All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-08-05  9:24 ` Anup Patel
  0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05  9:24 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, kvm, patches, marc.zyngier, christoffer.dall,
	will.deacon, ian.campbell, pranavkumar, Anup Patel

This patchset enables PMU virtualization in KVM ARM64. The
Guest can now directly use PMU available on the host HW.

The virtual PMU IRQ injection for Guest VCPUs is managed by
small piece of code shared between KVM ARM and KVM ARM64. The
virtual PMU IRQ number will be based on Guest machine model and
user space will provide it using set device address vm ioctl.

The second last patch of this series implements full context
switch of PMU registers which will context switch all PMU
registers on every KVM world-switch.

The last patch implements a lazy context switch of PMU registers
which is very similar to lazy debug context switch.
(Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)

Also, we reserve last PMU event counter for EL2 mode which
will not be accessible from Host and Guest EL1 mode. This
reserved EL2 mode PMU event counter can be used for profiling
KVM world-switch and other EL2 mode functions.

All testing have been done using KVMTOOL on X-Gene Mustang and
Foundation v8 Model for both Aarch32 and Aarch64 guest.

Anup Patel (6):
  ARM64: Move PMU register related defines to asm/pmu.h
  ARM64: perf: Re-enable overflow interrupt from interrupt handler
  ARM: perf: Re-enable overflow interrupt from interrupt handler
  ARM/ARM64: KVM: Add common code PMU IRQ routing
  ARM64: KVM: Implement full context switch of PMU registers
  ARM64: KVM: Upgrade to lazy context switch of PMU registers

 arch/arm/include/asm/kvm_host.h   |    9 +
 arch/arm/include/uapi/asm/kvm.h   |    1 +
 arch/arm/kernel/perf_event_v7.c   |    8 +
 arch/arm/kvm/arm.c                |    6 +
 arch/arm/kvm/reset.c              |    4 +
 arch/arm64/include/asm/kvm_asm.h  |   39 +++-
 arch/arm64/include/asm/kvm_host.h |   12 ++
 arch/arm64/include/asm/pmu.h      |   44 +++++
 arch/arm64/include/uapi/asm/kvm.h |    1 +
 arch/arm64/kernel/asm-offsets.c   |    2 +
 arch/arm64/kernel/perf_event.c    |   40 +---
 arch/arm64/kvm/Kconfig            |    7 +
 arch/arm64/kvm/Makefile           |    1 +
 arch/arm64/kvm/hyp-init.S         |   15 ++
 arch/arm64/kvm/hyp.S              |  209 +++++++++++++++++++-
 arch/arm64/kvm/reset.c            |    4 +
 arch/arm64/kvm/sys_regs.c         |  385 +++++++++++++++++++++++++++++++++----
 include/kvm/arm_pmu.h             |   52 +++++
 virt/kvm/arm/pmu.c                |  105 ++++++++++
 19 files changed, 870 insertions(+), 74 deletions(-)
 create mode 100644 include/kvm/arm_pmu.h
 create mode 100644 virt/kvm/arm/pmu.c

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-08-05  9:24 ` Anup Patel
  0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05  9:24 UTC (permalink / raw)
  To: linux-arm-kernel

This patchset enables PMU virtualization in KVM ARM64. The
Guest can now directly use PMU available on the host HW.

The virtual PMU IRQ injection for Guest VCPUs is managed by
small piece of code shared between KVM ARM and KVM ARM64. The
virtual PMU IRQ number will be based on Guest machine model and
user space will provide it using set device address vm ioctl.

The second last patch of this series implements full context
switch of PMU registers which will context switch all PMU
registers on every KVM world-switch.

The last patch implements a lazy context switch of PMU registers
which is very similar to lazy debug context switch.
(Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)

Also, we reserve last PMU event counter for EL2 mode which
will not be accessible from Host and Guest EL1 mode. This
reserved EL2 mode PMU event counter can be used for profiling
KVM world-switch and other EL2 mode functions.

All testing have been done using KVMTOOL on X-Gene Mustang and
Foundation v8 Model for both Aarch32 and Aarch64 guest.

Anup Patel (6):
  ARM64: Move PMU register related defines to asm/pmu.h
  ARM64: perf: Re-enable overflow interrupt from interrupt handler
  ARM: perf: Re-enable overflow interrupt from interrupt handler
  ARM/ARM64: KVM: Add common code PMU IRQ routing
  ARM64: KVM: Implement full context switch of PMU registers
  ARM64: KVM: Upgrade to lazy context switch of PMU registers

 arch/arm/include/asm/kvm_host.h   |    9 +
 arch/arm/include/uapi/asm/kvm.h   |    1 +
 arch/arm/kernel/perf_event_v7.c   |    8 +
 arch/arm/kvm/arm.c                |    6 +
 arch/arm/kvm/reset.c              |    4 +
 arch/arm64/include/asm/kvm_asm.h  |   39 +++-
 arch/arm64/include/asm/kvm_host.h |   12 ++
 arch/arm64/include/asm/pmu.h      |   44 +++++
 arch/arm64/include/uapi/asm/kvm.h |    1 +
 arch/arm64/kernel/asm-offsets.c   |    2 +
 arch/arm64/kernel/perf_event.c    |   40 +---
 arch/arm64/kvm/Kconfig            |    7 +
 arch/arm64/kvm/Makefile           |    1 +
 arch/arm64/kvm/hyp-init.S         |   15 ++
 arch/arm64/kvm/hyp.S              |  209 +++++++++++++++++++-
 arch/arm64/kvm/reset.c            |    4 +
 arch/arm64/kvm/sys_regs.c         |  385 +++++++++++++++++++++++++++++++++----
 include/kvm/arm_pmu.h             |   52 +++++
 virt/kvm/arm/pmu.c                |  105 ++++++++++
 19 files changed, 870 insertions(+), 74 deletions(-)
 create mode 100644 include/kvm/arm_pmu.h
 create mode 100644 virt/kvm/arm/pmu.c

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 1/6] ARM64: Move PMU register related defines to asm/pmu.h
  2014-08-05  9:24 ` Anup Patel
@ 2014-08-05  9:24   ` Anup Patel
  -1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05  9:24 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, kvm, patches, marc.zyngier, christoffer.dall,
	will.deacon, ian.campbell, pranavkumar, Anup Patel

To use the ARMv8 PMU related register defines from the KVM code,
we move the relevant definitions to asm/pmu.h include file.

We also add #ifndef __ASSEMBLY__ in order to use asm/pmu.h from
assembly code.

Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
---
 arch/arm64/include/asm/pmu.h   |   44 ++++++++++++++++++++++++++++++++++++++++
 arch/arm64/kernel/perf_event.c |   32 -----------------------------
 2 files changed, 44 insertions(+), 32 deletions(-)

diff --git a/arch/arm64/include/asm/pmu.h b/arch/arm64/include/asm/pmu.h
index e6f0878..f49cc72 100644
--- a/arch/arm64/include/asm/pmu.h
+++ b/arch/arm64/include/asm/pmu.h
@@ -19,6 +19,49 @@
 #ifndef __ASM_PMU_H
 #define __ASM_PMU_H
 
+/*
+ * Per-CPU PMCR: config reg
+ */
+#define ARMV8_PMCR_E		(1 << 0) /* Enable all counters */
+#define ARMV8_PMCR_P		(1 << 1) /* Reset all counters */
+#define ARMV8_PMCR_C		(1 << 2) /* Cycle counter reset */
+#define ARMV8_PMCR_D		(1 << 3) /* CCNT counts every 64th cpu cycle */
+#define ARMV8_PMCR_X		(1 << 4) /* Export to ETM */
+#define ARMV8_PMCR_DP		(1 << 5) /* Disable CCNT if non-invasive debug*/
+#define	ARMV8_PMCR_N_SHIFT	11	 /* Number of counters supported */
+#define	ARMV8_PMCR_N_MASK	0x1f
+#define	ARMV8_PMCR_MASK		0x3f	 /* Mask for writable bits */
+
+/*
+ * PMCNTEN: counters enable reg
+ */
+#define	ARMV8_CNTEN_MASK	0xffffffff	/* Mask for writable bits */
+
+/*
+ * PMINTEN: counters interrupt enable reg
+ */
+#define	ARMV8_INTEN_MASK	0xffffffff	/* Mask for writable bits */
+
+/*
+ * PMOVSR: counters overflow flag status reg
+ */
+#define	ARMV8_OVSR_MASK		0xffffffff	/* Mask for writable bits */
+#define	ARMV8_OVERFLOWED_MASK	ARMV8_OVSR_MASK
+
+/*
+ * PMXEVTYPER: Event selection reg
+ */
+#define	ARMV8_EVTYPE_MASK	0xc80003ff	/* Mask for writable bits */
+#define	ARMV8_EVTYPE_EVENT	0x3ff		/* Mask for EVENT bits */
+
+/*
+ * Event filters for PMUv3
+ */
+#define	ARMV8_EXCLUDE_EL1	(1 << 31)
+#define	ARMV8_EXCLUDE_EL0	(1 << 30)
+#define	ARMV8_INCLUDE_EL2	(1 << 27)
+
+#ifndef __ASSEMBLY__
 #ifdef CONFIG_HW_PERF_EVENTS
 
 /* The events for a given PMU register set. */
@@ -79,4 +122,5 @@ int armpmu_event_set_period(struct perf_event *event,
 			    int idx);
 
 #endif /* CONFIG_HW_PERF_EVENTS */
+#endif /* __ASSEMBLY__ */
 #endif /* __ASM_PMU_H */
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index baf5afb..47dfb8b 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -810,38 +810,6 @@ static const unsigned armv8_pmuv3_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 #define	ARMV8_IDX_TO_COUNTER(x)	\
 	(((x) - ARMV8_IDX_COUNTER0) & ARMV8_COUNTER_MASK)
 
-/*
- * Per-CPU PMCR: config reg
- */
-#define ARMV8_PMCR_E		(1 << 0) /* Enable all counters */
-#define ARMV8_PMCR_P		(1 << 1) /* Reset all counters */
-#define ARMV8_PMCR_C		(1 << 2) /* Cycle counter reset */
-#define ARMV8_PMCR_D		(1 << 3) /* CCNT counts every 64th cpu cycle */
-#define ARMV8_PMCR_X		(1 << 4) /* Export to ETM */
-#define ARMV8_PMCR_DP		(1 << 5) /* Disable CCNT if non-invasive debug*/
-#define	ARMV8_PMCR_N_SHIFT	11	 /* Number of counters supported */
-#define	ARMV8_PMCR_N_MASK	0x1f
-#define	ARMV8_PMCR_MASK		0x3f	 /* Mask for writable bits */
-
-/*
- * PMOVSR: counters overflow flag status reg
- */
-#define	ARMV8_OVSR_MASK		0xffffffff	/* Mask for writable bits */
-#define	ARMV8_OVERFLOWED_MASK	ARMV8_OVSR_MASK
-
-/*
- * PMXEVTYPER: Event selection reg
- */
-#define	ARMV8_EVTYPE_MASK	0xc80003ff	/* Mask for writable bits */
-#define	ARMV8_EVTYPE_EVENT	0x3ff		/* Mask for EVENT bits */
-
-/*
- * Event filters for PMUv3
- */
-#define	ARMV8_EXCLUDE_EL1	(1 << 31)
-#define	ARMV8_EXCLUDE_EL0	(1 << 30)
-#define	ARMV8_INCLUDE_EL2	(1 << 27)
-
 static inline u32 armv8pmu_pmcr_read(void)
 {
 	u32 val;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 1/6] ARM64: Move PMU register related defines to asm/pmu.h
@ 2014-08-05  9:24   ` Anup Patel
  0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05  9:24 UTC (permalink / raw)
  To: linux-arm-kernel

To use the ARMv8 PMU related register defines from the KVM code,
we move the relevant definitions to asm/pmu.h include file.

We also add #ifndef __ASSEMBLY__ in order to use asm/pmu.h from
assembly code.

Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
---
 arch/arm64/include/asm/pmu.h   |   44 ++++++++++++++++++++++++++++++++++++++++
 arch/arm64/kernel/perf_event.c |   32 -----------------------------
 2 files changed, 44 insertions(+), 32 deletions(-)

diff --git a/arch/arm64/include/asm/pmu.h b/arch/arm64/include/asm/pmu.h
index e6f0878..f49cc72 100644
--- a/arch/arm64/include/asm/pmu.h
+++ b/arch/arm64/include/asm/pmu.h
@@ -19,6 +19,49 @@
 #ifndef __ASM_PMU_H
 #define __ASM_PMU_H
 
+/*
+ * Per-CPU PMCR: config reg
+ */
+#define ARMV8_PMCR_E		(1 << 0) /* Enable all counters */
+#define ARMV8_PMCR_P		(1 << 1) /* Reset all counters */
+#define ARMV8_PMCR_C		(1 << 2) /* Cycle counter reset */
+#define ARMV8_PMCR_D		(1 << 3) /* CCNT counts every 64th cpu cycle */
+#define ARMV8_PMCR_X		(1 << 4) /* Export to ETM */
+#define ARMV8_PMCR_DP		(1 << 5) /* Disable CCNT if non-invasive debug*/
+#define	ARMV8_PMCR_N_SHIFT	11	 /* Number of counters supported */
+#define	ARMV8_PMCR_N_MASK	0x1f
+#define	ARMV8_PMCR_MASK		0x3f	 /* Mask for writable bits */
+
+/*
+ * PMCNTEN: counters enable reg
+ */
+#define	ARMV8_CNTEN_MASK	0xffffffff	/* Mask for writable bits */
+
+/*
+ * PMINTEN: counters interrupt enable reg
+ */
+#define	ARMV8_INTEN_MASK	0xffffffff	/* Mask for writable bits */
+
+/*
+ * PMOVSR: counters overflow flag status reg
+ */
+#define	ARMV8_OVSR_MASK		0xffffffff	/* Mask for writable bits */
+#define	ARMV8_OVERFLOWED_MASK	ARMV8_OVSR_MASK
+
+/*
+ * PMXEVTYPER: Event selection reg
+ */
+#define	ARMV8_EVTYPE_MASK	0xc80003ff	/* Mask for writable bits */
+#define	ARMV8_EVTYPE_EVENT	0x3ff		/* Mask for EVENT bits */
+
+/*
+ * Event filters for PMUv3
+ */
+#define	ARMV8_EXCLUDE_EL1	(1 << 31)
+#define	ARMV8_EXCLUDE_EL0	(1 << 30)
+#define	ARMV8_INCLUDE_EL2	(1 << 27)
+
+#ifndef __ASSEMBLY__
 #ifdef CONFIG_HW_PERF_EVENTS
 
 /* The events for a given PMU register set. */
@@ -79,4 +122,5 @@ int armpmu_event_set_period(struct perf_event *event,
 			    int idx);
 
 #endif /* CONFIG_HW_PERF_EVENTS */
+#endif /* __ASSEMBLY__ */
 #endif /* __ASM_PMU_H */
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index baf5afb..47dfb8b 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -810,38 +810,6 @@ static const unsigned armv8_pmuv3_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 #define	ARMV8_IDX_TO_COUNTER(x)	\
 	(((x) - ARMV8_IDX_COUNTER0) & ARMV8_COUNTER_MASK)
 
-/*
- * Per-CPU PMCR: config reg
- */
-#define ARMV8_PMCR_E		(1 << 0) /* Enable all counters */
-#define ARMV8_PMCR_P		(1 << 1) /* Reset all counters */
-#define ARMV8_PMCR_C		(1 << 2) /* Cycle counter reset */
-#define ARMV8_PMCR_D		(1 << 3) /* CCNT counts every 64th cpu cycle */
-#define ARMV8_PMCR_X		(1 << 4) /* Export to ETM */
-#define ARMV8_PMCR_DP		(1 << 5) /* Disable CCNT if non-invasive debug*/
-#define	ARMV8_PMCR_N_SHIFT	11	 /* Number of counters supported */
-#define	ARMV8_PMCR_N_MASK	0x1f
-#define	ARMV8_PMCR_MASK		0x3f	 /* Mask for writable bits */
-
-/*
- * PMOVSR: counters overflow flag status reg
- */
-#define	ARMV8_OVSR_MASK		0xffffffff	/* Mask for writable bits */
-#define	ARMV8_OVERFLOWED_MASK	ARMV8_OVSR_MASK
-
-/*
- * PMXEVTYPER: Event selection reg
- */
-#define	ARMV8_EVTYPE_MASK	0xc80003ff	/* Mask for writable bits */
-#define	ARMV8_EVTYPE_EVENT	0x3ff		/* Mask for EVENT bits */
-
-/*
- * Event filters for PMUv3
- */
-#define	ARMV8_EXCLUDE_EL1	(1 << 31)
-#define	ARMV8_EXCLUDE_EL0	(1 << 30)
-#define	ARMV8_INCLUDE_EL2	(1 << 27)
-
 static inline u32 armv8pmu_pmcr_read(void)
 {
 	u32 val;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 2/6] ARM64: perf: Re-enable overflow interrupt from interrupt handler
  2014-08-05  9:24 ` Anup Patel
@ 2014-08-05  9:24   ` Anup Patel
  -1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05  9:24 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, kvm, patches, marc.zyngier, christoffer.dall,
	will.deacon, ian.campbell, pranavkumar, Anup Patel

A hypervisor will typically mask the overflow interrupt before
forwarding it to Guest Linux hence we need to re-enable the overflow
interrupt after clearing it in Guest Linux. Also, this re-enabling
of overflow interrupt does not harm in non-virtualized scenarios.

Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Anup Patel <anup.patel@linaro.org>
---
 arch/arm64/kernel/perf_event.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 47dfb8b..19fb140 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -1076,6 +1076,14 @@ static irqreturn_t armv8pmu_handle_irq(int irq_num, void *dev)
 		if (!armv8pmu_counter_has_overflowed(pmovsr, idx))
 			continue;
 
+		/*
+		 * If we are running under a hypervisor such as KVM then
+		 * hypervisor will mask the interrupt before forwarding
+		 * it to Guest Linux hence re-enable interrupt for the
+		 * overflowed counter.
+		 */
+		armv8pmu_enable_intens(idx);
+
 		hwc = &event->hw;
 		armpmu_event_update(event, hwc, idx);
 		perf_sample_data_init(&data, 0, hwc->last_period);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 2/6] ARM64: perf: Re-enable overflow interrupt from interrupt handler
@ 2014-08-05  9:24   ` Anup Patel
  0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05  9:24 UTC (permalink / raw)
  To: linux-arm-kernel

A hypervisor will typically mask the overflow interrupt before
forwarding it to Guest Linux hence we need to re-enable the overflow
interrupt after clearing it in Guest Linux. Also, this re-enabling
of overflow interrupt does not harm in non-virtualized scenarios.

Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Anup Patel <anup.patel@linaro.org>
---
 arch/arm64/kernel/perf_event.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 47dfb8b..19fb140 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -1076,6 +1076,14 @@ static irqreturn_t armv8pmu_handle_irq(int irq_num, void *dev)
 		if (!armv8pmu_counter_has_overflowed(pmovsr, idx))
 			continue;
 
+		/*
+		 * If we are running under a hypervisor such as KVM then
+		 * hypervisor will mask the interrupt before forwarding
+		 * it to Guest Linux hence re-enable interrupt for the
+		 * overflowed counter.
+		 */
+		armv8pmu_enable_intens(idx);
+
 		hwc = &event->hw;
 		armpmu_event_update(event, hwc, idx);
 		perf_sample_data_init(&data, 0, hwc->last_period);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 3/6] ARM: perf: Re-enable overflow interrupt from interrupt handler
  2014-08-05  9:24 ` Anup Patel
@ 2014-08-05  9:24   ` Anup Patel
  -1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05  9:24 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, kvm, patches, marc.zyngier, christoffer.dall,
	will.deacon, ian.campbell, pranavkumar, Anup Patel

A hypervisor will typically mask the overflow interrupt before
forwarding it to Guest Linux hence we need to re-enable the overflow
interrupt after clearing it in Guest Linux. Also, this re-enabling
of overflow interrupt does not harm in non-virtualized scenarios.

Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Anup Patel <anup.patel@linaro.org>
---
 arch/arm/kernel/perf_event_v7.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index 1d37568..581cca5 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -1355,6 +1355,14 @@ static irqreturn_t armv7pmu_handle_irq(int irq_num, void *dev)
 		if (!armv7_pmnc_counter_has_overflowed(pmnc, idx))
 			continue;
 
+		/*
+		 * If we are running under a hypervisor such as KVM then
+		 * hypervisor will mask the interrupt before forwarding
+		 * it to Guest Linux hence re-enable interrupt for the
+		 * overflowed counter.
+		 */
+		armv7_pmnc_enable_intens(idx);
+
 		hwc = &event->hw;
 		armpmu_event_update(event);
 		perf_sample_data_init(&data, 0, hwc->last_period);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 3/6] ARM: perf: Re-enable overflow interrupt from interrupt handler
@ 2014-08-05  9:24   ` Anup Patel
  0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05  9:24 UTC (permalink / raw)
  To: linux-arm-kernel

A hypervisor will typically mask the overflow interrupt before
forwarding it to Guest Linux hence we need to re-enable the overflow
interrupt after clearing it in Guest Linux. Also, this re-enabling
of overflow interrupt does not harm in non-virtualized scenarios.

Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Anup Patel <anup.patel@linaro.org>
---
 arch/arm/kernel/perf_event_v7.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index 1d37568..581cca5 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -1355,6 +1355,14 @@ static irqreturn_t armv7pmu_handle_irq(int irq_num, void *dev)
 		if (!armv7_pmnc_counter_has_overflowed(pmnc, idx))
 			continue;
 
+		/*
+		 * If we are running under a hypervisor such as KVM then
+		 * hypervisor will mask the interrupt before forwarding
+		 * it to Guest Linux hence re-enable interrupt for the
+		 * overflowed counter.
+		 */
+		armv7_pmnc_enable_intens(idx);
+
 		hwc = &event->hw;
 		armpmu_event_update(event);
 		perf_sample_data_init(&data, 0, hwc->last_period);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 4/6] ARM/ARM64: KVM: Add common code PMU IRQ routing
  2014-08-05  9:24 ` Anup Patel
@ 2014-08-05  9:24   ` Anup Patel
  -1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05  9:24 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, kvm, patches, marc.zyngier, christoffer.dall,
	will.deacon, ian.campbell, pranavkumar, Anup Patel

This patch introduces common PMU IRQ routing code for
KVM ARM and KVM ARM64 under virt/kvm/arm directory.

The virtual PMU IRQ number for each Guest VCPU will be
provided by user space using set device address vm ioctl
with prameters:
dev_id = KVM_ARM_DEVICE_PMU
type = VCPU number
addr = PMU IRQ number for the VCPU

The low-level context switching code of KVM ARM/ARM64
will determine the state of VCPU PMU IRQ store it in
"irq_pending" flag when saving PMU context for the VCPU.

The common PMU IRQ routing code will inject virtual PMU
IRQ based on "irq_pending" flag and it will also clear
the "irq_pending" flag.

Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
---
 arch/arm/include/asm/kvm_host.h   |    9 ++++
 arch/arm/include/uapi/asm/kvm.h   |    1 +
 arch/arm/kvm/arm.c                |    6 +++
 arch/arm/kvm/reset.c              |    4 ++
 arch/arm64/include/asm/kvm_host.h |    9 ++++
 arch/arm64/include/uapi/asm/kvm.h |    1 +
 arch/arm64/kvm/Kconfig            |    7 +++
 arch/arm64/kvm/Makefile           |    1 +
 arch/arm64/kvm/reset.c            |    4 ++
 include/kvm/arm_pmu.h             |   52 ++++++++++++++++++
 virt/kvm/arm/pmu.c                |  105 +++++++++++++++++++++++++++++++++++++
 11 files changed, 199 insertions(+)
 create mode 100644 include/kvm/arm_pmu.h
 create mode 100644 virt/kvm/arm/pmu.c

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 193ceaf..a6a778f 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -24,6 +24,7 @@
 #include <asm/kvm_mmio.h>
 #include <asm/fpstate.h>
 #include <kvm/arm_arch_timer.h>
+#include <kvm/arm_pmu.h>
 
 #if defined(CONFIG_KVM_ARM_MAX_VCPUS)
 #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
@@ -53,6 +54,9 @@ struct kvm_arch {
 	/* Timer */
 	struct arch_timer_kvm	timer;
 
+	/* PMU */
+	struct pmu_kvm		pmu;
+
 	/*
 	 * Anything that is not used directly from assembly code goes
 	 * here.
@@ -118,8 +122,13 @@ struct kvm_vcpu_arch {
 
 	/* VGIC state */
 	struct vgic_cpu vgic_cpu;
+
+	/* Timer state */
 	struct arch_timer_cpu timer_cpu;
 
+	/* PMU state */
+	struct pmu_cpu pmu_cpu;
+
 	/*
 	 * Anything that is not used directly from assembly code goes
 	 * here.
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index e6ebdd3..b21e6eb 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -75,6 +75,7 @@ struct kvm_regs {
 
 /* Supported device IDs */
 #define KVM_ARM_DEVICE_VGIC_V2		0
+#define KVM_ARM_DEVICE_PMU		1
 
 /* Supported VGIC address types  */
 #define KVM_VGIC_V2_ADDR_TYPE_DIST	0
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 3c82b37..04130f5 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -140,6 +140,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
 	kvm_timer_init(kvm);
 
+	kvm_pmu_init(kvm);
+
 	/* Mark the initial VMID generation invalid */
 	kvm->arch.vmid_gen = 0;
 
@@ -567,6 +569,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
 			local_irq_enable();
 			kvm_timer_sync_hwstate(vcpu);
+			kvm_pmu_sync_hwstate(vcpu);
 			kvm_vgic_sync_hwstate(vcpu);
 			continue;
 		}
@@ -601,6 +604,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 *************************************************************/
 
 		kvm_timer_sync_hwstate(vcpu);
+		kvm_pmu_sync_hwstate(vcpu);
 		kvm_vgic_sync_hwstate(vcpu);
 
 		ret = handle_exit(vcpu, run, ret);
@@ -794,6 +798,8 @@ static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
 		if (!vgic_present)
 			return -ENXIO;
 		return kvm_vgic_addr(kvm, type, &dev_addr->addr, true);
+	case KVM_ARM_DEVICE_PMU:
+		return kvm_pmu_addr(kvm, type, &dev_addr->addr, true);
 	default:
 		return -ENODEV;
 	}
diff --git a/arch/arm/kvm/reset.c b/arch/arm/kvm/reset.c
index f558c07..42e6996 100644
--- a/arch/arm/kvm/reset.c
+++ b/arch/arm/kvm/reset.c
@@ -28,6 +28,7 @@
 #include <asm/kvm_coproc.h>
 
 #include <kvm/arm_arch_timer.h>
+#include <kvm/arm_pmu.h>
 
 /******************************************************************************
  * Cortex-A15 and Cortex-A7 Reset Values
@@ -79,5 +80,8 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 	/* Reset arch_timer context */
 	kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq);
 
+	/* Reset pmu context */
+	kvm_pmu_vcpu_reset(vcpu);
+
 	return 0;
 }
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7592ddf..ae4cdb2 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -38,6 +38,7 @@
 
 #include <kvm/arm_vgic.h>
 #include <kvm/arm_arch_timer.h>
+#include <kvm/arm_pmu.h>
 
 #define KVM_VCPU_MAX_FEATURES 3
 
@@ -63,6 +64,9 @@ struct kvm_arch {
 
 	/* Timer */
 	struct arch_timer_kvm	timer;
+
+	/* PMU */
+	struct pmu_kvm		pmu;
 };
 
 #define KVM_NR_MEM_OBJS     40
@@ -109,8 +113,13 @@ struct kvm_vcpu_arch {
 
 	/* VGIC state */
 	struct vgic_cpu vgic_cpu;
+
+	/* Timer state */
 	struct arch_timer_cpu timer_cpu;
 
+	/* PMU state */
+	struct pmu_cpu pmu_cpu;
+
 	/*
 	 * Anything that is not used directly from assembly code goes
 	 * here.
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index e633ff8..a7fed09 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -69,6 +69,7 @@ struct kvm_regs {
 
 /* Supported device IDs */
 #define KVM_ARM_DEVICE_VGIC_V2		0
+#define KVM_ARM_DEVICE_PMU		1
 
 /* Supported VGIC address types  */
 #define KVM_VGIC_V2_ADDR_TYPE_DIST	0
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 8ba85e9..672213d 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -26,6 +26,7 @@ config KVM
 	select KVM_ARM_HOST
 	select KVM_ARM_VGIC
 	select KVM_ARM_TIMER
+	select KVM_ARM_PMU
 	---help---
 	  Support hosting virtualized guest machines.
 
@@ -60,4 +61,10 @@ config KVM_ARM_TIMER
 	---help---
 	  Adds support for the Architected Timers in virtual machines.
 
+config KVM_ARM_PMU
+	bool
+	depends on KVM_ARM_VGIC
+	---help---
+	  Adds support for the Performance Monitoring in virtual machines.
+
 endif # VIRTUALIZATION
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 72a9fd5..6be68bc 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -21,3 +21,4 @@ kvm-$(CONFIG_KVM_ARM_HOST) += guest.o reset.o sys_regs.o sys_regs_generic_v8.o
 
 kvm-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o
 kvm-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
+kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 70a7816..27f4041 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -24,6 +24,7 @@
 #include <linux/kvm.h>
 
 #include <kvm/arm_arch_timer.h>
+#include <kvm/arm_pmu.h>
 
 #include <asm/cputype.h>
 #include <asm/ptrace.h>
@@ -108,5 +109,8 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 	/* Reset timer */
 	kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq);
 
+	/* Reset pmu context */
+	kvm_pmu_vcpu_reset(vcpu);
+
 	return 0;
 }
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
new file mode 100644
index 0000000..1e3aa44
--- /dev/null
+++ b/include/kvm/arm_pmu.h
@@ -0,0 +1,52 @@
+/*
+ * Copyright (C) 2014 Linaro Ltd.
+ * Author: Anup Patel <anup.patel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#ifndef __ASM_ARM_KVM_PMU_H
+#define __ASM_ARM_KVM_PMU_H
+
+struct pmu_kvm {
+#ifdef CONFIG_KVM_ARM_PMU
+	/* PMU IRQ Numbers */
+	unsigned int		irq_num[CONFIG_KVM_ARM_MAX_VCPUS];
+#endif
+};
+
+struct pmu_cpu {
+#ifdef CONFIG_KVM_ARM_PMU
+	/* IRQ pending flag. Updated when registers are saved. */
+	u32			irq_pending;
+#endif
+};
+
+#ifdef CONFIG_KVM_ARM_PMU
+void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu);
+void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu);
+int kvm_pmu_addr(struct kvm *kvm, unsigned long cpu, u64 *irq, bool write);
+int kvm_pmu_init(struct kvm *kvm);
+#else
+static inline void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu) {}
+static inline void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu) {}
+static inline int kvm_pmu_addr(struct kvm *kvm,
+				unsigned long cpu, u64 *irq, bool write)
+{
+	return -ENXIO;
+}
+static inline int kvm_pmu_init(struct kvm *kvm) { return 0; }
+#endif
+
+#endif
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
new file mode 100644
index 0000000..98066ad
--- /dev/null
+++ b/virt/kvm/arm/pmu.c
@@ -0,0 +1,105 @@
+/*
+ * Copyright (C) 2014 Linaro Ltd.
+ * Author: Anup Patel <anup.patel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/cpu.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+
+#include <kvm/arm_vgic.h>
+#include <kvm/arm_pmu.h>
+
+/**
+ * kvm_pmu_sync_hwstate - sync pmu state for cpu
+ * @vcpu: The vcpu pointer
+ *
+ * Inject virtual PMU IRQ if IRQ is pending for this cpu.
+ */
+void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu)
+{
+	struct pmu_cpu *pmu = &vcpu->arch.pmu_cpu;
+	struct pmu_kvm *kpmu = &vcpu->kvm->arch.pmu;
+
+	if (pmu->irq_pending) {
+		kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
+				    kpmu->irq_num[vcpu->vcpu_id],
+				    1);
+		pmu->irq_pending = 0;
+		return;
+	}
+}
+
+/**
+ * kvm_pmu_vcpu_reset - reset pmu state for cpu
+ * @vcpu: The vcpu pointer
+ *
+ */
+void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu)
+{
+	struct pmu_cpu *pmu = &vcpu->arch.pmu_cpu;
+
+	pmu->irq_pending = 0;
+}
+
+/**
+ * kvm_pmu_addr - set or get PMU VM IRQ numbers
+ * @kvm:   pointer to the vm struct
+ * @cpu:  cpu number
+ * @irq:  pointer to irq number value
+ * @write: if true set the irq number else read the irq number
+ *
+ * Set or get the PMU IRQ number for the given cpu number.
+ */
+int kvm_pmu_addr(struct kvm *kvm, unsigned long cpu, u64 *irq, bool write)
+{
+	struct pmu_kvm *kpmu = &kvm->arch.pmu;
+
+	if (CONFIG_KVM_ARM_MAX_VCPUS <= cpu)
+		return -ENODEV;
+
+	mutex_lock(&kvm->lock);
+
+	if (write) {
+		kpmu->irq_num[cpu] = *irq;
+	} else {
+		*irq = kpmu->irq_num[cpu];
+	}
+
+	mutex_unlock(&kvm->lock);
+
+	return 0;
+}
+
+/**
+ * kvm_pmu_init - Initialize global PMU state for a VM
+ * @kvm: pointer to the kvm struct
+ *
+ * Set all the PMU IRQ numbers to invalid value so that
+ * user space has to explicitly provide PMU IRQ numbers
+ * using set device address ioctl.
+ */
+int kvm_pmu_init(struct kvm *kvm)
+{
+	int i;
+	struct pmu_kvm *kpmu = &kvm->arch.pmu;
+
+	for (i = 0; i < CONFIG_KVM_ARM_MAX_VCPUS; i++) {
+		kpmu->irq_num[i] = UINT_MAX;
+	}
+
+	return 0;
+}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 4/6] ARM/ARM64: KVM: Add common code PMU IRQ routing
@ 2014-08-05  9:24   ` Anup Patel
  0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05  9:24 UTC (permalink / raw)
  To: linux-arm-kernel

This patch introduces common PMU IRQ routing code for
KVM ARM and KVM ARM64 under virt/kvm/arm directory.

The virtual PMU IRQ number for each Guest VCPU will be
provided by user space using set device address vm ioctl
with prameters:
dev_id = KVM_ARM_DEVICE_PMU
type = VCPU number
addr = PMU IRQ number for the VCPU

The low-level context switching code of KVM ARM/ARM64
will determine the state of VCPU PMU IRQ store it in
"irq_pending" flag when saving PMU context for the VCPU.

The common PMU IRQ routing code will inject virtual PMU
IRQ based on "irq_pending" flag and it will also clear
the "irq_pending" flag.

Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
---
 arch/arm/include/asm/kvm_host.h   |    9 ++++
 arch/arm/include/uapi/asm/kvm.h   |    1 +
 arch/arm/kvm/arm.c                |    6 +++
 arch/arm/kvm/reset.c              |    4 ++
 arch/arm64/include/asm/kvm_host.h |    9 ++++
 arch/arm64/include/uapi/asm/kvm.h |    1 +
 arch/arm64/kvm/Kconfig            |    7 +++
 arch/arm64/kvm/Makefile           |    1 +
 arch/arm64/kvm/reset.c            |    4 ++
 include/kvm/arm_pmu.h             |   52 ++++++++++++++++++
 virt/kvm/arm/pmu.c                |  105 +++++++++++++++++++++++++++++++++++++
 11 files changed, 199 insertions(+)
 create mode 100644 include/kvm/arm_pmu.h
 create mode 100644 virt/kvm/arm/pmu.c

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 193ceaf..a6a778f 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -24,6 +24,7 @@
 #include <asm/kvm_mmio.h>
 #include <asm/fpstate.h>
 #include <kvm/arm_arch_timer.h>
+#include <kvm/arm_pmu.h>
 
 #if defined(CONFIG_KVM_ARM_MAX_VCPUS)
 #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
@@ -53,6 +54,9 @@ struct kvm_arch {
 	/* Timer */
 	struct arch_timer_kvm	timer;
 
+	/* PMU */
+	struct pmu_kvm		pmu;
+
 	/*
 	 * Anything that is not used directly from assembly code goes
 	 * here.
@@ -118,8 +122,13 @@ struct kvm_vcpu_arch {
 
 	/* VGIC state */
 	struct vgic_cpu vgic_cpu;
+
+	/* Timer state */
 	struct arch_timer_cpu timer_cpu;
 
+	/* PMU state */
+	struct pmu_cpu pmu_cpu;
+
 	/*
 	 * Anything that is not used directly from assembly code goes
 	 * here.
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index e6ebdd3..b21e6eb 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -75,6 +75,7 @@ struct kvm_regs {
 
 /* Supported device IDs */
 #define KVM_ARM_DEVICE_VGIC_V2		0
+#define KVM_ARM_DEVICE_PMU		1
 
 /* Supported VGIC address types  */
 #define KVM_VGIC_V2_ADDR_TYPE_DIST	0
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 3c82b37..04130f5 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -140,6 +140,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
 	kvm_timer_init(kvm);
 
+	kvm_pmu_init(kvm);
+
 	/* Mark the initial VMID generation invalid */
 	kvm->arch.vmid_gen = 0;
 
@@ -567,6 +569,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
 			local_irq_enable();
 			kvm_timer_sync_hwstate(vcpu);
+			kvm_pmu_sync_hwstate(vcpu);
 			kvm_vgic_sync_hwstate(vcpu);
 			continue;
 		}
@@ -601,6 +604,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 *************************************************************/
 
 		kvm_timer_sync_hwstate(vcpu);
+		kvm_pmu_sync_hwstate(vcpu);
 		kvm_vgic_sync_hwstate(vcpu);
 
 		ret = handle_exit(vcpu, run, ret);
@@ -794,6 +798,8 @@ static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
 		if (!vgic_present)
 			return -ENXIO;
 		return kvm_vgic_addr(kvm, type, &dev_addr->addr, true);
+	case KVM_ARM_DEVICE_PMU:
+		return kvm_pmu_addr(kvm, type, &dev_addr->addr, true);
 	default:
 		return -ENODEV;
 	}
diff --git a/arch/arm/kvm/reset.c b/arch/arm/kvm/reset.c
index f558c07..42e6996 100644
--- a/arch/arm/kvm/reset.c
+++ b/arch/arm/kvm/reset.c
@@ -28,6 +28,7 @@
 #include <asm/kvm_coproc.h>
 
 #include <kvm/arm_arch_timer.h>
+#include <kvm/arm_pmu.h>
 
 /******************************************************************************
  * Cortex-A15 and Cortex-A7 Reset Values
@@ -79,5 +80,8 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 	/* Reset arch_timer context */
 	kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq);
 
+	/* Reset pmu context */
+	kvm_pmu_vcpu_reset(vcpu);
+
 	return 0;
 }
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7592ddf..ae4cdb2 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -38,6 +38,7 @@
 
 #include <kvm/arm_vgic.h>
 #include <kvm/arm_arch_timer.h>
+#include <kvm/arm_pmu.h>
 
 #define KVM_VCPU_MAX_FEATURES 3
 
@@ -63,6 +64,9 @@ struct kvm_arch {
 
 	/* Timer */
 	struct arch_timer_kvm	timer;
+
+	/* PMU */
+	struct pmu_kvm		pmu;
 };
 
 #define KVM_NR_MEM_OBJS     40
@@ -109,8 +113,13 @@ struct kvm_vcpu_arch {
 
 	/* VGIC state */
 	struct vgic_cpu vgic_cpu;
+
+	/* Timer state */
 	struct arch_timer_cpu timer_cpu;
 
+	/* PMU state */
+	struct pmu_cpu pmu_cpu;
+
 	/*
 	 * Anything that is not used directly from assembly code goes
 	 * here.
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index e633ff8..a7fed09 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -69,6 +69,7 @@ struct kvm_regs {
 
 /* Supported device IDs */
 #define KVM_ARM_DEVICE_VGIC_V2		0
+#define KVM_ARM_DEVICE_PMU		1
 
 /* Supported VGIC address types  */
 #define KVM_VGIC_V2_ADDR_TYPE_DIST	0
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 8ba85e9..672213d 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -26,6 +26,7 @@ config KVM
 	select KVM_ARM_HOST
 	select KVM_ARM_VGIC
 	select KVM_ARM_TIMER
+	select KVM_ARM_PMU
 	---help---
 	  Support hosting virtualized guest machines.
 
@@ -60,4 +61,10 @@ config KVM_ARM_TIMER
 	---help---
 	  Adds support for the Architected Timers in virtual machines.
 
+config KVM_ARM_PMU
+	bool
+	depends on KVM_ARM_VGIC
+	---help---
+	  Adds support for the Performance Monitoring in virtual machines.
+
 endif # VIRTUALIZATION
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 72a9fd5..6be68bc 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -21,3 +21,4 @@ kvm-$(CONFIG_KVM_ARM_HOST) += guest.o reset.o sys_regs.o sys_regs_generic_v8.o
 
 kvm-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o
 kvm-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
+kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 70a7816..27f4041 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -24,6 +24,7 @@
 #include <linux/kvm.h>
 
 #include <kvm/arm_arch_timer.h>
+#include <kvm/arm_pmu.h>
 
 #include <asm/cputype.h>
 #include <asm/ptrace.h>
@@ -108,5 +109,8 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 	/* Reset timer */
 	kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq);
 
+	/* Reset pmu context */
+	kvm_pmu_vcpu_reset(vcpu);
+
 	return 0;
 }
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
new file mode 100644
index 0000000..1e3aa44
--- /dev/null
+++ b/include/kvm/arm_pmu.h
@@ -0,0 +1,52 @@
+/*
+ * Copyright (C) 2014 Linaro Ltd.
+ * Author: Anup Patel <anup.patel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#ifndef __ASM_ARM_KVM_PMU_H
+#define __ASM_ARM_KVM_PMU_H
+
+struct pmu_kvm {
+#ifdef CONFIG_KVM_ARM_PMU
+	/* PMU IRQ Numbers */
+	unsigned int		irq_num[CONFIG_KVM_ARM_MAX_VCPUS];
+#endif
+};
+
+struct pmu_cpu {
+#ifdef CONFIG_KVM_ARM_PMU
+	/* IRQ pending flag. Updated when registers are saved. */
+	u32			irq_pending;
+#endif
+};
+
+#ifdef CONFIG_KVM_ARM_PMU
+void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu);
+void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu);
+int kvm_pmu_addr(struct kvm *kvm, unsigned long cpu, u64 *irq, bool write);
+int kvm_pmu_init(struct kvm *kvm);
+#else
+static inline void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu) {}
+static inline void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu) {}
+static inline int kvm_pmu_addr(struct kvm *kvm,
+				unsigned long cpu, u64 *irq, bool write)
+{
+	return -ENXIO;
+}
+static inline int kvm_pmu_init(struct kvm *kvm) { return 0; }
+#endif
+
+#endif
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
new file mode 100644
index 0000000..98066ad
--- /dev/null
+++ b/virt/kvm/arm/pmu.c
@@ -0,0 +1,105 @@
+/*
+ * Copyright (C) 2014 Linaro Ltd.
+ * Author: Anup Patel <anup.patel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/cpu.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+
+#include <kvm/arm_vgic.h>
+#include <kvm/arm_pmu.h>
+
+/**
+ * kvm_pmu_sync_hwstate - sync pmu state for cpu
+ * @vcpu: The vcpu pointer
+ *
+ * Inject virtual PMU IRQ if IRQ is pending for this cpu.
+ */
+void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu)
+{
+	struct pmu_cpu *pmu = &vcpu->arch.pmu_cpu;
+	struct pmu_kvm *kpmu = &vcpu->kvm->arch.pmu;
+
+	if (pmu->irq_pending) {
+		kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
+				    kpmu->irq_num[vcpu->vcpu_id],
+				    1);
+		pmu->irq_pending = 0;
+		return;
+	}
+}
+
+/**
+ * kvm_pmu_vcpu_reset - reset pmu state for cpu
+ * @vcpu: The vcpu pointer
+ *
+ */
+void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu)
+{
+	struct pmu_cpu *pmu = &vcpu->arch.pmu_cpu;
+
+	pmu->irq_pending = 0;
+}
+
+/**
+ * kvm_pmu_addr - set or get PMU VM IRQ numbers
+ * @kvm:   pointer to the vm struct
+ * @cpu:  cpu number
+ * @irq:  pointer to irq number value
+ * @write: if true set the irq number else read the irq number
+ *
+ * Set or get the PMU IRQ number for the given cpu number.
+ */
+int kvm_pmu_addr(struct kvm *kvm, unsigned long cpu, u64 *irq, bool write)
+{
+	struct pmu_kvm *kpmu = &kvm->arch.pmu;
+
+	if (CONFIG_KVM_ARM_MAX_VCPUS <= cpu)
+		return -ENODEV;
+
+	mutex_lock(&kvm->lock);
+
+	if (write) {
+		kpmu->irq_num[cpu] = *irq;
+	} else {
+		*irq = kpmu->irq_num[cpu];
+	}
+
+	mutex_unlock(&kvm->lock);
+
+	return 0;
+}
+
+/**
+ * kvm_pmu_init - Initialize global PMU state for a VM
+ * @kvm: pointer to the kvm struct
+ *
+ * Set all the PMU IRQ numbers to invalid value so that
+ * user space has to explicitly provide PMU IRQ numbers
+ * using set device address ioctl.
+ */
+int kvm_pmu_init(struct kvm *kvm)
+{
+	int i;
+	struct pmu_kvm *kpmu = &kvm->arch.pmu;
+
+	for (i = 0; i < CONFIG_KVM_ARM_MAX_VCPUS; i++) {
+		kpmu->irq_num[i] = UINT_MAX;
+	}
+
+	return 0;
+}
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 5/6] ARM64: KVM: Implement full context switch of PMU registers
  2014-08-05  9:24 ` Anup Patel
@ 2014-08-05  9:24   ` Anup Patel
  -1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05  9:24 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, kvm, patches, marc.zyngier, christoffer.dall,
	will.deacon, ian.campbell, pranavkumar, Anup Patel

This patch implements following stuff:
1. Save/restore all PMU registers for both Guest and Host in
KVM world switch.
2. Reserve last PMU event counter for performance analysis in
EL2-mode. To achieve we fake the number of event counters available
to the Guest by trapping PMCR_EL0 register accesses and program
MDCR_EL2.HPMN with number of PMU event counters minus one.
3. Clear and mask overflowed interrupts when saving PMU context
for Guest. The Guest will re-enable overflowed interrupts when
processing virtual PMU interrupt.

With this patch we have direct access of all PMU registers from
Guest and we only trap-n-emulate PMCR_EL0 accesses to fake number
of PMU event counters to Guest.

Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
---
 arch/arm64/include/asm/kvm_asm.h |   36 ++++++--
 arch/arm64/kernel/asm-offsets.c  |    1 +
 arch/arm64/kvm/hyp-init.S        |   15 ++++
 arch/arm64/kvm/hyp.S             |  168 +++++++++++++++++++++++++++++++++++-
 arch/arm64/kvm/sys_regs.c        |  175 ++++++++++++++++++++++++++++----------
 5 files changed, 343 insertions(+), 52 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 993a7db..93be21f 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -53,15 +53,27 @@
 #define DBGWVR0_EL1	71	/* Debug Watchpoint Value Registers (0-15) */
 #define DBGWVR15_EL1	86
 #define MDCCINT_EL1	87	/* Monitor Debug Comms Channel Interrupt Enable Reg */
+#define PMCR_EL0	88	/* Performance Monitors Control Register */
+#define PMOVSSET_EL0	89	/* Performance Monitors Overflow Flag Status Set Register */
+#define PMCCNTR_EL0	90	/* Cycle Counter Register */
+#define PMSELR_EL0	91	/* Performance Monitors Event Counter Selection Register */
+#define PMEVCNTR0_EL0	92	/* Performance Monitors Event Counter Register (0-30) */
+#define PMEVTYPER0_EL0	93	/* Performance Monitors Event Type Register (0-30) */
+#define PMEVCNTR30_EL0	152
+#define PMEVTYPER30_EL0	153
+#define PMCNTENSET_EL0	154	/* Performance Monitors Count Enable Set Register */
+#define PMINTENSET_EL1	155	/* Performance Monitors Interrupt Enable Set Register */
+#define PMUSERENR_EL0	156	/* Performance Monitors User Enable Register */
+#define PMCCFILTR_EL0	157	/* Cycle Count Filter Register */
 
 /* 32bit specific registers. Keep them at the end of the range */
-#define	DACR32_EL2	88	/* Domain Access Control Register */
-#define	IFSR32_EL2	89	/* Instruction Fault Status Register */
-#define	FPEXC32_EL2	90	/* Floating-Point Exception Control Register */
-#define	DBGVCR32_EL2	91	/* Debug Vector Catch Register */
-#define	TEECR32_EL1	92	/* ThumbEE Configuration Register */
-#define	TEEHBR32_EL1	93	/* ThumbEE Handler Base Register */
-#define	NR_SYS_REGS	94
+#define	DACR32_EL2	158	/* Domain Access Control Register */
+#define	IFSR32_EL2	159	/* Instruction Fault Status Register */
+#define	FPEXC32_EL2	160	/* Floating-Point Exception Control Register */
+#define	DBGVCR32_EL2	161	/* Debug Vector Catch Register */
+#define	TEECR32_EL1	162	/* ThumbEE Configuration Register */
+#define	TEEHBR32_EL1	163	/* ThumbEE Handler Base Register */
+#define	NR_SYS_REGS	164
 
 /* 32bit mapping */
 #define c0_MPIDR	(MPIDR_EL1 * 2)	/* MultiProcessor ID Register */
@@ -83,6 +95,13 @@
 #define c6_IFAR		(c6_DFAR + 1)	/* Instruction Fault Address Register */
 #define c7_PAR		(PAR_EL1 * 2)	/* Physical Address Register */
 #define c7_PAR_high	(c7_PAR + 1)	/* PAR top 32 bits */
+#define c9_PMCR		(PMCR_EL0 * 2)	/* Performance Monitors Control Register */
+#define c9_PMOVSSET	(PMOVSSET_EL0 * 2)
+#define c9_PMCCNTR	(PMCCNTR_EL0 * 2)
+#define c9_PMSELR	(PMSELR_EL0 * 2)
+#define c9_PMCNTENSET	(PMCNTENSET_EL0 * 2)
+#define c9_PMINTENSET	(PMINTENSET_EL1 * 2)
+#define c9_PMUSERENR	(PMUSERENR_EL0 * 2)
 #define c10_PRRR	(MAIR_EL1 * 2)	/* Primary Region Remap Register */
 #define c10_NMRR	(c10_PRRR + 1)	/* Normal Memory Remap Register */
 #define c12_VBAR	(VBAR_EL1 * 2)	/* Vector Base Address Register */
@@ -93,6 +112,9 @@
 #define c10_AMAIR0	(AMAIR_EL1 * 2)	/* Aux Memory Attr Indirection Reg */
 #define c10_AMAIR1	(c10_AMAIR0 + 1)/* Aux Memory Attr Indirection Reg */
 #define c14_CNTKCTL	(CNTKCTL_EL1 * 2) /* Timer Control Register (PL1) */
+#define c14_PMEVCNTR0	(PMEVCNTR0_EL0 * 2)
+#define c14_PMEVTYPR0	(PMEVTYPER0_EL0 * 2)
+#define c14_PMCCFILTR	(PMCCFILTR_EL0 * 2)
 
 #define cp14_DBGDSCRext	(MDSCR_EL1 * 2)
 #define cp14_DBGBCR0	(DBGBCR0_EL1 * 2)
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index ae73a83..053dc3e 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -140,6 +140,7 @@ int main(void)
   DEFINE(VGIC_CPU_NR_LR,	offsetof(struct vgic_cpu, nr_lr));
   DEFINE(KVM_VTTBR,		offsetof(struct kvm, arch.vttbr));
   DEFINE(KVM_VGIC_VCTRL,	offsetof(struct kvm, arch.vgic.vctrl_base));
+  DEFINE(VCPU_PMU_IRQ_PENDING,	offsetof(struct kvm_vcpu, arch.pmu_cpu.irq_pending));
 #endif
 #ifdef CONFIG_ARM64_CPU_SUSPEND
   DEFINE(CPU_SUSPEND_SZ,	sizeof(struct cpu_suspend_ctx));
diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
index d968796..b45556e 100644
--- a/arch/arm64/kvm/hyp-init.S
+++ b/arch/arm64/kvm/hyp-init.S
@@ -20,6 +20,7 @@
 #include <asm/assembler.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/pmu.h>
 
 	.text
 	.pushsection	.hyp.idmap.text, "ax"
@@ -107,6 +108,20 @@ target: /* We're now in the trampoline code, switch page tables */
 	kern_hyp_va	x3
 	msr	vbar_el2, x3
 
+	/* Reserve last PMU event counter for EL2 */
+	mov	x4, #0
+	mrs	x5, id_aa64dfr0_el1
+	ubfx	x5, x5, #8, #4		// Extract PMUver
+	cmp	x5, #1			// Must be PMUv3 else skip
+	bne	1f
+	mrs	x5, pmcr_el0
+	ubfx	x5, x5, #ARMV8_PMCR_N_SHIFT, #5	// Number of event counters
+	cmp	x5, #0			// Skip if no event counters
+	beq	1f
+	sub	x4, x5, #1
+1:
+	msr	mdcr_el2, x4
+
 	/* Hello, World! */
 	eret
 ENDPROC(__kvm_hyp_init)
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index d032132..6b41c01 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -23,6 +23,7 @@
 #include <asm/asm-offsets.h>
 #include <asm/debug-monitors.h>
 #include <asm/fpsimdmacros.h>
+#include <asm/pmu.h>
 #include <asm/kvm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_arm.h>
@@ -426,6 +427,77 @@ __kvm_hyp_code_start:
 	str	x21, [x2, #CPU_SYSREG_OFFSET(MDCCINT_EL1)]
 .endm
 
+.macro save_pmu, is_vcpu_pmu
+	// x2: base address for cpu context
+	// x3: mask of counters allowed in EL0 & EL1
+	// x4: number of event counters allowed in EL0 & EL1
+
+	mrs	x6, id_aa64dfr0_el1
+	ubfx	x5, x6, #8, #4		// Extract PMUver
+	cmp	x5, #1			// Must be PMUv3 else skip
+	bne	1f
+
+	mrs	x4, pmcr_el0		// Save PMCR_EL0
+	str	x4, [x2, #CPU_SYSREG_OFFSET(PMCR_EL0)]
+
+	and	x5, x4, #~(ARMV8_PMCR_E)// Clear PMCR_EL0.E
+	msr	pmcr_el0, x5		// This will stop all counters
+
+	mov	x3, #0
+	ubfx	x4, x4, #ARMV8_PMCR_N_SHIFT, #5	// Number of event counters
+	cmp	x4, #0			// Skip if no event counters
+	beq	2f
+	sub	x4, x4, #1		// Last event counter is reserved
+	mov	x3, #1
+	lsl	x3, x3, x4
+	sub	x3, x3, #1
+2:	orr	x3, x3, #(1 << 31)	// Mask of event counters
+
+	mrs	x5, pmovsset_el0	// Save PMOVSSET_EL0
+	and	x5, x5, x3
+	str	x5, [x2, #CPU_SYSREG_OFFSET(PMOVSSET_EL0)]
+
+	.if \is_vcpu_pmu == 1
+	msr	pmovsclr_el0, x5	// Clear HW interrupt line
+	msr	pmintenclr_el1, x5	// Mask irq for overflowed counters
+	str	w5, [x0, #VCPU_PMU_IRQ_PENDING] // Update irq pending flag
+	.endif
+
+	mrs	x5, pmccntr_el0		// Save PMCCNTR_EL0
+	str	x5, [x2, #CPU_SYSREG_OFFSET(PMCCNTR_EL0)]
+
+	mrs	x5, pmselr_el0		// Save PMSELR_EL0
+	str	x5, [x2, #CPU_SYSREG_OFFSET(PMSELR_EL0)]
+
+	lsl	x5, x4, #4
+	add	x5, x5, #CPU_SYSREG_OFFSET(PMEVCNTR0_EL0)
+	add	x5, x2, x5
+3:	cmp	x4, #0
+	beq	4f
+	sub	x4, x4, #1
+	msr	pmselr_el0, x4
+	mrs	x6, pmxevcntr_el0	// Save PMEVCNTR<n>_EL0
+	mrs	x7, pmxevtyper_el0	// Save PMEVTYPER<n>_EL0
+	stp	x6, x7, [x5, #-16]!
+	b	3b
+4:
+	mrs	x5, pmcntenset_el0	// Save PMCNTENSET_EL0
+	and	x5, x5, x3
+	str	x5, [x2, #CPU_SYSREG_OFFSET(PMCNTENSET_EL0)]
+
+	mrs	x5, pmintenset_el1	// Save PMINTENSET_EL1
+	and	x5, x5, x3
+	str	x5, [x2, #CPU_SYSREG_OFFSET(PMINTENSET_EL1)]
+
+	mrs	x5, pmuserenr_el0	// Save PMUSERENR_EL0
+	and	x5, x5, x3
+	str	x5, [x2, #CPU_SYSREG_OFFSET(PMUSERENR_EL0)]
+
+	mrs	x5, pmccfiltr_el0	// Save PMCCFILTR_EL0
+	str	x5, [x2, #CPU_SYSREG_OFFSET(PMCCFILTR_EL0)]
+1:
+.endm
+
 .macro restore_sysregs
 	// x2: base address for cpu context
 	// x3: tmp register
@@ -659,6 +731,72 @@ __kvm_hyp_code_start:
 	msr	mdccint_el1, x21
 .endm
 
+.macro restore_pmu
+	// x2: base address for cpu context
+	// x3: mask of counters allowed in EL0 & EL1
+	// x4: number of event counters allowed in EL0 & EL1
+
+	mrs	x6, id_aa64dfr0_el1
+	ubfx	x5, x6, #8, #4		// Extract PMUver
+	cmp	x5, #1			// Must be PMUv3 else skip
+	bne	1f
+
+	mov	x3, #0
+	mrs	x4, pmcr_el0
+	ubfx	x4, x4, #ARMV8_PMCR_N_SHIFT, #5	// Number of event counters
+	cmp	x4, #0			// Skip if no event counters
+	beq	2f
+	sub	x4, x4, #1		// Last event counter is reserved
+	mov	x3, #1
+	lsl	x3, x3, x4
+	sub	x3, x3, #1
+2:	orr	x3, x3, #(1 << 31)	// Mask of event counters
+
+	ldr	x5, [x2, #CPU_SYSREG_OFFSET(PMCCFILTR_EL0)]
+	msr	pmccfiltr_el0, x5	// Restore PMCCFILTR_EL0
+
+	ldr	x5, [x2, #CPU_SYSREG_OFFSET(PMUSERENR_EL0)]
+	and	x5, x5, x3
+	msr	pmuserenr_el0, x5	// Restore PMUSERENR_EL0
+
+	msr	pmintenclr_el1, x3
+	ldr	x5, [x2, #CPU_SYSREG_OFFSET(PMINTENSET_EL1)]
+	and	x5, x5, x3
+	msr	pmintenset_el1, x5	// Restore PMINTENSET_EL1
+
+	msr	pmcntenclr_el0, x3
+	ldr	x5, [x2, #CPU_SYSREG_OFFSET(PMCNTENSET_EL0)]
+	and	x5, x5, x3
+	msr	pmcntenset_el0, x5	// Restore PMCNTENSET_EL0
+
+	lsl	x5, x4, #4
+	add	x5, x5, #CPU_SYSREG_OFFSET(PMEVCNTR0_EL0)
+	add	x5, x2, x5
+3:	cmp	x4, #0
+	beq	4f
+	sub	x4, x4, #1
+	ldp	x6, x7, [x5, #-16]!
+	msr	pmselr_el0, x4
+	msr	pmxevcntr_el0, x6	// Restore PMEVCNTR<n>_EL0
+	msr	pmxevtyper_el0, x7	// Restore PMEVTYPER<n>_EL0
+	b	3b
+4:
+	ldr	x5, [x2, #CPU_SYSREG_OFFSET(PMSELR_EL0)]
+	msr	pmselr_el0, x5		// Restore PMSELR_EL0
+
+	ldr	x5, [x2, #CPU_SYSREG_OFFSET(PMCCNTR_EL0)]
+	msr	pmccntr_el0, x5		// Restore PMCCNTR_EL0
+
+	msr	pmovsclr_el0, x3
+	ldr	x5, [x2, #CPU_SYSREG_OFFSET(PMOVSSET_EL0)]
+	and	x5, x5, x3
+	msr	pmovsset_el0, x5	// Restore PMOVSSET_EL0
+
+	ldr	x5, [x2, #CPU_SYSREG_OFFSET(PMCR_EL0)]
+	msr	pmcr_el0, x5		// Restore PMCR_EL0
+1:
+.endm
+
 .macro skip_32bit_state tmp, target
 	// Skip 32bit state if not needed
 	mrs	\tmp, hcr_el2
@@ -775,8 +913,10 @@ __kvm_hyp_code_start:
 	msr	hstr_el2, x2
 
 	mrs	x2, mdcr_el2
+	and	x3, x2, #MDCR_EL2_HPME
 	and	x2, x2, #MDCR_EL2_HPMN_MASK
-	orr	x2, x2, #(MDCR_EL2_TPM | MDCR_EL2_TPMCR)
+	orr	x2, x2, x3
+	orr	x2, x2, #MDCR_EL2_TPMCR
 	orr	x2, x2, #(MDCR_EL2_TDRA | MDCR_EL2_TDOSA)
 
 	// Check for KVM_ARM64_DEBUG_DIRTY, and set debug to trap
@@ -795,7 +935,9 @@ __kvm_hyp_code_start:
 	msr	hstr_el2, xzr
 
 	mrs	x2, mdcr_el2
+	and	x3, x2, #MDCR_EL2_HPME
 	and	x2, x2, #MDCR_EL2_HPMN_MASK
+	orr	x2, x2, x3
 	msr	mdcr_el2, x2
 .endm
 
@@ -977,6 +1119,18 @@ __restore_debug:
 	restore_debug
 	ret
 
+__save_pmu_host:
+	save_pmu 0
+	ret
+
+__save_pmu_guest:
+	save_pmu 1
+	ret
+
+__restore_pmu:
+	restore_pmu
+	ret
+
 __save_fpsimd:
 	save_fpsimd
 	ret
@@ -1005,6 +1159,9 @@ ENTRY(__kvm_vcpu_run)
 	kern_hyp_va x2
 
 	save_host_regs
+
+	bl __save_pmu_host
+
 	bl __save_fpsimd
 	bl __save_sysregs
 
@@ -1027,6 +1184,9 @@ ENTRY(__kvm_vcpu_run)
 	bl	__restore_debug
 1:
 	restore_guest_32bit_state
+
+	bl __restore_pmu
+
 	restore_guest_regs
 
 	// That's it, no more messing around.
@@ -1040,12 +1200,16 @@ __kvm_vcpu_return:
 	add	x2, x0, #VCPU_CONTEXT
 
 	save_guest_regs
+
+	bl __save_pmu_guest
+
 	bl __save_fpsimd
 	bl __save_sysregs
 
 	skip_debug_state x3, 1f
 	bl	__save_debug
 1:
+
 	save_guest_32bit_state
 
 	save_timer_state
@@ -1068,6 +1232,8 @@ __kvm_vcpu_return:
 	str	xzr, [x0, #VCPU_DEBUG_FLAGS]
 	bl	__restore_debug
 1:
+	bl __restore_pmu
+
 	restore_host_regs
 
 	mov	x0, x1
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4a89ca2..081f95e 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -23,6 +23,7 @@
 #include <linux/mm.h>
 #include <linux/kvm_host.h>
 #include <linux/uaccess.h>
+#include <linux/perf_event.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_host.h>
 #include <asm/kvm_emulate.h>
@@ -31,6 +32,7 @@
 #include <asm/cacheflush.h>
 #include <asm/cputype.h>
 #include <asm/debug-monitors.h>
+#include <asm/pmu.h>
 #include <trace/events/kvm.h>
 
 #include "sys_regs.h"
@@ -164,6 +166,45 @@ static bool access_sctlr(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+/* PMCR_EL0 accessor. Only called as long as MDCR_EL2.TPMCR is set. */
+static bool access_pmcr(struct kvm_vcpu *vcpu,
+			const struct sys_reg_params *p,
+			const struct sys_reg_desc *r)
+{
+	unsigned long val, n;
+
+	if (p->is_write) {
+		/* Only update writeable bits of PMCR */
+		if (!p->is_aarch32)
+			val = vcpu_sys_reg(vcpu, r->reg);
+		else
+			val = vcpu_cp15(vcpu, r->reg);
+		val &= ~ARMV8_PMCR_MASK;
+		val |= *vcpu_reg(vcpu, p->Rt) & ARMV8_PMCR_MASK;
+		if (!p->is_aarch32)
+			vcpu_sys_reg(vcpu, r->reg) = val;
+		else
+			vcpu_cp15(vcpu, r->reg) = val;
+	} else {
+		/*
+		 * We reserve the last event counter for EL2-mode
+		 * performance analysis hence we show one less
+		 * event counter to the guest.
+		 */
+		if (!p->is_aarch32)
+			val = vcpu_sys_reg(vcpu, r->reg);
+		else
+			val = vcpu_cp15(vcpu, r->reg);
+		n = (val >> ARMV8_PMCR_N_SHIFT) & ARMV8_PMCR_N_MASK;
+		n = (n) ? n - 1 : 0;
+		val &= ~(ARMV8_PMCR_N_MASK << ARMV8_PMCR_N_SHIFT);
+		val |= (n & ARMV8_PMCR_N_MASK) << ARMV8_PMCR_N_SHIFT;
+		*vcpu_reg(vcpu, p->Rt) = val;
+	}
+
+	return true;
+}
+
 static bool trap_raz_wi(struct kvm_vcpu *vcpu,
 			const struct sys_reg_params *p,
 			const struct sys_reg_desc *r)
@@ -272,6 +313,20 @@ static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 	{ Op0(0b10), Op1(0b000), CRn(0b0000), CRm((n)), Op2(0b111),	\
 	  trap_debug_regs, reset_val, (DBGWCR0_EL1 + (n)), 0 }
 
+/* Macro to expand the PMEVCNTRn_EL0 register */
+#define PMU_PMEVCNTR_EL0(n)						\
+	/* PMEVCNTRn_EL0 */						\
+	{ Op0(0b11), Op1(0b011), CRn(0b1110),				\
+	  CRm((0b1000 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)),		\
+	  NULL, reset_val, (PMEVCNTR0_EL0 + (n)*2), 0 }
+
+/* Macro to expand the PMEVTYPERn_EL0 register */
+#define PMU_PMEVTYPER_EL0(n)						\
+	/* PMEVTYPERn_EL0 */						\
+	{ Op0(0b11), Op1(0b011), CRn(0b1110),				\
+	  CRm((0b1100 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)),		\
+	  NULL, reset_val, (PMEVTYPER0_EL0 + (n)*2), 0 }
+
 /*
  * Architected system registers.
  * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -408,10 +463,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 
 	/* PMINTENSET_EL1 */
 	{ Op0(0b11), Op1(0b000), CRn(0b1001), CRm(0b1110), Op2(0b001),
-	  trap_raz_wi },
-	/* PMINTENCLR_EL1 */
-	{ Op0(0b11), Op1(0b000), CRn(0b1001), CRm(0b1110), Op2(0b010),
-	  trap_raz_wi },
+	  NULL, reset_val, PMINTENSET_EL1, 0 },
 
 	/* MAIR_EL1 */
 	{ Op0(0b11), Op1(0b000), CRn(0b1010), CRm(0b0010), Op2(0b000),
@@ -440,43 +492,22 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 
 	/* PMCR_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b000),
-	  trap_raz_wi },
+	  access_pmcr, reset_val, PMCR_EL0, 0 },
 	/* PMCNTENSET_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b001),
-	  trap_raz_wi },
-	/* PMCNTENCLR_EL0 */
-	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b010),
-	  trap_raz_wi },
-	/* PMOVSCLR_EL0 */
-	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b011),
-	  trap_raz_wi },
-	/* PMSWINC_EL0 */
-	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b100),
-	  trap_raz_wi },
+	  NULL, reset_val, PMCNTENSET_EL0, 0 },
 	/* PMSELR_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b101),
-	  trap_raz_wi },
-	/* PMCEID0_EL0 */
-	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b110),
-	  trap_raz_wi },
-	/* PMCEID1_EL0 */
-	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b111),
-	  trap_raz_wi },
+	  NULL, reset_val, PMSELR_EL0 },
 	/* PMCCNTR_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b000),
-	  trap_raz_wi },
-	/* PMXEVTYPER_EL0 */
-	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b001),
-	  trap_raz_wi },
-	/* PMXEVCNTR_EL0 */
-	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b010),
-	  trap_raz_wi },
+	  NULL, reset_val, PMCCNTR_EL0, 0 },
 	/* PMUSERENR_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b000),
-	  trap_raz_wi },
+	  NULL, reset_val, PMUSERENR_EL0, 0 },
 	/* PMOVSSET_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b011),
-	  trap_raz_wi },
+	  NULL, reset_val, PMOVSSET_EL0, 0 },
 
 	/* TPIDR_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1101), CRm(0b0000), Op2(0b010),
@@ -485,6 +516,74 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ Op0(0b11), Op1(0b011), CRn(0b1101), CRm(0b0000), Op2(0b011),
 	  NULL, reset_unknown, TPIDRRO_EL0 },
 
+	/* PMEVCNTRn_EL0 */
+	PMU_PMEVCNTR_EL0(0),
+	PMU_PMEVCNTR_EL0(1),
+	PMU_PMEVCNTR_EL0(2),
+	PMU_PMEVCNTR_EL0(3),
+	PMU_PMEVCNTR_EL0(4),
+	PMU_PMEVCNTR_EL0(5),
+	PMU_PMEVCNTR_EL0(6),
+	PMU_PMEVCNTR_EL0(7),
+	PMU_PMEVCNTR_EL0(8),
+	PMU_PMEVCNTR_EL0(9),
+	PMU_PMEVCNTR_EL0(10),
+	PMU_PMEVCNTR_EL0(11),
+	PMU_PMEVCNTR_EL0(12),
+	PMU_PMEVCNTR_EL0(13),
+	PMU_PMEVCNTR_EL0(14),
+	PMU_PMEVCNTR_EL0(15),
+	PMU_PMEVCNTR_EL0(16),
+	PMU_PMEVCNTR_EL0(17),
+	PMU_PMEVCNTR_EL0(18),
+	PMU_PMEVCNTR_EL0(19),
+	PMU_PMEVCNTR_EL0(20),
+	PMU_PMEVCNTR_EL0(21),
+	PMU_PMEVCNTR_EL0(22),
+	PMU_PMEVCNTR_EL0(23),
+	PMU_PMEVCNTR_EL0(24),
+	PMU_PMEVCNTR_EL0(25),
+	PMU_PMEVCNTR_EL0(26),
+	PMU_PMEVCNTR_EL0(27),
+	PMU_PMEVCNTR_EL0(28),
+	PMU_PMEVCNTR_EL0(29),
+	PMU_PMEVCNTR_EL0(30),
+	/* PMEVTYPERn_EL0 */
+	PMU_PMEVTYPER_EL0(0),
+	PMU_PMEVTYPER_EL0(1),
+	PMU_PMEVTYPER_EL0(2),
+	PMU_PMEVTYPER_EL0(3),
+	PMU_PMEVTYPER_EL0(4),
+	PMU_PMEVTYPER_EL0(5),
+	PMU_PMEVTYPER_EL0(6),
+	PMU_PMEVTYPER_EL0(7),
+	PMU_PMEVTYPER_EL0(8),
+	PMU_PMEVTYPER_EL0(9),
+	PMU_PMEVTYPER_EL0(10),
+	PMU_PMEVTYPER_EL0(11),
+	PMU_PMEVTYPER_EL0(12),
+	PMU_PMEVTYPER_EL0(13),
+	PMU_PMEVTYPER_EL0(14),
+	PMU_PMEVTYPER_EL0(15),
+	PMU_PMEVTYPER_EL0(16),
+	PMU_PMEVTYPER_EL0(17),
+	PMU_PMEVTYPER_EL0(18),
+	PMU_PMEVTYPER_EL0(19),
+	PMU_PMEVTYPER_EL0(20),
+	PMU_PMEVTYPER_EL0(21),
+	PMU_PMEVTYPER_EL0(22),
+	PMU_PMEVTYPER_EL0(23),
+	PMU_PMEVTYPER_EL0(24),
+	PMU_PMEVTYPER_EL0(25),
+	PMU_PMEVTYPER_EL0(26),
+	PMU_PMEVTYPER_EL0(27),
+	PMU_PMEVTYPER_EL0(28),
+	PMU_PMEVTYPER_EL0(29),
+	PMU_PMEVTYPER_EL0(30),
+	/* PMCCFILTR_EL0 */
+	{ Op0(0b11), Op1(0b011), CRn(0b1110), CRm(0b1111), Op2(0b111),
+	  NULL, reset_val, PMCCFILTR_EL0, 0 },
+
 	/* DACR32_EL2 */
 	{ Op0(0b11), Op1(0b100), CRn(0b0011), CRm(0b0000), Op2(0b000),
 	  NULL, reset_unknown, DACR32_EL2 },
@@ -671,19 +770,7 @@ static const struct sys_reg_desc cp15_regs[] = {
 	{ Op1( 0), CRn( 7), CRm(14), Op2( 2), access_dcsw },
 
 	/* PMU */
-	{ Op1( 0), CRn( 9), CRm(12), Op2( 0), trap_raz_wi },
-	{ Op1( 0), CRn( 9), CRm(12), Op2( 1), trap_raz_wi },
-	{ Op1( 0), CRn( 9), CRm(12), Op2( 2), trap_raz_wi },
-	{ Op1( 0), CRn( 9), CRm(12), Op2( 3), trap_raz_wi },
-	{ Op1( 0), CRn( 9), CRm(12), Op2( 5), trap_raz_wi },
-	{ Op1( 0), CRn( 9), CRm(12), Op2( 6), trap_raz_wi },
-	{ Op1( 0), CRn( 9), CRm(12), Op2( 7), trap_raz_wi },
-	{ Op1( 0), CRn( 9), CRm(13), Op2( 0), trap_raz_wi },
-	{ Op1( 0), CRn( 9), CRm(13), Op2( 1), trap_raz_wi },
-	{ Op1( 0), CRn( 9), CRm(13), Op2( 2), trap_raz_wi },
-	{ Op1( 0), CRn( 9), CRm(14), Op2( 0), trap_raz_wi },
-	{ Op1( 0), CRn( 9), CRm(14), Op2( 1), trap_raz_wi },
-	{ Op1( 0), CRn( 9), CRm(14), Op2( 2), trap_raz_wi },
+	{ Op1( 0), CRn( 9), CRm(12), Op2( 0), access_pmcr, NULL, c9_PMCR },
 
 	{ Op1( 0), CRn(10), CRm( 2), Op2( 0), access_vm_reg, NULL, c10_PRRR },
 	{ Op1( 0), CRn(10), CRm( 2), Op2( 1), access_vm_reg, NULL, c10_NMRR },
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 5/6] ARM64: KVM: Implement full context switch of PMU registers
@ 2014-08-05  9:24   ` Anup Patel
  0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05  9:24 UTC (permalink / raw)
  To: linux-arm-kernel

This patch implements following stuff:
1. Save/restore all PMU registers for both Guest and Host in
KVM world switch.
2. Reserve last PMU event counter for performance analysis in
EL2-mode. To achieve we fake the number of event counters available
to the Guest by trapping PMCR_EL0 register accesses and program
MDCR_EL2.HPMN with number of PMU event counters minus one.
3. Clear and mask overflowed interrupts when saving PMU context
for Guest. The Guest will re-enable overflowed interrupts when
processing virtual PMU interrupt.

With this patch we have direct access of all PMU registers from
Guest and we only trap-n-emulate PMCR_EL0 accesses to fake number
of PMU event counters to Guest.

Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
---
 arch/arm64/include/asm/kvm_asm.h |   36 ++++++--
 arch/arm64/kernel/asm-offsets.c  |    1 +
 arch/arm64/kvm/hyp-init.S        |   15 ++++
 arch/arm64/kvm/hyp.S             |  168 +++++++++++++++++++++++++++++++++++-
 arch/arm64/kvm/sys_regs.c        |  175 ++++++++++++++++++++++++++++----------
 5 files changed, 343 insertions(+), 52 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 993a7db..93be21f 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -53,15 +53,27 @@
 #define DBGWVR0_EL1	71	/* Debug Watchpoint Value Registers (0-15) */
 #define DBGWVR15_EL1	86
 #define MDCCINT_EL1	87	/* Monitor Debug Comms Channel Interrupt Enable Reg */
+#define PMCR_EL0	88	/* Performance Monitors Control Register */
+#define PMOVSSET_EL0	89	/* Performance Monitors Overflow Flag Status Set Register */
+#define PMCCNTR_EL0	90	/* Cycle Counter Register */
+#define PMSELR_EL0	91	/* Performance Monitors Event Counter Selection Register */
+#define PMEVCNTR0_EL0	92	/* Performance Monitors Event Counter Register (0-30) */
+#define PMEVTYPER0_EL0	93	/* Performance Monitors Event Type Register (0-30) */
+#define PMEVCNTR30_EL0	152
+#define PMEVTYPER30_EL0	153
+#define PMCNTENSET_EL0	154	/* Performance Monitors Count Enable Set Register */
+#define PMINTENSET_EL1	155	/* Performance Monitors Interrupt Enable Set Register */
+#define PMUSERENR_EL0	156	/* Performance Monitors User Enable Register */
+#define PMCCFILTR_EL0	157	/* Cycle Count Filter Register */
 
 /* 32bit specific registers. Keep them at the end of the range */
-#define	DACR32_EL2	88	/* Domain Access Control Register */
-#define	IFSR32_EL2	89	/* Instruction Fault Status Register */
-#define	FPEXC32_EL2	90	/* Floating-Point Exception Control Register */
-#define	DBGVCR32_EL2	91	/* Debug Vector Catch Register */
-#define	TEECR32_EL1	92	/* ThumbEE Configuration Register */
-#define	TEEHBR32_EL1	93	/* ThumbEE Handler Base Register */
-#define	NR_SYS_REGS	94
+#define	DACR32_EL2	158	/* Domain Access Control Register */
+#define	IFSR32_EL2	159	/* Instruction Fault Status Register */
+#define	FPEXC32_EL2	160	/* Floating-Point Exception Control Register */
+#define	DBGVCR32_EL2	161	/* Debug Vector Catch Register */
+#define	TEECR32_EL1	162	/* ThumbEE Configuration Register */
+#define	TEEHBR32_EL1	163	/* ThumbEE Handler Base Register */
+#define	NR_SYS_REGS	164
 
 /* 32bit mapping */
 #define c0_MPIDR	(MPIDR_EL1 * 2)	/* MultiProcessor ID Register */
@@ -83,6 +95,13 @@
 #define c6_IFAR		(c6_DFAR + 1)	/* Instruction Fault Address Register */
 #define c7_PAR		(PAR_EL1 * 2)	/* Physical Address Register */
 #define c7_PAR_high	(c7_PAR + 1)	/* PAR top 32 bits */
+#define c9_PMCR		(PMCR_EL0 * 2)	/* Performance Monitors Control Register */
+#define c9_PMOVSSET	(PMOVSSET_EL0 * 2)
+#define c9_PMCCNTR	(PMCCNTR_EL0 * 2)
+#define c9_PMSELR	(PMSELR_EL0 * 2)
+#define c9_PMCNTENSET	(PMCNTENSET_EL0 * 2)
+#define c9_PMINTENSET	(PMINTENSET_EL1 * 2)
+#define c9_PMUSERENR	(PMUSERENR_EL0 * 2)
 #define c10_PRRR	(MAIR_EL1 * 2)	/* Primary Region Remap Register */
 #define c10_NMRR	(c10_PRRR + 1)	/* Normal Memory Remap Register */
 #define c12_VBAR	(VBAR_EL1 * 2)	/* Vector Base Address Register */
@@ -93,6 +112,9 @@
 #define c10_AMAIR0	(AMAIR_EL1 * 2)	/* Aux Memory Attr Indirection Reg */
 #define c10_AMAIR1	(c10_AMAIR0 + 1)/* Aux Memory Attr Indirection Reg */
 #define c14_CNTKCTL	(CNTKCTL_EL1 * 2) /* Timer Control Register (PL1) */
+#define c14_PMEVCNTR0	(PMEVCNTR0_EL0 * 2)
+#define c14_PMEVTYPR0	(PMEVTYPER0_EL0 * 2)
+#define c14_PMCCFILTR	(PMCCFILTR_EL0 * 2)
 
 #define cp14_DBGDSCRext	(MDSCR_EL1 * 2)
 #define cp14_DBGBCR0	(DBGBCR0_EL1 * 2)
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index ae73a83..053dc3e 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -140,6 +140,7 @@ int main(void)
   DEFINE(VGIC_CPU_NR_LR,	offsetof(struct vgic_cpu, nr_lr));
   DEFINE(KVM_VTTBR,		offsetof(struct kvm, arch.vttbr));
   DEFINE(KVM_VGIC_VCTRL,	offsetof(struct kvm, arch.vgic.vctrl_base));
+  DEFINE(VCPU_PMU_IRQ_PENDING,	offsetof(struct kvm_vcpu, arch.pmu_cpu.irq_pending));
 #endif
 #ifdef CONFIG_ARM64_CPU_SUSPEND
   DEFINE(CPU_SUSPEND_SZ,	sizeof(struct cpu_suspend_ctx));
diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
index d968796..b45556e 100644
--- a/arch/arm64/kvm/hyp-init.S
+++ b/arch/arm64/kvm/hyp-init.S
@@ -20,6 +20,7 @@
 #include <asm/assembler.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/pmu.h>
 
 	.text
 	.pushsection	.hyp.idmap.text, "ax"
@@ -107,6 +108,20 @@ target: /* We're now in the trampoline code, switch page tables */
 	kern_hyp_va	x3
 	msr	vbar_el2, x3
 
+	/* Reserve last PMU event counter for EL2 */
+	mov	x4, #0
+	mrs	x5, id_aa64dfr0_el1
+	ubfx	x5, x5, #8, #4		// Extract PMUver
+	cmp	x5, #1			// Must be PMUv3 else skip
+	bne	1f
+	mrs	x5, pmcr_el0
+	ubfx	x5, x5, #ARMV8_PMCR_N_SHIFT, #5	// Number of event counters
+	cmp	x5, #0			// Skip if no event counters
+	beq	1f
+	sub	x4, x5, #1
+1:
+	msr	mdcr_el2, x4
+
 	/* Hello, World! */
 	eret
 ENDPROC(__kvm_hyp_init)
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index d032132..6b41c01 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -23,6 +23,7 @@
 #include <asm/asm-offsets.h>
 #include <asm/debug-monitors.h>
 #include <asm/fpsimdmacros.h>
+#include <asm/pmu.h>
 #include <asm/kvm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_arm.h>
@@ -426,6 +427,77 @@ __kvm_hyp_code_start:
 	str	x21, [x2, #CPU_SYSREG_OFFSET(MDCCINT_EL1)]
 .endm
 
+.macro save_pmu, is_vcpu_pmu
+	// x2: base address for cpu context
+	// x3: mask of counters allowed in EL0 & EL1
+	// x4: number of event counters allowed in EL0 & EL1
+
+	mrs	x6, id_aa64dfr0_el1
+	ubfx	x5, x6, #8, #4		// Extract PMUver
+	cmp	x5, #1			// Must be PMUv3 else skip
+	bne	1f
+
+	mrs	x4, pmcr_el0		// Save PMCR_EL0
+	str	x4, [x2, #CPU_SYSREG_OFFSET(PMCR_EL0)]
+
+	and	x5, x4, #~(ARMV8_PMCR_E)// Clear PMCR_EL0.E
+	msr	pmcr_el0, x5		// This will stop all counters
+
+	mov	x3, #0
+	ubfx	x4, x4, #ARMV8_PMCR_N_SHIFT, #5	// Number of event counters
+	cmp	x4, #0			// Skip if no event counters
+	beq	2f
+	sub	x4, x4, #1		// Last event counter is reserved
+	mov	x3, #1
+	lsl	x3, x3, x4
+	sub	x3, x3, #1
+2:	orr	x3, x3, #(1 << 31)	// Mask of event counters
+
+	mrs	x5, pmovsset_el0	// Save PMOVSSET_EL0
+	and	x5, x5, x3
+	str	x5, [x2, #CPU_SYSREG_OFFSET(PMOVSSET_EL0)]
+
+	.if \is_vcpu_pmu == 1
+	msr	pmovsclr_el0, x5	// Clear HW interrupt line
+	msr	pmintenclr_el1, x5	// Mask irq for overflowed counters
+	str	w5, [x0, #VCPU_PMU_IRQ_PENDING] // Update irq pending flag
+	.endif
+
+	mrs	x5, pmccntr_el0		// Save PMCCNTR_EL0
+	str	x5, [x2, #CPU_SYSREG_OFFSET(PMCCNTR_EL0)]
+
+	mrs	x5, pmselr_el0		// Save PMSELR_EL0
+	str	x5, [x2, #CPU_SYSREG_OFFSET(PMSELR_EL0)]
+
+	lsl	x5, x4, #4
+	add	x5, x5, #CPU_SYSREG_OFFSET(PMEVCNTR0_EL0)
+	add	x5, x2, x5
+3:	cmp	x4, #0
+	beq	4f
+	sub	x4, x4, #1
+	msr	pmselr_el0, x4
+	mrs	x6, pmxevcntr_el0	// Save PMEVCNTR<n>_EL0
+	mrs	x7, pmxevtyper_el0	// Save PMEVTYPER<n>_EL0
+	stp	x6, x7, [x5, #-16]!
+	b	3b
+4:
+	mrs	x5, pmcntenset_el0	// Save PMCNTENSET_EL0
+	and	x5, x5, x3
+	str	x5, [x2, #CPU_SYSREG_OFFSET(PMCNTENSET_EL0)]
+
+	mrs	x5, pmintenset_el1	// Save PMINTENSET_EL1
+	and	x5, x5, x3
+	str	x5, [x2, #CPU_SYSREG_OFFSET(PMINTENSET_EL1)]
+
+	mrs	x5, pmuserenr_el0	// Save PMUSERENR_EL0
+	and	x5, x5, x3
+	str	x5, [x2, #CPU_SYSREG_OFFSET(PMUSERENR_EL0)]
+
+	mrs	x5, pmccfiltr_el0	// Save PMCCFILTR_EL0
+	str	x5, [x2, #CPU_SYSREG_OFFSET(PMCCFILTR_EL0)]
+1:
+.endm
+
 .macro restore_sysregs
 	// x2: base address for cpu context
 	// x3: tmp register
@@ -659,6 +731,72 @@ __kvm_hyp_code_start:
 	msr	mdccint_el1, x21
 .endm
 
+.macro restore_pmu
+	// x2: base address for cpu context
+	// x3: mask of counters allowed in EL0 & EL1
+	// x4: number of event counters allowed in EL0 & EL1
+
+	mrs	x6, id_aa64dfr0_el1
+	ubfx	x5, x6, #8, #4		// Extract PMUver
+	cmp	x5, #1			// Must be PMUv3 else skip
+	bne	1f
+
+	mov	x3, #0
+	mrs	x4, pmcr_el0
+	ubfx	x4, x4, #ARMV8_PMCR_N_SHIFT, #5	// Number of event counters
+	cmp	x4, #0			// Skip if no event counters
+	beq	2f
+	sub	x4, x4, #1		// Last event counter is reserved
+	mov	x3, #1
+	lsl	x3, x3, x4
+	sub	x3, x3, #1
+2:	orr	x3, x3, #(1 << 31)	// Mask of event counters
+
+	ldr	x5, [x2, #CPU_SYSREG_OFFSET(PMCCFILTR_EL0)]
+	msr	pmccfiltr_el0, x5	// Restore PMCCFILTR_EL0
+
+	ldr	x5, [x2, #CPU_SYSREG_OFFSET(PMUSERENR_EL0)]
+	and	x5, x5, x3
+	msr	pmuserenr_el0, x5	// Restore PMUSERENR_EL0
+
+	msr	pmintenclr_el1, x3
+	ldr	x5, [x2, #CPU_SYSREG_OFFSET(PMINTENSET_EL1)]
+	and	x5, x5, x3
+	msr	pmintenset_el1, x5	// Restore PMINTENSET_EL1
+
+	msr	pmcntenclr_el0, x3
+	ldr	x5, [x2, #CPU_SYSREG_OFFSET(PMCNTENSET_EL0)]
+	and	x5, x5, x3
+	msr	pmcntenset_el0, x5	// Restore PMCNTENSET_EL0
+
+	lsl	x5, x4, #4
+	add	x5, x5, #CPU_SYSREG_OFFSET(PMEVCNTR0_EL0)
+	add	x5, x2, x5
+3:	cmp	x4, #0
+	beq	4f
+	sub	x4, x4, #1
+	ldp	x6, x7, [x5, #-16]!
+	msr	pmselr_el0, x4
+	msr	pmxevcntr_el0, x6	// Restore PMEVCNTR<n>_EL0
+	msr	pmxevtyper_el0, x7	// Restore PMEVTYPER<n>_EL0
+	b	3b
+4:
+	ldr	x5, [x2, #CPU_SYSREG_OFFSET(PMSELR_EL0)]
+	msr	pmselr_el0, x5		// Restore PMSELR_EL0
+
+	ldr	x5, [x2, #CPU_SYSREG_OFFSET(PMCCNTR_EL0)]
+	msr	pmccntr_el0, x5		// Restore PMCCNTR_EL0
+
+	msr	pmovsclr_el0, x3
+	ldr	x5, [x2, #CPU_SYSREG_OFFSET(PMOVSSET_EL0)]
+	and	x5, x5, x3
+	msr	pmovsset_el0, x5	// Restore PMOVSSET_EL0
+
+	ldr	x5, [x2, #CPU_SYSREG_OFFSET(PMCR_EL0)]
+	msr	pmcr_el0, x5		// Restore PMCR_EL0
+1:
+.endm
+
 .macro skip_32bit_state tmp, target
 	// Skip 32bit state if not needed
 	mrs	\tmp, hcr_el2
@@ -775,8 +913,10 @@ __kvm_hyp_code_start:
 	msr	hstr_el2, x2
 
 	mrs	x2, mdcr_el2
+	and	x3, x2, #MDCR_EL2_HPME
 	and	x2, x2, #MDCR_EL2_HPMN_MASK
-	orr	x2, x2, #(MDCR_EL2_TPM | MDCR_EL2_TPMCR)
+	orr	x2, x2, x3
+	orr	x2, x2, #MDCR_EL2_TPMCR
 	orr	x2, x2, #(MDCR_EL2_TDRA | MDCR_EL2_TDOSA)
 
 	// Check for KVM_ARM64_DEBUG_DIRTY, and set debug to trap
@@ -795,7 +935,9 @@ __kvm_hyp_code_start:
 	msr	hstr_el2, xzr
 
 	mrs	x2, mdcr_el2
+	and	x3, x2, #MDCR_EL2_HPME
 	and	x2, x2, #MDCR_EL2_HPMN_MASK
+	orr	x2, x2, x3
 	msr	mdcr_el2, x2
 .endm
 
@@ -977,6 +1119,18 @@ __restore_debug:
 	restore_debug
 	ret
 
+__save_pmu_host:
+	save_pmu 0
+	ret
+
+__save_pmu_guest:
+	save_pmu 1
+	ret
+
+__restore_pmu:
+	restore_pmu
+	ret
+
 __save_fpsimd:
 	save_fpsimd
 	ret
@@ -1005,6 +1159,9 @@ ENTRY(__kvm_vcpu_run)
 	kern_hyp_va x2
 
 	save_host_regs
+
+	bl __save_pmu_host
+
 	bl __save_fpsimd
 	bl __save_sysregs
 
@@ -1027,6 +1184,9 @@ ENTRY(__kvm_vcpu_run)
 	bl	__restore_debug
 1:
 	restore_guest_32bit_state
+
+	bl __restore_pmu
+
 	restore_guest_regs
 
 	// That's it, no more messing around.
@@ -1040,12 +1200,16 @@ __kvm_vcpu_return:
 	add	x2, x0, #VCPU_CONTEXT
 
 	save_guest_regs
+
+	bl __save_pmu_guest
+
 	bl __save_fpsimd
 	bl __save_sysregs
 
 	skip_debug_state x3, 1f
 	bl	__save_debug
 1:
+
 	save_guest_32bit_state
 
 	save_timer_state
@@ -1068,6 +1232,8 @@ __kvm_vcpu_return:
 	str	xzr, [x0, #VCPU_DEBUG_FLAGS]
 	bl	__restore_debug
 1:
+	bl __restore_pmu
+
 	restore_host_regs
 
 	mov	x0, x1
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4a89ca2..081f95e 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -23,6 +23,7 @@
 #include <linux/mm.h>
 #include <linux/kvm_host.h>
 #include <linux/uaccess.h>
+#include <linux/perf_event.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_host.h>
 #include <asm/kvm_emulate.h>
@@ -31,6 +32,7 @@
 #include <asm/cacheflush.h>
 #include <asm/cputype.h>
 #include <asm/debug-monitors.h>
+#include <asm/pmu.h>
 #include <trace/events/kvm.h>
 
 #include "sys_regs.h"
@@ -164,6 +166,45 @@ static bool access_sctlr(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+/* PMCR_EL0 accessor. Only called as long as MDCR_EL2.TPMCR is set. */
+static bool access_pmcr(struct kvm_vcpu *vcpu,
+			const struct sys_reg_params *p,
+			const struct sys_reg_desc *r)
+{
+	unsigned long val, n;
+
+	if (p->is_write) {
+		/* Only update writeable bits of PMCR */
+		if (!p->is_aarch32)
+			val = vcpu_sys_reg(vcpu, r->reg);
+		else
+			val = vcpu_cp15(vcpu, r->reg);
+		val &= ~ARMV8_PMCR_MASK;
+		val |= *vcpu_reg(vcpu, p->Rt) & ARMV8_PMCR_MASK;
+		if (!p->is_aarch32)
+			vcpu_sys_reg(vcpu, r->reg) = val;
+		else
+			vcpu_cp15(vcpu, r->reg) = val;
+	} else {
+		/*
+		 * We reserve the last event counter for EL2-mode
+		 * performance analysis hence we show one less
+		 * event counter to the guest.
+		 */
+		if (!p->is_aarch32)
+			val = vcpu_sys_reg(vcpu, r->reg);
+		else
+			val = vcpu_cp15(vcpu, r->reg);
+		n = (val >> ARMV8_PMCR_N_SHIFT) & ARMV8_PMCR_N_MASK;
+		n = (n) ? n - 1 : 0;
+		val &= ~(ARMV8_PMCR_N_MASK << ARMV8_PMCR_N_SHIFT);
+		val |= (n & ARMV8_PMCR_N_MASK) << ARMV8_PMCR_N_SHIFT;
+		*vcpu_reg(vcpu, p->Rt) = val;
+	}
+
+	return true;
+}
+
 static bool trap_raz_wi(struct kvm_vcpu *vcpu,
 			const struct sys_reg_params *p,
 			const struct sys_reg_desc *r)
@@ -272,6 +313,20 @@ static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 	{ Op0(0b10), Op1(0b000), CRn(0b0000), CRm((n)), Op2(0b111),	\
 	  trap_debug_regs, reset_val, (DBGWCR0_EL1 + (n)), 0 }
 
+/* Macro to expand the PMEVCNTRn_EL0 register */
+#define PMU_PMEVCNTR_EL0(n)						\
+	/* PMEVCNTRn_EL0 */						\
+	{ Op0(0b11), Op1(0b011), CRn(0b1110),				\
+	  CRm((0b1000 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)),		\
+	  NULL, reset_val, (PMEVCNTR0_EL0 + (n)*2), 0 }
+
+/* Macro to expand the PMEVTYPERn_EL0 register */
+#define PMU_PMEVTYPER_EL0(n)						\
+	/* PMEVTYPERn_EL0 */						\
+	{ Op0(0b11), Op1(0b011), CRn(0b1110),				\
+	  CRm((0b1100 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)),		\
+	  NULL, reset_val, (PMEVTYPER0_EL0 + (n)*2), 0 }
+
 /*
  * Architected system registers.
  * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -408,10 +463,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 
 	/* PMINTENSET_EL1 */
 	{ Op0(0b11), Op1(0b000), CRn(0b1001), CRm(0b1110), Op2(0b001),
-	  trap_raz_wi },
-	/* PMINTENCLR_EL1 */
-	{ Op0(0b11), Op1(0b000), CRn(0b1001), CRm(0b1110), Op2(0b010),
-	  trap_raz_wi },
+	  NULL, reset_val, PMINTENSET_EL1, 0 },
 
 	/* MAIR_EL1 */
 	{ Op0(0b11), Op1(0b000), CRn(0b1010), CRm(0b0010), Op2(0b000),
@@ -440,43 +492,22 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 
 	/* PMCR_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b000),
-	  trap_raz_wi },
+	  access_pmcr, reset_val, PMCR_EL0, 0 },
 	/* PMCNTENSET_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b001),
-	  trap_raz_wi },
-	/* PMCNTENCLR_EL0 */
-	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b010),
-	  trap_raz_wi },
-	/* PMOVSCLR_EL0 */
-	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b011),
-	  trap_raz_wi },
-	/* PMSWINC_EL0 */
-	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b100),
-	  trap_raz_wi },
+	  NULL, reset_val, PMCNTENSET_EL0, 0 },
 	/* PMSELR_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b101),
-	  trap_raz_wi },
-	/* PMCEID0_EL0 */
-	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b110),
-	  trap_raz_wi },
-	/* PMCEID1_EL0 */
-	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b111),
-	  trap_raz_wi },
+	  NULL, reset_val, PMSELR_EL0 },
 	/* PMCCNTR_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b000),
-	  trap_raz_wi },
-	/* PMXEVTYPER_EL0 */
-	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b001),
-	  trap_raz_wi },
-	/* PMXEVCNTR_EL0 */
-	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b010),
-	  trap_raz_wi },
+	  NULL, reset_val, PMCCNTR_EL0, 0 },
 	/* PMUSERENR_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b000),
-	  trap_raz_wi },
+	  NULL, reset_val, PMUSERENR_EL0, 0 },
 	/* PMOVSSET_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b011),
-	  trap_raz_wi },
+	  NULL, reset_val, PMOVSSET_EL0, 0 },
 
 	/* TPIDR_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1101), CRm(0b0000), Op2(0b010),
@@ -485,6 +516,74 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ Op0(0b11), Op1(0b011), CRn(0b1101), CRm(0b0000), Op2(0b011),
 	  NULL, reset_unknown, TPIDRRO_EL0 },
 
+	/* PMEVCNTRn_EL0 */
+	PMU_PMEVCNTR_EL0(0),
+	PMU_PMEVCNTR_EL0(1),
+	PMU_PMEVCNTR_EL0(2),
+	PMU_PMEVCNTR_EL0(3),
+	PMU_PMEVCNTR_EL0(4),
+	PMU_PMEVCNTR_EL0(5),
+	PMU_PMEVCNTR_EL0(6),
+	PMU_PMEVCNTR_EL0(7),
+	PMU_PMEVCNTR_EL0(8),
+	PMU_PMEVCNTR_EL0(9),
+	PMU_PMEVCNTR_EL0(10),
+	PMU_PMEVCNTR_EL0(11),
+	PMU_PMEVCNTR_EL0(12),
+	PMU_PMEVCNTR_EL0(13),
+	PMU_PMEVCNTR_EL0(14),
+	PMU_PMEVCNTR_EL0(15),
+	PMU_PMEVCNTR_EL0(16),
+	PMU_PMEVCNTR_EL0(17),
+	PMU_PMEVCNTR_EL0(18),
+	PMU_PMEVCNTR_EL0(19),
+	PMU_PMEVCNTR_EL0(20),
+	PMU_PMEVCNTR_EL0(21),
+	PMU_PMEVCNTR_EL0(22),
+	PMU_PMEVCNTR_EL0(23),
+	PMU_PMEVCNTR_EL0(24),
+	PMU_PMEVCNTR_EL0(25),
+	PMU_PMEVCNTR_EL0(26),
+	PMU_PMEVCNTR_EL0(27),
+	PMU_PMEVCNTR_EL0(28),
+	PMU_PMEVCNTR_EL0(29),
+	PMU_PMEVCNTR_EL0(30),
+	/* PMEVTYPERn_EL0 */
+	PMU_PMEVTYPER_EL0(0),
+	PMU_PMEVTYPER_EL0(1),
+	PMU_PMEVTYPER_EL0(2),
+	PMU_PMEVTYPER_EL0(3),
+	PMU_PMEVTYPER_EL0(4),
+	PMU_PMEVTYPER_EL0(5),
+	PMU_PMEVTYPER_EL0(6),
+	PMU_PMEVTYPER_EL0(7),
+	PMU_PMEVTYPER_EL0(8),
+	PMU_PMEVTYPER_EL0(9),
+	PMU_PMEVTYPER_EL0(10),
+	PMU_PMEVTYPER_EL0(11),
+	PMU_PMEVTYPER_EL0(12),
+	PMU_PMEVTYPER_EL0(13),
+	PMU_PMEVTYPER_EL0(14),
+	PMU_PMEVTYPER_EL0(15),
+	PMU_PMEVTYPER_EL0(16),
+	PMU_PMEVTYPER_EL0(17),
+	PMU_PMEVTYPER_EL0(18),
+	PMU_PMEVTYPER_EL0(19),
+	PMU_PMEVTYPER_EL0(20),
+	PMU_PMEVTYPER_EL0(21),
+	PMU_PMEVTYPER_EL0(22),
+	PMU_PMEVTYPER_EL0(23),
+	PMU_PMEVTYPER_EL0(24),
+	PMU_PMEVTYPER_EL0(25),
+	PMU_PMEVTYPER_EL0(26),
+	PMU_PMEVTYPER_EL0(27),
+	PMU_PMEVTYPER_EL0(28),
+	PMU_PMEVTYPER_EL0(29),
+	PMU_PMEVTYPER_EL0(30),
+	/* PMCCFILTR_EL0 */
+	{ Op0(0b11), Op1(0b011), CRn(0b1110), CRm(0b1111), Op2(0b111),
+	  NULL, reset_val, PMCCFILTR_EL0, 0 },
+
 	/* DACR32_EL2 */
 	{ Op0(0b11), Op1(0b100), CRn(0b0011), CRm(0b0000), Op2(0b000),
 	  NULL, reset_unknown, DACR32_EL2 },
@@ -671,19 +770,7 @@ static const struct sys_reg_desc cp15_regs[] = {
 	{ Op1( 0), CRn( 7), CRm(14), Op2( 2), access_dcsw },
 
 	/* PMU */
-	{ Op1( 0), CRn( 9), CRm(12), Op2( 0), trap_raz_wi },
-	{ Op1( 0), CRn( 9), CRm(12), Op2( 1), trap_raz_wi },
-	{ Op1( 0), CRn( 9), CRm(12), Op2( 2), trap_raz_wi },
-	{ Op1( 0), CRn( 9), CRm(12), Op2( 3), trap_raz_wi },
-	{ Op1( 0), CRn( 9), CRm(12), Op2( 5), trap_raz_wi },
-	{ Op1( 0), CRn( 9), CRm(12), Op2( 6), trap_raz_wi },
-	{ Op1( 0), CRn( 9), CRm(12), Op2( 7), trap_raz_wi },
-	{ Op1( 0), CRn( 9), CRm(13), Op2( 0), trap_raz_wi },
-	{ Op1( 0), CRn( 9), CRm(13), Op2( 1), trap_raz_wi },
-	{ Op1( 0), CRn( 9), CRm(13), Op2( 2), trap_raz_wi },
-	{ Op1( 0), CRn( 9), CRm(14), Op2( 0), trap_raz_wi },
-	{ Op1( 0), CRn( 9), CRm(14), Op2( 1), trap_raz_wi },
-	{ Op1( 0), CRn( 9), CRm(14), Op2( 2), trap_raz_wi },
+	{ Op1( 0), CRn( 9), CRm(12), Op2( 0), access_pmcr, NULL, c9_PMCR },
 
 	{ Op1( 0), CRn(10), CRm( 2), Op2( 0), access_vm_reg, NULL, c10_PRRR },
 	{ Op1( 0), CRn(10), CRm( 2), Op2( 1), access_vm_reg, NULL, c10_NMRR },
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 6/6] ARM64: KVM: Upgrade to lazy context switch of PMU registers
  2014-08-05  9:24 ` Anup Patel
@ 2014-08-05  9:24   ` Anup Patel
  -1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05  9:24 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, kvm, patches, marc.zyngier, christoffer.dall,
	will.deacon, ian.campbell, pranavkumar, Anup Patel

Full context switch of all PMU registers for both host and
guest can make KVM world-switch very expensive.

This patch improves current PMU context switch by implementing
lazy context switch of PMU registers.

To achieve this, we trap all PMU register accesses and use a
per-VCPU dirty flag to keep track whether guest has updated
PMU registers or not. If PMU registers of VCPU are dirty or
PMCR_EL0.E bit is set for VCPU then we do full context switch
for both host and guest.
(This is very similar to lazy world switch for debug registers:
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)

Also, we always trap-n-emulate PMCR_EL0 to fake number of event
counters available to guest. For this PMCR_EL0 trap-n-emulate to
work correctly, we always save/restore PMCR_EL0 for both host and
guest whereas other PMU registers will be saved/restored based
on PMU dirty flag.

Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
---
 arch/arm64/include/asm/kvm_asm.h  |    3 +
 arch/arm64/include/asm/kvm_host.h |    3 +
 arch/arm64/kernel/asm-offsets.c   |    1 +
 arch/arm64/kvm/hyp.S              |   63 ++++++++--
 arch/arm64/kvm/sys_regs.c         |  248 +++++++++++++++++++++++++++++++++++--
 5 files changed, 298 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 93be21f..47b7fcd 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -132,6 +132,9 @@
 #define KVM_ARM64_DEBUG_DIRTY_SHIFT	0
 #define KVM_ARM64_DEBUG_DIRTY		(1 << KVM_ARM64_DEBUG_DIRTY_SHIFT)
 
+#define KVM_ARM64_PMU_DIRTY_SHIFT	0
+#define KVM_ARM64_PMU_DIRTY		(1 << KVM_ARM64_PMU_DIRTY_SHIFT)
+
 #ifndef __ASSEMBLY__
 struct kvm;
 struct kvm_vcpu;
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index ae4cdb2..4dba2a3 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -117,6 +117,9 @@ struct kvm_vcpu_arch {
 	/* Timer state */
 	struct arch_timer_cpu timer_cpu;
 
+	/* PMU flags */
+	u64 pmu_flags;
+
 	/* PMU state */
 	struct pmu_cpu pmu_cpu;
 
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 053dc3e..4234794 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -140,6 +140,7 @@ int main(void)
   DEFINE(VGIC_CPU_NR_LR,	offsetof(struct vgic_cpu, nr_lr));
   DEFINE(KVM_VTTBR,		offsetof(struct kvm, arch.vttbr));
   DEFINE(KVM_VGIC_VCTRL,	offsetof(struct kvm, arch.vgic.vctrl_base));
+  DEFINE(VCPU_PMU_FLAGS,	offsetof(struct kvm_vcpu, arch.pmu_flags));
   DEFINE(VCPU_PMU_IRQ_PENDING,	offsetof(struct kvm_vcpu, arch.pmu_cpu.irq_pending));
 #endif
 #ifdef CONFIG_ARM64_CPU_SUSPEND
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 6b41c01..5f9ccee 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -443,6 +443,9 @@ __kvm_hyp_code_start:
 	and	x5, x4, #~(ARMV8_PMCR_E)// Clear PMCR_EL0.E
 	msr	pmcr_el0, x5		// This will stop all counters
 
+	ldr	x5, [x0, #VCPU_PMU_FLAGS] // Only save if dirty flag set
+	tbz	x5, #KVM_ARM64_PMU_DIRTY_SHIFT, 1f
+
 	mov	x3, #0
 	ubfx	x4, x4, #ARMV8_PMCR_N_SHIFT, #5	// Number of event counters
 	cmp	x4, #0			// Skip if no event counters
@@ -731,7 +734,7 @@ __kvm_hyp_code_start:
 	msr	mdccint_el1, x21
 .endm
 
-.macro restore_pmu
+.macro restore_pmu, is_vcpu_pmu
 	// x2: base address for cpu context
 	// x3: mask of counters allowed in EL0 & EL1
 	// x4: number of event counters allowed in EL0 & EL1
@@ -741,16 +744,19 @@ __kvm_hyp_code_start:
 	cmp	x5, #1			// Must be PMUv3 else skip
 	bne	1f
 
+	ldr	x5, [x0, #VCPU_PMU_FLAGS] // Only restore if dirty flag set
+	tbz	x5, #KVM_ARM64_PMU_DIRTY_SHIFT, 2f
+
 	mov	x3, #0
 	mrs	x4, pmcr_el0
 	ubfx	x4, x4, #ARMV8_PMCR_N_SHIFT, #5	// Number of event counters
 	cmp	x4, #0			// Skip if no event counters
-	beq	2f
+	beq	3f
 	sub	x4, x4, #1		// Last event counter is reserved
 	mov	x3, #1
 	lsl	x3, x3, x4
 	sub	x3, x3, #1
-2:	orr	x3, x3, #(1 << 31)	// Mask of event counters
+3:	orr	x3, x3, #(1 << 31)	// Mask of event counters
 
 	ldr	x5, [x2, #CPU_SYSREG_OFFSET(PMCCFILTR_EL0)]
 	msr	pmccfiltr_el0, x5	// Restore PMCCFILTR_EL0
@@ -772,15 +778,15 @@ __kvm_hyp_code_start:
 	lsl	x5, x4, #4
 	add	x5, x5, #CPU_SYSREG_OFFSET(PMEVCNTR0_EL0)
 	add	x5, x2, x5
-3:	cmp	x4, #0
-	beq	4f
+4:	cmp	x4, #0
+	beq	5f
 	sub	x4, x4, #1
 	ldp	x6, x7, [x5, #-16]!
 	msr	pmselr_el0, x4
 	msr	pmxevcntr_el0, x6	// Restore PMEVCNTR<n>_EL0
 	msr	pmxevtyper_el0, x7	// Restore PMEVTYPER<n>_EL0
-	b	3b
-4:
+	b	4b
+5:
 	ldr	x5, [x2, #CPU_SYSREG_OFFSET(PMSELR_EL0)]
 	msr	pmselr_el0, x5		// Restore PMSELR_EL0
 
@@ -792,6 +798,13 @@ __kvm_hyp_code_start:
 	and	x5, x5, x3
 	msr	pmovsset_el0, x5	// Restore PMOVSSET_EL0
 
+	.if \is_vcpu_pmu == 0
+	// Clear the dirty flag for the next run, as all the state has
+	// already been saved. Note that we nuke the whole 64bit word.
+	// If we ever add more flags, we'll have to be more careful...
+	str	xzr, [x0, #VCPU_PMU_FLAGS]
+	.endif
+2:
 	ldr	x5, [x2, #CPU_SYSREG_OFFSET(PMCR_EL0)]
 	msr	pmcr_el0, x5		// Restore PMCR_EL0
 1:
@@ -838,6 +851,23 @@ __kvm_hyp_code_start:
 9999:
 .endm
 
+.macro compute_pmu_state
+	// Compute pmu state: If PMCR_EL0.E is set then
+	// we do full save/restore cycle and disable trapping
+	add	x25, x0, #VCPU_CONTEXT
+
+	// Check the state of PMCR_EL0.E bit
+	ldr	x26, [x25, #CPU_SYSREG_OFFSET(PMCR_EL0)]
+	and	x26, x26, #ARMV8_PMCR_E
+	cmp	x26, #0
+	b.eq	8887f
+
+	// If any interesting bits was set, we must set the flag
+	mov	x26, #KVM_ARM64_PMU_DIRTY
+	str	x26, [x0, #VCPU_PMU_FLAGS]
+8887:
+.endm
+
 .macro save_guest_32bit_state
 	skip_32bit_state x3, 1f
 
@@ -919,6 +949,12 @@ __kvm_hyp_code_start:
 	orr	x2, x2, #MDCR_EL2_TPMCR
 	orr	x2, x2, #(MDCR_EL2_TDRA | MDCR_EL2_TDOSA)
 
+	// Check for KVM_ARM64_PMU_DIRTY, and set PMU to trap
+	// all PMU registers if PMU not dirty.
+	ldr	x3, [x0, #VCPU_PMU_FLAGS]
+	tbnz	x3, #KVM_ARM64_PMU_DIRTY_SHIFT, 1f
+	orr	x2, x2, #MDCR_EL2_TPM
+1:
 	// Check for KVM_ARM64_DEBUG_DIRTY, and set debug to trap
 	// if not dirty.
 	ldr	x3, [x0, #VCPU_DEBUG_FLAGS]
@@ -1127,8 +1163,12 @@ __save_pmu_guest:
 	save_pmu 1
 	ret
 
-__restore_pmu:
-	restore_pmu
+__restore_pmu_host:
+	restore_pmu 0
+	ret
+
+__restore_pmu_guest:
+	restore_pmu 1
 	ret
 
 __save_fpsimd:
@@ -1160,6 +1200,7 @@ ENTRY(__kvm_vcpu_run)
 
 	save_host_regs
 
+	compute_pmu_state
 	bl __save_pmu_host
 
 	bl __save_fpsimd
@@ -1185,7 +1226,7 @@ ENTRY(__kvm_vcpu_run)
 1:
 	restore_guest_32bit_state
 
-	bl __restore_pmu
+	bl __restore_pmu_guest
 
 	restore_guest_regs
 
@@ -1232,7 +1273,7 @@ __kvm_vcpu_return:
 	str	xzr, [x0, #VCPU_DEBUG_FLAGS]
 	bl	__restore_debug
 1:
-	bl __restore_pmu
+	bl __restore_pmu_host
 
 	restore_host_regs
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 081f95e..cda6774 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -166,6 +166,130 @@ static bool access_sctlr(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+/* PMU reg accessor. Only called as long as MDCR_EL2.TPMCR is set. */
+static bool access_pmu_reg(struct kvm_vcpu *vcpu,
+			   const struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	unsigned long val;
+
+	if (p->is_write) {
+		val = *vcpu_reg(vcpu, p->Rt);
+		if (!p->is_aarch32)
+			vcpu_sys_reg(vcpu, r->reg) = val;
+		else
+			vcpu_cp15(vcpu, r->reg) = val & 0xffffffffUL;
+		vcpu->arch.pmu_flags |= KVM_ARM64_PMU_DIRTY;
+	} else {
+		if (!p->is_aarch32)
+			val = vcpu_sys_reg(vcpu, r->reg);
+		else
+			val = vcpu_cp15(vcpu, r->reg);
+		*vcpu_reg(vcpu, p->Rt) = val;
+	}
+
+	return true;
+}
+
+/* PMU set reg accessor. Only called as long as MDCR_EL2.TPM is set. */
+static bool access_pmu_setreg(struct kvm_vcpu *vcpu,
+			      const struct sys_reg_params *p,
+			      const struct sys_reg_desc *r)
+{
+	unsigned long val;
+
+	if (p->is_write) {
+		val = *vcpu_reg(vcpu, p->Rt);
+		if (!p->is_aarch32)
+			vcpu_sys_reg(vcpu, r->reg) |= val;
+		else
+			vcpu_cp15(vcpu, r->reg) |= val & 0xffffffffUL;
+		vcpu->arch.pmu_flags |= KVM_ARM64_PMU_DIRTY;
+	} else {
+		if (!p->is_aarch32)
+			val = vcpu_sys_reg(vcpu, r->reg);
+		else
+			val = vcpu_cp15(vcpu, r->reg);
+		*vcpu_reg(vcpu, p->Rt) = val;
+	}
+
+	return true;
+}
+
+/* PMU clear reg accessor. Only called as long as MDCR_EL2.TPM is set. */
+static bool access_pmu_clrreg(struct kvm_vcpu *vcpu,
+			      const struct sys_reg_params *p,
+			      const struct sys_reg_desc *r)
+{
+	unsigned long val;
+
+	if (p->is_write) {
+		val = *vcpu_reg(vcpu, p->Rt);
+		if (!p->is_aarch32)
+			vcpu_sys_reg(vcpu, r->reg) &= ~val;
+		else
+			vcpu_cp15(vcpu, r->reg) &= ~(val & 0xffffffffUL);
+		vcpu->arch.pmu_flags |= KVM_ARM64_PMU_DIRTY;
+	} else {
+		if (!p->is_aarch32)
+			val = vcpu_sys_reg(vcpu, r->reg);
+		else
+			val = vcpu_cp15(vcpu, r->reg);
+		*vcpu_reg(vcpu, p->Rt) = val;
+	}
+
+	return true;
+}
+
+/* PMU extended reg accessor. Only called as long as MDCR_EL2.TPM is set. */
+static bool access_pmu_xreg(struct kvm_vcpu *vcpu,
+			    const struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	unsigned long index, reg, val;
+
+	if (!p->is_aarch32)
+		index = vcpu_sys_reg(vcpu, PMSELR_EL0) & ARMV8_PMCR_N_MASK;
+	else
+		index = vcpu_cp15(vcpu, c9_PMSELR) & ARMV8_PMCR_N_MASK;
+
+	if (index != ARMV8_PMCR_N_MASK) {
+		if (!p->is_aarch32) {
+			if (r->reg == PMEVCNTR0_EL0)
+				reg = PMCCNTR_EL0;
+			else
+				reg = PMCCFILTR_EL0;
+		} else {
+			if (r->reg == c14_PMEVCNTR0)
+				reg = c9_PMCCNTR;
+			else
+				reg = c14_PMCCFILTR;
+		}
+	} else {
+		if (!p->is_aarch32)
+			reg = r->reg + 2*index;
+		else
+			reg = r->reg + 4*index;
+	}
+
+	if (p->is_write) {
+		val = *vcpu_reg(vcpu, p->Rt);
+		if (!p->is_aarch32)
+			vcpu_sys_reg(vcpu, reg) = val;
+		else
+			vcpu_cp15(vcpu, reg) = val & 0xffffffffUL;
+		vcpu->arch.pmu_flags |= KVM_ARM64_PMU_DIRTY;
+	} else {
+		if (!p->is_aarch32)
+			val = vcpu_sys_reg(vcpu, reg);
+		else
+			val = vcpu_cp15(vcpu, reg);
+		*vcpu_reg(vcpu, p->Rt) = val;
+	}
+
+	return true;
+}
+
 /* PMCR_EL0 accessor. Only called as long as MDCR_EL2.TPMCR is set. */
 static bool access_pmcr(struct kvm_vcpu *vcpu,
 			const struct sys_reg_params *p,
@@ -185,6 +309,7 @@ static bool access_pmcr(struct kvm_vcpu *vcpu,
 			vcpu_sys_reg(vcpu, r->reg) = val;
 		else
 			vcpu_cp15(vcpu, r->reg) = val;
+		vcpu->arch.pmu_flags |= KVM_ARM64_PMU_DIRTY;
 	} else {
 		/*
 		 * We reserve the last event counter for EL2-mode
@@ -318,14 +443,14 @@ static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 	/* PMEVCNTRn_EL0 */						\
 	{ Op0(0b11), Op1(0b011), CRn(0b1110),				\
 	  CRm((0b1000 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)),		\
-	  NULL, reset_val, (PMEVCNTR0_EL0 + (n)*2), 0 }
+	  access_pmu_reg, reset_val, (PMEVCNTR0_EL0 + (n)*2), 0 }
 
 /* Macro to expand the PMEVTYPERn_EL0 register */
 #define PMU_PMEVTYPER_EL0(n)						\
 	/* PMEVTYPERn_EL0 */						\
 	{ Op0(0b11), Op1(0b011), CRn(0b1110),				\
 	  CRm((0b1100 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)),		\
-	  NULL, reset_val, (PMEVTYPER0_EL0 + (n)*2), 0 }
+	  access_pmu_reg, reset_val, (PMEVTYPER0_EL0 + (n)*2), 0 }
 
 /*
  * Architected system registers.
@@ -463,7 +588,10 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 
 	/* PMINTENSET_EL1 */
 	{ Op0(0b11), Op1(0b000), CRn(0b1001), CRm(0b1110), Op2(0b001),
-	  NULL, reset_val, PMINTENSET_EL1, 0 },
+	  access_pmu_setreg, reset_val, PMINTENSET_EL1, 0 },
+	/* PMINTENCLR_EL1 */
+	{ Op0(0b11), Op1(0b000), CRn(0b1001), CRm(0b1110), Op2(0b010),
+	  access_pmu_clrreg, reset_val, PMINTENSET_EL1, 0 },
 
 	/* MAIR_EL1 */
 	{ Op0(0b11), Op1(0b000), CRn(0b1010), CRm(0b0010), Op2(0b000),
@@ -495,19 +623,31 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	  access_pmcr, reset_val, PMCR_EL0, 0 },
 	/* PMCNTENSET_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b001),
-	  NULL, reset_val, PMCNTENSET_EL0, 0 },
+	  access_pmu_setreg, reset_val, PMCNTENSET_EL0, 0 },
+	/* PMCNTENCLR_EL0 */
+	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b010),
+	  access_pmu_clrreg, reset_val, PMCNTENSET_EL0, 0 },
+	/* PMOVSCLR_EL0 */
+	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b011),
+	  access_pmu_clrreg, reset_val, PMOVSSET_EL0, 0 },
 	/* PMSELR_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b101),
-	  NULL, reset_val, PMSELR_EL0 },
+	  access_pmu_reg, reset_val, PMSELR_EL0 },
 	/* PMCCNTR_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b000),
-	  NULL, reset_val, PMCCNTR_EL0, 0 },
+	  access_pmu_reg, reset_val, PMCCNTR_EL0, 0 },
+	/* PMXEVTYPER_EL0 */
+	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b001),
+	  access_pmu_xreg, reset_val, PMEVTYPER0_EL0, 0 },
+	/* PMXEVCNTR_EL0 */
+	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b010),
+	  access_pmu_xreg, reset_val, PMEVCNTR0_EL0, 0 },
 	/* PMUSERENR_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b000),
-	  NULL, reset_val, PMUSERENR_EL0, 0 },
+	  access_pmu_reg, reset_val, PMUSERENR_EL0, 0 },
 	/* PMOVSSET_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b011),
-	  NULL, reset_val, PMOVSSET_EL0, 0 },
+	  access_pmu_setreg, reset_val, PMOVSSET_EL0, 0 },
 
 	/* TPIDR_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1101), CRm(0b0000), Op2(0b010),
@@ -582,7 +722,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	PMU_PMEVTYPER_EL0(30),
 	/* PMCCFILTR_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1110), CRm(0b1111), Op2(0b111),
-	  NULL, reset_val, PMCCFILTR_EL0, 0 },
+	  access_pmu_reg, reset_val, PMCCFILTR_EL0, 0 },
 
 	/* DACR32_EL2 */
 	{ Op0(0b11), Op1(0b100), CRn(0b0011), CRm(0b0000), Op2(0b000),
@@ -744,6 +884,20 @@ static const struct sys_reg_desc cp14_64_regs[] = {
 	{ Op1( 0), CRm( 2), .access = trap_raz_wi },
 };
 
+/* Macro to expand the PMEVCNTR<n> register */
+#define PMU_PMEVCNTR(n)							\
+	/* PMEVCNTRn */							\
+	{  Op1( 0), CRn(14), 						\
+	  CRm((0b1000 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)),		\
+	  access_pmu_reg, reset_val, (c14_PMEVCNTR0 + (n)*4), 0 }
+
+/* Macro to expand the PMEVTYPER<n> register */
+#define PMU_PMEVTYPER(n)						\
+	/* PMEVTYPERn_EL0 */						\
+	{ Op1( 0), CRn(14), 						\
+	  CRm((0b1100 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)),		\
+	  access_pmu_reg, reset_val, (c14_PMEVTYPR0 + (n)*4), 0 }
+
 /*
  * Trapped cp15 registers. TTBR0/TTBR1 get a double encoding,
  * depending on the way they are accessed (as a 32bit or a 64bit
@@ -771,12 +925,88 @@ static const struct sys_reg_desc cp15_regs[] = {
 
 	/* PMU */
 	{ Op1( 0), CRn( 9), CRm(12), Op2( 0), access_pmcr, NULL, c9_PMCR },
+	{ Op1( 0), CRn( 9), CRm(12), Op2( 1), access_pmu_setreg, NULL, c9_PMCNTENSET },
+	{ Op1( 0), CRn( 9), CRm(12), Op2( 2), access_pmu_clrreg, NULL, c9_PMCNTENSET },
+	{ Op1( 0), CRn( 9), CRm(12), Op2( 3), access_pmu_clrreg, NULL, c9_PMOVSSET },
+	{ Op1( 0), CRn( 9), CRm(12), Op2( 5), access_pmu_reg, NULL, c9_PMSELR },
+	{ Op1( 0), CRn( 9), CRm(13), Op2( 0), access_pmu_reg, NULL, c9_PMCCNTR },
+	{ Op1( 0), CRn( 9), CRm(13), Op2( 1), access_pmu_xreg, NULL, c14_PMEVTYPR0 },
+	{ Op1( 0), CRn( 9), CRm(13), Op2( 2), access_pmu_xreg, NULL, c14_PMEVCNTR0 },
+	{ Op1( 0), CRn( 9), CRm(14), Op2( 0), access_pmu_reg, NULL, c9_PMUSERENR },
+	{ Op1( 0), CRn( 9), CRm(14), Op2( 1), access_pmu_setreg, NULL, c9_PMINTENSET },
+	{ Op1( 0), CRn( 9), CRm(14), Op2( 2), access_pmu_clrreg, NULL, c9_PMINTENSET },
+	{ Op1( 0), CRn( 9), CRm(14), Op2( 3), access_pmu_setreg, NULL, c9_PMOVSSET },
 
 	{ Op1( 0), CRn(10), CRm( 2), Op2( 0), access_vm_reg, NULL, c10_PRRR },
 	{ Op1( 0), CRn(10), CRm( 2), Op2( 1), access_vm_reg, NULL, c10_NMRR },
 	{ Op1( 0), CRn(10), CRm( 3), Op2( 0), access_vm_reg, NULL, c10_AMAIR0 },
 	{ Op1( 0), CRn(10), CRm( 3), Op2( 1), access_vm_reg, NULL, c10_AMAIR1 },
 	{ Op1( 0), CRn(13), CRm( 0), Op2( 1), access_vm_reg, NULL, c13_CID },
+
+	/* PMU */
+	PMU_PMEVCNTR(0),
+	PMU_PMEVCNTR(1),
+	PMU_PMEVCNTR(2),
+	PMU_PMEVCNTR(3),
+	PMU_PMEVCNTR(4),
+	PMU_PMEVCNTR(5),
+	PMU_PMEVCNTR(6),
+	PMU_PMEVCNTR(7),
+	PMU_PMEVCNTR(8),
+	PMU_PMEVCNTR(9),
+	PMU_PMEVCNTR(10),
+	PMU_PMEVCNTR(11),
+	PMU_PMEVCNTR(12),
+	PMU_PMEVCNTR(13),
+	PMU_PMEVCNTR(14),
+	PMU_PMEVCNTR(15),
+	PMU_PMEVCNTR(16),
+	PMU_PMEVCNTR(17),
+	PMU_PMEVCNTR(18),
+	PMU_PMEVCNTR(19),
+	PMU_PMEVCNTR(20),
+	PMU_PMEVCNTR(21),
+	PMU_PMEVCNTR(22),
+	PMU_PMEVCNTR(23),
+	PMU_PMEVCNTR(24),
+	PMU_PMEVCNTR(25),
+	PMU_PMEVCNTR(26),
+	PMU_PMEVCNTR(27),
+	PMU_PMEVCNTR(28),
+	PMU_PMEVCNTR(29),
+	PMU_PMEVCNTR(30),
+	PMU_PMEVTYPER(0),
+	PMU_PMEVTYPER(1),
+	PMU_PMEVTYPER(2),
+	PMU_PMEVTYPER(3),
+	PMU_PMEVTYPER(4),
+	PMU_PMEVTYPER(5),
+	PMU_PMEVTYPER(6),
+	PMU_PMEVTYPER(7),
+	PMU_PMEVTYPER(8),
+	PMU_PMEVTYPER(9),
+	PMU_PMEVTYPER(10),
+	PMU_PMEVTYPER(11),
+	PMU_PMEVTYPER(12),
+	PMU_PMEVTYPER(13),
+	PMU_PMEVTYPER(14),
+	PMU_PMEVTYPER(15),
+	PMU_PMEVTYPER(16),
+	PMU_PMEVTYPER(17),
+	PMU_PMEVTYPER(18),
+	PMU_PMEVTYPER(19),
+	PMU_PMEVTYPER(20),
+	PMU_PMEVTYPER(21),
+	PMU_PMEVTYPER(22),
+	PMU_PMEVTYPER(23),
+	PMU_PMEVTYPER(24),
+	PMU_PMEVTYPER(25),
+	PMU_PMEVTYPER(26),
+	PMU_PMEVTYPER(27),
+	PMU_PMEVTYPER(28),
+	PMU_PMEVTYPER(29),
+	PMU_PMEVTYPER(30),
+	{ Op1( 0), CRn(14), CRm(15), Op2( 7), access_pmu_reg, NULL, c14_PMCCFILTR },
 };
 
 static const struct sys_reg_desc cp15_64_regs[] = {
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 6/6] ARM64: KVM: Upgrade to lazy context switch of PMU registers
@ 2014-08-05  9:24   ` Anup Patel
  0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05  9:24 UTC (permalink / raw)
  To: linux-arm-kernel

Full context switch of all PMU registers for both host and
guest can make KVM world-switch very expensive.

This patch improves current PMU context switch by implementing
lazy context switch of PMU registers.

To achieve this, we trap all PMU register accesses and use a
per-VCPU dirty flag to keep track whether guest has updated
PMU registers or not. If PMU registers of VCPU are dirty or
PMCR_EL0.E bit is set for VCPU then we do full context switch
for both host and guest.
(This is very similar to lazy world switch for debug registers:
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)

Also, we always trap-n-emulate PMCR_EL0 to fake number of event
counters available to guest. For this PMCR_EL0 trap-n-emulate to
work correctly, we always save/restore PMCR_EL0 for both host and
guest whereas other PMU registers will be saved/restored based
on PMU dirty flag.

Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
---
 arch/arm64/include/asm/kvm_asm.h  |    3 +
 arch/arm64/include/asm/kvm_host.h |    3 +
 arch/arm64/kernel/asm-offsets.c   |    1 +
 arch/arm64/kvm/hyp.S              |   63 ++++++++--
 arch/arm64/kvm/sys_regs.c         |  248 +++++++++++++++++++++++++++++++++++--
 5 files changed, 298 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 93be21f..47b7fcd 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -132,6 +132,9 @@
 #define KVM_ARM64_DEBUG_DIRTY_SHIFT	0
 #define KVM_ARM64_DEBUG_DIRTY		(1 << KVM_ARM64_DEBUG_DIRTY_SHIFT)
 
+#define KVM_ARM64_PMU_DIRTY_SHIFT	0
+#define KVM_ARM64_PMU_DIRTY		(1 << KVM_ARM64_PMU_DIRTY_SHIFT)
+
 #ifndef __ASSEMBLY__
 struct kvm;
 struct kvm_vcpu;
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index ae4cdb2..4dba2a3 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -117,6 +117,9 @@ struct kvm_vcpu_arch {
 	/* Timer state */
 	struct arch_timer_cpu timer_cpu;
 
+	/* PMU flags */
+	u64 pmu_flags;
+
 	/* PMU state */
 	struct pmu_cpu pmu_cpu;
 
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 053dc3e..4234794 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -140,6 +140,7 @@ int main(void)
   DEFINE(VGIC_CPU_NR_LR,	offsetof(struct vgic_cpu, nr_lr));
   DEFINE(KVM_VTTBR,		offsetof(struct kvm, arch.vttbr));
   DEFINE(KVM_VGIC_VCTRL,	offsetof(struct kvm, arch.vgic.vctrl_base));
+  DEFINE(VCPU_PMU_FLAGS,	offsetof(struct kvm_vcpu, arch.pmu_flags));
   DEFINE(VCPU_PMU_IRQ_PENDING,	offsetof(struct kvm_vcpu, arch.pmu_cpu.irq_pending));
 #endif
 #ifdef CONFIG_ARM64_CPU_SUSPEND
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 6b41c01..5f9ccee 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -443,6 +443,9 @@ __kvm_hyp_code_start:
 	and	x5, x4, #~(ARMV8_PMCR_E)// Clear PMCR_EL0.E
 	msr	pmcr_el0, x5		// This will stop all counters
 
+	ldr	x5, [x0, #VCPU_PMU_FLAGS] // Only save if dirty flag set
+	tbz	x5, #KVM_ARM64_PMU_DIRTY_SHIFT, 1f
+
 	mov	x3, #0
 	ubfx	x4, x4, #ARMV8_PMCR_N_SHIFT, #5	// Number of event counters
 	cmp	x4, #0			// Skip if no event counters
@@ -731,7 +734,7 @@ __kvm_hyp_code_start:
 	msr	mdccint_el1, x21
 .endm
 
-.macro restore_pmu
+.macro restore_pmu, is_vcpu_pmu
 	// x2: base address for cpu context
 	// x3: mask of counters allowed in EL0 & EL1
 	// x4: number of event counters allowed in EL0 & EL1
@@ -741,16 +744,19 @@ __kvm_hyp_code_start:
 	cmp	x5, #1			// Must be PMUv3 else skip
 	bne	1f
 
+	ldr	x5, [x0, #VCPU_PMU_FLAGS] // Only restore if dirty flag set
+	tbz	x5, #KVM_ARM64_PMU_DIRTY_SHIFT, 2f
+
 	mov	x3, #0
 	mrs	x4, pmcr_el0
 	ubfx	x4, x4, #ARMV8_PMCR_N_SHIFT, #5	// Number of event counters
 	cmp	x4, #0			// Skip if no event counters
-	beq	2f
+	beq	3f
 	sub	x4, x4, #1		// Last event counter is reserved
 	mov	x3, #1
 	lsl	x3, x3, x4
 	sub	x3, x3, #1
-2:	orr	x3, x3, #(1 << 31)	// Mask of event counters
+3:	orr	x3, x3, #(1 << 31)	// Mask of event counters
 
 	ldr	x5, [x2, #CPU_SYSREG_OFFSET(PMCCFILTR_EL0)]
 	msr	pmccfiltr_el0, x5	// Restore PMCCFILTR_EL0
@@ -772,15 +778,15 @@ __kvm_hyp_code_start:
 	lsl	x5, x4, #4
 	add	x5, x5, #CPU_SYSREG_OFFSET(PMEVCNTR0_EL0)
 	add	x5, x2, x5
-3:	cmp	x4, #0
-	beq	4f
+4:	cmp	x4, #0
+	beq	5f
 	sub	x4, x4, #1
 	ldp	x6, x7, [x5, #-16]!
 	msr	pmselr_el0, x4
 	msr	pmxevcntr_el0, x6	// Restore PMEVCNTR<n>_EL0
 	msr	pmxevtyper_el0, x7	// Restore PMEVTYPER<n>_EL0
-	b	3b
-4:
+	b	4b
+5:
 	ldr	x5, [x2, #CPU_SYSREG_OFFSET(PMSELR_EL0)]
 	msr	pmselr_el0, x5		// Restore PMSELR_EL0
 
@@ -792,6 +798,13 @@ __kvm_hyp_code_start:
 	and	x5, x5, x3
 	msr	pmovsset_el0, x5	// Restore PMOVSSET_EL0
 
+	.if \is_vcpu_pmu == 0
+	// Clear the dirty flag for the next run, as all the state has
+	// already been saved. Note that we nuke the whole 64bit word.
+	// If we ever add more flags, we'll have to be more careful...
+	str	xzr, [x0, #VCPU_PMU_FLAGS]
+	.endif
+2:
 	ldr	x5, [x2, #CPU_SYSREG_OFFSET(PMCR_EL0)]
 	msr	pmcr_el0, x5		// Restore PMCR_EL0
 1:
@@ -838,6 +851,23 @@ __kvm_hyp_code_start:
 9999:
 .endm
 
+.macro compute_pmu_state
+	// Compute pmu state: If PMCR_EL0.E is set then
+	// we do full save/restore cycle and disable trapping
+	add	x25, x0, #VCPU_CONTEXT
+
+	// Check the state of PMCR_EL0.E bit
+	ldr	x26, [x25, #CPU_SYSREG_OFFSET(PMCR_EL0)]
+	and	x26, x26, #ARMV8_PMCR_E
+	cmp	x26, #0
+	b.eq	8887f
+
+	// If any interesting bits was set, we must set the flag
+	mov	x26, #KVM_ARM64_PMU_DIRTY
+	str	x26, [x0, #VCPU_PMU_FLAGS]
+8887:
+.endm
+
 .macro save_guest_32bit_state
 	skip_32bit_state x3, 1f
 
@@ -919,6 +949,12 @@ __kvm_hyp_code_start:
 	orr	x2, x2, #MDCR_EL2_TPMCR
 	orr	x2, x2, #(MDCR_EL2_TDRA | MDCR_EL2_TDOSA)
 
+	// Check for KVM_ARM64_PMU_DIRTY, and set PMU to trap
+	// all PMU registers if PMU not dirty.
+	ldr	x3, [x0, #VCPU_PMU_FLAGS]
+	tbnz	x3, #KVM_ARM64_PMU_DIRTY_SHIFT, 1f
+	orr	x2, x2, #MDCR_EL2_TPM
+1:
 	// Check for KVM_ARM64_DEBUG_DIRTY, and set debug to trap
 	// if not dirty.
 	ldr	x3, [x0, #VCPU_DEBUG_FLAGS]
@@ -1127,8 +1163,12 @@ __save_pmu_guest:
 	save_pmu 1
 	ret
 
-__restore_pmu:
-	restore_pmu
+__restore_pmu_host:
+	restore_pmu 0
+	ret
+
+__restore_pmu_guest:
+	restore_pmu 1
 	ret
 
 __save_fpsimd:
@@ -1160,6 +1200,7 @@ ENTRY(__kvm_vcpu_run)
 
 	save_host_regs
 
+	compute_pmu_state
 	bl __save_pmu_host
 
 	bl __save_fpsimd
@@ -1185,7 +1226,7 @@ ENTRY(__kvm_vcpu_run)
 1:
 	restore_guest_32bit_state
 
-	bl __restore_pmu
+	bl __restore_pmu_guest
 
 	restore_guest_regs
 
@@ -1232,7 +1273,7 @@ __kvm_vcpu_return:
 	str	xzr, [x0, #VCPU_DEBUG_FLAGS]
 	bl	__restore_debug
 1:
-	bl __restore_pmu
+	bl __restore_pmu_host
 
 	restore_host_regs
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 081f95e..cda6774 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -166,6 +166,130 @@ static bool access_sctlr(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+/* PMU reg accessor. Only called as long as MDCR_EL2.TPMCR is set. */
+static bool access_pmu_reg(struct kvm_vcpu *vcpu,
+			   const struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	unsigned long val;
+
+	if (p->is_write) {
+		val = *vcpu_reg(vcpu, p->Rt);
+		if (!p->is_aarch32)
+			vcpu_sys_reg(vcpu, r->reg) = val;
+		else
+			vcpu_cp15(vcpu, r->reg) = val & 0xffffffffUL;
+		vcpu->arch.pmu_flags |= KVM_ARM64_PMU_DIRTY;
+	} else {
+		if (!p->is_aarch32)
+			val = vcpu_sys_reg(vcpu, r->reg);
+		else
+			val = vcpu_cp15(vcpu, r->reg);
+		*vcpu_reg(vcpu, p->Rt) = val;
+	}
+
+	return true;
+}
+
+/* PMU set reg accessor. Only called as long as MDCR_EL2.TPM is set. */
+static bool access_pmu_setreg(struct kvm_vcpu *vcpu,
+			      const struct sys_reg_params *p,
+			      const struct sys_reg_desc *r)
+{
+	unsigned long val;
+
+	if (p->is_write) {
+		val = *vcpu_reg(vcpu, p->Rt);
+		if (!p->is_aarch32)
+			vcpu_sys_reg(vcpu, r->reg) |= val;
+		else
+			vcpu_cp15(vcpu, r->reg) |= val & 0xffffffffUL;
+		vcpu->arch.pmu_flags |= KVM_ARM64_PMU_DIRTY;
+	} else {
+		if (!p->is_aarch32)
+			val = vcpu_sys_reg(vcpu, r->reg);
+		else
+			val = vcpu_cp15(vcpu, r->reg);
+		*vcpu_reg(vcpu, p->Rt) = val;
+	}
+
+	return true;
+}
+
+/* PMU clear reg accessor. Only called as long as MDCR_EL2.TPM is set. */
+static bool access_pmu_clrreg(struct kvm_vcpu *vcpu,
+			      const struct sys_reg_params *p,
+			      const struct sys_reg_desc *r)
+{
+	unsigned long val;
+
+	if (p->is_write) {
+		val = *vcpu_reg(vcpu, p->Rt);
+		if (!p->is_aarch32)
+			vcpu_sys_reg(vcpu, r->reg) &= ~val;
+		else
+			vcpu_cp15(vcpu, r->reg) &= ~(val & 0xffffffffUL);
+		vcpu->arch.pmu_flags |= KVM_ARM64_PMU_DIRTY;
+	} else {
+		if (!p->is_aarch32)
+			val = vcpu_sys_reg(vcpu, r->reg);
+		else
+			val = vcpu_cp15(vcpu, r->reg);
+		*vcpu_reg(vcpu, p->Rt) = val;
+	}
+
+	return true;
+}
+
+/* PMU extended reg accessor. Only called as long as MDCR_EL2.TPM is set. */
+static bool access_pmu_xreg(struct kvm_vcpu *vcpu,
+			    const struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	unsigned long index, reg, val;
+
+	if (!p->is_aarch32)
+		index = vcpu_sys_reg(vcpu, PMSELR_EL0) & ARMV8_PMCR_N_MASK;
+	else
+		index = vcpu_cp15(vcpu, c9_PMSELR) & ARMV8_PMCR_N_MASK;
+
+	if (index != ARMV8_PMCR_N_MASK) {
+		if (!p->is_aarch32) {
+			if (r->reg == PMEVCNTR0_EL0)
+				reg = PMCCNTR_EL0;
+			else
+				reg = PMCCFILTR_EL0;
+		} else {
+			if (r->reg == c14_PMEVCNTR0)
+				reg = c9_PMCCNTR;
+			else
+				reg = c14_PMCCFILTR;
+		}
+	} else {
+		if (!p->is_aarch32)
+			reg = r->reg + 2*index;
+		else
+			reg = r->reg + 4*index;
+	}
+
+	if (p->is_write) {
+		val = *vcpu_reg(vcpu, p->Rt);
+		if (!p->is_aarch32)
+			vcpu_sys_reg(vcpu, reg) = val;
+		else
+			vcpu_cp15(vcpu, reg) = val & 0xffffffffUL;
+		vcpu->arch.pmu_flags |= KVM_ARM64_PMU_DIRTY;
+	} else {
+		if (!p->is_aarch32)
+			val = vcpu_sys_reg(vcpu, reg);
+		else
+			val = vcpu_cp15(vcpu, reg);
+		*vcpu_reg(vcpu, p->Rt) = val;
+	}
+
+	return true;
+}
+
 /* PMCR_EL0 accessor. Only called as long as MDCR_EL2.TPMCR is set. */
 static bool access_pmcr(struct kvm_vcpu *vcpu,
 			const struct sys_reg_params *p,
@@ -185,6 +309,7 @@ static bool access_pmcr(struct kvm_vcpu *vcpu,
 			vcpu_sys_reg(vcpu, r->reg) = val;
 		else
 			vcpu_cp15(vcpu, r->reg) = val;
+		vcpu->arch.pmu_flags |= KVM_ARM64_PMU_DIRTY;
 	} else {
 		/*
 		 * We reserve the last event counter for EL2-mode
@@ -318,14 +443,14 @@ static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 	/* PMEVCNTRn_EL0 */						\
 	{ Op0(0b11), Op1(0b011), CRn(0b1110),				\
 	  CRm((0b1000 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)),		\
-	  NULL, reset_val, (PMEVCNTR0_EL0 + (n)*2), 0 }
+	  access_pmu_reg, reset_val, (PMEVCNTR0_EL0 + (n)*2), 0 }
 
 /* Macro to expand the PMEVTYPERn_EL0 register */
 #define PMU_PMEVTYPER_EL0(n)						\
 	/* PMEVTYPERn_EL0 */						\
 	{ Op0(0b11), Op1(0b011), CRn(0b1110),				\
 	  CRm((0b1100 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)),		\
-	  NULL, reset_val, (PMEVTYPER0_EL0 + (n)*2), 0 }
+	  access_pmu_reg, reset_val, (PMEVTYPER0_EL0 + (n)*2), 0 }
 
 /*
  * Architected system registers.
@@ -463,7 +588,10 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 
 	/* PMINTENSET_EL1 */
 	{ Op0(0b11), Op1(0b000), CRn(0b1001), CRm(0b1110), Op2(0b001),
-	  NULL, reset_val, PMINTENSET_EL1, 0 },
+	  access_pmu_setreg, reset_val, PMINTENSET_EL1, 0 },
+	/* PMINTENCLR_EL1 */
+	{ Op0(0b11), Op1(0b000), CRn(0b1001), CRm(0b1110), Op2(0b010),
+	  access_pmu_clrreg, reset_val, PMINTENSET_EL1, 0 },
 
 	/* MAIR_EL1 */
 	{ Op0(0b11), Op1(0b000), CRn(0b1010), CRm(0b0010), Op2(0b000),
@@ -495,19 +623,31 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	  access_pmcr, reset_val, PMCR_EL0, 0 },
 	/* PMCNTENSET_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b001),
-	  NULL, reset_val, PMCNTENSET_EL0, 0 },
+	  access_pmu_setreg, reset_val, PMCNTENSET_EL0, 0 },
+	/* PMCNTENCLR_EL0 */
+	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b010),
+	  access_pmu_clrreg, reset_val, PMCNTENSET_EL0, 0 },
+	/* PMOVSCLR_EL0 */
+	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b011),
+	  access_pmu_clrreg, reset_val, PMOVSSET_EL0, 0 },
 	/* PMSELR_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b101),
-	  NULL, reset_val, PMSELR_EL0 },
+	  access_pmu_reg, reset_val, PMSELR_EL0 },
 	/* PMCCNTR_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b000),
-	  NULL, reset_val, PMCCNTR_EL0, 0 },
+	  access_pmu_reg, reset_val, PMCCNTR_EL0, 0 },
+	/* PMXEVTYPER_EL0 */
+	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b001),
+	  access_pmu_xreg, reset_val, PMEVTYPER0_EL0, 0 },
+	/* PMXEVCNTR_EL0 */
+	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b010),
+	  access_pmu_xreg, reset_val, PMEVCNTR0_EL0, 0 },
 	/* PMUSERENR_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b000),
-	  NULL, reset_val, PMUSERENR_EL0, 0 },
+	  access_pmu_reg, reset_val, PMUSERENR_EL0, 0 },
 	/* PMOVSSET_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b011),
-	  NULL, reset_val, PMOVSSET_EL0, 0 },
+	  access_pmu_setreg, reset_val, PMOVSSET_EL0, 0 },
 
 	/* TPIDR_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1101), CRm(0b0000), Op2(0b010),
@@ -582,7 +722,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	PMU_PMEVTYPER_EL0(30),
 	/* PMCCFILTR_EL0 */
 	{ Op0(0b11), Op1(0b011), CRn(0b1110), CRm(0b1111), Op2(0b111),
-	  NULL, reset_val, PMCCFILTR_EL0, 0 },
+	  access_pmu_reg, reset_val, PMCCFILTR_EL0, 0 },
 
 	/* DACR32_EL2 */
 	{ Op0(0b11), Op1(0b100), CRn(0b0011), CRm(0b0000), Op2(0b000),
@@ -744,6 +884,20 @@ static const struct sys_reg_desc cp14_64_regs[] = {
 	{ Op1( 0), CRm( 2), .access = trap_raz_wi },
 };
 
+/* Macro to expand the PMEVCNTR<n> register */
+#define PMU_PMEVCNTR(n)							\
+	/* PMEVCNTRn */							\
+	{  Op1( 0), CRn(14), 						\
+	  CRm((0b1000 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)),		\
+	  access_pmu_reg, reset_val, (c14_PMEVCNTR0 + (n)*4), 0 }
+
+/* Macro to expand the PMEVTYPER<n> register */
+#define PMU_PMEVTYPER(n)						\
+	/* PMEVTYPERn_EL0 */						\
+	{ Op1( 0), CRn(14), 						\
+	  CRm((0b1100 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)),		\
+	  access_pmu_reg, reset_val, (c14_PMEVTYPR0 + (n)*4), 0 }
+
 /*
  * Trapped cp15 registers. TTBR0/TTBR1 get a double encoding,
  * depending on the way they are accessed (as a 32bit or a 64bit
@@ -771,12 +925,88 @@ static const struct sys_reg_desc cp15_regs[] = {
 
 	/* PMU */
 	{ Op1( 0), CRn( 9), CRm(12), Op2( 0), access_pmcr, NULL, c9_PMCR },
+	{ Op1( 0), CRn( 9), CRm(12), Op2( 1), access_pmu_setreg, NULL, c9_PMCNTENSET },
+	{ Op1( 0), CRn( 9), CRm(12), Op2( 2), access_pmu_clrreg, NULL, c9_PMCNTENSET },
+	{ Op1( 0), CRn( 9), CRm(12), Op2( 3), access_pmu_clrreg, NULL, c9_PMOVSSET },
+	{ Op1( 0), CRn( 9), CRm(12), Op2( 5), access_pmu_reg, NULL, c9_PMSELR },
+	{ Op1( 0), CRn( 9), CRm(13), Op2( 0), access_pmu_reg, NULL, c9_PMCCNTR },
+	{ Op1( 0), CRn( 9), CRm(13), Op2( 1), access_pmu_xreg, NULL, c14_PMEVTYPR0 },
+	{ Op1( 0), CRn( 9), CRm(13), Op2( 2), access_pmu_xreg, NULL, c14_PMEVCNTR0 },
+	{ Op1( 0), CRn( 9), CRm(14), Op2( 0), access_pmu_reg, NULL, c9_PMUSERENR },
+	{ Op1( 0), CRn( 9), CRm(14), Op2( 1), access_pmu_setreg, NULL, c9_PMINTENSET },
+	{ Op1( 0), CRn( 9), CRm(14), Op2( 2), access_pmu_clrreg, NULL, c9_PMINTENSET },
+	{ Op1( 0), CRn( 9), CRm(14), Op2( 3), access_pmu_setreg, NULL, c9_PMOVSSET },
 
 	{ Op1( 0), CRn(10), CRm( 2), Op2( 0), access_vm_reg, NULL, c10_PRRR },
 	{ Op1( 0), CRn(10), CRm( 2), Op2( 1), access_vm_reg, NULL, c10_NMRR },
 	{ Op1( 0), CRn(10), CRm( 3), Op2( 0), access_vm_reg, NULL, c10_AMAIR0 },
 	{ Op1( 0), CRn(10), CRm( 3), Op2( 1), access_vm_reg, NULL, c10_AMAIR1 },
 	{ Op1( 0), CRn(13), CRm( 0), Op2( 1), access_vm_reg, NULL, c13_CID },
+
+	/* PMU */
+	PMU_PMEVCNTR(0),
+	PMU_PMEVCNTR(1),
+	PMU_PMEVCNTR(2),
+	PMU_PMEVCNTR(3),
+	PMU_PMEVCNTR(4),
+	PMU_PMEVCNTR(5),
+	PMU_PMEVCNTR(6),
+	PMU_PMEVCNTR(7),
+	PMU_PMEVCNTR(8),
+	PMU_PMEVCNTR(9),
+	PMU_PMEVCNTR(10),
+	PMU_PMEVCNTR(11),
+	PMU_PMEVCNTR(12),
+	PMU_PMEVCNTR(13),
+	PMU_PMEVCNTR(14),
+	PMU_PMEVCNTR(15),
+	PMU_PMEVCNTR(16),
+	PMU_PMEVCNTR(17),
+	PMU_PMEVCNTR(18),
+	PMU_PMEVCNTR(19),
+	PMU_PMEVCNTR(20),
+	PMU_PMEVCNTR(21),
+	PMU_PMEVCNTR(22),
+	PMU_PMEVCNTR(23),
+	PMU_PMEVCNTR(24),
+	PMU_PMEVCNTR(25),
+	PMU_PMEVCNTR(26),
+	PMU_PMEVCNTR(27),
+	PMU_PMEVCNTR(28),
+	PMU_PMEVCNTR(29),
+	PMU_PMEVCNTR(30),
+	PMU_PMEVTYPER(0),
+	PMU_PMEVTYPER(1),
+	PMU_PMEVTYPER(2),
+	PMU_PMEVTYPER(3),
+	PMU_PMEVTYPER(4),
+	PMU_PMEVTYPER(5),
+	PMU_PMEVTYPER(6),
+	PMU_PMEVTYPER(7),
+	PMU_PMEVTYPER(8),
+	PMU_PMEVTYPER(9),
+	PMU_PMEVTYPER(10),
+	PMU_PMEVTYPER(11),
+	PMU_PMEVTYPER(12),
+	PMU_PMEVTYPER(13),
+	PMU_PMEVTYPER(14),
+	PMU_PMEVTYPER(15),
+	PMU_PMEVTYPER(16),
+	PMU_PMEVTYPER(17),
+	PMU_PMEVTYPER(18),
+	PMU_PMEVTYPER(19),
+	PMU_PMEVTYPER(20),
+	PMU_PMEVTYPER(21),
+	PMU_PMEVTYPER(22),
+	PMU_PMEVTYPER(23),
+	PMU_PMEVTYPER(24),
+	PMU_PMEVTYPER(25),
+	PMU_PMEVTYPER(26),
+	PMU_PMEVTYPER(27),
+	PMU_PMEVTYPER(28),
+	PMU_PMEVTYPER(29),
+	PMU_PMEVTYPER(30),
+	{ Op1( 0), CRn(14), CRm(15), Op2( 7), access_pmu_reg, NULL, c14_PMCCFILTR },
 };
 
 static const struct sys_reg_desc cp15_64_regs[] = {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2014-08-05  9:24 ` Anup Patel
@ 2014-08-05  9:32   ` Anup Patel
  -1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05  9:32 UTC (permalink / raw)
  To: Anup Patel
  Cc: Ian Campbell, kvm, Marc Zyngier, will.deacon, patches,
	linux-arm-kernel, kvmarm, Christoffer Dall,
	Pranavkumar Sawargaonkar

[-- Attachment #1: Type: text/plain, Size: 3386 bytes --]

On Tue, Aug 5, 2014 at 2:54 PM, Anup Patel <anup.patel@linaro.org> wrote:
> This patchset enables PMU virtualization in KVM ARM64. The
> Guest can now directly use PMU available on the host HW.
>
> The virtual PMU IRQ injection for Guest VCPUs is managed by
> small piece of code shared between KVM ARM and KVM ARM64. The
> virtual PMU IRQ number will be based on Guest machine model and
> user space will provide it using set device address vm ioctl.
>
> The second last patch of this series implements full context
> switch of PMU registers which will context switch all PMU
> registers on every KVM world-switch.
>
> The last patch implements a lazy context switch of PMU registers
> which is very similar to lazy debug context switch.
> (Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
>
> Also, we reserve last PMU event counter for EL2 mode which
> will not be accessible from Host and Guest EL1 mode. This
> reserved EL2 mode PMU event counter can be used for profiling
> KVM world-switch and other EL2 mode functions.
>
> All testing have been done using KVMTOOL on X-Gene Mustang and
> Foundation v8 Model for both Aarch32 and Aarch64 guest.
>
> Anup Patel (6):
>   ARM64: Move PMU register related defines to asm/pmu.h
>   ARM64: perf: Re-enable overflow interrupt from interrupt handler
>   ARM: perf: Re-enable overflow interrupt from interrupt handler
>   ARM/ARM64: KVM: Add common code PMU IRQ routing
>   ARM64: KVM: Implement full context switch of PMU registers
>   ARM64: KVM: Upgrade to lazy context switch of PMU registers
>
>  arch/arm/include/asm/kvm_host.h   |    9 +
>  arch/arm/include/uapi/asm/kvm.h   |    1 +
>  arch/arm/kernel/perf_event_v7.c   |    8 +
>  arch/arm/kvm/arm.c                |    6 +
>  arch/arm/kvm/reset.c              |    4 +
>  arch/arm64/include/asm/kvm_asm.h  |   39 +++-
>  arch/arm64/include/asm/kvm_host.h |   12 ++
>  arch/arm64/include/asm/pmu.h      |   44 +++++
>  arch/arm64/include/uapi/asm/kvm.h |    1 +
>  arch/arm64/kernel/asm-offsets.c   |    2 +
>  arch/arm64/kernel/perf_event.c    |   40 +---
>  arch/arm64/kvm/Kconfig            |    7 +
>  arch/arm64/kvm/Makefile           |    1 +
>  arch/arm64/kvm/hyp-init.S         |   15 ++
>  arch/arm64/kvm/hyp.S              |  209 +++++++++++++++++++-
>  arch/arm64/kvm/reset.c            |    4 +
>  arch/arm64/kvm/sys_regs.c         |  385 +++++++++++++++++++++++++++++++++----
>  include/kvm/arm_pmu.h             |   52 +++++
>  virt/kvm/arm/pmu.c                |  105 ++++++++++
>  19 files changed, 870 insertions(+), 74 deletions(-)
>  create mode 100644 include/kvm/arm_pmu.h
>  create mode 100644 virt/kvm/arm/pmu.c
>
> --
> 1.7.9.5
>
> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments,
> is for the sole use of the intended recipient(s) and contains information
> that is confidential and proprietary to Applied Micro Circuits Corporation or its subsidiaries.
> It is to be used solely for the purpose of furthering the parties' business relationship.
> All unauthorized review, use, disclosure or distribution is prohibited.
> If you are not the intended recipient, please contact the sender by reply e-mail
> and destroy all copies of the original message.
>

Hi All,

Please apply attached patch to KVMTOOL on-top-of my
recent KVMTOOL patchset for trying this patchset using
KVMTOOL.

Regards,
Anup

[-- Attachment #2: 0001-kvmtool-ARM-ARM64-Add-PMU-node-to-generated-guest-DT.patch --]
[-- Type: text/x-patch, Size: 3994 bytes --]

From c16a3265992ba8159ab1da6d589026c0aa0914ba Mon Sep 17 00:00:00 2001
From: Anup Patel <anup.patel@linaro.org>
Date: Mon, 4 Aug 2014 16:45:44 +0530
Subject: [RFC PATCH] kvmtool: ARM/ARM64: Add PMU node to generated guest DTB.

This patch informs KVM ARM/ARM64 in-kernel PMU virtualization
about the PMU irq numbers for each guest VCPU using set device
address vm ioctl.

We also adds PMU node in generated guest DTB to inform guest
about the PMU irq numbers. For now, we have assumed PPI17 as
PMU IRQ of KVMTOOL guest.

Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Anup Patel <anup.patel@linaro.org>
---
 tools/kvm/Makefile                     |    3 ++-
 tools/kvm/arm/fdt.c                    |    4 +++
 tools/kvm/arm/include/arm-common/pmu.h |   10 +++++++
 tools/kvm/arm/pmu.c                    |   45 ++++++++++++++++++++++++++++++++
 4 files changed, 61 insertions(+), 1 deletion(-)
 create mode 100644 tools/kvm/arm/include/arm-common/pmu.h
 create mode 100644 tools/kvm/arm/pmu.c

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index fba60f1..59b75c4 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -158,7 +158,8 @@ endif
 
 # ARM
 OBJS_ARM_COMMON		:= arm/fdt.o arm/gic.o arm/ioport.o arm/irq.o \
-			   arm/kvm.o arm/kvm-cpu.o arm/pci.o arm/timer.o
+			   arm/kvm.o arm/kvm-cpu.o arm/pci.o arm/timer.o \
+			   arm/pmu.o
 HDRS_ARM_COMMON		:= arm/include
 ifeq ($(ARCH), arm)
 	DEFINES		+= -DCONFIG_ARM
diff --git a/tools/kvm/arm/fdt.c b/tools/kvm/arm/fdt.c
index 93849cf2..42b0a67 100644
--- a/tools/kvm/arm/fdt.c
+++ b/tools/kvm/arm/fdt.c
@@ -5,6 +5,7 @@
 #include "kvm/virtio-mmio.h"
 
 #include "arm-common/gic.h"
+#include "arm-common/pmu.h"
 #include "arm-common/pci.h"
 
 #include <stdbool.h>
@@ -142,6 +143,9 @@ static int setup_fdt(struct kvm *kvm)
 	if (generate_cpu_peripheral_fdt_nodes)
 		generate_cpu_peripheral_fdt_nodes(fdt, kvm, gic_phandle);
 
+	/* Performance monitoring unit */
+	pmu__generate_fdt_nodes(fdt, kvm);
+
 	/* Virtio MMIO devices */
 	dev_hdr = device__first_dev(DEVICE_BUS_MMIO);
 	while (dev_hdr) {
diff --git a/tools/kvm/arm/include/arm-common/pmu.h b/tools/kvm/arm/include/arm-common/pmu.h
new file mode 100644
index 0000000..49ec9a8
--- /dev/null
+++ b/tools/kvm/arm/include/arm-common/pmu.h
@@ -0,0 +1,10 @@
+#ifndef ARM_COMMON__PMU_H
+#define ARM_COMMON__PMU_H
+
+#define PMU_CPU_IRQ			17
+
+struct kvm;
+
+void pmu__generate_fdt_nodes(void *fdt, struct kvm *kvm);
+
+#endif /* ARM_COMMON__PMU_H */
diff --git a/tools/kvm/arm/pmu.c b/tools/kvm/arm/pmu.c
new file mode 100644
index 0000000..7731a4c
--- /dev/null
+++ b/tools/kvm/arm/pmu.c
@@ -0,0 +1,45 @@
+#include "kvm/devices.h"
+#include "kvm/fdt.h"
+#include "kvm/kvm.h"
+#include "kvm/kvm-cpu.h"
+
+#include "arm-common/gic.h"
+#include "arm-common/pmu.h"
+
+#include <linux/byteorder.h>
+#include <linux/kvm.h>
+
+void pmu__generate_fdt_nodes(void *fdt, struct kvm *kvm)
+{
+	int cpu, err;
+	const char compatible[] = "arm,armv8-pmuv3\0arm,cortex-a15-pmu";
+	u32 cpu_mask = (((1 << kvm->nrcpus) - 1) << GIC_FDT_IRQ_PPI_CPU_SHIFT) \
+		       & GIC_FDT_IRQ_PPI_CPU_MASK;
+	u32 irq_prop[] = {
+			cpu_to_fdt32(GIC_FDT_IRQ_TYPE_PPI),
+			cpu_to_fdt32(PMU_CPU_IRQ - 0x10),
+			cpu_to_fdt32(cpu_mask | GIC_FDT_IRQ_FLAGS_EDGE_LO_HI),
+			};
+	struct kvm_arm_device_addr pmu_addr = {
+		.id = KVM_ARM_DEVICE_PMU << KVM_ARM_DEVICE_ID_SHIFT,
+		.addr = PMU_CPU_IRQ,
+	};
+
+	for (cpu = 0; cpu < kvm->nrcpus; ++cpu) {
+		pmu_addr.id &= ~KVM_ARM_DEVICE_TYPE_MASK;
+		pmu_addr.id |= (cpu << KVM_ARM_DEVICE_TYPE_SHIFT) &
+						KVM_ARM_DEVICE_TYPE_MASK;
+		err = ioctl(kvm->vm_fd, KVM_ARM_SET_DEVICE_ADDR, &pmu_addr);
+		if (err) {
+			printf("%s: KVM_ARM_SET_DEVICE_ADDR failed for CPU%d",
+				__func__, cpu);
+		}
+	}
+
+	_FDT(fdt_begin_node(fdt, "pmu"));
+
+	_FDT(fdt_property(fdt, "compatible", compatible, sizeof(compatible)));
+	_FDT(fdt_property(fdt, "interrupts", irq_prop, sizeof(irq_prop)));
+
+	_FDT(fdt_end_node(fdt));
+}
-- 
1.7.9.5


[-- Attachment #3: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-08-05  9:32   ` Anup Patel
  0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05  9:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Aug 5, 2014 at 2:54 PM, Anup Patel <anup.patel@linaro.org> wrote:
> This patchset enables PMU virtualization in KVM ARM64. The
> Guest can now directly use PMU available on the host HW.
>
> The virtual PMU IRQ injection for Guest VCPUs is managed by
> small piece of code shared between KVM ARM and KVM ARM64. The
> virtual PMU IRQ number will be based on Guest machine model and
> user space will provide it using set device address vm ioctl.
>
> The second last patch of this series implements full context
> switch of PMU registers which will context switch all PMU
> registers on every KVM world-switch.
>
> The last patch implements a lazy context switch of PMU registers
> which is very similar to lazy debug context switch.
> (Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
>
> Also, we reserve last PMU event counter for EL2 mode which
> will not be accessible from Host and Guest EL1 mode. This
> reserved EL2 mode PMU event counter can be used for profiling
> KVM world-switch and other EL2 mode functions.
>
> All testing have been done using KVMTOOL on X-Gene Mustang and
> Foundation v8 Model for both Aarch32 and Aarch64 guest.
>
> Anup Patel (6):
>   ARM64: Move PMU register related defines to asm/pmu.h
>   ARM64: perf: Re-enable overflow interrupt from interrupt handler
>   ARM: perf: Re-enable overflow interrupt from interrupt handler
>   ARM/ARM64: KVM: Add common code PMU IRQ routing
>   ARM64: KVM: Implement full context switch of PMU registers
>   ARM64: KVM: Upgrade to lazy context switch of PMU registers
>
>  arch/arm/include/asm/kvm_host.h   |    9 +
>  arch/arm/include/uapi/asm/kvm.h   |    1 +
>  arch/arm/kernel/perf_event_v7.c   |    8 +
>  arch/arm/kvm/arm.c                |    6 +
>  arch/arm/kvm/reset.c              |    4 +
>  arch/arm64/include/asm/kvm_asm.h  |   39 +++-
>  arch/arm64/include/asm/kvm_host.h |   12 ++
>  arch/arm64/include/asm/pmu.h      |   44 +++++
>  arch/arm64/include/uapi/asm/kvm.h |    1 +
>  arch/arm64/kernel/asm-offsets.c   |    2 +
>  arch/arm64/kernel/perf_event.c    |   40 +---
>  arch/arm64/kvm/Kconfig            |    7 +
>  arch/arm64/kvm/Makefile           |    1 +
>  arch/arm64/kvm/hyp-init.S         |   15 ++
>  arch/arm64/kvm/hyp.S              |  209 +++++++++++++++++++-
>  arch/arm64/kvm/reset.c            |    4 +
>  arch/arm64/kvm/sys_regs.c         |  385 +++++++++++++++++++++++++++++++++----
>  include/kvm/arm_pmu.h             |   52 +++++
>  virt/kvm/arm/pmu.c                |  105 ++++++++++
>  19 files changed, 870 insertions(+), 74 deletions(-)
>  create mode 100644 include/kvm/arm_pmu.h
>  create mode 100644 virt/kvm/arm/pmu.c
>
> --
> 1.7.9.5
>
> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments,
> is for the sole use of the intended recipient(s) and contains information
> that is confidential and proprietary to Applied Micro Circuits Corporation or its subsidiaries.
> It is to be used solely for the purpose of furthering the parties' business relationship.
> All unauthorized review, use, disclosure or distribution is prohibited.
> If you are not the intended recipient, please contact the sender by reply e-mail
> and destroy all copies of the original message.
>

Hi All,

Please apply attached patch to KVMTOOL on-top-of my
recent KVMTOOL patchset for trying this patchset using
KVMTOOL.

Regards,
Anup
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-kvmtool-ARM-ARM64-Add-PMU-node-to-generated-guest-DT.patch
Type: text/x-patch
Size: 3994 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140805/399a674d/attachment.bin>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2014-08-05  9:32   ` Anup Patel
@ 2014-08-05  9:35     ` Anup Patel
  -1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05  9:35 UTC (permalink / raw)
  To: Anup Patel
  Cc: kvmarm, linux-arm-kernel, kvm, patches, Marc Zyngier,
	Christoffer Dall, Will Deacon, Ian Campbell,
	Pranavkumar Sawargaonkar

On 5 August 2014 15:02, Anup Patel <apatel@apm.com> wrote:
> On Tue, Aug 5, 2014 at 2:54 PM, Anup Patel <anup.patel@linaro.org> wrote:
>> This patchset enables PMU virtualization in KVM ARM64. The
>> Guest can now directly use PMU available on the host HW.
>>
>> The virtual PMU IRQ injection for Guest VCPUs is managed by
>> small piece of code shared between KVM ARM and KVM ARM64. The
>> virtual PMU IRQ number will be based on Guest machine model and
>> user space will provide it using set device address vm ioctl.
>>
>> The second last patch of this series implements full context
>> switch of PMU registers which will context switch all PMU
>> registers on every KVM world-switch.
>>
>> The last patch implements a lazy context switch of PMU registers
>> which is very similar to lazy debug context switch.
>> (Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
>>
>> Also, we reserve last PMU event counter for EL2 mode which
>> will not be accessible from Host and Guest EL1 mode. This
>> reserved EL2 mode PMU event counter can be used for profiling
>> KVM world-switch and other EL2 mode functions.
>>
>> All testing have been done using KVMTOOL on X-Gene Mustang and
>> Foundation v8 Model for both Aarch32 and Aarch64 guest.
>>
>> Anup Patel (6):
>>   ARM64: Move PMU register related defines to asm/pmu.h
>>   ARM64: perf: Re-enable overflow interrupt from interrupt handler
>>   ARM: perf: Re-enable overflow interrupt from interrupt handler
>>   ARM/ARM64: KVM: Add common code PMU IRQ routing
>>   ARM64: KVM: Implement full context switch of PMU registers
>>   ARM64: KVM: Upgrade to lazy context switch of PMU registers
>>
>>  arch/arm/include/asm/kvm_host.h   |    9 +
>>  arch/arm/include/uapi/asm/kvm.h   |    1 +
>>  arch/arm/kernel/perf_event_v7.c   |    8 +
>>  arch/arm/kvm/arm.c                |    6 +
>>  arch/arm/kvm/reset.c              |    4 +
>>  arch/arm64/include/asm/kvm_asm.h  |   39 +++-
>>  arch/arm64/include/asm/kvm_host.h |   12 ++
>>  arch/arm64/include/asm/pmu.h      |   44 +++++
>>  arch/arm64/include/uapi/asm/kvm.h |    1 +
>>  arch/arm64/kernel/asm-offsets.c   |    2 +
>>  arch/arm64/kernel/perf_event.c    |   40 +---
>>  arch/arm64/kvm/Kconfig            |    7 +
>>  arch/arm64/kvm/Makefile           |    1 +
>>  arch/arm64/kvm/hyp-init.S         |   15 ++
>>  arch/arm64/kvm/hyp.S              |  209 +++++++++++++++++++-
>>  arch/arm64/kvm/reset.c            |    4 +
>>  arch/arm64/kvm/sys_regs.c         |  385 +++++++++++++++++++++++++++++++++----
>>  include/kvm/arm_pmu.h             |   52 +++++
>>  virt/kvm/arm/pmu.c                |  105 ++++++++++
>>  19 files changed, 870 insertions(+), 74 deletions(-)
>>  create mode 100644 include/kvm/arm_pmu.h
>>  create mode 100644 virt/kvm/arm/pmu.c
>>
>> --
>> 1.7.9.5
>>
>> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments,
>> is for the sole use of the intended recipient(s) and contains information
>> that is confidential and proprietary to Applied Micro Circuits Corporation or its subsidiaries.
>> It is to be used solely for the purpose of furthering the parties' business relationship.
>> All unauthorized review, use, disclosure or distribution is prohibited.
>> If you are not the intended recipient, please contact the sender by reply e-mail
>> and destroy all copies of the original message.

Please ignore this notice, it accidentally sneaked in.

--
Anup

>>
>
> Hi All,
>
> Please apply attached patch to KVMTOOL on-top-of my
> recent KVMTOOL patchset for trying this patchset using
> KVMTOOL.
>
> Regards,
> Anup

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-08-05  9:35     ` Anup Patel
  0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05  9:35 UTC (permalink / raw)
  To: linux-arm-kernel

On 5 August 2014 15:02, Anup Patel <apatel@apm.com> wrote:
> On Tue, Aug 5, 2014 at 2:54 PM, Anup Patel <anup.patel@linaro.org> wrote:
>> This patchset enables PMU virtualization in KVM ARM64. The
>> Guest can now directly use PMU available on the host HW.
>>
>> The virtual PMU IRQ injection for Guest VCPUs is managed by
>> small piece of code shared between KVM ARM and KVM ARM64. The
>> virtual PMU IRQ number will be based on Guest machine model and
>> user space will provide it using set device address vm ioctl.
>>
>> The second last patch of this series implements full context
>> switch of PMU registers which will context switch all PMU
>> registers on every KVM world-switch.
>>
>> The last patch implements a lazy context switch of PMU registers
>> which is very similar to lazy debug context switch.
>> (Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
>>
>> Also, we reserve last PMU event counter for EL2 mode which
>> will not be accessible from Host and Guest EL1 mode. This
>> reserved EL2 mode PMU event counter can be used for profiling
>> KVM world-switch and other EL2 mode functions.
>>
>> All testing have been done using KVMTOOL on X-Gene Mustang and
>> Foundation v8 Model for both Aarch32 and Aarch64 guest.
>>
>> Anup Patel (6):
>>   ARM64: Move PMU register related defines to asm/pmu.h
>>   ARM64: perf: Re-enable overflow interrupt from interrupt handler
>>   ARM: perf: Re-enable overflow interrupt from interrupt handler
>>   ARM/ARM64: KVM: Add common code PMU IRQ routing
>>   ARM64: KVM: Implement full context switch of PMU registers
>>   ARM64: KVM: Upgrade to lazy context switch of PMU registers
>>
>>  arch/arm/include/asm/kvm_host.h   |    9 +
>>  arch/arm/include/uapi/asm/kvm.h   |    1 +
>>  arch/arm/kernel/perf_event_v7.c   |    8 +
>>  arch/arm/kvm/arm.c                |    6 +
>>  arch/arm/kvm/reset.c              |    4 +
>>  arch/arm64/include/asm/kvm_asm.h  |   39 +++-
>>  arch/arm64/include/asm/kvm_host.h |   12 ++
>>  arch/arm64/include/asm/pmu.h      |   44 +++++
>>  arch/arm64/include/uapi/asm/kvm.h |    1 +
>>  arch/arm64/kernel/asm-offsets.c   |    2 +
>>  arch/arm64/kernel/perf_event.c    |   40 +---
>>  arch/arm64/kvm/Kconfig            |    7 +
>>  arch/arm64/kvm/Makefile           |    1 +
>>  arch/arm64/kvm/hyp-init.S         |   15 ++
>>  arch/arm64/kvm/hyp.S              |  209 +++++++++++++++++++-
>>  arch/arm64/kvm/reset.c            |    4 +
>>  arch/arm64/kvm/sys_regs.c         |  385 +++++++++++++++++++++++++++++++++----
>>  include/kvm/arm_pmu.h             |   52 +++++
>>  virt/kvm/arm/pmu.c                |  105 ++++++++++
>>  19 files changed, 870 insertions(+), 74 deletions(-)
>>  create mode 100644 include/kvm/arm_pmu.h
>>  create mode 100644 virt/kvm/arm/pmu.c
>>
>> --
>> 1.7.9.5
>>
>> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments,
>> is for the sole use of the intended recipient(s) and contains information
>> that is confidential and proprietary to Applied Micro Circuits Corporation or its subsidiaries.
>> It is to be used solely for the purpose of furthering the parties' business relationship.
>> All unauthorized review, use, disclosure or distribution is prohibited.
>> If you are not the intended recipient, please contact the sender by reply e-mail
>> and destroy all copies of the original message.

Please ignore this notice, it accidentally sneaked in.

--
Anup

>>
>
> Hi All,
>
> Please apply attached patch to KVMTOOL on-top-of my
> recent KVMTOOL patchset for trying this patchset using
> KVMTOOL.
>
> Regards,
> Anup

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 2/6] ARM64: perf: Re-enable overflow interrupt from interrupt handler
  2014-08-05  9:24   ` Anup Patel
@ 2014-08-06 14:24     ` Will Deacon
  -1 siblings, 0 replies; 78+ messages in thread
From: Will Deacon @ 2014-08-06 14:24 UTC (permalink / raw)
  To: Anup Patel
  Cc: kvmarm, linux-arm-kernel, kvm, patches, Marc Zyngier,
	christoffer.dall, ian.campbell, pranavkumar

On Tue, Aug 05, 2014 at 10:24:11AM +0100, Anup Patel wrote:
> A hypervisor will typically mask the overflow interrupt before
> forwarding it to Guest Linux hence we need to re-enable the overflow
> interrupt after clearing it in Guest Linux. Also, this re-enabling
> of overflow interrupt does not harm in non-virtualized scenarios.
> 
> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
> Signed-off-by: Anup Patel <anup.patel@linaro.org>
> ---
>  arch/arm64/kernel/perf_event.c |    8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index 47dfb8b..19fb140 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -1076,6 +1076,14 @@ static irqreturn_t armv8pmu_handle_irq(int irq_num, void *dev)
>  		if (!armv8pmu_counter_has_overflowed(pmovsr, idx))
>  			continue;
>  
> +		/*
> +		 * If we are running under a hypervisor such as KVM then
> +		 * hypervisor will mask the interrupt before forwarding
> +		 * it to Guest Linux hence re-enable interrupt for the
> +		 * overflowed counter.
> +		 */
> +		armv8pmu_enable_intens(idx);
> +

Really? This is a giant bodge in the guest to work around short-comings in
the hypervisor. Why can't we fix this properly using something like Marc's
irq forwarding code?

Will

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 2/6] ARM64: perf: Re-enable overflow interrupt from interrupt handler
@ 2014-08-06 14:24     ` Will Deacon
  0 siblings, 0 replies; 78+ messages in thread
From: Will Deacon @ 2014-08-06 14:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Aug 05, 2014 at 10:24:11AM +0100, Anup Patel wrote:
> A hypervisor will typically mask the overflow interrupt before
> forwarding it to Guest Linux hence we need to re-enable the overflow
> interrupt after clearing it in Guest Linux. Also, this re-enabling
> of overflow interrupt does not harm in non-virtualized scenarios.
> 
> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
> Signed-off-by: Anup Patel <anup.patel@linaro.org>
> ---
>  arch/arm64/kernel/perf_event.c |    8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index 47dfb8b..19fb140 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -1076,6 +1076,14 @@ static irqreturn_t armv8pmu_handle_irq(int irq_num, void *dev)
>  		if (!armv8pmu_counter_has_overflowed(pmovsr, idx))
>  			continue;
>  
> +		/*
> +		 * If we are running under a hypervisor such as KVM then
> +		 * hypervisor will mask the interrupt before forwarding
> +		 * it to Guest Linux hence re-enable interrupt for the
> +		 * overflowed counter.
> +		 */
> +		armv8pmu_enable_intens(idx);
> +

Really? This is a giant bodge in the guest to work around short-comings in
the hypervisor. Why can't we fix this properly using something like Marc's
irq forwarding code?

Will

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 2/6] ARM64: perf: Re-enable overflow interrupt from interrupt handler
  2014-08-06 14:24     ` Will Deacon
@ 2014-08-07  9:03       ` Anup Patel
  -1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-07  9:03 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, linux-arm-kernel, kvm, patches, Marc Zyngier,
	christoffer.dall, ian.campbell, pranavkumar

On 6 August 2014 19:54, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, Aug 05, 2014 at 10:24:11AM +0100, Anup Patel wrote:
>> A hypervisor will typically mask the overflow interrupt before
>> forwarding it to Guest Linux hence we need to re-enable the overflow
>> interrupt after clearing it in Guest Linux. Also, this re-enabling
>> of overflow interrupt does not harm in non-virtualized scenarios.
>>
>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>> ---
>>  arch/arm64/kernel/perf_event.c |    8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
>> index 47dfb8b..19fb140 100644
>> --- a/arch/arm64/kernel/perf_event.c
>> +++ b/arch/arm64/kernel/perf_event.c
>> @@ -1076,6 +1076,14 @@ static irqreturn_t armv8pmu_handle_irq(int irq_num, void *dev)
>>               if (!armv8pmu_counter_has_overflowed(pmovsr, idx))
>>                       continue;
>>
>> +             /*
>> +              * If we are running under a hypervisor such as KVM then
>> +              * hypervisor will mask the interrupt before forwarding
>> +              * it to Guest Linux hence re-enable interrupt for the
>> +              * overflowed counter.
>> +              */
>> +             armv8pmu_enable_intens(idx);
>> +
>
> Really? This is a giant bodge in the guest to work around short-comings in
> the hypervisor. Why can't we fix this properly using something like Marc's
> irq forwarding code?

This change is in accordance with our previous RFC thread about
PMU virtualization where Marc Z had suggest to do interrupt
mask/unmask dance similar to arch-timer.

I have not tried Marc'z irq forwarding series. In next revision of this
patchset, I will try to use Marc's irq forwarding approach.

>
> Will

--
Anup

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 2/6] ARM64: perf: Re-enable overflow interrupt from interrupt handler
@ 2014-08-07  9:03       ` Anup Patel
  0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-07  9:03 UTC (permalink / raw)
  To: linux-arm-kernel

On 6 August 2014 19:54, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, Aug 05, 2014 at 10:24:11AM +0100, Anup Patel wrote:
>> A hypervisor will typically mask the overflow interrupt before
>> forwarding it to Guest Linux hence we need to re-enable the overflow
>> interrupt after clearing it in Guest Linux. Also, this re-enabling
>> of overflow interrupt does not harm in non-virtualized scenarios.
>>
>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>> ---
>>  arch/arm64/kernel/perf_event.c |    8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
>> index 47dfb8b..19fb140 100644
>> --- a/arch/arm64/kernel/perf_event.c
>> +++ b/arch/arm64/kernel/perf_event.c
>> @@ -1076,6 +1076,14 @@ static irqreturn_t armv8pmu_handle_irq(int irq_num, void *dev)
>>               if (!armv8pmu_counter_has_overflowed(pmovsr, idx))
>>                       continue;
>>
>> +             /*
>> +              * If we are running under a hypervisor such as KVM then
>> +              * hypervisor will mask the interrupt before forwarding
>> +              * it to Guest Linux hence re-enable interrupt for the
>> +              * overflowed counter.
>> +              */
>> +             armv8pmu_enable_intens(idx);
>> +
>
> Really? This is a giant bodge in the guest to work around short-comings in
> the hypervisor. Why can't we fix this properly using something like Marc's
> irq forwarding code?

This change is in accordance with our previous RFC thread about
PMU virtualization where Marc Z had suggest to do interrupt
mask/unmask dance similar to arch-timer.

I have not tried Marc'z irq forwarding series. In next revision of this
patchset, I will try to use Marc's irq forwarding approach.

>
> Will

--
Anup

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 2/6] ARM64: perf: Re-enable overflow interrupt from interrupt handler
  2014-08-07  9:03       ` Anup Patel
@ 2014-08-07  9:06         ` Will Deacon
  -1 siblings, 0 replies; 78+ messages in thread
From: Will Deacon @ 2014-08-07  9:06 UTC (permalink / raw)
  To: Anup Patel
  Cc: kvmarm, linux-arm-kernel, kvm, patches, Marc Zyngier,
	christoffer.dall, ian.campbell, pranavkumar

On Thu, Aug 07, 2014 at 10:03:58AM +0100, Anup Patel wrote:
> On 6 August 2014 19:54, Will Deacon <will.deacon@arm.com> wrote:
> > On Tue, Aug 05, 2014 at 10:24:11AM +0100, Anup Patel wrote:
> >> A hypervisor will typically mask the overflow interrupt before
> >> forwarding it to Guest Linux hence we need to re-enable the overflow
> >> interrupt after clearing it in Guest Linux. Also, this re-enabling
> >> of overflow interrupt does not harm in non-virtualized scenarios.
> >>
> >> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
> >> Signed-off-by: Anup Patel <anup.patel@linaro.org>
> >> ---
> >>  arch/arm64/kernel/perf_event.c |    8 ++++++++
> >>  1 file changed, 8 insertions(+)
> >>
> >> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> >> index 47dfb8b..19fb140 100644
> >> --- a/arch/arm64/kernel/perf_event.c
> >> +++ b/arch/arm64/kernel/perf_event.c
> >> @@ -1076,6 +1076,14 @@ static irqreturn_t armv8pmu_handle_irq(int irq_num, void *dev)
> >>               if (!armv8pmu_counter_has_overflowed(pmovsr, idx))
> >>                       continue;
> >>
> >> +             /*
> >> +              * If we are running under a hypervisor such as KVM then
> >> +              * hypervisor will mask the interrupt before forwarding
> >> +              * it to Guest Linux hence re-enable interrupt for the
> >> +              * overflowed counter.
> >> +              */
> >> +             armv8pmu_enable_intens(idx);
> >> +
> >
> > Really? This is a giant bodge in the guest to work around short-comings in
> > the hypervisor. Why can't we fix this properly using something like Marc's
> > irq forwarding code?
> 
> This change is in accordance with our previous RFC thread about
> PMU virtualization where Marc Z had suggest to do interrupt
> mask/unmask dance similar to arch-timer.
> 
> I have not tried Marc'z irq forwarding series. In next revision of this
> patchset, I will try to use Marc's irq forwarding approach.

That would be good. Judging by the colour Marc went when he saw this patch,
I don't think he intended you to hack perf in this way :)

Will

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 2/6] ARM64: perf: Re-enable overflow interrupt from interrupt handler
@ 2014-08-07  9:06         ` Will Deacon
  0 siblings, 0 replies; 78+ messages in thread
From: Will Deacon @ 2014-08-07  9:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 07, 2014 at 10:03:58AM +0100, Anup Patel wrote:
> On 6 August 2014 19:54, Will Deacon <will.deacon@arm.com> wrote:
> > On Tue, Aug 05, 2014 at 10:24:11AM +0100, Anup Patel wrote:
> >> A hypervisor will typically mask the overflow interrupt before
> >> forwarding it to Guest Linux hence we need to re-enable the overflow
> >> interrupt after clearing it in Guest Linux. Also, this re-enabling
> >> of overflow interrupt does not harm in non-virtualized scenarios.
> >>
> >> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
> >> Signed-off-by: Anup Patel <anup.patel@linaro.org>
> >> ---
> >>  arch/arm64/kernel/perf_event.c |    8 ++++++++
> >>  1 file changed, 8 insertions(+)
> >>
> >> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> >> index 47dfb8b..19fb140 100644
> >> --- a/arch/arm64/kernel/perf_event.c
> >> +++ b/arch/arm64/kernel/perf_event.c
> >> @@ -1076,6 +1076,14 @@ static irqreturn_t armv8pmu_handle_irq(int irq_num, void *dev)
> >>               if (!armv8pmu_counter_has_overflowed(pmovsr, idx))
> >>                       continue;
> >>
> >> +             /*
> >> +              * If we are running under a hypervisor such as KVM then
> >> +              * hypervisor will mask the interrupt before forwarding
> >> +              * it to Guest Linux hence re-enable interrupt for the
> >> +              * overflowed counter.
> >> +              */
> >> +             armv8pmu_enable_intens(idx);
> >> +
> >
> > Really? This is a giant bodge in the guest to work around short-comings in
> > the hypervisor. Why can't we fix this properly using something like Marc's
> > irq forwarding code?
> 
> This change is in accordance with our previous RFC thread about
> PMU virtualization where Marc Z had suggest to do interrupt
> mask/unmask dance similar to arch-timer.
> 
> I have not tried Marc'z irq forwarding series. In next revision of this
> patchset, I will try to use Marc's irq forwarding approach.

That would be good. Judging by the colour Marc went when he saw this patch,
I don't think he intended you to hack perf in this way :)

Will

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2014-08-05  9:24 ` Anup Patel
@ 2014-11-07 20:23   ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-07 20:23 UTC (permalink / raw)
  To: Anup Patel
  Cc: kvmarm, linux-arm-kernel, kvm, patches, marc.zyngier,
	will.deacon, ian.campbell, pranavkumar

Hi Anup,

What are your plans in terms of follow-up on this one?

Should we review these patches and reply to anup _at_ brainfaul.org or
are you looking for someone else to pick them up?

Thanks,
-Christoffer

On Tue, Aug 05, 2014 at 02:54:09PM +0530, Anup Patel wrote:
> This patchset enables PMU virtualization in KVM ARM64. The
> Guest can now directly use PMU available on the host HW.
> 
> The virtual PMU IRQ injection for Guest VCPUs is managed by
> small piece of code shared between KVM ARM and KVM ARM64. The
> virtual PMU IRQ number will be based on Guest machine model and
> user space will provide it using set device address vm ioctl.
> 
> The second last patch of this series implements full context
> switch of PMU registers which will context switch all PMU
> registers on every KVM world-switch.
> 
> The last patch implements a lazy context switch of PMU registers
> which is very similar to lazy debug context switch.
> (Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
> 
> Also, we reserve last PMU event counter for EL2 mode which
> will not be accessible from Host and Guest EL1 mode. This
> reserved EL2 mode PMU event counter can be used for profiling
> KVM world-switch and other EL2 mode functions.
> 
> All testing have been done using KVMTOOL on X-Gene Mustang and
> Foundation v8 Model for both Aarch32 and Aarch64 guest.
> 
> Anup Patel (6):
>   ARM64: Move PMU register related defines to asm/pmu.h
>   ARM64: perf: Re-enable overflow interrupt from interrupt handler
>   ARM: perf: Re-enable overflow interrupt from interrupt handler
>   ARM/ARM64: KVM: Add common code PMU IRQ routing
>   ARM64: KVM: Implement full context switch of PMU registers
>   ARM64: KVM: Upgrade to lazy context switch of PMU registers
> 
>  arch/arm/include/asm/kvm_host.h   |    9 +
>  arch/arm/include/uapi/asm/kvm.h   |    1 +
>  arch/arm/kernel/perf_event_v7.c   |    8 +
>  arch/arm/kvm/arm.c                |    6 +
>  arch/arm/kvm/reset.c              |    4 +
>  arch/arm64/include/asm/kvm_asm.h  |   39 +++-
>  arch/arm64/include/asm/kvm_host.h |   12 ++
>  arch/arm64/include/asm/pmu.h      |   44 +++++
>  arch/arm64/include/uapi/asm/kvm.h |    1 +
>  arch/arm64/kernel/asm-offsets.c   |    2 +
>  arch/arm64/kernel/perf_event.c    |   40 +---
>  arch/arm64/kvm/Kconfig            |    7 +
>  arch/arm64/kvm/Makefile           |    1 +
>  arch/arm64/kvm/hyp-init.S         |   15 ++
>  arch/arm64/kvm/hyp.S              |  209 +++++++++++++++++++-
>  arch/arm64/kvm/reset.c            |    4 +
>  arch/arm64/kvm/sys_regs.c         |  385 +++++++++++++++++++++++++++++++++----
>  include/kvm/arm_pmu.h             |   52 +++++
>  virt/kvm/arm/pmu.c                |  105 ++++++++++
>  19 files changed, 870 insertions(+), 74 deletions(-)
>  create mode 100644 include/kvm/arm_pmu.h
>  create mode 100644 virt/kvm/arm/pmu.c
> 
> -- 
> 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-07 20:23   ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-07 20:23 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Anup,

What are your plans in terms of follow-up on this one?

Should we review these patches and reply to anup _at_ brainfaul.org or
are you looking for someone else to pick them up?

Thanks,
-Christoffer

On Tue, Aug 05, 2014 at 02:54:09PM +0530, Anup Patel wrote:
> This patchset enables PMU virtualization in KVM ARM64. The
> Guest can now directly use PMU available on the host HW.
> 
> The virtual PMU IRQ injection for Guest VCPUs is managed by
> small piece of code shared between KVM ARM and KVM ARM64. The
> virtual PMU IRQ number will be based on Guest machine model and
> user space will provide it using set device address vm ioctl.
> 
> The second last patch of this series implements full context
> switch of PMU registers which will context switch all PMU
> registers on every KVM world-switch.
> 
> The last patch implements a lazy context switch of PMU registers
> which is very similar to lazy debug context switch.
> (Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
> 
> Also, we reserve last PMU event counter for EL2 mode which
> will not be accessible from Host and Guest EL1 mode. This
> reserved EL2 mode PMU event counter can be used for profiling
> KVM world-switch and other EL2 mode functions.
> 
> All testing have been done using KVMTOOL on X-Gene Mustang and
> Foundation v8 Model for both Aarch32 and Aarch64 guest.
> 
> Anup Patel (6):
>   ARM64: Move PMU register related defines to asm/pmu.h
>   ARM64: perf: Re-enable overflow interrupt from interrupt handler
>   ARM: perf: Re-enable overflow interrupt from interrupt handler
>   ARM/ARM64: KVM: Add common code PMU IRQ routing
>   ARM64: KVM: Implement full context switch of PMU registers
>   ARM64: KVM: Upgrade to lazy context switch of PMU registers
> 
>  arch/arm/include/asm/kvm_host.h   |    9 +
>  arch/arm/include/uapi/asm/kvm.h   |    1 +
>  arch/arm/kernel/perf_event_v7.c   |    8 +
>  arch/arm/kvm/arm.c                |    6 +
>  arch/arm/kvm/reset.c              |    4 +
>  arch/arm64/include/asm/kvm_asm.h  |   39 +++-
>  arch/arm64/include/asm/kvm_host.h |   12 ++
>  arch/arm64/include/asm/pmu.h      |   44 +++++
>  arch/arm64/include/uapi/asm/kvm.h |    1 +
>  arch/arm64/kernel/asm-offsets.c   |    2 +
>  arch/arm64/kernel/perf_event.c    |   40 +---
>  arch/arm64/kvm/Kconfig            |    7 +
>  arch/arm64/kvm/Makefile           |    1 +
>  arch/arm64/kvm/hyp-init.S         |   15 ++
>  arch/arm64/kvm/hyp.S              |  209 +++++++++++++++++++-
>  arch/arm64/kvm/reset.c            |    4 +
>  arch/arm64/kvm/sys_regs.c         |  385 +++++++++++++++++++++++++++++++++----
>  include/kvm/arm_pmu.h             |   52 +++++
>  virt/kvm/arm/pmu.c                |  105 ++++++++++
>  19 files changed, 870 insertions(+), 74 deletions(-)
>  create mode 100644 include/kvm/arm_pmu.h
>  create mode 100644 virt/kvm/arm/pmu.c
> 
> -- 
> 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2014-08-05  9:24 ` Anup Patel
@ 2014-11-07 20:25   ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-07 20:25 UTC (permalink / raw)
  To: Anup Patel
  Cc: kvmarm, linux-arm-kernel, kvm, patches, marc.zyngier,
	will.deacon, ian.campbell, pranavkumar

Hi Anup,

[This time to the new email]

What are your plans in terms of follow-up on this one?

Should we review these patches and reply to anup _at_ brainfaul.org or
are you looking for someone else to pick them up?

Thanks,
-Christoffer

On Tue, Aug 05, 2014 at 02:54:09PM +0530, Anup Patel wrote:
> This patchset enables PMU virtualization in KVM ARM64. The
> Guest can now directly use PMU available on the host HW.
> 
> The virtual PMU IRQ injection for Guest VCPUs is managed by
> small piece of code shared between KVM ARM and KVM ARM64. The
> virtual PMU IRQ number will be based on Guest machine model and
> user space will provide it using set device address vm ioctl.
> 
> The second last patch of this series implements full context
> switch of PMU registers which will context switch all PMU
> registers on every KVM world-switch.
> 
> The last patch implements a lazy context switch of PMU registers
> which is very similar to lazy debug context switch.
> (Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
> 
> Also, we reserve last PMU event counter for EL2 mode which
> will not be accessible from Host and Guest EL1 mode. This
> reserved EL2 mode PMU event counter can be used for profiling
> KVM world-switch and other EL2 mode functions.
> 
> All testing have been done using KVMTOOL on X-Gene Mustang and
> Foundation v8 Model for both Aarch32 and Aarch64 guest.
> 
> Anup Patel (6):
>   ARM64: Move PMU register related defines to asm/pmu.h
>   ARM64: perf: Re-enable overflow interrupt from interrupt handler
>   ARM: perf: Re-enable overflow interrupt from interrupt handler
>   ARM/ARM64: KVM: Add common code PMU IRQ routing
>   ARM64: KVM: Implement full context switch of PMU registers
>   ARM64: KVM: Upgrade to lazy context switch of PMU registers
> 
>  arch/arm/include/asm/kvm_host.h   |    9 +
>  arch/arm/include/uapi/asm/kvm.h   |    1 +
>  arch/arm/kernel/perf_event_v7.c   |    8 +
>  arch/arm/kvm/arm.c                |    6 +
>  arch/arm/kvm/reset.c              |    4 +
>  arch/arm64/include/asm/kvm_asm.h  |   39 +++-
>  arch/arm64/include/asm/kvm_host.h |   12 ++
>  arch/arm64/include/asm/pmu.h      |   44 +++++
>  arch/arm64/include/uapi/asm/kvm.h |    1 +
>  arch/arm64/kernel/asm-offsets.c   |    2 +
>  arch/arm64/kernel/perf_event.c    |   40 +---
>  arch/arm64/kvm/Kconfig            |    7 +
>  arch/arm64/kvm/Makefile           |    1 +
>  arch/arm64/kvm/hyp-init.S         |   15 ++
>  arch/arm64/kvm/hyp.S              |  209 +++++++++++++++++++-
>  arch/arm64/kvm/reset.c            |    4 +
>  arch/arm64/kvm/sys_regs.c         |  385 +++++++++++++++++++++++++++++++++----
>  include/kvm/arm_pmu.h             |   52 +++++
>  virt/kvm/arm/pmu.c                |  105 ++++++++++
>  19 files changed, 870 insertions(+), 74 deletions(-)
>  create mode 100644 include/kvm/arm_pmu.h
>  create mode 100644 virt/kvm/arm/pmu.c
> 
> -- 
> 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-07 20:25   ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-07 20:25 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Anup,

[This time to the new email]

What are your plans in terms of follow-up on this one?

Should we review these patches and reply to anup _at_ brainfaul.org or
are you looking for someone else to pick them up?

Thanks,
-Christoffer

On Tue, Aug 05, 2014 at 02:54:09PM +0530, Anup Patel wrote:
> This patchset enables PMU virtualization in KVM ARM64. The
> Guest can now directly use PMU available on the host HW.
> 
> The virtual PMU IRQ injection for Guest VCPUs is managed by
> small piece of code shared between KVM ARM and KVM ARM64. The
> virtual PMU IRQ number will be based on Guest machine model and
> user space will provide it using set device address vm ioctl.
> 
> The second last patch of this series implements full context
> switch of PMU registers which will context switch all PMU
> registers on every KVM world-switch.
> 
> The last patch implements a lazy context switch of PMU registers
> which is very similar to lazy debug context switch.
> (Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
> 
> Also, we reserve last PMU event counter for EL2 mode which
> will not be accessible from Host and Guest EL1 mode. This
> reserved EL2 mode PMU event counter can be used for profiling
> KVM world-switch and other EL2 mode functions.
> 
> All testing have been done using KVMTOOL on X-Gene Mustang and
> Foundation v8 Model for both Aarch32 and Aarch64 guest.
> 
> Anup Patel (6):
>   ARM64: Move PMU register related defines to asm/pmu.h
>   ARM64: perf: Re-enable overflow interrupt from interrupt handler
>   ARM: perf: Re-enable overflow interrupt from interrupt handler
>   ARM/ARM64: KVM: Add common code PMU IRQ routing
>   ARM64: KVM: Implement full context switch of PMU registers
>   ARM64: KVM: Upgrade to lazy context switch of PMU registers
> 
>  arch/arm/include/asm/kvm_host.h   |    9 +
>  arch/arm/include/uapi/asm/kvm.h   |    1 +
>  arch/arm/kernel/perf_event_v7.c   |    8 +
>  arch/arm/kvm/arm.c                |    6 +
>  arch/arm/kvm/reset.c              |    4 +
>  arch/arm64/include/asm/kvm_asm.h  |   39 +++-
>  arch/arm64/include/asm/kvm_host.h |   12 ++
>  arch/arm64/include/asm/pmu.h      |   44 +++++
>  arch/arm64/include/uapi/asm/kvm.h |    1 +
>  arch/arm64/kernel/asm-offsets.c   |    2 +
>  arch/arm64/kernel/perf_event.c    |   40 +---
>  arch/arm64/kvm/Kconfig            |    7 +
>  arch/arm64/kvm/Makefile           |    1 +
>  arch/arm64/kvm/hyp-init.S         |   15 ++
>  arch/arm64/kvm/hyp.S              |  209 +++++++++++++++++++-
>  arch/arm64/kvm/reset.c            |    4 +
>  arch/arm64/kvm/sys_regs.c         |  385 +++++++++++++++++++++++++++++++++----
>  include/kvm/arm_pmu.h             |   52 +++++
>  virt/kvm/arm/pmu.c                |  105 ++++++++++
>  19 files changed, 870 insertions(+), 74 deletions(-)
>  create mode 100644 include/kvm/arm_pmu.h
>  create mode 100644 virt/kvm/arm/pmu.c
> 
> -- 
> 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2014-11-07 20:25   ` Christoffer Dall
@ 2014-11-08  9:36     ` Anup Patel
  -1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-08  9:36 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
	Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar

Hi Christoffer,

On Sat, Nov 8, 2014 at 1:55 AM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> Hi Anup,
>
> [This time to the new email]
>
> What are your plans in terms of follow-up on this one?

Actually, I am already working on RFC v2. I will send-out
RFC v2 sometime next time.

This RFC v2 will be RFC v1 based upon Marc's IRQ
forwarding patchset.

I will try to address PMU context switching for KVM ARM
in RFC v3. Does this sound OK?

Regards,
Anup

>
> Should we review these patches and reply to anup _at_ brainfaul.org or
> are you looking for someone else to pick them up?
>
> Thanks,
> -Christoffer
>
> On Tue, Aug 05, 2014 at 02:54:09PM +0530, Anup Patel wrote:
>> This patchset enables PMU virtualization in KVM ARM64. The
>> Guest can now directly use PMU available on the host HW.
>>
>> The virtual PMU IRQ injection for Guest VCPUs is managed by
>> small piece of code shared between KVM ARM and KVM ARM64. The
>> virtual PMU IRQ number will be based on Guest machine model and
>> user space will provide it using set device address vm ioctl.
>>
>> The second last patch of this series implements full context
>> switch of PMU registers which will context switch all PMU
>> registers on every KVM world-switch.
>>
>> The last patch implements a lazy context switch of PMU registers
>> which is very similar to lazy debug context switch.
>> (Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
>>
>> Also, we reserve last PMU event counter for EL2 mode which
>> will not be accessible from Host and Guest EL1 mode. This
>> reserved EL2 mode PMU event counter can be used for profiling
>> KVM world-switch and other EL2 mode functions.
>>
>> All testing have been done using KVMTOOL on X-Gene Mustang and
>> Foundation v8 Model for both Aarch32 and Aarch64 guest.
>>
>> Anup Patel (6):
>>   ARM64: Move PMU register related defines to asm/pmu.h
>>   ARM64: perf: Re-enable overflow interrupt from interrupt handler
>>   ARM: perf: Re-enable overflow interrupt from interrupt handler
>>   ARM/ARM64: KVM: Add common code PMU IRQ routing
>>   ARM64: KVM: Implement full context switch of PMU registers
>>   ARM64: KVM: Upgrade to lazy context switch of PMU registers
>>
>>  arch/arm/include/asm/kvm_host.h   |    9 +
>>  arch/arm/include/uapi/asm/kvm.h   |    1 +
>>  arch/arm/kernel/perf_event_v7.c   |    8 +
>>  arch/arm/kvm/arm.c                |    6 +
>>  arch/arm/kvm/reset.c              |    4 +
>>  arch/arm64/include/asm/kvm_asm.h  |   39 +++-
>>  arch/arm64/include/asm/kvm_host.h |   12 ++
>>  arch/arm64/include/asm/pmu.h      |   44 +++++
>>  arch/arm64/include/uapi/asm/kvm.h |    1 +
>>  arch/arm64/kernel/asm-offsets.c   |    2 +
>>  arch/arm64/kernel/perf_event.c    |   40 +---
>>  arch/arm64/kvm/Kconfig            |    7 +
>>  arch/arm64/kvm/Makefile           |    1 +
>>  arch/arm64/kvm/hyp-init.S         |   15 ++
>>  arch/arm64/kvm/hyp.S              |  209 +++++++++++++++++++-
>>  arch/arm64/kvm/reset.c            |    4 +
>>  arch/arm64/kvm/sys_regs.c         |  385 +++++++++++++++++++++++++++++++++----
>>  include/kvm/arm_pmu.h             |   52 +++++
>>  virt/kvm/arm/pmu.c                |  105 ++++++++++
>>  19 files changed, 870 insertions(+), 74 deletions(-)
>>  create mode 100644 include/kvm/arm_pmu.h
>>  create mode 100644 virt/kvm/arm/pmu.c
>>
>> --
>> 1.7.9.5
>>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-08  9:36     ` Anup Patel
  0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-08  9:36 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On Sat, Nov 8, 2014 at 1:55 AM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> Hi Anup,
>
> [This time to the new email]
>
> What are your plans in terms of follow-up on this one?

Actually, I am already working on RFC v2. I will send-out
RFC v2 sometime next time.

This RFC v2 will be RFC v1 based upon Marc's IRQ
forwarding patchset.

I will try to address PMU context switching for KVM ARM
in RFC v3. Does this sound OK?

Regards,
Anup

>
> Should we review these patches and reply to anup _at_ brainfaul.org or
> are you looking for someone else to pick them up?
>
> Thanks,
> -Christoffer
>
> On Tue, Aug 05, 2014 at 02:54:09PM +0530, Anup Patel wrote:
>> This patchset enables PMU virtualization in KVM ARM64. The
>> Guest can now directly use PMU available on the host HW.
>>
>> The virtual PMU IRQ injection for Guest VCPUs is managed by
>> small piece of code shared between KVM ARM and KVM ARM64. The
>> virtual PMU IRQ number will be based on Guest machine model and
>> user space will provide it using set device address vm ioctl.
>>
>> The second last patch of this series implements full context
>> switch of PMU registers which will context switch all PMU
>> registers on every KVM world-switch.
>>
>> The last patch implements a lazy context switch of PMU registers
>> which is very similar to lazy debug context switch.
>> (Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
>>
>> Also, we reserve last PMU event counter for EL2 mode which
>> will not be accessible from Host and Guest EL1 mode. This
>> reserved EL2 mode PMU event counter can be used for profiling
>> KVM world-switch and other EL2 mode functions.
>>
>> All testing have been done using KVMTOOL on X-Gene Mustang and
>> Foundation v8 Model for both Aarch32 and Aarch64 guest.
>>
>> Anup Patel (6):
>>   ARM64: Move PMU register related defines to asm/pmu.h
>>   ARM64: perf: Re-enable overflow interrupt from interrupt handler
>>   ARM: perf: Re-enable overflow interrupt from interrupt handler
>>   ARM/ARM64: KVM: Add common code PMU IRQ routing
>>   ARM64: KVM: Implement full context switch of PMU registers
>>   ARM64: KVM: Upgrade to lazy context switch of PMU registers
>>
>>  arch/arm/include/asm/kvm_host.h   |    9 +
>>  arch/arm/include/uapi/asm/kvm.h   |    1 +
>>  arch/arm/kernel/perf_event_v7.c   |    8 +
>>  arch/arm/kvm/arm.c                |    6 +
>>  arch/arm/kvm/reset.c              |    4 +
>>  arch/arm64/include/asm/kvm_asm.h  |   39 +++-
>>  arch/arm64/include/asm/kvm_host.h |   12 ++
>>  arch/arm64/include/asm/pmu.h      |   44 +++++
>>  arch/arm64/include/uapi/asm/kvm.h |    1 +
>>  arch/arm64/kernel/asm-offsets.c   |    2 +
>>  arch/arm64/kernel/perf_event.c    |   40 +---
>>  arch/arm64/kvm/Kconfig            |    7 +
>>  arch/arm64/kvm/Makefile           |    1 +
>>  arch/arm64/kvm/hyp-init.S         |   15 ++
>>  arch/arm64/kvm/hyp.S              |  209 +++++++++++++++++++-
>>  arch/arm64/kvm/reset.c            |    4 +
>>  arch/arm64/kvm/sys_regs.c         |  385 +++++++++++++++++++++++++++++++++----
>>  include/kvm/arm_pmu.h             |   52 +++++
>>  virt/kvm/arm/pmu.c                |  105 ++++++++++
>>  19 files changed, 870 insertions(+), 74 deletions(-)
>>  create mode 100644 include/kvm/arm_pmu.h
>>  create mode 100644 virt/kvm/arm/pmu.c
>>
>> --
>> 1.7.9.5
>>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2014-11-08  9:36     ` Anup Patel
@ 2014-11-08 12:39       ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-08 12:39 UTC (permalink / raw)
  To: Anup Patel
  Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
	Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar

Yes, sounds good.  I will review RFC v2 then.

-Christoffer

On Sat, Nov 8, 2014 at 10:36 AM, Anup Patel <anup@brainfault.org> wrote:
> Hi Christoffer,
>
> On Sat, Nov 8, 2014 at 1:55 AM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
>> Hi Anup,
>>
>> [This time to the new email]
>>
>> What are your plans in terms of follow-up on this one?
>
> Actually, I am already working on RFC v2. I will send-out
> RFC v2 sometime next time.
>
> This RFC v2 will be RFC v1 based upon Marc's IRQ
> forwarding patchset.
>
> I will try to address PMU context switching for KVM ARM
> in RFC v3. Does this sound OK?
>
> Regards,
> Anup
>
>>
>> Should we review these patches and reply to anup _at_ brainfaul.org or
>> are you looking for someone else to pick them up?
>>
>> Thanks,
>> -Christoffer
>>
>> On Tue, Aug 05, 2014 at 02:54:09PM +0530, Anup Patel wrote:
>>> This patchset enables PMU virtualization in KVM ARM64. The
>>> Guest can now directly use PMU available on the host HW.
>>>
>>> The virtual PMU IRQ injection for Guest VCPUs is managed by
>>> small piece of code shared between KVM ARM and KVM ARM64. The
>>> virtual PMU IRQ number will be based on Guest machine model and
>>> user space will provide it using set device address vm ioctl.
>>>
>>> The second last patch of this series implements full context
>>> switch of PMU registers which will context switch all PMU
>>> registers on every KVM world-switch.
>>>
>>> The last patch implements a lazy context switch of PMU registers
>>> which is very similar to lazy debug context switch.
>>> (Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
>>>
>>> Also, we reserve last PMU event counter for EL2 mode which
>>> will not be accessible from Host and Guest EL1 mode. This
>>> reserved EL2 mode PMU event counter can be used for profiling
>>> KVM world-switch and other EL2 mode functions.
>>>
>>> All testing have been done using KVMTOOL on X-Gene Mustang and
>>> Foundation v8 Model for both Aarch32 and Aarch64 guest.
>>>
>>> Anup Patel (6):
>>>   ARM64: Move PMU register related defines to asm/pmu.h
>>>   ARM64: perf: Re-enable overflow interrupt from interrupt handler
>>>   ARM: perf: Re-enable overflow interrupt from interrupt handler
>>>   ARM/ARM64: KVM: Add common code PMU IRQ routing
>>>   ARM64: KVM: Implement full context switch of PMU registers
>>>   ARM64: KVM: Upgrade to lazy context switch of PMU registers
>>>
>>>  arch/arm/include/asm/kvm_host.h   |    9 +
>>>  arch/arm/include/uapi/asm/kvm.h   |    1 +
>>>  arch/arm/kernel/perf_event_v7.c   |    8 +
>>>  arch/arm/kvm/arm.c                |    6 +
>>>  arch/arm/kvm/reset.c              |    4 +
>>>  arch/arm64/include/asm/kvm_asm.h  |   39 +++-
>>>  arch/arm64/include/asm/kvm_host.h |   12 ++
>>>  arch/arm64/include/asm/pmu.h      |   44 +++++
>>>  arch/arm64/include/uapi/asm/kvm.h |    1 +
>>>  arch/arm64/kernel/asm-offsets.c   |    2 +
>>>  arch/arm64/kernel/perf_event.c    |   40 +---
>>>  arch/arm64/kvm/Kconfig            |    7 +
>>>  arch/arm64/kvm/Makefile           |    1 +
>>>  arch/arm64/kvm/hyp-init.S         |   15 ++
>>>  arch/arm64/kvm/hyp.S              |  209 +++++++++++++++++++-
>>>  arch/arm64/kvm/reset.c            |    4 +
>>>  arch/arm64/kvm/sys_regs.c         |  385 +++++++++++++++++++++++++++++++++----
>>>  include/kvm/arm_pmu.h             |   52 +++++
>>>  virt/kvm/arm/pmu.c                |  105 ++++++++++
>>>  19 files changed, 870 insertions(+), 74 deletions(-)
>>>  create mode 100644 include/kvm/arm_pmu.h
>>>  create mode 100644 virt/kvm/arm/pmu.c
>>>
>>> --
>>> 1.7.9.5
>>>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-08 12:39       ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-08 12:39 UTC (permalink / raw)
  To: linux-arm-kernel

Yes, sounds good.  I will review RFC v2 then.

-Christoffer

On Sat, Nov 8, 2014 at 10:36 AM, Anup Patel <anup@brainfault.org> wrote:
> Hi Christoffer,
>
> On Sat, Nov 8, 2014 at 1:55 AM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
>> Hi Anup,
>>
>> [This time to the new email]
>>
>> What are your plans in terms of follow-up on this one?
>
> Actually, I am already working on RFC v2. I will send-out
> RFC v2 sometime next time.
>
> This RFC v2 will be RFC v1 based upon Marc's IRQ
> forwarding patchset.
>
> I will try to address PMU context switching for KVM ARM
> in RFC v3. Does this sound OK?
>
> Regards,
> Anup
>
>>
>> Should we review these patches and reply to anup _at_ brainfaul.org or
>> are you looking for someone else to pick them up?
>>
>> Thanks,
>> -Christoffer
>>
>> On Tue, Aug 05, 2014 at 02:54:09PM +0530, Anup Patel wrote:
>>> This patchset enables PMU virtualization in KVM ARM64. The
>>> Guest can now directly use PMU available on the host HW.
>>>
>>> The virtual PMU IRQ injection for Guest VCPUs is managed by
>>> small piece of code shared between KVM ARM and KVM ARM64. The
>>> virtual PMU IRQ number will be based on Guest machine model and
>>> user space will provide it using set device address vm ioctl.
>>>
>>> The second last patch of this series implements full context
>>> switch of PMU registers which will context switch all PMU
>>> registers on every KVM world-switch.
>>>
>>> The last patch implements a lazy context switch of PMU registers
>>> which is very similar to lazy debug context switch.
>>> (Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
>>>
>>> Also, we reserve last PMU event counter for EL2 mode which
>>> will not be accessible from Host and Guest EL1 mode. This
>>> reserved EL2 mode PMU event counter can be used for profiling
>>> KVM world-switch and other EL2 mode functions.
>>>
>>> All testing have been done using KVMTOOL on X-Gene Mustang and
>>> Foundation v8 Model for both Aarch32 and Aarch64 guest.
>>>
>>> Anup Patel (6):
>>>   ARM64: Move PMU register related defines to asm/pmu.h
>>>   ARM64: perf: Re-enable overflow interrupt from interrupt handler
>>>   ARM: perf: Re-enable overflow interrupt from interrupt handler
>>>   ARM/ARM64: KVM: Add common code PMU IRQ routing
>>>   ARM64: KVM: Implement full context switch of PMU registers
>>>   ARM64: KVM: Upgrade to lazy context switch of PMU registers
>>>
>>>  arch/arm/include/asm/kvm_host.h   |    9 +
>>>  arch/arm/include/uapi/asm/kvm.h   |    1 +
>>>  arch/arm/kernel/perf_event_v7.c   |    8 +
>>>  arch/arm/kvm/arm.c                |    6 +
>>>  arch/arm/kvm/reset.c              |    4 +
>>>  arch/arm64/include/asm/kvm_asm.h  |   39 +++-
>>>  arch/arm64/include/asm/kvm_host.h |   12 ++
>>>  arch/arm64/include/asm/pmu.h      |   44 +++++
>>>  arch/arm64/include/uapi/asm/kvm.h |    1 +
>>>  arch/arm64/kernel/asm-offsets.c   |    2 +
>>>  arch/arm64/kernel/perf_event.c    |   40 +---
>>>  arch/arm64/kvm/Kconfig            |    7 +
>>>  arch/arm64/kvm/Makefile           |    1 +
>>>  arch/arm64/kvm/hyp-init.S         |   15 ++
>>>  arch/arm64/kvm/hyp.S              |  209 +++++++++++++++++++-
>>>  arch/arm64/kvm/reset.c            |    4 +
>>>  arch/arm64/kvm/sys_regs.c         |  385 +++++++++++++++++++++++++++++++++----
>>>  include/kvm/arm_pmu.h             |   52 +++++
>>>  virt/kvm/arm/pmu.c                |  105 ++++++++++
>>>  19 files changed, 870 insertions(+), 74 deletions(-)
>>>  create mode 100644 include/kvm/arm_pmu.h
>>>  create mode 100644 virt/kvm/arm/pmu.c
>>>
>>> --
>>> 1.7.9.5
>>>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2014-11-08 12:39       ` Christoffer Dall
@ 2014-11-11  9:18         ` Anup Patel
  -1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-11  9:18 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
	Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar

Hi All,

I have second thoughts about rebasing KVM PMU patches
to Marc's irq-forwarding patches.

The PMU IRQs (when virtualized by KVM) are not exactly
forwarded IRQs because they are shared between Host
and Guest.

Scenario1
-------------

We might have perf running on Host and no KVM guest
running. In this scenario, we wont get interrupts on Host
because the kvm_pmu_hyp_init() (similar to the function
kvm_timer_hyp_init() of Marc's IRQ-forwarding
implementation) has put all host PMU IRQs in forwarding
mode.

The only way solve this problem is to not set forwarding
mode for PMU IRQs in kvm_pmu_hyp_init() and instead
have special routines to turn on and turn off the forwarding
mode of PMU IRQs. These routines will be called from
kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
forwarding state.

Scenario2
-------------

We might have perf running on Host and Guest simultaneously
which means it is quite likely that PMU HW trigger IRQ meant
for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
of Marc's patchset which is called before local_irq_enable()).

In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
will accidentally forward IRQ meant for Host to Guest unless
we put additional checks to inspect VCPU PMU state.

Am I missing any detail about IRQ forwarding for above
scenarios?

If not then can we consider current mask/unmask approach
for forwarding PMU IRQs?

Marc?? Will??

Regards,
Anup

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-11  9:18         ` Anup Patel
  0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-11  9:18 UTC (permalink / raw)
  To: linux-arm-kernel

Hi All,

I have second thoughts about rebasing KVM PMU patches
to Marc's irq-forwarding patches.

The PMU IRQs (when virtualized by KVM) are not exactly
forwarded IRQs because they are shared between Host
and Guest.

Scenario1
-------------

We might have perf running on Host and no KVM guest
running. In this scenario, we wont get interrupts on Host
because the kvm_pmu_hyp_init() (similar to the function
kvm_timer_hyp_init() of Marc's IRQ-forwarding
implementation) has put all host PMU IRQs in forwarding
mode.

The only way solve this problem is to not set forwarding
mode for PMU IRQs in kvm_pmu_hyp_init() and instead
have special routines to turn on and turn off the forwarding
mode of PMU IRQs. These routines will be called from
kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
forwarding state.

Scenario2
-------------

We might have perf running on Host and Guest simultaneously
which means it is quite likely that PMU HW trigger IRQ meant
for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
of Marc's patchset which is called before local_irq_enable()).

In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
will accidentally forward IRQ meant for Host to Guest unless
we put additional checks to inspect VCPU PMU state.

Am I missing any detail about IRQ forwarding for above
scenarios?

If not then can we consider current mask/unmask approach
for forwarding PMU IRQs?

Marc?? Will??

Regards,
Anup

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2014-11-11  9:18         ` Anup Patel
@ 2014-11-18  3:24           ` Anup Patel
  -1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-18  3:24 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
	Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar

On Tue, Nov 11, 2014 at 2:48 PM, Anup Patel <anup@brainfault.org> wrote:
> Hi All,
>
> I have second thoughts about rebasing KVM PMU patches
> to Marc's irq-forwarding patches.
>
> The PMU IRQs (when virtualized by KVM) are not exactly
> forwarded IRQs because they are shared between Host
> and Guest.
>
> Scenario1
> -------------
>
> We might have perf running on Host and no KVM guest
> running. In this scenario, we wont get interrupts on Host
> because the kvm_pmu_hyp_init() (similar to the function
> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> implementation) has put all host PMU IRQs in forwarding
> mode.
>
> The only way solve this problem is to not set forwarding
> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> have special routines to turn on and turn off the forwarding
> mode of PMU IRQs. These routines will be called from
> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> forwarding state.
>
> Scenario2
> -------------
>
> We might have perf running on Host and Guest simultaneously
> which means it is quite likely that PMU HW trigger IRQ meant
> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> of Marc's patchset which is called before local_irq_enable()).
>
> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> will accidentally forward IRQ meant for Host to Guest unless
> we put additional checks to inspect VCPU PMU state.
>
> Am I missing any detail about IRQ forwarding for above
> scenarios?
>
> If not then can we consider current mask/unmask approach
> for forwarding PMU IRQs?
>
> Marc?? Will??
>
> Regards,
> Anup

Ping ???

--
Anup

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-18  3:24           ` Anup Patel
  0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-18  3:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 11, 2014 at 2:48 PM, Anup Patel <anup@brainfault.org> wrote:
> Hi All,
>
> I have second thoughts about rebasing KVM PMU patches
> to Marc's irq-forwarding patches.
>
> The PMU IRQs (when virtualized by KVM) are not exactly
> forwarded IRQs because they are shared between Host
> and Guest.
>
> Scenario1
> -------------
>
> We might have perf running on Host and no KVM guest
> running. In this scenario, we wont get interrupts on Host
> because the kvm_pmu_hyp_init() (similar to the function
> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> implementation) has put all host PMU IRQs in forwarding
> mode.
>
> The only way solve this problem is to not set forwarding
> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> have special routines to turn on and turn off the forwarding
> mode of PMU IRQs. These routines will be called from
> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> forwarding state.
>
> Scenario2
> -------------
>
> We might have perf running on Host and Guest simultaneously
> which means it is quite likely that PMU HW trigger IRQ meant
> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> of Marc's patchset which is called before local_irq_enable()).
>
> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> will accidentally forward IRQ meant for Host to Guest unless
> we put additional checks to inspect VCPU PMU state.
>
> Am I missing any detail about IRQ forwarding for above
> scenarios?
>
> If not then can we consider current mask/unmask approach
> for forwarding PMU IRQs?
>
> Marc?? Will??
>
> Regards,
> Anup

Ping ???

--
Anup

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2014-11-11  9:18         ` Anup Patel
@ 2014-11-19 15:29           ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-19 15:29 UTC (permalink / raw)
  To: Anup Patel
  Cc: Ian Campbell, KVM General, Marc Zyngier, patches, Will Deacon,
	kvmarm, linux-arm-kernel, Pranavkumar Sawargaonkar

On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
> Hi All,
> 
> I have second thoughts about rebasing KVM PMU patches
> to Marc's irq-forwarding patches.
> 
> The PMU IRQs (when virtualized by KVM) are not exactly
> forwarded IRQs because they are shared between Host
> and Guest.
> 
> Scenario1
> -------------
> 
> We might have perf running on Host and no KVM guest
> running. In this scenario, we wont get interrupts on Host
> because the kvm_pmu_hyp_init() (similar to the function
> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> implementation) has put all host PMU IRQs in forwarding
> mode.
> 
> The only way solve this problem is to not set forwarding
> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> have special routines to turn on and turn off the forwarding
> mode of PMU IRQs. These routines will be called from
> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> forwarding state.
> 
> Scenario2
> -------------
> 
> We might have perf running on Host and Guest simultaneously
> which means it is quite likely that PMU HW trigger IRQ meant
> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> of Marc's patchset which is called before local_irq_enable()).
> 
> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> will accidentally forward IRQ meant for Host to Guest unless
> we put additional checks to inspect VCPU PMU state.
> 
> Am I missing any detail about IRQ forwarding for above
> scenarios?
> 
Hi Anup,

I briefly discussed this with Marc.  What I don't understand is how it
would be possible to get an interrupt for the host while running the
guest?

The rationale behind my question is that whenever you're running the
guest, the PMU should be programmed exclusively with guest state, and
since the PMU is per core, any interrupts should be for the guest, where
it would always be pending.

When migrating a VM with a pending PMU interrupt away for a CPU core, we
also capture the active state (the forwarding patches already handle
this), and obviously the PMU state along with it.

Does this address your concern?

-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-19 15:29           ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-19 15:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
> Hi All,
> 
> I have second thoughts about rebasing KVM PMU patches
> to Marc's irq-forwarding patches.
> 
> The PMU IRQs (when virtualized by KVM) are not exactly
> forwarded IRQs because they are shared between Host
> and Guest.
> 
> Scenario1
> -------------
> 
> We might have perf running on Host and no KVM guest
> running. In this scenario, we wont get interrupts on Host
> because the kvm_pmu_hyp_init() (similar to the function
> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> implementation) has put all host PMU IRQs in forwarding
> mode.
> 
> The only way solve this problem is to not set forwarding
> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> have special routines to turn on and turn off the forwarding
> mode of PMU IRQs. These routines will be called from
> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> forwarding state.
> 
> Scenario2
> -------------
> 
> We might have perf running on Host and Guest simultaneously
> which means it is quite likely that PMU HW trigger IRQ meant
> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> of Marc's patchset which is called before local_irq_enable()).
> 
> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> will accidentally forward IRQ meant for Host to Guest unless
> we put additional checks to inspect VCPU PMU state.
> 
> Am I missing any detail about IRQ forwarding for above
> scenarios?
> 
Hi Anup,

I briefly discussed this with Marc.  What I don't understand is how it
would be possible to get an interrupt for the host while running the
guest?

The rationale behind my question is that whenever you're running the
guest, the PMU should be programmed exclusively with guest state, and
since the PMU is per core, any interrupts should be for the guest, where
it would always be pending.

When migrating a VM with a pending PMU interrupt away for a CPU core, we
also capture the active state (the forwarding patches already handle
this), and obviously the PMU state along with it.

Does this address your concern?

-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2014-11-19 15:29           ` Christoffer Dall
@ 2014-11-20 14:47             ` Anup Patel
  -1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-20 14:47 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
	Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar

On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>> Hi All,
>>
>> I have second thoughts about rebasing KVM PMU patches
>> to Marc's irq-forwarding patches.
>>
>> The PMU IRQs (when virtualized by KVM) are not exactly
>> forwarded IRQs because they are shared between Host
>> and Guest.
>>
>> Scenario1
>> -------------
>>
>> We might have perf running on Host and no KVM guest
>> running. In this scenario, we wont get interrupts on Host
>> because the kvm_pmu_hyp_init() (similar to the function
>> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>> implementation) has put all host PMU IRQs in forwarding
>> mode.
>>
>> The only way solve this problem is to not set forwarding
>> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>> have special routines to turn on and turn off the forwarding
>> mode of PMU IRQs. These routines will be called from
>> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>> forwarding state.
>>
>> Scenario2
>> -------------
>>
>> We might have perf running on Host and Guest simultaneously
>> which means it is quite likely that PMU HW trigger IRQ meant
>> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>> of Marc's patchset which is called before local_irq_enable()).
>>
>> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>> will accidentally forward IRQ meant for Host to Guest unless
>> we put additional checks to inspect VCPU PMU state.
>>
>> Am I missing any detail about IRQ forwarding for above
>> scenarios?
>>
> Hi Anup,

Hi Christoffer,

>
> I briefly discussed this with Marc.  What I don't understand is how it
> would be possible to get an interrupt for the host while running the
> guest?
>
> The rationale behind my question is that whenever you're running the
> guest, the PMU should be programmed exclusively with guest state, and
> since the PMU is per core, any interrupts should be for the guest, where
> it would always be pending.

Yes, thats right PMU is programmed exclusively for guest when
guest is running and for host when host is running.

Let us assume a situation (Scenario2 mentioned previously)
where both host and guest are using PMU. When the guest is
running we come back to host mode due to variety of reasons
(stage2 fault, guest IO, regular host interrupt, host interrupt
meant for guest, ....) which means we will return from the
"ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
At this point we would have restored back host PMU context and
any PMU counter used by host can trigger PMU overflow interrup
for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
in the kvm_arch_vcpu_ioctl_run() function (similar to the
kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
which will try to detect PMU irq forwarding state in GIC hence it
can accidentally discover PMU irq pending for guest while this
PMU irq is actually meant for host.

This above mentioned situation does not happen for timer
because virtual timer interrupts are exclusively used for guest.
The exclusive use of virtual timer interrupt for guest ensures that
the function kvm_timer_sync_hwstate() will always see correct
state of virtual timer IRQ from GIC.

>
> When migrating a VM with a pending PMU interrupt away for a CPU core, we
> also capture the active state (the forwarding patches already handle
> this), and obviously the PMU state along with it.

Yes, the migration of PMU state and PMU interrupt state is
quite clear.

>
> Does this address your concern?

I hope above description give you idea about the concern
raised by me.

>
> -Christoffer

Regards,
Anup

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-20 14:47             ` Anup Patel
  0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-20 14:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>> Hi All,
>>
>> I have second thoughts about rebasing KVM PMU patches
>> to Marc's irq-forwarding patches.
>>
>> The PMU IRQs (when virtualized by KVM) are not exactly
>> forwarded IRQs because they are shared between Host
>> and Guest.
>>
>> Scenario1
>> -------------
>>
>> We might have perf running on Host and no KVM guest
>> running. In this scenario, we wont get interrupts on Host
>> because the kvm_pmu_hyp_init() (similar to the function
>> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>> implementation) has put all host PMU IRQs in forwarding
>> mode.
>>
>> The only way solve this problem is to not set forwarding
>> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>> have special routines to turn on and turn off the forwarding
>> mode of PMU IRQs. These routines will be called from
>> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>> forwarding state.
>>
>> Scenario2
>> -------------
>>
>> We might have perf running on Host and Guest simultaneously
>> which means it is quite likely that PMU HW trigger IRQ meant
>> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>> of Marc's patchset which is called before local_irq_enable()).
>>
>> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>> will accidentally forward IRQ meant for Host to Guest unless
>> we put additional checks to inspect VCPU PMU state.
>>
>> Am I missing any detail about IRQ forwarding for above
>> scenarios?
>>
> Hi Anup,

Hi Christoffer,

>
> I briefly discussed this with Marc.  What I don't understand is how it
> would be possible to get an interrupt for the host while running the
> guest?
>
> The rationale behind my question is that whenever you're running the
> guest, the PMU should be programmed exclusively with guest state, and
> since the PMU is per core, any interrupts should be for the guest, where
> it would always be pending.

Yes, thats right PMU is programmed exclusively for guest when
guest is running and for host when host is running.

Let us assume a situation (Scenario2 mentioned previously)
where both host and guest are using PMU. When the guest is
running we come back to host mode due to variety of reasons
(stage2 fault, guest IO, regular host interrupt, host interrupt
meant for guest, ....) which means we will return from the
"ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
At this point we would have restored back host PMU context and
any PMU counter used by host can trigger PMU overflow interrup
for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
in the kvm_arch_vcpu_ioctl_run() function (similar to the
kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
which will try to detect PMU irq forwarding state in GIC hence it
can accidentally discover PMU irq pending for guest while this
PMU irq is actually meant for host.

This above mentioned situation does not happen for timer
because virtual timer interrupts are exclusively used for guest.
The exclusive use of virtual timer interrupt for guest ensures that
the function kvm_timer_sync_hwstate() will always see correct
state of virtual timer IRQ from GIC.

>
> When migrating a VM with a pending PMU interrupt away for a CPU core, we
> also capture the active state (the forwarding patches already handle
> this), and obviously the PMU state along with it.

Yes, the migration of PMU state and PMU interrupt state is
quite clear.

>
> Does this address your concern?

I hope above description give you idea about the concern
raised by me.

>
> -Christoffer

Regards,
Anup

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2014-11-20 14:47             ` Anup Patel
@ 2014-11-21  9:59               ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-21  9:59 UTC (permalink / raw)
  To: Anup Patel
  Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
	Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar

On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
> >> Hi All,
> >>
> >> I have second thoughts about rebasing KVM PMU patches
> >> to Marc's irq-forwarding patches.
> >>
> >> The PMU IRQs (when virtualized by KVM) are not exactly
> >> forwarded IRQs because they are shared between Host
> >> and Guest.
> >>
> >> Scenario1
> >> -------------
> >>
> >> We might have perf running on Host and no KVM guest
> >> running. In this scenario, we wont get interrupts on Host
> >> because the kvm_pmu_hyp_init() (similar to the function
> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> >> implementation) has put all host PMU IRQs in forwarding
> >> mode.
> >>
> >> The only way solve this problem is to not set forwarding
> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> >> have special routines to turn on and turn off the forwarding
> >> mode of PMU IRQs. These routines will be called from
> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> >> forwarding state.
> >>
> >> Scenario2
> >> -------------
> >>
> >> We might have perf running on Host and Guest simultaneously
> >> which means it is quite likely that PMU HW trigger IRQ meant
> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> >> of Marc's patchset which is called before local_irq_enable()).
> >>
> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> >> will accidentally forward IRQ meant for Host to Guest unless
> >> we put additional checks to inspect VCPU PMU state.
> >>
> >> Am I missing any detail about IRQ forwarding for above
> >> scenarios?
> >>
> > Hi Anup,
> 
> Hi Christoffer,
> 
> >
> > I briefly discussed this with Marc.  What I don't understand is how it
> > would be possible to get an interrupt for the host while running the
> > guest?
> >
> > The rationale behind my question is that whenever you're running the
> > guest, the PMU should be programmed exclusively with guest state, and
> > since the PMU is per core, any interrupts should be for the guest, where
> > it would always be pending.
> 
> Yes, thats right PMU is programmed exclusively for guest when
> guest is running and for host when host is running.
> 
> Let us assume a situation (Scenario2 mentioned previously)
> where both host and guest are using PMU. When the guest is
> running we come back to host mode due to variety of reasons
> (stage2 fault, guest IO, regular host interrupt, host interrupt
> meant for guest, ....) which means we will return from the
> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
> At this point we would have restored back host PMU context and
> any PMU counter used by host can trigger PMU overflow interrup
> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
> in the kvm_arch_vcpu_ioctl_run() function (similar to the
> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
> which will try to detect PMU irq forwarding state in GIC hence it
> can accidentally discover PMU irq pending for guest while this
> PMU irq is actually meant for host.
> 
> This above mentioned situation does not happen for timer
> because virtual timer interrupts are exclusively used for guest.
> The exclusive use of virtual timer interrupt for guest ensures that
> the function kvm_timer_sync_hwstate() will always see correct
> state of virtual timer IRQ from GIC.
> 
I'm not quite following.

When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
you would (1) capture the active state of the IRQ pertaining to the
guest and (2) deactive the IRQ on the host, then (3) switch the state of
the PMU to the host state, and finally (4) re-enable IRQs on the CPU
you're running on.

If the host PMU state restored in (3) causes the PMU to raise an
interrupt, you'll take an interrupt after (4), which is for the host,
and you'll handle it on the host.

Whenever you schedule the guest VCPU again, you'll (a) disable
interrupts on the CPU, (b) restore the active state of the IRQ for the
guest, (c) restore the guest PMU state, (d) switch to the guest with
IRQs enabled on the CPU (potentially).

If the state in (c) causes an IRQ it will not fire on the host, because
it is marked as active in (b).

Where does this break?

-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-21  9:59               ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-21  9:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
> >> Hi All,
> >>
> >> I have second thoughts about rebasing KVM PMU patches
> >> to Marc's irq-forwarding patches.
> >>
> >> The PMU IRQs (when virtualized by KVM) are not exactly
> >> forwarded IRQs because they are shared between Host
> >> and Guest.
> >>
> >> Scenario1
> >> -------------
> >>
> >> We might have perf running on Host and no KVM guest
> >> running. In this scenario, we wont get interrupts on Host
> >> because the kvm_pmu_hyp_init() (similar to the function
> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> >> implementation) has put all host PMU IRQs in forwarding
> >> mode.
> >>
> >> The only way solve this problem is to not set forwarding
> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> >> have special routines to turn on and turn off the forwarding
> >> mode of PMU IRQs. These routines will be called from
> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> >> forwarding state.
> >>
> >> Scenario2
> >> -------------
> >>
> >> We might have perf running on Host and Guest simultaneously
> >> which means it is quite likely that PMU HW trigger IRQ meant
> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> >> of Marc's patchset which is called before local_irq_enable()).
> >>
> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> >> will accidentally forward IRQ meant for Host to Guest unless
> >> we put additional checks to inspect VCPU PMU state.
> >>
> >> Am I missing any detail about IRQ forwarding for above
> >> scenarios?
> >>
> > Hi Anup,
> 
> Hi Christoffer,
> 
> >
> > I briefly discussed this with Marc.  What I don't understand is how it
> > would be possible to get an interrupt for the host while running the
> > guest?
> >
> > The rationale behind my question is that whenever you're running the
> > guest, the PMU should be programmed exclusively with guest state, and
> > since the PMU is per core, any interrupts should be for the guest, where
> > it would always be pending.
> 
> Yes, thats right PMU is programmed exclusively for guest when
> guest is running and for host when host is running.
> 
> Let us assume a situation (Scenario2 mentioned previously)
> where both host and guest are using PMU. When the guest is
> running we come back to host mode due to variety of reasons
> (stage2 fault, guest IO, regular host interrupt, host interrupt
> meant for guest, ....) which means we will return from the
> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
> At this point we would have restored back host PMU context and
> any PMU counter used by host can trigger PMU overflow interrup
> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
> in the kvm_arch_vcpu_ioctl_run() function (similar to the
> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
> which will try to detect PMU irq forwarding state in GIC hence it
> can accidentally discover PMU irq pending for guest while this
> PMU irq is actually meant for host.
> 
> This above mentioned situation does not happen for timer
> because virtual timer interrupts are exclusively used for guest.
> The exclusive use of virtual timer interrupt for guest ensures that
> the function kvm_timer_sync_hwstate() will always see correct
> state of virtual timer IRQ from GIC.
> 
I'm not quite following.

When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
you would (1) capture the active state of the IRQ pertaining to the
guest and (2) deactive the IRQ on the host, then (3) switch the state of
the PMU to the host state, and finally (4) re-enable IRQs on the CPU
you're running on.

If the host PMU state restored in (3) causes the PMU to raise an
interrupt, you'll take an interrupt after (4), which is for the host,
and you'll handle it on the host.

Whenever you schedule the guest VCPU again, you'll (a) disable
interrupts on the CPU, (b) restore the active state of the IRQ for the
guest, (c) restore the guest PMU state, (d) switch to the guest with
IRQs enabled on the CPU (potentially).

If the state in (c) causes an IRQ it will not fire on the host, because
it is marked as active in (b).

Where does this break?

-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2014-11-21  9:59               ` Christoffer Dall
@ 2014-11-21 10:36                 ` Anup Patel
  -1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-21 10:36 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
	Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar

Hi Christoffer,

On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>> >> Hi All,
>> >>
>> >> I have second thoughts about rebasing KVM PMU patches
>> >> to Marc's irq-forwarding patches.
>> >>
>> >> The PMU IRQs (when virtualized by KVM) are not exactly
>> >> forwarded IRQs because they are shared between Host
>> >> and Guest.
>> >>
>> >> Scenario1
>> >> -------------
>> >>
>> >> We might have perf running on Host and no KVM guest
>> >> running. In this scenario, we wont get interrupts on Host
>> >> because the kvm_pmu_hyp_init() (similar to the function
>> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>> >> implementation) has put all host PMU IRQs in forwarding
>> >> mode.
>> >>
>> >> The only way solve this problem is to not set forwarding
>> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>> >> have special routines to turn on and turn off the forwarding
>> >> mode of PMU IRQs. These routines will be called from
>> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>> >> forwarding state.
>> >>
>> >> Scenario2
>> >> -------------
>> >>
>> >> We might have perf running on Host and Guest simultaneously
>> >> which means it is quite likely that PMU HW trigger IRQ meant
>> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>> >> of Marc's patchset which is called before local_irq_enable()).
>> >>
>> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>> >> will accidentally forward IRQ meant for Host to Guest unless
>> >> we put additional checks to inspect VCPU PMU state.
>> >>
>> >> Am I missing any detail about IRQ forwarding for above
>> >> scenarios?
>> >>
>> > Hi Anup,
>>
>> Hi Christoffer,
>>
>> >
>> > I briefly discussed this with Marc.  What I don't understand is how it
>> > would be possible to get an interrupt for the host while running the
>> > guest?
>> >
>> > The rationale behind my question is that whenever you're running the
>> > guest, the PMU should be programmed exclusively with guest state, and
>> > since the PMU is per core, any interrupts should be for the guest, where
>> > it would always be pending.
>>
>> Yes, thats right PMU is programmed exclusively for guest when
>> guest is running and for host when host is running.
>>
>> Let us assume a situation (Scenario2 mentioned previously)
>> where both host and guest are using PMU. When the guest is
>> running we come back to host mode due to variety of reasons
>> (stage2 fault, guest IO, regular host interrupt, host interrupt
>> meant for guest, ....) which means we will return from the
>> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>> At this point we would have restored back host PMU context and
>> any PMU counter used by host can trigger PMU overflow interrup
>> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>> which will try to detect PMU irq forwarding state in GIC hence it
>> can accidentally discover PMU irq pending for guest while this
>> PMU irq is actually meant for host.
>>
>> This above mentioned situation does not happen for timer
>> because virtual timer interrupts are exclusively used for guest.
>> The exclusive use of virtual timer interrupt for guest ensures that
>> the function kvm_timer_sync_hwstate() will always see correct
>> state of virtual timer IRQ from GIC.
>>
> I'm not quite following.
>
> When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
> you would (1) capture the active state of the IRQ pertaining to the
> guest and (2) deactive the IRQ on the host, then (3) switch the state of
> the PMU to the host state, and finally (4) re-enable IRQs on the CPU
> you're running on.
>
> If the host PMU state restored in (3) causes the PMU to raise an
> interrupt, you'll take an interrupt after (4), which is for the host,
> and you'll handle it on the host.
>
We only switch PMU state in assembly code using
kvm_call_hyp(__kvm_vcpu_run, vcpu)
so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
the current hardware PMU state is for host. This means whenever
we are in host mode the host PMU can change state of PMU IRQ
in GIC even if local IRQs are disabled.

Whenever we inspect active state of PMU IRQ in the
kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
Here we are not guaranteed that IRQ forward state returned by the
irq_get_fwd_state() API is for guest only.

The above situation does not manifest for virtual timer because
virtual timer registers are exclusively accessed by Guest and
virtual timer interrupt is only for Guest (never used by Host).

> Whenever you schedule the guest VCPU again, you'll (a) disable
> interrupts on the CPU, (b) restore the active state of the IRQ for the
> guest, (c) restore the guest PMU state, (d) switch to the guest with
> IRQs enabled on the CPU (potentially).

Here too, while we are between step (a) and step (b) the PMU HW
context is for host and any PMU counter can overflow. The step (b)
can actually override the PMU IRQ meant for Host.

>
> If the state in (c) causes an IRQ it will not fire on the host, because
> it is marked as active in (b).
>
> Where does this break?

Your explanation of IRQ forwarding is fine and fits well for devices
(such as virtual timer, and pass-through devices) where the device
is exclusively accessed by Guest and device IRQ whenever active
is only meant for Guest.

>
> -Christoffer

Regards,
Anup

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-21 10:36                 ` Anup Patel
  0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-21 10:36 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>> >> Hi All,
>> >>
>> >> I have second thoughts about rebasing KVM PMU patches
>> >> to Marc's irq-forwarding patches.
>> >>
>> >> The PMU IRQs (when virtualized by KVM) are not exactly
>> >> forwarded IRQs because they are shared between Host
>> >> and Guest.
>> >>
>> >> Scenario1
>> >> -------------
>> >>
>> >> We might have perf running on Host and no KVM guest
>> >> running. In this scenario, we wont get interrupts on Host
>> >> because the kvm_pmu_hyp_init() (similar to the function
>> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>> >> implementation) has put all host PMU IRQs in forwarding
>> >> mode.
>> >>
>> >> The only way solve this problem is to not set forwarding
>> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>> >> have special routines to turn on and turn off the forwarding
>> >> mode of PMU IRQs. These routines will be called from
>> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>> >> forwarding state.
>> >>
>> >> Scenario2
>> >> -------------
>> >>
>> >> We might have perf running on Host and Guest simultaneously
>> >> which means it is quite likely that PMU HW trigger IRQ meant
>> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>> >> of Marc's patchset which is called before local_irq_enable()).
>> >>
>> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>> >> will accidentally forward IRQ meant for Host to Guest unless
>> >> we put additional checks to inspect VCPU PMU state.
>> >>
>> >> Am I missing any detail about IRQ forwarding for above
>> >> scenarios?
>> >>
>> > Hi Anup,
>>
>> Hi Christoffer,
>>
>> >
>> > I briefly discussed this with Marc.  What I don't understand is how it
>> > would be possible to get an interrupt for the host while running the
>> > guest?
>> >
>> > The rationale behind my question is that whenever you're running the
>> > guest, the PMU should be programmed exclusively with guest state, and
>> > since the PMU is per core, any interrupts should be for the guest, where
>> > it would always be pending.
>>
>> Yes, thats right PMU is programmed exclusively for guest when
>> guest is running and for host when host is running.
>>
>> Let us assume a situation (Scenario2 mentioned previously)
>> where both host and guest are using PMU. When the guest is
>> running we come back to host mode due to variety of reasons
>> (stage2 fault, guest IO, regular host interrupt, host interrupt
>> meant for guest, ....) which means we will return from the
>> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>> At this point we would have restored back host PMU context and
>> any PMU counter used by host can trigger PMU overflow interrup
>> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>> which will try to detect PMU irq forwarding state in GIC hence it
>> can accidentally discover PMU irq pending for guest while this
>> PMU irq is actually meant for host.
>>
>> This above mentioned situation does not happen for timer
>> because virtual timer interrupts are exclusively used for guest.
>> The exclusive use of virtual timer interrupt for guest ensures that
>> the function kvm_timer_sync_hwstate() will always see correct
>> state of virtual timer IRQ from GIC.
>>
> I'm not quite following.
>
> When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
> you would (1) capture the active state of the IRQ pertaining to the
> guest and (2) deactive the IRQ on the host, then (3) switch the state of
> the PMU to the host state, and finally (4) re-enable IRQs on the CPU
> you're running on.
>
> If the host PMU state restored in (3) causes the PMU to raise an
> interrupt, you'll take an interrupt after (4), which is for the host,
> and you'll handle it on the host.
>
We only switch PMU state in assembly code using
kvm_call_hyp(__kvm_vcpu_run, vcpu)
so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
the current hardware PMU state is for host. This means whenever
we are in host mode the host PMU can change state of PMU IRQ
in GIC even if local IRQs are disabled.

Whenever we inspect active state of PMU IRQ in the
kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
Here we are not guaranteed that IRQ forward state returned by the
irq_get_fwd_state() API is for guest only.

The above situation does not manifest for virtual timer because
virtual timer registers are exclusively accessed by Guest and
virtual timer interrupt is only for Guest (never used by Host).

> Whenever you schedule the guest VCPU again, you'll (a) disable
> interrupts on the CPU, (b) restore the active state of the IRQ for the
> guest, (c) restore the guest PMU state, (d) switch to the guest with
> IRQs enabled on the CPU (potentially).

Here too, while we are between step (a) and step (b) the PMU HW
context is for host and any PMU counter can overflow. The step (b)
can actually override the PMU IRQ meant for Host.

>
> If the state in (c) causes an IRQ it will not fire on the host, because
> it is marked as active in (b).
>
> Where does this break?

Your explanation of IRQ forwarding is fine and fits well for devices
(such as virtual timer, and pass-through devices) where the device
is exclusively accessed by Guest and device IRQ whenever active
is only meant for Guest.

>
> -Christoffer

Regards,
Anup

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2014-11-21 10:36                 ` Anup Patel
@ 2014-11-21 11:49                   ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-21 11:49 UTC (permalink / raw)
  To: Anup Patel
  Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
	Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar

On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
> Hi Christoffer,
> 
> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
> >> <christoffer.dall@linaro.org> wrote:
> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
> >> >> Hi All,
> >> >>
> >> >> I have second thoughts about rebasing KVM PMU patches
> >> >> to Marc's irq-forwarding patches.
> >> >>
> >> >> The PMU IRQs (when virtualized by KVM) are not exactly
> >> >> forwarded IRQs because they are shared between Host
> >> >> and Guest.
> >> >>
> >> >> Scenario1
> >> >> -------------
> >> >>
> >> >> We might have perf running on Host and no KVM guest
> >> >> running. In this scenario, we wont get interrupts on Host
> >> >> because the kvm_pmu_hyp_init() (similar to the function
> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> >> >> implementation) has put all host PMU IRQs in forwarding
> >> >> mode.
> >> >>
> >> >> The only way solve this problem is to not set forwarding
> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> >> >> have special routines to turn on and turn off the forwarding
> >> >> mode of PMU IRQs. These routines will be called from
> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> >> >> forwarding state.
> >> >>
> >> >> Scenario2
> >> >> -------------
> >> >>
> >> >> We might have perf running on Host and Guest simultaneously
> >> >> which means it is quite likely that PMU HW trigger IRQ meant
> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> >> >> of Marc's patchset which is called before local_irq_enable()).
> >> >>
> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> >> >> will accidentally forward IRQ meant for Host to Guest unless
> >> >> we put additional checks to inspect VCPU PMU state.
> >> >>
> >> >> Am I missing any detail about IRQ forwarding for above
> >> >> scenarios?
> >> >>
> >> > Hi Anup,
> >>
> >> Hi Christoffer,
> >>
> >> >
> >> > I briefly discussed this with Marc.  What I don't understand is how it
> >> > would be possible to get an interrupt for the host while running the
> >> > guest?
> >> >
> >> > The rationale behind my question is that whenever you're running the
> >> > guest, the PMU should be programmed exclusively with guest state, and
> >> > since the PMU is per core, any interrupts should be for the guest, where
> >> > it would always be pending.
> >>
> >> Yes, thats right PMU is programmed exclusively for guest when
> >> guest is running and for host when host is running.
> >>
> >> Let us assume a situation (Scenario2 mentioned previously)
> >> where both host and guest are using PMU. When the guest is
> >> running we come back to host mode due to variety of reasons
> >> (stage2 fault, guest IO, regular host interrupt, host interrupt
> >> meant for guest, ....) which means we will return from the
> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
> >> At this point we would have restored back host PMU context and
> >> any PMU counter used by host can trigger PMU overflow interrup
> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the
> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
> >> which will try to detect PMU irq forwarding state in GIC hence it
> >> can accidentally discover PMU irq pending for guest while this
> >> PMU irq is actually meant for host.
> >>
> >> This above mentioned situation does not happen for timer
> >> because virtual timer interrupts are exclusively used for guest.
> >> The exclusive use of virtual timer interrupt for guest ensures that
> >> the function kvm_timer_sync_hwstate() will always see correct
> >> state of virtual timer IRQ from GIC.
> >>
> > I'm not quite following.
> >
> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
> > you would (1) capture the active state of the IRQ pertaining to the
> > guest and (2) deactive the IRQ on the host, then (3) switch the state of
> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU
> > you're running on.
> >
> > If the host PMU state restored in (3) causes the PMU to raise an
> > interrupt, you'll take an interrupt after (4), which is for the host,
> > and you'll handle it on the host.
> >
> We only switch PMU state in assembly code using
> kvm_call_hyp(__kvm_vcpu_run, vcpu)
> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
> the current hardware PMU state is for host. This means whenever
> we are in host mode the host PMU can change state of PMU IRQ
> in GIC even if local IRQs are disabled.
> 
> Whenever we inspect active state of PMU IRQ in the
> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
> Here we are not guaranteed that IRQ forward state returned by the
> irq_get_fwd_state() API is for guest only.
> 
> The above situation does not manifest for virtual timer because
> virtual timer registers are exclusively accessed by Guest and
> virtual timer interrupt is only for Guest (never used by Host).
> 
> > Whenever you schedule the guest VCPU again, you'll (a) disable
> > interrupts on the CPU, (b) restore the active state of the IRQ for the
> > guest, (c) restore the guest PMU state, (d) switch to the guest with
> > IRQs enabled on the CPU (potentially).
> 
> Here too, while we are between step (a) and step (b) the PMU HW
> context is for host and any PMU counter can overflow. The step (b)
> can actually override the PMU IRQ meant for Host.
> 
Can you not simply switch the state from C-code after capturing the IRQ
state then?  Everything should be accessible from EL1, right?

-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-21 11:49                   ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-21 11:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
> Hi Christoffer,
> 
> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
> >> <christoffer.dall@linaro.org> wrote:
> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
> >> >> Hi All,
> >> >>
> >> >> I have second thoughts about rebasing KVM PMU patches
> >> >> to Marc's irq-forwarding patches.
> >> >>
> >> >> The PMU IRQs (when virtualized by KVM) are not exactly
> >> >> forwarded IRQs because they are shared between Host
> >> >> and Guest.
> >> >>
> >> >> Scenario1
> >> >> -------------
> >> >>
> >> >> We might have perf running on Host and no KVM guest
> >> >> running. In this scenario, we wont get interrupts on Host
> >> >> because the kvm_pmu_hyp_init() (similar to the function
> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> >> >> implementation) has put all host PMU IRQs in forwarding
> >> >> mode.
> >> >>
> >> >> The only way solve this problem is to not set forwarding
> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> >> >> have special routines to turn on and turn off the forwarding
> >> >> mode of PMU IRQs. These routines will be called from
> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> >> >> forwarding state.
> >> >>
> >> >> Scenario2
> >> >> -------------
> >> >>
> >> >> We might have perf running on Host and Guest simultaneously
> >> >> which means it is quite likely that PMU HW trigger IRQ meant
> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> >> >> of Marc's patchset which is called before local_irq_enable()).
> >> >>
> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> >> >> will accidentally forward IRQ meant for Host to Guest unless
> >> >> we put additional checks to inspect VCPU PMU state.
> >> >>
> >> >> Am I missing any detail about IRQ forwarding for above
> >> >> scenarios?
> >> >>
> >> > Hi Anup,
> >>
> >> Hi Christoffer,
> >>
> >> >
> >> > I briefly discussed this with Marc.  What I don't understand is how it
> >> > would be possible to get an interrupt for the host while running the
> >> > guest?
> >> >
> >> > The rationale behind my question is that whenever you're running the
> >> > guest, the PMU should be programmed exclusively with guest state, and
> >> > since the PMU is per core, any interrupts should be for the guest, where
> >> > it would always be pending.
> >>
> >> Yes, thats right PMU is programmed exclusively for guest when
> >> guest is running and for host when host is running.
> >>
> >> Let us assume a situation (Scenario2 mentioned previously)
> >> where both host and guest are using PMU. When the guest is
> >> running we come back to host mode due to variety of reasons
> >> (stage2 fault, guest IO, regular host interrupt, host interrupt
> >> meant for guest, ....) which means we will return from the
> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
> >> At this point we would have restored back host PMU context and
> >> any PMU counter used by host can trigger PMU overflow interrup
> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the
> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
> >> which will try to detect PMU irq forwarding state in GIC hence it
> >> can accidentally discover PMU irq pending for guest while this
> >> PMU irq is actually meant for host.
> >>
> >> This above mentioned situation does not happen for timer
> >> because virtual timer interrupts are exclusively used for guest.
> >> The exclusive use of virtual timer interrupt for guest ensures that
> >> the function kvm_timer_sync_hwstate() will always see correct
> >> state of virtual timer IRQ from GIC.
> >>
> > I'm not quite following.
> >
> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
> > you would (1) capture the active state of the IRQ pertaining to the
> > guest and (2) deactive the IRQ on the host, then (3) switch the state of
> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU
> > you're running on.
> >
> > If the host PMU state restored in (3) causes the PMU to raise an
> > interrupt, you'll take an interrupt after (4), which is for the host,
> > and you'll handle it on the host.
> >
> We only switch PMU state in assembly code using
> kvm_call_hyp(__kvm_vcpu_run, vcpu)
> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
> the current hardware PMU state is for host. This means whenever
> we are in host mode the host PMU can change state of PMU IRQ
> in GIC even if local IRQs are disabled.
> 
> Whenever we inspect active state of PMU IRQ in the
> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
> Here we are not guaranteed that IRQ forward state returned by the
> irq_get_fwd_state() API is for guest only.
> 
> The above situation does not manifest for virtual timer because
> virtual timer registers are exclusively accessed by Guest and
> virtual timer interrupt is only for Guest (never used by Host).
> 
> > Whenever you schedule the guest VCPU again, you'll (a) disable
> > interrupts on the CPU, (b) restore the active state of the IRQ for the
> > guest, (c) restore the guest PMU state, (d) switch to the guest with
> > IRQs enabled on the CPU (potentially).
> 
> Here too, while we are between step (a) and step (b) the PMU HW
> context is for host and any PMU counter can overflow. The step (b)
> can actually override the PMU IRQ meant for Host.
> 
Can you not simply switch the state from C-code after capturing the IRQ
state then?  Everything should be accessible from EL1, right?

-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2014-11-21 11:49                   ` Christoffer Dall
@ 2014-11-24  8:44                     ` Anup Patel
  -1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-24  8:44 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
	Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar

On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
>> Hi Christoffer,
>>
>> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>> >> <christoffer.dall@linaro.org> wrote:
>> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>> >> >> Hi All,
>> >> >>
>> >> >> I have second thoughts about rebasing KVM PMU patches
>> >> >> to Marc's irq-forwarding patches.
>> >> >>
>> >> >> The PMU IRQs (when virtualized by KVM) are not exactly
>> >> >> forwarded IRQs because they are shared between Host
>> >> >> and Guest.
>> >> >>
>> >> >> Scenario1
>> >> >> -------------
>> >> >>
>> >> >> We might have perf running on Host and no KVM guest
>> >> >> running. In this scenario, we wont get interrupts on Host
>> >> >> because the kvm_pmu_hyp_init() (similar to the function
>> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>> >> >> implementation) has put all host PMU IRQs in forwarding
>> >> >> mode.
>> >> >>
>> >> >> The only way solve this problem is to not set forwarding
>> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>> >> >> have special routines to turn on and turn off the forwarding
>> >> >> mode of PMU IRQs. These routines will be called from
>> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>> >> >> forwarding state.
>> >> >>
>> >> >> Scenario2
>> >> >> -------------
>> >> >>
>> >> >> We might have perf running on Host and Guest simultaneously
>> >> >> which means it is quite likely that PMU HW trigger IRQ meant
>> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>> >> >> of Marc's patchset which is called before local_irq_enable()).
>> >> >>
>> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>> >> >> will accidentally forward IRQ meant for Host to Guest unless
>> >> >> we put additional checks to inspect VCPU PMU state.
>> >> >>
>> >> >> Am I missing any detail about IRQ forwarding for above
>> >> >> scenarios?
>> >> >>
>> >> > Hi Anup,
>> >>
>> >> Hi Christoffer,
>> >>
>> >> >
>> >> > I briefly discussed this with Marc.  What I don't understand is how it
>> >> > would be possible to get an interrupt for the host while running the
>> >> > guest?
>> >> >
>> >> > The rationale behind my question is that whenever you're running the
>> >> > guest, the PMU should be programmed exclusively with guest state, and
>> >> > since the PMU is per core, any interrupts should be for the guest, where
>> >> > it would always be pending.
>> >>
>> >> Yes, thats right PMU is programmed exclusively for guest when
>> >> guest is running and for host when host is running.
>> >>
>> >> Let us assume a situation (Scenario2 mentioned previously)
>> >> where both host and guest are using PMU. When the guest is
>> >> running we come back to host mode due to variety of reasons
>> >> (stage2 fault, guest IO, regular host interrupt, host interrupt
>> >> meant for guest, ....) which means we will return from the
>> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>> >> At this point we would have restored back host PMU context and
>> >> any PMU counter used by host can trigger PMU overflow interrup
>> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>> >> which will try to detect PMU irq forwarding state in GIC hence it
>> >> can accidentally discover PMU irq pending for guest while this
>> >> PMU irq is actually meant for host.
>> >>
>> >> This above mentioned situation does not happen for timer
>> >> because virtual timer interrupts are exclusively used for guest.
>> >> The exclusive use of virtual timer interrupt for guest ensures that
>> >> the function kvm_timer_sync_hwstate() will always see correct
>> >> state of virtual timer IRQ from GIC.
>> >>
>> > I'm not quite following.
>> >
>> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
>> > you would (1) capture the active state of the IRQ pertaining to the
>> > guest and (2) deactive the IRQ on the host, then (3) switch the state of
>> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU
>> > you're running on.
>> >
>> > If the host PMU state restored in (3) causes the PMU to raise an
>> > interrupt, you'll take an interrupt after (4), which is for the host,
>> > and you'll handle it on the host.
>> >
>> We only switch PMU state in assembly code using
>> kvm_call_hyp(__kvm_vcpu_run, vcpu)
>> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
>> the current hardware PMU state is for host. This means whenever
>> we are in host mode the host PMU can change state of PMU IRQ
>> in GIC even if local IRQs are disabled.
>>
>> Whenever we inspect active state of PMU IRQ in the
>> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
>> Here we are not guaranteed that IRQ forward state returned by the
>> irq_get_fwd_state() API is for guest only.
>>
>> The above situation does not manifest for virtual timer because
>> virtual timer registers are exclusively accessed by Guest and
>> virtual timer interrupt is only for Guest (never used by Host).
>>
>> > Whenever you schedule the guest VCPU again, you'll (a) disable
>> > interrupts on the CPU, (b) restore the active state of the IRQ for the
>> > guest, (c) restore the guest PMU state, (d) switch to the guest with
>> > IRQs enabled on the CPU (potentially).
>>
>> Here too, while we are between step (a) and step (b) the PMU HW
>> context is for host and any PMU counter can overflow. The step (b)
>> can actually override the PMU IRQ meant for Host.
>>
> Can you not simply switch the state from C-code after capturing the IRQ
> state then?  Everything should be accessible from EL1, right?

Yes, I think that would be the only option. This also means I will need
to re-implement context switching for doing it in C-code.

What about the scenario1 which I had mentioned?

--
Anup

>
> -Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-24  8:44                     ` Anup Patel
  0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-24  8:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
>> Hi Christoffer,
>>
>> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>> >> <christoffer.dall@linaro.org> wrote:
>> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>> >> >> Hi All,
>> >> >>
>> >> >> I have second thoughts about rebasing KVM PMU patches
>> >> >> to Marc's irq-forwarding patches.
>> >> >>
>> >> >> The PMU IRQs (when virtualized by KVM) are not exactly
>> >> >> forwarded IRQs because they are shared between Host
>> >> >> and Guest.
>> >> >>
>> >> >> Scenario1
>> >> >> -------------
>> >> >>
>> >> >> We might have perf running on Host and no KVM guest
>> >> >> running. In this scenario, we wont get interrupts on Host
>> >> >> because the kvm_pmu_hyp_init() (similar to the function
>> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>> >> >> implementation) has put all host PMU IRQs in forwarding
>> >> >> mode.
>> >> >>
>> >> >> The only way solve this problem is to not set forwarding
>> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>> >> >> have special routines to turn on and turn off the forwarding
>> >> >> mode of PMU IRQs. These routines will be called from
>> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>> >> >> forwarding state.
>> >> >>
>> >> >> Scenario2
>> >> >> -------------
>> >> >>
>> >> >> We might have perf running on Host and Guest simultaneously
>> >> >> which means it is quite likely that PMU HW trigger IRQ meant
>> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>> >> >> of Marc's patchset which is called before local_irq_enable()).
>> >> >>
>> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>> >> >> will accidentally forward IRQ meant for Host to Guest unless
>> >> >> we put additional checks to inspect VCPU PMU state.
>> >> >>
>> >> >> Am I missing any detail about IRQ forwarding for above
>> >> >> scenarios?
>> >> >>
>> >> > Hi Anup,
>> >>
>> >> Hi Christoffer,
>> >>
>> >> >
>> >> > I briefly discussed this with Marc.  What I don't understand is how it
>> >> > would be possible to get an interrupt for the host while running the
>> >> > guest?
>> >> >
>> >> > The rationale behind my question is that whenever you're running the
>> >> > guest, the PMU should be programmed exclusively with guest state, and
>> >> > since the PMU is per core, any interrupts should be for the guest, where
>> >> > it would always be pending.
>> >>
>> >> Yes, thats right PMU is programmed exclusively for guest when
>> >> guest is running and for host when host is running.
>> >>
>> >> Let us assume a situation (Scenario2 mentioned previously)
>> >> where both host and guest are using PMU. When the guest is
>> >> running we come back to host mode due to variety of reasons
>> >> (stage2 fault, guest IO, regular host interrupt, host interrupt
>> >> meant for guest, ....) which means we will return from the
>> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>> >> At this point we would have restored back host PMU context and
>> >> any PMU counter used by host can trigger PMU overflow interrup
>> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>> >> which will try to detect PMU irq forwarding state in GIC hence it
>> >> can accidentally discover PMU irq pending for guest while this
>> >> PMU irq is actually meant for host.
>> >>
>> >> This above mentioned situation does not happen for timer
>> >> because virtual timer interrupts are exclusively used for guest.
>> >> The exclusive use of virtual timer interrupt for guest ensures that
>> >> the function kvm_timer_sync_hwstate() will always see correct
>> >> state of virtual timer IRQ from GIC.
>> >>
>> > I'm not quite following.
>> >
>> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
>> > you would (1) capture the active state of the IRQ pertaining to the
>> > guest and (2) deactive the IRQ on the host, then (3) switch the state of
>> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU
>> > you're running on.
>> >
>> > If the host PMU state restored in (3) causes the PMU to raise an
>> > interrupt, you'll take an interrupt after (4), which is for the host,
>> > and you'll handle it on the host.
>> >
>> We only switch PMU state in assembly code using
>> kvm_call_hyp(__kvm_vcpu_run, vcpu)
>> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
>> the current hardware PMU state is for host. This means whenever
>> we are in host mode the host PMU can change state of PMU IRQ
>> in GIC even if local IRQs are disabled.
>>
>> Whenever we inspect active state of PMU IRQ in the
>> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
>> Here we are not guaranteed that IRQ forward state returned by the
>> irq_get_fwd_state() API is for guest only.
>>
>> The above situation does not manifest for virtual timer because
>> virtual timer registers are exclusively accessed by Guest and
>> virtual timer interrupt is only for Guest (never used by Host).
>>
>> > Whenever you schedule the guest VCPU again, you'll (a) disable
>> > interrupts on the CPU, (b) restore the active state of the IRQ for the
>> > guest, (c) restore the guest PMU state, (d) switch to the guest with
>> > IRQs enabled on the CPU (potentially).
>>
>> Here too, while we are between step (a) and step (b) the PMU HW
>> context is for host and any PMU counter can overflow. The step (b)
>> can actually override the PMU IRQ meant for Host.
>>
> Can you not simply switch the state from C-code after capturing the IRQ
> state then?  Everything should be accessible from EL1, right?

Yes, I think that would be the only option. This also means I will need
to re-implement context switching for doing it in C-code.

What about the scenario1 which I had mentioned?

--
Anup

>
> -Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2014-11-24  8:44                     ` Anup Patel
@ 2014-11-24 14:37                       ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-24 14:37 UTC (permalink / raw)
  To: Anup Patel
  Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
	Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar

On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
> >> Hi Christoffer,
> >>
> >> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
> >> <christoffer.dall@linaro.org> wrote:
> >> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
> >> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
> >> >> <christoffer.dall@linaro.org> wrote:
> >> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
> >> >> >> Hi All,
> >> >> >>
> >> >> >> I have second thoughts about rebasing KVM PMU patches
> >> >> >> to Marc's irq-forwarding patches.
> >> >> >>
> >> >> >> The PMU IRQs (when virtualized by KVM) are not exactly
> >> >> >> forwarded IRQs because they are shared between Host
> >> >> >> and Guest.
> >> >> >>
> >> >> >> Scenario1
> >> >> >> -------------
> >> >> >>
> >> >> >> We might have perf running on Host and no KVM guest
> >> >> >> running. In this scenario, we wont get interrupts on Host
> >> >> >> because the kvm_pmu_hyp_init() (similar to the function
> >> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> >> >> >> implementation) has put all host PMU IRQs in forwarding
> >> >> >> mode.
> >> >> >>
> >> >> >> The only way solve this problem is to not set forwarding
> >> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> >> >> >> have special routines to turn on and turn off the forwarding
> >> >> >> mode of PMU IRQs. These routines will be called from
> >> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> >> >> >> forwarding state.
> >> >> >>
> >> >> >> Scenario2
> >> >> >> -------------
> >> >> >>
> >> >> >> We might have perf running on Host and Guest simultaneously
> >> >> >> which means it is quite likely that PMU HW trigger IRQ meant
> >> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> >> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> >> >> >> of Marc's patchset which is called before local_irq_enable()).
> >> >> >>
> >> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> >> >> >> will accidentally forward IRQ meant for Host to Guest unless
> >> >> >> we put additional checks to inspect VCPU PMU state.
> >> >> >>
> >> >> >> Am I missing any detail about IRQ forwarding for above
> >> >> >> scenarios?
> >> >> >>
> >> >> > Hi Anup,
> >> >>
> >> >> Hi Christoffer,
> >> >>
> >> >> >
> >> >> > I briefly discussed this with Marc.  What I don't understand is how it
> >> >> > would be possible to get an interrupt for the host while running the
> >> >> > guest?
> >> >> >
> >> >> > The rationale behind my question is that whenever you're running the
> >> >> > guest, the PMU should be programmed exclusively with guest state, and
> >> >> > since the PMU is per core, any interrupts should be for the guest, where
> >> >> > it would always be pending.
> >> >>
> >> >> Yes, thats right PMU is programmed exclusively for guest when
> >> >> guest is running and for host when host is running.
> >> >>
> >> >> Let us assume a situation (Scenario2 mentioned previously)
> >> >> where both host and guest are using PMU. When the guest is
> >> >> running we come back to host mode due to variety of reasons
> >> >> (stage2 fault, guest IO, regular host interrupt, host interrupt
> >> >> meant for guest, ....) which means we will return from the
> >> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
> >> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
> >> >> At this point we would have restored back host PMU context and
> >> >> any PMU counter used by host can trigger PMU overflow interrup
> >> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
> >> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the
> >> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
> >> >> which will try to detect PMU irq forwarding state in GIC hence it
> >> >> can accidentally discover PMU irq pending for guest while this
> >> >> PMU irq is actually meant for host.
> >> >>
> >> >> This above mentioned situation does not happen for timer
> >> >> because virtual timer interrupts are exclusively used for guest.
> >> >> The exclusive use of virtual timer interrupt for guest ensures that
> >> >> the function kvm_timer_sync_hwstate() will always see correct
> >> >> state of virtual timer IRQ from GIC.
> >> >>
> >> > I'm not quite following.
> >> >
> >> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
> >> > you would (1) capture the active state of the IRQ pertaining to the
> >> > guest and (2) deactive the IRQ on the host, then (3) switch the state of
> >> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU
> >> > you're running on.
> >> >
> >> > If the host PMU state restored in (3) causes the PMU to raise an
> >> > interrupt, you'll take an interrupt after (4), which is for the host,
> >> > and you'll handle it on the host.
> >> >
> >> We only switch PMU state in assembly code using
> >> kvm_call_hyp(__kvm_vcpu_run, vcpu)
> >> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
> >> the current hardware PMU state is for host. This means whenever
> >> we are in host mode the host PMU can change state of PMU IRQ
> >> in GIC even if local IRQs are disabled.
> >>
> >> Whenever we inspect active state of PMU IRQ in the
> >> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
> >> Here we are not guaranteed that IRQ forward state returned by the
> >> irq_get_fwd_state() API is for guest only.
> >>
> >> The above situation does not manifest for virtual timer because
> >> virtual timer registers are exclusively accessed by Guest and
> >> virtual timer interrupt is only for Guest (never used by Host).
> >>
> >> > Whenever you schedule the guest VCPU again, you'll (a) disable
> >> > interrupts on the CPU, (b) restore the active state of the IRQ for the
> >> > guest, (c) restore the guest PMU state, (d) switch to the guest with
> >> > IRQs enabled on the CPU (potentially).
> >>
> >> Here too, while we are between step (a) and step (b) the PMU HW
> >> context is for host and any PMU counter can overflow. The step (b)
> >> can actually override the PMU IRQ meant for Host.
> >>
> > Can you not simply switch the state from C-code after capturing the IRQ
> > state then?  Everything should be accessible from EL1, right?
> 
> Yes, I think that would be the only option. This also means I will need
> to re-implement context switching for doing it in C-code.

Yes, you'd add some inline assembly in the C-code to access the
registers I guess.  Only thing I thought about after writing my original
mail is whether you'll be counting events while context-swtiching and
running on the host, which you actually don't want to.  Not sure if
there's a better way to avoid that.

> 
> What about the scenario1 which I had mentioned?
> 

You have to consider enabling/disabling forwarding and setting/clearing
the active state is part of the guest PMU state and all of it has to be
context-switched.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-24 14:37                       ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-24 14:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
> >> Hi Christoffer,
> >>
> >> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
> >> <christoffer.dall@linaro.org> wrote:
> >> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
> >> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
> >> >> <christoffer.dall@linaro.org> wrote:
> >> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
> >> >> >> Hi All,
> >> >> >>
> >> >> >> I have second thoughts about rebasing KVM PMU patches
> >> >> >> to Marc's irq-forwarding patches.
> >> >> >>
> >> >> >> The PMU IRQs (when virtualized by KVM) are not exactly
> >> >> >> forwarded IRQs because they are shared between Host
> >> >> >> and Guest.
> >> >> >>
> >> >> >> Scenario1
> >> >> >> -------------
> >> >> >>
> >> >> >> We might have perf running on Host and no KVM guest
> >> >> >> running. In this scenario, we wont get interrupts on Host
> >> >> >> because the kvm_pmu_hyp_init() (similar to the function
> >> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> >> >> >> implementation) has put all host PMU IRQs in forwarding
> >> >> >> mode.
> >> >> >>
> >> >> >> The only way solve this problem is to not set forwarding
> >> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> >> >> >> have special routines to turn on and turn off the forwarding
> >> >> >> mode of PMU IRQs. These routines will be called from
> >> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> >> >> >> forwarding state.
> >> >> >>
> >> >> >> Scenario2
> >> >> >> -------------
> >> >> >>
> >> >> >> We might have perf running on Host and Guest simultaneously
> >> >> >> which means it is quite likely that PMU HW trigger IRQ meant
> >> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> >> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> >> >> >> of Marc's patchset which is called before local_irq_enable()).
> >> >> >>
> >> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> >> >> >> will accidentally forward IRQ meant for Host to Guest unless
> >> >> >> we put additional checks to inspect VCPU PMU state.
> >> >> >>
> >> >> >> Am I missing any detail about IRQ forwarding for above
> >> >> >> scenarios?
> >> >> >>
> >> >> > Hi Anup,
> >> >>
> >> >> Hi Christoffer,
> >> >>
> >> >> >
> >> >> > I briefly discussed this with Marc.  What I don't understand is how it
> >> >> > would be possible to get an interrupt for the host while running the
> >> >> > guest?
> >> >> >
> >> >> > The rationale behind my question is that whenever you're running the
> >> >> > guest, the PMU should be programmed exclusively with guest state, and
> >> >> > since the PMU is per core, any interrupts should be for the guest, where
> >> >> > it would always be pending.
> >> >>
> >> >> Yes, thats right PMU is programmed exclusively for guest when
> >> >> guest is running and for host when host is running.
> >> >>
> >> >> Let us assume a situation (Scenario2 mentioned previously)
> >> >> where both host and guest are using PMU. When the guest is
> >> >> running we come back to host mode due to variety of reasons
> >> >> (stage2 fault, guest IO, regular host interrupt, host interrupt
> >> >> meant for guest, ....) which means we will return from the
> >> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
> >> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
> >> >> At this point we would have restored back host PMU context and
> >> >> any PMU counter used by host can trigger PMU overflow interrup
> >> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
> >> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the
> >> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
> >> >> which will try to detect PMU irq forwarding state in GIC hence it
> >> >> can accidentally discover PMU irq pending for guest while this
> >> >> PMU irq is actually meant for host.
> >> >>
> >> >> This above mentioned situation does not happen for timer
> >> >> because virtual timer interrupts are exclusively used for guest.
> >> >> The exclusive use of virtual timer interrupt for guest ensures that
> >> >> the function kvm_timer_sync_hwstate() will always see correct
> >> >> state of virtual timer IRQ from GIC.
> >> >>
> >> > I'm not quite following.
> >> >
> >> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
> >> > you would (1) capture the active state of the IRQ pertaining to the
> >> > guest and (2) deactive the IRQ on the host, then (3) switch the state of
> >> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU
> >> > you're running on.
> >> >
> >> > If the host PMU state restored in (3) causes the PMU to raise an
> >> > interrupt, you'll take an interrupt after (4), which is for the host,
> >> > and you'll handle it on the host.
> >> >
> >> We only switch PMU state in assembly code using
> >> kvm_call_hyp(__kvm_vcpu_run, vcpu)
> >> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
> >> the current hardware PMU state is for host. This means whenever
> >> we are in host mode the host PMU can change state of PMU IRQ
> >> in GIC even if local IRQs are disabled.
> >>
> >> Whenever we inspect active state of PMU IRQ in the
> >> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
> >> Here we are not guaranteed that IRQ forward state returned by the
> >> irq_get_fwd_state() API is for guest only.
> >>
> >> The above situation does not manifest for virtual timer because
> >> virtual timer registers are exclusively accessed by Guest and
> >> virtual timer interrupt is only for Guest (never used by Host).
> >>
> >> > Whenever you schedule the guest VCPU again, you'll (a) disable
> >> > interrupts on the CPU, (b) restore the active state of the IRQ for the
> >> > guest, (c) restore the guest PMU state, (d) switch to the guest with
> >> > IRQs enabled on the CPU (potentially).
> >>
> >> Here too, while we are between step (a) and step (b) the PMU HW
> >> context is for host and any PMU counter can overflow. The step (b)
> >> can actually override the PMU IRQ meant for Host.
> >>
> > Can you not simply switch the state from C-code after capturing the IRQ
> > state then?  Everything should be accessible from EL1, right?
> 
> Yes, I think that would be the only option. This also means I will need
> to re-implement context switching for doing it in C-code.

Yes, you'd add some inline assembly in the C-code to access the
registers I guess.  Only thing I thought about after writing my original
mail is whether you'll be counting events while context-swtiching and
running on the host, which you actually don't want to.  Not sure if
there's a better way to avoid that.

> 
> What about the scenario1 which I had mentioned?
> 

You have to consider enabling/disabling forwarding and setting/clearing
the active state is part of the guest PMU state and all of it has to be
context-switched.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2014-11-24 14:37                       ` Christoffer Dall
@ 2014-11-25 12:47                         ` Anup Patel
  -1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-25 12:47 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
	Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar

Hi Christoffer,

On Mon, Nov 24, 2014 at 8:07 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
>> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>> > On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
>> >> Hi Christoffer,
>> >>
>> >> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
>> >> <christoffer.dall@linaro.org> wrote:
>> >> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>> >> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>> >> >> <christoffer.dall@linaro.org> wrote:
>> >> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>> >> >> >> Hi All,
>> >> >> >>
>> >> >> >> I have second thoughts about rebasing KVM PMU patches
>> >> >> >> to Marc's irq-forwarding patches.
>> >> >> >>
>> >> >> >> The PMU IRQs (when virtualized by KVM) are not exactly
>> >> >> >> forwarded IRQs because they are shared between Host
>> >> >> >> and Guest.
>> >> >> >>
>> >> >> >> Scenario1
>> >> >> >> -------------
>> >> >> >>
>> >> >> >> We might have perf running on Host and no KVM guest
>> >> >> >> running. In this scenario, we wont get interrupts on Host
>> >> >> >> because the kvm_pmu_hyp_init() (similar to the function
>> >> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>> >> >> >> implementation) has put all host PMU IRQs in forwarding
>> >> >> >> mode.
>> >> >> >>
>> >> >> >> The only way solve this problem is to not set forwarding
>> >> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>> >> >> >> have special routines to turn on and turn off the forwarding
>> >> >> >> mode of PMU IRQs. These routines will be called from
>> >> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>> >> >> >> forwarding state.
>> >> >> >>
>> >> >> >> Scenario2
>> >> >> >> -------------
>> >> >> >>
>> >> >> >> We might have perf running on Host and Guest simultaneously
>> >> >> >> which means it is quite likely that PMU HW trigger IRQ meant
>> >> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>> >> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>> >> >> >> of Marc's patchset which is called before local_irq_enable()).
>> >> >> >>
>> >> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>> >> >> >> will accidentally forward IRQ meant for Host to Guest unless
>> >> >> >> we put additional checks to inspect VCPU PMU state.
>> >> >> >>
>> >> >> >> Am I missing any detail about IRQ forwarding for above
>> >> >> >> scenarios?
>> >> >> >>
>> >> >> > Hi Anup,
>> >> >>
>> >> >> Hi Christoffer,
>> >> >>
>> >> >> >
>> >> >> > I briefly discussed this with Marc.  What I don't understand is how it
>> >> >> > would be possible to get an interrupt for the host while running the
>> >> >> > guest?
>> >> >> >
>> >> >> > The rationale behind my question is that whenever you're running the
>> >> >> > guest, the PMU should be programmed exclusively with guest state, and
>> >> >> > since the PMU is per core, any interrupts should be for the guest, where
>> >> >> > it would always be pending.
>> >> >>
>> >> >> Yes, thats right PMU is programmed exclusively for guest when
>> >> >> guest is running and for host when host is running.
>> >> >>
>> >> >> Let us assume a situation (Scenario2 mentioned previously)
>> >> >> where both host and guest are using PMU. When the guest is
>> >> >> running we come back to host mode due to variety of reasons
>> >> >> (stage2 fault, guest IO, regular host interrupt, host interrupt
>> >> >> meant for guest, ....) which means we will return from the
>> >> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>> >> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>> >> >> At this point we would have restored back host PMU context and
>> >> >> any PMU counter used by host can trigger PMU overflow interrup
>> >> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>> >> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>> >> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>> >> >> which will try to detect PMU irq forwarding state in GIC hence it
>> >> >> can accidentally discover PMU irq pending for guest while this
>> >> >> PMU irq is actually meant for host.
>> >> >>
>> >> >> This above mentioned situation does not happen for timer
>> >> >> because virtual timer interrupts are exclusively used for guest.
>> >> >> The exclusive use of virtual timer interrupt for guest ensures that
>> >> >> the function kvm_timer_sync_hwstate() will always see correct
>> >> >> state of virtual timer IRQ from GIC.
>> >> >>
>> >> > I'm not quite following.
>> >> >
>> >> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
>> >> > you would (1) capture the active state of the IRQ pertaining to the
>> >> > guest and (2) deactive the IRQ on the host, then (3) switch the state of
>> >> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU
>> >> > you're running on.
>> >> >
>> >> > If the host PMU state restored in (3) causes the PMU to raise an
>> >> > interrupt, you'll take an interrupt after (4), which is for the host,
>> >> > and you'll handle it on the host.
>> >> >
>> >> We only switch PMU state in assembly code using
>> >> kvm_call_hyp(__kvm_vcpu_run, vcpu)
>> >> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
>> >> the current hardware PMU state is for host. This means whenever
>> >> we are in host mode the host PMU can change state of PMU IRQ
>> >> in GIC even if local IRQs are disabled.
>> >>
>> >> Whenever we inspect active state of PMU IRQ in the
>> >> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
>> >> Here we are not guaranteed that IRQ forward state returned by the
>> >> irq_get_fwd_state() API is for guest only.
>> >>
>> >> The above situation does not manifest for virtual timer because
>> >> virtual timer registers are exclusively accessed by Guest and
>> >> virtual timer interrupt is only for Guest (never used by Host).
>> >>
>> >> > Whenever you schedule the guest VCPU again, you'll (a) disable
>> >> > interrupts on the CPU, (b) restore the active state of the IRQ for the
>> >> > guest, (c) restore the guest PMU state, (d) switch to the guest with
>> >> > IRQs enabled on the CPU (potentially).
>> >>
>> >> Here too, while we are between step (a) and step (b) the PMU HW
>> >> context is for host and any PMU counter can overflow. The step (b)
>> >> can actually override the PMU IRQ meant for Host.
>> >>
>> > Can you not simply switch the state from C-code after capturing the IRQ
>> > state then?  Everything should be accessible from EL1, right?
>>
>> Yes, I think that would be the only option. This also means I will need
>> to re-implement context switching for doing it in C-code.
>
> Yes, you'd add some inline assembly in the C-code to access the
> registers I guess.  Only thing I thought about after writing my original
> mail is whether you'll be counting events while context-swtiching and
> running on the host, which you actually don't want to.  Not sure if
> there's a better way to avoid that.
>
>>
>> What about the scenario1 which I had mentioned?
>>
>
> You have to consider enabling/disabling forwarding and setting/clearing
> the active state is part of the guest PMU state and all of it has to be
> context-switched.

I found one more issue.

If PMU irq is PPI then enabling/disabling forwarding will not
work because irqd_set_irq_forwarded() function takes irq_data
as argument which is member of irq_desc and irq_desc for PPIs
is not per_cpu. This means we cannot call irqd_set_irq_forwarded()
simultaneously from different host CPUs.

>
> Thanks,
> -Christoffer

Regards,
Anup

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-25 12:47                         ` Anup Patel
  0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-25 12:47 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On Mon, Nov 24, 2014 at 8:07 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
>> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>> > On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
>> >> Hi Christoffer,
>> >>
>> >> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
>> >> <christoffer.dall@linaro.org> wrote:
>> >> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>> >> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>> >> >> <christoffer.dall@linaro.org> wrote:
>> >> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>> >> >> >> Hi All,
>> >> >> >>
>> >> >> >> I have second thoughts about rebasing KVM PMU patches
>> >> >> >> to Marc's irq-forwarding patches.
>> >> >> >>
>> >> >> >> The PMU IRQs (when virtualized by KVM) are not exactly
>> >> >> >> forwarded IRQs because they are shared between Host
>> >> >> >> and Guest.
>> >> >> >>
>> >> >> >> Scenario1
>> >> >> >> -------------
>> >> >> >>
>> >> >> >> We might have perf running on Host and no KVM guest
>> >> >> >> running. In this scenario, we wont get interrupts on Host
>> >> >> >> because the kvm_pmu_hyp_init() (similar to the function
>> >> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>> >> >> >> implementation) has put all host PMU IRQs in forwarding
>> >> >> >> mode.
>> >> >> >>
>> >> >> >> The only way solve this problem is to not set forwarding
>> >> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>> >> >> >> have special routines to turn on and turn off the forwarding
>> >> >> >> mode of PMU IRQs. These routines will be called from
>> >> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>> >> >> >> forwarding state.
>> >> >> >>
>> >> >> >> Scenario2
>> >> >> >> -------------
>> >> >> >>
>> >> >> >> We might have perf running on Host and Guest simultaneously
>> >> >> >> which means it is quite likely that PMU HW trigger IRQ meant
>> >> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>> >> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>> >> >> >> of Marc's patchset which is called before local_irq_enable()).
>> >> >> >>
>> >> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>> >> >> >> will accidentally forward IRQ meant for Host to Guest unless
>> >> >> >> we put additional checks to inspect VCPU PMU state.
>> >> >> >>
>> >> >> >> Am I missing any detail about IRQ forwarding for above
>> >> >> >> scenarios?
>> >> >> >>
>> >> >> > Hi Anup,
>> >> >>
>> >> >> Hi Christoffer,
>> >> >>
>> >> >> >
>> >> >> > I briefly discussed this with Marc.  What I don't understand is how it
>> >> >> > would be possible to get an interrupt for the host while running the
>> >> >> > guest?
>> >> >> >
>> >> >> > The rationale behind my question is that whenever you're running the
>> >> >> > guest, the PMU should be programmed exclusively with guest state, and
>> >> >> > since the PMU is per core, any interrupts should be for the guest, where
>> >> >> > it would always be pending.
>> >> >>
>> >> >> Yes, thats right PMU is programmed exclusively for guest when
>> >> >> guest is running and for host when host is running.
>> >> >>
>> >> >> Let us assume a situation (Scenario2 mentioned previously)
>> >> >> where both host and guest are using PMU. When the guest is
>> >> >> running we come back to host mode due to variety of reasons
>> >> >> (stage2 fault, guest IO, regular host interrupt, host interrupt
>> >> >> meant for guest, ....) which means we will return from the
>> >> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>> >> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>> >> >> At this point we would have restored back host PMU context and
>> >> >> any PMU counter used by host can trigger PMU overflow interrup
>> >> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>> >> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>> >> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>> >> >> which will try to detect PMU irq forwarding state in GIC hence it
>> >> >> can accidentally discover PMU irq pending for guest while this
>> >> >> PMU irq is actually meant for host.
>> >> >>
>> >> >> This above mentioned situation does not happen for timer
>> >> >> because virtual timer interrupts are exclusively used for guest.
>> >> >> The exclusive use of virtual timer interrupt for guest ensures that
>> >> >> the function kvm_timer_sync_hwstate() will always see correct
>> >> >> state of virtual timer IRQ from GIC.
>> >> >>
>> >> > I'm not quite following.
>> >> >
>> >> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
>> >> > you would (1) capture the active state of the IRQ pertaining to the
>> >> > guest and (2) deactive the IRQ on the host, then (3) switch the state of
>> >> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU
>> >> > you're running on.
>> >> >
>> >> > If the host PMU state restored in (3) causes the PMU to raise an
>> >> > interrupt, you'll take an interrupt after (4), which is for the host,
>> >> > and you'll handle it on the host.
>> >> >
>> >> We only switch PMU state in assembly code using
>> >> kvm_call_hyp(__kvm_vcpu_run, vcpu)
>> >> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
>> >> the current hardware PMU state is for host. This means whenever
>> >> we are in host mode the host PMU can change state of PMU IRQ
>> >> in GIC even if local IRQs are disabled.
>> >>
>> >> Whenever we inspect active state of PMU IRQ in the
>> >> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
>> >> Here we are not guaranteed that IRQ forward state returned by the
>> >> irq_get_fwd_state() API is for guest only.
>> >>
>> >> The above situation does not manifest for virtual timer because
>> >> virtual timer registers are exclusively accessed by Guest and
>> >> virtual timer interrupt is only for Guest (never used by Host).
>> >>
>> >> > Whenever you schedule the guest VCPU again, you'll (a) disable
>> >> > interrupts on the CPU, (b) restore the active state of the IRQ for the
>> >> > guest, (c) restore the guest PMU state, (d) switch to the guest with
>> >> > IRQs enabled on the CPU (potentially).
>> >>
>> >> Here too, while we are between step (a) and step (b) the PMU HW
>> >> context is for host and any PMU counter can overflow. The step (b)
>> >> can actually override the PMU IRQ meant for Host.
>> >>
>> > Can you not simply switch the state from C-code after capturing the IRQ
>> > state then?  Everything should be accessible from EL1, right?
>>
>> Yes, I think that would be the only option. This also means I will need
>> to re-implement context switching for doing it in C-code.
>
> Yes, you'd add some inline assembly in the C-code to access the
> registers I guess.  Only thing I thought about after writing my original
> mail is whether you'll be counting events while context-swtiching and
> running on the host, which you actually don't want to.  Not sure if
> there's a better way to avoid that.
>
>>
>> What about the scenario1 which I had mentioned?
>>
>
> You have to consider enabling/disabling forwarding and setting/clearing
> the active state is part of the guest PMU state and all of it has to be
> context-switched.

I found one more issue.

If PMU irq is PPI then enabling/disabling forwarding will not
work because irqd_set_irq_forwarded() function takes irq_data
as argument which is member of irq_desc and irq_desc for PPIs
is not per_cpu. This means we cannot call irqd_set_irq_forwarded()
simultaneously from different host CPUs.

>
> Thanks,
> -Christoffer

Regards,
Anup

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2014-11-25 12:47                         ` Anup Patel
@ 2014-11-25 13:42                           ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-25 13:42 UTC (permalink / raw)
  To: Anup Patel
  Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
	Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar

On Tue, Nov 25, 2014 at 06:17:03PM +0530, Anup Patel wrote:
> Hi Christoffer,
> 
> On Mon, Nov 24, 2014 at 8:07 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
> >> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
> >> <christoffer.dall@linaro.org> wrote:
> >> > On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
> >> >> Hi Christoffer,
> >> >>
> >> >> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
> >> >> <christoffer.dall@linaro.org> wrote:
> >> >> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
> >> >> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
> >> >> >> <christoffer.dall@linaro.org> wrote:
> >> >> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
> >> >> >> >> Hi All,
> >> >> >> >>
> >> >> >> >> I have second thoughts about rebasing KVM PMU patches
> >> >> >> >> to Marc's irq-forwarding patches.
> >> >> >> >>
> >> >> >> >> The PMU IRQs (when virtualized by KVM) are not exactly
> >> >> >> >> forwarded IRQs because they are shared between Host
> >> >> >> >> and Guest.
> >> >> >> >>
> >> >> >> >> Scenario1
> >> >> >> >> -------------
> >> >> >> >>
> >> >> >> >> We might have perf running on Host and no KVM guest
> >> >> >> >> running. In this scenario, we wont get interrupts on Host
> >> >> >> >> because the kvm_pmu_hyp_init() (similar to the function
> >> >> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> >> >> >> >> implementation) has put all host PMU IRQs in forwarding
> >> >> >> >> mode.
> >> >> >> >>
> >> >> >> >> The only way solve this problem is to not set forwarding
> >> >> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> >> >> >> >> have special routines to turn on and turn off the forwarding
> >> >> >> >> mode of PMU IRQs. These routines will be called from
> >> >> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> >> >> >> >> forwarding state.
> >> >> >> >>
> >> >> >> >> Scenario2
> >> >> >> >> -------------
> >> >> >> >>
> >> >> >> >> We might have perf running on Host and Guest simultaneously
> >> >> >> >> which means it is quite likely that PMU HW trigger IRQ meant
> >> >> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> >> >> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> >> >> >> >> of Marc's patchset which is called before local_irq_enable()).
> >> >> >> >>
> >> >> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> >> >> >> >> will accidentally forward IRQ meant for Host to Guest unless
> >> >> >> >> we put additional checks to inspect VCPU PMU state.
> >> >> >> >>
> >> >> >> >> Am I missing any detail about IRQ forwarding for above
> >> >> >> >> scenarios?
> >> >> >> >>
> >> >> >> > Hi Anup,
> >> >> >>
> >> >> >> Hi Christoffer,
> >> >> >>
> >> >> >> >
> >> >> >> > I briefly discussed this with Marc.  What I don't understand is how it
> >> >> >> > would be possible to get an interrupt for the host while running the
> >> >> >> > guest?
> >> >> >> >
> >> >> >> > The rationale behind my question is that whenever you're running the
> >> >> >> > guest, the PMU should be programmed exclusively with guest state, and
> >> >> >> > since the PMU is per core, any interrupts should be for the guest, where
> >> >> >> > it would always be pending.
> >> >> >>
> >> >> >> Yes, thats right PMU is programmed exclusively for guest when
> >> >> >> guest is running and for host when host is running.
> >> >> >>
> >> >> >> Let us assume a situation (Scenario2 mentioned previously)
> >> >> >> where both host and guest are using PMU. When the guest is
> >> >> >> running we come back to host mode due to variety of reasons
> >> >> >> (stage2 fault, guest IO, regular host interrupt, host interrupt
> >> >> >> meant for guest, ....) which means we will return from the
> >> >> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
> >> >> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
> >> >> >> At this point we would have restored back host PMU context and
> >> >> >> any PMU counter used by host can trigger PMU overflow interrup
> >> >> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
> >> >> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the
> >> >> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
> >> >> >> which will try to detect PMU irq forwarding state in GIC hence it
> >> >> >> can accidentally discover PMU irq pending for guest while this
> >> >> >> PMU irq is actually meant for host.
> >> >> >>
> >> >> >> This above mentioned situation does not happen for timer
> >> >> >> because virtual timer interrupts are exclusively used for guest.
> >> >> >> The exclusive use of virtual timer interrupt for guest ensures that
> >> >> >> the function kvm_timer_sync_hwstate() will always see correct
> >> >> >> state of virtual timer IRQ from GIC.
> >> >> >>
> >> >> > I'm not quite following.
> >> >> >
> >> >> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
> >> >> > you would (1) capture the active state of the IRQ pertaining to the
> >> >> > guest and (2) deactive the IRQ on the host, then (3) switch the state of
> >> >> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU
> >> >> > you're running on.
> >> >> >
> >> >> > If the host PMU state restored in (3) causes the PMU to raise an
> >> >> > interrupt, you'll take an interrupt after (4), which is for the host,
> >> >> > and you'll handle it on the host.
> >> >> >
> >> >> We only switch PMU state in assembly code using
> >> >> kvm_call_hyp(__kvm_vcpu_run, vcpu)
> >> >> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
> >> >> the current hardware PMU state is for host. This means whenever
> >> >> we are in host mode the host PMU can change state of PMU IRQ
> >> >> in GIC even if local IRQs are disabled.
> >> >>
> >> >> Whenever we inspect active state of PMU IRQ in the
> >> >> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
> >> >> Here we are not guaranteed that IRQ forward state returned by the
> >> >> irq_get_fwd_state() API is for guest only.
> >> >>
> >> >> The above situation does not manifest for virtual timer because
> >> >> virtual timer registers are exclusively accessed by Guest and
> >> >> virtual timer interrupt is only for Guest (never used by Host).
> >> >>
> >> >> > Whenever you schedule the guest VCPU again, you'll (a) disable
> >> >> > interrupts on the CPU, (b) restore the active state of the IRQ for the
> >> >> > guest, (c) restore the guest PMU state, (d) switch to the guest with
> >> >> > IRQs enabled on the CPU (potentially).
> >> >>
> >> >> Here too, while we are between step (a) and step (b) the PMU HW
> >> >> context is for host and any PMU counter can overflow. The step (b)
> >> >> can actually override the PMU IRQ meant for Host.
> >> >>
> >> > Can you not simply switch the state from C-code after capturing the IRQ
> >> > state then?  Everything should be accessible from EL1, right?
> >>
> >> Yes, I think that would be the only option. This also means I will need
> >> to re-implement context switching for doing it in C-code.
> >
> > Yes, you'd add some inline assembly in the C-code to access the
> > registers I guess.  Only thing I thought about after writing my original
> > mail is whether you'll be counting events while context-swtiching and
> > running on the host, which you actually don't want to.  Not sure if
> > there's a better way to avoid that.
> >
> >>
> >> What about the scenario1 which I had mentioned?
> >>
> >
> > You have to consider enabling/disabling forwarding and setting/clearing
> > the active state is part of the guest PMU state and all of it has to be
> > context-switched.
> 
> I found one more issue.
> 
> If PMU irq is PPI then enabling/disabling forwarding will not
> work because irqd_set_irq_forwarded() function takes irq_data
> as argument which is member of irq_desc and irq_desc for PPIs
> is not per_cpu. This means we cannot call irqd_set_irq_forwarded()
> simultaneously from different host CPUs.
> 
I'll let Marc answer this one and if this still applies to his view of
how the next version of the forwarding series will look like.

-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-25 13:42                           ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-25 13:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 25, 2014 at 06:17:03PM +0530, Anup Patel wrote:
> Hi Christoffer,
> 
> On Mon, Nov 24, 2014 at 8:07 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
> >> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
> >> <christoffer.dall@linaro.org> wrote:
> >> > On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
> >> >> Hi Christoffer,
> >> >>
> >> >> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
> >> >> <christoffer.dall@linaro.org> wrote:
> >> >> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
> >> >> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
> >> >> >> <christoffer.dall@linaro.org> wrote:
> >> >> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
> >> >> >> >> Hi All,
> >> >> >> >>
> >> >> >> >> I have second thoughts about rebasing KVM PMU patches
> >> >> >> >> to Marc's irq-forwarding patches.
> >> >> >> >>
> >> >> >> >> The PMU IRQs (when virtualized by KVM) are not exactly
> >> >> >> >> forwarded IRQs because they are shared between Host
> >> >> >> >> and Guest.
> >> >> >> >>
> >> >> >> >> Scenario1
> >> >> >> >> -------------
> >> >> >> >>
> >> >> >> >> We might have perf running on Host and no KVM guest
> >> >> >> >> running. In this scenario, we wont get interrupts on Host
> >> >> >> >> because the kvm_pmu_hyp_init() (similar to the function
> >> >> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> >> >> >> >> implementation) has put all host PMU IRQs in forwarding
> >> >> >> >> mode.
> >> >> >> >>
> >> >> >> >> The only way solve this problem is to not set forwarding
> >> >> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> >> >> >> >> have special routines to turn on and turn off the forwarding
> >> >> >> >> mode of PMU IRQs. These routines will be called from
> >> >> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> >> >> >> >> forwarding state.
> >> >> >> >>
> >> >> >> >> Scenario2
> >> >> >> >> -------------
> >> >> >> >>
> >> >> >> >> We might have perf running on Host and Guest simultaneously
> >> >> >> >> which means it is quite likely that PMU HW trigger IRQ meant
> >> >> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> >> >> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> >> >> >> >> of Marc's patchset which is called before local_irq_enable()).
> >> >> >> >>
> >> >> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> >> >> >> >> will accidentally forward IRQ meant for Host to Guest unless
> >> >> >> >> we put additional checks to inspect VCPU PMU state.
> >> >> >> >>
> >> >> >> >> Am I missing any detail about IRQ forwarding for above
> >> >> >> >> scenarios?
> >> >> >> >>
> >> >> >> > Hi Anup,
> >> >> >>
> >> >> >> Hi Christoffer,
> >> >> >>
> >> >> >> >
> >> >> >> > I briefly discussed this with Marc.  What I don't understand is how it
> >> >> >> > would be possible to get an interrupt for the host while running the
> >> >> >> > guest?
> >> >> >> >
> >> >> >> > The rationale behind my question is that whenever you're running the
> >> >> >> > guest, the PMU should be programmed exclusively with guest state, and
> >> >> >> > since the PMU is per core, any interrupts should be for the guest, where
> >> >> >> > it would always be pending.
> >> >> >>
> >> >> >> Yes, thats right PMU is programmed exclusively for guest when
> >> >> >> guest is running and for host when host is running.
> >> >> >>
> >> >> >> Let us assume a situation (Scenario2 mentioned previously)
> >> >> >> where both host and guest are using PMU. When the guest is
> >> >> >> running we come back to host mode due to variety of reasons
> >> >> >> (stage2 fault, guest IO, regular host interrupt, host interrupt
> >> >> >> meant for guest, ....) which means we will return from the
> >> >> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
> >> >> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
> >> >> >> At this point we would have restored back host PMU context and
> >> >> >> any PMU counter used by host can trigger PMU overflow interrup
> >> >> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
> >> >> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the
> >> >> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
> >> >> >> which will try to detect PMU irq forwarding state in GIC hence it
> >> >> >> can accidentally discover PMU irq pending for guest while this
> >> >> >> PMU irq is actually meant for host.
> >> >> >>
> >> >> >> This above mentioned situation does not happen for timer
> >> >> >> because virtual timer interrupts are exclusively used for guest.
> >> >> >> The exclusive use of virtual timer interrupt for guest ensures that
> >> >> >> the function kvm_timer_sync_hwstate() will always see correct
> >> >> >> state of virtual timer IRQ from GIC.
> >> >> >>
> >> >> > I'm not quite following.
> >> >> >
> >> >> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
> >> >> > you would (1) capture the active state of the IRQ pertaining to the
> >> >> > guest and (2) deactive the IRQ on the host, then (3) switch the state of
> >> >> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU
> >> >> > you're running on.
> >> >> >
> >> >> > If the host PMU state restored in (3) causes the PMU to raise an
> >> >> > interrupt, you'll take an interrupt after (4), which is for the host,
> >> >> > and you'll handle it on the host.
> >> >> >
> >> >> We only switch PMU state in assembly code using
> >> >> kvm_call_hyp(__kvm_vcpu_run, vcpu)
> >> >> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
> >> >> the current hardware PMU state is for host. This means whenever
> >> >> we are in host mode the host PMU can change state of PMU IRQ
> >> >> in GIC even if local IRQs are disabled.
> >> >>
> >> >> Whenever we inspect active state of PMU IRQ in the
> >> >> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
> >> >> Here we are not guaranteed that IRQ forward state returned by the
> >> >> irq_get_fwd_state() API is for guest only.
> >> >>
> >> >> The above situation does not manifest for virtual timer because
> >> >> virtual timer registers are exclusively accessed by Guest and
> >> >> virtual timer interrupt is only for Guest (never used by Host).
> >> >>
> >> >> > Whenever you schedule the guest VCPU again, you'll (a) disable
> >> >> > interrupts on the CPU, (b) restore the active state of the IRQ for the
> >> >> > guest, (c) restore the guest PMU state, (d) switch to the guest with
> >> >> > IRQs enabled on the CPU (potentially).
> >> >>
> >> >> Here too, while we are between step (a) and step (b) the PMU HW
> >> >> context is for host and any PMU counter can overflow. The step (b)
> >> >> can actually override the PMU IRQ meant for Host.
> >> >>
> >> > Can you not simply switch the state from C-code after capturing the IRQ
> >> > state then?  Everything should be accessible from EL1, right?
> >>
> >> Yes, I think that would be the only option. This also means I will need
> >> to re-implement context switching for doing it in C-code.
> >
> > Yes, you'd add some inline assembly in the C-code to access the
> > registers I guess.  Only thing I thought about after writing my original
> > mail is whether you'll be counting events while context-swtiching and
> > running on the host, which you actually don't want to.  Not sure if
> > there's a better way to avoid that.
> >
> >>
> >> What about the scenario1 which I had mentioned?
> >>
> >
> > You have to consider enabling/disabling forwarding and setting/clearing
> > the active state is part of the guest PMU state and all of it has to be
> > context-switched.
> 
> I found one more issue.
> 
> If PMU irq is PPI then enabling/disabling forwarding will not
> work because irqd_set_irq_forwarded() function takes irq_data
> as argument which is member of irq_desc and irq_desc for PPIs
> is not per_cpu. This means we cannot call irqd_set_irq_forwarded()
> simultaneously from different host CPUs.
> 
I'll let Marc answer this one and if this still applies to his view of
how the next version of the forwarding series will look like.

-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2014-11-25 13:42                           ` Christoffer Dall
@ 2014-11-27 10:22                             ` Anup Patel
  -1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-27 10:22 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
	Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar

On Tue, Nov 25, 2014 at 7:12 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Tue, Nov 25, 2014 at 06:17:03PM +0530, Anup Patel wrote:
>> Hi Christoffer,
>>
>> On Mon, Nov 24, 2014 at 8:07 PM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>> > On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
>> >> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
>> >> <christoffer.dall@linaro.org> wrote:
>> >> > On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
>> >> >> Hi Christoffer,
>> >> >>
>> >> >> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
>> >> >> <christoffer.dall@linaro.org> wrote:
>> >> >> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>> >> >> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>> >> >> >> <christoffer.dall@linaro.org> wrote:
>> >> >> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>> >> >> >> >> Hi All,
>> >> >> >> >>
>> >> >> >> >> I have second thoughts about rebasing KVM PMU patches
>> >> >> >> >> to Marc's irq-forwarding patches.
>> >> >> >> >>
>> >> >> >> >> The PMU IRQs (when virtualized by KVM) are not exactly
>> >> >> >> >> forwarded IRQs because they are shared between Host
>> >> >> >> >> and Guest.
>> >> >> >> >>
>> >> >> >> >> Scenario1
>> >> >> >> >> -------------
>> >> >> >> >>
>> >> >> >> >> We might have perf running on Host and no KVM guest
>> >> >> >> >> running. In this scenario, we wont get interrupts on Host
>> >> >> >> >> because the kvm_pmu_hyp_init() (similar to the function
>> >> >> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>> >> >> >> >> implementation) has put all host PMU IRQs in forwarding
>> >> >> >> >> mode.
>> >> >> >> >>
>> >> >> >> >> The only way solve this problem is to not set forwarding
>> >> >> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>> >> >> >> >> have special routines to turn on and turn off the forwarding
>> >> >> >> >> mode of PMU IRQs. These routines will be called from
>> >> >> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>> >> >> >> >> forwarding state.
>> >> >> >> >>
>> >> >> >> >> Scenario2
>> >> >> >> >> -------------
>> >> >> >> >>
>> >> >> >> >> We might have perf running on Host and Guest simultaneously
>> >> >> >> >> which means it is quite likely that PMU HW trigger IRQ meant
>> >> >> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>> >> >> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>> >> >> >> >> of Marc's patchset which is called before local_irq_enable()).
>> >> >> >> >>
>> >> >> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>> >> >> >> >> will accidentally forward IRQ meant for Host to Guest unless
>> >> >> >> >> we put additional checks to inspect VCPU PMU state.
>> >> >> >> >>
>> >> >> >> >> Am I missing any detail about IRQ forwarding for above
>> >> >> >> >> scenarios?
>> >> >> >> >>
>> >> >> >> > Hi Anup,
>> >> >> >>
>> >> >> >> Hi Christoffer,
>> >> >> >>
>> >> >> >> >
>> >> >> >> > I briefly discussed this with Marc.  What I don't understand is how it
>> >> >> >> > would be possible to get an interrupt for the host while running the
>> >> >> >> > guest?
>> >> >> >> >
>> >> >> >> > The rationale behind my question is that whenever you're running the
>> >> >> >> > guest, the PMU should be programmed exclusively with guest state, and
>> >> >> >> > since the PMU is per core, any interrupts should be for the guest, where
>> >> >> >> > it would always be pending.
>> >> >> >>
>> >> >> >> Yes, thats right PMU is programmed exclusively for guest when
>> >> >> >> guest is running and for host when host is running.
>> >> >> >>
>> >> >> >> Let us assume a situation (Scenario2 mentioned previously)
>> >> >> >> where both host and guest are using PMU. When the guest is
>> >> >> >> running we come back to host mode due to variety of reasons
>> >> >> >> (stage2 fault, guest IO, regular host interrupt, host interrupt
>> >> >> >> meant for guest, ....) which means we will return from the
>> >> >> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>> >> >> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>> >> >> >> At this point we would have restored back host PMU context and
>> >> >> >> any PMU counter used by host can trigger PMU overflow interrup
>> >> >> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>> >> >> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>> >> >> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>> >> >> >> which will try to detect PMU irq forwarding state in GIC hence it
>> >> >> >> can accidentally discover PMU irq pending for guest while this
>> >> >> >> PMU irq is actually meant for host.
>> >> >> >>
>> >> >> >> This above mentioned situation does not happen for timer
>> >> >> >> because virtual timer interrupts are exclusively used for guest.
>> >> >> >> The exclusive use of virtual timer interrupt for guest ensures that
>> >> >> >> the function kvm_timer_sync_hwstate() will always see correct
>> >> >> >> state of virtual timer IRQ from GIC.
>> >> >> >>
>> >> >> > I'm not quite following.
>> >> >> >
>> >> >> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
>> >> >> > you would (1) capture the active state of the IRQ pertaining to the
>> >> >> > guest and (2) deactive the IRQ on the host, then (3) switch the state of
>> >> >> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU
>> >> >> > you're running on.
>> >> >> >
>> >> >> > If the host PMU state restored in (3) causes the PMU to raise an
>> >> >> > interrupt, you'll take an interrupt after (4), which is for the host,
>> >> >> > and you'll handle it on the host.
>> >> >> >
>> >> >> We only switch PMU state in assembly code using
>> >> >> kvm_call_hyp(__kvm_vcpu_run, vcpu)
>> >> >> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
>> >> >> the current hardware PMU state is for host. This means whenever
>> >> >> we are in host mode the host PMU can change state of PMU IRQ
>> >> >> in GIC even if local IRQs are disabled.
>> >> >>
>> >> >> Whenever we inspect active state of PMU IRQ in the
>> >> >> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
>> >> >> Here we are not guaranteed that IRQ forward state returned by the
>> >> >> irq_get_fwd_state() API is for guest only.
>> >> >>
>> >> >> The above situation does not manifest for virtual timer because
>> >> >> virtual timer registers are exclusively accessed by Guest and
>> >> >> virtual timer interrupt is only for Guest (never used by Host).
>> >> >>
>> >> >> > Whenever you schedule the guest VCPU again, you'll (a) disable
>> >> >> > interrupts on the CPU, (b) restore the active state of the IRQ for the
>> >> >> > guest, (c) restore the guest PMU state, (d) switch to the guest with
>> >> >> > IRQs enabled on the CPU (potentially).
>> >> >>
>> >> >> Here too, while we are between step (a) and step (b) the PMU HW
>> >> >> context is for host and any PMU counter can overflow. The step (b)
>> >> >> can actually override the PMU IRQ meant for Host.
>> >> >>
>> >> > Can you not simply switch the state from C-code after capturing the IRQ
>> >> > state then?  Everything should be accessible from EL1, right?
>> >>
>> >> Yes, I think that would be the only option. This also means I will need
>> >> to re-implement context switching for doing it in C-code.
>> >
>> > Yes, you'd add some inline assembly in the C-code to access the
>> > registers I guess.  Only thing I thought about after writing my original
>> > mail is whether you'll be counting events while context-swtiching and
>> > running on the host, which you actually don't want to.  Not sure if
>> > there's a better way to avoid that.
>> >
>> >>
>> >> What about the scenario1 which I had mentioned?
>> >>
>> >
>> > You have to consider enabling/disabling forwarding and setting/clearing
>> > the active state is part of the guest PMU state and all of it has to be
>> > context-switched.
>>
>> I found one more issue.
>>
>> If PMU irq is PPI then enabling/disabling forwarding will not
>> work because irqd_set_irq_forwarded() function takes irq_data
>> as argument which is member of irq_desc and irq_desc for PPIs
>> is not per_cpu. This means we cannot call irqd_set_irq_forwarded()
>> simultaneously from different host CPUs.
>>

Hi Marc,

> I'll let Marc answer this one and if this still applies to his view of
> how the next version of the forwarding series will look like.

Ping??

>
> -Christoffer

--
Anup

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-27 10:22                             ` Anup Patel
  0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-27 10:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 25, 2014 at 7:12 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Tue, Nov 25, 2014 at 06:17:03PM +0530, Anup Patel wrote:
>> Hi Christoffer,
>>
>> On Mon, Nov 24, 2014 at 8:07 PM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>> > On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
>> >> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
>> >> <christoffer.dall@linaro.org> wrote:
>> >> > On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
>> >> >> Hi Christoffer,
>> >> >>
>> >> >> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
>> >> >> <christoffer.dall@linaro.org> wrote:
>> >> >> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>> >> >> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>> >> >> >> <christoffer.dall@linaro.org> wrote:
>> >> >> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>> >> >> >> >> Hi All,
>> >> >> >> >>
>> >> >> >> >> I have second thoughts about rebasing KVM PMU patches
>> >> >> >> >> to Marc's irq-forwarding patches.
>> >> >> >> >>
>> >> >> >> >> The PMU IRQs (when virtualized by KVM) are not exactly
>> >> >> >> >> forwarded IRQs because they are shared between Host
>> >> >> >> >> and Guest.
>> >> >> >> >>
>> >> >> >> >> Scenario1
>> >> >> >> >> -------------
>> >> >> >> >>
>> >> >> >> >> We might have perf running on Host and no KVM guest
>> >> >> >> >> running. In this scenario, we wont get interrupts on Host
>> >> >> >> >> because the kvm_pmu_hyp_init() (similar to the function
>> >> >> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>> >> >> >> >> implementation) has put all host PMU IRQs in forwarding
>> >> >> >> >> mode.
>> >> >> >> >>
>> >> >> >> >> The only way solve this problem is to not set forwarding
>> >> >> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>> >> >> >> >> have special routines to turn on and turn off the forwarding
>> >> >> >> >> mode of PMU IRQs. These routines will be called from
>> >> >> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>> >> >> >> >> forwarding state.
>> >> >> >> >>
>> >> >> >> >> Scenario2
>> >> >> >> >> -------------
>> >> >> >> >>
>> >> >> >> >> We might have perf running on Host and Guest simultaneously
>> >> >> >> >> which means it is quite likely that PMU HW trigger IRQ meant
>> >> >> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>> >> >> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>> >> >> >> >> of Marc's patchset which is called before local_irq_enable()).
>> >> >> >> >>
>> >> >> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>> >> >> >> >> will accidentally forward IRQ meant for Host to Guest unless
>> >> >> >> >> we put additional checks to inspect VCPU PMU state.
>> >> >> >> >>
>> >> >> >> >> Am I missing any detail about IRQ forwarding for above
>> >> >> >> >> scenarios?
>> >> >> >> >>
>> >> >> >> > Hi Anup,
>> >> >> >>
>> >> >> >> Hi Christoffer,
>> >> >> >>
>> >> >> >> >
>> >> >> >> > I briefly discussed this with Marc.  What I don't understand is how it
>> >> >> >> > would be possible to get an interrupt for the host while running the
>> >> >> >> > guest?
>> >> >> >> >
>> >> >> >> > The rationale behind my question is that whenever you're running the
>> >> >> >> > guest, the PMU should be programmed exclusively with guest state, and
>> >> >> >> > since the PMU is per core, any interrupts should be for the guest, where
>> >> >> >> > it would always be pending.
>> >> >> >>
>> >> >> >> Yes, thats right PMU is programmed exclusively for guest when
>> >> >> >> guest is running and for host when host is running.
>> >> >> >>
>> >> >> >> Let us assume a situation (Scenario2 mentioned previously)
>> >> >> >> where both host and guest are using PMU. When the guest is
>> >> >> >> running we come back to host mode due to variety of reasons
>> >> >> >> (stage2 fault, guest IO, regular host interrupt, host interrupt
>> >> >> >> meant for guest, ....) which means we will return from the
>> >> >> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>> >> >> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>> >> >> >> At this point we would have restored back host PMU context and
>> >> >> >> any PMU counter used by host can trigger PMU overflow interrup
>> >> >> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>> >> >> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>> >> >> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>> >> >> >> which will try to detect PMU irq forwarding state in GIC hence it
>> >> >> >> can accidentally discover PMU irq pending for guest while this
>> >> >> >> PMU irq is actually meant for host.
>> >> >> >>
>> >> >> >> This above mentioned situation does not happen for timer
>> >> >> >> because virtual timer interrupts are exclusively used for guest.
>> >> >> >> The exclusive use of virtual timer interrupt for guest ensures that
>> >> >> >> the function kvm_timer_sync_hwstate() will always see correct
>> >> >> >> state of virtual timer IRQ from GIC.
>> >> >> >>
>> >> >> > I'm not quite following.
>> >> >> >
>> >> >> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
>> >> >> > you would (1) capture the active state of the IRQ pertaining to the
>> >> >> > guest and (2) deactive the IRQ on the host, then (3) switch the state of
>> >> >> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU
>> >> >> > you're running on.
>> >> >> >
>> >> >> > If the host PMU state restored in (3) causes the PMU to raise an
>> >> >> > interrupt, you'll take an interrupt after (4), which is for the host,
>> >> >> > and you'll handle it on the host.
>> >> >> >
>> >> >> We only switch PMU state in assembly code using
>> >> >> kvm_call_hyp(__kvm_vcpu_run, vcpu)
>> >> >> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
>> >> >> the current hardware PMU state is for host. This means whenever
>> >> >> we are in host mode the host PMU can change state of PMU IRQ
>> >> >> in GIC even if local IRQs are disabled.
>> >> >>
>> >> >> Whenever we inspect active state of PMU IRQ in the
>> >> >> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
>> >> >> Here we are not guaranteed that IRQ forward state returned by the
>> >> >> irq_get_fwd_state() API is for guest only.
>> >> >>
>> >> >> The above situation does not manifest for virtual timer because
>> >> >> virtual timer registers are exclusively accessed by Guest and
>> >> >> virtual timer interrupt is only for Guest (never used by Host).
>> >> >>
>> >> >> > Whenever you schedule the guest VCPU again, you'll (a) disable
>> >> >> > interrupts on the CPU, (b) restore the active state of the IRQ for the
>> >> >> > guest, (c) restore the guest PMU state, (d) switch to the guest with
>> >> >> > IRQs enabled on the CPU (potentially).
>> >> >>
>> >> >> Here too, while we are between step (a) and step (b) the PMU HW
>> >> >> context is for host and any PMU counter can overflow. The step (b)
>> >> >> can actually override the PMU IRQ meant for Host.
>> >> >>
>> >> > Can you not simply switch the state from C-code after capturing the IRQ
>> >> > state then?  Everything should be accessible from EL1, right?
>> >>
>> >> Yes, I think that would be the only option. This also means I will need
>> >> to re-implement context switching for doing it in C-code.
>> >
>> > Yes, you'd add some inline assembly in the C-code to access the
>> > registers I guess.  Only thing I thought about after writing my original
>> > mail is whether you'll be counting events while context-swtiching and
>> > running on the host, which you actually don't want to.  Not sure if
>> > there's a better way to avoid that.
>> >
>> >>
>> >> What about the scenario1 which I had mentioned?
>> >>
>> >
>> > You have to consider enabling/disabling forwarding and setting/clearing
>> > the active state is part of the guest PMU state and all of it has to be
>> > context-switched.
>>
>> I found one more issue.
>>
>> If PMU irq is PPI then enabling/disabling forwarding will not
>> work because irqd_set_irq_forwarded() function takes irq_data
>> as argument which is member of irq_desc and irq_desc for PPIs
>> is not per_cpu. This means we cannot call irqd_set_irq_forwarded()
>> simultaneously from different host CPUs.
>>

Hi Marc,

> I'll let Marc answer this one and if this still applies to his view of
> how the next version of the forwarding series will look like.

Ping??

>
> -Christoffer

--
Anup

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2014-11-27 10:22                             ` Anup Patel
@ 2014-11-27 10:40                               ` Marc Zyngier
  -1 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2014-11-27 10:40 UTC (permalink / raw)
  To: Anup Patel, Christoffer Dall
  Cc: kvmarm, linux-arm-kernel, KVM General, patches, Will Deacon,
	Ian.Campbell, Pranavkumar Sawargaonkar

On 27/11/14 10:22, Anup Patel wrote:
> On Tue, Nov 25, 2014 at 7:12 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
>> On Tue, Nov 25, 2014 at 06:17:03PM +0530, Anup Patel wrote:
>>> Hi Christoffer,
>>>
>>> On Mon, Nov 24, 2014 at 8:07 PM, Christoffer Dall
>>> <christoffer.dall@linaro.org> wrote:
>>>> On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
>>>>> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>> On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
>>>>>>> Hi Christoffer,
>>>>>>>
>>>>>>> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>> On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>>>>>>>>> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>>>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>>>> On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>>>>>>>>>>> Hi All,
>>>>>>>>>>>
>>>>>>>>>>> I have second thoughts about rebasing KVM PMU patches
>>>>>>>>>>> to Marc's irq-forwarding patches.
>>>>>>>>>>>
>>>>>>>>>>> The PMU IRQs (when virtualized by KVM) are not exactly
>>>>>>>>>>> forwarded IRQs because they are shared between Host
>>>>>>>>>>> and Guest.
>>>>>>>>>>>
>>>>>>>>>>> Scenario1
>>>>>>>>>>> -------------
>>>>>>>>>>>
>>>>>>>>>>> We might have perf running on Host and no KVM guest
>>>>>>>>>>> running. In this scenario, we wont get interrupts on Host
>>>>>>>>>>> because the kvm_pmu_hyp_init() (similar to the function
>>>>>>>>>>> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>>>>>>>>>>> implementation) has put all host PMU IRQs in forwarding
>>>>>>>>>>> mode.
>>>>>>>>>>>
>>>>>>>>>>> The only way solve this problem is to not set forwarding
>>>>>>>>>>> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>>>>>>>>>>> have special routines to turn on and turn off the forwarding
>>>>>>>>>>> mode of PMU IRQs. These routines will be called from
>>>>>>>>>>> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>>>>>>>>>>> forwarding state.
>>>>>>>>>>>
>>>>>>>>>>> Scenario2
>>>>>>>>>>> -------------
>>>>>>>>>>>
>>>>>>>>>>> We might have perf running on Host and Guest simultaneously
>>>>>>>>>>> which means it is quite likely that PMU HW trigger IRQ meant
>>>>>>>>>>> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>>>>>>>>>>> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>>>>>>>>>>> of Marc's patchset which is called before local_irq_enable()).
>>>>>>>>>>>
>>>>>>>>>>> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>>>>>>>>>>> will accidentally forward IRQ meant for Host to Guest unless
>>>>>>>>>>> we put additional checks to inspect VCPU PMU state.
>>>>>>>>>>>
>>>>>>>>>>> Am I missing any detail about IRQ forwarding for above
>>>>>>>>>>> scenarios?
>>>>>>>>>>>
>>>>>>>>>> Hi Anup,
>>>>>>>>>
>>>>>>>>> Hi Christoffer,
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I briefly discussed this with Marc.  What I don't understand is how it
>>>>>>>>>> would be possible to get an interrupt for the host while running the
>>>>>>>>>> guest?
>>>>>>>>>>
>>>>>>>>>> The rationale behind my question is that whenever you're running the
>>>>>>>>>> guest, the PMU should be programmed exclusively with guest state, and
>>>>>>>>>> since the PMU is per core, any interrupts should be for the guest, where
>>>>>>>>>> it would always be pending.
>>>>>>>>>
>>>>>>>>> Yes, thats right PMU is programmed exclusively for guest when
>>>>>>>>> guest is running and for host when host is running.
>>>>>>>>>
>>>>>>>>> Let us assume a situation (Scenario2 mentioned previously)
>>>>>>>>> where both host and guest are using PMU. When the guest is
>>>>>>>>> running we come back to host mode due to variety of reasons
>>>>>>>>> (stage2 fault, guest IO, regular host interrupt, host interrupt
>>>>>>>>> meant for guest, ....) which means we will return from the
>>>>>>>>> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>>>>>>>>> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>>>>>>>>> At this point we would have restored back host PMU context and
>>>>>>>>> any PMU counter used by host can trigger PMU overflow interrup
>>>>>>>>> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>>>>>>>>> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>>>>>>>>> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>>>>>>>>> which will try to detect PMU irq forwarding state in GIC hence it
>>>>>>>>> can accidentally discover PMU irq pending for guest while this
>>>>>>>>> PMU irq is actually meant for host.
>>>>>>>>>
>>>>>>>>> This above mentioned situation does not happen for timer
>>>>>>>>> because virtual timer interrupts are exclusively used for guest.
>>>>>>>>> The exclusive use of virtual timer interrupt for guest ensures that
>>>>>>>>> the function kvm_timer_sync_hwstate() will always see correct
>>>>>>>>> state of virtual timer IRQ from GIC.
>>>>>>>>>
>>>>>>>> I'm not quite following.
>>>>>>>>
>>>>>>>> When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
>>>>>>>> you would (1) capture the active state of the IRQ pertaining to the
>>>>>>>> guest and (2) deactive the IRQ on the host, then (3) switch the state of
>>>>>>>> the PMU to the host state, and finally (4) re-enable IRQs on the CPU
>>>>>>>> you're running on.
>>>>>>>>
>>>>>>>> If the host PMU state restored in (3) causes the PMU to raise an
>>>>>>>> interrupt, you'll take an interrupt after (4), which is for the host,
>>>>>>>> and you'll handle it on the host.
>>>>>>>>
>>>>>>> We only switch PMU state in assembly code using
>>>>>>> kvm_call_hyp(__kvm_vcpu_run, vcpu)
>>>>>>> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
>>>>>>> the current hardware PMU state is for host. This means whenever
>>>>>>> we are in host mode the host PMU can change state of PMU IRQ
>>>>>>> in GIC even if local IRQs are disabled.
>>>>>>>
>>>>>>> Whenever we inspect active state of PMU IRQ in the
>>>>>>> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
>>>>>>> Here we are not guaranteed that IRQ forward state returned by the
>>>>>>> irq_get_fwd_state() API is for guest only.
>>>>>>>
>>>>>>> The above situation does not manifest for virtual timer because
>>>>>>> virtual timer registers are exclusively accessed by Guest and
>>>>>>> virtual timer interrupt is only for Guest (never used by Host).
>>>>>>>
>>>>>>>> Whenever you schedule the guest VCPU again, you'll (a) disable
>>>>>>>> interrupts on the CPU, (b) restore the active state of the IRQ for the
>>>>>>>> guest, (c) restore the guest PMU state, (d) switch to the guest with
>>>>>>>> IRQs enabled on the CPU (potentially).
>>>>>>>
>>>>>>> Here too, while we are between step (a) and step (b) the PMU HW
>>>>>>> context is for host and any PMU counter can overflow. The step (b)
>>>>>>> can actually override the PMU IRQ meant for Host.
>>>>>>>
>>>>>> Can you not simply switch the state from C-code after capturing the IRQ
>>>>>> state then?  Everything should be accessible from EL1, right?
>>>>>
>>>>> Yes, I think that would be the only option. This also means I will need
>>>>> to re-implement context switching for doing it in C-code.
>>>>
>>>> Yes, you'd add some inline assembly in the C-code to access the
>>>> registers I guess.  Only thing I thought about after writing my original
>>>> mail is whether you'll be counting events while context-swtiching and
>>>> running on the host, which you actually don't want to.  Not sure if
>>>> there's a better way to avoid that.
>>>>
>>>>>
>>>>> What about the scenario1 which I had mentioned?
>>>>>
>>>>
>>>> You have to consider enabling/disabling forwarding and setting/clearing
>>>> the active state is part of the guest PMU state and all of it has to be
>>>> context-switched.
>>>
>>> I found one more issue.
>>>
>>> If PMU irq is PPI then enabling/disabling forwarding will not
>>> work because irqd_set_irq_forwarded() function takes irq_data
>>> as argument which is member of irq_desc and irq_desc for PPIs
>>> is not per_cpu. This means we cannot call irqd_set_irq_forwarded()
>>> simultaneously from different host CPUs.
>>>
> 
> Hi Marc,
> 
>> I'll let Marc answer this one and if this still applies to his view of
>> how the next version of the forwarding series will look like.

I'm looking at it at the moment.

I'm inclined to say that we should fix the forwarding code to allow
individual PPIs to be forwarded. This is a bit harder than what we're
doing at the moment, but that's possible.

Of course, that complicates the code a bit, as we have to make sure
we're not premptable at that time.

What do you think?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-27 10:40                               ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2014-11-27 10:40 UTC (permalink / raw)
  To: linux-arm-kernel

On 27/11/14 10:22, Anup Patel wrote:
> On Tue, Nov 25, 2014 at 7:12 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
>> On Tue, Nov 25, 2014 at 06:17:03PM +0530, Anup Patel wrote:
>>> Hi Christoffer,
>>>
>>> On Mon, Nov 24, 2014 at 8:07 PM, Christoffer Dall
>>> <christoffer.dall@linaro.org> wrote:
>>>> On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
>>>>> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>> On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
>>>>>>> Hi Christoffer,
>>>>>>>
>>>>>>> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>> On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>>>>>>>>> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>>>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>>>> On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>>>>>>>>>>> Hi All,
>>>>>>>>>>>
>>>>>>>>>>> I have second thoughts about rebasing KVM PMU patches
>>>>>>>>>>> to Marc's irq-forwarding patches.
>>>>>>>>>>>
>>>>>>>>>>> The PMU IRQs (when virtualized by KVM) are not exactly
>>>>>>>>>>> forwarded IRQs because they are shared between Host
>>>>>>>>>>> and Guest.
>>>>>>>>>>>
>>>>>>>>>>> Scenario1
>>>>>>>>>>> -------------
>>>>>>>>>>>
>>>>>>>>>>> We might have perf running on Host and no KVM guest
>>>>>>>>>>> running. In this scenario, we wont get interrupts on Host
>>>>>>>>>>> because the kvm_pmu_hyp_init() (similar to the function
>>>>>>>>>>> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>>>>>>>>>>> implementation) has put all host PMU IRQs in forwarding
>>>>>>>>>>> mode.
>>>>>>>>>>>
>>>>>>>>>>> The only way solve this problem is to not set forwarding
>>>>>>>>>>> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>>>>>>>>>>> have special routines to turn on and turn off the forwarding
>>>>>>>>>>> mode of PMU IRQs. These routines will be called from
>>>>>>>>>>> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>>>>>>>>>>> forwarding state.
>>>>>>>>>>>
>>>>>>>>>>> Scenario2
>>>>>>>>>>> -------------
>>>>>>>>>>>
>>>>>>>>>>> We might have perf running on Host and Guest simultaneously
>>>>>>>>>>> which means it is quite likely that PMU HW trigger IRQ meant
>>>>>>>>>>> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>>>>>>>>>>> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>>>>>>>>>>> of Marc's patchset which is called before local_irq_enable()).
>>>>>>>>>>>
>>>>>>>>>>> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>>>>>>>>>>> will accidentally forward IRQ meant for Host to Guest unless
>>>>>>>>>>> we put additional checks to inspect VCPU PMU state.
>>>>>>>>>>>
>>>>>>>>>>> Am I missing any detail about IRQ forwarding for above
>>>>>>>>>>> scenarios?
>>>>>>>>>>>
>>>>>>>>>> Hi Anup,
>>>>>>>>>
>>>>>>>>> Hi Christoffer,
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I briefly discussed this with Marc.  What I don't understand is how it
>>>>>>>>>> would be possible to get an interrupt for the host while running the
>>>>>>>>>> guest?
>>>>>>>>>>
>>>>>>>>>> The rationale behind my question is that whenever you're running the
>>>>>>>>>> guest, the PMU should be programmed exclusively with guest state, and
>>>>>>>>>> since the PMU is per core, any interrupts should be for the guest, where
>>>>>>>>>> it would always be pending.
>>>>>>>>>
>>>>>>>>> Yes, thats right PMU is programmed exclusively for guest when
>>>>>>>>> guest is running and for host when host is running.
>>>>>>>>>
>>>>>>>>> Let us assume a situation (Scenario2 mentioned previously)
>>>>>>>>> where both host and guest are using PMU. When the guest is
>>>>>>>>> running we come back to host mode due to variety of reasons
>>>>>>>>> (stage2 fault, guest IO, regular host interrupt, host interrupt
>>>>>>>>> meant for guest, ....) which means we will return from the
>>>>>>>>> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>>>>>>>>> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>>>>>>>>> At this point we would have restored back host PMU context and
>>>>>>>>> any PMU counter used by host can trigger PMU overflow interrup
>>>>>>>>> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>>>>>>>>> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>>>>>>>>> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>>>>>>>>> which will try to detect PMU irq forwarding state in GIC hence it
>>>>>>>>> can accidentally discover PMU irq pending for guest while this
>>>>>>>>> PMU irq is actually meant for host.
>>>>>>>>>
>>>>>>>>> This above mentioned situation does not happen for timer
>>>>>>>>> because virtual timer interrupts are exclusively used for guest.
>>>>>>>>> The exclusive use of virtual timer interrupt for guest ensures that
>>>>>>>>> the function kvm_timer_sync_hwstate() will always see correct
>>>>>>>>> state of virtual timer IRQ from GIC.
>>>>>>>>>
>>>>>>>> I'm not quite following.
>>>>>>>>
>>>>>>>> When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
>>>>>>>> you would (1) capture the active state of the IRQ pertaining to the
>>>>>>>> guest and (2) deactive the IRQ on the host, then (3) switch the state of
>>>>>>>> the PMU to the host state, and finally (4) re-enable IRQs on the CPU
>>>>>>>> you're running on.
>>>>>>>>
>>>>>>>> If the host PMU state restored in (3) causes the PMU to raise an
>>>>>>>> interrupt, you'll take an interrupt after (4), which is for the host,
>>>>>>>> and you'll handle it on the host.
>>>>>>>>
>>>>>>> We only switch PMU state in assembly code using
>>>>>>> kvm_call_hyp(__kvm_vcpu_run, vcpu)
>>>>>>> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
>>>>>>> the current hardware PMU state is for host. This means whenever
>>>>>>> we are in host mode the host PMU can change state of PMU IRQ
>>>>>>> in GIC even if local IRQs are disabled.
>>>>>>>
>>>>>>> Whenever we inspect active state of PMU IRQ in the
>>>>>>> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
>>>>>>> Here we are not guaranteed that IRQ forward state returned by the
>>>>>>> irq_get_fwd_state() API is for guest only.
>>>>>>>
>>>>>>> The above situation does not manifest for virtual timer because
>>>>>>> virtual timer registers are exclusively accessed by Guest and
>>>>>>> virtual timer interrupt is only for Guest (never used by Host).
>>>>>>>
>>>>>>>> Whenever you schedule the guest VCPU again, you'll (a) disable
>>>>>>>> interrupts on the CPU, (b) restore the active state of the IRQ for the
>>>>>>>> guest, (c) restore the guest PMU state, (d) switch to the guest with
>>>>>>>> IRQs enabled on the CPU (potentially).
>>>>>>>
>>>>>>> Here too, while we are between step (a) and step (b) the PMU HW
>>>>>>> context is for host and any PMU counter can overflow. The step (b)
>>>>>>> can actually override the PMU IRQ meant for Host.
>>>>>>>
>>>>>> Can you not simply switch the state from C-code after capturing the IRQ
>>>>>> state then?  Everything should be accessible from EL1, right?
>>>>>
>>>>> Yes, I think that would be the only option. This also means I will need
>>>>> to re-implement context switching for doing it in C-code.
>>>>
>>>> Yes, you'd add some inline assembly in the C-code to access the
>>>> registers I guess.  Only thing I thought about after writing my original
>>>> mail is whether you'll be counting events while context-swtiching and
>>>> running on the host, which you actually don't want to.  Not sure if
>>>> there's a better way to avoid that.
>>>>
>>>>>
>>>>> What about the scenario1 which I had mentioned?
>>>>>
>>>>
>>>> You have to consider enabling/disabling forwarding and setting/clearing
>>>> the active state is part of the guest PMU state and all of it has to be
>>>> context-switched.
>>>
>>> I found one more issue.
>>>
>>> If PMU irq is PPI then enabling/disabling forwarding will not
>>> work because irqd_set_irq_forwarded() function takes irq_data
>>> as argument which is member of irq_desc and irq_desc for PPIs
>>> is not per_cpu. This means we cannot call irqd_set_irq_forwarded()
>>> simultaneously from different host CPUs.
>>>
> 
> Hi Marc,
> 
>> I'll let Marc answer this one and if this still applies to his view of
>> how the next version of the forwarding series will look like.

I'm looking at it at the moment.

I'm inclined to say that we should fix the forwarding code to allow
individual PPIs to be forwarded. This is a bit harder than what we're
doing at the moment, but that's possible.

Of course, that complicates the code a bit, as we have to make sure
we're not premptable at that time.

What do you think?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2014-11-27 10:40                               ` Marc Zyngier
@ 2014-11-27 10:54                                 ` Anup Patel
  -1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-27 10:54 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Christoffer Dall, kvmarm, linux-arm-kernel, KVM General, patches,
	Will Deacon, Ian.Campbell, Pranavkumar Sawargaonkar

On Thu, Nov 27, 2014 at 4:10 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 27/11/14 10:22, Anup Patel wrote:
>> On Tue, Nov 25, 2014 at 7:12 PM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>>> On Tue, Nov 25, 2014 at 06:17:03PM +0530, Anup Patel wrote:
>>>> Hi Christoffer,
>>>>
>>>> On Mon, Nov 24, 2014 at 8:07 PM, Christoffer Dall
>>>> <christoffer.dall@linaro.org> wrote:
>>>>> On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
>>>>>> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>> On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
>>>>>>>> Hi Christoffer,
>>>>>>>>
>>>>>>>> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
>>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>>> On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>>>>>>>>>> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>>>>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>>>>> On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>
>>>>>>>>>>>> I have second thoughts about rebasing KVM PMU patches
>>>>>>>>>>>> to Marc's irq-forwarding patches.
>>>>>>>>>>>>
>>>>>>>>>>>> The PMU IRQs (when virtualized by KVM) are not exactly
>>>>>>>>>>>> forwarded IRQs because they are shared between Host
>>>>>>>>>>>> and Guest.
>>>>>>>>>>>>
>>>>>>>>>>>> Scenario1
>>>>>>>>>>>> -------------
>>>>>>>>>>>>
>>>>>>>>>>>> We might have perf running on Host and no KVM guest
>>>>>>>>>>>> running. In this scenario, we wont get interrupts on Host
>>>>>>>>>>>> because the kvm_pmu_hyp_init() (similar to the function
>>>>>>>>>>>> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>>>>>>>>>>>> implementation) has put all host PMU IRQs in forwarding
>>>>>>>>>>>> mode.
>>>>>>>>>>>>
>>>>>>>>>>>> The only way solve this problem is to not set forwarding
>>>>>>>>>>>> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>>>>>>>>>>>> have special routines to turn on and turn off the forwarding
>>>>>>>>>>>> mode of PMU IRQs. These routines will be called from
>>>>>>>>>>>> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>>>>>>>>>>>> forwarding state.
>>>>>>>>>>>>
>>>>>>>>>>>> Scenario2
>>>>>>>>>>>> -------------
>>>>>>>>>>>>
>>>>>>>>>>>> We might have perf running on Host and Guest simultaneously
>>>>>>>>>>>> which means it is quite likely that PMU HW trigger IRQ meant
>>>>>>>>>>>> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>>>>>>>>>>>> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>>>>>>>>>>>> of Marc's patchset which is called before local_irq_enable()).
>>>>>>>>>>>>
>>>>>>>>>>>> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>>>>>>>>>>>> will accidentally forward IRQ meant for Host to Guest unless
>>>>>>>>>>>> we put additional checks to inspect VCPU PMU state.
>>>>>>>>>>>>
>>>>>>>>>>>> Am I missing any detail about IRQ forwarding for above
>>>>>>>>>>>> scenarios?
>>>>>>>>>>>>
>>>>>>>>>>> Hi Anup,
>>>>>>>>>>
>>>>>>>>>> Hi Christoffer,
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I briefly discussed this with Marc.  What I don't understand is how it
>>>>>>>>>>> would be possible to get an interrupt for the host while running the
>>>>>>>>>>> guest?
>>>>>>>>>>>
>>>>>>>>>>> The rationale behind my question is that whenever you're running the
>>>>>>>>>>> guest, the PMU should be programmed exclusively with guest state, and
>>>>>>>>>>> since the PMU is per core, any interrupts should be for the guest, where
>>>>>>>>>>> it would always be pending.
>>>>>>>>>>
>>>>>>>>>> Yes, thats right PMU is programmed exclusively for guest when
>>>>>>>>>> guest is running and for host when host is running.
>>>>>>>>>>
>>>>>>>>>> Let us assume a situation (Scenario2 mentioned previously)
>>>>>>>>>> where both host and guest are using PMU. When the guest is
>>>>>>>>>> running we come back to host mode due to variety of reasons
>>>>>>>>>> (stage2 fault, guest IO, regular host interrupt, host interrupt
>>>>>>>>>> meant for guest, ....) which means we will return from the
>>>>>>>>>> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>>>>>>>>>> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>>>>>>>>>> At this point we would have restored back host PMU context and
>>>>>>>>>> any PMU counter used by host can trigger PMU overflow interrup
>>>>>>>>>> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>>>>>>>>>> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>>>>>>>>>> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>>>>>>>>>> which will try to detect PMU irq forwarding state in GIC hence it
>>>>>>>>>> can accidentally discover PMU irq pending for guest while this
>>>>>>>>>> PMU irq is actually meant for host.
>>>>>>>>>>
>>>>>>>>>> This above mentioned situation does not happen for timer
>>>>>>>>>> because virtual timer interrupts are exclusively used for guest.
>>>>>>>>>> The exclusive use of virtual timer interrupt for guest ensures that
>>>>>>>>>> the function kvm_timer_sync_hwstate() will always see correct
>>>>>>>>>> state of virtual timer IRQ from GIC.
>>>>>>>>>>
>>>>>>>>> I'm not quite following.
>>>>>>>>>
>>>>>>>>> When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
>>>>>>>>> you would (1) capture the active state of the IRQ pertaining to the
>>>>>>>>> guest and (2) deactive the IRQ on the host, then (3) switch the state of
>>>>>>>>> the PMU to the host state, and finally (4) re-enable IRQs on the CPU
>>>>>>>>> you're running on.
>>>>>>>>>
>>>>>>>>> If the host PMU state restored in (3) causes the PMU to raise an
>>>>>>>>> interrupt, you'll take an interrupt after (4), which is for the host,
>>>>>>>>> and you'll handle it on the host.
>>>>>>>>>
>>>>>>>> We only switch PMU state in assembly code using
>>>>>>>> kvm_call_hyp(__kvm_vcpu_run, vcpu)
>>>>>>>> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
>>>>>>>> the current hardware PMU state is for host. This means whenever
>>>>>>>> we are in host mode the host PMU can change state of PMU IRQ
>>>>>>>> in GIC even if local IRQs are disabled.
>>>>>>>>
>>>>>>>> Whenever we inspect active state of PMU IRQ in the
>>>>>>>> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
>>>>>>>> Here we are not guaranteed that IRQ forward state returned by the
>>>>>>>> irq_get_fwd_state() API is for guest only.
>>>>>>>>
>>>>>>>> The above situation does not manifest for virtual timer because
>>>>>>>> virtual timer registers are exclusively accessed by Guest and
>>>>>>>> virtual timer interrupt is only for Guest (never used by Host).
>>>>>>>>
>>>>>>>>> Whenever you schedule the guest VCPU again, you'll (a) disable
>>>>>>>>> interrupts on the CPU, (b) restore the active state of the IRQ for the
>>>>>>>>> guest, (c) restore the guest PMU state, (d) switch to the guest with
>>>>>>>>> IRQs enabled on the CPU (potentially).
>>>>>>>>
>>>>>>>> Here too, while we are between step (a) and step (b) the PMU HW
>>>>>>>> context is for host and any PMU counter can overflow. The step (b)
>>>>>>>> can actually override the PMU IRQ meant for Host.
>>>>>>>>
>>>>>>> Can you not simply switch the state from C-code after capturing the IRQ
>>>>>>> state then?  Everything should be accessible from EL1, right?
>>>>>>
>>>>>> Yes, I think that would be the only option. This also means I will need
>>>>>> to re-implement context switching for doing it in C-code.
>>>>>
>>>>> Yes, you'd add some inline assembly in the C-code to access the
>>>>> registers I guess.  Only thing I thought about after writing my original
>>>>> mail is whether you'll be counting events while context-swtiching and
>>>>> running on the host, which you actually don't want to.  Not sure if
>>>>> there's a better way to avoid that.
>>>>>
>>>>>>
>>>>>> What about the scenario1 which I had mentioned?
>>>>>>
>>>>>
>>>>> You have to consider enabling/disabling forwarding and setting/clearing
>>>>> the active state is part of the guest PMU state and all of it has to be
>>>>> context-switched.
>>>>
>>>> I found one more issue.
>>>>
>>>> If PMU irq is PPI then enabling/disabling forwarding will not
>>>> work because irqd_set_irq_forwarded() function takes irq_data
>>>> as argument which is member of irq_desc and irq_desc for PPIs
>>>> is not per_cpu. This means we cannot call irqd_set_irq_forwarded()
>>>> simultaneously from different host CPUs.
>>>>
>>
>> Hi Marc,
>>
>>> I'll let Marc answer this one and if this still applies to his view of
>>> how the next version of the forwarding series will look like.
>
> I'm looking at it at the moment.
>
> I'm inclined to say that we should fix the forwarding code to allow
> individual PPIs to be forwarded. This is a bit harder than what we're
> doing at the moment, but that's possible.
>
> Of course, that complicates the code a bit, as we have to make sure
> we're not premptable at that time.
>
> What do you think?

Currently, irqd_set_irq_forwarded() is lockless.

It would be great if we can update irqd_set_irq_forwarded() for PPIs
such that it remains irqd_set_irq_forwarded() lockless so that we
dont have much overhead when we enable/disable forwarding
state.

>
> Thanks,
>
>         M.
> --
> Jazz is not dead. It just smells funny...

Regards,
Anup

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-27 10:54                                 ` Anup Patel
  0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-27 10:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 27, 2014 at 4:10 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 27/11/14 10:22, Anup Patel wrote:
>> On Tue, Nov 25, 2014 at 7:12 PM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>>> On Tue, Nov 25, 2014 at 06:17:03PM +0530, Anup Patel wrote:
>>>> Hi Christoffer,
>>>>
>>>> On Mon, Nov 24, 2014 at 8:07 PM, Christoffer Dall
>>>> <christoffer.dall@linaro.org> wrote:
>>>>> On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
>>>>>> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>> On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
>>>>>>>> Hi Christoffer,
>>>>>>>>
>>>>>>>> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
>>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>>> On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>>>>>>>>>> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>>>>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>>>>> On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>
>>>>>>>>>>>> I have second thoughts about rebasing KVM PMU patches
>>>>>>>>>>>> to Marc's irq-forwarding patches.
>>>>>>>>>>>>
>>>>>>>>>>>> The PMU IRQs (when virtualized by KVM) are not exactly
>>>>>>>>>>>> forwarded IRQs because they are shared between Host
>>>>>>>>>>>> and Guest.
>>>>>>>>>>>>
>>>>>>>>>>>> Scenario1
>>>>>>>>>>>> -------------
>>>>>>>>>>>>
>>>>>>>>>>>> We might have perf running on Host and no KVM guest
>>>>>>>>>>>> running. In this scenario, we wont get interrupts on Host
>>>>>>>>>>>> because the kvm_pmu_hyp_init() (similar to the function
>>>>>>>>>>>> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>>>>>>>>>>>> implementation) has put all host PMU IRQs in forwarding
>>>>>>>>>>>> mode.
>>>>>>>>>>>>
>>>>>>>>>>>> The only way solve this problem is to not set forwarding
>>>>>>>>>>>> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>>>>>>>>>>>> have special routines to turn on and turn off the forwarding
>>>>>>>>>>>> mode of PMU IRQs. These routines will be called from
>>>>>>>>>>>> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>>>>>>>>>>>> forwarding state.
>>>>>>>>>>>>
>>>>>>>>>>>> Scenario2
>>>>>>>>>>>> -------------
>>>>>>>>>>>>
>>>>>>>>>>>> We might have perf running on Host and Guest simultaneously
>>>>>>>>>>>> which means it is quite likely that PMU HW trigger IRQ meant
>>>>>>>>>>>> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>>>>>>>>>>>> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>>>>>>>>>>>> of Marc's patchset which is called before local_irq_enable()).
>>>>>>>>>>>>
>>>>>>>>>>>> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>>>>>>>>>>>> will accidentally forward IRQ meant for Host to Guest unless
>>>>>>>>>>>> we put additional checks to inspect VCPU PMU state.
>>>>>>>>>>>>
>>>>>>>>>>>> Am I missing any detail about IRQ forwarding for above
>>>>>>>>>>>> scenarios?
>>>>>>>>>>>>
>>>>>>>>>>> Hi Anup,
>>>>>>>>>>
>>>>>>>>>> Hi Christoffer,
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I briefly discussed this with Marc.  What I don't understand is how it
>>>>>>>>>>> would be possible to get an interrupt for the host while running the
>>>>>>>>>>> guest?
>>>>>>>>>>>
>>>>>>>>>>> The rationale behind my question is that whenever you're running the
>>>>>>>>>>> guest, the PMU should be programmed exclusively with guest state, and
>>>>>>>>>>> since the PMU is per core, any interrupts should be for the guest, where
>>>>>>>>>>> it would always be pending.
>>>>>>>>>>
>>>>>>>>>> Yes, thats right PMU is programmed exclusively for guest when
>>>>>>>>>> guest is running and for host when host is running.
>>>>>>>>>>
>>>>>>>>>> Let us assume a situation (Scenario2 mentioned previously)
>>>>>>>>>> where both host and guest are using PMU. When the guest is
>>>>>>>>>> running we come back to host mode due to variety of reasons
>>>>>>>>>> (stage2 fault, guest IO, regular host interrupt, host interrupt
>>>>>>>>>> meant for guest, ....) which means we will return from the
>>>>>>>>>> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>>>>>>>>>> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>>>>>>>>>> At this point we would have restored back host PMU context and
>>>>>>>>>> any PMU counter used by host can trigger PMU overflow interrup
>>>>>>>>>> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>>>>>>>>>> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>>>>>>>>>> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>>>>>>>>>> which will try to detect PMU irq forwarding state in GIC hence it
>>>>>>>>>> can accidentally discover PMU irq pending for guest while this
>>>>>>>>>> PMU irq is actually meant for host.
>>>>>>>>>>
>>>>>>>>>> This above mentioned situation does not happen for timer
>>>>>>>>>> because virtual timer interrupts are exclusively used for guest.
>>>>>>>>>> The exclusive use of virtual timer interrupt for guest ensures that
>>>>>>>>>> the function kvm_timer_sync_hwstate() will always see correct
>>>>>>>>>> state of virtual timer IRQ from GIC.
>>>>>>>>>>
>>>>>>>>> I'm not quite following.
>>>>>>>>>
>>>>>>>>> When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
>>>>>>>>> you would (1) capture the active state of the IRQ pertaining to the
>>>>>>>>> guest and (2) deactive the IRQ on the host, then (3) switch the state of
>>>>>>>>> the PMU to the host state, and finally (4) re-enable IRQs on the CPU
>>>>>>>>> you're running on.
>>>>>>>>>
>>>>>>>>> If the host PMU state restored in (3) causes the PMU to raise an
>>>>>>>>> interrupt, you'll take an interrupt after (4), which is for the host,
>>>>>>>>> and you'll handle it on the host.
>>>>>>>>>
>>>>>>>> We only switch PMU state in assembly code using
>>>>>>>> kvm_call_hyp(__kvm_vcpu_run, vcpu)
>>>>>>>> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
>>>>>>>> the current hardware PMU state is for host. This means whenever
>>>>>>>> we are in host mode the host PMU can change state of PMU IRQ
>>>>>>>> in GIC even if local IRQs are disabled.
>>>>>>>>
>>>>>>>> Whenever we inspect active state of PMU IRQ in the
>>>>>>>> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
>>>>>>>> Here we are not guaranteed that IRQ forward state returned by the
>>>>>>>> irq_get_fwd_state() API is for guest only.
>>>>>>>>
>>>>>>>> The above situation does not manifest for virtual timer because
>>>>>>>> virtual timer registers are exclusively accessed by Guest and
>>>>>>>> virtual timer interrupt is only for Guest (never used by Host).
>>>>>>>>
>>>>>>>>> Whenever you schedule the guest VCPU again, you'll (a) disable
>>>>>>>>> interrupts on the CPU, (b) restore the active state of the IRQ for the
>>>>>>>>> guest, (c) restore the guest PMU state, (d) switch to the guest with
>>>>>>>>> IRQs enabled on the CPU (potentially).
>>>>>>>>
>>>>>>>> Here too, while we are between step (a) and step (b) the PMU HW
>>>>>>>> context is for host and any PMU counter can overflow. The step (b)
>>>>>>>> can actually override the PMU IRQ meant for Host.
>>>>>>>>
>>>>>>> Can you not simply switch the state from C-code after capturing the IRQ
>>>>>>> state then?  Everything should be accessible from EL1, right?
>>>>>>
>>>>>> Yes, I think that would be the only option. This also means I will need
>>>>>> to re-implement context switching for doing it in C-code.
>>>>>
>>>>> Yes, you'd add some inline assembly in the C-code to access the
>>>>> registers I guess.  Only thing I thought about after writing my original
>>>>> mail is whether you'll be counting events while context-swtiching and
>>>>> running on the host, which you actually don't want to.  Not sure if
>>>>> there's a better way to avoid that.
>>>>>
>>>>>>
>>>>>> What about the scenario1 which I had mentioned?
>>>>>>
>>>>>
>>>>> You have to consider enabling/disabling forwarding and setting/clearing
>>>>> the active state is part of the guest PMU state and all of it has to be
>>>>> context-switched.
>>>>
>>>> I found one more issue.
>>>>
>>>> If PMU irq is PPI then enabling/disabling forwarding will not
>>>> work because irqd_set_irq_forwarded() function takes irq_data
>>>> as argument which is member of irq_desc and irq_desc for PPIs
>>>> is not per_cpu. This means we cannot call irqd_set_irq_forwarded()
>>>> simultaneously from different host CPUs.
>>>>
>>
>> Hi Marc,
>>
>>> I'll let Marc answer this one and if this still applies to his view of
>>> how the next version of the forwarding series will look like.
>
> I'm looking at it at the moment.
>
> I'm inclined to say that we should fix the forwarding code to allow
> individual PPIs to be forwarded. This is a bit harder than what we're
> doing at the moment, but that's possible.
>
> Of course, that complicates the code a bit, as we have to make sure
> we're not premptable at that time.
>
> What do you think?

Currently, irqd_set_irq_forwarded() is lockless.

It would be great if we can update irqd_set_irq_forwarded() for PPIs
such that it remains irqd_set_irq_forwarded() lockless so that we
dont have much overhead when we enable/disable forwarding
state.

>
> Thanks,
>
>         M.
> --
> Jazz is not dead. It just smells funny...

Regards,
Anup

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2014-11-27 10:54                                 ` Anup Patel
@ 2014-11-27 11:06                                   ` Marc Zyngier
  -1 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2014-11-27 11:06 UTC (permalink / raw)
  To: Anup Patel
  Cc: Christoffer Dall, kvmarm, linux-arm-kernel, KVM General, patches,
	Will Deacon, Ian.Campbell, Pranavkumar Sawargaonkar

On 27/11/14 10:54, Anup Patel wrote:
> On Thu, Nov 27, 2014 at 4:10 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> On 27/11/14 10:22, Anup Patel wrote:
>>> On Tue, Nov 25, 2014 at 7:12 PM, Christoffer Dall
>>> <christoffer.dall@linaro.org> wrote:
>>>> On Tue, Nov 25, 2014 at 06:17:03PM +0530, Anup Patel wrote:
>>>>> Hi Christoffer,
>>>>>
>>>>> On Mon, Nov 24, 2014 at 8:07 PM, Christoffer Dall
>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>> On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
>>>>>>> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>> On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
>>>>>>>>> Hi Christoffer,
>>>>>>>>>
>>>>>>>>> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
>>>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>>>> On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>>>>>>>>>>> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>>>>>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>>>>>> On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have second thoughts about rebasing KVM PMU patches
>>>>>>>>>>>>> to Marc's irq-forwarding patches.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The PMU IRQs (when virtualized by KVM) are not exactly
>>>>>>>>>>>>> forwarded IRQs because they are shared between Host
>>>>>>>>>>>>> and Guest.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Scenario1
>>>>>>>>>>>>> -------------
>>>>>>>>>>>>>
>>>>>>>>>>>>> We might have perf running on Host and no KVM guest
>>>>>>>>>>>>> running. In this scenario, we wont get interrupts on Host
>>>>>>>>>>>>> because the kvm_pmu_hyp_init() (similar to the function
>>>>>>>>>>>>> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>>>>>>>>>>>>> implementation) has put all host PMU IRQs in forwarding
>>>>>>>>>>>>> mode.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The only way solve this problem is to not set forwarding
>>>>>>>>>>>>> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>>>>>>>>>>>>> have special routines to turn on and turn off the forwarding
>>>>>>>>>>>>> mode of PMU IRQs. These routines will be called from
>>>>>>>>>>>>> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>>>>>>>>>>>>> forwarding state.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Scenario2
>>>>>>>>>>>>> -------------
>>>>>>>>>>>>>
>>>>>>>>>>>>> We might have perf running on Host and Guest simultaneously
>>>>>>>>>>>>> which means it is quite likely that PMU HW trigger IRQ meant
>>>>>>>>>>>>> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>>>>>>>>>>>>> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>>>>>>>>>>>>> of Marc's patchset which is called before local_irq_enable()).
>>>>>>>>>>>>>
>>>>>>>>>>>>> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>>>>>>>>>>>>> will accidentally forward IRQ meant for Host to Guest unless
>>>>>>>>>>>>> we put additional checks to inspect VCPU PMU state.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am I missing any detail about IRQ forwarding for above
>>>>>>>>>>>>> scenarios?
>>>>>>>>>>>>>
>>>>>>>>>>>> Hi Anup,
>>>>>>>>>>>
>>>>>>>>>>> Hi Christoffer,
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I briefly discussed this with Marc.  What I don't understand is how it
>>>>>>>>>>>> would be possible to get an interrupt for the host while running the
>>>>>>>>>>>> guest?
>>>>>>>>>>>>
>>>>>>>>>>>> The rationale behind my question is that whenever you're running the
>>>>>>>>>>>> guest, the PMU should be programmed exclusively with guest state, and
>>>>>>>>>>>> since the PMU is per core, any interrupts should be for the guest, where
>>>>>>>>>>>> it would always be pending.
>>>>>>>>>>>
>>>>>>>>>>> Yes, thats right PMU is programmed exclusively for guest when
>>>>>>>>>>> guest is running and for host when host is running.
>>>>>>>>>>>
>>>>>>>>>>> Let us assume a situation (Scenario2 mentioned previously)
>>>>>>>>>>> where both host and guest are using PMU. When the guest is
>>>>>>>>>>> running we come back to host mode due to variety of reasons
>>>>>>>>>>> (stage2 fault, guest IO, regular host interrupt, host interrupt
>>>>>>>>>>> meant for guest, ....) which means we will return from the
>>>>>>>>>>> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>>>>>>>>>>> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>>>>>>>>>>> At this point we would have restored back host PMU context and
>>>>>>>>>>> any PMU counter used by host can trigger PMU overflow interrup
>>>>>>>>>>> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>>>>>>>>>>> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>>>>>>>>>>> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>>>>>>>>>>> which will try to detect PMU irq forwarding state in GIC hence it
>>>>>>>>>>> can accidentally discover PMU irq pending for guest while this
>>>>>>>>>>> PMU irq is actually meant for host.
>>>>>>>>>>>
>>>>>>>>>>> This above mentioned situation does not happen for timer
>>>>>>>>>>> because virtual timer interrupts are exclusively used for guest.
>>>>>>>>>>> The exclusive use of virtual timer interrupt for guest ensures that
>>>>>>>>>>> the function kvm_timer_sync_hwstate() will always see correct
>>>>>>>>>>> state of virtual timer IRQ from GIC.
>>>>>>>>>>>
>>>>>>>>>> I'm not quite following.
>>>>>>>>>>
>>>>>>>>>> When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
>>>>>>>>>> you would (1) capture the active state of the IRQ pertaining to the
>>>>>>>>>> guest and (2) deactive the IRQ on the host, then (3) switch the state of
>>>>>>>>>> the PMU to the host state, and finally (4) re-enable IRQs on the CPU
>>>>>>>>>> you're running on.
>>>>>>>>>>
>>>>>>>>>> If the host PMU state restored in (3) causes the PMU to raise an
>>>>>>>>>> interrupt, you'll take an interrupt after (4), which is for the host,
>>>>>>>>>> and you'll handle it on the host.
>>>>>>>>>>
>>>>>>>>> We only switch PMU state in assembly code using
>>>>>>>>> kvm_call_hyp(__kvm_vcpu_run, vcpu)
>>>>>>>>> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
>>>>>>>>> the current hardware PMU state is for host. This means whenever
>>>>>>>>> we are in host mode the host PMU can change state of PMU IRQ
>>>>>>>>> in GIC even if local IRQs are disabled.
>>>>>>>>>
>>>>>>>>> Whenever we inspect active state of PMU IRQ in the
>>>>>>>>> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
>>>>>>>>> Here we are not guaranteed that IRQ forward state returned by the
>>>>>>>>> irq_get_fwd_state() API is for guest only.
>>>>>>>>>
>>>>>>>>> The above situation does not manifest for virtual timer because
>>>>>>>>> virtual timer registers are exclusively accessed by Guest and
>>>>>>>>> virtual timer interrupt is only for Guest (never used by Host).
>>>>>>>>>
>>>>>>>>>> Whenever you schedule the guest VCPU again, you'll (a) disable
>>>>>>>>>> interrupts on the CPU, (b) restore the active state of the IRQ for the
>>>>>>>>>> guest, (c) restore the guest PMU state, (d) switch to the guest with
>>>>>>>>>> IRQs enabled on the CPU (potentially).
>>>>>>>>>
>>>>>>>>> Here too, while we are between step (a) and step (b) the PMU HW
>>>>>>>>> context is for host and any PMU counter can overflow. The step (b)
>>>>>>>>> can actually override the PMU IRQ meant for Host.
>>>>>>>>>
>>>>>>>> Can you not simply switch the state from C-code after capturing the IRQ
>>>>>>>> state then?  Everything should be accessible from EL1, right?
>>>>>>>
>>>>>>> Yes, I think that would be the only option. This also means I will need
>>>>>>> to re-implement context switching for doing it in C-code.
>>>>>>
>>>>>> Yes, you'd add some inline assembly in the C-code to access the
>>>>>> registers I guess.  Only thing I thought about after writing my original
>>>>>> mail is whether you'll be counting events while context-swtiching and
>>>>>> running on the host, which you actually don't want to.  Not sure if
>>>>>> there's a better way to avoid that.
>>>>>>
>>>>>>>
>>>>>>> What about the scenario1 which I had mentioned?
>>>>>>>
>>>>>>
>>>>>> You have to consider enabling/disabling forwarding and setting/clearing
>>>>>> the active state is part of the guest PMU state and all of it has to be
>>>>>> context-switched.
>>>>>
>>>>> I found one more issue.
>>>>>
>>>>> If PMU irq is PPI then enabling/disabling forwarding will not
>>>>> work because irqd_set_irq_forwarded() function takes irq_data
>>>>> as argument which is member of irq_desc and irq_desc for PPIs
>>>>> is not per_cpu. This means we cannot call irqd_set_irq_forwarded()
>>>>> simultaneously from different host CPUs.
>>>>>
>>>
>>> Hi Marc,
>>>
>>>> I'll let Marc answer this one and if this still applies to his view of
>>>> how the next version of the forwarding series will look like.
>>
>> I'm looking at it at the moment.
>>
>> I'm inclined to say that we should fix the forwarding code to allow
>> individual PPIs to be forwarded. This is a bit harder than what we're
>> doing at the moment, but that's possible.
>>
>> Of course, that complicates the code a bit, as we have to make sure
>> we're not premptable at that time.
>>
>> What do you think?
> 
> Currently, irqd_set_irq_forwarded() is lockless.
> 
> It would be great if we can update irqd_set_irq_forwarded() for PPIs
> such that it remains irqd_set_irq_forwarded() lockless so that we
> dont have much overhead when we enable/disable forwarding
> state.

We probably need a separate API anyway, as you want to be able to
provide a cpumask to configure this. We can refine this as we go, and I
wouldn't worry about overhead just yet.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-27 11:06                                   ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2014-11-27 11:06 UTC (permalink / raw)
  To: linux-arm-kernel

On 27/11/14 10:54, Anup Patel wrote:
> On Thu, Nov 27, 2014 at 4:10 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> On 27/11/14 10:22, Anup Patel wrote:
>>> On Tue, Nov 25, 2014 at 7:12 PM, Christoffer Dall
>>> <christoffer.dall@linaro.org> wrote:
>>>> On Tue, Nov 25, 2014 at 06:17:03PM +0530, Anup Patel wrote:
>>>>> Hi Christoffer,
>>>>>
>>>>> On Mon, Nov 24, 2014 at 8:07 PM, Christoffer Dall
>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>> On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
>>>>>>> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>> On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
>>>>>>>>> Hi Christoffer,
>>>>>>>>>
>>>>>>>>> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
>>>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>>>> On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>>>>>>>>>>> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>>>>>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>>>>>> On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have second thoughts about rebasing KVM PMU patches
>>>>>>>>>>>>> to Marc's irq-forwarding patches.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The PMU IRQs (when virtualized by KVM) are not exactly
>>>>>>>>>>>>> forwarded IRQs because they are shared between Host
>>>>>>>>>>>>> and Guest.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Scenario1
>>>>>>>>>>>>> -------------
>>>>>>>>>>>>>
>>>>>>>>>>>>> We might have perf running on Host and no KVM guest
>>>>>>>>>>>>> running. In this scenario, we wont get interrupts on Host
>>>>>>>>>>>>> because the kvm_pmu_hyp_init() (similar to the function
>>>>>>>>>>>>> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>>>>>>>>>>>>> implementation) has put all host PMU IRQs in forwarding
>>>>>>>>>>>>> mode.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The only way solve this problem is to not set forwarding
>>>>>>>>>>>>> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>>>>>>>>>>>>> have special routines to turn on and turn off the forwarding
>>>>>>>>>>>>> mode of PMU IRQs. These routines will be called from
>>>>>>>>>>>>> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>>>>>>>>>>>>> forwarding state.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Scenario2
>>>>>>>>>>>>> -------------
>>>>>>>>>>>>>
>>>>>>>>>>>>> We might have perf running on Host and Guest simultaneously
>>>>>>>>>>>>> which means it is quite likely that PMU HW trigger IRQ meant
>>>>>>>>>>>>> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>>>>>>>>>>>>> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>>>>>>>>>>>>> of Marc's patchset which is called before local_irq_enable()).
>>>>>>>>>>>>>
>>>>>>>>>>>>> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>>>>>>>>>>>>> will accidentally forward IRQ meant for Host to Guest unless
>>>>>>>>>>>>> we put additional checks to inspect VCPU PMU state.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am I missing any detail about IRQ forwarding for above
>>>>>>>>>>>>> scenarios?
>>>>>>>>>>>>>
>>>>>>>>>>>> Hi Anup,
>>>>>>>>>>>
>>>>>>>>>>> Hi Christoffer,
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I briefly discussed this with Marc.  What I don't understand is how it
>>>>>>>>>>>> would be possible to get an interrupt for the host while running the
>>>>>>>>>>>> guest?
>>>>>>>>>>>>
>>>>>>>>>>>> The rationale behind my question is that whenever you're running the
>>>>>>>>>>>> guest, the PMU should be programmed exclusively with guest state, and
>>>>>>>>>>>> since the PMU is per core, any interrupts should be for the guest, where
>>>>>>>>>>>> it would always be pending.
>>>>>>>>>>>
>>>>>>>>>>> Yes, thats right PMU is programmed exclusively for guest when
>>>>>>>>>>> guest is running and for host when host is running.
>>>>>>>>>>>
>>>>>>>>>>> Let us assume a situation (Scenario2 mentioned previously)
>>>>>>>>>>> where both host and guest are using PMU. When the guest is
>>>>>>>>>>> running we come back to host mode due to variety of reasons
>>>>>>>>>>> (stage2 fault, guest IO, regular host interrupt, host interrupt
>>>>>>>>>>> meant for guest, ....) which means we will return from the
>>>>>>>>>>> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>>>>>>>>>>> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>>>>>>>>>>> At this point we would have restored back host PMU context and
>>>>>>>>>>> any PMU counter used by host can trigger PMU overflow interrup
>>>>>>>>>>> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>>>>>>>>>>> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>>>>>>>>>>> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>>>>>>>>>>> which will try to detect PMU irq forwarding state in GIC hence it
>>>>>>>>>>> can accidentally discover PMU irq pending for guest while this
>>>>>>>>>>> PMU irq is actually meant for host.
>>>>>>>>>>>
>>>>>>>>>>> This above mentioned situation does not happen for timer
>>>>>>>>>>> because virtual timer interrupts are exclusively used for guest.
>>>>>>>>>>> The exclusive use of virtual timer interrupt for guest ensures that
>>>>>>>>>>> the function kvm_timer_sync_hwstate() will always see correct
>>>>>>>>>>> state of virtual timer IRQ from GIC.
>>>>>>>>>>>
>>>>>>>>>> I'm not quite following.
>>>>>>>>>>
>>>>>>>>>> When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
>>>>>>>>>> you would (1) capture the active state of the IRQ pertaining to the
>>>>>>>>>> guest and (2) deactive the IRQ on the host, then (3) switch the state of
>>>>>>>>>> the PMU to the host state, and finally (4) re-enable IRQs on the CPU
>>>>>>>>>> you're running on.
>>>>>>>>>>
>>>>>>>>>> If the host PMU state restored in (3) causes the PMU to raise an
>>>>>>>>>> interrupt, you'll take an interrupt after (4), which is for the host,
>>>>>>>>>> and you'll handle it on the host.
>>>>>>>>>>
>>>>>>>>> We only switch PMU state in assembly code using
>>>>>>>>> kvm_call_hyp(__kvm_vcpu_run, vcpu)
>>>>>>>>> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
>>>>>>>>> the current hardware PMU state is for host. This means whenever
>>>>>>>>> we are in host mode the host PMU can change state of PMU IRQ
>>>>>>>>> in GIC even if local IRQs are disabled.
>>>>>>>>>
>>>>>>>>> Whenever we inspect active state of PMU IRQ in the
>>>>>>>>> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
>>>>>>>>> Here we are not guaranteed that IRQ forward state returned by the
>>>>>>>>> irq_get_fwd_state() API is for guest only.
>>>>>>>>>
>>>>>>>>> The above situation does not manifest for virtual timer because
>>>>>>>>> virtual timer registers are exclusively accessed by Guest and
>>>>>>>>> virtual timer interrupt is only for Guest (never used by Host).
>>>>>>>>>
>>>>>>>>>> Whenever you schedule the guest VCPU again, you'll (a) disable
>>>>>>>>>> interrupts on the CPU, (b) restore the active state of the IRQ for the
>>>>>>>>>> guest, (c) restore the guest PMU state, (d) switch to the guest with
>>>>>>>>>> IRQs enabled on the CPU (potentially).
>>>>>>>>>
>>>>>>>>> Here too, while we are between step (a) and step (b) the PMU HW
>>>>>>>>> context is for host and any PMU counter can overflow. The step (b)
>>>>>>>>> can actually override the PMU IRQ meant for Host.
>>>>>>>>>
>>>>>>>> Can you not simply switch the state from C-code after capturing the IRQ
>>>>>>>> state then?  Everything should be accessible from EL1, right?
>>>>>>>
>>>>>>> Yes, I think that would be the only option. This also means I will need
>>>>>>> to re-implement context switching for doing it in C-code.
>>>>>>
>>>>>> Yes, you'd add some inline assembly in the C-code to access the
>>>>>> registers I guess.  Only thing I thought about after writing my original
>>>>>> mail is whether you'll be counting events while context-swtiching and
>>>>>> running on the host, which you actually don't want to.  Not sure if
>>>>>> there's a better way to avoid that.
>>>>>>
>>>>>>>
>>>>>>> What about the scenario1 which I had mentioned?
>>>>>>>
>>>>>>
>>>>>> You have to consider enabling/disabling forwarding and setting/clearing
>>>>>> the active state is part of the guest PMU state and all of it has to be
>>>>>> context-switched.
>>>>>
>>>>> I found one more issue.
>>>>>
>>>>> If PMU irq is PPI then enabling/disabling forwarding will not
>>>>> work because irqd_set_irq_forwarded() function takes irq_data
>>>>> as argument which is member of irq_desc and irq_desc for PPIs
>>>>> is not per_cpu. This means we cannot call irqd_set_irq_forwarded()
>>>>> simultaneously from different host CPUs.
>>>>>
>>>
>>> Hi Marc,
>>>
>>>> I'll let Marc answer this one and if this still applies to his view of
>>>> how the next version of the forwarding series will look like.
>>
>> I'm looking at it at the moment.
>>
>> I'm inclined to say that we should fix the forwarding code to allow
>> individual PPIs to be forwarded. This is a bit harder than what we're
>> doing at the moment, but that's possible.
>>
>> Of course, that complicates the code a bit, as we have to make sure
>> we're not premptable at that time.
>>
>> What do you think?
> 
> Currently, irqd_set_irq_forwarded() is lockless.
> 
> It would be great if we can update irqd_set_irq_forwarded() for PPIs
> such that it remains irqd_set_irq_forwarded() lockless so that we
> dont have much overhead when we enable/disable forwarding
> state.

We probably need a separate API anyway, as you want to be able to
provide a cpumask to configure this. We can refine this as we go, and I
wouldn't worry about overhead just yet.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2014-11-27 11:06                                   ` Marc Zyngier
@ 2014-12-30  5:49                                     ` Anup Patel
  -1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-12-30  5:49 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Christoffer Dall, kvmarm, linux-arm-kernel, KVM General, patches,
	Will Deacon, Ian.Campbell, Pranavkumar Sawargaonkar

(dropping previous conversation for easy reading)

Hi Marc/Christoffer,

I tried implementing PMU context-switch via C code
in EL1 mode and in atomic context with irqs disabled.
The context switch itself works perfectly fine but
irq forwarding is not clean for PMU irq.

I found another issue that is GIC only samples irq
lines if they are enabled. This means for using
irq forwarding we will need to ensure that host PMU
irq is enabled.  The arch_timer code does this by
doing request_irq() for host virtual timer interrupt.
For PMU, we can either enable/disable host PMU
irq in context switch or we need to do have shared
irq handler between kvm pmu and host kernel pmu.

I have rethinked about our discussion so far. I
understand that we need KVM PMU virtualization
to meet following criteria:
1. No modification in host PMU driver
2. No modification in guest PMU driver
3. No mask/unmask dance for sharing host PMU irq
4. Clean way to avoid infinite VM exits due to
PMU interrupt

I have discovered new approach which is as follows:
1. Context switch PMU in atomic context (i.e. local_irq_disable())
2. Ensure that host PMU irq is disabled when entering guest
mode and re-enable host PMU irq when exiting guest mode if
it was enabled previously. This is to avoid infinite VM exits
due to PMU interrupt because as-per new approach we
don't mask the PMU irq via PMINTENSET_EL1 register.
3. Inject virtual PMU irq at time of entering guest mode if PMU
overflow register is non-zero (i.e. PMOVSSET_EL0) in atomic
context (i.e. local_irq_disable()).

The only limitation of this new approach is that virtual PMU irq
is injected at time of entering guest mode. This means guest
will receive virtual PMU  interrupt with little delay after actual
interrupt occurred. The PMU interrupts are only overflow events
and generally not used in any timing critical applications. If we
can live with this limitation then this can be a good approach
for KVM PMU virtualization.

Regards,
Anup

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-12-30  5:49                                     ` Anup Patel
  0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-12-30  5:49 UTC (permalink / raw)
  To: linux-arm-kernel

(dropping previous conversation for easy reading)

Hi Marc/Christoffer,

I tried implementing PMU context-switch via C code
in EL1 mode and in atomic context with irqs disabled.
The context switch itself works perfectly fine but
irq forwarding is not clean for PMU irq.

I found another issue that is GIC only samples irq
lines if they are enabled. This means for using
irq forwarding we will need to ensure that host PMU
irq is enabled.  The arch_timer code does this by
doing request_irq() for host virtual timer interrupt.
For PMU, we can either enable/disable host PMU
irq in context switch or we need to do have shared
irq handler between kvm pmu and host kernel pmu.

I have rethinked about our discussion so far. I
understand that we need KVM PMU virtualization
to meet following criteria:
1. No modification in host PMU driver
2. No modification in guest PMU driver
3. No mask/unmask dance for sharing host PMU irq
4. Clean way to avoid infinite VM exits due to
PMU interrupt

I have discovered new approach which is as follows:
1. Context switch PMU in atomic context (i.e. local_irq_disable())
2. Ensure that host PMU irq is disabled when entering guest
mode and re-enable host PMU irq when exiting guest mode if
it was enabled previously. This is to avoid infinite VM exits
due to PMU interrupt because as-per new approach we
don't mask the PMU irq via PMINTENSET_EL1 register.
3. Inject virtual PMU irq at time of entering guest mode if PMU
overflow register is non-zero (i.e. PMOVSSET_EL0) in atomic
context (i.e. local_irq_disable()).

The only limitation of this new approach is that virtual PMU irq
is injected at time of entering guest mode. This means guest
will receive virtual PMU  interrupt with little delay after actual
interrupt occurred. The PMU interrupts are only overflow events
and generally not used in any timing critical applications. If we
can live with this limitation then this can be a good approach
for KVM PMU virtualization.

Regards,
Anup

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2014-12-30  5:49                                     ` Anup Patel
@ 2015-01-08  4:02                                       ` Anup Patel
  -1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2015-01-08  4:02 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Christoffer Dall, kvmarm, linux-arm-kernel, KVM General, patches,
	Will Deacon, Ian.Campbell, Pranavkumar Sawargaonkar

On Tue, Dec 30, 2014 at 11:19 AM, Anup Patel <anup@brainfault.org> wrote:
> (dropping previous conversation for easy reading)
>
> Hi Marc/Christoffer,
>
> I tried implementing PMU context-switch via C code
> in EL1 mode and in atomic context with irqs disabled.
> The context switch itself works perfectly fine but
> irq forwarding is not clean for PMU irq.
>
> I found another issue that is GIC only samples irq
> lines if they are enabled. This means for using
> irq forwarding we will need to ensure that host PMU
> irq is enabled.  The arch_timer code does this by
> doing request_irq() for host virtual timer interrupt.
> For PMU, we can either enable/disable host PMU
> irq in context switch or we need to do have shared
> irq handler between kvm pmu and host kernel pmu.
>
> I have rethinked about our discussion so far. I
> understand that we need KVM PMU virtualization
> to meet following criteria:
> 1. No modification in host PMU driver
> 2. No modification in guest PMU driver
> 3. No mask/unmask dance for sharing host PMU irq
> 4. Clean way to avoid infinite VM exits due to
> PMU interrupt
>
> I have discovered new approach which is as follows:
> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
> 2. Ensure that host PMU irq is disabled when entering guest
> mode and re-enable host PMU irq when exiting guest mode if
> it was enabled previously. This is to avoid infinite VM exits
> due to PMU interrupt because as-per new approach we
> don't mask the PMU irq via PMINTENSET_EL1 register.
> 3. Inject virtual PMU irq at time of entering guest mode if PMU
> overflow register is non-zero (i.e. PMOVSSET_EL0) in atomic
> context (i.e. local_irq_disable()).
>
> The only limitation of this new approach is that virtual PMU irq
> is injected at time of entering guest mode. This means guest
> will receive virtual PMU  interrupt with little delay after actual
> interrupt occurred. The PMU interrupts are only overflow events
> and generally not used in any timing critical applications. If we
> can live with this limitation then this can be a good approach
> for KVM PMU virtualization.
>
> Regards,
> Anup

Hi Marc/Christoffer,

Ping??

Regards,
Anup

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2015-01-08  4:02                                       ` Anup Patel
  0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2015-01-08  4:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 30, 2014 at 11:19 AM, Anup Patel <anup@brainfault.org> wrote:
> (dropping previous conversation for easy reading)
>
> Hi Marc/Christoffer,
>
> I tried implementing PMU context-switch via C code
> in EL1 mode and in atomic context with irqs disabled.
> The context switch itself works perfectly fine but
> irq forwarding is not clean for PMU irq.
>
> I found another issue that is GIC only samples irq
> lines if they are enabled. This means for using
> irq forwarding we will need to ensure that host PMU
> irq is enabled.  The arch_timer code does this by
> doing request_irq() for host virtual timer interrupt.
> For PMU, we can either enable/disable host PMU
> irq in context switch or we need to do have shared
> irq handler between kvm pmu and host kernel pmu.
>
> I have rethinked about our discussion so far. I
> understand that we need KVM PMU virtualization
> to meet following criteria:
> 1. No modification in host PMU driver
> 2. No modification in guest PMU driver
> 3. No mask/unmask dance for sharing host PMU irq
> 4. Clean way to avoid infinite VM exits due to
> PMU interrupt
>
> I have discovered new approach which is as follows:
> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
> 2. Ensure that host PMU irq is disabled when entering guest
> mode and re-enable host PMU irq when exiting guest mode if
> it was enabled previously. This is to avoid infinite VM exits
> due to PMU interrupt because as-per new approach we
> don't mask the PMU irq via PMINTENSET_EL1 register.
> 3. Inject virtual PMU irq at time of entering guest mode if PMU
> overflow register is non-zero (i.e. PMOVSSET_EL0) in atomic
> context (i.e. local_irq_disable()).
>
> The only limitation of this new approach is that virtual PMU irq
> is injected at time of entering guest mode. This means guest
> will receive virtual PMU  interrupt with little delay after actual
> interrupt occurred. The PMU interrupts are only overflow events
> and generally not used in any timing critical applications. If we
> can live with this limitation then this can be a good approach
> for KVM PMU virtualization.
>
> Regards,
> Anup

Hi Marc/Christoffer,

Ping??

Regards,
Anup

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2014-12-30  5:49                                     ` Anup Patel
@ 2015-01-11 19:11                                       ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2015-01-11 19:11 UTC (permalink / raw)
  To: Anup Patel
  Cc: Marc Zyngier, kvmarm, linux-arm-kernel, KVM General, patches,
	Will Deacon, Ian.Campbell, Pranavkumar Sawargaonkar

On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
> (dropping previous conversation for easy reading)
> 
> Hi Marc/Christoffer,
> 
> I tried implementing PMU context-switch via C code
> in EL1 mode and in atomic context with irqs disabled.
> The context switch itself works perfectly fine but
> irq forwarding is not clean for PMU irq.
> 
> I found another issue that is GIC only samples irq
> lines if they are enabled. This means for using
> irq forwarding we will need to ensure that host PMU
> irq is enabled.  The arch_timer code does this by
> doing request_irq() for host virtual timer interrupt.
> For PMU, we can either enable/disable host PMU
> irq in context switch or we need to do have shared
> irq handler between kvm pmu and host kernel pmu.

could we simply require the host PMU driver to request the IRQ and have
the driver inject the corresponding IRQ to the VM via a mechanism
similar to VFIO using an eventfd and irqfds etc.?

(I haven't quite thought through if there's a way for the host PMU
driver to distinguish between an IRQ for itself and one for the guest,
though).

It does feel like we will need some sort of communication/coordination
between the host PMU driver and KVM...

> 
> I have rethinked about our discussion so far. I
> understand that we need KVM PMU virtualization
> to meet following criteria:
> 1. No modification in host PMU driver

is this really a strict requirement?  one of the advantages of KVM
should be that the rest of the kernel should be supportive of KVM.

> 2. No modification in guest PMU driver
> 3. No mask/unmask dance for sharing host PMU irq
> 4. Clean way to avoid infinite VM exits due to
> PMU interrupt
> 
> I have discovered new approach which is as follows:
> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
> 2. Ensure that host PMU irq is disabled when entering guest
> mode and re-enable host PMU irq when exiting guest mode if
> it was enabled previously.

How does this look like software-engineering wise?  Would you be looking
up the IRQ number from the DT in the KVM code again?  How does KVM then
synchronize with the host PMU driver so they're not both requesting the
same IRQ at the same time?

> This is to avoid infinite VM exits
> due to PMU interrupt because as-per new approach we
> don't mask the PMU irq via PMINTENSET_EL1 register.
> 3. Inject virtual PMU irq at time of entering guest mode if PMU
> overflow register is non-zero (i.e. PMOVSSET_EL0) in atomic
> context (i.e. local_irq_disable()).
> 
> The only limitation of this new approach is that virtual PMU irq
> is injected at time of entering guest mode. This means guest
> will receive virtual PMU  interrupt with little delay after actual
> interrupt occurred.

it may never receive it in the case of a tickless configuration AFAICT,
so this doesn't sound like the right approach.

> The PMU interrupts are only overflow events
> and generally not used in any timing critical applications. If we
> can live with this limitation then this can be a good approach
> for KVM PMU virtualization.
> 
Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2015-01-11 19:11                                       ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2015-01-11 19:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
> (dropping previous conversation for easy reading)
> 
> Hi Marc/Christoffer,
> 
> I tried implementing PMU context-switch via C code
> in EL1 mode and in atomic context with irqs disabled.
> The context switch itself works perfectly fine but
> irq forwarding is not clean for PMU irq.
> 
> I found another issue that is GIC only samples irq
> lines if they are enabled. This means for using
> irq forwarding we will need to ensure that host PMU
> irq is enabled.  The arch_timer code does this by
> doing request_irq() for host virtual timer interrupt.
> For PMU, we can either enable/disable host PMU
> irq in context switch or we need to do have shared
> irq handler between kvm pmu and host kernel pmu.

could we simply require the host PMU driver to request the IRQ and have
the driver inject the corresponding IRQ to the VM via a mechanism
similar to VFIO using an eventfd and irqfds etc.?

(I haven't quite thought through if there's a way for the host PMU
driver to distinguish between an IRQ for itself and one for the guest,
though).

It does feel like we will need some sort of communication/coordination
between the host PMU driver and KVM...

> 
> I have rethinked about our discussion so far. I
> understand that we need KVM PMU virtualization
> to meet following criteria:
> 1. No modification in host PMU driver

is this really a strict requirement?  one of the advantages of KVM
should be that the rest of the kernel should be supportive of KVM.

> 2. No modification in guest PMU driver
> 3. No mask/unmask dance for sharing host PMU irq
> 4. Clean way to avoid infinite VM exits due to
> PMU interrupt
> 
> I have discovered new approach which is as follows:
> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
> 2. Ensure that host PMU irq is disabled when entering guest
> mode and re-enable host PMU irq when exiting guest mode if
> it was enabled previously.

How does this look like software-engineering wise?  Would you be looking
up the IRQ number from the DT in the KVM code again?  How does KVM then
synchronize with the host PMU driver so they're not both requesting the
same IRQ at the same time?

> This is to avoid infinite VM exits
> due to PMU interrupt because as-per new approach we
> don't mask the PMU irq via PMINTENSET_EL1 register.
> 3. Inject virtual PMU irq at time of entering guest mode if PMU
> overflow register is non-zero (i.e. PMOVSSET_EL0) in atomic
> context (i.e. local_irq_disable()).
> 
> The only limitation of this new approach is that virtual PMU irq
> is injected at time of entering guest mode. This means guest
> will receive virtual PMU  interrupt with little delay after actual
> interrupt occurred.

it may never receive it in the case of a tickless configuration AFAICT,
so this doesn't sound like the right approach.

> The PMU interrupts are only overflow events
> and generally not used in any timing critical applications. If we
> can live with this limitation then this can be a good approach
> for KVM PMU virtualization.
> 
Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2015-01-11 19:11                                       ` Christoffer Dall
@ 2015-01-12  4:19                                         ` Anup Patel
  -1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2015-01-12  4:19 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Ian.Campbell, KVM General, Marc Zyngier, patches, Will Deacon,
	kvmarm, linux-arm-kernel, Pranavkumar Sawargaonkar

On Mon, Jan 12, 2015 at 12:41 AM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
>> (dropping previous conversation for easy reading)
>>
>> Hi Marc/Christoffer,
>>
>> I tried implementing PMU context-switch via C code
>> in EL1 mode and in atomic context with irqs disabled.
>> The context switch itself works perfectly fine but
>> irq forwarding is not clean for PMU irq.
>>
>> I found another issue that is GIC only samples irq
>> lines if they are enabled. This means for using
>> irq forwarding we will need to ensure that host PMU
>> irq is enabled.  The arch_timer code does this by
>> doing request_irq() for host virtual timer interrupt.
>> For PMU, we can either enable/disable host PMU
>> irq in context switch or we need to do have shared
>> irq handler between kvm pmu and host kernel pmu.
>
> could we simply require the host PMU driver to request the IRQ and have
> the driver inject the corresponding IRQ to the VM via a mechanism
> similar to VFIO using an eventfd and irqfds etc.?

Currently, the host PMU driver does request_irq() only when
there is some event to be monitored. This means host will do
request_irq() only when we run perf application on host
user space.

Initially, I though that we could simply pass IRQF_SHARED
for request_irq() in host PMU driver and do the same for
reqest_irq() in KVM PMU code but the PMU irq can be
SPI or PPI. If the PMU irq is SPI then IRQF_SHARED
flag would fine but if its PPI then we have no way to
set IRQF_SHARED flag because request_percpu_irq()
does not have irq flags parameter.

>
> (I haven't quite thought through if there's a way for the host PMU
> driver to distinguish between an IRQ for itself and one for the guest,
> though).
>
> It does feel like we will need some sort of communication/coordination
> between the host PMU driver and KVM...
>
>>
>> I have rethinked about our discussion so far. I
>> understand that we need KVM PMU virtualization
>> to meet following criteria:
>> 1. No modification in host PMU driver
>
> is this really a strict requirement?  one of the advantages of KVM
> should be that the rest of the kernel should be supportive of KVM.

I guess so because host PMU driver should not do things
differently for host and guest. I think this the reason why
we discarded the mask/unmask PMU irq approach which
I had implemented in RFC v1.

>
>> 2. No modification in guest PMU driver
>> 3. No mask/unmask dance for sharing host PMU irq
>> 4. Clean way to avoid infinite VM exits due to
>> PMU interrupt
>>
>> I have discovered new approach which is as follows:
>> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
>> 2. Ensure that host PMU irq is disabled when entering guest
>> mode and re-enable host PMU irq when exiting guest mode if
>> it was enabled previously.
>
> How does this look like software-engineering wise?  Would you be looking
> up the IRQ number from the DT in the KVM code again?  How does KVM then
> synchronize with the host PMU driver so they're not both requesting the
> same IRQ at the same time?

We only lookup host PMU irq numbers from DT at HYP init time.

During context switch we know the host PMU irq number for
current host CPU so we can get state of host PMU irq in
context switch code.

If we go by the shard irq handler approach then both KVM
and host PMU driver will do request_irq() on same host
PMU irq. In other words, there is no virtual PMU irq provided
by HW for guest.

>
>> This is to avoid infinite VM exits
>> due to PMU interrupt because as-per new approach we
>> don't mask the PMU irq via PMINTENSET_EL1 register.
>> 3. Inject virtual PMU irq at time of entering guest mode if PMU
>> overflow register is non-zero (i.e. PMOVSSET_EL0) in atomic
>> context (i.e. local_irq_disable()).
>>
>> The only limitation of this new approach is that virtual PMU irq
>> is injected at time of entering guest mode. This means guest
>> will receive virtual PMU  interrupt with little delay after actual
>> interrupt occurred.
>
> it may never receive it in the case of a tickless configuration AFAICT,
> so this doesn't sound like the right approach.

I think irrespective to any approach we take, we need a mechanism
to have shared irq handlers in KVM PMU and host PMU driver for
both PPI and SPI.

>
>> The PMU interrupts are only overflow events
>> and generally not used in any timing critical applications. If we
>> can live with this limitation then this can be a good approach
>> for KVM PMU virtualization.
>>
> Thanks,
> -Christoffer

Regards,
Anup

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2015-01-12  4:19                                         ` Anup Patel
  0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2015-01-12  4:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 12, 2015 at 12:41 AM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
>> (dropping previous conversation for easy reading)
>>
>> Hi Marc/Christoffer,
>>
>> I tried implementing PMU context-switch via C code
>> in EL1 mode and in atomic context with irqs disabled.
>> The context switch itself works perfectly fine but
>> irq forwarding is not clean for PMU irq.
>>
>> I found another issue that is GIC only samples irq
>> lines if they are enabled. This means for using
>> irq forwarding we will need to ensure that host PMU
>> irq is enabled.  The arch_timer code does this by
>> doing request_irq() for host virtual timer interrupt.
>> For PMU, we can either enable/disable host PMU
>> irq in context switch or we need to do have shared
>> irq handler between kvm pmu and host kernel pmu.
>
> could we simply require the host PMU driver to request the IRQ and have
> the driver inject the corresponding IRQ to the VM via a mechanism
> similar to VFIO using an eventfd and irqfds etc.?

Currently, the host PMU driver does request_irq() only when
there is some event to be monitored. This means host will do
request_irq() only when we run perf application on host
user space.

Initially, I though that we could simply pass IRQF_SHARED
for request_irq() in host PMU driver and do the same for
reqest_irq() in KVM PMU code but the PMU irq can be
SPI or PPI. If the PMU irq is SPI then IRQF_SHARED
flag would fine but if its PPI then we have no way to
set IRQF_SHARED flag because request_percpu_irq()
does not have irq flags parameter.

>
> (I haven't quite thought through if there's a way for the host PMU
> driver to distinguish between an IRQ for itself and one for the guest,
> though).
>
> It does feel like we will need some sort of communication/coordination
> between the host PMU driver and KVM...
>
>>
>> I have rethinked about our discussion so far. I
>> understand that we need KVM PMU virtualization
>> to meet following criteria:
>> 1. No modification in host PMU driver
>
> is this really a strict requirement?  one of the advantages of KVM
> should be that the rest of the kernel should be supportive of KVM.

I guess so because host PMU driver should not do things
differently for host and guest. I think this the reason why
we discarded the mask/unmask PMU irq approach which
I had implemented in RFC v1.

>
>> 2. No modification in guest PMU driver
>> 3. No mask/unmask dance for sharing host PMU irq
>> 4. Clean way to avoid infinite VM exits due to
>> PMU interrupt
>>
>> I have discovered new approach which is as follows:
>> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
>> 2. Ensure that host PMU irq is disabled when entering guest
>> mode and re-enable host PMU irq when exiting guest mode if
>> it was enabled previously.
>
> How does this look like software-engineering wise?  Would you be looking
> up the IRQ number from the DT in the KVM code again?  How does KVM then
> synchronize with the host PMU driver so they're not both requesting the
> same IRQ at the same time?

We only lookup host PMU irq numbers from DT at HYP init time.

During context switch we know the host PMU irq number for
current host CPU so we can get state of host PMU irq in
context switch code.

If we go by the shard irq handler approach then both KVM
and host PMU driver will do request_irq() on same host
PMU irq. In other words, there is no virtual PMU irq provided
by HW for guest.

>
>> This is to avoid infinite VM exits
>> due to PMU interrupt because as-per new approach we
>> don't mask the PMU irq via PMINTENSET_EL1 register.
>> 3. Inject virtual PMU irq at time of entering guest mode if PMU
>> overflow register is non-zero (i.e. PMOVSSET_EL0) in atomic
>> context (i.e. local_irq_disable()).
>>
>> The only limitation of this new approach is that virtual PMU irq
>> is injected at time of entering guest mode. This means guest
>> will receive virtual PMU  interrupt with little delay after actual
>> interrupt occurred.
>
> it may never receive it in the case of a tickless configuration AFAICT,
> so this doesn't sound like the right approach.

I think irrespective to any approach we take, we need a mechanism
to have shared irq handlers in KVM PMU and host PMU driver for
both PPI and SPI.

>
>> The PMU interrupts are only overflow events
>> and generally not used in any timing critical applications. If we
>> can live with this limitation then this can be a good approach
>> for KVM PMU virtualization.
>>
> Thanks,
> -Christoffer

Regards,
Anup

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2015-01-11 19:11                                       ` Christoffer Dall
@ 2015-01-14  4:28                                         ` Anup Patel
  -1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2015-01-14  4:28 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Marc Zyngier, kvmarm, linux-arm-kernel, KVM General, patches,
	Will Deacon, Ian.Campbell, Pranavkumar Sawargaonkar

On Mon, Jan 12, 2015 at 12:41 AM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
>> (dropping previous conversation for easy reading)
>>
>> Hi Marc/Christoffer,
>>
>> I tried implementing PMU context-switch via C code
>> in EL1 mode and in atomic context with irqs disabled.
>> The context switch itself works perfectly fine but
>> irq forwarding is not clean for PMU irq.
>>
>> I found another issue that is GIC only samples irq
>> lines if they are enabled. This means for using
>> irq forwarding we will need to ensure that host PMU
>> irq is enabled.  The arch_timer code does this by
>> doing request_irq() for host virtual timer interrupt.
>> For PMU, we can either enable/disable host PMU
>> irq in context switch or we need to do have shared
>> irq handler between kvm pmu and host kernel pmu.
>
> could we simply require the host PMU driver to request the IRQ and have
> the driver inject the corresponding IRQ to the VM via a mechanism
> similar to VFIO using an eventfd and irqfds etc.?
>
> (I haven't quite thought through if there's a way for the host PMU
> driver to distinguish between an IRQ for itself and one for the guest,
> though).
>
> It does feel like we will need some sort of communication/coordination
> between the host PMU driver and KVM...
>
>>
>> I have rethinked about our discussion so far. I
>> understand that we need KVM PMU virtualization
>> to meet following criteria:
>> 1. No modification in host PMU driver
>
> is this really a strict requirement?  one of the advantages of KVM
> should be that the rest of the kernel should be supportive of KVM.
>
>> 2. No modification in guest PMU driver
>> 3. No mask/unmask dance for sharing host PMU irq
>> 4. Clean way to avoid infinite VM exits due to
>> PMU interrupt
>>
>> I have discovered new approach which is as follows:
>> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
>> 2. Ensure that host PMU irq is disabled when entering guest
>> mode and re-enable host PMU irq when exiting guest mode if
>> it was enabled previously.
>
> How does this look like software-engineering wise?  Would you be looking
> up the IRQ number from the DT in the KVM code again?  How does KVM then
> synchronize with the host PMU driver so they're not both requesting the
> same IRQ at the same time?
>
>> This is to avoid infinite VM exits
>> due to PMU interrupt because as-per new approach we
>> don't mask the PMU irq via PMINTENSET_EL1 register.
>> 3. Inject virtual PMU irq at time of entering guest mode if PMU
>> overflow register is non-zero (i.e. PMOVSSET_EL0) in atomic
>> context (i.e. local_irq_disable()).
>>
>> The only limitation of this new approach is that virtual PMU irq
>> is injected at time of entering guest mode. This means guest
>> will receive virtual PMU  interrupt with little delay after actual
>> interrupt occurred.
>
> it may never receive it in the case of a tickless configuration AFAICT,
> so this doesn't sound like the right approach.

The PMU interrupts are not similar to arch_timer interrupts. In fact,
they are overflow interrupts on event counters. The PMU events
of Guest VCPU are only counted when Guest VCPU is running.
If the Guest VCPU is scheduled out or we are in Host mode then
then PMU events are counted for Host or other Guest whoever is
running currently.

In my view, this does not break tickless guest.

Also, the above fact applies irrespective to the approach we take
for PMU virtualization.

Regards,
Anup

>
>> The PMU interrupts are only overflow events
>> and generally not used in any timing critical applications. If we
>> can live with this limitation then this can be a good approach
>> for KVM PMU virtualization.
>>
> Thanks,
> -Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2015-01-14  4:28                                         ` Anup Patel
  0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2015-01-14  4:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 12, 2015 at 12:41 AM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
>> (dropping previous conversation for easy reading)
>>
>> Hi Marc/Christoffer,
>>
>> I tried implementing PMU context-switch via C code
>> in EL1 mode and in atomic context with irqs disabled.
>> The context switch itself works perfectly fine but
>> irq forwarding is not clean for PMU irq.
>>
>> I found another issue that is GIC only samples irq
>> lines if they are enabled. This means for using
>> irq forwarding we will need to ensure that host PMU
>> irq is enabled.  The arch_timer code does this by
>> doing request_irq() for host virtual timer interrupt.
>> For PMU, we can either enable/disable host PMU
>> irq in context switch or we need to do have shared
>> irq handler between kvm pmu and host kernel pmu.
>
> could we simply require the host PMU driver to request the IRQ and have
> the driver inject the corresponding IRQ to the VM via a mechanism
> similar to VFIO using an eventfd and irqfds etc.?
>
> (I haven't quite thought through if there's a way for the host PMU
> driver to distinguish between an IRQ for itself and one for the guest,
> though).
>
> It does feel like we will need some sort of communication/coordination
> between the host PMU driver and KVM...
>
>>
>> I have rethinked about our discussion so far. I
>> understand that we need KVM PMU virtualization
>> to meet following criteria:
>> 1. No modification in host PMU driver
>
> is this really a strict requirement?  one of the advantages of KVM
> should be that the rest of the kernel should be supportive of KVM.
>
>> 2. No modification in guest PMU driver
>> 3. No mask/unmask dance for sharing host PMU irq
>> 4. Clean way to avoid infinite VM exits due to
>> PMU interrupt
>>
>> I have discovered new approach which is as follows:
>> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
>> 2. Ensure that host PMU irq is disabled when entering guest
>> mode and re-enable host PMU irq when exiting guest mode if
>> it was enabled previously.
>
> How does this look like software-engineering wise?  Would you be looking
> up the IRQ number from the DT in the KVM code again?  How does KVM then
> synchronize with the host PMU driver so they're not both requesting the
> same IRQ at the same time?
>
>> This is to avoid infinite VM exits
>> due to PMU interrupt because as-per new approach we
>> don't mask the PMU irq via PMINTENSET_EL1 register.
>> 3. Inject virtual PMU irq at time of entering guest mode if PMU
>> overflow register is non-zero (i.e. PMOVSSET_EL0) in atomic
>> context (i.e. local_irq_disable()).
>>
>> The only limitation of this new approach is that virtual PMU irq
>> is injected at time of entering guest mode. This means guest
>> will receive virtual PMU  interrupt with little delay after actual
>> interrupt occurred.
>
> it may never receive it in the case of a tickless configuration AFAICT,
> so this doesn't sound like the right approach.

The PMU interrupts are not similar to arch_timer interrupts. In fact,
they are overflow interrupts on event counters. The PMU events
of Guest VCPU are only counted when Guest VCPU is running.
If the Guest VCPU is scheduled out or we are in Host mode then
then PMU events are counted for Host or other Guest whoever is
running currently.

In my view, this does not break tickless guest.

Also, the above fact applies irrespective to the approach we take
for PMU virtualization.

Regards,
Anup

>
>> The PMU interrupts are only overflow events
>> and generally not used in any timing critical applications. If we
>> can live with this limitation then this can be a good approach
>> for KVM PMU virtualization.
>>
> Thanks,
> -Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2015-01-12  4:19                                         ` Anup Patel
@ 2015-02-15 15:33                                           ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2015-02-15 15:33 UTC (permalink / raw)
  To: Anup Patel
  Cc: Marc Zyngier, kvmarm, linux-arm-kernel, KVM General, patches,
	Will Deacon, Ian.Campbell, Pranavkumar Sawargaonkar

Hi Anup,

On Mon, Jan 12, 2015 at 09:49:13AM +0530, Anup Patel wrote:
> On Mon, Jan 12, 2015 at 12:41 AM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
> >> (dropping previous conversation for easy reading)
> >>
> >> Hi Marc/Christoffer,
> >>
> >> I tried implementing PMU context-switch via C code
> >> in EL1 mode and in atomic context with irqs disabled.
> >> The context switch itself works perfectly fine but
> >> irq forwarding is not clean for PMU irq.
> >>
> >> I found another issue that is GIC only samples irq
> >> lines if they are enabled. This means for using
> >> irq forwarding we will need to ensure that host PMU
> >> irq is enabled.  The arch_timer code does this by
> >> doing request_irq() for host virtual timer interrupt.
> >> For PMU, we can either enable/disable host PMU
> >> irq in context switch or we need to do have shared
> >> irq handler between kvm pmu and host kernel pmu.
> >
> > could we simply require the host PMU driver to request the IRQ and have
> > the driver inject the corresponding IRQ to the VM via a mechanism
> > similar to VFIO using an eventfd and irqfds etc.?
> 
> Currently, the host PMU driver does request_irq() only when
> there is some event to be monitored. This means host will do
> request_irq() only when we run perf application on host
> user space.
> 
> Initially, I though that we could simply pass IRQF_SHARED
> for request_irq() in host PMU driver and do the same for
> reqest_irq() in KVM PMU code but the PMU irq can be
> SPI or PPI. If the PMU irq is SPI then IRQF_SHARED
> flag would fine but if its PPI then we have no way to
> set IRQF_SHARED flag because request_percpu_irq()
> does not have irq flags parameter.
> 
> >
> > (I haven't quite thought through if there's a way for the host PMU
> > driver to distinguish between an IRQ for itself and one for the guest,
> > though).
> >
> > It does feel like we will need some sort of communication/coordination
> > between the host PMU driver and KVM...
> >
> >>
> >> I have rethinked about our discussion so far. I
> >> understand that we need KVM PMU virtualization
> >> to meet following criteria:
> >> 1. No modification in host PMU driver
> >
> > is this really a strict requirement?  one of the advantages of KVM
> > should be that the rest of the kernel should be supportive of KVM.
> 
> I guess so because host PMU driver should not do things
> differently for host and guest. I think this the reason why
> we discarded the mask/unmask PMU irq approach which
> I had implemented in RFC v1.
> 
> >
> >> 2. No modification in guest PMU driver
> >> 3. No mask/unmask dance for sharing host PMU irq
> >> 4. Clean way to avoid infinite VM exits due to
> >> PMU interrupt
> >>
> >> I have discovered new approach which is as follows:
> >> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
> >> 2. Ensure that host PMU irq is disabled when entering guest
> >> mode and re-enable host PMU irq when exiting guest mode if
> >> it was enabled previously.
> >
> > How does this look like software-engineering wise?  Would you be looking
> > up the IRQ number from the DT in the KVM code again?  How does KVM then
> > synchronize with the host PMU driver so they're not both requesting the
> > same IRQ at the same time?
> 
> We only lookup host PMU irq numbers from DT at HYP init time.
> 
> During context switch we know the host PMU irq number for
> current host CPU so we can get state of host PMU irq in
> context switch code.
> 
> If we go by the shard irq handler approach then both KVM
> and host PMU driver will do request_irq() on same host
> PMU irq. In other words, there is no virtual PMU irq provided
> by HW for guest.
> 

Sorry for the *really* long delay in this response.

We had a chat about this subject with Will Deacon and Marc Zyngier
during connect, and basically we came to think of a number of problems
with the current approach:

1. As you pointed out, there is a need for a shared IRQ handler, and
   there is no immediately nice way to implement this without a more
   sophisticated perf/kvm interface, probably comprising eventfds or
   something similar.

2. Hijacking the counters for the VM without perf knowing about it
   basically makes it impossible to do system-wide event counting, an
   important use case for a virtualization host.

So the approach we will be taking now would be to:

First, implement a strictly trap-and-emulate in software approach.  This
would allow any software relying on access to performance counters to
work, although potentially with slightly unprecise values.  This is the
approach taken by x86 and would be significantly simpler to support on
systems like big.LITTLE as well.

Second, if there are values obtained from within the guest that are so
skewed by the trap-and-emulate approach that we need to give the guest
access to counters, we should try to share the hardware by partitioning
the physical counters, but again, we need to coordinate with the host
perf system for this.  We would only be pursuing this approach if
absolutely necessary.

Apologies for the change in direction on this.

What are your thoughts?  Do you still have time/interest to work
on any of this?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2015-02-15 15:33                                           ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2015-02-15 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Anup,

On Mon, Jan 12, 2015 at 09:49:13AM +0530, Anup Patel wrote:
> On Mon, Jan 12, 2015 at 12:41 AM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
> >> (dropping previous conversation for easy reading)
> >>
> >> Hi Marc/Christoffer,
> >>
> >> I tried implementing PMU context-switch via C code
> >> in EL1 mode and in atomic context with irqs disabled.
> >> The context switch itself works perfectly fine but
> >> irq forwarding is not clean for PMU irq.
> >>
> >> I found another issue that is GIC only samples irq
> >> lines if they are enabled. This means for using
> >> irq forwarding we will need to ensure that host PMU
> >> irq is enabled.  The arch_timer code does this by
> >> doing request_irq() for host virtual timer interrupt.
> >> For PMU, we can either enable/disable host PMU
> >> irq in context switch or we need to do have shared
> >> irq handler between kvm pmu and host kernel pmu.
> >
> > could we simply require the host PMU driver to request the IRQ and have
> > the driver inject the corresponding IRQ to the VM via a mechanism
> > similar to VFIO using an eventfd and irqfds etc.?
> 
> Currently, the host PMU driver does request_irq() only when
> there is some event to be monitored. This means host will do
> request_irq() only when we run perf application on host
> user space.
> 
> Initially, I though that we could simply pass IRQF_SHARED
> for request_irq() in host PMU driver and do the same for
> reqest_irq() in KVM PMU code but the PMU irq can be
> SPI or PPI. If the PMU irq is SPI then IRQF_SHARED
> flag would fine but if its PPI then we have no way to
> set IRQF_SHARED flag because request_percpu_irq()
> does not have irq flags parameter.
> 
> >
> > (I haven't quite thought through if there's a way for the host PMU
> > driver to distinguish between an IRQ for itself and one for the guest,
> > though).
> >
> > It does feel like we will need some sort of communication/coordination
> > between the host PMU driver and KVM...
> >
> >>
> >> I have rethinked about our discussion so far. I
> >> understand that we need KVM PMU virtualization
> >> to meet following criteria:
> >> 1. No modification in host PMU driver
> >
> > is this really a strict requirement?  one of the advantages of KVM
> > should be that the rest of the kernel should be supportive of KVM.
> 
> I guess so because host PMU driver should not do things
> differently for host and guest. I think this the reason why
> we discarded the mask/unmask PMU irq approach which
> I had implemented in RFC v1.
> 
> >
> >> 2. No modification in guest PMU driver
> >> 3. No mask/unmask dance for sharing host PMU irq
> >> 4. Clean way to avoid infinite VM exits due to
> >> PMU interrupt
> >>
> >> I have discovered new approach which is as follows:
> >> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
> >> 2. Ensure that host PMU irq is disabled when entering guest
> >> mode and re-enable host PMU irq when exiting guest mode if
> >> it was enabled previously.
> >
> > How does this look like software-engineering wise?  Would you be looking
> > up the IRQ number from the DT in the KVM code again?  How does KVM then
> > synchronize with the host PMU driver so they're not both requesting the
> > same IRQ at the same time?
> 
> We only lookup host PMU irq numbers from DT at HYP init time.
> 
> During context switch we know the host PMU irq number for
> current host CPU so we can get state of host PMU irq in
> context switch code.
> 
> If we go by the shard irq handler approach then both KVM
> and host PMU driver will do request_irq() on same host
> PMU irq. In other words, there is no virtual PMU irq provided
> by HW for guest.
> 

Sorry for the *really* long delay in this response.

We had a chat about this subject with Will Deacon and Marc Zyngier
during connect, and basically we came to think of a number of problems
with the current approach:

1. As you pointed out, there is a need for a shared IRQ handler, and
   there is no immediately nice way to implement this without a more
   sophisticated perf/kvm interface, probably comprising eventfds or
   something similar.

2. Hijacking the counters for the VM without perf knowing about it
   basically makes it impossible to do system-wide event counting, an
   important use case for a virtualization host.

So the approach we will be taking now would be to:

First, implement a strictly trap-and-emulate in software approach.  This
would allow any software relying on access to performance counters to
work, although potentially with slightly unprecise values.  This is the
approach taken by x86 and would be significantly simpler to support on
systems like big.LITTLE as well.

Second, if there are values obtained from within the guest that are so
skewed by the trap-and-emulate approach that we need to give the guest
access to counters, we should try to share the hardware by partitioning
the physical counters, but again, we need to coordinate with the host
perf system for this.  We would only be pursuing this approach if
absolutely necessary.

Apologies for the change in direction on this.

What are your thoughts?  Do you still have time/interest to work
on any of this?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2015-02-15 15:33                                           ` Christoffer Dall
@ 2015-02-16 12:16                                             ` Anup Patel
  -1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2015-02-16 12:16 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Marc Zyngier, kvmarm, linux-arm-kernel, KVM General, patches,
	Will Deacon, Ian.Campbell, Pranavkumar Sawargaonkar

Hi Christoffer,

On Sun, Feb 15, 2015 at 9:03 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> Hi Anup,
>
> On Mon, Jan 12, 2015 at 09:49:13AM +0530, Anup Patel wrote:
>> On Mon, Jan 12, 2015 at 12:41 AM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>> > On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
>> >> (dropping previous conversation for easy reading)
>> >>
>> >> Hi Marc/Christoffer,
>> >>
>> >> I tried implementing PMU context-switch via C code
>> >> in EL1 mode and in atomic context with irqs disabled.
>> >> The context switch itself works perfectly fine but
>> >> irq forwarding is not clean for PMU irq.
>> >>
>> >> I found another issue that is GIC only samples irq
>> >> lines if they are enabled. This means for using
>> >> irq forwarding we will need to ensure that host PMU
>> >> irq is enabled.  The arch_timer code does this by
>> >> doing request_irq() for host virtual timer interrupt.
>> >> For PMU, we can either enable/disable host PMU
>> >> irq in context switch or we need to do have shared
>> >> irq handler between kvm pmu and host kernel pmu.
>> >
>> > could we simply require the host PMU driver to request the IRQ and have
>> > the driver inject the corresponding IRQ to the VM via a mechanism
>> > similar to VFIO using an eventfd and irqfds etc.?
>>
>> Currently, the host PMU driver does request_irq() only when
>> there is some event to be monitored. This means host will do
>> request_irq() only when we run perf application on host
>> user space.
>>
>> Initially, I though that we could simply pass IRQF_SHARED
>> for request_irq() in host PMU driver and do the same for
>> reqest_irq() in KVM PMU code but the PMU irq can be
>> SPI or PPI. If the PMU irq is SPI then IRQF_SHARED
>> flag would fine but if its PPI then we have no way to
>> set IRQF_SHARED flag because request_percpu_irq()
>> does not have irq flags parameter.
>>
>> >
>> > (I haven't quite thought through if there's a way for the host PMU
>> > driver to distinguish between an IRQ for itself and one for the guest,
>> > though).
>> >
>> > It does feel like we will need some sort of communication/coordination
>> > between the host PMU driver and KVM...
>> >
>> >>
>> >> I have rethinked about our discussion so far. I
>> >> understand that we need KVM PMU virtualization
>> >> to meet following criteria:
>> >> 1. No modification in host PMU driver
>> >
>> > is this really a strict requirement?  one of the advantages of KVM
>> > should be that the rest of the kernel should be supportive of KVM.
>>
>> I guess so because host PMU driver should not do things
>> differently for host and guest. I think this the reason why
>> we discarded the mask/unmask PMU irq approach which
>> I had implemented in RFC v1.
>>
>> >
>> >> 2. No modification in guest PMU driver
>> >> 3. No mask/unmask dance for sharing host PMU irq
>> >> 4. Clean way to avoid infinite VM exits due to
>> >> PMU interrupt
>> >>
>> >> I have discovered new approach which is as follows:
>> >> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
>> >> 2. Ensure that host PMU irq is disabled when entering guest
>> >> mode and re-enable host PMU irq when exiting guest mode if
>> >> it was enabled previously.
>> >
>> > How does this look like software-engineering wise?  Would you be looking
>> > up the IRQ number from the DT in the KVM code again?  How does KVM then
>> > synchronize with the host PMU driver so they're not both requesting the
>> > same IRQ at the same time?
>>
>> We only lookup host PMU irq numbers from DT at HYP init time.
>>
>> During context switch we know the host PMU irq number for
>> current host CPU so we can get state of host PMU irq in
>> context switch code.
>>
>> If we go by the shard irq handler approach then both KVM
>> and host PMU driver will do request_irq() on same host
>> PMU irq. In other words, there is no virtual PMU irq provided
>> by HW for guest.
>>
>
> Sorry for the *really* long delay in this response.
>
> We had a chat about this subject with Will Deacon and Marc Zyngier
> during connect, and basically we came to think of a number of problems
> with the current approach:
>
> 1. As you pointed out, there is a need for a shared IRQ handler, and
>    there is no immediately nice way to implement this without a more
>    sophisticated perf/kvm interface, probably comprising eventfds or
>    something similar.
>
> 2. Hijacking the counters for the VM without perf knowing about it
>    basically makes it impossible to do system-wide event counting, an
>    important use case for a virtualization host.
>
> So the approach we will be taking now would be to:
>
> First, implement a strictly trap-and-emulate in software approach.  This
> would allow any software relying on access to performance counters to
> work, although potentially with slightly unprecise values.  This is the
> approach taken by x86 and would be significantly simpler to support on
> systems like big.LITTLE as well.

Actually, trap-and-emulate would also help avoid additions to the
KVM world switch.

>
> Second, if there are values obtained from within the guest that are so
> skewed by the trap-and-emulate approach that we need to give the guest
> access to counters, we should try to share the hardware by partitioning
> the physical counters, but again, we need to coordinate with the host
> perf system for this.  We would only be pursuing this approach if
> absolutely necessary.

Yes, with trap-and-emulate we cannot accurately emulate all types
of hw counters (particularly cache misses and similar events).

>
> Apologies for the change in direction on this.
>
> What are your thoughts?  Do you still have time/interest to work
> on any of this?

Its a drastic change in direction.

Currently, I have taken up some different work (not related to KVM)
so for next few months I wont be able spend time on this.

Its better if Linaro takes this work to avoid any further delays.

Best Regards,
Anup

>
> Thanks,
> -Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2015-02-16 12:16                                             ` Anup Patel
  0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2015-02-16 12:16 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On Sun, Feb 15, 2015 at 9:03 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> Hi Anup,
>
> On Mon, Jan 12, 2015 at 09:49:13AM +0530, Anup Patel wrote:
>> On Mon, Jan 12, 2015 at 12:41 AM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>> > On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
>> >> (dropping previous conversation for easy reading)
>> >>
>> >> Hi Marc/Christoffer,
>> >>
>> >> I tried implementing PMU context-switch via C code
>> >> in EL1 mode and in atomic context with irqs disabled.
>> >> The context switch itself works perfectly fine but
>> >> irq forwarding is not clean for PMU irq.
>> >>
>> >> I found another issue that is GIC only samples irq
>> >> lines if they are enabled. This means for using
>> >> irq forwarding we will need to ensure that host PMU
>> >> irq is enabled.  The arch_timer code does this by
>> >> doing request_irq() for host virtual timer interrupt.
>> >> For PMU, we can either enable/disable host PMU
>> >> irq in context switch or we need to do have shared
>> >> irq handler between kvm pmu and host kernel pmu.
>> >
>> > could we simply require the host PMU driver to request the IRQ and have
>> > the driver inject the corresponding IRQ to the VM via a mechanism
>> > similar to VFIO using an eventfd and irqfds etc.?
>>
>> Currently, the host PMU driver does request_irq() only when
>> there is some event to be monitored. This means host will do
>> request_irq() only when we run perf application on host
>> user space.
>>
>> Initially, I though that we could simply pass IRQF_SHARED
>> for request_irq() in host PMU driver and do the same for
>> reqest_irq() in KVM PMU code but the PMU irq can be
>> SPI or PPI. If the PMU irq is SPI then IRQF_SHARED
>> flag would fine but if its PPI then we have no way to
>> set IRQF_SHARED flag because request_percpu_irq()
>> does not have irq flags parameter.
>>
>> >
>> > (I haven't quite thought through if there's a way for the host PMU
>> > driver to distinguish between an IRQ for itself and one for the guest,
>> > though).
>> >
>> > It does feel like we will need some sort of communication/coordination
>> > between the host PMU driver and KVM...
>> >
>> >>
>> >> I have rethinked about our discussion so far. I
>> >> understand that we need KVM PMU virtualization
>> >> to meet following criteria:
>> >> 1. No modification in host PMU driver
>> >
>> > is this really a strict requirement?  one of the advantages of KVM
>> > should be that the rest of the kernel should be supportive of KVM.
>>
>> I guess so because host PMU driver should not do things
>> differently for host and guest. I think this the reason why
>> we discarded the mask/unmask PMU irq approach which
>> I had implemented in RFC v1.
>>
>> >
>> >> 2. No modification in guest PMU driver
>> >> 3. No mask/unmask dance for sharing host PMU irq
>> >> 4. Clean way to avoid infinite VM exits due to
>> >> PMU interrupt
>> >>
>> >> I have discovered new approach which is as follows:
>> >> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
>> >> 2. Ensure that host PMU irq is disabled when entering guest
>> >> mode and re-enable host PMU irq when exiting guest mode if
>> >> it was enabled previously.
>> >
>> > How does this look like software-engineering wise?  Would you be looking
>> > up the IRQ number from the DT in the KVM code again?  How does KVM then
>> > synchronize with the host PMU driver so they're not both requesting the
>> > same IRQ at the same time?
>>
>> We only lookup host PMU irq numbers from DT at HYP init time.
>>
>> During context switch we know the host PMU irq number for
>> current host CPU so we can get state of host PMU irq in
>> context switch code.
>>
>> If we go by the shard irq handler approach then both KVM
>> and host PMU driver will do request_irq() on same host
>> PMU irq. In other words, there is no virtual PMU irq provided
>> by HW for guest.
>>
>
> Sorry for the *really* long delay in this response.
>
> We had a chat about this subject with Will Deacon and Marc Zyngier
> during connect, and basically we came to think of a number of problems
> with the current approach:
>
> 1. As you pointed out, there is a need for a shared IRQ handler, and
>    there is no immediately nice way to implement this without a more
>    sophisticated perf/kvm interface, probably comprising eventfds or
>    something similar.
>
> 2. Hijacking the counters for the VM without perf knowing about it
>    basically makes it impossible to do system-wide event counting, an
>    important use case for a virtualization host.
>
> So the approach we will be taking now would be to:
>
> First, implement a strictly trap-and-emulate in software approach.  This
> would allow any software relying on access to performance counters to
> work, although potentially with slightly unprecise values.  This is the
> approach taken by x86 and would be significantly simpler to support on
> systems like big.LITTLE as well.

Actually, trap-and-emulate would also help avoid additions to the
KVM world switch.

>
> Second, if there are values obtained from within the guest that are so
> skewed by the trap-and-emulate approach that we need to give the guest
> access to counters, we should try to share the hardware by partitioning
> the physical counters, but again, we need to coordinate with the host
> perf system for this.  We would only be pursuing this approach if
> absolutely necessary.

Yes, with trap-and-emulate we cannot accurately emulate all types
of hw counters (particularly cache misses and similar events).

>
> Apologies for the change in direction on this.
>
> What are your thoughts?  Do you still have time/interest to work
> on any of this?

Its a drastic change in direction.

Currently, I have taken up some different work (not related to KVM)
so for next few months I wont be able spend time on this.

Its better if Linaro takes this work to avoid any further delays.

Best Regards,
Anup

>
> Thanks,
> -Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
  2015-02-16 12:16                                             ` Anup Patel
@ 2015-02-16 12:23                                               ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2015-02-16 12:23 UTC (permalink / raw)
  To: Anup Patel
  Cc: Marc Zyngier, kvmarm, linux-arm-kernel, KVM General, patches,
	Will Deacon, Ian.Campbell, Pranavkumar Sawargaonkar

On Mon, Feb 16, 2015 at 05:46:54PM +0530, Anup Patel wrote:
> Hi Christoffer,
> 
> On Sun, Feb 15, 2015 at 9:03 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > Hi Anup,
> >
> > On Mon, Jan 12, 2015 at 09:49:13AM +0530, Anup Patel wrote:
> >> On Mon, Jan 12, 2015 at 12:41 AM, Christoffer Dall
> >> <christoffer.dall@linaro.org> wrote:
> >> > On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
> >> >> (dropping previous conversation for easy reading)
> >> >>
> >> >> Hi Marc/Christoffer,
> >> >>
> >> >> I tried implementing PMU context-switch via C code
> >> >> in EL1 mode and in atomic context with irqs disabled.
> >> >> The context switch itself works perfectly fine but
> >> >> irq forwarding is not clean for PMU irq.
> >> >>
> >> >> I found another issue that is GIC only samples irq
> >> >> lines if they are enabled. This means for using
> >> >> irq forwarding we will need to ensure that host PMU
> >> >> irq is enabled.  The arch_timer code does this by
> >> >> doing request_irq() for host virtual timer interrupt.
> >> >> For PMU, we can either enable/disable host PMU
> >> >> irq in context switch or we need to do have shared
> >> >> irq handler between kvm pmu and host kernel pmu.
> >> >
> >> > could we simply require the host PMU driver to request the IRQ and have
> >> > the driver inject the corresponding IRQ to the VM via a mechanism
> >> > similar to VFIO using an eventfd and irqfds etc.?
> >>
> >> Currently, the host PMU driver does request_irq() only when
> >> there is some event to be monitored. This means host will do
> >> request_irq() only when we run perf application on host
> >> user space.
> >>
> >> Initially, I though that we could simply pass IRQF_SHARED
> >> for request_irq() in host PMU driver and do the same for
> >> reqest_irq() in KVM PMU code but the PMU irq can be
> >> SPI or PPI. If the PMU irq is SPI then IRQF_SHARED
> >> flag would fine but if its PPI then we have no way to
> >> set IRQF_SHARED flag because request_percpu_irq()
> >> does not have irq flags parameter.
> >>
> >> >
> >> > (I haven't quite thought through if there's a way for the host PMU
> >> > driver to distinguish between an IRQ for itself and one for the guest,
> >> > though).
> >> >
> >> > It does feel like we will need some sort of communication/coordination
> >> > between the host PMU driver and KVM...
> >> >
> >> >>
> >> >> I have rethinked about our discussion so far. I
> >> >> understand that we need KVM PMU virtualization
> >> >> to meet following criteria:
> >> >> 1. No modification in host PMU driver
> >> >
> >> > is this really a strict requirement?  one of the advantages of KVM
> >> > should be that the rest of the kernel should be supportive of KVM.
> >>
> >> I guess so because host PMU driver should not do things
> >> differently for host and guest. I think this the reason why
> >> we discarded the mask/unmask PMU irq approach which
> >> I had implemented in RFC v1.
> >>
> >> >
> >> >> 2. No modification in guest PMU driver
> >> >> 3. No mask/unmask dance for sharing host PMU irq
> >> >> 4. Clean way to avoid infinite VM exits due to
> >> >> PMU interrupt
> >> >>
> >> >> I have discovered new approach which is as follows:
> >> >> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
> >> >> 2. Ensure that host PMU irq is disabled when entering guest
> >> >> mode and re-enable host PMU irq when exiting guest mode if
> >> >> it was enabled previously.
> >> >
> >> > How does this look like software-engineering wise?  Would you be looking
> >> > up the IRQ number from the DT in the KVM code again?  How does KVM then
> >> > synchronize with the host PMU driver so they're not both requesting the
> >> > same IRQ at the same time?
> >>
> >> We only lookup host PMU irq numbers from DT at HYP init time.
> >>
> >> During context switch we know the host PMU irq number for
> >> current host CPU so we can get state of host PMU irq in
> >> context switch code.
> >>
> >> If we go by the shard irq handler approach then both KVM
> >> and host PMU driver will do request_irq() on same host
> >> PMU irq. In other words, there is no virtual PMU irq provided
> >> by HW for guest.
> >>
> >
> > Sorry for the *really* long delay in this response.
> >
> > We had a chat about this subject with Will Deacon and Marc Zyngier
> > during connect, and basically we came to think of a number of problems
> > with the current approach:
> >
> > 1. As you pointed out, there is a need for a shared IRQ handler, and
> >    there is no immediately nice way to implement this without a more
> >    sophisticated perf/kvm interface, probably comprising eventfds or
> >    something similar.
> >
> > 2. Hijacking the counters for the VM without perf knowing about it
> >    basically makes it impossible to do system-wide event counting, an
> >    important use case for a virtualization host.
> >
> > So the approach we will be taking now would be to:
> >
> > First, implement a strictly trap-and-emulate in software approach.  This
> > would allow any software relying on access to performance counters to
> > work, although potentially with slightly unprecise values.  This is the
> > approach taken by x86 and would be significantly simpler to support on
> > systems like big.LITTLE as well.
> 
> Actually, trap-and-emulate would also help avoid additions to the
> KVM world switch.
> 
> >
> > Second, if there are values obtained from within the guest that are so
> > skewed by the trap-and-emulate approach that we need to give the guest
> > access to counters, we should try to share the hardware by partitioning
> > the physical counters, but again, we need to coordinate with the host
> > perf system for this.  We would only be pursuing this approach if
> > absolutely necessary.
> 
> Yes, with trap-and-emulate we cannot accurately emulate all types
> of hw counters (particularly cache misses and similar events).
> 
> >
> > Apologies for the change in direction on this.
> >
> > What are your thoughts?  Do you still have time/interest to work
> > on any of this?
> 
> Its a drastic change in direction.
> 
> Currently, I have taken up some different work (not related to KVM)
> so for next few months I wont be able spend time on this.
> 
> Its better if Linaro takes this work to avoid any further delays.
> 
ok, will do, thanks for being responsive and putting in the efforts so
far!

Best,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2015-02-16 12:23                                               ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2015-02-16 12:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 16, 2015 at 05:46:54PM +0530, Anup Patel wrote:
> Hi Christoffer,
> 
> On Sun, Feb 15, 2015 at 9:03 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > Hi Anup,
> >
> > On Mon, Jan 12, 2015 at 09:49:13AM +0530, Anup Patel wrote:
> >> On Mon, Jan 12, 2015 at 12:41 AM, Christoffer Dall
> >> <christoffer.dall@linaro.org> wrote:
> >> > On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
> >> >> (dropping previous conversation for easy reading)
> >> >>
> >> >> Hi Marc/Christoffer,
> >> >>
> >> >> I tried implementing PMU context-switch via C code
> >> >> in EL1 mode and in atomic context with irqs disabled.
> >> >> The context switch itself works perfectly fine but
> >> >> irq forwarding is not clean for PMU irq.
> >> >>
> >> >> I found another issue that is GIC only samples irq
> >> >> lines if they are enabled. This means for using
> >> >> irq forwarding we will need to ensure that host PMU
> >> >> irq is enabled.  The arch_timer code does this by
> >> >> doing request_irq() for host virtual timer interrupt.
> >> >> For PMU, we can either enable/disable host PMU
> >> >> irq in context switch or we need to do have shared
> >> >> irq handler between kvm pmu and host kernel pmu.
> >> >
> >> > could we simply require the host PMU driver to request the IRQ and have
> >> > the driver inject the corresponding IRQ to the VM via a mechanism
> >> > similar to VFIO using an eventfd and irqfds etc.?
> >>
> >> Currently, the host PMU driver does request_irq() only when
> >> there is some event to be monitored. This means host will do
> >> request_irq() only when we run perf application on host
> >> user space.
> >>
> >> Initially, I though that we could simply pass IRQF_SHARED
> >> for request_irq() in host PMU driver and do the same for
> >> reqest_irq() in KVM PMU code but the PMU irq can be
> >> SPI or PPI. If the PMU irq is SPI then IRQF_SHARED
> >> flag would fine but if its PPI then we have no way to
> >> set IRQF_SHARED flag because request_percpu_irq()
> >> does not have irq flags parameter.
> >>
> >> >
> >> > (I haven't quite thought through if there's a way for the host PMU
> >> > driver to distinguish between an IRQ for itself and one for the guest,
> >> > though).
> >> >
> >> > It does feel like we will need some sort of communication/coordination
> >> > between the host PMU driver and KVM...
> >> >
> >> >>
> >> >> I have rethinked about our discussion so far. I
> >> >> understand that we need KVM PMU virtualization
> >> >> to meet following criteria:
> >> >> 1. No modification in host PMU driver
> >> >
> >> > is this really a strict requirement?  one of the advantages of KVM
> >> > should be that the rest of the kernel should be supportive of KVM.
> >>
> >> I guess so because host PMU driver should not do things
> >> differently for host and guest. I think this the reason why
> >> we discarded the mask/unmask PMU irq approach which
> >> I had implemented in RFC v1.
> >>
> >> >
> >> >> 2. No modification in guest PMU driver
> >> >> 3. No mask/unmask dance for sharing host PMU irq
> >> >> 4. Clean way to avoid infinite VM exits due to
> >> >> PMU interrupt
> >> >>
> >> >> I have discovered new approach which is as follows:
> >> >> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
> >> >> 2. Ensure that host PMU irq is disabled when entering guest
> >> >> mode and re-enable host PMU irq when exiting guest mode if
> >> >> it was enabled previously.
> >> >
> >> > How does this look like software-engineering wise?  Would you be looking
> >> > up the IRQ number from the DT in the KVM code again?  How does KVM then
> >> > synchronize with the host PMU driver so they're not both requesting the
> >> > same IRQ at the same time?
> >>
> >> We only lookup host PMU irq numbers from DT at HYP init time.
> >>
> >> During context switch we know the host PMU irq number for
> >> current host CPU so we can get state of host PMU irq in
> >> context switch code.
> >>
> >> If we go by the shard irq handler approach then both KVM
> >> and host PMU driver will do request_irq() on same host
> >> PMU irq. In other words, there is no virtual PMU irq provided
> >> by HW for guest.
> >>
> >
> > Sorry for the *really* long delay in this response.
> >
> > We had a chat about this subject with Will Deacon and Marc Zyngier
> > during connect, and basically we came to think of a number of problems
> > with the current approach:
> >
> > 1. As you pointed out, there is a need for a shared IRQ handler, and
> >    there is no immediately nice way to implement this without a more
> >    sophisticated perf/kvm interface, probably comprising eventfds or
> >    something similar.
> >
> > 2. Hijacking the counters for the VM without perf knowing about it
> >    basically makes it impossible to do system-wide event counting, an
> >    important use case for a virtualization host.
> >
> > So the approach we will be taking now would be to:
> >
> > First, implement a strictly trap-and-emulate in software approach.  This
> > would allow any software relying on access to performance counters to
> > work, although potentially with slightly unprecise values.  This is the
> > approach taken by x86 and would be significantly simpler to support on
> > systems like big.LITTLE as well.
> 
> Actually, trap-and-emulate would also help avoid additions to the
> KVM world switch.
> 
> >
> > Second, if there are values obtained from within the guest that are so
> > skewed by the trap-and-emulate approach that we need to give the guest
> > access to counters, we should try to share the hardware by partitioning
> > the physical counters, but again, we need to coordinate with the host
> > perf system for this.  We would only be pursuing this approach if
> > absolutely necessary.
> 
> Yes, with trap-and-emulate we cannot accurately emulate all types
> of hw counters (particularly cache misses and similar events).
> 
> >
> > Apologies for the change in direction on this.
> >
> > What are your thoughts?  Do you still have time/interest to work
> > on any of this?
> 
> Its a drastic change in direction.
> 
> Currently, I have taken up some different work (not related to KVM)
> so for next few months I wont be able spend time on this.
> 
> Its better if Linaro takes this work to avoid any further delays.
> 
ok, will do, thanks for being responsive and putting in the efforts so
far!

Best,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, other threads:[~2015-02-16 12:23 UTC | newest]

Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-05  9:24 [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support Anup Patel
2014-08-05  9:24 ` Anup Patel
2014-08-05  9:24 ` [RFC PATCH 1/6] ARM64: Move PMU register related defines to asm/pmu.h Anup Patel
2014-08-05  9:24   ` Anup Patel
2014-08-05  9:24 ` [RFC PATCH 2/6] ARM64: perf: Re-enable overflow interrupt from interrupt handler Anup Patel
2014-08-05  9:24   ` Anup Patel
2014-08-06 14:24   ` Will Deacon
2014-08-06 14:24     ` Will Deacon
2014-08-07  9:03     ` Anup Patel
2014-08-07  9:03       ` Anup Patel
2014-08-07  9:06       ` Will Deacon
2014-08-07  9:06         ` Will Deacon
2014-08-05  9:24 ` [RFC PATCH 3/6] ARM: " Anup Patel
2014-08-05  9:24   ` Anup Patel
2014-08-05  9:24 ` [RFC PATCH 4/6] ARM/ARM64: KVM: Add common code PMU IRQ routing Anup Patel
2014-08-05  9:24   ` Anup Patel
2014-08-05  9:24 ` [RFC PATCH 5/6] ARM64: KVM: Implement full context switch of PMU registers Anup Patel
2014-08-05  9:24   ` Anup Patel
2014-08-05  9:24 ` [RFC PATCH 6/6] ARM64: KVM: Upgrade to lazy " Anup Patel
2014-08-05  9:24   ` Anup Patel
2014-08-05  9:32 ` [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support Anup Patel
2014-08-05  9:32   ` Anup Patel
2014-08-05  9:35   ` Anup Patel
2014-08-05  9:35     ` Anup Patel
2014-11-07 20:23 ` Christoffer Dall
2014-11-07 20:23   ` Christoffer Dall
2014-11-07 20:25 ` Christoffer Dall
2014-11-07 20:25   ` Christoffer Dall
2014-11-08  9:36   ` Anup Patel
2014-11-08  9:36     ` Anup Patel
2014-11-08 12:39     ` Christoffer Dall
2014-11-08 12:39       ` Christoffer Dall
2014-11-11  9:18       ` Anup Patel
2014-11-11  9:18         ` Anup Patel
2014-11-18  3:24         ` Anup Patel
2014-11-18  3:24           ` Anup Patel
2014-11-19 15:29         ` Christoffer Dall
2014-11-19 15:29           ` Christoffer Dall
2014-11-20 14:47           ` Anup Patel
2014-11-20 14:47             ` Anup Patel
2014-11-21  9:59             ` Christoffer Dall
2014-11-21  9:59               ` Christoffer Dall
2014-11-21 10:36               ` Anup Patel
2014-11-21 10:36                 ` Anup Patel
2014-11-21 11:49                 ` Christoffer Dall
2014-11-21 11:49                   ` Christoffer Dall
2014-11-24  8:44                   ` Anup Patel
2014-11-24  8:44                     ` Anup Patel
2014-11-24 14:37                     ` Christoffer Dall
2014-11-24 14:37                       ` Christoffer Dall
2014-11-25 12:47                       ` Anup Patel
2014-11-25 12:47                         ` Anup Patel
2014-11-25 13:42                         ` Christoffer Dall
2014-11-25 13:42                           ` Christoffer Dall
2014-11-27 10:22                           ` Anup Patel
2014-11-27 10:22                             ` Anup Patel
2014-11-27 10:40                             ` Marc Zyngier
2014-11-27 10:40                               ` Marc Zyngier
2014-11-27 10:54                               ` Anup Patel
2014-11-27 10:54                                 ` Anup Patel
2014-11-27 11:06                                 ` Marc Zyngier
2014-11-27 11:06                                   ` Marc Zyngier
2014-12-30  5:49                                   ` Anup Patel
2014-12-30  5:49                                     ` Anup Patel
2015-01-08  4:02                                     ` Anup Patel
2015-01-08  4:02                                       ` Anup Patel
2015-01-11 19:11                                     ` Christoffer Dall
2015-01-11 19:11                                       ` Christoffer Dall
2015-01-12  4:19                                       ` Anup Patel
2015-01-12  4:19                                         ` Anup Patel
2015-02-15 15:33                                         ` Christoffer Dall
2015-02-15 15:33                                           ` Christoffer Dall
2015-02-16 12:16                                           ` Anup Patel
2015-02-16 12:16                                             ` Anup Patel
2015-02-16 12:23                                             ` Christoffer Dall
2015-02-16 12:23                                               ` Christoffer Dall
2015-01-14  4:28                                       ` Anup Patel
2015-01-14  4:28                                         ` Anup Patel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.