* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-08-05 9:24 ` Anup Patel
0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05 9:24 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, kvm, patches, marc.zyngier, christoffer.dall,
will.deacon, ian.campbell, pranavkumar, Anup Patel
This patchset enables PMU virtualization in KVM ARM64. The
Guest can now directly use PMU available on the host HW.
The virtual PMU IRQ injection for Guest VCPUs is managed by
small piece of code shared between KVM ARM and KVM ARM64. The
virtual PMU IRQ number will be based on Guest machine model and
user space will provide it using set device address vm ioctl.
The second last patch of this series implements full context
switch of PMU registers which will context switch all PMU
registers on every KVM world-switch.
The last patch implements a lazy context switch of PMU registers
which is very similar to lazy debug context switch.
(Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
Also, we reserve last PMU event counter for EL2 mode which
will not be accessible from Host and Guest EL1 mode. This
reserved EL2 mode PMU event counter can be used for profiling
KVM world-switch and other EL2 mode functions.
All testing have been done using KVMTOOL on X-Gene Mustang and
Foundation v8 Model for both Aarch32 and Aarch64 guest.
Anup Patel (6):
ARM64: Move PMU register related defines to asm/pmu.h
ARM64: perf: Re-enable overflow interrupt from interrupt handler
ARM: perf: Re-enable overflow interrupt from interrupt handler
ARM/ARM64: KVM: Add common code PMU IRQ routing
ARM64: KVM: Implement full context switch of PMU registers
ARM64: KVM: Upgrade to lazy context switch of PMU registers
arch/arm/include/asm/kvm_host.h | 9 +
arch/arm/include/uapi/asm/kvm.h | 1 +
arch/arm/kernel/perf_event_v7.c | 8 +
arch/arm/kvm/arm.c | 6 +
arch/arm/kvm/reset.c | 4 +
arch/arm64/include/asm/kvm_asm.h | 39 +++-
arch/arm64/include/asm/kvm_host.h | 12 ++
arch/arm64/include/asm/pmu.h | 44 +++++
arch/arm64/include/uapi/asm/kvm.h | 1 +
arch/arm64/kernel/asm-offsets.c | 2 +
arch/arm64/kernel/perf_event.c | 40 +---
arch/arm64/kvm/Kconfig | 7 +
arch/arm64/kvm/Makefile | 1 +
arch/arm64/kvm/hyp-init.S | 15 ++
arch/arm64/kvm/hyp.S | 209 +++++++++++++++++++-
arch/arm64/kvm/reset.c | 4 +
arch/arm64/kvm/sys_regs.c | 385 +++++++++++++++++++++++++++++++++----
include/kvm/arm_pmu.h | 52 +++++
virt/kvm/arm/pmu.c | 105 ++++++++++
19 files changed, 870 insertions(+), 74 deletions(-)
create mode 100644 include/kvm/arm_pmu.h
create mode 100644 virt/kvm/arm/pmu.c
--
1.7.9.5
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-08-05 9:24 ` Anup Patel
0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05 9:24 UTC (permalink / raw)
To: linux-arm-kernel
This patchset enables PMU virtualization in KVM ARM64. The
Guest can now directly use PMU available on the host HW.
The virtual PMU IRQ injection for Guest VCPUs is managed by
small piece of code shared between KVM ARM and KVM ARM64. The
virtual PMU IRQ number will be based on Guest machine model and
user space will provide it using set device address vm ioctl.
The second last patch of this series implements full context
switch of PMU registers which will context switch all PMU
registers on every KVM world-switch.
The last patch implements a lazy context switch of PMU registers
which is very similar to lazy debug context switch.
(Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
Also, we reserve last PMU event counter for EL2 mode which
will not be accessible from Host and Guest EL1 mode. This
reserved EL2 mode PMU event counter can be used for profiling
KVM world-switch and other EL2 mode functions.
All testing have been done using KVMTOOL on X-Gene Mustang and
Foundation v8 Model for both Aarch32 and Aarch64 guest.
Anup Patel (6):
ARM64: Move PMU register related defines to asm/pmu.h
ARM64: perf: Re-enable overflow interrupt from interrupt handler
ARM: perf: Re-enable overflow interrupt from interrupt handler
ARM/ARM64: KVM: Add common code PMU IRQ routing
ARM64: KVM: Implement full context switch of PMU registers
ARM64: KVM: Upgrade to lazy context switch of PMU registers
arch/arm/include/asm/kvm_host.h | 9 +
arch/arm/include/uapi/asm/kvm.h | 1 +
arch/arm/kernel/perf_event_v7.c | 8 +
arch/arm/kvm/arm.c | 6 +
arch/arm/kvm/reset.c | 4 +
arch/arm64/include/asm/kvm_asm.h | 39 +++-
arch/arm64/include/asm/kvm_host.h | 12 ++
arch/arm64/include/asm/pmu.h | 44 +++++
arch/arm64/include/uapi/asm/kvm.h | 1 +
arch/arm64/kernel/asm-offsets.c | 2 +
arch/arm64/kernel/perf_event.c | 40 +---
arch/arm64/kvm/Kconfig | 7 +
arch/arm64/kvm/Makefile | 1 +
arch/arm64/kvm/hyp-init.S | 15 ++
arch/arm64/kvm/hyp.S | 209 +++++++++++++++++++-
arch/arm64/kvm/reset.c | 4 +
arch/arm64/kvm/sys_regs.c | 385 +++++++++++++++++++++++++++++++++----
include/kvm/arm_pmu.h | 52 +++++
virt/kvm/arm/pmu.c | 105 ++++++++++
19 files changed, 870 insertions(+), 74 deletions(-)
create mode 100644 include/kvm/arm_pmu.h
create mode 100644 virt/kvm/arm/pmu.c
--
1.7.9.5
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 1/6] ARM64: Move PMU register related defines to asm/pmu.h
2014-08-05 9:24 ` Anup Patel
@ 2014-08-05 9:24 ` Anup Patel
-1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05 9:24 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, kvm, patches, marc.zyngier, christoffer.dall,
will.deacon, ian.campbell, pranavkumar, Anup Patel
To use the ARMv8 PMU related register defines from the KVM code,
we move the relevant definitions to asm/pmu.h include file.
We also add #ifndef __ASSEMBLY__ in order to use asm/pmu.h from
assembly code.
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
---
arch/arm64/include/asm/pmu.h | 44 ++++++++++++++++++++++++++++++++++++++++
arch/arm64/kernel/perf_event.c | 32 -----------------------------
2 files changed, 44 insertions(+), 32 deletions(-)
diff --git a/arch/arm64/include/asm/pmu.h b/arch/arm64/include/asm/pmu.h
index e6f0878..f49cc72 100644
--- a/arch/arm64/include/asm/pmu.h
+++ b/arch/arm64/include/asm/pmu.h
@@ -19,6 +19,49 @@
#ifndef __ASM_PMU_H
#define __ASM_PMU_H
+/*
+ * Per-CPU PMCR: config reg
+ */
+#define ARMV8_PMCR_E (1 << 0) /* Enable all counters */
+#define ARMV8_PMCR_P (1 << 1) /* Reset all counters */
+#define ARMV8_PMCR_C (1 << 2) /* Cycle counter reset */
+#define ARMV8_PMCR_D (1 << 3) /* CCNT counts every 64th cpu cycle */
+#define ARMV8_PMCR_X (1 << 4) /* Export to ETM */
+#define ARMV8_PMCR_DP (1 << 5) /* Disable CCNT if non-invasive debug*/
+#define ARMV8_PMCR_N_SHIFT 11 /* Number of counters supported */
+#define ARMV8_PMCR_N_MASK 0x1f
+#define ARMV8_PMCR_MASK 0x3f /* Mask for writable bits */
+
+/*
+ * PMCNTEN: counters enable reg
+ */
+#define ARMV8_CNTEN_MASK 0xffffffff /* Mask for writable bits */
+
+/*
+ * PMINTEN: counters interrupt enable reg
+ */
+#define ARMV8_INTEN_MASK 0xffffffff /* Mask for writable bits */
+
+/*
+ * PMOVSR: counters overflow flag status reg
+ */
+#define ARMV8_OVSR_MASK 0xffffffff /* Mask for writable bits */
+#define ARMV8_OVERFLOWED_MASK ARMV8_OVSR_MASK
+
+/*
+ * PMXEVTYPER: Event selection reg
+ */
+#define ARMV8_EVTYPE_MASK 0xc80003ff /* Mask for writable bits */
+#define ARMV8_EVTYPE_EVENT 0x3ff /* Mask for EVENT bits */
+
+/*
+ * Event filters for PMUv3
+ */
+#define ARMV8_EXCLUDE_EL1 (1 << 31)
+#define ARMV8_EXCLUDE_EL0 (1 << 30)
+#define ARMV8_INCLUDE_EL2 (1 << 27)
+
+#ifndef __ASSEMBLY__
#ifdef CONFIG_HW_PERF_EVENTS
/* The events for a given PMU register set. */
@@ -79,4 +122,5 @@ int armpmu_event_set_period(struct perf_event *event,
int idx);
#endif /* CONFIG_HW_PERF_EVENTS */
+#endif /* __ASSEMBLY__ */
#endif /* __ASM_PMU_H */
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index baf5afb..47dfb8b 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -810,38 +810,6 @@ static const unsigned armv8_pmuv3_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
#define ARMV8_IDX_TO_COUNTER(x) \
(((x) - ARMV8_IDX_COUNTER0) & ARMV8_COUNTER_MASK)
-/*
- * Per-CPU PMCR: config reg
- */
-#define ARMV8_PMCR_E (1 << 0) /* Enable all counters */
-#define ARMV8_PMCR_P (1 << 1) /* Reset all counters */
-#define ARMV8_PMCR_C (1 << 2) /* Cycle counter reset */
-#define ARMV8_PMCR_D (1 << 3) /* CCNT counts every 64th cpu cycle */
-#define ARMV8_PMCR_X (1 << 4) /* Export to ETM */
-#define ARMV8_PMCR_DP (1 << 5) /* Disable CCNT if non-invasive debug*/
-#define ARMV8_PMCR_N_SHIFT 11 /* Number of counters supported */
-#define ARMV8_PMCR_N_MASK 0x1f
-#define ARMV8_PMCR_MASK 0x3f /* Mask for writable bits */
-
-/*
- * PMOVSR: counters overflow flag status reg
- */
-#define ARMV8_OVSR_MASK 0xffffffff /* Mask for writable bits */
-#define ARMV8_OVERFLOWED_MASK ARMV8_OVSR_MASK
-
-/*
- * PMXEVTYPER: Event selection reg
- */
-#define ARMV8_EVTYPE_MASK 0xc80003ff /* Mask for writable bits */
-#define ARMV8_EVTYPE_EVENT 0x3ff /* Mask for EVENT bits */
-
-/*
- * Event filters for PMUv3
- */
-#define ARMV8_EXCLUDE_EL1 (1 << 31)
-#define ARMV8_EXCLUDE_EL0 (1 << 30)
-#define ARMV8_INCLUDE_EL2 (1 << 27)
-
static inline u32 armv8pmu_pmcr_read(void)
{
u32 val;
--
1.7.9.5
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [RFC PATCH 1/6] ARM64: Move PMU register related defines to asm/pmu.h
@ 2014-08-05 9:24 ` Anup Patel
0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05 9:24 UTC (permalink / raw)
To: linux-arm-kernel
To use the ARMv8 PMU related register defines from the KVM code,
we move the relevant definitions to asm/pmu.h include file.
We also add #ifndef __ASSEMBLY__ in order to use asm/pmu.h from
assembly code.
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
---
arch/arm64/include/asm/pmu.h | 44 ++++++++++++++++++++++++++++++++++++++++
arch/arm64/kernel/perf_event.c | 32 -----------------------------
2 files changed, 44 insertions(+), 32 deletions(-)
diff --git a/arch/arm64/include/asm/pmu.h b/arch/arm64/include/asm/pmu.h
index e6f0878..f49cc72 100644
--- a/arch/arm64/include/asm/pmu.h
+++ b/arch/arm64/include/asm/pmu.h
@@ -19,6 +19,49 @@
#ifndef __ASM_PMU_H
#define __ASM_PMU_H
+/*
+ * Per-CPU PMCR: config reg
+ */
+#define ARMV8_PMCR_E (1 << 0) /* Enable all counters */
+#define ARMV8_PMCR_P (1 << 1) /* Reset all counters */
+#define ARMV8_PMCR_C (1 << 2) /* Cycle counter reset */
+#define ARMV8_PMCR_D (1 << 3) /* CCNT counts every 64th cpu cycle */
+#define ARMV8_PMCR_X (1 << 4) /* Export to ETM */
+#define ARMV8_PMCR_DP (1 << 5) /* Disable CCNT if non-invasive debug*/
+#define ARMV8_PMCR_N_SHIFT 11 /* Number of counters supported */
+#define ARMV8_PMCR_N_MASK 0x1f
+#define ARMV8_PMCR_MASK 0x3f /* Mask for writable bits */
+
+/*
+ * PMCNTEN: counters enable reg
+ */
+#define ARMV8_CNTEN_MASK 0xffffffff /* Mask for writable bits */
+
+/*
+ * PMINTEN: counters interrupt enable reg
+ */
+#define ARMV8_INTEN_MASK 0xffffffff /* Mask for writable bits */
+
+/*
+ * PMOVSR: counters overflow flag status reg
+ */
+#define ARMV8_OVSR_MASK 0xffffffff /* Mask for writable bits */
+#define ARMV8_OVERFLOWED_MASK ARMV8_OVSR_MASK
+
+/*
+ * PMXEVTYPER: Event selection reg
+ */
+#define ARMV8_EVTYPE_MASK 0xc80003ff /* Mask for writable bits */
+#define ARMV8_EVTYPE_EVENT 0x3ff /* Mask for EVENT bits */
+
+/*
+ * Event filters for PMUv3
+ */
+#define ARMV8_EXCLUDE_EL1 (1 << 31)
+#define ARMV8_EXCLUDE_EL0 (1 << 30)
+#define ARMV8_INCLUDE_EL2 (1 << 27)
+
+#ifndef __ASSEMBLY__
#ifdef CONFIG_HW_PERF_EVENTS
/* The events for a given PMU register set. */
@@ -79,4 +122,5 @@ int armpmu_event_set_period(struct perf_event *event,
int idx);
#endif /* CONFIG_HW_PERF_EVENTS */
+#endif /* __ASSEMBLY__ */
#endif /* __ASM_PMU_H */
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index baf5afb..47dfb8b 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -810,38 +810,6 @@ static const unsigned armv8_pmuv3_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
#define ARMV8_IDX_TO_COUNTER(x) \
(((x) - ARMV8_IDX_COUNTER0) & ARMV8_COUNTER_MASK)
-/*
- * Per-CPU PMCR: config reg
- */
-#define ARMV8_PMCR_E (1 << 0) /* Enable all counters */
-#define ARMV8_PMCR_P (1 << 1) /* Reset all counters */
-#define ARMV8_PMCR_C (1 << 2) /* Cycle counter reset */
-#define ARMV8_PMCR_D (1 << 3) /* CCNT counts every 64th cpu cycle */
-#define ARMV8_PMCR_X (1 << 4) /* Export to ETM */
-#define ARMV8_PMCR_DP (1 << 5) /* Disable CCNT if non-invasive debug*/
-#define ARMV8_PMCR_N_SHIFT 11 /* Number of counters supported */
-#define ARMV8_PMCR_N_MASK 0x1f
-#define ARMV8_PMCR_MASK 0x3f /* Mask for writable bits */
-
-/*
- * PMOVSR: counters overflow flag status reg
- */
-#define ARMV8_OVSR_MASK 0xffffffff /* Mask for writable bits */
-#define ARMV8_OVERFLOWED_MASK ARMV8_OVSR_MASK
-
-/*
- * PMXEVTYPER: Event selection reg
- */
-#define ARMV8_EVTYPE_MASK 0xc80003ff /* Mask for writable bits */
-#define ARMV8_EVTYPE_EVENT 0x3ff /* Mask for EVENT bits */
-
-/*
- * Event filters for PMUv3
- */
-#define ARMV8_EXCLUDE_EL1 (1 << 31)
-#define ARMV8_EXCLUDE_EL0 (1 << 30)
-#define ARMV8_INCLUDE_EL2 (1 << 27)
-
static inline u32 armv8pmu_pmcr_read(void)
{
u32 val;
--
1.7.9.5
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [RFC PATCH 2/6] ARM64: perf: Re-enable overflow interrupt from interrupt handler
2014-08-05 9:24 ` Anup Patel
@ 2014-08-05 9:24 ` Anup Patel
-1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05 9:24 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, kvm, patches, marc.zyngier, christoffer.dall,
will.deacon, ian.campbell, pranavkumar, Anup Patel
A hypervisor will typically mask the overflow interrupt before
forwarding it to Guest Linux hence we need to re-enable the overflow
interrupt after clearing it in Guest Linux. Also, this re-enabling
of overflow interrupt does not harm in non-virtualized scenarios.
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Anup Patel <anup.patel@linaro.org>
---
arch/arm64/kernel/perf_event.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 47dfb8b..19fb140 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -1076,6 +1076,14 @@ static irqreturn_t armv8pmu_handle_irq(int irq_num, void *dev)
if (!armv8pmu_counter_has_overflowed(pmovsr, idx))
continue;
+ /*
+ * If we are running under a hypervisor such as KVM then
+ * hypervisor will mask the interrupt before forwarding
+ * it to Guest Linux hence re-enable interrupt for the
+ * overflowed counter.
+ */
+ armv8pmu_enable_intens(idx);
+
hwc = &event->hw;
armpmu_event_update(event, hwc, idx);
perf_sample_data_init(&data, 0, hwc->last_period);
--
1.7.9.5
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [RFC PATCH 2/6] ARM64: perf: Re-enable overflow interrupt from interrupt handler
@ 2014-08-05 9:24 ` Anup Patel
0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05 9:24 UTC (permalink / raw)
To: linux-arm-kernel
A hypervisor will typically mask the overflow interrupt before
forwarding it to Guest Linux hence we need to re-enable the overflow
interrupt after clearing it in Guest Linux. Also, this re-enabling
of overflow interrupt does not harm in non-virtualized scenarios.
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Anup Patel <anup.patel@linaro.org>
---
arch/arm64/kernel/perf_event.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 47dfb8b..19fb140 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -1076,6 +1076,14 @@ static irqreturn_t armv8pmu_handle_irq(int irq_num, void *dev)
if (!armv8pmu_counter_has_overflowed(pmovsr, idx))
continue;
+ /*
+ * If we are running under a hypervisor such as KVM then
+ * hypervisor will mask the interrupt before forwarding
+ * it to Guest Linux hence re-enable interrupt for the
+ * overflowed counter.
+ */
+ armv8pmu_enable_intens(idx);
+
hwc = &event->hw;
armpmu_event_update(event, hwc, idx);
perf_sample_data_init(&data, 0, hwc->last_period);
--
1.7.9.5
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [RFC PATCH 3/6] ARM: perf: Re-enable overflow interrupt from interrupt handler
2014-08-05 9:24 ` Anup Patel
@ 2014-08-05 9:24 ` Anup Patel
-1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05 9:24 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, kvm, patches, marc.zyngier, christoffer.dall,
will.deacon, ian.campbell, pranavkumar, Anup Patel
A hypervisor will typically mask the overflow interrupt before
forwarding it to Guest Linux hence we need to re-enable the overflow
interrupt after clearing it in Guest Linux. Also, this re-enabling
of overflow interrupt does not harm in non-virtualized scenarios.
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Anup Patel <anup.patel@linaro.org>
---
arch/arm/kernel/perf_event_v7.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index 1d37568..581cca5 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -1355,6 +1355,14 @@ static irqreturn_t armv7pmu_handle_irq(int irq_num, void *dev)
if (!armv7_pmnc_counter_has_overflowed(pmnc, idx))
continue;
+ /*
+ * If we are running under a hypervisor such as KVM then
+ * hypervisor will mask the interrupt before forwarding
+ * it to Guest Linux hence re-enable interrupt for the
+ * overflowed counter.
+ */
+ armv7_pmnc_enable_intens(idx);
+
hwc = &event->hw;
armpmu_event_update(event);
perf_sample_data_init(&data, 0, hwc->last_period);
--
1.7.9.5
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [RFC PATCH 3/6] ARM: perf: Re-enable overflow interrupt from interrupt handler
@ 2014-08-05 9:24 ` Anup Patel
0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05 9:24 UTC (permalink / raw)
To: linux-arm-kernel
A hypervisor will typically mask the overflow interrupt before
forwarding it to Guest Linux hence we need to re-enable the overflow
interrupt after clearing it in Guest Linux. Also, this re-enabling
of overflow interrupt does not harm in non-virtualized scenarios.
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Anup Patel <anup.patel@linaro.org>
---
arch/arm/kernel/perf_event_v7.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index 1d37568..581cca5 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -1355,6 +1355,14 @@ static irqreturn_t armv7pmu_handle_irq(int irq_num, void *dev)
if (!armv7_pmnc_counter_has_overflowed(pmnc, idx))
continue;
+ /*
+ * If we are running under a hypervisor such as KVM then
+ * hypervisor will mask the interrupt before forwarding
+ * it to Guest Linux hence re-enable interrupt for the
+ * overflowed counter.
+ */
+ armv7_pmnc_enable_intens(idx);
+
hwc = &event->hw;
armpmu_event_update(event);
perf_sample_data_init(&data, 0, hwc->last_period);
--
1.7.9.5
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [RFC PATCH 4/6] ARM/ARM64: KVM: Add common code PMU IRQ routing
2014-08-05 9:24 ` Anup Patel
@ 2014-08-05 9:24 ` Anup Patel
-1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05 9:24 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, kvm, patches, marc.zyngier, christoffer.dall,
will.deacon, ian.campbell, pranavkumar, Anup Patel
This patch introduces common PMU IRQ routing code for
KVM ARM and KVM ARM64 under virt/kvm/arm directory.
The virtual PMU IRQ number for each Guest VCPU will be
provided by user space using set device address vm ioctl
with prameters:
dev_id = KVM_ARM_DEVICE_PMU
type = VCPU number
addr = PMU IRQ number for the VCPU
The low-level context switching code of KVM ARM/ARM64
will determine the state of VCPU PMU IRQ store it in
"irq_pending" flag when saving PMU context for the VCPU.
The common PMU IRQ routing code will inject virtual PMU
IRQ based on "irq_pending" flag and it will also clear
the "irq_pending" flag.
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
---
arch/arm/include/asm/kvm_host.h | 9 ++++
arch/arm/include/uapi/asm/kvm.h | 1 +
arch/arm/kvm/arm.c | 6 +++
arch/arm/kvm/reset.c | 4 ++
arch/arm64/include/asm/kvm_host.h | 9 ++++
arch/arm64/include/uapi/asm/kvm.h | 1 +
arch/arm64/kvm/Kconfig | 7 +++
arch/arm64/kvm/Makefile | 1 +
arch/arm64/kvm/reset.c | 4 ++
include/kvm/arm_pmu.h | 52 ++++++++++++++++++
virt/kvm/arm/pmu.c | 105 +++++++++++++++++++++++++++++++++++++
11 files changed, 199 insertions(+)
create mode 100644 include/kvm/arm_pmu.h
create mode 100644 virt/kvm/arm/pmu.c
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 193ceaf..a6a778f 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -24,6 +24,7 @@
#include <asm/kvm_mmio.h>
#include <asm/fpstate.h>
#include <kvm/arm_arch_timer.h>
+#include <kvm/arm_pmu.h>
#if defined(CONFIG_KVM_ARM_MAX_VCPUS)
#define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
@@ -53,6 +54,9 @@ struct kvm_arch {
/* Timer */
struct arch_timer_kvm timer;
+ /* PMU */
+ struct pmu_kvm pmu;
+
/*
* Anything that is not used directly from assembly code goes
* here.
@@ -118,8 +122,13 @@ struct kvm_vcpu_arch {
/* VGIC state */
struct vgic_cpu vgic_cpu;
+
+ /* Timer state */
struct arch_timer_cpu timer_cpu;
+ /* PMU state */
+ struct pmu_cpu pmu_cpu;
+
/*
* Anything that is not used directly from assembly code goes
* here.
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index e6ebdd3..b21e6eb 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -75,6 +75,7 @@ struct kvm_regs {
/* Supported device IDs */
#define KVM_ARM_DEVICE_VGIC_V2 0
+#define KVM_ARM_DEVICE_PMU 1
/* Supported VGIC address types */
#define KVM_VGIC_V2_ADDR_TYPE_DIST 0
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 3c82b37..04130f5 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -140,6 +140,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
kvm_timer_init(kvm);
+ kvm_pmu_init(kvm);
+
/* Mark the initial VMID generation invalid */
kvm->arch.vmid_gen = 0;
@@ -567,6 +569,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
local_irq_enable();
kvm_timer_sync_hwstate(vcpu);
+ kvm_pmu_sync_hwstate(vcpu);
kvm_vgic_sync_hwstate(vcpu);
continue;
}
@@ -601,6 +604,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
*************************************************************/
kvm_timer_sync_hwstate(vcpu);
+ kvm_pmu_sync_hwstate(vcpu);
kvm_vgic_sync_hwstate(vcpu);
ret = handle_exit(vcpu, run, ret);
@@ -794,6 +798,8 @@ static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
if (!vgic_present)
return -ENXIO;
return kvm_vgic_addr(kvm, type, &dev_addr->addr, true);
+ case KVM_ARM_DEVICE_PMU:
+ return kvm_pmu_addr(kvm, type, &dev_addr->addr, true);
default:
return -ENODEV;
}
diff --git a/arch/arm/kvm/reset.c b/arch/arm/kvm/reset.c
index f558c07..42e6996 100644
--- a/arch/arm/kvm/reset.c
+++ b/arch/arm/kvm/reset.c
@@ -28,6 +28,7 @@
#include <asm/kvm_coproc.h>
#include <kvm/arm_arch_timer.h>
+#include <kvm/arm_pmu.h>
/******************************************************************************
* Cortex-A15 and Cortex-A7 Reset Values
@@ -79,5 +80,8 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
/* Reset arch_timer context */
kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq);
+ /* Reset pmu context */
+ kvm_pmu_vcpu_reset(vcpu);
+
return 0;
}
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7592ddf..ae4cdb2 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -38,6 +38,7 @@
#include <kvm/arm_vgic.h>
#include <kvm/arm_arch_timer.h>
+#include <kvm/arm_pmu.h>
#define KVM_VCPU_MAX_FEATURES 3
@@ -63,6 +64,9 @@ struct kvm_arch {
/* Timer */
struct arch_timer_kvm timer;
+
+ /* PMU */
+ struct pmu_kvm pmu;
};
#define KVM_NR_MEM_OBJS 40
@@ -109,8 +113,13 @@ struct kvm_vcpu_arch {
/* VGIC state */
struct vgic_cpu vgic_cpu;
+
+ /* Timer state */
struct arch_timer_cpu timer_cpu;
+ /* PMU state */
+ struct pmu_cpu pmu_cpu;
+
/*
* Anything that is not used directly from assembly code goes
* here.
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index e633ff8..a7fed09 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -69,6 +69,7 @@ struct kvm_regs {
/* Supported device IDs */
#define KVM_ARM_DEVICE_VGIC_V2 0
+#define KVM_ARM_DEVICE_PMU 1
/* Supported VGIC address types */
#define KVM_VGIC_V2_ADDR_TYPE_DIST 0
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 8ba85e9..672213d 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -26,6 +26,7 @@ config KVM
select KVM_ARM_HOST
select KVM_ARM_VGIC
select KVM_ARM_TIMER
+ select KVM_ARM_PMU
---help---
Support hosting virtualized guest machines.
@@ -60,4 +61,10 @@ config KVM_ARM_TIMER
---help---
Adds support for the Architected Timers in virtual machines.
+config KVM_ARM_PMU
+ bool
+ depends on KVM_ARM_VGIC
+ ---help---
+ Adds support for the Performance Monitoring in virtual machines.
+
endif # VIRTUALIZATION
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 72a9fd5..6be68bc 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -21,3 +21,4 @@ kvm-$(CONFIG_KVM_ARM_HOST) += guest.o reset.o sys_regs.o sys_regs_generic_v8.o
kvm-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o
kvm-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
+kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 70a7816..27f4041 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -24,6 +24,7 @@
#include <linux/kvm.h>
#include <kvm/arm_arch_timer.h>
+#include <kvm/arm_pmu.h>
#include <asm/cputype.h>
#include <asm/ptrace.h>
@@ -108,5 +109,8 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
/* Reset timer */
kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq);
+ /* Reset pmu context */
+ kvm_pmu_vcpu_reset(vcpu);
+
return 0;
}
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
new file mode 100644
index 0000000..1e3aa44
--- /dev/null
+++ b/include/kvm/arm_pmu.h
@@ -0,0 +1,52 @@
+/*
+ * Copyright (C) 2014 Linaro Ltd.
+ * Author: Anup Patel <anup.patel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#ifndef __ASM_ARM_KVM_PMU_H
+#define __ASM_ARM_KVM_PMU_H
+
+struct pmu_kvm {
+#ifdef CONFIG_KVM_ARM_PMU
+ /* PMU IRQ Numbers */
+ unsigned int irq_num[CONFIG_KVM_ARM_MAX_VCPUS];
+#endif
+};
+
+struct pmu_cpu {
+#ifdef CONFIG_KVM_ARM_PMU
+ /* IRQ pending flag. Updated when registers are saved. */
+ u32 irq_pending;
+#endif
+};
+
+#ifdef CONFIG_KVM_ARM_PMU
+void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu);
+void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu);
+int kvm_pmu_addr(struct kvm *kvm, unsigned long cpu, u64 *irq, bool write);
+int kvm_pmu_init(struct kvm *kvm);
+#else
+static inline void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu) {}
+static inline void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu) {}
+static inline int kvm_pmu_addr(struct kvm *kvm,
+ unsigned long cpu, u64 *irq, bool write)
+{
+ return -ENXIO;
+}
+static inline int kvm_pmu_init(struct kvm *kvm) { return 0; }
+#endif
+
+#endif
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
new file mode 100644
index 0000000..98066ad
--- /dev/null
+++ b/virt/kvm/arm/pmu.c
@@ -0,0 +1,105 @@
+/*
+ * Copyright (C) 2014 Linaro Ltd.
+ * Author: Anup Patel <anup.patel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/cpu.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+
+#include <kvm/arm_vgic.h>
+#include <kvm/arm_pmu.h>
+
+/**
+ * kvm_pmu_sync_hwstate - sync pmu state for cpu
+ * @vcpu: The vcpu pointer
+ *
+ * Inject virtual PMU IRQ if IRQ is pending for this cpu.
+ */
+void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu)
+{
+ struct pmu_cpu *pmu = &vcpu->arch.pmu_cpu;
+ struct pmu_kvm *kpmu = &vcpu->kvm->arch.pmu;
+
+ if (pmu->irq_pending) {
+ kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
+ kpmu->irq_num[vcpu->vcpu_id],
+ 1);
+ pmu->irq_pending = 0;
+ return;
+ }
+}
+
+/**
+ * kvm_pmu_vcpu_reset - reset pmu state for cpu
+ * @vcpu: The vcpu pointer
+ *
+ */
+void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu)
+{
+ struct pmu_cpu *pmu = &vcpu->arch.pmu_cpu;
+
+ pmu->irq_pending = 0;
+}
+
+/**
+ * kvm_pmu_addr - set or get PMU VM IRQ numbers
+ * @kvm: pointer to the vm struct
+ * @cpu: cpu number
+ * @irq: pointer to irq number value
+ * @write: if true set the irq number else read the irq number
+ *
+ * Set or get the PMU IRQ number for the given cpu number.
+ */
+int kvm_pmu_addr(struct kvm *kvm, unsigned long cpu, u64 *irq, bool write)
+{
+ struct pmu_kvm *kpmu = &kvm->arch.pmu;
+
+ if (CONFIG_KVM_ARM_MAX_VCPUS <= cpu)
+ return -ENODEV;
+
+ mutex_lock(&kvm->lock);
+
+ if (write) {
+ kpmu->irq_num[cpu] = *irq;
+ } else {
+ *irq = kpmu->irq_num[cpu];
+ }
+
+ mutex_unlock(&kvm->lock);
+
+ return 0;
+}
+
+/**
+ * kvm_pmu_init - Initialize global PMU state for a VM
+ * @kvm: pointer to the kvm struct
+ *
+ * Set all the PMU IRQ numbers to invalid value so that
+ * user space has to explicitly provide PMU IRQ numbers
+ * using set device address ioctl.
+ */
+int kvm_pmu_init(struct kvm *kvm)
+{
+ int i;
+ struct pmu_kvm *kpmu = &kvm->arch.pmu;
+
+ for (i = 0; i < CONFIG_KVM_ARM_MAX_VCPUS; i++) {
+ kpmu->irq_num[i] = UINT_MAX;
+ }
+
+ return 0;
+}
--
1.7.9.5
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [RFC PATCH 4/6] ARM/ARM64: KVM: Add common code PMU IRQ routing
@ 2014-08-05 9:24 ` Anup Patel
0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05 9:24 UTC (permalink / raw)
To: linux-arm-kernel
This patch introduces common PMU IRQ routing code for
KVM ARM and KVM ARM64 under virt/kvm/arm directory.
The virtual PMU IRQ number for each Guest VCPU will be
provided by user space using set device address vm ioctl
with prameters:
dev_id = KVM_ARM_DEVICE_PMU
type = VCPU number
addr = PMU IRQ number for the VCPU
The low-level context switching code of KVM ARM/ARM64
will determine the state of VCPU PMU IRQ store it in
"irq_pending" flag when saving PMU context for the VCPU.
The common PMU IRQ routing code will inject virtual PMU
IRQ based on "irq_pending" flag and it will also clear
the "irq_pending" flag.
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
---
arch/arm/include/asm/kvm_host.h | 9 ++++
arch/arm/include/uapi/asm/kvm.h | 1 +
arch/arm/kvm/arm.c | 6 +++
arch/arm/kvm/reset.c | 4 ++
arch/arm64/include/asm/kvm_host.h | 9 ++++
arch/arm64/include/uapi/asm/kvm.h | 1 +
arch/arm64/kvm/Kconfig | 7 +++
arch/arm64/kvm/Makefile | 1 +
arch/arm64/kvm/reset.c | 4 ++
include/kvm/arm_pmu.h | 52 ++++++++++++++++++
virt/kvm/arm/pmu.c | 105 +++++++++++++++++++++++++++++++++++++
11 files changed, 199 insertions(+)
create mode 100644 include/kvm/arm_pmu.h
create mode 100644 virt/kvm/arm/pmu.c
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 193ceaf..a6a778f 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -24,6 +24,7 @@
#include <asm/kvm_mmio.h>
#include <asm/fpstate.h>
#include <kvm/arm_arch_timer.h>
+#include <kvm/arm_pmu.h>
#if defined(CONFIG_KVM_ARM_MAX_VCPUS)
#define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
@@ -53,6 +54,9 @@ struct kvm_arch {
/* Timer */
struct arch_timer_kvm timer;
+ /* PMU */
+ struct pmu_kvm pmu;
+
/*
* Anything that is not used directly from assembly code goes
* here.
@@ -118,8 +122,13 @@ struct kvm_vcpu_arch {
/* VGIC state */
struct vgic_cpu vgic_cpu;
+
+ /* Timer state */
struct arch_timer_cpu timer_cpu;
+ /* PMU state */
+ struct pmu_cpu pmu_cpu;
+
/*
* Anything that is not used directly from assembly code goes
* here.
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index e6ebdd3..b21e6eb 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -75,6 +75,7 @@ struct kvm_regs {
/* Supported device IDs */
#define KVM_ARM_DEVICE_VGIC_V2 0
+#define KVM_ARM_DEVICE_PMU 1
/* Supported VGIC address types */
#define KVM_VGIC_V2_ADDR_TYPE_DIST 0
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 3c82b37..04130f5 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -140,6 +140,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
kvm_timer_init(kvm);
+ kvm_pmu_init(kvm);
+
/* Mark the initial VMID generation invalid */
kvm->arch.vmid_gen = 0;
@@ -567,6 +569,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
local_irq_enable();
kvm_timer_sync_hwstate(vcpu);
+ kvm_pmu_sync_hwstate(vcpu);
kvm_vgic_sync_hwstate(vcpu);
continue;
}
@@ -601,6 +604,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
*************************************************************/
kvm_timer_sync_hwstate(vcpu);
+ kvm_pmu_sync_hwstate(vcpu);
kvm_vgic_sync_hwstate(vcpu);
ret = handle_exit(vcpu, run, ret);
@@ -794,6 +798,8 @@ static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
if (!vgic_present)
return -ENXIO;
return kvm_vgic_addr(kvm, type, &dev_addr->addr, true);
+ case KVM_ARM_DEVICE_PMU:
+ return kvm_pmu_addr(kvm, type, &dev_addr->addr, true);
default:
return -ENODEV;
}
diff --git a/arch/arm/kvm/reset.c b/arch/arm/kvm/reset.c
index f558c07..42e6996 100644
--- a/arch/arm/kvm/reset.c
+++ b/arch/arm/kvm/reset.c
@@ -28,6 +28,7 @@
#include <asm/kvm_coproc.h>
#include <kvm/arm_arch_timer.h>
+#include <kvm/arm_pmu.h>
/******************************************************************************
* Cortex-A15 and Cortex-A7 Reset Values
@@ -79,5 +80,8 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
/* Reset arch_timer context */
kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq);
+ /* Reset pmu context */
+ kvm_pmu_vcpu_reset(vcpu);
+
return 0;
}
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7592ddf..ae4cdb2 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -38,6 +38,7 @@
#include <kvm/arm_vgic.h>
#include <kvm/arm_arch_timer.h>
+#include <kvm/arm_pmu.h>
#define KVM_VCPU_MAX_FEATURES 3
@@ -63,6 +64,9 @@ struct kvm_arch {
/* Timer */
struct arch_timer_kvm timer;
+
+ /* PMU */
+ struct pmu_kvm pmu;
};
#define KVM_NR_MEM_OBJS 40
@@ -109,8 +113,13 @@ struct kvm_vcpu_arch {
/* VGIC state */
struct vgic_cpu vgic_cpu;
+
+ /* Timer state */
struct arch_timer_cpu timer_cpu;
+ /* PMU state */
+ struct pmu_cpu pmu_cpu;
+
/*
* Anything that is not used directly from assembly code goes
* here.
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index e633ff8..a7fed09 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -69,6 +69,7 @@ struct kvm_regs {
/* Supported device IDs */
#define KVM_ARM_DEVICE_VGIC_V2 0
+#define KVM_ARM_DEVICE_PMU 1
/* Supported VGIC address types */
#define KVM_VGIC_V2_ADDR_TYPE_DIST 0
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 8ba85e9..672213d 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -26,6 +26,7 @@ config KVM
select KVM_ARM_HOST
select KVM_ARM_VGIC
select KVM_ARM_TIMER
+ select KVM_ARM_PMU
---help---
Support hosting virtualized guest machines.
@@ -60,4 +61,10 @@ config KVM_ARM_TIMER
---help---
Adds support for the Architected Timers in virtual machines.
+config KVM_ARM_PMU
+ bool
+ depends on KVM_ARM_VGIC
+ ---help---
+ Adds support for the Performance Monitoring in virtual machines.
+
endif # VIRTUALIZATION
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 72a9fd5..6be68bc 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -21,3 +21,4 @@ kvm-$(CONFIG_KVM_ARM_HOST) += guest.o reset.o sys_regs.o sys_regs_generic_v8.o
kvm-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o
kvm-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
+kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 70a7816..27f4041 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -24,6 +24,7 @@
#include <linux/kvm.h>
#include <kvm/arm_arch_timer.h>
+#include <kvm/arm_pmu.h>
#include <asm/cputype.h>
#include <asm/ptrace.h>
@@ -108,5 +109,8 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
/* Reset timer */
kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq);
+ /* Reset pmu context */
+ kvm_pmu_vcpu_reset(vcpu);
+
return 0;
}
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
new file mode 100644
index 0000000..1e3aa44
--- /dev/null
+++ b/include/kvm/arm_pmu.h
@@ -0,0 +1,52 @@
+/*
+ * Copyright (C) 2014 Linaro Ltd.
+ * Author: Anup Patel <anup.patel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#ifndef __ASM_ARM_KVM_PMU_H
+#define __ASM_ARM_KVM_PMU_H
+
+struct pmu_kvm {
+#ifdef CONFIG_KVM_ARM_PMU
+ /* PMU IRQ Numbers */
+ unsigned int irq_num[CONFIG_KVM_ARM_MAX_VCPUS];
+#endif
+};
+
+struct pmu_cpu {
+#ifdef CONFIG_KVM_ARM_PMU
+ /* IRQ pending flag. Updated when registers are saved. */
+ u32 irq_pending;
+#endif
+};
+
+#ifdef CONFIG_KVM_ARM_PMU
+void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu);
+void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu);
+int kvm_pmu_addr(struct kvm *kvm, unsigned long cpu, u64 *irq, bool write);
+int kvm_pmu_init(struct kvm *kvm);
+#else
+static inline void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu) {}
+static inline void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu) {}
+static inline int kvm_pmu_addr(struct kvm *kvm,
+ unsigned long cpu, u64 *irq, bool write)
+{
+ return -ENXIO;
+}
+static inline int kvm_pmu_init(struct kvm *kvm) { return 0; }
+#endif
+
+#endif
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
new file mode 100644
index 0000000..98066ad
--- /dev/null
+++ b/virt/kvm/arm/pmu.c
@@ -0,0 +1,105 @@
+/*
+ * Copyright (C) 2014 Linaro Ltd.
+ * Author: Anup Patel <anup.patel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/cpu.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+
+#include <kvm/arm_vgic.h>
+#include <kvm/arm_pmu.h>
+
+/**
+ * kvm_pmu_sync_hwstate - sync pmu state for cpu
+ * @vcpu: The vcpu pointer
+ *
+ * Inject virtual PMU IRQ if IRQ is pending for this cpu.
+ */
+void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu)
+{
+ struct pmu_cpu *pmu = &vcpu->arch.pmu_cpu;
+ struct pmu_kvm *kpmu = &vcpu->kvm->arch.pmu;
+
+ if (pmu->irq_pending) {
+ kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
+ kpmu->irq_num[vcpu->vcpu_id],
+ 1);
+ pmu->irq_pending = 0;
+ return;
+ }
+}
+
+/**
+ * kvm_pmu_vcpu_reset - reset pmu state for cpu
+ * @vcpu: The vcpu pointer
+ *
+ */
+void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu)
+{
+ struct pmu_cpu *pmu = &vcpu->arch.pmu_cpu;
+
+ pmu->irq_pending = 0;
+}
+
+/**
+ * kvm_pmu_addr - set or get PMU VM IRQ numbers
+ * @kvm: pointer to the vm struct
+ * @cpu: cpu number
+ * @irq: pointer to irq number value
+ * @write: if true set the irq number else read the irq number
+ *
+ * Set or get the PMU IRQ number for the given cpu number.
+ */
+int kvm_pmu_addr(struct kvm *kvm, unsigned long cpu, u64 *irq, bool write)
+{
+ struct pmu_kvm *kpmu = &kvm->arch.pmu;
+
+ if (CONFIG_KVM_ARM_MAX_VCPUS <= cpu)
+ return -ENODEV;
+
+ mutex_lock(&kvm->lock);
+
+ if (write) {
+ kpmu->irq_num[cpu] = *irq;
+ } else {
+ *irq = kpmu->irq_num[cpu];
+ }
+
+ mutex_unlock(&kvm->lock);
+
+ return 0;
+}
+
+/**
+ * kvm_pmu_init - Initialize global PMU state for a VM
+ * @kvm: pointer to the kvm struct
+ *
+ * Set all the PMU IRQ numbers to invalid value so that
+ * user space has to explicitly provide PMU IRQ numbers
+ * using set device address ioctl.
+ */
+int kvm_pmu_init(struct kvm *kvm)
+{
+ int i;
+ struct pmu_kvm *kpmu = &kvm->arch.pmu;
+
+ for (i = 0; i < CONFIG_KVM_ARM_MAX_VCPUS; i++) {
+ kpmu->irq_num[i] = UINT_MAX;
+ }
+
+ return 0;
+}
--
1.7.9.5
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [RFC PATCH 5/6] ARM64: KVM: Implement full context switch of PMU registers
2014-08-05 9:24 ` Anup Patel
@ 2014-08-05 9:24 ` Anup Patel
-1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05 9:24 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, kvm, patches, marc.zyngier, christoffer.dall,
will.deacon, ian.campbell, pranavkumar, Anup Patel
This patch implements following stuff:
1. Save/restore all PMU registers for both Guest and Host in
KVM world switch.
2. Reserve last PMU event counter for performance analysis in
EL2-mode. To achieve we fake the number of event counters available
to the Guest by trapping PMCR_EL0 register accesses and program
MDCR_EL2.HPMN with number of PMU event counters minus one.
3. Clear and mask overflowed interrupts when saving PMU context
for Guest. The Guest will re-enable overflowed interrupts when
processing virtual PMU interrupt.
With this patch we have direct access of all PMU registers from
Guest and we only trap-n-emulate PMCR_EL0 accesses to fake number
of PMU event counters to Guest.
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
---
arch/arm64/include/asm/kvm_asm.h | 36 ++++++--
arch/arm64/kernel/asm-offsets.c | 1 +
arch/arm64/kvm/hyp-init.S | 15 ++++
arch/arm64/kvm/hyp.S | 168 +++++++++++++++++++++++++++++++++++-
arch/arm64/kvm/sys_regs.c | 175 ++++++++++++++++++++++++++++----------
5 files changed, 343 insertions(+), 52 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 993a7db..93be21f 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -53,15 +53,27 @@
#define DBGWVR0_EL1 71 /* Debug Watchpoint Value Registers (0-15) */
#define DBGWVR15_EL1 86
#define MDCCINT_EL1 87 /* Monitor Debug Comms Channel Interrupt Enable Reg */
+#define PMCR_EL0 88 /* Performance Monitors Control Register */
+#define PMOVSSET_EL0 89 /* Performance Monitors Overflow Flag Status Set Register */
+#define PMCCNTR_EL0 90 /* Cycle Counter Register */
+#define PMSELR_EL0 91 /* Performance Monitors Event Counter Selection Register */
+#define PMEVCNTR0_EL0 92 /* Performance Monitors Event Counter Register (0-30) */
+#define PMEVTYPER0_EL0 93 /* Performance Monitors Event Type Register (0-30) */
+#define PMEVCNTR30_EL0 152
+#define PMEVTYPER30_EL0 153
+#define PMCNTENSET_EL0 154 /* Performance Monitors Count Enable Set Register */
+#define PMINTENSET_EL1 155 /* Performance Monitors Interrupt Enable Set Register */
+#define PMUSERENR_EL0 156 /* Performance Monitors User Enable Register */
+#define PMCCFILTR_EL0 157 /* Cycle Count Filter Register */
/* 32bit specific registers. Keep them at the end of the range */
-#define DACR32_EL2 88 /* Domain Access Control Register */
-#define IFSR32_EL2 89 /* Instruction Fault Status Register */
-#define FPEXC32_EL2 90 /* Floating-Point Exception Control Register */
-#define DBGVCR32_EL2 91 /* Debug Vector Catch Register */
-#define TEECR32_EL1 92 /* ThumbEE Configuration Register */
-#define TEEHBR32_EL1 93 /* ThumbEE Handler Base Register */
-#define NR_SYS_REGS 94
+#define DACR32_EL2 158 /* Domain Access Control Register */
+#define IFSR32_EL2 159 /* Instruction Fault Status Register */
+#define FPEXC32_EL2 160 /* Floating-Point Exception Control Register */
+#define DBGVCR32_EL2 161 /* Debug Vector Catch Register */
+#define TEECR32_EL1 162 /* ThumbEE Configuration Register */
+#define TEEHBR32_EL1 163 /* ThumbEE Handler Base Register */
+#define NR_SYS_REGS 164
/* 32bit mapping */
#define c0_MPIDR (MPIDR_EL1 * 2) /* MultiProcessor ID Register */
@@ -83,6 +95,13 @@
#define c6_IFAR (c6_DFAR + 1) /* Instruction Fault Address Register */
#define c7_PAR (PAR_EL1 * 2) /* Physical Address Register */
#define c7_PAR_high (c7_PAR + 1) /* PAR top 32 bits */
+#define c9_PMCR (PMCR_EL0 * 2) /* Performance Monitors Control Register */
+#define c9_PMOVSSET (PMOVSSET_EL0 * 2)
+#define c9_PMCCNTR (PMCCNTR_EL0 * 2)
+#define c9_PMSELR (PMSELR_EL0 * 2)
+#define c9_PMCNTENSET (PMCNTENSET_EL0 * 2)
+#define c9_PMINTENSET (PMINTENSET_EL1 * 2)
+#define c9_PMUSERENR (PMUSERENR_EL0 * 2)
#define c10_PRRR (MAIR_EL1 * 2) /* Primary Region Remap Register */
#define c10_NMRR (c10_PRRR + 1) /* Normal Memory Remap Register */
#define c12_VBAR (VBAR_EL1 * 2) /* Vector Base Address Register */
@@ -93,6 +112,9 @@
#define c10_AMAIR0 (AMAIR_EL1 * 2) /* Aux Memory Attr Indirection Reg */
#define c10_AMAIR1 (c10_AMAIR0 + 1)/* Aux Memory Attr Indirection Reg */
#define c14_CNTKCTL (CNTKCTL_EL1 * 2) /* Timer Control Register (PL1) */
+#define c14_PMEVCNTR0 (PMEVCNTR0_EL0 * 2)
+#define c14_PMEVTYPR0 (PMEVTYPER0_EL0 * 2)
+#define c14_PMCCFILTR (PMCCFILTR_EL0 * 2)
#define cp14_DBGDSCRext (MDSCR_EL1 * 2)
#define cp14_DBGBCR0 (DBGBCR0_EL1 * 2)
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index ae73a83..053dc3e 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -140,6 +140,7 @@ int main(void)
DEFINE(VGIC_CPU_NR_LR, offsetof(struct vgic_cpu, nr_lr));
DEFINE(KVM_VTTBR, offsetof(struct kvm, arch.vttbr));
DEFINE(KVM_VGIC_VCTRL, offsetof(struct kvm, arch.vgic.vctrl_base));
+ DEFINE(VCPU_PMU_IRQ_PENDING, offsetof(struct kvm_vcpu, arch.pmu_cpu.irq_pending));
#endif
#ifdef CONFIG_ARM64_CPU_SUSPEND
DEFINE(CPU_SUSPEND_SZ, sizeof(struct cpu_suspend_ctx));
diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
index d968796..b45556e 100644
--- a/arch/arm64/kvm/hyp-init.S
+++ b/arch/arm64/kvm/hyp-init.S
@@ -20,6 +20,7 @@
#include <asm/assembler.h>
#include <asm/kvm_arm.h>
#include <asm/kvm_mmu.h>
+#include <asm/pmu.h>
.text
.pushsection .hyp.idmap.text, "ax"
@@ -107,6 +108,20 @@ target: /* We're now in the trampoline code, switch page tables */
kern_hyp_va x3
msr vbar_el2, x3
+ /* Reserve last PMU event counter for EL2 */
+ mov x4, #0
+ mrs x5, id_aa64dfr0_el1
+ ubfx x5, x5, #8, #4 // Extract PMUver
+ cmp x5, #1 // Must be PMUv3 else skip
+ bne 1f
+ mrs x5, pmcr_el0
+ ubfx x5, x5, #ARMV8_PMCR_N_SHIFT, #5 // Number of event counters
+ cmp x5, #0 // Skip if no event counters
+ beq 1f
+ sub x4, x5, #1
+1:
+ msr mdcr_el2, x4
+
/* Hello, World! */
eret
ENDPROC(__kvm_hyp_init)
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index d032132..6b41c01 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -23,6 +23,7 @@
#include <asm/asm-offsets.h>
#include <asm/debug-monitors.h>
#include <asm/fpsimdmacros.h>
+#include <asm/pmu.h>
#include <asm/kvm.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_arm.h>
@@ -426,6 +427,77 @@ __kvm_hyp_code_start:
str x21, [x2, #CPU_SYSREG_OFFSET(MDCCINT_EL1)]
.endm
+.macro save_pmu, is_vcpu_pmu
+ // x2: base address for cpu context
+ // x3: mask of counters allowed in EL0 & EL1
+ // x4: number of event counters allowed in EL0 & EL1
+
+ mrs x6, id_aa64dfr0_el1
+ ubfx x5, x6, #8, #4 // Extract PMUver
+ cmp x5, #1 // Must be PMUv3 else skip
+ bne 1f
+
+ mrs x4, pmcr_el0 // Save PMCR_EL0
+ str x4, [x2, #CPU_SYSREG_OFFSET(PMCR_EL0)]
+
+ and x5, x4, #~(ARMV8_PMCR_E)// Clear PMCR_EL0.E
+ msr pmcr_el0, x5 // This will stop all counters
+
+ mov x3, #0
+ ubfx x4, x4, #ARMV8_PMCR_N_SHIFT, #5 // Number of event counters
+ cmp x4, #0 // Skip if no event counters
+ beq 2f
+ sub x4, x4, #1 // Last event counter is reserved
+ mov x3, #1
+ lsl x3, x3, x4
+ sub x3, x3, #1
+2: orr x3, x3, #(1 << 31) // Mask of event counters
+
+ mrs x5, pmovsset_el0 // Save PMOVSSET_EL0
+ and x5, x5, x3
+ str x5, [x2, #CPU_SYSREG_OFFSET(PMOVSSET_EL0)]
+
+ .if \is_vcpu_pmu == 1
+ msr pmovsclr_el0, x5 // Clear HW interrupt line
+ msr pmintenclr_el1, x5 // Mask irq for overflowed counters
+ str w5, [x0, #VCPU_PMU_IRQ_PENDING] // Update irq pending flag
+ .endif
+
+ mrs x5, pmccntr_el0 // Save PMCCNTR_EL0
+ str x5, [x2, #CPU_SYSREG_OFFSET(PMCCNTR_EL0)]
+
+ mrs x5, pmselr_el0 // Save PMSELR_EL0
+ str x5, [x2, #CPU_SYSREG_OFFSET(PMSELR_EL0)]
+
+ lsl x5, x4, #4
+ add x5, x5, #CPU_SYSREG_OFFSET(PMEVCNTR0_EL0)
+ add x5, x2, x5
+3: cmp x4, #0
+ beq 4f
+ sub x4, x4, #1
+ msr pmselr_el0, x4
+ mrs x6, pmxevcntr_el0 // Save PMEVCNTR<n>_EL0
+ mrs x7, pmxevtyper_el0 // Save PMEVTYPER<n>_EL0
+ stp x6, x7, [x5, #-16]!
+ b 3b
+4:
+ mrs x5, pmcntenset_el0 // Save PMCNTENSET_EL0
+ and x5, x5, x3
+ str x5, [x2, #CPU_SYSREG_OFFSET(PMCNTENSET_EL0)]
+
+ mrs x5, pmintenset_el1 // Save PMINTENSET_EL1
+ and x5, x5, x3
+ str x5, [x2, #CPU_SYSREG_OFFSET(PMINTENSET_EL1)]
+
+ mrs x5, pmuserenr_el0 // Save PMUSERENR_EL0
+ and x5, x5, x3
+ str x5, [x2, #CPU_SYSREG_OFFSET(PMUSERENR_EL0)]
+
+ mrs x5, pmccfiltr_el0 // Save PMCCFILTR_EL0
+ str x5, [x2, #CPU_SYSREG_OFFSET(PMCCFILTR_EL0)]
+1:
+.endm
+
.macro restore_sysregs
// x2: base address for cpu context
// x3: tmp register
@@ -659,6 +731,72 @@ __kvm_hyp_code_start:
msr mdccint_el1, x21
.endm
+.macro restore_pmu
+ // x2: base address for cpu context
+ // x3: mask of counters allowed in EL0 & EL1
+ // x4: number of event counters allowed in EL0 & EL1
+
+ mrs x6, id_aa64dfr0_el1
+ ubfx x5, x6, #8, #4 // Extract PMUver
+ cmp x5, #1 // Must be PMUv3 else skip
+ bne 1f
+
+ mov x3, #0
+ mrs x4, pmcr_el0
+ ubfx x4, x4, #ARMV8_PMCR_N_SHIFT, #5 // Number of event counters
+ cmp x4, #0 // Skip if no event counters
+ beq 2f
+ sub x4, x4, #1 // Last event counter is reserved
+ mov x3, #1
+ lsl x3, x3, x4
+ sub x3, x3, #1
+2: orr x3, x3, #(1 << 31) // Mask of event counters
+
+ ldr x5, [x2, #CPU_SYSREG_OFFSET(PMCCFILTR_EL0)]
+ msr pmccfiltr_el0, x5 // Restore PMCCFILTR_EL0
+
+ ldr x5, [x2, #CPU_SYSREG_OFFSET(PMUSERENR_EL0)]
+ and x5, x5, x3
+ msr pmuserenr_el0, x5 // Restore PMUSERENR_EL0
+
+ msr pmintenclr_el1, x3
+ ldr x5, [x2, #CPU_SYSREG_OFFSET(PMINTENSET_EL1)]
+ and x5, x5, x3
+ msr pmintenset_el1, x5 // Restore PMINTENSET_EL1
+
+ msr pmcntenclr_el0, x3
+ ldr x5, [x2, #CPU_SYSREG_OFFSET(PMCNTENSET_EL0)]
+ and x5, x5, x3
+ msr pmcntenset_el0, x5 // Restore PMCNTENSET_EL0
+
+ lsl x5, x4, #4
+ add x5, x5, #CPU_SYSREG_OFFSET(PMEVCNTR0_EL0)
+ add x5, x2, x5
+3: cmp x4, #0
+ beq 4f
+ sub x4, x4, #1
+ ldp x6, x7, [x5, #-16]!
+ msr pmselr_el0, x4
+ msr pmxevcntr_el0, x6 // Restore PMEVCNTR<n>_EL0
+ msr pmxevtyper_el0, x7 // Restore PMEVTYPER<n>_EL0
+ b 3b
+4:
+ ldr x5, [x2, #CPU_SYSREG_OFFSET(PMSELR_EL0)]
+ msr pmselr_el0, x5 // Restore PMSELR_EL0
+
+ ldr x5, [x2, #CPU_SYSREG_OFFSET(PMCCNTR_EL0)]
+ msr pmccntr_el0, x5 // Restore PMCCNTR_EL0
+
+ msr pmovsclr_el0, x3
+ ldr x5, [x2, #CPU_SYSREG_OFFSET(PMOVSSET_EL0)]
+ and x5, x5, x3
+ msr pmovsset_el0, x5 // Restore PMOVSSET_EL0
+
+ ldr x5, [x2, #CPU_SYSREG_OFFSET(PMCR_EL0)]
+ msr pmcr_el0, x5 // Restore PMCR_EL0
+1:
+.endm
+
.macro skip_32bit_state tmp, target
// Skip 32bit state if not needed
mrs \tmp, hcr_el2
@@ -775,8 +913,10 @@ __kvm_hyp_code_start:
msr hstr_el2, x2
mrs x2, mdcr_el2
+ and x3, x2, #MDCR_EL2_HPME
and x2, x2, #MDCR_EL2_HPMN_MASK
- orr x2, x2, #(MDCR_EL2_TPM | MDCR_EL2_TPMCR)
+ orr x2, x2, x3
+ orr x2, x2, #MDCR_EL2_TPMCR
orr x2, x2, #(MDCR_EL2_TDRA | MDCR_EL2_TDOSA)
// Check for KVM_ARM64_DEBUG_DIRTY, and set debug to trap
@@ -795,7 +935,9 @@ __kvm_hyp_code_start:
msr hstr_el2, xzr
mrs x2, mdcr_el2
+ and x3, x2, #MDCR_EL2_HPME
and x2, x2, #MDCR_EL2_HPMN_MASK
+ orr x2, x2, x3
msr mdcr_el2, x2
.endm
@@ -977,6 +1119,18 @@ __restore_debug:
restore_debug
ret
+__save_pmu_host:
+ save_pmu 0
+ ret
+
+__save_pmu_guest:
+ save_pmu 1
+ ret
+
+__restore_pmu:
+ restore_pmu
+ ret
+
__save_fpsimd:
save_fpsimd
ret
@@ -1005,6 +1159,9 @@ ENTRY(__kvm_vcpu_run)
kern_hyp_va x2
save_host_regs
+
+ bl __save_pmu_host
+
bl __save_fpsimd
bl __save_sysregs
@@ -1027,6 +1184,9 @@ ENTRY(__kvm_vcpu_run)
bl __restore_debug
1:
restore_guest_32bit_state
+
+ bl __restore_pmu
+
restore_guest_regs
// That's it, no more messing around.
@@ -1040,12 +1200,16 @@ __kvm_vcpu_return:
add x2, x0, #VCPU_CONTEXT
save_guest_regs
+
+ bl __save_pmu_guest
+
bl __save_fpsimd
bl __save_sysregs
skip_debug_state x3, 1f
bl __save_debug
1:
+
save_guest_32bit_state
save_timer_state
@@ -1068,6 +1232,8 @@ __kvm_vcpu_return:
str xzr, [x0, #VCPU_DEBUG_FLAGS]
bl __restore_debug
1:
+ bl __restore_pmu
+
restore_host_regs
mov x0, x1
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4a89ca2..081f95e 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -23,6 +23,7 @@
#include <linux/mm.h>
#include <linux/kvm_host.h>
#include <linux/uaccess.h>
+#include <linux/perf_event.h>
#include <asm/kvm_arm.h>
#include <asm/kvm_host.h>
#include <asm/kvm_emulate.h>
@@ -31,6 +32,7 @@
#include <asm/cacheflush.h>
#include <asm/cputype.h>
#include <asm/debug-monitors.h>
+#include <asm/pmu.h>
#include <trace/events/kvm.h>
#include "sys_regs.h"
@@ -164,6 +166,45 @@ static bool access_sctlr(struct kvm_vcpu *vcpu,
return true;
}
+/* PMCR_EL0 accessor. Only called as long as MDCR_EL2.TPMCR is set. */
+static bool access_pmcr(struct kvm_vcpu *vcpu,
+ const struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ unsigned long val, n;
+
+ if (p->is_write) {
+ /* Only update writeable bits of PMCR */
+ if (!p->is_aarch32)
+ val = vcpu_sys_reg(vcpu, r->reg);
+ else
+ val = vcpu_cp15(vcpu, r->reg);
+ val &= ~ARMV8_PMCR_MASK;
+ val |= *vcpu_reg(vcpu, p->Rt) & ARMV8_PMCR_MASK;
+ if (!p->is_aarch32)
+ vcpu_sys_reg(vcpu, r->reg) = val;
+ else
+ vcpu_cp15(vcpu, r->reg) = val;
+ } else {
+ /*
+ * We reserve the last event counter for EL2-mode
+ * performance analysis hence we show one less
+ * event counter to the guest.
+ */
+ if (!p->is_aarch32)
+ val = vcpu_sys_reg(vcpu, r->reg);
+ else
+ val = vcpu_cp15(vcpu, r->reg);
+ n = (val >> ARMV8_PMCR_N_SHIFT) & ARMV8_PMCR_N_MASK;
+ n = (n) ? n - 1 : 0;
+ val &= ~(ARMV8_PMCR_N_MASK << ARMV8_PMCR_N_SHIFT);
+ val |= (n & ARMV8_PMCR_N_MASK) << ARMV8_PMCR_N_SHIFT;
+ *vcpu_reg(vcpu, p->Rt) = val;
+ }
+
+ return true;
+}
+
static bool trap_raz_wi(struct kvm_vcpu *vcpu,
const struct sys_reg_params *p,
const struct sys_reg_desc *r)
@@ -272,6 +313,20 @@ static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
{ Op0(0b10), Op1(0b000), CRn(0b0000), CRm((n)), Op2(0b111), \
trap_debug_regs, reset_val, (DBGWCR0_EL1 + (n)), 0 }
+/* Macro to expand the PMEVCNTRn_EL0 register */
+#define PMU_PMEVCNTR_EL0(n) \
+ /* PMEVCNTRn_EL0 */ \
+ { Op0(0b11), Op1(0b011), CRn(0b1110), \
+ CRm((0b1000 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \
+ NULL, reset_val, (PMEVCNTR0_EL0 + (n)*2), 0 }
+
+/* Macro to expand the PMEVTYPERn_EL0 register */
+#define PMU_PMEVTYPER_EL0(n) \
+ /* PMEVTYPERn_EL0 */ \
+ { Op0(0b11), Op1(0b011), CRn(0b1110), \
+ CRm((0b1100 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \
+ NULL, reset_val, (PMEVTYPER0_EL0 + (n)*2), 0 }
+
/*
* Architected system registers.
* Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -408,10 +463,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
/* PMINTENSET_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b1001), CRm(0b1110), Op2(0b001),
- trap_raz_wi },
- /* PMINTENCLR_EL1 */
- { Op0(0b11), Op1(0b000), CRn(0b1001), CRm(0b1110), Op2(0b010),
- trap_raz_wi },
+ NULL, reset_val, PMINTENSET_EL1, 0 },
/* MAIR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b1010), CRm(0b0010), Op2(0b000),
@@ -440,43 +492,22 @@ static const struct sys_reg_desc sys_reg_descs[] = {
/* PMCR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b000),
- trap_raz_wi },
+ access_pmcr, reset_val, PMCR_EL0, 0 },
/* PMCNTENSET_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b001),
- trap_raz_wi },
- /* PMCNTENCLR_EL0 */
- { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b010),
- trap_raz_wi },
- /* PMOVSCLR_EL0 */
- { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b011),
- trap_raz_wi },
- /* PMSWINC_EL0 */
- { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b100),
- trap_raz_wi },
+ NULL, reset_val, PMCNTENSET_EL0, 0 },
/* PMSELR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b101),
- trap_raz_wi },
- /* PMCEID0_EL0 */
- { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b110),
- trap_raz_wi },
- /* PMCEID1_EL0 */
- { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b111),
- trap_raz_wi },
+ NULL, reset_val, PMSELR_EL0 },
/* PMCCNTR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b000),
- trap_raz_wi },
- /* PMXEVTYPER_EL0 */
- { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b001),
- trap_raz_wi },
- /* PMXEVCNTR_EL0 */
- { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b010),
- trap_raz_wi },
+ NULL, reset_val, PMCCNTR_EL0, 0 },
/* PMUSERENR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b000),
- trap_raz_wi },
+ NULL, reset_val, PMUSERENR_EL0, 0 },
/* PMOVSSET_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b011),
- trap_raz_wi },
+ NULL, reset_val, PMOVSSET_EL0, 0 },
/* TPIDR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1101), CRm(0b0000), Op2(0b010),
@@ -485,6 +516,74 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ Op0(0b11), Op1(0b011), CRn(0b1101), CRm(0b0000), Op2(0b011),
NULL, reset_unknown, TPIDRRO_EL0 },
+ /* PMEVCNTRn_EL0 */
+ PMU_PMEVCNTR_EL0(0),
+ PMU_PMEVCNTR_EL0(1),
+ PMU_PMEVCNTR_EL0(2),
+ PMU_PMEVCNTR_EL0(3),
+ PMU_PMEVCNTR_EL0(4),
+ PMU_PMEVCNTR_EL0(5),
+ PMU_PMEVCNTR_EL0(6),
+ PMU_PMEVCNTR_EL0(7),
+ PMU_PMEVCNTR_EL0(8),
+ PMU_PMEVCNTR_EL0(9),
+ PMU_PMEVCNTR_EL0(10),
+ PMU_PMEVCNTR_EL0(11),
+ PMU_PMEVCNTR_EL0(12),
+ PMU_PMEVCNTR_EL0(13),
+ PMU_PMEVCNTR_EL0(14),
+ PMU_PMEVCNTR_EL0(15),
+ PMU_PMEVCNTR_EL0(16),
+ PMU_PMEVCNTR_EL0(17),
+ PMU_PMEVCNTR_EL0(18),
+ PMU_PMEVCNTR_EL0(19),
+ PMU_PMEVCNTR_EL0(20),
+ PMU_PMEVCNTR_EL0(21),
+ PMU_PMEVCNTR_EL0(22),
+ PMU_PMEVCNTR_EL0(23),
+ PMU_PMEVCNTR_EL0(24),
+ PMU_PMEVCNTR_EL0(25),
+ PMU_PMEVCNTR_EL0(26),
+ PMU_PMEVCNTR_EL0(27),
+ PMU_PMEVCNTR_EL0(28),
+ PMU_PMEVCNTR_EL0(29),
+ PMU_PMEVCNTR_EL0(30),
+ /* PMEVTYPERn_EL0 */
+ PMU_PMEVTYPER_EL0(0),
+ PMU_PMEVTYPER_EL0(1),
+ PMU_PMEVTYPER_EL0(2),
+ PMU_PMEVTYPER_EL0(3),
+ PMU_PMEVTYPER_EL0(4),
+ PMU_PMEVTYPER_EL0(5),
+ PMU_PMEVTYPER_EL0(6),
+ PMU_PMEVTYPER_EL0(7),
+ PMU_PMEVTYPER_EL0(8),
+ PMU_PMEVTYPER_EL0(9),
+ PMU_PMEVTYPER_EL0(10),
+ PMU_PMEVTYPER_EL0(11),
+ PMU_PMEVTYPER_EL0(12),
+ PMU_PMEVTYPER_EL0(13),
+ PMU_PMEVTYPER_EL0(14),
+ PMU_PMEVTYPER_EL0(15),
+ PMU_PMEVTYPER_EL0(16),
+ PMU_PMEVTYPER_EL0(17),
+ PMU_PMEVTYPER_EL0(18),
+ PMU_PMEVTYPER_EL0(19),
+ PMU_PMEVTYPER_EL0(20),
+ PMU_PMEVTYPER_EL0(21),
+ PMU_PMEVTYPER_EL0(22),
+ PMU_PMEVTYPER_EL0(23),
+ PMU_PMEVTYPER_EL0(24),
+ PMU_PMEVTYPER_EL0(25),
+ PMU_PMEVTYPER_EL0(26),
+ PMU_PMEVTYPER_EL0(27),
+ PMU_PMEVTYPER_EL0(28),
+ PMU_PMEVTYPER_EL0(29),
+ PMU_PMEVTYPER_EL0(30),
+ /* PMCCFILTR_EL0 */
+ { Op0(0b11), Op1(0b011), CRn(0b1110), CRm(0b1111), Op2(0b111),
+ NULL, reset_val, PMCCFILTR_EL0, 0 },
+
/* DACR32_EL2 */
{ Op0(0b11), Op1(0b100), CRn(0b0011), CRm(0b0000), Op2(0b000),
NULL, reset_unknown, DACR32_EL2 },
@@ -671,19 +770,7 @@ static const struct sys_reg_desc cp15_regs[] = {
{ Op1( 0), CRn( 7), CRm(14), Op2( 2), access_dcsw },
/* PMU */
- { Op1( 0), CRn( 9), CRm(12), Op2( 0), trap_raz_wi },
- { Op1( 0), CRn( 9), CRm(12), Op2( 1), trap_raz_wi },
- { Op1( 0), CRn( 9), CRm(12), Op2( 2), trap_raz_wi },
- { Op1( 0), CRn( 9), CRm(12), Op2( 3), trap_raz_wi },
- { Op1( 0), CRn( 9), CRm(12), Op2( 5), trap_raz_wi },
- { Op1( 0), CRn( 9), CRm(12), Op2( 6), trap_raz_wi },
- { Op1( 0), CRn( 9), CRm(12), Op2( 7), trap_raz_wi },
- { Op1( 0), CRn( 9), CRm(13), Op2( 0), trap_raz_wi },
- { Op1( 0), CRn( 9), CRm(13), Op2( 1), trap_raz_wi },
- { Op1( 0), CRn( 9), CRm(13), Op2( 2), trap_raz_wi },
- { Op1( 0), CRn( 9), CRm(14), Op2( 0), trap_raz_wi },
- { Op1( 0), CRn( 9), CRm(14), Op2( 1), trap_raz_wi },
- { Op1( 0), CRn( 9), CRm(14), Op2( 2), trap_raz_wi },
+ { Op1( 0), CRn( 9), CRm(12), Op2( 0), access_pmcr, NULL, c9_PMCR },
{ Op1( 0), CRn(10), CRm( 2), Op2( 0), access_vm_reg, NULL, c10_PRRR },
{ Op1( 0), CRn(10), CRm( 2), Op2( 1), access_vm_reg, NULL, c10_NMRR },
--
1.7.9.5
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [RFC PATCH 5/6] ARM64: KVM: Implement full context switch of PMU registers
@ 2014-08-05 9:24 ` Anup Patel
0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05 9:24 UTC (permalink / raw)
To: linux-arm-kernel
This patch implements following stuff:
1. Save/restore all PMU registers for both Guest and Host in
KVM world switch.
2. Reserve last PMU event counter for performance analysis in
EL2-mode. To achieve we fake the number of event counters available
to the Guest by trapping PMCR_EL0 register accesses and program
MDCR_EL2.HPMN with number of PMU event counters minus one.
3. Clear and mask overflowed interrupts when saving PMU context
for Guest. The Guest will re-enable overflowed interrupts when
processing virtual PMU interrupt.
With this patch we have direct access of all PMU registers from
Guest and we only trap-n-emulate PMCR_EL0 accesses to fake number
of PMU event counters to Guest.
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
---
arch/arm64/include/asm/kvm_asm.h | 36 ++++++--
arch/arm64/kernel/asm-offsets.c | 1 +
arch/arm64/kvm/hyp-init.S | 15 ++++
arch/arm64/kvm/hyp.S | 168 +++++++++++++++++++++++++++++++++++-
arch/arm64/kvm/sys_regs.c | 175 ++++++++++++++++++++++++++++----------
5 files changed, 343 insertions(+), 52 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 993a7db..93be21f 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -53,15 +53,27 @@
#define DBGWVR0_EL1 71 /* Debug Watchpoint Value Registers (0-15) */
#define DBGWVR15_EL1 86
#define MDCCINT_EL1 87 /* Monitor Debug Comms Channel Interrupt Enable Reg */
+#define PMCR_EL0 88 /* Performance Monitors Control Register */
+#define PMOVSSET_EL0 89 /* Performance Monitors Overflow Flag Status Set Register */
+#define PMCCNTR_EL0 90 /* Cycle Counter Register */
+#define PMSELR_EL0 91 /* Performance Monitors Event Counter Selection Register */
+#define PMEVCNTR0_EL0 92 /* Performance Monitors Event Counter Register (0-30) */
+#define PMEVTYPER0_EL0 93 /* Performance Monitors Event Type Register (0-30) */
+#define PMEVCNTR30_EL0 152
+#define PMEVTYPER30_EL0 153
+#define PMCNTENSET_EL0 154 /* Performance Monitors Count Enable Set Register */
+#define PMINTENSET_EL1 155 /* Performance Monitors Interrupt Enable Set Register */
+#define PMUSERENR_EL0 156 /* Performance Monitors User Enable Register */
+#define PMCCFILTR_EL0 157 /* Cycle Count Filter Register */
/* 32bit specific registers. Keep them at the end of the range */
-#define DACR32_EL2 88 /* Domain Access Control Register */
-#define IFSR32_EL2 89 /* Instruction Fault Status Register */
-#define FPEXC32_EL2 90 /* Floating-Point Exception Control Register */
-#define DBGVCR32_EL2 91 /* Debug Vector Catch Register */
-#define TEECR32_EL1 92 /* ThumbEE Configuration Register */
-#define TEEHBR32_EL1 93 /* ThumbEE Handler Base Register */
-#define NR_SYS_REGS 94
+#define DACR32_EL2 158 /* Domain Access Control Register */
+#define IFSR32_EL2 159 /* Instruction Fault Status Register */
+#define FPEXC32_EL2 160 /* Floating-Point Exception Control Register */
+#define DBGVCR32_EL2 161 /* Debug Vector Catch Register */
+#define TEECR32_EL1 162 /* ThumbEE Configuration Register */
+#define TEEHBR32_EL1 163 /* ThumbEE Handler Base Register */
+#define NR_SYS_REGS 164
/* 32bit mapping */
#define c0_MPIDR (MPIDR_EL1 * 2) /* MultiProcessor ID Register */
@@ -83,6 +95,13 @@
#define c6_IFAR (c6_DFAR + 1) /* Instruction Fault Address Register */
#define c7_PAR (PAR_EL1 * 2) /* Physical Address Register */
#define c7_PAR_high (c7_PAR + 1) /* PAR top 32 bits */
+#define c9_PMCR (PMCR_EL0 * 2) /* Performance Monitors Control Register */
+#define c9_PMOVSSET (PMOVSSET_EL0 * 2)
+#define c9_PMCCNTR (PMCCNTR_EL0 * 2)
+#define c9_PMSELR (PMSELR_EL0 * 2)
+#define c9_PMCNTENSET (PMCNTENSET_EL0 * 2)
+#define c9_PMINTENSET (PMINTENSET_EL1 * 2)
+#define c9_PMUSERENR (PMUSERENR_EL0 * 2)
#define c10_PRRR (MAIR_EL1 * 2) /* Primary Region Remap Register */
#define c10_NMRR (c10_PRRR + 1) /* Normal Memory Remap Register */
#define c12_VBAR (VBAR_EL1 * 2) /* Vector Base Address Register */
@@ -93,6 +112,9 @@
#define c10_AMAIR0 (AMAIR_EL1 * 2) /* Aux Memory Attr Indirection Reg */
#define c10_AMAIR1 (c10_AMAIR0 + 1)/* Aux Memory Attr Indirection Reg */
#define c14_CNTKCTL (CNTKCTL_EL1 * 2) /* Timer Control Register (PL1) */
+#define c14_PMEVCNTR0 (PMEVCNTR0_EL0 * 2)
+#define c14_PMEVTYPR0 (PMEVTYPER0_EL0 * 2)
+#define c14_PMCCFILTR (PMCCFILTR_EL0 * 2)
#define cp14_DBGDSCRext (MDSCR_EL1 * 2)
#define cp14_DBGBCR0 (DBGBCR0_EL1 * 2)
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index ae73a83..053dc3e 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -140,6 +140,7 @@ int main(void)
DEFINE(VGIC_CPU_NR_LR, offsetof(struct vgic_cpu, nr_lr));
DEFINE(KVM_VTTBR, offsetof(struct kvm, arch.vttbr));
DEFINE(KVM_VGIC_VCTRL, offsetof(struct kvm, arch.vgic.vctrl_base));
+ DEFINE(VCPU_PMU_IRQ_PENDING, offsetof(struct kvm_vcpu, arch.pmu_cpu.irq_pending));
#endif
#ifdef CONFIG_ARM64_CPU_SUSPEND
DEFINE(CPU_SUSPEND_SZ, sizeof(struct cpu_suspend_ctx));
diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
index d968796..b45556e 100644
--- a/arch/arm64/kvm/hyp-init.S
+++ b/arch/arm64/kvm/hyp-init.S
@@ -20,6 +20,7 @@
#include <asm/assembler.h>
#include <asm/kvm_arm.h>
#include <asm/kvm_mmu.h>
+#include <asm/pmu.h>
.text
.pushsection .hyp.idmap.text, "ax"
@@ -107,6 +108,20 @@ target: /* We're now in the trampoline code, switch page tables */
kern_hyp_va x3
msr vbar_el2, x3
+ /* Reserve last PMU event counter for EL2 */
+ mov x4, #0
+ mrs x5, id_aa64dfr0_el1
+ ubfx x5, x5, #8, #4 // Extract PMUver
+ cmp x5, #1 // Must be PMUv3 else skip
+ bne 1f
+ mrs x5, pmcr_el0
+ ubfx x5, x5, #ARMV8_PMCR_N_SHIFT, #5 // Number of event counters
+ cmp x5, #0 // Skip if no event counters
+ beq 1f
+ sub x4, x5, #1
+1:
+ msr mdcr_el2, x4
+
/* Hello, World! */
eret
ENDPROC(__kvm_hyp_init)
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index d032132..6b41c01 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -23,6 +23,7 @@
#include <asm/asm-offsets.h>
#include <asm/debug-monitors.h>
#include <asm/fpsimdmacros.h>
+#include <asm/pmu.h>
#include <asm/kvm.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_arm.h>
@@ -426,6 +427,77 @@ __kvm_hyp_code_start:
str x21, [x2, #CPU_SYSREG_OFFSET(MDCCINT_EL1)]
.endm
+.macro save_pmu, is_vcpu_pmu
+ // x2: base address for cpu context
+ // x3: mask of counters allowed in EL0 & EL1
+ // x4: number of event counters allowed in EL0 & EL1
+
+ mrs x6, id_aa64dfr0_el1
+ ubfx x5, x6, #8, #4 // Extract PMUver
+ cmp x5, #1 // Must be PMUv3 else skip
+ bne 1f
+
+ mrs x4, pmcr_el0 // Save PMCR_EL0
+ str x4, [x2, #CPU_SYSREG_OFFSET(PMCR_EL0)]
+
+ and x5, x4, #~(ARMV8_PMCR_E)// Clear PMCR_EL0.E
+ msr pmcr_el0, x5 // This will stop all counters
+
+ mov x3, #0
+ ubfx x4, x4, #ARMV8_PMCR_N_SHIFT, #5 // Number of event counters
+ cmp x4, #0 // Skip if no event counters
+ beq 2f
+ sub x4, x4, #1 // Last event counter is reserved
+ mov x3, #1
+ lsl x3, x3, x4
+ sub x3, x3, #1
+2: orr x3, x3, #(1 << 31) // Mask of event counters
+
+ mrs x5, pmovsset_el0 // Save PMOVSSET_EL0
+ and x5, x5, x3
+ str x5, [x2, #CPU_SYSREG_OFFSET(PMOVSSET_EL0)]
+
+ .if \is_vcpu_pmu == 1
+ msr pmovsclr_el0, x5 // Clear HW interrupt line
+ msr pmintenclr_el1, x5 // Mask irq for overflowed counters
+ str w5, [x0, #VCPU_PMU_IRQ_PENDING] // Update irq pending flag
+ .endif
+
+ mrs x5, pmccntr_el0 // Save PMCCNTR_EL0
+ str x5, [x2, #CPU_SYSREG_OFFSET(PMCCNTR_EL0)]
+
+ mrs x5, pmselr_el0 // Save PMSELR_EL0
+ str x5, [x2, #CPU_SYSREG_OFFSET(PMSELR_EL0)]
+
+ lsl x5, x4, #4
+ add x5, x5, #CPU_SYSREG_OFFSET(PMEVCNTR0_EL0)
+ add x5, x2, x5
+3: cmp x4, #0
+ beq 4f
+ sub x4, x4, #1
+ msr pmselr_el0, x4
+ mrs x6, pmxevcntr_el0 // Save PMEVCNTR<n>_EL0
+ mrs x7, pmxevtyper_el0 // Save PMEVTYPER<n>_EL0
+ stp x6, x7, [x5, #-16]!
+ b 3b
+4:
+ mrs x5, pmcntenset_el0 // Save PMCNTENSET_EL0
+ and x5, x5, x3
+ str x5, [x2, #CPU_SYSREG_OFFSET(PMCNTENSET_EL0)]
+
+ mrs x5, pmintenset_el1 // Save PMINTENSET_EL1
+ and x5, x5, x3
+ str x5, [x2, #CPU_SYSREG_OFFSET(PMINTENSET_EL1)]
+
+ mrs x5, pmuserenr_el0 // Save PMUSERENR_EL0
+ and x5, x5, x3
+ str x5, [x2, #CPU_SYSREG_OFFSET(PMUSERENR_EL0)]
+
+ mrs x5, pmccfiltr_el0 // Save PMCCFILTR_EL0
+ str x5, [x2, #CPU_SYSREG_OFFSET(PMCCFILTR_EL0)]
+1:
+.endm
+
.macro restore_sysregs
// x2: base address for cpu context
// x3: tmp register
@@ -659,6 +731,72 @@ __kvm_hyp_code_start:
msr mdccint_el1, x21
.endm
+.macro restore_pmu
+ // x2: base address for cpu context
+ // x3: mask of counters allowed in EL0 & EL1
+ // x4: number of event counters allowed in EL0 & EL1
+
+ mrs x6, id_aa64dfr0_el1
+ ubfx x5, x6, #8, #4 // Extract PMUver
+ cmp x5, #1 // Must be PMUv3 else skip
+ bne 1f
+
+ mov x3, #0
+ mrs x4, pmcr_el0
+ ubfx x4, x4, #ARMV8_PMCR_N_SHIFT, #5 // Number of event counters
+ cmp x4, #0 // Skip if no event counters
+ beq 2f
+ sub x4, x4, #1 // Last event counter is reserved
+ mov x3, #1
+ lsl x3, x3, x4
+ sub x3, x3, #1
+2: orr x3, x3, #(1 << 31) // Mask of event counters
+
+ ldr x5, [x2, #CPU_SYSREG_OFFSET(PMCCFILTR_EL0)]
+ msr pmccfiltr_el0, x5 // Restore PMCCFILTR_EL0
+
+ ldr x5, [x2, #CPU_SYSREG_OFFSET(PMUSERENR_EL0)]
+ and x5, x5, x3
+ msr pmuserenr_el0, x5 // Restore PMUSERENR_EL0
+
+ msr pmintenclr_el1, x3
+ ldr x5, [x2, #CPU_SYSREG_OFFSET(PMINTENSET_EL1)]
+ and x5, x5, x3
+ msr pmintenset_el1, x5 // Restore PMINTENSET_EL1
+
+ msr pmcntenclr_el0, x3
+ ldr x5, [x2, #CPU_SYSREG_OFFSET(PMCNTENSET_EL0)]
+ and x5, x5, x3
+ msr pmcntenset_el0, x5 // Restore PMCNTENSET_EL0
+
+ lsl x5, x4, #4
+ add x5, x5, #CPU_SYSREG_OFFSET(PMEVCNTR0_EL0)
+ add x5, x2, x5
+3: cmp x4, #0
+ beq 4f
+ sub x4, x4, #1
+ ldp x6, x7, [x5, #-16]!
+ msr pmselr_el0, x4
+ msr pmxevcntr_el0, x6 // Restore PMEVCNTR<n>_EL0
+ msr pmxevtyper_el0, x7 // Restore PMEVTYPER<n>_EL0
+ b 3b
+4:
+ ldr x5, [x2, #CPU_SYSREG_OFFSET(PMSELR_EL0)]
+ msr pmselr_el0, x5 // Restore PMSELR_EL0
+
+ ldr x5, [x2, #CPU_SYSREG_OFFSET(PMCCNTR_EL0)]
+ msr pmccntr_el0, x5 // Restore PMCCNTR_EL0
+
+ msr pmovsclr_el0, x3
+ ldr x5, [x2, #CPU_SYSREG_OFFSET(PMOVSSET_EL0)]
+ and x5, x5, x3
+ msr pmovsset_el0, x5 // Restore PMOVSSET_EL0
+
+ ldr x5, [x2, #CPU_SYSREG_OFFSET(PMCR_EL0)]
+ msr pmcr_el0, x5 // Restore PMCR_EL0
+1:
+.endm
+
.macro skip_32bit_state tmp, target
// Skip 32bit state if not needed
mrs \tmp, hcr_el2
@@ -775,8 +913,10 @@ __kvm_hyp_code_start:
msr hstr_el2, x2
mrs x2, mdcr_el2
+ and x3, x2, #MDCR_EL2_HPME
and x2, x2, #MDCR_EL2_HPMN_MASK
- orr x2, x2, #(MDCR_EL2_TPM | MDCR_EL2_TPMCR)
+ orr x2, x2, x3
+ orr x2, x2, #MDCR_EL2_TPMCR
orr x2, x2, #(MDCR_EL2_TDRA | MDCR_EL2_TDOSA)
// Check for KVM_ARM64_DEBUG_DIRTY, and set debug to trap
@@ -795,7 +935,9 @@ __kvm_hyp_code_start:
msr hstr_el2, xzr
mrs x2, mdcr_el2
+ and x3, x2, #MDCR_EL2_HPME
and x2, x2, #MDCR_EL2_HPMN_MASK
+ orr x2, x2, x3
msr mdcr_el2, x2
.endm
@@ -977,6 +1119,18 @@ __restore_debug:
restore_debug
ret
+__save_pmu_host:
+ save_pmu 0
+ ret
+
+__save_pmu_guest:
+ save_pmu 1
+ ret
+
+__restore_pmu:
+ restore_pmu
+ ret
+
__save_fpsimd:
save_fpsimd
ret
@@ -1005,6 +1159,9 @@ ENTRY(__kvm_vcpu_run)
kern_hyp_va x2
save_host_regs
+
+ bl __save_pmu_host
+
bl __save_fpsimd
bl __save_sysregs
@@ -1027,6 +1184,9 @@ ENTRY(__kvm_vcpu_run)
bl __restore_debug
1:
restore_guest_32bit_state
+
+ bl __restore_pmu
+
restore_guest_regs
// That's it, no more messing around.
@@ -1040,12 +1200,16 @@ __kvm_vcpu_return:
add x2, x0, #VCPU_CONTEXT
save_guest_regs
+
+ bl __save_pmu_guest
+
bl __save_fpsimd
bl __save_sysregs
skip_debug_state x3, 1f
bl __save_debug
1:
+
save_guest_32bit_state
save_timer_state
@@ -1068,6 +1232,8 @@ __kvm_vcpu_return:
str xzr, [x0, #VCPU_DEBUG_FLAGS]
bl __restore_debug
1:
+ bl __restore_pmu
+
restore_host_regs
mov x0, x1
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4a89ca2..081f95e 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -23,6 +23,7 @@
#include <linux/mm.h>
#include <linux/kvm_host.h>
#include <linux/uaccess.h>
+#include <linux/perf_event.h>
#include <asm/kvm_arm.h>
#include <asm/kvm_host.h>
#include <asm/kvm_emulate.h>
@@ -31,6 +32,7 @@
#include <asm/cacheflush.h>
#include <asm/cputype.h>
#include <asm/debug-monitors.h>
+#include <asm/pmu.h>
#include <trace/events/kvm.h>
#include "sys_regs.h"
@@ -164,6 +166,45 @@ static bool access_sctlr(struct kvm_vcpu *vcpu,
return true;
}
+/* PMCR_EL0 accessor. Only called as long as MDCR_EL2.TPMCR is set. */
+static bool access_pmcr(struct kvm_vcpu *vcpu,
+ const struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ unsigned long val, n;
+
+ if (p->is_write) {
+ /* Only update writeable bits of PMCR */
+ if (!p->is_aarch32)
+ val = vcpu_sys_reg(vcpu, r->reg);
+ else
+ val = vcpu_cp15(vcpu, r->reg);
+ val &= ~ARMV8_PMCR_MASK;
+ val |= *vcpu_reg(vcpu, p->Rt) & ARMV8_PMCR_MASK;
+ if (!p->is_aarch32)
+ vcpu_sys_reg(vcpu, r->reg) = val;
+ else
+ vcpu_cp15(vcpu, r->reg) = val;
+ } else {
+ /*
+ * We reserve the last event counter for EL2-mode
+ * performance analysis hence we show one less
+ * event counter to the guest.
+ */
+ if (!p->is_aarch32)
+ val = vcpu_sys_reg(vcpu, r->reg);
+ else
+ val = vcpu_cp15(vcpu, r->reg);
+ n = (val >> ARMV8_PMCR_N_SHIFT) & ARMV8_PMCR_N_MASK;
+ n = (n) ? n - 1 : 0;
+ val &= ~(ARMV8_PMCR_N_MASK << ARMV8_PMCR_N_SHIFT);
+ val |= (n & ARMV8_PMCR_N_MASK) << ARMV8_PMCR_N_SHIFT;
+ *vcpu_reg(vcpu, p->Rt) = val;
+ }
+
+ return true;
+}
+
static bool trap_raz_wi(struct kvm_vcpu *vcpu,
const struct sys_reg_params *p,
const struct sys_reg_desc *r)
@@ -272,6 +313,20 @@ static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
{ Op0(0b10), Op1(0b000), CRn(0b0000), CRm((n)), Op2(0b111), \
trap_debug_regs, reset_val, (DBGWCR0_EL1 + (n)), 0 }
+/* Macro to expand the PMEVCNTRn_EL0 register */
+#define PMU_PMEVCNTR_EL0(n) \
+ /* PMEVCNTRn_EL0 */ \
+ { Op0(0b11), Op1(0b011), CRn(0b1110), \
+ CRm((0b1000 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \
+ NULL, reset_val, (PMEVCNTR0_EL0 + (n)*2), 0 }
+
+/* Macro to expand the PMEVTYPERn_EL0 register */
+#define PMU_PMEVTYPER_EL0(n) \
+ /* PMEVTYPERn_EL0 */ \
+ { Op0(0b11), Op1(0b011), CRn(0b1110), \
+ CRm((0b1100 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \
+ NULL, reset_val, (PMEVTYPER0_EL0 + (n)*2), 0 }
+
/*
* Architected system registers.
* Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -408,10 +463,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
/* PMINTENSET_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b1001), CRm(0b1110), Op2(0b001),
- trap_raz_wi },
- /* PMINTENCLR_EL1 */
- { Op0(0b11), Op1(0b000), CRn(0b1001), CRm(0b1110), Op2(0b010),
- trap_raz_wi },
+ NULL, reset_val, PMINTENSET_EL1, 0 },
/* MAIR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b1010), CRm(0b0010), Op2(0b000),
@@ -440,43 +492,22 @@ static const struct sys_reg_desc sys_reg_descs[] = {
/* PMCR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b000),
- trap_raz_wi },
+ access_pmcr, reset_val, PMCR_EL0, 0 },
/* PMCNTENSET_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b001),
- trap_raz_wi },
- /* PMCNTENCLR_EL0 */
- { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b010),
- trap_raz_wi },
- /* PMOVSCLR_EL0 */
- { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b011),
- trap_raz_wi },
- /* PMSWINC_EL0 */
- { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b100),
- trap_raz_wi },
+ NULL, reset_val, PMCNTENSET_EL0, 0 },
/* PMSELR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b101),
- trap_raz_wi },
- /* PMCEID0_EL0 */
- { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b110),
- trap_raz_wi },
- /* PMCEID1_EL0 */
- { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b111),
- trap_raz_wi },
+ NULL, reset_val, PMSELR_EL0 },
/* PMCCNTR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b000),
- trap_raz_wi },
- /* PMXEVTYPER_EL0 */
- { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b001),
- trap_raz_wi },
- /* PMXEVCNTR_EL0 */
- { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b010),
- trap_raz_wi },
+ NULL, reset_val, PMCCNTR_EL0, 0 },
/* PMUSERENR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b000),
- trap_raz_wi },
+ NULL, reset_val, PMUSERENR_EL0, 0 },
/* PMOVSSET_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b011),
- trap_raz_wi },
+ NULL, reset_val, PMOVSSET_EL0, 0 },
/* TPIDR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1101), CRm(0b0000), Op2(0b010),
@@ -485,6 +516,74 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ Op0(0b11), Op1(0b011), CRn(0b1101), CRm(0b0000), Op2(0b011),
NULL, reset_unknown, TPIDRRO_EL0 },
+ /* PMEVCNTRn_EL0 */
+ PMU_PMEVCNTR_EL0(0),
+ PMU_PMEVCNTR_EL0(1),
+ PMU_PMEVCNTR_EL0(2),
+ PMU_PMEVCNTR_EL0(3),
+ PMU_PMEVCNTR_EL0(4),
+ PMU_PMEVCNTR_EL0(5),
+ PMU_PMEVCNTR_EL0(6),
+ PMU_PMEVCNTR_EL0(7),
+ PMU_PMEVCNTR_EL0(8),
+ PMU_PMEVCNTR_EL0(9),
+ PMU_PMEVCNTR_EL0(10),
+ PMU_PMEVCNTR_EL0(11),
+ PMU_PMEVCNTR_EL0(12),
+ PMU_PMEVCNTR_EL0(13),
+ PMU_PMEVCNTR_EL0(14),
+ PMU_PMEVCNTR_EL0(15),
+ PMU_PMEVCNTR_EL0(16),
+ PMU_PMEVCNTR_EL0(17),
+ PMU_PMEVCNTR_EL0(18),
+ PMU_PMEVCNTR_EL0(19),
+ PMU_PMEVCNTR_EL0(20),
+ PMU_PMEVCNTR_EL0(21),
+ PMU_PMEVCNTR_EL0(22),
+ PMU_PMEVCNTR_EL0(23),
+ PMU_PMEVCNTR_EL0(24),
+ PMU_PMEVCNTR_EL0(25),
+ PMU_PMEVCNTR_EL0(26),
+ PMU_PMEVCNTR_EL0(27),
+ PMU_PMEVCNTR_EL0(28),
+ PMU_PMEVCNTR_EL0(29),
+ PMU_PMEVCNTR_EL0(30),
+ /* PMEVTYPERn_EL0 */
+ PMU_PMEVTYPER_EL0(0),
+ PMU_PMEVTYPER_EL0(1),
+ PMU_PMEVTYPER_EL0(2),
+ PMU_PMEVTYPER_EL0(3),
+ PMU_PMEVTYPER_EL0(4),
+ PMU_PMEVTYPER_EL0(5),
+ PMU_PMEVTYPER_EL0(6),
+ PMU_PMEVTYPER_EL0(7),
+ PMU_PMEVTYPER_EL0(8),
+ PMU_PMEVTYPER_EL0(9),
+ PMU_PMEVTYPER_EL0(10),
+ PMU_PMEVTYPER_EL0(11),
+ PMU_PMEVTYPER_EL0(12),
+ PMU_PMEVTYPER_EL0(13),
+ PMU_PMEVTYPER_EL0(14),
+ PMU_PMEVTYPER_EL0(15),
+ PMU_PMEVTYPER_EL0(16),
+ PMU_PMEVTYPER_EL0(17),
+ PMU_PMEVTYPER_EL0(18),
+ PMU_PMEVTYPER_EL0(19),
+ PMU_PMEVTYPER_EL0(20),
+ PMU_PMEVTYPER_EL0(21),
+ PMU_PMEVTYPER_EL0(22),
+ PMU_PMEVTYPER_EL0(23),
+ PMU_PMEVTYPER_EL0(24),
+ PMU_PMEVTYPER_EL0(25),
+ PMU_PMEVTYPER_EL0(26),
+ PMU_PMEVTYPER_EL0(27),
+ PMU_PMEVTYPER_EL0(28),
+ PMU_PMEVTYPER_EL0(29),
+ PMU_PMEVTYPER_EL0(30),
+ /* PMCCFILTR_EL0 */
+ { Op0(0b11), Op1(0b011), CRn(0b1110), CRm(0b1111), Op2(0b111),
+ NULL, reset_val, PMCCFILTR_EL0, 0 },
+
/* DACR32_EL2 */
{ Op0(0b11), Op1(0b100), CRn(0b0011), CRm(0b0000), Op2(0b000),
NULL, reset_unknown, DACR32_EL2 },
@@ -671,19 +770,7 @@ static const struct sys_reg_desc cp15_regs[] = {
{ Op1( 0), CRn( 7), CRm(14), Op2( 2), access_dcsw },
/* PMU */
- { Op1( 0), CRn( 9), CRm(12), Op2( 0), trap_raz_wi },
- { Op1( 0), CRn( 9), CRm(12), Op2( 1), trap_raz_wi },
- { Op1( 0), CRn( 9), CRm(12), Op2( 2), trap_raz_wi },
- { Op1( 0), CRn( 9), CRm(12), Op2( 3), trap_raz_wi },
- { Op1( 0), CRn( 9), CRm(12), Op2( 5), trap_raz_wi },
- { Op1( 0), CRn( 9), CRm(12), Op2( 6), trap_raz_wi },
- { Op1( 0), CRn( 9), CRm(12), Op2( 7), trap_raz_wi },
- { Op1( 0), CRn( 9), CRm(13), Op2( 0), trap_raz_wi },
- { Op1( 0), CRn( 9), CRm(13), Op2( 1), trap_raz_wi },
- { Op1( 0), CRn( 9), CRm(13), Op2( 2), trap_raz_wi },
- { Op1( 0), CRn( 9), CRm(14), Op2( 0), trap_raz_wi },
- { Op1( 0), CRn( 9), CRm(14), Op2( 1), trap_raz_wi },
- { Op1( 0), CRn( 9), CRm(14), Op2( 2), trap_raz_wi },
+ { Op1( 0), CRn( 9), CRm(12), Op2( 0), access_pmcr, NULL, c9_PMCR },
{ Op1( 0), CRn(10), CRm( 2), Op2( 0), access_vm_reg, NULL, c10_PRRR },
{ Op1( 0), CRn(10), CRm( 2), Op2( 1), access_vm_reg, NULL, c10_NMRR },
--
1.7.9.5
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [RFC PATCH 6/6] ARM64: KVM: Upgrade to lazy context switch of PMU registers
2014-08-05 9:24 ` Anup Patel
@ 2014-08-05 9:24 ` Anup Patel
-1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05 9:24 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, kvm, patches, marc.zyngier, christoffer.dall,
will.deacon, ian.campbell, pranavkumar, Anup Patel
Full context switch of all PMU registers for both host and
guest can make KVM world-switch very expensive.
This patch improves current PMU context switch by implementing
lazy context switch of PMU registers.
To achieve this, we trap all PMU register accesses and use a
per-VCPU dirty flag to keep track whether guest has updated
PMU registers or not. If PMU registers of VCPU are dirty or
PMCR_EL0.E bit is set for VCPU then we do full context switch
for both host and guest.
(This is very similar to lazy world switch for debug registers:
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
Also, we always trap-n-emulate PMCR_EL0 to fake number of event
counters available to guest. For this PMCR_EL0 trap-n-emulate to
work correctly, we always save/restore PMCR_EL0 for both host and
guest whereas other PMU registers will be saved/restored based
on PMU dirty flag.
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
---
arch/arm64/include/asm/kvm_asm.h | 3 +
arch/arm64/include/asm/kvm_host.h | 3 +
arch/arm64/kernel/asm-offsets.c | 1 +
arch/arm64/kvm/hyp.S | 63 ++++++++--
arch/arm64/kvm/sys_regs.c | 248 +++++++++++++++++++++++++++++++++++--
5 files changed, 298 insertions(+), 20 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 93be21f..47b7fcd 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -132,6 +132,9 @@
#define KVM_ARM64_DEBUG_DIRTY_SHIFT 0
#define KVM_ARM64_DEBUG_DIRTY (1 << KVM_ARM64_DEBUG_DIRTY_SHIFT)
+#define KVM_ARM64_PMU_DIRTY_SHIFT 0
+#define KVM_ARM64_PMU_DIRTY (1 << KVM_ARM64_PMU_DIRTY_SHIFT)
+
#ifndef __ASSEMBLY__
struct kvm;
struct kvm_vcpu;
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index ae4cdb2..4dba2a3 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -117,6 +117,9 @@ struct kvm_vcpu_arch {
/* Timer state */
struct arch_timer_cpu timer_cpu;
+ /* PMU flags */
+ u64 pmu_flags;
+
/* PMU state */
struct pmu_cpu pmu_cpu;
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 053dc3e..4234794 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -140,6 +140,7 @@ int main(void)
DEFINE(VGIC_CPU_NR_LR, offsetof(struct vgic_cpu, nr_lr));
DEFINE(KVM_VTTBR, offsetof(struct kvm, arch.vttbr));
DEFINE(KVM_VGIC_VCTRL, offsetof(struct kvm, arch.vgic.vctrl_base));
+ DEFINE(VCPU_PMU_FLAGS, offsetof(struct kvm_vcpu, arch.pmu_flags));
DEFINE(VCPU_PMU_IRQ_PENDING, offsetof(struct kvm_vcpu, arch.pmu_cpu.irq_pending));
#endif
#ifdef CONFIG_ARM64_CPU_SUSPEND
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 6b41c01..5f9ccee 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -443,6 +443,9 @@ __kvm_hyp_code_start:
and x5, x4, #~(ARMV8_PMCR_E)// Clear PMCR_EL0.E
msr pmcr_el0, x5 // This will stop all counters
+ ldr x5, [x0, #VCPU_PMU_FLAGS] // Only save if dirty flag set
+ tbz x5, #KVM_ARM64_PMU_DIRTY_SHIFT, 1f
+
mov x3, #0
ubfx x4, x4, #ARMV8_PMCR_N_SHIFT, #5 // Number of event counters
cmp x4, #0 // Skip if no event counters
@@ -731,7 +734,7 @@ __kvm_hyp_code_start:
msr mdccint_el1, x21
.endm
-.macro restore_pmu
+.macro restore_pmu, is_vcpu_pmu
// x2: base address for cpu context
// x3: mask of counters allowed in EL0 & EL1
// x4: number of event counters allowed in EL0 & EL1
@@ -741,16 +744,19 @@ __kvm_hyp_code_start:
cmp x5, #1 // Must be PMUv3 else skip
bne 1f
+ ldr x5, [x0, #VCPU_PMU_FLAGS] // Only restore if dirty flag set
+ tbz x5, #KVM_ARM64_PMU_DIRTY_SHIFT, 2f
+
mov x3, #0
mrs x4, pmcr_el0
ubfx x4, x4, #ARMV8_PMCR_N_SHIFT, #5 // Number of event counters
cmp x4, #0 // Skip if no event counters
- beq 2f
+ beq 3f
sub x4, x4, #1 // Last event counter is reserved
mov x3, #1
lsl x3, x3, x4
sub x3, x3, #1
-2: orr x3, x3, #(1 << 31) // Mask of event counters
+3: orr x3, x3, #(1 << 31) // Mask of event counters
ldr x5, [x2, #CPU_SYSREG_OFFSET(PMCCFILTR_EL0)]
msr pmccfiltr_el0, x5 // Restore PMCCFILTR_EL0
@@ -772,15 +778,15 @@ __kvm_hyp_code_start:
lsl x5, x4, #4
add x5, x5, #CPU_SYSREG_OFFSET(PMEVCNTR0_EL0)
add x5, x2, x5
-3: cmp x4, #0
- beq 4f
+4: cmp x4, #0
+ beq 5f
sub x4, x4, #1
ldp x6, x7, [x5, #-16]!
msr pmselr_el0, x4
msr pmxevcntr_el0, x6 // Restore PMEVCNTR<n>_EL0
msr pmxevtyper_el0, x7 // Restore PMEVTYPER<n>_EL0
- b 3b
-4:
+ b 4b
+5:
ldr x5, [x2, #CPU_SYSREG_OFFSET(PMSELR_EL0)]
msr pmselr_el0, x5 // Restore PMSELR_EL0
@@ -792,6 +798,13 @@ __kvm_hyp_code_start:
and x5, x5, x3
msr pmovsset_el0, x5 // Restore PMOVSSET_EL0
+ .if \is_vcpu_pmu == 0
+ // Clear the dirty flag for the next run, as all the state has
+ // already been saved. Note that we nuke the whole 64bit word.
+ // If we ever add more flags, we'll have to be more careful...
+ str xzr, [x0, #VCPU_PMU_FLAGS]
+ .endif
+2:
ldr x5, [x2, #CPU_SYSREG_OFFSET(PMCR_EL0)]
msr pmcr_el0, x5 // Restore PMCR_EL0
1:
@@ -838,6 +851,23 @@ __kvm_hyp_code_start:
9999:
.endm
+.macro compute_pmu_state
+ // Compute pmu state: If PMCR_EL0.E is set then
+ // we do full save/restore cycle and disable trapping
+ add x25, x0, #VCPU_CONTEXT
+
+ // Check the state of PMCR_EL0.E bit
+ ldr x26, [x25, #CPU_SYSREG_OFFSET(PMCR_EL0)]
+ and x26, x26, #ARMV8_PMCR_E
+ cmp x26, #0
+ b.eq 8887f
+
+ // If any interesting bits was set, we must set the flag
+ mov x26, #KVM_ARM64_PMU_DIRTY
+ str x26, [x0, #VCPU_PMU_FLAGS]
+8887:
+.endm
+
.macro save_guest_32bit_state
skip_32bit_state x3, 1f
@@ -919,6 +949,12 @@ __kvm_hyp_code_start:
orr x2, x2, #MDCR_EL2_TPMCR
orr x2, x2, #(MDCR_EL2_TDRA | MDCR_EL2_TDOSA)
+ // Check for KVM_ARM64_PMU_DIRTY, and set PMU to trap
+ // all PMU registers if PMU not dirty.
+ ldr x3, [x0, #VCPU_PMU_FLAGS]
+ tbnz x3, #KVM_ARM64_PMU_DIRTY_SHIFT, 1f
+ orr x2, x2, #MDCR_EL2_TPM
+1:
// Check for KVM_ARM64_DEBUG_DIRTY, and set debug to trap
// if not dirty.
ldr x3, [x0, #VCPU_DEBUG_FLAGS]
@@ -1127,8 +1163,12 @@ __save_pmu_guest:
save_pmu 1
ret
-__restore_pmu:
- restore_pmu
+__restore_pmu_host:
+ restore_pmu 0
+ ret
+
+__restore_pmu_guest:
+ restore_pmu 1
ret
__save_fpsimd:
@@ -1160,6 +1200,7 @@ ENTRY(__kvm_vcpu_run)
save_host_regs
+ compute_pmu_state
bl __save_pmu_host
bl __save_fpsimd
@@ -1185,7 +1226,7 @@ ENTRY(__kvm_vcpu_run)
1:
restore_guest_32bit_state
- bl __restore_pmu
+ bl __restore_pmu_guest
restore_guest_regs
@@ -1232,7 +1273,7 @@ __kvm_vcpu_return:
str xzr, [x0, #VCPU_DEBUG_FLAGS]
bl __restore_debug
1:
- bl __restore_pmu
+ bl __restore_pmu_host
restore_host_regs
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 081f95e..cda6774 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -166,6 +166,130 @@ static bool access_sctlr(struct kvm_vcpu *vcpu,
return true;
}
+/* PMU reg accessor. Only called as long as MDCR_EL2.TPMCR is set. */
+static bool access_pmu_reg(struct kvm_vcpu *vcpu,
+ const struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ unsigned long val;
+
+ if (p->is_write) {
+ val = *vcpu_reg(vcpu, p->Rt);
+ if (!p->is_aarch32)
+ vcpu_sys_reg(vcpu, r->reg) = val;
+ else
+ vcpu_cp15(vcpu, r->reg) = val & 0xffffffffUL;
+ vcpu->arch.pmu_flags |= KVM_ARM64_PMU_DIRTY;
+ } else {
+ if (!p->is_aarch32)
+ val = vcpu_sys_reg(vcpu, r->reg);
+ else
+ val = vcpu_cp15(vcpu, r->reg);
+ *vcpu_reg(vcpu, p->Rt) = val;
+ }
+
+ return true;
+}
+
+/* PMU set reg accessor. Only called as long as MDCR_EL2.TPM is set. */
+static bool access_pmu_setreg(struct kvm_vcpu *vcpu,
+ const struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ unsigned long val;
+
+ if (p->is_write) {
+ val = *vcpu_reg(vcpu, p->Rt);
+ if (!p->is_aarch32)
+ vcpu_sys_reg(vcpu, r->reg) |= val;
+ else
+ vcpu_cp15(vcpu, r->reg) |= val & 0xffffffffUL;
+ vcpu->arch.pmu_flags |= KVM_ARM64_PMU_DIRTY;
+ } else {
+ if (!p->is_aarch32)
+ val = vcpu_sys_reg(vcpu, r->reg);
+ else
+ val = vcpu_cp15(vcpu, r->reg);
+ *vcpu_reg(vcpu, p->Rt) = val;
+ }
+
+ return true;
+}
+
+/* PMU clear reg accessor. Only called as long as MDCR_EL2.TPM is set. */
+static bool access_pmu_clrreg(struct kvm_vcpu *vcpu,
+ const struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ unsigned long val;
+
+ if (p->is_write) {
+ val = *vcpu_reg(vcpu, p->Rt);
+ if (!p->is_aarch32)
+ vcpu_sys_reg(vcpu, r->reg) &= ~val;
+ else
+ vcpu_cp15(vcpu, r->reg) &= ~(val & 0xffffffffUL);
+ vcpu->arch.pmu_flags |= KVM_ARM64_PMU_DIRTY;
+ } else {
+ if (!p->is_aarch32)
+ val = vcpu_sys_reg(vcpu, r->reg);
+ else
+ val = vcpu_cp15(vcpu, r->reg);
+ *vcpu_reg(vcpu, p->Rt) = val;
+ }
+
+ return true;
+}
+
+/* PMU extended reg accessor. Only called as long as MDCR_EL2.TPM is set. */
+static bool access_pmu_xreg(struct kvm_vcpu *vcpu,
+ const struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ unsigned long index, reg, val;
+
+ if (!p->is_aarch32)
+ index = vcpu_sys_reg(vcpu, PMSELR_EL0) & ARMV8_PMCR_N_MASK;
+ else
+ index = vcpu_cp15(vcpu, c9_PMSELR) & ARMV8_PMCR_N_MASK;
+
+ if (index != ARMV8_PMCR_N_MASK) {
+ if (!p->is_aarch32) {
+ if (r->reg == PMEVCNTR0_EL0)
+ reg = PMCCNTR_EL0;
+ else
+ reg = PMCCFILTR_EL0;
+ } else {
+ if (r->reg == c14_PMEVCNTR0)
+ reg = c9_PMCCNTR;
+ else
+ reg = c14_PMCCFILTR;
+ }
+ } else {
+ if (!p->is_aarch32)
+ reg = r->reg + 2*index;
+ else
+ reg = r->reg + 4*index;
+ }
+
+ if (p->is_write) {
+ val = *vcpu_reg(vcpu, p->Rt);
+ if (!p->is_aarch32)
+ vcpu_sys_reg(vcpu, reg) = val;
+ else
+ vcpu_cp15(vcpu, reg) = val & 0xffffffffUL;
+ vcpu->arch.pmu_flags |= KVM_ARM64_PMU_DIRTY;
+ } else {
+ if (!p->is_aarch32)
+ val = vcpu_sys_reg(vcpu, reg);
+ else
+ val = vcpu_cp15(vcpu, reg);
+ *vcpu_reg(vcpu, p->Rt) = val;
+ }
+
+ return true;
+}
+
/* PMCR_EL0 accessor. Only called as long as MDCR_EL2.TPMCR is set. */
static bool access_pmcr(struct kvm_vcpu *vcpu,
const struct sys_reg_params *p,
@@ -185,6 +309,7 @@ static bool access_pmcr(struct kvm_vcpu *vcpu,
vcpu_sys_reg(vcpu, r->reg) = val;
else
vcpu_cp15(vcpu, r->reg) = val;
+ vcpu->arch.pmu_flags |= KVM_ARM64_PMU_DIRTY;
} else {
/*
* We reserve the last event counter for EL2-mode
@@ -318,14 +443,14 @@ static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
/* PMEVCNTRn_EL0 */ \
{ Op0(0b11), Op1(0b011), CRn(0b1110), \
CRm((0b1000 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \
- NULL, reset_val, (PMEVCNTR0_EL0 + (n)*2), 0 }
+ access_pmu_reg, reset_val, (PMEVCNTR0_EL0 + (n)*2), 0 }
/* Macro to expand the PMEVTYPERn_EL0 register */
#define PMU_PMEVTYPER_EL0(n) \
/* PMEVTYPERn_EL0 */ \
{ Op0(0b11), Op1(0b011), CRn(0b1110), \
CRm((0b1100 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \
- NULL, reset_val, (PMEVTYPER0_EL0 + (n)*2), 0 }
+ access_pmu_reg, reset_val, (PMEVTYPER0_EL0 + (n)*2), 0 }
/*
* Architected system registers.
@@ -463,7 +588,10 @@ static const struct sys_reg_desc sys_reg_descs[] = {
/* PMINTENSET_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b1001), CRm(0b1110), Op2(0b001),
- NULL, reset_val, PMINTENSET_EL1, 0 },
+ access_pmu_setreg, reset_val, PMINTENSET_EL1, 0 },
+ /* PMINTENCLR_EL1 */
+ { Op0(0b11), Op1(0b000), CRn(0b1001), CRm(0b1110), Op2(0b010),
+ access_pmu_clrreg, reset_val, PMINTENSET_EL1, 0 },
/* MAIR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b1010), CRm(0b0010), Op2(0b000),
@@ -495,19 +623,31 @@ static const struct sys_reg_desc sys_reg_descs[] = {
access_pmcr, reset_val, PMCR_EL0, 0 },
/* PMCNTENSET_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b001),
- NULL, reset_val, PMCNTENSET_EL0, 0 },
+ access_pmu_setreg, reset_val, PMCNTENSET_EL0, 0 },
+ /* PMCNTENCLR_EL0 */
+ { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b010),
+ access_pmu_clrreg, reset_val, PMCNTENSET_EL0, 0 },
+ /* PMOVSCLR_EL0 */
+ { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b011),
+ access_pmu_clrreg, reset_val, PMOVSSET_EL0, 0 },
/* PMSELR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b101),
- NULL, reset_val, PMSELR_EL0 },
+ access_pmu_reg, reset_val, PMSELR_EL0 },
/* PMCCNTR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b000),
- NULL, reset_val, PMCCNTR_EL0, 0 },
+ access_pmu_reg, reset_val, PMCCNTR_EL0, 0 },
+ /* PMXEVTYPER_EL0 */
+ { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b001),
+ access_pmu_xreg, reset_val, PMEVTYPER0_EL0, 0 },
+ /* PMXEVCNTR_EL0 */
+ { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b010),
+ access_pmu_xreg, reset_val, PMEVCNTR0_EL0, 0 },
/* PMUSERENR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b000),
- NULL, reset_val, PMUSERENR_EL0, 0 },
+ access_pmu_reg, reset_val, PMUSERENR_EL0, 0 },
/* PMOVSSET_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b011),
- NULL, reset_val, PMOVSSET_EL0, 0 },
+ access_pmu_setreg, reset_val, PMOVSSET_EL0, 0 },
/* TPIDR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1101), CRm(0b0000), Op2(0b010),
@@ -582,7 +722,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
PMU_PMEVTYPER_EL0(30),
/* PMCCFILTR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1110), CRm(0b1111), Op2(0b111),
- NULL, reset_val, PMCCFILTR_EL0, 0 },
+ access_pmu_reg, reset_val, PMCCFILTR_EL0, 0 },
/* DACR32_EL2 */
{ Op0(0b11), Op1(0b100), CRn(0b0011), CRm(0b0000), Op2(0b000),
@@ -744,6 +884,20 @@ static const struct sys_reg_desc cp14_64_regs[] = {
{ Op1( 0), CRm( 2), .access = trap_raz_wi },
};
+/* Macro to expand the PMEVCNTR<n> register */
+#define PMU_PMEVCNTR(n) \
+ /* PMEVCNTRn */ \
+ { Op1( 0), CRn(14), \
+ CRm((0b1000 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \
+ access_pmu_reg, reset_val, (c14_PMEVCNTR0 + (n)*4), 0 }
+
+/* Macro to expand the PMEVTYPER<n> register */
+#define PMU_PMEVTYPER(n) \
+ /* PMEVTYPERn_EL0 */ \
+ { Op1( 0), CRn(14), \
+ CRm((0b1100 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \
+ access_pmu_reg, reset_val, (c14_PMEVTYPR0 + (n)*4), 0 }
+
/*
* Trapped cp15 registers. TTBR0/TTBR1 get a double encoding,
* depending on the way they are accessed (as a 32bit or a 64bit
@@ -771,12 +925,88 @@ static const struct sys_reg_desc cp15_regs[] = {
/* PMU */
{ Op1( 0), CRn( 9), CRm(12), Op2( 0), access_pmcr, NULL, c9_PMCR },
+ { Op1( 0), CRn( 9), CRm(12), Op2( 1), access_pmu_setreg, NULL, c9_PMCNTENSET },
+ { Op1( 0), CRn( 9), CRm(12), Op2( 2), access_pmu_clrreg, NULL, c9_PMCNTENSET },
+ { Op1( 0), CRn( 9), CRm(12), Op2( 3), access_pmu_clrreg, NULL, c9_PMOVSSET },
+ { Op1( 0), CRn( 9), CRm(12), Op2( 5), access_pmu_reg, NULL, c9_PMSELR },
+ { Op1( 0), CRn( 9), CRm(13), Op2( 0), access_pmu_reg, NULL, c9_PMCCNTR },
+ { Op1( 0), CRn( 9), CRm(13), Op2( 1), access_pmu_xreg, NULL, c14_PMEVTYPR0 },
+ { Op1( 0), CRn( 9), CRm(13), Op2( 2), access_pmu_xreg, NULL, c14_PMEVCNTR0 },
+ { Op1( 0), CRn( 9), CRm(14), Op2( 0), access_pmu_reg, NULL, c9_PMUSERENR },
+ { Op1( 0), CRn( 9), CRm(14), Op2( 1), access_pmu_setreg, NULL, c9_PMINTENSET },
+ { Op1( 0), CRn( 9), CRm(14), Op2( 2), access_pmu_clrreg, NULL, c9_PMINTENSET },
+ { Op1( 0), CRn( 9), CRm(14), Op2( 3), access_pmu_setreg, NULL, c9_PMOVSSET },
{ Op1( 0), CRn(10), CRm( 2), Op2( 0), access_vm_reg, NULL, c10_PRRR },
{ Op1( 0), CRn(10), CRm( 2), Op2( 1), access_vm_reg, NULL, c10_NMRR },
{ Op1( 0), CRn(10), CRm( 3), Op2( 0), access_vm_reg, NULL, c10_AMAIR0 },
{ Op1( 0), CRn(10), CRm( 3), Op2( 1), access_vm_reg, NULL, c10_AMAIR1 },
{ Op1( 0), CRn(13), CRm( 0), Op2( 1), access_vm_reg, NULL, c13_CID },
+
+ /* PMU */
+ PMU_PMEVCNTR(0),
+ PMU_PMEVCNTR(1),
+ PMU_PMEVCNTR(2),
+ PMU_PMEVCNTR(3),
+ PMU_PMEVCNTR(4),
+ PMU_PMEVCNTR(5),
+ PMU_PMEVCNTR(6),
+ PMU_PMEVCNTR(7),
+ PMU_PMEVCNTR(8),
+ PMU_PMEVCNTR(9),
+ PMU_PMEVCNTR(10),
+ PMU_PMEVCNTR(11),
+ PMU_PMEVCNTR(12),
+ PMU_PMEVCNTR(13),
+ PMU_PMEVCNTR(14),
+ PMU_PMEVCNTR(15),
+ PMU_PMEVCNTR(16),
+ PMU_PMEVCNTR(17),
+ PMU_PMEVCNTR(18),
+ PMU_PMEVCNTR(19),
+ PMU_PMEVCNTR(20),
+ PMU_PMEVCNTR(21),
+ PMU_PMEVCNTR(22),
+ PMU_PMEVCNTR(23),
+ PMU_PMEVCNTR(24),
+ PMU_PMEVCNTR(25),
+ PMU_PMEVCNTR(26),
+ PMU_PMEVCNTR(27),
+ PMU_PMEVCNTR(28),
+ PMU_PMEVCNTR(29),
+ PMU_PMEVCNTR(30),
+ PMU_PMEVTYPER(0),
+ PMU_PMEVTYPER(1),
+ PMU_PMEVTYPER(2),
+ PMU_PMEVTYPER(3),
+ PMU_PMEVTYPER(4),
+ PMU_PMEVTYPER(5),
+ PMU_PMEVTYPER(6),
+ PMU_PMEVTYPER(7),
+ PMU_PMEVTYPER(8),
+ PMU_PMEVTYPER(9),
+ PMU_PMEVTYPER(10),
+ PMU_PMEVTYPER(11),
+ PMU_PMEVTYPER(12),
+ PMU_PMEVTYPER(13),
+ PMU_PMEVTYPER(14),
+ PMU_PMEVTYPER(15),
+ PMU_PMEVTYPER(16),
+ PMU_PMEVTYPER(17),
+ PMU_PMEVTYPER(18),
+ PMU_PMEVTYPER(19),
+ PMU_PMEVTYPER(20),
+ PMU_PMEVTYPER(21),
+ PMU_PMEVTYPER(22),
+ PMU_PMEVTYPER(23),
+ PMU_PMEVTYPER(24),
+ PMU_PMEVTYPER(25),
+ PMU_PMEVTYPER(26),
+ PMU_PMEVTYPER(27),
+ PMU_PMEVTYPER(28),
+ PMU_PMEVTYPER(29),
+ PMU_PMEVTYPER(30),
+ { Op1( 0), CRn(14), CRm(15), Op2( 7), access_pmu_reg, NULL, c14_PMCCFILTR },
};
static const struct sys_reg_desc cp15_64_regs[] = {
--
1.7.9.5
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [RFC PATCH 6/6] ARM64: KVM: Upgrade to lazy context switch of PMU registers
@ 2014-08-05 9:24 ` Anup Patel
0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05 9:24 UTC (permalink / raw)
To: linux-arm-kernel
Full context switch of all PMU registers for both host and
guest can make KVM world-switch very expensive.
This patch improves current PMU context switch by implementing
lazy context switch of PMU registers.
To achieve this, we trap all PMU register accesses and use a
per-VCPU dirty flag to keep track whether guest has updated
PMU registers or not. If PMU registers of VCPU are dirty or
PMCR_EL0.E bit is set for VCPU then we do full context switch
for both host and guest.
(This is very similar to lazy world switch for debug registers:
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
Also, we always trap-n-emulate PMCR_EL0 to fake number of event
counters available to guest. For this PMCR_EL0 trap-n-emulate to
work correctly, we always save/restore PMCR_EL0 for both host and
guest whereas other PMU registers will be saved/restored based
on PMU dirty flag.
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
---
arch/arm64/include/asm/kvm_asm.h | 3 +
arch/arm64/include/asm/kvm_host.h | 3 +
arch/arm64/kernel/asm-offsets.c | 1 +
arch/arm64/kvm/hyp.S | 63 ++++++++--
arch/arm64/kvm/sys_regs.c | 248 +++++++++++++++++++++++++++++++++++--
5 files changed, 298 insertions(+), 20 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 93be21f..47b7fcd 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -132,6 +132,9 @@
#define KVM_ARM64_DEBUG_DIRTY_SHIFT 0
#define KVM_ARM64_DEBUG_DIRTY (1 << KVM_ARM64_DEBUG_DIRTY_SHIFT)
+#define KVM_ARM64_PMU_DIRTY_SHIFT 0
+#define KVM_ARM64_PMU_DIRTY (1 << KVM_ARM64_PMU_DIRTY_SHIFT)
+
#ifndef __ASSEMBLY__
struct kvm;
struct kvm_vcpu;
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index ae4cdb2..4dba2a3 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -117,6 +117,9 @@ struct kvm_vcpu_arch {
/* Timer state */
struct arch_timer_cpu timer_cpu;
+ /* PMU flags */
+ u64 pmu_flags;
+
/* PMU state */
struct pmu_cpu pmu_cpu;
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 053dc3e..4234794 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -140,6 +140,7 @@ int main(void)
DEFINE(VGIC_CPU_NR_LR, offsetof(struct vgic_cpu, nr_lr));
DEFINE(KVM_VTTBR, offsetof(struct kvm, arch.vttbr));
DEFINE(KVM_VGIC_VCTRL, offsetof(struct kvm, arch.vgic.vctrl_base));
+ DEFINE(VCPU_PMU_FLAGS, offsetof(struct kvm_vcpu, arch.pmu_flags));
DEFINE(VCPU_PMU_IRQ_PENDING, offsetof(struct kvm_vcpu, arch.pmu_cpu.irq_pending));
#endif
#ifdef CONFIG_ARM64_CPU_SUSPEND
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 6b41c01..5f9ccee 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -443,6 +443,9 @@ __kvm_hyp_code_start:
and x5, x4, #~(ARMV8_PMCR_E)// Clear PMCR_EL0.E
msr pmcr_el0, x5 // This will stop all counters
+ ldr x5, [x0, #VCPU_PMU_FLAGS] // Only save if dirty flag set
+ tbz x5, #KVM_ARM64_PMU_DIRTY_SHIFT, 1f
+
mov x3, #0
ubfx x4, x4, #ARMV8_PMCR_N_SHIFT, #5 // Number of event counters
cmp x4, #0 // Skip if no event counters
@@ -731,7 +734,7 @@ __kvm_hyp_code_start:
msr mdccint_el1, x21
.endm
-.macro restore_pmu
+.macro restore_pmu, is_vcpu_pmu
// x2: base address for cpu context
// x3: mask of counters allowed in EL0 & EL1
// x4: number of event counters allowed in EL0 & EL1
@@ -741,16 +744,19 @@ __kvm_hyp_code_start:
cmp x5, #1 // Must be PMUv3 else skip
bne 1f
+ ldr x5, [x0, #VCPU_PMU_FLAGS] // Only restore if dirty flag set
+ tbz x5, #KVM_ARM64_PMU_DIRTY_SHIFT, 2f
+
mov x3, #0
mrs x4, pmcr_el0
ubfx x4, x4, #ARMV8_PMCR_N_SHIFT, #5 // Number of event counters
cmp x4, #0 // Skip if no event counters
- beq 2f
+ beq 3f
sub x4, x4, #1 // Last event counter is reserved
mov x3, #1
lsl x3, x3, x4
sub x3, x3, #1
-2: orr x3, x3, #(1 << 31) // Mask of event counters
+3: orr x3, x3, #(1 << 31) // Mask of event counters
ldr x5, [x2, #CPU_SYSREG_OFFSET(PMCCFILTR_EL0)]
msr pmccfiltr_el0, x5 // Restore PMCCFILTR_EL0
@@ -772,15 +778,15 @@ __kvm_hyp_code_start:
lsl x5, x4, #4
add x5, x5, #CPU_SYSREG_OFFSET(PMEVCNTR0_EL0)
add x5, x2, x5
-3: cmp x4, #0
- beq 4f
+4: cmp x4, #0
+ beq 5f
sub x4, x4, #1
ldp x6, x7, [x5, #-16]!
msr pmselr_el0, x4
msr pmxevcntr_el0, x6 // Restore PMEVCNTR<n>_EL0
msr pmxevtyper_el0, x7 // Restore PMEVTYPER<n>_EL0
- b 3b
-4:
+ b 4b
+5:
ldr x5, [x2, #CPU_SYSREG_OFFSET(PMSELR_EL0)]
msr pmselr_el0, x5 // Restore PMSELR_EL0
@@ -792,6 +798,13 @@ __kvm_hyp_code_start:
and x5, x5, x3
msr pmovsset_el0, x5 // Restore PMOVSSET_EL0
+ .if \is_vcpu_pmu == 0
+ // Clear the dirty flag for the next run, as all the state has
+ // already been saved. Note that we nuke the whole 64bit word.
+ // If we ever add more flags, we'll have to be more careful...
+ str xzr, [x0, #VCPU_PMU_FLAGS]
+ .endif
+2:
ldr x5, [x2, #CPU_SYSREG_OFFSET(PMCR_EL0)]
msr pmcr_el0, x5 // Restore PMCR_EL0
1:
@@ -838,6 +851,23 @@ __kvm_hyp_code_start:
9999:
.endm
+.macro compute_pmu_state
+ // Compute pmu state: If PMCR_EL0.E is set then
+ // we do full save/restore cycle and disable trapping
+ add x25, x0, #VCPU_CONTEXT
+
+ // Check the state of PMCR_EL0.E bit
+ ldr x26, [x25, #CPU_SYSREG_OFFSET(PMCR_EL0)]
+ and x26, x26, #ARMV8_PMCR_E
+ cmp x26, #0
+ b.eq 8887f
+
+ // If any interesting bits was set, we must set the flag
+ mov x26, #KVM_ARM64_PMU_DIRTY
+ str x26, [x0, #VCPU_PMU_FLAGS]
+8887:
+.endm
+
.macro save_guest_32bit_state
skip_32bit_state x3, 1f
@@ -919,6 +949,12 @@ __kvm_hyp_code_start:
orr x2, x2, #MDCR_EL2_TPMCR
orr x2, x2, #(MDCR_EL2_TDRA | MDCR_EL2_TDOSA)
+ // Check for KVM_ARM64_PMU_DIRTY, and set PMU to trap
+ // all PMU registers if PMU not dirty.
+ ldr x3, [x0, #VCPU_PMU_FLAGS]
+ tbnz x3, #KVM_ARM64_PMU_DIRTY_SHIFT, 1f
+ orr x2, x2, #MDCR_EL2_TPM
+1:
// Check for KVM_ARM64_DEBUG_DIRTY, and set debug to trap
// if not dirty.
ldr x3, [x0, #VCPU_DEBUG_FLAGS]
@@ -1127,8 +1163,12 @@ __save_pmu_guest:
save_pmu 1
ret
-__restore_pmu:
- restore_pmu
+__restore_pmu_host:
+ restore_pmu 0
+ ret
+
+__restore_pmu_guest:
+ restore_pmu 1
ret
__save_fpsimd:
@@ -1160,6 +1200,7 @@ ENTRY(__kvm_vcpu_run)
save_host_regs
+ compute_pmu_state
bl __save_pmu_host
bl __save_fpsimd
@@ -1185,7 +1226,7 @@ ENTRY(__kvm_vcpu_run)
1:
restore_guest_32bit_state
- bl __restore_pmu
+ bl __restore_pmu_guest
restore_guest_regs
@@ -1232,7 +1273,7 @@ __kvm_vcpu_return:
str xzr, [x0, #VCPU_DEBUG_FLAGS]
bl __restore_debug
1:
- bl __restore_pmu
+ bl __restore_pmu_host
restore_host_regs
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 081f95e..cda6774 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -166,6 +166,130 @@ static bool access_sctlr(struct kvm_vcpu *vcpu,
return true;
}
+/* PMU reg accessor. Only called as long as MDCR_EL2.TPMCR is set. */
+static bool access_pmu_reg(struct kvm_vcpu *vcpu,
+ const struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ unsigned long val;
+
+ if (p->is_write) {
+ val = *vcpu_reg(vcpu, p->Rt);
+ if (!p->is_aarch32)
+ vcpu_sys_reg(vcpu, r->reg) = val;
+ else
+ vcpu_cp15(vcpu, r->reg) = val & 0xffffffffUL;
+ vcpu->arch.pmu_flags |= KVM_ARM64_PMU_DIRTY;
+ } else {
+ if (!p->is_aarch32)
+ val = vcpu_sys_reg(vcpu, r->reg);
+ else
+ val = vcpu_cp15(vcpu, r->reg);
+ *vcpu_reg(vcpu, p->Rt) = val;
+ }
+
+ return true;
+}
+
+/* PMU set reg accessor. Only called as long as MDCR_EL2.TPM is set. */
+static bool access_pmu_setreg(struct kvm_vcpu *vcpu,
+ const struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ unsigned long val;
+
+ if (p->is_write) {
+ val = *vcpu_reg(vcpu, p->Rt);
+ if (!p->is_aarch32)
+ vcpu_sys_reg(vcpu, r->reg) |= val;
+ else
+ vcpu_cp15(vcpu, r->reg) |= val & 0xffffffffUL;
+ vcpu->arch.pmu_flags |= KVM_ARM64_PMU_DIRTY;
+ } else {
+ if (!p->is_aarch32)
+ val = vcpu_sys_reg(vcpu, r->reg);
+ else
+ val = vcpu_cp15(vcpu, r->reg);
+ *vcpu_reg(vcpu, p->Rt) = val;
+ }
+
+ return true;
+}
+
+/* PMU clear reg accessor. Only called as long as MDCR_EL2.TPM is set. */
+static bool access_pmu_clrreg(struct kvm_vcpu *vcpu,
+ const struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ unsigned long val;
+
+ if (p->is_write) {
+ val = *vcpu_reg(vcpu, p->Rt);
+ if (!p->is_aarch32)
+ vcpu_sys_reg(vcpu, r->reg) &= ~val;
+ else
+ vcpu_cp15(vcpu, r->reg) &= ~(val & 0xffffffffUL);
+ vcpu->arch.pmu_flags |= KVM_ARM64_PMU_DIRTY;
+ } else {
+ if (!p->is_aarch32)
+ val = vcpu_sys_reg(vcpu, r->reg);
+ else
+ val = vcpu_cp15(vcpu, r->reg);
+ *vcpu_reg(vcpu, p->Rt) = val;
+ }
+
+ return true;
+}
+
+/* PMU extended reg accessor. Only called as long as MDCR_EL2.TPM is set. */
+static bool access_pmu_xreg(struct kvm_vcpu *vcpu,
+ const struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ unsigned long index, reg, val;
+
+ if (!p->is_aarch32)
+ index = vcpu_sys_reg(vcpu, PMSELR_EL0) & ARMV8_PMCR_N_MASK;
+ else
+ index = vcpu_cp15(vcpu, c9_PMSELR) & ARMV8_PMCR_N_MASK;
+
+ if (index != ARMV8_PMCR_N_MASK) {
+ if (!p->is_aarch32) {
+ if (r->reg == PMEVCNTR0_EL0)
+ reg = PMCCNTR_EL0;
+ else
+ reg = PMCCFILTR_EL0;
+ } else {
+ if (r->reg == c14_PMEVCNTR0)
+ reg = c9_PMCCNTR;
+ else
+ reg = c14_PMCCFILTR;
+ }
+ } else {
+ if (!p->is_aarch32)
+ reg = r->reg + 2*index;
+ else
+ reg = r->reg + 4*index;
+ }
+
+ if (p->is_write) {
+ val = *vcpu_reg(vcpu, p->Rt);
+ if (!p->is_aarch32)
+ vcpu_sys_reg(vcpu, reg) = val;
+ else
+ vcpu_cp15(vcpu, reg) = val & 0xffffffffUL;
+ vcpu->arch.pmu_flags |= KVM_ARM64_PMU_DIRTY;
+ } else {
+ if (!p->is_aarch32)
+ val = vcpu_sys_reg(vcpu, reg);
+ else
+ val = vcpu_cp15(vcpu, reg);
+ *vcpu_reg(vcpu, p->Rt) = val;
+ }
+
+ return true;
+}
+
/* PMCR_EL0 accessor. Only called as long as MDCR_EL2.TPMCR is set. */
static bool access_pmcr(struct kvm_vcpu *vcpu,
const struct sys_reg_params *p,
@@ -185,6 +309,7 @@ static bool access_pmcr(struct kvm_vcpu *vcpu,
vcpu_sys_reg(vcpu, r->reg) = val;
else
vcpu_cp15(vcpu, r->reg) = val;
+ vcpu->arch.pmu_flags |= KVM_ARM64_PMU_DIRTY;
} else {
/*
* We reserve the last event counter for EL2-mode
@@ -318,14 +443,14 @@ static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
/* PMEVCNTRn_EL0 */ \
{ Op0(0b11), Op1(0b011), CRn(0b1110), \
CRm((0b1000 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \
- NULL, reset_val, (PMEVCNTR0_EL0 + (n)*2), 0 }
+ access_pmu_reg, reset_val, (PMEVCNTR0_EL0 + (n)*2), 0 }
/* Macro to expand the PMEVTYPERn_EL0 register */
#define PMU_PMEVTYPER_EL0(n) \
/* PMEVTYPERn_EL0 */ \
{ Op0(0b11), Op1(0b011), CRn(0b1110), \
CRm((0b1100 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \
- NULL, reset_val, (PMEVTYPER0_EL0 + (n)*2), 0 }
+ access_pmu_reg, reset_val, (PMEVTYPER0_EL0 + (n)*2), 0 }
/*
* Architected system registers.
@@ -463,7 +588,10 @@ static const struct sys_reg_desc sys_reg_descs[] = {
/* PMINTENSET_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b1001), CRm(0b1110), Op2(0b001),
- NULL, reset_val, PMINTENSET_EL1, 0 },
+ access_pmu_setreg, reset_val, PMINTENSET_EL1, 0 },
+ /* PMINTENCLR_EL1 */
+ { Op0(0b11), Op1(0b000), CRn(0b1001), CRm(0b1110), Op2(0b010),
+ access_pmu_clrreg, reset_val, PMINTENSET_EL1, 0 },
/* MAIR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b1010), CRm(0b0010), Op2(0b000),
@@ -495,19 +623,31 @@ static const struct sys_reg_desc sys_reg_descs[] = {
access_pmcr, reset_val, PMCR_EL0, 0 },
/* PMCNTENSET_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b001),
- NULL, reset_val, PMCNTENSET_EL0, 0 },
+ access_pmu_setreg, reset_val, PMCNTENSET_EL0, 0 },
+ /* PMCNTENCLR_EL0 */
+ { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b010),
+ access_pmu_clrreg, reset_val, PMCNTENSET_EL0, 0 },
+ /* PMOVSCLR_EL0 */
+ { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b011),
+ access_pmu_clrreg, reset_val, PMOVSSET_EL0, 0 },
/* PMSELR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b101),
- NULL, reset_val, PMSELR_EL0 },
+ access_pmu_reg, reset_val, PMSELR_EL0 },
/* PMCCNTR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b000),
- NULL, reset_val, PMCCNTR_EL0, 0 },
+ access_pmu_reg, reset_val, PMCCNTR_EL0, 0 },
+ /* PMXEVTYPER_EL0 */
+ { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b001),
+ access_pmu_xreg, reset_val, PMEVTYPER0_EL0, 0 },
+ /* PMXEVCNTR_EL0 */
+ { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b010),
+ access_pmu_xreg, reset_val, PMEVCNTR0_EL0, 0 },
/* PMUSERENR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b000),
- NULL, reset_val, PMUSERENR_EL0, 0 },
+ access_pmu_reg, reset_val, PMUSERENR_EL0, 0 },
/* PMOVSSET_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b011),
- NULL, reset_val, PMOVSSET_EL0, 0 },
+ access_pmu_setreg, reset_val, PMOVSSET_EL0, 0 },
/* TPIDR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1101), CRm(0b0000), Op2(0b010),
@@ -582,7 +722,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
PMU_PMEVTYPER_EL0(30),
/* PMCCFILTR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1110), CRm(0b1111), Op2(0b111),
- NULL, reset_val, PMCCFILTR_EL0, 0 },
+ access_pmu_reg, reset_val, PMCCFILTR_EL0, 0 },
/* DACR32_EL2 */
{ Op0(0b11), Op1(0b100), CRn(0b0011), CRm(0b0000), Op2(0b000),
@@ -744,6 +884,20 @@ static const struct sys_reg_desc cp14_64_regs[] = {
{ Op1( 0), CRm( 2), .access = trap_raz_wi },
};
+/* Macro to expand the PMEVCNTR<n> register */
+#define PMU_PMEVCNTR(n) \
+ /* PMEVCNTRn */ \
+ { Op1( 0), CRn(14), \
+ CRm((0b1000 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \
+ access_pmu_reg, reset_val, (c14_PMEVCNTR0 + (n)*4), 0 }
+
+/* Macro to expand the PMEVTYPER<n> register */
+#define PMU_PMEVTYPER(n) \
+ /* PMEVTYPERn_EL0 */ \
+ { Op1( 0), CRn(14), \
+ CRm((0b1100 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \
+ access_pmu_reg, reset_val, (c14_PMEVTYPR0 + (n)*4), 0 }
+
/*
* Trapped cp15 registers. TTBR0/TTBR1 get a double encoding,
* depending on the way they are accessed (as a 32bit or a 64bit
@@ -771,12 +925,88 @@ static const struct sys_reg_desc cp15_regs[] = {
/* PMU */
{ Op1( 0), CRn( 9), CRm(12), Op2( 0), access_pmcr, NULL, c9_PMCR },
+ { Op1( 0), CRn( 9), CRm(12), Op2( 1), access_pmu_setreg, NULL, c9_PMCNTENSET },
+ { Op1( 0), CRn( 9), CRm(12), Op2( 2), access_pmu_clrreg, NULL, c9_PMCNTENSET },
+ { Op1( 0), CRn( 9), CRm(12), Op2( 3), access_pmu_clrreg, NULL, c9_PMOVSSET },
+ { Op1( 0), CRn( 9), CRm(12), Op2( 5), access_pmu_reg, NULL, c9_PMSELR },
+ { Op1( 0), CRn( 9), CRm(13), Op2( 0), access_pmu_reg, NULL, c9_PMCCNTR },
+ { Op1( 0), CRn( 9), CRm(13), Op2( 1), access_pmu_xreg, NULL, c14_PMEVTYPR0 },
+ { Op1( 0), CRn( 9), CRm(13), Op2( 2), access_pmu_xreg, NULL, c14_PMEVCNTR0 },
+ { Op1( 0), CRn( 9), CRm(14), Op2( 0), access_pmu_reg, NULL, c9_PMUSERENR },
+ { Op1( 0), CRn( 9), CRm(14), Op2( 1), access_pmu_setreg, NULL, c9_PMINTENSET },
+ { Op1( 0), CRn( 9), CRm(14), Op2( 2), access_pmu_clrreg, NULL, c9_PMINTENSET },
+ { Op1( 0), CRn( 9), CRm(14), Op2( 3), access_pmu_setreg, NULL, c9_PMOVSSET },
{ Op1( 0), CRn(10), CRm( 2), Op2( 0), access_vm_reg, NULL, c10_PRRR },
{ Op1( 0), CRn(10), CRm( 2), Op2( 1), access_vm_reg, NULL, c10_NMRR },
{ Op1( 0), CRn(10), CRm( 3), Op2( 0), access_vm_reg, NULL, c10_AMAIR0 },
{ Op1( 0), CRn(10), CRm( 3), Op2( 1), access_vm_reg, NULL, c10_AMAIR1 },
{ Op1( 0), CRn(13), CRm( 0), Op2( 1), access_vm_reg, NULL, c13_CID },
+
+ /* PMU */
+ PMU_PMEVCNTR(0),
+ PMU_PMEVCNTR(1),
+ PMU_PMEVCNTR(2),
+ PMU_PMEVCNTR(3),
+ PMU_PMEVCNTR(4),
+ PMU_PMEVCNTR(5),
+ PMU_PMEVCNTR(6),
+ PMU_PMEVCNTR(7),
+ PMU_PMEVCNTR(8),
+ PMU_PMEVCNTR(9),
+ PMU_PMEVCNTR(10),
+ PMU_PMEVCNTR(11),
+ PMU_PMEVCNTR(12),
+ PMU_PMEVCNTR(13),
+ PMU_PMEVCNTR(14),
+ PMU_PMEVCNTR(15),
+ PMU_PMEVCNTR(16),
+ PMU_PMEVCNTR(17),
+ PMU_PMEVCNTR(18),
+ PMU_PMEVCNTR(19),
+ PMU_PMEVCNTR(20),
+ PMU_PMEVCNTR(21),
+ PMU_PMEVCNTR(22),
+ PMU_PMEVCNTR(23),
+ PMU_PMEVCNTR(24),
+ PMU_PMEVCNTR(25),
+ PMU_PMEVCNTR(26),
+ PMU_PMEVCNTR(27),
+ PMU_PMEVCNTR(28),
+ PMU_PMEVCNTR(29),
+ PMU_PMEVCNTR(30),
+ PMU_PMEVTYPER(0),
+ PMU_PMEVTYPER(1),
+ PMU_PMEVTYPER(2),
+ PMU_PMEVTYPER(3),
+ PMU_PMEVTYPER(4),
+ PMU_PMEVTYPER(5),
+ PMU_PMEVTYPER(6),
+ PMU_PMEVTYPER(7),
+ PMU_PMEVTYPER(8),
+ PMU_PMEVTYPER(9),
+ PMU_PMEVTYPER(10),
+ PMU_PMEVTYPER(11),
+ PMU_PMEVTYPER(12),
+ PMU_PMEVTYPER(13),
+ PMU_PMEVTYPER(14),
+ PMU_PMEVTYPER(15),
+ PMU_PMEVTYPER(16),
+ PMU_PMEVTYPER(17),
+ PMU_PMEVTYPER(18),
+ PMU_PMEVTYPER(19),
+ PMU_PMEVTYPER(20),
+ PMU_PMEVTYPER(21),
+ PMU_PMEVTYPER(22),
+ PMU_PMEVTYPER(23),
+ PMU_PMEVTYPER(24),
+ PMU_PMEVTYPER(25),
+ PMU_PMEVTYPER(26),
+ PMU_PMEVTYPER(27),
+ PMU_PMEVTYPER(28),
+ PMU_PMEVTYPER(29),
+ PMU_PMEVTYPER(30),
+ { Op1( 0), CRn(14), CRm(15), Op2( 7), access_pmu_reg, NULL, c14_PMCCFILTR },
};
static const struct sys_reg_desc cp15_64_regs[] = {
--
1.7.9.5
^ permalink raw reply related [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2014-08-05 9:24 ` Anup Patel
@ 2014-08-05 9:32 ` Anup Patel
-1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05 9:32 UTC (permalink / raw)
To: Anup Patel
Cc: Ian Campbell, kvm, Marc Zyngier, will.deacon, patches,
linux-arm-kernel, kvmarm, Christoffer Dall,
Pranavkumar Sawargaonkar
[-- Attachment #1: Type: text/plain, Size: 3386 bytes --]
On Tue, Aug 5, 2014 at 2:54 PM, Anup Patel <anup.patel@linaro.org> wrote:
> This patchset enables PMU virtualization in KVM ARM64. The
> Guest can now directly use PMU available on the host HW.
>
> The virtual PMU IRQ injection for Guest VCPUs is managed by
> small piece of code shared between KVM ARM and KVM ARM64. The
> virtual PMU IRQ number will be based on Guest machine model and
> user space will provide it using set device address vm ioctl.
>
> The second last patch of this series implements full context
> switch of PMU registers which will context switch all PMU
> registers on every KVM world-switch.
>
> The last patch implements a lazy context switch of PMU registers
> which is very similar to lazy debug context switch.
> (Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
>
> Also, we reserve last PMU event counter for EL2 mode which
> will not be accessible from Host and Guest EL1 mode. This
> reserved EL2 mode PMU event counter can be used for profiling
> KVM world-switch and other EL2 mode functions.
>
> All testing have been done using KVMTOOL on X-Gene Mustang and
> Foundation v8 Model for both Aarch32 and Aarch64 guest.
>
> Anup Patel (6):
> ARM64: Move PMU register related defines to asm/pmu.h
> ARM64: perf: Re-enable overflow interrupt from interrupt handler
> ARM: perf: Re-enable overflow interrupt from interrupt handler
> ARM/ARM64: KVM: Add common code PMU IRQ routing
> ARM64: KVM: Implement full context switch of PMU registers
> ARM64: KVM: Upgrade to lazy context switch of PMU registers
>
> arch/arm/include/asm/kvm_host.h | 9 +
> arch/arm/include/uapi/asm/kvm.h | 1 +
> arch/arm/kernel/perf_event_v7.c | 8 +
> arch/arm/kvm/arm.c | 6 +
> arch/arm/kvm/reset.c | 4 +
> arch/arm64/include/asm/kvm_asm.h | 39 +++-
> arch/arm64/include/asm/kvm_host.h | 12 ++
> arch/arm64/include/asm/pmu.h | 44 +++++
> arch/arm64/include/uapi/asm/kvm.h | 1 +
> arch/arm64/kernel/asm-offsets.c | 2 +
> arch/arm64/kernel/perf_event.c | 40 +---
> arch/arm64/kvm/Kconfig | 7 +
> arch/arm64/kvm/Makefile | 1 +
> arch/arm64/kvm/hyp-init.S | 15 ++
> arch/arm64/kvm/hyp.S | 209 +++++++++++++++++++-
> arch/arm64/kvm/reset.c | 4 +
> arch/arm64/kvm/sys_regs.c | 385 +++++++++++++++++++++++++++++++++----
> include/kvm/arm_pmu.h | 52 +++++
> virt/kvm/arm/pmu.c | 105 ++++++++++
> 19 files changed, 870 insertions(+), 74 deletions(-)
> create mode 100644 include/kvm/arm_pmu.h
> create mode 100644 virt/kvm/arm/pmu.c
>
> --
> 1.7.9.5
>
> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments,
> is for the sole use of the intended recipient(s) and contains information
> that is confidential and proprietary to Applied Micro Circuits Corporation or its subsidiaries.
> It is to be used solely for the purpose of furthering the parties' business relationship.
> All unauthorized review, use, disclosure or distribution is prohibited.
> If you are not the intended recipient, please contact the sender by reply e-mail
> and destroy all copies of the original message.
>
Hi All,
Please apply attached patch to KVMTOOL on-top-of my
recent KVMTOOL patchset for trying this patchset using
KVMTOOL.
Regards,
Anup
[-- Attachment #2: 0001-kvmtool-ARM-ARM64-Add-PMU-node-to-generated-guest-DT.patch --]
[-- Type: text/x-patch, Size: 3994 bytes --]
From c16a3265992ba8159ab1da6d589026c0aa0914ba Mon Sep 17 00:00:00 2001
From: Anup Patel <anup.patel@linaro.org>
Date: Mon, 4 Aug 2014 16:45:44 +0530
Subject: [RFC PATCH] kvmtool: ARM/ARM64: Add PMU node to generated guest DTB.
This patch informs KVM ARM/ARM64 in-kernel PMU virtualization
about the PMU irq numbers for each guest VCPU using set device
address vm ioctl.
We also adds PMU node in generated guest DTB to inform guest
about the PMU irq numbers. For now, we have assumed PPI17 as
PMU IRQ of KVMTOOL guest.
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Anup Patel <anup.patel@linaro.org>
---
tools/kvm/Makefile | 3 ++-
tools/kvm/arm/fdt.c | 4 +++
tools/kvm/arm/include/arm-common/pmu.h | 10 +++++++
tools/kvm/arm/pmu.c | 45 ++++++++++++++++++++++++++++++++
4 files changed, 61 insertions(+), 1 deletion(-)
create mode 100644 tools/kvm/arm/include/arm-common/pmu.h
create mode 100644 tools/kvm/arm/pmu.c
diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index fba60f1..59b75c4 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -158,7 +158,8 @@ endif
# ARM
OBJS_ARM_COMMON := arm/fdt.o arm/gic.o arm/ioport.o arm/irq.o \
- arm/kvm.o arm/kvm-cpu.o arm/pci.o arm/timer.o
+ arm/kvm.o arm/kvm-cpu.o arm/pci.o arm/timer.o \
+ arm/pmu.o
HDRS_ARM_COMMON := arm/include
ifeq ($(ARCH), arm)
DEFINES += -DCONFIG_ARM
diff --git a/tools/kvm/arm/fdt.c b/tools/kvm/arm/fdt.c
index 93849cf2..42b0a67 100644
--- a/tools/kvm/arm/fdt.c
+++ b/tools/kvm/arm/fdt.c
@@ -5,6 +5,7 @@
#include "kvm/virtio-mmio.h"
#include "arm-common/gic.h"
+#include "arm-common/pmu.h"
#include "arm-common/pci.h"
#include <stdbool.h>
@@ -142,6 +143,9 @@ static int setup_fdt(struct kvm *kvm)
if (generate_cpu_peripheral_fdt_nodes)
generate_cpu_peripheral_fdt_nodes(fdt, kvm, gic_phandle);
+ /* Performance monitoring unit */
+ pmu__generate_fdt_nodes(fdt, kvm);
+
/* Virtio MMIO devices */
dev_hdr = device__first_dev(DEVICE_BUS_MMIO);
while (dev_hdr) {
diff --git a/tools/kvm/arm/include/arm-common/pmu.h b/tools/kvm/arm/include/arm-common/pmu.h
new file mode 100644
index 0000000..49ec9a8
--- /dev/null
+++ b/tools/kvm/arm/include/arm-common/pmu.h
@@ -0,0 +1,10 @@
+#ifndef ARM_COMMON__PMU_H
+#define ARM_COMMON__PMU_H
+
+#define PMU_CPU_IRQ 17
+
+struct kvm;
+
+void pmu__generate_fdt_nodes(void *fdt, struct kvm *kvm);
+
+#endif /* ARM_COMMON__PMU_H */
diff --git a/tools/kvm/arm/pmu.c b/tools/kvm/arm/pmu.c
new file mode 100644
index 0000000..7731a4c
--- /dev/null
+++ b/tools/kvm/arm/pmu.c
@@ -0,0 +1,45 @@
+#include "kvm/devices.h"
+#include "kvm/fdt.h"
+#include "kvm/kvm.h"
+#include "kvm/kvm-cpu.h"
+
+#include "arm-common/gic.h"
+#include "arm-common/pmu.h"
+
+#include <linux/byteorder.h>
+#include <linux/kvm.h>
+
+void pmu__generate_fdt_nodes(void *fdt, struct kvm *kvm)
+{
+ int cpu, err;
+ const char compatible[] = "arm,armv8-pmuv3\0arm,cortex-a15-pmu";
+ u32 cpu_mask = (((1 << kvm->nrcpus) - 1) << GIC_FDT_IRQ_PPI_CPU_SHIFT) \
+ & GIC_FDT_IRQ_PPI_CPU_MASK;
+ u32 irq_prop[] = {
+ cpu_to_fdt32(GIC_FDT_IRQ_TYPE_PPI),
+ cpu_to_fdt32(PMU_CPU_IRQ - 0x10),
+ cpu_to_fdt32(cpu_mask | GIC_FDT_IRQ_FLAGS_EDGE_LO_HI),
+ };
+ struct kvm_arm_device_addr pmu_addr = {
+ .id = KVM_ARM_DEVICE_PMU << KVM_ARM_DEVICE_ID_SHIFT,
+ .addr = PMU_CPU_IRQ,
+ };
+
+ for (cpu = 0; cpu < kvm->nrcpus; ++cpu) {
+ pmu_addr.id &= ~KVM_ARM_DEVICE_TYPE_MASK;
+ pmu_addr.id |= (cpu << KVM_ARM_DEVICE_TYPE_SHIFT) &
+ KVM_ARM_DEVICE_TYPE_MASK;
+ err = ioctl(kvm->vm_fd, KVM_ARM_SET_DEVICE_ADDR, &pmu_addr);
+ if (err) {
+ printf("%s: KVM_ARM_SET_DEVICE_ADDR failed for CPU%d",
+ __func__, cpu);
+ }
+ }
+
+ _FDT(fdt_begin_node(fdt, "pmu"));
+
+ _FDT(fdt_property(fdt, "compatible", compatible, sizeof(compatible)));
+ _FDT(fdt_property(fdt, "interrupts", irq_prop, sizeof(irq_prop)));
+
+ _FDT(fdt_end_node(fdt));
+}
--
1.7.9.5
[-- Attachment #3: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-08-05 9:32 ` Anup Patel
0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05 9:32 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Aug 5, 2014 at 2:54 PM, Anup Patel <anup.patel@linaro.org> wrote:
> This patchset enables PMU virtualization in KVM ARM64. The
> Guest can now directly use PMU available on the host HW.
>
> The virtual PMU IRQ injection for Guest VCPUs is managed by
> small piece of code shared between KVM ARM and KVM ARM64. The
> virtual PMU IRQ number will be based on Guest machine model and
> user space will provide it using set device address vm ioctl.
>
> The second last patch of this series implements full context
> switch of PMU registers which will context switch all PMU
> registers on every KVM world-switch.
>
> The last patch implements a lazy context switch of PMU registers
> which is very similar to lazy debug context switch.
> (Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
>
> Also, we reserve last PMU event counter for EL2 mode which
> will not be accessible from Host and Guest EL1 mode. This
> reserved EL2 mode PMU event counter can be used for profiling
> KVM world-switch and other EL2 mode functions.
>
> All testing have been done using KVMTOOL on X-Gene Mustang and
> Foundation v8 Model for both Aarch32 and Aarch64 guest.
>
> Anup Patel (6):
> ARM64: Move PMU register related defines to asm/pmu.h
> ARM64: perf: Re-enable overflow interrupt from interrupt handler
> ARM: perf: Re-enable overflow interrupt from interrupt handler
> ARM/ARM64: KVM: Add common code PMU IRQ routing
> ARM64: KVM: Implement full context switch of PMU registers
> ARM64: KVM: Upgrade to lazy context switch of PMU registers
>
> arch/arm/include/asm/kvm_host.h | 9 +
> arch/arm/include/uapi/asm/kvm.h | 1 +
> arch/arm/kernel/perf_event_v7.c | 8 +
> arch/arm/kvm/arm.c | 6 +
> arch/arm/kvm/reset.c | 4 +
> arch/arm64/include/asm/kvm_asm.h | 39 +++-
> arch/arm64/include/asm/kvm_host.h | 12 ++
> arch/arm64/include/asm/pmu.h | 44 +++++
> arch/arm64/include/uapi/asm/kvm.h | 1 +
> arch/arm64/kernel/asm-offsets.c | 2 +
> arch/arm64/kernel/perf_event.c | 40 +---
> arch/arm64/kvm/Kconfig | 7 +
> arch/arm64/kvm/Makefile | 1 +
> arch/arm64/kvm/hyp-init.S | 15 ++
> arch/arm64/kvm/hyp.S | 209 +++++++++++++++++++-
> arch/arm64/kvm/reset.c | 4 +
> arch/arm64/kvm/sys_regs.c | 385 +++++++++++++++++++++++++++++++++----
> include/kvm/arm_pmu.h | 52 +++++
> virt/kvm/arm/pmu.c | 105 ++++++++++
> 19 files changed, 870 insertions(+), 74 deletions(-)
> create mode 100644 include/kvm/arm_pmu.h
> create mode 100644 virt/kvm/arm/pmu.c
>
> --
> 1.7.9.5
>
> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments,
> is for the sole use of the intended recipient(s) and contains information
> that is confidential and proprietary to Applied Micro Circuits Corporation or its subsidiaries.
> It is to be used solely for the purpose of furthering the parties' business relationship.
> All unauthorized review, use, disclosure or distribution is prohibited.
> If you are not the intended recipient, please contact the sender by reply e-mail
> and destroy all copies of the original message.
>
Hi All,
Please apply attached patch to KVMTOOL on-top-of my
recent KVMTOOL patchset for trying this patchset using
KVMTOOL.
Regards,
Anup
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-kvmtool-ARM-ARM64-Add-PMU-node-to-generated-guest-DT.patch
Type: text/x-patch
Size: 3994 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140805/399a674d/attachment.bin>
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2014-08-05 9:32 ` Anup Patel
@ 2014-08-05 9:35 ` Anup Patel
-1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05 9:35 UTC (permalink / raw)
To: Anup Patel
Cc: kvmarm, linux-arm-kernel, kvm, patches, Marc Zyngier,
Christoffer Dall, Will Deacon, Ian Campbell,
Pranavkumar Sawargaonkar
On 5 August 2014 15:02, Anup Patel <apatel@apm.com> wrote:
> On Tue, Aug 5, 2014 at 2:54 PM, Anup Patel <anup.patel@linaro.org> wrote:
>> This patchset enables PMU virtualization in KVM ARM64. The
>> Guest can now directly use PMU available on the host HW.
>>
>> The virtual PMU IRQ injection for Guest VCPUs is managed by
>> small piece of code shared between KVM ARM and KVM ARM64. The
>> virtual PMU IRQ number will be based on Guest machine model and
>> user space will provide it using set device address vm ioctl.
>>
>> The second last patch of this series implements full context
>> switch of PMU registers which will context switch all PMU
>> registers on every KVM world-switch.
>>
>> The last patch implements a lazy context switch of PMU registers
>> which is very similar to lazy debug context switch.
>> (Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
>>
>> Also, we reserve last PMU event counter for EL2 mode which
>> will not be accessible from Host and Guest EL1 mode. This
>> reserved EL2 mode PMU event counter can be used for profiling
>> KVM world-switch and other EL2 mode functions.
>>
>> All testing have been done using KVMTOOL on X-Gene Mustang and
>> Foundation v8 Model for both Aarch32 and Aarch64 guest.
>>
>> Anup Patel (6):
>> ARM64: Move PMU register related defines to asm/pmu.h
>> ARM64: perf: Re-enable overflow interrupt from interrupt handler
>> ARM: perf: Re-enable overflow interrupt from interrupt handler
>> ARM/ARM64: KVM: Add common code PMU IRQ routing
>> ARM64: KVM: Implement full context switch of PMU registers
>> ARM64: KVM: Upgrade to lazy context switch of PMU registers
>>
>> arch/arm/include/asm/kvm_host.h | 9 +
>> arch/arm/include/uapi/asm/kvm.h | 1 +
>> arch/arm/kernel/perf_event_v7.c | 8 +
>> arch/arm/kvm/arm.c | 6 +
>> arch/arm/kvm/reset.c | 4 +
>> arch/arm64/include/asm/kvm_asm.h | 39 +++-
>> arch/arm64/include/asm/kvm_host.h | 12 ++
>> arch/arm64/include/asm/pmu.h | 44 +++++
>> arch/arm64/include/uapi/asm/kvm.h | 1 +
>> arch/arm64/kernel/asm-offsets.c | 2 +
>> arch/arm64/kernel/perf_event.c | 40 +---
>> arch/arm64/kvm/Kconfig | 7 +
>> arch/arm64/kvm/Makefile | 1 +
>> arch/arm64/kvm/hyp-init.S | 15 ++
>> arch/arm64/kvm/hyp.S | 209 +++++++++++++++++++-
>> arch/arm64/kvm/reset.c | 4 +
>> arch/arm64/kvm/sys_regs.c | 385 +++++++++++++++++++++++++++++++++----
>> include/kvm/arm_pmu.h | 52 +++++
>> virt/kvm/arm/pmu.c | 105 ++++++++++
>> 19 files changed, 870 insertions(+), 74 deletions(-)
>> create mode 100644 include/kvm/arm_pmu.h
>> create mode 100644 virt/kvm/arm/pmu.c
>>
>> --
>> 1.7.9.5
>>
>> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments,
>> is for the sole use of the intended recipient(s) and contains information
>> that is confidential and proprietary to Applied Micro Circuits Corporation or its subsidiaries.
>> It is to be used solely for the purpose of furthering the parties' business relationship.
>> All unauthorized review, use, disclosure or distribution is prohibited.
>> If you are not the intended recipient, please contact the sender by reply e-mail
>> and destroy all copies of the original message.
Please ignore this notice, it accidentally sneaked in.
--
Anup
>>
>
> Hi All,
>
> Please apply attached patch to KVMTOOL on-top-of my
> recent KVMTOOL patchset for trying this patchset using
> KVMTOOL.
>
> Regards,
> Anup
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-08-05 9:35 ` Anup Patel
0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-05 9:35 UTC (permalink / raw)
To: linux-arm-kernel
On 5 August 2014 15:02, Anup Patel <apatel@apm.com> wrote:
> On Tue, Aug 5, 2014 at 2:54 PM, Anup Patel <anup.patel@linaro.org> wrote:
>> This patchset enables PMU virtualization in KVM ARM64. The
>> Guest can now directly use PMU available on the host HW.
>>
>> The virtual PMU IRQ injection for Guest VCPUs is managed by
>> small piece of code shared between KVM ARM and KVM ARM64. The
>> virtual PMU IRQ number will be based on Guest machine model and
>> user space will provide it using set device address vm ioctl.
>>
>> The second last patch of this series implements full context
>> switch of PMU registers which will context switch all PMU
>> registers on every KVM world-switch.
>>
>> The last patch implements a lazy context switch of PMU registers
>> which is very similar to lazy debug context switch.
>> (Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
>>
>> Also, we reserve last PMU event counter for EL2 mode which
>> will not be accessible from Host and Guest EL1 mode. This
>> reserved EL2 mode PMU event counter can be used for profiling
>> KVM world-switch and other EL2 mode functions.
>>
>> All testing have been done using KVMTOOL on X-Gene Mustang and
>> Foundation v8 Model for both Aarch32 and Aarch64 guest.
>>
>> Anup Patel (6):
>> ARM64: Move PMU register related defines to asm/pmu.h
>> ARM64: perf: Re-enable overflow interrupt from interrupt handler
>> ARM: perf: Re-enable overflow interrupt from interrupt handler
>> ARM/ARM64: KVM: Add common code PMU IRQ routing
>> ARM64: KVM: Implement full context switch of PMU registers
>> ARM64: KVM: Upgrade to lazy context switch of PMU registers
>>
>> arch/arm/include/asm/kvm_host.h | 9 +
>> arch/arm/include/uapi/asm/kvm.h | 1 +
>> arch/arm/kernel/perf_event_v7.c | 8 +
>> arch/arm/kvm/arm.c | 6 +
>> arch/arm/kvm/reset.c | 4 +
>> arch/arm64/include/asm/kvm_asm.h | 39 +++-
>> arch/arm64/include/asm/kvm_host.h | 12 ++
>> arch/arm64/include/asm/pmu.h | 44 +++++
>> arch/arm64/include/uapi/asm/kvm.h | 1 +
>> arch/arm64/kernel/asm-offsets.c | 2 +
>> arch/arm64/kernel/perf_event.c | 40 +---
>> arch/arm64/kvm/Kconfig | 7 +
>> arch/arm64/kvm/Makefile | 1 +
>> arch/arm64/kvm/hyp-init.S | 15 ++
>> arch/arm64/kvm/hyp.S | 209 +++++++++++++++++++-
>> arch/arm64/kvm/reset.c | 4 +
>> arch/arm64/kvm/sys_regs.c | 385 +++++++++++++++++++++++++++++++++----
>> include/kvm/arm_pmu.h | 52 +++++
>> virt/kvm/arm/pmu.c | 105 ++++++++++
>> 19 files changed, 870 insertions(+), 74 deletions(-)
>> create mode 100644 include/kvm/arm_pmu.h
>> create mode 100644 virt/kvm/arm/pmu.c
>>
>> --
>> 1.7.9.5
>>
>> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments,
>> is for the sole use of the intended recipient(s) and contains information
>> that is confidential and proprietary to Applied Micro Circuits Corporation or its subsidiaries.
>> It is to be used solely for the purpose of furthering the parties' business relationship.
>> All unauthorized review, use, disclosure or distribution is prohibited.
>> If you are not the intended recipient, please contact the sender by reply e-mail
>> and destroy all copies of the original message.
Please ignore this notice, it accidentally sneaked in.
--
Anup
>>
>
> Hi All,
>
> Please apply attached patch to KVMTOOL on-top-of my
> recent KVMTOOL patchset for trying this patchset using
> KVMTOOL.
>
> Regards,
> Anup
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 2/6] ARM64: perf: Re-enable overflow interrupt from interrupt handler
2014-08-05 9:24 ` Anup Patel
@ 2014-08-06 14:24 ` Will Deacon
-1 siblings, 0 replies; 78+ messages in thread
From: Will Deacon @ 2014-08-06 14:24 UTC (permalink / raw)
To: Anup Patel
Cc: kvmarm, linux-arm-kernel, kvm, patches, Marc Zyngier,
christoffer.dall, ian.campbell, pranavkumar
On Tue, Aug 05, 2014 at 10:24:11AM +0100, Anup Patel wrote:
> A hypervisor will typically mask the overflow interrupt before
> forwarding it to Guest Linux hence we need to re-enable the overflow
> interrupt after clearing it in Guest Linux. Also, this re-enabling
> of overflow interrupt does not harm in non-virtualized scenarios.
>
> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
> Signed-off-by: Anup Patel <anup.patel@linaro.org>
> ---
> arch/arm64/kernel/perf_event.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index 47dfb8b..19fb140 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -1076,6 +1076,14 @@ static irqreturn_t armv8pmu_handle_irq(int irq_num, void *dev)
> if (!armv8pmu_counter_has_overflowed(pmovsr, idx))
> continue;
>
> + /*
> + * If we are running under a hypervisor such as KVM then
> + * hypervisor will mask the interrupt before forwarding
> + * it to Guest Linux hence re-enable interrupt for the
> + * overflowed counter.
> + */
> + armv8pmu_enable_intens(idx);
> +
Really? This is a giant bodge in the guest to work around short-comings in
the hypervisor. Why can't we fix this properly using something like Marc's
irq forwarding code?
Will
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 2/6] ARM64: perf: Re-enable overflow interrupt from interrupt handler
@ 2014-08-06 14:24 ` Will Deacon
0 siblings, 0 replies; 78+ messages in thread
From: Will Deacon @ 2014-08-06 14:24 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Aug 05, 2014 at 10:24:11AM +0100, Anup Patel wrote:
> A hypervisor will typically mask the overflow interrupt before
> forwarding it to Guest Linux hence we need to re-enable the overflow
> interrupt after clearing it in Guest Linux. Also, this re-enabling
> of overflow interrupt does not harm in non-virtualized scenarios.
>
> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
> Signed-off-by: Anup Patel <anup.patel@linaro.org>
> ---
> arch/arm64/kernel/perf_event.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index 47dfb8b..19fb140 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -1076,6 +1076,14 @@ static irqreturn_t armv8pmu_handle_irq(int irq_num, void *dev)
> if (!armv8pmu_counter_has_overflowed(pmovsr, idx))
> continue;
>
> + /*
> + * If we are running under a hypervisor such as KVM then
> + * hypervisor will mask the interrupt before forwarding
> + * it to Guest Linux hence re-enable interrupt for the
> + * overflowed counter.
> + */
> + armv8pmu_enable_intens(idx);
> +
Really? This is a giant bodge in the guest to work around short-comings in
the hypervisor. Why can't we fix this properly using something like Marc's
irq forwarding code?
Will
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 2/6] ARM64: perf: Re-enable overflow interrupt from interrupt handler
2014-08-06 14:24 ` Will Deacon
@ 2014-08-07 9:03 ` Anup Patel
-1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-07 9:03 UTC (permalink / raw)
To: Will Deacon
Cc: kvmarm, linux-arm-kernel, kvm, patches, Marc Zyngier,
christoffer.dall, ian.campbell, pranavkumar
On 6 August 2014 19:54, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, Aug 05, 2014 at 10:24:11AM +0100, Anup Patel wrote:
>> A hypervisor will typically mask the overflow interrupt before
>> forwarding it to Guest Linux hence we need to re-enable the overflow
>> interrupt after clearing it in Guest Linux. Also, this re-enabling
>> of overflow interrupt does not harm in non-virtualized scenarios.
>>
>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>> ---
>> arch/arm64/kernel/perf_event.c | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
>> index 47dfb8b..19fb140 100644
>> --- a/arch/arm64/kernel/perf_event.c
>> +++ b/arch/arm64/kernel/perf_event.c
>> @@ -1076,6 +1076,14 @@ static irqreturn_t armv8pmu_handle_irq(int irq_num, void *dev)
>> if (!armv8pmu_counter_has_overflowed(pmovsr, idx))
>> continue;
>>
>> + /*
>> + * If we are running under a hypervisor such as KVM then
>> + * hypervisor will mask the interrupt before forwarding
>> + * it to Guest Linux hence re-enable interrupt for the
>> + * overflowed counter.
>> + */
>> + armv8pmu_enable_intens(idx);
>> +
>
> Really? This is a giant bodge in the guest to work around short-comings in
> the hypervisor. Why can't we fix this properly using something like Marc's
> irq forwarding code?
This change is in accordance with our previous RFC thread about
PMU virtualization where Marc Z had suggest to do interrupt
mask/unmask dance similar to arch-timer.
I have not tried Marc'z irq forwarding series. In next revision of this
patchset, I will try to use Marc's irq forwarding approach.
>
> Will
--
Anup
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 2/6] ARM64: perf: Re-enable overflow interrupt from interrupt handler
@ 2014-08-07 9:03 ` Anup Patel
0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-08-07 9:03 UTC (permalink / raw)
To: linux-arm-kernel
On 6 August 2014 19:54, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, Aug 05, 2014 at 10:24:11AM +0100, Anup Patel wrote:
>> A hypervisor will typically mask the overflow interrupt before
>> forwarding it to Guest Linux hence we need to re-enable the overflow
>> interrupt after clearing it in Guest Linux. Also, this re-enabling
>> of overflow interrupt does not harm in non-virtualized scenarios.
>>
>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>> ---
>> arch/arm64/kernel/perf_event.c | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
>> index 47dfb8b..19fb140 100644
>> --- a/arch/arm64/kernel/perf_event.c
>> +++ b/arch/arm64/kernel/perf_event.c
>> @@ -1076,6 +1076,14 @@ static irqreturn_t armv8pmu_handle_irq(int irq_num, void *dev)
>> if (!armv8pmu_counter_has_overflowed(pmovsr, idx))
>> continue;
>>
>> + /*
>> + * If we are running under a hypervisor such as KVM then
>> + * hypervisor will mask the interrupt before forwarding
>> + * it to Guest Linux hence re-enable interrupt for the
>> + * overflowed counter.
>> + */
>> + armv8pmu_enable_intens(idx);
>> +
>
> Really? This is a giant bodge in the guest to work around short-comings in
> the hypervisor. Why can't we fix this properly using something like Marc's
> irq forwarding code?
This change is in accordance with our previous RFC thread about
PMU virtualization where Marc Z had suggest to do interrupt
mask/unmask dance similar to arch-timer.
I have not tried Marc'z irq forwarding series. In next revision of this
patchset, I will try to use Marc's irq forwarding approach.
>
> Will
--
Anup
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 2/6] ARM64: perf: Re-enable overflow interrupt from interrupt handler
2014-08-07 9:03 ` Anup Patel
@ 2014-08-07 9:06 ` Will Deacon
-1 siblings, 0 replies; 78+ messages in thread
From: Will Deacon @ 2014-08-07 9:06 UTC (permalink / raw)
To: Anup Patel
Cc: kvmarm, linux-arm-kernel, kvm, patches, Marc Zyngier,
christoffer.dall, ian.campbell, pranavkumar
On Thu, Aug 07, 2014 at 10:03:58AM +0100, Anup Patel wrote:
> On 6 August 2014 19:54, Will Deacon <will.deacon@arm.com> wrote:
> > On Tue, Aug 05, 2014 at 10:24:11AM +0100, Anup Patel wrote:
> >> A hypervisor will typically mask the overflow interrupt before
> >> forwarding it to Guest Linux hence we need to re-enable the overflow
> >> interrupt after clearing it in Guest Linux. Also, this re-enabling
> >> of overflow interrupt does not harm in non-virtualized scenarios.
> >>
> >> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
> >> Signed-off-by: Anup Patel <anup.patel@linaro.org>
> >> ---
> >> arch/arm64/kernel/perf_event.c | 8 ++++++++
> >> 1 file changed, 8 insertions(+)
> >>
> >> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> >> index 47dfb8b..19fb140 100644
> >> --- a/arch/arm64/kernel/perf_event.c
> >> +++ b/arch/arm64/kernel/perf_event.c
> >> @@ -1076,6 +1076,14 @@ static irqreturn_t armv8pmu_handle_irq(int irq_num, void *dev)
> >> if (!armv8pmu_counter_has_overflowed(pmovsr, idx))
> >> continue;
> >>
> >> + /*
> >> + * If we are running under a hypervisor such as KVM then
> >> + * hypervisor will mask the interrupt before forwarding
> >> + * it to Guest Linux hence re-enable interrupt for the
> >> + * overflowed counter.
> >> + */
> >> + armv8pmu_enable_intens(idx);
> >> +
> >
> > Really? This is a giant bodge in the guest to work around short-comings in
> > the hypervisor. Why can't we fix this properly using something like Marc's
> > irq forwarding code?
>
> This change is in accordance with our previous RFC thread about
> PMU virtualization where Marc Z had suggest to do interrupt
> mask/unmask dance similar to arch-timer.
>
> I have not tried Marc'z irq forwarding series. In next revision of this
> patchset, I will try to use Marc's irq forwarding approach.
That would be good. Judging by the colour Marc went when he saw this patch,
I don't think he intended you to hack perf in this way :)
Will
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 2/6] ARM64: perf: Re-enable overflow interrupt from interrupt handler
@ 2014-08-07 9:06 ` Will Deacon
0 siblings, 0 replies; 78+ messages in thread
From: Will Deacon @ 2014-08-07 9:06 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Aug 07, 2014 at 10:03:58AM +0100, Anup Patel wrote:
> On 6 August 2014 19:54, Will Deacon <will.deacon@arm.com> wrote:
> > On Tue, Aug 05, 2014 at 10:24:11AM +0100, Anup Patel wrote:
> >> A hypervisor will typically mask the overflow interrupt before
> >> forwarding it to Guest Linux hence we need to re-enable the overflow
> >> interrupt after clearing it in Guest Linux. Also, this re-enabling
> >> of overflow interrupt does not harm in non-virtualized scenarios.
> >>
> >> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
> >> Signed-off-by: Anup Patel <anup.patel@linaro.org>
> >> ---
> >> arch/arm64/kernel/perf_event.c | 8 ++++++++
> >> 1 file changed, 8 insertions(+)
> >>
> >> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> >> index 47dfb8b..19fb140 100644
> >> --- a/arch/arm64/kernel/perf_event.c
> >> +++ b/arch/arm64/kernel/perf_event.c
> >> @@ -1076,6 +1076,14 @@ static irqreturn_t armv8pmu_handle_irq(int irq_num, void *dev)
> >> if (!armv8pmu_counter_has_overflowed(pmovsr, idx))
> >> continue;
> >>
> >> + /*
> >> + * If we are running under a hypervisor such as KVM then
> >> + * hypervisor will mask the interrupt before forwarding
> >> + * it to Guest Linux hence re-enable interrupt for the
> >> + * overflowed counter.
> >> + */
> >> + armv8pmu_enable_intens(idx);
> >> +
> >
> > Really? This is a giant bodge in the guest to work around short-comings in
> > the hypervisor. Why can't we fix this properly using something like Marc's
> > irq forwarding code?
>
> This change is in accordance with our previous RFC thread about
> PMU virtualization where Marc Z had suggest to do interrupt
> mask/unmask dance similar to arch-timer.
>
> I have not tried Marc'z irq forwarding series. In next revision of this
> patchset, I will try to use Marc's irq forwarding approach.
That would be good. Judging by the colour Marc went when he saw this patch,
I don't think he intended you to hack perf in this way :)
Will
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2014-08-05 9:24 ` Anup Patel
@ 2014-11-07 20:23 ` Christoffer Dall
-1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-07 20:23 UTC (permalink / raw)
To: Anup Patel
Cc: kvmarm, linux-arm-kernel, kvm, patches, marc.zyngier,
will.deacon, ian.campbell, pranavkumar
Hi Anup,
What are your plans in terms of follow-up on this one?
Should we review these patches and reply to anup _at_ brainfaul.org or
are you looking for someone else to pick them up?
Thanks,
-Christoffer
On Tue, Aug 05, 2014 at 02:54:09PM +0530, Anup Patel wrote:
> This patchset enables PMU virtualization in KVM ARM64. The
> Guest can now directly use PMU available on the host HW.
>
> The virtual PMU IRQ injection for Guest VCPUs is managed by
> small piece of code shared between KVM ARM and KVM ARM64. The
> virtual PMU IRQ number will be based on Guest machine model and
> user space will provide it using set device address vm ioctl.
>
> The second last patch of this series implements full context
> switch of PMU registers which will context switch all PMU
> registers on every KVM world-switch.
>
> The last patch implements a lazy context switch of PMU registers
> which is very similar to lazy debug context switch.
> (Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
>
> Also, we reserve last PMU event counter for EL2 mode which
> will not be accessible from Host and Guest EL1 mode. This
> reserved EL2 mode PMU event counter can be used for profiling
> KVM world-switch and other EL2 mode functions.
>
> All testing have been done using KVMTOOL on X-Gene Mustang and
> Foundation v8 Model for both Aarch32 and Aarch64 guest.
>
> Anup Patel (6):
> ARM64: Move PMU register related defines to asm/pmu.h
> ARM64: perf: Re-enable overflow interrupt from interrupt handler
> ARM: perf: Re-enable overflow interrupt from interrupt handler
> ARM/ARM64: KVM: Add common code PMU IRQ routing
> ARM64: KVM: Implement full context switch of PMU registers
> ARM64: KVM: Upgrade to lazy context switch of PMU registers
>
> arch/arm/include/asm/kvm_host.h | 9 +
> arch/arm/include/uapi/asm/kvm.h | 1 +
> arch/arm/kernel/perf_event_v7.c | 8 +
> arch/arm/kvm/arm.c | 6 +
> arch/arm/kvm/reset.c | 4 +
> arch/arm64/include/asm/kvm_asm.h | 39 +++-
> arch/arm64/include/asm/kvm_host.h | 12 ++
> arch/arm64/include/asm/pmu.h | 44 +++++
> arch/arm64/include/uapi/asm/kvm.h | 1 +
> arch/arm64/kernel/asm-offsets.c | 2 +
> arch/arm64/kernel/perf_event.c | 40 +---
> arch/arm64/kvm/Kconfig | 7 +
> arch/arm64/kvm/Makefile | 1 +
> arch/arm64/kvm/hyp-init.S | 15 ++
> arch/arm64/kvm/hyp.S | 209 +++++++++++++++++++-
> arch/arm64/kvm/reset.c | 4 +
> arch/arm64/kvm/sys_regs.c | 385 +++++++++++++++++++++++++++++++++----
> include/kvm/arm_pmu.h | 52 +++++
> virt/kvm/arm/pmu.c | 105 ++++++++++
> 19 files changed, 870 insertions(+), 74 deletions(-)
> create mode 100644 include/kvm/arm_pmu.h
> create mode 100644 virt/kvm/arm/pmu.c
>
> --
> 1.7.9.5
>
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-07 20:23 ` Christoffer Dall
0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-07 20:23 UTC (permalink / raw)
To: linux-arm-kernel
Hi Anup,
What are your plans in terms of follow-up on this one?
Should we review these patches and reply to anup _at_ brainfaul.org or
are you looking for someone else to pick them up?
Thanks,
-Christoffer
On Tue, Aug 05, 2014 at 02:54:09PM +0530, Anup Patel wrote:
> This patchset enables PMU virtualization in KVM ARM64. The
> Guest can now directly use PMU available on the host HW.
>
> The virtual PMU IRQ injection for Guest VCPUs is managed by
> small piece of code shared between KVM ARM and KVM ARM64. The
> virtual PMU IRQ number will be based on Guest machine model and
> user space will provide it using set device address vm ioctl.
>
> The second last patch of this series implements full context
> switch of PMU registers which will context switch all PMU
> registers on every KVM world-switch.
>
> The last patch implements a lazy context switch of PMU registers
> which is very similar to lazy debug context switch.
> (Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
>
> Also, we reserve last PMU event counter for EL2 mode which
> will not be accessible from Host and Guest EL1 mode. This
> reserved EL2 mode PMU event counter can be used for profiling
> KVM world-switch and other EL2 mode functions.
>
> All testing have been done using KVMTOOL on X-Gene Mustang and
> Foundation v8 Model for both Aarch32 and Aarch64 guest.
>
> Anup Patel (6):
> ARM64: Move PMU register related defines to asm/pmu.h
> ARM64: perf: Re-enable overflow interrupt from interrupt handler
> ARM: perf: Re-enable overflow interrupt from interrupt handler
> ARM/ARM64: KVM: Add common code PMU IRQ routing
> ARM64: KVM: Implement full context switch of PMU registers
> ARM64: KVM: Upgrade to lazy context switch of PMU registers
>
> arch/arm/include/asm/kvm_host.h | 9 +
> arch/arm/include/uapi/asm/kvm.h | 1 +
> arch/arm/kernel/perf_event_v7.c | 8 +
> arch/arm/kvm/arm.c | 6 +
> arch/arm/kvm/reset.c | 4 +
> arch/arm64/include/asm/kvm_asm.h | 39 +++-
> arch/arm64/include/asm/kvm_host.h | 12 ++
> arch/arm64/include/asm/pmu.h | 44 +++++
> arch/arm64/include/uapi/asm/kvm.h | 1 +
> arch/arm64/kernel/asm-offsets.c | 2 +
> arch/arm64/kernel/perf_event.c | 40 +---
> arch/arm64/kvm/Kconfig | 7 +
> arch/arm64/kvm/Makefile | 1 +
> arch/arm64/kvm/hyp-init.S | 15 ++
> arch/arm64/kvm/hyp.S | 209 +++++++++++++++++++-
> arch/arm64/kvm/reset.c | 4 +
> arch/arm64/kvm/sys_regs.c | 385 +++++++++++++++++++++++++++++++++----
> include/kvm/arm_pmu.h | 52 +++++
> virt/kvm/arm/pmu.c | 105 ++++++++++
> 19 files changed, 870 insertions(+), 74 deletions(-)
> create mode 100644 include/kvm/arm_pmu.h
> create mode 100644 virt/kvm/arm/pmu.c
>
> --
> 1.7.9.5
>
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2014-08-05 9:24 ` Anup Patel
@ 2014-11-07 20:25 ` Christoffer Dall
-1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-07 20:25 UTC (permalink / raw)
To: Anup Patel
Cc: kvmarm, linux-arm-kernel, kvm, patches, marc.zyngier,
will.deacon, ian.campbell, pranavkumar
Hi Anup,
[This time to the new email]
What are your plans in terms of follow-up on this one?
Should we review these patches and reply to anup _at_ brainfaul.org or
are you looking for someone else to pick them up?
Thanks,
-Christoffer
On Tue, Aug 05, 2014 at 02:54:09PM +0530, Anup Patel wrote:
> This patchset enables PMU virtualization in KVM ARM64. The
> Guest can now directly use PMU available on the host HW.
>
> The virtual PMU IRQ injection for Guest VCPUs is managed by
> small piece of code shared between KVM ARM and KVM ARM64. The
> virtual PMU IRQ number will be based on Guest machine model and
> user space will provide it using set device address vm ioctl.
>
> The second last patch of this series implements full context
> switch of PMU registers which will context switch all PMU
> registers on every KVM world-switch.
>
> The last patch implements a lazy context switch of PMU registers
> which is very similar to lazy debug context switch.
> (Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
>
> Also, we reserve last PMU event counter for EL2 mode which
> will not be accessible from Host and Guest EL1 mode. This
> reserved EL2 mode PMU event counter can be used for profiling
> KVM world-switch and other EL2 mode functions.
>
> All testing have been done using KVMTOOL on X-Gene Mustang and
> Foundation v8 Model for both Aarch32 and Aarch64 guest.
>
> Anup Patel (6):
> ARM64: Move PMU register related defines to asm/pmu.h
> ARM64: perf: Re-enable overflow interrupt from interrupt handler
> ARM: perf: Re-enable overflow interrupt from interrupt handler
> ARM/ARM64: KVM: Add common code PMU IRQ routing
> ARM64: KVM: Implement full context switch of PMU registers
> ARM64: KVM: Upgrade to lazy context switch of PMU registers
>
> arch/arm/include/asm/kvm_host.h | 9 +
> arch/arm/include/uapi/asm/kvm.h | 1 +
> arch/arm/kernel/perf_event_v7.c | 8 +
> arch/arm/kvm/arm.c | 6 +
> arch/arm/kvm/reset.c | 4 +
> arch/arm64/include/asm/kvm_asm.h | 39 +++-
> arch/arm64/include/asm/kvm_host.h | 12 ++
> arch/arm64/include/asm/pmu.h | 44 +++++
> arch/arm64/include/uapi/asm/kvm.h | 1 +
> arch/arm64/kernel/asm-offsets.c | 2 +
> arch/arm64/kernel/perf_event.c | 40 +---
> arch/arm64/kvm/Kconfig | 7 +
> arch/arm64/kvm/Makefile | 1 +
> arch/arm64/kvm/hyp-init.S | 15 ++
> arch/arm64/kvm/hyp.S | 209 +++++++++++++++++++-
> arch/arm64/kvm/reset.c | 4 +
> arch/arm64/kvm/sys_regs.c | 385 +++++++++++++++++++++++++++++++++----
> include/kvm/arm_pmu.h | 52 +++++
> virt/kvm/arm/pmu.c | 105 ++++++++++
> 19 files changed, 870 insertions(+), 74 deletions(-)
> create mode 100644 include/kvm/arm_pmu.h
> create mode 100644 virt/kvm/arm/pmu.c
>
> --
> 1.7.9.5
>
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-07 20:25 ` Christoffer Dall
0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-07 20:25 UTC (permalink / raw)
To: linux-arm-kernel
Hi Anup,
[This time to the new email]
What are your plans in terms of follow-up on this one?
Should we review these patches and reply to anup _at_ brainfaul.org or
are you looking for someone else to pick them up?
Thanks,
-Christoffer
On Tue, Aug 05, 2014 at 02:54:09PM +0530, Anup Patel wrote:
> This patchset enables PMU virtualization in KVM ARM64. The
> Guest can now directly use PMU available on the host HW.
>
> The virtual PMU IRQ injection for Guest VCPUs is managed by
> small piece of code shared between KVM ARM and KVM ARM64. The
> virtual PMU IRQ number will be based on Guest machine model and
> user space will provide it using set device address vm ioctl.
>
> The second last patch of this series implements full context
> switch of PMU registers which will context switch all PMU
> registers on every KVM world-switch.
>
> The last patch implements a lazy context switch of PMU registers
> which is very similar to lazy debug context switch.
> (Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
>
> Also, we reserve last PMU event counter for EL2 mode which
> will not be accessible from Host and Guest EL1 mode. This
> reserved EL2 mode PMU event counter can be used for profiling
> KVM world-switch and other EL2 mode functions.
>
> All testing have been done using KVMTOOL on X-Gene Mustang and
> Foundation v8 Model for both Aarch32 and Aarch64 guest.
>
> Anup Patel (6):
> ARM64: Move PMU register related defines to asm/pmu.h
> ARM64: perf: Re-enable overflow interrupt from interrupt handler
> ARM: perf: Re-enable overflow interrupt from interrupt handler
> ARM/ARM64: KVM: Add common code PMU IRQ routing
> ARM64: KVM: Implement full context switch of PMU registers
> ARM64: KVM: Upgrade to lazy context switch of PMU registers
>
> arch/arm/include/asm/kvm_host.h | 9 +
> arch/arm/include/uapi/asm/kvm.h | 1 +
> arch/arm/kernel/perf_event_v7.c | 8 +
> arch/arm/kvm/arm.c | 6 +
> arch/arm/kvm/reset.c | 4 +
> arch/arm64/include/asm/kvm_asm.h | 39 +++-
> arch/arm64/include/asm/kvm_host.h | 12 ++
> arch/arm64/include/asm/pmu.h | 44 +++++
> arch/arm64/include/uapi/asm/kvm.h | 1 +
> arch/arm64/kernel/asm-offsets.c | 2 +
> arch/arm64/kernel/perf_event.c | 40 +---
> arch/arm64/kvm/Kconfig | 7 +
> arch/arm64/kvm/Makefile | 1 +
> arch/arm64/kvm/hyp-init.S | 15 ++
> arch/arm64/kvm/hyp.S | 209 +++++++++++++++++++-
> arch/arm64/kvm/reset.c | 4 +
> arch/arm64/kvm/sys_regs.c | 385 +++++++++++++++++++++++++++++++++----
> include/kvm/arm_pmu.h | 52 +++++
> virt/kvm/arm/pmu.c | 105 ++++++++++
> 19 files changed, 870 insertions(+), 74 deletions(-)
> create mode 100644 include/kvm/arm_pmu.h
> create mode 100644 virt/kvm/arm/pmu.c
>
> --
> 1.7.9.5
>
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2014-11-07 20:25 ` Christoffer Dall
@ 2014-11-08 9:36 ` Anup Patel
-1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-08 9:36 UTC (permalink / raw)
To: Christoffer Dall
Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar
Hi Christoffer,
On Sat, Nov 8, 2014 at 1:55 AM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> Hi Anup,
>
> [This time to the new email]
>
> What are your plans in terms of follow-up on this one?
Actually, I am already working on RFC v2. I will send-out
RFC v2 sometime next time.
This RFC v2 will be RFC v1 based upon Marc's IRQ
forwarding patchset.
I will try to address PMU context switching for KVM ARM
in RFC v3. Does this sound OK?
Regards,
Anup
>
> Should we review these patches and reply to anup _at_ brainfaul.org or
> are you looking for someone else to pick them up?
>
> Thanks,
> -Christoffer
>
> On Tue, Aug 05, 2014 at 02:54:09PM +0530, Anup Patel wrote:
>> This patchset enables PMU virtualization in KVM ARM64. The
>> Guest can now directly use PMU available on the host HW.
>>
>> The virtual PMU IRQ injection for Guest VCPUs is managed by
>> small piece of code shared between KVM ARM and KVM ARM64. The
>> virtual PMU IRQ number will be based on Guest machine model and
>> user space will provide it using set device address vm ioctl.
>>
>> The second last patch of this series implements full context
>> switch of PMU registers which will context switch all PMU
>> registers on every KVM world-switch.
>>
>> The last patch implements a lazy context switch of PMU registers
>> which is very similar to lazy debug context switch.
>> (Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
>>
>> Also, we reserve last PMU event counter for EL2 mode which
>> will not be accessible from Host and Guest EL1 mode. This
>> reserved EL2 mode PMU event counter can be used for profiling
>> KVM world-switch and other EL2 mode functions.
>>
>> All testing have been done using KVMTOOL on X-Gene Mustang and
>> Foundation v8 Model for both Aarch32 and Aarch64 guest.
>>
>> Anup Patel (6):
>> ARM64: Move PMU register related defines to asm/pmu.h
>> ARM64: perf: Re-enable overflow interrupt from interrupt handler
>> ARM: perf: Re-enable overflow interrupt from interrupt handler
>> ARM/ARM64: KVM: Add common code PMU IRQ routing
>> ARM64: KVM: Implement full context switch of PMU registers
>> ARM64: KVM: Upgrade to lazy context switch of PMU registers
>>
>> arch/arm/include/asm/kvm_host.h | 9 +
>> arch/arm/include/uapi/asm/kvm.h | 1 +
>> arch/arm/kernel/perf_event_v7.c | 8 +
>> arch/arm/kvm/arm.c | 6 +
>> arch/arm/kvm/reset.c | 4 +
>> arch/arm64/include/asm/kvm_asm.h | 39 +++-
>> arch/arm64/include/asm/kvm_host.h | 12 ++
>> arch/arm64/include/asm/pmu.h | 44 +++++
>> arch/arm64/include/uapi/asm/kvm.h | 1 +
>> arch/arm64/kernel/asm-offsets.c | 2 +
>> arch/arm64/kernel/perf_event.c | 40 +---
>> arch/arm64/kvm/Kconfig | 7 +
>> arch/arm64/kvm/Makefile | 1 +
>> arch/arm64/kvm/hyp-init.S | 15 ++
>> arch/arm64/kvm/hyp.S | 209 +++++++++++++++++++-
>> arch/arm64/kvm/reset.c | 4 +
>> arch/arm64/kvm/sys_regs.c | 385 +++++++++++++++++++++++++++++++++----
>> include/kvm/arm_pmu.h | 52 +++++
>> virt/kvm/arm/pmu.c | 105 ++++++++++
>> 19 files changed, 870 insertions(+), 74 deletions(-)
>> create mode 100644 include/kvm/arm_pmu.h
>> create mode 100644 virt/kvm/arm/pmu.c
>>
>> --
>> 1.7.9.5
>>
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-08 9:36 ` Anup Patel
0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-08 9:36 UTC (permalink / raw)
To: linux-arm-kernel
Hi Christoffer,
On Sat, Nov 8, 2014 at 1:55 AM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> Hi Anup,
>
> [This time to the new email]
>
> What are your plans in terms of follow-up on this one?
Actually, I am already working on RFC v2. I will send-out
RFC v2 sometime next time.
This RFC v2 will be RFC v1 based upon Marc's IRQ
forwarding patchset.
I will try to address PMU context switching for KVM ARM
in RFC v3. Does this sound OK?
Regards,
Anup
>
> Should we review these patches and reply to anup _at_ brainfaul.org or
> are you looking for someone else to pick them up?
>
> Thanks,
> -Christoffer
>
> On Tue, Aug 05, 2014 at 02:54:09PM +0530, Anup Patel wrote:
>> This patchset enables PMU virtualization in KVM ARM64. The
>> Guest can now directly use PMU available on the host HW.
>>
>> The virtual PMU IRQ injection for Guest VCPUs is managed by
>> small piece of code shared between KVM ARM and KVM ARM64. The
>> virtual PMU IRQ number will be based on Guest machine model and
>> user space will provide it using set device address vm ioctl.
>>
>> The second last patch of this series implements full context
>> switch of PMU registers which will context switch all PMU
>> registers on every KVM world-switch.
>>
>> The last patch implements a lazy context switch of PMU registers
>> which is very similar to lazy debug context switch.
>> (Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
>>
>> Also, we reserve last PMU event counter for EL2 mode which
>> will not be accessible from Host and Guest EL1 mode. This
>> reserved EL2 mode PMU event counter can be used for profiling
>> KVM world-switch and other EL2 mode functions.
>>
>> All testing have been done using KVMTOOL on X-Gene Mustang and
>> Foundation v8 Model for both Aarch32 and Aarch64 guest.
>>
>> Anup Patel (6):
>> ARM64: Move PMU register related defines to asm/pmu.h
>> ARM64: perf: Re-enable overflow interrupt from interrupt handler
>> ARM: perf: Re-enable overflow interrupt from interrupt handler
>> ARM/ARM64: KVM: Add common code PMU IRQ routing
>> ARM64: KVM: Implement full context switch of PMU registers
>> ARM64: KVM: Upgrade to lazy context switch of PMU registers
>>
>> arch/arm/include/asm/kvm_host.h | 9 +
>> arch/arm/include/uapi/asm/kvm.h | 1 +
>> arch/arm/kernel/perf_event_v7.c | 8 +
>> arch/arm/kvm/arm.c | 6 +
>> arch/arm/kvm/reset.c | 4 +
>> arch/arm64/include/asm/kvm_asm.h | 39 +++-
>> arch/arm64/include/asm/kvm_host.h | 12 ++
>> arch/arm64/include/asm/pmu.h | 44 +++++
>> arch/arm64/include/uapi/asm/kvm.h | 1 +
>> arch/arm64/kernel/asm-offsets.c | 2 +
>> arch/arm64/kernel/perf_event.c | 40 +---
>> arch/arm64/kvm/Kconfig | 7 +
>> arch/arm64/kvm/Makefile | 1 +
>> arch/arm64/kvm/hyp-init.S | 15 ++
>> arch/arm64/kvm/hyp.S | 209 +++++++++++++++++++-
>> arch/arm64/kvm/reset.c | 4 +
>> arch/arm64/kvm/sys_regs.c | 385 +++++++++++++++++++++++++++++++++----
>> include/kvm/arm_pmu.h | 52 +++++
>> virt/kvm/arm/pmu.c | 105 ++++++++++
>> 19 files changed, 870 insertions(+), 74 deletions(-)
>> create mode 100644 include/kvm/arm_pmu.h
>> create mode 100644 virt/kvm/arm/pmu.c
>>
>> --
>> 1.7.9.5
>>
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2014-11-08 9:36 ` Anup Patel
@ 2014-11-08 12:39 ` Christoffer Dall
-1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-08 12:39 UTC (permalink / raw)
To: Anup Patel
Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar
Yes, sounds good. I will review RFC v2 then.
-Christoffer
On Sat, Nov 8, 2014 at 10:36 AM, Anup Patel <anup@brainfault.org> wrote:
> Hi Christoffer,
>
> On Sat, Nov 8, 2014 at 1:55 AM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
>> Hi Anup,
>>
>> [This time to the new email]
>>
>> What are your plans in terms of follow-up on this one?
>
> Actually, I am already working on RFC v2. I will send-out
> RFC v2 sometime next time.
>
> This RFC v2 will be RFC v1 based upon Marc's IRQ
> forwarding patchset.
>
> I will try to address PMU context switching for KVM ARM
> in RFC v3. Does this sound OK?
>
> Regards,
> Anup
>
>>
>> Should we review these patches and reply to anup _at_ brainfaul.org or
>> are you looking for someone else to pick them up?
>>
>> Thanks,
>> -Christoffer
>>
>> On Tue, Aug 05, 2014 at 02:54:09PM +0530, Anup Patel wrote:
>>> This patchset enables PMU virtualization in KVM ARM64. The
>>> Guest can now directly use PMU available on the host HW.
>>>
>>> The virtual PMU IRQ injection for Guest VCPUs is managed by
>>> small piece of code shared between KVM ARM and KVM ARM64. The
>>> virtual PMU IRQ number will be based on Guest machine model and
>>> user space will provide it using set device address vm ioctl.
>>>
>>> The second last patch of this series implements full context
>>> switch of PMU registers which will context switch all PMU
>>> registers on every KVM world-switch.
>>>
>>> The last patch implements a lazy context switch of PMU registers
>>> which is very similar to lazy debug context switch.
>>> (Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
>>>
>>> Also, we reserve last PMU event counter for EL2 mode which
>>> will not be accessible from Host and Guest EL1 mode. This
>>> reserved EL2 mode PMU event counter can be used for profiling
>>> KVM world-switch and other EL2 mode functions.
>>>
>>> All testing have been done using KVMTOOL on X-Gene Mustang and
>>> Foundation v8 Model for both Aarch32 and Aarch64 guest.
>>>
>>> Anup Patel (6):
>>> ARM64: Move PMU register related defines to asm/pmu.h
>>> ARM64: perf: Re-enable overflow interrupt from interrupt handler
>>> ARM: perf: Re-enable overflow interrupt from interrupt handler
>>> ARM/ARM64: KVM: Add common code PMU IRQ routing
>>> ARM64: KVM: Implement full context switch of PMU registers
>>> ARM64: KVM: Upgrade to lazy context switch of PMU registers
>>>
>>> arch/arm/include/asm/kvm_host.h | 9 +
>>> arch/arm/include/uapi/asm/kvm.h | 1 +
>>> arch/arm/kernel/perf_event_v7.c | 8 +
>>> arch/arm/kvm/arm.c | 6 +
>>> arch/arm/kvm/reset.c | 4 +
>>> arch/arm64/include/asm/kvm_asm.h | 39 +++-
>>> arch/arm64/include/asm/kvm_host.h | 12 ++
>>> arch/arm64/include/asm/pmu.h | 44 +++++
>>> arch/arm64/include/uapi/asm/kvm.h | 1 +
>>> arch/arm64/kernel/asm-offsets.c | 2 +
>>> arch/arm64/kernel/perf_event.c | 40 +---
>>> arch/arm64/kvm/Kconfig | 7 +
>>> arch/arm64/kvm/Makefile | 1 +
>>> arch/arm64/kvm/hyp-init.S | 15 ++
>>> arch/arm64/kvm/hyp.S | 209 +++++++++++++++++++-
>>> arch/arm64/kvm/reset.c | 4 +
>>> arch/arm64/kvm/sys_regs.c | 385 +++++++++++++++++++++++++++++++++----
>>> include/kvm/arm_pmu.h | 52 +++++
>>> virt/kvm/arm/pmu.c | 105 ++++++++++
>>> 19 files changed, 870 insertions(+), 74 deletions(-)
>>> create mode 100644 include/kvm/arm_pmu.h
>>> create mode 100644 virt/kvm/arm/pmu.c
>>>
>>> --
>>> 1.7.9.5
>>>
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-08 12:39 ` Christoffer Dall
0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-08 12:39 UTC (permalink / raw)
To: linux-arm-kernel
Yes, sounds good. I will review RFC v2 then.
-Christoffer
On Sat, Nov 8, 2014 at 10:36 AM, Anup Patel <anup@brainfault.org> wrote:
> Hi Christoffer,
>
> On Sat, Nov 8, 2014 at 1:55 AM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
>> Hi Anup,
>>
>> [This time to the new email]
>>
>> What are your plans in terms of follow-up on this one?
>
> Actually, I am already working on RFC v2. I will send-out
> RFC v2 sometime next time.
>
> This RFC v2 will be RFC v1 based upon Marc's IRQ
> forwarding patchset.
>
> I will try to address PMU context switching for KVM ARM
> in RFC v3. Does this sound OK?
>
> Regards,
> Anup
>
>>
>> Should we review these patches and reply to anup _at_ brainfaul.org or
>> are you looking for someone else to pick them up?
>>
>> Thanks,
>> -Christoffer
>>
>> On Tue, Aug 05, 2014 at 02:54:09PM +0530, Anup Patel wrote:
>>> This patchset enables PMU virtualization in KVM ARM64. The
>>> Guest can now directly use PMU available on the host HW.
>>>
>>> The virtual PMU IRQ injection for Guest VCPUs is managed by
>>> small piece of code shared between KVM ARM and KVM ARM64. The
>>> virtual PMU IRQ number will be based on Guest machine model and
>>> user space will provide it using set device address vm ioctl.
>>>
>>> The second last patch of this series implements full context
>>> switch of PMU registers which will context switch all PMU
>>> registers on every KVM world-switch.
>>>
>>> The last patch implements a lazy context switch of PMU registers
>>> which is very similar to lazy debug context switch.
>>> (Refer, http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/271040.html)
>>>
>>> Also, we reserve last PMU event counter for EL2 mode which
>>> will not be accessible from Host and Guest EL1 mode. This
>>> reserved EL2 mode PMU event counter can be used for profiling
>>> KVM world-switch and other EL2 mode functions.
>>>
>>> All testing have been done using KVMTOOL on X-Gene Mustang and
>>> Foundation v8 Model for both Aarch32 and Aarch64 guest.
>>>
>>> Anup Patel (6):
>>> ARM64: Move PMU register related defines to asm/pmu.h
>>> ARM64: perf: Re-enable overflow interrupt from interrupt handler
>>> ARM: perf: Re-enable overflow interrupt from interrupt handler
>>> ARM/ARM64: KVM: Add common code PMU IRQ routing
>>> ARM64: KVM: Implement full context switch of PMU registers
>>> ARM64: KVM: Upgrade to lazy context switch of PMU registers
>>>
>>> arch/arm/include/asm/kvm_host.h | 9 +
>>> arch/arm/include/uapi/asm/kvm.h | 1 +
>>> arch/arm/kernel/perf_event_v7.c | 8 +
>>> arch/arm/kvm/arm.c | 6 +
>>> arch/arm/kvm/reset.c | 4 +
>>> arch/arm64/include/asm/kvm_asm.h | 39 +++-
>>> arch/arm64/include/asm/kvm_host.h | 12 ++
>>> arch/arm64/include/asm/pmu.h | 44 +++++
>>> arch/arm64/include/uapi/asm/kvm.h | 1 +
>>> arch/arm64/kernel/asm-offsets.c | 2 +
>>> arch/arm64/kernel/perf_event.c | 40 +---
>>> arch/arm64/kvm/Kconfig | 7 +
>>> arch/arm64/kvm/Makefile | 1 +
>>> arch/arm64/kvm/hyp-init.S | 15 ++
>>> arch/arm64/kvm/hyp.S | 209 +++++++++++++++++++-
>>> arch/arm64/kvm/reset.c | 4 +
>>> arch/arm64/kvm/sys_regs.c | 385 +++++++++++++++++++++++++++++++++----
>>> include/kvm/arm_pmu.h | 52 +++++
>>> virt/kvm/arm/pmu.c | 105 ++++++++++
>>> 19 files changed, 870 insertions(+), 74 deletions(-)
>>> create mode 100644 include/kvm/arm_pmu.h
>>> create mode 100644 virt/kvm/arm/pmu.c
>>>
>>> --
>>> 1.7.9.5
>>>
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2014-11-08 12:39 ` Christoffer Dall
@ 2014-11-11 9:18 ` Anup Patel
-1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-11 9:18 UTC (permalink / raw)
To: Christoffer Dall
Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar
Hi All,
I have second thoughts about rebasing KVM PMU patches
to Marc's irq-forwarding patches.
The PMU IRQs (when virtualized by KVM) are not exactly
forwarded IRQs because they are shared between Host
and Guest.
Scenario1
-------------
We might have perf running on Host and no KVM guest
running. In this scenario, we wont get interrupts on Host
because the kvm_pmu_hyp_init() (similar to the function
kvm_timer_hyp_init() of Marc's IRQ-forwarding
implementation) has put all host PMU IRQs in forwarding
mode.
The only way solve this problem is to not set forwarding
mode for PMU IRQs in kvm_pmu_hyp_init() and instead
have special routines to turn on and turn off the forwarding
mode of PMU IRQs. These routines will be called from
kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
forwarding state.
Scenario2
-------------
We might have perf running on Host and Guest simultaneously
which means it is quite likely that PMU HW trigger IRQ meant
for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
of Marc's patchset which is called before local_irq_enable()).
In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
will accidentally forward IRQ meant for Host to Guest unless
we put additional checks to inspect VCPU PMU state.
Am I missing any detail about IRQ forwarding for above
scenarios?
If not then can we consider current mask/unmask approach
for forwarding PMU IRQs?
Marc?? Will??
Regards,
Anup
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-11 9:18 ` Anup Patel
0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-11 9:18 UTC (permalink / raw)
To: linux-arm-kernel
Hi All,
I have second thoughts about rebasing KVM PMU patches
to Marc's irq-forwarding patches.
The PMU IRQs (when virtualized by KVM) are not exactly
forwarded IRQs because they are shared between Host
and Guest.
Scenario1
-------------
We might have perf running on Host and no KVM guest
running. In this scenario, we wont get interrupts on Host
because the kvm_pmu_hyp_init() (similar to the function
kvm_timer_hyp_init() of Marc's IRQ-forwarding
implementation) has put all host PMU IRQs in forwarding
mode.
The only way solve this problem is to not set forwarding
mode for PMU IRQs in kvm_pmu_hyp_init() and instead
have special routines to turn on and turn off the forwarding
mode of PMU IRQs. These routines will be called from
kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
forwarding state.
Scenario2
-------------
We might have perf running on Host and Guest simultaneously
which means it is quite likely that PMU HW trigger IRQ meant
for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
of Marc's patchset which is called before local_irq_enable()).
In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
will accidentally forward IRQ meant for Host to Guest unless
we put additional checks to inspect VCPU PMU state.
Am I missing any detail about IRQ forwarding for above
scenarios?
If not then can we consider current mask/unmask approach
for forwarding PMU IRQs?
Marc?? Will??
Regards,
Anup
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2014-11-11 9:18 ` Anup Patel
@ 2014-11-18 3:24 ` Anup Patel
-1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-18 3:24 UTC (permalink / raw)
To: Christoffer Dall
Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar
On Tue, Nov 11, 2014 at 2:48 PM, Anup Patel <anup@brainfault.org> wrote:
> Hi All,
>
> I have second thoughts about rebasing KVM PMU patches
> to Marc's irq-forwarding patches.
>
> The PMU IRQs (when virtualized by KVM) are not exactly
> forwarded IRQs because they are shared between Host
> and Guest.
>
> Scenario1
> -------------
>
> We might have perf running on Host and no KVM guest
> running. In this scenario, we wont get interrupts on Host
> because the kvm_pmu_hyp_init() (similar to the function
> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> implementation) has put all host PMU IRQs in forwarding
> mode.
>
> The only way solve this problem is to not set forwarding
> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> have special routines to turn on and turn off the forwarding
> mode of PMU IRQs. These routines will be called from
> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> forwarding state.
>
> Scenario2
> -------------
>
> We might have perf running on Host and Guest simultaneously
> which means it is quite likely that PMU HW trigger IRQ meant
> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> of Marc's patchset which is called before local_irq_enable()).
>
> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> will accidentally forward IRQ meant for Host to Guest unless
> we put additional checks to inspect VCPU PMU state.
>
> Am I missing any detail about IRQ forwarding for above
> scenarios?
>
> If not then can we consider current mask/unmask approach
> for forwarding PMU IRQs?
>
> Marc?? Will??
>
> Regards,
> Anup
Ping ???
--
Anup
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-18 3:24 ` Anup Patel
0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-18 3:24 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Nov 11, 2014 at 2:48 PM, Anup Patel <anup@brainfault.org> wrote:
> Hi All,
>
> I have second thoughts about rebasing KVM PMU patches
> to Marc's irq-forwarding patches.
>
> The PMU IRQs (when virtualized by KVM) are not exactly
> forwarded IRQs because they are shared between Host
> and Guest.
>
> Scenario1
> -------------
>
> We might have perf running on Host and no KVM guest
> running. In this scenario, we wont get interrupts on Host
> because the kvm_pmu_hyp_init() (similar to the function
> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> implementation) has put all host PMU IRQs in forwarding
> mode.
>
> The only way solve this problem is to not set forwarding
> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> have special routines to turn on and turn off the forwarding
> mode of PMU IRQs. These routines will be called from
> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> forwarding state.
>
> Scenario2
> -------------
>
> We might have perf running on Host and Guest simultaneously
> which means it is quite likely that PMU HW trigger IRQ meant
> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> of Marc's patchset which is called before local_irq_enable()).
>
> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> will accidentally forward IRQ meant for Host to Guest unless
> we put additional checks to inspect VCPU PMU state.
>
> Am I missing any detail about IRQ forwarding for above
> scenarios?
>
> If not then can we consider current mask/unmask approach
> for forwarding PMU IRQs?
>
> Marc?? Will??
>
> Regards,
> Anup
Ping ???
--
Anup
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2014-11-11 9:18 ` Anup Patel
@ 2014-11-19 15:29 ` Christoffer Dall
-1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-19 15:29 UTC (permalink / raw)
To: Anup Patel
Cc: Ian Campbell, KVM General, Marc Zyngier, patches, Will Deacon,
kvmarm, linux-arm-kernel, Pranavkumar Sawargaonkar
On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
> Hi All,
>
> I have second thoughts about rebasing KVM PMU patches
> to Marc's irq-forwarding patches.
>
> The PMU IRQs (when virtualized by KVM) are not exactly
> forwarded IRQs because they are shared between Host
> and Guest.
>
> Scenario1
> -------------
>
> We might have perf running on Host and no KVM guest
> running. In this scenario, we wont get interrupts on Host
> because the kvm_pmu_hyp_init() (similar to the function
> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> implementation) has put all host PMU IRQs in forwarding
> mode.
>
> The only way solve this problem is to not set forwarding
> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> have special routines to turn on and turn off the forwarding
> mode of PMU IRQs. These routines will be called from
> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> forwarding state.
>
> Scenario2
> -------------
>
> We might have perf running on Host and Guest simultaneously
> which means it is quite likely that PMU HW trigger IRQ meant
> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> of Marc's patchset which is called before local_irq_enable()).
>
> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> will accidentally forward IRQ meant for Host to Guest unless
> we put additional checks to inspect VCPU PMU state.
>
> Am I missing any detail about IRQ forwarding for above
> scenarios?
>
Hi Anup,
I briefly discussed this with Marc. What I don't understand is how it
would be possible to get an interrupt for the host while running the
guest?
The rationale behind my question is that whenever you're running the
guest, the PMU should be programmed exclusively with guest state, and
since the PMU is per core, any interrupts should be for the guest, where
it would always be pending.
When migrating a VM with a pending PMU interrupt away for a CPU core, we
also capture the active state (the forwarding patches already handle
this), and obviously the PMU state along with it.
Does this address your concern?
-Christoffer
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-19 15:29 ` Christoffer Dall
0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-19 15:29 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
> Hi All,
>
> I have second thoughts about rebasing KVM PMU patches
> to Marc's irq-forwarding patches.
>
> The PMU IRQs (when virtualized by KVM) are not exactly
> forwarded IRQs because they are shared between Host
> and Guest.
>
> Scenario1
> -------------
>
> We might have perf running on Host and no KVM guest
> running. In this scenario, we wont get interrupts on Host
> because the kvm_pmu_hyp_init() (similar to the function
> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> implementation) has put all host PMU IRQs in forwarding
> mode.
>
> The only way solve this problem is to not set forwarding
> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> have special routines to turn on and turn off the forwarding
> mode of PMU IRQs. These routines will be called from
> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> forwarding state.
>
> Scenario2
> -------------
>
> We might have perf running on Host and Guest simultaneously
> which means it is quite likely that PMU HW trigger IRQ meant
> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> of Marc's patchset which is called before local_irq_enable()).
>
> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> will accidentally forward IRQ meant for Host to Guest unless
> we put additional checks to inspect VCPU PMU state.
>
> Am I missing any detail about IRQ forwarding for above
> scenarios?
>
Hi Anup,
I briefly discussed this with Marc. What I don't understand is how it
would be possible to get an interrupt for the host while running the
guest?
The rationale behind my question is that whenever you're running the
guest, the PMU should be programmed exclusively with guest state, and
since the PMU is per core, any interrupts should be for the guest, where
it would always be pending.
When migrating a VM with a pending PMU interrupt away for a CPU core, we
also capture the active state (the forwarding patches already handle
this), and obviously the PMU state along with it.
Does this address your concern?
-Christoffer
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2014-11-19 15:29 ` Christoffer Dall
@ 2014-11-20 14:47 ` Anup Patel
-1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-20 14:47 UTC (permalink / raw)
To: Christoffer Dall
Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar
On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>> Hi All,
>>
>> I have second thoughts about rebasing KVM PMU patches
>> to Marc's irq-forwarding patches.
>>
>> The PMU IRQs (when virtualized by KVM) are not exactly
>> forwarded IRQs because they are shared between Host
>> and Guest.
>>
>> Scenario1
>> -------------
>>
>> We might have perf running on Host and no KVM guest
>> running. In this scenario, we wont get interrupts on Host
>> because the kvm_pmu_hyp_init() (similar to the function
>> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>> implementation) has put all host PMU IRQs in forwarding
>> mode.
>>
>> The only way solve this problem is to not set forwarding
>> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>> have special routines to turn on and turn off the forwarding
>> mode of PMU IRQs. These routines will be called from
>> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>> forwarding state.
>>
>> Scenario2
>> -------------
>>
>> We might have perf running on Host and Guest simultaneously
>> which means it is quite likely that PMU HW trigger IRQ meant
>> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>> of Marc's patchset which is called before local_irq_enable()).
>>
>> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>> will accidentally forward IRQ meant for Host to Guest unless
>> we put additional checks to inspect VCPU PMU state.
>>
>> Am I missing any detail about IRQ forwarding for above
>> scenarios?
>>
> Hi Anup,
Hi Christoffer,
>
> I briefly discussed this with Marc. What I don't understand is how it
> would be possible to get an interrupt for the host while running the
> guest?
>
> The rationale behind my question is that whenever you're running the
> guest, the PMU should be programmed exclusively with guest state, and
> since the PMU is per core, any interrupts should be for the guest, where
> it would always be pending.
Yes, thats right PMU is programmed exclusively for guest when
guest is running and for host when host is running.
Let us assume a situation (Scenario2 mentioned previously)
where both host and guest are using PMU. When the guest is
running we come back to host mode due to variety of reasons
(stage2 fault, guest IO, regular host interrupt, host interrupt
meant for guest, ....) which means we will return from the
"ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
At this point we would have restored back host PMU context and
any PMU counter used by host can trigger PMU overflow interrup
for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
in the kvm_arch_vcpu_ioctl_run() function (similar to the
kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
which will try to detect PMU irq forwarding state in GIC hence it
can accidentally discover PMU irq pending for guest while this
PMU irq is actually meant for host.
This above mentioned situation does not happen for timer
because virtual timer interrupts are exclusively used for guest.
The exclusive use of virtual timer interrupt for guest ensures that
the function kvm_timer_sync_hwstate() will always see correct
state of virtual timer IRQ from GIC.
>
> When migrating a VM with a pending PMU interrupt away for a CPU core, we
> also capture the active state (the forwarding patches already handle
> this), and obviously the PMU state along with it.
Yes, the migration of PMU state and PMU interrupt state is
quite clear.
>
> Does this address your concern?
I hope above description give you idea about the concern
raised by me.
>
> -Christoffer
Regards,
Anup
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-20 14:47 ` Anup Patel
0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-20 14:47 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>> Hi All,
>>
>> I have second thoughts about rebasing KVM PMU patches
>> to Marc's irq-forwarding patches.
>>
>> The PMU IRQs (when virtualized by KVM) are not exactly
>> forwarded IRQs because they are shared between Host
>> and Guest.
>>
>> Scenario1
>> -------------
>>
>> We might have perf running on Host and no KVM guest
>> running. In this scenario, we wont get interrupts on Host
>> because the kvm_pmu_hyp_init() (similar to the function
>> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>> implementation) has put all host PMU IRQs in forwarding
>> mode.
>>
>> The only way solve this problem is to not set forwarding
>> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>> have special routines to turn on and turn off the forwarding
>> mode of PMU IRQs. These routines will be called from
>> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>> forwarding state.
>>
>> Scenario2
>> -------------
>>
>> We might have perf running on Host and Guest simultaneously
>> which means it is quite likely that PMU HW trigger IRQ meant
>> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>> of Marc's patchset which is called before local_irq_enable()).
>>
>> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>> will accidentally forward IRQ meant for Host to Guest unless
>> we put additional checks to inspect VCPU PMU state.
>>
>> Am I missing any detail about IRQ forwarding for above
>> scenarios?
>>
> Hi Anup,
Hi Christoffer,
>
> I briefly discussed this with Marc. What I don't understand is how it
> would be possible to get an interrupt for the host while running the
> guest?
>
> The rationale behind my question is that whenever you're running the
> guest, the PMU should be programmed exclusively with guest state, and
> since the PMU is per core, any interrupts should be for the guest, where
> it would always be pending.
Yes, thats right PMU is programmed exclusively for guest when
guest is running and for host when host is running.
Let us assume a situation (Scenario2 mentioned previously)
where both host and guest are using PMU. When the guest is
running we come back to host mode due to variety of reasons
(stage2 fault, guest IO, regular host interrupt, host interrupt
meant for guest, ....) which means we will return from the
"ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
At this point we would have restored back host PMU context and
any PMU counter used by host can trigger PMU overflow interrup
for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
in the kvm_arch_vcpu_ioctl_run() function (similar to the
kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
which will try to detect PMU irq forwarding state in GIC hence it
can accidentally discover PMU irq pending for guest while this
PMU irq is actually meant for host.
This above mentioned situation does not happen for timer
because virtual timer interrupts are exclusively used for guest.
The exclusive use of virtual timer interrupt for guest ensures that
the function kvm_timer_sync_hwstate() will always see correct
state of virtual timer IRQ from GIC.
>
> When migrating a VM with a pending PMU interrupt away for a CPU core, we
> also capture the active state (the forwarding patches already handle
> this), and obviously the PMU state along with it.
Yes, the migration of PMU state and PMU interrupt state is
quite clear.
>
> Does this address your concern?
I hope above description give you idea about the concern
raised by me.
>
> -Christoffer
Regards,
Anup
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2014-11-20 14:47 ` Anup Patel
@ 2014-11-21 9:59 ` Christoffer Dall
-1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-21 9:59 UTC (permalink / raw)
To: Anup Patel
Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar
On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
> >> Hi All,
> >>
> >> I have second thoughts about rebasing KVM PMU patches
> >> to Marc's irq-forwarding patches.
> >>
> >> The PMU IRQs (when virtualized by KVM) are not exactly
> >> forwarded IRQs because they are shared between Host
> >> and Guest.
> >>
> >> Scenario1
> >> -------------
> >>
> >> We might have perf running on Host and no KVM guest
> >> running. In this scenario, we wont get interrupts on Host
> >> because the kvm_pmu_hyp_init() (similar to the function
> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> >> implementation) has put all host PMU IRQs in forwarding
> >> mode.
> >>
> >> The only way solve this problem is to not set forwarding
> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> >> have special routines to turn on and turn off the forwarding
> >> mode of PMU IRQs. These routines will be called from
> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> >> forwarding state.
> >>
> >> Scenario2
> >> -------------
> >>
> >> We might have perf running on Host and Guest simultaneously
> >> which means it is quite likely that PMU HW trigger IRQ meant
> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> >> of Marc's patchset which is called before local_irq_enable()).
> >>
> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> >> will accidentally forward IRQ meant for Host to Guest unless
> >> we put additional checks to inspect VCPU PMU state.
> >>
> >> Am I missing any detail about IRQ forwarding for above
> >> scenarios?
> >>
> > Hi Anup,
>
> Hi Christoffer,
>
> >
> > I briefly discussed this with Marc. What I don't understand is how it
> > would be possible to get an interrupt for the host while running the
> > guest?
> >
> > The rationale behind my question is that whenever you're running the
> > guest, the PMU should be programmed exclusively with guest state, and
> > since the PMU is per core, any interrupts should be for the guest, where
> > it would always be pending.
>
> Yes, thats right PMU is programmed exclusively for guest when
> guest is running and for host when host is running.
>
> Let us assume a situation (Scenario2 mentioned previously)
> where both host and guest are using PMU. When the guest is
> running we come back to host mode due to variety of reasons
> (stage2 fault, guest IO, regular host interrupt, host interrupt
> meant for guest, ....) which means we will return from the
> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
> At this point we would have restored back host PMU context and
> any PMU counter used by host can trigger PMU overflow interrup
> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
> in the kvm_arch_vcpu_ioctl_run() function (similar to the
> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
> which will try to detect PMU irq forwarding state in GIC hence it
> can accidentally discover PMU irq pending for guest while this
> PMU irq is actually meant for host.
>
> This above mentioned situation does not happen for timer
> because virtual timer interrupts are exclusively used for guest.
> The exclusive use of virtual timer interrupt for guest ensures that
> the function kvm_timer_sync_hwstate() will always see correct
> state of virtual timer IRQ from GIC.
>
I'm not quite following.
When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
you would (1) capture the active state of the IRQ pertaining to the
guest and (2) deactive the IRQ on the host, then (3) switch the state of
the PMU to the host state, and finally (4) re-enable IRQs on the CPU
you're running on.
If the host PMU state restored in (3) causes the PMU to raise an
interrupt, you'll take an interrupt after (4), which is for the host,
and you'll handle it on the host.
Whenever you schedule the guest VCPU again, you'll (a) disable
interrupts on the CPU, (b) restore the active state of the IRQ for the
guest, (c) restore the guest PMU state, (d) switch to the guest with
IRQs enabled on the CPU (potentially).
If the state in (c) causes an IRQ it will not fire on the host, because
it is marked as active in (b).
Where does this break?
-Christoffer
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-21 9:59 ` Christoffer Dall
0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-21 9:59 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
> >> Hi All,
> >>
> >> I have second thoughts about rebasing KVM PMU patches
> >> to Marc's irq-forwarding patches.
> >>
> >> The PMU IRQs (when virtualized by KVM) are not exactly
> >> forwarded IRQs because they are shared between Host
> >> and Guest.
> >>
> >> Scenario1
> >> -------------
> >>
> >> We might have perf running on Host and no KVM guest
> >> running. In this scenario, we wont get interrupts on Host
> >> because the kvm_pmu_hyp_init() (similar to the function
> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> >> implementation) has put all host PMU IRQs in forwarding
> >> mode.
> >>
> >> The only way solve this problem is to not set forwarding
> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> >> have special routines to turn on and turn off the forwarding
> >> mode of PMU IRQs. These routines will be called from
> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> >> forwarding state.
> >>
> >> Scenario2
> >> -------------
> >>
> >> We might have perf running on Host and Guest simultaneously
> >> which means it is quite likely that PMU HW trigger IRQ meant
> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> >> of Marc's patchset which is called before local_irq_enable()).
> >>
> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> >> will accidentally forward IRQ meant for Host to Guest unless
> >> we put additional checks to inspect VCPU PMU state.
> >>
> >> Am I missing any detail about IRQ forwarding for above
> >> scenarios?
> >>
> > Hi Anup,
>
> Hi Christoffer,
>
> >
> > I briefly discussed this with Marc. What I don't understand is how it
> > would be possible to get an interrupt for the host while running the
> > guest?
> >
> > The rationale behind my question is that whenever you're running the
> > guest, the PMU should be programmed exclusively with guest state, and
> > since the PMU is per core, any interrupts should be for the guest, where
> > it would always be pending.
>
> Yes, thats right PMU is programmed exclusively for guest when
> guest is running and for host when host is running.
>
> Let us assume a situation (Scenario2 mentioned previously)
> where both host and guest are using PMU. When the guest is
> running we come back to host mode due to variety of reasons
> (stage2 fault, guest IO, regular host interrupt, host interrupt
> meant for guest, ....) which means we will return from the
> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
> At this point we would have restored back host PMU context and
> any PMU counter used by host can trigger PMU overflow interrup
> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
> in the kvm_arch_vcpu_ioctl_run() function (similar to the
> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
> which will try to detect PMU irq forwarding state in GIC hence it
> can accidentally discover PMU irq pending for guest while this
> PMU irq is actually meant for host.
>
> This above mentioned situation does not happen for timer
> because virtual timer interrupts are exclusively used for guest.
> The exclusive use of virtual timer interrupt for guest ensures that
> the function kvm_timer_sync_hwstate() will always see correct
> state of virtual timer IRQ from GIC.
>
I'm not quite following.
When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
you would (1) capture the active state of the IRQ pertaining to the
guest and (2) deactive the IRQ on the host, then (3) switch the state of
the PMU to the host state, and finally (4) re-enable IRQs on the CPU
you're running on.
If the host PMU state restored in (3) causes the PMU to raise an
interrupt, you'll take an interrupt after (4), which is for the host,
and you'll handle it on the host.
Whenever you schedule the guest VCPU again, you'll (a) disable
interrupts on the CPU, (b) restore the active state of the IRQ for the
guest, (c) restore the guest PMU state, (d) switch to the guest with
IRQs enabled on the CPU (potentially).
If the state in (c) causes an IRQ it will not fire on the host, because
it is marked as active in (b).
Where does this break?
-Christoffer
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2014-11-21 9:59 ` Christoffer Dall
@ 2014-11-21 10:36 ` Anup Patel
-1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-21 10:36 UTC (permalink / raw)
To: Christoffer Dall
Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar
Hi Christoffer,
On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>> >> Hi All,
>> >>
>> >> I have second thoughts about rebasing KVM PMU patches
>> >> to Marc's irq-forwarding patches.
>> >>
>> >> The PMU IRQs (when virtualized by KVM) are not exactly
>> >> forwarded IRQs because they are shared between Host
>> >> and Guest.
>> >>
>> >> Scenario1
>> >> -------------
>> >>
>> >> We might have perf running on Host and no KVM guest
>> >> running. In this scenario, we wont get interrupts on Host
>> >> because the kvm_pmu_hyp_init() (similar to the function
>> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>> >> implementation) has put all host PMU IRQs in forwarding
>> >> mode.
>> >>
>> >> The only way solve this problem is to not set forwarding
>> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>> >> have special routines to turn on and turn off the forwarding
>> >> mode of PMU IRQs. These routines will be called from
>> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>> >> forwarding state.
>> >>
>> >> Scenario2
>> >> -------------
>> >>
>> >> We might have perf running on Host and Guest simultaneously
>> >> which means it is quite likely that PMU HW trigger IRQ meant
>> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>> >> of Marc's patchset which is called before local_irq_enable()).
>> >>
>> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>> >> will accidentally forward IRQ meant for Host to Guest unless
>> >> we put additional checks to inspect VCPU PMU state.
>> >>
>> >> Am I missing any detail about IRQ forwarding for above
>> >> scenarios?
>> >>
>> > Hi Anup,
>>
>> Hi Christoffer,
>>
>> >
>> > I briefly discussed this with Marc. What I don't understand is how it
>> > would be possible to get an interrupt for the host while running the
>> > guest?
>> >
>> > The rationale behind my question is that whenever you're running the
>> > guest, the PMU should be programmed exclusively with guest state, and
>> > since the PMU is per core, any interrupts should be for the guest, where
>> > it would always be pending.
>>
>> Yes, thats right PMU is programmed exclusively for guest when
>> guest is running and for host when host is running.
>>
>> Let us assume a situation (Scenario2 mentioned previously)
>> where both host and guest are using PMU. When the guest is
>> running we come back to host mode due to variety of reasons
>> (stage2 fault, guest IO, regular host interrupt, host interrupt
>> meant for guest, ....) which means we will return from the
>> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>> At this point we would have restored back host PMU context and
>> any PMU counter used by host can trigger PMU overflow interrup
>> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>> which will try to detect PMU irq forwarding state in GIC hence it
>> can accidentally discover PMU irq pending for guest while this
>> PMU irq is actually meant for host.
>>
>> This above mentioned situation does not happen for timer
>> because virtual timer interrupts are exclusively used for guest.
>> The exclusive use of virtual timer interrupt for guest ensures that
>> the function kvm_timer_sync_hwstate() will always see correct
>> state of virtual timer IRQ from GIC.
>>
> I'm not quite following.
>
> When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
> you would (1) capture the active state of the IRQ pertaining to the
> guest and (2) deactive the IRQ on the host, then (3) switch the state of
> the PMU to the host state, and finally (4) re-enable IRQs on the CPU
> you're running on.
>
> If the host PMU state restored in (3) causes the PMU to raise an
> interrupt, you'll take an interrupt after (4), which is for the host,
> and you'll handle it on the host.
>
We only switch PMU state in assembly code using
kvm_call_hyp(__kvm_vcpu_run, vcpu)
so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
the current hardware PMU state is for host. This means whenever
we are in host mode the host PMU can change state of PMU IRQ
in GIC even if local IRQs are disabled.
Whenever we inspect active state of PMU IRQ in the
kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
Here we are not guaranteed that IRQ forward state returned by the
irq_get_fwd_state() API is for guest only.
The above situation does not manifest for virtual timer because
virtual timer registers are exclusively accessed by Guest and
virtual timer interrupt is only for Guest (never used by Host).
> Whenever you schedule the guest VCPU again, you'll (a) disable
> interrupts on the CPU, (b) restore the active state of the IRQ for the
> guest, (c) restore the guest PMU state, (d) switch to the guest with
> IRQs enabled on the CPU (potentially).
Here too, while we are between step (a) and step (b) the PMU HW
context is for host and any PMU counter can overflow. The step (b)
can actually override the PMU IRQ meant for Host.
>
> If the state in (c) causes an IRQ it will not fire on the host, because
> it is marked as active in (b).
>
> Where does this break?
Your explanation of IRQ forwarding is fine and fits well for devices
(such as virtual timer, and pass-through devices) where the device
is exclusively accessed by Guest and device IRQ whenever active
is only meant for Guest.
>
> -Christoffer
Regards,
Anup
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-21 10:36 ` Anup Patel
0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-21 10:36 UTC (permalink / raw)
To: linux-arm-kernel
Hi Christoffer,
On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>> >> Hi All,
>> >>
>> >> I have second thoughts about rebasing KVM PMU patches
>> >> to Marc's irq-forwarding patches.
>> >>
>> >> The PMU IRQs (when virtualized by KVM) are not exactly
>> >> forwarded IRQs because they are shared between Host
>> >> and Guest.
>> >>
>> >> Scenario1
>> >> -------------
>> >>
>> >> We might have perf running on Host and no KVM guest
>> >> running. In this scenario, we wont get interrupts on Host
>> >> because the kvm_pmu_hyp_init() (similar to the function
>> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>> >> implementation) has put all host PMU IRQs in forwarding
>> >> mode.
>> >>
>> >> The only way solve this problem is to not set forwarding
>> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>> >> have special routines to turn on and turn off the forwarding
>> >> mode of PMU IRQs. These routines will be called from
>> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>> >> forwarding state.
>> >>
>> >> Scenario2
>> >> -------------
>> >>
>> >> We might have perf running on Host and Guest simultaneously
>> >> which means it is quite likely that PMU HW trigger IRQ meant
>> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>> >> of Marc's patchset which is called before local_irq_enable()).
>> >>
>> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>> >> will accidentally forward IRQ meant for Host to Guest unless
>> >> we put additional checks to inspect VCPU PMU state.
>> >>
>> >> Am I missing any detail about IRQ forwarding for above
>> >> scenarios?
>> >>
>> > Hi Anup,
>>
>> Hi Christoffer,
>>
>> >
>> > I briefly discussed this with Marc. What I don't understand is how it
>> > would be possible to get an interrupt for the host while running the
>> > guest?
>> >
>> > The rationale behind my question is that whenever you're running the
>> > guest, the PMU should be programmed exclusively with guest state, and
>> > since the PMU is per core, any interrupts should be for the guest, where
>> > it would always be pending.
>>
>> Yes, thats right PMU is programmed exclusively for guest when
>> guest is running and for host when host is running.
>>
>> Let us assume a situation (Scenario2 mentioned previously)
>> where both host and guest are using PMU. When the guest is
>> running we come back to host mode due to variety of reasons
>> (stage2 fault, guest IO, regular host interrupt, host interrupt
>> meant for guest, ....) which means we will return from the
>> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>> At this point we would have restored back host PMU context and
>> any PMU counter used by host can trigger PMU overflow interrup
>> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>> which will try to detect PMU irq forwarding state in GIC hence it
>> can accidentally discover PMU irq pending for guest while this
>> PMU irq is actually meant for host.
>>
>> This above mentioned situation does not happen for timer
>> because virtual timer interrupts are exclusively used for guest.
>> The exclusive use of virtual timer interrupt for guest ensures that
>> the function kvm_timer_sync_hwstate() will always see correct
>> state of virtual timer IRQ from GIC.
>>
> I'm not quite following.
>
> When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
> you would (1) capture the active state of the IRQ pertaining to the
> guest and (2) deactive the IRQ on the host, then (3) switch the state of
> the PMU to the host state, and finally (4) re-enable IRQs on the CPU
> you're running on.
>
> If the host PMU state restored in (3) causes the PMU to raise an
> interrupt, you'll take an interrupt after (4), which is for the host,
> and you'll handle it on the host.
>
We only switch PMU state in assembly code using
kvm_call_hyp(__kvm_vcpu_run, vcpu)
so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
the current hardware PMU state is for host. This means whenever
we are in host mode the host PMU can change state of PMU IRQ
in GIC even if local IRQs are disabled.
Whenever we inspect active state of PMU IRQ in the
kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
Here we are not guaranteed that IRQ forward state returned by the
irq_get_fwd_state() API is for guest only.
The above situation does not manifest for virtual timer because
virtual timer registers are exclusively accessed by Guest and
virtual timer interrupt is only for Guest (never used by Host).
> Whenever you schedule the guest VCPU again, you'll (a) disable
> interrupts on the CPU, (b) restore the active state of the IRQ for the
> guest, (c) restore the guest PMU state, (d) switch to the guest with
> IRQs enabled on the CPU (potentially).
Here too, while we are between step (a) and step (b) the PMU HW
context is for host and any PMU counter can overflow. The step (b)
can actually override the PMU IRQ meant for Host.
>
> If the state in (c) causes an IRQ it will not fire on the host, because
> it is marked as active in (b).
>
> Where does this break?
Your explanation of IRQ forwarding is fine and fits well for devices
(such as virtual timer, and pass-through devices) where the device
is exclusively accessed by Guest and device IRQ whenever active
is only meant for Guest.
>
> -Christoffer
Regards,
Anup
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2014-11-21 10:36 ` Anup Patel
@ 2014-11-21 11:49 ` Christoffer Dall
-1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-21 11:49 UTC (permalink / raw)
To: Anup Patel
Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar
On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
> Hi Christoffer,
>
> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
> >> <christoffer.dall@linaro.org> wrote:
> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
> >> >> Hi All,
> >> >>
> >> >> I have second thoughts about rebasing KVM PMU patches
> >> >> to Marc's irq-forwarding patches.
> >> >>
> >> >> The PMU IRQs (when virtualized by KVM) are not exactly
> >> >> forwarded IRQs because they are shared between Host
> >> >> and Guest.
> >> >>
> >> >> Scenario1
> >> >> -------------
> >> >>
> >> >> We might have perf running on Host and no KVM guest
> >> >> running. In this scenario, we wont get interrupts on Host
> >> >> because the kvm_pmu_hyp_init() (similar to the function
> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> >> >> implementation) has put all host PMU IRQs in forwarding
> >> >> mode.
> >> >>
> >> >> The only way solve this problem is to not set forwarding
> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> >> >> have special routines to turn on and turn off the forwarding
> >> >> mode of PMU IRQs. These routines will be called from
> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> >> >> forwarding state.
> >> >>
> >> >> Scenario2
> >> >> -------------
> >> >>
> >> >> We might have perf running on Host and Guest simultaneously
> >> >> which means it is quite likely that PMU HW trigger IRQ meant
> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> >> >> of Marc's patchset which is called before local_irq_enable()).
> >> >>
> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> >> >> will accidentally forward IRQ meant for Host to Guest unless
> >> >> we put additional checks to inspect VCPU PMU state.
> >> >>
> >> >> Am I missing any detail about IRQ forwarding for above
> >> >> scenarios?
> >> >>
> >> > Hi Anup,
> >>
> >> Hi Christoffer,
> >>
> >> >
> >> > I briefly discussed this with Marc. What I don't understand is how it
> >> > would be possible to get an interrupt for the host while running the
> >> > guest?
> >> >
> >> > The rationale behind my question is that whenever you're running the
> >> > guest, the PMU should be programmed exclusively with guest state, and
> >> > since the PMU is per core, any interrupts should be for the guest, where
> >> > it would always be pending.
> >>
> >> Yes, thats right PMU is programmed exclusively for guest when
> >> guest is running and for host when host is running.
> >>
> >> Let us assume a situation (Scenario2 mentioned previously)
> >> where both host and guest are using PMU. When the guest is
> >> running we come back to host mode due to variety of reasons
> >> (stage2 fault, guest IO, regular host interrupt, host interrupt
> >> meant for guest, ....) which means we will return from the
> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
> >> At this point we would have restored back host PMU context and
> >> any PMU counter used by host can trigger PMU overflow interrup
> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the
> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
> >> which will try to detect PMU irq forwarding state in GIC hence it
> >> can accidentally discover PMU irq pending for guest while this
> >> PMU irq is actually meant for host.
> >>
> >> This above mentioned situation does not happen for timer
> >> because virtual timer interrupts are exclusively used for guest.
> >> The exclusive use of virtual timer interrupt for guest ensures that
> >> the function kvm_timer_sync_hwstate() will always see correct
> >> state of virtual timer IRQ from GIC.
> >>
> > I'm not quite following.
> >
> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
> > you would (1) capture the active state of the IRQ pertaining to the
> > guest and (2) deactive the IRQ on the host, then (3) switch the state of
> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU
> > you're running on.
> >
> > If the host PMU state restored in (3) causes the PMU to raise an
> > interrupt, you'll take an interrupt after (4), which is for the host,
> > and you'll handle it on the host.
> >
> We only switch PMU state in assembly code using
> kvm_call_hyp(__kvm_vcpu_run, vcpu)
> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
> the current hardware PMU state is for host. This means whenever
> we are in host mode the host PMU can change state of PMU IRQ
> in GIC even if local IRQs are disabled.
>
> Whenever we inspect active state of PMU IRQ in the
> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
> Here we are not guaranteed that IRQ forward state returned by the
> irq_get_fwd_state() API is for guest only.
>
> The above situation does not manifest for virtual timer because
> virtual timer registers are exclusively accessed by Guest and
> virtual timer interrupt is only for Guest (never used by Host).
>
> > Whenever you schedule the guest VCPU again, you'll (a) disable
> > interrupts on the CPU, (b) restore the active state of the IRQ for the
> > guest, (c) restore the guest PMU state, (d) switch to the guest with
> > IRQs enabled on the CPU (potentially).
>
> Here too, while we are between step (a) and step (b) the PMU HW
> context is for host and any PMU counter can overflow. The step (b)
> can actually override the PMU IRQ meant for Host.
>
Can you not simply switch the state from C-code after capturing the IRQ
state then? Everything should be accessible from EL1, right?
-Christoffer
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-21 11:49 ` Christoffer Dall
0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-21 11:49 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
> Hi Christoffer,
>
> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
> >> <christoffer.dall@linaro.org> wrote:
> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
> >> >> Hi All,
> >> >>
> >> >> I have second thoughts about rebasing KVM PMU patches
> >> >> to Marc's irq-forwarding patches.
> >> >>
> >> >> The PMU IRQs (when virtualized by KVM) are not exactly
> >> >> forwarded IRQs because they are shared between Host
> >> >> and Guest.
> >> >>
> >> >> Scenario1
> >> >> -------------
> >> >>
> >> >> We might have perf running on Host and no KVM guest
> >> >> running. In this scenario, we wont get interrupts on Host
> >> >> because the kvm_pmu_hyp_init() (similar to the function
> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> >> >> implementation) has put all host PMU IRQs in forwarding
> >> >> mode.
> >> >>
> >> >> The only way solve this problem is to not set forwarding
> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> >> >> have special routines to turn on and turn off the forwarding
> >> >> mode of PMU IRQs. These routines will be called from
> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> >> >> forwarding state.
> >> >>
> >> >> Scenario2
> >> >> -------------
> >> >>
> >> >> We might have perf running on Host and Guest simultaneously
> >> >> which means it is quite likely that PMU HW trigger IRQ meant
> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> >> >> of Marc's patchset which is called before local_irq_enable()).
> >> >>
> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> >> >> will accidentally forward IRQ meant for Host to Guest unless
> >> >> we put additional checks to inspect VCPU PMU state.
> >> >>
> >> >> Am I missing any detail about IRQ forwarding for above
> >> >> scenarios?
> >> >>
> >> > Hi Anup,
> >>
> >> Hi Christoffer,
> >>
> >> >
> >> > I briefly discussed this with Marc. What I don't understand is how it
> >> > would be possible to get an interrupt for the host while running the
> >> > guest?
> >> >
> >> > The rationale behind my question is that whenever you're running the
> >> > guest, the PMU should be programmed exclusively with guest state, and
> >> > since the PMU is per core, any interrupts should be for the guest, where
> >> > it would always be pending.
> >>
> >> Yes, thats right PMU is programmed exclusively for guest when
> >> guest is running and for host when host is running.
> >>
> >> Let us assume a situation (Scenario2 mentioned previously)
> >> where both host and guest are using PMU. When the guest is
> >> running we come back to host mode due to variety of reasons
> >> (stage2 fault, guest IO, regular host interrupt, host interrupt
> >> meant for guest, ....) which means we will return from the
> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
> >> At this point we would have restored back host PMU context and
> >> any PMU counter used by host can trigger PMU overflow interrup
> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the
> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
> >> which will try to detect PMU irq forwarding state in GIC hence it
> >> can accidentally discover PMU irq pending for guest while this
> >> PMU irq is actually meant for host.
> >>
> >> This above mentioned situation does not happen for timer
> >> because virtual timer interrupts are exclusively used for guest.
> >> The exclusive use of virtual timer interrupt for guest ensures that
> >> the function kvm_timer_sync_hwstate() will always see correct
> >> state of virtual timer IRQ from GIC.
> >>
> > I'm not quite following.
> >
> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
> > you would (1) capture the active state of the IRQ pertaining to the
> > guest and (2) deactive the IRQ on the host, then (3) switch the state of
> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU
> > you're running on.
> >
> > If the host PMU state restored in (3) causes the PMU to raise an
> > interrupt, you'll take an interrupt after (4), which is for the host,
> > and you'll handle it on the host.
> >
> We only switch PMU state in assembly code using
> kvm_call_hyp(__kvm_vcpu_run, vcpu)
> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
> the current hardware PMU state is for host. This means whenever
> we are in host mode the host PMU can change state of PMU IRQ
> in GIC even if local IRQs are disabled.
>
> Whenever we inspect active state of PMU IRQ in the
> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
> Here we are not guaranteed that IRQ forward state returned by the
> irq_get_fwd_state() API is for guest only.
>
> The above situation does not manifest for virtual timer because
> virtual timer registers are exclusively accessed by Guest and
> virtual timer interrupt is only for Guest (never used by Host).
>
> > Whenever you schedule the guest VCPU again, you'll (a) disable
> > interrupts on the CPU, (b) restore the active state of the IRQ for the
> > guest, (c) restore the guest PMU state, (d) switch to the guest with
> > IRQs enabled on the CPU (potentially).
>
> Here too, while we are between step (a) and step (b) the PMU HW
> context is for host and any PMU counter can overflow. The step (b)
> can actually override the PMU IRQ meant for Host.
>
Can you not simply switch the state from C-code after capturing the IRQ
state then? Everything should be accessible from EL1, right?
-Christoffer
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2014-11-21 11:49 ` Christoffer Dall
@ 2014-11-24 8:44 ` Anup Patel
-1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-24 8:44 UTC (permalink / raw)
To: Christoffer Dall
Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar
On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
>> Hi Christoffer,
>>
>> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>> >> <christoffer.dall@linaro.org> wrote:
>> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>> >> >> Hi All,
>> >> >>
>> >> >> I have second thoughts about rebasing KVM PMU patches
>> >> >> to Marc's irq-forwarding patches.
>> >> >>
>> >> >> The PMU IRQs (when virtualized by KVM) are not exactly
>> >> >> forwarded IRQs because they are shared between Host
>> >> >> and Guest.
>> >> >>
>> >> >> Scenario1
>> >> >> -------------
>> >> >>
>> >> >> We might have perf running on Host and no KVM guest
>> >> >> running. In this scenario, we wont get interrupts on Host
>> >> >> because the kvm_pmu_hyp_init() (similar to the function
>> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>> >> >> implementation) has put all host PMU IRQs in forwarding
>> >> >> mode.
>> >> >>
>> >> >> The only way solve this problem is to not set forwarding
>> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>> >> >> have special routines to turn on and turn off the forwarding
>> >> >> mode of PMU IRQs. These routines will be called from
>> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>> >> >> forwarding state.
>> >> >>
>> >> >> Scenario2
>> >> >> -------------
>> >> >>
>> >> >> We might have perf running on Host and Guest simultaneously
>> >> >> which means it is quite likely that PMU HW trigger IRQ meant
>> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>> >> >> of Marc's patchset which is called before local_irq_enable()).
>> >> >>
>> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>> >> >> will accidentally forward IRQ meant for Host to Guest unless
>> >> >> we put additional checks to inspect VCPU PMU state.
>> >> >>
>> >> >> Am I missing any detail about IRQ forwarding for above
>> >> >> scenarios?
>> >> >>
>> >> > Hi Anup,
>> >>
>> >> Hi Christoffer,
>> >>
>> >> >
>> >> > I briefly discussed this with Marc. What I don't understand is how it
>> >> > would be possible to get an interrupt for the host while running the
>> >> > guest?
>> >> >
>> >> > The rationale behind my question is that whenever you're running the
>> >> > guest, the PMU should be programmed exclusively with guest state, and
>> >> > since the PMU is per core, any interrupts should be for the guest, where
>> >> > it would always be pending.
>> >>
>> >> Yes, thats right PMU is programmed exclusively for guest when
>> >> guest is running and for host when host is running.
>> >>
>> >> Let us assume a situation (Scenario2 mentioned previously)
>> >> where both host and guest are using PMU. When the guest is
>> >> running we come back to host mode due to variety of reasons
>> >> (stage2 fault, guest IO, regular host interrupt, host interrupt
>> >> meant for guest, ....) which means we will return from the
>> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>> >> At this point we would have restored back host PMU context and
>> >> any PMU counter used by host can trigger PMU overflow interrup
>> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>> >> which will try to detect PMU irq forwarding state in GIC hence it
>> >> can accidentally discover PMU irq pending for guest while this
>> >> PMU irq is actually meant for host.
>> >>
>> >> This above mentioned situation does not happen for timer
>> >> because virtual timer interrupts are exclusively used for guest.
>> >> The exclusive use of virtual timer interrupt for guest ensures that
>> >> the function kvm_timer_sync_hwstate() will always see correct
>> >> state of virtual timer IRQ from GIC.
>> >>
>> > I'm not quite following.
>> >
>> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
>> > you would (1) capture the active state of the IRQ pertaining to the
>> > guest and (2) deactive the IRQ on the host, then (3) switch the state of
>> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU
>> > you're running on.
>> >
>> > If the host PMU state restored in (3) causes the PMU to raise an
>> > interrupt, you'll take an interrupt after (4), which is for the host,
>> > and you'll handle it on the host.
>> >
>> We only switch PMU state in assembly code using
>> kvm_call_hyp(__kvm_vcpu_run, vcpu)
>> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
>> the current hardware PMU state is for host. This means whenever
>> we are in host mode the host PMU can change state of PMU IRQ
>> in GIC even if local IRQs are disabled.
>>
>> Whenever we inspect active state of PMU IRQ in the
>> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
>> Here we are not guaranteed that IRQ forward state returned by the
>> irq_get_fwd_state() API is for guest only.
>>
>> The above situation does not manifest for virtual timer because
>> virtual timer registers are exclusively accessed by Guest and
>> virtual timer interrupt is only for Guest (never used by Host).
>>
>> > Whenever you schedule the guest VCPU again, you'll (a) disable
>> > interrupts on the CPU, (b) restore the active state of the IRQ for the
>> > guest, (c) restore the guest PMU state, (d) switch to the guest with
>> > IRQs enabled on the CPU (potentially).
>>
>> Here too, while we are between step (a) and step (b) the PMU HW
>> context is for host and any PMU counter can overflow. The step (b)
>> can actually override the PMU IRQ meant for Host.
>>
> Can you not simply switch the state from C-code after capturing the IRQ
> state then? Everything should be accessible from EL1, right?
Yes, I think that would be the only option. This also means I will need
to re-implement context switching for doing it in C-code.
What about the scenario1 which I had mentioned?
--
Anup
>
> -Christoffer
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-24 8:44 ` Anup Patel
0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-24 8:44 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
>> Hi Christoffer,
>>
>> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>> >> <christoffer.dall@linaro.org> wrote:
>> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>> >> >> Hi All,
>> >> >>
>> >> >> I have second thoughts about rebasing KVM PMU patches
>> >> >> to Marc's irq-forwarding patches.
>> >> >>
>> >> >> The PMU IRQs (when virtualized by KVM) are not exactly
>> >> >> forwarded IRQs because they are shared between Host
>> >> >> and Guest.
>> >> >>
>> >> >> Scenario1
>> >> >> -------------
>> >> >>
>> >> >> We might have perf running on Host and no KVM guest
>> >> >> running. In this scenario, we wont get interrupts on Host
>> >> >> because the kvm_pmu_hyp_init() (similar to the function
>> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>> >> >> implementation) has put all host PMU IRQs in forwarding
>> >> >> mode.
>> >> >>
>> >> >> The only way solve this problem is to not set forwarding
>> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>> >> >> have special routines to turn on and turn off the forwarding
>> >> >> mode of PMU IRQs. These routines will be called from
>> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>> >> >> forwarding state.
>> >> >>
>> >> >> Scenario2
>> >> >> -------------
>> >> >>
>> >> >> We might have perf running on Host and Guest simultaneously
>> >> >> which means it is quite likely that PMU HW trigger IRQ meant
>> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>> >> >> of Marc's patchset which is called before local_irq_enable()).
>> >> >>
>> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>> >> >> will accidentally forward IRQ meant for Host to Guest unless
>> >> >> we put additional checks to inspect VCPU PMU state.
>> >> >>
>> >> >> Am I missing any detail about IRQ forwarding for above
>> >> >> scenarios?
>> >> >>
>> >> > Hi Anup,
>> >>
>> >> Hi Christoffer,
>> >>
>> >> >
>> >> > I briefly discussed this with Marc. What I don't understand is how it
>> >> > would be possible to get an interrupt for the host while running the
>> >> > guest?
>> >> >
>> >> > The rationale behind my question is that whenever you're running the
>> >> > guest, the PMU should be programmed exclusively with guest state, and
>> >> > since the PMU is per core, any interrupts should be for the guest, where
>> >> > it would always be pending.
>> >>
>> >> Yes, thats right PMU is programmed exclusively for guest when
>> >> guest is running and for host when host is running.
>> >>
>> >> Let us assume a situation (Scenario2 mentioned previously)
>> >> where both host and guest are using PMU. When the guest is
>> >> running we come back to host mode due to variety of reasons
>> >> (stage2 fault, guest IO, regular host interrupt, host interrupt
>> >> meant for guest, ....) which means we will return from the
>> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>> >> At this point we would have restored back host PMU context and
>> >> any PMU counter used by host can trigger PMU overflow interrup
>> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>> >> which will try to detect PMU irq forwarding state in GIC hence it
>> >> can accidentally discover PMU irq pending for guest while this
>> >> PMU irq is actually meant for host.
>> >>
>> >> This above mentioned situation does not happen for timer
>> >> because virtual timer interrupts are exclusively used for guest.
>> >> The exclusive use of virtual timer interrupt for guest ensures that
>> >> the function kvm_timer_sync_hwstate() will always see correct
>> >> state of virtual timer IRQ from GIC.
>> >>
>> > I'm not quite following.
>> >
>> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
>> > you would (1) capture the active state of the IRQ pertaining to the
>> > guest and (2) deactive the IRQ on the host, then (3) switch the state of
>> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU
>> > you're running on.
>> >
>> > If the host PMU state restored in (3) causes the PMU to raise an
>> > interrupt, you'll take an interrupt after (4), which is for the host,
>> > and you'll handle it on the host.
>> >
>> We only switch PMU state in assembly code using
>> kvm_call_hyp(__kvm_vcpu_run, vcpu)
>> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
>> the current hardware PMU state is for host. This means whenever
>> we are in host mode the host PMU can change state of PMU IRQ
>> in GIC even if local IRQs are disabled.
>>
>> Whenever we inspect active state of PMU IRQ in the
>> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
>> Here we are not guaranteed that IRQ forward state returned by the
>> irq_get_fwd_state() API is for guest only.
>>
>> The above situation does not manifest for virtual timer because
>> virtual timer registers are exclusively accessed by Guest and
>> virtual timer interrupt is only for Guest (never used by Host).
>>
>> > Whenever you schedule the guest VCPU again, you'll (a) disable
>> > interrupts on the CPU, (b) restore the active state of the IRQ for the
>> > guest, (c) restore the guest PMU state, (d) switch to the guest with
>> > IRQs enabled on the CPU (potentially).
>>
>> Here too, while we are between step (a) and step (b) the PMU HW
>> context is for host and any PMU counter can overflow. The step (b)
>> can actually override the PMU IRQ meant for Host.
>>
> Can you not simply switch the state from C-code after capturing the IRQ
> state then? Everything should be accessible from EL1, right?
Yes, I think that would be the only option. This also means I will need
to re-implement context switching for doing it in C-code.
What about the scenario1 which I had mentioned?
--
Anup
>
> -Christoffer
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2014-11-24 8:44 ` Anup Patel
@ 2014-11-24 14:37 ` Christoffer Dall
-1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-24 14:37 UTC (permalink / raw)
To: Anup Patel
Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar
On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
> >> Hi Christoffer,
> >>
> >> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
> >> <christoffer.dall@linaro.org> wrote:
> >> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
> >> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
> >> >> <christoffer.dall@linaro.org> wrote:
> >> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
> >> >> >> Hi All,
> >> >> >>
> >> >> >> I have second thoughts about rebasing KVM PMU patches
> >> >> >> to Marc's irq-forwarding patches.
> >> >> >>
> >> >> >> The PMU IRQs (when virtualized by KVM) are not exactly
> >> >> >> forwarded IRQs because they are shared between Host
> >> >> >> and Guest.
> >> >> >>
> >> >> >> Scenario1
> >> >> >> -------------
> >> >> >>
> >> >> >> We might have perf running on Host and no KVM guest
> >> >> >> running. In this scenario, we wont get interrupts on Host
> >> >> >> because the kvm_pmu_hyp_init() (similar to the function
> >> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> >> >> >> implementation) has put all host PMU IRQs in forwarding
> >> >> >> mode.
> >> >> >>
> >> >> >> The only way solve this problem is to not set forwarding
> >> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> >> >> >> have special routines to turn on and turn off the forwarding
> >> >> >> mode of PMU IRQs. These routines will be called from
> >> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> >> >> >> forwarding state.
> >> >> >>
> >> >> >> Scenario2
> >> >> >> -------------
> >> >> >>
> >> >> >> We might have perf running on Host and Guest simultaneously
> >> >> >> which means it is quite likely that PMU HW trigger IRQ meant
> >> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> >> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> >> >> >> of Marc's patchset which is called before local_irq_enable()).
> >> >> >>
> >> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> >> >> >> will accidentally forward IRQ meant for Host to Guest unless
> >> >> >> we put additional checks to inspect VCPU PMU state.
> >> >> >>
> >> >> >> Am I missing any detail about IRQ forwarding for above
> >> >> >> scenarios?
> >> >> >>
> >> >> > Hi Anup,
> >> >>
> >> >> Hi Christoffer,
> >> >>
> >> >> >
> >> >> > I briefly discussed this with Marc. What I don't understand is how it
> >> >> > would be possible to get an interrupt for the host while running the
> >> >> > guest?
> >> >> >
> >> >> > The rationale behind my question is that whenever you're running the
> >> >> > guest, the PMU should be programmed exclusively with guest state, and
> >> >> > since the PMU is per core, any interrupts should be for the guest, where
> >> >> > it would always be pending.
> >> >>
> >> >> Yes, thats right PMU is programmed exclusively for guest when
> >> >> guest is running and for host when host is running.
> >> >>
> >> >> Let us assume a situation (Scenario2 mentioned previously)
> >> >> where both host and guest are using PMU. When the guest is
> >> >> running we come back to host mode due to variety of reasons
> >> >> (stage2 fault, guest IO, regular host interrupt, host interrupt
> >> >> meant for guest, ....) which means we will return from the
> >> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
> >> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
> >> >> At this point we would have restored back host PMU context and
> >> >> any PMU counter used by host can trigger PMU overflow interrup
> >> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
> >> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the
> >> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
> >> >> which will try to detect PMU irq forwarding state in GIC hence it
> >> >> can accidentally discover PMU irq pending for guest while this
> >> >> PMU irq is actually meant for host.
> >> >>
> >> >> This above mentioned situation does not happen for timer
> >> >> because virtual timer interrupts are exclusively used for guest.
> >> >> The exclusive use of virtual timer interrupt for guest ensures that
> >> >> the function kvm_timer_sync_hwstate() will always see correct
> >> >> state of virtual timer IRQ from GIC.
> >> >>
> >> > I'm not quite following.
> >> >
> >> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
> >> > you would (1) capture the active state of the IRQ pertaining to the
> >> > guest and (2) deactive the IRQ on the host, then (3) switch the state of
> >> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU
> >> > you're running on.
> >> >
> >> > If the host PMU state restored in (3) causes the PMU to raise an
> >> > interrupt, you'll take an interrupt after (4), which is for the host,
> >> > and you'll handle it on the host.
> >> >
> >> We only switch PMU state in assembly code using
> >> kvm_call_hyp(__kvm_vcpu_run, vcpu)
> >> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
> >> the current hardware PMU state is for host. This means whenever
> >> we are in host mode the host PMU can change state of PMU IRQ
> >> in GIC even if local IRQs are disabled.
> >>
> >> Whenever we inspect active state of PMU IRQ in the
> >> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
> >> Here we are not guaranteed that IRQ forward state returned by the
> >> irq_get_fwd_state() API is for guest only.
> >>
> >> The above situation does not manifest for virtual timer because
> >> virtual timer registers are exclusively accessed by Guest and
> >> virtual timer interrupt is only for Guest (never used by Host).
> >>
> >> > Whenever you schedule the guest VCPU again, you'll (a) disable
> >> > interrupts on the CPU, (b) restore the active state of the IRQ for the
> >> > guest, (c) restore the guest PMU state, (d) switch to the guest with
> >> > IRQs enabled on the CPU (potentially).
> >>
> >> Here too, while we are between step (a) and step (b) the PMU HW
> >> context is for host and any PMU counter can overflow. The step (b)
> >> can actually override the PMU IRQ meant for Host.
> >>
> > Can you not simply switch the state from C-code after capturing the IRQ
> > state then? Everything should be accessible from EL1, right?
>
> Yes, I think that would be the only option. This also means I will need
> to re-implement context switching for doing it in C-code.
Yes, you'd add some inline assembly in the C-code to access the
registers I guess. Only thing I thought about after writing my original
mail is whether you'll be counting events while context-swtiching and
running on the host, which you actually don't want to. Not sure if
there's a better way to avoid that.
>
> What about the scenario1 which I had mentioned?
>
You have to consider enabling/disabling forwarding and setting/clearing
the active state is part of the guest PMU state and all of it has to be
context-switched.
Thanks,
-Christoffer
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-24 14:37 ` Christoffer Dall
0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-24 14:37 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
> >> Hi Christoffer,
> >>
> >> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
> >> <christoffer.dall@linaro.org> wrote:
> >> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
> >> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
> >> >> <christoffer.dall@linaro.org> wrote:
> >> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
> >> >> >> Hi All,
> >> >> >>
> >> >> >> I have second thoughts about rebasing KVM PMU patches
> >> >> >> to Marc's irq-forwarding patches.
> >> >> >>
> >> >> >> The PMU IRQs (when virtualized by KVM) are not exactly
> >> >> >> forwarded IRQs because they are shared between Host
> >> >> >> and Guest.
> >> >> >>
> >> >> >> Scenario1
> >> >> >> -------------
> >> >> >>
> >> >> >> We might have perf running on Host and no KVM guest
> >> >> >> running. In this scenario, we wont get interrupts on Host
> >> >> >> because the kvm_pmu_hyp_init() (similar to the function
> >> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> >> >> >> implementation) has put all host PMU IRQs in forwarding
> >> >> >> mode.
> >> >> >>
> >> >> >> The only way solve this problem is to not set forwarding
> >> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> >> >> >> have special routines to turn on and turn off the forwarding
> >> >> >> mode of PMU IRQs. These routines will be called from
> >> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> >> >> >> forwarding state.
> >> >> >>
> >> >> >> Scenario2
> >> >> >> -------------
> >> >> >>
> >> >> >> We might have perf running on Host and Guest simultaneously
> >> >> >> which means it is quite likely that PMU HW trigger IRQ meant
> >> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> >> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> >> >> >> of Marc's patchset which is called before local_irq_enable()).
> >> >> >>
> >> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> >> >> >> will accidentally forward IRQ meant for Host to Guest unless
> >> >> >> we put additional checks to inspect VCPU PMU state.
> >> >> >>
> >> >> >> Am I missing any detail about IRQ forwarding for above
> >> >> >> scenarios?
> >> >> >>
> >> >> > Hi Anup,
> >> >>
> >> >> Hi Christoffer,
> >> >>
> >> >> >
> >> >> > I briefly discussed this with Marc. What I don't understand is how it
> >> >> > would be possible to get an interrupt for the host while running the
> >> >> > guest?
> >> >> >
> >> >> > The rationale behind my question is that whenever you're running the
> >> >> > guest, the PMU should be programmed exclusively with guest state, and
> >> >> > since the PMU is per core, any interrupts should be for the guest, where
> >> >> > it would always be pending.
> >> >>
> >> >> Yes, thats right PMU is programmed exclusively for guest when
> >> >> guest is running and for host when host is running.
> >> >>
> >> >> Let us assume a situation (Scenario2 mentioned previously)
> >> >> where both host and guest are using PMU. When the guest is
> >> >> running we come back to host mode due to variety of reasons
> >> >> (stage2 fault, guest IO, regular host interrupt, host interrupt
> >> >> meant for guest, ....) which means we will return from the
> >> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
> >> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
> >> >> At this point we would have restored back host PMU context and
> >> >> any PMU counter used by host can trigger PMU overflow interrup
> >> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
> >> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the
> >> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
> >> >> which will try to detect PMU irq forwarding state in GIC hence it
> >> >> can accidentally discover PMU irq pending for guest while this
> >> >> PMU irq is actually meant for host.
> >> >>
> >> >> This above mentioned situation does not happen for timer
> >> >> because virtual timer interrupts are exclusively used for guest.
> >> >> The exclusive use of virtual timer interrupt for guest ensures that
> >> >> the function kvm_timer_sync_hwstate() will always see correct
> >> >> state of virtual timer IRQ from GIC.
> >> >>
> >> > I'm not quite following.
> >> >
> >> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
> >> > you would (1) capture the active state of the IRQ pertaining to the
> >> > guest and (2) deactive the IRQ on the host, then (3) switch the state of
> >> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU
> >> > you're running on.
> >> >
> >> > If the host PMU state restored in (3) causes the PMU to raise an
> >> > interrupt, you'll take an interrupt after (4), which is for the host,
> >> > and you'll handle it on the host.
> >> >
> >> We only switch PMU state in assembly code using
> >> kvm_call_hyp(__kvm_vcpu_run, vcpu)
> >> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
> >> the current hardware PMU state is for host. This means whenever
> >> we are in host mode the host PMU can change state of PMU IRQ
> >> in GIC even if local IRQs are disabled.
> >>
> >> Whenever we inspect active state of PMU IRQ in the
> >> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
> >> Here we are not guaranteed that IRQ forward state returned by the
> >> irq_get_fwd_state() API is for guest only.
> >>
> >> The above situation does not manifest for virtual timer because
> >> virtual timer registers are exclusively accessed by Guest and
> >> virtual timer interrupt is only for Guest (never used by Host).
> >>
> >> > Whenever you schedule the guest VCPU again, you'll (a) disable
> >> > interrupts on the CPU, (b) restore the active state of the IRQ for the
> >> > guest, (c) restore the guest PMU state, (d) switch to the guest with
> >> > IRQs enabled on the CPU (potentially).
> >>
> >> Here too, while we are between step (a) and step (b) the PMU HW
> >> context is for host and any PMU counter can overflow. The step (b)
> >> can actually override the PMU IRQ meant for Host.
> >>
> > Can you not simply switch the state from C-code after capturing the IRQ
> > state then? Everything should be accessible from EL1, right?
>
> Yes, I think that would be the only option. This also means I will need
> to re-implement context switching for doing it in C-code.
Yes, you'd add some inline assembly in the C-code to access the
registers I guess. Only thing I thought about after writing my original
mail is whether you'll be counting events while context-swtiching and
running on the host, which you actually don't want to. Not sure if
there's a better way to avoid that.
>
> What about the scenario1 which I had mentioned?
>
You have to consider enabling/disabling forwarding and setting/clearing
the active state is part of the guest PMU state and all of it has to be
context-switched.
Thanks,
-Christoffer
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2014-11-24 14:37 ` Christoffer Dall
@ 2014-11-25 12:47 ` Anup Patel
-1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-25 12:47 UTC (permalink / raw)
To: Christoffer Dall
Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar
Hi Christoffer,
On Mon, Nov 24, 2014 at 8:07 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
>> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>> > On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
>> >> Hi Christoffer,
>> >>
>> >> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
>> >> <christoffer.dall@linaro.org> wrote:
>> >> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>> >> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>> >> >> <christoffer.dall@linaro.org> wrote:
>> >> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>> >> >> >> Hi All,
>> >> >> >>
>> >> >> >> I have second thoughts about rebasing KVM PMU patches
>> >> >> >> to Marc's irq-forwarding patches.
>> >> >> >>
>> >> >> >> The PMU IRQs (when virtualized by KVM) are not exactly
>> >> >> >> forwarded IRQs because they are shared between Host
>> >> >> >> and Guest.
>> >> >> >>
>> >> >> >> Scenario1
>> >> >> >> -------------
>> >> >> >>
>> >> >> >> We might have perf running on Host and no KVM guest
>> >> >> >> running. In this scenario, we wont get interrupts on Host
>> >> >> >> because the kvm_pmu_hyp_init() (similar to the function
>> >> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>> >> >> >> implementation) has put all host PMU IRQs in forwarding
>> >> >> >> mode.
>> >> >> >>
>> >> >> >> The only way solve this problem is to not set forwarding
>> >> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>> >> >> >> have special routines to turn on and turn off the forwarding
>> >> >> >> mode of PMU IRQs. These routines will be called from
>> >> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>> >> >> >> forwarding state.
>> >> >> >>
>> >> >> >> Scenario2
>> >> >> >> -------------
>> >> >> >>
>> >> >> >> We might have perf running on Host and Guest simultaneously
>> >> >> >> which means it is quite likely that PMU HW trigger IRQ meant
>> >> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>> >> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>> >> >> >> of Marc's patchset which is called before local_irq_enable()).
>> >> >> >>
>> >> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>> >> >> >> will accidentally forward IRQ meant for Host to Guest unless
>> >> >> >> we put additional checks to inspect VCPU PMU state.
>> >> >> >>
>> >> >> >> Am I missing any detail about IRQ forwarding for above
>> >> >> >> scenarios?
>> >> >> >>
>> >> >> > Hi Anup,
>> >> >>
>> >> >> Hi Christoffer,
>> >> >>
>> >> >> >
>> >> >> > I briefly discussed this with Marc. What I don't understand is how it
>> >> >> > would be possible to get an interrupt for the host while running the
>> >> >> > guest?
>> >> >> >
>> >> >> > The rationale behind my question is that whenever you're running the
>> >> >> > guest, the PMU should be programmed exclusively with guest state, and
>> >> >> > since the PMU is per core, any interrupts should be for the guest, where
>> >> >> > it would always be pending.
>> >> >>
>> >> >> Yes, thats right PMU is programmed exclusively for guest when
>> >> >> guest is running and for host when host is running.
>> >> >>
>> >> >> Let us assume a situation (Scenario2 mentioned previously)
>> >> >> where both host and guest are using PMU. When the guest is
>> >> >> running we come back to host mode due to variety of reasons
>> >> >> (stage2 fault, guest IO, regular host interrupt, host interrupt
>> >> >> meant for guest, ....) which means we will return from the
>> >> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>> >> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>> >> >> At this point we would have restored back host PMU context and
>> >> >> any PMU counter used by host can trigger PMU overflow interrup
>> >> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>> >> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>> >> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>> >> >> which will try to detect PMU irq forwarding state in GIC hence it
>> >> >> can accidentally discover PMU irq pending for guest while this
>> >> >> PMU irq is actually meant for host.
>> >> >>
>> >> >> This above mentioned situation does not happen for timer
>> >> >> because virtual timer interrupts are exclusively used for guest.
>> >> >> The exclusive use of virtual timer interrupt for guest ensures that
>> >> >> the function kvm_timer_sync_hwstate() will always see correct
>> >> >> state of virtual timer IRQ from GIC.
>> >> >>
>> >> > I'm not quite following.
>> >> >
>> >> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
>> >> > you would (1) capture the active state of the IRQ pertaining to the
>> >> > guest and (2) deactive the IRQ on the host, then (3) switch the state of
>> >> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU
>> >> > you're running on.
>> >> >
>> >> > If the host PMU state restored in (3) causes the PMU to raise an
>> >> > interrupt, you'll take an interrupt after (4), which is for the host,
>> >> > and you'll handle it on the host.
>> >> >
>> >> We only switch PMU state in assembly code using
>> >> kvm_call_hyp(__kvm_vcpu_run, vcpu)
>> >> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
>> >> the current hardware PMU state is for host. This means whenever
>> >> we are in host mode the host PMU can change state of PMU IRQ
>> >> in GIC even if local IRQs are disabled.
>> >>
>> >> Whenever we inspect active state of PMU IRQ in the
>> >> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
>> >> Here we are not guaranteed that IRQ forward state returned by the
>> >> irq_get_fwd_state() API is for guest only.
>> >>
>> >> The above situation does not manifest for virtual timer because
>> >> virtual timer registers are exclusively accessed by Guest and
>> >> virtual timer interrupt is only for Guest (never used by Host).
>> >>
>> >> > Whenever you schedule the guest VCPU again, you'll (a) disable
>> >> > interrupts on the CPU, (b) restore the active state of the IRQ for the
>> >> > guest, (c) restore the guest PMU state, (d) switch to the guest with
>> >> > IRQs enabled on the CPU (potentially).
>> >>
>> >> Here too, while we are between step (a) and step (b) the PMU HW
>> >> context is for host and any PMU counter can overflow. The step (b)
>> >> can actually override the PMU IRQ meant for Host.
>> >>
>> > Can you not simply switch the state from C-code after capturing the IRQ
>> > state then? Everything should be accessible from EL1, right?
>>
>> Yes, I think that would be the only option. This also means I will need
>> to re-implement context switching for doing it in C-code.
>
> Yes, you'd add some inline assembly in the C-code to access the
> registers I guess. Only thing I thought about after writing my original
> mail is whether you'll be counting events while context-swtiching and
> running on the host, which you actually don't want to. Not sure if
> there's a better way to avoid that.
>
>>
>> What about the scenario1 which I had mentioned?
>>
>
> You have to consider enabling/disabling forwarding and setting/clearing
> the active state is part of the guest PMU state and all of it has to be
> context-switched.
I found one more issue.
If PMU irq is PPI then enabling/disabling forwarding will not
work because irqd_set_irq_forwarded() function takes irq_data
as argument which is member of irq_desc and irq_desc for PPIs
is not per_cpu. This means we cannot call irqd_set_irq_forwarded()
simultaneously from different host CPUs.
>
> Thanks,
> -Christoffer
Regards,
Anup
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-25 12:47 ` Anup Patel
0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-25 12:47 UTC (permalink / raw)
To: linux-arm-kernel
Hi Christoffer,
On Mon, Nov 24, 2014 at 8:07 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
>> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>> > On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
>> >> Hi Christoffer,
>> >>
>> >> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
>> >> <christoffer.dall@linaro.org> wrote:
>> >> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>> >> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>> >> >> <christoffer.dall@linaro.org> wrote:
>> >> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>> >> >> >> Hi All,
>> >> >> >>
>> >> >> >> I have second thoughts about rebasing KVM PMU patches
>> >> >> >> to Marc's irq-forwarding patches.
>> >> >> >>
>> >> >> >> The PMU IRQs (when virtualized by KVM) are not exactly
>> >> >> >> forwarded IRQs because they are shared between Host
>> >> >> >> and Guest.
>> >> >> >>
>> >> >> >> Scenario1
>> >> >> >> -------------
>> >> >> >>
>> >> >> >> We might have perf running on Host and no KVM guest
>> >> >> >> running. In this scenario, we wont get interrupts on Host
>> >> >> >> because the kvm_pmu_hyp_init() (similar to the function
>> >> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>> >> >> >> implementation) has put all host PMU IRQs in forwarding
>> >> >> >> mode.
>> >> >> >>
>> >> >> >> The only way solve this problem is to not set forwarding
>> >> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>> >> >> >> have special routines to turn on and turn off the forwarding
>> >> >> >> mode of PMU IRQs. These routines will be called from
>> >> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>> >> >> >> forwarding state.
>> >> >> >>
>> >> >> >> Scenario2
>> >> >> >> -------------
>> >> >> >>
>> >> >> >> We might have perf running on Host and Guest simultaneously
>> >> >> >> which means it is quite likely that PMU HW trigger IRQ meant
>> >> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>> >> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>> >> >> >> of Marc's patchset which is called before local_irq_enable()).
>> >> >> >>
>> >> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>> >> >> >> will accidentally forward IRQ meant for Host to Guest unless
>> >> >> >> we put additional checks to inspect VCPU PMU state.
>> >> >> >>
>> >> >> >> Am I missing any detail about IRQ forwarding for above
>> >> >> >> scenarios?
>> >> >> >>
>> >> >> > Hi Anup,
>> >> >>
>> >> >> Hi Christoffer,
>> >> >>
>> >> >> >
>> >> >> > I briefly discussed this with Marc. What I don't understand is how it
>> >> >> > would be possible to get an interrupt for the host while running the
>> >> >> > guest?
>> >> >> >
>> >> >> > The rationale behind my question is that whenever you're running the
>> >> >> > guest, the PMU should be programmed exclusively with guest state, and
>> >> >> > since the PMU is per core, any interrupts should be for the guest, where
>> >> >> > it would always be pending.
>> >> >>
>> >> >> Yes, thats right PMU is programmed exclusively for guest when
>> >> >> guest is running and for host when host is running.
>> >> >>
>> >> >> Let us assume a situation (Scenario2 mentioned previously)
>> >> >> where both host and guest are using PMU. When the guest is
>> >> >> running we come back to host mode due to variety of reasons
>> >> >> (stage2 fault, guest IO, regular host interrupt, host interrupt
>> >> >> meant for guest, ....) which means we will return from the
>> >> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>> >> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>> >> >> At this point we would have restored back host PMU context and
>> >> >> any PMU counter used by host can trigger PMU overflow interrup
>> >> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>> >> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>> >> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>> >> >> which will try to detect PMU irq forwarding state in GIC hence it
>> >> >> can accidentally discover PMU irq pending for guest while this
>> >> >> PMU irq is actually meant for host.
>> >> >>
>> >> >> This above mentioned situation does not happen for timer
>> >> >> because virtual timer interrupts are exclusively used for guest.
>> >> >> The exclusive use of virtual timer interrupt for guest ensures that
>> >> >> the function kvm_timer_sync_hwstate() will always see correct
>> >> >> state of virtual timer IRQ from GIC.
>> >> >>
>> >> > I'm not quite following.
>> >> >
>> >> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
>> >> > you would (1) capture the active state of the IRQ pertaining to the
>> >> > guest and (2) deactive the IRQ on the host, then (3) switch the state of
>> >> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU
>> >> > you're running on.
>> >> >
>> >> > If the host PMU state restored in (3) causes the PMU to raise an
>> >> > interrupt, you'll take an interrupt after (4), which is for the host,
>> >> > and you'll handle it on the host.
>> >> >
>> >> We only switch PMU state in assembly code using
>> >> kvm_call_hyp(__kvm_vcpu_run, vcpu)
>> >> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
>> >> the current hardware PMU state is for host. This means whenever
>> >> we are in host mode the host PMU can change state of PMU IRQ
>> >> in GIC even if local IRQs are disabled.
>> >>
>> >> Whenever we inspect active state of PMU IRQ in the
>> >> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
>> >> Here we are not guaranteed that IRQ forward state returned by the
>> >> irq_get_fwd_state() API is for guest only.
>> >>
>> >> The above situation does not manifest for virtual timer because
>> >> virtual timer registers are exclusively accessed by Guest and
>> >> virtual timer interrupt is only for Guest (never used by Host).
>> >>
>> >> > Whenever you schedule the guest VCPU again, you'll (a) disable
>> >> > interrupts on the CPU, (b) restore the active state of the IRQ for the
>> >> > guest, (c) restore the guest PMU state, (d) switch to the guest with
>> >> > IRQs enabled on the CPU (potentially).
>> >>
>> >> Here too, while we are between step (a) and step (b) the PMU HW
>> >> context is for host and any PMU counter can overflow. The step (b)
>> >> can actually override the PMU IRQ meant for Host.
>> >>
>> > Can you not simply switch the state from C-code after capturing the IRQ
>> > state then? Everything should be accessible from EL1, right?
>>
>> Yes, I think that would be the only option. This also means I will need
>> to re-implement context switching for doing it in C-code.
>
> Yes, you'd add some inline assembly in the C-code to access the
> registers I guess. Only thing I thought about after writing my original
> mail is whether you'll be counting events while context-swtiching and
> running on the host, which you actually don't want to. Not sure if
> there's a better way to avoid that.
>
>>
>> What about the scenario1 which I had mentioned?
>>
>
> You have to consider enabling/disabling forwarding and setting/clearing
> the active state is part of the guest PMU state and all of it has to be
> context-switched.
I found one more issue.
If PMU irq is PPI then enabling/disabling forwarding will not
work because irqd_set_irq_forwarded() function takes irq_data
as argument which is member of irq_desc and irq_desc for PPIs
is not per_cpu. This means we cannot call irqd_set_irq_forwarded()
simultaneously from different host CPUs.
>
> Thanks,
> -Christoffer
Regards,
Anup
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2014-11-25 12:47 ` Anup Patel
@ 2014-11-25 13:42 ` Christoffer Dall
-1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-25 13:42 UTC (permalink / raw)
To: Anup Patel
Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar
On Tue, Nov 25, 2014 at 06:17:03PM +0530, Anup Patel wrote:
> Hi Christoffer,
>
> On Mon, Nov 24, 2014 at 8:07 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
> >> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
> >> <christoffer.dall@linaro.org> wrote:
> >> > On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
> >> >> Hi Christoffer,
> >> >>
> >> >> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
> >> >> <christoffer.dall@linaro.org> wrote:
> >> >> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
> >> >> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
> >> >> >> <christoffer.dall@linaro.org> wrote:
> >> >> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
> >> >> >> >> Hi All,
> >> >> >> >>
> >> >> >> >> I have second thoughts about rebasing KVM PMU patches
> >> >> >> >> to Marc's irq-forwarding patches.
> >> >> >> >>
> >> >> >> >> The PMU IRQs (when virtualized by KVM) are not exactly
> >> >> >> >> forwarded IRQs because they are shared between Host
> >> >> >> >> and Guest.
> >> >> >> >>
> >> >> >> >> Scenario1
> >> >> >> >> -------------
> >> >> >> >>
> >> >> >> >> We might have perf running on Host and no KVM guest
> >> >> >> >> running. In this scenario, we wont get interrupts on Host
> >> >> >> >> because the kvm_pmu_hyp_init() (similar to the function
> >> >> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> >> >> >> >> implementation) has put all host PMU IRQs in forwarding
> >> >> >> >> mode.
> >> >> >> >>
> >> >> >> >> The only way solve this problem is to not set forwarding
> >> >> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> >> >> >> >> have special routines to turn on and turn off the forwarding
> >> >> >> >> mode of PMU IRQs. These routines will be called from
> >> >> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> >> >> >> >> forwarding state.
> >> >> >> >>
> >> >> >> >> Scenario2
> >> >> >> >> -------------
> >> >> >> >>
> >> >> >> >> We might have perf running on Host and Guest simultaneously
> >> >> >> >> which means it is quite likely that PMU HW trigger IRQ meant
> >> >> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> >> >> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> >> >> >> >> of Marc's patchset which is called before local_irq_enable()).
> >> >> >> >>
> >> >> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> >> >> >> >> will accidentally forward IRQ meant for Host to Guest unless
> >> >> >> >> we put additional checks to inspect VCPU PMU state.
> >> >> >> >>
> >> >> >> >> Am I missing any detail about IRQ forwarding for above
> >> >> >> >> scenarios?
> >> >> >> >>
> >> >> >> > Hi Anup,
> >> >> >>
> >> >> >> Hi Christoffer,
> >> >> >>
> >> >> >> >
> >> >> >> > I briefly discussed this with Marc. What I don't understand is how it
> >> >> >> > would be possible to get an interrupt for the host while running the
> >> >> >> > guest?
> >> >> >> >
> >> >> >> > The rationale behind my question is that whenever you're running the
> >> >> >> > guest, the PMU should be programmed exclusively with guest state, and
> >> >> >> > since the PMU is per core, any interrupts should be for the guest, where
> >> >> >> > it would always be pending.
> >> >> >>
> >> >> >> Yes, thats right PMU is programmed exclusively for guest when
> >> >> >> guest is running and for host when host is running.
> >> >> >>
> >> >> >> Let us assume a situation (Scenario2 mentioned previously)
> >> >> >> where both host and guest are using PMU. When the guest is
> >> >> >> running we come back to host mode due to variety of reasons
> >> >> >> (stage2 fault, guest IO, regular host interrupt, host interrupt
> >> >> >> meant for guest, ....) which means we will return from the
> >> >> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
> >> >> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
> >> >> >> At this point we would have restored back host PMU context and
> >> >> >> any PMU counter used by host can trigger PMU overflow interrup
> >> >> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
> >> >> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the
> >> >> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
> >> >> >> which will try to detect PMU irq forwarding state in GIC hence it
> >> >> >> can accidentally discover PMU irq pending for guest while this
> >> >> >> PMU irq is actually meant for host.
> >> >> >>
> >> >> >> This above mentioned situation does not happen for timer
> >> >> >> because virtual timer interrupts are exclusively used for guest.
> >> >> >> The exclusive use of virtual timer interrupt for guest ensures that
> >> >> >> the function kvm_timer_sync_hwstate() will always see correct
> >> >> >> state of virtual timer IRQ from GIC.
> >> >> >>
> >> >> > I'm not quite following.
> >> >> >
> >> >> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
> >> >> > you would (1) capture the active state of the IRQ pertaining to the
> >> >> > guest and (2) deactive the IRQ on the host, then (3) switch the state of
> >> >> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU
> >> >> > you're running on.
> >> >> >
> >> >> > If the host PMU state restored in (3) causes the PMU to raise an
> >> >> > interrupt, you'll take an interrupt after (4), which is for the host,
> >> >> > and you'll handle it on the host.
> >> >> >
> >> >> We only switch PMU state in assembly code using
> >> >> kvm_call_hyp(__kvm_vcpu_run, vcpu)
> >> >> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
> >> >> the current hardware PMU state is for host. This means whenever
> >> >> we are in host mode the host PMU can change state of PMU IRQ
> >> >> in GIC even if local IRQs are disabled.
> >> >>
> >> >> Whenever we inspect active state of PMU IRQ in the
> >> >> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
> >> >> Here we are not guaranteed that IRQ forward state returned by the
> >> >> irq_get_fwd_state() API is for guest only.
> >> >>
> >> >> The above situation does not manifest for virtual timer because
> >> >> virtual timer registers are exclusively accessed by Guest and
> >> >> virtual timer interrupt is only for Guest (never used by Host).
> >> >>
> >> >> > Whenever you schedule the guest VCPU again, you'll (a) disable
> >> >> > interrupts on the CPU, (b) restore the active state of the IRQ for the
> >> >> > guest, (c) restore the guest PMU state, (d) switch to the guest with
> >> >> > IRQs enabled on the CPU (potentially).
> >> >>
> >> >> Here too, while we are between step (a) and step (b) the PMU HW
> >> >> context is for host and any PMU counter can overflow. The step (b)
> >> >> can actually override the PMU IRQ meant for Host.
> >> >>
> >> > Can you not simply switch the state from C-code after capturing the IRQ
> >> > state then? Everything should be accessible from EL1, right?
> >>
> >> Yes, I think that would be the only option. This also means I will need
> >> to re-implement context switching for doing it in C-code.
> >
> > Yes, you'd add some inline assembly in the C-code to access the
> > registers I guess. Only thing I thought about after writing my original
> > mail is whether you'll be counting events while context-swtiching and
> > running on the host, which you actually don't want to. Not sure if
> > there's a better way to avoid that.
> >
> >>
> >> What about the scenario1 which I had mentioned?
> >>
> >
> > You have to consider enabling/disabling forwarding and setting/clearing
> > the active state is part of the guest PMU state and all of it has to be
> > context-switched.
>
> I found one more issue.
>
> If PMU irq is PPI then enabling/disabling forwarding will not
> work because irqd_set_irq_forwarded() function takes irq_data
> as argument which is member of irq_desc and irq_desc for PPIs
> is not per_cpu. This means we cannot call irqd_set_irq_forwarded()
> simultaneously from different host CPUs.
>
I'll let Marc answer this one and if this still applies to his view of
how the next version of the forwarding series will look like.
-Christoffer
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-25 13:42 ` Christoffer Dall
0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2014-11-25 13:42 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Nov 25, 2014 at 06:17:03PM +0530, Anup Patel wrote:
> Hi Christoffer,
>
> On Mon, Nov 24, 2014 at 8:07 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
> >> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
> >> <christoffer.dall@linaro.org> wrote:
> >> > On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
> >> >> Hi Christoffer,
> >> >>
> >> >> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
> >> >> <christoffer.dall@linaro.org> wrote:
> >> >> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
> >> >> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
> >> >> >> <christoffer.dall@linaro.org> wrote:
> >> >> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
> >> >> >> >> Hi All,
> >> >> >> >>
> >> >> >> >> I have second thoughts about rebasing KVM PMU patches
> >> >> >> >> to Marc's irq-forwarding patches.
> >> >> >> >>
> >> >> >> >> The PMU IRQs (when virtualized by KVM) are not exactly
> >> >> >> >> forwarded IRQs because they are shared between Host
> >> >> >> >> and Guest.
> >> >> >> >>
> >> >> >> >> Scenario1
> >> >> >> >> -------------
> >> >> >> >>
> >> >> >> >> We might have perf running on Host and no KVM guest
> >> >> >> >> running. In this scenario, we wont get interrupts on Host
> >> >> >> >> because the kvm_pmu_hyp_init() (similar to the function
> >> >> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
> >> >> >> >> implementation) has put all host PMU IRQs in forwarding
> >> >> >> >> mode.
> >> >> >> >>
> >> >> >> >> The only way solve this problem is to not set forwarding
> >> >> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
> >> >> >> >> have special routines to turn on and turn off the forwarding
> >> >> >> >> mode of PMU IRQs. These routines will be called from
> >> >> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
> >> >> >> >> forwarding state.
> >> >> >> >>
> >> >> >> >> Scenario2
> >> >> >> >> -------------
> >> >> >> >>
> >> >> >> >> We might have perf running on Host and Guest simultaneously
> >> >> >> >> which means it is quite likely that PMU HW trigger IRQ meant
> >> >> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
> >> >> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
> >> >> >> >> of Marc's patchset which is called before local_irq_enable()).
> >> >> >> >>
> >> >> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
> >> >> >> >> will accidentally forward IRQ meant for Host to Guest unless
> >> >> >> >> we put additional checks to inspect VCPU PMU state.
> >> >> >> >>
> >> >> >> >> Am I missing any detail about IRQ forwarding for above
> >> >> >> >> scenarios?
> >> >> >> >>
> >> >> >> > Hi Anup,
> >> >> >>
> >> >> >> Hi Christoffer,
> >> >> >>
> >> >> >> >
> >> >> >> > I briefly discussed this with Marc. What I don't understand is how it
> >> >> >> > would be possible to get an interrupt for the host while running the
> >> >> >> > guest?
> >> >> >> >
> >> >> >> > The rationale behind my question is that whenever you're running the
> >> >> >> > guest, the PMU should be programmed exclusively with guest state, and
> >> >> >> > since the PMU is per core, any interrupts should be for the guest, where
> >> >> >> > it would always be pending.
> >> >> >>
> >> >> >> Yes, thats right PMU is programmed exclusively for guest when
> >> >> >> guest is running and for host when host is running.
> >> >> >>
> >> >> >> Let us assume a situation (Scenario2 mentioned previously)
> >> >> >> where both host and guest are using PMU. When the guest is
> >> >> >> running we come back to host mode due to variety of reasons
> >> >> >> (stage2 fault, guest IO, regular host interrupt, host interrupt
> >> >> >> meant for guest, ....) which means we will return from the
> >> >> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
> >> >> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
> >> >> >> At this point we would have restored back host PMU context and
> >> >> >> any PMU counter used by host can trigger PMU overflow interrup
> >> >> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
> >> >> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the
> >> >> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
> >> >> >> which will try to detect PMU irq forwarding state in GIC hence it
> >> >> >> can accidentally discover PMU irq pending for guest while this
> >> >> >> PMU irq is actually meant for host.
> >> >> >>
> >> >> >> This above mentioned situation does not happen for timer
> >> >> >> because virtual timer interrupts are exclusively used for guest.
> >> >> >> The exclusive use of virtual timer interrupt for guest ensures that
> >> >> >> the function kvm_timer_sync_hwstate() will always see correct
> >> >> >> state of virtual timer IRQ from GIC.
> >> >> >>
> >> >> > I'm not quite following.
> >> >> >
> >> >> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
> >> >> > you would (1) capture the active state of the IRQ pertaining to the
> >> >> > guest and (2) deactive the IRQ on the host, then (3) switch the state of
> >> >> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU
> >> >> > you're running on.
> >> >> >
> >> >> > If the host PMU state restored in (3) causes the PMU to raise an
> >> >> > interrupt, you'll take an interrupt after (4), which is for the host,
> >> >> > and you'll handle it on the host.
> >> >> >
> >> >> We only switch PMU state in assembly code using
> >> >> kvm_call_hyp(__kvm_vcpu_run, vcpu)
> >> >> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
> >> >> the current hardware PMU state is for host. This means whenever
> >> >> we are in host mode the host PMU can change state of PMU IRQ
> >> >> in GIC even if local IRQs are disabled.
> >> >>
> >> >> Whenever we inspect active state of PMU IRQ in the
> >> >> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
> >> >> Here we are not guaranteed that IRQ forward state returned by the
> >> >> irq_get_fwd_state() API is for guest only.
> >> >>
> >> >> The above situation does not manifest for virtual timer because
> >> >> virtual timer registers are exclusively accessed by Guest and
> >> >> virtual timer interrupt is only for Guest (never used by Host).
> >> >>
> >> >> > Whenever you schedule the guest VCPU again, you'll (a) disable
> >> >> > interrupts on the CPU, (b) restore the active state of the IRQ for the
> >> >> > guest, (c) restore the guest PMU state, (d) switch to the guest with
> >> >> > IRQs enabled on the CPU (potentially).
> >> >>
> >> >> Here too, while we are between step (a) and step (b) the PMU HW
> >> >> context is for host and any PMU counter can overflow. The step (b)
> >> >> can actually override the PMU IRQ meant for Host.
> >> >>
> >> > Can you not simply switch the state from C-code after capturing the IRQ
> >> > state then? Everything should be accessible from EL1, right?
> >>
> >> Yes, I think that would be the only option. This also means I will need
> >> to re-implement context switching for doing it in C-code.
> >
> > Yes, you'd add some inline assembly in the C-code to access the
> > registers I guess. Only thing I thought about after writing my original
> > mail is whether you'll be counting events while context-swtiching and
> > running on the host, which you actually don't want to. Not sure if
> > there's a better way to avoid that.
> >
> >>
> >> What about the scenario1 which I had mentioned?
> >>
> >
> > You have to consider enabling/disabling forwarding and setting/clearing
> > the active state is part of the guest PMU state and all of it has to be
> > context-switched.
>
> I found one more issue.
>
> If PMU irq is PPI then enabling/disabling forwarding will not
> work because irqd_set_irq_forwarded() function takes irq_data
> as argument which is member of irq_desc and irq_desc for PPIs
> is not per_cpu. This means we cannot call irqd_set_irq_forwarded()
> simultaneously from different host CPUs.
>
I'll let Marc answer this one and if this still applies to his view of
how the next version of the forwarding series will look like.
-Christoffer
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2014-11-25 13:42 ` Christoffer Dall
@ 2014-11-27 10:22 ` Anup Patel
-1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-27 10:22 UTC (permalink / raw)
To: Christoffer Dall
Cc: kvmarm, linux-arm-kernel, KVM General, patches, Marc Zyngier,
Will Deacon, Ian Campbell, Pranavkumar Sawargaonkar
On Tue, Nov 25, 2014 at 7:12 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Tue, Nov 25, 2014 at 06:17:03PM +0530, Anup Patel wrote:
>> Hi Christoffer,
>>
>> On Mon, Nov 24, 2014 at 8:07 PM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>> > On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
>> >> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
>> >> <christoffer.dall@linaro.org> wrote:
>> >> > On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
>> >> >> Hi Christoffer,
>> >> >>
>> >> >> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
>> >> >> <christoffer.dall@linaro.org> wrote:
>> >> >> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>> >> >> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>> >> >> >> <christoffer.dall@linaro.org> wrote:
>> >> >> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>> >> >> >> >> Hi All,
>> >> >> >> >>
>> >> >> >> >> I have second thoughts about rebasing KVM PMU patches
>> >> >> >> >> to Marc's irq-forwarding patches.
>> >> >> >> >>
>> >> >> >> >> The PMU IRQs (when virtualized by KVM) are not exactly
>> >> >> >> >> forwarded IRQs because they are shared between Host
>> >> >> >> >> and Guest.
>> >> >> >> >>
>> >> >> >> >> Scenario1
>> >> >> >> >> -------------
>> >> >> >> >>
>> >> >> >> >> We might have perf running on Host and no KVM guest
>> >> >> >> >> running. In this scenario, we wont get interrupts on Host
>> >> >> >> >> because the kvm_pmu_hyp_init() (similar to the function
>> >> >> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>> >> >> >> >> implementation) has put all host PMU IRQs in forwarding
>> >> >> >> >> mode.
>> >> >> >> >>
>> >> >> >> >> The only way solve this problem is to not set forwarding
>> >> >> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>> >> >> >> >> have special routines to turn on and turn off the forwarding
>> >> >> >> >> mode of PMU IRQs. These routines will be called from
>> >> >> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>> >> >> >> >> forwarding state.
>> >> >> >> >>
>> >> >> >> >> Scenario2
>> >> >> >> >> -------------
>> >> >> >> >>
>> >> >> >> >> We might have perf running on Host and Guest simultaneously
>> >> >> >> >> which means it is quite likely that PMU HW trigger IRQ meant
>> >> >> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>> >> >> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>> >> >> >> >> of Marc's patchset which is called before local_irq_enable()).
>> >> >> >> >>
>> >> >> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>> >> >> >> >> will accidentally forward IRQ meant for Host to Guest unless
>> >> >> >> >> we put additional checks to inspect VCPU PMU state.
>> >> >> >> >>
>> >> >> >> >> Am I missing any detail about IRQ forwarding for above
>> >> >> >> >> scenarios?
>> >> >> >> >>
>> >> >> >> > Hi Anup,
>> >> >> >>
>> >> >> >> Hi Christoffer,
>> >> >> >>
>> >> >> >> >
>> >> >> >> > I briefly discussed this with Marc. What I don't understand is how it
>> >> >> >> > would be possible to get an interrupt for the host while running the
>> >> >> >> > guest?
>> >> >> >> >
>> >> >> >> > The rationale behind my question is that whenever you're running the
>> >> >> >> > guest, the PMU should be programmed exclusively with guest state, and
>> >> >> >> > since the PMU is per core, any interrupts should be for the guest, where
>> >> >> >> > it would always be pending.
>> >> >> >>
>> >> >> >> Yes, thats right PMU is programmed exclusively for guest when
>> >> >> >> guest is running and for host when host is running.
>> >> >> >>
>> >> >> >> Let us assume a situation (Scenario2 mentioned previously)
>> >> >> >> where both host and guest are using PMU. When the guest is
>> >> >> >> running we come back to host mode due to variety of reasons
>> >> >> >> (stage2 fault, guest IO, regular host interrupt, host interrupt
>> >> >> >> meant for guest, ....) which means we will return from the
>> >> >> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>> >> >> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>> >> >> >> At this point we would have restored back host PMU context and
>> >> >> >> any PMU counter used by host can trigger PMU overflow interrup
>> >> >> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>> >> >> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>> >> >> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>> >> >> >> which will try to detect PMU irq forwarding state in GIC hence it
>> >> >> >> can accidentally discover PMU irq pending for guest while this
>> >> >> >> PMU irq is actually meant for host.
>> >> >> >>
>> >> >> >> This above mentioned situation does not happen for timer
>> >> >> >> because virtual timer interrupts are exclusively used for guest.
>> >> >> >> The exclusive use of virtual timer interrupt for guest ensures that
>> >> >> >> the function kvm_timer_sync_hwstate() will always see correct
>> >> >> >> state of virtual timer IRQ from GIC.
>> >> >> >>
>> >> >> > I'm not quite following.
>> >> >> >
>> >> >> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
>> >> >> > you would (1) capture the active state of the IRQ pertaining to the
>> >> >> > guest and (2) deactive the IRQ on the host, then (3) switch the state of
>> >> >> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU
>> >> >> > you're running on.
>> >> >> >
>> >> >> > If the host PMU state restored in (3) causes the PMU to raise an
>> >> >> > interrupt, you'll take an interrupt after (4), which is for the host,
>> >> >> > and you'll handle it on the host.
>> >> >> >
>> >> >> We only switch PMU state in assembly code using
>> >> >> kvm_call_hyp(__kvm_vcpu_run, vcpu)
>> >> >> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
>> >> >> the current hardware PMU state is for host. This means whenever
>> >> >> we are in host mode the host PMU can change state of PMU IRQ
>> >> >> in GIC even if local IRQs are disabled.
>> >> >>
>> >> >> Whenever we inspect active state of PMU IRQ in the
>> >> >> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
>> >> >> Here we are not guaranteed that IRQ forward state returned by the
>> >> >> irq_get_fwd_state() API is for guest only.
>> >> >>
>> >> >> The above situation does not manifest for virtual timer because
>> >> >> virtual timer registers are exclusively accessed by Guest and
>> >> >> virtual timer interrupt is only for Guest (never used by Host).
>> >> >>
>> >> >> > Whenever you schedule the guest VCPU again, you'll (a) disable
>> >> >> > interrupts on the CPU, (b) restore the active state of the IRQ for the
>> >> >> > guest, (c) restore the guest PMU state, (d) switch to the guest with
>> >> >> > IRQs enabled on the CPU (potentially).
>> >> >>
>> >> >> Here too, while we are between step (a) and step (b) the PMU HW
>> >> >> context is for host and any PMU counter can overflow. The step (b)
>> >> >> can actually override the PMU IRQ meant for Host.
>> >> >>
>> >> > Can you not simply switch the state from C-code after capturing the IRQ
>> >> > state then? Everything should be accessible from EL1, right?
>> >>
>> >> Yes, I think that would be the only option. This also means I will need
>> >> to re-implement context switching for doing it in C-code.
>> >
>> > Yes, you'd add some inline assembly in the C-code to access the
>> > registers I guess. Only thing I thought about after writing my original
>> > mail is whether you'll be counting events while context-swtiching and
>> > running on the host, which you actually don't want to. Not sure if
>> > there's a better way to avoid that.
>> >
>> >>
>> >> What about the scenario1 which I had mentioned?
>> >>
>> >
>> > You have to consider enabling/disabling forwarding and setting/clearing
>> > the active state is part of the guest PMU state and all of it has to be
>> > context-switched.
>>
>> I found one more issue.
>>
>> If PMU irq is PPI then enabling/disabling forwarding will not
>> work because irqd_set_irq_forwarded() function takes irq_data
>> as argument which is member of irq_desc and irq_desc for PPIs
>> is not per_cpu. This means we cannot call irqd_set_irq_forwarded()
>> simultaneously from different host CPUs.
>>
Hi Marc,
> I'll let Marc answer this one and if this still applies to his view of
> how the next version of the forwarding series will look like.
Ping??
>
> -Christoffer
--
Anup
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-27 10:22 ` Anup Patel
0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-27 10:22 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Nov 25, 2014 at 7:12 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Tue, Nov 25, 2014 at 06:17:03PM +0530, Anup Patel wrote:
>> Hi Christoffer,
>>
>> On Mon, Nov 24, 2014 at 8:07 PM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>> > On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
>> >> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
>> >> <christoffer.dall@linaro.org> wrote:
>> >> > On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
>> >> >> Hi Christoffer,
>> >> >>
>> >> >> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
>> >> >> <christoffer.dall@linaro.org> wrote:
>> >> >> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>> >> >> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>> >> >> >> <christoffer.dall@linaro.org> wrote:
>> >> >> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>> >> >> >> >> Hi All,
>> >> >> >> >>
>> >> >> >> >> I have second thoughts about rebasing KVM PMU patches
>> >> >> >> >> to Marc's irq-forwarding patches.
>> >> >> >> >>
>> >> >> >> >> The PMU IRQs (when virtualized by KVM) are not exactly
>> >> >> >> >> forwarded IRQs because they are shared between Host
>> >> >> >> >> and Guest.
>> >> >> >> >>
>> >> >> >> >> Scenario1
>> >> >> >> >> -------------
>> >> >> >> >>
>> >> >> >> >> We might have perf running on Host and no KVM guest
>> >> >> >> >> running. In this scenario, we wont get interrupts on Host
>> >> >> >> >> because the kvm_pmu_hyp_init() (similar to the function
>> >> >> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>> >> >> >> >> implementation) has put all host PMU IRQs in forwarding
>> >> >> >> >> mode.
>> >> >> >> >>
>> >> >> >> >> The only way solve this problem is to not set forwarding
>> >> >> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>> >> >> >> >> have special routines to turn on and turn off the forwarding
>> >> >> >> >> mode of PMU IRQs. These routines will be called from
>> >> >> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>> >> >> >> >> forwarding state.
>> >> >> >> >>
>> >> >> >> >> Scenario2
>> >> >> >> >> -------------
>> >> >> >> >>
>> >> >> >> >> We might have perf running on Host and Guest simultaneously
>> >> >> >> >> which means it is quite likely that PMU HW trigger IRQ meant
>> >> >> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>> >> >> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>> >> >> >> >> of Marc's patchset which is called before local_irq_enable()).
>> >> >> >> >>
>> >> >> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>> >> >> >> >> will accidentally forward IRQ meant for Host to Guest unless
>> >> >> >> >> we put additional checks to inspect VCPU PMU state.
>> >> >> >> >>
>> >> >> >> >> Am I missing any detail about IRQ forwarding for above
>> >> >> >> >> scenarios?
>> >> >> >> >>
>> >> >> >> > Hi Anup,
>> >> >> >>
>> >> >> >> Hi Christoffer,
>> >> >> >>
>> >> >> >> >
>> >> >> >> > I briefly discussed this with Marc. What I don't understand is how it
>> >> >> >> > would be possible to get an interrupt for the host while running the
>> >> >> >> > guest?
>> >> >> >> >
>> >> >> >> > The rationale behind my question is that whenever you're running the
>> >> >> >> > guest, the PMU should be programmed exclusively with guest state, and
>> >> >> >> > since the PMU is per core, any interrupts should be for the guest, where
>> >> >> >> > it would always be pending.
>> >> >> >>
>> >> >> >> Yes, thats right PMU is programmed exclusively for guest when
>> >> >> >> guest is running and for host when host is running.
>> >> >> >>
>> >> >> >> Let us assume a situation (Scenario2 mentioned previously)
>> >> >> >> where both host and guest are using PMU. When the guest is
>> >> >> >> running we come back to host mode due to variety of reasons
>> >> >> >> (stage2 fault, guest IO, regular host interrupt, host interrupt
>> >> >> >> meant for guest, ....) which means we will return from the
>> >> >> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>> >> >> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>> >> >> >> At this point we would have restored back host PMU context and
>> >> >> >> any PMU counter used by host can trigger PMU overflow interrup
>> >> >> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>> >> >> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>> >> >> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>> >> >> >> which will try to detect PMU irq forwarding state in GIC hence it
>> >> >> >> can accidentally discover PMU irq pending for guest while this
>> >> >> >> PMU irq is actually meant for host.
>> >> >> >>
>> >> >> >> This above mentioned situation does not happen for timer
>> >> >> >> because virtual timer interrupts are exclusively used for guest.
>> >> >> >> The exclusive use of virtual timer interrupt for guest ensures that
>> >> >> >> the function kvm_timer_sync_hwstate() will always see correct
>> >> >> >> state of virtual timer IRQ from GIC.
>> >> >> >>
>> >> >> > I'm not quite following.
>> >> >> >
>> >> >> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
>> >> >> > you would (1) capture the active state of the IRQ pertaining to the
>> >> >> > guest and (2) deactive the IRQ on the host, then (3) switch the state of
>> >> >> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU
>> >> >> > you're running on.
>> >> >> >
>> >> >> > If the host PMU state restored in (3) causes the PMU to raise an
>> >> >> > interrupt, you'll take an interrupt after (4), which is for the host,
>> >> >> > and you'll handle it on the host.
>> >> >> >
>> >> >> We only switch PMU state in assembly code using
>> >> >> kvm_call_hyp(__kvm_vcpu_run, vcpu)
>> >> >> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
>> >> >> the current hardware PMU state is for host. This means whenever
>> >> >> we are in host mode the host PMU can change state of PMU IRQ
>> >> >> in GIC even if local IRQs are disabled.
>> >> >>
>> >> >> Whenever we inspect active state of PMU IRQ in the
>> >> >> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
>> >> >> Here we are not guaranteed that IRQ forward state returned by the
>> >> >> irq_get_fwd_state() API is for guest only.
>> >> >>
>> >> >> The above situation does not manifest for virtual timer because
>> >> >> virtual timer registers are exclusively accessed by Guest and
>> >> >> virtual timer interrupt is only for Guest (never used by Host).
>> >> >>
>> >> >> > Whenever you schedule the guest VCPU again, you'll (a) disable
>> >> >> > interrupts on the CPU, (b) restore the active state of the IRQ for the
>> >> >> > guest, (c) restore the guest PMU state, (d) switch to the guest with
>> >> >> > IRQs enabled on the CPU (potentially).
>> >> >>
>> >> >> Here too, while we are between step (a) and step (b) the PMU HW
>> >> >> context is for host and any PMU counter can overflow. The step (b)
>> >> >> can actually override the PMU IRQ meant for Host.
>> >> >>
>> >> > Can you not simply switch the state from C-code after capturing the IRQ
>> >> > state then? Everything should be accessible from EL1, right?
>> >>
>> >> Yes, I think that would be the only option. This also means I will need
>> >> to re-implement context switching for doing it in C-code.
>> >
>> > Yes, you'd add some inline assembly in the C-code to access the
>> > registers I guess. Only thing I thought about after writing my original
>> > mail is whether you'll be counting events while context-swtiching and
>> > running on the host, which you actually don't want to. Not sure if
>> > there's a better way to avoid that.
>> >
>> >>
>> >> What about the scenario1 which I had mentioned?
>> >>
>> >
>> > You have to consider enabling/disabling forwarding and setting/clearing
>> > the active state is part of the guest PMU state and all of it has to be
>> > context-switched.
>>
>> I found one more issue.
>>
>> If PMU irq is PPI then enabling/disabling forwarding will not
>> work because irqd_set_irq_forwarded() function takes irq_data
>> as argument which is member of irq_desc and irq_desc for PPIs
>> is not per_cpu. This means we cannot call irqd_set_irq_forwarded()
>> simultaneously from different host CPUs.
>>
Hi Marc,
> I'll let Marc answer this one and if this still applies to his view of
> how the next version of the forwarding series will look like.
Ping??
>
> -Christoffer
--
Anup
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2014-11-27 10:22 ` Anup Patel
@ 2014-11-27 10:40 ` Marc Zyngier
-1 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2014-11-27 10:40 UTC (permalink / raw)
To: Anup Patel, Christoffer Dall
Cc: kvmarm, linux-arm-kernel, KVM General, patches, Will Deacon,
Ian.Campbell, Pranavkumar Sawargaonkar
On 27/11/14 10:22, Anup Patel wrote:
> On Tue, Nov 25, 2014 at 7:12 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
>> On Tue, Nov 25, 2014 at 06:17:03PM +0530, Anup Patel wrote:
>>> Hi Christoffer,
>>>
>>> On Mon, Nov 24, 2014 at 8:07 PM, Christoffer Dall
>>> <christoffer.dall@linaro.org> wrote:
>>>> On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
>>>>> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>> On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
>>>>>>> Hi Christoffer,
>>>>>>>
>>>>>>> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>> On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>>>>>>>>> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>>>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>>>> On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>>>>>>>>>>> Hi All,
>>>>>>>>>>>
>>>>>>>>>>> I have second thoughts about rebasing KVM PMU patches
>>>>>>>>>>> to Marc's irq-forwarding patches.
>>>>>>>>>>>
>>>>>>>>>>> The PMU IRQs (when virtualized by KVM) are not exactly
>>>>>>>>>>> forwarded IRQs because they are shared between Host
>>>>>>>>>>> and Guest.
>>>>>>>>>>>
>>>>>>>>>>> Scenario1
>>>>>>>>>>> -------------
>>>>>>>>>>>
>>>>>>>>>>> We might have perf running on Host and no KVM guest
>>>>>>>>>>> running. In this scenario, we wont get interrupts on Host
>>>>>>>>>>> because the kvm_pmu_hyp_init() (similar to the function
>>>>>>>>>>> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>>>>>>>>>>> implementation) has put all host PMU IRQs in forwarding
>>>>>>>>>>> mode.
>>>>>>>>>>>
>>>>>>>>>>> The only way solve this problem is to not set forwarding
>>>>>>>>>>> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>>>>>>>>>>> have special routines to turn on and turn off the forwarding
>>>>>>>>>>> mode of PMU IRQs. These routines will be called from
>>>>>>>>>>> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>>>>>>>>>>> forwarding state.
>>>>>>>>>>>
>>>>>>>>>>> Scenario2
>>>>>>>>>>> -------------
>>>>>>>>>>>
>>>>>>>>>>> We might have perf running on Host and Guest simultaneously
>>>>>>>>>>> which means it is quite likely that PMU HW trigger IRQ meant
>>>>>>>>>>> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>>>>>>>>>>> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>>>>>>>>>>> of Marc's patchset which is called before local_irq_enable()).
>>>>>>>>>>>
>>>>>>>>>>> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>>>>>>>>>>> will accidentally forward IRQ meant for Host to Guest unless
>>>>>>>>>>> we put additional checks to inspect VCPU PMU state.
>>>>>>>>>>>
>>>>>>>>>>> Am I missing any detail about IRQ forwarding for above
>>>>>>>>>>> scenarios?
>>>>>>>>>>>
>>>>>>>>>> Hi Anup,
>>>>>>>>>
>>>>>>>>> Hi Christoffer,
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I briefly discussed this with Marc. What I don't understand is how it
>>>>>>>>>> would be possible to get an interrupt for the host while running the
>>>>>>>>>> guest?
>>>>>>>>>>
>>>>>>>>>> The rationale behind my question is that whenever you're running the
>>>>>>>>>> guest, the PMU should be programmed exclusively with guest state, and
>>>>>>>>>> since the PMU is per core, any interrupts should be for the guest, where
>>>>>>>>>> it would always be pending.
>>>>>>>>>
>>>>>>>>> Yes, thats right PMU is programmed exclusively for guest when
>>>>>>>>> guest is running and for host when host is running.
>>>>>>>>>
>>>>>>>>> Let us assume a situation (Scenario2 mentioned previously)
>>>>>>>>> where both host and guest are using PMU. When the guest is
>>>>>>>>> running we come back to host mode due to variety of reasons
>>>>>>>>> (stage2 fault, guest IO, regular host interrupt, host interrupt
>>>>>>>>> meant for guest, ....) which means we will return from the
>>>>>>>>> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>>>>>>>>> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>>>>>>>>> At this point we would have restored back host PMU context and
>>>>>>>>> any PMU counter used by host can trigger PMU overflow interrup
>>>>>>>>> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>>>>>>>>> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>>>>>>>>> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>>>>>>>>> which will try to detect PMU irq forwarding state in GIC hence it
>>>>>>>>> can accidentally discover PMU irq pending for guest while this
>>>>>>>>> PMU irq is actually meant for host.
>>>>>>>>>
>>>>>>>>> This above mentioned situation does not happen for timer
>>>>>>>>> because virtual timer interrupts are exclusively used for guest.
>>>>>>>>> The exclusive use of virtual timer interrupt for guest ensures that
>>>>>>>>> the function kvm_timer_sync_hwstate() will always see correct
>>>>>>>>> state of virtual timer IRQ from GIC.
>>>>>>>>>
>>>>>>>> I'm not quite following.
>>>>>>>>
>>>>>>>> When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
>>>>>>>> you would (1) capture the active state of the IRQ pertaining to the
>>>>>>>> guest and (2) deactive the IRQ on the host, then (3) switch the state of
>>>>>>>> the PMU to the host state, and finally (4) re-enable IRQs on the CPU
>>>>>>>> you're running on.
>>>>>>>>
>>>>>>>> If the host PMU state restored in (3) causes the PMU to raise an
>>>>>>>> interrupt, you'll take an interrupt after (4), which is for the host,
>>>>>>>> and you'll handle it on the host.
>>>>>>>>
>>>>>>> We only switch PMU state in assembly code using
>>>>>>> kvm_call_hyp(__kvm_vcpu_run, vcpu)
>>>>>>> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
>>>>>>> the current hardware PMU state is for host. This means whenever
>>>>>>> we are in host mode the host PMU can change state of PMU IRQ
>>>>>>> in GIC even if local IRQs are disabled.
>>>>>>>
>>>>>>> Whenever we inspect active state of PMU IRQ in the
>>>>>>> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
>>>>>>> Here we are not guaranteed that IRQ forward state returned by the
>>>>>>> irq_get_fwd_state() API is for guest only.
>>>>>>>
>>>>>>> The above situation does not manifest for virtual timer because
>>>>>>> virtual timer registers are exclusively accessed by Guest and
>>>>>>> virtual timer interrupt is only for Guest (never used by Host).
>>>>>>>
>>>>>>>> Whenever you schedule the guest VCPU again, you'll (a) disable
>>>>>>>> interrupts on the CPU, (b) restore the active state of the IRQ for the
>>>>>>>> guest, (c) restore the guest PMU state, (d) switch to the guest with
>>>>>>>> IRQs enabled on the CPU (potentially).
>>>>>>>
>>>>>>> Here too, while we are between step (a) and step (b) the PMU HW
>>>>>>> context is for host and any PMU counter can overflow. The step (b)
>>>>>>> can actually override the PMU IRQ meant for Host.
>>>>>>>
>>>>>> Can you not simply switch the state from C-code after capturing the IRQ
>>>>>> state then? Everything should be accessible from EL1, right?
>>>>>
>>>>> Yes, I think that would be the only option. This also means I will need
>>>>> to re-implement context switching for doing it in C-code.
>>>>
>>>> Yes, you'd add some inline assembly in the C-code to access the
>>>> registers I guess. Only thing I thought about after writing my original
>>>> mail is whether you'll be counting events while context-swtiching and
>>>> running on the host, which you actually don't want to. Not sure if
>>>> there's a better way to avoid that.
>>>>
>>>>>
>>>>> What about the scenario1 which I had mentioned?
>>>>>
>>>>
>>>> You have to consider enabling/disabling forwarding and setting/clearing
>>>> the active state is part of the guest PMU state and all of it has to be
>>>> context-switched.
>>>
>>> I found one more issue.
>>>
>>> If PMU irq is PPI then enabling/disabling forwarding will not
>>> work because irqd_set_irq_forwarded() function takes irq_data
>>> as argument which is member of irq_desc and irq_desc for PPIs
>>> is not per_cpu. This means we cannot call irqd_set_irq_forwarded()
>>> simultaneously from different host CPUs.
>>>
>
> Hi Marc,
>
>> I'll let Marc answer this one and if this still applies to his view of
>> how the next version of the forwarding series will look like.
I'm looking at it at the moment.
I'm inclined to say that we should fix the forwarding code to allow
individual PPIs to be forwarded. This is a bit harder than what we're
doing at the moment, but that's possible.
Of course, that complicates the code a bit, as we have to make sure
we're not premptable at that time.
What do you think?
Thanks,
M.
--
Jazz is not dead. It just smells funny...
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-27 10:40 ` Marc Zyngier
0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2014-11-27 10:40 UTC (permalink / raw)
To: linux-arm-kernel
On 27/11/14 10:22, Anup Patel wrote:
> On Tue, Nov 25, 2014 at 7:12 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
>> On Tue, Nov 25, 2014 at 06:17:03PM +0530, Anup Patel wrote:
>>> Hi Christoffer,
>>>
>>> On Mon, Nov 24, 2014 at 8:07 PM, Christoffer Dall
>>> <christoffer.dall@linaro.org> wrote:
>>>> On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
>>>>> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>> On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
>>>>>>> Hi Christoffer,
>>>>>>>
>>>>>>> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>> On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>>>>>>>>> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>>>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>>>> On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>>>>>>>>>>> Hi All,
>>>>>>>>>>>
>>>>>>>>>>> I have second thoughts about rebasing KVM PMU patches
>>>>>>>>>>> to Marc's irq-forwarding patches.
>>>>>>>>>>>
>>>>>>>>>>> The PMU IRQs (when virtualized by KVM) are not exactly
>>>>>>>>>>> forwarded IRQs because they are shared between Host
>>>>>>>>>>> and Guest.
>>>>>>>>>>>
>>>>>>>>>>> Scenario1
>>>>>>>>>>> -------------
>>>>>>>>>>>
>>>>>>>>>>> We might have perf running on Host and no KVM guest
>>>>>>>>>>> running. In this scenario, we wont get interrupts on Host
>>>>>>>>>>> because the kvm_pmu_hyp_init() (similar to the function
>>>>>>>>>>> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>>>>>>>>>>> implementation) has put all host PMU IRQs in forwarding
>>>>>>>>>>> mode.
>>>>>>>>>>>
>>>>>>>>>>> The only way solve this problem is to not set forwarding
>>>>>>>>>>> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>>>>>>>>>>> have special routines to turn on and turn off the forwarding
>>>>>>>>>>> mode of PMU IRQs. These routines will be called from
>>>>>>>>>>> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>>>>>>>>>>> forwarding state.
>>>>>>>>>>>
>>>>>>>>>>> Scenario2
>>>>>>>>>>> -------------
>>>>>>>>>>>
>>>>>>>>>>> We might have perf running on Host and Guest simultaneously
>>>>>>>>>>> which means it is quite likely that PMU HW trigger IRQ meant
>>>>>>>>>>> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>>>>>>>>>>> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>>>>>>>>>>> of Marc's patchset which is called before local_irq_enable()).
>>>>>>>>>>>
>>>>>>>>>>> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>>>>>>>>>>> will accidentally forward IRQ meant for Host to Guest unless
>>>>>>>>>>> we put additional checks to inspect VCPU PMU state.
>>>>>>>>>>>
>>>>>>>>>>> Am I missing any detail about IRQ forwarding for above
>>>>>>>>>>> scenarios?
>>>>>>>>>>>
>>>>>>>>>> Hi Anup,
>>>>>>>>>
>>>>>>>>> Hi Christoffer,
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I briefly discussed this with Marc. What I don't understand is how it
>>>>>>>>>> would be possible to get an interrupt for the host while running the
>>>>>>>>>> guest?
>>>>>>>>>>
>>>>>>>>>> The rationale behind my question is that whenever you're running the
>>>>>>>>>> guest, the PMU should be programmed exclusively with guest state, and
>>>>>>>>>> since the PMU is per core, any interrupts should be for the guest, where
>>>>>>>>>> it would always be pending.
>>>>>>>>>
>>>>>>>>> Yes, thats right PMU is programmed exclusively for guest when
>>>>>>>>> guest is running and for host when host is running.
>>>>>>>>>
>>>>>>>>> Let us assume a situation (Scenario2 mentioned previously)
>>>>>>>>> where both host and guest are using PMU. When the guest is
>>>>>>>>> running we come back to host mode due to variety of reasons
>>>>>>>>> (stage2 fault, guest IO, regular host interrupt, host interrupt
>>>>>>>>> meant for guest, ....) which means we will return from the
>>>>>>>>> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>>>>>>>>> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>>>>>>>>> At this point we would have restored back host PMU context and
>>>>>>>>> any PMU counter used by host can trigger PMU overflow interrup
>>>>>>>>> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>>>>>>>>> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>>>>>>>>> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>>>>>>>>> which will try to detect PMU irq forwarding state in GIC hence it
>>>>>>>>> can accidentally discover PMU irq pending for guest while this
>>>>>>>>> PMU irq is actually meant for host.
>>>>>>>>>
>>>>>>>>> This above mentioned situation does not happen for timer
>>>>>>>>> because virtual timer interrupts are exclusively used for guest.
>>>>>>>>> The exclusive use of virtual timer interrupt for guest ensures that
>>>>>>>>> the function kvm_timer_sync_hwstate() will always see correct
>>>>>>>>> state of virtual timer IRQ from GIC.
>>>>>>>>>
>>>>>>>> I'm not quite following.
>>>>>>>>
>>>>>>>> When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
>>>>>>>> you would (1) capture the active state of the IRQ pertaining to the
>>>>>>>> guest and (2) deactive the IRQ on the host, then (3) switch the state of
>>>>>>>> the PMU to the host state, and finally (4) re-enable IRQs on the CPU
>>>>>>>> you're running on.
>>>>>>>>
>>>>>>>> If the host PMU state restored in (3) causes the PMU to raise an
>>>>>>>> interrupt, you'll take an interrupt after (4), which is for the host,
>>>>>>>> and you'll handle it on the host.
>>>>>>>>
>>>>>>> We only switch PMU state in assembly code using
>>>>>>> kvm_call_hyp(__kvm_vcpu_run, vcpu)
>>>>>>> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
>>>>>>> the current hardware PMU state is for host. This means whenever
>>>>>>> we are in host mode the host PMU can change state of PMU IRQ
>>>>>>> in GIC even if local IRQs are disabled.
>>>>>>>
>>>>>>> Whenever we inspect active state of PMU IRQ in the
>>>>>>> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
>>>>>>> Here we are not guaranteed that IRQ forward state returned by the
>>>>>>> irq_get_fwd_state() API is for guest only.
>>>>>>>
>>>>>>> The above situation does not manifest for virtual timer because
>>>>>>> virtual timer registers are exclusively accessed by Guest and
>>>>>>> virtual timer interrupt is only for Guest (never used by Host).
>>>>>>>
>>>>>>>> Whenever you schedule the guest VCPU again, you'll (a) disable
>>>>>>>> interrupts on the CPU, (b) restore the active state of the IRQ for the
>>>>>>>> guest, (c) restore the guest PMU state, (d) switch to the guest with
>>>>>>>> IRQs enabled on the CPU (potentially).
>>>>>>>
>>>>>>> Here too, while we are between step (a) and step (b) the PMU HW
>>>>>>> context is for host and any PMU counter can overflow. The step (b)
>>>>>>> can actually override the PMU IRQ meant for Host.
>>>>>>>
>>>>>> Can you not simply switch the state from C-code after capturing the IRQ
>>>>>> state then? Everything should be accessible from EL1, right?
>>>>>
>>>>> Yes, I think that would be the only option. This also means I will need
>>>>> to re-implement context switching for doing it in C-code.
>>>>
>>>> Yes, you'd add some inline assembly in the C-code to access the
>>>> registers I guess. Only thing I thought about after writing my original
>>>> mail is whether you'll be counting events while context-swtiching and
>>>> running on the host, which you actually don't want to. Not sure if
>>>> there's a better way to avoid that.
>>>>
>>>>>
>>>>> What about the scenario1 which I had mentioned?
>>>>>
>>>>
>>>> You have to consider enabling/disabling forwarding and setting/clearing
>>>> the active state is part of the guest PMU state and all of it has to be
>>>> context-switched.
>>>
>>> I found one more issue.
>>>
>>> If PMU irq is PPI then enabling/disabling forwarding will not
>>> work because irqd_set_irq_forwarded() function takes irq_data
>>> as argument which is member of irq_desc and irq_desc for PPIs
>>> is not per_cpu. This means we cannot call irqd_set_irq_forwarded()
>>> simultaneously from different host CPUs.
>>>
>
> Hi Marc,
>
>> I'll let Marc answer this one and if this still applies to his view of
>> how the next version of the forwarding series will look like.
I'm looking at it at the moment.
I'm inclined to say that we should fix the forwarding code to allow
individual PPIs to be forwarded. This is a bit harder than what we're
doing at the moment, but that's possible.
Of course, that complicates the code a bit, as we have to make sure
we're not premptable at that time.
What do you think?
Thanks,
M.
--
Jazz is not dead. It just smells funny...
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2014-11-27 10:40 ` Marc Zyngier
@ 2014-11-27 10:54 ` Anup Patel
-1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-27 10:54 UTC (permalink / raw)
To: Marc Zyngier
Cc: Christoffer Dall, kvmarm, linux-arm-kernel, KVM General, patches,
Will Deacon, Ian.Campbell, Pranavkumar Sawargaonkar
On Thu, Nov 27, 2014 at 4:10 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 27/11/14 10:22, Anup Patel wrote:
>> On Tue, Nov 25, 2014 at 7:12 PM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>>> On Tue, Nov 25, 2014 at 06:17:03PM +0530, Anup Patel wrote:
>>>> Hi Christoffer,
>>>>
>>>> On Mon, Nov 24, 2014 at 8:07 PM, Christoffer Dall
>>>> <christoffer.dall@linaro.org> wrote:
>>>>> On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
>>>>>> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>> On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
>>>>>>>> Hi Christoffer,
>>>>>>>>
>>>>>>>> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
>>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>>> On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>>>>>>>>>> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>>>>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>>>>> On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>
>>>>>>>>>>>> I have second thoughts about rebasing KVM PMU patches
>>>>>>>>>>>> to Marc's irq-forwarding patches.
>>>>>>>>>>>>
>>>>>>>>>>>> The PMU IRQs (when virtualized by KVM) are not exactly
>>>>>>>>>>>> forwarded IRQs because they are shared between Host
>>>>>>>>>>>> and Guest.
>>>>>>>>>>>>
>>>>>>>>>>>> Scenario1
>>>>>>>>>>>> -------------
>>>>>>>>>>>>
>>>>>>>>>>>> We might have perf running on Host and no KVM guest
>>>>>>>>>>>> running. In this scenario, we wont get interrupts on Host
>>>>>>>>>>>> because the kvm_pmu_hyp_init() (similar to the function
>>>>>>>>>>>> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>>>>>>>>>>>> implementation) has put all host PMU IRQs in forwarding
>>>>>>>>>>>> mode.
>>>>>>>>>>>>
>>>>>>>>>>>> The only way solve this problem is to not set forwarding
>>>>>>>>>>>> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>>>>>>>>>>>> have special routines to turn on and turn off the forwarding
>>>>>>>>>>>> mode of PMU IRQs. These routines will be called from
>>>>>>>>>>>> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>>>>>>>>>>>> forwarding state.
>>>>>>>>>>>>
>>>>>>>>>>>> Scenario2
>>>>>>>>>>>> -------------
>>>>>>>>>>>>
>>>>>>>>>>>> We might have perf running on Host and Guest simultaneously
>>>>>>>>>>>> which means it is quite likely that PMU HW trigger IRQ meant
>>>>>>>>>>>> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>>>>>>>>>>>> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>>>>>>>>>>>> of Marc's patchset which is called before local_irq_enable()).
>>>>>>>>>>>>
>>>>>>>>>>>> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>>>>>>>>>>>> will accidentally forward IRQ meant for Host to Guest unless
>>>>>>>>>>>> we put additional checks to inspect VCPU PMU state.
>>>>>>>>>>>>
>>>>>>>>>>>> Am I missing any detail about IRQ forwarding for above
>>>>>>>>>>>> scenarios?
>>>>>>>>>>>>
>>>>>>>>>>> Hi Anup,
>>>>>>>>>>
>>>>>>>>>> Hi Christoffer,
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I briefly discussed this with Marc. What I don't understand is how it
>>>>>>>>>>> would be possible to get an interrupt for the host while running the
>>>>>>>>>>> guest?
>>>>>>>>>>>
>>>>>>>>>>> The rationale behind my question is that whenever you're running the
>>>>>>>>>>> guest, the PMU should be programmed exclusively with guest state, and
>>>>>>>>>>> since the PMU is per core, any interrupts should be for the guest, where
>>>>>>>>>>> it would always be pending.
>>>>>>>>>>
>>>>>>>>>> Yes, thats right PMU is programmed exclusively for guest when
>>>>>>>>>> guest is running and for host when host is running.
>>>>>>>>>>
>>>>>>>>>> Let us assume a situation (Scenario2 mentioned previously)
>>>>>>>>>> where both host and guest are using PMU. When the guest is
>>>>>>>>>> running we come back to host mode due to variety of reasons
>>>>>>>>>> (stage2 fault, guest IO, regular host interrupt, host interrupt
>>>>>>>>>> meant for guest, ....) which means we will return from the
>>>>>>>>>> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>>>>>>>>>> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>>>>>>>>>> At this point we would have restored back host PMU context and
>>>>>>>>>> any PMU counter used by host can trigger PMU overflow interrup
>>>>>>>>>> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>>>>>>>>>> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>>>>>>>>>> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>>>>>>>>>> which will try to detect PMU irq forwarding state in GIC hence it
>>>>>>>>>> can accidentally discover PMU irq pending for guest while this
>>>>>>>>>> PMU irq is actually meant for host.
>>>>>>>>>>
>>>>>>>>>> This above mentioned situation does not happen for timer
>>>>>>>>>> because virtual timer interrupts are exclusively used for guest.
>>>>>>>>>> The exclusive use of virtual timer interrupt for guest ensures that
>>>>>>>>>> the function kvm_timer_sync_hwstate() will always see correct
>>>>>>>>>> state of virtual timer IRQ from GIC.
>>>>>>>>>>
>>>>>>>>> I'm not quite following.
>>>>>>>>>
>>>>>>>>> When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
>>>>>>>>> you would (1) capture the active state of the IRQ pertaining to the
>>>>>>>>> guest and (2) deactive the IRQ on the host, then (3) switch the state of
>>>>>>>>> the PMU to the host state, and finally (4) re-enable IRQs on the CPU
>>>>>>>>> you're running on.
>>>>>>>>>
>>>>>>>>> If the host PMU state restored in (3) causes the PMU to raise an
>>>>>>>>> interrupt, you'll take an interrupt after (4), which is for the host,
>>>>>>>>> and you'll handle it on the host.
>>>>>>>>>
>>>>>>>> We only switch PMU state in assembly code using
>>>>>>>> kvm_call_hyp(__kvm_vcpu_run, vcpu)
>>>>>>>> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
>>>>>>>> the current hardware PMU state is for host. This means whenever
>>>>>>>> we are in host mode the host PMU can change state of PMU IRQ
>>>>>>>> in GIC even if local IRQs are disabled.
>>>>>>>>
>>>>>>>> Whenever we inspect active state of PMU IRQ in the
>>>>>>>> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
>>>>>>>> Here we are not guaranteed that IRQ forward state returned by the
>>>>>>>> irq_get_fwd_state() API is for guest only.
>>>>>>>>
>>>>>>>> The above situation does not manifest for virtual timer because
>>>>>>>> virtual timer registers are exclusively accessed by Guest and
>>>>>>>> virtual timer interrupt is only for Guest (never used by Host).
>>>>>>>>
>>>>>>>>> Whenever you schedule the guest VCPU again, you'll (a) disable
>>>>>>>>> interrupts on the CPU, (b) restore the active state of the IRQ for the
>>>>>>>>> guest, (c) restore the guest PMU state, (d) switch to the guest with
>>>>>>>>> IRQs enabled on the CPU (potentially).
>>>>>>>>
>>>>>>>> Here too, while we are between step (a) and step (b) the PMU HW
>>>>>>>> context is for host and any PMU counter can overflow. The step (b)
>>>>>>>> can actually override the PMU IRQ meant for Host.
>>>>>>>>
>>>>>>> Can you not simply switch the state from C-code after capturing the IRQ
>>>>>>> state then? Everything should be accessible from EL1, right?
>>>>>>
>>>>>> Yes, I think that would be the only option. This also means I will need
>>>>>> to re-implement context switching for doing it in C-code.
>>>>>
>>>>> Yes, you'd add some inline assembly in the C-code to access the
>>>>> registers I guess. Only thing I thought about after writing my original
>>>>> mail is whether you'll be counting events while context-swtiching and
>>>>> running on the host, which you actually don't want to. Not sure if
>>>>> there's a better way to avoid that.
>>>>>
>>>>>>
>>>>>> What about the scenario1 which I had mentioned?
>>>>>>
>>>>>
>>>>> You have to consider enabling/disabling forwarding and setting/clearing
>>>>> the active state is part of the guest PMU state and all of it has to be
>>>>> context-switched.
>>>>
>>>> I found one more issue.
>>>>
>>>> If PMU irq is PPI then enabling/disabling forwarding will not
>>>> work because irqd_set_irq_forwarded() function takes irq_data
>>>> as argument which is member of irq_desc and irq_desc for PPIs
>>>> is not per_cpu. This means we cannot call irqd_set_irq_forwarded()
>>>> simultaneously from different host CPUs.
>>>>
>>
>> Hi Marc,
>>
>>> I'll let Marc answer this one and if this still applies to his view of
>>> how the next version of the forwarding series will look like.
>
> I'm looking at it at the moment.
>
> I'm inclined to say that we should fix the forwarding code to allow
> individual PPIs to be forwarded. This is a bit harder than what we're
> doing at the moment, but that's possible.
>
> Of course, that complicates the code a bit, as we have to make sure
> we're not premptable at that time.
>
> What do you think?
Currently, irqd_set_irq_forwarded() is lockless.
It would be great if we can update irqd_set_irq_forwarded() for PPIs
such that it remains irqd_set_irq_forwarded() lockless so that we
dont have much overhead when we enable/disable forwarding
state.
>
> Thanks,
>
> M.
> --
> Jazz is not dead. It just smells funny...
Regards,
Anup
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-27 10:54 ` Anup Patel
0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-11-27 10:54 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Nov 27, 2014 at 4:10 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 27/11/14 10:22, Anup Patel wrote:
>> On Tue, Nov 25, 2014 at 7:12 PM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>>> On Tue, Nov 25, 2014 at 06:17:03PM +0530, Anup Patel wrote:
>>>> Hi Christoffer,
>>>>
>>>> On Mon, Nov 24, 2014 at 8:07 PM, Christoffer Dall
>>>> <christoffer.dall@linaro.org> wrote:
>>>>> On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
>>>>>> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>> On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
>>>>>>>> Hi Christoffer,
>>>>>>>>
>>>>>>>> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
>>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>>> On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>>>>>>>>>> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>>>>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>>>>> On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>
>>>>>>>>>>>> I have second thoughts about rebasing KVM PMU patches
>>>>>>>>>>>> to Marc's irq-forwarding patches.
>>>>>>>>>>>>
>>>>>>>>>>>> The PMU IRQs (when virtualized by KVM) are not exactly
>>>>>>>>>>>> forwarded IRQs because they are shared between Host
>>>>>>>>>>>> and Guest.
>>>>>>>>>>>>
>>>>>>>>>>>> Scenario1
>>>>>>>>>>>> -------------
>>>>>>>>>>>>
>>>>>>>>>>>> We might have perf running on Host and no KVM guest
>>>>>>>>>>>> running. In this scenario, we wont get interrupts on Host
>>>>>>>>>>>> because the kvm_pmu_hyp_init() (similar to the function
>>>>>>>>>>>> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>>>>>>>>>>>> implementation) has put all host PMU IRQs in forwarding
>>>>>>>>>>>> mode.
>>>>>>>>>>>>
>>>>>>>>>>>> The only way solve this problem is to not set forwarding
>>>>>>>>>>>> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>>>>>>>>>>>> have special routines to turn on and turn off the forwarding
>>>>>>>>>>>> mode of PMU IRQs. These routines will be called from
>>>>>>>>>>>> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>>>>>>>>>>>> forwarding state.
>>>>>>>>>>>>
>>>>>>>>>>>> Scenario2
>>>>>>>>>>>> -------------
>>>>>>>>>>>>
>>>>>>>>>>>> We might have perf running on Host and Guest simultaneously
>>>>>>>>>>>> which means it is quite likely that PMU HW trigger IRQ meant
>>>>>>>>>>>> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>>>>>>>>>>>> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>>>>>>>>>>>> of Marc's patchset which is called before local_irq_enable()).
>>>>>>>>>>>>
>>>>>>>>>>>> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>>>>>>>>>>>> will accidentally forward IRQ meant for Host to Guest unless
>>>>>>>>>>>> we put additional checks to inspect VCPU PMU state.
>>>>>>>>>>>>
>>>>>>>>>>>> Am I missing any detail about IRQ forwarding for above
>>>>>>>>>>>> scenarios?
>>>>>>>>>>>>
>>>>>>>>>>> Hi Anup,
>>>>>>>>>>
>>>>>>>>>> Hi Christoffer,
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I briefly discussed this with Marc. What I don't understand is how it
>>>>>>>>>>> would be possible to get an interrupt for the host while running the
>>>>>>>>>>> guest?
>>>>>>>>>>>
>>>>>>>>>>> The rationale behind my question is that whenever you're running the
>>>>>>>>>>> guest, the PMU should be programmed exclusively with guest state, and
>>>>>>>>>>> since the PMU is per core, any interrupts should be for the guest, where
>>>>>>>>>>> it would always be pending.
>>>>>>>>>>
>>>>>>>>>> Yes, thats right PMU is programmed exclusively for guest when
>>>>>>>>>> guest is running and for host when host is running.
>>>>>>>>>>
>>>>>>>>>> Let us assume a situation (Scenario2 mentioned previously)
>>>>>>>>>> where both host and guest are using PMU. When the guest is
>>>>>>>>>> running we come back to host mode due to variety of reasons
>>>>>>>>>> (stage2 fault, guest IO, regular host interrupt, host interrupt
>>>>>>>>>> meant for guest, ....) which means we will return from the
>>>>>>>>>> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>>>>>>>>>> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>>>>>>>>>> At this point we would have restored back host PMU context and
>>>>>>>>>> any PMU counter used by host can trigger PMU overflow interrup
>>>>>>>>>> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>>>>>>>>>> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>>>>>>>>>> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>>>>>>>>>> which will try to detect PMU irq forwarding state in GIC hence it
>>>>>>>>>> can accidentally discover PMU irq pending for guest while this
>>>>>>>>>> PMU irq is actually meant for host.
>>>>>>>>>>
>>>>>>>>>> This above mentioned situation does not happen for timer
>>>>>>>>>> because virtual timer interrupts are exclusively used for guest.
>>>>>>>>>> The exclusive use of virtual timer interrupt for guest ensures that
>>>>>>>>>> the function kvm_timer_sync_hwstate() will always see correct
>>>>>>>>>> state of virtual timer IRQ from GIC.
>>>>>>>>>>
>>>>>>>>> I'm not quite following.
>>>>>>>>>
>>>>>>>>> When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
>>>>>>>>> you would (1) capture the active state of the IRQ pertaining to the
>>>>>>>>> guest and (2) deactive the IRQ on the host, then (3) switch the state of
>>>>>>>>> the PMU to the host state, and finally (4) re-enable IRQs on the CPU
>>>>>>>>> you're running on.
>>>>>>>>>
>>>>>>>>> If the host PMU state restored in (3) causes the PMU to raise an
>>>>>>>>> interrupt, you'll take an interrupt after (4), which is for the host,
>>>>>>>>> and you'll handle it on the host.
>>>>>>>>>
>>>>>>>> We only switch PMU state in assembly code using
>>>>>>>> kvm_call_hyp(__kvm_vcpu_run, vcpu)
>>>>>>>> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
>>>>>>>> the current hardware PMU state is for host. This means whenever
>>>>>>>> we are in host mode the host PMU can change state of PMU IRQ
>>>>>>>> in GIC even if local IRQs are disabled.
>>>>>>>>
>>>>>>>> Whenever we inspect active state of PMU IRQ in the
>>>>>>>> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
>>>>>>>> Here we are not guaranteed that IRQ forward state returned by the
>>>>>>>> irq_get_fwd_state() API is for guest only.
>>>>>>>>
>>>>>>>> The above situation does not manifest for virtual timer because
>>>>>>>> virtual timer registers are exclusively accessed by Guest and
>>>>>>>> virtual timer interrupt is only for Guest (never used by Host).
>>>>>>>>
>>>>>>>>> Whenever you schedule the guest VCPU again, you'll (a) disable
>>>>>>>>> interrupts on the CPU, (b) restore the active state of the IRQ for the
>>>>>>>>> guest, (c) restore the guest PMU state, (d) switch to the guest with
>>>>>>>>> IRQs enabled on the CPU (potentially).
>>>>>>>>
>>>>>>>> Here too, while we are between step (a) and step (b) the PMU HW
>>>>>>>> context is for host and any PMU counter can overflow. The step (b)
>>>>>>>> can actually override the PMU IRQ meant for Host.
>>>>>>>>
>>>>>>> Can you not simply switch the state from C-code after capturing the IRQ
>>>>>>> state then? Everything should be accessible from EL1, right?
>>>>>>
>>>>>> Yes, I think that would be the only option. This also means I will need
>>>>>> to re-implement context switching for doing it in C-code.
>>>>>
>>>>> Yes, you'd add some inline assembly in the C-code to access the
>>>>> registers I guess. Only thing I thought about after writing my original
>>>>> mail is whether you'll be counting events while context-swtiching and
>>>>> running on the host, which you actually don't want to. Not sure if
>>>>> there's a better way to avoid that.
>>>>>
>>>>>>
>>>>>> What about the scenario1 which I had mentioned?
>>>>>>
>>>>>
>>>>> You have to consider enabling/disabling forwarding and setting/clearing
>>>>> the active state is part of the guest PMU state and all of it has to be
>>>>> context-switched.
>>>>
>>>> I found one more issue.
>>>>
>>>> If PMU irq is PPI then enabling/disabling forwarding will not
>>>> work because irqd_set_irq_forwarded() function takes irq_data
>>>> as argument which is member of irq_desc and irq_desc for PPIs
>>>> is not per_cpu. This means we cannot call irqd_set_irq_forwarded()
>>>> simultaneously from different host CPUs.
>>>>
>>
>> Hi Marc,
>>
>>> I'll let Marc answer this one and if this still applies to his view of
>>> how the next version of the forwarding series will look like.
>
> I'm looking at it at the moment.
>
> I'm inclined to say that we should fix the forwarding code to allow
> individual PPIs to be forwarded. This is a bit harder than what we're
> doing at the moment, but that's possible.
>
> Of course, that complicates the code a bit, as we have to make sure
> we're not premptable at that time.
>
> What do you think?
Currently, irqd_set_irq_forwarded() is lockless.
It would be great if we can update irqd_set_irq_forwarded() for PPIs
such that it remains irqd_set_irq_forwarded() lockless so that we
dont have much overhead when we enable/disable forwarding
state.
>
> Thanks,
>
> M.
> --
> Jazz is not dead. It just smells funny...
Regards,
Anup
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2014-11-27 10:54 ` Anup Patel
@ 2014-11-27 11:06 ` Marc Zyngier
-1 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2014-11-27 11:06 UTC (permalink / raw)
To: Anup Patel
Cc: Christoffer Dall, kvmarm, linux-arm-kernel, KVM General, patches,
Will Deacon, Ian.Campbell, Pranavkumar Sawargaonkar
On 27/11/14 10:54, Anup Patel wrote:
> On Thu, Nov 27, 2014 at 4:10 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> On 27/11/14 10:22, Anup Patel wrote:
>>> On Tue, Nov 25, 2014 at 7:12 PM, Christoffer Dall
>>> <christoffer.dall@linaro.org> wrote:
>>>> On Tue, Nov 25, 2014 at 06:17:03PM +0530, Anup Patel wrote:
>>>>> Hi Christoffer,
>>>>>
>>>>> On Mon, Nov 24, 2014 at 8:07 PM, Christoffer Dall
>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>> On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
>>>>>>> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>> On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
>>>>>>>>> Hi Christoffer,
>>>>>>>>>
>>>>>>>>> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
>>>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>>>> On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>>>>>>>>>>> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>>>>>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>>>>>> On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have second thoughts about rebasing KVM PMU patches
>>>>>>>>>>>>> to Marc's irq-forwarding patches.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The PMU IRQs (when virtualized by KVM) are not exactly
>>>>>>>>>>>>> forwarded IRQs because they are shared between Host
>>>>>>>>>>>>> and Guest.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Scenario1
>>>>>>>>>>>>> -------------
>>>>>>>>>>>>>
>>>>>>>>>>>>> We might have perf running on Host and no KVM guest
>>>>>>>>>>>>> running. In this scenario, we wont get interrupts on Host
>>>>>>>>>>>>> because the kvm_pmu_hyp_init() (similar to the function
>>>>>>>>>>>>> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>>>>>>>>>>>>> implementation) has put all host PMU IRQs in forwarding
>>>>>>>>>>>>> mode.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The only way solve this problem is to not set forwarding
>>>>>>>>>>>>> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>>>>>>>>>>>>> have special routines to turn on and turn off the forwarding
>>>>>>>>>>>>> mode of PMU IRQs. These routines will be called from
>>>>>>>>>>>>> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>>>>>>>>>>>>> forwarding state.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Scenario2
>>>>>>>>>>>>> -------------
>>>>>>>>>>>>>
>>>>>>>>>>>>> We might have perf running on Host and Guest simultaneously
>>>>>>>>>>>>> which means it is quite likely that PMU HW trigger IRQ meant
>>>>>>>>>>>>> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>>>>>>>>>>>>> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>>>>>>>>>>>>> of Marc's patchset which is called before local_irq_enable()).
>>>>>>>>>>>>>
>>>>>>>>>>>>> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>>>>>>>>>>>>> will accidentally forward IRQ meant for Host to Guest unless
>>>>>>>>>>>>> we put additional checks to inspect VCPU PMU state.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am I missing any detail about IRQ forwarding for above
>>>>>>>>>>>>> scenarios?
>>>>>>>>>>>>>
>>>>>>>>>>>> Hi Anup,
>>>>>>>>>>>
>>>>>>>>>>> Hi Christoffer,
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I briefly discussed this with Marc. What I don't understand is how it
>>>>>>>>>>>> would be possible to get an interrupt for the host while running the
>>>>>>>>>>>> guest?
>>>>>>>>>>>>
>>>>>>>>>>>> The rationale behind my question is that whenever you're running the
>>>>>>>>>>>> guest, the PMU should be programmed exclusively with guest state, and
>>>>>>>>>>>> since the PMU is per core, any interrupts should be for the guest, where
>>>>>>>>>>>> it would always be pending.
>>>>>>>>>>>
>>>>>>>>>>> Yes, thats right PMU is programmed exclusively for guest when
>>>>>>>>>>> guest is running and for host when host is running.
>>>>>>>>>>>
>>>>>>>>>>> Let us assume a situation (Scenario2 mentioned previously)
>>>>>>>>>>> where both host and guest are using PMU. When the guest is
>>>>>>>>>>> running we come back to host mode due to variety of reasons
>>>>>>>>>>> (stage2 fault, guest IO, regular host interrupt, host interrupt
>>>>>>>>>>> meant for guest, ....) which means we will return from the
>>>>>>>>>>> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>>>>>>>>>>> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>>>>>>>>>>> At this point we would have restored back host PMU context and
>>>>>>>>>>> any PMU counter used by host can trigger PMU overflow interrup
>>>>>>>>>>> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>>>>>>>>>>> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>>>>>>>>>>> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>>>>>>>>>>> which will try to detect PMU irq forwarding state in GIC hence it
>>>>>>>>>>> can accidentally discover PMU irq pending for guest while this
>>>>>>>>>>> PMU irq is actually meant for host.
>>>>>>>>>>>
>>>>>>>>>>> This above mentioned situation does not happen for timer
>>>>>>>>>>> because virtual timer interrupts are exclusively used for guest.
>>>>>>>>>>> The exclusive use of virtual timer interrupt for guest ensures that
>>>>>>>>>>> the function kvm_timer_sync_hwstate() will always see correct
>>>>>>>>>>> state of virtual timer IRQ from GIC.
>>>>>>>>>>>
>>>>>>>>>> I'm not quite following.
>>>>>>>>>>
>>>>>>>>>> When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
>>>>>>>>>> you would (1) capture the active state of the IRQ pertaining to the
>>>>>>>>>> guest and (2) deactive the IRQ on the host, then (3) switch the state of
>>>>>>>>>> the PMU to the host state, and finally (4) re-enable IRQs on the CPU
>>>>>>>>>> you're running on.
>>>>>>>>>>
>>>>>>>>>> If the host PMU state restored in (3) causes the PMU to raise an
>>>>>>>>>> interrupt, you'll take an interrupt after (4), which is for the host,
>>>>>>>>>> and you'll handle it on the host.
>>>>>>>>>>
>>>>>>>>> We only switch PMU state in assembly code using
>>>>>>>>> kvm_call_hyp(__kvm_vcpu_run, vcpu)
>>>>>>>>> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
>>>>>>>>> the current hardware PMU state is for host. This means whenever
>>>>>>>>> we are in host mode the host PMU can change state of PMU IRQ
>>>>>>>>> in GIC even if local IRQs are disabled.
>>>>>>>>>
>>>>>>>>> Whenever we inspect active state of PMU IRQ in the
>>>>>>>>> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
>>>>>>>>> Here we are not guaranteed that IRQ forward state returned by the
>>>>>>>>> irq_get_fwd_state() API is for guest only.
>>>>>>>>>
>>>>>>>>> The above situation does not manifest for virtual timer because
>>>>>>>>> virtual timer registers are exclusively accessed by Guest and
>>>>>>>>> virtual timer interrupt is only for Guest (never used by Host).
>>>>>>>>>
>>>>>>>>>> Whenever you schedule the guest VCPU again, you'll (a) disable
>>>>>>>>>> interrupts on the CPU, (b) restore the active state of the IRQ for the
>>>>>>>>>> guest, (c) restore the guest PMU state, (d) switch to the guest with
>>>>>>>>>> IRQs enabled on the CPU (potentially).
>>>>>>>>>
>>>>>>>>> Here too, while we are between step (a) and step (b) the PMU HW
>>>>>>>>> context is for host and any PMU counter can overflow. The step (b)
>>>>>>>>> can actually override the PMU IRQ meant for Host.
>>>>>>>>>
>>>>>>>> Can you not simply switch the state from C-code after capturing the IRQ
>>>>>>>> state then? Everything should be accessible from EL1, right?
>>>>>>>
>>>>>>> Yes, I think that would be the only option. This also means I will need
>>>>>>> to re-implement context switching for doing it in C-code.
>>>>>>
>>>>>> Yes, you'd add some inline assembly in the C-code to access the
>>>>>> registers I guess. Only thing I thought about after writing my original
>>>>>> mail is whether you'll be counting events while context-swtiching and
>>>>>> running on the host, which you actually don't want to. Not sure if
>>>>>> there's a better way to avoid that.
>>>>>>
>>>>>>>
>>>>>>> What about the scenario1 which I had mentioned?
>>>>>>>
>>>>>>
>>>>>> You have to consider enabling/disabling forwarding and setting/clearing
>>>>>> the active state is part of the guest PMU state and all of it has to be
>>>>>> context-switched.
>>>>>
>>>>> I found one more issue.
>>>>>
>>>>> If PMU irq is PPI then enabling/disabling forwarding will not
>>>>> work because irqd_set_irq_forwarded() function takes irq_data
>>>>> as argument which is member of irq_desc and irq_desc for PPIs
>>>>> is not per_cpu. This means we cannot call irqd_set_irq_forwarded()
>>>>> simultaneously from different host CPUs.
>>>>>
>>>
>>> Hi Marc,
>>>
>>>> I'll let Marc answer this one and if this still applies to his view of
>>>> how the next version of the forwarding series will look like.
>>
>> I'm looking at it at the moment.
>>
>> I'm inclined to say that we should fix the forwarding code to allow
>> individual PPIs to be forwarded. This is a bit harder than what we're
>> doing at the moment, but that's possible.
>>
>> Of course, that complicates the code a bit, as we have to make sure
>> we're not premptable at that time.
>>
>> What do you think?
>
> Currently, irqd_set_irq_forwarded() is lockless.
>
> It would be great if we can update irqd_set_irq_forwarded() for PPIs
> such that it remains irqd_set_irq_forwarded() lockless so that we
> dont have much overhead when we enable/disable forwarding
> state.
We probably need a separate API anyway, as you want to be able to
provide a cpumask to configure this. We can refine this as we go, and I
wouldn't worry about overhead just yet.
Thanks,
M.
--
Jazz is not dead. It just smells funny...
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-11-27 11:06 ` Marc Zyngier
0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2014-11-27 11:06 UTC (permalink / raw)
To: linux-arm-kernel
On 27/11/14 10:54, Anup Patel wrote:
> On Thu, Nov 27, 2014 at 4:10 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> On 27/11/14 10:22, Anup Patel wrote:
>>> On Tue, Nov 25, 2014 at 7:12 PM, Christoffer Dall
>>> <christoffer.dall@linaro.org> wrote:
>>>> On Tue, Nov 25, 2014 at 06:17:03PM +0530, Anup Patel wrote:
>>>>> Hi Christoffer,
>>>>>
>>>>> On Mon, Nov 24, 2014 at 8:07 PM, Christoffer Dall
>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>> On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote:
>>>>>>> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall
>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>> On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote:
>>>>>>>>> Hi Christoffer,
>>>>>>>>>
>>>>>>>>> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall
>>>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>>>> On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote:
>>>>>>>>>>> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
>>>>>>>>>>> <christoffer.dall@linaro.org> wrote:
>>>>>>>>>>>> On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have second thoughts about rebasing KVM PMU patches
>>>>>>>>>>>>> to Marc's irq-forwarding patches.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The PMU IRQs (when virtualized by KVM) are not exactly
>>>>>>>>>>>>> forwarded IRQs because they are shared between Host
>>>>>>>>>>>>> and Guest.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Scenario1
>>>>>>>>>>>>> -------------
>>>>>>>>>>>>>
>>>>>>>>>>>>> We might have perf running on Host and no KVM guest
>>>>>>>>>>>>> running. In this scenario, we wont get interrupts on Host
>>>>>>>>>>>>> because the kvm_pmu_hyp_init() (similar to the function
>>>>>>>>>>>>> kvm_timer_hyp_init() of Marc's IRQ-forwarding
>>>>>>>>>>>>> implementation) has put all host PMU IRQs in forwarding
>>>>>>>>>>>>> mode.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The only way solve this problem is to not set forwarding
>>>>>>>>>>>>> mode for PMU IRQs in kvm_pmu_hyp_init() and instead
>>>>>>>>>>>>> have special routines to turn on and turn off the forwarding
>>>>>>>>>>>>> mode of PMU IRQs. These routines will be called from
>>>>>>>>>>>>> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
>>>>>>>>>>>>> forwarding state.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Scenario2
>>>>>>>>>>>>> -------------
>>>>>>>>>>>>>
>>>>>>>>>>>>> We might have perf running on Host and Guest simultaneously
>>>>>>>>>>>>> which means it is quite likely that PMU HW trigger IRQ meant
>>>>>>>>>>>>> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);"
>>>>>>>>>>>>> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine
>>>>>>>>>>>>> of Marc's patchset which is called before local_irq_enable()).
>>>>>>>>>>>>>
>>>>>>>>>>>>> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
>>>>>>>>>>>>> will accidentally forward IRQ meant for Host to Guest unless
>>>>>>>>>>>>> we put additional checks to inspect VCPU PMU state.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am I missing any detail about IRQ forwarding for above
>>>>>>>>>>>>> scenarios?
>>>>>>>>>>>>>
>>>>>>>>>>>> Hi Anup,
>>>>>>>>>>>
>>>>>>>>>>> Hi Christoffer,
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I briefly discussed this with Marc. What I don't understand is how it
>>>>>>>>>>>> would be possible to get an interrupt for the host while running the
>>>>>>>>>>>> guest?
>>>>>>>>>>>>
>>>>>>>>>>>> The rationale behind my question is that whenever you're running the
>>>>>>>>>>>> guest, the PMU should be programmed exclusively with guest state, and
>>>>>>>>>>>> since the PMU is per core, any interrupts should be for the guest, where
>>>>>>>>>>>> it would always be pending.
>>>>>>>>>>>
>>>>>>>>>>> Yes, thats right PMU is programmed exclusively for guest when
>>>>>>>>>>> guest is running and for host when host is running.
>>>>>>>>>>>
>>>>>>>>>>> Let us assume a situation (Scenario2 mentioned previously)
>>>>>>>>>>> where both host and guest are using PMU. When the guest is
>>>>>>>>>>> running we come back to host mode due to variety of reasons
>>>>>>>>>>> (stage2 fault, guest IO, regular host interrupt, host interrupt
>>>>>>>>>>> meant for guest, ....) which means we will return from the
>>>>>>>>>>> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the
>>>>>>>>>>> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
>>>>>>>>>>> At this point we would have restored back host PMU context and
>>>>>>>>>>> any PMU counter used by host can trigger PMU overflow interrup
>>>>>>>>>>> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);"
>>>>>>>>>>> in the kvm_arch_vcpu_ioctl_run() function (similar to the
>>>>>>>>>>> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
>>>>>>>>>>> which will try to detect PMU irq forwarding state in GIC hence it
>>>>>>>>>>> can accidentally discover PMU irq pending for guest while this
>>>>>>>>>>> PMU irq is actually meant for host.
>>>>>>>>>>>
>>>>>>>>>>> This above mentioned situation does not happen for timer
>>>>>>>>>>> because virtual timer interrupts are exclusively used for guest.
>>>>>>>>>>> The exclusive use of virtual timer interrupt for guest ensures that
>>>>>>>>>>> the function kvm_timer_sync_hwstate() will always see correct
>>>>>>>>>>> state of virtual timer IRQ from GIC.
>>>>>>>>>>>
>>>>>>>>>> I'm not quite following.
>>>>>>>>>>
>>>>>>>>>> When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section,
>>>>>>>>>> you would (1) capture the active state of the IRQ pertaining to the
>>>>>>>>>> guest and (2) deactive the IRQ on the host, then (3) switch the state of
>>>>>>>>>> the PMU to the host state, and finally (4) re-enable IRQs on the CPU
>>>>>>>>>> you're running on.
>>>>>>>>>>
>>>>>>>>>> If the host PMU state restored in (3) causes the PMU to raise an
>>>>>>>>>> interrupt, you'll take an interrupt after (4), which is for the host,
>>>>>>>>>> and you'll handle it on the host.
>>>>>>>>>>
>>>>>>>>> We only switch PMU state in assembly code using
>>>>>>>>> kvm_call_hyp(__kvm_vcpu_run, vcpu)
>>>>>>>>> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode)
>>>>>>>>> the current hardware PMU state is for host. This means whenever
>>>>>>>>> we are in host mode the host PMU can change state of PMU IRQ
>>>>>>>>> in GIC even if local IRQs are disabled.
>>>>>>>>>
>>>>>>>>> Whenever we inspect active state of PMU IRQ in the
>>>>>>>>> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API.
>>>>>>>>> Here we are not guaranteed that IRQ forward state returned by the
>>>>>>>>> irq_get_fwd_state() API is for guest only.
>>>>>>>>>
>>>>>>>>> The above situation does not manifest for virtual timer because
>>>>>>>>> virtual timer registers are exclusively accessed by Guest and
>>>>>>>>> virtual timer interrupt is only for Guest (never used by Host).
>>>>>>>>>
>>>>>>>>>> Whenever you schedule the guest VCPU again, you'll (a) disable
>>>>>>>>>> interrupts on the CPU, (b) restore the active state of the IRQ for the
>>>>>>>>>> guest, (c) restore the guest PMU state, (d) switch to the guest with
>>>>>>>>>> IRQs enabled on the CPU (potentially).
>>>>>>>>>
>>>>>>>>> Here too, while we are between step (a) and step (b) the PMU HW
>>>>>>>>> context is for host and any PMU counter can overflow. The step (b)
>>>>>>>>> can actually override the PMU IRQ meant for Host.
>>>>>>>>>
>>>>>>>> Can you not simply switch the state from C-code after capturing the IRQ
>>>>>>>> state then? Everything should be accessible from EL1, right?
>>>>>>>
>>>>>>> Yes, I think that would be the only option. This also means I will need
>>>>>>> to re-implement context switching for doing it in C-code.
>>>>>>
>>>>>> Yes, you'd add some inline assembly in the C-code to access the
>>>>>> registers I guess. Only thing I thought about after writing my original
>>>>>> mail is whether you'll be counting events while context-swtiching and
>>>>>> running on the host, which you actually don't want to. Not sure if
>>>>>> there's a better way to avoid that.
>>>>>>
>>>>>>>
>>>>>>> What about the scenario1 which I had mentioned?
>>>>>>>
>>>>>>
>>>>>> You have to consider enabling/disabling forwarding and setting/clearing
>>>>>> the active state is part of the guest PMU state and all of it has to be
>>>>>> context-switched.
>>>>>
>>>>> I found one more issue.
>>>>>
>>>>> If PMU irq is PPI then enabling/disabling forwarding will not
>>>>> work because irqd_set_irq_forwarded() function takes irq_data
>>>>> as argument which is member of irq_desc and irq_desc for PPIs
>>>>> is not per_cpu. This means we cannot call irqd_set_irq_forwarded()
>>>>> simultaneously from different host CPUs.
>>>>>
>>>
>>> Hi Marc,
>>>
>>>> I'll let Marc answer this one and if this still applies to his view of
>>>> how the next version of the forwarding series will look like.
>>
>> I'm looking at it at the moment.
>>
>> I'm inclined to say that we should fix the forwarding code to allow
>> individual PPIs to be forwarded. This is a bit harder than what we're
>> doing at the moment, but that's possible.
>>
>> Of course, that complicates the code a bit, as we have to make sure
>> we're not premptable at that time.
>>
>> What do you think?
>
> Currently, irqd_set_irq_forwarded() is lockless.
>
> It would be great if we can update irqd_set_irq_forwarded() for PPIs
> such that it remains irqd_set_irq_forwarded() lockless so that we
> dont have much overhead when we enable/disable forwarding
> state.
We probably need a separate API anyway, as you want to be able to
provide a cpumask to configure this. We can refine this as we go, and I
wouldn't worry about overhead just yet.
Thanks,
M.
--
Jazz is not dead. It just smells funny...
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2014-11-27 11:06 ` Marc Zyngier
@ 2014-12-30 5:49 ` Anup Patel
-1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-12-30 5:49 UTC (permalink / raw)
To: Marc Zyngier
Cc: Christoffer Dall, kvmarm, linux-arm-kernel, KVM General, patches,
Will Deacon, Ian.Campbell, Pranavkumar Sawargaonkar
(dropping previous conversation for easy reading)
Hi Marc/Christoffer,
I tried implementing PMU context-switch via C code
in EL1 mode and in atomic context with irqs disabled.
The context switch itself works perfectly fine but
irq forwarding is not clean for PMU irq.
I found another issue that is GIC only samples irq
lines if they are enabled. This means for using
irq forwarding we will need to ensure that host PMU
irq is enabled. The arch_timer code does this by
doing request_irq() for host virtual timer interrupt.
For PMU, we can either enable/disable host PMU
irq in context switch or we need to do have shared
irq handler between kvm pmu and host kernel pmu.
I have rethinked about our discussion so far. I
understand that we need KVM PMU virtualization
to meet following criteria:
1. No modification in host PMU driver
2. No modification in guest PMU driver
3. No mask/unmask dance for sharing host PMU irq
4. Clean way to avoid infinite VM exits due to
PMU interrupt
I have discovered new approach which is as follows:
1. Context switch PMU in atomic context (i.e. local_irq_disable())
2. Ensure that host PMU irq is disabled when entering guest
mode and re-enable host PMU irq when exiting guest mode if
it was enabled previously. This is to avoid infinite VM exits
due to PMU interrupt because as-per new approach we
don't mask the PMU irq via PMINTENSET_EL1 register.
3. Inject virtual PMU irq at time of entering guest mode if PMU
overflow register is non-zero (i.e. PMOVSSET_EL0) in atomic
context (i.e. local_irq_disable()).
The only limitation of this new approach is that virtual PMU irq
is injected at time of entering guest mode. This means guest
will receive virtual PMU interrupt with little delay after actual
interrupt occurred. The PMU interrupts are only overflow events
and generally not used in any timing critical applications. If we
can live with this limitation then this can be a good approach
for KVM PMU virtualization.
Regards,
Anup
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2014-12-30 5:49 ` Anup Patel
0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2014-12-30 5:49 UTC (permalink / raw)
To: linux-arm-kernel
(dropping previous conversation for easy reading)
Hi Marc/Christoffer,
I tried implementing PMU context-switch via C code
in EL1 mode and in atomic context with irqs disabled.
The context switch itself works perfectly fine but
irq forwarding is not clean for PMU irq.
I found another issue that is GIC only samples irq
lines if they are enabled. This means for using
irq forwarding we will need to ensure that host PMU
irq is enabled. The arch_timer code does this by
doing request_irq() for host virtual timer interrupt.
For PMU, we can either enable/disable host PMU
irq in context switch or we need to do have shared
irq handler between kvm pmu and host kernel pmu.
I have rethinked about our discussion so far. I
understand that we need KVM PMU virtualization
to meet following criteria:
1. No modification in host PMU driver
2. No modification in guest PMU driver
3. No mask/unmask dance for sharing host PMU irq
4. Clean way to avoid infinite VM exits due to
PMU interrupt
I have discovered new approach which is as follows:
1. Context switch PMU in atomic context (i.e. local_irq_disable())
2. Ensure that host PMU irq is disabled when entering guest
mode and re-enable host PMU irq when exiting guest mode if
it was enabled previously. This is to avoid infinite VM exits
due to PMU interrupt because as-per new approach we
don't mask the PMU irq via PMINTENSET_EL1 register.
3. Inject virtual PMU irq at time of entering guest mode if PMU
overflow register is non-zero (i.e. PMOVSSET_EL0) in atomic
context (i.e. local_irq_disable()).
The only limitation of this new approach is that virtual PMU irq
is injected at time of entering guest mode. This means guest
will receive virtual PMU interrupt with little delay after actual
interrupt occurred. The PMU interrupts are only overflow events
and generally not used in any timing critical applications. If we
can live with this limitation then this can be a good approach
for KVM PMU virtualization.
Regards,
Anup
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2014-12-30 5:49 ` Anup Patel
@ 2015-01-08 4:02 ` Anup Patel
-1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2015-01-08 4:02 UTC (permalink / raw)
To: Marc Zyngier
Cc: Christoffer Dall, kvmarm, linux-arm-kernel, KVM General, patches,
Will Deacon, Ian.Campbell, Pranavkumar Sawargaonkar
On Tue, Dec 30, 2014 at 11:19 AM, Anup Patel <anup@brainfault.org> wrote:
> (dropping previous conversation for easy reading)
>
> Hi Marc/Christoffer,
>
> I tried implementing PMU context-switch via C code
> in EL1 mode and in atomic context with irqs disabled.
> The context switch itself works perfectly fine but
> irq forwarding is not clean for PMU irq.
>
> I found another issue that is GIC only samples irq
> lines if they are enabled. This means for using
> irq forwarding we will need to ensure that host PMU
> irq is enabled. The arch_timer code does this by
> doing request_irq() for host virtual timer interrupt.
> For PMU, we can either enable/disable host PMU
> irq in context switch or we need to do have shared
> irq handler between kvm pmu and host kernel pmu.
>
> I have rethinked about our discussion so far. I
> understand that we need KVM PMU virtualization
> to meet following criteria:
> 1. No modification in host PMU driver
> 2. No modification in guest PMU driver
> 3. No mask/unmask dance for sharing host PMU irq
> 4. Clean way to avoid infinite VM exits due to
> PMU interrupt
>
> I have discovered new approach which is as follows:
> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
> 2. Ensure that host PMU irq is disabled when entering guest
> mode and re-enable host PMU irq when exiting guest mode if
> it was enabled previously. This is to avoid infinite VM exits
> due to PMU interrupt because as-per new approach we
> don't mask the PMU irq via PMINTENSET_EL1 register.
> 3. Inject virtual PMU irq at time of entering guest mode if PMU
> overflow register is non-zero (i.e. PMOVSSET_EL0) in atomic
> context (i.e. local_irq_disable()).
>
> The only limitation of this new approach is that virtual PMU irq
> is injected at time of entering guest mode. This means guest
> will receive virtual PMU interrupt with little delay after actual
> interrupt occurred. The PMU interrupts are only overflow events
> and generally not used in any timing critical applications. If we
> can live with this limitation then this can be a good approach
> for KVM PMU virtualization.
>
> Regards,
> Anup
Hi Marc/Christoffer,
Ping??
Regards,
Anup
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2015-01-08 4:02 ` Anup Patel
0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2015-01-08 4:02 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Dec 30, 2014 at 11:19 AM, Anup Patel <anup@brainfault.org> wrote:
> (dropping previous conversation for easy reading)
>
> Hi Marc/Christoffer,
>
> I tried implementing PMU context-switch via C code
> in EL1 mode and in atomic context with irqs disabled.
> The context switch itself works perfectly fine but
> irq forwarding is not clean for PMU irq.
>
> I found another issue that is GIC only samples irq
> lines if they are enabled. This means for using
> irq forwarding we will need to ensure that host PMU
> irq is enabled. The arch_timer code does this by
> doing request_irq() for host virtual timer interrupt.
> For PMU, we can either enable/disable host PMU
> irq in context switch or we need to do have shared
> irq handler between kvm pmu and host kernel pmu.
>
> I have rethinked about our discussion so far. I
> understand that we need KVM PMU virtualization
> to meet following criteria:
> 1. No modification in host PMU driver
> 2. No modification in guest PMU driver
> 3. No mask/unmask dance for sharing host PMU irq
> 4. Clean way to avoid infinite VM exits due to
> PMU interrupt
>
> I have discovered new approach which is as follows:
> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
> 2. Ensure that host PMU irq is disabled when entering guest
> mode and re-enable host PMU irq when exiting guest mode if
> it was enabled previously. This is to avoid infinite VM exits
> due to PMU interrupt because as-per new approach we
> don't mask the PMU irq via PMINTENSET_EL1 register.
> 3. Inject virtual PMU irq at time of entering guest mode if PMU
> overflow register is non-zero (i.e. PMOVSSET_EL0) in atomic
> context (i.e. local_irq_disable()).
>
> The only limitation of this new approach is that virtual PMU irq
> is injected at time of entering guest mode. This means guest
> will receive virtual PMU interrupt with little delay after actual
> interrupt occurred. The PMU interrupts are only overflow events
> and generally not used in any timing critical applications. If we
> can live with this limitation then this can be a good approach
> for KVM PMU virtualization.
>
> Regards,
> Anup
Hi Marc/Christoffer,
Ping??
Regards,
Anup
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2014-12-30 5:49 ` Anup Patel
@ 2015-01-11 19:11 ` Christoffer Dall
-1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2015-01-11 19:11 UTC (permalink / raw)
To: Anup Patel
Cc: Marc Zyngier, kvmarm, linux-arm-kernel, KVM General, patches,
Will Deacon, Ian.Campbell, Pranavkumar Sawargaonkar
On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
> (dropping previous conversation for easy reading)
>
> Hi Marc/Christoffer,
>
> I tried implementing PMU context-switch via C code
> in EL1 mode and in atomic context with irqs disabled.
> The context switch itself works perfectly fine but
> irq forwarding is not clean for PMU irq.
>
> I found another issue that is GIC only samples irq
> lines if they are enabled. This means for using
> irq forwarding we will need to ensure that host PMU
> irq is enabled. The arch_timer code does this by
> doing request_irq() for host virtual timer interrupt.
> For PMU, we can either enable/disable host PMU
> irq in context switch or we need to do have shared
> irq handler between kvm pmu and host kernel pmu.
could we simply require the host PMU driver to request the IRQ and have
the driver inject the corresponding IRQ to the VM via a mechanism
similar to VFIO using an eventfd and irqfds etc.?
(I haven't quite thought through if there's a way for the host PMU
driver to distinguish between an IRQ for itself and one for the guest,
though).
It does feel like we will need some sort of communication/coordination
between the host PMU driver and KVM...
>
> I have rethinked about our discussion so far. I
> understand that we need KVM PMU virtualization
> to meet following criteria:
> 1. No modification in host PMU driver
is this really a strict requirement? one of the advantages of KVM
should be that the rest of the kernel should be supportive of KVM.
> 2. No modification in guest PMU driver
> 3. No mask/unmask dance for sharing host PMU irq
> 4. Clean way to avoid infinite VM exits due to
> PMU interrupt
>
> I have discovered new approach which is as follows:
> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
> 2. Ensure that host PMU irq is disabled when entering guest
> mode and re-enable host PMU irq when exiting guest mode if
> it was enabled previously.
How does this look like software-engineering wise? Would you be looking
up the IRQ number from the DT in the KVM code again? How does KVM then
synchronize with the host PMU driver so they're not both requesting the
same IRQ at the same time?
> This is to avoid infinite VM exits
> due to PMU interrupt because as-per new approach we
> don't mask the PMU irq via PMINTENSET_EL1 register.
> 3. Inject virtual PMU irq at time of entering guest mode if PMU
> overflow register is non-zero (i.e. PMOVSSET_EL0) in atomic
> context (i.e. local_irq_disable()).
>
> The only limitation of this new approach is that virtual PMU irq
> is injected at time of entering guest mode. This means guest
> will receive virtual PMU interrupt with little delay after actual
> interrupt occurred.
it may never receive it in the case of a tickless configuration AFAICT,
so this doesn't sound like the right approach.
> The PMU interrupts are only overflow events
> and generally not used in any timing critical applications. If we
> can live with this limitation then this can be a good approach
> for KVM PMU virtualization.
>
Thanks,
-Christoffer
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2015-01-11 19:11 ` Christoffer Dall
0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2015-01-11 19:11 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
> (dropping previous conversation for easy reading)
>
> Hi Marc/Christoffer,
>
> I tried implementing PMU context-switch via C code
> in EL1 mode and in atomic context with irqs disabled.
> The context switch itself works perfectly fine but
> irq forwarding is not clean for PMU irq.
>
> I found another issue that is GIC only samples irq
> lines if they are enabled. This means for using
> irq forwarding we will need to ensure that host PMU
> irq is enabled. The arch_timer code does this by
> doing request_irq() for host virtual timer interrupt.
> For PMU, we can either enable/disable host PMU
> irq in context switch or we need to do have shared
> irq handler between kvm pmu and host kernel pmu.
could we simply require the host PMU driver to request the IRQ and have
the driver inject the corresponding IRQ to the VM via a mechanism
similar to VFIO using an eventfd and irqfds etc.?
(I haven't quite thought through if there's a way for the host PMU
driver to distinguish between an IRQ for itself and one for the guest,
though).
It does feel like we will need some sort of communication/coordination
between the host PMU driver and KVM...
>
> I have rethinked about our discussion so far. I
> understand that we need KVM PMU virtualization
> to meet following criteria:
> 1. No modification in host PMU driver
is this really a strict requirement? one of the advantages of KVM
should be that the rest of the kernel should be supportive of KVM.
> 2. No modification in guest PMU driver
> 3. No mask/unmask dance for sharing host PMU irq
> 4. Clean way to avoid infinite VM exits due to
> PMU interrupt
>
> I have discovered new approach which is as follows:
> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
> 2. Ensure that host PMU irq is disabled when entering guest
> mode and re-enable host PMU irq when exiting guest mode if
> it was enabled previously.
How does this look like software-engineering wise? Would you be looking
up the IRQ number from the DT in the KVM code again? How does KVM then
synchronize with the host PMU driver so they're not both requesting the
same IRQ at the same time?
> This is to avoid infinite VM exits
> due to PMU interrupt because as-per new approach we
> don't mask the PMU irq via PMINTENSET_EL1 register.
> 3. Inject virtual PMU irq at time of entering guest mode if PMU
> overflow register is non-zero (i.e. PMOVSSET_EL0) in atomic
> context (i.e. local_irq_disable()).
>
> The only limitation of this new approach is that virtual PMU irq
> is injected at time of entering guest mode. This means guest
> will receive virtual PMU interrupt with little delay after actual
> interrupt occurred.
it may never receive it in the case of a tickless configuration AFAICT,
so this doesn't sound like the right approach.
> The PMU interrupts are only overflow events
> and generally not used in any timing critical applications. If we
> can live with this limitation then this can be a good approach
> for KVM PMU virtualization.
>
Thanks,
-Christoffer
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2015-01-11 19:11 ` Christoffer Dall
@ 2015-01-12 4:19 ` Anup Patel
-1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2015-01-12 4:19 UTC (permalink / raw)
To: Christoffer Dall
Cc: Ian.Campbell, KVM General, Marc Zyngier, patches, Will Deacon,
kvmarm, linux-arm-kernel, Pranavkumar Sawargaonkar
On Mon, Jan 12, 2015 at 12:41 AM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
>> (dropping previous conversation for easy reading)
>>
>> Hi Marc/Christoffer,
>>
>> I tried implementing PMU context-switch via C code
>> in EL1 mode and in atomic context with irqs disabled.
>> The context switch itself works perfectly fine but
>> irq forwarding is not clean for PMU irq.
>>
>> I found another issue that is GIC only samples irq
>> lines if they are enabled. This means for using
>> irq forwarding we will need to ensure that host PMU
>> irq is enabled. The arch_timer code does this by
>> doing request_irq() for host virtual timer interrupt.
>> For PMU, we can either enable/disable host PMU
>> irq in context switch or we need to do have shared
>> irq handler between kvm pmu and host kernel pmu.
>
> could we simply require the host PMU driver to request the IRQ and have
> the driver inject the corresponding IRQ to the VM via a mechanism
> similar to VFIO using an eventfd and irqfds etc.?
Currently, the host PMU driver does request_irq() only when
there is some event to be monitored. This means host will do
request_irq() only when we run perf application on host
user space.
Initially, I though that we could simply pass IRQF_SHARED
for request_irq() in host PMU driver and do the same for
reqest_irq() in KVM PMU code but the PMU irq can be
SPI or PPI. If the PMU irq is SPI then IRQF_SHARED
flag would fine but if its PPI then we have no way to
set IRQF_SHARED flag because request_percpu_irq()
does not have irq flags parameter.
>
> (I haven't quite thought through if there's a way for the host PMU
> driver to distinguish between an IRQ for itself and one for the guest,
> though).
>
> It does feel like we will need some sort of communication/coordination
> between the host PMU driver and KVM...
>
>>
>> I have rethinked about our discussion so far. I
>> understand that we need KVM PMU virtualization
>> to meet following criteria:
>> 1. No modification in host PMU driver
>
> is this really a strict requirement? one of the advantages of KVM
> should be that the rest of the kernel should be supportive of KVM.
I guess so because host PMU driver should not do things
differently for host and guest. I think this the reason why
we discarded the mask/unmask PMU irq approach which
I had implemented in RFC v1.
>
>> 2. No modification in guest PMU driver
>> 3. No mask/unmask dance for sharing host PMU irq
>> 4. Clean way to avoid infinite VM exits due to
>> PMU interrupt
>>
>> I have discovered new approach which is as follows:
>> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
>> 2. Ensure that host PMU irq is disabled when entering guest
>> mode and re-enable host PMU irq when exiting guest mode if
>> it was enabled previously.
>
> How does this look like software-engineering wise? Would you be looking
> up the IRQ number from the DT in the KVM code again? How does KVM then
> synchronize with the host PMU driver so they're not both requesting the
> same IRQ at the same time?
We only lookup host PMU irq numbers from DT at HYP init time.
During context switch we know the host PMU irq number for
current host CPU so we can get state of host PMU irq in
context switch code.
If we go by the shard irq handler approach then both KVM
and host PMU driver will do request_irq() on same host
PMU irq. In other words, there is no virtual PMU irq provided
by HW for guest.
>
>> This is to avoid infinite VM exits
>> due to PMU interrupt because as-per new approach we
>> don't mask the PMU irq via PMINTENSET_EL1 register.
>> 3. Inject virtual PMU irq at time of entering guest mode if PMU
>> overflow register is non-zero (i.e. PMOVSSET_EL0) in atomic
>> context (i.e. local_irq_disable()).
>>
>> The only limitation of this new approach is that virtual PMU irq
>> is injected at time of entering guest mode. This means guest
>> will receive virtual PMU interrupt with little delay after actual
>> interrupt occurred.
>
> it may never receive it in the case of a tickless configuration AFAICT,
> so this doesn't sound like the right approach.
I think irrespective to any approach we take, we need a mechanism
to have shared irq handlers in KVM PMU and host PMU driver for
both PPI and SPI.
>
>> The PMU interrupts are only overflow events
>> and generally not used in any timing critical applications. If we
>> can live with this limitation then this can be a good approach
>> for KVM PMU virtualization.
>>
> Thanks,
> -Christoffer
Regards,
Anup
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2015-01-12 4:19 ` Anup Patel
0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2015-01-12 4:19 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Jan 12, 2015 at 12:41 AM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
>> (dropping previous conversation for easy reading)
>>
>> Hi Marc/Christoffer,
>>
>> I tried implementing PMU context-switch via C code
>> in EL1 mode and in atomic context with irqs disabled.
>> The context switch itself works perfectly fine but
>> irq forwarding is not clean for PMU irq.
>>
>> I found another issue that is GIC only samples irq
>> lines if they are enabled. This means for using
>> irq forwarding we will need to ensure that host PMU
>> irq is enabled. The arch_timer code does this by
>> doing request_irq() for host virtual timer interrupt.
>> For PMU, we can either enable/disable host PMU
>> irq in context switch or we need to do have shared
>> irq handler between kvm pmu and host kernel pmu.
>
> could we simply require the host PMU driver to request the IRQ and have
> the driver inject the corresponding IRQ to the VM via a mechanism
> similar to VFIO using an eventfd and irqfds etc.?
Currently, the host PMU driver does request_irq() only when
there is some event to be monitored. This means host will do
request_irq() only when we run perf application on host
user space.
Initially, I though that we could simply pass IRQF_SHARED
for request_irq() in host PMU driver and do the same for
reqest_irq() in KVM PMU code but the PMU irq can be
SPI or PPI. If the PMU irq is SPI then IRQF_SHARED
flag would fine but if its PPI then we have no way to
set IRQF_SHARED flag because request_percpu_irq()
does not have irq flags parameter.
>
> (I haven't quite thought through if there's a way for the host PMU
> driver to distinguish between an IRQ for itself and one for the guest,
> though).
>
> It does feel like we will need some sort of communication/coordination
> between the host PMU driver and KVM...
>
>>
>> I have rethinked about our discussion so far. I
>> understand that we need KVM PMU virtualization
>> to meet following criteria:
>> 1. No modification in host PMU driver
>
> is this really a strict requirement? one of the advantages of KVM
> should be that the rest of the kernel should be supportive of KVM.
I guess so because host PMU driver should not do things
differently for host and guest. I think this the reason why
we discarded the mask/unmask PMU irq approach which
I had implemented in RFC v1.
>
>> 2. No modification in guest PMU driver
>> 3. No mask/unmask dance for sharing host PMU irq
>> 4. Clean way to avoid infinite VM exits due to
>> PMU interrupt
>>
>> I have discovered new approach which is as follows:
>> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
>> 2. Ensure that host PMU irq is disabled when entering guest
>> mode and re-enable host PMU irq when exiting guest mode if
>> it was enabled previously.
>
> How does this look like software-engineering wise? Would you be looking
> up the IRQ number from the DT in the KVM code again? How does KVM then
> synchronize with the host PMU driver so they're not both requesting the
> same IRQ at the same time?
We only lookup host PMU irq numbers from DT at HYP init time.
During context switch we know the host PMU irq number for
current host CPU so we can get state of host PMU irq in
context switch code.
If we go by the shard irq handler approach then both KVM
and host PMU driver will do request_irq() on same host
PMU irq. In other words, there is no virtual PMU irq provided
by HW for guest.
>
>> This is to avoid infinite VM exits
>> due to PMU interrupt because as-per new approach we
>> don't mask the PMU irq via PMINTENSET_EL1 register.
>> 3. Inject virtual PMU irq at time of entering guest mode if PMU
>> overflow register is non-zero (i.e. PMOVSSET_EL0) in atomic
>> context (i.e. local_irq_disable()).
>>
>> The only limitation of this new approach is that virtual PMU irq
>> is injected at time of entering guest mode. This means guest
>> will receive virtual PMU interrupt with little delay after actual
>> interrupt occurred.
>
> it may never receive it in the case of a tickless configuration AFAICT,
> so this doesn't sound like the right approach.
I think irrespective to any approach we take, we need a mechanism
to have shared irq handlers in KVM PMU and host PMU driver for
both PPI and SPI.
>
>> The PMU interrupts are only overflow events
>> and generally not used in any timing critical applications. If we
>> can live with this limitation then this can be a good approach
>> for KVM PMU virtualization.
>>
> Thanks,
> -Christoffer
Regards,
Anup
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2015-01-11 19:11 ` Christoffer Dall
@ 2015-01-14 4:28 ` Anup Patel
-1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2015-01-14 4:28 UTC (permalink / raw)
To: Christoffer Dall
Cc: Marc Zyngier, kvmarm, linux-arm-kernel, KVM General, patches,
Will Deacon, Ian.Campbell, Pranavkumar Sawargaonkar
On Mon, Jan 12, 2015 at 12:41 AM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
>> (dropping previous conversation for easy reading)
>>
>> Hi Marc/Christoffer,
>>
>> I tried implementing PMU context-switch via C code
>> in EL1 mode and in atomic context with irqs disabled.
>> The context switch itself works perfectly fine but
>> irq forwarding is not clean for PMU irq.
>>
>> I found another issue that is GIC only samples irq
>> lines if they are enabled. This means for using
>> irq forwarding we will need to ensure that host PMU
>> irq is enabled. The arch_timer code does this by
>> doing request_irq() for host virtual timer interrupt.
>> For PMU, we can either enable/disable host PMU
>> irq in context switch or we need to do have shared
>> irq handler between kvm pmu and host kernel pmu.
>
> could we simply require the host PMU driver to request the IRQ and have
> the driver inject the corresponding IRQ to the VM via a mechanism
> similar to VFIO using an eventfd and irqfds etc.?
>
> (I haven't quite thought through if there's a way for the host PMU
> driver to distinguish between an IRQ for itself and one for the guest,
> though).
>
> It does feel like we will need some sort of communication/coordination
> between the host PMU driver and KVM...
>
>>
>> I have rethinked about our discussion so far. I
>> understand that we need KVM PMU virtualization
>> to meet following criteria:
>> 1. No modification in host PMU driver
>
> is this really a strict requirement? one of the advantages of KVM
> should be that the rest of the kernel should be supportive of KVM.
>
>> 2. No modification in guest PMU driver
>> 3. No mask/unmask dance for sharing host PMU irq
>> 4. Clean way to avoid infinite VM exits due to
>> PMU interrupt
>>
>> I have discovered new approach which is as follows:
>> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
>> 2. Ensure that host PMU irq is disabled when entering guest
>> mode and re-enable host PMU irq when exiting guest mode if
>> it was enabled previously.
>
> How does this look like software-engineering wise? Would you be looking
> up the IRQ number from the DT in the KVM code again? How does KVM then
> synchronize with the host PMU driver so they're not both requesting the
> same IRQ at the same time?
>
>> This is to avoid infinite VM exits
>> due to PMU interrupt because as-per new approach we
>> don't mask the PMU irq via PMINTENSET_EL1 register.
>> 3. Inject virtual PMU irq at time of entering guest mode if PMU
>> overflow register is non-zero (i.e. PMOVSSET_EL0) in atomic
>> context (i.e. local_irq_disable()).
>>
>> The only limitation of this new approach is that virtual PMU irq
>> is injected at time of entering guest mode. This means guest
>> will receive virtual PMU interrupt with little delay after actual
>> interrupt occurred.
>
> it may never receive it in the case of a tickless configuration AFAICT,
> so this doesn't sound like the right approach.
The PMU interrupts are not similar to arch_timer interrupts. In fact,
they are overflow interrupts on event counters. The PMU events
of Guest VCPU are only counted when Guest VCPU is running.
If the Guest VCPU is scheduled out or we are in Host mode then
then PMU events are counted for Host or other Guest whoever is
running currently.
In my view, this does not break tickless guest.
Also, the above fact applies irrespective to the approach we take
for PMU virtualization.
Regards,
Anup
>
>> The PMU interrupts are only overflow events
>> and generally not used in any timing critical applications. If we
>> can live with this limitation then this can be a good approach
>> for KVM PMU virtualization.
>>
> Thanks,
> -Christoffer
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2015-01-14 4:28 ` Anup Patel
0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2015-01-14 4:28 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Jan 12, 2015 at 12:41 AM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
>> (dropping previous conversation for easy reading)
>>
>> Hi Marc/Christoffer,
>>
>> I tried implementing PMU context-switch via C code
>> in EL1 mode and in atomic context with irqs disabled.
>> The context switch itself works perfectly fine but
>> irq forwarding is not clean for PMU irq.
>>
>> I found another issue that is GIC only samples irq
>> lines if they are enabled. This means for using
>> irq forwarding we will need to ensure that host PMU
>> irq is enabled. The arch_timer code does this by
>> doing request_irq() for host virtual timer interrupt.
>> For PMU, we can either enable/disable host PMU
>> irq in context switch or we need to do have shared
>> irq handler between kvm pmu and host kernel pmu.
>
> could we simply require the host PMU driver to request the IRQ and have
> the driver inject the corresponding IRQ to the VM via a mechanism
> similar to VFIO using an eventfd and irqfds etc.?
>
> (I haven't quite thought through if there's a way for the host PMU
> driver to distinguish between an IRQ for itself and one for the guest,
> though).
>
> It does feel like we will need some sort of communication/coordination
> between the host PMU driver and KVM...
>
>>
>> I have rethinked about our discussion so far. I
>> understand that we need KVM PMU virtualization
>> to meet following criteria:
>> 1. No modification in host PMU driver
>
> is this really a strict requirement? one of the advantages of KVM
> should be that the rest of the kernel should be supportive of KVM.
>
>> 2. No modification in guest PMU driver
>> 3. No mask/unmask dance for sharing host PMU irq
>> 4. Clean way to avoid infinite VM exits due to
>> PMU interrupt
>>
>> I have discovered new approach which is as follows:
>> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
>> 2. Ensure that host PMU irq is disabled when entering guest
>> mode and re-enable host PMU irq when exiting guest mode if
>> it was enabled previously.
>
> How does this look like software-engineering wise? Would you be looking
> up the IRQ number from the DT in the KVM code again? How does KVM then
> synchronize with the host PMU driver so they're not both requesting the
> same IRQ at the same time?
>
>> This is to avoid infinite VM exits
>> due to PMU interrupt because as-per new approach we
>> don't mask the PMU irq via PMINTENSET_EL1 register.
>> 3. Inject virtual PMU irq at time of entering guest mode if PMU
>> overflow register is non-zero (i.e. PMOVSSET_EL0) in atomic
>> context (i.e. local_irq_disable()).
>>
>> The only limitation of this new approach is that virtual PMU irq
>> is injected at time of entering guest mode. This means guest
>> will receive virtual PMU interrupt with little delay after actual
>> interrupt occurred.
>
> it may never receive it in the case of a tickless configuration AFAICT,
> so this doesn't sound like the right approach.
The PMU interrupts are not similar to arch_timer interrupts. In fact,
they are overflow interrupts on event counters. The PMU events
of Guest VCPU are only counted when Guest VCPU is running.
If the Guest VCPU is scheduled out or we are in Host mode then
then PMU events are counted for Host or other Guest whoever is
running currently.
In my view, this does not break tickless guest.
Also, the above fact applies irrespective to the approach we take
for PMU virtualization.
Regards,
Anup
>
>> The PMU interrupts are only overflow events
>> and generally not used in any timing critical applications. If we
>> can live with this limitation then this can be a good approach
>> for KVM PMU virtualization.
>>
> Thanks,
> -Christoffer
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2015-01-12 4:19 ` Anup Patel
@ 2015-02-15 15:33 ` Christoffer Dall
-1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2015-02-15 15:33 UTC (permalink / raw)
To: Anup Patel
Cc: Marc Zyngier, kvmarm, linux-arm-kernel, KVM General, patches,
Will Deacon, Ian.Campbell, Pranavkumar Sawargaonkar
Hi Anup,
On Mon, Jan 12, 2015 at 09:49:13AM +0530, Anup Patel wrote:
> On Mon, Jan 12, 2015 at 12:41 AM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
> >> (dropping previous conversation for easy reading)
> >>
> >> Hi Marc/Christoffer,
> >>
> >> I tried implementing PMU context-switch via C code
> >> in EL1 mode and in atomic context with irqs disabled.
> >> The context switch itself works perfectly fine but
> >> irq forwarding is not clean for PMU irq.
> >>
> >> I found another issue that is GIC only samples irq
> >> lines if they are enabled. This means for using
> >> irq forwarding we will need to ensure that host PMU
> >> irq is enabled. The arch_timer code does this by
> >> doing request_irq() for host virtual timer interrupt.
> >> For PMU, we can either enable/disable host PMU
> >> irq in context switch or we need to do have shared
> >> irq handler between kvm pmu and host kernel pmu.
> >
> > could we simply require the host PMU driver to request the IRQ and have
> > the driver inject the corresponding IRQ to the VM via a mechanism
> > similar to VFIO using an eventfd and irqfds etc.?
>
> Currently, the host PMU driver does request_irq() only when
> there is some event to be monitored. This means host will do
> request_irq() only when we run perf application on host
> user space.
>
> Initially, I though that we could simply pass IRQF_SHARED
> for request_irq() in host PMU driver and do the same for
> reqest_irq() in KVM PMU code but the PMU irq can be
> SPI or PPI. If the PMU irq is SPI then IRQF_SHARED
> flag would fine but if its PPI then we have no way to
> set IRQF_SHARED flag because request_percpu_irq()
> does not have irq flags parameter.
>
> >
> > (I haven't quite thought through if there's a way for the host PMU
> > driver to distinguish between an IRQ for itself and one for the guest,
> > though).
> >
> > It does feel like we will need some sort of communication/coordination
> > between the host PMU driver and KVM...
> >
> >>
> >> I have rethinked about our discussion so far. I
> >> understand that we need KVM PMU virtualization
> >> to meet following criteria:
> >> 1. No modification in host PMU driver
> >
> > is this really a strict requirement? one of the advantages of KVM
> > should be that the rest of the kernel should be supportive of KVM.
>
> I guess so because host PMU driver should not do things
> differently for host and guest. I think this the reason why
> we discarded the mask/unmask PMU irq approach which
> I had implemented in RFC v1.
>
> >
> >> 2. No modification in guest PMU driver
> >> 3. No mask/unmask dance for sharing host PMU irq
> >> 4. Clean way to avoid infinite VM exits due to
> >> PMU interrupt
> >>
> >> I have discovered new approach which is as follows:
> >> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
> >> 2. Ensure that host PMU irq is disabled when entering guest
> >> mode and re-enable host PMU irq when exiting guest mode if
> >> it was enabled previously.
> >
> > How does this look like software-engineering wise? Would you be looking
> > up the IRQ number from the DT in the KVM code again? How does KVM then
> > synchronize with the host PMU driver so they're not both requesting the
> > same IRQ at the same time?
>
> We only lookup host PMU irq numbers from DT at HYP init time.
>
> During context switch we know the host PMU irq number for
> current host CPU so we can get state of host PMU irq in
> context switch code.
>
> If we go by the shard irq handler approach then both KVM
> and host PMU driver will do request_irq() on same host
> PMU irq. In other words, there is no virtual PMU irq provided
> by HW for guest.
>
Sorry for the *really* long delay in this response.
We had a chat about this subject with Will Deacon and Marc Zyngier
during connect, and basically we came to think of a number of problems
with the current approach:
1. As you pointed out, there is a need for a shared IRQ handler, and
there is no immediately nice way to implement this without a more
sophisticated perf/kvm interface, probably comprising eventfds or
something similar.
2. Hijacking the counters for the VM without perf knowing about it
basically makes it impossible to do system-wide event counting, an
important use case for a virtualization host.
So the approach we will be taking now would be to:
First, implement a strictly trap-and-emulate in software approach. This
would allow any software relying on access to performance counters to
work, although potentially with slightly unprecise values. This is the
approach taken by x86 and would be significantly simpler to support on
systems like big.LITTLE as well.
Second, if there are values obtained from within the guest that are so
skewed by the trap-and-emulate approach that we need to give the guest
access to counters, we should try to share the hardware by partitioning
the physical counters, but again, we need to coordinate with the host
perf system for this. We would only be pursuing this approach if
absolutely necessary.
Apologies for the change in direction on this.
What are your thoughts? Do you still have time/interest to work
on any of this?
Thanks,
-Christoffer
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2015-02-15 15:33 ` Christoffer Dall
0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2015-02-15 15:33 UTC (permalink / raw)
To: linux-arm-kernel
Hi Anup,
On Mon, Jan 12, 2015 at 09:49:13AM +0530, Anup Patel wrote:
> On Mon, Jan 12, 2015 at 12:41 AM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
> >> (dropping previous conversation for easy reading)
> >>
> >> Hi Marc/Christoffer,
> >>
> >> I tried implementing PMU context-switch via C code
> >> in EL1 mode and in atomic context with irqs disabled.
> >> The context switch itself works perfectly fine but
> >> irq forwarding is not clean for PMU irq.
> >>
> >> I found another issue that is GIC only samples irq
> >> lines if they are enabled. This means for using
> >> irq forwarding we will need to ensure that host PMU
> >> irq is enabled. The arch_timer code does this by
> >> doing request_irq() for host virtual timer interrupt.
> >> For PMU, we can either enable/disable host PMU
> >> irq in context switch or we need to do have shared
> >> irq handler between kvm pmu and host kernel pmu.
> >
> > could we simply require the host PMU driver to request the IRQ and have
> > the driver inject the corresponding IRQ to the VM via a mechanism
> > similar to VFIO using an eventfd and irqfds etc.?
>
> Currently, the host PMU driver does request_irq() only when
> there is some event to be monitored. This means host will do
> request_irq() only when we run perf application on host
> user space.
>
> Initially, I though that we could simply pass IRQF_SHARED
> for request_irq() in host PMU driver and do the same for
> reqest_irq() in KVM PMU code but the PMU irq can be
> SPI or PPI. If the PMU irq is SPI then IRQF_SHARED
> flag would fine but if its PPI then we have no way to
> set IRQF_SHARED flag because request_percpu_irq()
> does not have irq flags parameter.
>
> >
> > (I haven't quite thought through if there's a way for the host PMU
> > driver to distinguish between an IRQ for itself and one for the guest,
> > though).
> >
> > It does feel like we will need some sort of communication/coordination
> > between the host PMU driver and KVM...
> >
> >>
> >> I have rethinked about our discussion so far. I
> >> understand that we need KVM PMU virtualization
> >> to meet following criteria:
> >> 1. No modification in host PMU driver
> >
> > is this really a strict requirement? one of the advantages of KVM
> > should be that the rest of the kernel should be supportive of KVM.
>
> I guess so because host PMU driver should not do things
> differently for host and guest. I think this the reason why
> we discarded the mask/unmask PMU irq approach which
> I had implemented in RFC v1.
>
> >
> >> 2. No modification in guest PMU driver
> >> 3. No mask/unmask dance for sharing host PMU irq
> >> 4. Clean way to avoid infinite VM exits due to
> >> PMU interrupt
> >>
> >> I have discovered new approach which is as follows:
> >> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
> >> 2. Ensure that host PMU irq is disabled when entering guest
> >> mode and re-enable host PMU irq when exiting guest mode if
> >> it was enabled previously.
> >
> > How does this look like software-engineering wise? Would you be looking
> > up the IRQ number from the DT in the KVM code again? How does KVM then
> > synchronize with the host PMU driver so they're not both requesting the
> > same IRQ at the same time?
>
> We only lookup host PMU irq numbers from DT at HYP init time.
>
> During context switch we know the host PMU irq number for
> current host CPU so we can get state of host PMU irq in
> context switch code.
>
> If we go by the shard irq handler approach then both KVM
> and host PMU driver will do request_irq() on same host
> PMU irq. In other words, there is no virtual PMU irq provided
> by HW for guest.
>
Sorry for the *really* long delay in this response.
We had a chat about this subject with Will Deacon and Marc Zyngier
during connect, and basically we came to think of a number of problems
with the current approach:
1. As you pointed out, there is a need for a shared IRQ handler, and
there is no immediately nice way to implement this without a more
sophisticated perf/kvm interface, probably comprising eventfds or
something similar.
2. Hijacking the counters for the VM without perf knowing about it
basically makes it impossible to do system-wide event counting, an
important use case for a virtualization host.
So the approach we will be taking now would be to:
First, implement a strictly trap-and-emulate in software approach. This
would allow any software relying on access to performance counters to
work, although potentially with slightly unprecise values. This is the
approach taken by x86 and would be significantly simpler to support on
systems like big.LITTLE as well.
Second, if there are values obtained from within the guest that are so
skewed by the trap-and-emulate approach that we need to give the guest
access to counters, we should try to share the hardware by partitioning
the physical counters, but again, we need to coordinate with the host
perf system for this. We would only be pursuing this approach if
absolutely necessary.
Apologies for the change in direction on this.
What are your thoughts? Do you still have time/interest to work
on any of this?
Thanks,
-Christoffer
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2015-02-15 15:33 ` Christoffer Dall
@ 2015-02-16 12:16 ` Anup Patel
-1 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2015-02-16 12:16 UTC (permalink / raw)
To: Christoffer Dall
Cc: Marc Zyngier, kvmarm, linux-arm-kernel, KVM General, patches,
Will Deacon, Ian.Campbell, Pranavkumar Sawargaonkar
Hi Christoffer,
On Sun, Feb 15, 2015 at 9:03 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> Hi Anup,
>
> On Mon, Jan 12, 2015 at 09:49:13AM +0530, Anup Patel wrote:
>> On Mon, Jan 12, 2015 at 12:41 AM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>> > On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
>> >> (dropping previous conversation for easy reading)
>> >>
>> >> Hi Marc/Christoffer,
>> >>
>> >> I tried implementing PMU context-switch via C code
>> >> in EL1 mode and in atomic context with irqs disabled.
>> >> The context switch itself works perfectly fine but
>> >> irq forwarding is not clean for PMU irq.
>> >>
>> >> I found another issue that is GIC only samples irq
>> >> lines if they are enabled. This means for using
>> >> irq forwarding we will need to ensure that host PMU
>> >> irq is enabled. The arch_timer code does this by
>> >> doing request_irq() for host virtual timer interrupt.
>> >> For PMU, we can either enable/disable host PMU
>> >> irq in context switch or we need to do have shared
>> >> irq handler between kvm pmu and host kernel pmu.
>> >
>> > could we simply require the host PMU driver to request the IRQ and have
>> > the driver inject the corresponding IRQ to the VM via a mechanism
>> > similar to VFIO using an eventfd and irqfds etc.?
>>
>> Currently, the host PMU driver does request_irq() only when
>> there is some event to be monitored. This means host will do
>> request_irq() only when we run perf application on host
>> user space.
>>
>> Initially, I though that we could simply pass IRQF_SHARED
>> for request_irq() in host PMU driver and do the same for
>> reqest_irq() in KVM PMU code but the PMU irq can be
>> SPI or PPI. If the PMU irq is SPI then IRQF_SHARED
>> flag would fine but if its PPI then we have no way to
>> set IRQF_SHARED flag because request_percpu_irq()
>> does not have irq flags parameter.
>>
>> >
>> > (I haven't quite thought through if there's a way for the host PMU
>> > driver to distinguish between an IRQ for itself and one for the guest,
>> > though).
>> >
>> > It does feel like we will need some sort of communication/coordination
>> > between the host PMU driver and KVM...
>> >
>> >>
>> >> I have rethinked about our discussion so far. I
>> >> understand that we need KVM PMU virtualization
>> >> to meet following criteria:
>> >> 1. No modification in host PMU driver
>> >
>> > is this really a strict requirement? one of the advantages of KVM
>> > should be that the rest of the kernel should be supportive of KVM.
>>
>> I guess so because host PMU driver should not do things
>> differently for host and guest. I think this the reason why
>> we discarded the mask/unmask PMU irq approach which
>> I had implemented in RFC v1.
>>
>> >
>> >> 2. No modification in guest PMU driver
>> >> 3. No mask/unmask dance for sharing host PMU irq
>> >> 4. Clean way to avoid infinite VM exits due to
>> >> PMU interrupt
>> >>
>> >> I have discovered new approach which is as follows:
>> >> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
>> >> 2. Ensure that host PMU irq is disabled when entering guest
>> >> mode and re-enable host PMU irq when exiting guest mode if
>> >> it was enabled previously.
>> >
>> > How does this look like software-engineering wise? Would you be looking
>> > up the IRQ number from the DT in the KVM code again? How does KVM then
>> > synchronize with the host PMU driver so they're not both requesting the
>> > same IRQ at the same time?
>>
>> We only lookup host PMU irq numbers from DT at HYP init time.
>>
>> During context switch we know the host PMU irq number for
>> current host CPU so we can get state of host PMU irq in
>> context switch code.
>>
>> If we go by the shard irq handler approach then both KVM
>> and host PMU driver will do request_irq() on same host
>> PMU irq. In other words, there is no virtual PMU irq provided
>> by HW for guest.
>>
>
> Sorry for the *really* long delay in this response.
>
> We had a chat about this subject with Will Deacon and Marc Zyngier
> during connect, and basically we came to think of a number of problems
> with the current approach:
>
> 1. As you pointed out, there is a need for a shared IRQ handler, and
> there is no immediately nice way to implement this without a more
> sophisticated perf/kvm interface, probably comprising eventfds or
> something similar.
>
> 2. Hijacking the counters for the VM without perf knowing about it
> basically makes it impossible to do system-wide event counting, an
> important use case for a virtualization host.
>
> So the approach we will be taking now would be to:
>
> First, implement a strictly trap-and-emulate in software approach. This
> would allow any software relying on access to performance counters to
> work, although potentially with slightly unprecise values. This is the
> approach taken by x86 and would be significantly simpler to support on
> systems like big.LITTLE as well.
Actually, trap-and-emulate would also help avoid additions to the
KVM world switch.
>
> Second, if there are values obtained from within the guest that are so
> skewed by the trap-and-emulate approach that we need to give the guest
> access to counters, we should try to share the hardware by partitioning
> the physical counters, but again, we need to coordinate with the host
> perf system for this. We would only be pursuing this approach if
> absolutely necessary.
Yes, with trap-and-emulate we cannot accurately emulate all types
of hw counters (particularly cache misses and similar events).
>
> Apologies for the change in direction on this.
>
> What are your thoughts? Do you still have time/interest to work
> on any of this?
Its a drastic change in direction.
Currently, I have taken up some different work (not related to KVM)
so for next few months I wont be able spend time on this.
Its better if Linaro takes this work to avoid any further delays.
Best Regards,
Anup
>
> Thanks,
> -Christoffer
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2015-02-16 12:16 ` Anup Patel
0 siblings, 0 replies; 78+ messages in thread
From: Anup Patel @ 2015-02-16 12:16 UTC (permalink / raw)
To: linux-arm-kernel
Hi Christoffer,
On Sun, Feb 15, 2015 at 9:03 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> Hi Anup,
>
> On Mon, Jan 12, 2015 at 09:49:13AM +0530, Anup Patel wrote:
>> On Mon, Jan 12, 2015 at 12:41 AM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>> > On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
>> >> (dropping previous conversation for easy reading)
>> >>
>> >> Hi Marc/Christoffer,
>> >>
>> >> I tried implementing PMU context-switch via C code
>> >> in EL1 mode and in atomic context with irqs disabled.
>> >> The context switch itself works perfectly fine but
>> >> irq forwarding is not clean for PMU irq.
>> >>
>> >> I found another issue that is GIC only samples irq
>> >> lines if they are enabled. This means for using
>> >> irq forwarding we will need to ensure that host PMU
>> >> irq is enabled. The arch_timer code does this by
>> >> doing request_irq() for host virtual timer interrupt.
>> >> For PMU, we can either enable/disable host PMU
>> >> irq in context switch or we need to do have shared
>> >> irq handler between kvm pmu and host kernel pmu.
>> >
>> > could we simply require the host PMU driver to request the IRQ and have
>> > the driver inject the corresponding IRQ to the VM via a mechanism
>> > similar to VFIO using an eventfd and irqfds etc.?
>>
>> Currently, the host PMU driver does request_irq() only when
>> there is some event to be monitored. This means host will do
>> request_irq() only when we run perf application on host
>> user space.
>>
>> Initially, I though that we could simply pass IRQF_SHARED
>> for request_irq() in host PMU driver and do the same for
>> reqest_irq() in KVM PMU code but the PMU irq can be
>> SPI or PPI. If the PMU irq is SPI then IRQF_SHARED
>> flag would fine but if its PPI then we have no way to
>> set IRQF_SHARED flag because request_percpu_irq()
>> does not have irq flags parameter.
>>
>> >
>> > (I haven't quite thought through if there's a way for the host PMU
>> > driver to distinguish between an IRQ for itself and one for the guest,
>> > though).
>> >
>> > It does feel like we will need some sort of communication/coordination
>> > between the host PMU driver and KVM...
>> >
>> >>
>> >> I have rethinked about our discussion so far. I
>> >> understand that we need KVM PMU virtualization
>> >> to meet following criteria:
>> >> 1. No modification in host PMU driver
>> >
>> > is this really a strict requirement? one of the advantages of KVM
>> > should be that the rest of the kernel should be supportive of KVM.
>>
>> I guess so because host PMU driver should not do things
>> differently for host and guest. I think this the reason why
>> we discarded the mask/unmask PMU irq approach which
>> I had implemented in RFC v1.
>>
>> >
>> >> 2. No modification in guest PMU driver
>> >> 3. No mask/unmask dance for sharing host PMU irq
>> >> 4. Clean way to avoid infinite VM exits due to
>> >> PMU interrupt
>> >>
>> >> I have discovered new approach which is as follows:
>> >> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
>> >> 2. Ensure that host PMU irq is disabled when entering guest
>> >> mode and re-enable host PMU irq when exiting guest mode if
>> >> it was enabled previously.
>> >
>> > How does this look like software-engineering wise? Would you be looking
>> > up the IRQ number from the DT in the KVM code again? How does KVM then
>> > synchronize with the host PMU driver so they're not both requesting the
>> > same IRQ at the same time?
>>
>> We only lookup host PMU irq numbers from DT at HYP init time.
>>
>> During context switch we know the host PMU irq number for
>> current host CPU so we can get state of host PMU irq in
>> context switch code.
>>
>> If we go by the shard irq handler approach then both KVM
>> and host PMU driver will do request_irq() on same host
>> PMU irq. In other words, there is no virtual PMU irq provided
>> by HW for guest.
>>
>
> Sorry for the *really* long delay in this response.
>
> We had a chat about this subject with Will Deacon and Marc Zyngier
> during connect, and basically we came to think of a number of problems
> with the current approach:
>
> 1. As you pointed out, there is a need for a shared IRQ handler, and
> there is no immediately nice way to implement this without a more
> sophisticated perf/kvm interface, probably comprising eventfds or
> something similar.
>
> 2. Hijacking the counters for the VM without perf knowing about it
> basically makes it impossible to do system-wide event counting, an
> important use case for a virtualization host.
>
> So the approach we will be taking now would be to:
>
> First, implement a strictly trap-and-emulate in software approach. This
> would allow any software relying on access to performance counters to
> work, although potentially with slightly unprecise values. This is the
> approach taken by x86 and would be significantly simpler to support on
> systems like big.LITTLE as well.
Actually, trap-and-emulate would also help avoid additions to the
KVM world switch.
>
> Second, if there are values obtained from within the guest that are so
> skewed by the trap-and-emulate approach that we need to give the guest
> access to counters, we should try to share the hardware by partitioning
> the physical counters, but again, we need to coordinate with the host
> perf system for this. We would only be pursuing this approach if
> absolutely necessary.
Yes, with trap-and-emulate we cannot accurately emulate all types
of hw counters (particularly cache misses and similar events).
>
> Apologies for the change in direction on this.
>
> What are your thoughts? Do you still have time/interest to work
> on any of this?
Its a drastic change in direction.
Currently, I have taken up some different work (not related to KVM)
so for next few months I wont be able spend time on this.
Its better if Linaro takes this work to avoid any further delays.
Best Regards,
Anup
>
> Thanks,
> -Christoffer
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
2015-02-16 12:16 ` Anup Patel
@ 2015-02-16 12:23 ` Christoffer Dall
-1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2015-02-16 12:23 UTC (permalink / raw)
To: Anup Patel
Cc: Marc Zyngier, kvmarm, linux-arm-kernel, KVM General, patches,
Will Deacon, Ian.Campbell, Pranavkumar Sawargaonkar
On Mon, Feb 16, 2015 at 05:46:54PM +0530, Anup Patel wrote:
> Hi Christoffer,
>
> On Sun, Feb 15, 2015 at 9:03 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > Hi Anup,
> >
> > On Mon, Jan 12, 2015 at 09:49:13AM +0530, Anup Patel wrote:
> >> On Mon, Jan 12, 2015 at 12:41 AM, Christoffer Dall
> >> <christoffer.dall@linaro.org> wrote:
> >> > On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
> >> >> (dropping previous conversation for easy reading)
> >> >>
> >> >> Hi Marc/Christoffer,
> >> >>
> >> >> I tried implementing PMU context-switch via C code
> >> >> in EL1 mode and in atomic context with irqs disabled.
> >> >> The context switch itself works perfectly fine but
> >> >> irq forwarding is not clean for PMU irq.
> >> >>
> >> >> I found another issue that is GIC only samples irq
> >> >> lines if they are enabled. This means for using
> >> >> irq forwarding we will need to ensure that host PMU
> >> >> irq is enabled. The arch_timer code does this by
> >> >> doing request_irq() for host virtual timer interrupt.
> >> >> For PMU, we can either enable/disable host PMU
> >> >> irq in context switch or we need to do have shared
> >> >> irq handler between kvm pmu and host kernel pmu.
> >> >
> >> > could we simply require the host PMU driver to request the IRQ and have
> >> > the driver inject the corresponding IRQ to the VM via a mechanism
> >> > similar to VFIO using an eventfd and irqfds etc.?
> >>
> >> Currently, the host PMU driver does request_irq() only when
> >> there is some event to be monitored. This means host will do
> >> request_irq() only when we run perf application on host
> >> user space.
> >>
> >> Initially, I though that we could simply pass IRQF_SHARED
> >> for request_irq() in host PMU driver and do the same for
> >> reqest_irq() in KVM PMU code but the PMU irq can be
> >> SPI or PPI. If the PMU irq is SPI then IRQF_SHARED
> >> flag would fine but if its PPI then we have no way to
> >> set IRQF_SHARED flag because request_percpu_irq()
> >> does not have irq flags parameter.
> >>
> >> >
> >> > (I haven't quite thought through if there's a way for the host PMU
> >> > driver to distinguish between an IRQ for itself and one for the guest,
> >> > though).
> >> >
> >> > It does feel like we will need some sort of communication/coordination
> >> > between the host PMU driver and KVM...
> >> >
> >> >>
> >> >> I have rethinked about our discussion so far. I
> >> >> understand that we need KVM PMU virtualization
> >> >> to meet following criteria:
> >> >> 1. No modification in host PMU driver
> >> >
> >> > is this really a strict requirement? one of the advantages of KVM
> >> > should be that the rest of the kernel should be supportive of KVM.
> >>
> >> I guess so because host PMU driver should not do things
> >> differently for host and guest. I think this the reason why
> >> we discarded the mask/unmask PMU irq approach which
> >> I had implemented in RFC v1.
> >>
> >> >
> >> >> 2. No modification in guest PMU driver
> >> >> 3. No mask/unmask dance for sharing host PMU irq
> >> >> 4. Clean way to avoid infinite VM exits due to
> >> >> PMU interrupt
> >> >>
> >> >> I have discovered new approach which is as follows:
> >> >> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
> >> >> 2. Ensure that host PMU irq is disabled when entering guest
> >> >> mode and re-enable host PMU irq when exiting guest mode if
> >> >> it was enabled previously.
> >> >
> >> > How does this look like software-engineering wise? Would you be looking
> >> > up the IRQ number from the DT in the KVM code again? How does KVM then
> >> > synchronize with the host PMU driver so they're not both requesting the
> >> > same IRQ at the same time?
> >>
> >> We only lookup host PMU irq numbers from DT at HYP init time.
> >>
> >> During context switch we know the host PMU irq number for
> >> current host CPU so we can get state of host PMU irq in
> >> context switch code.
> >>
> >> If we go by the shard irq handler approach then both KVM
> >> and host PMU driver will do request_irq() on same host
> >> PMU irq. In other words, there is no virtual PMU irq provided
> >> by HW for guest.
> >>
> >
> > Sorry for the *really* long delay in this response.
> >
> > We had a chat about this subject with Will Deacon and Marc Zyngier
> > during connect, and basically we came to think of a number of problems
> > with the current approach:
> >
> > 1. As you pointed out, there is a need for a shared IRQ handler, and
> > there is no immediately nice way to implement this without a more
> > sophisticated perf/kvm interface, probably comprising eventfds or
> > something similar.
> >
> > 2. Hijacking the counters for the VM without perf knowing about it
> > basically makes it impossible to do system-wide event counting, an
> > important use case for a virtualization host.
> >
> > So the approach we will be taking now would be to:
> >
> > First, implement a strictly trap-and-emulate in software approach. This
> > would allow any software relying on access to performance counters to
> > work, although potentially with slightly unprecise values. This is the
> > approach taken by x86 and would be significantly simpler to support on
> > systems like big.LITTLE as well.
>
> Actually, trap-and-emulate would also help avoid additions to the
> KVM world switch.
>
> >
> > Second, if there are values obtained from within the guest that are so
> > skewed by the trap-and-emulate approach that we need to give the guest
> > access to counters, we should try to share the hardware by partitioning
> > the physical counters, but again, we need to coordinate with the host
> > perf system for this. We would only be pursuing this approach if
> > absolutely necessary.
>
> Yes, with trap-and-emulate we cannot accurately emulate all types
> of hw counters (particularly cache misses and similar events).
>
> >
> > Apologies for the change in direction on this.
> >
> > What are your thoughts? Do you still have time/interest to work
> > on any of this?
>
> Its a drastic change in direction.
>
> Currently, I have taken up some different work (not related to KVM)
> so for next few months I wont be able spend time on this.
>
> Its better if Linaro takes this work to avoid any further delays.
>
ok, will do, thanks for being responsive and putting in the efforts so
far!
Best,
-Christoffer
^ permalink raw reply [flat|nested] 78+ messages in thread
* [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
@ 2015-02-16 12:23 ` Christoffer Dall
0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2015-02-16 12:23 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Feb 16, 2015 at 05:46:54PM +0530, Anup Patel wrote:
> Hi Christoffer,
>
> On Sun, Feb 15, 2015 at 9:03 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > Hi Anup,
> >
> > On Mon, Jan 12, 2015 at 09:49:13AM +0530, Anup Patel wrote:
> >> On Mon, Jan 12, 2015 at 12:41 AM, Christoffer Dall
> >> <christoffer.dall@linaro.org> wrote:
> >> > On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
> >> >> (dropping previous conversation for easy reading)
> >> >>
> >> >> Hi Marc/Christoffer,
> >> >>
> >> >> I tried implementing PMU context-switch via C code
> >> >> in EL1 mode and in atomic context with irqs disabled.
> >> >> The context switch itself works perfectly fine but
> >> >> irq forwarding is not clean for PMU irq.
> >> >>
> >> >> I found another issue that is GIC only samples irq
> >> >> lines if they are enabled. This means for using
> >> >> irq forwarding we will need to ensure that host PMU
> >> >> irq is enabled. The arch_timer code does this by
> >> >> doing request_irq() for host virtual timer interrupt.
> >> >> For PMU, we can either enable/disable host PMU
> >> >> irq in context switch or we need to do have shared
> >> >> irq handler between kvm pmu and host kernel pmu.
> >> >
> >> > could we simply require the host PMU driver to request the IRQ and have
> >> > the driver inject the corresponding IRQ to the VM via a mechanism
> >> > similar to VFIO using an eventfd and irqfds etc.?
> >>
> >> Currently, the host PMU driver does request_irq() only when
> >> there is some event to be monitored. This means host will do
> >> request_irq() only when we run perf application on host
> >> user space.
> >>
> >> Initially, I though that we could simply pass IRQF_SHARED
> >> for request_irq() in host PMU driver and do the same for
> >> reqest_irq() in KVM PMU code but the PMU irq can be
> >> SPI or PPI. If the PMU irq is SPI then IRQF_SHARED
> >> flag would fine but if its PPI then we have no way to
> >> set IRQF_SHARED flag because request_percpu_irq()
> >> does not have irq flags parameter.
> >>
> >> >
> >> > (I haven't quite thought through if there's a way for the host PMU
> >> > driver to distinguish between an IRQ for itself and one for the guest,
> >> > though).
> >> >
> >> > It does feel like we will need some sort of communication/coordination
> >> > between the host PMU driver and KVM...
> >> >
> >> >>
> >> >> I have rethinked about our discussion so far. I
> >> >> understand that we need KVM PMU virtualization
> >> >> to meet following criteria:
> >> >> 1. No modification in host PMU driver
> >> >
> >> > is this really a strict requirement? one of the advantages of KVM
> >> > should be that the rest of the kernel should be supportive of KVM.
> >>
> >> I guess so because host PMU driver should not do things
> >> differently for host and guest. I think this the reason why
> >> we discarded the mask/unmask PMU irq approach which
> >> I had implemented in RFC v1.
> >>
> >> >
> >> >> 2. No modification in guest PMU driver
> >> >> 3. No mask/unmask dance for sharing host PMU irq
> >> >> 4. Clean way to avoid infinite VM exits due to
> >> >> PMU interrupt
> >> >>
> >> >> I have discovered new approach which is as follows:
> >> >> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
> >> >> 2. Ensure that host PMU irq is disabled when entering guest
> >> >> mode and re-enable host PMU irq when exiting guest mode if
> >> >> it was enabled previously.
> >> >
> >> > How does this look like software-engineering wise? Would you be looking
> >> > up the IRQ number from the DT in the KVM code again? How does KVM then
> >> > synchronize with the host PMU driver so they're not both requesting the
> >> > same IRQ at the same time?
> >>
> >> We only lookup host PMU irq numbers from DT at HYP init time.
> >>
> >> During context switch we know the host PMU irq number for
> >> current host CPU so we can get state of host PMU irq in
> >> context switch code.
> >>
> >> If we go by the shard irq handler approach then both KVM
> >> and host PMU driver will do request_irq() on same host
> >> PMU irq. In other words, there is no virtual PMU irq provided
> >> by HW for guest.
> >>
> >
> > Sorry for the *really* long delay in this response.
> >
> > We had a chat about this subject with Will Deacon and Marc Zyngier
> > during connect, and basically we came to think of a number of problems
> > with the current approach:
> >
> > 1. As you pointed out, there is a need for a shared IRQ handler, and
> > there is no immediately nice way to implement this without a more
> > sophisticated perf/kvm interface, probably comprising eventfds or
> > something similar.
> >
> > 2. Hijacking the counters for the VM without perf knowing about it
> > basically makes it impossible to do system-wide event counting, an
> > important use case for a virtualization host.
> >
> > So the approach we will be taking now would be to:
> >
> > First, implement a strictly trap-and-emulate in software approach. This
> > would allow any software relying on access to performance counters to
> > work, although potentially with slightly unprecise values. This is the
> > approach taken by x86 and would be significantly simpler to support on
> > systems like big.LITTLE as well.
>
> Actually, trap-and-emulate would also help avoid additions to the
> KVM world switch.
>
> >
> > Second, if there are values obtained from within the guest that are so
> > skewed by the trap-and-emulate approach that we need to give the guest
> > access to counters, we should try to share the hardware by partitioning
> > the physical counters, but again, we need to coordinate with the host
> > perf system for this. We would only be pursuing this approach if
> > absolutely necessary.
>
> Yes, with trap-and-emulate we cannot accurately emulate all types
> of hw counters (particularly cache misses and similar events).
>
> >
> > Apologies for the change in direction on this.
> >
> > What are your thoughts? Do you still have time/interest to work
> > on any of this?
>
> Its a drastic change in direction.
>
> Currently, I have taken up some different work (not related to KVM)
> so for next few months I wont be able spend time on this.
>
> Its better if Linaro takes this work to avoid any further delays.
>
ok, will do, thanks for being responsive and putting in the efforts so
far!
Best,
-Christoffer
^ permalink raw reply [flat|nested] 78+ messages in thread
end of thread, other threads:[~2015-02-16 12:23 UTC | newest]
Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-05 9:24 [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support Anup Patel
2014-08-05 9:24 ` Anup Patel
2014-08-05 9:24 ` [RFC PATCH 1/6] ARM64: Move PMU register related defines to asm/pmu.h Anup Patel
2014-08-05 9:24 ` Anup Patel
2014-08-05 9:24 ` [RFC PATCH 2/6] ARM64: perf: Re-enable overflow interrupt from interrupt handler Anup Patel
2014-08-05 9:24 ` Anup Patel
2014-08-06 14:24 ` Will Deacon
2014-08-06 14:24 ` Will Deacon
2014-08-07 9:03 ` Anup Patel
2014-08-07 9:03 ` Anup Patel
2014-08-07 9:06 ` Will Deacon
2014-08-07 9:06 ` Will Deacon
2014-08-05 9:24 ` [RFC PATCH 3/6] ARM: " Anup Patel
2014-08-05 9:24 ` Anup Patel
2014-08-05 9:24 ` [RFC PATCH 4/6] ARM/ARM64: KVM: Add common code PMU IRQ routing Anup Patel
2014-08-05 9:24 ` Anup Patel
2014-08-05 9:24 ` [RFC PATCH 5/6] ARM64: KVM: Implement full context switch of PMU registers Anup Patel
2014-08-05 9:24 ` Anup Patel
2014-08-05 9:24 ` [RFC PATCH 6/6] ARM64: KVM: Upgrade to lazy " Anup Patel
2014-08-05 9:24 ` Anup Patel
2014-08-05 9:32 ` [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support Anup Patel
2014-08-05 9:32 ` Anup Patel
2014-08-05 9:35 ` Anup Patel
2014-08-05 9:35 ` Anup Patel
2014-11-07 20:23 ` Christoffer Dall
2014-11-07 20:23 ` Christoffer Dall
2014-11-07 20:25 ` Christoffer Dall
2014-11-07 20:25 ` Christoffer Dall
2014-11-08 9:36 ` Anup Patel
2014-11-08 9:36 ` Anup Patel
2014-11-08 12:39 ` Christoffer Dall
2014-11-08 12:39 ` Christoffer Dall
2014-11-11 9:18 ` Anup Patel
2014-11-11 9:18 ` Anup Patel
2014-11-18 3:24 ` Anup Patel
2014-11-18 3:24 ` Anup Patel
2014-11-19 15:29 ` Christoffer Dall
2014-11-19 15:29 ` Christoffer Dall
2014-11-20 14:47 ` Anup Patel
2014-11-20 14:47 ` Anup Patel
2014-11-21 9:59 ` Christoffer Dall
2014-11-21 9:59 ` Christoffer Dall
2014-11-21 10:36 ` Anup Patel
2014-11-21 10:36 ` Anup Patel
2014-11-21 11:49 ` Christoffer Dall
2014-11-21 11:49 ` Christoffer Dall
2014-11-24 8:44 ` Anup Patel
2014-11-24 8:44 ` Anup Patel
2014-11-24 14:37 ` Christoffer Dall
2014-11-24 14:37 ` Christoffer Dall
2014-11-25 12:47 ` Anup Patel
2014-11-25 12:47 ` Anup Patel
2014-11-25 13:42 ` Christoffer Dall
2014-11-25 13:42 ` Christoffer Dall
2014-11-27 10:22 ` Anup Patel
2014-11-27 10:22 ` Anup Patel
2014-11-27 10:40 ` Marc Zyngier
2014-11-27 10:40 ` Marc Zyngier
2014-11-27 10:54 ` Anup Patel
2014-11-27 10:54 ` Anup Patel
2014-11-27 11:06 ` Marc Zyngier
2014-11-27 11:06 ` Marc Zyngier
2014-12-30 5:49 ` Anup Patel
2014-12-30 5:49 ` Anup Patel
2015-01-08 4:02 ` Anup Patel
2015-01-08 4:02 ` Anup Patel
2015-01-11 19:11 ` Christoffer Dall
2015-01-11 19:11 ` Christoffer Dall
2015-01-12 4:19 ` Anup Patel
2015-01-12 4:19 ` Anup Patel
2015-02-15 15:33 ` Christoffer Dall
2015-02-15 15:33 ` Christoffer Dall
2015-02-16 12:16 ` Anup Patel
2015-02-16 12:16 ` Anup Patel
2015-02-16 12:23 ` Christoffer Dall
2015-02-16 12:23 ` Christoffer Dall
2015-01-14 4:28 ` Anup Patel
2015-01-14 4:28 ` Anup Patel
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.