* [PATCH 1/4] perf: Update the ctx->pmu for a hybrid system
2021-06-16 18:55 [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system kan.liang
@ 2021-06-16 18:55 ` kan.liang
2021-06-16 18:55 ` [PATCH 2/4] perf/x86: Fix the x86_pmu_start WARNING on " kan.liang
` (3 subsequent siblings)
4 siblings, 0 replies; 11+ messages in thread
From: kan.liang @ 2021-06-16 18:55 UTC (permalink / raw)
To: peterz, mingo, linux-kernel
Cc: acme, mark.rutland, ak, alexander.shishkin, namhyung, jolsa, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
The current generic perf code is not hybrid-friendly. A task perf
context assumes that there is only one perf_hw_context PMU. The PMU
is stored in the ctx->pmu when allocating the perf context. In a
context switch, before scheduling any events, the corresponding PMU
has to be disabled. But on a hybrid system, the task may be scheduled to
a CPU which has a different PMU from the ctx->pmu. The PMU of the new
CPU may not be disabled.
Add a supported_cpus in struct pmu to indicate the supported CPU mask
of the current PMU. When a new task is scheduled, check the supported
CPU of the ctx->pmu. Update the ctx->pmu if the CPU doesn't support it.
For a non-hybrid system or the generic supported_cpus is not implemented
in the specific codes yet, the behavior is not changed.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
include/linux/perf_event.h | 7 +++++++
kernel/events/core.c | 29 ++++++++++++++++++++++++++++-
2 files changed, 35 insertions(+), 1 deletion(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index f5a6a2f..111b1c1 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -301,6 +301,13 @@ struct pmu {
unsigned int nr_addr_filters;
/*
+ * For hybrid systems with PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
+ * @supported_cpus: The supported CPU mask of the current PMU.
+ * Empty means a non-hybrid system or not implemented.
+ */
+ cpumask_t supported_cpus;
+
+ /*
* Fully disable/enable this PMU, can be used to protect from the PMI
* as well as for lazy/batch writing of the MSRs.
*/
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6fee4a7..b172f54 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3817,11 +3817,38 @@ static void cpu_ctx_sched_in(struct perf_cpu_context *cpuctx,
ctx_sched_in(ctx, cpuctx, event_type, task);
}
+/*
+ * Update the ctx->pmu if the new task is scheduled in a CPU
+ * which has a different type of PMU
+ */
+static inline void perf_event_context_update_hybrid(struct perf_event_context *ctx)
+{
+ struct pmu *pmu = ctx->pmu;
+
+ if (cpumask_empty(&pmu->supported_cpus) || cpumask_test_cpu(smp_processor_id(), &pmu->supported_cpus))
+ return;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(pmu, &pmus, entry) {
+ if ((pmu->task_ctx_nr == perf_hw_context) &&
+ cpumask_test_cpu(smp_processor_id(), &pmu->supported_cpus)) {
+ ctx->pmu = pmu;
+ break;
+ }
+ }
+ rcu_read_unlock();
+ WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &ctx->pmu->supported_cpus));
+}
+
static void perf_event_context_sched_in(struct perf_event_context *ctx,
struct task_struct *task)
{
struct perf_cpu_context *cpuctx;
- struct pmu *pmu = ctx->pmu;
+ struct pmu *pmu;
+
+ if (ctx->pmu->capabilities & PERF_PMU_CAP_HETEROGENEOUS_CPUS)
+ perf_event_context_update_hybrid(ctx);
+ pmu = ctx->pmu;
cpuctx = __get_cpu_context(ctx);
if (cpuctx->task_ctx == ctx) {
--
2.7.4
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/4] perf/x86: Fix the x86_pmu_start WARNING on a hybrid system
2021-06-16 18:55 [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system kan.liang
2021-06-16 18:55 ` [PATCH 1/4] perf: Update " kan.liang
@ 2021-06-16 18:55 ` kan.liang
2021-06-16 18:55 ` [PATCH 3/4] perf: Check the supported CPU of an event kan.liang
` (2 subsequent siblings)
4 siblings, 0 replies; 11+ messages in thread
From: kan.liang @ 2021-06-16 18:55 UTC (permalink / raw)
To: peterz, mingo, linux-kernel
Cc: acme, mark.rutland, ak, alexander.shishkin, namhyung, jolsa, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
The below WARNING may be triggered, when a user enables per-task
monitoring with all available perf_hw_context PMUs on a hybrid system.
WARNING: CPU: 8 PID: 37107 at arch/x86/events/core.c:1505
x86_pmu_start+0x77/0x90
Call Trace:
x86_pmu_enable+0x111/0x2f0
event_sched_in+0x167/0x230
merge_sched_in+0x1a7/0x3d0
visit_groups_merge.constprop.0.isra.0+0x16f/0x450
? x86_pmu_del+0x42/0x190
ctx_sched_in+0xb8/0x170
perf_event_sched_in+0x61/0x90
__perf_event_task_sched_in+0x20b/0x2a0
finish_task_switch.isra.0+0x16a/0x290
__schedule+0x2fd/0x970
? free_inode_nonrcu+0x18/0x20
schedule+0x4f/0xc0
do_wait+0x176/0x2f0
kernel_wait4+0xaf/0x150
Here is the line of the WARNING.
if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
The current generic perf code doesn't know the supported CPU mask of a
specific PMU. It cannot update the ctx->pmu when the task is scheduled
on a CPU which has a different type of PMU from the previous CPU.
If many events are scheduled in the context switch and the perf
scheduler tries to move the early event to a new counter, the WARNING
will be triggered, because the corresponding PMU is not disabled. The
early events are probably already running.
Update the supported_cpus in struct pmu for a hybrid system. So the
generic perf code understands the supported CPU mask of a specific PMU.
Since the supported_cpus is tracked in the struct pmu, remove it from
the struct x86_hybrid_pmu.
Fixes: f83d2f91d259 ("perf/x86/intel: Add Alder Lake Hybrid support")
Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
arch/x86/events/core.c | 12 ++++--------
arch/x86/events/intel/core.c | 13 +++++--------
arch/x86/events/perf_event.h | 1 -
3 files changed, 9 insertions(+), 17 deletions(-)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 8f71dd7..ecebc66 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2060,6 +2060,7 @@ void x86_pmu_update_cpu_context(struct pmu *pmu, int cpu)
cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
cpuctx->ctx.pmu = pmu;
+ cpumask_set_cpu(cpu, &pmu->supported_cpus);
}
static int __init init_hw_perf_events(void)
@@ -2331,12 +2332,9 @@ static struct cpu_hw_events *allocate_fake_cpuc(struct pmu *event_pmu)
cpuc->is_fake = 1;
if (is_hybrid()) {
- struct x86_hybrid_pmu *h_pmu;
-
- h_pmu = hybrid_pmu(event_pmu);
- if (cpumask_empty(&h_pmu->supported_cpus))
+ if (cpumask_empty(&event_pmu->supported_cpus))
goto error;
- cpu = cpumask_first(&h_pmu->supported_cpus);
+ cpu = cpumask_first(&event_pmu->supported_cpus);
} else
cpu = raw_smp_processor_id();
cpuc->pmu = event_pmu;
@@ -2441,7 +2439,6 @@ static int validate_group(struct perf_event *event)
static int x86_pmu_event_init(struct perf_event *event)
{
- struct x86_hybrid_pmu *pmu = NULL;
int err;
if ((event->attr.type != event->pmu->type) &&
@@ -2450,8 +2447,7 @@ static int x86_pmu_event_init(struct perf_event *event)
return -ENOENT;
if (is_hybrid() && (event->cpu != -1)) {
- pmu = hybrid_pmu(event->pmu);
- if (!cpumask_test_cpu(event->cpu, &pmu->supported_cpus))
+ if (!cpumask_test_cpu(event->cpu, &event->pmu->supported_cpus))
return -ENOENT;
}
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index e288922..03ba014 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4311,7 +4311,7 @@ static bool init_hybrid_pmu(int cpu)
}
/* Only check and dump the PMU information for the first CPU */
- if (!cpumask_empty(&pmu->supported_cpus))
+ if (!cpumask_empty(&pmu->pmu.supported_cpus))
goto end;
if (!check_hw_exists(&pmu->pmu, pmu->num_counters, pmu->num_counters_fixed))
@@ -4328,9 +4328,7 @@ static bool init_hybrid_pmu(int cpu)
pmu->intel_ctrl);
end:
- cpumask_set_cpu(cpu, &pmu->supported_cpus);
cpuc->pmu = &pmu->pmu;
-
x86_pmu_update_cpu_context(&pmu->pmu, cpu);
return true;
@@ -4463,7 +4461,7 @@ static void intel_pmu_cpu_dead(int cpu)
intel_cpuc_finish(cpuc);
if (is_hybrid() && cpuc->pmu)
- cpumask_clear_cpu(cpu, &hybrid_pmu(cpuc->pmu)->supported_cpus);
+ cpumask_clear_cpu(cpu, &cpuc->pmu->supported_cpus);
}
static void intel_pmu_sched_task(struct perf_event_context *ctx,
@@ -4494,10 +4492,9 @@ static int intel_pmu_aux_output_match(struct perf_event *event)
static int intel_pmu_filter_match(struct perf_event *event)
{
- struct x86_hybrid_pmu *pmu = hybrid_pmu(event->pmu);
unsigned int cpu = smp_processor_id();
- return cpumask_test_cpu(cpu, &pmu->supported_cpus);
+ return cpumask_test_cpu(cpu, &event->pmu->supported_cpus);
}
PMU_FORMAT_ATTR(offcore_rsp, "config1:0-63");
@@ -5299,7 +5296,7 @@ static umode_t hybrid_events_is_visible(struct kobject *kobj,
static inline int hybrid_find_supported_cpu(struct x86_hybrid_pmu *pmu)
{
- int cpu = cpumask_first(&pmu->supported_cpus);
+ int cpu = cpumask_first(&pmu->pmu.supported_cpus);
return (cpu >= nr_cpu_ids) ? -1 : cpu;
}
@@ -5355,7 +5352,7 @@ static ssize_t intel_hybrid_get_attr_cpus(struct device *dev,
struct x86_hybrid_pmu *pmu =
container_of(dev_get_drvdata(dev), struct x86_hybrid_pmu, pmu);
- return cpumap_print_to_pagebuf(true, buf, &pmu->supported_cpus);
+ return cpumap_print_to_pagebuf(true, buf, &pmu->pmu.supported_cpus);
}
static DEVICE_ATTR(cpus, S_IRUGO, intel_hybrid_get_attr_cpus, NULL);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index ad87cb3..02abcac 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -636,7 +636,6 @@ struct x86_hybrid_pmu {
struct pmu pmu;
const char *name;
u8 cpu_type;
- cpumask_t supported_cpus;
union perf_capabilities intel_cap;
u64 intel_ctrl;
int max_pebs_events;
--
2.7.4
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/4] perf: Check the supported CPU of an event
2021-06-16 18:55 [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system kan.liang
2021-06-16 18:55 ` [PATCH 1/4] perf: Update " kan.liang
2021-06-16 18:55 ` [PATCH 2/4] perf/x86: Fix the x86_pmu_start WARNING on " kan.liang
@ 2021-06-16 18:55 ` kan.liang
2021-06-16 18:55 ` [PATCH 4/4] perf/x86: Remove filter_match callback kan.liang
2021-06-17 10:23 ` [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system Peter Zijlstra
4 siblings, 0 replies; 11+ messages in thread
From: kan.liang @ 2021-06-16 18:55 UTC (permalink / raw)
To: peterz, mingo, linux-kernel
Cc: acme, mark.rutland, ak, alexander.shishkin, namhyung, jolsa, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
All architectures, which support the filter_match callback, check
whether an event is schedulable on the current CPU. Since the
supported_cpus is moved to struct pmu, the check can be done in the
generic code. The filter_match callback can be avoided for some
architecture, e.g., x86.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
kernel/events/core.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index b172f54..7140f40 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2225,9 +2225,19 @@ static bool is_orphaned_event(struct perf_event *event)
return event->state == PERF_EVENT_STATE_DEAD;
}
+static __always_inline bool pmu_can_sched_on_this_cpu(struct pmu *pmu)
+{
+ return cpumask_empty(&pmu->supported_cpus) ||
+ cpumask_test_cpu(smp_processor_id(), &pmu->supported_cpus);
+}
+
static inline int __pmu_filter_match(struct perf_event *event)
{
struct pmu *pmu = event->pmu;
+
+ if (!pmu_can_sched_on_this_cpu(pmu))
+ return 0;
+
return pmu->filter_match ? pmu->filter_match(event) : 1;
}
@@ -3825,7 +3835,7 @@ static inline void perf_event_context_update_hybrid(struct perf_event_context *c
{
struct pmu *pmu = ctx->pmu;
- if (cpumask_empty(&pmu->supported_cpus) || cpumask_test_cpu(smp_processor_id(), &pmu->supported_cpus))
+ if (pmu_can_sched_on_this_cpu(pmu))
return;
rcu_read_lock();
--
2.7.4
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 4/4] perf/x86: Remove filter_match callback
2021-06-16 18:55 [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system kan.liang
` (2 preceding siblings ...)
2021-06-16 18:55 ` [PATCH 3/4] perf: Check the supported CPU of an event kan.liang
@ 2021-06-16 18:55 ` kan.liang
2021-06-17 10:23 ` [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system Peter Zijlstra
4 siblings, 0 replies; 11+ messages in thread
From: kan.liang @ 2021-06-16 18:55 UTC (permalink / raw)
To: peterz, mingo, linux-kernel
Cc: acme, mark.rutland, ak, alexander.shishkin, namhyung, jolsa, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
The filter_match callback is to check whether an event is schedulable on
the current CPU, which has been supported in the generic code.
Remove the filter_match callback.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
arch/x86/events/core.c | 10 ----------
arch/x86/events/intel/core.c | 8 --------
arch/x86/events/perf_event.h | 1 -
3 files changed, 19 deletions(-)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index ecebc66..4965171 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2640,14 +2640,6 @@ static int x86_pmu_aux_output_match(struct perf_event *event)
return 0;
}
-static int x86_pmu_filter_match(struct perf_event *event)
-{
- if (x86_pmu.filter_match)
- return x86_pmu.filter_match(event);
-
- return 1;
-}
-
static struct pmu pmu = {
.pmu_enable = x86_pmu_enable,
.pmu_disable = x86_pmu_disable,
@@ -2675,8 +2667,6 @@ static struct pmu pmu = {
.check_period = x86_pmu_check_period,
.aux_output_match = x86_pmu_aux_output_match,
-
- .filter_match = x86_pmu_filter_match,
};
void arch_perf_update_userpage(struct perf_event *event,
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 03ba014..17b86d2 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4490,13 +4490,6 @@ static int intel_pmu_aux_output_match(struct perf_event *event)
return is_intel_pt_event(event);
}
-static int intel_pmu_filter_match(struct perf_event *event)
-{
- unsigned int cpu = smp_processor_id();
-
- return cpumask_test_cpu(cpu, &event->pmu->supported_cpus);
-}
-
PMU_FORMAT_ATTR(offcore_rsp, "config1:0-63");
PMU_FORMAT_ATTR(ldlat, "config1:0-15");
@@ -6131,7 +6124,6 @@ __init int intel_pmu_init(void)
x86_pmu.update_topdown_event = adl_update_topdown_event;
x86_pmu.set_topdown_event_period = adl_set_topdown_event_period;
- x86_pmu.filter_match = intel_pmu_filter_match;
x86_pmu.get_event_constraints = adl_get_event_constraints;
x86_pmu.hw_config = adl_hw_config;
x86_pmu.limit_period = spr_limit_period;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 02abcac..9b6c9d8 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -883,7 +883,6 @@ struct x86_pmu {
int (*aux_output_match) (struct perf_event *event);
- int (*filter_match)(struct perf_event *event);
/*
* Hybrid support
*
--
2.7.4
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system
2021-06-16 18:55 [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system kan.liang
` (3 preceding siblings ...)
2021-06-16 18:55 ` [PATCH 4/4] perf/x86: Remove filter_match callback kan.liang
@ 2021-06-17 10:23 ` Peter Zijlstra
2021-06-17 11:33 ` Peter Zijlstra
4 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2021-06-17 10:23 UTC (permalink / raw)
To: kan.liang
Cc: mingo, linux-kernel, acme, mark.rutland, ak, alexander.shishkin,
namhyung, jolsa
On Wed, Jun 16, 2021 at 11:55:30AM -0700, kan.liang@linux.intel.com wrote:
> To fix the issue, the generic perf codes have to understand the
> supported CPU mask of a specific hybrid PMU. So it can update the
> ctx->pmu accordingly, when a task is scheduled on a CPU which has
> a different type of PMU from the previous CPU. The supported_cpus
> has to be moved to the struct pmu.
Urghh.. I so hate this :-/
I *did* point you to:
https://lore.kernel.org/lkml/20181010104559.GO5728@hirez.programming.kicks-ass.net/
when you started this whole hybrid crud, and I think that's still the
correct thing to do.
Still, let me consider if there's a workable short-term cludge I hate
less.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system
2021-06-17 10:23 ` [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system Peter Zijlstra
@ 2021-06-17 11:33 ` Peter Zijlstra
2021-06-17 14:10 ` Liang, Kan
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Peter Zijlstra @ 2021-06-17 11:33 UTC (permalink / raw)
To: kan.liang
Cc: mingo, linux-kernel, acme, mark.rutland, ak, alexander.shishkin,
namhyung, jolsa
On Thu, Jun 17, 2021 at 12:23:06PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 16, 2021 at 11:55:30AM -0700, kan.liang@linux.intel.com wrote:
>
> > To fix the issue, the generic perf codes have to understand the
> > supported CPU mask of a specific hybrid PMU. So it can update the
> > ctx->pmu accordingly, when a task is scheduled on a CPU which has
> > a different type of PMU from the previous CPU. The supported_cpus
> > has to be moved to the struct pmu.
>
> Urghh.. I so hate this :-/
>
> I *did* point you to:
>
> https://lore.kernel.org/lkml/20181010104559.GO5728@hirez.programming.kicks-ass.net/
>
> when you started this whole hybrid crud, and I think that's still the
> correct thing to do.
>
> Still, let me consider if there's a workable short-term cludge I hate
> less.
How's this? We already have x86_pmu_update_cpu_context() setting the
'correct' pmu in the cpuctx, so we can simply fold that back into the
task context.
For normal use this is a no-op.
Now I need to go audit all ctx->pmu usage :-(
---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index db4604c4c502..6a496c29ef00 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3822,9 +3822,16 @@ static void perf_event_context_sched_in(struct perf_event_context *ctx,
struct task_struct *task)
{
struct perf_cpu_context *cpuctx;
- struct pmu *pmu = ctx->pmu;
+ struct pmu *pmu;
cpuctx = __get_cpu_context(ctx);
+
+ /*
+ * HACK; for HETEROGENOUS the task context might have switched to a
+ * different PMU, don't bother gating this.
+ */
+ pmu = ctx->pmu = cpuctx->ctx.pmu;
+
if (cpuctx->task_ctx == ctx) {
if (cpuctx->sched_cb_usage)
__perf_pmu_sched_task(cpuctx, true);
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system
2021-06-17 11:33 ` Peter Zijlstra
@ 2021-06-17 14:10 ` Liang, Kan
2021-06-17 19:32 ` Peter Zijlstra
2021-06-18 13:54 ` Liang, Kan
2021-06-24 7:09 ` [tip: perf/core] perf: Fix task context PMU for Hetero tip-bot2 for Peter Zijlstra
2 siblings, 1 reply; 11+ messages in thread
From: Liang, Kan @ 2021-06-17 14:10 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, acme, mark.rutland, ak, alexander.shishkin,
namhyung, jolsa
On 6/17/2021 7:33 AM, Peter Zijlstra wrote:
> On Thu, Jun 17, 2021 at 12:23:06PM +0200, Peter Zijlstra wrote:
>> On Wed, Jun 16, 2021 at 11:55:30AM -0700, kan.liang@linux.intel.com wrote:
>>
>>> To fix the issue, the generic perf codes have to understand the
>>> supported CPU mask of a specific hybrid PMU. So it can update the
>>> ctx->pmu accordingly, when a task is scheduled on a CPU which has
>>> a different type of PMU from the previous CPU. The supported_cpus
>>> has to be moved to the struct pmu.
>>
>> Urghh.. I so hate this :-/
>>
>> I *did* point you to:
>>
>> https://lore.kernel.org/lkml/20181010104559.GO5728@hirez.programming.kicks-ass.net/
>>
>> when you started this whole hybrid crud
Yes, to work around the hybrid, I updated the PMU for the CPU context
accordingly, but not the task context. :( This issue is found in a
stress test that was not ready at that time. Sorry for that.
>>, and I think that's still the
>> correct thing to do.
>> >> Still, let me consider if there's a workable short-term cludge I hate
>> less.
>
> How's this? We already have x86_pmu_update_cpu_context() setting the
> 'correct' pmu in the cpuctx, so we can simply fold that back into the
> task context.
>
> For normal use this is a no-op.
>
> Now I need to go audit all ctx->pmu usage :-(
>
> ---
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index db4604c4c502..6a496c29ef00 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3822,9 +3822,16 @@ static void perf_event_context_sched_in(struct perf_event_context *ctx,
> struct task_struct *task)
> {
> struct perf_cpu_context *cpuctx;
> - struct pmu *pmu = ctx->pmu;
> + struct pmu *pmu;
>
> cpuctx = __get_cpu_context(ctx);
> +
> + /*
> + * HACK; for HETEROGENOUS the task context might have switched to a
> + * different PMU, don't bother gating this.
> + */
> + pmu = ctx->pmu = cpuctx->ctx.pmu;
> +
I think all the perf_sw_context PMUs share the same pmu_cpu_context. so
the cpuctx->ctx.pmu should be always the first registered
perf_sw_context PMU which is perf_swevent. The ctx->pmu could be another
software PMU.
In theory, the perf_sw_context PMUs should have a similar issue. If the
events are from different perf_sw_context PMUs, we should
perf_pmu_disable() all of the PMUs before schedule them, but the
ctx->pmu only tracks the first one.
I don't have a good way to fix the perf_sw_context PMUs. I think we have
to go through the event list and find all PMUs. But I don't think it's
worth doing.
Maybe we should only apply the change for the hybrid PMUs, and leave
other PMUs as is.
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6fee4a7..df9cce6 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3821,9 +3821,19 @@ static void perf_event_context_sched_in(struct
perf_event_context *ctx,
struct task_struct *task)
{
struct perf_cpu_context *cpuctx;
- struct pmu *pmu = ctx->pmu;
+ struct pmu *pmu;
cpuctx = __get_cpu_context(ctx);
+
+ if (ctx->pmu->capabilities & PERF_PMU_CAP_HETEROGENEOUS_CPUS) {
+ /*
+ * HACK; for HETEROGENOUS the task context might have switched to a
+ * different PMU, don't bother gating this.
+ */
+ pmu = ctx->pmu = cpuctx->ctx.pmu;
+ } else
+ pmu = ctx->pmu;
+
if (cpuctx->task_ctx == ctx) {
if (cpuctx->sched_cb_usage)
__perf_pmu_sched_task(cpuctx, true);
Thanks,
Kan
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system
2021-06-17 14:10 ` Liang, Kan
@ 2021-06-17 19:32 ` Peter Zijlstra
0 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2021-06-17 19:32 UTC (permalink / raw)
To: Liang, Kan
Cc: mingo, linux-kernel, acme, mark.rutland, ak, alexander.shishkin,
namhyung, jolsa
On Thu, Jun 17, 2021 at 10:10:37AM -0400, Liang, Kan wrote:
> I think all the perf_sw_context PMUs share the same pmu_cpu_context. so the
> cpuctx->ctx.pmu should be always the first registered perf_sw_context PMU
> which is perf_swevent. The ctx->pmu could be another software PMU.
Is there actually anything that relies on that? IIRC the sw pmus only
use event->pmu->foo() methods (exactly because the ctx->pmu is
unreliable for them).
> In theory, the perf_sw_context PMUs should have a similar issue. If the
> events are from different perf_sw_context PMUs, we should perf_pmu_disable()
> all of the PMUs before schedule them, but the ctx->pmu only tracks the first
> one.
>
> I don't have a good way to fix the perf_sw_context PMUs. I think we have to
> go through the event list and find all PMUs. But I don't think it's worth
> doing.
Yeah, the software PMUs are misserable, they're one of the things I wish
I'd done differently. Cleaning that up is *somewhere* on the TODO list.
So I *think* it should work as is and we can avoid the extra check, but
let me know what actual testing does.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system
2021-06-17 11:33 ` Peter Zijlstra
2021-06-17 14:10 ` Liang, Kan
@ 2021-06-18 13:54 ` Liang, Kan
2021-06-24 7:09 ` [tip: perf/core] perf: Fix task context PMU for Hetero tip-bot2 for Peter Zijlstra
2 siblings, 0 replies; 11+ messages in thread
From: Liang, Kan @ 2021-06-18 13:54 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, acme, mark.rutland, ak, alexander.shishkin,
namhyung, jolsa
On 6/17/2021 7:33 AM, Peter Zijlstra wrote:
> On Thu, Jun 17, 2021 at 12:23:06PM +0200, Peter Zijlstra wrote:
>> On Wed, Jun 16, 2021 at 11:55:30AM -0700, kan.liang@linux.intel.com wrote:
>>
>>> To fix the issue, the generic perf codes have to understand the
>>> supported CPU mask of a specific hybrid PMU. So it can update the
>>> ctx->pmu accordingly, when a task is scheduled on a CPU which has
>>> a different type of PMU from the previous CPU. The supported_cpus
>>> has to be moved to the struct pmu.
>>
>> Urghh.. I so hate this :-/
>>
>> I *did* point you to:
>>
>> https://lore.kernel.org/lkml/20181010104559.GO5728@hirez.programming.kicks-ass.net/
>>
>> when you started this whole hybrid crud, and I think that's still the
>> correct thing to do.
>>
>> Still, let me consider if there's a workable short-term cludge I hate
>> less.
>
> How's this? We already have x86_pmu_update_cpu_context() setting the
> 'correct' pmu in the cpuctx, so we can simply fold that back into the
> task context.
>
> For normal use this is a no-op.
>
> Now I need to go audit all ctx->pmu usage :-(
>
> ---
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index db4604c4c502..6a496c29ef00 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3822,9 +3822,16 @@ static void perf_event_context_sched_in(struct perf_event_context *ctx,
> struct task_struct *task)
> {
> struct perf_cpu_context *cpuctx;
> - struct pmu *pmu = ctx->pmu;
> + struct pmu *pmu;
>
> cpuctx = __get_cpu_context(ctx);
> +
> + /*
> + * HACK; for HETEROGENOUS the task context might have switched to a
%s/HETEROGENOUS/HETEROGENEOUS/
> + * different PMU, don't bother gating this.
> + */
> + pmu = ctx->pmu = cpuctx->ctx.pmu;
> +
> if (cpuctx->task_ctx == ctx) {
> if (cpuctx->sched_cb_usage)
> __perf_pmu_sched_task(cpuctx, true);
>
With the patch, the issue has gone.
We've also run a full test on ADL and SPR. There is no other issues
found with the patch.
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Thanks,
Kan
^ permalink raw reply [flat|nested] 11+ messages in thread
* [tip: perf/core] perf: Fix task context PMU for Hetero
2021-06-17 11:33 ` Peter Zijlstra
2021-06-17 14:10 ` Liang, Kan
2021-06-18 13:54 ` Liang, Kan
@ 2021-06-24 7:09 ` tip-bot2 for Peter Zijlstra
2 siblings, 0 replies; 11+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-06-24 7:09 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Kan Liang, Peter Zijlstra (Intel), x86, linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 012669c740e6e2afa8bdb95394d06676f933dd2d
Gitweb: https://git.kernel.org/tip/012669c740e6e2afa8bdb95394d06676f933dd2d
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Tue, 22 Jun 2021 16:21:01 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 23 Jun 2021 18:30:56 +02:00
perf: Fix task context PMU for Hetero
On HETEROGENEOUS hardware (ARM big.Little, Intel Alderlake etc.) each
CPU might have a different hardware PMU. Since each such PMU is
represented by a different struct pmu, but we only have a single HW
task context.
That means that the task context needs to switch PMU type when it
switches CPUs.
Not doing this means that ctx->pmu calls (pmu_{dis,en}able(),
{start,commit,cancel}_txn() etc.) are called against the wrong PMU and
things will go wobbly.
Fixes: f83d2f91d259 ("perf/x86/intel: Add Alder Lake Hybrid support")
Reported-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lkml.kernel.org/r/YMsy7BuGT8nBTspT@hirez.programming.kicks-ass.net
---
kernel/events/core.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6c964de..0e125ae 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3822,9 +3822,16 @@ static void perf_event_context_sched_in(struct perf_event_context *ctx,
struct task_struct *task)
{
struct perf_cpu_context *cpuctx;
- struct pmu *pmu = ctx->pmu;
+ struct pmu *pmu;
cpuctx = __get_cpu_context(ctx);
+
+ /*
+ * HACK: for HETEROGENEOUS the task context might have switched to a
+ * different PMU, force (re)set the context,
+ */
+ pmu = ctx->pmu = cpuctx->ctx.pmu;
+
if (cpuctx->task_ctx == ctx) {
if (cpuctx->sched_cb_usage)
__perf_pmu_sched_task(cpuctx, true);
^ permalink raw reply related [flat|nested] 11+ messages in thread