linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system
@ 2021-06-16 18:55 kan.liang
  2021-06-16 18:55 ` [PATCH 1/4] perf: Update " kan.liang
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: kan.liang @ 2021-06-16 18:55 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: acme, mark.rutland, ak, alexander.shishkin, namhyung, jolsa, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The patchset is to fix the below WARNING triggered on an ADL machine
when a user enables per-task monitoring with all available
perf_hw_context PMUs.

WARNING: CPU: 8 PID: 37107 at arch/x86/events/core.c:1505
x86_pmu_start+0x77/0x90
Call Trace:
x86_pmu_enable+0x111/0x2f0
event_sched_in+0x167/0x230
merge_sched_in+0x1a7/0x3d0
visit_groups_merge.constprop.0.isra.0+0x16f/0x450
? x86_pmu_del+0x42/0x190
ctx_sched_in+0xb8/0x170
perf_event_sched_in+0x61/0x90
__perf_event_task_sched_in+0x20b/0x2a0
finish_task_switch.isra.0+0x16a/0x290
__schedule+0x2fd/0x970
? free_inode_nonrcu+0x18/0x20
schedule+0x4f/0xc0
do_wait+0x176/0x2f0
kernel_wait4+0xaf/0x150

Here is the line of the WARNING.
        if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))

To fix the issue, the generic perf codes have to understand the
supported CPU mask of a specific hybrid PMU. So it can update the
ctx->pmu accordingly, when a task is scheduled on a CPU which has
a different type of PMU from the previous CPU. The supported_cpus
has to be moved to the struct pmu.

Besides, the moving can bring another improvement. All hybrid
architectures including x86 and arm check whether an event is
schedulable on the current CPU via the filter_match callback. Since the
supported_cpus is moved to struct pmu, the check can be done in the
generic code. The filter_match callback may be avoided. The patchset
only implements the improvement for x86. Arm may implement a similar
improvement later separately.

Kan Liang (4):
  perf: Update the ctx->pmu for a hybrid system
  perf/x86: Fix the x86_pmu_start WARNING on a hybrid system
  perf: Check the supported CPU of an event
  perf/x86: Remove filter_match callback

 arch/x86/events/core.c       | 22 ++++------------------
 arch/x86/events/intel/core.c | 19 ++++---------------
 arch/x86/events/perf_event.h |  2 --
 include/linux/perf_event.h   |  7 +++++++
 kernel/events/core.c         | 39 ++++++++++++++++++++++++++++++++++++++-
 5 files changed, 53 insertions(+), 36 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/4] perf: Update the ctx->pmu for a hybrid system
  2021-06-16 18:55 [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system kan.liang
@ 2021-06-16 18:55 ` kan.liang
  2021-06-16 18:55 ` [PATCH 2/4] perf/x86: Fix the x86_pmu_start WARNING on " kan.liang
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: kan.liang @ 2021-06-16 18:55 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: acme, mark.rutland, ak, alexander.shishkin, namhyung, jolsa, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The current generic perf code is not hybrid-friendly. A task perf
context assumes that there is only one perf_hw_context PMU. The PMU
is stored in the ctx->pmu when allocating the perf context. In a
context switch, before scheduling any events, the corresponding PMU
has to be disabled. But on a hybrid system, the task may be scheduled to
a CPU which has a different PMU from the ctx->pmu. The PMU of the new
CPU may not be disabled.

Add a supported_cpus in struct pmu to indicate the supported CPU mask
of the current PMU. When a new task is scheduled, check the supported
CPU of the ctx->pmu. Update the ctx->pmu if the CPU doesn't support it.

For a non-hybrid system or the generic supported_cpus is not implemented
in the specific codes yet, the behavior is not changed.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 include/linux/perf_event.h |  7 +++++++
 kernel/events/core.c       | 29 ++++++++++++++++++++++++++++-
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index f5a6a2f..111b1c1 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -301,6 +301,13 @@ struct pmu {
 	unsigned int			nr_addr_filters;
 
 	/*
+	 * For hybrid systems with PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
+	 * @supported_cpus: The supported CPU mask of the current PMU.
+	 *		    Empty means a non-hybrid system or not implemented.
+	 */
+	cpumask_t			supported_cpus;
+
+	/*
 	 * Fully disable/enable this PMU, can be used to protect from the PMI
 	 * as well as for lazy/batch writing of the MSRs.
 	 */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6fee4a7..b172f54 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3817,11 +3817,38 @@ static void cpu_ctx_sched_in(struct perf_cpu_context *cpuctx,
 	ctx_sched_in(ctx, cpuctx, event_type, task);
 }
 
+/*
+ * Update the ctx->pmu if the new task is scheduled in a CPU
+ * which has a different type of PMU
+ */
+static inline void perf_event_context_update_hybrid(struct perf_event_context *ctx)
+{
+	struct pmu *pmu = ctx->pmu;
+
+	if (cpumask_empty(&pmu->supported_cpus) || cpumask_test_cpu(smp_processor_id(), &pmu->supported_cpus))
+		return;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(pmu, &pmus, entry) {
+		if ((pmu->task_ctx_nr == perf_hw_context) &&
+		    cpumask_test_cpu(smp_processor_id(), &pmu->supported_cpus)) {
+			ctx->pmu = pmu;
+			break;
+		}
+	}
+	rcu_read_unlock();
+	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &ctx->pmu->supported_cpus));
+}
+
 static void perf_event_context_sched_in(struct perf_event_context *ctx,
 					struct task_struct *task)
 {
 	struct perf_cpu_context *cpuctx;
-	struct pmu *pmu = ctx->pmu;
+	struct pmu *pmu;
+
+	if (ctx->pmu->capabilities & PERF_PMU_CAP_HETEROGENEOUS_CPUS)
+		perf_event_context_update_hybrid(ctx);
+	pmu = ctx->pmu;
 
 	cpuctx = __get_cpu_context(ctx);
 	if (cpuctx->task_ctx == ctx) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/4] perf/x86: Fix the x86_pmu_start WARNING on a hybrid system
  2021-06-16 18:55 [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system kan.liang
  2021-06-16 18:55 ` [PATCH 1/4] perf: Update " kan.liang
@ 2021-06-16 18:55 ` kan.liang
  2021-06-16 18:55 ` [PATCH 3/4] perf: Check the supported CPU of an event kan.liang
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: kan.liang @ 2021-06-16 18:55 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: acme, mark.rutland, ak, alexander.shishkin, namhyung, jolsa, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The below WARNING may be triggered, when a user enables per-task
monitoring with all available perf_hw_context PMUs on a hybrid system.

WARNING: CPU: 8 PID: 37107 at arch/x86/events/core.c:1505
x86_pmu_start+0x77/0x90
Call Trace:
x86_pmu_enable+0x111/0x2f0
event_sched_in+0x167/0x230
merge_sched_in+0x1a7/0x3d0
visit_groups_merge.constprop.0.isra.0+0x16f/0x450
? x86_pmu_del+0x42/0x190
ctx_sched_in+0xb8/0x170
perf_event_sched_in+0x61/0x90
__perf_event_task_sched_in+0x20b/0x2a0
finish_task_switch.isra.0+0x16a/0x290
__schedule+0x2fd/0x970
? free_inode_nonrcu+0x18/0x20
schedule+0x4f/0xc0
do_wait+0x176/0x2f0
kernel_wait4+0xaf/0x150

Here is the line of the WARNING.
        if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))

The current generic perf code doesn't know the supported CPU mask of a
specific PMU. It cannot update the ctx->pmu when the task is scheduled
on a CPU which has a different type of PMU from the previous CPU.
If many events are scheduled in the context switch and the perf
scheduler tries to move the early event to a new counter, the WARNING
will be triggered, because the corresponding PMU is not disabled. The
early events are probably already running.

Update the supported_cpus in struct pmu for a hybrid system. So the
generic perf code understands the supported CPU mask of a specific PMU.
Since the supported_cpus is tracked in the struct pmu, remove it from
the struct x86_hybrid_pmu.

Fixes: f83d2f91d259 ("perf/x86/intel: Add Alder Lake Hybrid support")
Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/core.c       | 12 ++++--------
 arch/x86/events/intel/core.c | 13 +++++--------
 arch/x86/events/perf_event.h |  1 -
 3 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 8f71dd7..ecebc66 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2060,6 +2060,7 @@ void x86_pmu_update_cpu_context(struct pmu *pmu, int cpu)
 
 	cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
 	cpuctx->ctx.pmu = pmu;
+	cpumask_set_cpu(cpu, &pmu->supported_cpus);
 }
 
 static int __init init_hw_perf_events(void)
@@ -2331,12 +2332,9 @@ static struct cpu_hw_events *allocate_fake_cpuc(struct pmu *event_pmu)
 	cpuc->is_fake = 1;
 
 	if (is_hybrid()) {
-		struct x86_hybrid_pmu *h_pmu;
-
-		h_pmu = hybrid_pmu(event_pmu);
-		if (cpumask_empty(&h_pmu->supported_cpus))
+		if (cpumask_empty(&event_pmu->supported_cpus))
 			goto error;
-		cpu = cpumask_first(&h_pmu->supported_cpus);
+		cpu = cpumask_first(&event_pmu->supported_cpus);
 	} else
 		cpu = raw_smp_processor_id();
 	cpuc->pmu = event_pmu;
@@ -2441,7 +2439,6 @@ static int validate_group(struct perf_event *event)
 
 static int x86_pmu_event_init(struct perf_event *event)
 {
-	struct x86_hybrid_pmu *pmu = NULL;
 	int err;
 
 	if ((event->attr.type != event->pmu->type) &&
@@ -2450,8 +2447,7 @@ static int x86_pmu_event_init(struct perf_event *event)
 		return -ENOENT;
 
 	if (is_hybrid() && (event->cpu != -1)) {
-		pmu = hybrid_pmu(event->pmu);
-		if (!cpumask_test_cpu(event->cpu, &pmu->supported_cpus))
+		if (!cpumask_test_cpu(event->cpu, &event->pmu->supported_cpus))
 			return -ENOENT;
 	}
 
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index e288922..03ba014 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4311,7 +4311,7 @@ static bool init_hybrid_pmu(int cpu)
 	}
 
 	/* Only check and dump the PMU information for the first CPU */
-	if (!cpumask_empty(&pmu->supported_cpus))
+	if (!cpumask_empty(&pmu->pmu.supported_cpus))
 		goto end;
 
 	if (!check_hw_exists(&pmu->pmu, pmu->num_counters, pmu->num_counters_fixed))
@@ -4328,9 +4328,7 @@ static bool init_hybrid_pmu(int cpu)
 			     pmu->intel_ctrl);
 
 end:
-	cpumask_set_cpu(cpu, &pmu->supported_cpus);
 	cpuc->pmu = &pmu->pmu;
-
 	x86_pmu_update_cpu_context(&pmu->pmu, cpu);
 
 	return true;
@@ -4463,7 +4461,7 @@ static void intel_pmu_cpu_dead(int cpu)
 	intel_cpuc_finish(cpuc);
 
 	if (is_hybrid() && cpuc->pmu)
-		cpumask_clear_cpu(cpu, &hybrid_pmu(cpuc->pmu)->supported_cpus);
+		cpumask_clear_cpu(cpu, &cpuc->pmu->supported_cpus);
 }
 
 static void intel_pmu_sched_task(struct perf_event_context *ctx,
@@ -4494,10 +4492,9 @@ static int intel_pmu_aux_output_match(struct perf_event *event)
 
 static int intel_pmu_filter_match(struct perf_event *event)
 {
-	struct x86_hybrid_pmu *pmu = hybrid_pmu(event->pmu);
 	unsigned int cpu = smp_processor_id();
 
-	return cpumask_test_cpu(cpu, &pmu->supported_cpus);
+	return cpumask_test_cpu(cpu, &event->pmu->supported_cpus);
 }
 
 PMU_FORMAT_ATTR(offcore_rsp, "config1:0-63");
@@ -5299,7 +5296,7 @@ static umode_t hybrid_events_is_visible(struct kobject *kobj,
 
 static inline int hybrid_find_supported_cpu(struct x86_hybrid_pmu *pmu)
 {
-	int cpu = cpumask_first(&pmu->supported_cpus);
+	int cpu = cpumask_first(&pmu->pmu.supported_cpus);
 
 	return (cpu >= nr_cpu_ids) ? -1 : cpu;
 }
@@ -5355,7 +5352,7 @@ static ssize_t intel_hybrid_get_attr_cpus(struct device *dev,
 	struct x86_hybrid_pmu *pmu =
 		container_of(dev_get_drvdata(dev), struct x86_hybrid_pmu, pmu);
 
-	return cpumap_print_to_pagebuf(true, buf, &pmu->supported_cpus);
+	return cpumap_print_to_pagebuf(true, buf, &pmu->pmu.supported_cpus);
 }
 
 static DEVICE_ATTR(cpus, S_IRUGO, intel_hybrid_get_attr_cpus, NULL);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index ad87cb3..02abcac 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -636,7 +636,6 @@ struct x86_hybrid_pmu {
 	struct pmu			pmu;
 	const char			*name;
 	u8				cpu_type;
-	cpumask_t			supported_cpus;
 	union perf_capabilities		intel_cap;
 	u64				intel_ctrl;
 	int				max_pebs_events;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/4] perf: Check the supported CPU of an event
  2021-06-16 18:55 [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system kan.liang
  2021-06-16 18:55 ` [PATCH 1/4] perf: Update " kan.liang
  2021-06-16 18:55 ` [PATCH 2/4] perf/x86: Fix the x86_pmu_start WARNING on " kan.liang
@ 2021-06-16 18:55 ` kan.liang
  2021-06-16 18:55 ` [PATCH 4/4] perf/x86: Remove filter_match callback kan.liang
  2021-06-17 10:23 ` [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system Peter Zijlstra
  4 siblings, 0 replies; 11+ messages in thread
From: kan.liang @ 2021-06-16 18:55 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: acme, mark.rutland, ak, alexander.shishkin, namhyung, jolsa, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

All architectures, which support the filter_match callback, check
whether an event is schedulable on the current CPU. Since the
supported_cpus is moved to struct pmu, the check can be done in the
generic code. The filter_match callback can be avoided for some
architecture, e.g., x86.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 kernel/events/core.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index b172f54..7140f40 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2225,9 +2225,19 @@ static bool is_orphaned_event(struct perf_event *event)
 	return event->state == PERF_EVENT_STATE_DEAD;
 }
 
+static __always_inline bool pmu_can_sched_on_this_cpu(struct pmu *pmu)
+{
+	return cpumask_empty(&pmu->supported_cpus) ||
+	       cpumask_test_cpu(smp_processor_id(), &pmu->supported_cpus);
+}
+
 static inline int __pmu_filter_match(struct perf_event *event)
 {
 	struct pmu *pmu = event->pmu;
+
+	if (!pmu_can_sched_on_this_cpu(pmu))
+		return 0;
+
 	return pmu->filter_match ? pmu->filter_match(event) : 1;
 }
 
@@ -3825,7 +3835,7 @@ static inline void perf_event_context_update_hybrid(struct perf_event_context *c
 {
 	struct pmu *pmu = ctx->pmu;
 
-	if (cpumask_empty(&pmu->supported_cpus) || cpumask_test_cpu(smp_processor_id(), &pmu->supported_cpus))
+	if (pmu_can_sched_on_this_cpu(pmu))
 		return;
 
 	rcu_read_lock();
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/4] perf/x86: Remove filter_match callback
  2021-06-16 18:55 [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system kan.liang
                   ` (2 preceding siblings ...)
  2021-06-16 18:55 ` [PATCH 3/4] perf: Check the supported CPU of an event kan.liang
@ 2021-06-16 18:55 ` kan.liang
  2021-06-17 10:23 ` [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system Peter Zijlstra
  4 siblings, 0 replies; 11+ messages in thread
From: kan.liang @ 2021-06-16 18:55 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: acme, mark.rutland, ak, alexander.shishkin, namhyung, jolsa, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The filter_match callback is to check whether an event is schedulable on
the current CPU, which has been supported in the generic code.

Remove the filter_match callback.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/core.c       | 10 ----------
 arch/x86/events/intel/core.c |  8 --------
 arch/x86/events/perf_event.h |  1 -
 3 files changed, 19 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index ecebc66..4965171 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2640,14 +2640,6 @@ static int x86_pmu_aux_output_match(struct perf_event *event)
 	return 0;
 }
 
-static int x86_pmu_filter_match(struct perf_event *event)
-{
-	if (x86_pmu.filter_match)
-		return x86_pmu.filter_match(event);
-
-	return 1;
-}
-
 static struct pmu pmu = {
 	.pmu_enable		= x86_pmu_enable,
 	.pmu_disable		= x86_pmu_disable,
@@ -2675,8 +2667,6 @@ static struct pmu pmu = {
 	.check_period		= x86_pmu_check_period,
 
 	.aux_output_match	= x86_pmu_aux_output_match,
-
-	.filter_match		= x86_pmu_filter_match,
 };
 
 void arch_perf_update_userpage(struct perf_event *event,
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 03ba014..17b86d2 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4490,13 +4490,6 @@ static int intel_pmu_aux_output_match(struct perf_event *event)
 	return is_intel_pt_event(event);
 }
 
-static int intel_pmu_filter_match(struct perf_event *event)
-{
-	unsigned int cpu = smp_processor_id();
-
-	return cpumask_test_cpu(cpu, &event->pmu->supported_cpus);
-}
-
 PMU_FORMAT_ATTR(offcore_rsp, "config1:0-63");
 
 PMU_FORMAT_ATTR(ldlat, "config1:0-15");
@@ -6131,7 +6124,6 @@ __init int intel_pmu_init(void)
 		x86_pmu.update_topdown_event = adl_update_topdown_event;
 		x86_pmu.set_topdown_event_period = adl_set_topdown_event_period;
 
-		x86_pmu.filter_match = intel_pmu_filter_match;
 		x86_pmu.get_event_constraints = adl_get_event_constraints;
 		x86_pmu.hw_config = adl_hw_config;
 		x86_pmu.limit_period = spr_limit_period;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 02abcac..9b6c9d8 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -883,7 +883,6 @@ struct x86_pmu {
 
 	int (*aux_output_match) (struct perf_event *event);
 
-	int (*filter_match)(struct perf_event *event);
 	/*
 	 * Hybrid support
 	 *
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system
  2021-06-16 18:55 [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system kan.liang
                   ` (3 preceding siblings ...)
  2021-06-16 18:55 ` [PATCH 4/4] perf/x86: Remove filter_match callback kan.liang
@ 2021-06-17 10:23 ` Peter Zijlstra
  2021-06-17 11:33   ` Peter Zijlstra
  4 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2021-06-17 10:23 UTC (permalink / raw)
  To: kan.liang
  Cc: mingo, linux-kernel, acme, mark.rutland, ak, alexander.shishkin,
	namhyung, jolsa

On Wed, Jun 16, 2021 at 11:55:30AM -0700, kan.liang@linux.intel.com wrote:

> To fix the issue, the generic perf codes have to understand the
> supported CPU mask of a specific hybrid PMU. So it can update the
> ctx->pmu accordingly, when a task is scheduled on a CPU which has
> a different type of PMU from the previous CPU. The supported_cpus
> has to be moved to the struct pmu.

Urghh.. I so hate this :-/

I *did* point you to:

  https://lore.kernel.org/lkml/20181010104559.GO5728@hirez.programming.kicks-ass.net/

when you started this whole hybrid crud, and I think that's still the
correct thing to do.

Still, let me consider if there's a workable short-term cludge I hate
less.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system
  2021-06-17 10:23 ` [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system Peter Zijlstra
@ 2021-06-17 11:33   ` Peter Zijlstra
  2021-06-17 14:10     ` Liang, Kan
                       ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Peter Zijlstra @ 2021-06-17 11:33 UTC (permalink / raw)
  To: kan.liang
  Cc: mingo, linux-kernel, acme, mark.rutland, ak, alexander.shishkin,
	namhyung, jolsa

On Thu, Jun 17, 2021 at 12:23:06PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 16, 2021 at 11:55:30AM -0700, kan.liang@linux.intel.com wrote:
> 
> > To fix the issue, the generic perf codes have to understand the
> > supported CPU mask of a specific hybrid PMU. So it can update the
> > ctx->pmu accordingly, when a task is scheduled on a CPU which has
> > a different type of PMU from the previous CPU. The supported_cpus
> > has to be moved to the struct pmu.
> 
> Urghh.. I so hate this :-/
> 
> I *did* point you to:
> 
>   https://lore.kernel.org/lkml/20181010104559.GO5728@hirez.programming.kicks-ass.net/
> 
> when you started this whole hybrid crud, and I think that's still the
> correct thing to do.
> 
> Still, let me consider if there's a workable short-term cludge I hate
> less.

How's this? We already have x86_pmu_update_cpu_context() setting the
'correct' pmu in the cpuctx, so we can simply fold that back into the
task context.

For normal use this is a no-op.

Now I need to go audit all ctx->pmu usage :-(

---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index db4604c4c502..6a496c29ef00 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3822,9 +3822,16 @@ static void perf_event_context_sched_in(struct perf_event_context *ctx,
 					struct task_struct *task)
 {
 	struct perf_cpu_context *cpuctx;
-	struct pmu *pmu = ctx->pmu;
+	struct pmu *pmu;
 
 	cpuctx = __get_cpu_context(ctx);
+
+	/*
+	 * HACK; for HETEROGENOUS the task context might have switched to a
+	 * different PMU, don't bother gating this.
+	 */
+	pmu = ctx->pmu = cpuctx->ctx.pmu;
+
 	if (cpuctx->task_ctx == ctx) {
 		if (cpuctx->sched_cb_usage)
 			__perf_pmu_sched_task(cpuctx, true);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system
  2021-06-17 11:33   ` Peter Zijlstra
@ 2021-06-17 14:10     ` Liang, Kan
  2021-06-17 19:32       ` Peter Zijlstra
  2021-06-18 13:54     ` Liang, Kan
  2021-06-24  7:09     ` [tip: perf/core] perf: Fix task context PMU for Hetero tip-bot2 for Peter Zijlstra
  2 siblings, 1 reply; 11+ messages in thread
From: Liang, Kan @ 2021-06-17 14:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, acme, mark.rutland, ak, alexander.shishkin,
	namhyung, jolsa



On 6/17/2021 7:33 AM, Peter Zijlstra wrote:
> On Thu, Jun 17, 2021 at 12:23:06PM +0200, Peter Zijlstra wrote:
>> On Wed, Jun 16, 2021 at 11:55:30AM -0700, kan.liang@linux.intel.com wrote:
>>
>>> To fix the issue, the generic perf codes have to understand the
>>> supported CPU mask of a specific hybrid PMU. So it can update the
>>> ctx->pmu accordingly, when a task is scheduled on a CPU which has
>>> a different type of PMU from the previous CPU. The supported_cpus
>>> has to be moved to the struct pmu.
>>
>> Urghh.. I so hate this :-/
>>
>> I *did* point you to:
>>
>>    https://lore.kernel.org/lkml/20181010104559.GO5728@hirez.programming.kicks-ass.net/
>>
>> when you started this whole hybrid crud

Yes, to work around the hybrid, I updated the PMU for the CPU context 
accordingly, but not the task context. :( This issue is found in a 
stress test that was not ready at that time. Sorry for that.

>>, and I think that's still the
>> correct thing to do.
>> >> Still, let me consider if there's a workable short-term cludge I hate
>> less.
> 
> How's this? We already have x86_pmu_update_cpu_context() setting the
> 'correct' pmu in the cpuctx, so we can simply fold that back into the
> task context.
> 
> For normal use this is a no-op.
> 
> Now I need to go audit all ctx->pmu usage :-(
> 
> ---
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index db4604c4c502..6a496c29ef00 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3822,9 +3822,16 @@ static void perf_event_context_sched_in(struct perf_event_context *ctx,
>   					struct task_struct *task)
>   {
>   	struct perf_cpu_context *cpuctx;
> -	struct pmu *pmu = ctx->pmu;
> +	struct pmu *pmu;
>   
>   	cpuctx = __get_cpu_context(ctx);
> +
> +	/*
> +	 * HACK; for HETEROGENOUS the task context might have switched to a
> +	 * different PMU, don't bother gating this.
> +	 */
> +	pmu = ctx->pmu = cpuctx->ctx.pmu;
> +

I think all the perf_sw_context PMUs share the same pmu_cpu_context. so 
the cpuctx->ctx.pmu should be always the first registered 
perf_sw_context PMU which is perf_swevent. The ctx->pmu could be another 
software PMU.

In theory, the perf_sw_context PMUs should have a similar issue. If the 
events are from different perf_sw_context PMUs, we should 
perf_pmu_disable() all of the PMUs before schedule them, but the 
ctx->pmu only tracks the first one.

I don't have a good way to fix the perf_sw_context PMUs. I think we have 
to go through the event list and find all PMUs. But I don't think it's 
worth doing.

Maybe we should only apply the change for the hybrid PMUs, and leave 
other PMUs as is.

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6fee4a7..df9cce6 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3821,9 +3821,19 @@ static void perf_event_context_sched_in(struct 
perf_event_context *ctx,
  					struct task_struct *task)
  {
  	struct perf_cpu_context *cpuctx;
-	struct pmu *pmu = ctx->pmu;
+	struct pmu *pmu;

  	cpuctx = __get_cpu_context(ctx);
+
+	if (ctx->pmu->capabilities & PERF_PMU_CAP_HETEROGENEOUS_CPUS) {
+		/*
+		 * HACK; for HETEROGENOUS the task context might have switched to a
+		 * different PMU, don't bother gating this.
+		 */
+		pmu = ctx->pmu = cpuctx->ctx.pmu;
+	} else
+		pmu = ctx->pmu;
+
  	if (cpuctx->task_ctx == ctx) {
  		if (cpuctx->sched_cb_usage)
  			__perf_pmu_sched_task(cpuctx, true);



Thanks,
Kan

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system
  2021-06-17 14:10     ` Liang, Kan
@ 2021-06-17 19:32       ` Peter Zijlstra
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2021-06-17 19:32 UTC (permalink / raw)
  To: Liang, Kan
  Cc: mingo, linux-kernel, acme, mark.rutland, ak, alexander.shishkin,
	namhyung, jolsa

On Thu, Jun 17, 2021 at 10:10:37AM -0400, Liang, Kan wrote:
> I think all the perf_sw_context PMUs share the same pmu_cpu_context. so the
> cpuctx->ctx.pmu should be always the first registered perf_sw_context PMU
> which is perf_swevent. The ctx->pmu could be another software PMU.

Is there actually anything that relies on that? IIRC the sw pmus only
use event->pmu->foo() methods (exactly because the ctx->pmu is
unreliable for them).

> In theory, the perf_sw_context PMUs should have a similar issue. If the
> events are from different perf_sw_context PMUs, we should perf_pmu_disable()
> all of the PMUs before schedule them, but the ctx->pmu only tracks the first
> one.
> 
> I don't have a good way to fix the perf_sw_context PMUs. I think we have to
> go through the event list and find all PMUs. But I don't think it's worth
> doing.

Yeah, the software PMUs are misserable, they're one of the things I wish
I'd done differently. Cleaning that up is *somewhere* on the TODO list.

So I *think* it should work as is and we can avoid the extra check, but
let me know what actual testing does.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system
  2021-06-17 11:33   ` Peter Zijlstra
  2021-06-17 14:10     ` Liang, Kan
@ 2021-06-18 13:54     ` Liang, Kan
  2021-06-24  7:09     ` [tip: perf/core] perf: Fix task context PMU for Hetero tip-bot2 for Peter Zijlstra
  2 siblings, 0 replies; 11+ messages in thread
From: Liang, Kan @ 2021-06-18 13:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, acme, mark.rutland, ak, alexander.shishkin,
	namhyung, jolsa



On 6/17/2021 7:33 AM, Peter Zijlstra wrote:
> On Thu, Jun 17, 2021 at 12:23:06PM +0200, Peter Zijlstra wrote:
>> On Wed, Jun 16, 2021 at 11:55:30AM -0700, kan.liang@linux.intel.com wrote:
>>
>>> To fix the issue, the generic perf codes have to understand the
>>> supported CPU mask of a specific hybrid PMU. So it can update the
>>> ctx->pmu accordingly, when a task is scheduled on a CPU which has
>>> a different type of PMU from the previous CPU. The supported_cpus
>>> has to be moved to the struct pmu.
>>
>> Urghh.. I so hate this :-/
>>
>> I *did* point you to:
>>
>>    https://lore.kernel.org/lkml/20181010104559.GO5728@hirez.programming.kicks-ass.net/
>>
>> when you started this whole hybrid crud, and I think that's still the
>> correct thing to do.
>>
>> Still, let me consider if there's a workable short-term cludge I hate
>> less.
> 
> How's this? We already have x86_pmu_update_cpu_context() setting the
> 'correct' pmu in the cpuctx, so we can simply fold that back into the
> task context.
> 
> For normal use this is a no-op.
> 
> Now I need to go audit all ctx->pmu usage :-(
> 
> ---
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index db4604c4c502..6a496c29ef00 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3822,9 +3822,16 @@ static void perf_event_context_sched_in(struct perf_event_context *ctx,
>   					struct task_struct *task)
>   {
>   	struct perf_cpu_context *cpuctx;
> -	struct pmu *pmu = ctx->pmu;
> +	struct pmu *pmu;
>   
>   	cpuctx = __get_cpu_context(ctx);
> +
> +	/*
> +	 * HACK; for HETEROGENOUS the task context might have switched to a

%s/HETEROGENOUS/HETEROGENEOUS/

> +	 * different PMU, don't bother gating this.
> +	 */
> +	pmu = ctx->pmu = cpuctx->ctx.pmu;
> +
>   	if (cpuctx->task_ctx == ctx) {
>   		if (cpuctx->sched_cb_usage)
>   			__perf_pmu_sched_task(cpuctx, true);
> 

With the patch, the issue has gone.
We've also run a full test on ADL and SPR. There is no other issues 
found with the patch.

Tested-by: Kan Liang <kan.liang@linux.intel.com>

Thanks,
Kan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tip: perf/core] perf: Fix task context PMU for Hetero
  2021-06-17 11:33   ` Peter Zijlstra
  2021-06-17 14:10     ` Liang, Kan
  2021-06-18 13:54     ` Liang, Kan
@ 2021-06-24  7:09     ` tip-bot2 for Peter Zijlstra
  2 siblings, 0 replies; 11+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-06-24  7:09 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Kan Liang, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     012669c740e6e2afa8bdb95394d06676f933dd2d
Gitweb:        https://git.kernel.org/tip/012669c740e6e2afa8bdb95394d06676f933dd2d
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 22 Jun 2021 16:21:01 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 23 Jun 2021 18:30:56 +02:00

perf: Fix task context PMU for Hetero

On HETEROGENEOUS hardware (ARM big.Little, Intel Alderlake etc.) each
CPU might have a different hardware PMU. Since each such PMU is
represented by a different struct pmu, but we only have a single HW
task context.

That means that the task context needs to switch PMU type when it
switches CPUs.

Not doing this means that ctx->pmu calls (pmu_{dis,en}able(),
{start,commit,cancel}_txn() etc.) are called against the wrong PMU and
things will go wobbly.

Fixes: f83d2f91d259 ("perf/x86/intel: Add Alder Lake Hybrid support")
Reported-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lkml.kernel.org/r/YMsy7BuGT8nBTspT@hirez.programming.kicks-ass.net
---
 kernel/events/core.c |  9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6c964de..0e125ae 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3822,9 +3822,16 @@ static void perf_event_context_sched_in(struct perf_event_context *ctx,
 					struct task_struct *task)
 {
 	struct perf_cpu_context *cpuctx;
-	struct pmu *pmu = ctx->pmu;
+	struct pmu *pmu;
 
 	cpuctx = __get_cpu_context(ctx);
+
+	/*
+	 * HACK: for HETEROGENEOUS the task context might have switched to a
+	 * different PMU, force (re)set the context,
+	 */
+	pmu = ctx->pmu = cpuctx->ctx.pmu;
+
 	if (cpuctx->task_ctx == ctx) {
 		if (cpuctx->sched_cb_usage)
 			__perf_pmu_sched_task(cpuctx, true);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-06-24  7:09 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-16 18:55 [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system kan.liang
2021-06-16 18:55 ` [PATCH 1/4] perf: Update " kan.liang
2021-06-16 18:55 ` [PATCH 2/4] perf/x86: Fix the x86_pmu_start WARNING on " kan.liang
2021-06-16 18:55 ` [PATCH 3/4] perf: Check the supported CPU of an event kan.liang
2021-06-16 18:55 ` [PATCH 4/4] perf/x86: Remove filter_match callback kan.liang
2021-06-17 10:23 ` [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system Peter Zijlstra
2021-06-17 11:33   ` Peter Zijlstra
2021-06-17 14:10     ` Liang, Kan
2021-06-17 19:32       ` Peter Zijlstra
2021-06-18 13:54     ` Liang, Kan
2021-06-24  7:09     ` [tip: perf/core] perf: Fix task context PMU for Hetero tip-bot2 for Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).