linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] perf/core: install cgroup events to correct cpuctx
@ 2020-01-22 19:50 Song Liu
  2020-01-24  9:15 ` Peter Zijlstra
  2020-01-29 11:32 ` [tip: perf/urgent] perf/cgroups: Install cgroup events to correct cpuctx tip-bot2 for Song Liu
  0 siblings, 2 replies; 9+ messages in thread
From: Song Liu @ 2020-01-22 19:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Song Liu, Andi Kleen, Peter Zijlstra,
	Alexander Shishkin, Arnaldo Carvalho de Melo, Jiri Olsa,
	Namhyung Kim, Thomas Gleixner

cgroup events are always installed in the cpuctx. However, when it is not
installed via IPI, list_update_cgroup_event() adds it to cpuctx of current
CPU, which triggers the following with CONFIG_DEBUG_LIST:

[   31.776974] ------------[ cut here ]------------
[   31.777570] list_add double add: new=ffff888ff7cf0db0, prev=ffff888ff7ce82f0, next=ffff888ff7cf0db0.
[   31.778737] WARNING: CPU: 3 PID: 1186 at lib/list_debug.c:31 __list_add_valid+0x67/0x70
[   31.779745] Modules linked in:
[   31.780138] CPU: 3 PID: 1186 Comm: perf Tainted: G        W         5.5.0-rc6+ #3962
[   31.781125] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
[   31.782199] RIP: 0010:__list_add_valid+0x67/0x70
[   31.782774] Code: c1 4c 89 c6 48 c7 c7 f8 cd 57 82 e8 43 a0 a4 ff 0f 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 48 ce 57 82 e8 29 a0 a4 ff <0f> 0b 31 c0 c3 0f 1f 40 00 48 b9 00 01 00 00 00 00 ad de 48 8b 07
[   31.785066] RSP: 0018:ffffc900013ffdb8 EFLAGS: 00010086
[   31.785713] RAX: 0000000000000000 RBX: ffff888d5bb4a000 RCX: 0000000000000007
[   31.786596] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff888ff7cd8870
[   31.787471] RBP: ffff888ff7db0c40 R08: 0000000000001b3b R09: 0000000000000067
[   31.788352] R10: ffff888ff7db0c90 R11: ffffc900013ffc65 R12: ffff888ff7cf0c40
[   31.789229] R13: ffff888ff7ce82f0 R14: ffff888ff7cf0db0 R15: ffff888ff7cf0db0
[   31.790115] FS:  00007f14cd1557c0(0000) GS:ffff888ff7cc0000(0000) knlGS:0000000000000000
[   31.791111] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   31.791824] CR2: 00007f675e3ace60 CR3: 0000000d55e76006 CR4: 00000000003606e0
[   31.792703] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   31.793583] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   31.794461] Call Trace:
[   31.794776]  list_add_event+0xe5/0x230
[   31.795247]  perf_install_in_context+0x155/0x1f0
[   31.795819]  ? anon_inode_getfile+0x7f/0xd0
[   31.796342]  __do_sys_perf_event_open+0x323/0xd60
[   31.796921]  do_syscall_64+0x55/0x1c0
[   31.797384]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   31.798008] RIP: 0033:0x7f14ca4a33e9
[   31.798460] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6f 9a 2c 00 f7 d8 64 89 01 48
[   31.800729] RSP: 002b:00007ffeaf16e5a8 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
[   31.801655] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f14ca4a33e9
[   31.802536] RDX: 0000000000000006 RSI: 0000000000000003 RDI: 0000000001053fd8
[   31.803412] RBP: 00007ffeaf16e670 R08: 000000000000000c R09: 000000000000000c
[   31.804286] R10: 00000000ffffffff R11: 0000000000000246 R12: 00000000ffffffff
[   31.805160] R13: 0000000000000006 R14: 0000000001053af0 R15: 0000000001053fc0
[   31.806031] ---[ end trace ef48f280582d1897 ]---

To reproduce this, we can simply run:
  perf stat -e cs -a &
  perf stat -e cs -G anycgroup

Fix this by installing it to cpuctx that contains event->ctx, and the
proper cgrp_cpuctx_list.

Fixes: db0503e4f675 ("perf/core: Optimize perf_install_in_event()")
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
 kernel/events/core.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index d25f2de45996..2248a6090a5f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -951,9 +951,9 @@ list_update_cgroup_event(struct perf_event *event,
 
 	/*
 	 * Because cgroup events are always per-cpu events,
-	 * this will always be called from the right CPU.
+	 * @ctx == &cpuctx->ctx.
 	 */
-	cpuctx = __get_cpu_context(ctx);
+	cpuctx = container_of(ctx, struct perf_cpu_context, ctx);
 
 	/*
 	 * Since setting cpuctx->cgrp is conditional on the current @cgrp
@@ -979,7 +979,8 @@ list_update_cgroup_event(struct perf_event *event,
 
 	cpuctx_entry = &cpuctx->cgrp_cpuctx_entry;
 	if (add)
-		list_add(cpuctx_entry, this_cpu_ptr(&cgrp_cpuctx_list));
+		list_add(cpuctx_entry,
+			 per_cpu_ptr(&cgrp_cpuctx_list, event->cpu));
 	else
 		list_del(cpuctx_entry);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] perf/core: install cgroup events to correct cpuctx
  2020-01-22 19:50 [PATCH v2] perf/core: install cgroup events to correct cpuctx Song Liu
@ 2020-01-24  9:15 ` Peter Zijlstra
  2020-03-06  7:48   ` Song Liu
  2020-01-29 11:32 ` [tip: perf/urgent] perf/cgroups: Install cgroup events to correct cpuctx tip-bot2 for Song Liu
  1 sibling, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2020-01-24  9:15 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-kernel, kernel-team, Andi Kleen, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Thomas Gleixner

On Wed, Jan 22, 2020 at 11:50:27AM -0800, Song Liu wrote:
> cgroup events are always installed in the cpuctx. However, when it is not
> installed via IPI, list_update_cgroup_event() adds it to cpuctx of current
> CPU, which triggers the following with CONFIG_DEBUG_LIST:
> 

> [   31.777570] list_add double add: new=ffff888ff7cf0db0, prev=ffff888ff7ce82f0, next=ffff888ff7cf0db0.

> To reproduce this, we can simply run:
>   perf stat -e cs -a &
>   perf stat -e cs -G anycgroup
> 
> Fix this by installing it to cpuctx that contains event->ctx, and the
> proper cgrp_cpuctx_list.
> 
> Fixes: db0503e4f675 ("perf/core: Optimize perf_install_in_event()")
> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Andi Kleen <andi@firstfloor.org>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Song Liu <songliubraving@fb.com>

Thanks!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [tip: perf/urgent] perf/cgroups: Install cgroup events to correct cpuctx
  2020-01-22 19:50 [PATCH v2] perf/core: install cgroup events to correct cpuctx Song Liu
  2020-01-24  9:15 ` Peter Zijlstra
@ 2020-01-29 11:32 ` tip-bot2 for Song Liu
  1 sibling, 0 replies; 9+ messages in thread
From: tip-bot2 for Song Liu @ 2020-01-29 11:32 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Song Liu, Ingo Molnar, stable, x86, LKML

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     07c5972951f088094776038006a0592a46d14bbc
Gitweb:        https://git.kernel.org/tip/07c5972951f088094776038006a0592a46d14bbc
Author:        Song Liu <songliubraving@fb.com>
AuthorDate:    Wed, 22 Jan 2020 11:50:27 -08:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 28 Jan 2020 21:20:19 +01:00

perf/cgroups: Install cgroup events to correct cpuctx

cgroup events are always installed in the cpuctx. However, when it is not
installed via IPI, list_update_cgroup_event() adds it to cpuctx of current
CPU, which triggers list corruption:

  [] list_add double add: new=ffff888ff7cf0db0, prev=ffff888ff7ce82f0, next=ffff888ff7cf0db0.

To reproduce this, we can simply run:

  # perf stat -e cs -a &
  # perf stat -e cs -G anycgroup

Fix this by installing it to cpuctx that contains event->ctx, and the
proper cgrp_cpuctx_list.

Fixes: db0503e4f675 ("perf/core: Optimize perf_install_in_event()")
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200122195027.2112449-1-songliubraving@fb.com
---
 kernel/events/core.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2d9aeba..fdb7f7e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -951,9 +951,9 @@ list_update_cgroup_event(struct perf_event *event,
 
 	/*
 	 * Because cgroup events are always per-cpu events,
-	 * this will always be called from the right CPU.
+	 * @ctx == &cpuctx->ctx.
 	 */
-	cpuctx = __get_cpu_context(ctx);
+	cpuctx = container_of(ctx, struct perf_cpu_context, ctx);
 
 	/*
 	 * Since setting cpuctx->cgrp is conditional on the current @cgrp
@@ -979,7 +979,8 @@ list_update_cgroup_event(struct perf_event *event,
 
 	cpuctx_entry = &cpuctx->cgrp_cpuctx_entry;
 	if (add)
-		list_add(cpuctx_entry, this_cpu_ptr(&cgrp_cpuctx_list));
+		list_add(cpuctx_entry,
+			 per_cpu_ptr(&cgrp_cpuctx_list, event->cpu));
 	else
 		list_del(cpuctx_entry);
 }

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] perf/core: install cgroup events to correct cpuctx
  2020-01-24  9:15 ` Peter Zijlstra
@ 2020-03-06  7:48   ` Song Liu
  2020-03-18  7:07     ` Song Liu
  0 siblings, 1 reply; 9+ messages in thread
From: Song Liu @ 2020-03-06  7:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Kernel Team, Andi Kleen, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Thomas Gleixner



> On Jan 24, 2020, at 1:15 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Wed, Jan 22, 2020 at 11:50:27AM -0800, Song Liu wrote:
>> cgroup events are always installed in the cpuctx. However, when it is not
>> installed via IPI, list_update_cgroup_event() adds it to cpuctx of current
>> CPU, which triggers the following with CONFIG_DEBUG_LIST:
>> 
> 
>> [   31.777570] list_add double add: new=ffff888ff7cf0db0, prev=ffff888ff7ce82f0, next=ffff888ff7cf0db0.
> 
>> To reproduce this, we can simply run:
>>  perf stat -e cs -a &
>>  perf stat -e cs -G anycgroup
>> 
>> Fix this by installing it to cpuctx that contains event->ctx, and the
>> proper cgrp_cpuctx_list.
>> 
>> Fixes: db0503e4f675 ("perf/core: Optimize perf_install_in_event()")
>> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>> Cc: Andi Kleen <andi@firstfloor.org>
>> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
>> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
>> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
>> Cc: Jiri Olsa <jolsa@redhat.com>
>> Cc: Namhyung Kim <namhyung@kernel.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Signed-off-by: Song Liu <songliubraving@fb.com>
> 
> Thanks!

I just realized this won't fully fix the problem, because later in 
list_update_cgroup_event() we use "current":

	struct perf_cgroup *cgrp = perf_cgroup_from_task(current, ctx);

I don't have a good idea to fix this cleanly. How about we just use IPI 
to install cgroup events (like v1):

diff --git a/kernel/events/core.c b/kernel/events/core.c
index a1f8bde19b56..36e8fe27e2a1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2682,14 +2682,18 @@ perf_install_in_context(struct perf_event_context *ctx,
	smp_store_release(&event->ctx, ctx);

	/*
-	 * perf_event_attr::disabled events will not run and can be initialized
-	 * without IPI. Except when this is the first event for the context, in
-	 * that case we need the magic of the IPI to set ctx->is_active.
+	 * perf_event_attr::disabled events will not run and can be
+	 * initialized without IPI. Except:
+	 *   1. when this is the first event for the context, in that case
+	 *      we need the magic of the IPI to set ctx->is_active;
+	 *   2. cgroup event in OFF state, because it is installed in the
+	 *      cpuctx.
	 *
	 * The IOC_ENABLE that is sure to follow the creation of a disabled
	 * event will issue the IPI and reprogram the hardware.
	 */
-	if (__perf_effective_state(event) == PERF_EVENT_STATE_OFF && ctx->nr_events) {
+	if (__perf_effective_state(event) == PERF_EVENT_STATE_OFF &&
+	    !is_cgroup_event(event) && ctx->nr_events) {
		raw_spin_lock_irq(&ctx->lock);
		if (ctx->task == TASK_TOMBSTONE) {
			raw_spin_unlock_irq(&ctx->lock);

Thanks,
Song




^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] perf/core: install cgroup events to correct cpuctx
  2020-03-06  7:48   ` Song Liu
@ 2020-03-18  7:07     ` Song Liu
  2020-03-18 18:05       ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Song Liu @ 2020-03-18  7:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Kernel Team, Andi Kleen, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Thomas Gleixner

Hi Peter, 

> On Mar 5, 2020, at 11:48 PM, Song Liu <songliubraving@fb.com> wrote:
> 
> 
> 
>> On Jan 24, 2020, at 1:15 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> 
>> On Wed, Jan 22, 2020 at 11:50:27AM -0800, Song Liu wrote:
>>> cgroup events are always installed in the cpuctx. However, when it is not
>>> installed via IPI, list_update_cgroup_event() adds it to cpuctx of current
>>> CPU, which triggers the following with CONFIG_DEBUG_LIST:
>>> 
>> 
>>> [   31.777570] list_add double add: new=ffff888ff7cf0db0, prev=ffff888ff7ce82f0, next=ffff888ff7cf0db0.
>> 
>>> To reproduce this, we can simply run:
>>> perf stat -e cs -a &
>>> perf stat -e cs -G anycgroup
>>> 
>>> Fix this by installing it to cpuctx that contains event->ctx, and the
>>> proper cgrp_cpuctx_list.
>>> 
>>> Fixes: db0503e4f675 ("perf/core: Optimize perf_install_in_event()")
>>> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>> Cc: Andi Kleen <andi@firstfloor.org>
>>> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
>>> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
>>> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
>>> Cc: Jiri Olsa <jolsa@redhat.com>
>>> Cc: Namhyung Kim <namhyung@kernel.org>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Signed-off-by: Song Liu <songliubraving@fb.com>
>> 
>> Thanks!
> 
> I just realized this won't fully fix the problem, because later in 
> list_update_cgroup_event() we use "current":
> 
> 	struct perf_cgroup *cgrp = perf_cgroup_from_task(current, ctx);

Could you please share your thoughts on this? I think we cannot use current
in list_update_cgroup_event(), unless we call it on the target CPU. 

Thanks,
Song


> 
> I don't have a good idea to fix this cleanly. How about we just use IPI 
> to install cgroup events (like v1):
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index a1f8bde19b56..36e8fe27e2a1 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2682,14 +2682,18 @@ perf_install_in_context(struct perf_event_context *ctx,
> 	smp_store_release(&event->ctx, ctx);
> 
> 	/*
> -	 * perf_event_attr::disabled events will not run and can be initialized
> -	 * without IPI. Except when this is the first event for the context, in
> -	 * that case we need the magic of the IPI to set ctx->is_active.
> +	 * perf_event_attr::disabled events will not run and can be
> +	 * initialized without IPI. Except:
> +	 *   1. when this is the first event for the context, in that case
> +	 *      we need the magic of the IPI to set ctx->is_active;
> +	 *   2. cgroup event in OFF state, because it is installed in the
> +	 *      cpuctx.
> 	 *
> 	 * The IOC_ENABLE that is sure to follow the creation of a disabled
> 	 * event will issue the IPI and reprogram the hardware.
> 	 */
> -	if (__perf_effective_state(event) == PERF_EVENT_STATE_OFF && ctx->nr_events) {
> +	if (__perf_effective_state(event) == PERF_EVENT_STATE_OFF &&
> +	    !is_cgroup_event(event) && ctx->nr_events) {
> 		raw_spin_lock_irq(&ctx->lock);
> 		if (ctx->task == TASK_TOMBSTONE) {
> 			raw_spin_unlock_irq(&ctx->lock);
> 
> Thanks,
> Song
> 
> 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] perf/core: install cgroup events to correct cpuctx
  2020-03-18  7:07     ` Song Liu
@ 2020-03-18 18:05       ` Peter Zijlstra
  2020-03-18 19:33         ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2020-03-18 18:05 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-kernel, Kernel Team, Andi Kleen, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Thomas Gleixner

On Wed, Mar 18, 2020 at 07:07:29AM +0000, Song Liu wrote:
> Hi Peter, 
> 
> > On Mar 5, 2020, at 11:48 PM, Song Liu <songliubraving@fb.com> wrote:
> > 
> > 
> > 
> >> On Jan 24, 2020, at 1:15 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> 
> >> On Wed, Jan 22, 2020 at 11:50:27AM -0800, Song Liu wrote:
> >>> cgroup events are always installed in the cpuctx. However, when it is not
> >>> installed via IPI, list_update_cgroup_event() adds it to cpuctx of current
> >>> CPU, which triggers the following with CONFIG_DEBUG_LIST:
> >>> 
> >> 
> >>> [   31.777570] list_add double add: new=ffff888ff7cf0db0, prev=ffff888ff7ce82f0, next=ffff888ff7cf0db0.
> >> 
> >>> To reproduce this, we can simply run:
> >>> perf stat -e cs -a &
> >>> perf stat -e cs -G anycgroup
> >>> 
> >>> Fix this by installing it to cpuctx that contains event->ctx, and the
> >>> proper cgrp_cpuctx_list.
> >>> 
> >>> Fixes: db0503e4f675 ("perf/core: Optimize perf_install_in_event()")
> >>> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> >>> Cc: Andi Kleen <andi@firstfloor.org>
> >>> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> >>> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> >>> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> >>> Cc: Jiri Olsa <jolsa@redhat.com>
> >>> Cc: Namhyung Kim <namhyung@kernel.org>
> >>> Cc: Thomas Gleixner <tglx@linutronix.de>
> >>> Signed-off-by: Song Liu <songliubraving@fb.com>
> >> 
> >> Thanks!
> > 
> > I just realized this won't fully fix the problem, because later in 
> > list_update_cgroup_event() we use "current":
> > 
> > 	struct perf_cgroup *cgrp = perf_cgroup_from_task(current, ctx);
> 
> Could you please share your thoughts on this? I think we cannot use current
> in list_update_cgroup_event(), unless we call it on the target CPU. 

Bah, that cgroup crap is 'wrong'. It's pointless to track the
cpuctx->cgrp state for disabled events.

Let me see if I can unravel that crud.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] perf/core: install cgroup events to correct cpuctx
  2020-03-18 18:05       ` Peter Zijlstra
@ 2020-03-18 19:33         ` Peter Zijlstra
  2020-03-18 22:37           ` Song Liu
  2020-04-08 12:20           ` [tip: perf/urgent] perf/core: Fix event cgroup tracking tip-bot2 for Peter Zijlstra
  0 siblings, 2 replies; 9+ messages in thread
From: Peter Zijlstra @ 2020-03-18 19:33 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-kernel, Kernel Team, Andi Kleen, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Thomas Gleixner

On Wed, Mar 18, 2020 at 07:05:35PM +0100, Peter Zijlstra wrote:
> On Wed, Mar 18, 2020 at 07:07:29AM +0000, Song Liu wrote:

> > Could you please share your thoughts on this? I think we cannot use current
> > in list_update_cgroup_event(), unless we call it on the target CPU. 
> 
> Bah, that cgroup crap is 'wrong'. It's pointless to track the
> cpuctx->cgrp state for disabled events.
> 
> Let me see if I can unravel that crud.

This compiles, but I've no clue how to operate cgroups. Please test.

---
 kernel/events/core.c | 70 ++++++++++++++++++++++++++++++++--------------------
 1 file changed, 43 insertions(+), 27 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index ccf8d4fc6374..9f0713292cba 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -981,16 +981,10 @@ perf_cgroup_set_shadow_time(struct perf_event *event, u64 now)
 	event->shadow_ctx_time = now - t->timestamp;
 }
 
-/*
- * Update cpuctx->cgrp so that it is set when first cgroup event is added and
- * cleared when last cgroup event is removed.
- */
 static inline void
-list_update_cgroup_event(struct perf_event *event,
-			 struct perf_event_context *ctx, bool add)
+perf_cgroup_event_enable(struct perf_event *event, struct perf_event_context *ctx)
 {
 	struct perf_cpu_context *cpuctx;
-	struct list_head *cpuctx_entry;
 
 	if (!is_cgroup_event(event))
 		return;
@@ -1007,28 +1001,41 @@ list_update_cgroup_event(struct perf_event *event,
 	 * because if the first would mismatch, the second would not try again
 	 * and we would leave cpuctx->cgrp unset.
 	 */
-	if (add && !cpuctx->cgrp) {
+	if (ctx->is_active && !cpuctx->cgrp) {
 		struct perf_cgroup *cgrp = perf_cgroup_from_task(current, ctx);
 
 		if (cgroup_is_descendant(cgrp->css.cgroup, event->cgrp->css.cgroup))
 			cpuctx->cgrp = cgrp;
 	}
 
-	if (add && ctx->nr_cgroups++)
+	if (ctx->nr_cgroups++)
 		return;
-	else if (!add && --ctx->nr_cgroups)
+
+	list_add(&cpuctx->cgrp_cpuctx_entry,
+			per_cpu_ptr(&cgrp_cpuctx_list, event->cpu));
+}
+
+static inline void
+perf_cgroup_event_disable(struct perf_event *event, struct perf_event_context *ctx)
+{
+	struct perf_cpu_context *cpuctx;
+
+	if (!is_cgroup_event(event))
 		return;
 
-	/* no cgroup running */
-	if (!add)
+	/*
+	 * Because cgroup events are always per-cpu events,
+	 * @ctx == &cpuctx->ctx.
+	 */
+	cpuctx = container_of(ctx, struct perf_cpu_context, ctx);
+
+	if (--ctx->nr_cgroups)
+		return;
+
+	if (ctx->is_active && cpuctx->cgrp)
 		cpuctx->cgrp = NULL;
 
-	cpuctx_entry = &cpuctx->cgrp_cpuctx_entry;
-	if (add)
-		list_add(cpuctx_entry,
-			 per_cpu_ptr(&cgrp_cpuctx_list, event->cpu));
-	else
-		list_del(cpuctx_entry);
+	list_del(&cpuctx->cgrp_cpuctx_entry);
 }
 
 #else /* !CONFIG_CGROUP_PERF */
@@ -1094,11 +1101,14 @@ static inline u64 perf_cgroup_event_time(struct perf_event *event)
 }
 
 static inline void
-list_update_cgroup_event(struct perf_event *event,
-			 struct perf_event_context *ctx, bool add)
+perf_cgroup_event_enable(struct perf_event *event, struct perf_event_context *ctx)
 {
 }
 
+static inline void
+perf_cgroup_event_disable(struct perf_event *event, struct perf_event_context *ctx)
+{
+}
 #endif
 
 /*
@@ -1789,13 +1799,14 @@ list_add_event(struct perf_event *event, struct perf_event_context *ctx)
 		add_event_to_groups(event, ctx);
 	}
 
-	list_update_cgroup_event(event, ctx, true);
-
 	list_add_rcu(&event->event_entry, &ctx->event_list);
 	ctx->nr_events++;
 	if (event->attr.inherit_stat)
 		ctx->nr_stat++;
 
+	if (event->state > PERF_EVENT_STATE_OFF)
+		perf_cgroup_event_enable(event, ctx);
+
 	ctx->generation++;
 }
 
@@ -1971,8 +1982,6 @@ list_del_event(struct perf_event *event, struct perf_event_context *ctx)
 
 	event->attach_state &= ~PERF_ATTACH_CONTEXT;
 
-	list_update_cgroup_event(event, ctx, false);
-
 	ctx->nr_events--;
 	if (event->attr.inherit_stat)
 		ctx->nr_stat--;
@@ -1989,8 +1998,10 @@ list_del_event(struct perf_event *event, struct perf_event_context *ctx)
 	 * of error state is by explicit re-enabling
 	 * of the event
 	 */
-	if (event->state > PERF_EVENT_STATE_OFF)
+	if (event->state > PERF_EVENT_STATE_OFF) {
+		perf_cgroup_event_disable(event, ctx);
 		perf_event_set_state(event, PERF_EVENT_STATE_OFF);
+	}
 
 	ctx->generation++;
 }
@@ -2221,6 +2232,7 @@ event_sched_out(struct perf_event *event,
 
 	if (READ_ONCE(event->pending_disable) >= 0) {
 		WRITE_ONCE(event->pending_disable, -1);
+		perf_cgroup_event_disable(event, ctx);
 		state = PERF_EVENT_STATE_OFF;
 	}
 	perf_event_set_state(event, state);
@@ -2357,6 +2369,7 @@ static void __perf_event_disable(struct perf_event *event,
 		event_sched_out(event, cpuctx, ctx);
 
 	perf_event_set_state(event, PERF_EVENT_STATE_OFF);
+	perf_cgroup_event_disable(event, ctx);
 }
 
 /*
@@ -2740,7 +2753,7 @@ static int  __perf_install_in_context(void *info)
 	}
 
 #ifdef CONFIG_CGROUP_PERF
-	if (is_cgroup_event(event)) {
+	if (event->state > PERF_EVENT_STATE_OFF && is_cgroup_event(event)) {
 		/*
 		 * If the current cgroup doesn't match the event's
 		 * cgroup, we should not try to schedule it.
@@ -2900,6 +2913,7 @@ static void __perf_event_enable(struct perf_event *event,
 		ctx_sched_out(ctx, cpuctx, EVENT_TIME);
 
 	perf_event_set_state(event, PERF_EVENT_STATE_INACTIVE);
+	perf_cgroup_event_enable(event, ctx);
 
 	if (!ctx->is_active)
 		return;
@@ -3609,8 +3623,10 @@ static int merge_sched_in(struct perf_event *event, void *data)
 	}
 
 	if (event->state == PERF_EVENT_STATE_INACTIVE) {
-		if (event->attr.pinned)
+		if (event->attr.pinned) {
+			perf_cgroup_event_disable(event, ctx);
 			perf_event_set_state(event, PERF_EVENT_STATE_ERROR);
+		}
 
 		*can_add_hw = 0;
 		ctx->rotate_necessary = 1;

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] perf/core: install cgroup events to correct cpuctx
  2020-03-18 19:33         ` Peter Zijlstra
@ 2020-03-18 22:37           ` Song Liu
  2020-04-08 12:20           ` [tip: perf/urgent] perf/core: Fix event cgroup tracking tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 9+ messages in thread
From: Song Liu @ 2020-03-18 22:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Kernel Team, Andi Kleen, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Thomas Gleixner



> On Mar 18, 2020, at 12:33 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Wed, Mar 18, 2020 at 07:05:35PM +0100, Peter Zijlstra wrote:
>> On Wed, Mar 18, 2020 at 07:07:29AM +0000, Song Liu wrote:
> 
>>> Could you please share your thoughts on this? I think we cannot use current
>>> in list_update_cgroup_event(), unless we call it on the target CPU. 
>> 
>> Bah, that cgroup crap is 'wrong'. It's pointless to track the
>> cpuctx->cgrp state for disabled events.
>> 
>> Let me see if I can unravel that crud.
> 
> This compiles, but I've no clue how to operate cgroups. Please test.

Thanks Peter! This fixes the issue I saw. And the code looks good too. 

Reviewed-and-tested-by: Song Liu <songliubraving@fb.com>

[...]


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [tip: perf/urgent] perf/core: Fix event cgroup tracking
  2020-03-18 19:33         ` Peter Zijlstra
  2020-03-18 22:37           ` Song Liu
@ 2020-04-08 12:20           ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 9+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2020-04-08 12:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: stable, Song Liu, Peter Zijlstra (Intel), Ingo Molnar, x86, LKML

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     33238c50451596be86db1505ab65fee5172844d0
Gitweb:        https://git.kernel.org/tip/33238c50451596be86db1505ab65fee5172844d0
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Wed, 18 Mar 2020 20:33:37 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 08 Apr 2020 11:33:44 +02:00

perf/core: Fix event cgroup tracking

Song reports that installing cgroup events is broken since:

  db0503e4f675 ("perf/core: Optimize perf_install_in_event()")

The problem being that cgroup events try to track cpuctx->cgrp even
for disabled events, which is pointless and actively harmful since the
above commit. Rework the code to have explicit enable/disable hooks
for cgroup events, such that we can limit cgroup tracking to active
events.

More specifically, since the above commit disabled events are no
longer added to their context from the 'right' CPU, and we can't
access things like the current cgroup for a remote CPU.

Cc: <stable@vger.kernel.org> # v5.5+
Fixes: db0503e4f675 ("perf/core: Optimize perf_install_in_event()")
Reported-by: Song Liu <songliubraving@fb.com>
Tested-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200318193337.GB20760@hirez.programming.kicks-ass.net
---
 kernel/events/core.c | 70 ++++++++++++++++++++++++++-----------------
 1 file changed, 43 insertions(+), 27 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 55e4441..7afd0b5 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -983,16 +983,10 @@ perf_cgroup_set_shadow_time(struct perf_event *event, u64 now)
 	event->shadow_ctx_time = now - t->timestamp;
 }
 
-/*
- * Update cpuctx->cgrp so that it is set when first cgroup event is added and
- * cleared when last cgroup event is removed.
- */
 static inline void
-list_update_cgroup_event(struct perf_event *event,
-			 struct perf_event_context *ctx, bool add)
+perf_cgroup_event_enable(struct perf_event *event, struct perf_event_context *ctx)
 {
 	struct perf_cpu_context *cpuctx;
-	struct list_head *cpuctx_entry;
 
 	if (!is_cgroup_event(event))
 		return;
@@ -1009,28 +1003,41 @@ list_update_cgroup_event(struct perf_event *event,
 	 * because if the first would mismatch, the second would not try again
 	 * and we would leave cpuctx->cgrp unset.
 	 */
-	if (add && !cpuctx->cgrp) {
+	if (ctx->is_active && !cpuctx->cgrp) {
 		struct perf_cgroup *cgrp = perf_cgroup_from_task(current, ctx);
 
 		if (cgroup_is_descendant(cgrp->css.cgroup, event->cgrp->css.cgroup))
 			cpuctx->cgrp = cgrp;
 	}
 
-	if (add && ctx->nr_cgroups++)
+	if (ctx->nr_cgroups++)
 		return;
-	else if (!add && --ctx->nr_cgroups)
+
+	list_add(&cpuctx->cgrp_cpuctx_entry,
+			per_cpu_ptr(&cgrp_cpuctx_list, event->cpu));
+}
+
+static inline void
+perf_cgroup_event_disable(struct perf_event *event, struct perf_event_context *ctx)
+{
+	struct perf_cpu_context *cpuctx;
+
+	if (!is_cgroup_event(event))
 		return;
 
-	/* no cgroup running */
-	if (!add)
+	/*
+	 * Because cgroup events are always per-cpu events,
+	 * @ctx == &cpuctx->ctx.
+	 */
+	cpuctx = container_of(ctx, struct perf_cpu_context, ctx);
+
+	if (--ctx->nr_cgroups)
+		return;
+
+	if (ctx->is_active && cpuctx->cgrp)
 		cpuctx->cgrp = NULL;
 
-	cpuctx_entry = &cpuctx->cgrp_cpuctx_entry;
-	if (add)
-		list_add(cpuctx_entry,
-			 per_cpu_ptr(&cgrp_cpuctx_list, event->cpu));
-	else
-		list_del(cpuctx_entry);
+	list_del(&cpuctx->cgrp_cpuctx_entry);
 }
 
 #else /* !CONFIG_CGROUP_PERF */
@@ -1096,11 +1103,14 @@ static inline u64 perf_cgroup_event_time(struct perf_event *event)
 }
 
 static inline void
-list_update_cgroup_event(struct perf_event *event,
-			 struct perf_event_context *ctx, bool add)
+perf_cgroup_event_enable(struct perf_event *event, struct perf_event_context *ctx)
 {
 }
 
+static inline void
+perf_cgroup_event_disable(struct perf_event *event, struct perf_event_context *ctx)
+{
+}
 #endif
 
 /*
@@ -1791,13 +1801,14 @@ list_add_event(struct perf_event *event, struct perf_event_context *ctx)
 		add_event_to_groups(event, ctx);
 	}
 
-	list_update_cgroup_event(event, ctx, true);
-
 	list_add_rcu(&event->event_entry, &ctx->event_list);
 	ctx->nr_events++;
 	if (event->attr.inherit_stat)
 		ctx->nr_stat++;
 
+	if (event->state > PERF_EVENT_STATE_OFF)
+		perf_cgroup_event_enable(event, ctx);
+
 	ctx->generation++;
 }
 
@@ -1976,8 +1987,6 @@ list_del_event(struct perf_event *event, struct perf_event_context *ctx)
 
 	event->attach_state &= ~PERF_ATTACH_CONTEXT;
 
-	list_update_cgroup_event(event, ctx, false);
-
 	ctx->nr_events--;
 	if (event->attr.inherit_stat)
 		ctx->nr_stat--;
@@ -1994,8 +2003,10 @@ list_del_event(struct perf_event *event, struct perf_event_context *ctx)
 	 * of error state is by explicit re-enabling
 	 * of the event
 	 */
-	if (event->state > PERF_EVENT_STATE_OFF)
+	if (event->state > PERF_EVENT_STATE_OFF) {
+		perf_cgroup_event_disable(event, ctx);
 		perf_event_set_state(event, PERF_EVENT_STATE_OFF);
+	}
 
 	ctx->generation++;
 }
@@ -2226,6 +2237,7 @@ event_sched_out(struct perf_event *event,
 
 	if (READ_ONCE(event->pending_disable) >= 0) {
 		WRITE_ONCE(event->pending_disable, -1);
+		perf_cgroup_event_disable(event, ctx);
 		state = PERF_EVENT_STATE_OFF;
 	}
 	perf_event_set_state(event, state);
@@ -2363,6 +2375,7 @@ static void __perf_event_disable(struct perf_event *event,
 		event_sched_out(event, cpuctx, ctx);
 
 	perf_event_set_state(event, PERF_EVENT_STATE_OFF);
+	perf_cgroup_event_disable(event, ctx);
 }
 
 /*
@@ -2746,7 +2759,7 @@ static int  __perf_install_in_context(void *info)
 	}
 
 #ifdef CONFIG_CGROUP_PERF
-	if (is_cgroup_event(event)) {
+	if (event->state > PERF_EVENT_STATE_OFF && is_cgroup_event(event)) {
 		/*
 		 * If the current cgroup doesn't match the event's
 		 * cgroup, we should not try to schedule it.
@@ -2906,6 +2919,7 @@ static void __perf_event_enable(struct perf_event *event,
 		ctx_sched_out(ctx, cpuctx, EVENT_TIME);
 
 	perf_event_set_state(event, PERF_EVENT_STATE_INACTIVE);
+	perf_cgroup_event_enable(event, ctx);
 
 	if (!ctx->is_active)
 		return;
@@ -3616,8 +3630,10 @@ static int merge_sched_in(struct perf_event *event, void *data)
 	}
 
 	if (event->state == PERF_EVENT_STATE_INACTIVE) {
-		if (event->attr.pinned)
+		if (event->attr.pinned) {
+			perf_cgroup_event_disable(event, ctx);
 			perf_event_set_state(event, PERF_EVENT_STATE_ERROR);
+		}
 
 		*can_add_hw = 0;
 		ctx->rotate_necessary = 1;

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-04-08 12:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-22 19:50 [PATCH v2] perf/core: install cgroup events to correct cpuctx Song Liu
2020-01-24  9:15 ` Peter Zijlstra
2020-03-06  7:48   ` Song Liu
2020-03-18  7:07     ` Song Liu
2020-03-18 18:05       ` Peter Zijlstra
2020-03-18 19:33         ` Peter Zijlstra
2020-03-18 22:37           ` Song Liu
2020-04-08 12:20           ` [tip: perf/urgent] perf/core: Fix event cgroup tracking tip-bot2 for Peter Zijlstra
2020-01-29 11:32 ` [tip: perf/urgent] perf/cgroups: Install cgroup events to correct cpuctx tip-bot2 for Song Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).