* perf: bisected sampling bug in Linux 4.11-rc1
@ 2017-07-14 18:05 Vince Weaver
2017-07-14 18:25 ` Vince Weaver
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Vince Weaver @ 2017-07-14 18:05 UTC (permalink / raw)
To: linux-kernel
Cc: Alexander Shishkin, Ingo Molnar, Peter Zijlstra, Stephane Eranian
I was tracking down some regressions in my perf_event_test testsuite.
Some of the tests broke in the 4.11-rc1 timeframe.
I've bisected one of them, this report is about
tests/overflow/simul_oneshot_group_overflow
This test creates an event group containing two sampling events, set
to overflow to a signal handler (which disables and then refreshes the
event).
On a good kernel you get the following:
Event perf::instructions with period 1000000
Event perf::instructions with period 2000000
fd 3 overflows: 946 (perf::instructions/1000000)
fd 4 overflows: 473 (perf::instructions/2000000)
Ending counts:
Count 0: 946379875
Count 1: 946365218
With the broken kernels you get:
Event perf::instructions with period 1000000
Event perf::instructions with period 2000000
fd 3 overflows: 938 (perf::instructions/1000000)
fd 4 overflows: 318 (perf::instructions/2000000)
Ending counts:
Count 0: 946373080
Count 1: 653373058
487f05e18aa4efacee6357480f293a5afe6593b5 is the first bad commit
commit 487f05e18aa4efacee6357480f293a5afe6593b5
Author: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Date: Thu Jan 19 18:43:30 2017 +0200
perf/core: Optimize event rescheduling on active contexts
When new events are added to an active context, we go and reschedule
all cpu groups and all task groups in order to preserve the priority
(cpu pinned, task pinned, cpu flexible, task flexible), but in
reality we only need to reschedule groups of the same priority as
that of the events being added, and below.
This patch changes the behavior so that only groups that need to be
rescheduled are rescheduled.
Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20170119164330.22887-3-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: perf: bisected sampling bug in Linux 4.11-rc1
2017-07-14 18:05 perf: bisected sampling bug in Linux 4.11-rc1 Vince Weaver
@ 2017-07-14 18:25 ` Vince Weaver
2017-07-14 20:07 ` Alexander Shishkin
` (2 subsequent siblings)
3 siblings, 0 replies; 12+ messages in thread
From: Vince Weaver @ 2017-07-14 18:25 UTC (permalink / raw)
To: linux-kernel
Cc: Alexander Shishkin, Ingo Molnar, Peter Zijlstra, Stephane Eranian
On Fri, 14 Jul 2017, Vince Weaver wrote:
> On a good kernel you get the following:
> Event perf::instructions with period 1000000
> Event perf::instructions with period 2000000
> fd 3 overflows: 946 (perf::instructions/1000000)
> fd 4 overflows: 473 (perf::instructions/2000000)
> Ending counts:
> Count 0: 946379875
> Count 1: 946365218
>
> With the broken kernels you get:
> Event perf::instructions with period 1000000
> Event perf::instructions with period 2000000
> fd 3 overflows: 938 (perf::instructions/1000000)
> fd 4 overflows: 318 (perf::instructions/2000000)
> Ending counts:
> Count 0: 946373080
> Count 1: 653373058
additional relevant detail:
in the failing case, the group leader of the event set has
.pinned=1
If I change that to .pinned=0 then the test passes.
Vince
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: perf: bisected sampling bug in Linux 4.11-rc1
2017-07-14 18:05 perf: bisected sampling bug in Linux 4.11-rc1 Vince Weaver
2017-07-14 18:25 ` Vince Weaver
@ 2017-07-14 20:07 ` Alexander Shishkin
2017-07-14 20:14 ` Vince Weaver
2017-07-15 11:00 ` Ingo Molnar
2017-07-18 11:08 ` perf: bisected sampling bug in Linux 4.11-rc1 Alexander Shishkin
3 siblings, 1 reply; 12+ messages in thread
From: Alexander Shishkin @ 2017-07-14 20:07 UTC (permalink / raw)
To: Vince Weaver, linux-kernel; +Cc: Ingo Molnar, Peter Zijlstra, Stephane Eranian
Vince Weaver <vincent.weaver@maine.edu> writes:
> I was tracking down some regressions in my perf_event_test testsuite.
> Some of the tests broke in the 4.11-rc1 timeframe.
>
> I've bisected one of them, this report is about
> tests/overflow/simul_oneshot_group_overflow
> This test creates an event group containing two sampling events, set
> to overflow to a signal handler (which disables and then refreshes the
> event).
>
> On a good kernel you get the following:
> Event perf::instructions with period 1000000
> Event perf::instructions with period 2000000
> fd 3 overflows: 946 (perf::instructions/1000000)
> fd 4 overflows: 473 (perf::instructions/2000000)
> Ending counts:
> Count 0: 946379875
> Count 1: 946365218
>
> With the broken kernels you get:
> Event perf::instructions with period 1000000
> Event perf::instructions with period 2000000
> fd 3 overflows: 938 (perf::instructions/1000000)
> fd 4 overflows: 318 (perf::instructions/2000000)
> Ending counts:
> Count 0: 946373080
> Count 1: 653373058
I'm not sure I'm seeing it (granted, it's a friday evening): is it the
difference in overflow counts?
Also, are they cpu or task bound?
Regards,
--
Alex
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: perf: bisected sampling bug in Linux 4.11-rc1
2017-07-14 20:07 ` Alexander Shishkin
@ 2017-07-14 20:14 ` Vince Weaver
0 siblings, 0 replies; 12+ messages in thread
From: Vince Weaver @ 2017-07-14 20:14 UTC (permalink / raw)
To: Alexander Shishkin
Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Stephane Eranian
On Fri, 14 Jul 2017, Alexander Shishkin wrote:
> Vince Weaver <vincent.weaver@maine.edu> writes:
>
> > I was tracking down some regressions in my perf_event_test testsuite.
> > Some of the tests broke in the 4.11-rc1 timeframe.
> >
> > I've bisected one of them, this report is about
> > tests/overflow/simul_oneshot_group_overflow
> > This test creates an event group containing two sampling events, set
> > to overflow to a signal handler (which disables and then refreshes the
> > event).
> >
> > On a good kernel you get the following:
> > Event perf::instructions with period 1000000
> > Event perf::instructions with period 2000000
> > fd 3 overflows: 946 (perf::instructions/1000000)
> > fd 4 overflows: 473 (perf::instructions/2000000)
> > Ending counts:
> > Count 0: 946379875
> > Count 1: 946365218
> >
> > With the broken kernels you get:
> > Event perf::instructions with period 1000000
> > Event perf::instructions with period 2000000
> > fd 3 overflows: 938 (perf::instructions/1000000)
> > fd 4 overflows: 318 (perf::instructions/2000000)
> > Ending counts:
> > Count 0: 946373080
> > Count 1: 653373058
>
> I'm not sure I'm seeing it (granted, it's a friday evening): is it the
> difference in overflow counts?
It's two things.
It's created an grouped event, with the two events both
perf::instructions.
1. The total count at the end should be the same for both
(on the failing kernels it is not)
2. The overflow count for both events should be roughly
total_events/sample_freq.
(on the failing kernels it is not)
> Also, are they cpu or task bound?
The open looks like this:
perf_event_open(&pe,0,-1,-1,0);
On the failing case, the group leader is pinned.
The source code for the test is here:
https://github.com/deater/perf_event_tests/blob/master/tests/overflow/simul_oneshot_group_overflow.c
Vince
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: perf: bisected sampling bug in Linux 4.11-rc1
2017-07-14 18:05 perf: bisected sampling bug in Linux 4.11-rc1 Vince Weaver
2017-07-14 18:25 ` Vince Weaver
2017-07-14 20:07 ` Alexander Shishkin
@ 2017-07-15 11:00 ` Ingo Molnar
2017-07-15 14:37 ` Vince Weaver
2017-07-18 10:39 ` [tip:perf/urgent] Revert "perf/core: Optimize event rescheduling on active contexts" tip-bot for Ingo Molnar
2017-07-18 11:08 ` perf: bisected sampling bug in Linux 4.11-rc1 Alexander Shishkin
3 siblings, 2 replies; 12+ messages in thread
From: Ingo Molnar @ 2017-07-15 11:00 UTC (permalink / raw)
To: Vince Weaver
Cc: linux-kernel, Alexander Shishkin, Peter Zijlstra,
Stephane Eranian, Jiri Olsa, Arnaldo Carvalho de Melo,
Thomas Gleixner
* Vince Weaver <vincent.weaver@maine.edu> wrote:
> 487f05e18aa4efacee6357480f293a5afe6593b5 is the first bad commit
>
> commit 487f05e18aa4efacee6357480f293a5afe6593b5
> Author: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Date: Thu Jan 19 18:43:30 2017 +0200
>
> perf/core: Optimize event rescheduling on active contexts
BTW., just to prepare for the eventuality: below is a (completely untested...)
revert of this commit, against recent kernels, with conflicts fixed up.
Does this fix your testcase?
Thanks,
Ingo
>From fe0deecf2a8e9f5097013bcf89a9ef4b80715be1 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@kernel.org>
Date: Sat, 15 Jul 2017 12:57:51 +0200
Subject: [PATCH] Revert "perf/core: Optimize event rescheduling on active contexts"
This reverts commit 487f05e18aa4efacee6357480f293a5afe6593b5.
---
kernel/events/core.c | 80 ++++++++--------------------------------------------
1 file changed, 11 insertions(+), 69 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9747e422ab20..778aa2548142 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -359,8 +359,6 @@ enum event_type_t {
EVENT_FLEXIBLE = 0x1,
EVENT_PINNED = 0x2,
EVENT_TIME = 0x4,
- /* see ctx_resched() for details */
- EVENT_CPU = 0x8,
EVENT_ALL = EVENT_FLEXIBLE | EVENT_PINNED,
};
@@ -1445,20 +1443,6 @@ static void update_group_times(struct perf_event *leader)
update_event_times(event);
}
-static enum event_type_t get_event_type(struct perf_event *event)
-{
- struct perf_event_context *ctx = event->ctx;
- enum event_type_t event_type;
-
- lockdep_assert_held(&ctx->lock);
-
- event_type = event->attr.pinned ? EVENT_PINNED : EVENT_FLEXIBLE;
- if (!ctx->task)
- event_type |= EVENT_CPU;
-
- return event_type;
-}
-
static struct list_head *
ctx_group_list(struct perf_event *event, struct perf_event_context *ctx)
{
@@ -2232,8 +2216,7 @@ ctx_sched_in(struct perf_event_context *ctx,
struct task_struct *task);
static void task_ctx_sched_out(struct perf_cpu_context *cpuctx,
- struct perf_event_context *ctx,
- enum event_type_t event_type)
+ struct perf_event_context *ctx)
{
if (!cpuctx->task_ctx)
return;
@@ -2241,7 +2224,7 @@ static void task_ctx_sched_out(struct perf_cpu_context *cpuctx,
if (WARN_ON_ONCE(ctx != cpuctx->task_ctx))
return;
- ctx_sched_out(ctx, cpuctx, event_type);
+ ctx_sched_out(ctx, cpuctx, EVENT_ALL);
}
static void perf_event_sched_in(struct perf_cpu_context *cpuctx,
@@ -2256,51 +2239,13 @@ static void perf_event_sched_in(struct perf_cpu_context *cpuctx,
ctx_sched_in(ctx, cpuctx, EVENT_FLEXIBLE, task);
}
-/*
- * We want to maintain the following priority of scheduling:
- * - CPU pinned (EVENT_CPU | EVENT_PINNED)
- * - task pinned (EVENT_PINNED)
- * - CPU flexible (EVENT_CPU | EVENT_FLEXIBLE)
- * - task flexible (EVENT_FLEXIBLE).
- *
- * In order to avoid unscheduling and scheduling back in everything every
- * time an event is added, only do it for the groups of equal priority and
- * below.
- *
- * This can be called after a batch operation on task events, in which case
- * event_type is a bit mask of the types of events involved. For CPU events,
- * event_type is only either EVENT_PINNED or EVENT_FLEXIBLE.
- */
static void ctx_resched(struct perf_cpu_context *cpuctx,
- struct perf_event_context *task_ctx,
- enum event_type_t event_type)
+ struct perf_event_context *task_ctx)
{
- enum event_type_t ctx_event_type = event_type & EVENT_ALL;
- bool cpu_event = !!(event_type & EVENT_CPU);
-
- /*
- * If pinned groups are involved, flexible groups also need to be
- * scheduled out.
- */
- if (event_type & EVENT_PINNED)
- event_type |= EVENT_FLEXIBLE;
-
perf_pmu_disable(cpuctx->ctx.pmu);
if (task_ctx)
- task_ctx_sched_out(cpuctx, task_ctx, event_type);
-
- /*
- * Decide which cpu ctx groups to schedule out based on the types
- * of events that caused rescheduling:
- * - EVENT_CPU: schedule out corresponding groups;
- * - EVENT_PINNED task events: schedule out EVENT_FLEXIBLE groups;
- * - otherwise, do nothing more.
- */
- if (cpu_event)
- cpu_ctx_sched_out(cpuctx, ctx_event_type);
- else if (ctx_event_type & EVENT_PINNED)
- cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
-
+ task_ctx_sched_out(cpuctx, task_ctx);
+ cpu_ctx_sched_out(cpuctx, EVENT_ALL);
perf_event_sched_in(cpuctx, task_ctx, current);
perf_pmu_enable(cpuctx->ctx.pmu);
}
@@ -2347,7 +2292,7 @@ static int __perf_install_in_context(void *info)
if (reprogram) {
ctx_sched_out(ctx, cpuctx, EVENT_TIME);
add_event_to_ctx(event, ctx);
- ctx_resched(cpuctx, task_ctx, get_event_type(event));
+ ctx_resched(cpuctx, task_ctx);
} else {
add_event_to_ctx(event, ctx);
}
@@ -2514,7 +2459,7 @@ static void __perf_event_enable(struct perf_event *event,
if (ctx->task)
WARN_ON_ONCE(task_ctx != ctx);
- ctx_resched(cpuctx, task_ctx, get_event_type(event));
+ ctx_resched(cpuctx, task_ctx);
}
/*
@@ -2941,7 +2886,7 @@ static void perf_event_context_sched_out(struct task_struct *task, int ctxn,
if (do_switch) {
raw_spin_lock(&ctx->lock);
- task_ctx_sched_out(cpuctx, ctx, EVENT_ALL);
+ task_ctx_sched_out(cpuctx, ctx);
raw_spin_unlock(&ctx->lock);
}
}
@@ -3498,7 +3443,6 @@ static int event_enable_on_exec(struct perf_event *event,
static void perf_event_enable_on_exec(int ctxn)
{
struct perf_event_context *ctx, *clone_ctx = NULL;
- enum event_type_t event_type = 0;
struct perf_cpu_context *cpuctx;
struct perf_event *event;
unsigned long flags;
@@ -3512,17 +3456,15 @@ static void perf_event_enable_on_exec(int ctxn)
cpuctx = __get_cpu_context(ctx);
perf_ctx_lock(cpuctx, ctx);
ctx_sched_out(ctx, cpuctx, EVENT_TIME);
- list_for_each_entry(event, &ctx->event_list, event_entry) {
+ list_for_each_entry(event, &ctx->event_list, event_entry)
enabled |= event_enable_on_exec(event, ctx);
- event_type |= get_event_type(event);
- }
/*
* Unclone and reschedule this context if we enabled any event.
*/
if (enabled) {
clone_ctx = unclone_ctx(ctx);
- ctx_resched(cpuctx, ctx, event_type);
+ ctx_resched(cpuctx, ctx);
} else {
ctx_sched_in(ctx, cpuctx, EVENT_TIME, current);
}
@@ -10466,7 +10408,7 @@ static void perf_event_exit_task_context(struct task_struct *child, int ctxn)
* in.
*/
raw_spin_lock_irq(&child_ctx->lock);
- task_ctx_sched_out(__get_cpu_context(child_ctx), child_ctx, EVENT_ALL);
+ task_ctx_sched_out(__get_cpu_context(child_ctx), child_ctx);
/*
* Now that the context is inactive, destroy the task <-> ctx relation
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: perf: bisected sampling bug in Linux 4.11-rc1
2017-07-15 11:00 ` Ingo Molnar
@ 2017-07-15 14:37 ` Vince Weaver
2017-07-16 3:03 ` Vince Weaver
2017-07-18 10:39 ` [tip:perf/urgent] Revert "perf/core: Optimize event rescheduling on active contexts" tip-bot for Ingo Molnar
1 sibling, 1 reply; 12+ messages in thread
From: Vince Weaver @ 2017-07-15 14:37 UTC (permalink / raw)
To: Ingo Molnar
Cc: Vince Weaver, linux-kernel, Alexander Shishkin, Peter Zijlstra,
Stephane Eranian, Jiri Olsa, Arnaldo Carvalho de Melo,
Thomas Gleixner
On Sat, 15 Jul 2017, Ingo Molnar wrote:
>
> * Vince Weaver <vincent.weaver@maine.edu> wrote:
>
> > 487f05e18aa4efacee6357480f293a5afe6593b5 is the first bad commit
> >
> > commit 487f05e18aa4efacee6357480f293a5afe6593b5
> > Author: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> > Date: Thu Jan 19 18:43:30 2017 +0200
> >
> > perf/core: Optimize event rescheduling on active contexts
>
> BTW., just to prepare for the eventuality: below is a (completely untested...)
> revert of this commit, against recent kernels, with conflicts fixed up.
>
> Does this fix your testcase?
Yes, applying this to current git fixes the testcase and doesn't seem to
break anything else.
Although there is a separate issue also introduced in 4.11-rc1 that still
fails a different testcase. I'm in the middle of bisecting that one and
probably won't have the result of the bisect until Monday.
Vince
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: perf: bisected sampling bug in Linux 4.11-rc1
2017-07-15 14:37 ` Vince Weaver
@ 2017-07-16 3:03 ` Vince Weaver
0 siblings, 0 replies; 12+ messages in thread
From: Vince Weaver @ 2017-07-16 3:03 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, Alexander Shishkin, Peter Zijlstra,
Stephane Eranian, Jiri Olsa, Arnaldo Carvalho de Melo,
Thomas Gleixner
On Sat, 15 Jul 2017, Vince Weaver wrote:
> Although there is a separate issue also introduced in 4.11-rc1 that still
> fails a different testcase. I'm in the middle of bisecting that one and
> probably won't have the result of the bisect until Monday.
I went and bisected the other issue anyway. It wasn't in 4.11-rc1, but in
4.12-rc4. Yes, I have a test that triggered the
c1582c231ea041 perf/core: Drop kernel samples even though :u is specified
issue.
The test was actually losing 7% of its samples, so not just a single
sample here or there. The test had two events sampling, one at 100k and
one at 200k so I guess every other sample would have two samples
immediately back-to-back which must make it more likely to stray into the
kernel.
Vince
^ permalink raw reply [flat|nested] 12+ messages in thread
* [tip:perf/urgent] Revert "perf/core: Optimize event rescheduling on active contexts"
2017-07-15 11:00 ` Ingo Molnar
2017-07-15 14:37 ` Vince Weaver
@ 2017-07-18 10:39 ` tip-bot for Ingo Molnar
1 sibling, 0 replies; 12+ messages in thread
From: tip-bot for Ingo Molnar @ 2017-07-18 10:39 UTC (permalink / raw)
To: linux-tip-commits
Cc: acme, alexander.shishkin, peterz, eranian, mingo, linux-kernel,
jolsa, torvalds, acme, hpa, tglx, vincent.weaver
Commit-ID: 770f8eb8a990a8904bfd8a6849be147b40b6e1aa
Gitweb: http://git.kernel.org/tip/770f8eb8a990a8904bfd8a6849be147b40b6e1aa
Author: Ingo Molnar <mingo@kernel.org>
AuthorDate: Sat, 15 Jul 2017 13:00:49 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 18 Jul 2017 10:44:47 +0200
Revert "perf/core: Optimize event rescheduling on active contexts"
This reverts commit 487f05e18aa4efacee6357480f293a5afe6593b5.
Vince Weaver reported that it breaks a testcase for pinned events:
| I've bisected one of them, this report is about:
|
| tests/overflow/simul_oneshot_group_overflow
|
| This test creates an event group containing two sampling events, set
| to overflow to a signal handler (which disables and then refreshes the
| event).
|
| On a good kernel you get the following:
| Event perf::instructions with period 1000000
| Event perf::instructions with period 2000000
| fd 3 overflows: 946 (perf::instructions/1000000)
| fd 4 overflows: 473 (perf::instructions/2000000)
| Ending counts:
| Count 0: 946379875
| Count 1: 946365218
|
| With the broken kernels you get:
| Event perf::instructions with period 1000000
| Event perf::instructions with period 2000000
| fd 3 overflows: 938 (perf::instructions/1000000)
| fd 4 overflows: 318 (perf::instructions/2000000)
| Ending counts:
| Count 0: 946373080
| Count 1: 653373058
...
| additional relevant detail:
| in the failing case, the group leader of the event set has
| .pinned=1
| If I change that to .pinned=0 then the test passes.
As it's an optimization we can revert it for now until the root cause is found.
Adrian Hunter <adrian.hunter@intel.com>
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170715110049.36jvxnidy2flh6ll@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
kernel/events/core.c | 80 ++++++++--------------------------------------------
1 file changed, 11 insertions(+), 69 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9747e42..778aa25 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -359,8 +359,6 @@ enum event_type_t {
EVENT_FLEXIBLE = 0x1,
EVENT_PINNED = 0x2,
EVENT_TIME = 0x4,
- /* see ctx_resched() for details */
- EVENT_CPU = 0x8,
EVENT_ALL = EVENT_FLEXIBLE | EVENT_PINNED,
};
@@ -1445,20 +1443,6 @@ static void update_group_times(struct perf_event *leader)
update_event_times(event);
}
-static enum event_type_t get_event_type(struct perf_event *event)
-{
- struct perf_event_context *ctx = event->ctx;
- enum event_type_t event_type;
-
- lockdep_assert_held(&ctx->lock);
-
- event_type = event->attr.pinned ? EVENT_PINNED : EVENT_FLEXIBLE;
- if (!ctx->task)
- event_type |= EVENT_CPU;
-
- return event_type;
-}
-
static struct list_head *
ctx_group_list(struct perf_event *event, struct perf_event_context *ctx)
{
@@ -2232,8 +2216,7 @@ ctx_sched_in(struct perf_event_context *ctx,
struct task_struct *task);
static void task_ctx_sched_out(struct perf_cpu_context *cpuctx,
- struct perf_event_context *ctx,
- enum event_type_t event_type)
+ struct perf_event_context *ctx)
{
if (!cpuctx->task_ctx)
return;
@@ -2241,7 +2224,7 @@ static void task_ctx_sched_out(struct perf_cpu_context *cpuctx,
if (WARN_ON_ONCE(ctx != cpuctx->task_ctx))
return;
- ctx_sched_out(ctx, cpuctx, event_type);
+ ctx_sched_out(ctx, cpuctx, EVENT_ALL);
}
static void perf_event_sched_in(struct perf_cpu_context *cpuctx,
@@ -2256,51 +2239,13 @@ static void perf_event_sched_in(struct perf_cpu_context *cpuctx,
ctx_sched_in(ctx, cpuctx, EVENT_FLEXIBLE, task);
}
-/*
- * We want to maintain the following priority of scheduling:
- * - CPU pinned (EVENT_CPU | EVENT_PINNED)
- * - task pinned (EVENT_PINNED)
- * - CPU flexible (EVENT_CPU | EVENT_FLEXIBLE)
- * - task flexible (EVENT_FLEXIBLE).
- *
- * In order to avoid unscheduling and scheduling back in everything every
- * time an event is added, only do it for the groups of equal priority and
- * below.
- *
- * This can be called after a batch operation on task events, in which case
- * event_type is a bit mask of the types of events involved. For CPU events,
- * event_type is only either EVENT_PINNED or EVENT_FLEXIBLE.
- */
static void ctx_resched(struct perf_cpu_context *cpuctx,
- struct perf_event_context *task_ctx,
- enum event_type_t event_type)
+ struct perf_event_context *task_ctx)
{
- enum event_type_t ctx_event_type = event_type & EVENT_ALL;
- bool cpu_event = !!(event_type & EVENT_CPU);
-
- /*
- * If pinned groups are involved, flexible groups also need to be
- * scheduled out.
- */
- if (event_type & EVENT_PINNED)
- event_type |= EVENT_FLEXIBLE;
-
perf_pmu_disable(cpuctx->ctx.pmu);
if (task_ctx)
- task_ctx_sched_out(cpuctx, task_ctx, event_type);
-
- /*
- * Decide which cpu ctx groups to schedule out based on the types
- * of events that caused rescheduling:
- * - EVENT_CPU: schedule out corresponding groups;
- * - EVENT_PINNED task events: schedule out EVENT_FLEXIBLE groups;
- * - otherwise, do nothing more.
- */
- if (cpu_event)
- cpu_ctx_sched_out(cpuctx, ctx_event_type);
- else if (ctx_event_type & EVENT_PINNED)
- cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
-
+ task_ctx_sched_out(cpuctx, task_ctx);
+ cpu_ctx_sched_out(cpuctx, EVENT_ALL);
perf_event_sched_in(cpuctx, task_ctx, current);
perf_pmu_enable(cpuctx->ctx.pmu);
}
@@ -2347,7 +2292,7 @@ static int __perf_install_in_context(void *info)
if (reprogram) {
ctx_sched_out(ctx, cpuctx, EVENT_TIME);
add_event_to_ctx(event, ctx);
- ctx_resched(cpuctx, task_ctx, get_event_type(event));
+ ctx_resched(cpuctx, task_ctx);
} else {
add_event_to_ctx(event, ctx);
}
@@ -2514,7 +2459,7 @@ static void __perf_event_enable(struct perf_event *event,
if (ctx->task)
WARN_ON_ONCE(task_ctx != ctx);
- ctx_resched(cpuctx, task_ctx, get_event_type(event));
+ ctx_resched(cpuctx, task_ctx);
}
/*
@@ -2941,7 +2886,7 @@ unlock:
if (do_switch) {
raw_spin_lock(&ctx->lock);
- task_ctx_sched_out(cpuctx, ctx, EVENT_ALL);
+ task_ctx_sched_out(cpuctx, ctx);
raw_spin_unlock(&ctx->lock);
}
}
@@ -3498,7 +3443,6 @@ static int event_enable_on_exec(struct perf_event *event,
static void perf_event_enable_on_exec(int ctxn)
{
struct perf_event_context *ctx, *clone_ctx = NULL;
- enum event_type_t event_type = 0;
struct perf_cpu_context *cpuctx;
struct perf_event *event;
unsigned long flags;
@@ -3512,17 +3456,15 @@ static void perf_event_enable_on_exec(int ctxn)
cpuctx = __get_cpu_context(ctx);
perf_ctx_lock(cpuctx, ctx);
ctx_sched_out(ctx, cpuctx, EVENT_TIME);
- list_for_each_entry(event, &ctx->event_list, event_entry) {
+ list_for_each_entry(event, &ctx->event_list, event_entry)
enabled |= event_enable_on_exec(event, ctx);
- event_type |= get_event_type(event);
- }
/*
* Unclone and reschedule this context if we enabled any event.
*/
if (enabled) {
clone_ctx = unclone_ctx(ctx);
- ctx_resched(cpuctx, ctx, event_type);
+ ctx_resched(cpuctx, ctx);
} else {
ctx_sched_in(ctx, cpuctx, EVENT_TIME, current);
}
@@ -10466,7 +10408,7 @@ static void perf_event_exit_task_context(struct task_struct *child, int ctxn)
* in.
*/
raw_spin_lock_irq(&child_ctx->lock);
- task_ctx_sched_out(__get_cpu_context(child_ctx), child_ctx, EVENT_ALL);
+ task_ctx_sched_out(__get_cpu_context(child_ctx), child_ctx);
/*
* Now that the context is inactive, destroy the task <-> ctx relation
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: perf: bisected sampling bug in Linux 4.11-rc1
2017-07-14 18:05 perf: bisected sampling bug in Linux 4.11-rc1 Vince Weaver
` (2 preceding siblings ...)
2017-07-15 11:00 ` Ingo Molnar
@ 2017-07-18 11:08 ` Alexander Shishkin
2017-07-18 12:13 ` Ingo Molnar
2017-07-20 8:34 ` [tip:perf/urgent] perf/core: Fix scheduling regression of pinned groups tip-bot for Alexander Shishkin
3 siblings, 2 replies; 12+ messages in thread
From: Alexander Shishkin @ 2017-07-18 11:08 UTC (permalink / raw)
To: Vince Weaver, linux-kernel; +Cc: Ingo Molnar, Peter Zijlstra, Stephane Eranian
Vince Weaver <vincent.weaver@maine.edu> writes:
> I was tracking down some regressions in my perf_event_test testsuite.
> Some of the tests broke in the 4.11-rc1 timeframe.
>
> I've bisected one of them, this report is about
> tests/overflow/simul_oneshot_group_overflow
> This test creates an event group containing two sampling events, set
> to overflow to a signal handler (which disables and then refreshes the
> event).
>
> On a good kernel you get the following:
> Event perf::instructions with period 1000000
> Event perf::instructions with period 2000000
> fd 3 overflows: 946 (perf::instructions/1000000)
> fd 4 overflows: 473 (perf::instructions/2000000)
> Ending counts:
> Count 0: 946379875
> Count 1: 946365218
>
> With the broken kernels you get:
> Event perf::instructions with period 1000000
> Event perf::instructions with period 2000000
> fd 3 overflows: 938 (perf::instructions/1000000)
> fd 4 overflows: 318 (perf::instructions/2000000)
> Ending counts:
> Count 0: 946373080
> Count 1: 653373058
>
>
> 487f05e18aa4efacee6357480f293a5afe6593b5 is the first bad commit
>
> commit 487f05e18aa4efacee6357480f293a5afe6593b5
> Author: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Date: Thu Jan 19 18:43:30 2017 +0200
Ok, there was a bug there indeed. This patch should take care of it and
should also be backportable in case it's stable-worthy.
>From 187d67c9908cb126656c34546772089c17a8e6c5 Mon Sep 17 00:00:00 2001
From: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Date: Tue, 18 Jul 2017 13:53:01 +0300
Subject: [PATCH] perf: Fix scheduling regression of pinned groups
Commit 487f05e18a ("perf/core: Optimize event rescheduling on active
contexts") erronously assumed that event's 'pinned' setting determines
whether the event belongs to a pinned group or not, but in fact, it's
the group leader's pinned state that matters. This was discovered by
Vince in a test case where two instruction counters are grouped, the
group leader is pinned, but the other event is not; in the regressed
case the counters were off by 33% (the difference between events'
periods), but should be the same within the error margin.
This fixes the problem by looking at the group leader's pinning.
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Fixes: 487f05e18a ("perf/core: Optimize event rescheduling on active contexts")
Cc: stable@vger.kernel.org
---
kernel/events/core.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index bc63f8db1b..1edbaf94dd 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1451,6 +1451,13 @@ static enum event_type_t get_event_type(struct perf_event *event)
lockdep_assert_held(&ctx->lock);
+ /*
+ * It's 'group type', really, because if our group leader is
+ * pinned, so are we.
+ */
+ if (event->group_leader != event)
+ event = event->group_leader;
+
event_type = event->attr.pinned ? EVENT_PINNED : EVENT_FLEXIBLE;
if (!ctx->task)
event_type |= EVENT_CPU;
--
2.11.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: perf: bisected sampling bug in Linux 4.11-rc1
2017-07-18 11:08 ` perf: bisected sampling bug in Linux 4.11-rc1 Alexander Shishkin
@ 2017-07-18 12:13 ` Ingo Molnar
2017-07-20 3:30 ` Vince Weaver
2017-07-20 8:34 ` [tip:perf/urgent] perf/core: Fix scheduling regression of pinned groups tip-bot for Alexander Shishkin
1 sibling, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2017-07-18 12:13 UTC (permalink / raw)
To: Alexander Shishkin
Cc: Vince Weaver, linux-kernel, Peter Zijlstra, Stephane Eranian
* Alexander Shishkin <alexander.shishkin@linux.intel.com> wrote:
> Vince Weaver <vincent.weaver@maine.edu> writes:
>
> > I was tracking down some regressions in my perf_event_test testsuite.
> > Some of the tests broke in the 4.11-rc1 timeframe.
> >
> > I've bisected one of them, this report is about
> > tests/overflow/simul_oneshot_group_overflow
> > This test creates an event group containing two sampling events, set
> > to overflow to a signal handler (which disables and then refreshes the
> > event).
> >
> > On a good kernel you get the following:
> > Event perf::instructions with period 1000000
> > Event perf::instructions with period 2000000
> > fd 3 overflows: 946 (perf::instructions/1000000)
> > fd 4 overflows: 473 (perf::instructions/2000000)
> > Ending counts:
> > Count 0: 946379875
> > Count 1: 946365218
> >
> > With the broken kernels you get:
> > Event perf::instructions with period 1000000
> > Event perf::instructions with period 2000000
> > fd 3 overflows: 938 (perf::instructions/1000000)
> > fd 4 overflows: 318 (perf::instructions/2000000)
> > Ending counts:
> > Count 0: 946373080
> > Count 1: 653373058
> >
> >
> > 487f05e18aa4efacee6357480f293a5afe6593b5 is the first bad commit
> >
> > commit 487f05e18aa4efacee6357480f293a5afe6593b5
> > Author: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> > Date: Thu Jan 19 18:43:30 2017 +0200
>
> Ok, there was a bug there indeed. This patch should take care of it and
> should also be backportable in case it's stable-worthy.
>
> From 187d67c9908cb126656c34546772089c17a8e6c5 Mon Sep 17 00:00:00 2001
> From: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Date: Tue, 18 Jul 2017 13:53:01 +0300
> Subject: [PATCH] perf: Fix scheduling regression of pinned groups
Ok, great - if this works then I'll pick up this fix instead of the revert that
I've queued up earlier today.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: perf: bisected sampling bug in Linux 4.11-rc1
2017-07-18 12:13 ` Ingo Molnar
@ 2017-07-20 3:30 ` Vince Weaver
0 siblings, 0 replies; 12+ messages in thread
From: Vince Weaver @ 2017-07-20 3:30 UTC (permalink / raw)
To: Ingo Molnar
Cc: Alexander Shishkin, Vince Weaver, linux-kernel, Peter Zijlstra,
Stephane Eranian
On Tue, 18 Jul 2017, Ingo Molnar wrote:
>
> Ok, great - if this works then I'll pick up this fix instead of the revert that
> I've queued up earlier today.
sorry for the delay, I was out of town for a few days.
I've tried the patch and it looks like it fixes the test in question and
doesn't seem to add any other regressions.
Vince
^ permalink raw reply [flat|nested] 12+ messages in thread
* [tip:perf/urgent] perf/core: Fix scheduling regression of pinned groups
2017-07-18 11:08 ` perf: bisected sampling bug in Linux 4.11-rc1 Alexander Shishkin
2017-07-18 12:13 ` Ingo Molnar
@ 2017-07-20 8:34 ` tip-bot for Alexander Shishkin
1 sibling, 0 replies; 12+ messages in thread
From: tip-bot for Alexander Shishkin @ 2017-07-20 8:34 UTC (permalink / raw)
To: linux-tip-commits
Cc: peterz, torvalds, alexander.shishkin, linux-kernel, jolsa, tglx,
acme, vincent.weaver, mingo, hpa, eranian
Commit-ID: 3bda69c1c3993a2bddbae01397d12bfef6054011
Gitweb: http://git.kernel.org/tip/3bda69c1c3993a2bddbae01397d12bfef6054011
Author: Alexander Shishkin <alexander.shishkin@linux.intel.com>
AuthorDate: Tue, 18 Jul 2017 14:08:34 +0300
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 20 Jul 2017 09:43:02 +0200
perf/core: Fix scheduling regression of pinned groups
Vince Weaver reported:
> I was tracking down some regressions in my perf_event_test testsuite.
> Some of the tests broke in the 4.11-rc1 timeframe.
>
> I've bisected one of them, this report is about
> tests/overflow/simul_oneshot_group_overflow
> This test creates an event group containing two sampling events, set
> to overflow to a signal handler (which disables and then refreshes the
> event).
>
> On a good kernel you get the following:
> Event perf::instructions with period 1000000
> Event perf::instructions with period 2000000
> fd 3 overflows: 946 (perf::instructions/1000000)
> fd 4 overflows: 473 (perf::instructions/2000000)
> Ending counts:
> Count 0: 946379875
> Count 1: 946365218
>
> With the broken kernels you get:
> Event perf::instructions with period 1000000
> Event perf::instructions with period 2000000
> fd 3 overflows: 938 (perf::instructions/1000000)
> fd 4 overflows: 318 (perf::instructions/2000000)
> Ending counts:
> Count 0: 946373080
> Count 1: 653373058
The root cause of the bug is that the following commit:
487f05e18a ("perf/core: Optimize event rescheduling on active contexts")
erronously assumed that event's 'pinned' setting determines whether the
event belongs to a pinned group or not, but in fact, it's the group
leader's pinned state that matters.
This was discovered by Vince in the test case described above, where two instruction
counters are grouped, the group leader is pinned, but the other event is not;
in the regressed case the counters were off by 33% (the difference between events'
periods), but should be the same within the error margin.
Fix the problem by looking at the group leader's pinning.
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 487f05e18a ("perf/core: Optimize event rescheduling on active contexts")
Link: http://lkml.kernel.org/r/87lgnmvw7h.fsf@ashishki-desk.ger.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
kernel/events/core.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9747e42..c9cdbd3 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1452,6 +1452,13 @@ static enum event_type_t get_event_type(struct perf_event *event)
lockdep_assert_held(&ctx->lock);
+ /*
+ * It's 'group type', really, because if our group leader is
+ * pinned, so are we.
+ */
+ if (event->group_leader != event)
+ event = event->group_leader;
+
event_type = event->attr.pinned ? EVENT_PINNED : EVENT_FLEXIBLE;
if (!ctx->task)
event_type |= EVENT_CPU;
^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2017-07-20 8:40 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-14 18:05 perf: bisected sampling bug in Linux 4.11-rc1 Vince Weaver
2017-07-14 18:25 ` Vince Weaver
2017-07-14 20:07 ` Alexander Shishkin
2017-07-14 20:14 ` Vince Weaver
2017-07-15 11:00 ` Ingo Molnar
2017-07-15 14:37 ` Vince Weaver
2017-07-16 3:03 ` Vince Weaver
2017-07-18 10:39 ` [tip:perf/urgent] Revert "perf/core: Optimize event rescheduling on active contexts" tip-bot for Ingo Molnar
2017-07-18 11:08 ` perf: bisected sampling bug in Linux 4.11-rc1 Alexander Shishkin
2017-07-18 12:13 ` Ingo Molnar
2017-07-20 3:30 ` Vince Weaver
2017-07-20 8:34 ` [tip:perf/urgent] perf/core: Fix scheduling regression of pinned groups tip-bot for Alexander Shishkin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.