linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] perf/core: what is exclude_idle supposed to do
@ 2018-04-16 22:04 Stephane Eranian
  2018-04-17  6:20 ` Jiri Olsa
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Stephane Eranian @ 2018-04-16 22:04 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Jiri Olsa, Arnaldo Carvalho de Melo, mingo,
	Andi Kleen, Vince Weaver

Hi,

I am trying to understand what the exclude_idle event attribute is supposed
to accomplish.
As per the definition in the header file:

    exclude_idle   :  1, /* don't count when idle */

Naively, I thought it would simply stop the event when running in the
context of the idle task (swapper, pid 0) on any CPU. That would seem to
match the succinct description.

However, running a simple:

$ perf record -a -e cycles:I sleep 5
perf_event_attr:
    sample_type                      IP|TID|TIME|CPU|PERIOD
   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID
   exclude_idle                     1

on an idle machine, showed me that this is not what is actually happening:
$ perf  script
          swapper     0 [000] 1132634.287442:          1 cycles:I:
  ffffffff8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms])
          swapper     0 [001] 1132634.287498:          1 cycles:I:
  ffffffff8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms])


After looking at the code, it all made sense, it seems to current
implementation is only relevant for sw events. I don't understand why.

I am left wondering what is the goal of exclude_idle?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] perf/core: what is exclude_idle supposed to do
  2018-04-16 22:04 [RFC] perf/core: what is exclude_idle supposed to do Stephane Eranian
@ 2018-04-17  6:20 ` Jiri Olsa
  2018-04-18 15:10   ` Vince Weaver
  2018-04-17 13:40 ` Arnaldo Carvalho de Melo
  2018-04-20  8:35 ` Peter Zijlstra
  2 siblings, 1 reply; 9+ messages in thread
From: Jiri Olsa @ 2018-04-17  6:20 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: LKML, Peter Zijlstra, Arnaldo Carvalho de Melo, mingo,
	Andi Kleen, Vince Weaver

On Mon, Apr 16, 2018 at 10:04:53PM +0000, Stephane Eranian wrote:
> Hi,
> 
> I am trying to understand what the exclude_idle event attribute is supposed
> to accomplish.
> As per the definition in the header file:
> 
>     exclude_idle   :  1, /* don't count when idle */
> 
> Naively, I thought it would simply stop the event when running in the
> context of the idle task (swapper, pid 0) on any CPU. That would seem to
> match the succinct description.
> 
> However, running a simple:
> 
> $ perf record -a -e cycles:I sleep 5
> perf_event_attr:
>     sample_type                      IP|TID|TIME|CPU|PERIOD
>    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID
>    exclude_idle                     1
> 
> on an idle machine, showed me that this is not what is actually happening:
> $ perf  script
>           swapper     0 [000] 1132634.287442:          1 cycles:I:
>   ffffffff8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms])
>           swapper     0 [001] 1132634.287498:          1 cycles:I:
>   ffffffff8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms])
> 
> 
> After looking at the code, it all made sense, it seems to current
> implementation is only relevant for sw events. I don't understand why.

AFAICS it's not implemented

Peter suggested some time ago change for cpu-clock (attached)
I still have it in my queue, because it gives funny numbers
sometimes.. and I haven't figured it out so far

the idea so far was to use cpu-clock,cpu-clock:I events to
count and display idle % in perf top/record and stat metrics

jirka


---
>From 7f20b047cf56f8086d79007175592633798656ce Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz@infradead.org>
Date: Thu, 23 Nov 2017 16:15:36 +0100
Subject: [PATCH] perf: Add support to monitor idle time on cpu-clock

Adding support to use perf_event_attr::exclude_idle
(the :I modifierr in perf tools)to monitor idle time
on cpu-clock event.

Link: http://lkml.kernel.org/n/tip-jw45kl6ydrhzqx4uxws0qsqb@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/events/core.c     | 22 ++++++++++++++++++----
 kernel/sched/idle_task.c | 17 +++++++++++++++++
 2 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2989e3b0fe8b..62c0dc75ed11 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9391,19 +9391,33 @@ static void perf_swevent_init_hrtimer(struct perf_event *event)
  * Software event: cpu wall time clock
  */
 
+static u64 cpu_clock_count(struct perf_event *event)
+{
+	u64 now = local_clock();
+
+	if (event->attr.exclude_idle)
+		now -= idle_task(event->oncpu)->se.sum_exec_runtime;
+
+	return now;
+}
+
 static void cpu_clock_event_update(struct perf_event *event)
 {
-	s64 prev;
+	s64 prev, delta;
 	u64 now;
 
-	now = local_clock();
+	now = cpu_clock_count(event);
 	prev = local64_xchg(&event->hw.prev_count, now);
-	local64_add(now - prev, &event->count);
+	delta = now - prev;
+	if (delta > 0)
+		local64_add(delta, &event->count);
 }
 
 static void cpu_clock_event_start(struct perf_event *event, int flags)
 {
-	local64_set(&event->hw.prev_count, local_clock());
+	u64 now = cpu_clock_count(event);
+
+	local64_set(&event->hw.prev_count, now);
 	perf_swevent_start_hrtimer(event);
 }
 
diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
index d518664cce4f..e360ce5dd9a2 100644
--- a/kernel/sched/idle_task.c
+++ b/kernel/sched/idle_task.c
@@ -27,9 +27,14 @@ static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p, int fl
 static struct task_struct *
 pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
+	struct task_struct *idle = rq->idle;
+
 	put_prev_task(rq, prev);
 	update_idle_core(rq);
 	schedstat_inc(rq->sched_goidle);
+
+	idle->se.exec_start = rq_clock_task(rq);
+
 	return rq->idle;
 }
 
@@ -48,6 +53,15 @@ dequeue_task_idle(struct rq *rq, struct task_struct *p, int flags)
 
 static void put_prev_task_idle(struct rq *rq, struct task_struct *prev)
 {
+	struct task_struct *idle = rq->idle;
+	u64 delta, now;
+
+	now = rq_clock_task(rq);
+	delta = now - idle->se.exec_start;
+	if (likely((s64)delta > 0))
+		idle->se.sum_exec_runtime += delta;
+	idle->se.exec_start = now;
+
 	rq_last_tick_reset(rq);
 }
 
@@ -57,6 +71,9 @@ static void task_tick_idle(struct rq *rq, struct task_struct *curr, int queued)
 
 static void set_curr_task_idle(struct rq *rq)
 {
+	struct task_struct *idle = rq->idle;
+
+	idle->se.exec_start = rq_clock_task(rq);
 }
 
 static void switched_to_idle(struct rq *rq, struct task_struct *p)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC] perf/core: what is exclude_idle supposed to do
  2018-04-16 22:04 [RFC] perf/core: what is exclude_idle supposed to do Stephane Eranian
  2018-04-17  6:20 ` Jiri Olsa
@ 2018-04-17 13:40 ` Arnaldo Carvalho de Melo
  2018-04-20  8:35 ` Peter Zijlstra
  2 siblings, 0 replies; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-04-17 13:40 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: LKML, Peter Zijlstra, Jiri Olsa, Ingo Molnar, Andi Kleen,
	Vince Weaver, acme

Em Mon, Apr 16, 2018 at 10:04:53PM +0000, Stephane Eranian escreveu:
> Hi,
> 
> I am trying to understand what the exclude_idle event attribute is supposed
> to accomplish.
> As per the definition in the header file:
> 
>     exclude_idle   :  1, /* don't count when idle */
> 
> Naively, I thought it would simply stop the event when running in the
> context of the idle task (swapper, pid 0) on any CPU. That would seem to
> match the succinct description.
> 
> However, running a simple:
> 
> $ perf record -a -e cycles:I sleep 5
> perf_event_attr:
>     sample_type                      IP|TID|TIME|CPU|PERIOD
>    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID
>    exclude_idle                     1
> 
> on an idle machine, showed me that this is not what is actually happening:
> $ perf  script
>           swapper     0 [000] 1132634.287442:          1 cycles:I:
>   ffffffff8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms])
>           swapper     0 [001] 1132634.287498:          1 cycles:I:
>   ffffffff8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms])
> 
> 
> After looking at the code, it all made sense, it seems to current
> implementation is only relevant for sw events. I don't understand why.
> 
> I am left wondering what is the goal of exclude_idle?

To do something it is not doing, i.e. to do what you expected it to do.

There were messages exchanged recently where PeterZ said that this is
just that needs fixing.

- Arnaldo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] perf/core: what is exclude_idle supposed to do
  2018-04-17  6:20 ` Jiri Olsa
@ 2018-04-18 15:10   ` Vince Weaver
  2018-04-20  8:36     ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Vince Weaver @ 2018-04-18 15:10 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Stephane Eranian, LKML, Peter Zijlstra, Arnaldo Carvalho de Melo,
	mingo, Andi Kleen, Vince Weaver

On Tue, 17 Apr 2018, Jiri Olsa wrote:

> On Mon, Apr 16, 2018 at 10:04:53PM +0000, Stephane Eranian wrote:
> > Hi,
> > 
> > I am trying to understand what the exclude_idle event attribute is supposed
> > to accomplish.
> > As per the definition in the header file:
> > 
> >     exclude_idle   :  1, /* don't count when idle */
> 
> AFAICS it's not implemented

so just to be completely clear hear, we're saying that the "exclude_idle" 
modifier has never done anything useful and still doesn't?

If so I should update the perf_event_open manpage to spell this out.

Vince

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] perf/core: what is exclude_idle supposed to do
  2018-04-16 22:04 [RFC] perf/core: what is exclude_idle supposed to do Stephane Eranian
  2018-04-17  6:20 ` Jiri Olsa
  2018-04-17 13:40 ` Arnaldo Carvalho de Melo
@ 2018-04-20  8:35 ` Peter Zijlstra
  2 siblings, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2018-04-20  8:35 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: LKML, Jiri Olsa, Arnaldo Carvalho de Melo, mingo, Andi Kleen,
	Vince Weaver

On Mon, Apr 16, 2018 at 10:04:53PM +0000, Stephane Eranian wrote:
> Hi,
> 
> I am trying to understand what the exclude_idle event attribute is supposed
> to accomplish.
> As per the definition in the header file:
> 
>     exclude_idle   :  1, /* don't count when idle */
> 
> Naively, I thought it would simply stop the event when running in the
> context of the idle task (swapper, pid 0) on any CPU. That would seem to
> match the succinct description.
> 
> However, running a simple:
> 
> $ perf record -a -e cycles:I sleep 5
> perf_event_attr:
>     sample_type                      IP|TID|TIME|CPU|PERIOD
>    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID
>    exclude_idle                     1
> 
> on an idle machine, showed me that this is not what is actually happening:
> $ perf  script
>           swapper     0 [000] 1132634.287442:          1 cycles:I:
>   ffffffff8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms])
>           swapper     0 [001] 1132634.287498:          1 cycles:I:
>   ffffffff8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms])
> 
> 
> After looking at the code, it all made sense, it seems to current
> implementation is only relevant for sw events. I don't understand why.
> 
> I am left wondering what is the goal of exclude_idle?

A "git grep exclude_idle" seems to suggest powerpc/arm have it
immplemented for their PMU. If we then look at commit:

  2743a5b0fa6f ("perfcounters: provide expansion room in the ABI")

It was Paul who introduced the bit.

So I'm thinking that if x86 doesn't implement it, we should at least
error out on it. Of course, so far we've allowed it, so who knows what
all will break if we start asserting that :/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] perf/core: what is exclude_idle supposed to do
  2018-04-18 15:10   ` Vince Weaver
@ 2018-04-20  8:36     ` Peter Zijlstra
  2018-04-20 14:18       ` Vince Weaver
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2018-04-20  8:36 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Jiri Olsa, Stephane Eranian, LKML, Arnaldo Carvalho de Melo,
	mingo, Andi Kleen

On Wed, Apr 18, 2018 at 11:10:20AM -0400, Vince Weaver wrote:
> On Tue, 17 Apr 2018, Jiri Olsa wrote:
> 
> > On Mon, Apr 16, 2018 at 10:04:53PM +0000, Stephane Eranian wrote:
> > > Hi,
> > > 
> > > I am trying to understand what the exclude_idle event attribute is supposed
> > > to accomplish.
> > > As per the definition in the header file:
> > > 
> > >     exclude_idle   :  1, /* don't count when idle */
> > 
> > AFAICS it's not implemented
> 
> so just to be completely clear hear, we're saying that the "exclude_idle" 
> modifier has never done anything useful and still doesn't?

AFAICT it works on Power and possibly ARM.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] perf/core: what is exclude_idle supposed to do
  2018-04-20  8:36     ` Peter Zijlstra
@ 2018-04-20 14:18       ` Vince Weaver
  2018-04-20 16:51         ` Vince Weaver
  0 siblings, 1 reply; 9+ messages in thread
From: Vince Weaver @ 2018-04-20 14:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, Jiri Olsa, Stephane Eranian, LKML,
	Arnaldo Carvalho de Melo, mingo, Andi Kleen

On Fri, 20 Apr 2018, Peter Zijlstra wrote:

> On Wed, Apr 18, 2018 at 11:10:20AM -0400, Vince Weaver wrote:
> > On Tue, 17 Apr 2018, Jiri Olsa wrote:
> > 
> > > On Mon, Apr 16, 2018 at 10:04:53PM +0000, Stephane Eranian wrote:
> > > > Hi,
> > > > 
> > > > I am trying to understand what the exclude_idle event attribute is supposed
> > > > to accomplish.
> > > > As per the definition in the header file:
> > > > 
> > > >     exclude_idle   :  1, /* don't count when idle */
> > > 
> > > AFAICS it's not implemented
> > 
> > so just to be completely clear hear, we're saying that the "exclude_idle" 
> > modifier has never done anything useful and still doesn't?
> 
> AFAICT it works on Power and possibly ARM.

at least some ARMs are a bit more honest about it than x86

ivybridge:
	Performance counter stats for '/bin/ls':
	1,368,162      instructions
	1,368,162      instructions:I

pi2/ARM cortex-A7
	Performance counter stats for '/bin/ls':
	1,910,083      instructions
	<not supported>      instructions:I

I'd fire up my Power8 machine to see but not sure it's worth the hassle 
and/or having to get out the ear protection.

Vince

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] perf/core: what is exclude_idle supposed to do
  2018-04-20 14:18       ` Vince Weaver
@ 2018-04-20 16:51         ` Vince Weaver
  2018-04-20 18:19           ` Stephane Eranian
  0 siblings, 1 reply; 9+ messages in thread
From: Vince Weaver @ 2018-04-20 16:51 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, Jiri Olsa, Stephane Eranian, LKML,
	Arnaldo Carvalho de Melo, mingo, Andi Kleen

On Fri, 20 Apr 2018, Vince Weaver wrote:

> > AFAICT it works on Power and possibly ARM.
> 
> at least some ARMs are a bit more honest about it than x86
> 
> ivybridge:
> 	Performance counter stats for '/bin/ls':
> 	1,368,162      instructions
> 	1,368,162      instructions:I
> 
> pi2/ARM cortex-A7
> 	Performance counter stats for '/bin/ls':
> 	1,910,083      instructions
> 	<not supported>      instructions:I
> 
> I'd fire up my Power8 machine to see but not sure it's worth the hassle 
> and/or having to get out the ear protection.

I did power up the Power8 machine in the end:

power8:
	perf stat -e cycles,cycles:I sleep 5
	Performance counter stats for 'sleep 5':
        14,271,273      cycles
        14,271,273      cycles:I

???

But then if I try again on power8

	perf stat -a -e cycles,cycles:I sleep 5
	 Performance counter stats for 'system wide':
	1,238,772,322,327      cycles
 	1,238,674,771,713      cycles:I   

there is a difference.

But then on ivybridge

	perf stat -a -e cycles,cycles:I sleep 5
	Performance counter stats for 'system wide':
	589,598,104      cycles
	589,537,190      cycles:I

there is also a different in system wide mode.

So maybe exclude_idle does do something on x86?  Or am I completely 
misunderstanding what the flag is supposed to be indicating?

Vince

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] perf/core: what is exclude_idle supposed to do
  2018-04-20 16:51         ` Vince Weaver
@ 2018-04-20 18:19           ` Stephane Eranian
  0 siblings, 0 replies; 9+ messages in thread
From: Stephane Eranian @ 2018-04-20 18:19 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, Jiri Olsa, LKML, Arnaldo Carvalho de Melo, mingo,
	Andi Kleen

On Fri, Apr 20, 2018 at 9:51 AM Vince Weaver <vincent.weaver@maine.edu>
wrote:

> On Fri, 20 Apr 2018, Vince Weaver wrote:

> > > AFAICT it works on Power and possibly ARM.
> >
> > at least some ARMs are a bit more honest about it than x86
> >
> > ivybridge:
> >       Performance counter stats for '/bin/ls':
> >       1,368,162      instructions
> >       1,368,162      instructions:I
> >
> > pi2/ARM cortex-A7
> >       Performance counter stats for '/bin/ls':
> >       1,910,083      instructions
> >       <not supported>      instructions:I
> >
> > I'd fire up my Power8 machine to see but not sure it's worth the hassle
> > and/or having to get out the ear protection.

> I did power up the Power8 machine in the end:

> power8:
>          perf stat -e cycles,cycles:I sleep 5
>          Performance counter stats for 'sleep 5':
>          14,271,273      cycles
>          14,271,273      cycles:I

> ???

> But then if I try again on power8

>          perf stat -a -e cycles,cycles:I sleep 5
>           Performance counter stats for 'system wide':
>          1,238,772,322,327      cycles
>          1,238,674,771,713      cycles:I

> there is a difference.

> But then on ivybridge

>          perf stat -a -e cycles,cycles:I sleep 5
>          Performance counter stats for 'system wide':
>          589,598,104      cycles
>          589,537,190      cycles:I

This may be noise.

The way the flag is named leads me to believe its goal is to not count when
executing in the context of the idle task (pid 0 on each CPU).
However, it does not seem to be implemented that way. If you were to
implement this, then in system wide mode you'd have to check
the incoming task on ctxsw, very much like we do in cgroup monitoring. So
it would not be totally free. One can argue, in sampling mode
you can eliminate the samples coming from PID=0 in the tool. But there
would be nothing to cover counting mode.

Interestingly, there is also code in perf tool to exclude known idle
routines from reporting. But this is targeted to only some routines that the
idle task may end up executing. so it is not quite the same.


> So maybe exclude_idle does do something on x86?  Or am I completely
> misunderstanding what the flag is supposed to be indicating?

> Vince

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-04-20 18:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-16 22:04 [RFC] perf/core: what is exclude_idle supposed to do Stephane Eranian
2018-04-17  6:20 ` Jiri Olsa
2018-04-18 15:10   ` Vince Weaver
2018-04-20  8:36     ` Peter Zijlstra
2018-04-20 14:18       ` Vince Weaver
2018-04-20 16:51         ` Vince Weaver
2018-04-20 18:19           ` Stephane Eranian
2018-04-17 13:40 ` Arnaldo Carvalho de Melo
2018-04-20  8:35 ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).