linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET 00/10] perf sched timehist: Introduce --idle-hist option (v1)
@ 2016-12-06  3:40 Namhyung Kim
  2016-12-06  3:40 ` [PATCH 01/10] perf sched: Cleanup option processing Namhyung Kim
                   ` (10 more replies)
  0 siblings, 11 replies; 28+ messages in thread
From: Namhyung Kim @ 2016-12-06  3:40 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Andi Kleen, Minchan Kim

Hi,

This patchset implements the idle hist feature which analyze reason of system
idle.  Sometimes I need to investigate what makes CPUs to go idle even though
I have jobs to do.  It may be due to I/O, waiting on lock or whatever.

To identify the reasons it only accounts events related to idle task.  Also it
shows callchains when entering to idle and time how long it's in the idle.
Although it's not perfect, it works well to help finding the reasons.

With --idle-hist option, only idle task's runtime is accounted to previous
task (which makes the cpu go to idle).  With --summary(-only) option, you can
see the total idle stat by task or callchains.

The example output looks like below:

  Idle-time summary
                            comm  parent  sched-out    idle-time   min-idle    avg-idle    max-idle  stddev  migrations
                                            (count)       (msec)     (msec)      (msec)      (msec)       %
  ---------------------------------------------------------------------------------------------------------------------
                  rcu_preempt[7]       2         95      550.872      0.011       5.798      23.146    7.63       0
                 migration/1[16]       2          1       15.558     15.558      15.558      15.558    0.00       0
                  khugepaged[39]       2          1        3.062      3.062       3.062       3.062    0.00       0
               kworker/0:1H[124]       2          2        4.728      0.611       2.364       4.116   74.12       0
            systemd-journal[167]       1          1        4.510      4.510       4.510       4.510    0.00       0
              kworker/u16:3[558]       2         13       74.737      0.080       5.749      12.960   21.96       0
             irq/34-iwlwifi[628]       2         21      118.403      0.032       5.638      23.990   24.00       0
              kworker/u17:0[673]       2          1        3.523      3.523       3.523       3.523    0.00       0
                dbus-daemon[722]       1          1        6.743      6.743       6.743       6.743    0.00       0
                    ifplugd[741]       1          1       58.826     58.826      58.826      58.826    0.00       0
            wpa_supplicant[1490]       1          1       13.302     13.302      13.302      13.302    0.00       0
               wpa_actiond[1492]       1          2        4.064      0.168       2.032       3.896   91.72       0
                   dockerd[1500]       1          1        0.055      0.055       0.055       0.055    0.00       0
              dockerd[1516/1500]    1500          1        5.732      5.732       5.732       5.732    0.00       0
              dockerd[1571/1500]    1500          1        0.020      0.020       0.020       0.020    0.00       0
  ...
  
  Idle stats:
      CPU  0 idle for    902.195  msec
      CPU  1 idle for    899.932  msec
      CPU  2 idle for    941.218  msec
      CPU  3 idle for    822.453  msec
  
  Idle stats by callchain:
    CPU  0:   902.195  msec
    Idle time (msec)     Count  Callchains
    ----------------  --------  --------------------------------------------------
             370.589        69  futex_wait_queue_me <- futex_wait <- do_futex <- sys_futex <- entry_SYSCALL_64_fastpath
             178.799        17  worker_thread <- kthread <- ret_from_fork
             128.352        17  schedule_timeout <- rcu_gp_kthread <- kthread <- ret_from_fork
             125.111        19  schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_select <- core_sys_select
              71.599        50  schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- sys_poll
              23.146         1  rcu_gp_kthread <- kthread <- ret_from_fork
               4.510         1  schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- ep_poll <- sys_epoll_wait <- do_syscall_64
               0.085         1  schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- do_restart_poll
  ...


The patch 1-5 are cleanup and preparations so they can be applied separately.
The patch 6-10 implements the -I/--idle-hist option.

The code is available at 'perf/timehist-idle-v1' branch in my tree:

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git 

Any comments are welcome!

Thanks,
Namhyung


Namhyung Kim (10):
  perf sched: Cleanup option processing
  perf tools: Introduce callchain_cursor__copy()
  perf sched timehist: Handle zero sample->tid properly
  perf sched timehist: Split is_idle_sample()
  perf sched timehist: Cleanup idle_max_cpu handling
  perf sched timehist: Introduce struct idle_time_data
  perf sched timehist: Save callchain when entering idle
  perf sched timehist: Skip non-idle events when necessary
  perf sched timehist: Add -I/--idle-hist option
  perf sched timehist: Show callchains for idle stat

 tools/perf/Documentation/perf-sched.txt |   4 +
 tools/perf/builtin-sched.c              | 264 +++++++++++++++++++++++++++-----
 tools/perf/util/callchain.c             |  27 ++++
 tools/perf/util/callchain.h             |   3 +
 4 files changed, 256 insertions(+), 42 deletions(-)

-- 
2.10.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 01/10] perf sched: Cleanup option processing
  2016-12-06  3:40 [PATCHSET 00/10] perf sched timehist: Introduce --idle-hist option (v1) Namhyung Kim
@ 2016-12-06  3:40 ` Namhyung Kim
  2016-12-06  3:58   ` David Ahern
  2016-12-07 18:24   ` [tip:perf/core] " tip-bot for Namhyung Kim
  2016-12-06  3:40 ` [PATCH 02/10] perf tools: Introduce callchain_cursor__copy() Namhyung Kim
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 28+ messages in thread
From: Namhyung Kim @ 2016-12-06  3:40 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Andi Kleen, Minchan Kim

The -D/--dump-raw-trace option is in the parent option so no need to
repeat it.  Also move -f/--force option to parent as it's common to
handle data file.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-sched.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 870d94cd20ba..6991686bcaa5 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -2583,6 +2583,7 @@ static int perf_sched__timehist(struct perf_sched *sched)
 	struct perf_data_file file = {
 		.path = input_name,
 		.mode = PERF_DATA_MODE_READ,
+		.force = sched->force,
 	};
 
 	struct perf_session *session;
@@ -2984,6 +2985,7 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 		    "be more verbose (show symbol address, etc)"),
 	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
 		    "dump raw trace in ASCII"),
+	OPT_BOOLEAN('f', "force", &sched.force, "don't complain, do it"),
 	OPT_END()
 	};
 	const struct option latency_options[] = {
@@ -2991,8 +2993,6 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 		   "sort by key(s): runtime, switch, avg, max"),
 	OPT_INTEGER('C', "CPU", &sched.profile_cpu,
 		    "CPU to profile on"),
-	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
-		    "dump raw trace in ASCII"),
 	OPT_BOOLEAN('p', "pids", &sched.skip_merge,
 		    "latency stats per pid instead of per comm"),
 	OPT_PARENT(sched_options)
@@ -3000,9 +3000,6 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 	const struct option replay_options[] = {
 	OPT_UINTEGER('r', "repeat", &sched.replay_repeat,
 		     "repeat the workload replay N times (-1: infinite)"),
-	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
-		    "dump raw trace in ASCII"),
-	OPT_BOOLEAN('f', "force", &sched.force, "don't complain, do it"),
 	OPT_PARENT(sched_options)
 	};
 	const struct option map_options[] = {
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 02/10] perf tools: Introduce callchain_cursor__copy()
  2016-12-06  3:40 [PATCHSET 00/10] perf sched timehist: Introduce --idle-hist option (v1) Namhyung Kim
  2016-12-06  3:40 ` [PATCH 01/10] perf sched: Cleanup option processing Namhyung Kim
@ 2016-12-06  3:40 ` Namhyung Kim
  2016-12-07 18:25   ` [tip:perf/core] perf callchain: " tip-bot for Namhyung Kim
  2016-12-06  3:40 ` [PATCH 03/10] perf sched timehist: Handle zero sample->tid properly Namhyung Kim
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Namhyung Kim @ 2016-12-06  3:40 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Andi Kleen, Minchan Kim

The callchain_cursor__copy() function is to save current callchain
captured by a cursor.  It'll be used to keep callchains when switching
to idle task for each cpu.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/callchain.c | 27 +++++++++++++++++++++++++++
 tools/perf/util/callchain.h |  3 +++
 2 files changed, 30 insertions(+)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 823befd8209a..42922512c1c6 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1234,3 +1234,30 @@ int callchain_node__make_parent_list(struct callchain_node *node)
 	}
 	return -ENOMEM;
 }
+
+int callchain_cursor__copy(struct callchain_cursor *dst,
+			   struct callchain_cursor *src)
+{
+	int rc = 0;
+
+	callchain_cursor_reset(dst);
+	callchain_cursor_commit(src);
+
+	while (true) {
+		struct callchain_cursor_node *node;
+
+		node = callchain_cursor_current(src);
+		if (node == NULL)
+			break;
+
+		rc = callchain_cursor_append(dst, node->ip, node->map, node->sym,
+					     node->branch, &node->branch_flags,
+					     node->nr_loop_iter, node->samples);
+		if (rc)
+			break;
+
+		callchain_cursor_advance(src);
+	}
+
+	return rc;
+}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index d9c70dccf06a..35c8e379530f 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -216,6 +216,9 @@ static inline void callchain_cursor_advance(struct callchain_cursor *cursor)
 	cursor->pos++;
 }
 
+int callchain_cursor__copy(struct callchain_cursor *dst,
+			   struct callchain_cursor *src);
+
 struct option;
 struct hist_entry;
 
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 03/10] perf sched timehist: Handle zero sample->tid properly
  2016-12-06  3:40 [PATCHSET 00/10] perf sched timehist: Introduce --idle-hist option (v1) Namhyung Kim
  2016-12-06  3:40 ` [PATCH 01/10] perf sched: Cleanup option processing Namhyung Kim
  2016-12-06  3:40 ` [PATCH 02/10] perf tools: Introduce callchain_cursor__copy() Namhyung Kim
@ 2016-12-06  3:40 ` Namhyung Kim
  2016-12-06  3:52   ` David Ahern
  2016-12-07 18:25   ` [tip:perf/core] " tip-bot for Namhyung Kim
  2016-12-06  3:40 ` [PATCH 04/10] perf sched timehist: Split is_idle_sample() Namhyung Kim
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 28+ messages in thread
From: Namhyung Kim @ 2016-12-06  3:40 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Andi Kleen, Minchan Kim

Sometimes samples have tid of 0 but non-0 pid.  It ends up having a
new thread of 0 tid/pid (instead of referring idle task) since tid is
used to search matching task.  But I guess it's wrong to use 0 as a
tid when pid is set.  This patch uses tid only if it has a non-zero
value or same as pid (of 0).

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-sched.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 6991686bcaa5..e34a71166b4a 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -2118,7 +2118,9 @@ static struct thread *timehist_get_thread(struct perf_sched *sched,
 			pr_err("Failed to get idle thread for cpu %d.\n", sample->cpu);
 
 	} else {
-		thread = machine__findnew_thread(machine, sample->pid, sample->tid);
+		/* there were samples with tid 0 but non-zero pid */
+		thread = machine__findnew_thread(machine, sample->pid,
+						 sample->tid ?: sample->pid);
 		if (thread == NULL) {
 			pr_debug("Failed to get thread for tid %d. skipping sample.\n",
 				 sample->tid);
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 04/10] perf sched timehist: Split is_idle_sample()
  2016-12-06  3:40 [PATCHSET 00/10] perf sched timehist: Introduce --idle-hist option (v1) Namhyung Kim
                   ` (2 preceding siblings ...)
  2016-12-06  3:40 ` [PATCH 03/10] perf sched timehist: Handle zero sample->tid properly Namhyung Kim
@ 2016-12-06  3:40 ` Namhyung Kim
  2016-12-06  3:57   ` David Ahern
  2016-12-06  3:40 ` [PATCH 05/10] perf sched timehist: Cleanup idle_max_cpu handling Namhyung Kim
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Namhyung Kim @ 2016-12-06  3:40 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Andi Kleen, Minchan Kim

The is_idle_sample() function actually does more than determining
whether sample come from idle task.  Split the callchain part into
save_task_callchain() to make it clearer.  Also checking prev_pid from
trace data looks unnecessary since it should be always same as
sample->pid.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-sched.c | 35 ++++++++++++++++-------------------
 1 file changed, 16 insertions(+), 19 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index e34a71166b4a..5f69c9b9c78f 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -1939,39 +1939,36 @@ static void timehist_update_runtime_stats(struct thread_runtime *r,
 	r->total_run_time += r->dt_run;
 }
 
-static bool is_idle_sample(struct perf_sched *sched,
-			   struct perf_sample *sample,
-			   struct perf_evsel *evsel,
-			   struct machine *machine)
+static bool is_idle_sample(struct perf_sample *sample)
 {
-	struct thread *thread;
-	struct callchain_cursor *cursor = &callchain_cursor;
-
 	/* pid 0 == swapper == idle task */
-	if (sample->pid == 0)
-		return true;
+	return (sample->pid == 0);
+}
 
-	if (strcmp(perf_evsel__name(evsel), "sched:sched_switch") == 0) {
-		if (perf_evsel__intval(evsel, sample, "prev_pid") == 0)
-			return true;
-	}
+static void save_task_callchain(struct perf_sched *sched,
+				struct perf_sample *sample,
+				struct perf_evsel *evsel,
+				struct machine *machine)
+{
+	struct callchain_cursor *cursor = &callchain_cursor;
+	struct thread *thread;
 
 	/* want main thread for process - has maps */
 	thread = machine__findnew_thread(machine, sample->pid, sample->pid);
 	if (thread == NULL) {
 		pr_debug("Failed to get thread for pid %d.\n", sample->pid);
-		return false;
+		return;
 	}
 
 	if (!symbol_conf.use_callchain || sample->callchain == NULL)
-		return false;
+		return;
 
 	if (thread__resolve_callchain(thread, cursor, evsel, sample,
 				      NULL, NULL, sched->max_stack + 2) != 0) {
 		if (verbose)
 			error("Failed to resolve callchain. Skipping\n");
 
-		return false;
+		return;
 	}
 
 	callchain_cursor_commit(cursor);
@@ -1994,8 +1991,6 @@ static bool is_idle_sample(struct perf_sched *sched,
 
 		callchain_cursor_advance(cursor);
 	}
-
-	return false;
 }
 
 /*
@@ -2112,7 +2107,7 @@ static struct thread *timehist_get_thread(struct perf_sched *sched,
 {
 	struct thread *thread;
 
-	if (is_idle_sample(sched, sample, evsel, machine)) {
+	if (is_idle_sample(sample)) {
 		thread = get_idle_thread(sample->cpu);
 		if (thread == NULL)
 			pr_err("Failed to get idle thread for cpu %d.\n", sample->cpu);
@@ -2125,6 +2120,8 @@ static struct thread *timehist_get_thread(struct perf_sched *sched,
 			pr_debug("Failed to get thread for tid %d. skipping sample.\n",
 				 sample->tid);
 		}
+
+		save_task_callchain(sched, sample, evsel, machine);
 	}
 
 	return thread;
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 05/10] perf sched timehist: Cleanup idle_max_cpu handling
  2016-12-06  3:40 [PATCHSET 00/10] perf sched timehist: Introduce --idle-hist option (v1) Namhyung Kim
                   ` (3 preceding siblings ...)
  2016-12-06  3:40 ` [PATCH 04/10] perf sched timehist: Split is_idle_sample() Namhyung Kim
@ 2016-12-06  3:40 ` Namhyung Kim
  2016-12-06  4:00   ` David Ahern
  2016-12-07 18:26   ` [tip:perf/core] " tip-bot for Namhyung Kim
  2016-12-06  3:40 ` [PATCH 06/10] perf sched timehist: Introduce struct idle_time_data Namhyung Kim
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 28+ messages in thread
From: Namhyung Kim @ 2016-12-06  3:40 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Andi Kleen, Minchan Kim

It treats the idle_max_cpu little bit confusingly IMHO.  Let's make it
more straight forward.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-sched.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 5f69c9b9c78f..26efa99567b8 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -2005,7 +2005,7 @@ static int init_idle_threads(int ncpu)
 	if (!idle_threads)
 		return -ENOMEM;
 
-	idle_max_cpu = ncpu - 1;
+	idle_max_cpu = ncpu;
 
 	/* allocate the actual thread struct if needed */
 	for (i = 0; i < ncpu; ++i) {
@@ -2026,7 +2026,7 @@ static void free_idle_threads(void)
 	if (idle_threads == NULL)
 		return;
 
-	for (i = 0; i <= idle_max_cpu; ++i) {
+	for (i = 0; i < idle_max_cpu; ++i) {
 		if ((idle_threads[i]))
 			thread__delete(idle_threads[i]);
 	}
@@ -2049,8 +2049,7 @@ static struct thread *get_idle_thread(int cpu)
 			return NULL;
 
 		idle_threads = (struct thread **) p;
-		i = idle_max_cpu ? idle_max_cpu + 1 : 0;
-		for (; i < j; ++i)
+		for (i = idle_max_cpu; i < j; ++i)
 			idle_threads[i] = NULL;
 
 		idle_max_cpu = j;
@@ -2492,7 +2491,7 @@ static void timehist_print_summary(struct perf_sched *sched,
 		return;
 
 	printf("\nIdle stats:\n");
-	for (i = 0; i <= idle_max_cpu; ++i) {
+	for (i = 0; i < idle_max_cpu; ++i) {
 		t = idle_threads[i];
 		if (!t)
 			continue;
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 06/10] perf sched timehist: Introduce struct idle_time_data
  2016-12-06  3:40 [PATCHSET 00/10] perf sched timehist: Introduce --idle-hist option (v1) Namhyung Kim
                   ` (4 preceding siblings ...)
  2016-12-06  3:40 ` [PATCH 05/10] perf sched timehist: Cleanup idle_max_cpu handling Namhyung Kim
@ 2016-12-06  3:40 ` Namhyung Kim
  2016-12-06  4:07   ` David Ahern
  2016-12-06  3:40 ` [PATCH 07/10] perf sched timehist: Save callchain when entering idle Namhyung Kim
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Namhyung Kim @ 2016-12-06  3:40 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Andi Kleen, Minchan Kim

The struct idle_time_data is to keep idle stats with callchains entering
to the idle task.  The normal thread_runtime calculation is done
transparently since it extends the struct thread_runtime.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-sched.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 26efa99567b8..d3ee814ce77f 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -230,6 +230,15 @@ struct evsel_runtime {
 	u32 ncpu;       /* highest cpu slot allocated */
 };
 
+/* per cpu idle time data */
+struct idle_thread_runtime {
+	struct thread_runtime tr;
+	struct thread *last_thread;
+	struct rb_root sorted_root;
+	struct callchain_root callchain;
+	struct callchain_cursor cursor;
+};
+
 /* track idle times per cpu */
 static struct thread **idle_threads;
 static int idle_max_cpu;
@@ -2009,11 +2018,22 @@ static int init_idle_threads(int ncpu)
 
 	/* allocate the actual thread struct if needed */
 	for (i = 0; i < ncpu; ++i) {
+		struct idle_thread_runtime *itr;
+
 		idle_threads[i] = thread__new(0, 0);
 		if (idle_threads[i] == NULL)
 			return -ENOMEM;
 
 		thread__set_comm(idle_threads[i], idle_comm, 0);
+
+		itr = zalloc(sizeof(*itr));
+		if (itr == NULL)
+			return -ENOMEM;
+
+		init_stats(&itr->tr.run_stats);
+		callchain_init(&itr->callchain);
+		callchain_cursor_reset(&itr->cursor);
+		thread__set_priv(idle_threads[i], itr);
 	}
 
 	return 0;
@@ -2060,8 +2080,19 @@ static struct thread *get_idle_thread(int cpu)
 		idle_threads[cpu] = thread__new(0, 0);
 
 		if (idle_threads[cpu]) {
+			struct idle_thread_runtime *itr;
+
 			idle_threads[cpu]->tid = 0;
 			thread__set_comm(idle_threads[cpu], idle_comm, 0);
+
+			itr = zalloc(sizeof(*itr));
+			if (itr == NULL)
+				return NULL;
+
+			init_stats(&itr->tr.run_stats);
+			callchain_init(&itr->callchain);
+			callchain_cursor_reset(&itr->cursor);
+			thread__set_priv(idle_threads[cpu], itr);
 		}
 	}
 
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 07/10] perf sched timehist: Save callchain when entering idle
  2016-12-06  3:40 [PATCHSET 00/10] perf sched timehist: Introduce --idle-hist option (v1) Namhyung Kim
                   ` (5 preceding siblings ...)
  2016-12-06  3:40 ` [PATCH 06/10] perf sched timehist: Introduce struct idle_time_data Namhyung Kim
@ 2016-12-06  3:40 ` Namhyung Kim
  2016-12-06  3:40 ` [PATCH 08/10] perf sched timehist: Skip non-idle events when necessary Namhyung Kim
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2016-12-06  3:40 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Andi Kleen, Minchan Kim

In order to investigate reason of idle, it needs to keep the callchains
when entering to idle.  This can be identified sched_switch event having
next_pid as 0.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-sched.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index d3ee814ce77f..058cb549cf3e 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -200,6 +200,7 @@ struct perf_sched {
 	/* options for timehist command */
 	bool		summary;
 	bool		summary_only;
+	bool		idle_hist;
 	bool		show_callchain;
 	unsigned int	max_stack;
 	bool		show_cpu_visual;
@@ -2099,6 +2100,29 @@ static struct thread *get_idle_thread(int cpu)
 	return idle_threads[cpu];
 }
 
+static void save_idle_callchain(struct thread *thread,
+				struct perf_sample *sample)
+{
+	struct thread *idle;
+	struct idle_thread_runtime *itr;
+
+	if (!symbol_conf.use_callchain || sample->callchain == NULL)
+		return;
+
+	idle = get_idle_thread(sample->cpu);
+	if (idle == NULL) {
+		pr_err("Failed to get idle thread for cpu %d.\n", sample->cpu);
+		return;
+	}
+
+	itr = thread__priv(idle);
+	if (itr == NULL)
+		return;
+
+	itr->last_thread = thread;
+	callchain_cursor__copy(&itr->cursor, &callchain_cursor);
+}
+
 /*
  * handle runtime stats saved per thread
  */
@@ -2152,6 +2176,11 @@ static struct thread *timehist_get_thread(struct perf_sched *sched,
 		}
 
 		save_task_callchain(sched, sample, evsel, machine);
+		if (sched->idle_hist) {
+			/* copy task callchain when entering to idle */
+			if (perf_evsel__intval(evsel, sample, "next_pid") == 0)
+				save_idle_callchain(thread, sample);
+		}
 	}
 
 	return thread;
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 08/10] perf sched timehist: Skip non-idle events when necessary
  2016-12-06  3:40 [PATCHSET 00/10] perf sched timehist: Introduce --idle-hist option (v1) Namhyung Kim
                   ` (6 preceding siblings ...)
  2016-12-06  3:40 ` [PATCH 07/10] perf sched timehist: Save callchain when entering idle Namhyung Kim
@ 2016-12-06  3:40 ` Namhyung Kim
  2016-12-06  3:40 ` [PATCH 09/10] perf sched timehist: Add -I/--idle-hist option Namhyung Kim
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2016-12-06  3:40 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Andi Kleen, Minchan Kim

Sometimes it only focuses on idle-related events like upcoming idle-hist
feature.  In this case we don't want to see other event to reduce noise.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-sched.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 058cb549cf3e..8df562ff3a43 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -2187,7 +2187,9 @@ static struct thread *timehist_get_thread(struct perf_sched *sched,
 }
 
 static bool timehist_skip_sample(struct perf_sched *sched,
-				 struct thread *thread)
+				 struct thread *thread,
+				 struct perf_evsel *evsel,
+				 struct perf_sample *sample)
 {
 	bool rc = false;
 
@@ -2196,10 +2198,19 @@ static bool timehist_skip_sample(struct perf_sched *sched,
 		sched->skipped_samples++;
 	}
 
+	if (sched->idle_hist) {
+		if (strcmp(perf_evsel__name(evsel), "sched:sched_switch"))
+			rc = true;
+		else if (perf_evsel__intval(evsel, sample, "prev_pid") != 0 &&
+			 perf_evsel__intval(evsel, sample, "next_pid") != 0)
+			rc = true;
+	}
+
 	return rc;
 }
 
 static void timehist_print_wakeup_event(struct perf_sched *sched,
+					struct perf_evsel *evsel,
 					struct perf_sample *sample,
 					struct machine *machine,
 					struct thread *awakened)
@@ -2212,8 +2223,8 @@ static void timehist_print_wakeup_event(struct perf_sched *sched,
 		return;
 
 	/* show wakeup unless both awakee and awaker are filtered */
-	if (timehist_skip_sample(sched, thread) &&
-	    timehist_skip_sample(sched, awakened)) {
+	if (timehist_skip_sample(sched, thread, evsel, sample) &&
+	    timehist_skip_sample(sched, awakened, evsel, sample)) {
 		return;
 	}
 
@@ -2258,7 +2269,7 @@ static int timehist_sched_wakeup_event(struct perf_tool *tool,
 	/* show wakeups if requested */
 	if (sched->show_wakeups &&
 	    !perf_time__skip_sample(&sched->ptime, sample->time))
-		timehist_print_wakeup_event(sched, sample, machine, thread);
+		timehist_print_wakeup_event(sched, evsel, sample, machine, thread);
 
 	return 0;
 }
@@ -2285,8 +2296,8 @@ static void timehist_print_migration_event(struct perf_sched *sched,
 	if (thread == NULL)
 		return;
 
-	if (timehist_skip_sample(sched, thread) &&
-	    timehist_skip_sample(sched, migrated)) {
+	if (timehist_skip_sample(sched, thread, evsel, sample) &&
+	    timehist_skip_sample(sched, migrated, evsel, sample)) {
 		return;
 	}
 
@@ -2371,7 +2382,7 @@ static int timehist_sched_change_event(struct perf_tool *tool,
 		goto out;
 	}
 
-	if (timehist_skip_sample(sched, thread))
+	if (timehist_skip_sample(sched, thread, evsel, sample))
 		goto out;
 
 	tr = thread__get_runtime(thread);
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 09/10] perf sched timehist: Add -I/--idle-hist option
  2016-12-06  3:40 [PATCHSET 00/10] perf sched timehist: Introduce --idle-hist option (v1) Namhyung Kim
                   ` (7 preceding siblings ...)
  2016-12-06  3:40 ` [PATCH 08/10] perf sched timehist: Skip non-idle events when necessary Namhyung Kim
@ 2016-12-06  3:40 ` Namhyung Kim
  2016-12-06  3:40 ` [PATCH 10/10] perf sched timehist: Show callchains for idle stat Namhyung Kim
  2016-12-06 19:44 ` [PATCHSET 00/10] perf sched timehist: Introduce --idle-hist option (v1) Arnaldo Carvalho de Melo
  10 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2016-12-06  3:40 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Andi Kleen, Minchan Kim

The --idle-hist option is to analyze system idle state so which process
makes cpu to go idle.  If this option is specified, non-idle events will
be skipped and processes switching to/from idle will be shown.

This option is mostly useful when used with --summary(-only) option.  In
the idle-time summary view, idle time is accounted to previous thread
which is run before idle task.

The example output looks like following:

  Idle-time summary
                          comm  parent  sched-out    idle-time   min-idle    avg-idle    max-idle  stddev  migrations
                                          (count)       (msec)     (msec)      (msec)      (msec)       %
  -------------------------------------------------------------------------------------------------------------------
                rcu_preempt[7]       2         95      550.872      0.011       5.798      23.146    7.63       0
               migration/1[16]       2          1       15.558     15.558      15.558      15.558    0.00       0
                khugepaged[39]       2          1        3.062      3.062       3.062       3.062    0.00       0
             kworker/0:1H[124]       2          2        4.728      0.611       2.364       4.116   74.12       0
          systemd-journal[167]       1          1        4.510      4.510       4.510       4.510    0.00       0
            kworker/u16:3[558]       2         13       74.737      0.080       5.749      12.960   21.96       0
           irq/34-iwlwifi[628]       2         21      118.403      0.032       5.638      23.990   24.00       0
            kworker/u17:0[673]       2          1        3.523      3.523       3.523       3.523    0.00       0
              dbus-daemon[722]       1          1        6.743      6.743       6.743       6.743    0.00       0
                  ifplugd[741]       1          1       58.826     58.826      58.826      58.826    0.00       0
          wpa_supplicant[1490]       1          1       13.302     13.302      13.302      13.302    0.00       0
             wpa_actiond[1492]       1          2        4.064      0.168       2.032       3.896   91.72       0
                 dockerd[1500]       1          1        0.055      0.055       0.055       0.055    0.00       0
  ...

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-sched.txt |  4 ++++
 tools/perf/builtin-sched.c              | 38 ++++++++++++++++++++++++++++-----
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/tools/perf/Documentation/perf-sched.txt b/tools/perf/Documentation/perf-sched.txt
index 7775b1eb2bee..76173969ab80 100644
--- a/tools/perf/Documentation/perf-sched.txt
+++ b/tools/perf/Documentation/perf-sched.txt
@@ -132,6 +132,10 @@ OPTIONS for 'perf sched timehist'
 --migrations::
 	Show migration events.
 
+-I::
+--idle-hist::
+	Show idle-related events only.
+
 --time::
 	Only analyze samples within given time window: <start>,<stop>. Times
 	have the format seconds.microseconds. If start is not given (i.e., time
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 8df562ff3a43..7a5a497d9fa2 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -2418,7 +2418,28 @@ static int timehist_sched_change_event(struct perf_tool *tool,
 			t = ptime->end;
 	}
 
-	timehist_update_runtime_stats(tr, t, tprev);
+	if (!sched->idle_hist || thread->tid == 0) {
+		timehist_update_runtime_stats(tr, t, tprev);
+
+		if (sched->idle_hist) {
+			struct idle_thread_runtime *itr = (void *)tr;
+			struct thread_runtime *last_tr;
+
+			BUG_ON(thread->tid != 0);
+
+			if (itr->last_thread == NULL)
+				goto out;
+
+			/* add current idle time as last thread's runtime */
+			last_tr = thread__get_runtime(itr->last_thread);
+			if (last_tr == NULL)
+				goto out;
+
+			timehist_update_runtime_stats(last_tr, t, tprev);
+
+			itr->last_thread = NULL;
+		}
+	}
 
 	if (!sched->summary_only)
 		timehist_print_sample(sched, sample, &al, thread, t);
@@ -2540,9 +2561,15 @@ static void timehist_print_summary(struct perf_sched *sched,
 	if (comm_width < 30)
 		comm_width = 30;
 
-	printf("\nRuntime summary\n");
-	printf("%*s  parent   sched-in  ", comm_width, "comm");
-	printf("   run-time    min-run     avg-run     max-run  stddev  migrations\n");
+	if (sched->idle_hist) {
+		printf("\nIdle-time summary\n");
+		printf("%*s  parent  sched-out  ", comm_width, "comm");
+		printf("  idle-time   min-idle    avg-idle    max-idle  stddev  migrations\n");
+	} else {
+		printf("\nRuntime summary\n");
+		printf("%*s  parent   sched-in  ", comm_width, "comm");
+		printf("   run-time    min-run     avg-run     max-run  stddev  migrations\n");
+	}
 	printf("%*s            (count)  ", comm_width, "");
 	printf("     (msec)     (msec)      (msec)      (msec)       %%\n");
 	printf("%.117s\n", graph_dotted_line);
@@ -2558,7 +2585,7 @@ static void timehist_print_summary(struct perf_sched *sched,
 		printf("<no terminated tasks>\n");
 
 	/* CPU idle stats not tracked when samples were skipped */
-	if (sched->skipped_samples)
+	if (sched->skipped_samples && !sched->idle_hist)
 		return;
 
 	printf("\nIdle stats:\n");
@@ -3100,6 +3127,7 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_BOOLEAN('w', "wakeups", &sched.show_wakeups, "Show wakeup events"),
 	OPT_BOOLEAN('M', "migrations", &sched.show_migrations, "Show migration events"),
 	OPT_BOOLEAN('V', "cpu-visual", &sched.show_cpu_visual, "Add CPU visual"),
+	OPT_BOOLEAN('I', "idle-hist", &sched.idle_hist, "Show idle events only"),
 	OPT_STRING(0, "time", &sched.time_str, "str",
 		   "Time span for analysis (start,stop)"),
 	OPT_PARENT(sched_options)
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 10/10] perf sched timehist: Show callchains for idle stat
  2016-12-06  3:40 [PATCHSET 00/10] perf sched timehist: Introduce --idle-hist option (v1) Namhyung Kim
                   ` (8 preceding siblings ...)
  2016-12-06  3:40 ` [PATCH 09/10] perf sched timehist: Add -I/--idle-hist option Namhyung Kim
@ 2016-12-06  3:40 ` Namhyung Kim
  2016-12-06 19:44 ` [PATCHSET 00/10] perf sched timehist: Introduce --idle-hist option (v1) Arnaldo Carvalho de Melo
  10 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2016-12-06  3:40 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Andi Kleen, Minchan Kim

When --idle-hist option is used with --summary, it now shows idle stats
with callchains like below:

  Idle stats by callchain:
    CPU  0:   902.195  msec
    Idle time (msec)     Count  Callchains
    ----------------  --------  --------------------------------------------------
             370.589        69  futex_wait_queue_me <- futex_wait <- do_futex <- sys_futex <- entry_SYSCALL_64_fastpath
             178.799        17  worker_thread <- kthread <- ret_from_fork
             128.352        17  schedule_timeout <- rcu_gp_kthread <- kthread <- ret_from_fork
             125.111        19  schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_select <- core_sys_select
              71.599        50  schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- sys_poll
              23.146         1  rcu_gp_kthread <- kthread <- ret_from_fork
               4.510         1  schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- ep_poll <- sys_epoll_wait <- do_syscall_64
               0.085         1  schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- do_restart_poll
  ...

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-sched.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 86 insertions(+)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 7a5a497d9fa2..acd8ed5af438 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -2436,6 +2436,9 @@ static int timehist_sched_change_event(struct perf_tool *tool,
 				goto out;
 
 			timehist_update_runtime_stats(last_tr, t, tprev);
+			if (itr->cursor.nr)
+				callchain_append(&itr->callchain, &itr->cursor,
+						 t - tprev);
 
 			itr->last_thread = NULL;
 		}
@@ -2546,6 +2549,60 @@ static int show_deadthread_runtime(struct thread *t, void *priv)
 	return __show_thread_runtime(t, priv);
 }
 
+static size_t callchain__fprintf_folded(FILE *fp, struct callchain_node *node)
+{
+	const char *sep = " <- ";
+	struct callchain_list *chain;
+	size_t ret = 0;
+	char bf[1024];
+	bool first;
+
+	if (node == NULL)
+		return 0;
+
+	ret = callchain__fprintf_folded(fp, node->parent);
+	first = (ret == 0);
+
+	list_for_each_entry(chain, &node->val, list) {
+		if (chain->ip >= PERF_CONTEXT_MAX)
+			continue;
+		if (chain->ms.sym && chain->ms.sym->ignore)
+			continue;
+		ret += fprintf(fp, "%s%s", first ? "" : sep,
+			       callchain_list__sym_name(chain, bf, sizeof(bf),
+							false));
+		first = false;
+	}
+
+	return ret;
+}
+
+static size_t timehist_print_idlehist_callchain(struct rb_root *root)
+{
+	size_t ret = 0;
+	FILE *fp = stdout;
+	struct callchain_node *chain;
+	struct rb_node *rb_node = rb_first(root);
+
+	printf("  %16s  %8s  %s\n", "Idle time (msec)", "Count", "Callchains");
+	printf("  %.16s  %.8s  %.50s\n", graph_dotted_line, graph_dotted_line,
+	       graph_dotted_line);
+
+	while (rb_node) {
+		chain = rb_entry(rb_node, struct callchain_node, rb_node);
+		rb_node = rb_next(rb_node);
+
+		ret += fprintf(fp, "  ");
+		print_sched_time(chain->hit, 12);
+		ret += 16;  /* print_sched_time returns 2nd arg + 4 */
+		ret += fprintf(fp, " %8d  ", chain->count);
+		ret += callchain__fprintf_folded(fp, chain);
+		ret += fprintf(fp, "\n");
+	}
+
+	return ret;
+}
+
 static void timehist_print_summary(struct perf_sched *sched,
 				   struct perf_session *session)
 {
@@ -2604,6 +2661,35 @@ static void timehist_print_summary(struct perf_sched *sched,
 			printf("    CPU %2d idle entire time window\n", i);
 	}
 
+	if (sched->idle_hist && symbol_conf.use_callchain) {
+		callchain_param.mode  = CHAIN_FOLDED;
+		callchain_param.value = CCVAL_PERIOD;
+
+		callchain_register_param(&callchain_param);
+
+		printf("\nIdle stats by callchain:\n");
+		for (i = 0; i < idle_max_cpu; ++i) {
+			struct idle_thread_runtime *itr;
+
+			t = idle_threads[i];
+			if (!t)
+				continue;
+
+			itr = thread__priv(t);
+			if (itr == NULL)
+				continue;
+
+			callchain_param.sort(&itr->sorted_root, &itr->callchain,
+					     0, &callchain_param);
+
+			printf("  CPU %2d:", i);
+			print_sched_time(itr->tr.total_run_time, 6);
+			printf(" msec\n");
+			timehist_print_idlehist_callchain(&itr->sorted_root);
+			printf("\n");
+		}
+	}
+
 	printf("\n"
 	       "    Total number of unique tasks: %" PRIu64 "\n"
 	       "Total number of context switches: %" PRIu64 "\n"
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 03/10] perf sched timehist: Handle zero sample->tid properly
  2016-12-06  3:40 ` [PATCH 03/10] perf sched timehist: Handle zero sample->tid properly Namhyung Kim
@ 2016-12-06  3:52   ` David Ahern
  2016-12-06  3:59     ` Namhyung Kim
  2016-12-07  2:06     ` Namhyung Kim
  2016-12-07 18:25   ` [tip:perf/core] " tip-bot for Namhyung Kim
  1 sibling, 2 replies; 28+ messages in thread
From: David Ahern @ 2016-12-06  3:52 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Andi Kleen, Minchan Kim

On 12/5/16 7:40 PM, Namhyung Kim wrote:
> Sometimes samples have tid of 0 but non-0 pid.  It ends up having a

Any idea how that happens?

> new thread of 0 tid/pid (instead of referring idle task) since tid is
> used to search matching task.  But I guess it's wrong to use 0 as a
> tid when pid is set.  This patch uses tid only if it has a non-zero
> value or same as pid (of 0).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 04/10] perf sched timehist: Split is_idle_sample()
  2016-12-06  3:40 ` [PATCH 04/10] perf sched timehist: Split is_idle_sample() Namhyung Kim
@ 2016-12-06  3:57   ` David Ahern
  2016-12-07  1:26     ` Namhyung Kim
  0 siblings, 1 reply; 28+ messages in thread
From: David Ahern @ 2016-12-06  3:57 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Andi Kleen, Minchan Kim

On 12/5/16 7:40 PM, Namhyung Kim wrote:
> The is_idle_sample() function actually does more than determining
> whether sample come from idle task.  Split the callchain part into
> save_task_callchain() to make it clearer.  Also checking prev_pid from
> trace data looks unnecessary since it should be always same as
> sample->pid.

The timehist command is 3-4 years old now, and over that time it has been used on a variety of OS'es -- RHEL6, WRL5, WRL6, various Fedora and debian releases, etc.

The idle_sample checks were added for a reason. Given the range of kernels that have sched tracepoints I'd be wary of just removing it regardless of how silly it seems against top of tree kernels. 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 01/10] perf sched: Cleanup option processing
  2016-12-06  3:40 ` [PATCH 01/10] perf sched: Cleanup option processing Namhyung Kim
@ 2016-12-06  3:58   ` David Ahern
  2016-12-07 18:24   ` [tip:perf/core] " tip-bot for Namhyung Kim
  1 sibling, 0 replies; 28+ messages in thread
From: David Ahern @ 2016-12-06  3:58 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Andi Kleen, Minchan Kim

On 12/5/16 7:40 PM, Namhyung Kim wrote:
> The -D/--dump-raw-trace option is in the parent option so no need to
> repeat it.  Also move -f/--force option to parent as it's common to
> handle data file.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-sched.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 

Acked-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 03/10] perf sched timehist: Handle zero sample->tid properly
  2016-12-06  3:52   ` David Ahern
@ 2016-12-06  3:59     ` Namhyung Kim
  2016-12-06  4:01       ` David Ahern
  2016-12-07  2:06     ` Namhyung Kim
  1 sibling, 1 reply; 28+ messages in thread
From: Namhyung Kim @ 2016-12-06  3:59 UTC (permalink / raw)
  To: David Ahern
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	LKML, Andi Kleen, Minchan Kim

Hi David,

On Tue, Dec 6, 2016 at 12:52 PM, David Ahern <dsahern@gmail.com> wrote:
> On 12/5/16 7:40 PM, Namhyung Kim wrote:
>> Sometimes samples have tid of 0 but non-0 pid.  It ends up having a
>
> Any idea how that happens?

No, I didn't investigate it yet.  Looking at the original code, you
seemed to have same issue and workaround like checking prev_pid or
callchains, right?

Thanks,
Namhyung

>
>> new thread of 0 tid/pid (instead of referring idle task) since tid is
>> used to search matching task.  But I guess it's wrong to use 0 as a
>> tid when pid is set.  This patch uses tid only if it has a non-zero
>> value or same as pid (of 0).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 05/10] perf sched timehist: Cleanup idle_max_cpu handling
  2016-12-06  3:40 ` [PATCH 05/10] perf sched timehist: Cleanup idle_max_cpu handling Namhyung Kim
@ 2016-12-06  4:00   ` David Ahern
  2016-12-07 18:26   ` [tip:perf/core] " tip-bot for Namhyung Kim
  1 sibling, 0 replies; 28+ messages in thread
From: David Ahern @ 2016-12-06  4:00 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Andi Kleen, Minchan Kim

On 12/5/16 7:40 PM, Namhyung Kim wrote:
> It treats the idle_max_cpu little bit confusingly IMHO.  Let's make it
> more straight forward.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-sched.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 

Seems correct. I've lost track of how idle_max_cpu ended up like that.

Acked-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 03/10] perf sched timehist: Handle zero sample->tid properly
  2016-12-06  3:59     ` Namhyung Kim
@ 2016-12-06  4:01       ` David Ahern
  0 siblings, 0 replies; 28+ messages in thread
From: David Ahern @ 2016-12-06  4:01 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	LKML, Andi Kleen, Minchan Kim

On 12/5/16 7:59 PM, Namhyung Kim wrote:
> No, I didn't investigate it yet.  Looking at the original code, you
> seemed to have same issue and workaround like checking prev_pid or
> callchains, right?

most likely. As I responded on another patch, the sched timehist command has been used for years on a range of OS'es and kernel versions. Most of the oddities you see are (sometime quick) fixes to strange differences in kernel versions.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 06/10] perf sched timehist: Introduce struct idle_time_data
  2016-12-06  3:40 ` [PATCH 06/10] perf sched timehist: Introduce struct idle_time_data Namhyung Kim
@ 2016-12-06  4:07   ` David Ahern
  2016-12-07  2:12     ` Namhyung Kim
  0 siblings, 1 reply; 28+ messages in thread
From: David Ahern @ 2016-12-06  4:07 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Andi Kleen, Minchan Kim

On 12/5/16 7:40 PM, Namhyung Kim wrote:
> The struct idle_time_data is to keep idle stats with callchains entering
> to the idle task.  The normal thread_runtime calculation is done
> transparently since it extends the struct thread_runtime.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-sched.c | 31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> index 26efa99567b8..d3ee814ce77f 100644
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c
> @@ -230,6 +230,15 @@ struct evsel_runtime {
>  	u32 ncpu;       /* highest cpu slot allocated */
>  };
>  
> +/* per cpu idle time data */
> +struct idle_thread_runtime {
> +	struct thread_runtime tr;
> +	struct thread *last_thread;
> +	struct rb_root sorted_root;
> +	struct callchain_root callchain;
> +	struct callchain_cursor cursor;
> +};
> +
>  /* track idle times per cpu */
>  static struct thread **idle_threads;
>  static int idle_max_cpu;
> @@ -2009,11 +2018,22 @@ static int init_idle_threads(int ncpu)
>  
>  	/* allocate the actual thread struct if needed */
>  	for (i = 0; i < ncpu; ++i) {
> +		struct idle_thread_runtime *itr;
> +
>  		idle_threads[i] = thread__new(0, 0);
>  		if (idle_threads[i] == NULL)
>  			return -ENOMEM;
>  
>  		thread__set_comm(idle_threads[i], idle_comm, 0);
> +
> +		itr = zalloc(sizeof(*itr));
> +		if (itr == NULL)
> +			return -ENOMEM;
> +
> +		init_stats(&itr->tr.run_stats);
> +		callchain_init(&itr->callchain);
> +		callchain_cursor_reset(&itr->cursor);
> +		thread__set_priv(idle_threads[i], itr);

perhaps the common lines above should be put into init_idle_thread() and called in both places to init a single idle thread instance.


>  	}
>  
>  	return 0;
> @@ -2060,8 +2080,19 @@ static struct thread *get_idle_thread(int cpu)
>  		idle_threads[cpu] = thread__new(0, 0);
>  
>  		if (idle_threads[cpu]) {
> +			struct idle_thread_runtime *itr;
> +
>  			idle_threads[cpu]->tid = 0;
>  			thread__set_comm(idle_threads[cpu], idle_comm, 0);
> +
> +			itr = zalloc(sizeof(*itr));
> +			if (itr == NULL)
> +				return NULL;
> +
> +			init_stats(&itr->tr.run_stats);
> +			callchain_init(&itr->callchain);
> +			callchain_cursor_reset(&itr->cursor);
> +			thread__set_priv(idle_threads[cpu], itr);
>  		}
>  	}
>  
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCHSET 00/10] perf sched timehist: Introduce --idle-hist option (v1)
  2016-12-06  3:40 [PATCHSET 00/10] perf sched timehist: Introduce --idle-hist option (v1) Namhyung Kim
                   ` (9 preceding siblings ...)
  2016-12-06  3:40 ` [PATCH 10/10] perf sched timehist: Show callchains for idle stat Namhyung Kim
@ 2016-12-06 19:44 ` Arnaldo Carvalho de Melo
  2016-12-06 21:31   ` David Ahern
  10 siblings, 1 reply; 28+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-12-06 19:44 UTC (permalink / raw)
  To: David Ahern
  Cc: Namhyung Kim, Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML,
	Andi Kleen, Minchan Kim

Em Tue, Dec 06, 2016 at 12:40:00PM +0900, Namhyung Kim escreveu:
> Hi,
> 
> This patchset implements the idle hist feature which analyze reason of system
> idle.  Sometimes I need to investigate what makes CPUs to go idle even though
> I have jobs to do.  It may be due to I/O, waiting on lock or whatever.

So, on a first round, I've applied the ones David acked, will try
pushing to Ingo today.

David, I still haven't myself reviewed the patches, before that, do you
have any strong objection to the other patches? Will you continue
reviewing them?

- Arnaldo
 
> To identify the reasons it only accounts events related to idle task.  Also it
> shows callchains when entering to idle and time how long it's in the idle.
> Although it's not perfect, it works well to help finding the reasons.
> 
> With --idle-hist option, only idle task's runtime is accounted to previous
> task (which makes the cpu go to idle).  With --summary(-only) option, you can
> see the total idle stat by task or callchains.
> 
> The example output looks like below:
> 
>   Idle-time summary
>                             comm  parent  sched-out    idle-time   min-idle    avg-idle    max-idle  stddev  migrations
>                                             (count)       (msec)     (msec)      (msec)      (msec)       %
>   ---------------------------------------------------------------------------------------------------------------------
>                   rcu_preempt[7]       2         95      550.872      0.011       5.798      23.146    7.63       0
>                  migration/1[16]       2          1       15.558     15.558      15.558      15.558    0.00       0
>                   khugepaged[39]       2          1        3.062      3.062       3.062       3.062    0.00       0
>                kworker/0:1H[124]       2          2        4.728      0.611       2.364       4.116   74.12       0
>             systemd-journal[167]       1          1        4.510      4.510       4.510       4.510    0.00       0
>               kworker/u16:3[558]       2         13       74.737      0.080       5.749      12.960   21.96       0
>              irq/34-iwlwifi[628]       2         21      118.403      0.032       5.638      23.990   24.00       0
>               kworker/u17:0[673]       2          1        3.523      3.523       3.523       3.523    0.00       0
>                 dbus-daemon[722]       1          1        6.743      6.743       6.743       6.743    0.00       0
>                     ifplugd[741]       1          1       58.826     58.826      58.826      58.826    0.00       0
>             wpa_supplicant[1490]       1          1       13.302     13.302      13.302      13.302    0.00       0
>                wpa_actiond[1492]       1          2        4.064      0.168       2.032       3.896   91.72       0
>                    dockerd[1500]       1          1        0.055      0.055       0.055       0.055    0.00       0
>               dockerd[1516/1500]    1500          1        5.732      5.732       5.732       5.732    0.00       0
>               dockerd[1571/1500]    1500          1        0.020      0.020       0.020       0.020    0.00       0
>   ...
>   
>   Idle stats:
>       CPU  0 idle for    902.195  msec
>       CPU  1 idle for    899.932  msec
>       CPU  2 idle for    941.218  msec
>       CPU  3 idle for    822.453  msec
>   
>   Idle stats by callchain:
>     CPU  0:   902.195  msec
>     Idle time (msec)     Count  Callchains
>     ----------------  --------  --------------------------------------------------
>              370.589        69  futex_wait_queue_me <- futex_wait <- do_futex <- sys_futex <- entry_SYSCALL_64_fastpath
>              178.799        17  worker_thread <- kthread <- ret_from_fork
>              128.352        17  schedule_timeout <- rcu_gp_kthread <- kthread <- ret_from_fork
>              125.111        19  schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_select <- core_sys_select
>               71.599        50  schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- sys_poll
>               23.146         1  rcu_gp_kthread <- kthread <- ret_from_fork
>                4.510         1  schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- ep_poll <- sys_epoll_wait <- do_syscall_64
>                0.085         1  schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- do_restart_poll
>   ...
> 
> 
> The patch 1-5 are cleanup and preparations so they can be applied separately.
> The patch 6-10 implements the -I/--idle-hist option.
> 
> The code is available at 'perf/timehist-idle-v1' branch in my tree:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git 
> 
> Any comments are welcome!
> 
> Thanks,
> Namhyung
> 
> 
> Namhyung Kim (10):
>   perf sched: Cleanup option processing
>   perf tools: Introduce callchain_cursor__copy()
>   perf sched timehist: Handle zero sample->tid properly
>   perf sched timehist: Split is_idle_sample()
>   perf sched timehist: Cleanup idle_max_cpu handling
>   perf sched timehist: Introduce struct idle_time_data
>   perf sched timehist: Save callchain when entering idle
>   perf sched timehist: Skip non-idle events when necessary
>   perf sched timehist: Add -I/--idle-hist option
>   perf sched timehist: Show callchains for idle stat
> 
>  tools/perf/Documentation/perf-sched.txt |   4 +
>  tools/perf/builtin-sched.c              | 264 +++++++++++++++++++++++++++-----
>  tools/perf/util/callchain.c             |  27 ++++
>  tools/perf/util/callchain.h             |   3 +
>  4 files changed, 256 insertions(+), 42 deletions(-)
> 
> -- 
> 2.10.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCHSET 00/10] perf sched timehist: Introduce --idle-hist option (v1)
  2016-12-06 19:44 ` [PATCHSET 00/10] perf sched timehist: Introduce --idle-hist option (v1) Arnaldo Carvalho de Melo
@ 2016-12-06 21:31   ` David Ahern
  0 siblings, 0 replies; 28+ messages in thread
From: David Ahern @ 2016-12-06 21:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Namhyung Kim, Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML,
	Andi Kleen, Minchan Kim

On 12/6/16 11:44 AM, Arnaldo Carvalho de Melo wrote:
> Em Tue, Dec 06, 2016 at 12:40:00PM +0900, Namhyung Kim escreveu:
>> Hi,
>>
>> This patchset implements the idle hist feature which analyze reason of system
>> idle.  Sometimes I need to investigate what makes CPUs to go idle even though
>> I have jobs to do.  It may be due to I/O, waiting on lock or whatever.
> So, on a first round, I've applied the ones David acked, will try
> pushing to Ingo today.
> 
> David, I still haven't myself reviewed the patches, before that, do you
> have any strong objection to the other patches? Will you continue
> reviewing them?

patches 7-10 were fine. It's an interesting feature. My main concern was the changes to detecting idle_sample. If it works properly on RHEL6, RHEL7, and top of tree then it should be fine.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 04/10] perf sched timehist: Split is_idle_sample()
  2016-12-06  3:57   ` David Ahern
@ 2016-12-07  1:26     ` Namhyung Kim
  0 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2016-12-07  1:26 UTC (permalink / raw)
  To: David Ahern
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	LKML, Andi Kleen, Minchan Kim

Hi David,

On Mon, Dec 05, 2016 at 07:57:43PM -0800, David Ahern wrote:
> On 12/5/16 7:40 PM, Namhyung Kim wrote:
> > The is_idle_sample() function actually does more than determining
> > whether sample come from idle task.  Split the callchain part into
> > save_task_callchain() to make it clearer.  Also checking prev_pid from
> > trace data looks unnecessary since it should be always same as
> > sample->pid.
> 
> The timehist command is 3-4 years old now, and over that time it has
> been used on a variety of OS'es -- RHEL6, WRL5, WRL6, various Fedora
> and debian releases, etc. 
> 
> The idle_sample checks were added for a reason. Given the range of
> kernels that have sched tracepoints I'd be wary of just removing it
> regardless of how silly it seems against top of tree kernels.

Hmm.. ok.  Maybe I need to add a comment.

What do you think about moving callchain part then?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 03/10] perf sched timehist: Handle zero sample->tid properly
  2016-12-06  3:52   ` David Ahern
  2016-12-06  3:59     ` Namhyung Kim
@ 2016-12-07  2:06     ` Namhyung Kim
  2016-12-08 14:19       ` Namhyung Kim
  1 sibling, 1 reply; 28+ messages in thread
From: Namhyung Kim @ 2016-12-07  2:06 UTC (permalink / raw)
  To: David Ahern
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	LKML, Andi Kleen, Minchan Kim

On Mon, Dec 05, 2016 at 07:52:57PM -0800, David Ahern wrote:
> On 12/5/16 7:40 PM, Namhyung Kim wrote:
> > Sometimes samples have tid of 0 but non-0 pid.  It ends up having a
> 
> Any idea how that happens?

It seems that an exiting task wakes up its parent and the parent might
call wait(2) concurrently.  So at the time of calling last schedule(),
its pid (and tgid) link might be unhashed by the parent and can have 0
sample->tid and/or sample->pid depending on timing IMHO.  Not sure
anything guarantees that the sample tid/pid is preserved during the
event.  From a quick look I couldn't find..

If that's true we need to somehow make sure that sample->tid of 0 is
actually from idle task or not.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 06/10] perf sched timehist: Introduce struct idle_time_data
  2016-12-06  4:07   ` David Ahern
@ 2016-12-07  2:12     ` Namhyung Kim
  0 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2016-12-07  2:12 UTC (permalink / raw)
  To: David Ahern
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	LKML, Andi Kleen, Minchan Kim

On Mon, Dec 05, 2016 at 08:07:09PM -0800, David Ahern wrote:
> On 12/5/16 7:40 PM, Namhyung Kim wrote:
> > The struct idle_time_data is to keep idle stats with callchains entering
> > to the idle task.  The normal thread_runtime calculation is done
> > transparently since it extends the struct thread_runtime.
> > 
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/perf/builtin-sched.c | 31 +++++++++++++++++++++++++++++++
> >  1 file changed, 31 insertions(+)
> > 
> > diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> > index 26efa99567b8..d3ee814ce77f 100644
> > --- a/tools/perf/builtin-sched.c
> > +++ b/tools/perf/builtin-sched.c
> > @@ -230,6 +230,15 @@ struct evsel_runtime {
> >  	u32 ncpu;       /* highest cpu slot allocated */
> >  };
> >  
> > +/* per cpu idle time data */
> > +struct idle_thread_runtime {
> > +	struct thread_runtime tr;
> > +	struct thread *last_thread;
> > +	struct rb_root sorted_root;
> > +	struct callchain_root callchain;
> > +	struct callchain_cursor cursor;
> > +};
> > +
> >  /* track idle times per cpu */
> >  static struct thread **idle_threads;
> >  static int idle_max_cpu;
> > @@ -2009,11 +2018,22 @@ static int init_idle_threads(int ncpu)
> >  
> >  	/* allocate the actual thread struct if needed */
> >  	for (i = 0; i < ncpu; ++i) {
> > +		struct idle_thread_runtime *itr;
> > +
> >  		idle_threads[i] = thread__new(0, 0);
> >  		if (idle_threads[i] == NULL)
> >  			return -ENOMEM;
> >  
> >  		thread__set_comm(idle_threads[i], idle_comm, 0);
> > +
> > +		itr = zalloc(sizeof(*itr));
> > +		if (itr == NULL)
> > +			return -ENOMEM;
> > +
> > +		init_stats(&itr->tr.run_stats);
> > +		callchain_init(&itr->callchain);
> > +		callchain_cursor_reset(&itr->cursor);
> > +		thread__set_priv(idle_threads[i], itr);
> 
> perhaps the common lines above should be put into init_idle_thread()
> and called in both places to init a single idle thread instance.

Will change.

Thanks,
Namhyung

> 
> 
> >  	}
> >  
> >  	return 0;
> > @@ -2060,8 +2080,19 @@ static struct thread *get_idle_thread(int cpu)
> >  		idle_threads[cpu] = thread__new(0, 0);
> >  
> >  		if (idle_threads[cpu]) {
> > +			struct idle_thread_runtime *itr;
> > +
> >  			idle_threads[cpu]->tid = 0;
> >  			thread__set_comm(idle_threads[cpu], idle_comm, 0);
> > +
> > +			itr = zalloc(sizeof(*itr));
> > +			if (itr == NULL)
> > +				return NULL;
> > +
> > +			init_stats(&itr->tr.run_stats);
> > +			callchain_init(&itr->callchain);
> > +			callchain_cursor_reset(&itr->cursor);
> > +			thread__set_priv(idle_threads[cpu], itr);
> >  		}
> >  	}
> >  
> > 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [tip:perf/core] perf sched: Cleanup option processing
  2016-12-06  3:40 ` [PATCH 01/10] perf sched: Cleanup option processing Namhyung Kim
  2016-12-06  3:58   ` David Ahern
@ 2016-12-07 18:24   ` tip-bot for Namhyung Kim
  1 sibling, 0 replies; 28+ messages in thread
From: tip-bot for Namhyung Kim @ 2016-12-07 18:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jolsa, acme, tglx, dsahern, andi, namhyung, linux-kernel, hpa,
	peterz, minchan, mingo

Commit-ID:  6fa94258ce2673adc707b2ec5668464f2cf83ed3
Gitweb:     http://git.kernel.org/tip/6fa94258ce2673adc707b2ec5668464f2cf83ed3
Author:     Namhyung Kim <namhyung@kernel.org>
AuthorDate: Tue, 6 Dec 2016 12:40:01 +0900
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 7 Dec 2016 12:00:33 -0300

perf sched: Cleanup option processing

The -D/--dump-raw-trace option is in the parent option so no need to
repeat it.  Also move -f/--force option to parent as it's common to
handle data file.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20161206034010.6499-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-sched.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 84c39f1..634d8cf 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -2583,6 +2583,7 @@ static int perf_sched__timehist(struct perf_sched *sched)
 	struct perf_data_file file = {
 		.path = input_name,
 		.mode = PERF_DATA_MODE_READ,
+		.force = sched->force,
 	};
 
 	struct perf_session *session;
@@ -2988,6 +2989,7 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 		    "be more verbose (show symbol address, etc)"),
 	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
 		    "dump raw trace in ASCII"),
+	OPT_BOOLEAN('f', "force", &sched.force, "don't complain, do it"),
 	OPT_END()
 	};
 	const struct option latency_options[] = {
@@ -2995,8 +2997,6 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 		   "sort by key(s): runtime, switch, avg, max"),
 	OPT_INTEGER('C', "CPU", &sched.profile_cpu,
 		    "CPU to profile on"),
-	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
-		    "dump raw trace in ASCII"),
 	OPT_BOOLEAN('p', "pids", &sched.skip_merge,
 		    "latency stats per pid instead of per comm"),
 	OPT_PARENT(sched_options)
@@ -3004,9 +3004,6 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 	const struct option replay_options[] = {
 	OPT_UINTEGER('r', "repeat", &sched.replay_repeat,
 		     "repeat the workload replay N times (-1: infinite)"),
-	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
-		    "dump raw trace in ASCII"),
-	OPT_BOOLEAN('f', "force", &sched.force, "don't complain, do it"),
 	OPT_PARENT(sched_options)
 	};
 	const struct option map_options[] = {

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:perf/core] perf callchain: Introduce callchain_cursor__copy()
  2016-12-06  3:40 ` [PATCH 02/10] perf tools: Introduce callchain_cursor__copy() Namhyung Kim
@ 2016-12-07 18:25   ` tip-bot for Namhyung Kim
  0 siblings, 0 replies; 28+ messages in thread
From: tip-bot for Namhyung Kim @ 2016-12-07 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: minchan, peterz, acme, linux-kernel, namhyung, andi, dsahern,
	jolsa, tglx, hpa, mingo

Commit-ID:  571f1eb9b967a52732d2e1f41f1b62e27c900325
Gitweb:     http://git.kernel.org/tip/571f1eb9b967a52732d2e1f41f1b62e27c900325
Author:     Namhyung Kim <namhyung@kernel.org>
AuthorDate: Tue, 6 Dec 2016 12:40:02 +0900
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 7 Dec 2016 12:00:33 -0300

perf callchain: Introduce callchain_cursor__copy()

The callchain_cursor__copy() function is to save current callchain
captured by a cursor.  It'll be used to keep callchains when switching
to idle task for each cpu.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20161206034010.6499-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/callchain.c | 27 +++++++++++++++++++++++++++
 tools/perf/util/callchain.h |  3 +++
 2 files changed, 30 insertions(+)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 823befd..4292251 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1234,3 +1234,30 @@ out:
 	}
 	return -ENOMEM;
 }
+
+int callchain_cursor__copy(struct callchain_cursor *dst,
+			   struct callchain_cursor *src)
+{
+	int rc = 0;
+
+	callchain_cursor_reset(dst);
+	callchain_cursor_commit(src);
+
+	while (true) {
+		struct callchain_cursor_node *node;
+
+		node = callchain_cursor_current(src);
+		if (node == NULL)
+			break;
+
+		rc = callchain_cursor_append(dst, node->ip, node->map, node->sym,
+					     node->branch, &node->branch_flags,
+					     node->nr_loop_iter, node->samples);
+		if (rc)
+			break;
+
+		callchain_cursor_advance(src);
+	}
+
+	return rc;
+}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index d9c70dc..35c8e37 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -216,6 +216,9 @@ static inline void callchain_cursor_advance(struct callchain_cursor *cursor)
 	cursor->pos++;
 }
 
+int callchain_cursor__copy(struct callchain_cursor *dst,
+			   struct callchain_cursor *src);
+
 struct option;
 struct hist_entry;
 

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:perf/core] perf sched timehist: Handle zero sample->tid properly
  2016-12-06  3:40 ` [PATCH 03/10] perf sched timehist: Handle zero sample->tid properly Namhyung Kim
  2016-12-06  3:52   ` David Ahern
@ 2016-12-07 18:25   ` tip-bot for Namhyung Kim
  1 sibling, 0 replies; 28+ messages in thread
From: tip-bot for Namhyung Kim @ 2016-12-07 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, minchan, hpa, jolsa, peterz, linux-kernel, mingo, acme,
	dsahern, andi, namhyung

Commit-ID:  5d92d96a947a5d0d83710d11750bb29a0c1b985d
Gitweb:     http://git.kernel.org/tip/5d92d96a947a5d0d83710d11750bb29a0c1b985d
Author:     Namhyung Kim <namhyung@kernel.org>
AuthorDate: Tue, 6 Dec 2016 12:40:03 +0900
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 7 Dec 2016 12:00:34 -0300

perf sched timehist: Handle zero sample->tid properly

Sometimes samples have tid of 0 but non-0 pid.  It ends up having a new
thread of 0 tid/pid (instead of referring idle task) since tid is used
to search matching task.  But I guess it's wrong to use 0 as a tid when
pid is set.  This patch uses tid only if it has a non-zero value or same
as pid (of 0).

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20161206034010.6499-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-sched.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 634d8cf..c8b3e6c 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -2118,7 +2118,9 @@ static struct thread *timehist_get_thread(struct perf_sched *sched,
 			pr_err("Failed to get idle thread for cpu %d.\n", sample->cpu);
 
 	} else {
-		thread = machine__findnew_thread(machine, sample->pid, sample->tid);
+		/* there were samples with tid 0 but non-zero pid */
+		thread = machine__findnew_thread(machine, sample->pid,
+						 sample->tid ?: sample->pid);
 		if (thread == NULL) {
 			pr_debug("Failed to get thread for tid %d. skipping sample.\n",
 				 sample->tid);

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:perf/core] perf sched timehist: Cleanup idle_max_cpu handling
  2016-12-06  3:40 ` [PATCH 05/10] perf sched timehist: Cleanup idle_max_cpu handling Namhyung Kim
  2016-12-06  4:00   ` David Ahern
@ 2016-12-07 18:26   ` tip-bot for Namhyung Kim
  1 sibling, 0 replies; 28+ messages in thread
From: tip-bot for Namhyung Kim @ 2016-12-07 18:26 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: minchan, linux-kernel, namhyung, peterz, dsahern, acme, mingo,
	andi, hpa, tglx, jolsa

Commit-ID:  b336352b417edd8c058c9e09d136728b6e2ff56d
Gitweb:     http://git.kernel.org/tip/b336352b417edd8c058c9e09d136728b6e2ff56d
Author:     Namhyung Kim <namhyung@kernel.org>
AuthorDate: Tue, 6 Dec 2016 12:40:05 +0900
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 7 Dec 2016 12:00:34 -0300

perf sched timehist: Cleanup idle_max_cpu handling

It treats the idle_max_cpu little bit confusingly IMHO.  Let's make it
more straight forward.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20161206034010.6499-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-sched.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index c8b3e6c..1a3f1be 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -2010,7 +2010,7 @@ static int init_idle_threads(int ncpu)
 	if (!idle_threads)
 		return -ENOMEM;
 
-	idle_max_cpu = ncpu - 1;
+	idle_max_cpu = ncpu;
 
 	/* allocate the actual thread struct if needed */
 	for (i = 0; i < ncpu; ++i) {
@@ -2031,7 +2031,7 @@ static void free_idle_threads(void)
 	if (idle_threads == NULL)
 		return;
 
-	for (i = 0; i <= idle_max_cpu; ++i) {
+	for (i = 0; i < idle_max_cpu; ++i) {
 		if ((idle_threads[i]))
 			thread__delete(idle_threads[i]);
 	}
@@ -2054,8 +2054,7 @@ static struct thread *get_idle_thread(int cpu)
 			return NULL;
 
 		idle_threads = (struct thread **) p;
-		i = idle_max_cpu ? idle_max_cpu + 1 : 0;
-		for (; i < j; ++i)
+		for (i = idle_max_cpu; i < j; ++i)
 			idle_threads[i] = NULL;
 
 		idle_max_cpu = j;
@@ -2495,7 +2494,7 @@ static void timehist_print_summary(struct perf_sched *sched,
 		return;
 
 	printf("\nIdle stats:\n");
-	for (i = 0; i <= idle_max_cpu; ++i) {
+	for (i = 0; i < idle_max_cpu; ++i) {
 		t = idle_threads[i];
 		if (!t)
 			continue;

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 03/10] perf sched timehist: Handle zero sample->tid properly
  2016-12-07  2:06     ` Namhyung Kim
@ 2016-12-08 14:19       ` Namhyung Kim
  0 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2016-12-08 14:19 UTC (permalink / raw)
  To: David Ahern
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	LKML, Andi Kleen, Minchan Kim

On Wed, Dec 07, 2016 at 11:06:29AM +0900, Namhyung Kim wrote:
> On Mon, Dec 05, 2016 at 07:52:57PM -0800, David Ahern wrote:
> > On 12/5/16 7:40 PM, Namhyung Kim wrote:
> > > Sometimes samples have tid of 0 but non-0 pid.  It ends up having a
> > 
> > Any idea how that happens?
> 
> It seems that an exiting task wakes up its parent and the parent might
> call wait(2) concurrently.  So at the time of calling last schedule(),
> its pid (and tgid) link might be unhashed by the parent and can have 0
> sample->tid and/or sample->pid depending on timing IMHO.  Not sure
> anything guarantees that the sample tid/pid is preserved during the
> event.  From a quick look I couldn't find..

I found following line in my data file.

  $ perf script --time 52460.536907,
  swapper     0 [000] 52460.536907:       sched:sched_switch: pool:3775 [120] x ==> at-spi2-registr:1961 [120]
                    7f3963 __schedule (/lib/modules/4.8.3-1-ARCH/build/vmlinux)
                    7f3d7c schedule (/lib/modules/4.8.3-1-ARCH/build/vmlinux)
                    280b14 do_exit (/lib/modules/4.8.3-1-ARCH/build/vmlinux)
                    280fa7 [unknown] (/lib/modules/4.8.3-1-ARCH/build/vmlinux)
                    7f7cf2 entry_SYSCALL_64_fastpath (/lib/modules/4.8.3-1-ARCH/build/vmlinux)
  ...

As you can see task 3775 called schedule() during exit but sample tid
was 0.  In this case prev_pid is still correct (though it's always pid
in the root namespace) since its number was saved in task_struct at the
time of creation and not changed.

I'll change to check prev_pid for idle task.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-12-08 14:20 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-06  3:40 [PATCHSET 00/10] perf sched timehist: Introduce --idle-hist option (v1) Namhyung Kim
2016-12-06  3:40 ` [PATCH 01/10] perf sched: Cleanup option processing Namhyung Kim
2016-12-06  3:58   ` David Ahern
2016-12-07 18:24   ` [tip:perf/core] " tip-bot for Namhyung Kim
2016-12-06  3:40 ` [PATCH 02/10] perf tools: Introduce callchain_cursor__copy() Namhyung Kim
2016-12-07 18:25   ` [tip:perf/core] perf callchain: " tip-bot for Namhyung Kim
2016-12-06  3:40 ` [PATCH 03/10] perf sched timehist: Handle zero sample->tid properly Namhyung Kim
2016-12-06  3:52   ` David Ahern
2016-12-06  3:59     ` Namhyung Kim
2016-12-06  4:01       ` David Ahern
2016-12-07  2:06     ` Namhyung Kim
2016-12-08 14:19       ` Namhyung Kim
2016-12-07 18:25   ` [tip:perf/core] " tip-bot for Namhyung Kim
2016-12-06  3:40 ` [PATCH 04/10] perf sched timehist: Split is_idle_sample() Namhyung Kim
2016-12-06  3:57   ` David Ahern
2016-12-07  1:26     ` Namhyung Kim
2016-12-06  3:40 ` [PATCH 05/10] perf sched timehist: Cleanup idle_max_cpu handling Namhyung Kim
2016-12-06  4:00   ` David Ahern
2016-12-07 18:26   ` [tip:perf/core] " tip-bot for Namhyung Kim
2016-12-06  3:40 ` [PATCH 06/10] perf sched timehist: Introduce struct idle_time_data Namhyung Kim
2016-12-06  4:07   ` David Ahern
2016-12-07  2:12     ` Namhyung Kim
2016-12-06  3:40 ` [PATCH 07/10] perf sched timehist: Save callchain when entering idle Namhyung Kim
2016-12-06  3:40 ` [PATCH 08/10] perf sched timehist: Skip non-idle events when necessary Namhyung Kim
2016-12-06  3:40 ` [PATCH 09/10] perf sched timehist: Add -I/--idle-hist option Namhyung Kim
2016-12-06  3:40 ` [PATCH 10/10] perf sched timehist: Show callchains for idle stat Namhyung Kim
2016-12-06 19:44 ` [PATCHSET 00/10] perf sched timehist: Introduce --idle-hist option (v1) Arnaldo Carvalho de Melo
2016-12-06 21:31   ` David Ahern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).