linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1)
@ 2016-11-14 16:12 Namhyung Kim
  2016-11-14 16:12 ` [PATCH RESEND 1/7] perf symbol: Print symbol offsets conditionally Namhyung Kim
                   ` (8 more replies)
  0 siblings, 9 replies; 15+ messages in thread
From: Namhyung Kim @ 2016-11-14 16:12 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, David Ahern
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Stephane Eranian,
	Andi Kleen

Hello,

This patchset is a rebased version of David's sched timehist work [1].
I plan to improve perf sched command more and think that having
timehist command before the work looks good.  It seems David is busy
these days, so I'm retrying it by myself.

This implements only basic feature and a few options.  I just split
the patch to make it easier to review and did some cosmetic changes.
More patches will come later.

The below is from the David's original description:

------------------------8<-------------------------
'perf sched timehist' provides an analysis of scheduling events.

Example usage:
    perf sched record -- sleep 1
    perf sched timehist

By default it shows the individual schedule events, including the time between
sched-in events for the task, the task scheduling delay (time between wakeup
and actually running) and run time for the task:

           time cpu  task name[tid/pid]    b/n time sch delay  run time
  ------------- ---- -------------------- --------- --------- ---------
   79371.874569 [11] gcc[31949]               0.014     0.000     1.148
   79371.874591 [10] gcc[31951]               0.000     0.000     0.024
   79371.874603 [10] migration/10[59]         3.350     0.004     0.011
   79371.874604 [11] <idle>                   1.148     0.000     0.035
   79371.874723 [05] <idle>                   0.016     0.000     1.383
   79371.874746 [05] gcc[31949]               0.153     0.078     0.022
...

Times are in msec.usec.

If callchains were recorded they are appended to the line with a default stack depth of 5:

   79371.874569 [11] gcc[31949]                  0.000014  0.000000  0.001148  wait_for_completion_killable do_fork sys_vfork stub_vfork __vfork
   79371.874591 [10] gcc[31951]                  0.000000  0.000000  0.000024  __cond_resched _cond_resched wait_for_completion stop_one_cpu sched_exec
   79371.874603 [10] migration/10[59]            0.003350  0.000004  0.000011  smpboot_thread_fn kthread ret_from_fork
   79371.874604 [11] <idle>                      0.001148  0.000000  0.000035  cpu_startup_entry start_secondary
   79371.874723 [05] <idle>                      0.000016  0.000000  0.001383  cpu_startup_entry start_secondary
   79371.874746 [05] gcc[31949]                  0.000153  0.000078  0.000022  do_wait sys_wait4 system_call_fastpath __GI___waitpid

--no-call-graph can be used to not show the callchains. --max-stack is used
to control the number of frames shown (default of 5). -x/--excl options can
be used to collapse redundant callchains to get more relevant data on screen.

Similar to perf-trace -s and -S can be used to dump a statistical summary
without or with events (respectively). Statistics include min run time,
average run time and max run time. Stats are also shown for run time by
cpu.

The cpu-visual option provides a visual aid for sched switches by cpu:
...
   79371.874569 [11]            s      gcc[31949]                  0.000014  0.000000  0.001148
   79371.874591 [10]           s       gcc[31951]                  0.000000  0.000000  0.000024
   79371.874603 [10]           s       migration/10[59]            0.003350  0.000004  0.000011
   79371.874604 [11]            i      <idle>                      0.001148  0.000000  0.000035
   79371.874723 [05]      i            <idle>                      0.000016  0.000000  0.001383
   79371.874746 [05]      s            gcc[31949]                  0.000153  0.000078  0.000022
...

------------------------8<-------------------------

This code is available at 'perf/timehist-v1' branch in my tree

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Any feedback is welcomed, David?

Thanks,
Namhyung


[1] https://lkml.org/lkml/2013/12/1/129

David Ahern (6):
  perf sched timehist: Introduce timehist command
  perf sched timehist: Add summary options
  perf sched timehist: Add -w/--wakeups option
  perf sched timehist: Add call graph options
  perf sched timehist: Add -V/--cpu-visual option
  perf sched: Add documentation for timehist options

Namhyung Kim (1):
  perf symbol: Print symbol offsets conditionally

 tools/perf/Documentation/perf-sched.txt |  46 +-
 tools/perf/builtin-sched.c              | 912 +++++++++++++++++++++++++++++++-
 tools/perf/util/evsel_fprintf.c         |   6 +-
 tools/perf/util/symbol.h                |   3 +-
 tools/perf/util/symbol_fprintf.c        |  11 +-
 5 files changed, 963 insertions(+), 15 deletions(-)

-- 
2.10.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RESEND 1/7] perf symbol: Print symbol offsets conditionally
  2016-11-14 16:12 [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1) Namhyung Kim
@ 2016-11-14 16:12 ` Namhyung Kim
  2016-11-14 16:12 ` [PATCH 2/7] perf sched timehist: Introduce timehist command Namhyung Kim
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Namhyung Kim @ 2016-11-14 16:12 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, David Ahern
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Stephane Eranian,
	Andi Kleen

The __symbol__fprintf_symname_offs() always shows symbol offsets.  So
there's no difference between 'perf script -F ip,sym' and 'perf script
-F ip,sym,symoff'.  I don't think it's a desired behavior..

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/evsel_fprintf.c  |  6 ++++--
 tools/perf/util/symbol.h         |  3 ++-
 tools/perf/util/symbol_fprintf.c | 11 ++++++-----
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/evsel_fprintf.c b/tools/perf/util/evsel_fprintf.c
index 662a0a6182e7..ccb602397b60 100644
--- a/tools/perf/util/evsel_fprintf.c
+++ b/tools/perf/util/evsel_fprintf.c
@@ -137,7 +137,8 @@ int sample__fprintf_callchain(struct perf_sample *sample, int left_alignment,
 
 				if (print_symoffset) {
 					printed += __symbol__fprintf_symname_offs(node->sym, &node_al,
-										  print_unknown_as_addr, fp);
+										  print_unknown_as_addr,
+										  true, fp);
 				} else {
 					printed += __symbol__fprintf_symname(node->sym, &node_al,
 									     print_unknown_as_addr, fp);
@@ -188,7 +189,8 @@ int sample__fprintf_sym(struct perf_sample *sample, struct addr_location *al,
 			printed += fprintf(fp, " ");
 			if (print_symoffset) {
 				printed += __symbol__fprintf_symname_offs(al->sym, al,
-									  print_unknown_as_addr, fp);
+									  print_unknown_as_addr,
+									  true, fp);
 			} else {
 				printed += __symbol__fprintf_symname(al->sym, al,
 								     print_unknown_as_addr, fp);
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index d964844eb314..18f63f65607a 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -281,7 +281,8 @@ int symbol__annotation_init(void);
 struct symbol *symbol__new(u64 start, u64 len, u8 binding, const char *name);
 size_t __symbol__fprintf_symname_offs(const struct symbol *sym,
 				      const struct addr_location *al,
-				      bool unknown_as_addr, FILE *fp);
+				      bool unknown_as_addr,
+				      bool print_offsets, FILE *fp);
 size_t symbol__fprintf_symname_offs(const struct symbol *sym,
 				    const struct addr_location *al, FILE *fp);
 size_t __symbol__fprintf_symname(const struct symbol *sym,
diff --git a/tools/perf/util/symbol_fprintf.c b/tools/perf/util/symbol_fprintf.c
index a680bdaa65dc..7c6b33e8e2d2 100644
--- a/tools/perf/util/symbol_fprintf.c
+++ b/tools/perf/util/symbol_fprintf.c
@@ -15,14 +15,15 @@ size_t symbol__fprintf(struct symbol *sym, FILE *fp)
 
 size_t __symbol__fprintf_symname_offs(const struct symbol *sym,
 				      const struct addr_location *al,
-				      bool unknown_as_addr, FILE *fp)
+				      bool unknown_as_addr,
+				      bool print_offsets, FILE *fp)
 {
 	unsigned long offset;
 	size_t length;
 
 	if (sym && sym->name) {
 		length = fprintf(fp, "%s", sym->name);
-		if (al) {
+		if (al && print_offsets) {
 			if (al->addr < sym->end)
 				offset = al->addr - sym->start;
 			else
@@ -40,19 +41,19 @@ size_t symbol__fprintf_symname_offs(const struct symbol *sym,
 				    const struct addr_location *al,
 				    FILE *fp)
 {
-	return __symbol__fprintf_symname_offs(sym, al, false, fp);
+	return __symbol__fprintf_symname_offs(sym, al, false, true, fp);
 }
 
 size_t __symbol__fprintf_symname(const struct symbol *sym,
 				 const struct addr_location *al,
 				 bool unknown_as_addr, FILE *fp)
 {
-	return __symbol__fprintf_symname_offs(sym, al, unknown_as_addr, fp);
+	return __symbol__fprintf_symname_offs(sym, al, unknown_as_addr, false, fp);
 }
 
 size_t symbol__fprintf_symname(const struct symbol *sym, FILE *fp)
 {
-	return __symbol__fprintf_symname_offs(sym, NULL, false, fp);
+	return __symbol__fprintf_symname_offs(sym, NULL, false, false, fp);
 }
 
 size_t dso__fprintf_symbols_by_name(struct dso *dso,
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/7] perf sched timehist: Introduce timehist command
  2016-11-14 16:12 [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1) Namhyung Kim
  2016-11-14 16:12 ` [PATCH RESEND 1/7] perf symbol: Print symbol offsets conditionally Namhyung Kim
@ 2016-11-14 16:12 ` Namhyung Kim
  2016-11-14 16:12 ` [PATCH 3/7] perf sched timehist: Add summary options Namhyung Kim
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Namhyung Kim @ 2016-11-14 16:12 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, David Ahern
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Stephane Eranian,
	Andi Kleen

From: David Ahern <dsahern@gmail.com>

'perf sched timehist' provides an analysis of scheduling events.

Example usage:
    perf sched record -- sleep 1
    perf sched timehist

By default it shows the individual schedule events, including the time between
sched-in events for the task, the task scheduling delay (time between wakeup
and actually running) and run time for the task:

            time    cpu  task name              b/n time  sch delay   run time
                         [tid/pid]                (msec)     (msec)     (msec)
  -------------- ------  --------------------  ---------  ---------  ---------
    79371.874569 [0011]  gcc[31949]                0.014      0.000      1.148
    79371.874591 [0010]  gcc[31951]                0.000      0.000      0.024
    79371.874603 [0010]  migration/10[59]          3.350      0.004      0.011
    79371.874604 [0011]  <idle>                    1.148      0.000      0.035
    79371.874723 [0005]  <idle>                    0.016      0.000      1.383
    79371.874746 [0005]  gcc[31949]                0.153      0.078      0.022
...

Times are in msec.usec.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-sched.c | 595 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 590 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index fb3441211e4b..f55900c833dd 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -13,12 +13,14 @@
 #include "util/cloexec.h"
 #include "util/thread_map.h"
 #include "util/color.h"
+#include "util/stat.h"
 
 #include <subcmd/parse-options.h>
 #include "util/trace-event.h"
 
 #include "util/debug.h"
 
+#include <linux/log2.h>
 #include <sys/prctl.h>
 #include <sys/resource.h>
 
@@ -194,6 +196,29 @@ struct perf_sched {
 	struct perf_sched_map map;
 };
 
+/* per thread run time data */
+struct thread_runtime {
+	u64 last_time;      /* time of previous sched in/out event */
+	u64 dt_run;         /* run time */
+	u64 dt_between;     /* time between CPU access (off cpu) */
+	u64 dt_delay;       /* time between wakeup and sched-in */
+	u64 ready_to_run;   /* time of wakeup */
+
+	struct stats run_stats;
+	u64 total_run_time;
+};
+
+/* per event run time data */
+struct evsel_runtime {
+	u64 *last_time; /* time this event was last seen per cpu */
+	u32 ncpu;       /* highest cpu slot allocated */
+};
+
+/* track idle times per cpu */
+static struct thread **idle_threads;
+static int idle_max_cpu;
+static char idle_comm[] = "<idle>";
+
 static u64 get_nsecs(void)
 {
 	struct timespec ts;
@@ -1654,6 +1679,547 @@ static int perf_sched__read_events(struct perf_sched *sched)
 	return rc;
 }
 
+/*
+ * scheduling times are printed as msec.usec
+ */
+static inline void print_sched_time(unsigned long long nsecs, int width)
+{
+	unsigned long msecs;
+	unsigned long usecs;
+
+	msecs  = nsecs / NSEC_PER_MSEC;
+	nsecs -= msecs * NSEC_PER_MSEC;
+	usecs  = nsecs / NSEC_PER_USEC;
+	printf("%*lu.%03lu ", width, msecs, usecs);
+}
+
+/*
+ * returns runtime data for event, allocating memory for it the
+ * first time it is used.
+ */
+static struct evsel_runtime *perf_evsel__get_runtime(struct perf_evsel *evsel)
+{
+	struct evsel_runtime *r = evsel->priv;
+
+	if (r == NULL) {
+		r = zalloc(sizeof(struct evsel_runtime));
+		evsel->priv = r;
+	}
+
+	return r;
+}
+
+/*
+ * save last time event was seen per cpu
+ */
+static void perf_evsel__save_time(struct perf_evsel *evsel,
+				  u64 timestamp, u32 cpu)
+{
+	struct evsel_runtime *r = perf_evsel__get_runtime(evsel);
+
+	if (r == NULL)
+		return;
+
+	if ((cpu >= r->ncpu) || (r->last_time == NULL)) {
+		int i, n = __roundup_pow_of_two(cpu+1);
+		void *p = r->last_time;
+
+		p = realloc(r->last_time, n * sizeof(u64));
+		if (!p)
+			return;
+
+		r->last_time = p;
+		i = r->ncpu ? r->ncpu + 1 : 0;
+		for (; i < n; ++i)
+			r->last_time[i] = (u64) 0;
+
+		r->ncpu = n;
+	}
+
+	r->last_time[cpu] = timestamp;
+}
+
+/* returns last time this event was seen on the given cpu */
+static u64 perf_evsel__get_time(struct perf_evsel *evsel, u32 cpu)
+{
+	struct evsel_runtime *r = perf_evsel__get_runtime(evsel);
+
+	if ((r == NULL) || (r->last_time == NULL) || (cpu > r->ncpu))
+		return 0;
+
+	return r->last_time[cpu];
+}
+
+static int comm_width = 20;
+
+static char *timehist_get_commstr(struct thread *thread)
+{
+	static char str[32];
+	const char *comm = thread__comm_str(thread);
+	pid_t tid = thread->tid;
+	pid_t pid = thread->pid_;
+	int n;
+
+	if (pid == 0)
+		n = scnprintf(str, sizeof(str), "%s", comm);
+
+	else if (tid != pid)
+		n = scnprintf(str, sizeof(str), "%s[%d/%d]", comm, tid, pid);
+
+	else
+		n = scnprintf(str, sizeof(str), "%s[%d]", comm, tid);
+
+	if (n > comm_width)
+		comm_width = n;
+
+	return str;
+}
+
+static void timehist_header(void)
+{
+	printf("%15s %6s ", "time", "cpu");
+
+	printf(" %-20s  %9s  %9s  %9s",
+		"task name", "b/n time", "sch delay", "run time");
+
+	printf("\n");
+
+	/*
+	 * units row
+	 */
+	printf("%15s %-6s ", "", "");
+
+	printf(" %-20s  %9s  %9s  %9s\n", "[tid/pid]", "(msec)", "(msec)", "(msec)");
+
+	/*
+	 * separator
+	 */
+	printf("%.15s %.6s ", graph_dotted_line, graph_dotted_line);
+
+	printf(" %.20s  %.9s  %.9s  %.9s",
+		graph_dotted_line, graph_dotted_line, graph_dotted_line,
+		graph_dotted_line);
+
+	printf("\n");
+}
+
+static void timehist_print_sample(struct perf_sample *sample,
+				  struct thread *thread)
+{
+	struct thread_runtime *tr = thread__priv(thread);
+	char tstr[64];
+
+	timestamp__scnprintf_usec(sample->time, tstr, sizeof(tstr));
+	printf("%15s [%04d] ", tstr, sample->cpu);
+
+	printf(" %-*s ", comm_width, timehist_get_commstr(thread));
+
+	print_sched_time(tr->dt_between, 6);
+	print_sched_time(tr->dt_delay, 6);
+	print_sched_time(tr->dt_run, 6);
+	printf("\n");
+}
+
+/*
+ * Explanation of delta-time stats:
+ *
+ *            t = time of current schedule out event
+ *        tprev = time of previous sched out event
+ *                also time of schedule-in event for current task
+ *    last_time = time of last sched change event for current task
+ *                (i.e, time process was last scheduled out)
+ * ready_to_run = time of wakeup for current task
+ *
+ * -----|------------|------------|------------|------
+ *    last         ready        tprev          t
+ *    time         to run
+ *
+ *      |------- dt_between ------|
+ *                   |- dt_delay -|-- dt_run --|
+ *
+ *     dt_run = run time of current task
+ * dt_between = time between last schedule out event for task and tprev
+ *              represents time spent off the cpu
+ *   dt_delay = time between wakeup and schedule-in of task
+ */
+
+static void timehist_update_runtime_stats(struct thread_runtime *r,
+					 u64 t, u64 tprev)
+{
+	r->dt_delay   = 0;
+	r->dt_between = 0;
+	r->dt_run     = 0;
+	if (tprev) {
+		r->dt_run = t - tprev;
+		if (r->ready_to_run) {
+			if (r->ready_to_run > tprev)
+				pr_debug("time travel: wakeup time for task > previous sched_switch event\n");
+			else
+				r->dt_delay = tprev - r->ready_to_run;
+		}
+
+		if (r->last_time > tprev)
+			pr_debug("time travel: last sched out time for task > previous sched_switch event\n");
+		else if (r->last_time)
+			r->dt_between = tprev - r->last_time;
+	}
+
+	update_stats(&r->run_stats, r->dt_run);
+	r->total_run_time += r->dt_run;
+}
+
+static bool is_idle_sample(struct perf_sample *sample,
+			   struct perf_evsel *evsel)
+{
+	/* pid 0 == swapper == idle task */
+	if (sample->pid == 0)
+		return true;
+
+	if (strcmp(perf_evsel__name(evsel), "sched:sched_switch") == 0) {
+		if (perf_evsel__intval(evsel, sample, "prev_pid") == 0)
+			return true;
+	}
+	return false;
+}
+
+/*
+ * Track idle stats per cpu by maintaining a local thread
+ * struct for the idle task on each cpu.
+ */
+static int init_idle_threads(int ncpu)
+{
+	int i;
+
+	idle_threads = zalloc(ncpu * sizeof(struct thread *));
+	if (!idle_threads)
+		return -ENOMEM;
+
+	idle_max_cpu = ncpu - 1;
+
+	/* allocate the actual thread struct if needed */
+	for (i = 0; i < ncpu; ++i) {
+		idle_threads[i] = thread__new(0, 0);
+		if (idle_threads[i] == NULL)
+			return -ENOMEM;
+
+		thread__set_comm(idle_threads[i], idle_comm, 0);
+	}
+
+	return 0;
+}
+
+static void free_idle_threads(void)
+{
+	int i;
+
+	if (idle_threads == NULL)
+		return;
+
+	for (i = 0; i <= idle_max_cpu; ++i) {
+		if ((idle_threads[i]))
+			thread__delete(idle_threads[i]);
+	}
+
+	free(idle_threads);
+}
+
+static struct thread *get_idle_thread(int cpu)
+{
+	/*
+	 * expand/allocate array of pointers to local thread
+	 * structs if needed
+	 */
+	if ((cpu >= idle_max_cpu) || (idle_threads == NULL)) {
+		int i, j = __roundup_pow_of_two(cpu+1);
+		void *p;
+
+		p = realloc(idle_threads, j * sizeof(struct thread *));
+		if (!p)
+			return NULL;
+
+		idle_threads = (struct thread **) p;
+		i = idle_max_cpu ? idle_max_cpu + 1 : 0;
+		for (; i < j; ++i)
+			idle_threads[i] = NULL;
+
+		idle_max_cpu = j;
+	}
+
+	/* allocate a new thread struct if needed */
+	if (idle_threads[cpu] == NULL) {
+		idle_threads[cpu] = thread__new(0, 0);
+
+		if (idle_threads[cpu]) {
+			idle_threads[cpu]->tid = 0;
+			thread__set_comm(idle_threads[cpu], idle_comm, 0);
+		}
+	}
+
+	return idle_threads[cpu];
+}
+
+/*
+ * handle runtime stats saved per thread
+ */
+static struct thread_runtime *thread__init_runtime(struct thread *thread)
+{
+	struct thread_runtime *r;
+
+	r = zalloc(sizeof(struct thread_runtime));
+	if (!r)
+		return NULL;
+
+	init_stats(&r->run_stats);
+	thread__set_priv(thread, r);
+
+	return r;
+}
+
+static struct thread_runtime *thread__get_runtime(struct thread *thread)
+{
+	struct thread_runtime *tr;
+
+	tr = thread__priv(thread);
+	if (tr == NULL) {
+		tr = thread__init_runtime(thread);
+		if (tr == NULL)
+			pr_debug("Failed to malloc memory for runtime data.\n");
+	}
+
+	return tr;
+}
+
+static struct thread *timehist_get_thread(struct perf_sample *sample,
+					  struct machine *machine,
+					  struct perf_evsel *evsel)
+{
+	struct thread *thread;
+
+	if (is_idle_sample(sample, evsel)) {
+		thread = get_idle_thread(sample->cpu);
+		if (thread == NULL)
+			pr_err("Failed to get idle thread for cpu %d.\n", sample->cpu);
+
+	} else {
+		thread = machine__findnew_thread(machine, sample->pid, sample->tid);
+		if (thread == NULL) {
+			pr_debug("Failed to get thread for tid %d. skipping sample.\n",
+				 sample->tid);
+		}
+	}
+
+	return thread;
+}
+
+static bool timehist_skip_sample(struct thread *thread)
+{
+	bool rc = false;
+
+	if (thread__is_filtered(thread))
+		rc = true;
+
+	return rc;
+}
+
+static int timehist_sched_wakeup_event(struct perf_tool *tool __maybe_unused,
+				       union perf_event *event __maybe_unused,
+				       struct perf_evsel *evsel,
+				       struct perf_sample *sample,
+				       struct machine *machine)
+{
+	struct thread *thread;
+	struct thread_runtime *tr = NULL;
+	/* want pid of awakened task not pid in sample */
+	const u32 pid = perf_evsel__intval(evsel, sample, "pid");
+
+	thread = machine__findnew_thread(machine, 0, pid);
+	if (thread == NULL)
+		return -1;
+
+	tr = thread__get_runtime(thread);
+	if (tr == NULL)
+		return -1;
+
+	if (tr->ready_to_run == 0)
+		tr->ready_to_run = sample->time;
+
+	return 0;
+}
+
+static int timehist_sched_change_event(struct perf_tool *tool __maybe_unused,
+				       union perf_event *event,
+				       struct perf_evsel *evsel,
+				       struct perf_sample *sample,
+				       struct machine *machine)
+{
+	struct addr_location al;
+	struct thread *thread;
+	struct thread_runtime *tr = NULL;
+	u64 tprev;
+	int rc = 0;
+
+	if (machine__resolve(machine, &al, sample) < 0) {
+		pr_err("problem processing %d event. skipping it\n",
+		       event->header.type);
+		rc = -1;
+		goto out;
+	}
+
+	thread = timehist_get_thread(sample, machine, evsel);
+	if (thread == NULL) {
+		rc = -1;
+		goto out;
+	}
+
+	if (timehist_skip_sample(thread))
+		goto out;
+
+	tr = thread__get_runtime(thread);
+	if (tr == NULL) {
+		rc = -1;
+		goto out;
+	}
+
+	tprev = perf_evsel__get_time(evsel, sample->cpu);
+
+	timehist_update_runtime_stats(tr, sample->time, tprev);
+	timehist_print_sample(sample, thread);
+
+out:
+	if (tr) {
+		/* time of this sched_switch event becomes last time task seen */
+		tr->last_time = sample->time;
+
+		/* sched out event for task so reset ready to run time */
+		tr->ready_to_run = 0;
+	}
+
+	perf_evsel__save_time(evsel, sample->time, sample->cpu);
+
+	return rc;
+}
+
+static int timehist_sched_switch_event(struct perf_tool *tool,
+			     union perf_event *event,
+			     struct perf_evsel *evsel,
+			     struct perf_sample *sample,
+			     struct machine *machine __maybe_unused)
+{
+	return timehist_sched_change_event(tool, event, evsel, sample, machine);
+}
+
+static int process_lost(struct perf_tool *tool __maybe_unused,
+			union perf_event *event,
+			struct perf_sample *sample,
+			struct machine *machine __maybe_unused)
+{
+	char tstr[64];
+
+	timestamp__scnprintf_usec(sample->time, tstr, sizeof(tstr));
+	printf("%15s ", tstr);
+	printf("lost %" PRIu64 " events on cpu %d\n", event->lost.lost, sample->cpu);
+
+	return 0;
+}
+
+
+typedef int (*sched_handler)(struct perf_tool *tool,
+			  union perf_event *event,
+			  struct perf_evsel *evsel,
+			  struct perf_sample *sample,
+			  struct machine *machine);
+
+static int perf_timehist__process_sample(struct perf_tool *tool,
+					 union perf_event *event,
+					 struct perf_sample *sample,
+					 struct perf_evsel *evsel,
+					 struct machine *machine)
+{
+	struct perf_sched *sched = container_of(tool, struct perf_sched, tool);
+	int err = 0;
+	int this_cpu = sample->cpu;
+
+	if (this_cpu > sched->max_cpu)
+		sched->max_cpu = this_cpu;
+
+	if (evsel->handler != NULL) {
+		sched_handler f = evsel->handler;
+
+		err = f(tool, event, evsel, sample, machine);
+	}
+
+	return err;
+}
+
+static int perf_sched__timehist(struct perf_sched *sched)
+{
+	const struct perf_evsel_str_handler handlers[] = {
+		{ "sched:sched_switch",       timehist_sched_switch_event, },
+		{ "sched:sched_wakeup",	      timehist_sched_wakeup_event, },
+		{ "sched:sched_wakeup_new",   timehist_sched_wakeup_event, },
+	};
+	struct perf_data_file file = {
+		.path = input_name,
+		.mode = PERF_DATA_MODE_READ,
+	};
+
+	struct perf_session *session;
+	int err = -1;
+
+	/*
+	 * event handlers for timehist option
+	 */
+	sched->tool.sample	 = perf_timehist__process_sample;
+	sched->tool.mmap	 = perf_event__process_mmap;
+	sched->tool.comm	 = perf_event__process_comm;
+	sched->tool.exit	 = perf_event__process_exit;
+	sched->tool.fork	 = perf_event__process_fork;
+	sched->tool.lost	 = process_lost;
+	sched->tool.attr	 = perf_event__process_attr;
+	sched->tool.tracing_data = perf_event__process_tracing_data;
+	sched->tool.build_id	 = perf_event__process_build_id;
+
+	sched->tool.ordered_events = true;
+	sched->tool.ordering_requires_timestamps = true;
+
+	session = perf_session__new(&file, false, &sched->tool);
+	if (session == NULL)
+		return -ENOMEM;
+
+	symbol__init(&session->header.env);
+
+	setup_pager();
+
+	/* setup per-evsel handlers */
+	if (perf_session__set_tracepoints_handlers(session, handlers))
+		goto out;
+
+	if (!perf_session__has_traces(session, "record -R"))
+		goto out;
+
+	/* pre-allocate struct for per-CPU idle stats */
+	sched->max_cpu = session->header.env.nr_cpus_online;
+	if (sched->max_cpu == 0)
+		sched->max_cpu = 4;
+	if (init_idle_threads(sched->max_cpu))
+		goto out;
+
+	timehist_header();
+
+	err = perf_session__process_events(session);
+	if (err) {
+		pr_err("Failed to process events, error %d", err);
+		goto out;
+	}
+
+out:
+	free_idle_threads();
+	perf_session__delete(session);
+
+	return err;
+}
+
+
 static void print_bad_events(struct perf_sched *sched)
 {
 	if (sched->nr_unordered_timestamps && sched->nr_timestamps) {
@@ -1970,8 +2536,6 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 	const struct option latency_options[] = {
 	OPT_STRING('s', "sort", &sched.sort_order, "key[,key2...]",
 		   "sort by key(s): runtime, switch, avg, max"),
-	OPT_INCR('v', "verbose", &verbose,
-		    "be more verbose (show symbol address, etc)"),
 	OPT_INTEGER('C', "CPU", &sched.profile_cpu,
 		    "CPU to profile on"),
 	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
@@ -1983,8 +2547,6 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 	const struct option replay_options[] = {
 	OPT_UINTEGER('r', "repeat", &sched.replay_repeat,
 		     "repeat the workload replay N times (-1: infinite)"),
-	OPT_INCR('v', "verbose", &verbose,
-		    "be more verbose (show symbol address, etc)"),
 	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
 		    "dump raw trace in ASCII"),
 	OPT_BOOLEAN('f', "force", &sched.force, "don't complain, do it"),
@@ -2001,6 +2563,16 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
                     "display given CPUs in map"),
 	OPT_PARENT(sched_options)
 	};
+	const struct option timehist_options[] = {
+	OPT_STRING('k', "vmlinux", &symbol_conf.vmlinux_name,
+		   "file", "vmlinux pathname"),
+	OPT_STRING(0, "kallsyms", &symbol_conf.kallsyms_name,
+		   "file", "kallsyms pathname"),
+	OPT_STRING(0, "symfs", &symbol_conf.symfs, "directory",
+		    "Look for files with symbols relative to this directory"),
+	OPT_PARENT(sched_options)
+	};
+
 	const char * const latency_usage[] = {
 		"perf sched latency [<options>]",
 		NULL
@@ -2013,8 +2585,13 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 		"perf sched map [<options>]",
 		NULL
 	};
+	const char * const timehist_usage[] = {
+		"perf sched timehist [<options>]",
+		NULL
+	};
 	const char *const sched_subcommands[] = { "record", "latency", "map",
-						  "replay", "script", NULL };
+						  "replay", "script",
+						  "timehist", NULL };
 	const char *sched_usage[] = {
 		NULL,
 		NULL
@@ -2077,6 +2654,14 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 				usage_with_options(replay_usage, replay_options);
 		}
 		return perf_sched__replay(&sched);
+	} else if (!strcmp(argv[0], "timehist")) {
+		if (argc) {
+			argc = parse_options(argc, argv, timehist_options,
+					     timehist_usage, 0);
+			if (argc)
+				usage_with_options(timehist_usage, timehist_options);
+		}
+		return perf_sched__timehist(&sched);
 	} else {
 		usage_with_options(sched_usage, sched_options);
 	}
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/7] perf sched timehist: Add summary options
  2016-11-14 16:12 [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1) Namhyung Kim
  2016-11-14 16:12 ` [PATCH RESEND 1/7] perf symbol: Print symbol offsets conditionally Namhyung Kim
  2016-11-14 16:12 ` [PATCH 2/7] perf sched timehist: Introduce timehist command Namhyung Kim
@ 2016-11-14 16:12 ` Namhyung Kim
  2016-11-14 16:12 ` [PATCH 4/7] perf sched timehist: Add -w/--wakeups option Namhyung Kim
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Namhyung Kim @ 2016-11-14 16:12 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, David Ahern
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Stephane Eranian,
	Andi Kleen

From: David Ahern <dsahern@gmail.com>

The -s/--summary option is to show process runtime statistics.  And the
 -S/--with-summary option is to show the stats with the normal output.

  $ perf sched timehist -s

  Runtime summary
                            comm  parent   sched-in     run-time    min-run     avg-run     max-run  stddev
                                            (count)       (msec)     (msec)      (msec)      (msec)       %
  ---------------------------------------------------------------------------------------------------------
                  ksoftirqd/0[3]       2          2        0.011      0.004       0.005       0.006   14.87
                  rcu_preempt[7]       2         11        0.071      0.002       0.006       0.017   20.23
                  watchdog/0[11]       2          1        0.002      0.002       0.002       0.002    0.00
                  watchdog/1[12]       2          1        0.004      0.004       0.004       0.004    0.00
  ...

  Terminated tasks:
                     sleep[7220]    7219          3        0.770      0.087       0.256       0.576   62.28

  Idle stats:
      CPU  0 idle for   2352.006  msec
      CPU  1 idle for   2764.497  msec
      CPU  2 idle for   2998.229  msec
      CPU  3 idle for   2967.800  msec

      Total number of unique tasks: 52
  Total number of context switches: 2532
             Total run time (msec): 218.036

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-sched.c | 166 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 160 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index f55900c833dd..4b38f1411444 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -194,6 +194,11 @@ struct perf_sched {
 	bool force;
 	bool skip_merge;
 	struct perf_sched_map map;
+
+	/* options for timehist command */
+	bool		summary;
+	bool		summary_only;
+	u64		skipped_samples;
 };
 
 /* per thread run time data */
@@ -2011,12 +2016,15 @@ static struct thread *timehist_get_thread(struct perf_sample *sample,
 	return thread;
 }
 
-static bool timehist_skip_sample(struct thread *thread)
+static bool timehist_skip_sample(struct perf_sched *sched,
+				 struct thread *thread)
 {
 	bool rc = false;
 
-	if (thread__is_filtered(thread))
+	if (thread__is_filtered(thread)) {
 		rc = true;
+		sched->skipped_samples++;
+	}
 
 	return rc;
 }
@@ -2046,7 +2054,7 @@ static int timehist_sched_wakeup_event(struct perf_tool *tool __maybe_unused,
 	return 0;
 }
 
-static int timehist_sched_change_event(struct perf_tool *tool __maybe_unused,
+static int timehist_sched_change_event(struct perf_tool *tool,
 				       union perf_event *event,
 				       struct perf_evsel *evsel,
 				       struct perf_sample *sample,
@@ -2057,6 +2065,7 @@ static int timehist_sched_change_event(struct perf_tool *tool __maybe_unused,
 	struct thread_runtime *tr = NULL;
 	u64 tprev;
 	int rc = 0;
+	struct perf_sched *sched = container_of(tool, struct perf_sched, tool);
 
 	if (machine__resolve(machine, &al, sample) < 0) {
 		pr_err("problem processing %d event. skipping it\n",
@@ -2071,7 +2080,7 @@ static int timehist_sched_change_event(struct perf_tool *tool __maybe_unused,
 		goto out;
 	}
 
-	if (timehist_skip_sample(thread))
+	if (timehist_skip_sample(sched, thread))
 		goto out;
 
 	tr = thread__get_runtime(thread);
@@ -2083,7 +2092,8 @@ static int timehist_sched_change_event(struct perf_tool *tool __maybe_unused,
 	tprev = perf_evsel__get_time(evsel, sample->cpu);
 
 	timehist_update_runtime_stats(tr, sample->time, tprev);
-	timehist_print_sample(sample, thread);
+	if (!sched->summary_only)
+		timehist_print_sample(sample, thread);
 
 out:
 	if (tr) {
@@ -2123,6 +2133,131 @@ static int process_lost(struct perf_tool *tool __maybe_unused,
 }
 
 
+static void print_thread_runtime(struct thread *t,
+				 struct thread_runtime *r)
+{
+	double mean = avg_stats(&r->run_stats);
+	float stddev;
+
+	printf("%*s   %5d  %9" PRIu64 " ",
+	       comm_width, timehist_get_commstr(t), t->ppid,
+	       (u64) r->run_stats.n);
+
+	print_sched_time(r->total_run_time, 8);
+	stddev = rel_stddev_stats(stddev_stats(&r->run_stats), mean);
+	print_sched_time(r->run_stats.min, 6);
+	printf(" ");
+	print_sched_time((u64) mean, 6);
+	printf(" ");
+	print_sched_time(r->run_stats.max, 6);
+	printf("  ");
+	printf("%5.2f", stddev);
+	printf("\n");
+}
+
+struct total_run_stats {
+	u64  sched_count;
+	u64  task_count;
+	u64  total_run_time;
+};
+
+static int __show_thread_runtime(struct thread *t, void *priv)
+{
+	struct total_run_stats *stats = priv;
+	struct thread_runtime *r;
+
+	if (thread__is_filtered(t))
+		return 0;
+
+	r = thread__priv(t);
+	if (r && r->run_stats.n) {
+		stats->task_count++;
+		stats->sched_count += r->run_stats.n;
+		stats->total_run_time += r->total_run_time;
+		print_thread_runtime(t, r);
+	}
+
+	return 0;
+}
+
+static int show_thread_runtime(struct thread *t, void *priv)
+{
+	if (t->dead)
+		return 0;
+
+	return __show_thread_runtime(t, priv);
+}
+
+static int show_deadthread_runtime(struct thread *t, void *priv)
+{
+	if (!t->dead)
+		return 0;
+
+	return __show_thread_runtime(t, priv);
+}
+
+static void timehist_print_summary(struct perf_sched *sched,
+				   struct perf_session *session)
+{
+	struct machine *m = &session->machines.host;
+	struct total_run_stats totals;
+	u64 task_count;
+	struct thread *t;
+	struct thread_runtime *r;
+	int i;
+
+	memset(&totals, 0, sizeof(totals));
+
+	if (comm_width < 30)
+		comm_width = 30;
+
+	printf("\nRuntime summary\n");
+	printf("%*s  parent   sched-in  ", comm_width, "comm");
+	printf("   run-time    min-run     avg-run     max-run  stddev\n");
+	printf("%*s            (count)  ", comm_width, "");
+	printf("     (msec)     (msec)      (msec)      (msec)       %%\n");
+	printf("%.105s\n", graph_dotted_line);
+
+	machine__for_each_thread(m, show_thread_runtime, &totals);
+	task_count = totals.task_count;
+	if (!task_count)
+		printf("<no still running tasks>\n");
+
+	printf("\nTerminated tasks:\n");
+	machine__for_each_thread(m, show_deadthread_runtime, &totals);
+	if (task_count == totals.task_count)
+		printf("<no terminated tasks>\n");
+
+	/* CPU idle stats not tracked when samples were skipped */
+	if (sched->skipped_samples)
+		return;
+
+	printf("\nIdle stats:\n");
+	for (i = 0; i <= idle_max_cpu; ++i) {
+		t = idle_threads[i];
+		if (!t)
+			continue;
+
+		r = thread__priv(t);
+		if (r && r->run_stats.n) {
+			totals.sched_count += r->run_stats.n;
+			printf("    CPU %2d idle for ", i);
+			print_sched_time(r->total_run_time, 6);
+			printf(" msec\n");
+		} else
+			printf("    CPU %2d idle entire time window\n", i);
+	}
+
+	printf("\n"
+	       "    Total number of unique tasks: %" PRIu64 "\n"
+	       "Total number of context switches: %" PRIu64 "\n"
+	       "           Total run time (msec): ",
+	       totals.task_count, totals.sched_count);
+
+	print_sched_time(totals.total_run_time, 2);
+	printf("\n");
+}
+
 typedef int (*sched_handler)(struct perf_tool *tool,
 			  union perf_event *event,
 			  struct perf_evsel *evsel,
@@ -2164,6 +2299,7 @@ static int perf_sched__timehist(struct perf_sched *sched)
 	};
 
 	struct perf_session *session;
+	struct perf_evlist *evlist;
 	int err = -1;
 
 	/*
@@ -2186,6 +2322,8 @@ static int perf_sched__timehist(struct perf_sched *sched)
 	if (session == NULL)
 		return -ENOMEM;
 
+	evlist = session->evlist;
+
 	symbol__init(&session->header.env);
 
 	setup_pager();
@@ -2204,7 +2342,12 @@ static int perf_sched__timehist(struct perf_sched *sched)
 	if (init_idle_threads(sched->max_cpu))
 		goto out;
 
-	timehist_header();
+	/* summary_only implies summary option, but don't overwrite summary if set */
+	if (sched->summary_only)
+		sched->summary = sched->summary_only;
+
+	if (!sched->summary_only)
+		timehist_header();
 
 	err = perf_session__process_events(session);
 	if (err) {
@@ -2212,6 +2355,13 @@ static int perf_sched__timehist(struct perf_sched *sched)
 		goto out;
 	}
 
+	sched->nr_events      = evlist->stats.nr_events[0];
+	sched->nr_lost_events = evlist->stats.total_lost;
+	sched->nr_lost_chunks = evlist->stats.nr_events[PERF_RECORD_LOST];
+
+	if (sched->summary)
+		timehist_print_summary(sched, session);
+
 out:
 	free_idle_threads();
 	perf_session__delete(session);
@@ -2570,6 +2720,10 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 		   "file", "kallsyms pathname"),
 	OPT_STRING(0, "symfs", &symbol_conf.symfs, "directory",
 		    "Look for files with symbols relative to this directory"),
+	OPT_BOOLEAN('s', "summary", &sched.summary_only,
+		    "Show only syscall summary with statistics"),
+	OPT_BOOLEAN('S', "with-summary", &sched.summary,
+		    "Show all syscalls and summary with statistics"),
 	OPT_PARENT(sched_options)
 	};
 
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/7] perf sched timehist: Add -w/--wakeups option
  2016-11-14 16:12 [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1) Namhyung Kim
                   ` (2 preceding siblings ...)
  2016-11-14 16:12 ` [PATCH 3/7] perf sched timehist: Add summary options Namhyung Kim
@ 2016-11-14 16:12 ` Namhyung Kim
  2016-11-14 16:12 ` [PATCH 5/7] perf sched timehist: Add call graph options Namhyung Kim
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Namhyung Kim @ 2016-11-14 16:12 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, David Ahern
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Stephane Eranian,
	Andi Kleen

From: David Ahern <dsahern@gmail.com>

The -w option is to show wakeup events with timehist.

  $ perf sched timehist -w
             time    cpu  task name              b/n time  sch delay   run time
                          [tid/pid]                (msec)     (msec)     (msec)
  --------------- ------  --------------------  ---------  ---------  ---------
   2412598.429689 [0002]  perf[7219]                                             awakened: perf[7220]
   2412598.429696 [0009]  <idle>                    0.000      0.000      0.000
   2412598.429767 [0002]  perf[7219]                0.000      0.000      0.000
   2412598.429780 [0009]  perf[7220]                                             awakened: migration/9[53]
  ...

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-sched.c | 58 ++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 54 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 4b38f1411444..e59cd7574d0f 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -198,6 +198,7 @@ struct perf_sched {
 	/* options for timehist command */
 	bool		summary;
 	bool		summary_only;
+	bool		show_wakeups;
 	u64		skipped_samples;
 };
 
@@ -1808,7 +1809,8 @@ static void timehist_header(void)
 	printf("\n");
 }
 
-static void timehist_print_sample(struct perf_sample *sample,
+static void timehist_print_sample(struct perf_sched *sched,
+				  struct perf_sample *sample,
 				  struct thread *thread)
 {
 	struct thread_runtime *tr = thread__priv(thread);
@@ -1822,6 +1824,10 @@ static void timehist_print_sample(struct perf_sample *sample,
 	print_sched_time(tr->dt_between, 6);
 	print_sched_time(tr->dt_delay, 6);
 	print_sched_time(tr->dt_run, 6);
+
+	if (sched->show_wakeups)
+		printf("  %-*s", comm_width, "");
+
 	printf("\n");
 }
 
@@ -2029,12 +2035,44 @@ static bool timehist_skip_sample(struct perf_sched *sched,
 	return rc;
 }
 
-static int timehist_sched_wakeup_event(struct perf_tool *tool __maybe_unused,
+static void timehist_print_wakeup_event(struct perf_sched *sched,
+					struct perf_sample *sample,
+					struct machine *machine,
+					struct thread *awakened)
+{
+	struct thread *thread;
+	char tstr[64];
+
+	thread = machine__findnew_thread(machine, sample->pid, sample->tid);
+	if (thread == NULL)
+		return;
+
+	/* show wakeup unless both awakee and awaker are filtered */
+	if (timehist_skip_sample(sched, thread) &&
+	    timehist_skip_sample(sched, awakened)) {
+		return;
+	}
+
+	timestamp__scnprintf_usec(sample->time, tstr, sizeof(tstr));
+	printf("%15s [%04d] ", tstr, sample->cpu);
+
+	printf(" %-*s ", comm_width, timehist_get_commstr(thread));
+
+	/* dt spacer */
+	printf("  %9s  %9s  %9s ", "", "", "");
+
+	printf("awakened: %s", timehist_get_commstr(awakened));
+
+	printf("\n");
+}
+
+static int timehist_sched_wakeup_event(struct perf_tool *tool,
 				       union perf_event *event __maybe_unused,
 				       struct perf_evsel *evsel,
 				       struct perf_sample *sample,
 				       struct machine *machine)
 {
+	struct perf_sched *sched = container_of(tool, struct perf_sched, tool);
 	struct thread *thread;
 	struct thread_runtime *tr = NULL;
 	/* want pid of awakened task not pid in sample */
@@ -2051,6 +2089,10 @@ static int timehist_sched_wakeup_event(struct perf_tool *tool __maybe_unused,
 	if (tr->ready_to_run == 0)
 		tr->ready_to_run = sample->time;
 
+	/* show wakeups if requested */
+	if (sched->show_wakeups)
+		timehist_print_wakeup_event(sched, sample, machine, thread);
+
 	return 0;
 }
 
@@ -2060,12 +2102,12 @@ static int timehist_sched_change_event(struct perf_tool *tool,
 				       struct perf_sample *sample,
 				       struct machine *machine)
 {
+	struct perf_sched *sched = container_of(tool, struct perf_sched, tool);
 	struct addr_location al;
 	struct thread *thread;
 	struct thread_runtime *tr = NULL;
 	u64 tprev;
 	int rc = 0;
-	struct perf_sched *sched = container_of(tool, struct perf_sched, tool);
 
 	if (machine__resolve(machine, &al, sample) < 0) {
 		pr_err("problem processing %d event. skipping it\n",
@@ -2093,7 +2135,7 @@ static int timehist_sched_change_event(struct perf_tool *tool,
 
 	timehist_update_runtime_stats(tr, sample->time, tprev);
 	if (!sched->summary_only)
-		timehist_print_sample(sample, thread);
+		timehist_print_sample(sched, sample, thread);
 
 out:
 	if (tr) {
@@ -2724,6 +2766,7 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 		    "Show only syscall summary with statistics"),
 	OPT_BOOLEAN('S', "with-summary", &sched.summary,
 		    "Show all syscalls and summary with statistics"),
+	OPT_BOOLEAN('w', "wakeups", &sched.show_wakeups, "Show wakeup events"),
 	OPT_PARENT(sched_options)
 	};
 
@@ -2815,6 +2858,13 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 			if (argc)
 				usage_with_options(timehist_usage, timehist_options);
 		}
+		if (sched.show_wakeups && sched.summary_only) {
+			pr_err(" Error: -s and -w are mutually exclusive.\n");
+			parse_options_usage(timehist_usage, timehist_options, "s", true);
+			parse_options_usage(NULL, timehist_options, "w", true);
+			return -EINVAL;
+		}
+
 		return perf_sched__timehist(&sched);
 	} else {
 		usage_with_options(sched_usage, sched_options);
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/7] perf sched timehist: Add call graph options
  2016-11-14 16:12 [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1) Namhyung Kim
                   ` (3 preceding siblings ...)
  2016-11-14 16:12 ` [PATCH 4/7] perf sched timehist: Add -w/--wakeups option Namhyung Kim
@ 2016-11-14 16:12 ` Namhyung Kim
  2016-11-14 16:12 ` [PATCH 6/7] perf sched timehist: Add -V/--cpu-visual option Namhyung Kim
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Namhyung Kim @ 2016-11-14 16:12 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, David Ahern
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Stephane Eranian,
	Andi Kleen

From: David Ahern <dsahern@gmail.com>

If callchains were recorded they are appended to the line with a default stack depth of 5:

   79371.874569 [0011]  gcc[31949]                  0.000014  0.000000  0.001148  wait_for_completion_killable do_fork sys_vfork stub_vfork __vfork
   79371.874591 [0010]  gcc[31951]                  0.000000  0.000000  0.000024  __cond_resched _cond_resched wait_for_completion stop_one_cpu sched_exec
   79371.874603 [0010]  migration/10[59]            0.003350  0.000004  0.000011  smpboot_thread_fn kthread ret_from_fork
   79371.874604 [0011]  <idle>                      0.001148  0.000000  0.000035  cpu_startup_entry start_secondary
   79371.874723 [0005]  <idle>                      0.000016  0.000000  0.001383  cpu_startup_entry start_secondary
   79371.874746 [0005]  gcc[31949]                  0.000153  0.000078  0.000022  do_wait sys_wait4 system_call_fastpath __GI___waitpid

 --no-call-graph can be used to not show the callchains. --max-stack is used
to control the number of frames shown (default of 5). -x/--excl options can
be used to collapse redundant callchains to get more relevant data on screen.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-sched.c | 79 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 76 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index e59cd7574d0f..69a575b59d05 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -14,6 +14,7 @@
 #include "util/thread_map.h"
 #include "util/color.h"
 #include "util/stat.h"
+#include "util/callchain.h"
 
 #include <subcmd/parse-options.h>
 #include "util/trace-event.h"
@@ -198,6 +199,8 @@ struct perf_sched {
 	/* options for timehist command */
 	bool		summary;
 	bool		summary_only;
+	bool		show_callchain;
+	unsigned int	max_stack;
 	bool		show_wakeups;
 	u64		skipped_samples;
 };
@@ -1811,6 +1814,7 @@ static void timehist_header(void)
 
 static void timehist_print_sample(struct perf_sched *sched,
 				  struct perf_sample *sample,
+				  struct addr_location *al,
 				  struct thread *thread)
 {
 	struct thread_runtime *tr = thread__priv(thread);
@@ -1828,6 +1832,17 @@ static void timehist_print_sample(struct perf_sched *sched,
 	if (sched->show_wakeups)
 		printf("  %-*s", comm_width, "");
 
+	if (thread->tid == 0)
+		goto out;
+
+	if (sched->show_callchain)
+		printf("  ");
+
+	sample__fprintf_sym(sample, al, 0,
+			    EVSEL__PRINT_SYM | EVSEL__PRINT_ONELINE,
+			    &callchain_cursor, stdout);
+
+out:
 	printf("\n");
 }
 
@@ -1880,8 +1895,12 @@ static void timehist_update_runtime_stats(struct thread_runtime *r,
 }
 
 static bool is_idle_sample(struct perf_sample *sample,
-			   struct perf_evsel *evsel)
+			   struct perf_evsel *evsel,
+			   struct machine *machine)
 {
+	struct thread *thread;
+	struct callchain_cursor *cursor = &callchain_cursor;
+
 	/* pid 0 == swapper == idle task */
 	if (sample->pid == 0)
 		return true;
@@ -1890,6 +1909,25 @@ static bool is_idle_sample(struct perf_sample *sample,
 		if (perf_evsel__intval(evsel, sample, "prev_pid") == 0)
 			return true;
 	}
+
+	/* want main thread for process - has maps */
+	thread = machine__findnew_thread(machine, sample->pid, sample->pid);
+	if (thread == NULL) {
+		pr_debug("Failed to get thread for pid %d.\n", sample->pid);
+		return false;
+	}
+
+	if (!symbol_conf.use_callchain || sample->callchain == NULL)
+		return false;
+
+	if (thread__resolve_callchain(thread, cursor, evsel, sample,
+				      NULL, NULL, PERF_MAX_STACK_DEPTH) != 0) {
+		if (verbose)
+			error("Failed to resolve callchain. Skipping\n");
+
+		return false;
+	}
+	callchain_cursor_commit(cursor);
 	return false;
 }
 
@@ -2006,7 +2044,7 @@ static struct thread *timehist_get_thread(struct perf_sample *sample,
 {
 	struct thread *thread;
 
-	if (is_idle_sample(sample, evsel)) {
+	if (is_idle_sample(sample, evsel, machine)) {
 		thread = get_idle_thread(sample->cpu);
 		if (thread == NULL)
 			pr_err("Failed to get idle thread for cpu %d.\n", sample->cpu);
@@ -2135,7 +2173,7 @@ static int timehist_sched_change_event(struct perf_tool *tool,
 
 	timehist_update_runtime_stats(tr, sample->time, tprev);
 	if (!sched->summary_only)
-		timehist_print_sample(sched, sample, thread);
+		timehist_print_sample(sched, sample, &al, thread);
 
 out:
 	if (tr) {
@@ -2328,6 +2366,30 @@ static int perf_timehist__process_sample(struct perf_tool *tool,
 	return err;
 }
 
+static int timehist_check_attr(struct perf_sched *sched,
+			       struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel;
+	struct evsel_runtime *er;
+
+	list_for_each_entry(evsel, &evlist->entries, node) {
+		er = perf_evsel__get_runtime(evsel);
+		if (er == NULL) {
+			pr_err("Failed to allocate memory for evsel runtime data\n");
+			return -1;
+		}
+
+		if (sched->show_callchain &&
+		    !(evsel->attr.sample_type & PERF_SAMPLE_CALLCHAIN)) {
+			pr_info("Samples do not have callchains.\n");
+			sched->show_callchain = 0;
+			symbol_conf.use_callchain = 0;
+		}
+	}
+
+	return 0;
+}
+
 static int perf_sched__timehist(struct perf_sched *sched)
 {
 	const struct perf_evsel_str_handler handlers[] = {
@@ -2360,6 +2422,8 @@ static int perf_sched__timehist(struct perf_sched *sched)
 	sched->tool.ordered_events = true;
 	sched->tool.ordering_requires_timestamps = true;
 
+	symbol_conf.use_callchain = sched->show_callchain;
+
 	session = perf_session__new(&file, false, &sched->tool);
 	if (session == NULL)
 		return -ENOMEM;
@@ -2368,6 +2432,9 @@ static int perf_sched__timehist(struct perf_sched *sched)
 
 	symbol__init(&session->header.env);
 
+	if (timehist_check_attr(sched, evlist) != 0)
+		goto out;
+
 	setup_pager();
 
 	/* setup per-evsel handlers */
@@ -2715,6 +2782,8 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 		.next_shortname1      = 'A',
 		.next_shortname2      = '0',
 		.skip_merge           = 0,
+		.show_callchain	      = 1,
+		.max_stack            = 5,
 	};
 	const struct option sched_options[] = {
 	OPT_STRING('i', "input", &input_name, "file",
@@ -2760,6 +2829,10 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 		   "file", "vmlinux pathname"),
 	OPT_STRING(0, "kallsyms", &symbol_conf.kallsyms_name,
 		   "file", "kallsyms pathname"),
+	OPT_BOOLEAN('g', "call-graph", &sched.show_callchain,
+		    "Display call chains if present (default on)"),
+	OPT_UINTEGER(0, "max-stack", &sched.max_stack,
+		   "Maximum number of functions to display backtrace."),
 	OPT_STRING(0, "symfs", &symbol_conf.symfs, "directory",
 		    "Look for files with symbols relative to this directory"),
 	OPT_BOOLEAN('s', "summary", &sched.summary_only,
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/7] perf sched timehist: Add -V/--cpu-visual option
  2016-11-14 16:12 [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1) Namhyung Kim
                   ` (4 preceding siblings ...)
  2016-11-14 16:12 ` [PATCH 5/7] perf sched timehist: Add call graph options Namhyung Kim
@ 2016-11-14 16:12 ` Namhyung Kim
  2016-11-14 16:12 ` [PATCH 7/7] perf sched: Add documentation for timehist options Namhyung Kim
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Namhyung Kim @ 2016-11-14 16:12 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, David Ahern
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Stephane Eranian,
	Andi Kleen

From: David Ahern <dsahern@gmail.com>

The -V option provides a visual aid for sched switches by cpu:

  $ perf sched timehist -V
             time    cpu  0123456789abc  task name              b/n time  sch delay   run time
                                         [tid/pid]                (msec)     (msec)     (msec)
  --------------- ------  -------------  --------------------  ---------  ---------  ---------
  ...
   2412598.429696 [0009]           i     <idle>                    0.000      0.000      0.000
   2412598.429767 [0002]    s            perf[7219]                0.000      0.000      0.000
   2412598.429783 [0009]           s     perf[7220]                0.000      0.006      0.087
   2412598.429794 [0010]            i    <idle>                    0.000      0.000      0.000
   2412598.429795 [0009]           s     migration/9[53]           0.000      0.003      0.011
   2412598.430370 [0010]            s    sleep[7220]               0.011      0.000      0.576
   2412598.432584 [0003]     i           <idle>                    0.000      0.000      0.000
  ...

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-sched.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 42 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 69a575b59d05..7b7d74b37ef1 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -201,6 +201,7 @@ struct perf_sched {
 	bool		summary_only;
 	bool		show_callchain;
 	unsigned int	max_stack;
+	bool		show_cpu_visual;
 	bool		show_wakeups;
 	u64		skipped_samples;
 };
@@ -1784,10 +1785,23 @@ static char *timehist_get_commstr(struct thread *thread)
 	return str;
 }
 
-static void timehist_header(void)
+static void timehist_header(struct perf_sched *sched)
 {
+	u32 ncpus = sched->max_cpu + 1;
+	u32 i, j;
+
 	printf("%15s %6s ", "time", "cpu");
 
+	if (sched->show_cpu_visual) {
+		printf(" ");
+		for (i = 0, j = 0; i < ncpus; ++i) {
+			printf("%x", j++);
+			if (j > 15)
+				j = 0;
+		}
+		printf(" ");
+	}
+
 	printf(" %-20s  %9s  %9s  %9s",
 		"task name", "b/n time", "sch delay", "run time");
 
@@ -1798,6 +1812,9 @@ static void timehist_header(void)
 	 */
 	printf("%15s %-6s ", "", "");
 
+	if (sched->show_cpu_visual)
+		printf(" %*s ", ncpus, "");
+
 	printf(" %-20s  %9s  %9s  %9s\n", "[tid/pid]", "(msec)", "(msec)", "(msec)");
 
 	/*
@@ -1805,6 +1822,9 @@ static void timehist_header(void)
 	 */
 	printf("%.15s %.6s ", graph_dotted_line, graph_dotted_line);
 
+	if (sched->show_cpu_visual)
+		printf(" %.*s ", ncpus, graph_dotted_line);
+
 	printf(" %.20s  %.9s  %.9s  %.9s",
 		graph_dotted_line, graph_dotted_line, graph_dotted_line,
 		graph_dotted_line);
@@ -1818,11 +1838,28 @@ static void timehist_print_sample(struct perf_sched *sched,
 				  struct thread *thread)
 {
 	struct thread_runtime *tr = thread__priv(thread);
+	u32 max_cpus = sched->max_cpu + 1;
 	char tstr[64];
 
 	timestamp__scnprintf_usec(sample->time, tstr, sizeof(tstr));
 	printf("%15s [%04d] ", tstr, sample->cpu);
 
+	if (sched->show_cpu_visual) {
+		u32 i;
+		char c;
+
+		printf(" ");
+		for (i = 0; i < max_cpus; ++i) {
+			/* flag idle times with 'i'; others are sched events */
+			if (i == sample->cpu)
+				c = (thread->tid == 0) ? 'i' : 's';
+			else
+				c = ' ';
+			printf("%c", c);
+		}
+		printf(" ");
+	}
+
 	printf(" %-*s ", comm_width, timehist_get_commstr(thread));
 
 	print_sched_time(tr->dt_between, 6);
@@ -2093,6 +2130,8 @@ static void timehist_print_wakeup_event(struct perf_sched *sched,
 
 	timestamp__scnprintf_usec(sample->time, tstr, sizeof(tstr));
 	printf("%15s [%04d] ", tstr, sample->cpu);
+	if (sched->show_cpu_visual)
+		printf(" %*s ", sched->max_cpu + 1, "");
 
 	printf(" %-*s ", comm_width, timehist_get_commstr(thread));
 
@@ -2456,7 +2495,7 @@ static int perf_sched__timehist(struct perf_sched *sched)
 		sched->summary = sched->summary_only;
 
 	if (!sched->summary_only)
-		timehist_header();
+		timehist_header(sched);
 
 	err = perf_session__process_events(session);
 	if (err) {
@@ -2840,6 +2879,7 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_BOOLEAN('S', "with-summary", &sched.summary,
 		    "Show all syscalls and summary with statistics"),
 	OPT_BOOLEAN('w', "wakeups", &sched.show_wakeups, "Show wakeup events"),
+	OPT_BOOLEAN('V', "cpu-visual", &sched.show_cpu_visual, "Add CPU visual"),
 	OPT_PARENT(sched_options)
 	};
 
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 7/7] perf sched: Add documentation for timehist options
  2016-11-14 16:12 [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1) Namhyung Kim
                   ` (5 preceding siblings ...)
  2016-11-14 16:12 ` [PATCH 6/7] perf sched timehist: Add -V/--cpu-visual option Namhyung Kim
@ 2016-11-14 16:12 ` Namhyung Kim
  2016-11-14 16:26 ` [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1) David Ahern
  2016-11-15  6:42 ` Ingo Molnar
  8 siblings, 0 replies; 15+ messages in thread
From: Namhyung Kim @ 2016-11-14 16:12 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, David Ahern
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Stephane Eranian,
	Andi Kleen

From: David Ahern <dsahern@gmail.com>

Add entry to perf-sched documentation for timehist command and its
options.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-sched.txt | 46 +++++++++++++++++++++++++++++++--
 1 file changed, 44 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-sched.txt b/tools/perf/Documentation/perf-sched.txt
index 1cc08cc47ac5..a0344643f008 100644
--- a/tools/perf/Documentation/perf-sched.txt
+++ b/tools/perf/Documentation/perf-sched.txt
@@ -8,11 +8,11 @@ perf-sched - Tool to trace/measure scheduler properties (latencies)
 SYNOPSIS
 --------
 [verse]
-'perf sched' {record|latency|map|replay|script}
+'perf sched' {record|latency|map|replay|script|timehist}
 
 DESCRIPTION
 -----------
-There are five variants of perf sched:
+There are several variants of perf sched:
 
   'perf sched record <command>' to record the scheduling events
   of an arbitrary workload.
@@ -36,6 +36,11 @@ There are five variants of perf sched:
   are running on a CPU. A '*' denotes the CPU that had the event, and
   a dot signals an idle CPU.
 
+  'perf sched timehist' to show context-switching analysis with times
+  between schedule-in, schedule delay, and run time. If callchains are
+  present stack trace is dumped at the end of the line. A summary of
+  run times can be shown as well.
+
 OPTIONS
 -------
 -i::
@@ -66,6 +71,43 @@ OPTIONS for 'perf sched map'
 --color-pids::
 	Highlight the given pids.
 
+OPTIONS for 'perf sched timehist'
+---------------------------------
+-k::
+--vmlinux=<file>::
+    vmlinux pathname
+
+--kallsyms=<file>::
+    kallsyms pathname
+
+-s::
+--summary::
+    Show only a summary of scheduling by thread with min, max, and average
+    run times (in sec) and relative stddev.
+
+-S::
+--with-summary::
+    Show all scheduling events followed by a summary by thread with min,
+    max, and average run times (in sec) and relative stddev.
+
+-w::
+--wakeups::
+    Show wakeup events.
+
+--call-graph::
+        Display call chains. Default is on. Use --no-call-graph to suppress
+
+--max-stack::
+    Set the stack depth limit when showing the callchains. Default: 5
+
+--symfs=<directory>::
+    Look for files with symbols relative to this directory.
+
+-V::
+--cpu-visual::
+    Add a visual that better emphasizes activity by cpu. Idle times
+    are denoted with 'i'; schedule events with an 's'.
+
 SEE ALSO
 --------
 linkperf:perf-record[1]
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1)
  2016-11-14 16:12 [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1) Namhyung Kim
                   ` (6 preceding siblings ...)
  2016-11-14 16:12 ` [PATCH 7/7] perf sched: Add documentation for timehist options Namhyung Kim
@ 2016-11-14 16:26 ` David Ahern
  2016-11-15  6:42 ` Ingo Molnar
  8 siblings, 0 replies; 15+ messages in thread
From: David Ahern @ 2016-11-14 16:26 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Stephane Eranian,
	Andi Kleen

On 11/14/16 9:12 AM, Namhyung Kim wrote:
> Hello,
> 
> This patchset is a rebased version of David's sched timehist work [1].
> I plan to improve perf sched command more and think that having
> timehist command before the work looks good.  It seems David is busy
> these days, so I'm retrying it by myself.
> 
> This implements only basic feature and a few options.  I just split
> the patch to make it easier to review and did some cosmetic changes.
> More patches will come later.
> 
> The below is from the David's original description:
> 

-----8<-----

> 
> This code is available at 'perf/timehist-v1' branch in my tree
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
> 
> Any feedback is welcomed, David?
> 

Love to see this go in. Thanks for reviving the patches.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1)
  2016-11-14 16:12 [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1) Namhyung Kim
                   ` (7 preceding siblings ...)
  2016-11-14 16:26 ` [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1) David Ahern
@ 2016-11-15  6:42 ` Ingo Molnar
  2016-11-15  6:55   ` Namhyung Kim
  8 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2016-11-15  6:42 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Jiri Olsa,
	LKML, Stephane Eranian, Andi Kleen


* Namhyung Kim <namhyung@kernel.org> wrote:

> Hello,
> 
> This patchset is a rebased version of David's sched timehist work [1].
> I plan to improve perf sched command more and think that having
> timehist command before the work looks good.  It seems David is busy
> these days, so I'm retrying it by myself.
> 
> This implements only basic feature and a few options.  I just split
> the patch to make it easier to review and did some cosmetic changes.
> More patches will come later.
> 
> The below is from the David's original description:
> 
> ------------------------8<-------------------------
> 'perf sched timehist' provides an analysis of scheduling events.
> 
> Example usage:
>     perf sched record -- sleep 1
>     perf sched timehist


Cool, very nice!

> By default it shows the individual schedule events, including the time between
> sched-in events for the task, the task scheduling delay (time between wakeup
> and actually running) and run time for the task:
> 
>            time cpu  task name[tid/pid]    b/n time sch delay  run time
>   ------------- ---- -------------------- --------- --------- ---------
>    79371.874569 [11] gcc[31949]               0.014     0.000     1.148
>    79371.874591 [10] gcc[31951]               0.000     0.000     0.024
>    79371.874603 [10] migration/10[59]         3.350     0.004     0.011
>    79371.874604 [11] <idle>                   1.148     0.000     0.035
>    79371.874723 [05] <idle>                   0.016     0.000     1.383
>    79371.874746 [05] gcc[31949]               0.153     0.078     0.022
> ...

What does the 'b/n' abbreviation stand for? 'Between'? Could we call the column 
'sch wait' instead, or so?


> Times are in msec.usec.
> 
> If callchains were recorded they are appended to the line with a default stack depth of 5:
> 
>    79371.874569 [11] gcc[31949]                  0.000014  0.000000  0.001148  wait_for_completion_killable do_fork sys_vfork stub_vfork __vfork
>    79371.874591 [10] gcc[31951]                  0.000000  0.000000  0.000024  __cond_resched _cond_resched wait_for_completion stop_one_cpu sched_exec
>    79371.874603 [10] migration/10[59]            0.003350  0.000004  0.000011  smpboot_thread_fn kthread ret_from_fork
>    79371.874604 [11] <idle>                      0.001148  0.000000  0.000035  cpu_startup_entry start_secondary
>    79371.874723 [05] <idle>                      0.000016  0.000000  0.001383  cpu_startup_entry start_secondary
>    79371.874746 [05] gcc[31949]                  0.000153  0.000078  0.000022  do_wait sys_wait4 system_call_fastpath __GI___waitpid

So when I first saw this it was hard for me to disambiguate individual function 
names. Wouldn't this be a bit more readable:

>    79371.874569 [11] gcc[31949]                  0.000014  0.000000  0.001148  wait_for_completion_killable() <- do_fork sys_vfork stub_vfork() <- __vfork()
>    79371.874591 [10] gcc[31951]                  0.000000  0.000000  0.000024  __cond_resched() <- _cond_resched() <- wait_for_completion() <- stop_one_cpu() <- sched_exec()
>    79371.874603 [10] migration/10[59]            0.003350  0.000004  0.000011  smpboot_thread_fn() <- kthread() <- ret_from_fork()
>    79371.874604 [11] <idle>                      0.001148  0.000000  0.000035  cpu_startup_entry() <- start_secondary()
>    79371.874723 [05] <idle>                      0.000016  0.000000  0.001383  cpu_startup_entry() <- start_secondary()
>    79371.874746 [05] gcc[31949]                  0.000153  0.000078  0.000022  do_wait() <- sys_wait4() <- system_call_fastpath() <- __GI___waitpid()

Or:

>    79371.874569 [11] gcc[31949]                  0.000014  0.000000  0.001148  wait_for_completion_killable()		<- do_fork sys_vfork stub_vfork() <- __vfork()
>    79371.874591 [10] gcc[31951]                  0.000000  0.000000  0.000024  __cond_resched()			<- _cond_resched() <- wait_for_completion() <- stop_one_cpu() <- sched_exec()
>    79371.874603 [10] migration/10[59]            0.003350  0.000004  0.000011  smpboot_thread_fn()			<- kthread() <- ret_from_fork()
>    79371.874604 [11] <idle>                      0.001148  0.000000  0.000035  cpu_startup_entry()			<- start_secondary()
>    79371.874723 [05] <idle>                      0.000016  0.000000  0.001383  cpu_startup_entry()			<- start_secondary()
>    79371.874746 [05] gcc[31949]                  0.000153  0.000078  0.000022  do_wait()				<- sys_wait4() <- system_call_fastpath() <- __GI___waitpid()

(i.e. visually separate the first entry - and list the rest.)

Or maybe it could be ASCII color coded so that the different entries are easier to 
separate: for example the functions could be printed in alternating white/grey 
color?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1)
  2016-11-15  6:42 ` Ingo Molnar
@ 2016-11-15  6:55   ` Namhyung Kim
  2016-11-15  7:34     ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Namhyung Kim @ 2016-11-15  6:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Jiri Olsa,
	LKML, Stephane Eranian, Andi Kleen

Hi Ingo,

On Tue, Nov 15, 2016 at 07:42:14AM +0100, Ingo Molnar wrote:
> 
> * Namhyung Kim <namhyung@kernel.org> wrote:
> 
> > Hello,
> > 
> > This patchset is a rebased version of David's sched timehist work [1].
> > I plan to improve perf sched command more and think that having
> > timehist command before the work looks good.  It seems David is busy
> > these days, so I'm retrying it by myself.
> > 
> > This implements only basic feature and a few options.  I just split
> > the patch to make it easier to review and did some cosmetic changes.
> > More patches will come later.
> > 
> > The below is from the David's original description:
> > 
> > ------------------------8<-------------------------
> > 'perf sched timehist' provides an analysis of scheduling events.
> > 
> > Example usage:
> >     perf sched record -- sleep 1
> >     perf sched timehist
> 
> 
> Cool, very nice!

:)

> 
> > By default it shows the individual schedule events, including the time between
> > sched-in events for the task, the task scheduling delay (time between wakeup
> > and actually running) and run time for the task:
> > 
> >            time cpu  task name[tid/pid]    b/n time sch delay  run time
> >   ------------- ---- -------------------- --------- --------- ---------
> >    79371.874569 [11] gcc[31949]               0.014     0.000     1.148
> >    79371.874591 [10] gcc[31951]               0.000     0.000     0.024
> >    79371.874603 [10] migration/10[59]         3.350     0.004     0.011
> >    79371.874604 [11] <idle>                   1.148     0.000     0.035
> >    79371.874723 [05] <idle>                   0.016     0.000     1.383
> >    79371.874746 [05] gcc[31949]               0.153     0.078     0.022
> > ...
> 
> What does the 'b/n' abbreviation stand for? 'Between'? Could we call the column 
> 'sch wait' instead, or so?

Looks better, or what about 'wait time'?

> 
> 
> > Times are in msec.usec.
> > 
> > If callchains were recorded they are appended to the line with a default stack depth of 5:
> > 
> >    79371.874569 [11] gcc[31949]                  0.000014  0.000000  0.001148  wait_for_completion_killable do_fork sys_vfork stub_vfork __vfork
> >    79371.874591 [10] gcc[31951]                  0.000000  0.000000  0.000024  __cond_resched _cond_resched wait_for_completion stop_one_cpu sched_exec
> >    79371.874603 [10] migration/10[59]            0.003350  0.000004  0.000011  smpboot_thread_fn kthread ret_from_fork
> >    79371.874604 [11] <idle>                      0.001148  0.000000  0.000035  cpu_startup_entry start_secondary
> >    79371.874723 [05] <idle>                      0.000016  0.000000  0.001383  cpu_startup_entry start_secondary
> >    79371.874746 [05] gcc[31949]                  0.000153  0.000078  0.000022  do_wait sys_wait4 system_call_fastpath __GI___waitpid
> 
> So when I first saw this it was hard for me to disambiguate individual function 
> names. Wouldn't this be a bit more readable:
> 
> >    79371.874569 [11] gcc[31949]                  0.000014  0.000000  0.001148  wait_for_completion_killable() <- do_fork sys_vfork stub_vfork() <- __vfork()
> >    79371.874591 [10] gcc[31951]                  0.000000  0.000000  0.000024  __cond_resched() <- _cond_resched() <- wait_for_completion() <- stop_one_cpu() <- sched_exec()
> >    79371.874603 [10] migration/10[59]            0.003350  0.000004  0.000011  smpboot_thread_fn() <- kthread() <- ret_from_fork()
> >    79371.874604 [11] <idle>                      0.001148  0.000000  0.000035  cpu_startup_entry() <- start_secondary()
> >    79371.874723 [05] <idle>                      0.000016  0.000000  0.001383  cpu_startup_entry() <- start_secondary()
> >    79371.874746 [05] gcc[31949]                  0.000153  0.000078  0.000022  do_wait() <- sys_wait4() <- system_call_fastpath() <- __GI___waitpid()
> 
> Or:
> 
> >    79371.874569 [11] gcc[31949]                  0.000014  0.000000  0.001148  wait_for_completion_killable()		<- do_fork sys_vfork stub_vfork() <- __vfork()
> >    79371.874591 [10] gcc[31951]                  0.000000  0.000000  0.000024  __cond_resched()			<- _cond_resched() <- wait_for_completion() <- stop_one_cpu() <- sched_exec()
> >    79371.874603 [10] migration/10[59]            0.003350  0.000004  0.000011  smpboot_thread_fn()			<- kthread() <- ret_from_fork()
> >    79371.874604 [11] <idle>                      0.001148  0.000000  0.000035  cpu_startup_entry()			<- start_secondary()
> >    79371.874723 [05] <idle>                      0.000016  0.000000  0.001383  cpu_startup_entry()			<- start_secondary()
> >    79371.874746 [05] gcc[31949]                  0.000153  0.000078  0.000022  do_wait()				<- sys_wait4() <- system_call_fastpath() <- __GI___waitpid()
> 
> (i.e. visually separate the first entry - and list the rest.)
> 
> Or maybe it could be ASCII color coded so that the different entries are easier to 
> separate: for example the functions could be printed in alternating white/grey 
> color?

I'd go with the first option - simply adding arrows.  It's good enough
to identify each function IMHO.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1)
  2016-11-15  6:55   ` Namhyung Kim
@ 2016-11-15  7:34     ` Ingo Molnar
  2016-11-15 15:14       ` David Ahern
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2016-11-15  7:34 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, David Ahern, Peter Zijlstra, Jiri Olsa,
	LKML, Stephane Eranian, Andi Kleen


* Namhyung Kim <namhyung@kernel.org> wrote:

> > > By default it shows the individual schedule events, including the time between
> > > sched-in events for the task, the task scheduling delay (time between wakeup
> > > and actually running) and run time for the task:
> > > 
> > >            time cpu  task name[tid/pid]    b/n time sch delay  run time
> > >   ------------- ---- -------------------- --------- --------- ---------
> > >    79371.874569 [11] gcc[31949]               0.014     0.000     1.148
> > >    79371.874591 [10] gcc[31951]               0.000     0.000     0.024
> > >    79371.874603 [10] migration/10[59]         3.350     0.004     0.011
> > >    79371.874604 [11] <idle>                   1.148     0.000     0.035
> > >    79371.874723 [05] <idle>                   0.016     0.000     1.383
> > >    79371.874746 [05] gcc[31949]               0.153     0.078     0.022
> > > ...
> > 
> > What does the 'b/n' abbreviation stand for? 'Between'? Could we call the column 
> > 'sch wait' instead, or so?
> 
> Looks better, or what about 'wait time'?

Works for me!

> I'd go with the first option - simply adding arrows.  It's good enough to 
> identify each function IMHO.

Ok!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1)
  2016-11-15  7:34     ` Ingo Molnar
@ 2016-11-15 15:14       ` David Ahern
  2016-11-15 15:32         ` Namhyung Kim
  0 siblings, 1 reply; 15+ messages in thread
From: David Ahern @ 2016-11-15 15:14 UTC (permalink / raw)
  To: Ingo Molnar, Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Jiri Olsa, LKML,
	Stephane Eranian, Andi Kleen

On 11/15/16 12:34 AM, Ingo Molnar wrote:
> 
> * Namhyung Kim <namhyung@kernel.org> wrote:
> 
>>>> By default it shows the individual schedule events, including the time between
>>>> sched-in events for the task, the task scheduling delay (time between wakeup
>>>> and actually running) and run time for the task:
>>>>
>>>>            time cpu  task name[tid/pid]    b/n time sch delay  run time
>>>>   ------------- ---- -------------------- --------- --------- ---------
>>>>    79371.874569 [11] gcc[31949]               0.014     0.000     1.148
>>>>    79371.874591 [10] gcc[31951]               0.000     0.000     0.024
>>>>    79371.874603 [10] migration/10[59]         3.350     0.004     0.011
>>>>    79371.874604 [11] <idle>                   1.148     0.000     0.035
>>>>    79371.874723 [05] <idle>                   0.016     0.000     1.383
>>>>    79371.874746 [05] gcc[31949]               0.153     0.078     0.022
>>>> ...
>>>
>>> What does the 'b/n' abbreviation stand for? 'Between'? Could we call the column 
>>> 'sch wait' instead, or so?
>>
>> Looks better, or what about 'wait time'?
> 
> Works for me!

That column generically is time not running -- time between the last sched out and the current sched in. It could be expected (sleep, select, read, ...), waiting for a resource (disk I/O, mutex) or preemption.

> 
>> I'd go with the first option - simply adding arrows.  It's good enough to 
>> identify each function IMHO.
> 
> Ok!

I'd prefer the arrows too for a default. Color can be an add-on option.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1)
  2016-11-15 15:14       ` David Ahern
@ 2016-11-15 15:32         ` Namhyung Kim
  2016-11-15 18:25           ` David Ahern
  0 siblings, 1 reply; 15+ messages in thread
From: Namhyung Kim @ 2016-11-15 15:32 UTC (permalink / raw)
  To: David Ahern
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Peter Zijlstra, Jiri Olsa,
	LKML, Stephane Eranian, Andi Kleen

Hi David,

On Tue, Nov 15, 2016 at 08:14:24AM -0700, David Ahern wrote:
> On 11/15/16 12:34 AM, Ingo Molnar wrote:
> > 
> > * Namhyung Kim <namhyung@kernel.org> wrote:
> > 
> >>>> By default it shows the individual schedule events, including the time between
> >>>> sched-in events for the task, the task scheduling delay (time between wakeup
> >>>> and actually running) and run time for the task:
> >>>>
> >>>>            time cpu  task name[tid/pid]    b/n time sch delay  run time
> >>>>   ------------- ---- -------------------- --------- --------- ---------
> >>>>    79371.874569 [11] gcc[31949]               0.014     0.000     1.148
> >>>>    79371.874591 [10] gcc[31951]               0.000     0.000     0.024
> >>>>    79371.874603 [10] migration/10[59]         3.350     0.004     0.011
> >>>>    79371.874604 [11] <idle>                   1.148     0.000     0.035
> >>>>    79371.874723 [05] <idle>                   0.016     0.000     1.383
> >>>>    79371.874746 [05] gcc[31949]               0.153     0.078     0.022
> >>>> ...
> >>>
> >>> What does the 'b/n' abbreviation stand for? 'Between'? Could we call the column 
> >>> 'sch wait' instead, or so?
> >>
> >> Looks better, or what about 'wait time'?
> > 
> > Works for me!
> 
> That column generically is time not running -- time between the last
> sched out and the current sched in. It could be expected (sleep,
> select, read, ...), waiting for a resource (disk I/O, mutex) or
> preemption.

Right.  Maybe it'd be better to show the prev_state as well to
identify the reason (roughly).

So, are you ok with the name 'wait time'?  My thinking is that they
are all waiting for something - timer, resource or cpu.

Thanks,
Namhyung

> 
> > 
> >> I'd go with the first option - simply adding arrows.  It's good enough to 
> >> identify each function IMHO.
> > 
> > Ok!
> 
> I'd prefer the arrows too for a default. Color can be an add-on option.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1)
  2016-11-15 15:32         ` Namhyung Kim
@ 2016-11-15 18:25           ` David Ahern
  0 siblings, 0 replies; 15+ messages in thread
From: David Ahern @ 2016-11-15 18:25 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Peter Zijlstra, Jiri Olsa,
	LKML, Stephane Eranian, Andi Kleen

On 11/15/16 8:32 AM, Namhyung Kim wrote:
> So, are you ok with the name 'wait time'?  My thinking is that they
> are all waiting for something - timer, resource or cpu.

sure. 

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-11-15 18:25 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-14 16:12 [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1) Namhyung Kim
2016-11-14 16:12 ` [PATCH RESEND 1/7] perf symbol: Print symbol offsets conditionally Namhyung Kim
2016-11-14 16:12 ` [PATCH 2/7] perf sched timehist: Introduce timehist command Namhyung Kim
2016-11-14 16:12 ` [PATCH 3/7] perf sched timehist: Add summary options Namhyung Kim
2016-11-14 16:12 ` [PATCH 4/7] perf sched timehist: Add -w/--wakeups option Namhyung Kim
2016-11-14 16:12 ` [PATCH 5/7] perf sched timehist: Add call graph options Namhyung Kim
2016-11-14 16:12 ` [PATCH 6/7] perf sched timehist: Add -V/--cpu-visual option Namhyung Kim
2016-11-14 16:12 ` [PATCH 7/7] perf sched: Add documentation for timehist options Namhyung Kim
2016-11-14 16:26 ` [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1) David Ahern
2016-11-15  6:42 ` Ingo Molnar
2016-11-15  6:55   ` Namhyung Kim
2016-11-15  7:34     ` Ingo Molnar
2016-11-15 15:14       ` David Ahern
2016-11-15 15:32         ` Namhyung Kim
2016-11-15 18:25           ` David Ahern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).