linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 0/6] Freq/CPU%/CORE_BUSY% support
@ 2015-07-24 13:48 kan.liang
  2015-07-24 13:48 ` [PATCH V2 1/6] perf,tools: save cpu max freq in perf header kan.liang
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: kan.liang @ 2015-07-24 13:48 UTC (permalink / raw)
  To: a.p.zijlstra, acme
  Cc: luto, mingo, eranian, ak, mark.rutland, adrian.hunter, jolsa,
	namhyung, linux-kernel, Kan Liang

From: Kan Liang <kan.liang@intel.com>

This patch set supports per-sample freq/CPU%/CORE_BUSY% print in perf
report -D and --stdio.
For printing these information, the perf.data file must have been obtained
by group read and using special events cycles, ref-cycles, msr/tsc/,
msr/aperf/ or msr/mperf/.

 - Freq (MHz): The frequency during the sample interval. Needs cycles
   ref-cycles event.
 - CPU%: CPU utilization during the sample interval. Needs ref-cycles and
   msr/tsc/ events.
 - CORE_BUSY%: actual percent performance (APERF/MPERF%) during the
   sample interval. Needs msr/aperf/ and msr/mperf/ events.

For printing CPU% and CORE_BUSY%, please also apply the kernel patch.
http://marc.info/?l=linux-kernel&m=143747254926369&w=2

Here is an example:

$ perf record -e
'{cycles,ref-cycles,msr/tsc/,msr/mperf/,msr/aperf/}:S' ~/tchain_edit

$ perf report --stdio --group --show-freq-perf

                                 Overhead   FREQ MHz   CPU%  CORE_BUSY%
Command      Shared Object     Symbol
 ........................................  .........  .....  ..........
...........  ................  ......................

    99.54%  99.54%  99.53%  99.53%  99.53%       2301     96         99
tchain_edit  tchain_edit       [.] f3
     0.20%   0.20%   0.20%   0.20%   0.20%       2301     98         99
tchain_edit  tchain_edit       [.] f2
     0.05%   0.05%   0.05%   0.05%   0.05%       2300     98         99
tchain_edit  [kernel.vmlinux]  [k] read_tsc

Changes since V1:
 - Save cpu max freq to header when recording
 - Read cpu max freq and msr type from header when reporting

Kan Liang (6):
  perf,tools: save cpu max freq in perf header
  perf,tools: read cpu max freq and msr type from header
  perf,tools: Dump per-sample freq/CPU%/CORE_BUSY% in report -D
  perf,tools: save misc sample read value in struct perf_sample
  perf,tools: caculate and save freq/CPU%/CORE_BUSY% in he_stat
  perf,tools: Show freq/CPU%/CORE_BUSY% in perf report --stdio

 tools/perf/Documentation/perf-report.txt | 12 ++++++
 tools/perf/builtin-annotate.c            |  2 +-
 tools/perf/builtin-diff.c                |  2 +-
 tools/perf/builtin-report.c              | 22 ++++++++++
 tools/perf/perf.h                        |  1 +
 tools/perf/tests/hists_link.c            |  4 +-
 tools/perf/ui/hist.c                     | 71 +++++++++++++++++++++++++++++---
 tools/perf/util/cpumap.c                 | 32 ++++++++++++++
 tools/perf/util/cpumap.h                 |  2 +
 tools/perf/util/event.h                  | 11 +++++
 tools/perf/util/header.c                 | 56 +++++++++++++++++++++++++
 tools/perf/util/header.h                 |  4 ++
 tools/perf/util/hist.c                   | 51 ++++++++++++++++++++---
 tools/perf/util/hist.h                   |  5 +++
 tools/perf/util/session.c                | 49 +++++++++++++++++++---
 tools/perf/util/session.h                | 29 +++++++++++++
 tools/perf/util/sort.c                   |  3 ++
 tools/perf/util/sort.h                   |  3 ++
 tools/perf/util/symbol.h                 |  9 +++-
 tools/perf/util/util.c                   |  2 +
 20 files changed, 349 insertions(+), 21 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH V2 1/6] perf,tools: save cpu max freq in perf header
  2015-07-24 13:48 [PATCH V2 0/6] Freq/CPU%/CORE_BUSY% support kan.liang
@ 2015-07-24 13:48 ` kan.liang
  2015-07-26 16:31   ` Jiri Olsa
  2015-07-26 16:32   ` Jiri Olsa
  2015-07-24 13:48 ` [PATCH V2 2/6] perf,tools: read cpu max freq and msr type from header kan.liang
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 22+ messages in thread
From: kan.liang @ 2015-07-24 13:48 UTC (permalink / raw)
  To: a.p.zijlstra, acme
  Cc: luto, mingo, eranian, ak, mark.rutland, adrian.hunter, jolsa,
	namhyung, linux-kernel, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Get cpu max frequency from the first online cpu, and save the MHz value
in perf.data header.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/util/cpumap.c | 32 ++++++++++++++++++++++++++++++++
 tools/perf/util/cpumap.h |  1 +
 tools/perf/util/header.c | 35 +++++++++++++++++++++++++++++++++++
 tools/perf/util/header.h |  2 ++
 4 files changed, 70 insertions(+)

diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c
index 3667e21..548ef13 100644
--- a/tools/perf/util/cpumap.c
+++ b/tools/perf/util/cpumap.c
@@ -499,3 +499,35 @@ int cpu__setup_cpunode_map(void)
 	closedir(dir1);
 	return 0;
 }
+
+unsigned int get_cpu_max_freq(void)
+{
+	const char *mnt;
+	char path[PATH_MAX], tmp;
+	FILE *fp;
+	unsigned int freq;
+	int cpu = 0;
+	int ret;
+
+	mnt = sysfs__mountpoint();
+	if (!mnt)
+		return 0;
+
+	snprintf(path, PATH_MAX, "%s/devices/system/cpu/online", mnt);
+	fp = fopen(path, "r");
+	if (fp) {
+		ret = fscanf(fp, "%u%c", &cpu, &tmp);
+		fclose(fp);
+		if (ret < 1)
+			return 0;
+	}
+
+	snprintf(path, PATH_MAX, "%s/devices/system/cpu/cpu%d/cpufreq/cpuinfo_max_freq", mnt, cpu);
+	fp = fopen(path, "r");
+	if (!fp)
+		return 0;
+	ret = fscanf(fp, "%u", &freq);
+	fclose(fp);
+
+	return (ret == 1) ? freq : 0;
+}
diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h
index 0af9cec..6784677 100644
--- a/tools/perf/util/cpumap.h
+++ b/tools/perf/util/cpumap.h
@@ -58,6 +58,7 @@ int max_node_num;
 int *cpunode_map;
 
 int cpu__setup_cpunode_map(void);
+unsigned int get_cpu_max_freq(void);
 
 static inline int cpu__max_node(void)
 {
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 03ace57..287a488 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -862,6 +862,16 @@ write_it:
 	return do_write_string(fd, buffer);
 }
 
+static int write_cpu_max_freq(int fd, struct perf_header *h __maybe_unused,
+			      struct perf_evlist *evlist __maybe_unused)
+{
+	u32 freq;
+
+	freq = get_cpu_max_freq() / 1000;
+
+	return do_write(fd, &freq, sizeof(freq));
+}
+
 static int write_branch_stack(int fd __maybe_unused,
 			      struct perf_header *h __maybe_unused,
 		       struct perf_evlist *evlist __maybe_unused)
@@ -1158,6 +1168,11 @@ static void print_cpuid(struct perf_header *ph, int fd __maybe_unused, FILE *fp)
 	fprintf(fp, "# cpuid : %s\n", ph->env.cpuid);
 }
 
+static void print_cpu_max_freq(struct perf_header *ph, int fd __maybe_unused, FILE *fp)
+{
+	fprintf(fp, "# CPU max frequency : %u MHz\n", ph->env.cpu_max_freq);
+}
+
 static void print_branch_stack(struct perf_header *ph __maybe_unused,
 			       int fd __maybe_unused, FILE *fp)
 {
@@ -1471,6 +1486,25 @@ static int process_cpuid(struct perf_file_section *section __maybe_unused,
 	return ph->env.cpuid ? 0 : -ENOMEM;
 }
 
+static int process_cpu_max_freq(struct perf_file_section *section __maybe_unused,
+				struct perf_header *ph, int fd,
+				void *data __maybe_unused)
+{
+	ssize_t ret;
+	u32 nr;
+
+	ret = readn(fd, &nr, sizeof(nr));
+	if (ret != sizeof(nr))
+		return -1;
+
+	if (ph->needs_swap)
+		nr = bswap_32(nr);
+
+	ph->env.cpu_max_freq = nr;
+
+	return 0;
+}
+
 static int process_total_mem(struct perf_file_section *section __maybe_unused,
 			     struct perf_header *ph, int fd,
 			     void *data __maybe_unused)
@@ -1885,6 +1919,7 @@ static const struct feature_ops feat_ops[HEADER_LAST_FEATURE] = {
 	FEAT_OPP(HEADER_NRCPUS,		nrcpus),
 	FEAT_OPP(HEADER_CPUDESC,	cpudesc),
 	FEAT_OPP(HEADER_CPUID,		cpuid),
+	FEAT_OPP(HEADER_CPU_MAX_FREQ,	cpu_max_freq),
 	FEAT_OPP(HEADER_TOTAL_MEM,	total_mem),
 	FEAT_OPP(HEADER_EVENT_DESC,	event_desc),
 	FEAT_OPP(HEADER_CMDLINE,	cmdline),
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index d4d5796..a646025 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -22,6 +22,7 @@ enum {
 	HEADER_NRCPUS,
 	HEADER_CPUDESC,
 	HEADER_CPUID,
+	HEADER_CPU_MAX_FREQ,
 	HEADER_TOTAL_MEM,
 	HEADER_CMDLINE,
 	HEADER_EVENT_DESC,
@@ -75,6 +76,7 @@ struct perf_session_env {
 	int			nr_cpus_avail;
 	char			*cpu_desc;
 	char			*cpuid;
+	int			cpu_max_freq;
 	unsigned long long	total_mem;
 
 	int			nr_cmdline;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V2 2/6] perf,tools: read cpu max freq and msr type from header
  2015-07-24 13:48 [PATCH V2 0/6] Freq/CPU%/CORE_BUSY% support kan.liang
  2015-07-24 13:48 ` [PATCH V2 1/6] perf,tools: save cpu max freq in perf header kan.liang
@ 2015-07-24 13:48 ` kan.liang
  2015-07-24 13:48 ` [PATCH V2 3/6] perf,tools: Dump per-sample freq/CPU%/CORE_BUSY% in report -D kan.liang
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: kan.liang @ 2015-07-24 13:48 UTC (permalink / raw)
  To: a.p.zijlstra, acme
  Cc: luto, mingo, eranian, ak, mark.rutland, adrian.hunter, jolsa,
	namhyung, linux-kernel, Kan Liang

From: Kan Liang <kan.liang@intel.com>

In perf report, read cpu max freq and msr type from header.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/builtin-report.c |  3 +++
 tools/perf/util/cpumap.h    |  1 +
 tools/perf/util/header.c    | 21 +++++++++++++++++++++
 tools/perf/util/header.h    |  2 ++
 tools/perf/util/session.h   |  1 +
 5 files changed, 28 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 95a4771..2a32e9a 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -818,6 +818,9 @@ repeat:
 		symbol_conf.cumulate_callchain = false;
 	}
 
+	cpu_max_freq = session->header.env.cpu_max_freq;
+	msr_pmu_type = perf_header_find_pmu_type(session, "msr");
+
 	if (setup_sorting() < 0) {
 		if (sort_order)
 			parse_options_usage(report_usage, options, "s", 1);
diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h
index 6784677..70ac686 100644
--- a/tools/perf/util/cpumap.h
+++ b/tools/perf/util/cpumap.h
@@ -56,6 +56,7 @@ static inline bool cpu_map__empty(const struct cpu_map *map)
 int max_cpu_num;
 int max_node_num;
 int *cpunode_map;
+unsigned int cpu_max_freq;
 
 int cpu__setup_cpunode_map(void);
 unsigned int get_cpu_max_freq(void);
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 287a488..e83681b 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -2847,3 +2847,24 @@ int perf_event__process_build_id(struct perf_tool *tool __maybe_unused,
 				 session);
 	return 0;
 }
+
+int perf_header_find_pmu_type(struct perf_session *session, const char *name)
+{
+	u32 pmu_num = session->header.env.nr_pmu_mappings;
+	u32 type;
+	char *str, *tmp;
+
+	str = session->header.env.pmu_mappings;
+	while (pmu_num--) {
+		type = strtoul(str, &tmp, 0);
+		if (*tmp != ':') {
+			pr_err("unable to read pmu mappings from head\n");
+			return -1;
+		}
+		str = tmp + 1;
+		if (!strcmp(str, name))
+			return type;
+		str += strlen(str) + 1;
+	}
+	return -1;
+}
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index a646025..233644b 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -150,6 +150,8 @@ int perf_event__process_build_id(struct perf_tool *tool,
 				 struct perf_session *session);
 bool is_perf_magic(u64 magic);
 
+int perf_header_find_pmu_type(struct perf_session *s, const char *name);
+
 #define NAME_ALIGN 64
 
 int write_padded(int fd, const void *bf, size_t count, size_t count_aligned);
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index b44afc7..a339338 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -17,6 +17,7 @@ struct thread;
 
 struct auxtrace;
 struct itrace_synth_opts;
+unsigned int msr_pmu_type;
 
 struct perf_session {
 	struct perf_header	header;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V2 3/6] perf,tools: Dump per-sample freq/CPU%/CORE_BUSY% in report -D
  2015-07-24 13:48 [PATCH V2 0/6] Freq/CPU%/CORE_BUSY% support kan.liang
  2015-07-24 13:48 ` [PATCH V2 1/6] perf,tools: save cpu max freq in perf header kan.liang
  2015-07-24 13:48 ` [PATCH V2 2/6] perf,tools: read cpu max freq and msr type from header kan.liang
@ 2015-07-24 13:48 ` kan.liang
  2015-07-26 16:31   ` Jiri Olsa
  2015-07-24 13:48 ` [PATCH V2 4/6] perf,tools: save misc sample read value in struct perf_sample kan.liang
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: kan.liang @ 2015-07-24 13:48 UTC (permalink / raw)
  To: a.p.zijlstra, acme
  Cc: luto, mingo, eranian, ak, mark.rutland, adrian.hunter, jolsa,
	namhyung, linux-kernel, Kan Liang

From: Kan Liang <kan.liang@intel.com>

The group read results from cycles/ref-cycles/TSC/ASTATE/MSTATE event
can be used to calculate the frequency, CPU Utilization and percent
performance during each sampling period.
This patch shows them in report -D.

Here is an example:

$ perf record -e
'{cycles,ref-cycles,msr/tsc/,msr/mperf/,msr/aperf/}:S' ~/tchain_edit

Here is one sample from perf report -D

1972044565107 0x3498 [0x88]: PERF_RECORD_SAMPLE(IP, 0x2): 10608/10608:
0x4005fd period: 564686 addr: 0
... sample_read:
.... group nr 5
..... id 0000000000000012, value 0000000002143901
..... id 0000000000000052, value 0000000002143896
..... id 0000000000000094, value 00000000021e443d
..... id 00000000000000d4, value 00000000021db984
..... id 0000000000000114, value 00000000021db964
..... Freq 2301 MHz
..... CPU% 98%
..... CORE_BUSY% 99%

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/util/session.c | 33 ++++++++++++++++++++++++++++-----
 tools/perf/util/session.h | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index ed9dc25..7f628d9 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -851,8 +851,14 @@ static void perf_evlist__print_tstamp(struct perf_evlist *evlist,
 		printf("%" PRIu64 " ", sample->time);
 }
 
-static void sample_read__printf(struct perf_sample *sample, u64 read_format)
+static void sample_read__printf(struct perf_evlist *evlist,
+				struct perf_sample *sample,
+				u64 read_format)
 {
+	struct perf_evsel *evsel;
+	struct perf_sample_id *sid;
+	u64 data[FREQ_PERF_MAX] = { 0 };
+
 	printf("... sample_read:\n");
 
 	if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
@@ -875,10 +881,26 @@ static void sample_read__printf(struct perf_sample *sample, u64 read_format)
 			printf("..... id %016" PRIx64
 			       ", value %016" PRIx64 "\n",
 			       value->id, value->value);
+
+			sid = perf_evlist__id2sid(evlist, value->id);
+			evsel = sid->evsel;
+			if (evsel != NULL)
+				SET_FREQ_PERF_VALUE(evsel, data,
+						    value->value);
 		}
 	} else
 		printf("..... id %016" PRIx64 ", value %016" PRIx64 "\n",
 			sample->read.one.id, sample->read.one.value);
+
+	if (HAS_FREQ(data))
+		printf("..... Freq %lu MHz\n",
+		       (data[FREQ_PERF_CYCLES] * cpu_max_freq) / data[FREQ_PERF_REF_CYCLES]);
+	if (HAS_CPU_U(data))
+		printf("..... CPU%% %lu%%\n",
+		       (100 * data[FREQ_PERF_REF_CYCLES]) / data[FREQ_PERF_TSC]);
+	if (HAS_CORE_BUSY(data))
+		printf("..... CORE_BUSY%% %lu%%\n",
+		       (100 * data[FREQ_PERF_APERF]) / data[FREQ_PERF_MPERF]);
 }
 
 static void dump_event(struct perf_evlist *evlist, union perf_event *event,
@@ -899,8 +921,8 @@ static void dump_event(struct perf_evlist *evlist, union perf_event *event,
 	       event->header.size, perf_event__name(event->header.type));
 }
 
-static void dump_sample(struct perf_evsel *evsel, union perf_event *event,
-			struct perf_sample *sample)
+static void dump_sample(struct perf_evlist *evlist, struct perf_evsel *evsel,
+			union perf_event *event, struct perf_sample *sample)
 {
 	u64 sample_type;
 
@@ -938,7 +960,7 @@ static void dump_sample(struct perf_evsel *evsel, union perf_event *event,
 		printf("... transaction: %" PRIx64 "\n", sample->transaction);
 
 	if (sample_type & PERF_SAMPLE_READ)
-		sample_read__printf(sample, evsel->attr.read_format);
+		sample_read__printf(evlist, sample, evsel->attr.read_format);
 }
 
 static struct machine *machines__find_for_cpumode(struct machines *machines,
@@ -1053,11 +1075,12 @@ static int machines__deliver_event(struct machines *machines,
 
 	switch (event->header.type) {
 	case PERF_RECORD_SAMPLE:
-		dump_sample(evsel, event, sample);
 		if (evsel == NULL) {
 			++evlist->stats.nr_unknown_id;
 			return 0;
 		}
+		dump_sample(evlist, evsel, event, sample);
+
 		if (machine == NULL) {
 			++evlist->stats.nr_unprocessable_samples;
 			return 0;
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index a339338..db218ba 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -43,6 +43,44 @@ struct perf_session {
 #define PRINT_IP_OPT_ONELINE	(1<<4)
 #define PRINT_IP_OPT_SRCLINE	(1<<5)
 
+#define PERF_MSR_TSC		0
+#define PERF_MSR_APERF		1
+#define PERF_MSR_MPERF		2
+
+enum perf_freq_perf_index {
+	FREQ_PERF_TSC		= 0,
+	FREQ_PERF_APERF		= 1,
+	FREQ_PERF_MPERF		= 2,
+	FREQ_PERF_CYCLES	= 3,
+	FREQ_PERF_REF_CYCLES	= 4,
+
+	FREQ_PERF_MAX
+};
+
+#define SET_FREQ_PERF_VALUE(event, array, value)			\
+{									\
+	if (event->attr.type == msr_pmu_type) {				\
+		if (event->attr.config == PERF_MSR_TSC)			\
+			array[FREQ_PERF_TSC] = value;			\
+		if (event->attr.config == PERF_MSR_APERF)		\
+			array[FREQ_PERF_APERF] = value;			\
+		if (event->attr.config == PERF_MSR_MPERF)		\
+			array[FREQ_PERF_MPERF] = value;			\
+	}								\
+	if (event->attr.type == PERF_TYPE_HARDWARE) {			\
+		if (event->attr.config == PERF_COUNT_HW_CPU_CYCLES)	\
+			array[FREQ_PERF_CYCLES] = value;		\
+		if (event->attr.config == PERF_COUNT_HW_REF_CPU_CYCLES)	\
+			array[FREQ_PERF_REF_CYCLES] = value;		\
+	}								\
+}
+
+#define HAS_FREQ(array)							\
+	((array[FREQ_PERF_CYCLES] > 0) && (array[FREQ_PERF_REF_CYCLES] > 0))
+#define HAS_CPU_U(array)						\
+	((array[FREQ_PERF_TSC] > 0) && (array[FREQ_PERF_REF_CYCLES] > 0))
+#define HAS_CORE_BUSY(array)						\
+	((array[FREQ_PERF_APERF] > 0) && (array[FREQ_PERF_MPERF] > 0))
 struct perf_tool;
 
 struct perf_session *perf_session__new(struct perf_data_file *file,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V2 4/6] perf,tools: save misc sample read value in struct perf_sample
  2015-07-24 13:48 [PATCH V2 0/6] Freq/CPU%/CORE_BUSY% support kan.liang
                   ` (2 preceding siblings ...)
  2015-07-24 13:48 ` [PATCH V2 3/6] perf,tools: Dump per-sample freq/CPU%/CORE_BUSY% in report -D kan.liang
@ 2015-07-24 13:48 ` kan.liang
  2015-07-26 16:31   ` Jiri Olsa
  2015-07-24 13:48 ` [PATCH V2 5/6] perf,tools: caculate and save freq/CPU%/CORE_BUSY% in he_stat kan.liang
  2015-07-24 13:48 ` [PATCH V2 6/6] perf,tools: Show freq/CPU%/CORE_BUSY% in perf report --stdio kan.liang
  5 siblings, 1 reply; 22+ messages in thread
From: kan.liang @ 2015-07-24 13:48 UTC (permalink / raw)
  To: a.p.zijlstra, acme
  Cc: luto, mingo, eranian, ak, mark.rutland, adrian.hunter, jolsa,
	namhyung, linux-kernel, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Save group read results from cycles/ref-cycles/TSC/ASTATE/MSTATE in
struct perf_sample. The following sample process function can easily
use them to caculate freq/CPU%/CORE_BUSY% and add them in hists.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/util/event.h   | 11 +++++++++++
 tools/perf/util/session.c | 16 ++++++++++++++++
 tools/perf/util/session.h | 10 ----------
 3 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index c53f363..f7aabe3 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -176,6 +176,16 @@ enum {
 	PERF_IP_FLAG_TRACE_BEGIN	|\
 	PERF_IP_FLAG_TRACE_END)
 
+enum perf_freq_perf_index {
+	FREQ_PERF_TSC		= 0,
+	FREQ_PERF_APERF		= 1,
+	FREQ_PERF_MPERF		= 2,
+	FREQ_PERF_CYCLES	= 3,
+	FREQ_PERF_REF_CYCLES	= 4,
+
+	FREQ_PERF_MAX
+};
+
 struct perf_sample {
 	u64 ip;
 	u32 pid, tid;
@@ -191,6 +201,7 @@ struct perf_sample {
 	u64 data_src;
 	u32 flags;
 	u16 insn_len;
+	u64 freq_perf_data[FREQ_PERF_MAX];
 	void *raw_data;
 	struct ip_callchain *callchain;
 	struct branch_stack *branch_stack;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 7f628d9..7da36b1 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -998,6 +998,8 @@ static int deliver_sample_value(struct perf_evlist *evlist,
 				struct machine *machine)
 {
 	struct perf_sample_id *sid = perf_evlist__id2sid(evlist, v->id);
+	struct perf_evsel *evsel;
+	u64 nr = 0;
 
 	if (sid) {
 		sample->id     = v->id;
@@ -1010,6 +1012,20 @@ static int deliver_sample_value(struct perf_evlist *evlist,
 		return 0;
 	}
 
+	if (perf_evsel__is_group_leader(sid->evsel)) {
+		evsel = sid->evsel;
+		SET_FREQ_PERF_VALUE(evsel, sample->freq_perf_data,
+				    sample->read.group.values[nr].value);
+		evlist__for_each_continue(evlist, evsel) {
+			if ((evsel->leader != sid->evsel) ||
+			    (++nr >= sample->read.group.nr))
+				break;
+
+			SET_FREQ_PERF_VALUE(evsel, sample->freq_perf_data,
+					    sample->read.group.values[nr].value);
+		}
+	}
+
 	return tool->sample(tool, event, sample, sid->evsel, machine);
 }
 
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index db218ba..ee23872 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -47,16 +47,6 @@ struct perf_session {
 #define PERF_MSR_APERF		1
 #define PERF_MSR_MPERF		2
 
-enum perf_freq_perf_index {
-	FREQ_PERF_TSC		= 0,
-	FREQ_PERF_APERF		= 1,
-	FREQ_PERF_MPERF		= 2,
-	FREQ_PERF_CYCLES	= 3,
-	FREQ_PERF_REF_CYCLES	= 4,
-
-	FREQ_PERF_MAX
-};
-
 #define SET_FREQ_PERF_VALUE(event, array, value)			\
 {									\
 	if (event->attr.type == msr_pmu_type) {				\
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V2 5/6] perf,tools: caculate and save freq/CPU%/CORE_BUSY% in he_stat
  2015-07-24 13:48 [PATCH V2 0/6] Freq/CPU%/CORE_BUSY% support kan.liang
                   ` (3 preceding siblings ...)
  2015-07-24 13:48 ` [PATCH V2 4/6] perf,tools: save misc sample read value in struct perf_sample kan.liang
@ 2015-07-24 13:48 ` kan.liang
  2015-07-26 16:31   ` Jiri Olsa
  2015-07-26 16:31   ` Jiri Olsa
  2015-07-24 13:48 ` [PATCH V2 6/6] perf,tools: Show freq/CPU%/CORE_BUSY% in perf report --stdio kan.liang
  5 siblings, 2 replies; 22+ messages in thread
From: kan.liang @ 2015-07-24 13:48 UTC (permalink / raw)
  To: a.p.zijlstra, acme
  Cc: luto, mingo, eranian, ak, mark.rutland, adrian.hunter, jolsa,
	namhyung, linux-kernel, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Introduce a new hist_iter ops (hist_iter_freq_perf) to caculate the
freq/CPU%/CORE_BUSY% freq when processing samples, and save them in
hist_entry.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/builtin-annotate.c |  2 +-
 tools/perf/builtin-diff.c     |  2 +-
 tools/perf/tests/hists_link.c |  4 ++--
 tools/perf/util/hist.c        | 51 ++++++++++++++++++++++++++++++++++++++-----
 tools/perf/util/hist.h        |  2 ++
 tools/perf/util/sort.h        |  3 +++
 tools/perf/util/symbol.h      |  6 +++++
 7 files changed, 60 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 2c1bec3..06e2f87 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -71,7 +71,7 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel,
 		return 0;
 	}
 
-	he = __hists__add_entry(hists, al, NULL, NULL, NULL, 1, 1, 0, true);
+	he = __hists__add_entry(hists, al, NULL, NULL, NULL, 1, 1, 0, NULL, true);
 	if (he == NULL)
 		return -ENOMEM;
 
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index daaa7dc..2fffcc4 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -315,7 +315,7 @@ static int hists__add_entry(struct hists *hists,
 			    u64 weight, u64 transaction)
 {
 	if (__hists__add_entry(hists, al, NULL, NULL, NULL, period, weight,
-			       transaction, true) != NULL)
+			       transaction, NULL, true) != NULL)
 		return 0;
 	return -ENOMEM;
 }
diff --git a/tools/perf/tests/hists_link.c b/tools/perf/tests/hists_link.c
index 8c102b0..5d9f9e3 100644
--- a/tools/perf/tests/hists_link.c
+++ b/tools/perf/tests/hists_link.c
@@ -90,7 +90,7 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine)
 				goto out;
 
 			he = __hists__add_entry(hists, &al, NULL,
-						NULL, NULL, 1, 1, 0, true);
+						NULL, NULL, 1, 1, 0, NULL, true);
 			if (he == NULL) {
 				addr_location__put(&al);
 				goto out;
@@ -116,7 +116,7 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine)
 				goto out;
 
 			he = __hists__add_entry(hists, &al, NULL,
-						NULL, NULL, 1, 1, 0, true);
+						NULL, NULL, 1, 1, 0, NULL, true);
 			if (he == NULL) {
 				addr_location__put(&al);
 				goto out;
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 6f28d53..26b8eea 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -436,7 +436,9 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
 				      struct symbol *sym_parent,
 				      struct branch_info *bi,
 				      struct mem_info *mi,
-				      u64 period, u64 weight, u64 transaction,
+				      u64 period, u64 weight,
+				      u64 transaction,
+				      struct freq_perf_info *info,
 				      bool sample_self)
 {
 	struct hist_entry entry = {
@@ -454,6 +456,9 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
 			.nr_events = 1,
 			.period	= period,
 			.weight = weight,
+			.freq = (info != NULL) ? info->freq : 0,
+			.cpu_u = (info != NULL) ? info->cpu_u : 0,
+			.core_busy = (info != NULL) ? info->core_busy : 0,
 		},
 		.parent = sym_parent,
 		.filtered = symbol__parent_filter(sym_parent) | al->filtered,
@@ -481,6 +486,32 @@ iter_add_next_nop_entry(struct hist_entry_iter *iter __maybe_unused,
 }
 
 static int
+iter_add_single_freq_perf_entry(struct hist_entry_iter *iter, struct addr_location *al)
+{
+	struct perf_evsel *evsel = iter->evsel;
+	struct perf_sample *sample = iter->sample;
+	struct hist_entry *he;
+	struct freq_perf_info info = {0};
+	u64 *data = sample->freq_perf_data;
+
+	if (data[FREQ_PERF_REF_CYCLES] > 0)
+		info.freq = (data[FREQ_PERF_CYCLES] * cpu_max_freq) / data[FREQ_PERF_REF_CYCLES];
+	if (data[FREQ_PERF_TSC] > 0)
+		info.cpu_u = (100 * data[FREQ_PERF_REF_CYCLES]) / data[FREQ_PERF_TSC];
+	if (data[FREQ_PERF_MPERF] > 0)
+		info.core_busy = (100 * data[FREQ_PERF_APERF]) / data[FREQ_PERF_MPERF];
+
+	he = __hists__add_entry(evsel__hists(evsel), al, iter->parent, NULL, NULL,
+				sample->period, sample->weight,
+				sample->transaction, &info, true);
+	if (he == NULL)
+		return -ENOMEM;
+
+	iter->he = he;
+	return 0;
+}
+
+static int
 iter_prepare_mem_entry(struct hist_entry_iter *iter, struct addr_location *al)
 {
 	struct perf_sample *sample = iter->sample;
@@ -517,7 +548,7 @@ iter_add_single_mem_entry(struct hist_entry_iter *iter, struct addr_location *al
 	 * and the he_stat__add_period() function.
 	 */
 	he = __hists__add_entry(hists, al, iter->parent, NULL, mi,
-				cost, cost, 0, true);
+				cost, cost, 0, NULL, true);
 	if (!he)
 		return -ENOMEM;
 
@@ -618,7 +649,7 @@ iter_add_next_branch_entry(struct hist_entry_iter *iter, struct addr_location *a
 	 * and not events sampled. Thus we use a pseudo period of 1.
 	 */
 	he = __hists__add_entry(hists, al, iter->parent, &bi[i], NULL,
-				1, 1, 0, true);
+				1, 1, 0, NULL, true);
 	if (he == NULL)
 		return -ENOMEM;
 
@@ -656,7 +687,7 @@ iter_add_single_normal_entry(struct hist_entry_iter *iter, struct addr_location
 
 	he = __hists__add_entry(evsel__hists(evsel), al, iter->parent, NULL, NULL,
 				sample->period, sample->weight,
-				sample->transaction, true);
+				sample->transaction, NULL, true);
 	if (he == NULL)
 		return -ENOMEM;
 
@@ -718,7 +749,7 @@ iter_add_single_cumulative_entry(struct hist_entry_iter *iter,
 
 	he = __hists__add_entry(hists, al, iter->parent, NULL, NULL,
 				sample->period, sample->weight,
-				sample->transaction, true);
+				sample->transaction, NULL, true);
 	if (he == NULL)
 		return -ENOMEM;
 
@@ -791,7 +822,7 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
 
 	he = __hists__add_entry(evsel__hists(evsel), al, iter->parent, NULL, NULL,
 				sample->period, sample->weight,
-				sample->transaction, false);
+				sample->transaction, NULL, false);
 	if (he == NULL)
 		return -ENOMEM;
 
@@ -813,6 +844,14 @@ iter_finish_cumulative_entry(struct hist_entry_iter *iter,
 	return 0;
 }
 
+const struct hist_iter_ops hist_iter_freq_perf = {
+	.prepare_entry		= iter_prepare_normal_entry,
+	.add_single_entry	= iter_add_single_freq_perf_entry,
+	.next_entry		= iter_next_nop_entry,
+	.add_next_entry		= iter_add_next_nop_entry,
+	.finish_entry		= iter_finish_normal_entry,
+};
+
 const struct hist_iter_ops hist_iter_mem = {
 	.prepare_entry 		= iter_prepare_mem_entry,
 	.add_single_entry 	= iter_add_single_mem_entry,
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 5ed8d9c..70bd557 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -102,6 +102,7 @@ extern const struct hist_iter_ops hist_iter_normal;
 extern const struct hist_iter_ops hist_iter_branch;
 extern const struct hist_iter_ops hist_iter_mem;
 extern const struct hist_iter_ops hist_iter_cumulative;
+extern const struct hist_iter_ops hist_iter_freq_perf;
 
 struct hist_entry *__hists__add_entry(struct hists *hists,
 				      struct addr_location *al,
@@ -109,6 +110,7 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
 				      struct branch_info *bi,
 				      struct mem_info *mi, u64 period,
 				      u64 weight, u64 transaction,
+				      struct freq_perf_info *freq_perf,
 				      bool sample_self);
 int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al,
 			 int max_stack_depth, void *arg);
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index e97cd47..90422ed 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -54,6 +54,9 @@ struct he_stat {
 	u64			period_guest_us;
 	u64			weight;
 	u32			nr_events;
+	u64			freq;
+	u64			cpu_u;
+	u64			core_busy;
 };
 
 struct hist_entry_diff {
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index b98ce51..fa0ccf3 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -180,6 +180,12 @@ struct mem_info {
 	union perf_mem_data_src data_src;
 };
 
+struct freq_perf_info {
+	u64	freq;
+	u64	cpu_u;
+	u64	core_busy;
+};
+
 struct addr_location {
 	struct machine *machine;
 	struct thread *thread;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V2 6/6] perf,tools: Show freq/CPU%/CORE_BUSY% in perf report --stdio
  2015-07-24 13:48 [PATCH V2 0/6] Freq/CPU%/CORE_BUSY% support kan.liang
                   ` (4 preceding siblings ...)
  2015-07-24 13:48 ` [PATCH V2 5/6] perf,tools: caculate and save freq/CPU%/CORE_BUSY% in he_stat kan.liang
@ 2015-07-24 13:48 ` kan.liang
  2015-07-26 16:32   ` Jiri Olsa
  5 siblings, 1 reply; 22+ messages in thread
From: kan.liang @ 2015-07-24 13:48 UTC (permalink / raw)
  To: a.p.zijlstra, acme
  Cc: luto, mingo, eranian, ak, mark.rutland, adrian.hunter, jolsa,
	namhyung, linux-kernel, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Show frequency, CPU Utilization and percent performance for each symbol
in perf report by --stdio --show-freq-perf

In sampling group, only group leader do sampling. So only need to print
group leader's freq in --group.

Here is an example.

$ perf report --stdio --group --show-freq-perf

                                 Overhead   FREQ MHz   CPU%  CORE_BUSY%
Command      Shared Object     Symbol
 ........................................  .........  .....  ..........
...........  ................  ......................

    99.54%  99.54%  99.53%  99.53%  99.53%       2301     96         99
tchain_edit  tchain_edit       [.] f3
     0.20%   0.20%   0.20%   0.20%   0.20%       2301     98         99
tchain_edit  tchain_edit       [.] f2
     0.05%   0.05%   0.05%   0.05%   0.05%       2300     98         99
tchain_edit  [kernel.vmlinux]  [k] read_tsc

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/Documentation/perf-report.txt | 12 ++++++
 tools/perf/builtin-report.c              | 19 +++++++++
 tools/perf/perf.h                        |  1 +
 tools/perf/ui/hist.c                     | 71 +++++++++++++++++++++++++++++---
 tools/perf/util/hist.h                   |  3 ++
 tools/perf/util/session.c                |  2 +-
 tools/perf/util/sort.c                   |  3 ++
 tools/perf/util/symbol.h                 |  3 +-
 tools/perf/util/util.c                   |  2 +
 9 files changed, 109 insertions(+), 7 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index c33b69f..faa8825 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -303,6 +303,18 @@ OPTIONS
 	special event -e cpu/mem-loads/ or -e cpu/mem-stores/. See
 	'perf mem' for simpler access.
 
+--show-freq-perf::
+	Show CPU frequency and performance result from sample read.
+	To generate the frequency and performance output, the perf.data file
+	must have been obtained by group read and using special events cycles,
+	ref-cycles, msr/tsc/, msr/aperf/ or msr/mperf/
+	Freq MHz: The frequency during the sample interval. Needs cycles and
+		  ref-cycles event.
+	CPU%: CPU utilization during the sample interval. Needs ref-cycles and
+	      msr/tsc/ events.
+	CORE_BUSY%: actual percent performance (APERF/MPERF%) during the
+		    sample interval. Needs msr/aperf/ and msr/mperf/ events.
+
 --percent-limit::
 	Do not show entries which have an overhead under that percent.
 	(Default: 0).
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 2a32e9a..f52c63d 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -164,6 +164,8 @@ static int process_sample_event(struct perf_tool *tool,
 		iter.ops = &hist_iter_mem;
 	else if (symbol_conf.cumulate_callchain)
 		iter.ops = &hist_iter_cumulative;
+	else if (symbol_conf.show_freq_perf)
+		iter.ops = &hist_iter_freq_perf;
 	else
 		iter.ops = &hist_iter_normal;
 
@@ -721,6 +723,8 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_BOOLEAN(0, "demangle-kernel", &symbol_conf.demangle_kernel,
 		    "Enable kernel symbol demangling"),
 	OPT_BOOLEAN(0, "mem-mode", &report.mem_mode, "mem access profile"),
+	OPT_BOOLEAN(0, "show-freq-perf", &symbol_conf.show_freq_perf,
+		    "show CPU freqency and performance info"),
 	OPT_CALLBACK(0, "percent-limit", &report, "percent",
 		     "Don't show entries under that percent", parse_percent_limit),
 	OPT_CALLBACK(0, "percentage", NULL, "relative|absolute",
@@ -733,7 +737,9 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 	struct perf_data_file file = {
 		.mode  = PERF_DATA_MODE_READ,
 	};
+	struct perf_evsel *pos;
 	int ret = hists__init();
+	bool freq_perf_info[FREQ_PERF_MAX] = {0};
 
 	if (ret < 0)
 		return ret;
@@ -821,6 +827,19 @@ repeat:
 	cpu_max_freq = session->header.env.cpu_max_freq;
 	msr_pmu_type = perf_header_find_pmu_type(session, "msr");
 
+	if (symbol_conf.show_freq_perf) {
+		perf_freq = perf_cpu_u = perf_core_busy = false;
+		evlist__for_each(session->evlist, pos) {
+			SET_FREQ_PERF_VALUE(pos, freq_perf_info, true);
+		}
+		if (HAS_FREQ(freq_perf_info))
+			perf_freq = true;
+		if (HAS_CPU_U(freq_perf_info))
+			perf_cpu_u = true;
+		if (HAS_CORE_BUSY(freq_perf_info))
+			perf_core_busy = true;
+	}
+
 	if (setup_sorting() < 0) {
 		if (sort_order)
 			parse_options_usage(report_usage, options, "s", 1);
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 937b16a..87daab8 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -33,6 +33,7 @@ static inline unsigned long long rdclock(void)
 
 extern const char *input_name;
 extern bool perf_host, perf_guest;
+extern bool perf_freq, perf_cpu_u, perf_core_busy;
 extern const char perf_version_string[];
 
 void pthread__unblock_sigwinch(void);
diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index 25d6083..949bbf2 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -17,7 +17,7 @@
 
 static int __hpp__fmt(struct perf_hpp *hpp, struct hist_entry *he,
 		      hpp_field_fn get_field, const char *fmt, int len,
-		      hpp_snprint_fn print_fn, bool fmt_percent)
+		      hpp_snprint_fn print_fn, bool fmt_percent, bool single)
 {
 	int ret;
 	struct hists *hists = he->hists;
@@ -36,7 +36,7 @@ static int __hpp__fmt(struct perf_hpp *hpp, struct hist_entry *he,
 	} else
 		ret = hpp__call_print_fn(hpp, print_fn, fmt, len, get_field(he));
 
-	if (perf_evsel__is_group_event(evsel)) {
+	if (perf_evsel__is_group_event(evsel) && !single) {
 		int prev_idx, idx_delta;
 		struct hist_entry *pair;
 		int nr_members = evsel->nr_members;
@@ -109,10 +109,17 @@ int hpp__fmt(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
 	     const char *fmtstr, hpp_snprint_fn print_fn, bool fmt_percent)
 {
 	int len = fmt->user_len ?: fmt->len;
+	bool single = false;
+
+	if (symbol_conf.show_freq_perf &&
+	    ((fmt == &perf_hpp__format[PERF_HPP__FREQ]) ||
+	     (fmt == &perf_hpp__format[PERF_HPP__CPU_U]) ||
+	     (fmt == &perf_hpp__format[PERF_HPP__CORE_BUSY])))
+		single = true;
 
 	if (symbol_conf.field_sep) {
 		return __hpp__fmt(hpp, he, get_field, fmtstr, 1,
-				  print_fn, fmt_percent);
+				  print_fn, fmt_percent, single);
 	}
 
 	if (fmt_percent)
@@ -120,7 +127,7 @@ int hpp__fmt(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
 	else
 		len -= 1;
 
-	return  __hpp__fmt(hpp, he, get_field, fmtstr, len, print_fn, fmt_percent);
+	return  __hpp__fmt(hpp, he, get_field, fmtstr, len, print_fn, fmt_percent, single);
 }
 
 int hpp__fmt_acc(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
@@ -234,6 +241,30 @@ static int hpp__header_fn(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
 	return scnprintf(hpp->buf, hpp->size, "%*s", len, fmt->name);
 }
 
+static int hpp__single_width_fn(struct perf_hpp_fmt *fmt,
+			 struct perf_hpp *hpp __maybe_unused,
+			 struct perf_evsel *evsel)
+{
+	int len = fmt->user_len ?: fmt->len;
+
+	if (symbol_conf.event_group && !symbol_conf.show_freq_perf)
+		len = max(len, evsel->nr_members * fmt->len);
+
+	if (len < (int)strlen(fmt->name))
+		len = strlen(fmt->name);
+
+	return len;
+}
+
+static int hpp__single_header_fn(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+			  struct perf_evsel *evsel)
+{
+	int len = hpp__single_width_fn(fmt, hpp, evsel);
+
+	return scnprintf(hpp->buf, hpp->size, "%*s", len, fmt->name);
+}
+
+
 static int hpp_color_scnprintf(struct perf_hpp *hpp, const char *fmt, ...)
 {
 	va_list args;
@@ -363,6 +394,9 @@ HPP_PERCENT_ACC_FNS(overhead_acc, period)
 
 HPP_RAW_FNS(samples, nr_events)
 HPP_RAW_FNS(period, period)
+HPP_RAW_FNS(freq, freq)
+HPP_RAW_FNS(cpu_u, cpu_u)
+HPP_RAW_FNS(core_busy, core_busy)
 
 static int64_t hpp__nop_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
 			    struct hist_entry *a __maybe_unused,
@@ -395,6 +429,17 @@ static int64_t hpp__nop_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
 		.sort	= hpp__sort_ ## _fn,		\
 	}
 
+#define HPP__SINGLE_PRINT_FNS(_name, _fn)		\
+	{						\
+		.name   = _name,			\
+		.header	= hpp__single_header_fn,	\
+		.width	= hpp__single_width_fn,		\
+		.entry	= hpp__entry_ ## _fn,		\
+		.cmp	= hpp__nop_cmp,			\
+		.collapse = hpp__nop_cmp,		\
+		.sort	= hpp__sort_ ## _fn,		\
+	}
+
 #define HPP__PRINT_FNS(_name, _fn)			\
 	{						\
 		.name   = _name,			\
@@ -414,7 +459,10 @@ struct perf_hpp_fmt perf_hpp__format[] = {
 	HPP__COLOR_PRINT_FNS("guest usr", overhead_guest_us),
 	HPP__COLOR_ACC_PRINT_FNS("Children", overhead_acc),
 	HPP__PRINT_FNS("Samples", samples),
-	HPP__PRINT_FNS("Period", period)
+	HPP__PRINT_FNS("Period", period),
+	HPP__SINGLE_PRINT_FNS("FREQ MHz", freq),
+	HPP__SINGLE_PRINT_FNS("CPU%", cpu_u),
+	HPP__SINGLE_PRINT_FNS("CORE_BUSY%", core_busy)
 };
 
 LIST_HEAD(perf_hpp__list);
@@ -485,6 +533,14 @@ void perf_hpp__init(void)
 	if (symbol_conf.show_total_period)
 		perf_hpp__column_enable(PERF_HPP__PERIOD);
 
+	if (symbol_conf.show_freq_perf) {
+		if (perf_freq)
+			perf_hpp__column_enable(PERF_HPP__FREQ);
+		if (perf_cpu_u)
+			perf_hpp__column_enable(PERF_HPP__CPU_U);
+		if (perf_core_busy)
+			perf_hpp__column_enable(PERF_HPP__CORE_BUSY);
+	}
 	/* prepend overhead field for backward compatiblity.  */
 	list = &perf_hpp__format[PERF_HPP__OVERHEAD].sort_list;
 	if (list_empty(list))
@@ -652,6 +708,9 @@ void perf_hpp__reset_width(struct perf_hpp_fmt *fmt, struct hists *hists)
 		return;
 
 	switch (idx) {
+	case PERF_HPP__CPU_U:
+		fmt->len = 5;
+		break;
 	case PERF_HPP__OVERHEAD:
 	case PERF_HPP__OVERHEAD_SYS:
 	case PERF_HPP__OVERHEAD_US:
@@ -661,6 +720,8 @@ void perf_hpp__reset_width(struct perf_hpp_fmt *fmt, struct hists *hists)
 
 	case PERF_HPP__OVERHEAD_GUEST_SYS:
 	case PERF_HPP__OVERHEAD_GUEST_US:
+	case PERF_HPP__FREQ:
+	case PERF_HPP__CORE_BUSY:
 		fmt->len = 9;
 		break;
 
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 70bd557..ec64234 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -237,6 +237,9 @@ enum {
 	PERF_HPP__OVERHEAD_ACC,
 	PERF_HPP__SAMPLES,
 	PERF_HPP__PERIOD,
+	PERF_HPP__FREQ,
+	PERF_HPP__CPU_U,
+	PERF_HPP__CORE_BUSY,
 
 	PERF_HPP__MAX_INDEX
 };
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 7da36b1..c5010f5 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1012,7 +1012,7 @@ static int deliver_sample_value(struct perf_evlist *evlist,
 		return 0;
 	}
 
-	if (perf_evsel__is_group_leader(sid->evsel)) {
+	if (symbol_conf.show_freq_perf && perf_evsel__is_group_leader(sid->evsel)) {
 		evsel = sid->evsel;
 		SET_FREQ_PERF_VALUE(evsel, sample->freq_perf_data,
 				    sample->read.group.values[nr].value);
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 4c65a14..690e173 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -1225,6 +1225,9 @@ static struct hpp_dimension hpp_sort_dimensions[] = {
 	DIM(PERF_HPP__OVERHEAD_ACC, "overhead_children"),
 	DIM(PERF_HPP__SAMPLES, "sample"),
 	DIM(PERF_HPP__PERIOD, "period"),
+	DIM(PERF_HPP__FREQ, "freq"),
+	DIM(PERF_HPP__CPU_U, "cpu_u"),
+	DIM(PERF_HPP__CORE_BUSY, "core_busy"),
 };
 
 #undef DIM
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index fa0ccf3..7d70c89 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -106,7 +106,8 @@ struct symbol_conf {
 			filter_relative,
 			show_hist_headers,
 			branch_callstack,
-			has_filter;
+			has_filter,
+			show_freq_perf;
 	const char	*vmlinux_name,
 			*kallsyms_name,
 			*source_prefix,
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index edc2d63..648b307 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -34,6 +34,8 @@ bool test_attr__enabled;
 bool perf_host  = true;
 bool perf_guest = false;
 
+bool perf_freq, perf_cpu_u, perf_core_busy;
+
 char tracing_events_path[PATH_MAX + 1] = "/sys/kernel/debug/tracing/events";
 
 void event_attr_init(struct perf_event_attr *attr)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH V2 5/6] perf,tools: caculate and save freq/CPU%/CORE_BUSY% in he_stat
  2015-07-24 13:48 ` [PATCH V2 5/6] perf,tools: caculate and save freq/CPU%/CORE_BUSY% in he_stat kan.liang
@ 2015-07-26 16:31   ` Jiri Olsa
  2015-07-27 17:19     ` Liang, Kan
  2015-07-27 22:34     ` Liang, Kan
  2015-07-26 16:31   ` Jiri Olsa
  1 sibling, 2 replies; 22+ messages in thread
From: Jiri Olsa @ 2015-07-26 16:31 UTC (permalink / raw)
  To: kan.liang
  Cc: a.p.zijlstra, acme, luto, mingo, eranian, ak, mark.rutland,
	adrian.hunter, jolsa, namhyung, linux-kernel

On Fri, Jul 24, 2015 at 09:48:31AM -0400, kan.liang@intel.com wrote:

SNIP

>  		},
>  		.parent = sym_parent,
>  		.filtered = symbol__parent_filter(sym_parent) | al->filtered,
> @@ -481,6 +486,32 @@ iter_add_next_nop_entry(struct hist_entry_iter *iter __maybe_unused,
>  }
>  
>  static int
> +iter_add_single_freq_perf_entry(struct hist_entry_iter *iter, struct addr_location *al)
> +{
> +	struct perf_evsel *evsel = iter->evsel;
> +	struct perf_sample *sample = iter->sample;
> +	struct hist_entry *he;
> +	struct freq_perf_info info = {0};
> +	u64 *data = sample->freq_perf_data;
> +
> +	if (data[FREQ_PERF_REF_CYCLES] > 0)
> +		info.freq = (data[FREQ_PERF_CYCLES] * cpu_max_freq) / data[FREQ_PERF_REF_CYCLES];
> +	if (data[FREQ_PERF_TSC] > 0)
> +		info.cpu_u = (100 * data[FREQ_PERF_REF_CYCLES]) / data[FREQ_PERF_TSC];
> +	if (data[FREQ_PERF_MPERF] > 0)
> +		info.core_busy = (100 * data[FREQ_PERF_APERF]) / data[FREQ_PERF_MPERF];

seems to me the new iterator is too big gun for this,
why not initialize  'struct freq_perf_info' in
iter_prepare_normal_entry ?

jirka

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V2 5/6] perf,tools: caculate and save freq/CPU%/CORE_BUSY% in he_stat
  2015-07-24 13:48 ` [PATCH V2 5/6] perf,tools: caculate and save freq/CPU%/CORE_BUSY% in he_stat kan.liang
  2015-07-26 16:31   ` Jiri Olsa
@ 2015-07-26 16:31   ` Jiri Olsa
  2015-07-27 17:11     ` Liang, Kan
  1 sibling, 1 reply; 22+ messages in thread
From: Jiri Olsa @ 2015-07-26 16:31 UTC (permalink / raw)
  To: kan.liang
  Cc: a.p.zijlstra, acme, luto, mingo, eranian, ak, mark.rutland,
	adrian.hunter, jolsa, namhyung, linux-kernel

On Fri, Jul 24, 2015 at 09:48:31AM -0400, kan.liang@intel.com wrote:

SNIP

>  				      struct addr_location *al,
> @@ -109,6 +110,7 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
>  				      struct branch_info *bi,
>  				      struct mem_info *mi, u64 period,
>  				      u64 weight, u64 transaction,
> +				      struct freq_perf_info *freq_perf,
>  				      bool sample_self);
>  int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al,
>  			 int max_stack_depth, void *arg);
> diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
> index e97cd47..90422ed 100644
> --- a/tools/perf/util/sort.h
> +++ b/tools/perf/util/sort.h
> @@ -54,6 +54,9 @@ struct he_stat {
>  	u64			period_guest_us;
>  	u64			weight;
>  	u32			nr_events;
> +	u64			freq;
> +	u64			cpu_u;

could the cpu_u have more descriptive name?

thanks,
jirka

> +	u64			core_busy;
>  };
>  

SNIP

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V2 3/6] perf,tools: Dump per-sample freq/CPU%/CORE_BUSY% in report -D
  2015-07-24 13:48 ` [PATCH V2 3/6] perf,tools: Dump per-sample freq/CPU%/CORE_BUSY% in report -D kan.liang
@ 2015-07-26 16:31   ` Jiri Olsa
  0 siblings, 0 replies; 22+ messages in thread
From: Jiri Olsa @ 2015-07-26 16:31 UTC (permalink / raw)
  To: kan.liang
  Cc: a.p.zijlstra, acme, luto, mingo, eranian, ak, mark.rutland,
	adrian.hunter, jolsa, namhyung, linux-kernel

On Fri, Jul 24, 2015 at 09:48:29AM -0400, kan.liang@intel.com wrote:

SNIP

> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index ed9dc25..7f628d9 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -851,8 +851,14 @@ static void perf_evlist__print_tstamp(struct perf_evlist *evlist,
>  		printf("%" PRIu64 " ", sample->time);
>  }
>  
> -static void sample_read__printf(struct perf_sample *sample, u64 read_format)
> +static void sample_read__printf(struct perf_evlist *evlist,
> +				struct perf_sample *sample,
> +				u64 read_format)
>  {
> +	struct perf_evsel *evsel;
> +	struct perf_sample_id *sid;
> +	u64 data[FREQ_PERF_MAX] = { 0 };
> +
>  	printf("... sample_read:\n");
>  
>  	if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
> @@ -875,10 +881,26 @@ static void sample_read__printf(struct perf_sample *sample, u64 read_format)
>  			printf("..... id %016" PRIx64
>  			       ", value %016" PRIx64 "\n",
>  			       value->id, value->value);
> +
> +			sid = perf_evlist__id2sid(evlist, value->id);
> +			evsel = sid->evsel;
> +			if (evsel != NULL)
> +				SET_FREQ_PERF_VALUE(evsel, data,
> +						    value->value);
>  		}
>  	} else
>  		printf("..... id %016" PRIx64 ", value %016" PRIx64 "\n",
>  			sample->read.one.id, sample->read.one.value);
> +
> +	if (HAS_FREQ(data))
> +		printf("..... Freq %lu MHz\n",
> +		       (data[FREQ_PERF_CYCLES] * cpu_max_freq) / data[FREQ_PERF_REF_CYCLES]);

I dont like having those numbers (cpu_max_freq, msr_pmu_type) pushed
into global variables that would be used

I think you should somehow push those session's values 
through the whole dump_sample code

maybe make them both part of session::perf_session_env and push
this pointer along the dump_sample call stack

jirka

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V2 1/6] perf,tools: save cpu max freq in perf header
  2015-07-24 13:48 ` [PATCH V2 1/6] perf,tools: save cpu max freq in perf header kan.liang
@ 2015-07-26 16:31   ` Jiri Olsa
  2015-07-26 16:32   ` Jiri Olsa
  1 sibling, 0 replies; 22+ messages in thread
From: Jiri Olsa @ 2015-07-26 16:31 UTC (permalink / raw)
  To: kan.liang
  Cc: a.p.zijlstra, acme, luto, mingo, eranian, ak, mark.rutland,
	adrian.hunter, jolsa, namhyung, linux-kernel

On Fri, Jul 24, 2015 at 09:48:27AM -0400, kan.liang@intel.com wrote:

SNIP

>  			      struct perf_header *h __maybe_unused,
>  		       struct perf_evlist *evlist __maybe_unused)
> @@ -1158,6 +1168,11 @@ static void print_cpuid(struct perf_header *ph, int fd __maybe_unused, FILE *fp)
>  	fprintf(fp, "# cpuid : %s\n", ph->env.cpuid);
>  }
>  
> +static void print_cpu_max_freq(struct perf_header *ph, int fd __maybe_unused, FILE *fp)
> +{
> +	fprintf(fp, "# CPU max frequency : %u MHz\n", ph->env.cpu_max_freq);
> +}
> +
>  static void print_branch_stack(struct perf_header *ph __maybe_unused,
>  			       int fd __maybe_unused, FILE *fp)
>  {
> @@ -1471,6 +1486,25 @@ static int process_cpuid(struct perf_file_section *section __maybe_unused,
>  	return ph->env.cpuid ? 0 : -ENOMEM;
>  }
>  
> +static int process_cpu_max_freq(struct perf_file_section *section __maybe_unused,
> +				struct perf_header *ph, int fd,
> +				void *data __maybe_unused)
> +{
> +	ssize_t ret;
> +	u32 nr;
> +
> +	ret = readn(fd, &nr, sizeof(nr));
> +	if (ret != sizeof(nr))
> +		return -1;
> +
> +	if (ph->needs_swap)
> +		nr = bswap_32(nr);
> +
> +	ph->env.cpu_max_freq = nr;
> +
> +	return 0;
> +}
> +
>  static int process_total_mem(struct perf_file_section *section __maybe_unused,
>  			     struct perf_header *ph, int fd,
>  			     void *data __maybe_unused)
> @@ -1885,6 +1919,7 @@ static const struct feature_ops feat_ops[HEADER_LAST_FEATURE] = {
>  	FEAT_OPP(HEADER_NRCPUS,		nrcpus),
>  	FEAT_OPP(HEADER_CPUDESC,	cpudesc),
>  	FEAT_OPP(HEADER_CPUID,		cpuid),
> +	FEAT_OPP(HEADER_CPU_MAX_FREQ,	cpu_max_freq),

I wonder we should start generic FEAT for CPU related data
stored in some expandable way tag-value or such

jirka

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V2 4/6] perf,tools: save misc sample read value in struct perf_sample
  2015-07-24 13:48 ` [PATCH V2 4/6] perf,tools: save misc sample read value in struct perf_sample kan.liang
@ 2015-07-26 16:31   ` Jiri Olsa
  2015-07-27 22:24     ` Liang, Kan
  0 siblings, 1 reply; 22+ messages in thread
From: Jiri Olsa @ 2015-07-26 16:31 UTC (permalink / raw)
  To: kan.liang
  Cc: a.p.zijlstra, acme, luto, mingo, eranian, ak, mark.rutland,
	adrian.hunter, jolsa, namhyung, linux-kernel

On Fri, Jul 24, 2015 at 09:48:30AM -0400, kan.liang@intel.com wrote:

SNIP

> +	u64 nr = 0;
>  
>  	if (sid) {
>  		sample->id     = v->id;
> @@ -1010,6 +1012,20 @@ static int deliver_sample_value(struct perf_evlist *evlist,
>  		return 0;
>  	}
>  
> +	if (perf_evsel__is_group_leader(sid->evsel)) {
> +		evsel = sid->evsel;
> +		SET_FREQ_PERF_VALUE(evsel, sample->freq_perf_data,
> +				    sample->read.group.values[nr].value);
> +		evlist__for_each_continue(evlist, evsel) {
> +			if ((evsel->leader != sid->evsel) ||
> +			    (++nr >= sample->read.group.nr))
> +				break;
> +
> +			SET_FREQ_PERF_VALUE(evsel, sample->freq_perf_data,
> +					    sample->read.group.values[nr].value);

I think this should be in upper layer.. why not do this also
within iter_prepare_normal_entry as for the rest of the calculations
I suggested in my other email

jirka

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V2 1/6] perf,tools: save cpu max freq in perf header
  2015-07-24 13:48 ` [PATCH V2 1/6] perf,tools: save cpu max freq in perf header kan.liang
  2015-07-26 16:31   ` Jiri Olsa
@ 2015-07-26 16:32   ` Jiri Olsa
  1 sibling, 0 replies; 22+ messages in thread
From: Jiri Olsa @ 2015-07-26 16:32 UTC (permalink / raw)
  To: kan.liang
  Cc: a.p.zijlstra, acme, luto, mingo, eranian, ak, mark.rutland,
	adrian.hunter, jolsa, namhyung, linux-kernel

On Fri, Jul 24, 2015 at 09:48:27AM -0400, kan.liang@intel.com wrote:

SNIP

> diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h
> index 0af9cec..6784677 100644
> --- a/tools/perf/util/cpumap.h
> +++ b/tools/perf/util/cpumap.h
> @@ -58,6 +58,7 @@ int max_node_num;
>  int *cpunode_map;
>  
>  int cpu__setup_cpunode_map(void);
> +unsigned int get_cpu_max_freq(void);
>  
>  static inline int cpu__max_node(void)
>  {
> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index 03ace57..287a488 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -862,6 +862,16 @@ write_it:
>  	return do_write_string(fd, buffer);
>  }
>  
> +static int write_cpu_max_freq(int fd, struct perf_header *h __maybe_unused,
> +			      struct perf_evlist *evlist __maybe_unused)
> +{
> +	u32 freq;
> +
> +	freq = get_cpu_max_freq() / 1000;

why dont you store it as u64 ?

jirka

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V2 6/6] perf,tools: Show freq/CPU%/CORE_BUSY% in perf report --stdio
  2015-07-24 13:48 ` [PATCH V2 6/6] perf,tools: Show freq/CPU%/CORE_BUSY% in perf report --stdio kan.liang
@ 2015-07-26 16:32   ` Jiri Olsa
  0 siblings, 0 replies; 22+ messages in thread
From: Jiri Olsa @ 2015-07-26 16:32 UTC (permalink / raw)
  To: kan.liang
  Cc: a.p.zijlstra, acme, luto, mingo, eranian, ak, mark.rutland,
	adrian.hunter, jolsa, namhyung, linux-kernel

On Fri, Jul 24, 2015 at 09:48:32AM -0400, kan.liang@intel.com wrote:

SNIP

> diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> index fa0ccf3..7d70c89 100644
> --- a/tools/perf/util/symbol.h
> +++ b/tools/perf/util/symbol.h
> @@ -106,7 +106,8 @@ struct symbol_conf {
>  			filter_relative,
>  			show_hist_headers,
>  			branch_callstack,
> -			has_filter;
> +			has_filter,
> +			show_freq_perf;
>  	const char	*vmlinux_name,
>  			*kallsyms_name,
>  			*source_prefix,
> diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
> index edc2d63..648b307 100644
> --- a/tools/perf/util/util.c
> +++ b/tools/perf/util/util.c
> @@ -34,6 +34,8 @@ bool test_attr__enabled;
>  bool perf_host  = true;
>  bool perf_guest = false;
>  
> +bool perf_freq, perf_cpu_u, perf_core_busy;

how about 'symbol_conf.freq_perf_type' having display info encoded in?

thanks,
jirka

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH V2 5/6] perf,tools: caculate and save freq/CPU%/CORE_BUSY% in he_stat
  2015-07-26 16:31   ` Jiri Olsa
@ 2015-07-27 17:11     ` Liang, Kan
  2015-07-27 17:16       ` Jiri Olsa
  0 siblings, 1 reply; 22+ messages in thread
From: Liang, Kan @ 2015-07-27 17:11 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: a.p.zijlstra, acme, luto, mingo, eranian, ak, mark.rutland,
	Hunter, Adrian, jolsa, namhyung, linux-kernel

> 
> On Fri, Jul 24, 2015 at 09:48:31AM -0400, kan.liang@intel.com wrote:
> 
> SNIP
> 
> >  				      struct addr_location *al,
> > @@ -109,6 +110,7 @@ struct hist_entry *__hists__add_entry(struct
> hists *hists,
> >  				      struct branch_info *bi,
> >  				      struct mem_info *mi, u64 period,
> >  				      u64 weight, u64 transaction,
> > +				      struct freq_perf_info *freq_perf,
> >  				      bool sample_self);
> >  int hist_entry_iter__add(struct hist_entry_iter *iter, struct
> addr_location *al,
> >  			 int max_stack_depth, void *arg);
> > diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h index
> > e97cd47..90422ed 100644
> > --- a/tools/perf/util/sort.h
> > +++ b/tools/perf/util/sort.h
> > @@ -54,6 +54,9 @@ struct he_stat {
> >  	u64			period_guest_us;
> >  	u64			weight;
> >  	u32			nr_events;
> > +	u64			freq;
> > +	u64			cpu_u;
> 
> could the cpu_u have more descriptive name?

How about "cpu_util" or "cpu_utilization"?

Thanks,
Kan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V2 5/6] perf,tools: caculate and save freq/CPU%/CORE_BUSY% in he_stat
  2015-07-27 17:11     ` Liang, Kan
@ 2015-07-27 17:16       ` Jiri Olsa
  0 siblings, 0 replies; 22+ messages in thread
From: Jiri Olsa @ 2015-07-27 17:16 UTC (permalink / raw)
  To: Liang, Kan
  Cc: a.p.zijlstra, acme, luto, mingo, eranian, ak, mark.rutland,
	Hunter, Adrian, jolsa, namhyung, linux-kernel

On Mon, Jul 27, 2015 at 05:11:13PM +0000, Liang, Kan wrote:
> > 
> > On Fri, Jul 24, 2015 at 09:48:31AM -0400, kan.liang@intel.com wrote:
> > 
> > SNIP
> > 
> > >  				      struct addr_location *al,
> > > @@ -109,6 +110,7 @@ struct hist_entry *__hists__add_entry(struct
> > hists *hists,
> > >  				      struct branch_info *bi,
> > >  				      struct mem_info *mi, u64 period,
> > >  				      u64 weight, u64 transaction,
> > > +				      struct freq_perf_info *freq_perf,
> > >  				      bool sample_self);
> > >  int hist_entry_iter__add(struct hist_entry_iter *iter, struct
> > addr_location *al,
> > >  			 int max_stack_depth, void *arg);
> > > diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h index
> > > e97cd47..90422ed 100644
> > > --- a/tools/perf/util/sort.h
> > > +++ b/tools/perf/util/sort.h
> > > @@ -54,6 +54,9 @@ struct he_stat {
> > >  	u64			period_guest_us;
> > >  	u64			weight;
> > >  	u32			nr_events;
> > > +	u64			freq;
> > > +	u64			cpu_u;
> > 
> > could the cpu_u have more descriptive name?
> 
> How about "cpu_util" or "cpu_utilization"?

cpu_util sounds good to me

thanks,
jirka

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH V2 5/6] perf,tools: caculate and save freq/CPU%/CORE_BUSY% in he_stat
  2015-07-26 16:31   ` Jiri Olsa
@ 2015-07-27 17:19     ` Liang, Kan
  2015-07-27 17:27       ` Jiri Olsa
  2015-07-27 22:34     ` Liang, Kan
  1 sibling, 1 reply; 22+ messages in thread
From: Liang, Kan @ 2015-07-27 17:19 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: a.p.zijlstra, acme, luto, mingo, eranian, ak, mark.rutland,
	Hunter, Adrian, jolsa, namhyung, linux-kernel



> > -481,6 +486,32 @@ iter_add_next_nop_entry(struct hist_entry_iter
> *iter
> > __maybe_unused,  }
> >
> >  static int
> > +iter_add_single_freq_perf_entry(struct hist_entry_iter *iter, struct
> > +addr_location *al) {
> > +	struct perf_evsel *evsel = iter->evsel;
> > +	struct perf_sample *sample = iter->sample;
> > +	struct hist_entry *he;
> > +	struct freq_perf_info info = {0};
> > +	u64 *data = sample->freq_perf_data;
> > +
> > +	if (data[FREQ_PERF_REF_CYCLES] > 0)
> > +		info.freq = (data[FREQ_PERF_CYCLES] * cpu_max_freq) /
> data[FREQ_PERF_REF_CYCLES];
> > +	if (data[FREQ_PERF_TSC] > 0)
> > +		info.cpu_u = (100 * data[FREQ_PERF_REF_CYCLES]) /
> data[FREQ_PERF_TSC];
> > +	if (data[FREQ_PERF_MPERF] > 0)
> > +		info.core_busy = (100 * data[FREQ_PERF_APERF]) /
> > +data[FREQ_PERF_MPERF];
> 
> seems to me the new iterator is too big gun for this, why not initialize
> 'struct freq_perf_info' in iter_prepare_normal_entry ?
> 

Yes, we can initialize it in normal process.
But we only need the freq_perf_info when "--show-freq-perf" is set.
For most of the cases, it wastes time to calculate the unused information
for each sample. That could impact the performance for processing. 

Thanks,
Kan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V2 5/6] perf,tools: caculate and save freq/CPU%/CORE_BUSY% in he_stat
  2015-07-27 17:19     ` Liang, Kan
@ 2015-07-27 17:27       ` Jiri Olsa
  0 siblings, 0 replies; 22+ messages in thread
From: Jiri Olsa @ 2015-07-27 17:27 UTC (permalink / raw)
  To: Liang, Kan
  Cc: a.p.zijlstra, acme, luto, mingo, eranian, ak, mark.rutland,
	Hunter, Adrian, jolsa, namhyung, linux-kernel

On Mon, Jul 27, 2015 at 05:19:23PM +0000, Liang, Kan wrote:
> 
> 
> > > -481,6 +486,32 @@ iter_add_next_nop_entry(struct hist_entry_iter
> > *iter
> > > __maybe_unused,  }
> > >
> > >  static int
> > > +iter_add_single_freq_perf_entry(struct hist_entry_iter *iter, struct
> > > +addr_location *al) {
> > > +	struct perf_evsel *evsel = iter->evsel;
> > > +	struct perf_sample *sample = iter->sample;
> > > +	struct hist_entry *he;
> > > +	struct freq_perf_info info = {0};
> > > +	u64 *data = sample->freq_perf_data;
> > > +
> > > +	if (data[FREQ_PERF_REF_CYCLES] > 0)
> > > +		info.freq = (data[FREQ_PERF_CYCLES] * cpu_max_freq) /
> > data[FREQ_PERF_REF_CYCLES];
> > > +	if (data[FREQ_PERF_TSC] > 0)
> > > +		info.cpu_u = (100 * data[FREQ_PERF_REF_CYCLES]) /
> > data[FREQ_PERF_TSC];
> > > +	if (data[FREQ_PERF_MPERF] > 0)
> > > +		info.core_busy = (100 * data[FREQ_PERF_APERF]) /
> > > +data[FREQ_PERF_MPERF];
> > 
> > seems to me the new iterator is too big gun for this, why not initialize
> > 'struct freq_perf_info' in iter_prepare_normal_entry ?
> > 
> 
> Yes, we can initialize it in normal process.
> But we only need the freq_perf_info when "--show-freq-perf" is set.
> For most of the cases, it wastes time to calculate the unused information
> for each sample. That could impact the performance for processing. 

hum, it could fill in freq_perf_info only when --show-freq-perf is set,
other time it'd pass NULL

jirka

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH V2 4/6] perf,tools: save misc sample read value in struct perf_sample
  2015-07-26 16:31   ` Jiri Olsa
@ 2015-07-27 22:24     ` Liang, Kan
  2015-07-28 13:55       ` Jiri Olsa
  0 siblings, 1 reply; 22+ messages in thread
From: Liang, Kan @ 2015-07-27 22:24 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: a.p.zijlstra, acme, luto, mingo, eranian, ak, mark.rutland,
	Hunter, Adrian, jolsa, namhyung, linux-kernel



> -----Original Message-----
> From: Jiri Olsa [mailto:jolsa@redhat.com]
> Sent: Sunday, July 26, 2015 12:32 PM
> To: Liang, Kan
> Cc: a.p.zijlstra@chello.nl; acme@kernel.org; luto@kernel.org;
> mingo@redhat.com; eranian@google.com; ak@linux.intel.com;
> mark.rutland@arm.com; Hunter, Adrian; jolsa@kernel.org;
> namhyung@kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH V2 4/6] perf,tools: save misc sample read value in
> struct perf_sample
> 
> On Fri, Jul 24, 2015 at 09:48:30AM -0400, kan.liang@intel.com wrote:
> 
> SNIP
> 
> > +	u64 nr = 0;
> >
> >  	if (sid) {
> >  		sample->id     = v->id;
> > @@ -1010,6 +1012,20 @@ static int deliver_sample_value(struct
> perf_evlist *evlist,
> >  		return 0;
> >  	}
> >
> > +	if (perf_evsel__is_group_leader(sid->evsel)) {
> > +		evsel = sid->evsel;
> > +		SET_FREQ_PERF_VALUE(evsel, sample->freq_perf_data,
> > +				    sample->read.group.values[nr].value);
> > +		evlist__for_each_continue(evlist, evsel) {
> > +			if ((evsel->leader != sid->evsel) ||
> > +			    (++nr >= sample->read.group.nr))
> > +				break;
> > +
> > +			SET_FREQ_PERF_VALUE(evsel, sample-
> >freq_perf_data,
> > +					    sample-
> >read.group.values[nr].value);
> 
> I think this should be in upper layer..

OK, I'll move it to deliver_sample_group.

> why not do this also within
> iter_prepare_normal_entry as for the rest of the calculations I suggested
> in my other email

Because we cannot get evlist in iter_add/prepare function. We cannot go
through all group members' value. 
It's too complex (need to change many interfaces) to pass the evlist to
iter_add/prepare function

Thanks,
Kan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH V2 5/6] perf,tools: caculate and save freq/CPU%/CORE_BUSY% in he_stat
  2015-07-26 16:31   ` Jiri Olsa
  2015-07-27 17:19     ` Liang, Kan
@ 2015-07-27 22:34     ` Liang, Kan
  2015-07-28 13:50       ` Jiri Olsa
  1 sibling, 1 reply; 22+ messages in thread
From: Liang, Kan @ 2015-07-27 22:34 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: a.p.zijlstra, acme, luto, mingo, eranian, ak, mark.rutland,
	Hunter, Adrian, jolsa, namhyung, linux-kernel


> 
> On Fri, Jul 24, 2015 at 09:48:31AM -0400, kan.liang@intel.com wrote:
> 
> SNIP
> 
> >  		},
> >  		.parent = sym_parent,
> >  		.filtered = symbol__parent_filter(sym_parent) | al-
> >filtered, @@
> > -481,6 +486,32 @@ iter_add_next_nop_entry(struct hist_entry_iter
> *iter
> > __maybe_unused,  }
> >
> >  static int
> > +iter_add_single_freq_perf_entry(struct hist_entry_iter *iter, struct
> > +addr_location *al) {
> > +	struct perf_evsel *evsel = iter->evsel;
> > +	struct perf_sample *sample = iter->sample;
> > +	struct hist_entry *he;
> > +	struct freq_perf_info info = {0};
> > +	u64 *data = sample->freq_perf_data;
> > +
> > +	if (data[FREQ_PERF_REF_CYCLES] > 0)
> > +		info.freq = (data[FREQ_PERF_CYCLES] * cpu_max_freq) /
> data[FREQ_PERF_REF_CYCLES];
> > +	if (data[FREQ_PERF_TSC] > 0)
> > +		info.cpu_u = (100 * data[FREQ_PERF_REF_CYCLES]) /
> data[FREQ_PERF_TSC];
> > +	if (data[FREQ_PERF_MPERF] > 0)
> > +		info.core_busy = (100 * data[FREQ_PERF_APERF]) /
> > +data[FREQ_PERF_MPERF];
> 
> seems to me the new iterator is too big gun for this, why not initialize
> 'struct freq_perf_info' in iter_prepare_normal_entry ?
> 
How about initializing in iter_add_single_normal_entry?

We only use freq_perf_info in iter_add_single_normal_entry.
If initializing in iter_prepare_normal_entry, we have to save the
freq_perf_info in hist_entry_iter.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V2 5/6] perf,tools: caculate and save freq/CPU%/CORE_BUSY% in he_stat
  2015-07-27 22:34     ` Liang, Kan
@ 2015-07-28 13:50       ` Jiri Olsa
  0 siblings, 0 replies; 22+ messages in thread
From: Jiri Olsa @ 2015-07-28 13:50 UTC (permalink / raw)
  To: Liang, Kan
  Cc: a.p.zijlstra, acme, luto, mingo, eranian, ak, mark.rutland,
	Hunter, Adrian, jolsa, namhyung, linux-kernel

On Mon, Jul 27, 2015 at 10:34:04PM +0000, Liang, Kan wrote:
> 
> > 
> > On Fri, Jul 24, 2015 at 09:48:31AM -0400, kan.liang@intel.com wrote:
> > 
> > SNIP
> > 
> > >  		},
> > >  		.parent = sym_parent,
> > >  		.filtered = symbol__parent_filter(sym_parent) | al-
> > >filtered, @@
> > > -481,6 +486,32 @@ iter_add_next_nop_entry(struct hist_entry_iter
> > *iter
> > > __maybe_unused,  }
> > >
> > >  static int
> > > +iter_add_single_freq_perf_entry(struct hist_entry_iter *iter, struct
> > > +addr_location *al) {
> > > +	struct perf_evsel *evsel = iter->evsel;
> > > +	struct perf_sample *sample = iter->sample;
> > > +	struct hist_entry *he;
> > > +	struct freq_perf_info info = {0};
> > > +	u64 *data = sample->freq_perf_data;
> > > +
> > > +	if (data[FREQ_PERF_REF_CYCLES] > 0)
> > > +		info.freq = (data[FREQ_PERF_CYCLES] * cpu_max_freq) /
> > data[FREQ_PERF_REF_CYCLES];
> > > +	if (data[FREQ_PERF_TSC] > 0)
> > > +		info.cpu_u = (100 * data[FREQ_PERF_REF_CYCLES]) /
> > data[FREQ_PERF_TSC];
> > > +	if (data[FREQ_PERF_MPERF] > 0)
> > > +		info.core_busy = (100 * data[FREQ_PERF_APERF]) /
> > > +data[FREQ_PERF_MPERF];
> > 
> > seems to me the new iterator is too big gun for this, why not initialize
> > 'struct freq_perf_info' in iter_prepare_normal_entry ?
> > 
> How about initializing in iter_add_single_normal_entry?
> 
> We only use freq_perf_info in iter_add_single_normal_entry.
> If initializing in iter_prepare_normal_entry, we have to save the
> freq_perf_info in hist_entry_iter.

sounds good

jirka

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V2 4/6] perf,tools: save misc sample read value in struct perf_sample
  2015-07-27 22:24     ` Liang, Kan
@ 2015-07-28 13:55       ` Jiri Olsa
  0 siblings, 0 replies; 22+ messages in thread
From: Jiri Olsa @ 2015-07-28 13:55 UTC (permalink / raw)
  To: Liang, Kan
  Cc: a.p.zijlstra, acme, luto, mingo, eranian, ak, mark.rutland,
	Hunter, Adrian, jolsa, namhyung, linux-kernel

On Mon, Jul 27, 2015 at 10:24:49PM +0000, Liang, Kan wrote:

SNIP

> > >freq_perf_data,
> > > +					    sample-
> > >read.group.values[nr].value);
> > 
> > I think this should be in upper layer..
> 
> OK, I'll move it to deliver_sample_group.
> 
> > why not do this also within
> > iter_prepare_normal_entry as for the rest of the calculations I suggested
> > in my other email
> 
> Because we cannot get evlist in iter_add/prepare function. We cannot go
> through all group members' value. 
> It's too complex (need to change many interfaces) to pass the evlist to
> iter_add/prepare function

hum, I remember Namhyung added iter->add_entry_cb, but seems it's called
after the entry is added.. how about adding pre-addition callback?

it can reach the report structure with perf_evlist

jirka

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2015-07-28 13:55 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-24 13:48 [PATCH V2 0/6] Freq/CPU%/CORE_BUSY% support kan.liang
2015-07-24 13:48 ` [PATCH V2 1/6] perf,tools: save cpu max freq in perf header kan.liang
2015-07-26 16:31   ` Jiri Olsa
2015-07-26 16:32   ` Jiri Olsa
2015-07-24 13:48 ` [PATCH V2 2/6] perf,tools: read cpu max freq and msr type from header kan.liang
2015-07-24 13:48 ` [PATCH V2 3/6] perf,tools: Dump per-sample freq/CPU%/CORE_BUSY% in report -D kan.liang
2015-07-26 16:31   ` Jiri Olsa
2015-07-24 13:48 ` [PATCH V2 4/6] perf,tools: save misc sample read value in struct perf_sample kan.liang
2015-07-26 16:31   ` Jiri Olsa
2015-07-27 22:24     ` Liang, Kan
2015-07-28 13:55       ` Jiri Olsa
2015-07-24 13:48 ` [PATCH V2 5/6] perf,tools: caculate and save freq/CPU%/CORE_BUSY% in he_stat kan.liang
2015-07-26 16:31   ` Jiri Olsa
2015-07-27 17:19     ` Liang, Kan
2015-07-27 17:27       ` Jiri Olsa
2015-07-27 22:34     ` Liang, Kan
2015-07-28 13:50       ` Jiri Olsa
2015-07-26 16:31   ` Jiri Olsa
2015-07-27 17:11     ` Liang, Kan
2015-07-27 17:16       ` Jiri Olsa
2015-07-24 13:48 ` [PATCH V2 6/6] perf,tools: Show freq/CPU%/CORE_BUSY% in perf report --stdio kan.liang
2015-07-26 16:32   ` Jiri Olsa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).