linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] perf tools: Fix for cases where cpumode is incorrect or insufficient
@ 2018-10-31  9:10 Adrian Hunter
  2018-10-31  9:10 ` [PATCH 1/5] perf tools: Add fallback functions for cases where cpumode is insufficient Adrian Hunter
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Adrian Hunter @ 2018-10-31  9:10 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, linux-kernel, leo.yan, David Miller,
	Mathieu Poirier

Hi

These patches probably deal with most cases except the one fixed by David
Miller's "perf callchain: Honour the ordering of
PERF_CONTEXT_{USER,KERNEL,etc}" patch, and also cat_backtrace() which looks
like it has the same problem, and the cs-etm fix.


Adrian Hunter (5):
      perf tools: Add fallback functions for cases where cpumode is insufficient
      perf tools: Use fallback for sample_addr_correlates_sym() cases
      perf tools: Use fallbacks for branch stacks
      perf intel-pt: Insert callchain context into synthesized callchains
      perf intel-pt/bts: Calculate cpumode for synthesized samples

 tools/perf/builtin-script.c                        | 12 +++---
 tools/perf/util/event.c                            | 29 +++++++++++++-
 tools/perf/util/intel-bts.c                        | 17 ++++++---
 tools/perf/util/intel-pt.c                         | 28 ++++++++------
 tools/perf/util/machine.c                          | 40 ++++++++++++++++++++
 tools/perf/util/machine.h                          |  2 +
 .../util/scripting-engines/trace-event-python.c    | 16 ++++----
 tools/perf/util/thread-stack.c                     | 44 +++++++++++++++++-----
 tools/perf/util/thread-stack.h                     |  2 +-
 tools/perf/util/thread.h                           |  4 ++
 10 files changed, 153 insertions(+), 41 deletions(-)


Regards
Adrian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/5] perf tools: Add fallback functions for cases where cpumode is insufficient
  2018-10-31  9:10 [PATCH 0/5] perf tools: Fix for cases where cpumode is incorrect or insufficient Adrian Hunter
@ 2018-10-31  9:10 ` Adrian Hunter
  2018-11-05 17:29   ` Arnaldo Carvalho de Melo
  2018-10-31  9:10 ` [PATCH 2/5] perf tools: Use fallback for sample_addr_correlates_sym() cases Adrian Hunter
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Adrian Hunter @ 2018-10-31  9:10 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, linux-kernel, leo.yan, David Miller,
	Mathieu Poirier

For branch stacks or branch samples, the sample cpumode might not be
correct because it applies only to the sample 'ip' and not necessary to
'addr' or branch stack addresses. Add fallback functions that can be used
to deal with those cases

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org # 4.19
---
 tools/perf/util/event.c   | 27 ++++++++++++++++++++++++++
 tools/perf/util/machine.c | 40 +++++++++++++++++++++++++++++++++++++++
 tools/perf/util/machine.h |  2 ++
 tools/perf/util/thread.h  |  4 ++++
 4 files changed, 73 insertions(+)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index bc646185f8d9..52b24b8ce29e 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -1576,6 +1576,24 @@ struct map *thread__find_map(struct thread *thread, u8 cpumode, u64 addr,
 	return al->map;
 }
 
+/*
+ * For branch stacks or branch samples, the sample cpumode might not be correct
+ * because it applies only to the sample 'ip' and not necessary to 'addr' or
+ * branch stack addresses. If possible, use a fallback to deal with those cases.
+ */
+struct map *thread__find_map_fallback(struct thread *thread, u8 cpumode,
+				      u64 addr, struct addr_location *al)
+{
+	struct map *map = thread__find_map(thread, cpumode, addr, al);
+	struct machine *machine = thread->mg->machine;
+	u8 addr_cpumode = machine__addr_cpumode(machine, cpumode, addr);
+
+	if (map || addr_cpumode == cpumode)
+		return map;
+
+	return thread__find_map(thread, addr_cpumode, addr, al);
+}
+
 struct symbol *thread__find_symbol(struct thread *thread, u8 cpumode,
 				   u64 addr, struct addr_location *al)
 {
@@ -1585,6 +1603,15 @@ struct symbol *thread__find_symbol(struct thread *thread, u8 cpumode,
 	return al->sym;
 }
 
+struct symbol *thread__find_symbol_fb(struct thread *thread, u8 cpumode,
+				      u64 addr, struct addr_location *al)
+{
+	al->sym = NULL;
+	if (thread__find_map_fallback(thread, cpumode, addr, al))
+		al->sym = map__find_symbol(al->map, al->addr);
+	return al->sym;
+}
+
 /*
  * Callers need to drop the reference to al->thread, obtained in
  * machine__findnew_thread()
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 111ae858cbcb..04edc0eac376 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2542,6 +2542,46 @@ int machine__get_kernel_start(struct machine *machine)
 	return err;
 }
 
+/*
+ * machine__single_ku_as - Machine has same address space for kernel and user.
+ * @machine: machine object
+ *
+ * Some architectures have a single address space for kernel and user addresses,
+ * which makes it possible to determine if an address is in kernel space or user
+ * space.
+ */
+static bool machine__single_ku_as(struct machine *machine)
+{
+	return strcmp(perf_env__arch(machine->env), "sparc");
+}
+
+u8 machine__addr_cpumode(struct machine *machine, u8 cpumode, u64 addr)
+{
+	u8 addr_cpumode = cpumode;
+	bool kernel_ip;
+
+	if (!machine__single_ku_as(machine))
+		goto out;
+
+	kernel_ip = machine__kernel_ip(machine, addr);
+	switch (cpumode) {
+	case PERF_RECORD_MISC_KERNEL:
+	case PERF_RECORD_MISC_USER:
+		addr_cpumode = kernel_ip ? PERF_RECORD_MISC_KERNEL :
+					   PERF_RECORD_MISC_USER;
+		break;
+	case PERF_RECORD_MISC_GUEST_KERNEL:
+	case PERF_RECORD_MISC_GUEST_USER:
+		addr_cpumode = kernel_ip ? PERF_RECORD_MISC_GUEST_KERNEL :
+					   PERF_RECORD_MISC_GUEST_USER;
+		break;
+	default:
+		break;
+	}
+out:
+	return addr_cpumode;
+}
+
 struct dso *machine__findnew_dso(struct machine *machine, const char *filename)
 {
 	return dsos__findnew(&machine->dsos, filename);
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index d856b85862e2..20d464135ed1 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -99,6 +99,8 @@ static inline bool machine__kernel_ip(struct machine *machine, u64 ip)
 	return ip >= kernel_start;
 }
 
+u8 machine__addr_cpumode(struct machine *machine, u8 cpumode, u64 addr);
+
 struct thread *machine__find_thread(struct machine *machine, pid_t pid,
 				    pid_t tid);
 struct comm *machine__thread_exec_comm(struct machine *machine,
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 36c09a9904e6..e28f9dc1110a 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -96,9 +96,13 @@ struct thread *thread__main_thread(struct machine *machine, struct thread *threa
 
 struct map *thread__find_map(struct thread *thread, u8 cpumode, u64 addr,
 			     struct addr_location *al);
+struct map *thread__find_map_fallback(struct thread *thread, u8 cpumode,
+				      u64 addr, struct addr_location *al);
 
 struct symbol *thread__find_symbol(struct thread *thread, u8 cpumode,
 				   u64 addr, struct addr_location *al);
+struct symbol *thread__find_symbol_fb(struct thread *thread, u8 cpumode,
+				      u64 addr, struct addr_location *al);
 
 void thread__find_cpumode_addr_location(struct thread *thread, u64 addr,
 					struct addr_location *al);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/5] perf tools: Use fallback for sample_addr_correlates_sym() cases
  2018-10-31  9:10 [PATCH 0/5] perf tools: Fix for cases where cpumode is incorrect or insufficient Adrian Hunter
  2018-10-31  9:10 ` [PATCH 1/5] perf tools: Add fallback functions for cases where cpumode is insufficient Adrian Hunter
@ 2018-10-31  9:10 ` Adrian Hunter
  2018-11-03 17:46   ` Hunter, Adrian
  2018-10-31  9:10 ` [PATCH 3/5] perf tools: Use fallbacks for branch stacks Adrian Hunter
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Adrian Hunter @ 2018-10-31  9:10 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, linux-kernel, leo.yan, David Miller,
	Mathieu Poirier

thread__resolve() is used in the sample_addr_correlates_sym() cases where
'addr' is a destination of a branch which does not necessarily have the
same cpumode as the 'ip'. Use the fallback function in that case.

This patch depends on patch "perf tools: Add fallback functions for cases
where cpumode is insufficient".

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org # 4.19
---
 tools/perf/util/event.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 52b24b8ce29e..316f62ffa27b 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -1705,7 +1705,7 @@ bool sample_addr_correlates_sym(struct perf_event_attr *attr)
 void thread__resolve(struct thread *thread, struct addr_location *al,
 		     struct perf_sample *sample)
 {
-	thread__find_map(thread, sample->cpumode, sample->addr, al);
+	thread__find_map_fallback(thread, sample->cpumode, sample->addr, al);
 
 	al->cpu = sample->cpu;
 	al->sym = NULL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/5] perf tools: Use fallbacks for branch stacks
  2018-10-31  9:10 [PATCH 0/5] perf tools: Fix for cases where cpumode is incorrect or insufficient Adrian Hunter
  2018-10-31  9:10 ` [PATCH 1/5] perf tools: Add fallback functions for cases where cpumode is insufficient Adrian Hunter
  2018-10-31  9:10 ` [PATCH 2/5] perf tools: Use fallback for sample_addr_correlates_sym() cases Adrian Hunter
@ 2018-10-31  9:10 ` Adrian Hunter
  2018-11-05 17:56   ` Arnaldo Carvalho de Melo
  2018-10-31  9:10 ` [PATCH 4/5] perf intel-pt: Insert callchain context into synthesized callchains Adrian Hunter
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Adrian Hunter @ 2018-10-31  9:10 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, linux-kernel, leo.yan, David Miller,
	Mathieu Poirier

Branch stacks do not necessarily have the same cpumode as the 'ip'. Use the
fallback functions in those cases.

This patch depends on patch "perf tools: Add fallback functions for cases
where cpumode is insufficient".

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org # 4.19
---
 tools/perf/builtin-script.c                      | 12 ++++++------
 .../util/scripting-engines/trace-event-python.c  | 16 ++++++++--------
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index b5bc85bd0bbe..996317d8e183 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -728,8 +728,8 @@ static int perf_sample__fprintf_brstack(struct perf_sample *sample,
 		if (PRINT_FIELD(DSO)) {
 			memset(&alf, 0, sizeof(alf));
 			memset(&alt, 0, sizeof(alt));
-			thread__find_map(thread, sample->cpumode, from, &alf);
-			thread__find_map(thread, sample->cpumode, to, &alt);
+			thread__find_map_fallback(thread, sample->cpumode, from, &alf);
+			thread__find_map_fallback(thread, sample->cpumode, to, &alt);
 		}
 
 		printed += fprintf(fp, " 0x%"PRIx64, from);
@@ -775,8 +775,8 @@ static int perf_sample__fprintf_brstacksym(struct perf_sample *sample,
 		from = br->entries[i].from;
 		to   = br->entries[i].to;
 
-		thread__find_symbol(thread, sample->cpumode, from, &alf);
-		thread__find_symbol(thread, sample->cpumode, to, &alt);
+		thread__find_symbol_fb(thread, sample->cpumode, from, &alf);
+		thread__find_symbol_fb(thread, sample->cpumode, to, &alt);
 
 		printed += symbol__fprintf_symname_offs(alf.sym, &alf, fp);
 		if (PRINT_FIELD(DSO)) {
@@ -820,11 +820,11 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample,
 		from = br->entries[i].from;
 		to   = br->entries[i].to;
 
-		if (thread__find_map(thread, sample->cpumode, from, &alf) &&
+		if (thread__find_map_fallback(thread, sample->cpumode, from, &alf) &&
 		    !alf.map->dso->adjust_symbols)
 			from = map__map_ip(alf.map, from);
 
-		if (thread__find_map(thread, sample->cpumode, to, &alt) &&
+		if (thread__find_map_fallback(thread, sample->cpumode, to, &alt) &&
 		    !alt.map->dso->adjust_symbols)
 			to = map__map_ip(alt.map, to);
 
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index 69aa93d4ee99..244aeeee9c7a 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -494,14 +494,14 @@ static PyObject *python_process_brstack(struct perf_sample *sample,
 		pydict_set_item_string_decref(pyelem, "cycles",
 		    PyLong_FromUnsignedLongLong(br->entries[i].flags.cycles));
 
-		thread__find_map(thread, sample->cpumode,
-				 br->entries[i].from, &al);
+		thread__find_map_fallback(thread, sample->cpumode,
+					  br->entries[i].from, &al);
 		dsoname = get_dsoname(al.map);
 		pydict_set_item_string_decref(pyelem, "from_dsoname",
 					      _PyUnicode_FromString(dsoname));
 
-		thread__find_map(thread, sample->cpumode,
-				 br->entries[i].to, &al);
+		thread__find_map_fallback(thread, sample->cpumode,
+					  br->entries[i].to, &al);
 		dsoname = get_dsoname(al.map);
 		pydict_set_item_string_decref(pyelem, "to_dsoname",
 					      _PyUnicode_FromString(dsoname));
@@ -576,14 +576,14 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample,
 		if (!pyelem)
 			Py_FatalError("couldn't create Python dictionary");
 
-		thread__find_symbol(thread, sample->cpumode,
-				    br->entries[i].from, &al);
+		thread__find_symbol_fb(thread, sample->cpumode,
+				       br->entries[i].from, &al);
 		get_symoff(al.sym, &al, true, bf, sizeof(bf));
 		pydict_set_item_string_decref(pyelem, "from",
 					      _PyUnicode_FromString(bf));
 
-		thread__find_symbol(thread, sample->cpumode,
-				    br->entries[i].to, &al);
+		thread__find_symbol_fb(thread, sample->cpumode,
+				       br->entries[i].to, &al);
 		get_symoff(al.sym, &al, true, bf, sizeof(bf));
 		pydict_set_item_string_decref(pyelem, "to",
 					      _PyUnicode_FromString(bf));
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/5] perf intel-pt: Insert callchain context into synthesized callchains
  2018-10-31  9:10 [PATCH 0/5] perf tools: Fix for cases where cpumode is incorrect or insufficient Adrian Hunter
                   ` (2 preceding siblings ...)
  2018-10-31  9:10 ` [PATCH 3/5] perf tools: Use fallbacks for branch stacks Adrian Hunter
@ 2018-10-31  9:10 ` Adrian Hunter
  2018-10-31 13:28   ` Arnaldo Carvalho de Melo
  2018-10-31  9:10 ` [PATCH 5/5] perf intel-pt/bts: Calculate cpumode for synthesized samples Adrian Hunter
  2018-10-31 13:35 ` [PATCH 0/5] perf tools: Fix for cases where cpumode is incorrect or insufficient Jiri Olsa
  5 siblings, 1 reply; 25+ messages in thread
From: Adrian Hunter @ 2018-10-31  9:10 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, linux-kernel, leo.yan, David Miller,
	Mathieu Poirier

In the absence of a fallback, callchains must encode also the callchain
context. Do that now there is no fallback.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org # 4.19
---
 tools/perf/util/intel-pt.c     |  6 +++--
 tools/perf/util/thread-stack.c | 44 +++++++++++++++++++++++++++-------
 tools/perf/util/thread-stack.h |  2 +-
 3 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index ffa385a029b3..60732213d16a 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -759,7 +759,8 @@ static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
 	if (pt->synth_opts.callchain) {
 		size_t sz = sizeof(struct ip_callchain);
 
-		sz += pt->synth_opts.callchain_sz * sizeof(u64);
+		/* Add 1 to callchain_sz for callchain context */
+		sz += (pt->synth_opts.callchain_sz + 1) * sizeof(u64);
 		ptq->chain = zalloc(sz);
 		if (!ptq->chain)
 			goto out_free;
@@ -1160,7 +1161,8 @@ static void intel_pt_prep_sample(struct intel_pt *pt,
 
 	if (pt->synth_opts.callchain) {
 		thread_stack__sample(ptq->thread, ptq->chain,
-				     pt->synth_opts.callchain_sz, sample->ip);
+				     pt->synth_opts.callchain_sz + 1,
+				     sample->ip, pt->kernel_start);
 		sample->callchain = ptq->chain;
 	}
 
diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
index c091635bf7dc..afdf36852ac8 100644
--- a/tools/perf/util/thread-stack.c
+++ b/tools/perf/util/thread-stack.c
@@ -310,20 +310,46 @@ void thread_stack__free(struct thread *thread)
 	}
 }
 
+static inline u64 callchain_context(u64 ip, u64 kernel_start)
+{
+	return ip < kernel_start ? PERF_CONTEXT_USER : PERF_CONTEXT_KERNEL;
+}
+
 void thread_stack__sample(struct thread *thread, struct ip_callchain *chain,
-			  size_t sz, u64 ip)
+			  size_t sz, u64 ip, u64 kernel_start)
 {
-	size_t i;
+	u64 context = callchain_context(ip, kernel_start);
+	u64 last_context;
+	size_t i, j;
 
-	if (!thread || !thread->ts)
-		chain->nr = 1;
-	else
-		chain->nr = min(sz, thread->ts->cnt + 1);
+	if (sz < 2) {
+		chain->nr = 0;
+		return;
+	}
 
-	chain->ips[0] = ip;
+	chain->ips[0] = context;
+	chain->ips[1] = ip;
+
+	if (!thread || !thread->ts) {
+		chain->nr = 2;
+		return;
+	}
+
+	last_context = context;
+
+	for (i = 2, j = 0; i < sz && j < thread->ts->cnt; i++, j++) {
+		ip = thread->ts->stack[thread->ts->cnt - j].ret_addr;
+		context = callchain_context(ip, kernel_start);
+		if (context != last_context) {
+			if (i >= sz - 1)
+				break;
+			chain->ips[i++] = context;
+			last_context = context;
+		}
+		chain->ips[i] = ip;
+	}
 
-	for (i = 1; i < chain->nr; i++)
-		chain->ips[i] = thread->ts->stack[thread->ts->cnt - i].ret_addr;
+	chain->nr = i;
 }
 
 struct call_return_processor *
diff --git a/tools/perf/util/thread-stack.h b/tools/perf/util/thread-stack.h
index b7e41c4ebfdd..f97c00a8c251 100644
--- a/tools/perf/util/thread-stack.h
+++ b/tools/perf/util/thread-stack.h
@@ -84,7 +84,7 @@ int thread_stack__event(struct thread *thread, u32 flags, u64 from_ip,
 			u64 to_ip, u16 insn_len, u64 trace_nr);
 void thread_stack__set_trace_nr(struct thread *thread, u64 trace_nr);
 void thread_stack__sample(struct thread *thread, struct ip_callchain *chain,
-			  size_t sz, u64 ip);
+			  size_t sz, u64 ip, u64 kernel_start);
 int thread_stack__flush(struct thread *thread);
 void thread_stack__free(struct thread *thread);
 size_t thread_stack__depth(struct thread *thread);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/5] perf intel-pt/bts: Calculate cpumode for synthesized samples
  2018-10-31  9:10 [PATCH 0/5] perf tools: Fix for cases where cpumode is incorrect or insufficient Adrian Hunter
                   ` (3 preceding siblings ...)
  2018-10-31  9:10 ` [PATCH 4/5] perf intel-pt: Insert callchain context into synthesized callchains Adrian Hunter
@ 2018-10-31  9:10 ` Adrian Hunter
  2018-10-31 22:13   ` [tip:perf/urgent] " tip-bot for Adrian Hunter
  2018-10-31 13:35 ` [PATCH 0/5] perf tools: Fix for cases where cpumode is incorrect or insufficient Jiri Olsa
  5 siblings, 1 reply; 25+ messages in thread
From: Adrian Hunter @ 2018-10-31  9:10 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, linux-kernel, leo.yan, David Miller,
	Mathieu Poirier

In the absence of a fallback, samples must provide a correct cpumode for
the 'ip'. Do that now there is no fallback.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org # 4.19
---
 tools/perf/util/intel-bts.c | 17 ++++++++++++-----
 tools/perf/util/intel-pt.c  | 22 +++++++++++++---------
 2 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
index 3b3a3d55dca1..7b27d77306c2 100644
--- a/tools/perf/util/intel-bts.c
+++ b/tools/perf/util/intel-bts.c
@@ -269,6 +269,13 @@ static int intel_bts_do_fix_overlap(struct auxtrace_queue *queue,
 	return 0;
 }
 
+static inline u8 intel_bts_cpumode(struct intel_bts *bts, uint64_t ip)
+{
+	return machine__kernel_ip(bts->machine, ip) ?
+	       PERF_RECORD_MISC_KERNEL :
+	       PERF_RECORD_MISC_USER;
+}
+
 static int intel_bts_synth_branch_sample(struct intel_bts_queue *btsq,
 					 struct branch *branch)
 {
@@ -281,12 +288,8 @@ static int intel_bts_synth_branch_sample(struct intel_bts_queue *btsq,
 	    bts->num_events++ <= bts->synth_opts.initial_skip)
 		return 0;
 
-	event.sample.header.type = PERF_RECORD_SAMPLE;
-	event.sample.header.misc = PERF_RECORD_MISC_USER;
-	event.sample.header.size = sizeof(struct perf_event_header);
-
-	sample.cpumode = PERF_RECORD_MISC_USER;
 	sample.ip = le64_to_cpu(branch->from);
+	sample.cpumode = intel_bts_cpumode(bts, sample.ip);
 	sample.pid = btsq->pid;
 	sample.tid = btsq->tid;
 	sample.addr = le64_to_cpu(branch->to);
@@ -298,6 +301,10 @@ static int intel_bts_synth_branch_sample(struct intel_bts_queue *btsq,
 	sample.insn_len = btsq->intel_pt_insn.length;
 	memcpy(sample.insn, btsq->intel_pt_insn.buf, INTEL_PT_INSN_BUF_SZ);
 
+	event.sample.header.type = PERF_RECORD_SAMPLE;
+	event.sample.header.misc = sample.cpumode;
+	event.sample.header.size = sizeof(struct perf_event_header);
+
 	if (bts->synth_opts.inject) {
 		event.sample.header.size = bts->branches_event_size;
 		ret = perf_event__synthesize_sample(&event,
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 60732213d16a..86cc9a64e982 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -407,6 +407,13 @@ intel_pt_cache_lookup(struct dso *dso, struct machine *machine, u64 offset)
 	return auxtrace_cache__lookup(dso->auxtrace_cache, offset);
 }
 
+static inline u8 intel_pt_cpumode(struct intel_pt *pt, uint64_t ip)
+{
+	return ip >= pt->kernel_start ?
+	       PERF_RECORD_MISC_KERNEL :
+	       PERF_RECORD_MISC_USER;
+}
+
 static int intel_pt_walk_next_insn(struct intel_pt_insn *intel_pt_insn,
 				   uint64_t *insn_cnt_ptr, uint64_t *ip,
 				   uint64_t to_ip, uint64_t max_insn_cnt,
@@ -429,10 +436,7 @@ static int intel_pt_walk_next_insn(struct intel_pt_insn *intel_pt_insn,
 	if (to_ip && *ip == to_ip)
 		goto out_no_cache;
 
-	if (*ip >= ptq->pt->kernel_start)
-		cpumode = PERF_RECORD_MISC_KERNEL;
-	else
-		cpumode = PERF_RECORD_MISC_USER;
+	cpumode = intel_pt_cpumode(ptq->pt, *ip);
 
 	thread = ptq->thread;
 	if (!thread) {
@@ -1059,15 +1063,11 @@ static void intel_pt_prep_b_sample(struct intel_pt *pt,
 				   union perf_event *event,
 				   struct perf_sample *sample)
 {
-	event->sample.header.type = PERF_RECORD_SAMPLE;
-	event->sample.header.misc = PERF_RECORD_MISC_USER;
-	event->sample.header.size = sizeof(struct perf_event_header);
-
 	if (!pt->timeless_decoding)
 		sample->time = tsc_to_perf_time(ptq->timestamp, &pt->tc);
 
-	sample->cpumode = PERF_RECORD_MISC_USER;
 	sample->ip = ptq->state->from_ip;
+	sample->cpumode = intel_pt_cpumode(pt, sample->ip);
 	sample->pid = ptq->pid;
 	sample->tid = ptq->tid;
 	sample->addr = ptq->state->to_ip;
@@ -1076,6 +1076,10 @@ static void intel_pt_prep_b_sample(struct intel_pt *pt,
 	sample->flags = ptq->flags;
 	sample->insn_len = ptq->insn_len;
 	memcpy(sample->insn, ptq->insn, INTEL_PT_INSN_BUF_SZ);
+
+	event->sample.header.type = PERF_RECORD_SAMPLE;
+	event->sample.header.misc = sample->cpumode;
+	event->sample.header.size = sizeof(struct perf_event_header);
 }
 
 static int intel_pt_inject_event(union perf_event *event,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/5] perf intel-pt: Insert callchain context into synthesized callchains
  2018-10-31  9:10 ` [PATCH 4/5] perf intel-pt: Insert callchain context into synthesized callchains Adrian Hunter
@ 2018-10-31 13:28   ` Arnaldo Carvalho de Melo
  2018-10-31 14:15     ` Adrian Hunter
  0 siblings, 1 reply; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-10-31 13:28 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Jiri Olsa, Andi Kleen, linux-kernel, leo.yan, David Miller,
	Mathieu Poirier

Em Wed, Oct 31, 2018 at 11:10:42AM +0200, Adrian Hunter escreveu:
> In the absence of a fallback, callchains must encode also the callchain
> context. Do that now there is no fallback.

So, this one is independent of the first 3 patches, right? Ok, applying
it first, I'll relook the first ones next.

- Arnaldo
 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> Cc: stable@vger.kernel.org # 4.19
> ---
>  tools/perf/util/intel-pt.c     |  6 +++--
>  tools/perf/util/thread-stack.c | 44 +++++++++++++++++++++++++++-------
>  tools/perf/util/thread-stack.h |  2 +-
>  3 files changed, 40 insertions(+), 12 deletions(-)
> 
> diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
> index ffa385a029b3..60732213d16a 100644
> --- a/tools/perf/util/intel-pt.c
> +++ b/tools/perf/util/intel-pt.c
> @@ -759,7 +759,8 @@ static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
>  	if (pt->synth_opts.callchain) {
>  		size_t sz = sizeof(struct ip_callchain);
>  
> -		sz += pt->synth_opts.callchain_sz * sizeof(u64);
> +		/* Add 1 to callchain_sz for callchain context */
> +		sz += (pt->synth_opts.callchain_sz + 1) * sizeof(u64);
>  		ptq->chain = zalloc(sz);
>  		if (!ptq->chain)
>  			goto out_free;
> @@ -1160,7 +1161,8 @@ static void intel_pt_prep_sample(struct intel_pt *pt,
>  
>  	if (pt->synth_opts.callchain) {
>  		thread_stack__sample(ptq->thread, ptq->chain,
> -				     pt->synth_opts.callchain_sz, sample->ip);
> +				     pt->synth_opts.callchain_sz + 1,
> +				     sample->ip, pt->kernel_start);
>  		sample->callchain = ptq->chain;
>  	}
>  
> diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
> index c091635bf7dc..afdf36852ac8 100644
> --- a/tools/perf/util/thread-stack.c
> +++ b/tools/perf/util/thread-stack.c
> @@ -310,20 +310,46 @@ void thread_stack__free(struct thread *thread)
>  	}
>  }
>  
> +static inline u64 callchain_context(u64 ip, u64 kernel_start)
> +{
> +	return ip < kernel_start ? PERF_CONTEXT_USER : PERF_CONTEXT_KERNEL;
> +}
> +
>  void thread_stack__sample(struct thread *thread, struct ip_callchain *chain,
> -			  size_t sz, u64 ip)
> +			  size_t sz, u64 ip, u64 kernel_start)
>  {
> -	size_t i;
> +	u64 context = callchain_context(ip, kernel_start);
> +	u64 last_context;
> +	size_t i, j;
>  
> -	if (!thread || !thread->ts)
> -		chain->nr = 1;
> -	else
> -		chain->nr = min(sz, thread->ts->cnt + 1);
> +	if (sz < 2) {
> +		chain->nr = 0;
> +		return;
> +	}
>  
> -	chain->ips[0] = ip;
> +	chain->ips[0] = context;
> +	chain->ips[1] = ip;
> +
> +	if (!thread || !thread->ts) {
> +		chain->nr = 2;
> +		return;
> +	}
> +
> +	last_context = context;
> +
> +	for (i = 2, j = 0; i < sz && j < thread->ts->cnt; i++, j++) {
> +		ip = thread->ts->stack[thread->ts->cnt - j].ret_addr;
> +		context = callchain_context(ip, kernel_start);
> +		if (context != last_context) {
> +			if (i >= sz - 1)
> +				break;
> +			chain->ips[i++] = context;
> +			last_context = context;
> +		}
> +		chain->ips[i] = ip;
> +	}
>  
> -	for (i = 1; i < chain->nr; i++)
> -		chain->ips[i] = thread->ts->stack[thread->ts->cnt - i].ret_addr;
> +	chain->nr = i;
>  }
>  
>  struct call_return_processor *
> diff --git a/tools/perf/util/thread-stack.h b/tools/perf/util/thread-stack.h
> index b7e41c4ebfdd..f97c00a8c251 100644
> --- a/tools/perf/util/thread-stack.h
> +++ b/tools/perf/util/thread-stack.h
> @@ -84,7 +84,7 @@ int thread_stack__event(struct thread *thread, u32 flags, u64 from_ip,
>  			u64 to_ip, u16 insn_len, u64 trace_nr);
>  void thread_stack__set_trace_nr(struct thread *thread, u64 trace_nr);
>  void thread_stack__sample(struct thread *thread, struct ip_callchain *chain,
> -			  size_t sz, u64 ip);
> +			  size_t sz, u64 ip, u64 kernel_start);
>  int thread_stack__flush(struct thread *thread);
>  void thread_stack__free(struct thread *thread);
>  size_t thread_stack__depth(struct thread *thread);
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/5] perf tools: Fix for cases where cpumode is incorrect or insufficient
  2018-10-31  9:10 [PATCH 0/5] perf tools: Fix for cases where cpumode is incorrect or insufficient Adrian Hunter
                   ` (4 preceding siblings ...)
  2018-10-31  9:10 ` [PATCH 5/5] perf intel-pt/bts: Calculate cpumode for synthesized samples Adrian Hunter
@ 2018-10-31 13:35 ` Jiri Olsa
  5 siblings, 0 replies; 25+ messages in thread
From: Jiri Olsa @ 2018-10-31 13:35 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Andi Kleen, linux-kernel, leo.yan,
	David Miller, Mathieu Poirier

On Wed, Oct 31, 2018 at 11:10:38AM +0200, Adrian Hunter wrote:
> Hi
> 
> These patches probably deal with most cases except the one fixed by David
> Miller's "perf callchain: Honour the ordering of
> PERF_CONTEXT_{USER,KERNEL,etc}" patch, and also cat_backtrace() which looks
> like it has the same problem, and the cs-etm fix.
> 
> 
> Adrian Hunter (5):
>       perf tools: Add fallback functions for cases where cpumode is insufficient
>       perf tools: Use fallback for sample_addr_correlates_sym() cases
>       perf tools: Use fallbacks for branch stacks
>       perf intel-pt: Insert callchain context into synthesized callchains
>       perf intel-pt/bts: Calculate cpumode for synthesized samples

Reviewed-by: Jiri Olsa <jolsa@kernel.org>

thanks,
jirka

> 
>  tools/perf/builtin-script.c                        | 12 +++---
>  tools/perf/util/event.c                            | 29 +++++++++++++-
>  tools/perf/util/intel-bts.c                        | 17 ++++++---
>  tools/perf/util/intel-pt.c                         | 28 ++++++++------
>  tools/perf/util/machine.c                          | 40 ++++++++++++++++++++
>  tools/perf/util/machine.h                          |  2 +
>  .../util/scripting-engines/trace-event-python.c    | 16 ++++----
>  tools/perf/util/thread-stack.c                     | 44 +++++++++++++++++-----
>  tools/perf/util/thread-stack.h                     |  2 +-
>  tools/perf/util/thread.h                           |  4 ++
>  10 files changed, 153 insertions(+), 41 deletions(-)
> 
> 
> Regards
> Adrian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/5] perf intel-pt: Insert callchain context into synthesized callchains
  2018-10-31 13:28   ` Arnaldo Carvalho de Melo
@ 2018-10-31 14:15     ` Adrian Hunter
  2018-10-31 14:20       ` Adrian Hunter
  2018-10-31 14:26       ` [PATCH 4/5] " Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 25+ messages in thread
From: Adrian Hunter @ 2018-10-31 14:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, linux-kernel, leo.yan, David Miller,
	Mathieu Poirier

On 31/10/18 3:28 PM, Arnaldo Carvalho de Melo wrote:
> Em Wed, Oct 31, 2018 at 11:10:42AM +0200, Adrian Hunter escreveu:
>> In the absence of a fallback, callchains must encode also the callchain
>> context. Do that now there is no fallback.
> 
> So, this one is independent of the first 3 patches, right?

Yes.  I was just going to test it separately when I noticed I had
screwed up my earlier testing.  When I re-tested I discovered this patch
has an off-by-one error:

diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
index afdf36852ac8..61a4286a74dc 100644
--- a/tools/perf/util/thread-stack.c
+++ b/tools/perf/util/thread-stack.c
@@ -337,7 +337,7 @@ void thread_stack__sample(struct thread *thread, struct ip_callchain *chain,
 
 	last_context = context;
 
-	for (i = 2, j = 0; i < sz && j < thread->ts->cnt; i++, j++) {
+	for (i = 2, j = 1; i < sz && j <= thread->ts->cnt; i++, j++) {
 		ip = thread->ts->stack[thread->ts->cnt - j].ret_addr;
 		context = callchain_context(ip, kernel_start);
 		if (context != last_context) {

Shall I send V2?

> So, this one is independent of the first 3 patches, right? Ok, applying
> it first, I'll relook the first ones next.
> 
> - Arnaldo
>  
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> Cc: stable@vger.kernel.org # 4.19
>> ---
>>  tools/perf/util/intel-pt.c     |  6 +++--
>>  tools/perf/util/thread-stack.c | 44 +++++++++++++++++++++++++++-------
>>  tools/perf/util/thread-stack.h |  2 +-
>>  3 files changed, 40 insertions(+), 12 deletions(-)
>>
>> diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
>> index ffa385a029b3..60732213d16a 100644
>> --- a/tools/perf/util/intel-pt.c
>> +++ b/tools/perf/util/intel-pt.c
>> @@ -759,7 +759,8 @@ static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
>>  	if (pt->synth_opts.callchain) {
>>  		size_t sz = sizeof(struct ip_callchain);
>>  
>> -		sz += pt->synth_opts.callchain_sz * sizeof(u64);
>> +		/* Add 1 to callchain_sz for callchain context */
>> +		sz += (pt->synth_opts.callchain_sz + 1) * sizeof(u64);
>>  		ptq->chain = zalloc(sz);
>>  		if (!ptq->chain)
>>  			goto out_free;
>> @@ -1160,7 +1161,8 @@ static void intel_pt_prep_sample(struct intel_pt *pt,
>>  
>>  	if (pt->synth_opts.callchain) {
>>  		thread_stack__sample(ptq->thread, ptq->chain,
>> -				     pt->synth_opts.callchain_sz, sample->ip);
>> +				     pt->synth_opts.callchain_sz + 1,
>> +				     sample->ip, pt->kernel_start);
>>  		sample->callchain = ptq->chain;
>>  	}
>>  
>> diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
>> index c091635bf7dc..afdf36852ac8 100644
>> --- a/tools/perf/util/thread-stack.c
>> +++ b/tools/perf/util/thread-stack.c
>> @@ -310,20 +310,46 @@ void thread_stack__free(struct thread *thread)
>>  	}
>>  }
>>  
>> +static inline u64 callchain_context(u64 ip, u64 kernel_start)
>> +{
>> +	return ip < kernel_start ? PERF_CONTEXT_USER : PERF_CONTEXT_KERNEL;
>> +}
>> +
>>  void thread_stack__sample(struct thread *thread, struct ip_callchain *chain,
>> -			  size_t sz, u64 ip)
>> +			  size_t sz, u64 ip, u64 kernel_start)
>>  {
>> -	size_t i;
>> +	u64 context = callchain_context(ip, kernel_start);
>> +	u64 last_context;
>> +	size_t i, j;
>>  
>> -	if (!thread || !thread->ts)
>> -		chain->nr = 1;
>> -	else
>> -		chain->nr = min(sz, thread->ts->cnt + 1);
>> +	if (sz < 2) {
>> +		chain->nr = 0;
>> +		return;
>> +	}
>>  
>> -	chain->ips[0] = ip;
>> +	chain->ips[0] = context;
>> +	chain->ips[1] = ip;
>> +
>> +	if (!thread || !thread->ts) {
>> +		chain->nr = 2;
>> +		return;
>> +	}
>> +
>> +	last_context = context;
>> +
>> +	for (i = 2, j = 0; i < sz && j < thread->ts->cnt; i++, j++) {
>> +		ip = thread->ts->stack[thread->ts->cnt - j].ret_addr;
>> +		context = callchain_context(ip, kernel_start);
>> +		if (context != last_context) {
>> +			if (i >= sz - 1)
>> +				break;
>> +			chain->ips[i++] = context;
>> +			last_context = context;
>> +		}
>> +		chain->ips[i] = ip;
>> +	}
>>  
>> -	for (i = 1; i < chain->nr; i++)
>> -		chain->ips[i] = thread->ts->stack[thread->ts->cnt - i].ret_addr;
>> +	chain->nr = i;
>>  }
>>  
>>  struct call_return_processor *
>> diff --git a/tools/perf/util/thread-stack.h b/tools/perf/util/thread-stack.h
>> index b7e41c4ebfdd..f97c00a8c251 100644
>> --- a/tools/perf/util/thread-stack.h
>> +++ b/tools/perf/util/thread-stack.h
>> @@ -84,7 +84,7 @@ int thread_stack__event(struct thread *thread, u32 flags, u64 from_ip,
>>  			u64 to_ip, u16 insn_len, u64 trace_nr);
>>  void thread_stack__set_trace_nr(struct thread *thread, u64 trace_nr);
>>  void thread_stack__sample(struct thread *thread, struct ip_callchain *chain,
>> -			  size_t sz, u64 ip);
>> +			  size_t sz, u64 ip, u64 kernel_start);
>>  int thread_stack__flush(struct thread *thread);
>>  void thread_stack__free(struct thread *thread);
>>  size_t thread_stack__depth(struct thread *thread);
>> -- 
>> 2.17.1
> 


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/5] perf intel-pt: Insert callchain context into synthesized callchains
  2018-10-31 14:15     ` Adrian Hunter
@ 2018-10-31 14:20       ` Adrian Hunter
  2018-10-31 14:27         ` Arnaldo Carvalho de Melo
  2018-10-31 22:12         ` [tip:perf/urgent] " tip-bot for Adrian Hunter
  2018-10-31 14:26       ` [PATCH 4/5] " Arnaldo Carvalho de Melo
  1 sibling, 2 replies; 25+ messages in thread
From: Adrian Hunter @ 2018-10-31 14:20 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, linux-kernel, leo.yan, David Miller,
	Mathieu Poirier

[-- Attachment #1: Type: text/plain, Size: 1186 bytes --]

On 31/10/18 4:15 PM, Adrian Hunter wrote:
> On 31/10/18 3:28 PM, Arnaldo Carvalho de Melo wrote:
>> Em Wed, Oct 31, 2018 at 11:10:42AM +0200, Adrian Hunter escreveu:
>>> In the absence of a fallback, callchains must encode also the callchain
>>> context. Do that now there is no fallback.
>>
>> So, this one is independent of the first 3 patches, right?
> 
> Yes.  I was just going to test it separately when I noticed I had
> screwed up my earlier testing.  When I re-tested I discovered this patch
> has an off-by-one error:
> 
> diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
> index afdf36852ac8..61a4286a74dc 100644
> --- a/tools/perf/util/thread-stack.c
> +++ b/tools/perf/util/thread-stack.c
> @@ -337,7 +337,7 @@ void thread_stack__sample(struct thread *thread, struct ip_callchain *chain,
>  
>  	last_context = context;
>  
> -	for (i = 2, j = 0; i < sz && j < thread->ts->cnt; i++, j++) {
> +	for (i = 2, j = 1; i < sz && j <= thread->ts->cnt; i++, j++) {
>  		ip = thread->ts->stack[thread->ts->cnt - j].ret_addr;
>  		context = callchain_context(ip, kernel_start);
>  		if (context != last_context) {
> 
> Shall I send V2?

I have attached it

[-- Attachment #2: 0004-perf-intel-pt-Insert-callchain-context-into-synthesi.patch --]
[-- Type: text/x-patch, Size: 3796 bytes --]

From 19a9fbdbd3729c7f14c9c1f0ac79c18cce91ff60 Mon Sep 17 00:00:00 2001
From: Adrian Hunter <adrian.hunter@intel.com>
Date: Wed, 31 Oct 2018 09:31:25 +0200
Subject: [PATCH V2 4/5] perf intel-pt: Insert callchain context into
 synthesized callchains

In the absence of a fallback, callchains must encode also the callchain
context. Do that now there is no fallback.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org # 4.19
---


Changes in V2:

	Fix off-by-one error


 tools/perf/util/intel-pt.c     |  6 +++--
 tools/perf/util/thread-stack.c | 44 +++++++++++++++++++++++++++-------
 tools/perf/util/thread-stack.h |  2 +-
 3 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index ffa385a029b3..60732213d16a 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -759,7 +759,8 @@ static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
 	if (pt->synth_opts.callchain) {
 		size_t sz = sizeof(struct ip_callchain);
 
-		sz += pt->synth_opts.callchain_sz * sizeof(u64);
+		/* Add 1 to callchain_sz for callchain context */
+		sz += (pt->synth_opts.callchain_sz + 1) * sizeof(u64);
 		ptq->chain = zalloc(sz);
 		if (!ptq->chain)
 			goto out_free;
@@ -1160,7 +1161,8 @@ static void intel_pt_prep_sample(struct intel_pt *pt,
 
 	if (pt->synth_opts.callchain) {
 		thread_stack__sample(ptq->thread, ptq->chain,
-				     pt->synth_opts.callchain_sz, sample->ip);
+				     pt->synth_opts.callchain_sz + 1,
+				     sample->ip, pt->kernel_start);
 		sample->callchain = ptq->chain;
 	}
 
diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
index c091635bf7dc..61a4286a74dc 100644
--- a/tools/perf/util/thread-stack.c
+++ b/tools/perf/util/thread-stack.c
@@ -310,20 +310,46 @@ void thread_stack__free(struct thread *thread)
 	}
 }
 
+static inline u64 callchain_context(u64 ip, u64 kernel_start)
+{
+	return ip < kernel_start ? PERF_CONTEXT_USER : PERF_CONTEXT_KERNEL;
+}
+
 void thread_stack__sample(struct thread *thread, struct ip_callchain *chain,
-			  size_t sz, u64 ip)
+			  size_t sz, u64 ip, u64 kernel_start)
 {
-	size_t i;
+	u64 context = callchain_context(ip, kernel_start);
+	u64 last_context;
+	size_t i, j;
 
-	if (!thread || !thread->ts)
-		chain->nr = 1;
-	else
-		chain->nr = min(sz, thread->ts->cnt + 1);
+	if (sz < 2) {
+		chain->nr = 0;
+		return;
+	}
 
-	chain->ips[0] = ip;
+	chain->ips[0] = context;
+	chain->ips[1] = ip;
+
+	if (!thread || !thread->ts) {
+		chain->nr = 2;
+		return;
+	}
+
+	last_context = context;
+
+	for (i = 2, j = 1; i < sz && j <= thread->ts->cnt; i++, j++) {
+		ip = thread->ts->stack[thread->ts->cnt - j].ret_addr;
+		context = callchain_context(ip, kernel_start);
+		if (context != last_context) {
+			if (i >= sz - 1)
+				break;
+			chain->ips[i++] = context;
+			last_context = context;
+		}
+		chain->ips[i] = ip;
+	}
 
-	for (i = 1; i < chain->nr; i++)
-		chain->ips[i] = thread->ts->stack[thread->ts->cnt - i].ret_addr;
+	chain->nr = i;
 }
 
 struct call_return_processor *
diff --git a/tools/perf/util/thread-stack.h b/tools/perf/util/thread-stack.h
index b7e41c4ebfdd..f97c00a8c251 100644
--- a/tools/perf/util/thread-stack.h
+++ b/tools/perf/util/thread-stack.h
@@ -84,7 +84,7 @@ int thread_stack__event(struct thread *thread, u32 flags, u64 from_ip,
 			u64 to_ip, u16 insn_len, u64 trace_nr);
 void thread_stack__set_trace_nr(struct thread *thread, u64 trace_nr);
 void thread_stack__sample(struct thread *thread, struct ip_callchain *chain,
-			  size_t sz, u64 ip);
+			  size_t sz, u64 ip, u64 kernel_start);
 int thread_stack__flush(struct thread *thread);
 void thread_stack__free(struct thread *thread);
 size_t thread_stack__depth(struct thread *thread);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/5] perf intel-pt: Insert callchain context into synthesized callchains
  2018-10-31 14:15     ` Adrian Hunter
  2018-10-31 14:20       ` Adrian Hunter
@ 2018-10-31 14:26       ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-10-31 14:26 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Jiri Olsa, Andi Kleen, linux-kernel, leo.yan, David Miller,
	Mathieu Poirier

Em Wed, Oct 31, 2018 at 04:15:26PM +0200, Adrian Hunter escreveu:
> On 31/10/18 3:28 PM, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Oct 31, 2018 at 11:10:42AM +0200, Adrian Hunter escreveu:
> >> In the absence of a fallback, callchains must encode also the callchain
> >> context. Do that now there is no fallback.
> > 
> > So, this one is independent of the first 3 patches, right?
> 
> Yes.  I was just going to test it separately when I noticed I had
> screwed up my earlier testing.  When I re-tested I discovered this patch
> has an off-by-one error:
> 
> diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
> index afdf36852ac8..61a4286a74dc 100644
> --- a/tools/perf/util/thread-stack.c
> +++ b/tools/perf/util/thread-stack.c
> @@ -337,7 +337,7 @@ void thread_stack__sample(struct thread *thread, struct ip_callchain *chain,
>  
>  	last_context = context;
>  
> -	for (i = 2, j = 0; i < sz && j < thread->ts->cnt; i++, j++) {
> +	for (i = 2, j = 1; i < sz && j <= thread->ts->cnt; i++, j++) {
>  		ip = thread->ts->stack[thread->ts->cnt - j].ret_addr;
>  		context = callchain_context(ip, kernel_start);
>  		if (context != last_context) {
> 
> Shall I send V2?

No need, I can fix this here
 
> > So, this one is independent of the first 3 patches, right? Ok, applying
> > it first, I'll relook the first ones next.
> > 
> > - Arnaldo
> >  
> >> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> >> Cc: stable@vger.kernel.org # 4.19
> >> ---
> >>  tools/perf/util/intel-pt.c     |  6 +++--
> >>  tools/perf/util/thread-stack.c | 44 +++++++++++++++++++++++++++-------
> >>  tools/perf/util/thread-stack.h |  2 +-
> >>  3 files changed, 40 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
> >> index ffa385a029b3..60732213d16a 100644
> >> --- a/tools/perf/util/intel-pt.c
> >> +++ b/tools/perf/util/intel-pt.c
> >> @@ -759,7 +759,8 @@ static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
> >>  	if (pt->synth_opts.callchain) {
> >>  		size_t sz = sizeof(struct ip_callchain);
> >>  
> >> -		sz += pt->synth_opts.callchain_sz * sizeof(u64);
> >> +		/* Add 1 to callchain_sz for callchain context */
> >> +		sz += (pt->synth_opts.callchain_sz + 1) * sizeof(u64);
> >>  		ptq->chain = zalloc(sz);
> >>  		if (!ptq->chain)
> >>  			goto out_free;
> >> @@ -1160,7 +1161,8 @@ static void intel_pt_prep_sample(struct intel_pt *pt,
> >>  
> >>  	if (pt->synth_opts.callchain) {
> >>  		thread_stack__sample(ptq->thread, ptq->chain,
> >> -				     pt->synth_opts.callchain_sz, sample->ip);
> >> +				     pt->synth_opts.callchain_sz + 1,
> >> +				     sample->ip, pt->kernel_start);
> >>  		sample->callchain = ptq->chain;
> >>  	}
> >>  
> >> diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
> >> index c091635bf7dc..afdf36852ac8 100644
> >> --- a/tools/perf/util/thread-stack.c
> >> +++ b/tools/perf/util/thread-stack.c
> >> @@ -310,20 +310,46 @@ void thread_stack__free(struct thread *thread)
> >>  	}
> >>  }
> >>  
> >> +static inline u64 callchain_context(u64 ip, u64 kernel_start)
> >> +{
> >> +	return ip < kernel_start ? PERF_CONTEXT_USER : PERF_CONTEXT_KERNEL;
> >> +}
> >> +
> >>  void thread_stack__sample(struct thread *thread, struct ip_callchain *chain,
> >> -			  size_t sz, u64 ip)
> >> +			  size_t sz, u64 ip, u64 kernel_start)
> >>  {
> >> -	size_t i;
> >> +	u64 context = callchain_context(ip, kernel_start);
> >> +	u64 last_context;
> >> +	size_t i, j;
> >>  
> >> -	if (!thread || !thread->ts)
> >> -		chain->nr = 1;
> >> -	else
> >> -		chain->nr = min(sz, thread->ts->cnt + 1);
> >> +	if (sz < 2) {
> >> +		chain->nr = 0;
> >> +		return;
> >> +	}
> >>  
> >> -	chain->ips[0] = ip;
> >> +	chain->ips[0] = context;
> >> +	chain->ips[1] = ip;
> >> +
> >> +	if (!thread || !thread->ts) {
> >> +		chain->nr = 2;
> >> +		return;
> >> +	}
> >> +
> >> +	last_context = context;
> >> +
> >> +	for (i = 2, j = 0; i < sz && j < thread->ts->cnt; i++, j++) {
> >> +		ip = thread->ts->stack[thread->ts->cnt - j].ret_addr;
> >> +		context = callchain_context(ip, kernel_start);
> >> +		if (context != last_context) {
> >> +			if (i >= sz - 1)
> >> +				break;
> >> +			chain->ips[i++] = context;
> >> +			last_context = context;
> >> +		}
> >> +		chain->ips[i] = ip;
> >> +	}
> >>  
> >> -	for (i = 1; i < chain->nr; i++)
> >> -		chain->ips[i] = thread->ts->stack[thread->ts->cnt - i].ret_addr;
> >> +	chain->nr = i;
> >>  }
> >>  
> >>  struct call_return_processor *
> >> diff --git a/tools/perf/util/thread-stack.h b/tools/perf/util/thread-stack.h
> >> index b7e41c4ebfdd..f97c00a8c251 100644
> >> --- a/tools/perf/util/thread-stack.h
> >> +++ b/tools/perf/util/thread-stack.h
> >> @@ -84,7 +84,7 @@ int thread_stack__event(struct thread *thread, u32 flags, u64 from_ip,
> >>  			u64 to_ip, u16 insn_len, u64 trace_nr);
> >>  void thread_stack__set_trace_nr(struct thread *thread, u64 trace_nr);
> >>  void thread_stack__sample(struct thread *thread, struct ip_callchain *chain,
> >> -			  size_t sz, u64 ip);
> >> +			  size_t sz, u64 ip, u64 kernel_start);
> >>  int thread_stack__flush(struct thread *thread);
> >>  void thread_stack__free(struct thread *thread);
> >>  size_t thread_stack__depth(struct thread *thread);
> >> -- 
> >> 2.17.1
> > 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/5] perf intel-pt: Insert callchain context into synthesized callchains
  2018-10-31 14:20       ` Adrian Hunter
@ 2018-10-31 14:27         ` Arnaldo Carvalho de Melo
  2018-10-31 22:12         ` [tip:perf/urgent] " tip-bot for Adrian Hunter
  1 sibling, 0 replies; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-10-31 14:27 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Jiri Olsa, Andi Kleen, linux-kernel, leo.yan, David Miller,
	Mathieu Poirier

Em Wed, Oct 31, 2018 at 04:20:45PM +0200, Adrian Hunter escreveu:
> On 31/10/18 4:15 PM, Adrian Hunter wrote:
> > On 31/10/18 3:28 PM, Arnaldo Carvalho de Melo wrote:
> >> Em Wed, Oct 31, 2018 at 11:10:42AM +0200, Adrian Hunter escreveu:
> >>> In the absence of a fallback, callchains must encode also the callchain
> >>> context. Do that now there is no fallback.
> >>
> >> So, this one is independent of the first 3 patches, right?
> > 
> > Yes.  I was just going to test it separately when I noticed I had
> > screwed up my earlier testing.  When I re-tested I discovered this patch
> > has an off-by-one error:
> > 
> > diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
> > index afdf36852ac8..61a4286a74dc 100644
> > --- a/tools/perf/util/thread-stack.c
> > +++ b/tools/perf/util/thread-stack.c
> > @@ -337,7 +337,7 @@ void thread_stack__sample(struct thread *thread, struct ip_callchain *chain,
> >  
> >  	last_context = context;
> >  
> > -	for (i = 2, j = 0; i < sz && j < thread->ts->cnt; i++, j++) {
> > +	for (i = 2, j = 1; i < sz && j <= thread->ts->cnt; i++, j++) {
> >  		ip = thread->ts->stack[thread->ts->cnt - j].ret_addr;
> >  		context = callchain_context(ip, kernel_start);
> >  		if (context != last_context) {
> > 
> > Shall I send V2?
> 
> I have attached it

Thanks!

> >From 19a9fbdbd3729c7f14c9c1f0ac79c18cce91ff60 Mon Sep 17 00:00:00 2001
> From: Adrian Hunter <adrian.hunter@intel.com>
> Date: Wed, 31 Oct 2018 09:31:25 +0200
> Subject: [PATCH V2 4/5] perf intel-pt: Insert callchain context into
>  synthesized callchains
> 
> In the absence of a fallback, callchains must encode also the callchain
> context. Do that now there is no fallback.
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> Cc: stable@vger.kernel.org # 4.19
> ---
> 
> 
> Changes in V2:
> 
> 	Fix off-by-one error
> 
> 
>  tools/perf/util/intel-pt.c     |  6 +++--
>  tools/perf/util/thread-stack.c | 44 +++++++++++++++++++++++++++-------
>  tools/perf/util/thread-stack.h |  2 +-
>  3 files changed, 40 insertions(+), 12 deletions(-)
> 
> diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
> index ffa385a029b3..60732213d16a 100644
> --- a/tools/perf/util/intel-pt.c
> +++ b/tools/perf/util/intel-pt.c
> @@ -759,7 +759,8 @@ static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
>  	if (pt->synth_opts.callchain) {
>  		size_t sz = sizeof(struct ip_callchain);
>  
> -		sz += pt->synth_opts.callchain_sz * sizeof(u64);
> +		/* Add 1 to callchain_sz for callchain context */
> +		sz += (pt->synth_opts.callchain_sz + 1) * sizeof(u64);
>  		ptq->chain = zalloc(sz);
>  		if (!ptq->chain)
>  			goto out_free;
> @@ -1160,7 +1161,8 @@ static void intel_pt_prep_sample(struct intel_pt *pt,
>  
>  	if (pt->synth_opts.callchain) {
>  		thread_stack__sample(ptq->thread, ptq->chain,
> -				     pt->synth_opts.callchain_sz, sample->ip);
> +				     pt->synth_opts.callchain_sz + 1,
> +				     sample->ip, pt->kernel_start);
>  		sample->callchain = ptq->chain;
>  	}
>  
> diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
> index c091635bf7dc..61a4286a74dc 100644
> --- a/tools/perf/util/thread-stack.c
> +++ b/tools/perf/util/thread-stack.c
> @@ -310,20 +310,46 @@ void thread_stack__free(struct thread *thread)
>  	}
>  }
>  
> +static inline u64 callchain_context(u64 ip, u64 kernel_start)
> +{
> +	return ip < kernel_start ? PERF_CONTEXT_USER : PERF_CONTEXT_KERNEL;
> +}
> +
>  void thread_stack__sample(struct thread *thread, struct ip_callchain *chain,
> -			  size_t sz, u64 ip)
> +			  size_t sz, u64 ip, u64 kernel_start)
>  {
> -	size_t i;
> +	u64 context = callchain_context(ip, kernel_start);
> +	u64 last_context;
> +	size_t i, j;
>  
> -	if (!thread || !thread->ts)
> -		chain->nr = 1;
> -	else
> -		chain->nr = min(sz, thread->ts->cnt + 1);
> +	if (sz < 2) {
> +		chain->nr = 0;
> +		return;
> +	}
>  
> -	chain->ips[0] = ip;
> +	chain->ips[0] = context;
> +	chain->ips[1] = ip;
> +
> +	if (!thread || !thread->ts) {
> +		chain->nr = 2;
> +		return;
> +	}
> +
> +	last_context = context;
> +
> +	for (i = 2, j = 1; i < sz && j <= thread->ts->cnt; i++, j++) {
> +		ip = thread->ts->stack[thread->ts->cnt - j].ret_addr;
> +		context = callchain_context(ip, kernel_start);
> +		if (context != last_context) {
> +			if (i >= sz - 1)
> +				break;
> +			chain->ips[i++] = context;
> +			last_context = context;
> +		}
> +		chain->ips[i] = ip;
> +	}
>  
> -	for (i = 1; i < chain->nr; i++)
> -		chain->ips[i] = thread->ts->stack[thread->ts->cnt - i].ret_addr;
> +	chain->nr = i;
>  }
>  
>  struct call_return_processor *
> diff --git a/tools/perf/util/thread-stack.h b/tools/perf/util/thread-stack.h
> index b7e41c4ebfdd..f97c00a8c251 100644
> --- a/tools/perf/util/thread-stack.h
> +++ b/tools/perf/util/thread-stack.h
> @@ -84,7 +84,7 @@ int thread_stack__event(struct thread *thread, u32 flags, u64 from_ip,
>  			u64 to_ip, u16 insn_len, u64 trace_nr);
>  void thread_stack__set_trace_nr(struct thread *thread, u64 trace_nr);
>  void thread_stack__sample(struct thread *thread, struct ip_callchain *chain,
> -			  size_t sz, u64 ip);
> +			  size_t sz, u64 ip, u64 kernel_start);
>  int thread_stack__flush(struct thread *thread);
>  void thread_stack__free(struct thread *thread);
>  size_t thread_stack__depth(struct thread *thread);
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [tip:perf/urgent] perf intel-pt: Insert callchain context into synthesized callchains
  2018-10-31 14:20       ` Adrian Hunter
  2018-10-31 14:27         ` Arnaldo Carvalho de Melo
@ 2018-10-31 22:12         ` tip-bot for Adrian Hunter
  1 sibling, 0 replies; 25+ messages in thread
From: tip-bot for Adrian Hunter @ 2018-10-31 22:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, tglx, jolsa, adrian.hunter, mingo, leo.yan, ak, hpa, davem,
	linux-kernel, mathieu.poirier

Commit-ID:  242483068b4b9ad02f1653819b6e683577681e0e
Gitweb:     https://git.kernel.org/tip/242483068b4b9ad02f1653819b6e683577681e0e
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Wed, 31 Oct 2018 11:10:42 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 31 Oct 2018 12:54:27 -0300

perf intel-pt: Insert callchain context into synthesized callchains

In the absence of a fallback, callchains must encode also the callchain
context. Do that now there is no fallback.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: stable@vger.kernel.org # 4.19
Link: http://lkml.kernel.org/r/100ea2ec-ed14-b56d-d810-e0a6d2f4b069@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/intel-pt.c     |  6 ++++--
 tools/perf/util/thread-stack.c | 44 +++++++++++++++++++++++++++++++++---------
 tools/perf/util/thread-stack.h |  2 +-
 3 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index ffa385a029b3..60732213d16a 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -759,7 +759,8 @@ static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
 	if (pt->synth_opts.callchain) {
 		size_t sz = sizeof(struct ip_callchain);
 
-		sz += pt->synth_opts.callchain_sz * sizeof(u64);
+		/* Add 1 to callchain_sz for callchain context */
+		sz += (pt->synth_opts.callchain_sz + 1) * sizeof(u64);
 		ptq->chain = zalloc(sz);
 		if (!ptq->chain)
 			goto out_free;
@@ -1160,7 +1161,8 @@ static void intel_pt_prep_sample(struct intel_pt *pt,
 
 	if (pt->synth_opts.callchain) {
 		thread_stack__sample(ptq->thread, ptq->chain,
-				     pt->synth_opts.callchain_sz, sample->ip);
+				     pt->synth_opts.callchain_sz + 1,
+				     sample->ip, pt->kernel_start);
 		sample->callchain = ptq->chain;
 	}
 
diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
index c091635bf7dc..61a4286a74dc 100644
--- a/tools/perf/util/thread-stack.c
+++ b/tools/perf/util/thread-stack.c
@@ -310,20 +310,46 @@ void thread_stack__free(struct thread *thread)
 	}
 }
 
+static inline u64 callchain_context(u64 ip, u64 kernel_start)
+{
+	return ip < kernel_start ? PERF_CONTEXT_USER : PERF_CONTEXT_KERNEL;
+}
+
 void thread_stack__sample(struct thread *thread, struct ip_callchain *chain,
-			  size_t sz, u64 ip)
+			  size_t sz, u64 ip, u64 kernel_start)
 {
-	size_t i;
+	u64 context = callchain_context(ip, kernel_start);
+	u64 last_context;
+	size_t i, j;
 
-	if (!thread || !thread->ts)
-		chain->nr = 1;
-	else
-		chain->nr = min(sz, thread->ts->cnt + 1);
+	if (sz < 2) {
+		chain->nr = 0;
+		return;
+	}
 
-	chain->ips[0] = ip;
+	chain->ips[0] = context;
+	chain->ips[1] = ip;
+
+	if (!thread || !thread->ts) {
+		chain->nr = 2;
+		return;
+	}
+
+	last_context = context;
+
+	for (i = 2, j = 1; i < sz && j <= thread->ts->cnt; i++, j++) {
+		ip = thread->ts->stack[thread->ts->cnt - j].ret_addr;
+		context = callchain_context(ip, kernel_start);
+		if (context != last_context) {
+			if (i >= sz - 1)
+				break;
+			chain->ips[i++] = context;
+			last_context = context;
+		}
+		chain->ips[i] = ip;
+	}
 
-	for (i = 1; i < chain->nr; i++)
-		chain->ips[i] = thread->ts->stack[thread->ts->cnt - i].ret_addr;
+	chain->nr = i;
 }
 
 struct call_return_processor *
diff --git a/tools/perf/util/thread-stack.h b/tools/perf/util/thread-stack.h
index b7e41c4ebfdd..f97c00a8c251 100644
--- a/tools/perf/util/thread-stack.h
+++ b/tools/perf/util/thread-stack.h
@@ -84,7 +84,7 @@ int thread_stack__event(struct thread *thread, u32 flags, u64 from_ip,
 			u64 to_ip, u16 insn_len, u64 trace_nr);
 void thread_stack__set_trace_nr(struct thread *thread, u64 trace_nr);
 void thread_stack__sample(struct thread *thread, struct ip_callchain *chain,
-			  size_t sz, u64 ip);
+			  size_t sz, u64 ip, u64 kernel_start);
 int thread_stack__flush(struct thread *thread);
 void thread_stack__free(struct thread *thread);
 size_t thread_stack__depth(struct thread *thread);

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip:perf/urgent] perf intel-pt/bts: Calculate cpumode for synthesized samples
  2018-10-31  9:10 ` [PATCH 5/5] perf intel-pt/bts: Calculate cpumode for synthesized samples Adrian Hunter
@ 2018-10-31 22:13   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Adrian Hunter @ 2018-10-31 22:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: leo.yan, tglx, adrian.hunter, davem, linux-kernel, jolsa, acme,
	hpa, mingo, ak, mathieu.poirier

Commit-ID:  5d4f0edaa3ac4f1844ed7c64cd2bae6f1912bac5
Gitweb:     https://git.kernel.org/tip/5d4f0edaa3ac4f1844ed7c64cd2bae6f1912bac5
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Wed, 31 Oct 2018 11:10:43 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 31 Oct 2018 12:56:26 -0300

perf intel-pt/bts: Calculate cpumode for synthesized samples

In the absence of a fallback, samples must provide a correct cpumode for
the 'ip'. Do that now there is no fallback.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: stable@vger.kernel.org # 4.19
Link: http://lkml.kernel.org/r/20181031091043.23465-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/intel-bts.c | 17 ++++++++++++-----
 tools/perf/util/intel-pt.c  | 22 +++++++++++++---------
 2 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
index 3b3a3d55dca1..7b27d77306c2 100644
--- a/tools/perf/util/intel-bts.c
+++ b/tools/perf/util/intel-bts.c
@@ -269,6 +269,13 @@ static int intel_bts_do_fix_overlap(struct auxtrace_queue *queue,
 	return 0;
 }
 
+static inline u8 intel_bts_cpumode(struct intel_bts *bts, uint64_t ip)
+{
+	return machine__kernel_ip(bts->machine, ip) ?
+	       PERF_RECORD_MISC_KERNEL :
+	       PERF_RECORD_MISC_USER;
+}
+
 static int intel_bts_synth_branch_sample(struct intel_bts_queue *btsq,
 					 struct branch *branch)
 {
@@ -281,12 +288,8 @@ static int intel_bts_synth_branch_sample(struct intel_bts_queue *btsq,
 	    bts->num_events++ <= bts->synth_opts.initial_skip)
 		return 0;
 
-	event.sample.header.type = PERF_RECORD_SAMPLE;
-	event.sample.header.misc = PERF_RECORD_MISC_USER;
-	event.sample.header.size = sizeof(struct perf_event_header);
-
-	sample.cpumode = PERF_RECORD_MISC_USER;
 	sample.ip = le64_to_cpu(branch->from);
+	sample.cpumode = intel_bts_cpumode(bts, sample.ip);
 	sample.pid = btsq->pid;
 	sample.tid = btsq->tid;
 	sample.addr = le64_to_cpu(branch->to);
@@ -298,6 +301,10 @@ static int intel_bts_synth_branch_sample(struct intel_bts_queue *btsq,
 	sample.insn_len = btsq->intel_pt_insn.length;
 	memcpy(sample.insn, btsq->intel_pt_insn.buf, INTEL_PT_INSN_BUF_SZ);
 
+	event.sample.header.type = PERF_RECORD_SAMPLE;
+	event.sample.header.misc = sample.cpumode;
+	event.sample.header.size = sizeof(struct perf_event_header);
+
 	if (bts->synth_opts.inject) {
 		event.sample.header.size = bts->branches_event_size;
 		ret = perf_event__synthesize_sample(&event,
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 60732213d16a..86cc9a64e982 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -407,6 +407,13 @@ intel_pt_cache_lookup(struct dso *dso, struct machine *machine, u64 offset)
 	return auxtrace_cache__lookup(dso->auxtrace_cache, offset);
 }
 
+static inline u8 intel_pt_cpumode(struct intel_pt *pt, uint64_t ip)
+{
+	return ip >= pt->kernel_start ?
+	       PERF_RECORD_MISC_KERNEL :
+	       PERF_RECORD_MISC_USER;
+}
+
 static int intel_pt_walk_next_insn(struct intel_pt_insn *intel_pt_insn,
 				   uint64_t *insn_cnt_ptr, uint64_t *ip,
 				   uint64_t to_ip, uint64_t max_insn_cnt,
@@ -429,10 +436,7 @@ static int intel_pt_walk_next_insn(struct intel_pt_insn *intel_pt_insn,
 	if (to_ip && *ip == to_ip)
 		goto out_no_cache;
 
-	if (*ip >= ptq->pt->kernel_start)
-		cpumode = PERF_RECORD_MISC_KERNEL;
-	else
-		cpumode = PERF_RECORD_MISC_USER;
+	cpumode = intel_pt_cpumode(ptq->pt, *ip);
 
 	thread = ptq->thread;
 	if (!thread) {
@@ -1059,15 +1063,11 @@ static void intel_pt_prep_b_sample(struct intel_pt *pt,
 				   union perf_event *event,
 				   struct perf_sample *sample)
 {
-	event->sample.header.type = PERF_RECORD_SAMPLE;
-	event->sample.header.misc = PERF_RECORD_MISC_USER;
-	event->sample.header.size = sizeof(struct perf_event_header);
-
 	if (!pt->timeless_decoding)
 		sample->time = tsc_to_perf_time(ptq->timestamp, &pt->tc);
 
-	sample->cpumode = PERF_RECORD_MISC_USER;
 	sample->ip = ptq->state->from_ip;
+	sample->cpumode = intel_pt_cpumode(pt, sample->ip);
 	sample->pid = ptq->pid;
 	sample->tid = ptq->tid;
 	sample->addr = ptq->state->to_ip;
@@ -1076,6 +1076,10 @@ static void intel_pt_prep_b_sample(struct intel_pt *pt,
 	sample->flags = ptq->flags;
 	sample->insn_len = ptq->insn_len;
 	memcpy(sample->insn, ptq->insn, INTEL_PT_INSN_BUF_SZ);
+
+	event->sample.header.type = PERF_RECORD_SAMPLE;
+	event->sample.header.misc = sample->cpumode;
+	event->sample.header.size = sizeof(struct perf_event_header);
 }
 
 static int intel_pt_inject_event(union perf_event *event,

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* RE: [PATCH 2/5] perf tools: Use fallback for sample_addr_correlates_sym() cases
  2018-10-31  9:10 ` [PATCH 2/5] perf tools: Use fallback for sample_addr_correlates_sym() cases Adrian Hunter
@ 2018-11-03 17:46   ` Hunter, Adrian
  0 siblings, 0 replies; 25+ messages in thread
From: Hunter, Adrian @ 2018-11-03 17:46 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, linux-kernel, leo.yan, David Miller,
	Mathieu Poirier

> -----Original Message-----
> From: Hunter, Adrian
> Sent: Wednesday, October 31, 2018 11:11 AM
> To: Arnaldo Carvalho de Melo <acme@kernel.org>
> Cc: Jiri Olsa <jolsa@redhat.com>; Andi Kleen <ak@linux.intel.com>; linux-
> kernel@vger.kernel.org; leo.yan@linaro.org; David Miller
> <davem@davemloft.net>; Mathieu Poirier <mathieu.poirier@linaro.org>
> Subject: [PATCH 2/5] perf tools: Use fallback for
> sample_addr_correlates_sym() cases
> 
> thread__resolve() is used in the sample_addr_correlates_sym() cases where
> 'addr' is a destination of a branch which does not necessarily have the
> same cpumode as the 'ip'. Use the fallback function in that case.
> 
> This patch depends on patch "perf tools: Add fallback functions for cases
> where cpumode is insufficient".
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> Cc: stable@vger.kernel.org # 4.19

Any comments?

> ---
>  tools/perf/util/event.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> index 52b24b8ce29e..316f62ffa27b 100644
> --- a/tools/perf/util/event.c
> +++ b/tools/perf/util/event.c
> @@ -1705,7 +1705,7 @@ bool sample_addr_correlates_sym(struct
> perf_event_attr *attr)
>  void thread__resolve(struct thread *thread, struct addr_location *al,
>  		     struct perf_sample *sample)
>  {
> -	thread__find_map(thread, sample->cpumode, sample->addr, al);
> +	thread__find_map_fallback(thread, sample->cpumode, sample-
> >addr, al);
> 
>  	al->cpu = sample->cpu;
>  	al->sym = NULL;
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] perf tools: Add fallback functions for cases where cpumode is insufficient
  2018-10-31  9:10 ` [PATCH 1/5] perf tools: Add fallback functions for cases where cpumode is insufficient Adrian Hunter
@ 2018-11-05 17:29   ` Arnaldo Carvalho de Melo
  2018-11-05 18:19     ` David Miller
  2018-11-05 19:21     ` Hunter, Adrian
  0 siblings, 2 replies; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-11-05 17:29 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Jiri Olsa, Andi Kleen, linux-kernel, leo.yan, David Miller,
	Mathieu Poirier

Em Wed, Oct 31, 2018 at 11:10:39AM +0200, Adrian Hunter escreveu:
> For branch stacks or branch samples, the sample cpumode might not be
> correct because it applies only to the sample 'ip' and not necessary to
> 'addr' or branch stack addresses. Add fallback functions that can be used
> to deal with those cases
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> Cc: stable@vger.kernel.org # 4.19
> ---
>  tools/perf/util/event.c   | 27 ++++++++++++++++++++++++++
>  tools/perf/util/machine.c | 40 +++++++++++++++++++++++++++++++++++++++
>  tools/perf/util/machine.h |  2 ++
>  tools/perf/util/thread.h  |  4 ++++
>  4 files changed, 73 insertions(+)
> 
> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> index bc646185f8d9..52b24b8ce29e 100644
> --- a/tools/perf/util/event.c
> +++ b/tools/perf/util/event.c
> @@ -1576,6 +1576,24 @@ struct map *thread__find_map(struct thread *thread, u8 cpumode, u64 addr,
>  	return al->map;
>  }
>  
> +/*
> + * For branch stacks or branch samples, the sample cpumode might not be correct
> + * because it applies only to the sample 'ip' and not necessary to 'addr' or
> + * branch stack addresses. If possible, use a fallback to deal with those cases.
> + */
> +struct map *thread__find_map_fallback(struct thread *thread, u8 cpumode,
> +				      u64 addr, struct addr_location *al)
> +{

You named one as _fallback...

> +	struct map *map = thread__find_map(thread, cpumode, addr, al);
> +	struct machine *machine = thread->mg->machine;
> +	u8 addr_cpumode = machine__addr_cpumode(machine, cpumode, addr);
> +
> +	if (map || addr_cpumode == cpumode)
> +		return map;
> +
> +	return thread__find_map(thread, addr_cpumode, addr, al);
> +}
> +
>  struct symbol *thread__find_symbol(struct thread *thread, u8 cpumode,
>  				   u64 addr, struct addr_location *al)
>  {
> @@ -1585,6 +1603,15 @@ struct symbol *thread__find_symbol(struct thread *thread, u8 cpumode,
>  	return al->sym;
>  }
>  
> +struct symbol *thread__find_symbol_fb(struct thread *thread, u8 cpumode,
> +				      u64 addr, struct addr_location *al)

... and the other as _fb, make it consistent, please.

> +{
> +	al->sym = NULL;
> +	if (thread__find_map_fallback(thread, cpumode, addr, al))
> +		al->sym = map__find_symbol(al->map, al->addr);
> +	return al->sym;
> +}
> +
>  /*
>   * Callers need to drop the reference to al->thread, obtained in
>   * machine__findnew_thread()
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 111ae858cbcb..04edc0eac376 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -2542,6 +2542,46 @@ int machine__get_kernel_start(struct machine *machine)
>  	return err;
>  }
>  
> +/*
> + * machine__single_ku_as - Machine has same address space for kernel and user.
> + * @machine: machine object
> + *
> + * Some architectures have a single address space for kernel and user addresses,
> + * which makes it possible to determine if an address is in kernel space or user
> + * space.
> + */
> +static bool machine__single_ku_as(struct machine *machine)
> +{
> +	return strcmp(perf_env__arch(machine->env), "sparc");
> +}

Can we avoid having this strcmp be done repeatedly? I.e. just make this
a boolean initialized at session start, when machine->env is setup, so
we'd have:

   machine->single_address_space

Instead of a function?

Also have you considered making this fallback to be performed only from
code that is that arch specific?

I.e. the code that supports branch samples/stacks is x86_86 specific at
this point and thus only in that case we would call the routines that
perform the fallback, which, in turn, wouldn't need to check for
"sparc"?

> +
> +u8 machine__addr_cpumode(struct machine *machine, u8 cpumode, u64 addr)
> +{
> +	u8 addr_cpumode = cpumode;
> +	bool kernel_ip;
> +
> +	if (!machine__single_ku_as(machine))
> +		goto out;
> +
> +	kernel_ip = machine__kernel_ip(machine, addr);
> +	switch (cpumode) {
> +	case PERF_RECORD_MISC_KERNEL:
> +	case PERF_RECORD_MISC_USER:
> +		addr_cpumode = kernel_ip ? PERF_RECORD_MISC_KERNEL :
> +					   PERF_RECORD_MISC_USER;
> +		break;
> +	case PERF_RECORD_MISC_GUEST_KERNEL:
> +	case PERF_RECORD_MISC_GUEST_USER:
> +		addr_cpumode = kernel_ip ? PERF_RECORD_MISC_GUEST_KERNEL :
> +					   PERF_RECORD_MISC_GUEST_USER;
> +		break;
> +	default:
> +		break;
> +	}
> +out:
> +	return addr_cpumode;
> +}
> +
>  struct dso *machine__findnew_dso(struct machine *machine, const char *filename)
>  {
>  	return dsos__findnew(&machine->dsos, filename);
> diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> index d856b85862e2..20d464135ed1 100644
> --- a/tools/perf/util/machine.h
> +++ b/tools/perf/util/machine.h
> @@ -99,6 +99,8 @@ static inline bool machine__kernel_ip(struct machine *machine, u64 ip)
>  	return ip >= kernel_start;
>  }
>  
> +u8 machine__addr_cpumode(struct machine *machine, u8 cpumode, u64 addr);
> +
>  struct thread *machine__find_thread(struct machine *machine, pid_t pid,
>  				    pid_t tid);
>  struct comm *machine__thread_exec_comm(struct machine *machine,
> diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
> index 36c09a9904e6..e28f9dc1110a 100644
> --- a/tools/perf/util/thread.h
> +++ b/tools/perf/util/thread.h
> @@ -96,9 +96,13 @@ struct thread *thread__main_thread(struct machine *machine, struct thread *threa
>  
>  struct map *thread__find_map(struct thread *thread, u8 cpumode, u64 addr,
>  			     struct addr_location *al);
> +struct map *thread__find_map_fallback(struct thread *thread, u8 cpumode,
> +				      u64 addr, struct addr_location *al);
>  
>  struct symbol *thread__find_symbol(struct thread *thread, u8 cpumode,
>  				   u64 addr, struct addr_location *al);
> +struct symbol *thread__find_symbol_fb(struct thread *thread, u8 cpumode,
> +				      u64 addr, struct addr_location *al);
>  
>  void thread__find_cpumode_addr_location(struct thread *thread, u64 addr,
>  					struct addr_location *al);
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/5] perf tools: Use fallbacks for branch stacks
  2018-10-31  9:10 ` [PATCH 3/5] perf tools: Use fallbacks for branch stacks Adrian Hunter
@ 2018-11-05 17:56   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-11-05 17:56 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Jiri Olsa, Andi Kleen, linux-kernel, leo.yan, David Miller,
	Mathieu Poirier

Em Wed, Oct 31, 2018 at 11:10:41AM +0200, Adrian Hunter escreveu:
> Branch stacks do not necessarily have the same cpumode as the 'ip'. Use the
> fallback functions in those cases.
> 
> This patch depends on patch "perf tools: Add fallback functions for cases
>> where cpumode is insufficient".

This one is ok, i.e. only in places where we need a fallback we do it.

- Arnaldo
 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> Cc: stable@vger.kernel.org # 4.19
> ---
>  tools/perf/builtin-script.c                      | 12 ++++++------
>  .../util/scripting-engines/trace-event-python.c  | 16 ++++++++--------
>  2 files changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index b5bc85bd0bbe..996317d8e183 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -728,8 +728,8 @@ static int perf_sample__fprintf_brstack(struct perf_sample *sample,
>  		if (PRINT_FIELD(DSO)) {
>  			memset(&alf, 0, sizeof(alf));
>  			memset(&alt, 0, sizeof(alt));
> -			thread__find_map(thread, sample->cpumode, from, &alf);
> -			thread__find_map(thread, sample->cpumode, to, &alt);
> +			thread__find_map_fallback(thread, sample->cpumode, from, &alf);
> +			thread__find_map_fallback(thread, sample->cpumode, to, &alt);
>  		}
>  
>  		printed += fprintf(fp, " 0x%"PRIx64, from);
> @@ -775,8 +775,8 @@ static int perf_sample__fprintf_brstacksym(struct perf_sample *sample,
>  		from = br->entries[i].from;
>  		to   = br->entries[i].to;
>  
> -		thread__find_symbol(thread, sample->cpumode, from, &alf);
> -		thread__find_symbol(thread, sample->cpumode, to, &alt);
> +		thread__find_symbol_fb(thread, sample->cpumode, from, &alf);
> +		thread__find_symbol_fb(thread, sample->cpumode, to, &alt);
>  
>  		printed += symbol__fprintf_symname_offs(alf.sym, &alf, fp);
>  		if (PRINT_FIELD(DSO)) {
> @@ -820,11 +820,11 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample,
>  		from = br->entries[i].from;
>  		to   = br->entries[i].to;
>  
> -		if (thread__find_map(thread, sample->cpumode, from, &alf) &&
> +		if (thread__find_map_fallback(thread, sample->cpumode, from, &alf) &&
>  		    !alf.map->dso->adjust_symbols)
>  			from = map__map_ip(alf.map, from);
>  
> -		if (thread__find_map(thread, sample->cpumode, to, &alt) &&
> +		if (thread__find_map_fallback(thread, sample->cpumode, to, &alt) &&
>  		    !alt.map->dso->adjust_symbols)
>  			to = map__map_ip(alt.map, to);
>  
> diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
> index 69aa93d4ee99..244aeeee9c7a 100644
> --- a/tools/perf/util/scripting-engines/trace-event-python.c
> +++ b/tools/perf/util/scripting-engines/trace-event-python.c
> @@ -494,14 +494,14 @@ static PyObject *python_process_brstack(struct perf_sample *sample,
>  		pydict_set_item_string_decref(pyelem, "cycles",
>  		    PyLong_FromUnsignedLongLong(br->entries[i].flags.cycles));
>  
> -		thread__find_map(thread, sample->cpumode,
> -				 br->entries[i].from, &al);
> +		thread__find_map_fallback(thread, sample->cpumode,
> +					  br->entries[i].from, &al);
>  		dsoname = get_dsoname(al.map);
>  		pydict_set_item_string_decref(pyelem, "from_dsoname",
>  					      _PyUnicode_FromString(dsoname));
>  
> -		thread__find_map(thread, sample->cpumode,
> -				 br->entries[i].to, &al);
> +		thread__find_map_fallback(thread, sample->cpumode,
> +					  br->entries[i].to, &al);
>  		dsoname = get_dsoname(al.map);
>  		pydict_set_item_string_decref(pyelem, "to_dsoname",
>  					      _PyUnicode_FromString(dsoname));
> @@ -576,14 +576,14 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample,
>  		if (!pyelem)
>  			Py_FatalError("couldn't create Python dictionary");
>  
> -		thread__find_symbol(thread, sample->cpumode,
> -				    br->entries[i].from, &al);
> +		thread__find_symbol_fb(thread, sample->cpumode,
> +				       br->entries[i].from, &al);
>  		get_symoff(al.sym, &al, true, bf, sizeof(bf));
>  		pydict_set_item_string_decref(pyelem, "from",
>  					      _PyUnicode_FromString(bf));
>  
> -		thread__find_symbol(thread, sample->cpumode,
> -				    br->entries[i].to, &al);
> +		thread__find_symbol_fb(thread, sample->cpumode,
> +				       br->entries[i].to, &al);
>  		get_symoff(al.sym, &al, true, bf, sizeof(bf));
>  		pydict_set_item_string_decref(pyelem, "to",
>  					      _PyUnicode_FromString(bf));
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] perf tools: Add fallback functions for cases where cpumode is insufficient
  2018-11-05 17:29   ` Arnaldo Carvalho de Melo
@ 2018-11-05 18:19     ` David Miller
  2018-11-05 19:21     ` Hunter, Adrian
  1 sibling, 0 replies; 25+ messages in thread
From: David Miller @ 2018-11-05 18:19 UTC (permalink / raw)
  To: acme; +Cc: adrian.hunter, jolsa, ak, linux-kernel, leo.yan, mathieu.poirier

From: Arnaldo Carvalho de Melo <acme@kernel.org>
Date: Mon, 5 Nov 2018 14:29:46 -0300

>> @@ -2542,6 +2542,46 @@ int machine__get_kernel_start(struct machine *machine)
>>  	return err;
>>  }
>>  
>> +/*
>> + * machine__single_ku_as - Machine has same address space for kernel and user.
>> + * @machine: machine object
>> + *
>> + * Some architectures have a single address space for kernel and user addresses,
>> + * which makes it possible to determine if an address is in kernel space or user
>> + * space.
>> + */
>> +static bool machine__single_ku_as(struct machine *machine)
>> +{
>> +	return strcmp(perf_env__arch(machine->env), "sparc");
>> +}
> 
> Can we avoid having this strcmp be done repeatedly? I.e. just make this
> a boolean initialized at session start, when machine->env is setup, so
> we'd have:
> 
>    machine->single_address_space
> 
> Instead of a function?

Agreed, doing this every time is wasteful.

We could also make it a define in some arch/foo file.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 1/5] perf tools: Add fallback functions for cases where cpumode is insufficient
  2018-11-05 17:29   ` Arnaldo Carvalho de Melo
  2018-11-05 18:19     ` David Miller
@ 2018-11-05 19:21     ` Hunter, Adrian
  2018-11-05 19:36       ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 25+ messages in thread
From: Hunter, Adrian @ 2018-11-05 19:21 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, linux-kernel, leo.yan, David Miller,
	Mathieu Poirier

> -----Original Message-----
> From: Arnaldo Carvalho de Melo [mailto:acme@kernel.org]
> Sent: Monday, November 5, 2018 7:30 PM
> To: Hunter, Adrian <adrian.hunter@intel.com>
> Cc: Jiri Olsa <jolsa@redhat.com>; Andi Kleen <ak@linux.intel.com>; linux-
> kernel@vger.kernel.org; leo.yan@linaro.org; David Miller
> <davem@davemloft.net>; Mathieu Poirier <mathieu.poirier@linaro.org>
> Subject: Re: [PATCH 1/5] perf tools: Add fallback functions for cases where
> cpumode is insufficient
> 
> Em Wed, Oct 31, 2018 at 11:10:39AM +0200, Adrian Hunter escreveu:
> > For branch stacks or branch samples, the sample cpumode might not be
> > correct because it applies only to the sample 'ip' and not necessary
> > to 'addr' or branch stack addresses. Add fallback functions that can
> > be used to deal with those cases
> >
> > Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> > Cc: stable@vger.kernel.org # 4.19
> > ---
> >  tools/perf/util/event.c   | 27 ++++++++++++++++++++++++++
> >  tools/perf/util/machine.c | 40
> > +++++++++++++++++++++++++++++++++++++++
> >  tools/perf/util/machine.h |  2 ++
> >  tools/perf/util/thread.h  |  4 ++++
> >  4 files changed, 73 insertions(+)
> >
> > diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c index
> > bc646185f8d9..52b24b8ce29e 100644
> > --- a/tools/perf/util/event.c
> > +++ b/tools/perf/util/event.c
> > @@ -1576,6 +1576,24 @@ struct map *thread__find_map(struct thread
> *thread, u8 cpumode, u64 addr,
> >  	return al->map;
> >  }
> >
> > +/*
> > + * For branch stacks or branch samples, the sample cpumode might not
> > +be correct
> > + * because it applies only to the sample 'ip' and not necessary to
> > +'addr' or
> > + * branch stack addresses. If possible, use a fallback to deal with those
> cases.
> > + */
> > +struct map *thread__find_map_fallback(struct thread *thread, u8
> cpumode,
> > +				      u64 addr, struct addr_location *al) {
> 
> You named one as _fallback...
> 
> > +	struct map *map = thread__find_map(thread, cpumode, addr, al);
> > +	struct machine *machine = thread->mg->machine;
> > +	u8 addr_cpumode = machine__addr_cpumode(machine, cpumode,
> addr);
> > +
> > +	if (map || addr_cpumode == cpumode)
> > +		return map;
> > +
> > +	return thread__find_map(thread, addr_cpumode, addr, al); }
> > +
> >  struct symbol *thread__find_symbol(struct thread *thread, u8 cpumode,
> >  				   u64 addr, struct addr_location *al)  { @@ -
> 1585,6 +1603,15 @@
> > struct symbol *thread__find_symbol(struct thread *thread, u8 cpumode,
> >  	return al->sym;
> >  }
> >
> > +struct symbol *thread__find_symbol_fb(struct thread *thread, u8
> cpumode,
> > +				      u64 addr, struct addr_location *al)
> 
> ... and the other as _fb, make it consistent, please.

ok

> 
> > +{
> > +	al->sym = NULL;
> > +	if (thread__find_map_fallback(thread, cpumode, addr, al))
> > +		al->sym = map__find_symbol(al->map, al->addr);
> > +	return al->sym;
> > +}
> > +
> >  /*
> >   * Callers need to drop the reference to al->thread, obtained in
> >   * machine__findnew_thread()
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index 111ae858cbcb..04edc0eac376 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -2542,6 +2542,46 @@ int machine__get_kernel_start(struct machine
> *machine)
> >  	return err;
> >  }
> >
> > +/*
> > + * machine__single_ku_as - Machine has same address space for kernel
> and user.
> > + * @machine: machine object
> > + *
> > + * Some architectures have a single address space for kernel and user
> > +addresses,
> > + * which makes it possible to determine if an address is in kernel
> > +space or user
> > + * space.
> > + */
> > +static bool machine__single_ku_as(struct machine *machine) {
> > +	return strcmp(perf_env__arch(machine->env), "sparc"); }
> 
> Can we avoid having this strcmp be done repeatedly?

It is only done if there are mapping errors

 > Can we avoid having this strcmp be done repeatedly? I.e. just make this a
> boolean initialized at session start, when machine->env is setup, so we'd
> have:
> 
>    machine->single_address_space
> 
> Instead of a function?

Sure  thing.

> 
> Also have you considered making this fallback to be performed only from
> code that is that arch specific?
> 
> I.e. the code that supports branch samples/stacks is x86_86 specific at this
> point and thus only in that case we would call the routines that perform the
> fallback, which, in turn, wouldn't need to check for "sparc"?

I will look at it, but theoretically someone could be processing x86 data but
doing it on a machine of a different architecture.

> 
> > +
> > +u8 machine__addr_cpumode(struct machine *machine, u8 cpumode, u64
> > +addr) {
> > +	u8 addr_cpumode = cpumode;
> > +	bool kernel_ip;
> > +
> > +	if (!machine__single_ku_as(machine))
> > +		goto out;
> > +
> > +	kernel_ip = machine__kernel_ip(machine, addr);
> > +	switch (cpumode) {
> > +	case PERF_RECORD_MISC_KERNEL:
> > +	case PERF_RECORD_MISC_USER:
> > +		addr_cpumode = kernel_ip ? PERF_RECORD_MISC_KERNEL :
> > +					   PERF_RECORD_MISC_USER;
> > +		break;
> > +	case PERF_RECORD_MISC_GUEST_KERNEL:
> > +	case PERF_RECORD_MISC_GUEST_USER:
> > +		addr_cpumode = kernel_ip ?
> PERF_RECORD_MISC_GUEST_KERNEL :
> > +					   PERF_RECORD_MISC_GUEST_USER;
> > +		break;
> > +	default:
> > +		break;
> > +	}
> > +out:
> > +	return addr_cpumode;
> > +}
> > +
> >  struct dso *machine__findnew_dso(struct machine *machine, const char
> > *filename)  {
> >  	return dsos__findnew(&machine->dsos, filename); diff --git
> > a/tools/perf/util/machine.h b/tools/perf/util/machine.h index
> > d856b85862e2..20d464135ed1 100644
> > --- a/tools/perf/util/machine.h
> > +++ b/tools/perf/util/machine.h
> > @@ -99,6 +99,8 @@ static inline bool machine__kernel_ip(struct machine
> *machine, u64 ip)
> >  	return ip >= kernel_start;
> >  }
> >
> > +u8 machine__addr_cpumode(struct machine *machine, u8 cpumode, u64
> > +addr);
> > +
> >  struct thread *machine__find_thread(struct machine *machine, pid_t pid,
> >  				    pid_t tid);
> >  struct comm *machine__thread_exec_comm(struct machine *machine,
> diff
> > --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h index
> > 36c09a9904e6..e28f9dc1110a 100644
> > --- a/tools/perf/util/thread.h
> > +++ b/tools/perf/util/thread.h
> > @@ -96,9 +96,13 @@ struct thread *thread__main_thread(struct machine
> > *machine, struct thread *threa
> >
> >  struct map *thread__find_map(struct thread *thread, u8 cpumode, u64
> addr,
> >  			     struct addr_location *al);
> > +struct map *thread__find_map_fallback(struct thread *thread, u8
> cpumode,
> > +				      u64 addr, struct addr_location *al);
> >
> >  struct symbol *thread__find_symbol(struct thread *thread, u8 cpumode,
> >  				   u64 addr, struct addr_location *al);
> > +struct symbol *thread__find_symbol_fb(struct thread *thread, u8
> cpumode,
> > +				      u64 addr, struct addr_location *al);
> >
> >  void thread__find_cpumode_addr_location(struct thread *thread, u64
> addr,
> >  					struct addr_location *al);
> > --
> > 2.17.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] perf tools: Add fallback functions for cases where cpumode is insufficient
  2018-11-05 19:21     ` Hunter, Adrian
@ 2018-11-05 19:36       ` Arnaldo Carvalho de Melo
  2018-11-05 19:53         ` Hunter, Adrian
  0 siblings, 1 reply; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-11-05 19:36 UTC (permalink / raw)
  To: Hunter, Adrian
  Cc: Jiri Olsa, Andi Kleen, linux-kernel, leo.yan, David Miller,
	Mathieu Poirier

Em Mon, Nov 05, 2018 at 07:21:44PM +0000, Hunter, Adrian escreveu:
> > In Monday, November 5, 2018 7:30 PM, Arnaldo Carvalho de Melo wrote
<SNIP>
> > > +struct map *thread__find_map_fallback(struct thread *thread, u8
> > cpumode,
> > > +				      u64 addr, struct addr_location *al) {

> > You named one as _fallback...

> > > +	struct map *map = thread__find_map(thread, cpumode, addr, al);
> > > +	struct machine *machine = thread->mg->machine;
> > > +	u8 addr_cpumode = machine__addr_cpumode(machine, cpumode,
> > addr);
> > > +
> > > +	if (map || addr_cpumode == cpumode)
> > > +		return map;
> > > +
> > > +	return thread__find_map(thread, addr_cpumode, addr, al); }
> > > +
> > >  struct symbol *thread__find_symbol(struct thread *thread, u8 cpumode,
> > >  				   u64 addr, struct addr_location *al)  { @@ -
> > 1585,6 +1603,15 @@
> > > struct symbol *thread__find_symbol(struct thread *thread, u8 cpumode,
> > >  	return al->sym;
> > >  }

> > > +struct symbol *thread__find_symbol_fb(struct thread *thread, u8
> > cpumode,
> > > +				      u64 addr, struct addr_location *al)

> > ... and the other as _fb, make it consistent, please.

> ok

> > > +{
> > > +	al->sym = NULL;
> > > +	if (thread__find_map_fallback(thread, cpumode, addr, al))
> > > +		al->sym = map__find_symbol(al->map, al->addr);
> > > +	return al->sym;
> > > +}

> > >  /*
> > >   * Callers need to drop the reference to al->thread, obtained in
> > >   * machine__findnew_thread()
> > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > > index 111ae858cbcb..04edc0eac376 100644
> > > --- a/tools/perf/util/machine.c
> > > +++ b/tools/perf/util/machine.c
> > > @@ -2542,6 +2542,46 @@ int machine__get_kernel_start(struct machine
> > *machine)
> > >  	return err;
> > >  }

> > > +/*
> > > + * machine__single_ku_as - Machine has same address space for kernel
> > and user.
> > > + * @machine: machine object
> > > + *
> > > + * Some architectures have a single address space for kernel and user
> > > +addresses,
> > > + * which makes it possible to determine if an address is in kernel
> > > +space or user
> > > + * space.
> > > + */
> > > +static bool machine__single_ku_as(struct machine *machine) {
> > > +	return strcmp(perf_env__arch(machine->env), "sparc"); }

> > Can we avoid having this strcmp be done repeatedly?
 
> It is only done if there are mapping errors
 
>  > Can we avoid having this strcmp be done repeatedly? I.e. just make this a
> > boolean initialized at session start, when machine->env is setup, so we'd
> > have:

> >    machine->single_address_space

> > Instead of a function?
 
> Sure  thing.

Thanks
 
> > 
> > Also have you considered making this fallback to be performed only from
> > code that is that arch specific?
> > 
> > I.e. the code that supports branch samples/stacks is x86_86 specific at this
> > point and thus only in that case we would call the routines that perform the
> > fallback, which, in turn, wouldn't need to check for "sparc"?
 
> I will look at it, but theoretically someone could be processing x86 data but
> doing it on a machine of a different architecture.

Right, that should be supported, yes. What I meant was that when
processing perf.data file with samples where the cpumode can't be
inferred, we should use the fallback routines.

It is super unfortunate that we have addresses without a accompanying
cpumode :-\ Don't you think those coulde be fixed somehow? If this comes
from things synthesized from Intel PT traces, then we can use the
address ranges for kernel/userspace to derive that before hitting the
core code, that would be fed with addr/cpumode pairs, just like we have
hdr.misc & USER/KERNEL and the PERF_CONTEXT_ markers in callchains.

- Arnaldo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 1/5] perf tools: Add fallback functions for cases where cpumode is insufficient
  2018-11-05 19:36       ` Arnaldo Carvalho de Melo
@ 2018-11-05 19:53         ` Hunter, Adrian
  2018-11-05 20:31           ` Arnaldo Carvalho de Melo
  2018-11-26 19:02           ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 25+ messages in thread
From: Hunter, Adrian @ 2018-11-05 19:53 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, linux-kernel, leo.yan, David Miller,
	Mathieu Poirier

> -----Original Message-----
> From: Arnaldo Carvalho de Melo [mailto:acme@kernel.org]
> Sent: Monday, November 5, 2018 9:36 PM
> To: Hunter, Adrian <adrian.hunter@intel.com>
> Cc: Jiri Olsa <jolsa@redhat.com>; Andi Kleen <ak@linux.intel.com>; linux-
> kernel@vger.kernel.org; leo.yan@linaro.org; David Miller
> <davem@davemloft.net>; Mathieu Poirier <mathieu.poirier@linaro.org>
> Subject: Re: [PATCH 1/5] perf tools: Add fallback functions for cases where
> cpumode is insufficient
> 
> Em Mon, Nov 05, 2018 at 07:21:44PM +0000, Hunter, Adrian escreveu:
> > > In Monday, November 5, 2018 7:30 PM, Arnaldo Carvalho de Melo wrote
> <SNIP>
> > > > +struct map *thread__find_map_fallback(struct thread *thread, u8
> > > cpumode,
> > > > +				      u64 addr, struct addr_location *al) {
> 
> > > You named one as _fallback...
> 
> > > > +	struct map *map = thread__find_map(thread, cpumode, addr, al);
> > > > +	struct machine *machine = thread->mg->machine;
> > > > +	u8 addr_cpumode = machine__addr_cpumode(machine, cpumode,
> > > addr);
> > > > +
> > > > +	if (map || addr_cpumode == cpumode)
> > > > +		return map;
> > > > +
> > > > +	return thread__find_map(thread, addr_cpumode, addr, al); }
> > > > +
> > > >  struct symbol *thread__find_symbol(struct thread *thread, u8
> cpumode,
> > > >  				   u64 addr, struct addr_location *al)  { @@ -
> > > 1585,6 +1603,15 @@
> > > > struct symbol *thread__find_symbol(struct thread *thread, u8
> cpumode,
> > > >  	return al->sym;
> > > >  }
> 
> > > > +struct symbol *thread__find_symbol_fb(struct thread *thread, u8
> > > cpumode,
> > > > +				      u64 addr, struct addr_location *al)
> 
> > > ... and the other as _fb, make it consistent, please.
> 
> > ok
> 
> > > > +{
> > > > +	al->sym = NULL;
> > > > +	if (thread__find_map_fallback(thread, cpumode, addr, al))
> > > > +		al->sym = map__find_symbol(al->map, al->addr);
> > > > +	return al->sym;
> > > > +}
> 
> > > >  /*
> > > >   * Callers need to drop the reference to al->thread, obtained in
> > > >   * machine__findnew_thread()
> > > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > > > index 111ae858cbcb..04edc0eac376 100644
> > > > --- a/tools/perf/util/machine.c
> > > > +++ b/tools/perf/util/machine.c
> > > > @@ -2542,6 +2542,46 @@ int machine__get_kernel_start(struct
> > > > machine
> > > *machine)
> > > >  	return err;
> > > >  }
> 
> > > > +/*
> > > > + * machine__single_ku_as - Machine has same address space for
> > > > +kernel
> > > and user.
> > > > + * @machine: machine object
> > > > + *
> > > > + * Some architectures have a single address space for kernel and
> > > > +user addresses,
> > > > + * which makes it possible to determine if an address is in
> > > > +kernel space or user
> > > > + * space.
> > > > + */
> > > > +static bool machine__single_ku_as(struct machine *machine) {
> > > > +	return strcmp(perf_env__arch(machine->env), "sparc"); }
> 
> > > Can we avoid having this strcmp be done repeatedly?
> 
> > It is only done if there are mapping errors
> 
> >  > Can we avoid having this strcmp be done repeatedly? I.e. just make
> > this a
> > > boolean initialized at session start, when machine->env is setup, so
> > > we'd
> > > have:
> 
> > >    machine->single_address_space
> 
> > > Instead of a function?
> 
> > Sure  thing.
> 
> Thanks
> 
> > >
> > > Also have you considered making this fallback to be performed only
> > > from code that is that arch specific?
> > >
> > > I.e. the code that supports branch samples/stacks is x86_86 specific
> > > at this point and thus only in that case we would call the routines
> > > that perform the fallback, which, in turn, wouldn't need to check for
> "sparc"?
> 
> > I will look at it, but theoretically someone could be processing x86
> > data but doing it on a machine of a different architecture.
> 
> Right, that should be supported, yes. What I meant was that when
> processing perf.data file with samples where the cpumode can't be inferred,
> we should use the fallback routines.
> 
> It is super unfortunate that we have addresses without a accompanying
> cpumode :-\ Don't you think those coulde be fixed somehow? If this comes
> from things synthesized from Intel PT traces, then we can use the address
> ranges for kernel/userspace to derive that before hitting the core code, that
> would be fed with addr/cpumode pairs, just like we have hdr.misc &
> USER/KERNEL and the PERF_CONTEXT_ markers in callchains.

Yes we will probably need to look at that, but at the moment I would like a fix for stable.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] perf tools: Add fallback functions for cases where cpumode is insufficient
  2018-11-05 19:53         ` Hunter, Adrian
@ 2018-11-05 20:31           ` Arnaldo Carvalho de Melo
  2018-11-26 19:02           ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-11-05 20:31 UTC (permalink / raw)
  To: Hunter, Adrian
  Cc: Jiri Olsa, Andi Kleen, linux-kernel, leo.yan, David Miller,
	Mathieu Poirier

Em Mon, Nov 05, 2018 at 07:53:17PM +0000, Hunter, Adrian escreveu:
> > -----Original Message-----
> > From: Arnaldo Carvalho de Melo [mailto:acme@kernel.org]
> > Sent: Monday, November 5, 2018 9:36 PM
> > To: Hunter, Adrian <adrian.hunter@intel.com>
> > Cc: Jiri Olsa <jolsa@redhat.com>; Andi Kleen <ak@linux.intel.com>; linux-
> > kernel@vger.kernel.org; leo.yan@linaro.org; David Miller
> > <davem@davemloft.net>; Mathieu Poirier <mathieu.poirier@linaro.org>
> > Subject: Re: [PATCH 1/5] perf tools: Add fallback functions for cases where
> > cpumode is insufficient
> > 
> > Em Mon, Nov 05, 2018 at 07:21:44PM +0000, Hunter, Adrian escreveu:
> > > > In Monday, November 5, 2018 7:30 PM, Arnaldo Carvalho de Melo wrote
> > > > Also have you considered making this fallback to be performed only
> > > > from code that is that arch specific?

> > > > I.e. the code that supports branch samples/stacks is x86_86 specific
> > > > at this point and thus only in that case we would call the routines
> > > > that perform the fallback, which, in turn, wouldn't need to check for
> > "sparc"?
> > 
> > > I will look at it, but theoretically someone could be processing x86
> > > data but doing it on a machine of a different architecture.
> > 
> > Right, that should be supported, yes. What I meant was that when
> > processing perf.data file with samples where the cpumode can't be inferred,
> > we should use the fallback routines.
> > 
> > It is super unfortunate that we have addresses without a accompanying
> > cpumode :-\ Don't you think those coulde be fixed somehow? If this comes
> > from things synthesized from Intel PT traces, then we can use the address
> > ranges for kernel/userspace to derive that before hitting the core code, that
> > would be fed with addr/cpumode pairs, just like we have hdr.misc &
> > USER/KERNEL and the PERF_CONTEXT_ markers in callchains.
> 
> Yes we will probably need to look at that, but at the moment I would like a fix for stable.

Ok, with that check for archs like sparc, fair enough, and its great
that you consider doing the better fix on top of it, later, thanks!

- Arnaldo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] perf tools: Add fallback functions for cases where cpumode is insufficient
  2018-11-05 19:53         ` Hunter, Adrian
  2018-11-05 20:31           ` Arnaldo Carvalho de Melo
@ 2018-11-26 19:02           ` Arnaldo Carvalho de Melo
  2018-11-26 19:41             ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-11-26 19:02 UTC (permalink / raw)
  To: Hunter, Adrian
  Cc: Jiri Olsa, Andi Kleen, linux-kernel, leo.yan, David Miller,
	Mathieu Poirier

Em Mon, Nov 05, 2018 at 07:53:17PM +0000, Hunter, Adrian escreveu:
> > -----Original Message-----
> > From: Arnaldo Carvalho de Melo [mailto:acme@kernel.org]
> > Sent: Monday, November 5, 2018 9:36 PM
> > To: Hunter, Adrian <adrian.hunter@intel.com>
> > Cc: Jiri Olsa <jolsa@redhat.com>; Andi Kleen <ak@linux.intel.com>; linux-
> > kernel@vger.kernel.org; leo.yan@linaro.org; David Miller
> > <davem@davemloft.net>; Mathieu Poirier <mathieu.poirier@linaro.org>
> > Subject: Re: [PATCH 1/5] perf tools: Add fallback functions for cases where
> > cpumode is insufficient
> > 
> > Em Mon, Nov 05, 2018 at 07:21:44PM +0000, Hunter, Adrian escreveu:
> > > > In Monday, November 5, 2018 7:30 PM, Arnaldo Carvalho de Melo wrote
> > <SNIP>
> > > > > +struct map *thread__find_map_fallback(struct thread *thread, u8
> > > > cpumode,
> > > > > +				      u64 addr, struct addr_location *al) {
> > 
> > > > You named one as _fallback...
> > 
> > > > > +	struct map *map = thread__find_map(thread, cpumode, addr, al);
> > > > > +	struct machine *machine = thread->mg->machine;
> > > > > +	u8 addr_cpumode = machine__addr_cpumode(machine, cpumode,
> > > > addr);
> > > > > +
> > > > > +	if (map || addr_cpumode == cpumode)
> > > > > +		return map;
> > > > > +
> > > > > +	return thread__find_map(thread, addr_cpumode, addr, al); }
> > > > > +
> > > > >  struct symbol *thread__find_symbol(struct thread *thread, u8
> > cpumode,
> > > > >  				   u64 addr, struct addr_location *al)  { @@ -
> > > > 1585,6 +1603,15 @@
> > > > > struct symbol *thread__find_symbol(struct thread *thread, u8
> > cpumode,
> > > > >  	return al->sym;
> > > > >  }
> > 
> > > > > +struct symbol *thread__find_symbol_fb(struct thread *thread, u8
> > > > cpumode,
> > > > > +				      u64 addr, struct addr_location *al)
> > 
> > > > ... and the other as _fb, make it consistent, please.
> > 
> > > ok
> > 
> > > > > +{
> > > > > +	al->sym = NULL;
> > > > > +	if (thread__find_map_fallback(thread, cpumode, addr, al))
> > > > > +		al->sym = map__find_symbol(al->map, al->addr);
> > > > > +	return al->sym;
> > > > > +}
> > 
> > > > >  /*
> > > > >   * Callers need to drop the reference to al->thread, obtained in
> > > > >   * machine__findnew_thread()
> > > > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > > > > index 111ae858cbcb..04edc0eac376 100644
> > > > > --- a/tools/perf/util/machine.c
> > > > > +++ b/tools/perf/util/machine.c
> > > > > @@ -2542,6 +2542,46 @@ int machine__get_kernel_start(struct
> > > > > machine
> > > > *machine)
> > > > >  	return err;
> > > > >  }
> > 
> > > > > +/*
> > > > > + * machine__single_ku_as - Machine has same address space for
> > > > > +kernel
> > > > and user.
> > > > > + * @machine: machine object
> > > > > + *
> > > > > + * Some architectures have a single address space for kernel and
> > > > > +user addresses,
> > > > > + * which makes it possible to determine if an address is in
> > > > > +kernel space or user
> > > > > + * space.
> > > > > + */
> > > > > +static bool machine__single_ku_as(struct machine *machine) {
> > > > > +	return strcmp(perf_env__arch(machine->env), "sparc"); }
> > 
> > > > Can we avoid having this strcmp be done repeatedly?
> > 
> > > It is only done if there are mapping errors
> > 
> > >  > Can we avoid having this strcmp be done repeatedly? I.e. just make
> > > this a
> > > > boolean initialized at session start, when machine->env is setup, so
> > > > we'd
> > > > have:
> > 
> > > >    machine->single_address_space
> > 
> > > > Instead of a function?
> > 
> > > Sure  thing.

Have you sent an update to this patch series addressing the concerns? I
think that renaming that _fb to __fallback and having this
machine->single_address_space is what is missing, I'll try to take a
stab at this now and have it in my perf/core branch soon.

But you have this already done, please send it.

- Arnaldo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] perf tools: Add fallback functions for cases where cpumode is insufficient
  2018-11-26 19:02           ` Arnaldo Carvalho de Melo
@ 2018-11-26 19:41             ` Arnaldo Carvalho de Melo
  2018-11-26 19:44               ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-11-26 19:41 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Jiri Olsa, Andi Kleen, linux-kernel, Leo Yan, David Miller,
	Mathieu Poirier

Em Mon, Nov 26, 2018 at 04:02:17PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Nov 05, 2018 at 07:53:17PM +0000, Hunter, Adrian escreveu:
> > > From: Arnaldo Carvalho de Melo [mailto:acme@kernel.org]
> > > Em Mon, Nov 05, 2018 at 07:21:44PM +0000, Hunter, Adrian escreveu:
> > > > > > +static bool machine__single_ku_as(struct machine *machine) {
> > > > > > +	return strcmp(perf_env__arch(machine->env), "sparc"); }

> > > > > Can we avoid having this strcmp be done repeatedly?

> > > > It is only done if there are mapping errors

> > > > > Can we avoid having this strcmp be done repeatedly? I.e. just
> > > > > make this a boolean initialized at session start, when
> > > > > machine->env is setup, so we'd have:

> > > > >    machine->single_address_space

> > > > > Instead of a function?

> > > > Sure  thing.
> 
> Have you sent an update to this patch series addressing the concerns? I
> think that renaming that _fb to __fallback and having this
> machine->single_address_space is what is missing, I'll try to take a
> stab at this now and have it in my perf/core branch soon.
 
> But you have this already done, please send it.

So, here is the patch on top of yours and at the end a patch that I'm
inserting just before yours, please check and ack, thanks:

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index da523d99be4b..4506aa7fa557 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2592,25 +2592,13 @@ int machine__get_kernel_start(struct machine *machine)
 	return err;
 }
 
-/*
- * machine__single_ku_as - Machine has same address space for kernel and user.
- * @machine: machine object
- *
- * Some architectures have a single address space for kernel and user addresses,
- * which makes it possible to determine if an address is in kernel space or user
- * space.
- */
-static bool machine__single_ku_as(struct machine *machine)
-{
-	return strcmp(perf_env__arch(machine->env), "sparc");
-}
 
 u8 machine__addr_cpumode(struct machine *machine, u8 cpumode, u64 addr)
 {
 	u8 addr_cpumode = cpumode;
 	bool kernel_ip;
 
-	if (!machine__single_ku_as(machine))
+	if (!machine->env->single_address_space)
 		goto out;
 
 	kernel_ip = machine__kernel_ip(machine, addr);


commit 5aabc18da787f7f6785b4027c63a89592e4e2c03
Author: Arnaldo Carvalho de Melo <acme@redhat.com>
Date:   Mon Nov 26 16:25:25 2018 -0300

    perf env: Record if a arch has a single user/kernel address space
    
    Some architectures have a single address space for kernel and user
    addresses, which makes it possible to determine if an address is in
    kernel space or user space. Some don't, e.g.: sparc.
    
    Cache that info in perf_env so that, for instance, code needing to
    fallback failed symbol lookups at the kernel space in single address
    space arches can lookup at userspace.
    
    Cc: Adrian Hunter <adrian.hunter@intel.com>
    Cc: Andi Kleen <ak@linux.intel.com>
    Cc: David Ahern <dsahern@gmail.com>
    Cc: David Miller <davem@davemloft.net>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Leo Yan <leo.yan@linaro.org>
    Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Wang Nan <wangnan0@huawei.com>
    Link: https://lkml.kernel.org/n/tip-4jb2rusqml7utgebiaf51k77@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 59f38c7693f8..d1cae06a2dfc 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -8,6 +8,16 @@
 
 struct perf_env perf_env;
 
+/*
+ * Some architectures have a single address space for kernel and user
+ * addresses, which makes it possible to determine if an address is in kernel
+ * space or user space. Some don't, e.g.: sparc
+ */
+void perf_env__init_single_address_space(struct perf_env *env)
+{
+	env->single_address_space = strcmp(perf_env__arch(env), "sparc");
+}
+
 void perf_env__exit(struct perf_env *env)
 {
 	int i;
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index d01b8355f4ca..a79898f6417d 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -60,6 +60,12 @@ struct perf_env {
 	struct cpu_topology_map	*cpu;
 	struct cpu_cache_level	*caches;
 	int			 caches_cnt;
+	/*
+         * Some architectures have a single address space for kernel and user
+         * addresses, which makes it possible to determine if an address is in
+         * kernel space or user space. Some don't, e.g.: sparc
+         */
+	bool			single_address_space;
 	struct numa_node	*numa_nodes;
 	struct memory_node	*memory_nodes;
 	unsigned long long	 memory_bsize;
@@ -72,6 +78,8 @@ void perf_env__exit(struct perf_env *env);
 
 int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[]);
 
+void perf_env__init_single_address_space(struct perf_env *env);
+
 int perf_env__read_cpu_topology_map(struct perf_env *env);
 
 void cpu_cache_level__free(struct cpu_cache_level *cache);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 7d2c8ce6cfad..164d56229135 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -150,6 +150,8 @@ struct perf_session *perf_session__new(struct perf_data *data,
 		session->machines.host.env = &perf_env;
 	}
 
+	perf_env__init_single_address_space(session->machines.host.env);
+
 	if (!data || perf_data__is_write(data)) {
 		/*
 		 * In O_RDONLY mode this will be performed when reading the

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] perf tools: Add fallback functions for cases where cpumode is insufficient
  2018-11-26 19:41             ` Arnaldo Carvalho de Melo
@ 2018-11-26 19:44               ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-11-26 19:44 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Jiri Olsa, Andi Kleen, linux-kernel, Leo Yan, David Miller,
	Mathieu Poirier

Em Mon, Nov 26, 2018 at 04:41:45PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Nov 26, 2018 at 04:02:17PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Mon, Nov 05, 2018 at 07:53:17PM +0000, Hunter, Adrian escreveu:
> > > > From: Arnaldo Carvalho de Melo [mailto:acme@kernel.org]
> > > > Em Mon, Nov 05, 2018 at 07:21:44PM +0000, Hunter, Adrian escreveu:
> > > > > > > +static bool machine__single_ku_as(struct machine *machine) {
> > > > > > > +	return strcmp(perf_env__arch(machine->env), "sparc"); }
> 
> > > > > > Can we avoid having this strcmp be done repeatedly?
> 
> > > > > It is only done if there are mapping errors
> 
> > > > > > Can we avoid having this strcmp be done repeatedly? I.e. just
> > > > > > make this a boolean initialized at session start, when
> > > > > > machine->env is setup, so we'd have:
> 
> > > > > >    machine->single_address_space
> 
> > > > > > Instead of a function?
> 
> > > > > Sure  thing.
> > 
> > Have you sent an update to this patch series addressing the concerns? I
> > think that renaming that _fb to __fallback and having this
> > machine->single_address_space is what is missing, I'll try to take a
> > stab at this now and have it in my perf/core branch soon.
>  
> > But you have this already done, please send it.
> 
> So, here is the patch on top of yours and at the end a patch that I'm
> inserting just before yours, please check and ack, thanks:


Argh, nevermind, just saw your V2, /me goes to read it ;-\

- Arnaldo

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-11-27 12:36 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-31  9:10 [PATCH 0/5] perf tools: Fix for cases where cpumode is incorrect or insufficient Adrian Hunter
2018-10-31  9:10 ` [PATCH 1/5] perf tools: Add fallback functions for cases where cpumode is insufficient Adrian Hunter
2018-11-05 17:29   ` Arnaldo Carvalho de Melo
2018-11-05 18:19     ` David Miller
2018-11-05 19:21     ` Hunter, Adrian
2018-11-05 19:36       ` Arnaldo Carvalho de Melo
2018-11-05 19:53         ` Hunter, Adrian
2018-11-05 20:31           ` Arnaldo Carvalho de Melo
2018-11-26 19:02           ` Arnaldo Carvalho de Melo
2018-11-26 19:41             ` Arnaldo Carvalho de Melo
2018-11-26 19:44               ` Arnaldo Carvalho de Melo
2018-10-31  9:10 ` [PATCH 2/5] perf tools: Use fallback for sample_addr_correlates_sym() cases Adrian Hunter
2018-11-03 17:46   ` Hunter, Adrian
2018-10-31  9:10 ` [PATCH 3/5] perf tools: Use fallbacks for branch stacks Adrian Hunter
2018-11-05 17:56   ` Arnaldo Carvalho de Melo
2018-10-31  9:10 ` [PATCH 4/5] perf intel-pt: Insert callchain context into synthesized callchains Adrian Hunter
2018-10-31 13:28   ` Arnaldo Carvalho de Melo
2018-10-31 14:15     ` Adrian Hunter
2018-10-31 14:20       ` Adrian Hunter
2018-10-31 14:27         ` Arnaldo Carvalho de Melo
2018-10-31 22:12         ` [tip:perf/urgent] " tip-bot for Adrian Hunter
2018-10-31 14:26       ` [PATCH 4/5] " Arnaldo Carvalho de Melo
2018-10-31  9:10 ` [PATCH 5/5] perf intel-pt/bts: Calculate cpumode for synthesized samples Adrian Hunter
2018-10-31 22:13   ` [tip:perf/urgent] " tip-bot for Adrian Hunter
2018-10-31 13:35 ` [PATCH 0/5] perf tools: Fix for cases where cpumode is incorrect or insufficient Jiri Olsa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).