* [PATCH] perf cs-etm: Correct CPU mode for samples @ 2018-10-30 7:18 ` Leo Yan 0 siblings, 0 replies; 13+ messages in thread From: Leo Yan @ 2018-10-30 7:18 UTC (permalink / raw) To: Arnaldo Carvalho de Melo, Mathieu Poirier, Peter Zijlstra, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel, Coresight ML Cc: Leo Yan Since commit 9042f5e3539e ("perf tools: Stop fallbacking to kallsyms for vdso symbols lookup"), the kernel address cannot be properly parsed to kernel symbol with command 'perf script -k vmlinux'. The reason is CoreSight samples is always to set CPU mode as PERF_RECORD_MISC_USER, thus it fails to find corresponding map/dso in below flows: process_sample_event() `-> machine__resolve() `-> thread__find_map(thread, sample->cpumode, sample->ip, al); In this flow it needs to pass argument 'sample->cpumode' to tell what's the CPU mode, before it always passed PERF_RECORD_MISC_USER but without any failure until the commit 9042f5e3539e ("perf tools: Stop fallbacking to kallsyms for vdso symbols lookup") has been merged. The reason is even with the wrong CPU mode the function thread__find_map() firstly fails to find map but it will rollback to find kernel map for vdso symbols lookup. In the latest code it has removed the fallback code, thus if CPU mode is PERF_RECORD_MISC_USER then it cannot find map anymore with kernel address. This patch is to correct samples CPU mode setting, it creates a new helper function cs_etm__cpu_mode() to tell what's the CPU mode based on the address with the info from machine structure; this patch has a bit extension to check not only kernel and user mode, but also check for host/guest and hypervisor mode. Finally this patch uses the function in instruction and branch samples and also apply in cs_etm__mem_access() for a minor polishing. Signed-off-by: Leo Yan <leo.yan@linaro.org> --- tools/perf/util/cs-etm.c | 39 ++++++++++++++++++++++++++++++--------- 1 file changed, 30 insertions(+), 9 deletions(-) diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 3b37d66..73430b7 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -244,6 +244,27 @@ static void cs_etm__free(struct perf_session *session) zfree(&aux); } +static u8 cs_etm__cpu_mode(struct cs_etm_queue *etmq, u64 address) +{ + struct machine *machine; + + machine = etmq->etm->machine; + + if (address >= etmq->etm->kernel_start) { + if (machine__is_host(machine)) + return PERF_RECORD_MISC_KERNEL; + else + return PERF_RECORD_MISC_GUEST_KERNEL; + } else { + if (machine__is_host(machine)) + return PERF_RECORD_MISC_USER; + else if (perf_guest) + return PERF_RECORD_MISC_GUEST_USER; + else + return PERF_RECORD_MISC_HYPERVISOR; + } +} + static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u64 address, size_t size, u8 *buffer) { @@ -258,10 +279,7 @@ static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u64 address, return -1; machine = etmq->etm->machine; - if (address >= etmq->etm->kernel_start) - cpumode = PERF_RECORD_MISC_KERNEL; - else - cpumode = PERF_RECORD_MISC_USER; + cpumode = cs_etm__cpu_mode(etmq, address); thread = etmq->thread; if (!thread) { @@ -653,7 +671,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, struct perf_sample sample = {.ip = 0,}; event->sample.header.type = PERF_RECORD_SAMPLE; - event->sample.header.misc = PERF_RECORD_MISC_USER; + event->sample.header.misc = cs_etm__cpu_mode(etmq, addr); event->sample.header.size = sizeof(struct perf_event_header); sample.ip = addr; @@ -665,7 +683,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, sample.cpu = etmq->packet->cpu; sample.flags = 0; sample.insn_len = 1; - sample.cpumode = event->header.misc; + sample.cpumode = event->sample.header.misc; if (etm->synth_opts.last_branch) { cs_etm__copy_last_branch_rb(etmq); @@ -706,12 +724,15 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq) u64 nr; struct branch_entry entries; } dummy_bs; + u64 ip; + + ip = cs_etm__last_executed_instr(etmq->prev_packet); event->sample.header.type = PERF_RECORD_SAMPLE; - event->sample.header.misc = PERF_RECORD_MISC_USER; + event->sample.header.misc = cs_etm__cpu_mode(etmq, ip); event->sample.header.size = sizeof(struct perf_event_header); - sample.ip = cs_etm__last_executed_instr(etmq->prev_packet); + sample.ip = ip; sample.pid = etmq->pid; sample.tid = etmq->tid; sample.addr = cs_etm__first_executed_instr(etmq->packet); @@ -720,7 +741,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq) sample.period = 1; sample.cpu = etmq->packet->cpu; sample.flags = 0; - sample.cpumode = PERF_RECORD_MISC_USER; + sample.cpumode = event->sample.header.misc; /* * perf report cannot handle events without a branch stack -- 2.7.4 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH] perf cs-etm: Correct CPU mode for samples @ 2018-10-30 7:18 ` Leo Yan 0 siblings, 0 replies; 13+ messages in thread From: Leo Yan @ 2018-10-30 7:18 UTC (permalink / raw) To: linux-arm-kernel Since commit 9042f5e3539e ("perf tools: Stop fallbacking to kallsyms for vdso symbols lookup"), the kernel address cannot be properly parsed to kernel symbol with command 'perf script -k vmlinux'. The reason is CoreSight samples is always to set CPU mode as PERF_RECORD_MISC_USER, thus it fails to find corresponding map/dso in below flows: process_sample_event() `-> machine__resolve() `-> thread__find_map(thread, sample->cpumode, sample->ip, al); In this flow it needs to pass argument 'sample->cpumode' to tell what's the CPU mode, before it always passed PERF_RECORD_MISC_USER but without any failure until the commit 9042f5e3539e ("perf tools: Stop fallbacking to kallsyms for vdso symbols lookup") has been merged. The reason is even with the wrong CPU mode the function thread__find_map() firstly fails to find map but it will rollback to find kernel map for vdso symbols lookup. In the latest code it has removed the fallback code, thus if CPU mode is PERF_RECORD_MISC_USER then it cannot find map anymore with kernel address. This patch is to correct samples CPU mode setting, it creates a new helper function cs_etm__cpu_mode() to tell what's the CPU mode based on the address with the info from machine structure; this patch has a bit extension to check not only kernel and user mode, but also check for host/guest and hypervisor mode. Finally this patch uses the function in instruction and branch samples and also apply in cs_etm__mem_access() for a minor polishing. Signed-off-by: Leo Yan <leo.yan@linaro.org> --- tools/perf/util/cs-etm.c | 39 ++++++++++++++++++++++++++++++--------- 1 file changed, 30 insertions(+), 9 deletions(-) diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 3b37d66..73430b7 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -244,6 +244,27 @@ static void cs_etm__free(struct perf_session *session) zfree(&aux); } +static u8 cs_etm__cpu_mode(struct cs_etm_queue *etmq, u64 address) +{ + struct machine *machine; + + machine = etmq->etm->machine; + + if (address >= etmq->etm->kernel_start) { + if (machine__is_host(machine)) + return PERF_RECORD_MISC_KERNEL; + else + return PERF_RECORD_MISC_GUEST_KERNEL; + } else { + if (machine__is_host(machine)) + return PERF_RECORD_MISC_USER; + else if (perf_guest) + return PERF_RECORD_MISC_GUEST_USER; + else + return PERF_RECORD_MISC_HYPERVISOR; + } +} + static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u64 address, size_t size, u8 *buffer) { @@ -258,10 +279,7 @@ static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u64 address, return -1; machine = etmq->etm->machine; - if (address >= etmq->etm->kernel_start) - cpumode = PERF_RECORD_MISC_KERNEL; - else - cpumode = PERF_RECORD_MISC_USER; + cpumode = cs_etm__cpu_mode(etmq, address); thread = etmq->thread; if (!thread) { @@ -653,7 +671,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, struct perf_sample sample = {.ip = 0,}; event->sample.header.type = PERF_RECORD_SAMPLE; - event->sample.header.misc = PERF_RECORD_MISC_USER; + event->sample.header.misc = cs_etm__cpu_mode(etmq, addr); event->sample.header.size = sizeof(struct perf_event_header); sample.ip = addr; @@ -665,7 +683,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, sample.cpu = etmq->packet->cpu; sample.flags = 0; sample.insn_len = 1; - sample.cpumode = event->header.misc; + sample.cpumode = event->sample.header.misc; if (etm->synth_opts.last_branch) { cs_etm__copy_last_branch_rb(etmq); @@ -706,12 +724,15 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq) u64 nr; struct branch_entry entries; } dummy_bs; + u64 ip; + + ip = cs_etm__last_executed_instr(etmq->prev_packet); event->sample.header.type = PERF_RECORD_SAMPLE; - event->sample.header.misc = PERF_RECORD_MISC_USER; + event->sample.header.misc = cs_etm__cpu_mode(etmq, ip); event->sample.header.size = sizeof(struct perf_event_header); - sample.ip = cs_etm__last_executed_instr(etmq->prev_packet); + sample.ip = ip; sample.pid = etmq->pid; sample.tid = etmq->tid; sample.addr = cs_etm__first_executed_instr(etmq->packet); @@ -720,7 +741,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq) sample.period = 1; sample.cpu = etmq->packet->cpu; sample.flags = 0; - sample.cpumode = PERF_RECORD_MISC_USER; + sample.cpumode = event->sample.header.misc; /* * perf report cannot handle events without a branch stack -- 2.7.4 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] perf cs-etm: Correct CPU mode for samples 2018-10-30 7:18 ` Leo Yan @ 2018-10-30 14:32 ` Arnaldo Carvalho de Melo -1 siblings, 0 replies; 13+ messages in thread From: Arnaldo Carvalho de Melo @ 2018-10-30 14:32 UTC (permalink / raw) To: Leo Yan Cc: Mathieu Poirier, Peter Zijlstra, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel, Coresight ML Em Tue, Oct 30, 2018 at 03:18:28PM +0800, Leo Yan escreveu: > Since commit 9042f5e3539e ("perf tools: Stop fallbacking to kallsyms > for vdso symbols lookup"), the kernel address cannot be properly parsed > to kernel symbol with command 'perf script -k vmlinux'. The reason is > CoreSight samples is always to set CPU mode as PERF_RECORD_MISC_USER, > thus it fails to find corresponding map/dso in below flows: > > process_sample_event() > `-> machine__resolve() > `-> thread__find_map(thread, sample->cpumode, sample->ip, al); > > In this flow it needs to pass argument 'sample->cpumode' to tell what's > the CPU mode, before it always passed PERF_RECORD_MISC_USER but without > any failure until the commit 9042f5e3539e ("perf tools: Stop fallbacking > to kallsyms for vdso symbols lookup") has been merged. The reason is > even with the wrong CPU mode the function thread__find_map() firstly > fails to find map but it will rollback to find kernel map for vdso > symbols lookup. In the latest code it has removed the fallback code, > thus if CPU mode is PERF_RECORD_MISC_USER then it cannot find map > anymore with kernel address. > > This patch is to correct samples CPU mode setting, it creates a new > helper function cs_etm__cpu_mode() to tell what's the CPU mode based on > the address with the info from machine structure; this patch has a bit > extension to check not only kernel and user mode, but also check for > host/guest and hypervisor mode. Finally this patch uses the function > in instruction and branch samples and also apply in cs_etm__mem_access() > for a minor polishing. Mathieu, can I have your Acked-by, please? Leo, thanks for acting so quickly on this one! Now processing coresight traces should be faster, less lookups :-) - Arnaldo > Signed-off-by: Leo Yan <leo.yan@linaro.org> > --- > tools/perf/util/cs-etm.c | 39 ++++++++++++++++++++++++++++++--------- > 1 file changed, 30 insertions(+), 9 deletions(-) > > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c > index 3b37d66..73430b7 100644 > --- a/tools/perf/util/cs-etm.c > +++ b/tools/perf/util/cs-etm.c > @@ -244,6 +244,27 @@ static void cs_etm__free(struct perf_session *session) > zfree(&aux); > } > > +static u8 cs_etm__cpu_mode(struct cs_etm_queue *etmq, u64 address) > +{ > + struct machine *machine; > + > + machine = etmq->etm->machine; > + > + if (address >= etmq->etm->kernel_start) { > + if (machine__is_host(machine)) > + return PERF_RECORD_MISC_KERNEL; > + else > + return PERF_RECORD_MISC_GUEST_KERNEL; > + } else { > + if (machine__is_host(machine)) > + return PERF_RECORD_MISC_USER; > + else if (perf_guest) > + return PERF_RECORD_MISC_GUEST_USER; > + else > + return PERF_RECORD_MISC_HYPERVISOR; > + } > +} > + > static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u64 address, > size_t size, u8 *buffer) > { > @@ -258,10 +279,7 @@ static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u64 address, > return -1; > > machine = etmq->etm->machine; > - if (address >= etmq->etm->kernel_start) > - cpumode = PERF_RECORD_MISC_KERNEL; > - else > - cpumode = PERF_RECORD_MISC_USER; > + cpumode = cs_etm__cpu_mode(etmq, address); > > thread = etmq->thread; > if (!thread) { > @@ -653,7 +671,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, > struct perf_sample sample = {.ip = 0,}; > > event->sample.header.type = PERF_RECORD_SAMPLE; > - event->sample.header.misc = PERF_RECORD_MISC_USER; > + event->sample.header.misc = cs_etm__cpu_mode(etmq, addr); > event->sample.header.size = sizeof(struct perf_event_header); > > sample.ip = addr; > @@ -665,7 +683,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, > sample.cpu = etmq->packet->cpu; > sample.flags = 0; > sample.insn_len = 1; > - sample.cpumode = event->header.misc; > + sample.cpumode = event->sample.header.misc; > > if (etm->synth_opts.last_branch) { > cs_etm__copy_last_branch_rb(etmq); > @@ -706,12 +724,15 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq) > u64 nr; > struct branch_entry entries; > } dummy_bs; > + u64 ip; > + > + ip = cs_etm__last_executed_instr(etmq->prev_packet); > > event->sample.header.type = PERF_RECORD_SAMPLE; > - event->sample.header.misc = PERF_RECORD_MISC_USER; > + event->sample.header.misc = cs_etm__cpu_mode(etmq, ip); > event->sample.header.size = sizeof(struct perf_event_header); > > - sample.ip = cs_etm__last_executed_instr(etmq->prev_packet); > + sample.ip = ip; > sample.pid = etmq->pid; > sample.tid = etmq->tid; > sample.addr = cs_etm__first_executed_instr(etmq->packet); > @@ -720,7 +741,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq) > sample.period = 1; > sample.cpu = etmq->packet->cpu; > sample.flags = 0; > - sample.cpumode = PERF_RECORD_MISC_USER; > + sample.cpumode = event->sample.header.misc; > > /* > * perf report cannot handle events without a branch stack > -- > 2.7.4 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] perf cs-etm: Correct CPU mode for samples @ 2018-10-30 14:32 ` Arnaldo Carvalho de Melo 0 siblings, 0 replies; 13+ messages in thread From: Arnaldo Carvalho de Melo @ 2018-10-30 14:32 UTC (permalink / raw) To: linux-arm-kernel Em Tue, Oct 30, 2018 at 03:18:28PM +0800, Leo Yan escreveu: > Since commit 9042f5e3539e ("perf tools: Stop fallbacking to kallsyms > for vdso symbols lookup"), the kernel address cannot be properly parsed > to kernel symbol with command 'perf script -k vmlinux'. The reason is > CoreSight samples is always to set CPU mode as PERF_RECORD_MISC_USER, > thus it fails to find corresponding map/dso in below flows: > > process_sample_event() > `-> machine__resolve() > `-> thread__find_map(thread, sample->cpumode, sample->ip, al); > > In this flow it needs to pass argument 'sample->cpumode' to tell what's > the CPU mode, before it always passed PERF_RECORD_MISC_USER but without > any failure until the commit 9042f5e3539e ("perf tools: Stop fallbacking > to kallsyms for vdso symbols lookup") has been merged. The reason is > even with the wrong CPU mode the function thread__find_map() firstly > fails to find map but it will rollback to find kernel map for vdso > symbols lookup. In the latest code it has removed the fallback code, > thus if CPU mode is PERF_RECORD_MISC_USER then it cannot find map > anymore with kernel address. > > This patch is to correct samples CPU mode setting, it creates a new > helper function cs_etm__cpu_mode() to tell what's the CPU mode based on > the address with the info from machine structure; this patch has a bit > extension to check not only kernel and user mode, but also check for > host/guest and hypervisor mode. Finally this patch uses the function > in instruction and branch samples and also apply in cs_etm__mem_access() > for a minor polishing. Mathieu, can I have your Acked-by, please? Leo, thanks for acting so quickly on this one! Now processing coresight traces should be faster, less lookups :-) - Arnaldo > Signed-off-by: Leo Yan <leo.yan@linaro.org> > --- > tools/perf/util/cs-etm.c | 39 ++++++++++++++++++++++++++++++--------- > 1 file changed, 30 insertions(+), 9 deletions(-) > > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c > index 3b37d66..73430b7 100644 > --- a/tools/perf/util/cs-etm.c > +++ b/tools/perf/util/cs-etm.c > @@ -244,6 +244,27 @@ static void cs_etm__free(struct perf_session *session) > zfree(&aux); > } > > +static u8 cs_etm__cpu_mode(struct cs_etm_queue *etmq, u64 address) > +{ > + struct machine *machine; > + > + machine = etmq->etm->machine; > + > + if (address >= etmq->etm->kernel_start) { > + if (machine__is_host(machine)) > + return PERF_RECORD_MISC_KERNEL; > + else > + return PERF_RECORD_MISC_GUEST_KERNEL; > + } else { > + if (machine__is_host(machine)) > + return PERF_RECORD_MISC_USER; > + else if (perf_guest) > + return PERF_RECORD_MISC_GUEST_USER; > + else > + return PERF_RECORD_MISC_HYPERVISOR; > + } > +} > + > static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u64 address, > size_t size, u8 *buffer) > { > @@ -258,10 +279,7 @@ static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u64 address, > return -1; > > machine = etmq->etm->machine; > - if (address >= etmq->etm->kernel_start) > - cpumode = PERF_RECORD_MISC_KERNEL; > - else > - cpumode = PERF_RECORD_MISC_USER; > + cpumode = cs_etm__cpu_mode(etmq, address); > > thread = etmq->thread; > if (!thread) { > @@ -653,7 +671,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, > struct perf_sample sample = {.ip = 0,}; > > event->sample.header.type = PERF_RECORD_SAMPLE; > - event->sample.header.misc = PERF_RECORD_MISC_USER; > + event->sample.header.misc = cs_etm__cpu_mode(etmq, addr); > event->sample.header.size = sizeof(struct perf_event_header); > > sample.ip = addr; > @@ -665,7 +683,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, > sample.cpu = etmq->packet->cpu; > sample.flags = 0; > sample.insn_len = 1; > - sample.cpumode = event->header.misc; > + sample.cpumode = event->sample.header.misc; > > if (etm->synth_opts.last_branch) { > cs_etm__copy_last_branch_rb(etmq); > @@ -706,12 +724,15 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq) > u64 nr; > struct branch_entry entries; > } dummy_bs; > + u64 ip; > + > + ip = cs_etm__last_executed_instr(etmq->prev_packet); > > event->sample.header.type = PERF_RECORD_SAMPLE; > - event->sample.header.misc = PERF_RECORD_MISC_USER; > + event->sample.header.misc = cs_etm__cpu_mode(etmq, ip); > event->sample.header.size = sizeof(struct perf_event_header); > > - sample.ip = cs_etm__last_executed_instr(etmq->prev_packet); > + sample.ip = ip; > sample.pid = etmq->pid; > sample.tid = etmq->tid; > sample.addr = cs_etm__first_executed_instr(etmq->packet); > @@ -720,7 +741,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq) > sample.period = 1; > sample.cpu = etmq->packet->cpu; > sample.flags = 0; > - sample.cpumode = PERF_RECORD_MISC_USER; > + sample.cpumode = event->sample.header.misc; > > /* > * perf report cannot handle events without a branch stack > -- > 2.7.4 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] perf cs-etm: Correct CPU mode for samples 2018-10-30 14:32 ` Arnaldo Carvalho de Melo @ 2018-10-30 15:04 ` leo.yan at linaro.org -1 siblings, 0 replies; 13+ messages in thread From: leo.yan @ 2018-10-30 15:04 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Mathieu Poirier, Peter Zijlstra, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel, Coresight ML Hi Arnaldo, On Tue, Oct 30, 2018 at 11:32:26AM -0300, Arnaldo Carvalho de Melo wrote: > Em Tue, Oct 30, 2018 at 03:18:28PM +0800, Leo Yan escreveu: > > Since commit 9042f5e3539e ("perf tools: Stop fallbacking to kallsyms > > for vdso symbols lookup"), the kernel address cannot be properly parsed > > to kernel symbol with command 'perf script -k vmlinux'. The reason is > > CoreSight samples is always to set CPU mode as PERF_RECORD_MISC_USER, > > thus it fails to find corresponding map/dso in below flows: > > > > process_sample_event() > > `-> machine__resolve() > > `-> thread__find_map(thread, sample->cpumode, sample->ip, al); > > > > In this flow it needs to pass argument 'sample->cpumode' to tell what's > > the CPU mode, before it always passed PERF_RECORD_MISC_USER but without > > any failure until the commit 9042f5e3539e ("perf tools: Stop fallbacking > > to kallsyms for vdso symbols lookup") has been merged. The reason is > > even with the wrong CPU mode the function thread__find_map() firstly > > fails to find map but it will rollback to find kernel map for vdso > > symbols lookup. In the latest code it has removed the fallback code, > > thus if CPU mode is PERF_RECORD_MISC_USER then it cannot find map > > anymore with kernel address. > > > > This patch is to correct samples CPU mode setting, it creates a new > > helper function cs_etm__cpu_mode() to tell what's the CPU mode based on > > the address with the info from machine structure; this patch has a bit > > extension to check not only kernel and user mode, but also check for > > host/guest and hypervisor mode. Finally this patch uses the function > > in instruction and branch samples and also apply in cs_etm__mem_access() > > for a minor polishing. > > Mathieu, can I have your Acked-by, please? Leo, thanks for acting so > quickly on this one! Thanks for reivewing. Yeah, let's wait for Mathieu reviewing as well, as I know he is travelling so might be delay a bit. Just remind, we might need the similiar change for util/intel-pt.c and util/intel-bts.c when generate samples, otherwise they might have the same regression for kernel symbols. I am not the best person to change these two files, but bring up this for attention. > Now processing coresight traces should be faster, less lookups :-) Thanks! [...] Thanks, Leo Yan ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] perf cs-etm: Correct CPU mode for samples @ 2018-10-30 15:04 ` leo.yan at linaro.org 0 siblings, 0 replies; 13+ messages in thread From: leo.yan at linaro.org @ 2018-10-30 15:04 UTC (permalink / raw) To: linux-arm-kernel Hi Arnaldo, On Tue, Oct 30, 2018 at 11:32:26AM -0300, Arnaldo Carvalho de Melo wrote: > Em Tue, Oct 30, 2018 at 03:18:28PM +0800, Leo Yan escreveu: > > Since commit 9042f5e3539e ("perf tools: Stop fallbacking to kallsyms > > for vdso symbols lookup"), the kernel address cannot be properly parsed > > to kernel symbol with command 'perf script -k vmlinux'. The reason is > > CoreSight samples is always to set CPU mode as PERF_RECORD_MISC_USER, > > thus it fails to find corresponding map/dso in below flows: > > > > process_sample_event() > > `-> machine__resolve() > > `-> thread__find_map(thread, sample->cpumode, sample->ip, al); > > > > In this flow it needs to pass argument 'sample->cpumode' to tell what's > > the CPU mode, before it always passed PERF_RECORD_MISC_USER but without > > any failure until the commit 9042f5e3539e ("perf tools: Stop fallbacking > > to kallsyms for vdso symbols lookup") has been merged. The reason is > > even with the wrong CPU mode the function thread__find_map() firstly > > fails to find map but it will rollback to find kernel map for vdso > > symbols lookup. In the latest code it has removed the fallback code, > > thus if CPU mode is PERF_RECORD_MISC_USER then it cannot find map > > anymore with kernel address. > > > > This patch is to correct samples CPU mode setting, it creates a new > > helper function cs_etm__cpu_mode() to tell what's the CPU mode based on > > the address with the info from machine structure; this patch has a bit > > extension to check not only kernel and user mode, but also check for > > host/guest and hypervisor mode. Finally this patch uses the function > > in instruction and branch samples and also apply in cs_etm__mem_access() > > for a minor polishing. > > Mathieu, can I have your Acked-by, please? Leo, thanks for acting so > quickly on this one! Thanks for reivewing. Yeah, let's wait for Mathieu reviewing as well, as I know he is travelling so might be delay a bit. Just remind, we might need the similiar change for util/intel-pt.c and util/intel-bts.c when generate samples, otherwise they might have the same regression for kernel symbols. I am not the best person to change these two files, but bring up this for attention. > Now processing coresight traces should be faster, less lookups :-) Thanks! [...] Thanks, Leo Yan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] perf cs-etm: Correct CPU mode for samples 2018-10-30 15:04 ` leo.yan at linaro.org @ 2018-10-30 15:11 ` Arnaldo Carvalho de Melo -1 siblings, 0 replies; 13+ messages in thread From: Arnaldo Carvalho de Melo @ 2018-10-30 15:11 UTC (permalink / raw) To: leo.yan, Adrian Hunter Cc: Mathieu Poirier, Peter Zijlstra, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel, Coresight ML Em Tue, Oct 30, 2018 at 11:04:49PM +0800, leo.yan@linaro.org escreveu: > Hi Arnaldo, > > On Tue, Oct 30, 2018 at 11:32:26AM -0300, Arnaldo Carvalho de Melo wrote: > > Em Tue, Oct 30, 2018 at 03:18:28PM +0800, Leo Yan escreveu: > > > Since commit 9042f5e3539e ("perf tools: Stop fallbacking to kallsyms > > > for vdso symbols lookup"), the kernel address cannot be properly parsed > > > to kernel symbol with command 'perf script -k vmlinux'. The reason is > > > CoreSight samples is always to set CPU mode as PERF_RECORD_MISC_USER, > > > thus it fails to find corresponding map/dso in below flows: > > > > > > process_sample_event() > > > `-> machine__resolve() > > > `-> thread__find_map(thread, sample->cpumode, sample->ip, al); > > > > > > In this flow it needs to pass argument 'sample->cpumode' to tell what's > > > the CPU mode, before it always passed PERF_RECORD_MISC_USER but without > > > any failure until the commit 9042f5e3539e ("perf tools: Stop fallbacking > > > to kallsyms for vdso symbols lookup") has been merged. The reason is > > > even with the wrong CPU mode the function thread__find_map() firstly > > > fails to find map but it will rollback to find kernel map for vdso > > > symbols lookup. In the latest code it has removed the fallback code, > > > thus if CPU mode is PERF_RECORD_MISC_USER then it cannot find map > > > anymore with kernel address. > > > > > > This patch is to correct samples CPU mode setting, it creates a new > > > helper function cs_etm__cpu_mode() to tell what's the CPU mode based on > > > the address with the info from machine structure; this patch has a bit > > > extension to check not only kernel and user mode, but also check for > > > host/guest and hypervisor mode. Finally this patch uses the function > > > in instruction and branch samples and also apply in cs_etm__mem_access() > > > for a minor polishing. > > > > Mathieu, can I have your Acked-by, please? Leo, thanks for acting so > > quickly on this one! > > Thanks for reivewing. Yeah, let's wait for Mathieu reviewing as well, > as I know he is travelling so might be delay a bit. I'm tentatively applying the patch, as this needs fixing ASAP, and I take that you have tested it and it cured the problem for you, so should be a good indication for the acceptance of the patch. We can always fix some detail later. > Just remind, we might need the similiar change for util/intel-pt.c and > util/intel-bts.c when generate samples, otherwise they might have the > same regression for kernel symbols. I am not the best person to change > these two files, but bring up this for attention. Right, I think Adrian is working on it, Adrian? > > Now processing coresight traces should be faster, less lookups :-) > > Thanks! - Arnaldo ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] perf cs-etm: Correct CPU mode for samples @ 2018-10-30 15:11 ` Arnaldo Carvalho de Melo 0 siblings, 0 replies; 13+ messages in thread From: Arnaldo Carvalho de Melo @ 2018-10-30 15:11 UTC (permalink / raw) To: linux-arm-kernel Em Tue, Oct 30, 2018 at 11:04:49PM +0800, leo.yan at linaro.org escreveu: > Hi Arnaldo, > > On Tue, Oct 30, 2018 at 11:32:26AM -0300, Arnaldo Carvalho de Melo wrote: > > Em Tue, Oct 30, 2018 at 03:18:28PM +0800, Leo Yan escreveu: > > > Since commit 9042f5e3539e ("perf tools: Stop fallbacking to kallsyms > > > for vdso symbols lookup"), the kernel address cannot be properly parsed > > > to kernel symbol with command 'perf script -k vmlinux'. The reason is > > > CoreSight samples is always to set CPU mode as PERF_RECORD_MISC_USER, > > > thus it fails to find corresponding map/dso in below flows: > > > > > > process_sample_event() > > > `-> machine__resolve() > > > `-> thread__find_map(thread, sample->cpumode, sample->ip, al); > > > > > > In this flow it needs to pass argument 'sample->cpumode' to tell what's > > > the CPU mode, before it always passed PERF_RECORD_MISC_USER but without > > > any failure until the commit 9042f5e3539e ("perf tools: Stop fallbacking > > > to kallsyms for vdso symbols lookup") has been merged. The reason is > > > even with the wrong CPU mode the function thread__find_map() firstly > > > fails to find map but it will rollback to find kernel map for vdso > > > symbols lookup. In the latest code it has removed the fallback code, > > > thus if CPU mode is PERF_RECORD_MISC_USER then it cannot find map > > > anymore with kernel address. > > > > > > This patch is to correct samples CPU mode setting, it creates a new > > > helper function cs_etm__cpu_mode() to tell what's the CPU mode based on > > > the address with the info from machine structure; this patch has a bit > > > extension to check not only kernel and user mode, but also check for > > > host/guest and hypervisor mode. Finally this patch uses the function > > > in instruction and branch samples and also apply in cs_etm__mem_access() > > > for a minor polishing. > > > > Mathieu, can I have your Acked-by, please? Leo, thanks for acting so > > quickly on this one! > > Thanks for reivewing. Yeah, let's wait for Mathieu reviewing as well, > as I know he is travelling so might be delay a bit. I'm tentatively applying the patch, as this needs fixing ASAP, and I take that you have tested it and it cured the problem for you, so should be a good indication for the acceptance of the patch. We can always fix some detail later. > Just remind, we might need the similiar change for util/intel-pt.c and > util/intel-bts.c when generate samples, otherwise they might have the > same regression for kernel symbols. I am not the best person to change > these two files, but bring up this for attention. Right, I think Adrian is working on it, Adrian? > > Now processing coresight traces should be faster, less lookups :-) > > Thanks! - Arnaldo ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] perf cs-etm: Correct CPU mode for samples 2018-10-30 15:11 ` Arnaldo Carvalho de Melo @ 2018-10-30 15:30 ` Adrian Hunter -1 siblings, 0 replies; 13+ messages in thread From: Adrian Hunter @ 2018-10-30 15:30 UTC (permalink / raw) To: Arnaldo Carvalho de Melo, leo.yan Cc: Mathieu Poirier, Peter Zijlstra, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel, Coresight ML On 30/10/18 5:11 PM, Arnaldo Carvalho de Melo wrote: > Em Tue, Oct 30, 2018 at 11:04:49PM +0800, leo.yan@linaro.org escreveu: >> Hi Arnaldo, >> >> On Tue, Oct 30, 2018 at 11:32:26AM -0300, Arnaldo Carvalho de Melo wrote: >>> Em Tue, Oct 30, 2018 at 03:18:28PM +0800, Leo Yan escreveu: >>>> Since commit 9042f5e3539e ("perf tools: Stop fallbacking to kallsyms >>>> for vdso symbols lookup"), the kernel address cannot be properly parsed >>>> to kernel symbol with command 'perf script -k vmlinux'. The reason is >>>> CoreSight samples is always to set CPU mode as PERF_RECORD_MISC_USER, >>>> thus it fails to find corresponding map/dso in below flows: >>>> >>>> process_sample_event() >>>> `-> machine__resolve() >>>> `-> thread__find_map(thread, sample->cpumode, sample->ip, al); >>>> >>>> In this flow it needs to pass argument 'sample->cpumode' to tell what's >>>> the CPU mode, before it always passed PERF_RECORD_MISC_USER but without >>>> any failure until the commit 9042f5e3539e ("perf tools: Stop fallbacking >>>> to kallsyms for vdso symbols lookup") has been merged. The reason is >>>> even with the wrong CPU mode the function thread__find_map() firstly >>>> fails to find map but it will rollback to find kernel map for vdso >>>> symbols lookup. In the latest code it has removed the fallback code, >>>> thus if CPU mode is PERF_RECORD_MISC_USER then it cannot find map >>>> anymore with kernel address. >>>> >>>> This patch is to correct samples CPU mode setting, it creates a new >>>> helper function cs_etm__cpu_mode() to tell what's the CPU mode based on >>>> the address with the info from machine structure; this patch has a bit >>>> extension to check not only kernel and user mode, but also check for >>>> host/guest and hypervisor mode. Finally this patch uses the function >>>> in instruction and branch samples and also apply in cs_etm__mem_access() >>>> for a minor polishing. >>> >>> Mathieu, can I have your Acked-by, please? Leo, thanks for acting so >>> quickly on this one! >> >> Thanks for reivewing. Yeah, let's wait for Mathieu reviewing as well, >> as I know he is travelling so might be delay a bit. > > I'm tentatively applying the patch, as this needs fixing ASAP, and I > take that you have tested it and it cured the problem for you, so should > be a good indication for the acceptance of the patch. > > We can always fix some detail later. > >> Just remind, we might need the similiar change for util/intel-pt.c and >> util/intel-bts.c when generate samples, otherwise they might have the >> same regression for kernel symbols. I am not the best person to change >> these two files, but bring up this for attention. > > Right, I think Adrian is working on it, Adrian? Yes although I am more concerned with branches from user space to kernel space and vice versa, which this patch doesn't deal with. Also there are many cpumode issues in perf tools, and really the only way to deal with them simply at the moment is to put back the fallback for arches other than sparc i.e. diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c index bc646185f8d9..b7a79cf490ad 100644 --- a/tools/perf/util/event.c +++ b/tools/perf/util/event.c @@ -1561,9 +1561,18 @@ struct map *thread__find_map(struct thread *thread, u8 cpumode, u64 addr, return NULL; } - +try_again: al->map = map_groups__find(mg, al->addr); - if (al->map != NULL) { + if (al->map == NULL) { + if (cpumode == PERF_RECORD_MISC_USER && machine && + strcmp(perf_env__arch(machine->env), "sparc") && + mg != &machine->kmaps && + machine__kernel_ip(machine, al->addr)) { + mg = &machine->kmaps; + load_map = true; + goto try_again; + } + } else { /* * Kernel maps might be changed when loading symbols so loading * must be done prior to using kernel maps. ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH] perf cs-etm: Correct CPU mode for samples @ 2018-10-30 15:30 ` Adrian Hunter 0 siblings, 0 replies; 13+ messages in thread From: Adrian Hunter @ 2018-10-30 15:30 UTC (permalink / raw) To: linux-arm-kernel On 30/10/18 5:11 PM, Arnaldo Carvalho de Melo wrote: > Em Tue, Oct 30, 2018 at 11:04:49PM +0800, leo.yan at linaro.org escreveu: >> Hi Arnaldo, >> >> On Tue, Oct 30, 2018 at 11:32:26AM -0300, Arnaldo Carvalho de Melo wrote: >>> Em Tue, Oct 30, 2018 at 03:18:28PM +0800, Leo Yan escreveu: >>>> Since commit 9042f5e3539e ("perf tools: Stop fallbacking to kallsyms >>>> for vdso symbols lookup"), the kernel address cannot be properly parsed >>>> to kernel symbol with command 'perf script -k vmlinux'. The reason is >>>> CoreSight samples is always to set CPU mode as PERF_RECORD_MISC_USER, >>>> thus it fails to find corresponding map/dso in below flows: >>>> >>>> process_sample_event() >>>> `-> machine__resolve() >>>> `-> thread__find_map(thread, sample->cpumode, sample->ip, al); >>>> >>>> In this flow it needs to pass argument 'sample->cpumode' to tell what's >>>> the CPU mode, before it always passed PERF_RECORD_MISC_USER but without >>>> any failure until the commit 9042f5e3539e ("perf tools: Stop fallbacking >>>> to kallsyms for vdso symbols lookup") has been merged. The reason is >>>> even with the wrong CPU mode the function thread__find_map() firstly >>>> fails to find map but it will rollback to find kernel map for vdso >>>> symbols lookup. In the latest code it has removed the fallback code, >>>> thus if CPU mode is PERF_RECORD_MISC_USER then it cannot find map >>>> anymore with kernel address. >>>> >>>> This patch is to correct samples CPU mode setting, it creates a new >>>> helper function cs_etm__cpu_mode() to tell what's the CPU mode based on >>>> the address with the info from machine structure; this patch has a bit >>>> extension to check not only kernel and user mode, but also check for >>>> host/guest and hypervisor mode. Finally this patch uses the function >>>> in instruction and branch samples and also apply in cs_etm__mem_access() >>>> for a minor polishing. >>> >>> Mathieu, can I have your Acked-by, please? Leo, thanks for acting so >>> quickly on this one! >> >> Thanks for reivewing. Yeah, let's wait for Mathieu reviewing as well, >> as I know he is travelling so might be delay a bit. > > I'm tentatively applying the patch, as this needs fixing ASAP, and I > take that you have tested it and it cured the problem for you, so should > be a good indication for the acceptance of the patch. > > We can always fix some detail later. > >> Just remind, we might need the similiar change for util/intel-pt.c and >> util/intel-bts.c when generate samples, otherwise they might have the >> same regression for kernel symbols. I am not the best person to change >> these two files, but bring up this for attention. > > Right, I think Adrian is working on it, Adrian? Yes although I am more concerned with branches from user space to kernel space and vice versa, which this patch doesn't deal with. Also there are many cpumode issues in perf tools, and really the only way to deal with them simply at the moment is to put back the fallback for arches other than sparc i.e. diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c index bc646185f8d9..b7a79cf490ad 100644 --- a/tools/perf/util/event.c +++ b/tools/perf/util/event.c @@ -1561,9 +1561,18 @@ struct map *thread__find_map(struct thread *thread, u8 cpumode, u64 addr, return NULL; } - +try_again: al->map = map_groups__find(mg, al->addr); - if (al->map != NULL) { + if (al->map == NULL) { + if (cpumode == PERF_RECORD_MISC_USER && machine && + strcmp(perf_env__arch(machine->env), "sparc") && + mg != &machine->kmaps && + machine__kernel_ip(machine, al->addr)) { + mg = &machine->kmaps; + load_map = true; + goto try_again; + } + } else { /* * Kernel maps might be changed when loading symbols so loading * must be done prior to using kernel maps. ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] perf cs-etm: Correct CPU mode for samples 2018-10-30 15:30 ` Adrian Hunter @ 2018-10-30 15:45 ` Arnaldo Carvalho de Melo -1 siblings, 0 replies; 13+ messages in thread From: Arnaldo Carvalho de Melo @ 2018-10-30 15:45 UTC (permalink / raw) To: Adrian Hunter Cc: leo.yan, Mathieu Poirier, Peter Zijlstra, David Miller, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel, Coresight ML Em Tue, Oct 30, 2018 at 05:30:55PM +0200, Adrian Hunter escreveu: > On 30/10/18 5:11 PM, Arnaldo Carvalho de Melo wrote: > > Em Tue, Oct 30, 2018 at 11:04:49PM +0800, leo.yan@linaro.org escreveu: > >> Hi Arnaldo, > >> > >> On Tue, Oct 30, 2018 at 11:32:26AM -0300, Arnaldo Carvalho de Melo wrote: > >>> Em Tue, Oct 30, 2018 at 03:18:28PM +0800, Leo Yan escreveu: > >>>> Since commit 9042f5e3539e ("perf tools: Stop fallbacking to kallsyms > >>>> for vdso symbols lookup"), the kernel address cannot be properly parsed > >>>> to kernel symbol with command 'perf script -k vmlinux'. The reason is > >>>> CoreSight samples is always to set CPU mode as PERF_RECORD_MISC_USER, > >>>> thus it fails to find corresponding map/dso in below flows: > >>>> > >>>> process_sample_event() > >>>> `-> machine__resolve() > >>>> `-> thread__find_map(thread, sample->cpumode, sample->ip, al); > >>>> > >>>> In this flow it needs to pass argument 'sample->cpumode' to tell what's > >>>> the CPU mode, before it always passed PERF_RECORD_MISC_USER but without > >>>> any failure until the commit 9042f5e3539e ("perf tools: Stop fallbacking > >>>> to kallsyms for vdso symbols lookup") has been merged. The reason is > >>>> even with the wrong CPU mode the function thread__find_map() firstly > >>>> fails to find map but it will rollback to find kernel map for vdso > >>>> symbols lookup. In the latest code it has removed the fallback code, > >>>> thus if CPU mode is PERF_RECORD_MISC_USER then it cannot find map > >>>> anymore with kernel address. > >>>> > >>>> This patch is to correct samples CPU mode setting, it creates a new > >>>> helper function cs_etm__cpu_mode() to tell what's the CPU mode based on > >>>> the address with the info from machine structure; this patch has a bit > >>>> extension to check not only kernel and user mode, but also check for > >>>> host/guest and hypervisor mode. Finally this patch uses the function > >>>> in instruction and branch samples and also apply in cs_etm__mem_access() > >>>> for a minor polishing. > >>> > >>> Mathieu, can I have your Acked-by, please? Leo, thanks for acting so > >>> quickly on this one! > >> > >> Thanks for reivewing. Yeah, let's wait for Mathieu reviewing as well, > >> as I know he is travelling so might be delay a bit. > > > > I'm tentatively applying the patch, as this needs fixing ASAP, and I > > take that you have tested it and it cured the problem for you, so should > > be a good indication for the acceptance of the patch. > > > > We can always fix some detail later. > > > >> Just remind, we might need the similiar change for util/intel-pt.c and > >> util/intel-bts.c when generate samples, otherwise they might have the > >> same regression for kernel symbols. I am not the best person to change > >> these two files, but bring up this for attention. > > Right, I think Adrian is working on it, Adrian? > Yes although I am more concerned with branches from user space to kernel > space and vice versa, which this patch doesn't deal with. Also there are > many cpumode issues in perf tools, and really the only way to deal with > them simply at the moment is to put back the fallback for arches other > than sparc i.e. We can do that, sure, but the opportunity to fix at least some of those problems and then make this fallback not to take place looks reasonable, for instance, I have this patch now in my local tree, from David, that posted just as a way to show part of the problem, parts of the commit message I wrote, parts are from David: commit 54ac9007e6cb77c37ebf4f5e3debb23075cc90f0 Author: David S. Miller <davem@davemloft.net> Date: Tue Oct 30 12:12:26 2018 -0300 perf callchain: Honour the ordering of PERF_CONTEXT_{USER,KERNEL,etc} When processing using 'perf report -g caller', which is the default, we ended up reverting the callchain entries received from the kernel, but simply reverting trows away the information that tells that from a point onwards the addresses are for userspace, kernel, guest kernel, guest user, hypervisor. The idea is that if we are walking backwards, for each cluster of non-cpumode entries we have to first scan backwards for the next one and use that for the cluster. This seems silly and more expensive than it needs to be but it is enough for a initial fix. The code here is really complicated because it is intimately intertwined with the lbr and branch handling, as well as this callchain order, further fixes will be needed to properly take into account the cpumode in those cases. Another problem with ORDER_CALLER is that the NULL "0" IP that is at the end of most callchains shows up at the top of the histogram because every callchain contains it and with ORDER_CALLER it is the first entry. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Souvik Banerjee <souvik1997@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: stable@vger.kernel.org # 4.19 Link: https://lkml.kernel.org/n/tip-2wt3ayp6j2y2f2xowixa8y6y@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 111ae858cbcb..8ee8ab39d8ac 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -2140,6 +2140,27 @@ static int resolve_lbr_callchain_sample(struct thread *thread, return 0; } +static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread, + struct callchain_cursor *cursor, + struct symbol **parent, + struct addr_location *root_al, + u8 *cpumode, int ent) +{ + int err = 0; + + while (--ent >= 0) { + u64 ip = chain->ips[ent]; + + if (ip >= PERF_CONTEXT_MAX) { + err = add_callchain_ip(thread, cursor, parent, + root_al, cpumode, ip, + false, NULL, NULL, 0); + break; + } + } + return err; +} + static int thread__resolve_callchain_sample(struct thread *thread, struct callchain_cursor *cursor, struct perf_evsel *evsel, @@ -2246,6 +2267,12 @@ static int thread__resolve_callchain_sample(struct thread *thread, } check_calls: + if (callchain_param.order != ORDER_CALLEE) { + err = find_prev_cpumode(chain, thread, cursor, parent, root_al, + &cpumode, chain->nr - first_call); + if (err) + return (err < 0) ? err : 0; + } for (i = first_call, nr_entries = 0; i < chain_nr && nr_entries < max_stack; i++) { u64 ip; @@ -2260,9 +2287,15 @@ static int thread__resolve_callchain_sample(struct thread *thread, continue; #endif ip = chain->ips[j]; - if (ip < PERF_CONTEXT_MAX) ++nr_entries; + else if (callchain_param.order != ORDER_CALLEE) { + err = find_prev_cpumode(chain, thread, cursor, parent, + root_al, &cpumode, j); + if (err) + return (err < 0) ? err : 0; + continue; + } err = add_callchain_ip(thread, cursor, parent, root_al, &cpumode, ip, ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH] perf cs-etm: Correct CPU mode for samples @ 2018-10-30 15:45 ` Arnaldo Carvalho de Melo 0 siblings, 0 replies; 13+ messages in thread From: Arnaldo Carvalho de Melo @ 2018-10-30 15:45 UTC (permalink / raw) To: linux-arm-kernel Em Tue, Oct 30, 2018 at 05:30:55PM +0200, Adrian Hunter escreveu: > On 30/10/18 5:11 PM, Arnaldo Carvalho de Melo wrote: > > Em Tue, Oct 30, 2018 at 11:04:49PM +0800, leo.yan at linaro.org escreveu: > >> Hi Arnaldo, > >> > >> On Tue, Oct 30, 2018 at 11:32:26AM -0300, Arnaldo Carvalho de Melo wrote: > >>> Em Tue, Oct 30, 2018 at 03:18:28PM +0800, Leo Yan escreveu: > >>>> Since commit 9042f5e3539e ("perf tools: Stop fallbacking to kallsyms > >>>> for vdso symbols lookup"), the kernel address cannot be properly parsed > >>>> to kernel symbol with command 'perf script -k vmlinux'. The reason is > >>>> CoreSight samples is always to set CPU mode as PERF_RECORD_MISC_USER, > >>>> thus it fails to find corresponding map/dso in below flows: > >>>> > >>>> process_sample_event() > >>>> `-> machine__resolve() > >>>> `-> thread__find_map(thread, sample->cpumode, sample->ip, al); > >>>> > >>>> In this flow it needs to pass argument 'sample->cpumode' to tell what's > >>>> the CPU mode, before it always passed PERF_RECORD_MISC_USER but without > >>>> any failure until the commit 9042f5e3539e ("perf tools: Stop fallbacking > >>>> to kallsyms for vdso symbols lookup") has been merged. The reason is > >>>> even with the wrong CPU mode the function thread__find_map() firstly > >>>> fails to find map but it will rollback to find kernel map for vdso > >>>> symbols lookup. In the latest code it has removed the fallback code, > >>>> thus if CPU mode is PERF_RECORD_MISC_USER then it cannot find map > >>>> anymore with kernel address. > >>>> > >>>> This patch is to correct samples CPU mode setting, it creates a new > >>>> helper function cs_etm__cpu_mode() to tell what's the CPU mode based on > >>>> the address with the info from machine structure; this patch has a bit > >>>> extension to check not only kernel and user mode, but also check for > >>>> host/guest and hypervisor mode. Finally this patch uses the function > >>>> in instruction and branch samples and also apply in cs_etm__mem_access() > >>>> for a minor polishing. > >>> > >>> Mathieu, can I have your Acked-by, please? Leo, thanks for acting so > >>> quickly on this one! > >> > >> Thanks for reivewing. Yeah, let's wait for Mathieu reviewing as well, > >> as I know he is travelling so might be delay a bit. > > > > I'm tentatively applying the patch, as this needs fixing ASAP, and I > > take that you have tested it and it cured the problem for you, so should > > be a good indication for the acceptance of the patch. > > > > We can always fix some detail later. > > > >> Just remind, we might need the similiar change for util/intel-pt.c and > >> util/intel-bts.c when generate samples, otherwise they might have the > >> same regression for kernel symbols. I am not the best person to change > >> these two files, but bring up this for attention. > > Right, I think Adrian is working on it, Adrian? > Yes although I am more concerned with branches from user space to kernel > space and vice versa, which this patch doesn't deal with. Also there are > many cpumode issues in perf tools, and really the only way to deal with > them simply at the moment is to put back the fallback for arches other > than sparc i.e. We can do that, sure, but the opportunity to fix at least some of those problems and then make this fallback not to take place looks reasonable, for instance, I have this patch now in my local tree, from David, that posted just as a way to show part of the problem, parts of the commit message I wrote, parts are from David: commit 54ac9007e6cb77c37ebf4f5e3debb23075cc90f0 Author: David S. Miller <davem@davemloft.net> Date: Tue Oct 30 12:12:26 2018 -0300 perf callchain: Honour the ordering of PERF_CONTEXT_{USER,KERNEL,etc} When processing using 'perf report -g caller', which is the default, we ended up reverting the callchain entries received from the kernel, but simply reverting trows away the information that tells that from a point onwards the addresses are for userspace, kernel, guest kernel, guest user, hypervisor. The idea is that if we are walking backwards, for each cluster of non-cpumode entries we have to first scan backwards for the next one and use that for the cluster. This seems silly and more expensive than it needs to be but it is enough for a initial fix. The code here is really complicated because it is intimately intertwined with the lbr and branch handling, as well as this callchain order, further fixes will be needed to properly take into account the cpumode in those cases. Another problem with ORDER_CALLER is that the NULL "0" IP that is at the end of most callchains shows up at the top of the histogram because every callchain contains it and with ORDER_CALLER it is the first entry. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Souvik Banerjee <souvik1997@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: stable at vger.kernel.org # 4.19 Link: https://lkml.kernel.org/n/tip-2wt3ayp6j2y2f2xowixa8y6y at git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 111ae858cbcb..8ee8ab39d8ac 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -2140,6 +2140,27 @@ static int resolve_lbr_callchain_sample(struct thread *thread, return 0; } +static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread, + struct callchain_cursor *cursor, + struct symbol **parent, + struct addr_location *root_al, + u8 *cpumode, int ent) +{ + int err = 0; + + while (--ent >= 0) { + u64 ip = chain->ips[ent]; + + if (ip >= PERF_CONTEXT_MAX) { + err = add_callchain_ip(thread, cursor, parent, + root_al, cpumode, ip, + false, NULL, NULL, 0); + break; + } + } + return err; +} + static int thread__resolve_callchain_sample(struct thread *thread, struct callchain_cursor *cursor, struct perf_evsel *evsel, @@ -2246,6 +2267,12 @@ static int thread__resolve_callchain_sample(struct thread *thread, } check_calls: + if (callchain_param.order != ORDER_CALLEE) { + err = find_prev_cpumode(chain, thread, cursor, parent, root_al, + &cpumode, chain->nr - first_call); + if (err) + return (err < 0) ? err : 0; + } for (i = first_call, nr_entries = 0; i < chain_nr && nr_entries < max_stack; i++) { u64 ip; @@ -2260,9 +2287,15 @@ static int thread__resolve_callchain_sample(struct thread *thread, continue; #endif ip = chain->ips[j]; - if (ip < PERF_CONTEXT_MAX) ++nr_entries; + else if (callchain_param.order != ORDER_CALLEE) { + err = find_prev_cpumode(chain, thread, cursor, parent, + root_al, &cpumode, j); + if (err) + return (err < 0) ? err : 0; + continue; + } err = add_callchain_ip(thread, cursor, parent, root_al, &cpumode, ip, ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [tip:perf/urgent] perf cs-etm: Correct CPU mode for samples 2018-10-30 7:18 ` Leo Yan (?) (?) @ 2018-10-31 22:04 ` tip-bot for Leo Yan -1 siblings, 0 replies; 13+ messages in thread From: tip-bot for Leo Yan @ 2018-10-31 22:04 UTC (permalink / raw) To: linux-tip-commits Cc: acme, tglx, davem, alexander.shishkin, adrian.hunter, leo.yan, linux-kernel, jolsa, peterz, mathieu.poirier, hpa, mingo, namhyung Commit-ID: d6c9c05fe1eb4b213b183d8a1e79416256dc833a Gitweb: https://git.kernel.org/tip/d6c9c05fe1eb4b213b183d8a1e79416256dc833a Author: Leo Yan <leo.yan@linaro.org> AuthorDate: Tue, 30 Oct 2018 15:18:28 +0800 Committer: Arnaldo Carvalho de Melo <acme@redhat.com> CommitDate: Wed, 31 Oct 2018 09:57:50 -0300 perf cs-etm: Correct CPU mode for samples Since commit edeb0c90df35 ("perf tools: Stop fallbacking to kallsyms for vdso symbols lookup"), the kernel address cannot be properly parsed to kernel symbol with command 'perf script -k vmlinux'. The reason is CoreSight samples is always to set CPU mode as PERF_RECORD_MISC_USER, thus it fails to find corresponding map/dso in below flows: process_sample_event() `-> machine__resolve() `-> thread__find_map(thread, sample->cpumode, sample->ip, al); In this flow it needs to pass argument 'sample->cpumode' to tell what's the CPU mode, before it always passed PERF_RECORD_MISC_USER but without any failure until the commit edeb0c90df35 ("perf tools: Stop fallbacking to kallsyms for vdso symbols lookup") has been merged. The reason is even with the wrong CPU mode the function thread__find_map() firstly fails to find map but it will rollback to find kernel map for vdso symbols lookup. In the latest code it has removed the fallback code, thus if CPU mode is PERF_RECORD_MISC_USER then it cannot find map anymore with kernel address. This patch is to correct samples CPU mode setting, it creates a new helper function cs_etm__cpu_mode() to tell what's the CPU mode based on the address with the info from machine structure; this patch has a bit extension to check not only kernel and user mode, but also check for host/guest and hypervisor mode. Finally this patch uses the function in instruction and branch samples and also apply in cs_etm__mem_access() for a minor polishing. Signed-off-by: Leo Yan <leo.yan@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Miller <davem@davemloft.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: stable@kernel.org # v4.19 Link: http://lkml.kernel.org/r/1540883908-17018-1-git-send-email-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/cs-etm.c | 39 ++++++++++++++++++++++++++++++--------- 1 file changed, 30 insertions(+), 9 deletions(-) diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 3b37d66dc533..73430b73570d 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -244,6 +244,27 @@ static void cs_etm__free(struct perf_session *session) zfree(&aux); } +static u8 cs_etm__cpu_mode(struct cs_etm_queue *etmq, u64 address) +{ + struct machine *machine; + + machine = etmq->etm->machine; + + if (address >= etmq->etm->kernel_start) { + if (machine__is_host(machine)) + return PERF_RECORD_MISC_KERNEL; + else + return PERF_RECORD_MISC_GUEST_KERNEL; + } else { + if (machine__is_host(machine)) + return PERF_RECORD_MISC_USER; + else if (perf_guest) + return PERF_RECORD_MISC_GUEST_USER; + else + return PERF_RECORD_MISC_HYPERVISOR; + } +} + static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u64 address, size_t size, u8 *buffer) { @@ -258,10 +279,7 @@ static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u64 address, return -1; machine = etmq->etm->machine; - if (address >= etmq->etm->kernel_start) - cpumode = PERF_RECORD_MISC_KERNEL; - else - cpumode = PERF_RECORD_MISC_USER; + cpumode = cs_etm__cpu_mode(etmq, address); thread = etmq->thread; if (!thread) { @@ -653,7 +671,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, struct perf_sample sample = {.ip = 0,}; event->sample.header.type = PERF_RECORD_SAMPLE; - event->sample.header.misc = PERF_RECORD_MISC_USER; + event->sample.header.misc = cs_etm__cpu_mode(etmq, addr); event->sample.header.size = sizeof(struct perf_event_header); sample.ip = addr; @@ -665,7 +683,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, sample.cpu = etmq->packet->cpu; sample.flags = 0; sample.insn_len = 1; - sample.cpumode = event->header.misc; + sample.cpumode = event->sample.header.misc; if (etm->synth_opts.last_branch) { cs_etm__copy_last_branch_rb(etmq); @@ -706,12 +724,15 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq) u64 nr; struct branch_entry entries; } dummy_bs; + u64 ip; + + ip = cs_etm__last_executed_instr(etmq->prev_packet); event->sample.header.type = PERF_RECORD_SAMPLE; - event->sample.header.misc = PERF_RECORD_MISC_USER; + event->sample.header.misc = cs_etm__cpu_mode(etmq, ip); event->sample.header.size = sizeof(struct perf_event_header); - sample.ip = cs_etm__last_executed_instr(etmq->prev_packet); + sample.ip = ip; sample.pid = etmq->pid; sample.tid = etmq->tid; sample.addr = cs_etm__first_executed_instr(etmq->packet); @@ -720,7 +741,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq) sample.period = 1; sample.cpu = etmq->packet->cpu; sample.flags = 0; - sample.cpumode = PERF_RECORD_MISC_USER; + sample.cpumode = event->sample.header.misc; /* * perf report cannot handle events without a branch stack ^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2018-10-31 22:05 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-10-30 7:18 [PATCH] perf cs-etm: Correct CPU mode for samples Leo Yan 2018-10-30 7:18 ` Leo Yan 2018-10-30 14:32 ` Arnaldo Carvalho de Melo 2018-10-30 14:32 ` Arnaldo Carvalho de Melo 2018-10-30 15:04 ` leo.yan 2018-10-30 15:04 ` leo.yan at linaro.org 2018-10-30 15:11 ` Arnaldo Carvalho de Melo 2018-10-30 15:11 ` Arnaldo Carvalho de Melo 2018-10-30 15:30 ` Adrian Hunter 2018-10-30 15:30 ` Adrian Hunter 2018-10-30 15:45 ` Arnaldo Carvalho de Melo 2018-10-30 15:45 ` Arnaldo Carvalho de Melo 2018-10-31 22:04 ` [tip:perf/urgent] " tip-bot for Leo Yan
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.