* [RFC] perf arm-spe: Track task context switch for cpu-mode events @ 2021-09-16 0:17 Namhyung Kim 2021-09-16 13:54 ` Leo Yan 0 siblings, 1 reply; 27+ messages in thread From: Namhyung Kim @ 2021-09-16 0:17 UTC (permalink / raw) To: Arnaldo Carvalho de Melo, Jiri Olsa Cc: Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian, Leo Yan, James Clark, Tan Xiaojun, Adrian Hunter When perf report synthesize events from ARM SPE data, it refers to current cpu, pid and tid in the machine. But there's no place to set them in the ARM SPE decoder. I'm seeing all pid/tid is set to -1 and user symbols are not resolved in the output. # perf record -a -e arm_spe_0/ts_enable=1/ sleep 1 # perf report -q | head 8.77% 8.77% :-1 [kernel.kallsyms] [k] format_decode 7.02% 7.02% :-1 [kernel.kallsyms] [k] seq_printf 7.02% 7.02% :-1 [unknown] [.] 0x0000ffff9f687c34 5.26% 5.26% :-1 [kernel.kallsyms] [k] vsnprintf 3.51% 3.51% :-1 [kernel.kallsyms] [k] string 3.51% 3.51% :-1 [unknown] [.] 0x0000ffff9f66ae20 3.51% 3.51% :-1 [unknown] [.] 0x0000ffff9f670b3c 3.51% 3.51% :-1 [unknown] [.] 0x0000ffff9f67c040 1.75% 1.75% :-1 [kernel.kallsyms] [k] ___cache_free 1.75% 1.75% :-1 [kernel.kallsyms] [k] __count_memcg_events Like Intel PT, add context switch records to track task info. As ARM SPE support was added later than PERF_RECORD_SWITCH_CPU_WIDE, I think we can safely set the attr.context_switch bit and use it. Cc: Leo Yan <leo.yan@linaro.org> Cc: James Clark <james.clark@arm.com> Cc: Tan Xiaojun <tanxiaojun@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> --- tools/perf/arch/arm64/util/arm-spe.c | 6 +++++- tools/perf/util/arm-spe.c | 24 ++++++++++++++++++++++++ 2 files changed, 29 insertions(+), 1 deletion(-) diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c index a4420d4df503..58ba8d15c573 100644 --- a/tools/perf/arch/arm64/util/arm-spe.c +++ b/tools/perf/arch/arm64/util/arm-spe.c @@ -166,8 +166,12 @@ static int arm_spe_recording_options(struct auxtrace_record *itr, tracking_evsel->core.attr.sample_period = 1; /* In per-cpu case, always need the time of mmap events etc */ - if (!perf_cpu_map__empty(cpus)) + if (!perf_cpu_map__empty(cpus)) { evsel__set_sample_bit(tracking_evsel, TIME); + evsel__set_sample_bit(tracking_evsel, CPU); + /* also track task context switch */ + tracking_evsel->core.attr.context_switch = 1; + } return 0; } diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c index 58b7069c5a5f..0acac0431c48 100644 --- a/tools/perf/util/arm-spe.c +++ b/tools/perf/util/arm-spe.c @@ -681,6 +681,25 @@ static int arm_spe_process_timeless_queues(struct arm_spe *spe, pid_t tid, return 0; } +static int arm_spe_context_switch(struct arm_spe *spe, union perf_event *event, + struct perf_sample *sample) +{ + pid_t pid, tid; + int cpu; + + if (!(event->header.misc & PERF_RECORD_MISC_SWITCH_OUT)) + return 0; + + pid = event->context_switch.next_prev_pid; + tid = event->context_switch.next_prev_tid; + cpu = sample->cpu; + + if (tid == -1) + pr_warn("context_switch event has no tid\n"); + + return machine__set_current_tid(spe->machine, cpu, pid, tid); +} + static int arm_spe_process_event(struct perf_session *session, union perf_event *event, struct perf_sample *sample, @@ -718,6 +737,11 @@ static int arm_spe_process_event(struct perf_session *session, } } else if (timestamp) { err = arm_spe_process_queues(spe, timestamp); + if (err) + return err; + + if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE) + err = arm_spe_context_switch(spe, event, sample); } return err; -- 2.33.0.464.g1972c5931b-goog ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-09-16 0:17 [RFC] perf arm-spe: Track task context switch for cpu-mode events Namhyung Kim @ 2021-09-16 13:54 ` Leo Yan 2021-09-16 21:01 ` Namhyung Kim 0 siblings, 1 reply; 27+ messages in thread From: Leo Yan @ 2021-09-16 13:54 UTC (permalink / raw) To: Namhyung Kim Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian, James Clark, Tan Xiaojun, Adrian Hunter Hi Namhyung, On Wed, Sep 15, 2021 at 05:17:48PM -0700, Namhyung Kim wrote: > When perf report synthesize events from ARM SPE data, it refers to > current cpu, pid and tid in the machine. But there's no place to set > them in the ARM SPE decoder. I'm seeing all pid/tid is set to -1 and > user symbols are not resolved in the output. > > # perf record -a -e arm_spe_0/ts_enable=1/ sleep 1 > > # perf report -q | head > 8.77% 8.77% :-1 [kernel.kallsyms] [k] format_decode > 7.02% 7.02% :-1 [kernel.kallsyms] [k] seq_printf > 7.02% 7.02% :-1 [unknown] [.] 0x0000ffff9f687c34 > 5.26% 5.26% :-1 [kernel.kallsyms] [k] vsnprintf > 3.51% 3.51% :-1 [kernel.kallsyms] [k] string > 3.51% 3.51% :-1 [unknown] [.] 0x0000ffff9f66ae20 > 3.51% 3.51% :-1 [unknown] [.] 0x0000ffff9f670b3c > 3.51% 3.51% :-1 [unknown] [.] 0x0000ffff9f67c040 > 1.75% 1.75% :-1 [kernel.kallsyms] [k] ___cache_free > 1.75% 1.75% :-1 [kernel.kallsyms] [k] __count_memcg_events > > Like Intel PT, add context switch records to track task info. As ARM > SPE support was added later than PERF_RECORD_SWITCH_CPU_WIDE, I think > we can safely set the attr.context_switch bit and use it. Thanks for the patch. Before we had discussion for enabling PID/TID for SPE samples; in the patch set [1], patches 07, 08 set sample's pid/tid based on the Arm SPE context packets. To enable hardware tracing context ID, you also needs to enable kernel config CONFIG_PID_IN_CONTEXTIDR. At that time, there have a concern is the hardware context ID might introduce confusion for non-root namespace. We also considered to use PERF_RECORD_SWITCH_CPU_WIDE event for setting pid/tid, the Intel PT implementation uses two things to set sample's pid/tid: one is PERF_RECORD_SWITCH_CPU_WIDE event and another is to detect the branch instruction is the symbol "__switch_to". Since the trace event PERF_RECORD_SWITCH_CPU_WIDE is coarse, so it only uses the new pid/tid after the branch instruction for "__switch_to". Arm SPE is 'statistical', thus it cannot promise the trace data must contain the branch instruction for "__switch_to", please see details [2]. I think the feasible way is to use CONTEXTIDR to trace PID/TID _only_ for root namespace, and the perf tool uses context packet to set pid/tid for samples. So except we need patches 07 and 08, we also need a change in Arm SPE driver as below: diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c index d44bcc29d99c..2553d53d3772 100644 --- a/drivers/perf/arm_spe_pmu.c +++ b/drivers/perf/arm_spe_pmu.c @@ -272,7 +272,9 @@ static u64 arm_spe_event_to_pmscr(struct perf_event *event) if (!attr->exclude_kernel) reg |= BIT(SYS_PMSCR_EL1_E1SPE_SHIFT); - if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable()) + /* Only enable context ID tracing for root namespace */ + if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable() && + (task_active_pid_ns(current) == &init_pid_ns)) reg |= BIT(SYS_PMSCR_EL1_CX_SHIFT); return reg; Could you confirm if this works for you? If it's okay for you, I will sync with James for upstreaming the changes. Thanks, Leo [1] https://lore.kernel.org/lkml/20210119144658.793-8-james.clark@arm.com/ [2] https://lore.kernel.org/lkml/20210204102734.GA4737@leoy-ThinkPad-X240s/ ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-09-16 13:54 ` Leo Yan @ 2021-09-16 21:01 ` Namhyung Kim 2021-09-23 14:23 ` Leo Yan 0 siblings, 1 reply; 27+ messages in thread From: Namhyung Kim @ 2021-09-16 21:01 UTC (permalink / raw) To: Leo Yan Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian, James Clark, Adrian Hunter Hi Leo, (Removing Tan as the email bounced) On Thu, Sep 16, 2021 at 6:54 AM Leo Yan <leo.yan@linaro.org> wrote: > > Hi Namhyung, > > On Wed, Sep 15, 2021 at 05:17:48PM -0700, Namhyung Kim wrote: > > When perf report synthesize events from ARM SPE data, it refers to > > current cpu, pid and tid in the machine. But there's no place to set > > them in the ARM SPE decoder. I'm seeing all pid/tid is set to -1 and > > user symbols are not resolved in the output. > > > > # perf record -a -e arm_spe_0/ts_enable=1/ sleep 1 > > > > # perf report -q | head > > 8.77% 8.77% :-1 [kernel.kallsyms] [k] format_decode > > 7.02% 7.02% :-1 [kernel.kallsyms] [k] seq_printf > > 7.02% 7.02% :-1 [unknown] [.] 0x0000ffff9f687c34 > > 5.26% 5.26% :-1 [kernel.kallsyms] [k] vsnprintf > > 3.51% 3.51% :-1 [kernel.kallsyms] [k] string > > 3.51% 3.51% :-1 [unknown] [.] 0x0000ffff9f66ae20 > > 3.51% 3.51% :-1 [unknown] [.] 0x0000ffff9f670b3c > > 3.51% 3.51% :-1 [unknown] [.] 0x0000ffff9f67c040 > > 1.75% 1.75% :-1 [kernel.kallsyms] [k] ___cache_free > > 1.75% 1.75% :-1 [kernel.kallsyms] [k] __count_memcg_events > > > > Like Intel PT, add context switch records to track task info. As ARM > > SPE support was added later than PERF_RECORD_SWITCH_CPU_WIDE, I think > > we can safely set the attr.context_switch bit and use it. > > Thanks for the patch. > > Before we had discussion for enabling PID/TID for SPE samples; in the patch > set [1], patches 07, 08 set sample's pid/tid based on the Arm SPE context > packets. To enable hardware tracing context ID, you also needs to enable > kernel config CONFIG_PID_IN_CONTEXTIDR. Thanks for sharing this. Yeah I also look at the context info but having a dependency on a kconfig looks limiting its functionality. Also the kconfig says it has some overhead in the critical path (even if perf is not running, right?) - but not sure how much it can add. config PID_IN_CONTEXTIDR bool "Write the current PID to the CONTEXTIDR register" help Enabling this option causes the kernel to write the current PID to the CONTEXTIDR register, at the expense of some additional instructions during context switch. Say Y here only if you are planning to use hardware trace tools with this kernel. > > At that time, there have a concern is the hardware context ID might > introduce confusion for non-root namespace. Sounds like a problem. > > We also considered to use PERF_RECORD_SWITCH_CPU_WIDE event for setting > pid/tid, the Intel PT implementation uses two things to set sample's > pid/tid: one is PERF_RECORD_SWITCH_CPU_WIDE event and another is to detect > the branch instruction is the symbol "__switch_to". Since the trace > event PERF_RECORD_SWITCH_CPU_WIDE is coarse, so it only uses the new > pid/tid after the branch instruction for "__switch_to". Arm SPE is > 'statistical', thus it cannot promise the trace data must contain the > branch instruction for "__switch_to", please see details [2]. I can see the need in the Intel PT as it needs to trace all (branch) instructions, but is it really needed for ARM SPE too? Maybe I am missing something, but it seems enough to have a coarse-grained context switch for sampling events.. > > I think the feasible way is to use CONTEXTIDR to trace PID/TID _only_ > for root namespace, and the perf tool uses context packet to set > pid/tid for samples. So except we need patches 07 and 08, we also > need a change in Arm SPE driver as below: > > diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c > index d44bcc29d99c..2553d53d3772 100644 > --- a/drivers/perf/arm_spe_pmu.c > +++ b/drivers/perf/arm_spe_pmu.c > @@ -272,7 +272,9 @@ static u64 arm_spe_event_to_pmscr(struct perf_event *event) > if (!attr->exclude_kernel) > reg |= BIT(SYS_PMSCR_EL1_E1SPE_SHIFT); > > - if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable()) > + /* Only enable context ID tracing for root namespace */ > + if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable() && > + (task_active_pid_ns(current) == &init_pid_ns)) > reg |= BIT(SYS_PMSCR_EL1_CX_SHIFT); > > return reg; > > Could you confirm if this works for you? If it's okay for you, I will > sync with James for upstreaming the changes. Let me think about this more.. Thanks, Namhyung > > Thanks, > Leo > > [1] https://lore.kernel.org/lkml/20210119144658.793-8-james.clark@arm.com/ > [2] https://lore.kernel.org/lkml/20210204102734.GA4737@leoy-ThinkPad-X240s/ ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-09-16 21:01 ` Namhyung Kim @ 2021-09-23 14:23 ` Leo Yan 2021-09-23 16:01 ` Namhyung Kim 2021-09-30 15:08 ` James Clark 0 siblings, 2 replies; 27+ messages in thread From: Leo Yan @ 2021-09-23 14:23 UTC (permalink / raw) To: Namhyung Kim Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian, James Clark, Adrian Hunter Hi Namhyung, On Thu, Sep 16, 2021 at 02:01:21PM -0700, Namhyung Kim wrote: [...] > > Before we had discussion for enabling PID/TID for SPE samples; in the patch > > set [1], patches 07, 08 set sample's pid/tid based on the Arm SPE context > > packets. To enable hardware tracing context ID, you also needs to enable > > kernel config CONFIG_PID_IN_CONTEXTIDR. > > Thanks for sharing this. > > Yeah I also look at the context info but having a dependency on a kconfig > looks limiting its functionality. Also the kconfig says it has some overhead > in the critical path (even if perf is not running, right?) - but not sure how > much it can add. Yes, after enabled config PID_IN_CONTEXTIDR, the kernel will always write PID into the system register CONTEXTIDR during process context switching. Please see the flow: __switch_to() (arch/arm64/kernel/process.c) `-> contextidr_thread_switch(next) > config PID_IN_CONTEXTIDR > bool "Write the current PID to the CONTEXTIDR register" > help > Enabling this option causes the kernel to write the current PID to > the CONTEXTIDR register, at the expense of some additional > instructions during context switch. Say Y here only if you are > planning to use hardware trace tools with this kernel. > > > > > At that time, there have a concern is the hardware context ID might > > introduce confusion for non-root namespace. > > Sounds like a problem. > > > > > We also considered to use PERF_RECORD_SWITCH_CPU_WIDE event for setting > > pid/tid, the Intel PT implementation uses two things to set sample's > > pid/tid: one is PERF_RECORD_SWITCH_CPU_WIDE event and another is to detect > > the branch instruction is the symbol "__switch_to". Since the trace > > event PERF_RECORD_SWITCH_CPU_WIDE is coarse, so it only uses the new > > pid/tid after the branch instruction for "__switch_to". Arm SPE is > > 'statistical', thus it cannot promise the trace data must contain the > > branch instruction for "__switch_to", please see details [2]. > > I can see the need in the Intel PT as it needs to trace all (branch) > instructions, but is it really needed for ARM SPE too? > Maybe I am missing something, but it seems enough to have a > coarse-grained context switch for sampling events.. The issue is that the coarse-grained context switch if introduces any inaccuracy in the reported result. If we can run some workloads and prove the coarse-grained context switch doesn't cause significant bias, it will be great and can give us the confidence for this approach. Even enabling PERF_RECORD_SWITCH_CPU_WIDE event, I think it's good to give priority for hardware PID tracing in Arm SPE trace data, if detects the hardware PID tracing is enabled, then we can rollback to use context packets from hardware trace data to set sample's PID. How about you think for this? Thanks, Leo ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-09-23 14:23 ` Leo Yan @ 2021-09-23 16:01 ` Namhyung Kim 2021-09-30 18:47 ` Stephane Eranian 2021-09-30 15:08 ` James Clark 1 sibling, 1 reply; 27+ messages in thread From: Namhyung Kim @ 2021-09-23 16:01 UTC (permalink / raw) To: Leo Yan Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian, James Clark, Adrian Hunter Hi Leo, On Thu, Sep 23, 2021 at 7:23 AM Leo Yan <leo.yan@linaro.org> wrote: > > Hi Namhyung, > > On Thu, Sep 16, 2021 at 02:01:21PM -0700, Namhyung Kim wrote: > > [...] > > > > Before we had discussion for enabling PID/TID for SPE samples; in the patch > > > set [1], patches 07, 08 set sample's pid/tid based on the Arm SPE context > > > packets. To enable hardware tracing context ID, you also needs to enable > > > kernel config CONFIG_PID_IN_CONTEXTIDR. > > > > Thanks for sharing this. > > > > Yeah I also look at the context info but having a dependency on a kconfig > > looks limiting its functionality. Also the kconfig says it has some overhead > > in the critical path (even if perf is not running, right?) - but not sure how > > much it can add. > > Yes, after enabled config PID_IN_CONTEXTIDR, the kernel will always > write PID into the system register CONTEXTIDR during process context > switching. Please see the flow: > > __switch_to() (arch/arm64/kernel/process.c) > `-> contextidr_thread_switch(next) Thanks for the info. I assume it's a light-weight operation. > > > We also considered to use PERF_RECORD_SWITCH_CPU_WIDE event for setting > > > pid/tid, the Intel PT implementation uses two things to set sample's > > > pid/tid: one is PERF_RECORD_SWITCH_CPU_WIDE event and another is to detect > > > the branch instruction is the symbol "__switch_to". Since the trace > > > event PERF_RECORD_SWITCH_CPU_WIDE is coarse, so it only uses the new > > > pid/tid after the branch instruction for "__switch_to". Arm SPE is > > > 'statistical', thus it cannot promise the trace data must contain the > > > branch instruction for "__switch_to", please see details [2]. > > > > I can see the need in the Intel PT as it needs to trace all (branch) > > instructions, but is it really needed for ARM SPE too? > > Maybe I am missing something, but it seems enough to have a > > coarse-grained context switch for sampling events.. > > The issue is that the coarse-grained context switch if introduces any > inaccuracy in the reported result. If we can run some workloads and > prove the coarse-grained context switch doesn't cause significant bias, > it will be great and can give us the confidence for this approach. > > Even enabling PERF_RECORD_SWITCH_CPU_WIDE event, I think it's good to > give priority for hardware PID tracing in Arm SPE trace data, if detects > the hardware PID tracing is enabled, then we can rollback to use > context packets from hardware trace data to set sample's PID. > > How about you think for this? I think it's good as long as it has a fallback when the context info is not available. Thanks, Namhyung ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-09-23 16:01 ` Namhyung Kim @ 2021-09-30 18:47 ` Stephane Eranian 2021-10-01 10:44 ` James Clark 0 siblings, 1 reply; 27+ messages in thread From: Stephane Eranian @ 2021-09-30 18:47 UTC (permalink / raw) To: Namhyung Kim Cc: Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, James Clark, Adrian Hunter On Thu, Sep 23, 2021 at 9:02 AM Namhyung Kim <namhyung@kernel.org> wrote: > > Hi Leo, > > On Thu, Sep 23, 2021 at 7:23 AM Leo Yan <leo.yan@linaro.org> wrote: > > > > Hi Namhyung, > > > > On Thu, Sep 16, 2021 at 02:01:21PM -0700, Namhyung Kim wrote: > > > > [...] > > > > > > Before we had discussion for enabling PID/TID for SPE samples; in the patch > > > > set [1], patches 07, 08 set sample's pid/tid based on the Arm SPE context > > > > packets. To enable hardware tracing context ID, you also needs to enable > > > > kernel config CONFIG_PID_IN_CONTEXTIDR. > > > > > > Thanks for sharing this. > > > > > > Yeah I also look at the context info but having a dependency on a kconfig > > > looks limiting its functionality. Also the kconfig says it has some overhead > > > in the critical path (even if perf is not running, right?) - but not sure how > > > much it can add. > > > > Yes, after enabled config PID_IN_CONTEXTIDR, the kernel will always > > write PID into the system register CONTEXTIDR during process context > > switching. Please see the flow: > > > > __switch_to() (arch/arm64/kernel/process.c) > > `-> contextidr_thread_switch(next) > > Thanks for the info. I assume it's a light-weight operation. > > I'd like to understand why it was believed that having SPE record to PID could be too expensive vs. what I am seeing with all the tracking of context switches and the volume of data this generates. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-09-30 18:47 ` Stephane Eranian @ 2021-10-01 10:44 ` James Clark 2021-10-01 18:22 ` Stephane Eranian 0 siblings, 1 reply; 27+ messages in thread From: James Clark @ 2021-10-01 10:44 UTC (permalink / raw) To: Stephane Eranian, Namhyung Kim Cc: Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Adrian Hunter On 30/09/2021 19:47, Stephane Eranian wrote: > On Thu, Sep 23, 2021 at 9:02 AM Namhyung Kim <namhyung@kernel.org> wrote: >> >> Hi Leo, >> >> On Thu, Sep 23, 2021 at 7:23 AM Leo Yan <leo.yan@linaro.org> wrote: >>> >>> Hi Namhyung, >>> >>> On Thu, Sep 16, 2021 at 02:01:21PM -0700, Namhyung Kim wrote: >>> >>> [...] >>> >>>>> Before we had discussion for enabling PID/TID for SPE samples; in the patch >>>>> set [1], patches 07, 08 set sample's pid/tid based on the Arm SPE context >>>>> packets. To enable hardware tracing context ID, you also needs to enable >>>>> kernel config CONFIG_PID_IN_CONTEXTIDR. >>>> >>>> Thanks for sharing this. >>>> >>>> Yeah I also look at the context info but having a dependency on a kconfig >>>> looks limiting its functionality. Also the kconfig says it has some overhead >>>> in the critical path (even if perf is not running, right?) - but not sure how >>>> much it can add. >>> >>> Yes, after enabled config PID_IN_CONTEXTIDR, the kernel will always >>> write PID into the system register CONTEXTIDR during process context >>> switching. Please see the flow: >>> >>> __switch_to() (arch/arm64/kernel/process.c) >>> `-> contextidr_thread_switch(next) >> >> Thanks for the info. I assume it's a light-weight operation. >> >> > I'd like to understand why it was believed that having SPE record to > PID could be too expensive > vs. what I am seeing with all the tracking of context switches and the > volume of data this generates. > I think the justification about it being expensive is that when PID_IN_CONTEXTIDR is set, there is an extra few instructions to write that value on every context switch, whether SPE is enabled or not. So it has a system wide impact. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-10-01 10:44 ` James Clark @ 2021-10-01 18:22 ` Stephane Eranian 2021-10-04 15:19 ` Leo Yan 0 siblings, 1 reply; 27+ messages in thread From: Stephane Eranian @ 2021-10-01 18:22 UTC (permalink / raw) To: James Clark Cc: Namhyung Kim, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Adrian Hunter On Fri, Oct 1, 2021 at 3:44 AM James Clark <james.clark@arm.com> wrote: > > > > On 30/09/2021 19:47, Stephane Eranian wrote: > > On Thu, Sep 23, 2021 at 9:02 AM Namhyung Kim <namhyung@kernel.org> wrote: > >> > >> Hi Leo, > >> > >> On Thu, Sep 23, 2021 at 7:23 AM Leo Yan <leo.yan@linaro.org> wrote: > >>> > >>> Hi Namhyung, > >>> > >>> On Thu, Sep 16, 2021 at 02:01:21PM -0700, Namhyung Kim wrote: > >>> > >>> [...] > >>> > >>>>> Before we had discussion for enabling PID/TID for SPE samples; in the patch > >>>>> set [1], patches 07, 08 set sample's pid/tid based on the Arm SPE context > >>>>> packets. To enable hardware tracing context ID, you also needs to enable > >>>>> kernel config CONFIG_PID_IN_CONTEXTIDR. > >>>> > >>>> Thanks for sharing this. > >>>> > >>>> Yeah I also look at the context info but having a dependency on a kconfig > >>>> looks limiting its functionality. Also the kconfig says it has some overhead > >>>> in the critical path (even if perf is not running, right?) - but not sure how > >>>> much it can add. > >>> > >>> Yes, after enabled config PID_IN_CONTEXTIDR, the kernel will always > >>> write PID into the system register CONTEXTIDR during process context > >>> switching. Please see the flow: > >>> > >>> __switch_to() (arch/arm64/kernel/process.c) > >>> `-> contextidr_thread_switch(next) > >> > >> Thanks for the info. I assume it's a light-weight operation. > >> > >> > > I'd like to understand why it was believed that having SPE record to > > PID could be too expensive > > vs. what I am seeing with all the tracking of context switches and the > > volume of data this generates. > > > > I think the justification about it being expensive is that when PID_IN_CONTEXTIDR > is set, there is an extra few instructions to write that value on every context > switch, whether SPE is enabled or not. So it has a system wide impact. You could use a static key to make this conditional to having SPE running on the CPU like it is done for other PMU features on other architectures. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-10-01 18:22 ` Stephane Eranian @ 2021-10-04 15:19 ` Leo Yan 0 siblings, 0 replies; 27+ messages in thread From: Leo Yan @ 2021-10-04 15:19 UTC (permalink / raw) To: Stephane Eranian, Will Deacon, Catalin Marinas Cc: James Clark, Namhyung Kim, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Adrian Hunter Hi all, On Fri, Oct 01, 2021 at 11:22:33AM -0700, Stephane Eranian wrote: > On Fri, Oct 1, 2021 at 3:44 AM James Clark <james.clark@arm.com> wrote: > > On 30/09/2021 19:47, Stephane Eranian wrote: > > > On Thu, Sep 23, 2021 at 9:02 AM Namhyung Kim <namhyung@kernel.org> wrote: > > >> On Thu, Sep 23, 2021 at 7:23 AM Leo Yan <leo.yan@linaro.org> wrote: [...] > > >>> Yes, after enabled config PID_IN_CONTEXTIDR, the kernel will always > > >>> write PID into the system register CONTEXTIDR during process context > > >>> switching. Please see the flow: > > >>> > > >>> __switch_to() (arch/arm64/kernel/process.c) > > >>> `-> contextidr_thread_switch(next) > > >> > > >> Thanks for the info. I assume it's a light-weight operation. > > >> > > > I'd like to understand why it was believed that having SPE record to > > > PID could be too expensive > > > vs. what I am seeing with all the tracking of context switches and the > > > volume of data this generates. > > > > I think the justification about it being expensive is that when PID_IN_CONTEXTIDR > > is set, there is an extra few instructions to write that value on every context > > switch, whether SPE is enabled or not. So it has a system wide impact. > > You could use a static key to make this conditional to having SPE > running on the CPU like > it is done for other PMU features on other architectures. Good point for using static key to dynamically enable and disable CONTEXTIDR rather than using config PID_IN_CONTEXTIDR. For curious, I did a rough testing to compare the performance for using CONTEXTIDR, there have four testing configurations: - 'dis': Use the mainline kernel with disabling PID_IN_CONTEXTIDR. - 'enb': Use the mainline kernel with enablng PID_IN_CONTEXTIDR. - 'true': Use static key='TRUE' to store PID into CONTEXTIDR. - 'false': Use static key='FALSE' so don't store PID into CONTEXTIDR. I used the command 'perf bench sched messaging -t -g 20 -l 1000' as the testing case, the main reason is this case have total 800 threads for sending and receiving messages, so it should have huge times for context switching. The testing iterates for 5 loops for every configuration, and get the result for run time (in seconds): dis enb true false ------------------------------ #1 26.568 26.786 26.056 26.197 #2 26.442 26.457 26.458 26.846 #3 26.719 26.701 27.119 26.281 #4 26.448 27.595 26.953 27.043 #5 27.017 27.263 26.638 26.933 ------------------------------ avg. 26.638 26.960 26.644 26.66 ------------------------------ delta pct. 1.21% 0.02% 0.08% Compared with the base configuration ('dis' with disabling PID_IN_CONTEXTIDR), we can see the performance is only minor downgradation in other configurations ('enb': 1.21%, Enable static key: 0.02%, Disable static key: 0.08%). So seems to me, it's feasible to use static key to dynamically control the path for PID_IN_CONTEXTIDR. @Catalin, @Will, could you confirm if it is the right direction to use static key to replace PID_IN_CONTEXTIDR? Or have any concern for using the static key, like any secuirty reason? Also pasted the code for using static key for CONTEXTIDR in the bottom. Thanks, Leo ---8<--- diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h index f4ba93d4ffeb..857e35a8624a 100644 --- a/arch/arm64/include/asm/mmu_context.h +++ b/arch/arm64/include/asm/mmu_context.h @@ -26,9 +26,11 @@ extern bool rodata_full; +DECLARE_STATIC_KEY_FALSE(contextidr_used); + static inline void contextidr_thread_switch(struct task_struct *next) { - if (!IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR)) + if (!static_branch_unlikely(&contextidr_used)) return; write_sysreg(task_pid_nr(next), contextidr_el1); diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 40adb8cdbf5a..f6b6e73fac9d 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -487,6 +487,22 @@ void update_sctlr_el1(u64 sctlr) isb(); } +DEFINE_STATIC_KEY_FALSE(contextidr_used); + +static int __init contextidr_enable(char *str) +{ + unsigned int val = 0; + + get_option(&str, &val); + + if (val) + static_branch_enable(&contextidr_used); + + return 0; +} +early_param("contextidr_enable", contextidr_enable); + /* * Thread switching. */ ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-09-23 14:23 ` Leo Yan 2021-09-23 16:01 ` Namhyung Kim @ 2021-09-30 15:08 ` James Clark 2021-10-04 6:26 ` Leo Yan 1 sibling, 1 reply; 27+ messages in thread From: James Clark @ 2021-09-30 15:08 UTC (permalink / raw) To: Leo Yan, Namhyung Kim Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian, Adrian Hunter, german.gomez On 23/09/2021 15:23, Leo Yan wrote: > Hi Namhyung, > > On Thu, Sep 16, 2021 at 02:01:21PM -0700, Namhyung Kim wrote: > > [...] > >>> Before we had discussion for enabling PID/TID for SPE samples; in the patch >>> set [1], patches 07, 08 set sample's pid/tid based on the Arm SPE context >>> packets. To enable hardware tracing context ID, you also needs to enable >>> kernel config CONFIG_PID_IN_CONTEXTIDR. >> >> Thanks for sharing this. >> >> Yeah I also look at the context info but having a dependency on a kconfig >> looks limiting its functionality. Also the kconfig says it has some overhead >> in the critical path (even if perf is not running, right?) - but not sure how >> much it can add. > > Yes, after enabled config PID_IN_CONTEXTIDR, the kernel will always > write PID into the system register CONTEXTIDR during process context > switching. Please see the flow: > > __switch_to() (arch/arm64/kernel/process.c) > `-> contextidr_thread_switch(next) > >> config PID_IN_CONTEXTIDR >> bool "Write the current PID to the CONTEXTIDR register" >> help >> Enabling this option causes the kernel to write the current PID to >> the CONTEXTIDR register, at the expense of some additional >> instructions during context switch. Say Y here only if you are >> planning to use hardware trace tools with this kernel. >> >>> >>> At that time, there have a concern is the hardware context ID might >>> introduce confusion for non-root namespace. >> >> Sounds like a problem. >> >>> >>> We also considered to use PERF_RECORD_SWITCH_CPU_WIDE event for setting >>> pid/tid, the Intel PT implementation uses two things to set sample's >>> pid/tid: one is PERF_RECORD_SWITCH_CPU_WIDE event and another is to detect >>> the branch instruction is the symbol "__switch_to". Since the trace >>> event PERF_RECORD_SWITCH_CPU_WIDE is coarse, so it only uses the new >>> pid/tid after the branch instruction for "__switch_to". Arm SPE is >>> 'statistical', thus it cannot promise the trace data must contain the >>> branch instruction for "__switch_to", please see details [2]. >> >> I can see the need in the Intel PT as it needs to trace all (branch) >> instructions, but is it really needed for ARM SPE too? >> Maybe I am missing something, but it seems enough to have a >> coarse-grained context switch for sampling events.. > > The issue is that the coarse-grained context switch if introduces any > inaccuracy in the reported result. If we can run some workloads and > prove the coarse-grained context switch doesn't cause significant bias, > it will be great and can give us the confidence for this approach. It sounds like it's worth testing. Do you think the inaccuracy would only apply to code in the kernel around the time of the switch? Or do you think it could affect userspace as well? It seems to me that the switch event would have a timestamp that would precede _all_ userspace code, but I'm not 100% sure on that. I suppose it's easy to test. German Gomez actually starting looking into the switch events for SPE at the same time, so I've CCd him and maybe he can do some testing of the patch. Thanks James > > Even enabling PERF_RECORD_SWITCH_CPU_WIDE event, I think it's good to > give priority for hardware PID tracing in Arm SPE trace data, if detects > the hardware PID tracing is enabled, then we can rollback to use > context packets from hardware trace data to set sample's PID. > > How about you think for this? > > Thanks, > Leo > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-09-30 15:08 ` James Clark @ 2021-10-04 6:26 ` Leo Yan 2021-10-05 10:06 ` German Gomez 0 siblings, 1 reply; 27+ messages in thread From: Leo Yan @ 2021-10-04 6:26 UTC (permalink / raw) To: James Clark Cc: Namhyung Kim, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian, Adrian Hunter, german.gomez Hi James, On Thu, Sep 30, 2021 at 04:08:52PM +0100, James Clark wrote: > On 23/09/2021 15:23, Leo Yan wrote: > > On Thu, Sep 16, 2021 at 02:01:21PM -0700, Namhyung Kim wrote: [...] > >>> We also considered to use PERF_RECORD_SWITCH_CPU_WIDE event for setting > >>> pid/tid, the Intel PT implementation uses two things to set sample's > >>> pid/tid: one is PERF_RECORD_SWITCH_CPU_WIDE event and another is to detect > >>> the branch instruction is the symbol "__switch_to". Since the trace > >>> event PERF_RECORD_SWITCH_CPU_WIDE is coarse, so it only uses the new > >>> pid/tid after the branch instruction for "__switch_to". Arm SPE is > >>> 'statistical', thus it cannot promise the trace data must contain the > >>> branch instruction for "__switch_to", please see details [2]. > >> > >> I can see the need in the Intel PT as it needs to trace all (branch) > >> instructions, but is it really needed for ARM SPE too? > >> Maybe I am missing something, but it seems enough to have a > >> coarse-grained context switch for sampling events.. > > > > The issue is that the coarse-grained context switch if introduces any > > inaccuracy in the reported result. If we can run some workloads and > > prove the coarse-grained context switch doesn't cause significant bias, > > it will be great and can give us the confidence for this approach. > > It sounds like it's worth testing. Do you think the inaccuracy would only > apply to code in the kernel around the time of the switch? Or do you think > it could affect userspace as well? The inaccuracy should only apply to the kernel code. There would be some samples will be wrongly accounted for the next task between the function prepare_task_switch() and switch_to(). > It seems to me that the switch event > would have a timestamp that would precede _all_ userspace code, but I'm not > 100% sure on that. Yes, the switch event is generated in the scheduler which precede exiting to userspace: __schedule() `> context_switch() `> prepare_task_switch() `> perf_event_task_sched_out() > I suppose it's easy to test. I'd like to use the comparison method for the test: We should enable PID tracing and capture in the perf.data, when decode the trace data, we can based on context packet and based on the switch events to generate out two results, so we can check how the difference between these results. > German Gomez actually starting looking into the switch events for SPE at the > same time, so I've CCd him and maybe he can do some testing of the patch. Cool! German is welcome to continue the related work; since I am in holiday this week, I will try this as well, if I have any conclusion will get back to you guys. If the test result shows good enough, I personally think we need finish below items: - Enable PID tracing and decode with context packets; - Provide interface to user space so perf tool knows if should use hardware PID or rollback to context switch events; - Merge Namhyung's patch for using switch events for samples. Thanks, Leo ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-10-04 6:26 ` Leo Yan @ 2021-10-05 10:06 ` German Gomez 2021-10-06 9:36 ` Leo Yan 2021-10-06 14:06 ` James Clark 0 siblings, 2 replies; 27+ messages in thread From: German Gomez @ 2021-10-05 10:06 UTC (permalink / raw) To: Leo Yan, James Clark Cc: Namhyung Kim, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian, Adrian Hunter Hi Leo, On 04/10/2021 07:26, Leo Yan wrote: > Hi James, > > On Thu, Sep 30, 2021 at 04:08:52PM +0100, James Clark wrote: >> On 23/09/2021 15:23, Leo Yan wrote: >>> On Thu, Sep 16, 2021 at 02:01:21PM -0700, Namhyung Kim wrote: > [...] > I'd like to use the comparison method for the test: > We should enable PID tracing and capture in the perf.data, when decode > the trace data, we can based on context packet and based on the switch > events to generate out two results, so we can check how the difference > between these results. Yesterday we did some testing and found that there seems to be an exact match between using context packets and switch events. However this only applies when tracing in userspace (by adding the 'u' suffix to the perf event). Otherwise we still see as much as 2% of events having the wrong PID around the time of the switch. In order to measure this I applied Namhyung's patch and James's patch from [1]. Then added a printf line to the function arm_spe_prep_sample where I have access to both PID values, as a quick way to compare them later in a perf-report run. This is the diff of the printf patch: diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c index 41385ab96fbc..591985c66ac4 100644 --- a/tools/perf/util/arm-spe.c +++ b/tools/perf/util/arm-spe.c @@ -247,6 +247,8 @@ static void arm_spe_prep_sample(struct arm_spe *spe, event->sample.header.type = PERF_RECORD_SAMPLE; event->sample.header.misc = sample->cpumode; event->sample.header.size = sizeof(struct perf_event_header); + + printf(">>>>>> %d / %lu\n", speq->tid, record->context_id & 0x7fff); } The differences obtained as error % were obtained by running the following perf-record commands for different configurations: $ sudo ./perf record -e arm_spe/ts_enable=1,load_filter=1,store_filter=1/u -a -- sleep 60 $ sudo ./perf report --stdio \ | grep ">>>>>>" \ | awk '{total++; if($2!=$4) miss++} END {print "Error: " (100*miss/total) "% out of " total " samples"}' Error: 0% out of 11839328 samples $ sudo ./perf record -e arm_spe/ts_enable=1,load_filter=1,store_filter=1/ -a -- sleep 10 $ sudo ./perf report --stdio \ | grep ">>>>>>" \ | awk '{total++; if($2!=$4) miss++} END {print "Error: " (100*miss/total) "% out of " total " samples"}' Error: 1.30624% out of 3418731 samples >> German Gomez actually starting looking into the switch events for SPE at the >> same time, so I've CCd him and maybe he can do some testing of the patch. > Cool! German is welcome to continue the related work; since I am in > holiday this week, I will try this as well, if I have any conclusion > will get back to you guys. > > If the test result shows good enough, I personally think we need finish > below items: > > - Enable PID tracing and decode with context packets; > - Provide interface to user space so perf tool knows if should use > hardware PID or rollback to context switch events; > - Merge Namhyung's patch for using switch events for samples. > > Thanks, > Leo I think the fallback to using switch when we can't use the CONTEXTIDR register is a viable option for userspace events, but maybe not so much for non-userspace. Thanks, German [1] https://www.spinics.net/lists/linux-perf-users/msg12543.html ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-10-05 10:06 ` German Gomez @ 2021-10-06 9:36 ` Leo Yan 2021-10-06 16:09 ` Namhyung Kim 2021-10-06 14:06 ` James Clark 1 sibling, 1 reply; 27+ messages in thread From: Leo Yan @ 2021-10-06 9:36 UTC (permalink / raw) To: German Gomez Cc: James Clark, Namhyung Kim, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian, Adrian Hunter Hi German, On Tue, Oct 05, 2021 at 11:06:12AM +0100, German Gomez wrote: [...] > Yesterday we did some testing and found that there seems to be an exact > match between using context packets and switch events. However this only > applies when tracing in userspace (by adding the 'u' suffix to the perf > event). Otherwise we still see as much as 2% of events having the wrong > PID around the time of the switch. This result sounds reasonable for me, if we only trace the userspace, the result should have no any difference between using context packets and switch events. It's a bit high deviation with switch events (1.30% as shown in below result) after enable kernel tracing. > In order to measure this I applied Namhyung's patch and James's patch > from [1]. Then added a printf line to the function arm_spe_prep_sample > where I have access to both PID values, as a quick way to compare them > later in a perf-report run. This is the diff of the printf patch: > > diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c > index 41385ab96fbc..591985c66ac4 100644 > --- a/tools/perf/util/arm-spe.c > +++ b/tools/perf/util/arm-spe.c > @@ -247,6 +247,8 @@ static void arm_spe_prep_sample(struct arm_spe *spe, > event->sample.header.type = PERF_RECORD_SAMPLE; > event->sample.header.misc = sample->cpumode; > event->sample.header.size = sizeof(struct perf_event_header); > + > + printf(">>>>>> %d / %lu\n", speq->tid, record->context_id & 0x7fff); > } > > The differences obtained as error % were obtained by running the > following perf-record commands for different configurations: > > $ sudo ./perf record -e arm_spe/ts_enable=1,load_filter=1,store_filter=1/u -a -- sleep 60 > $ sudo ./perf report --stdio \ > | grep ">>>>>>" \ > | awk '{total++; if($2!=$4) miss++} END {print "Error: " (100*miss/total) "% out of " total " samples"}' > > Error: 0% out of 11839328 samples > > $ sudo ./perf record -e arm_spe/ts_enable=1,load_filter=1,store_filter=1/ -a -- sleep 10 > $ sudo ./perf report --stdio \ > | grep ">>>>>>" \ > | awk '{total++; if($2!=$4) miss++} END {print "Error: " (100*miss/total) "% out of " total " samples"}' > > Error: 1.30624% out of 3418731 samples Thanks for sharing this! > I think the fallback to using switch when we can't use the CONTEXTIDR > register is a viable option for userspace events, but maybe not so much > for non-userspace. Agreed. If so, it's good to check the variable 'evsel->core.attr.exclude_kernel' when decode Arm SPE trace data, and only use context switch event when 'exclude_kernel' is set. Here should note one thing is the perf tool needs to have knowledge to decide if the bit 3 'CX' (macro 'SYS_PMSCR_EL1_CX_SHIFT' in kernel) has been set in register PMSCR or not. AFAIK, Arm SPE driver doesn't expose any interface (or config) to userspace for the context tracing, so one method is to add an extra config in Arm SPE driver for this bit, e.g. 'ATTR_CFG_FLD_cx_enable_CFG' can be added in Arm SPE driver. Alternatively, rather than adding new config, I am just wandering we simply use two flags in perf's decoding: 'use_switch_event_for_pid' and 'use_ctx_packet_for_pid', the first variable will be set if detects the tracing is userspace only, the second varaible will be set when detects the hardware tracing containing context packet. So if the variable 'use_ctx_packet_for_pid' has been set, then the decoder will always use context packet for sample's PID, otherwise, it falls back to check 'use_switch_event_for_pid' and set sample PID based on switch events. If have any other idea, please feel free bring up. Thanks, Leo ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-10-06 9:36 ` Leo Yan @ 2021-10-06 16:09 ` Namhyung Kim 2021-10-08 11:07 ` German Gomez 0 siblings, 1 reply; 27+ messages in thread From: Namhyung Kim @ 2021-10-06 16:09 UTC (permalink / raw) To: Leo Yan Cc: German Gomez, James Clark, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian, Adrian Hunter Hi Leo and German, On Wed, Oct 6, 2021 at 2:36 AM Leo Yan <leo.yan@linaro.org> wrote: > > Hi German, > > On Tue, Oct 05, 2021 at 11:06:12AM +0100, German Gomez wrote: > > [...] > > > Yesterday we did some testing and found that there seems to be an exact > > match between using context packets and switch events. However this only > > applies when tracing in userspace (by adding the 'u' suffix to the perf > > event). Otherwise we still see as much as 2% of events having the wrong > > PID around the time of the switch. > > This result sounds reasonable for me, if we only trace the userspace, > the result should have no any difference between using context packets > and switch events. > > It's a bit high deviation with switch events (1.30% as shown in below > result) after enable kernel tracing. Yes, it's bigger than I expected, but it'd be workload specific. > > > In order to measure this I applied Namhyung's patch and James's patch > > from [1]. Then added a printf line to the function arm_spe_prep_sample > > where I have access to both PID values, as a quick way to compare them > > later in a perf-report run. This is the diff of the printf patch: > > > > diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c > > index 41385ab96fbc..591985c66ac4 100644 > > --- a/tools/perf/util/arm-spe.c > > +++ b/tools/perf/util/arm-spe.c > > @@ -247,6 +247,8 @@ static void arm_spe_prep_sample(struct arm_spe *spe, > > event->sample.header.type = PERF_RECORD_SAMPLE; > > event->sample.header.misc = sample->cpumode; > > event->sample.header.size = sizeof(struct perf_event_header); > > + > > + printf(">>>>>> %d / %lu\n", speq->tid, record->context_id & 0x7fff); > > } > > > > The differences obtained as error % were obtained by running the > > following perf-record commands for different configurations: > > > > $ sudo ./perf record -e arm_spe/ts_enable=1,load_filter=1,store_filter=1/u -a -- sleep 60 > > $ sudo ./perf report --stdio \ > > | grep ">>>>>>" \ > > | awk '{total++; if($2!=$4) miss++} END {print "Error: " (100*miss/total) "% out of " total " samples"}' > > > > Error: 0% out of 11839328 samples > > > > $ sudo ./perf record -e arm_spe/ts_enable=1,load_filter=1,store_filter=1/ -a -- sleep 10 > > $ sudo ./perf report --stdio \ > > | grep ">>>>>>" \ > > | awk '{total++; if($2!=$4) miss++} END {print "Error: " (100*miss/total) "% out of " total " samples"}' > > > > Error: 1.30624% out of 3418731 samples > > Thanks for sharing this! > > > I think the fallback to using switch when we can't use the CONTEXTIDR > > register is a viable option for userspace events, but maybe not so much > > for non-userspace. > > Agreed. > > If so, it's good to check the variable > 'evsel->core.attr.exclude_kernel' when decode Arm SPE trace data, and > only use context switch event when 'exclude_kernel' is set. I think it'd be better to check it in perf record and not set evsel->core.attr.context_switch if possible. Or it can ignore the context switch once it sees a context packet. > > Here should note one thing is the perf tool needs to have knowledge to > decide if the bit 3 'CX' (macro 'SYS_PMSCR_EL1_CX_SHIFT' in kernel) has > been set in register PMSCR or not. AFAIK, Arm SPE driver doesn't > expose any interface (or config) to userspace for the context tracing, > so one method is to add an extra config in Arm SPE driver for this > bit, e.g. 'ATTR_CFG_FLD_cx_enable_CFG' can be added in Arm SPE driver. > > Alternatively, rather than adding new config, I am just wandering we > simply use two flags in perf's decoding: 'use_switch_event_for_pid' and > 'use_ctx_packet_for_pid', the first variable will be set if detects > the tracing is userspace only, the second varaible will be set when > detects the hardware tracing containing context packet. So if the > variable 'use_ctx_packet_for_pid' has been set, then the decoder will > always use context packet for sample's PID, otherwise, it falls back > to check 'use_switch_event_for_pid' and set sample PID based on switch > events. > > If have any other idea, please feel free bring up. If it's just kernel config, we can check /proc/config.gz or /boot/config-$(uname -r). When it knows for sure it can just use the context packet, otherwise it needs the context switch. Thanks, Namhyung ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-10-06 16:09 ` Namhyung Kim @ 2021-10-08 11:07 ` German Gomez 2021-10-09 0:12 ` Namhyung Kim 0 siblings, 1 reply; 27+ messages in thread From: German Gomez @ 2021-10-08 11:07 UTC (permalink / raw) To: Namhyung Kim, Leo Yan Cc: James Clark, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian, Adrian Hunter Hi Leo, Namhyung, On 06/10/2021 17:09, Namhyung Kim wrote: > Hi Leo and German, > > [...] > > I think it'd be better to check it in perf record and not set > evsel->core.attr.context_switch if possible. > > Or it can ignore the context switch once it sees a context packet. > >> Here should note one thing is the perf tool needs to have knowledge to >> decide if the bit 3 'CX' (macro 'SYS_PMSCR_EL1_CX_SHIFT' in kernel) has >> been set in register PMSCR or not. AFAIK, Arm SPE driver doesn't >> expose any interface (or config) to userspace for the context tracing, >> so one method is to add an extra config in Arm SPE driver for this >> bit, e.g. 'ATTR_CFG_FLD_cx_enable_CFG' can be added in Arm SPE driver. >> >> Alternatively, rather than adding new config, I am just wandering we >> simply use two flags in perf's decoding: 'use_switch_event_for_pid' and >> 'use_ctx_packet_for_pid', the first variable will be set if detects >> the tracing is userspace only, the second varaible will be set when >> detects the hardware tracing containing context packet. So if the >> variable 'use_ctx_packet_for_pid' has been set, then the decoder will >> always use context packet for sample's PID, otherwise, it falls back >> to check 'use_switch_event_for_pid' and set sample PID based on switch >> events. >> >> If have any other idea, please feel free bring up. > If it's just kernel config, we can check /proc/config.gz or > /boot/config-$(uname -r). When it knows for sure it can just use > the context packet, otherwise it needs the context switch. > > Thanks, > Namhyung Please correct me if I'm wrong, after disabling the PID_IN_CONTEXTIDR feature in the kernel, I don't see any context packets in the auxtraces. I think after applying the patch from [1], it should be sufficient to determine if pid tracing should fall back to use --switch-events when context_id from that patch has a value of -1. If the patch at the end of this message is applied on top of Namhyuna's and [1], I think it can work. Also, if the pmu driver is patched to disable the 'CX' bit when the pid is not in the root namespace [2] (unfortunately I haven't been able to set up an environment to properly test Leo's patch yet) tracing could also fall back to context-switch for userspace tracing. What do you think? Thanks, German [1] https://www.spinics.net/lists/linux-perf-users/msg12543.html [2] https://lore.kernel.org/lkml/20210916135418.GA383600@leoy-ThinkPad-X240s/ diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c index 708323d..e224665 100644 --- a/tools/perf/util/arm-spe.c +++ b/tools/perf/util/arm-spe.c @@ -71,6 +71,12 @@ u64 kernel_start; unsigned long num_events; + + /* + * Used for PID tracing. + */ + u8 use_context_id_pkt; + u8 use_context_switch_event; }; struct arm_spe_queue { @@ -586,13 +592,30 @@ return timeless_decoding; } +static bool arm_spe__is_exclude_kernel(struct arm_spe *spe) { + struct evsel *evsel; + struct evlist *evlist = spe->session->evlist; + + evlist__for_each_entry(evlist, evsel) { + if (evsel->core.attr.type == spe->pmu_type && evsel->core.attr.exclude_kernel) + return true; + } + + return false; +} + static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe, struct auxtrace_queue *queue) { struct arm_spe_queue *speq = queue->priv; - pid_t tid; + pid_t tid = -1; - tid = machine__get_current_tid(spe->machine, speq->cpu); + if (spe->use_context_id_pkt) + tid = speq->decoder->record.context_id; + + if (tid == -1 && spe->use_context_switch_event) + tid = machine__get_current_tid(spe->machine, speq->cpu); + if (tid != -1) { speq->tid = tid; thread__zput(speq->thread); @@ -1084,6 +1107,15 @@ spe->timeless_decoding = arm_spe__is_timeless_decoding(spe); /* + * Always try to use context packet by default for pid tracing. + * + * If it's not enabled in the pmu driver, it will always have a value of -1 and we can try + * to fall back to using context-switch events instead. + */ + spe->use_context_id_pkt = true; + spe->use_context_switch_event = arm_spe__is_exclude_kernel(spe); + + /* * The synthesized event PERF_RECORD_TIME_CONV has been handled ahead * and the parameters for hardware clock are stored in the session * context. Passes these parameters to the struct perf_tsc_conversion @@ -1141,4 +1173,4 @@ err_free: free(spe); return err; -} +} \ No newline at end of file ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-10-08 11:07 ` German Gomez @ 2021-10-09 0:12 ` Namhyung Kim 2021-10-11 13:58 ` German Gomez 0 siblings, 1 reply; 27+ messages in thread From: Namhyung Kim @ 2021-10-09 0:12 UTC (permalink / raw) To: German Gomez Cc: Leo Yan, James Clark, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian, Adrian Hunter Hi German, On Fri, Oct 8, 2021 at 4:08 AM German Gomez <german.gomez@arm.com> wrote: > > Hi Leo, Namhyung, > > On 06/10/2021 17:09, Namhyung Kim wrote: > > Hi Leo and German, > > > > [...] > > > > I think it'd be better to check it in perf record and not set > > evsel->core.attr.context_switch if possible. > > > > Or it can ignore the context switch once it sees a context packet. > > > >> Here should note one thing is the perf tool needs to have knowledge to > >> decide if the bit 3 'CX' (macro 'SYS_PMSCR_EL1_CX_SHIFT' in kernel) has > >> been set in register PMSCR or not. AFAIK, Arm SPE driver doesn't > >> expose any interface (or config) to userspace for the context tracing, > >> so one method is to add an extra config in Arm SPE driver for this > >> bit, e.g. 'ATTR_CFG_FLD_cx_enable_CFG' can be added in Arm SPE driver. > >> > >> Alternatively, rather than adding new config, I am just wandering we > >> simply use two flags in perf's decoding: 'use_switch_event_for_pid' and > >> 'use_ctx_packet_for_pid', the first variable will be set if detects > >> the tracing is userspace only, the second varaible will be set when > >> detects the hardware tracing containing context packet. So if the > >> variable 'use_ctx_packet_for_pid' has been set, then the decoder will > >> always use context packet for sample's PID, otherwise, it falls back > >> to check 'use_switch_event_for_pid' and set sample PID based on switch > >> events. > >> > >> If have any other idea, please feel free bring up. > > If it's just kernel config, we can check /proc/config.gz or > > /boot/config-$(uname -r). When it knows for sure it can just use > > the context packet, otherwise it needs the context switch. > > > > Thanks, > > Namhyung > > Please correct me if I'm wrong, after disabling the PID_IN_CONTEXTIDR > feature in the kernel, I don't see any context packets in the auxtraces. > I think after applying the patch from [1], it should be sufficient to > determine if pid tracing should fall back to use --switch-events when > context_id from that patch has a value of -1. > > If the patch at the end of this message is applied on top of Namhyuna's > and [1], I think it can work. Also, if the pmu driver is patched to > disable the 'CX' bit when the pid is not in the root namespace [2] > (unfortunately I haven't been able to set up an environment to properly > test Leo's patch yet) tracing could also fall back to context-switch for > userspace tracing. What do you think? I think we should use context-switch even for kernel samples, but only if the context packets are not available. Thanks, Namhyung > > Thanks, > German > > [1] https://www.spinics.net/lists/linux-perf-users/msg12543.html > [2] https://lore.kernel.org/lkml/20210916135418.GA383600@leoy-ThinkPad-X240s/ > > diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c > index 708323d..e224665 100644 > --- a/tools/perf/util/arm-spe.c > +++ b/tools/perf/util/arm-spe.c > @@ -71,6 +71,12 @@ > u64 kernel_start; > > unsigned long num_events; > + > + /* > + * Used for PID tracing. > + */ > + u8 use_context_id_pkt; > + u8 use_context_switch_event; > }; > > struct arm_spe_queue { > @@ -586,13 +592,30 @@ > return timeless_decoding; > } > > +static bool arm_spe__is_exclude_kernel(struct arm_spe *spe) { > + struct evsel *evsel; > + struct evlist *evlist = spe->session->evlist; > + > + evlist__for_each_entry(evlist, evsel) { > + if (evsel->core.attr.type == spe->pmu_type && evsel->core.attr.exclude_kernel) > + return true; > + } > + > + return false; > +} > + > static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe, > struct auxtrace_queue *queue) > { > struct arm_spe_queue *speq = queue->priv; > - pid_t tid; > + pid_t tid = -1; > > - tid = machine__get_current_tid(spe->machine, speq->cpu); > + if (spe->use_context_id_pkt) > + tid = speq->decoder->record.context_id; > + > + if (tid == -1 && spe->use_context_switch_event) > + tid = machine__get_current_tid(spe->machine, speq->cpu); > + > if (tid != -1) { > speq->tid = tid; > thread__zput(speq->thread); > @@ -1084,6 +1107,15 @@ > spe->timeless_decoding = arm_spe__is_timeless_decoding(spe); > > /* > + * Always try to use context packet by default for pid tracing. > + * > + * If it's not enabled in the pmu driver, it will always have a value of -1 and we can try > + * to fall back to using context-switch events instead. > + */ > + spe->use_context_id_pkt = true; > + spe->use_context_switch_event = arm_spe__is_exclude_kernel(spe); > + > + /* > * The synthesized event PERF_RECORD_TIME_CONV has been handled ahead > * and the parameters for hardware clock are stored in the session > * context. Passes these parameters to the struct perf_tsc_conversion > @@ -1141,4 +1173,4 @@ > err_free: > free(spe); > return err; > -} > +} > \ No newline at end of file ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-10-09 0:12 ` Namhyung Kim @ 2021-10-11 13:58 ` German Gomez 2021-10-11 14:29 ` Leo Yan 0 siblings, 1 reply; 27+ messages in thread From: German Gomez @ 2021-10-11 13:58 UTC (permalink / raw) To: Namhyung Kim Cc: Leo Yan, James Clark, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian, Adrian Hunter Hi Namhyung, On 09/10/2021 01:12, Namhyung Kim wrote: > Hi German, > > On Fri, Oct 8, 2021 at 4:08 AM German Gomez <german.gomez@arm.com> wrote: > > [...] > > I think we should use context-switch even for kernel samples, but > only if the context packets are not available. > > Thanks, > Namhyung I think I agree with that you say. If --switch-events is not enabled by default like you mention, an user could opt-in to using the fallback if there's no better option for kernel tracing yet. @Leo, what are your thoughts on this? Perhaps adding a warning message to tell the user to please enable context packets, otherwise the results will have workload-dependant inaccuracies, could be a good enough compromise? Thanks, German ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-10-11 13:58 ` German Gomez @ 2021-10-11 14:29 ` Leo Yan 2021-10-12 11:07 ` German Gomez 0 siblings, 1 reply; 27+ messages in thread From: Leo Yan @ 2021-10-11 14:29 UTC (permalink / raw) To: German Gomez Cc: Namhyung Kim, James Clark, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian, Adrian Hunter Hi German, On Mon, Oct 11, 2021 at 02:58:40PM +0100, German Gomez wrote: > Hi Namhyung, > > On 09/10/2021 01:12, Namhyung Kim wrote: > > > Hi German, > > > > On Fri, Oct 8, 2021 at 4:08 AM German Gomez <german.gomez@arm.com> wrote: > > > > [...] > > > > I think we should use context-switch even for kernel samples, but > > only if the context packets are not available. > > > > Thanks, > > Namhyung > > I think I agree with that you say. If --switch-events is not enabled by > default like you mention, an user could opt-in to using the fallback if > there's no better option for kernel tracing yet. Actually I took time to try to find some way to enable switch events conditionally. As Namhyung suggested, we can enable the switch events in the perf tool (should do this in arm_spe_recording_options()), I am just wandering if perf tool can enable switch event only when it runs in the non-root namespace. I looked the code util/namespaces.c but still fail to find any approach to confirm the perf is running in the root namespace... anyway, this is not critical for this work. Welcome if anyone has idea for this. > @Leo, what are your thoughts on this? Perhaps adding a warning message > to tell the user to please enable context packets, otherwise the results > will have workload-dependant inaccuracies, could be a good enough > compromise? Yeah, this is exactly what I think. It's good to give a warning so users have knowledge for the potential inaccuracies. Thanks, Leo ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-10-11 14:29 ` Leo Yan @ 2021-10-12 11:07 ` German Gomez 2021-10-18 11:01 ` German Gomez 0 siblings, 1 reply; 27+ messages in thread From: German Gomez @ 2021-10-12 11:07 UTC (permalink / raw) To: Leo Yan, Namhyung Kim Cc: James Clark, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian, Adrian Hunter Hi, Leo and Namhyung, I want to make sure I'm on the same page as you regarding this topic. On 11/10/2021 15:29, Leo Yan wrote: > Hi German, > > On Mon, Oct 11, 2021 at 02:58:40PM +0100, German Gomez wrote: >> Hi Namhyung, >> >> On 09/10/2021 01:12, Namhyung Kim wrote: >> >>> Hi German, >>> >>> On Fri, Oct 8, 2021 at 4:08 AM German Gomez <german.gomez@arm.com> wrote: >>> >>> [...] >>> >>> I think we should use context-switch even for kernel samples, but >>> only if the context packets are not available. Do you think we should use them also when tracing outside of the root namespace? I'm no sure if we are considering the driver patch to disable context packets in non-root ns from earlier. >> [...] >> Actually I took time to try to find some way to enable switch events >> conditionally. As Namhyung suggested, we can enable the switch events >> in the perf tool (should do this in arm_spe_recording_options()), I am >> just wandering if perf tool can enable switch event only when it runs >> in the non-root namespace. I looked the code util/namespaces.c but >> still fail to find any approach to confirm the perf is running in >> the root namespace... anyway, this is not critical for this work. >> >> Welcome if anyone has idea for this. Thanks, Leo. We'll let you know if we come up with something too. > >> @Leo, what are your thoughts on this? Perhaps adding a warning message >> to tell the user to please enable context packets, otherwise the results >> will have workload-dependant inaccuracies, could be a good enough >> compromise? > Yeah, this is exactly what I think. It's good to give a warning so > users have knowledge for the potential inaccuracies. > > Thanks, > Leo If we are not considering patching the driver at this stage, so we allow hardware tracing on non-root namespaces. I think we could proceed like this: - For userspace, always use context-switch events as they are accurate and consistent with namespaces. - For kernel tracing, if context packets are enabled, use them, but warn the user that the PIDs correspond to the root namespace. - Otherwise, use context-switch events and warn the user of the time inaccuracies. Later, if the driver is patched to disable context packets outside the root namespace, kernel tracing could fall back to using context-switch events and warn the user with a single message about the time inaccuracies. If we are aligned, we could collect your feedback and share an updated patch that considers the warnings. Many thanks Best regards ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-10-12 11:07 ` German Gomez @ 2021-10-18 11:01 ` German Gomez 2021-10-18 13:23 ` Leo Yan 0 siblings, 1 reply; 27+ messages in thread From: German Gomez @ 2021-10-18 11:01 UTC (permalink / raw) To: Leo Yan, Namhyung Kim, James Clark Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian, Adrian Hunter Hi, What do you thing of the patch below? PERF_RECORD_SWITCH events are also included for tracing forks. The patch would sit on top of Namhyung's. Thanks, German On 12/10/2021 12:07, German Gomez wrote: > Hi, Leo and Namhyung, > > I want to make sure I'm on the same page as you regarding this topic. > > [...] > > If we are not considering patching the driver at this stage, so we allow > hardware tracing on non-root namespaces. I think we could proceed like > this: > > - For userspace, always use context-switch events as they are > accurate and consistent with namespaces. > - For kernel tracing, if context packets are enabled, use them, but > warn the user that the PIDs correspond to the root namespace. > - Otherwise, use context-switch events and warn the user of the time > inaccuracies. > > Later, if the driver is patched to disable context packets outside the > root namespace, kernel tracing could fall back to using context-switch > events and warn the user with a single message about the time > inaccuracies. > > If we are aligned, we could collect your feedback and share an updated > patch that considers the warnings. > > Many thanks > Best regards --- tools/perf/util/arm-spe.c | 66 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 63 insertions(+), 3 deletions(-) diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c index 708323d7c93c..6a2f7a484a80 100644 --- a/tools/perf/util/arm-spe.c +++ b/tools/perf/util/arm-spe.c @@ -71,6 +71,17 @@ struct arm_spe { u64 kernel_start; unsigned long num_events; + + /* + * Used for PID tracing. + */ + u8 exclude_kernel; + + /* + * Warning messages. + */ + u8 warn_context_pkt_namesapce; + u8 warn_context_switch_ev_accuracy; }; struct arm_spe_queue { @@ -586,11 +597,42 @@ static bool arm_spe__is_timeless_decoding(struct arm_spe *spe) return timeless_decoding; } +static bool arm_spe__is_exclude_kernel(struct arm_spe *spe) { + struct evsel *evsel; + struct evlist *evlist = spe->session->evlist; + + evlist__for_each_entry(evlist, evsel) { + if (evsel->core.attr.type == spe->pmu_type && evsel->core.attr.exclude_kernel) + return true; + } + + return false; +} + static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe, struct auxtrace_queue *queue) { struct arm_spe_queue *speq = queue->priv; - pid_t tid; + pid_t tid = machine__get_current_tid(spe->machine, speq->cpu); + u64 context_id = speq->decoder->record.context_id; + + /* + * We're tracing the kernel. + */ + if (!spe->exclude_kernel) { + /* + * Use CONTEXT packets in kernel tracing if available and warn the user of the + * values correspond to the root PID namespace. + * + * If CONTEXT packets aren't available but context-switch events are, warn the user + * of the time inaccuracies. + */ + if (context_id != (u64) -1) { + tid = speq->decoder->record.context_id; + spe->warn_context_pkt_namesapce = true; + } else if (tid != -1 && context_id == (u64) -1) + spe->warn_context_switch_ev_accuracy = true; + } tid = machine__get_current_tid(spe->machine, speq->cpu); if (tid != -1) { @@ -740,7 +782,8 @@ static int arm_spe_process_event(struct perf_session *session, if (err) return err; - if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE) + if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE || + event->header.type == PERF_RECORD_SWITCH) err = arm_spe_context_switch(spe, event, sample); } @@ -807,7 +850,20 @@ static int arm_spe_flush(struct perf_session *session __maybe_unused, return arm_spe_process_timeless_queues(spe, -1, MAX_TIMESTAMP - 1); - return arm_spe_process_queues(spe, MAX_TIMESTAMP); + ret = arm_spe_process_queues(spe, MAX_TIMESTAMP); + + if (spe->warn_context_pkt_namesapce) + ui__warning( + "Arm SPE CONTEXT packets used for PID/TID tracing.\n\n" + "PID values correspond to the root PID namespace.\n\n"); + + if (spe->warn_context_switch_ev_accuracy) + ui__warning( + "No Arm SPE CONTEXT packets found within traces.\n\n" + "Fallback to PERF_RECORD_SWITCH events for PID/TID tracing will have\n" + "workload-dependant timing inaccuracies.\n\n"); + + return ret; } static void arm_spe_free_queue(void *priv) @@ -1083,6 +1139,10 @@ int arm_spe_process_auxtrace_info(union perf_event *event, spe->timeless_decoding = arm_spe__is_timeless_decoding(spe); + spe->exclude_kernel = arm_spe__is_exclude_kernel(spe); + spe->warn_context_pkt_namesapce = false; + spe->warn_context_switch_ev_accuracy = false; + /* * The synthesized event PERF_RECORD_TIME_CONV has been handled ahead * and the parameters for hardware clock are stored in the session -- 2.17.1 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-10-18 11:01 ` German Gomez @ 2021-10-18 13:23 ` Leo Yan 2021-10-19 12:21 ` German Gomez 0 siblings, 1 reply; 27+ messages in thread From: Leo Yan @ 2021-10-18 13:23 UTC (permalink / raw) To: German Gomez Cc: Namhyung Kim, James Clark, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian, Adrian Hunter Hi German, On Mon, Oct 18, 2021 at 12:01:27PM +0100, German Gomez wrote: > Hi, > > What do you thing of the patch below? PERF_RECORD_SWITCH events are also > included for tracing forks. The patch would sit on top of Namhyung's. Yeah, it's good to add PERF_RECORD_SWITCH. > On 12/10/2021 12:07, German Gomez wrote: > > Hi, Leo and Namhyung, > > > > I want to make sure I'm on the same page as you regarding this topic. > > > > [...] > > > > If we are not considering patching the driver at this stage, so we allow > > hardware tracing on non-root namespaces. I think we could proceed like > > this: > > > > - For userspace, always use context-switch events as they are > > accurate and consistent with namespaces. I don't think you can always use context-switch events for userspace samples. The underlying mechanism is when there have context-switch event or context packet is coming, it will invoke the function machine__set_current_tid() to set current pid/tid; afterwards, we can retrieve the current pid/tid with the function arm_spe_set_pid_tid_cpu(). The question is that if we want to use the tid/pid info at the same time for both context-switch events and context packets, then it's hard to maintain. E.g. we need to create multiple thread context, one is used to track pid info coming from context-switch events and another context is to track pid info from context packet. To simplify the code, I still think we give context packet priority and use it if it's avalible. And we rollback to use context-switch events for pid/tid when context packet is not avaliable. > > - For kernel tracing, if context packets are enabled, use them, but > > warn the user that the PIDs correspond to the root namespace. > > - Otherwise, use context-switch events and warn the user of the time > > inaccuracies. > > > > Later, if the driver is patched to disable context packets outside the > > root namespace, kernel tracing could fall back to using context-switch > > events and warn the user with a single message about the time > > inaccuracies. > > > > If we are aligned, we could collect your feedback and share an updated > > patch that considers the warnings. > > > > Many thanks > > Best regards > > --- > tools/perf/util/arm-spe.c | 66 +++++++++++++++++++++++++++++++++++++-- > 1 file changed, 63 insertions(+), 3 deletions(-) > > diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c > index 708323d7c93c..6a2f7a484a80 100644 > --- a/tools/perf/util/arm-spe.c > +++ b/tools/perf/util/arm-spe.c > @@ -71,6 +71,17 @@ struct arm_spe { > u64 kernel_start; > > unsigned long num_events; > + > + /* > + * Used for PID tracing. > + */ > + u8 exclude_kernel; > + > + /* > + * Warning messages. > + */ > + u8 warn_context_pkt_namesapce; > + u8 warn_context_switch_ev_accuracy; > }; > > struct arm_spe_queue { > @@ -586,11 +597,42 @@ static bool arm_spe__is_timeless_decoding(struct arm_spe *spe) > return timeless_decoding; > } > > +static bool arm_spe__is_exclude_kernel(struct arm_spe *spe) { > + struct evsel *evsel; > + struct evlist *evlist = spe->session->evlist; > + > + evlist__for_each_entry(evlist, evsel) { > + if (evsel->core.attr.type == spe->pmu_type && evsel->core.attr.exclude_kernel) > + return true; > + } > + > + return false; > +} > + > static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe, > struct auxtrace_queue *queue) > { > struct arm_spe_queue *speq = queue->priv; > - pid_t tid; > + pid_t tid = machine__get_current_tid(spe->machine, speq->cpu); > + u64 context_id = speq->decoder->record.context_id; > + > + /* > + * We're tracing the kernel. > + */ > + if (!spe->exclude_kernel) { This is incorrect ... 'exclude_kernel' is a global variable and if it's set then perf will always run below code. I think here you want to avoid using contect packet for user space samples, but checking 'exclude_kernel' cannot help for this purpose since 'exclude_kernel' cannot be used to decide sample mode (kernel mode or user mode). Thanks, Leo > + /* > + * Use CONTEXT packets in kernel tracing if available and warn the user of the > + * values correspond to the root PID namespace. > + * > + * If CONTEXT packets aren't available but context-switch events are, warn the user > + * of the time inaccuracies. > + */ > + if (context_id != (u64) -1) { > + tid = speq->decoder->record.context_id; > + spe->warn_context_pkt_namesapce = true; > + } else if (tid != -1 && context_id == (u64) -1) > + spe->warn_context_switch_ev_accuracy = true; > + } > > tid = machine__get_current_tid(spe->machine, speq->cpu); > if (tid != -1) { > @@ -740,7 +782,8 @@ static int arm_spe_process_event(struct perf_session *session, > if (err) > return err; > > - if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE) > + if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE || > + event->header.type == PERF_RECORD_SWITCH) > err = arm_spe_context_switch(spe, event, sample); > } > > @@ -807,7 +850,20 @@ static int arm_spe_flush(struct perf_session *session __maybe_unused, > return arm_spe_process_timeless_queues(spe, -1, > MAX_TIMESTAMP - 1); > > - return arm_spe_process_queues(spe, MAX_TIMESTAMP); > + ret = arm_spe_process_queues(spe, MAX_TIMESTAMP); > + > + if (spe->warn_context_pkt_namesapce) > + ui__warning( > + "Arm SPE CONTEXT packets used for PID/TID tracing.\n\n" > + "PID values correspond to the root PID namespace.\n\n"); > + > + if (spe->warn_context_switch_ev_accuracy) > + ui__warning( > + "No Arm SPE CONTEXT packets found within traces.\n\n" > + "Fallback to PERF_RECORD_SWITCH events for PID/TID tracing will have\n" > + "workload-dependant timing inaccuracies.\n\n"); > + > + return ret; > } > > static void arm_spe_free_queue(void *priv) > @@ -1083,6 +1139,10 @@ int arm_spe_process_auxtrace_info(union perf_event *event, > > spe->timeless_decoding = arm_spe__is_timeless_decoding(spe); > > + spe->exclude_kernel = arm_spe__is_exclude_kernel(spe); > + spe->warn_context_pkt_namesapce = false; > + spe->warn_context_switch_ev_accuracy = false; > + > /* > * The synthesized event PERF_RECORD_TIME_CONV has been handled ahead > * and the parameters for hardware clock are stored in the session > -- > 2.17.1 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-10-18 13:23 ` Leo Yan @ 2021-10-19 12:21 ` German Gomez 2021-10-29 10:51 ` German Gomez 0 siblings, 1 reply; 27+ messages in thread From: German Gomez @ 2021-10-19 12:21 UTC (permalink / raw) To: Leo Yan Cc: Namhyung Kim, James Clark, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian, Adrian Hunter Hi Leo, Many thanks for you comments as always and sorry for the rushed patch. On 18/10/2021 14:23, Leo Yan wrote: > Hi German, > > On Mon, Oct 18, 2021 at 12:01:27PM +0100, German Gomez wrote: >> Hi, >> >> What do you thing of the patch below? PERF_RECORD_SWITCH events are also >> included for tracing forks. The patch would sit on top of Namhyung's. > Yeah, it's good to add PERF_RECORD_SWITCH. > >> On 12/10/2021 12:07, German Gomez wrote: >>> Hi, Leo and Namhyung, >>> >>> I want to make sure I'm on the same page as you regarding this topic. >>> >>> [...] >>> >>> If we are not considering patching the driver at this stage, so we allow >>> hardware tracing on non-root namespaces. I think we could proceed like >>> this: >>> >>> � - For userspace, always use context-switch events as they are >>> ��� accurate and consistent with namespaces. > I don't think you can always use context-switch events for userspace > samples. The underlying mechanism is when there have context-switch > event or context packet is coming, it will invoke the function > machine__set_current_tid() to set current pid/tid; afterwards, we > can retrieve the current pid/tid with the function > arm_spe_set_pid_tid_cpu(). > > The question is that if we want to use the tid/pid info at the same > time for both context-switch events and context packets, then it's > hard to maintain. E.g. we need to create multiple thread context, one > is used to track pid info coming from context-switch events and > another context is to track pid info from context packet. My thinking was to use only one of the methods for the entire run, but the code below is not expressive enough I'm afraid and I agree it could become hard to maintain. I need to polish it up. > > To simplify the code, I still think we give context packet priority and > use it if it's avalible. And we rollback to use context-switch events > for pid/tid when context packet is not avaliable. OK if it simplifies things. I think context-pkt availability can be determined before any events are processed by looking at the top record in the auxtrace_heap, o any of the auxtrace_queues. >>> � - For kernel tracing, if context packets are enabled, use them, but >>> ��� warn the user that the PIDs correspond to the root namespace. >>> � - Otherwise, use context-switch events and warn the user of the time >>> ��� inaccuracies. >>> >>> Later, if the driver is patched to disable context packets outside the >>> root namespace, kernel tracing could fall back to using context-switch >>> events and warn the user with a single message about the time >>> inaccuracies. >>> >>> If we are aligned, we could collect your feedback and share an updated >>> patch that considers the warnings. >>> >>> Many thanks >>> Best regards >> --- >> �tools/perf/util/arm-spe.c | 66 +++++++++++++++++++++++++++++++++++++-- >> �1 file changed, 63 insertions(+), 3 deletions(-) >> >> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c >> index 708323d7c93c..6a2f7a484a80 100644 >> --- a/tools/perf/util/arm-spe.c >> +++ b/tools/perf/util/arm-spe.c >> @@ -71,6 +71,17 @@ struct arm_spe { >> ���� u64��� ��� ��� ��� kernel_start; >> � >> ���� unsigned long��� ��� ��� num_events; >> + >> +��� /* >> +��� �* Used for PID tracing. >> +��� �*/ >> +��� u8��� ��� ��� ��� exclude_kernel; >> + >> +��� /* >> +��� �* Warning messages. >> +��� �*/ >> +��� u8��� ��� ��� ��� warn_context_pkt_namesapce; >> +��� u8��� ��� ��� ��� warn_context_switch_ev_accuracy; >> �}; >> � >> �struct arm_spe_queue { >> @@ -586,11 +597,42 @@ static bool arm_spe__is_timeless_decoding(struct arm_spe *spe) >> ���� return timeless_decoding; >> �} >> � >> +static bool arm_spe__is_exclude_kernel(struct arm_spe *spe) { >> +��� struct evsel *evsel; >> +��� struct evlist *evlist = spe->session->evlist; >> + >> +��� evlist__for_each_entry(evlist, evsel) { >> +��� if (evsel->core.attr.type == spe->pmu_type && evsel->core.attr.exclude_kernel) >> +��� ��� return true; >> +��� } >> + >> +��� return false; >> +} >> + >> �static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe, >> ���� ��� ��� ��� ��� struct auxtrace_queue *queue) >> �{ >> ���� struct arm_spe_queue *speq = queue->priv; >> -��� pid_t tid; >> +��� pid_t tid = machine__get_current_tid(spe->machine, speq->cpu); >> +��� u64 context_id = speq->decoder->record.context_id; >> + >> +��� /* >> +��� * We're tracing the kernel. >> +��� */ >> +��� if (!spe->exclude_kernel) { > This is incorrect ... 'exclude_kernel' is a global variable and if > it's set then perf will always run below code. > > I think here you want to avoid using contect packet for user space > samples, but checking 'exclude_kernel' cannot help for this purpose > since 'exclude_kernel' cannot be used to decide sample mode (kernel > mode or user mode). > > Thanks, > Leo > >> +��� ��� /* >> +��� ��� �* Use CONTEXT packets in kernel tracing if available and warn the user of the >> +��� ��� �* values correspond to the root PID namespace. >> +��� ��� �* >> +��� ��� �* If CONTEXT packets aren't available but context-switch events are, warn the user >> +��� ��� �* of the time inaccuracies. >> +��� ��� �*/ >> +��� ��� if (context_id != (u64) -1) { >> +��� ��� ��� tid = speq->decoder->record.context_id; >> +��� ��� ��� spe->warn_context_pkt_namesapce = true; >> +��� ��� } else if (tid != -1 && context_id == (u64) -1) >> +��� ��� ��� spe->warn_context_switch_ev_accuracy = true; >> +��� } >> � >> ���� tid = machine__get_current_tid(spe->machine, speq->cpu); >> ���� if (tid != -1) { >> @@ -740,7 +782,8 @@ static int arm_spe_process_event(struct perf_session *session, >> ���� ��� if (err) >> ���� ��� ��� return err; >> � >> -��� ��� if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE) >> +��� ��� if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE || >> +��� ��� ��� event->header.type == PERF_RECORD_SWITCH) >> ���� ��� ��� err = arm_spe_context_switch(spe, event, sample); >> ���� } >> � >> @@ -807,7 +850,20 @@ static int arm_spe_flush(struct perf_session *session __maybe_unused, >> ���� ��� return arm_spe_process_timeless_queues(spe, -1, >> ���� ��� ��� ��� MAX_TIMESTAMP - 1); >> � >> -��� return arm_spe_process_queues(spe, MAX_TIMESTAMP); >> +��� ret = arm_spe_process_queues(spe, MAX_TIMESTAMP); >> + >> +��� if (spe->warn_context_pkt_namesapce) >> +��� ��� ui__warning( >> +��� ��� ��� "Arm SPE CONTEXT packets used for PID/TID tracing.\n\n" >> +��� ��� ��� "PID values correspond to the root PID namespace.\n\n"); >> + >> +��� if (spe->warn_context_switch_ev_accuracy) >> +��� ��� ui__warning( >> +��� ��� ��� "No Arm SPE CONTEXT packets found within traces.\n\n" >> +��� ��� ��� "Fallback to PERF_RECORD_SWITCH events for PID/TID tracing will have\n" >> +��� ��� ��� "workload-dependant timing inaccuracies.\n\n"); >> + >> +��� return ret; >> �} >> � >> �static void arm_spe_free_queue(void *priv) >> @@ -1083,6 +1139,10 @@ int arm_spe_process_auxtrace_info(union perf_event *event, >> � >> ���� spe->timeless_decoding = arm_spe__is_timeless_decoding(spe); >> � >> +��� spe->exclude_kernel = arm_spe__is_exclude_kernel(spe); >> +��� spe->warn_context_pkt_namesapce = false; >> +��� spe->warn_context_switch_ev_accuracy = false; >> + >> ���� /* >> ���� �* The synthesized event PERF_RECORD_TIME_CONV has been handled ahead >> ���� �* and the parameters for hardware clock are stored in the session >> -- >> 2.17.1 Thanks, German ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-10-19 12:21 ` German Gomez @ 2021-10-29 10:51 ` German Gomez 2021-11-01 15:11 ` Leo Yan 0 siblings, 1 reply; 27+ messages in thread From: German Gomez @ 2021-10-29 10:51 UTC (permalink / raw) To: Leo Yan Cc: Namhyung Kim, James Clark, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian, Adrian Hunter Hi Leo, The current plan is to define a global flag in the `struct arm_spe` to select the method of pid tracing (context pkt, or switch events): struct arm_spe { /* ... */ u8 use_ctx_pkt_for_pid; } The method could be determined by peeking at the top element of the `struct auxtrace_heap` at the beginning of the perf-report. If ctx packets have been collected, the first one should have a context_id != -1. We could then tweak this part of Namhyung patch slightly: if (!spe->use_ctx_pkt_for_pid && (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE || event->header.type == PERF_RECORD_SWITCH)) err = arm_spe_context_switch(spe, event, sample); Then we could apply patch [1] which wasn't fully merged in the end, including similar `if (spe->use_ctx_pkt_for_pid)` to collect the pid/tid from the context packets. What do you think? Thanks, German [1] https://lore.kernel.org/lkml/20210119144658.793-8-james.clark@arm.com/ On 19/10/2021 13:21, German Gomez wrote: > Hi Leo, > > Many thanks for you comments as always and sorry for the rushed patch. > > On 18/10/2021 14:23, Leo Yan wrote: >> Hi German, >> >> On Mon, Oct 18, 2021 at 12:01:27PM +0100, German Gomez wrote: >>> Hi, >>> >>> What do you thing of the patch below? PERF_RECORD_SWITCH events are also >>> included for tracing forks. The patch would sit on top of Namhyung's. >> Yeah, it's good to add PERF_RECORD_SWITCH. >> >>> On 12/10/2021 12:07, German Gomez wrote: >>>> Hi, Leo and Namhyung, >>>> >>>> I want to make sure I'm on the same page as you regarding this topic. >>>> >>>> [...] >>>> >>>> If we are not considering patching the driver at this stage, so we allow >>>> hardware tracing on non-root namespaces. I think we could proceed like >>>> this: >>>> >>>> � - For userspace, always use context-switch events as they are >>>> ��� accurate and consistent with namespaces. >> I don't think you can always use context-switch events for userspace >> samples. The underlying mechanism is when there have context-switch >> event or context packet is coming, it will invoke the function >> machine__set_current_tid() to set current pid/tid; afterwards, we >> can retrieve the current pid/tid with the function >> arm_spe_set_pid_tid_cpu(). >> >> The question is that if we want to use the tid/pid info at the same >> time for both context-switch events and context packets, then it's >> hard to maintain. E.g. we need to create multiple thread context, one >> is used to track pid info coming from context-switch events and >> another context is to track pid info from context packet. > My thinking was to use only one of the methods for the entire run, but > the code below is not expressive enough I'm afraid and I agree it could > become hard to maintain. I need to polish it up. > >> To simplify the code, I still think we give context packet priority and >> use it if it's avalible. And we rollback to use context-switch events >> for pid/tid when context packet is not avaliable. > OK if it simplifies things. I think context-pkt availability can be > determined before any events are processed by looking at the top record > in the auxtrace_heap, o any of the auxtrace_queues. > >>>> � - For kernel tracing, if context packets are enabled, use them, but >>>> ��� warn the user that the PIDs correspond to the root namespace. >>>> � - Otherwise, use context-switch events and warn the user of the time >>>> ��� inaccuracies. >>>> >>>> Later, if the driver is patched to disable context packets outside the >>>> root namespace, kernel tracing could fall back to using context-switch >>>> events and warn the user with a single message about the time >>>> inaccuracies. >>>> >>>> If we are aligned, we could collect your feedback and share an updated >>>> patch that considers the warnings. >>>> >>>> Many thanks >>>> Best regards >>> --- >>> �tools/perf/util/arm-spe.c | 66 +++++++++++++++++++++++++++++++++++++-- >>> �1 file changed, 63 insertions(+), 3 deletions(-) >>> >>> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c >>> index 708323d7c93c..6a2f7a484a80 100644 >>> --- a/tools/perf/util/arm-spe.c >>> +++ b/tools/perf/util/arm-spe.c >>> @@ -71,6 +71,17 @@ struct arm_spe { >>> ���� u64��� ��� ��� ��� kernel_start; >>> � >>> ���� unsigned long��� ��� ��� num_events; >>> + >>> +��� /* >>> +��� �* Used for PID tracing. >>> +��� �*/ >>> +��� u8��� ��� ��� ��� exclude_kernel; >>> + >>> +��� /* >>> +��� �* Warning messages. >>> +��� �*/ >>> +��� u8��� ��� ��� ��� warn_context_pkt_namesapce; >>> +��� u8��� ��� ��� ��� warn_context_switch_ev_accuracy; >>> �}; >>> � >>> �struct arm_spe_queue { >>> @@ -586,11 +597,42 @@ static bool arm_spe__is_timeless_decoding(struct arm_spe *spe) >>> ���� return timeless_decoding; >>> �} >>> � >>> +static bool arm_spe__is_exclude_kernel(struct arm_spe *spe) { >>> +��� struct evsel *evsel; >>> +��� struct evlist *evlist = spe->session->evlist; >>> + >>> +��� evlist__for_each_entry(evlist, evsel) { >>> +��� if (evsel->core.attr.type == spe->pmu_type && evsel->core.attr.exclude_kernel) >>> +��� ��� return true; >>> +��� } >>> + >>> +��� return false; >>> +} >>> + >>> �static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe, >>> ���� ��� ��� ��� ��� struct auxtrace_queue *queue) >>> �{ >>> ���� struct arm_spe_queue *speq = queue->priv; >>> -��� pid_t tid; >>> +��� pid_t tid = machine__get_current_tid(spe->machine, speq->cpu); >>> +��� u64 context_id = speq->decoder->record.context_id; >>> + >>> +��� /* >>> +��� * We're tracing the kernel. >>> +��� */ >>> +��� if (!spe->exclude_kernel) { >> This is incorrect ... 'exclude_kernel' is a global variable and if >> it's set then perf will always run below code. >> >> I think here you want to avoid using contect packet for user space >> samples, but checking 'exclude_kernel' cannot help for this purpose >> since 'exclude_kernel' cannot be used to decide sample mode (kernel >> mode or user mode). >> >> Thanks, >> Leo >> >>> +��� ��� /* >>> +��� ��� �* Use CONTEXT packets in kernel tracing if available and warn the user of the >>> +��� ��� �* values correspond to the root PID namespace. >>> +��� ��� �* >>> +��� ��� �* If CONTEXT packets aren't available but context-switch events are, warn the user >>> +��� ��� �* of the time inaccuracies. >>> +��� ��� �*/ >>> +��� ��� if (context_id != (u64) -1) { >>> +��� ��� ��� tid = speq->decoder->record.context_id; >>> +��� ��� ��� spe->warn_context_pkt_namesapce = true; >>> +��� ��� } else if (tid != -1 && context_id == (u64) -1) >>> +��� ��� ��� spe->warn_context_switch_ev_accuracy = true; >>> +��� } >>> � >>> ���� tid = machine__get_current_tid(spe->machine, speq->cpu); >>> ���� if (tid != -1) { >>> @@ -740,7 +782,8 @@ static int arm_spe_process_event(struct perf_session *session, >>> ���� ��� if (err) >>> ���� ��� ��� return err; >>> � >>> -��� ��� if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE) >>> +��� ��� if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE || >>> +��� ��� ��� event->header.type == PERF_RECORD_SWITCH) >>> ���� ��� ��� err = arm_spe_context_switch(spe, event, sample); >>> ���� } >>> � >>> @@ -807,7 +850,20 @@ static int arm_spe_flush(struct perf_session *session __maybe_unused, >>> ���� ��� return arm_spe_process_timeless_queues(spe, -1, >>> ���� ��� ��� ��� MAX_TIMESTAMP - 1); >>> � >>> -��� return arm_spe_process_queues(spe, MAX_TIMESTAMP); >>> +��� ret = arm_spe_process_queues(spe, MAX_TIMESTAMP); >>> + >>> +��� if (spe->warn_context_pkt_namesapce) >>> +��� ��� ui__warning( >>> +��� ��� ��� "Arm SPE CONTEXT packets used for PID/TID tracing.\n\n" >>> +��� ��� ��� "PID values correspond to the root PID namespace.\n\n"); >>> + >>> +��� if (spe->warn_context_switch_ev_accuracy) >>> +��� ��� ui__warning( >>> +��� ��� ��� "No Arm SPE CONTEXT packets found within traces.\n\n" >>> +��� ��� ��� "Fallback to PERF_RECORD_SWITCH events for PID/TID tracing will have\n" >>> +��� ��� ��� "workload-dependant timing inaccuracies.\n\n"); >>> + >>> +��� return ret; >>> �} >>> � >>> �static void arm_spe_free_queue(void *priv) >>> @@ -1083,6 +1139,10 @@ int arm_spe_process_auxtrace_info(union perf_event *event, >>> � >>> ���� spe->timeless_decoding = arm_spe__is_timeless_decoding(spe); >>> � >>> +��� spe->exclude_kernel = arm_spe__is_exclude_kernel(spe); >>> +��� spe->warn_context_pkt_namesapce = false; >>> +��� spe->warn_context_switch_ev_accuracy = false; >>> + >>> ���� /* >>> ���� �* The synthesized event PERF_RECORD_TIME_CONV has been handled ahead >>> ���� �* and the parameters for hardware clock are stored in the session >>> -- >>> 2.17.1 > > Thanks, > German > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-10-29 10:51 ` German Gomez @ 2021-11-01 15:11 ` Leo Yan 2021-11-01 15:36 ` German Gomez 0 siblings, 1 reply; 27+ messages in thread From: Leo Yan @ 2021-11-01 15:11 UTC (permalink / raw) To: German Gomez Cc: Namhyung Kim, James Clark, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian, Adrian Hunter Hi German, On Fri, Oct 29, 2021 at 11:51:16AM +0100, German Gomez wrote: > Hi Leo, > > The current plan is to define a global flag in the `struct arm_spe` to > select the method of pid tracing (context pkt, or switch events): > > struct arm_spe { > /* ... */ > u8 use_ctx_pkt_for_pid; > } > > The method could be determined by peeking at the top element of the > `struct auxtrace_heap` at the beginning of the perf-report. If ctx > packets have been collected, the first one should have a context_id != > -1. We could then tweak this part of Namhyung patch slightly: Have one concern: if cannot find the context packet, will the decoder drop the SPE packets until it find the first context packet? If this is the case, I am concern the decoder will run out for all packets and doesn't generate any samples if the SPE trace data doesn't contain any context packet. > > if (!spe->use_ctx_pkt_for_pid && > (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE || > event->header.type == PERF_RECORD_SWITCH)) > err = arm_spe_context_switch(spe, event, sample); > > Then we could apply patch [1] which wasn't fully merged in the end, > including similar `if (spe->use_ctx_pkt_for_pid)` to collect the pid/tid > from the context packets. > > What do you think? Except the above concern, the solution looks good to me. Thanks, Leo ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-11-01 15:11 ` Leo Yan @ 2021-11-01 15:36 ` German Gomez 2021-11-01 15:42 ` German Gomez 0 siblings, 1 reply; 27+ messages in thread From: German Gomez @ 2021-11-01 15:36 UTC (permalink / raw) To: Leo Yan Cc: Namhyung Kim, James Clark, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian, Adrian Hunter Hi Leo, On 01/11/2021 15:11, Leo Yan wrote: > Hi German, > > On Fri, Oct 29, 2021 at 11:51:16AM +0100, German Gomez wrote: > [...] > Have one concern: if cannot find the context packet, will the decoder > drop the SPE packets until it find the first context packet? If this > is the case, I am concern the decoder will run out for all packets > and doesn't generate any samples if the SPE trace data doesn't contain > any context packet. Not really. It will only peek at the first decoded packet without dropping it. I couldn't think of a corner case where the decoder might miss a context packet for the first records (I also haven't seen any -1 values so far). >> ��� if (!spe->use_ctx_pkt_for_pid && >> ������� (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE || >> �������� event->header.type == PERF_RECORD_SWITCH)) >> ����������� err = arm_spe_context_switch(spe, event, sample); >> >> Then we could apply patch [1] which wasn't fully merged in the end, >> including similar `if (spe->use_ctx_pkt_for_pid)` to collect the pid/tid >> from the context packets. >> >> What do you think? > Except the above concern, the solution looks good to me. I realized I cannot use the heap for it will not work in timeless decoding. We can still use the queues though. By the way, is this return statement in the arm_spe__setup_queue() function misplaced? if (spe->timeless_decoding) return 0; Judging by the long comment in the arm_spe_run_decoder() function, it seems like it should be placed somewhere below the call to "ret = arm_spe_decode(...)", otherwise arm_spe_run_decoder() will begin with an uninitialized record? Thanks, German > > Thanks, > Leo ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-11-01 15:36 ` German Gomez @ 2021-11-01 15:42 ` German Gomez 0 siblings, 0 replies; 27+ messages in thread From: German Gomez @ 2021-11-01 15:42 UTC (permalink / raw) To: Leo Yan Cc: Namhyung Kim, James Clark, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian, Adrian Hunter On 01/11/2021 15:36, German Gomez wrote: > [...] > Not really. It will only peek at the first decoded packet without Slight correction for clarity: s/decoded packet/decoded record/. I would be peeking at the first record in the auxtrace queue. The context packet is stored in the context_id field of the record. > dropping it. I couldn't think of a corner case where the decoder might > miss a context packet for the first records (I also haven't seen any -1 > values so far). > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events 2021-10-05 10:06 ` German Gomez 2021-10-06 9:36 ` Leo Yan @ 2021-10-06 14:06 ` James Clark 1 sibling, 0 replies; 27+ messages in thread From: James Clark @ 2021-10-06 14:06 UTC (permalink / raw) To: German Gomez, Leo Yan Cc: Namhyung Kim, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian, Adrian Hunter On 05/10/2021 11:06, German Gomez wrote: > Hi Leo, > > On 04/10/2021 07:26, Leo Yan wrote: >> Hi James, >> >> On Thu, Sep 30, 2021 at 04:08:52PM +0100, James Clark wrote: >>> On 23/09/2021 15:23, Leo Yan wrote: >>>> On Thu, Sep 16, 2021 at 02:01:21PM -0700, Namhyung Kim wrote: >> [...] >> I'd like to use the comparison method for the test: >> We should enable PID tracing and capture in the perf.data, when decode >> the trace data, we can based on context packet and based on the switch >> events to generate out two results, so we can check how the difference >> between these results. > > Yesterday we did some testing and found that there seems to be an exact > match between using context packets and switch events. However this only > applies when tracing in userspace (by adding the 'u' suffix to the perf > event). Otherwise we still see as much as 2% of events having the wrong > PID around the time of the switch. > > In order to measure this I applied Namhyung's patch and James's patch > from [1]. I thought that this had been applied already so I need to follow this up. James [...] > > [1] https://www.spinics.net/lists/linux-perf-users/msg12543.html > ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2021-11-01 15:42 UTC | newest] Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-09-16 0:17 [RFC] perf arm-spe: Track task context switch for cpu-mode events Namhyung Kim 2021-09-16 13:54 ` Leo Yan 2021-09-16 21:01 ` Namhyung Kim 2021-09-23 14:23 ` Leo Yan 2021-09-23 16:01 ` Namhyung Kim 2021-09-30 18:47 ` Stephane Eranian 2021-10-01 10:44 ` James Clark 2021-10-01 18:22 ` Stephane Eranian 2021-10-04 15:19 ` Leo Yan 2021-09-30 15:08 ` James Clark 2021-10-04 6:26 ` Leo Yan 2021-10-05 10:06 ` German Gomez 2021-10-06 9:36 ` Leo Yan 2021-10-06 16:09 ` Namhyung Kim 2021-10-08 11:07 ` German Gomez 2021-10-09 0:12 ` Namhyung Kim 2021-10-11 13:58 ` German Gomez 2021-10-11 14:29 ` Leo Yan 2021-10-12 11:07 ` German Gomez 2021-10-18 11:01 ` German Gomez 2021-10-18 13:23 ` Leo Yan 2021-10-19 12:21 ` German Gomez 2021-10-29 10:51 ` German Gomez 2021-11-01 15:11 ` Leo Yan 2021-11-01 15:36 ` German Gomez 2021-11-01 15:42 ` German Gomez 2021-10-06 14:06 ` James Clark
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).