All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] perf arm-spe: Track task context switch for cpu-mode events
@ 2021-09-16  0:17 Namhyung Kim
  2021-09-16 13:54 ` Leo Yan
  0 siblings, 1 reply; 27+ messages in thread
From: Namhyung Kim @ 2021-09-16  0:17 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers,
	Stephane Eranian, Leo Yan, James Clark, Tan Xiaojun,
	Adrian Hunter

When perf report synthesize events from ARM SPE data, it refers to
current cpu, pid and tid in the machine.  But there's no place to set
them in the ARM SPE decoder.  I'm seeing all pid/tid is set to -1 and
user symbols are not resolved in the output.

  # perf record -a -e arm_spe_0/ts_enable=1/ sleep 1

  # perf report -q | head
     8.77%     8.77%  :-1      [kernel.kallsyms]  [k] format_decode
     7.02%     7.02%  :-1      [kernel.kallsyms]  [k] seq_printf
     7.02%     7.02%  :-1      [unknown]          [.] 0x0000ffff9f687c34
     5.26%     5.26%  :-1      [kernel.kallsyms]  [k] vsnprintf
     3.51%     3.51%  :-1      [kernel.kallsyms]  [k] string
     3.51%     3.51%  :-1      [unknown]          [.] 0x0000ffff9f66ae20
     3.51%     3.51%  :-1      [unknown]          [.] 0x0000ffff9f670b3c
     3.51%     3.51%  :-1      [unknown]          [.] 0x0000ffff9f67c040
     1.75%     1.75%  :-1      [kernel.kallsyms]  [k] ___cache_free
     1.75%     1.75%  :-1      [kernel.kallsyms]  [k] __count_memcg_events

Like Intel PT, add context switch records to track task info.  As ARM
SPE support was added later than PERF_RECORD_SWITCH_CPU_WIDE, I think
we can safely set the attr.context_switch bit and use it.

Cc: Leo Yan <leo.yan@linaro.org>
Cc: James Clark <james.clark@arm.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/arch/arm64/util/arm-spe.c |  6 +++++-
 tools/perf/util/arm-spe.c            | 24 ++++++++++++++++++++++++
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
index a4420d4df503..58ba8d15c573 100644
--- a/tools/perf/arch/arm64/util/arm-spe.c
+++ b/tools/perf/arch/arm64/util/arm-spe.c
@@ -166,8 +166,12 @@ static int arm_spe_recording_options(struct auxtrace_record *itr,
 	tracking_evsel->core.attr.sample_period = 1;
 
 	/* In per-cpu case, always need the time of mmap events etc */
-	if (!perf_cpu_map__empty(cpus))
+	if (!perf_cpu_map__empty(cpus)) {
 		evsel__set_sample_bit(tracking_evsel, TIME);
+		evsel__set_sample_bit(tracking_evsel, CPU);
+		/* also track task context switch */
+		tracking_evsel->core.attr.context_switch = 1;
+	}
 
 	return 0;
 }
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 58b7069c5a5f..0acac0431c48 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -681,6 +681,25 @@ static int arm_spe_process_timeless_queues(struct arm_spe *spe, pid_t tid,
 	return 0;
 }
 
+static int arm_spe_context_switch(struct arm_spe *spe, union perf_event *event,
+				  struct perf_sample *sample)
+{
+	pid_t pid, tid;
+	int cpu;
+
+	if (!(event->header.misc & PERF_RECORD_MISC_SWITCH_OUT))
+		return 0;
+
+	pid = event->context_switch.next_prev_pid;
+	tid = event->context_switch.next_prev_tid;
+	cpu = sample->cpu;
+
+	if (tid == -1)
+		pr_warn("context_switch event has no tid\n");
+
+	return machine__set_current_tid(spe->machine, cpu, pid, tid);
+}
+
 static int arm_spe_process_event(struct perf_session *session,
 				 union perf_event *event,
 				 struct perf_sample *sample,
@@ -718,6 +737,11 @@ static int arm_spe_process_event(struct perf_session *session,
 		}
 	} else if (timestamp) {
 		err = arm_spe_process_queues(spe, timestamp);
+		if (err)
+			return err;
+
+		if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE)
+			err = arm_spe_context_switch(spe, event, sample);
 	}
 
 	return err;
-- 
2.33.0.464.g1972c5931b-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-09-16  0:17 [RFC] perf arm-spe: Track task context switch for cpu-mode events Namhyung Kim
@ 2021-09-16 13:54 ` Leo Yan
  2021-09-16 21:01   ` Namhyung Kim
  0 siblings, 1 reply; 27+ messages in thread
From: Leo Yan @ 2021-09-16 13:54 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Andi Kleen, Ian Rogers, Stephane Eranian, James Clark,
	Tan Xiaojun, Adrian Hunter

Hi Namhyung,

On Wed, Sep 15, 2021 at 05:17:48PM -0700, Namhyung Kim wrote:
> When perf report synthesize events from ARM SPE data, it refers to
> current cpu, pid and tid in the machine.  But there's no place to set
> them in the ARM SPE decoder.  I'm seeing all pid/tid is set to -1 and
> user symbols are not resolved in the output.
> 
>   # perf record -a -e arm_spe_0/ts_enable=1/ sleep 1
> 
>   # perf report -q | head
>      8.77%     8.77%  :-1      [kernel.kallsyms]  [k] format_decode
>      7.02%     7.02%  :-1      [kernel.kallsyms]  [k] seq_printf
>      7.02%     7.02%  :-1      [unknown]          [.] 0x0000ffff9f687c34
>      5.26%     5.26%  :-1      [kernel.kallsyms]  [k] vsnprintf
>      3.51%     3.51%  :-1      [kernel.kallsyms]  [k] string
>      3.51%     3.51%  :-1      [unknown]          [.] 0x0000ffff9f66ae20
>      3.51%     3.51%  :-1      [unknown]          [.] 0x0000ffff9f670b3c
>      3.51%     3.51%  :-1      [unknown]          [.] 0x0000ffff9f67c040
>      1.75%     1.75%  :-1      [kernel.kallsyms]  [k] ___cache_free
>      1.75%     1.75%  :-1      [kernel.kallsyms]  [k] __count_memcg_events
> 
> Like Intel PT, add context switch records to track task info.  As ARM
> SPE support was added later than PERF_RECORD_SWITCH_CPU_WIDE, I think
> we can safely set the attr.context_switch bit and use it.

Thanks for the patch.

Before we had discussion for enabling PID/TID for SPE samples; in the patch
set [1], patches 07, 08 set sample's pid/tid based on the Arm SPE context
packets.  To enable hardware tracing context ID, you also needs to enable
kernel config CONFIG_PID_IN_CONTEXTIDR.

At that time, there have a concern is the hardware context ID might
introduce confusion for non-root namespace.

We also considered to use PERF_RECORD_SWITCH_CPU_WIDE event for setting
pid/tid, the Intel PT implementation uses two things to set sample's
pid/tid: one is PERF_RECORD_SWITCH_CPU_WIDE event and another is to detect
the branch instruction is the symbol "__switch_to".  Since the trace
event PERF_RECORD_SWITCH_CPU_WIDE is coarse, so it only uses the new
pid/tid after the branch instruction for "__switch_to".  Arm SPE is
'statistical', thus it cannot promise the trace data must contain the
branch instruction for "__switch_to", please see details [2].

I think the feasible way is to use CONTEXTIDR to trace PID/TID _only_
for root namespace, and the perf tool uses context packet to set
pid/tid for samples.  So except we need patches 07 and 08, we also
need a change in Arm SPE driver as below:

diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index d44bcc29d99c..2553d53d3772 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -272,7 +272,9 @@ static u64 arm_spe_event_to_pmscr(struct perf_event *event)
        if (!attr->exclude_kernel)
                reg |= BIT(SYS_PMSCR_EL1_E1SPE_SHIFT);
 
-       if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable())
+       /* Only enable context ID tracing for root namespace */
+       if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable() &&
+           (task_active_pid_ns(current) == &init_pid_ns))
                reg |= BIT(SYS_PMSCR_EL1_CX_SHIFT);
 
        return reg;

Could you confirm if this works for you?  If it's okay for you, I will
sync with James for upstreaming the changes.

Thanks,
Leo

[1] https://lore.kernel.org/lkml/20210119144658.793-8-james.clark@arm.com/
[2] https://lore.kernel.org/lkml/20210204102734.GA4737@leoy-ThinkPad-X240s/

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-09-16 13:54 ` Leo Yan
@ 2021-09-16 21:01   ` Namhyung Kim
  2021-09-23 14:23     ` Leo Yan
  0 siblings, 1 reply; 27+ messages in thread
From: Namhyung Kim @ 2021-09-16 21:01 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Andi Kleen, Ian Rogers, Stephane Eranian, James Clark,
	Adrian Hunter

Hi Leo,

(Removing Tan as the email bounced)

On Thu, Sep 16, 2021 at 6:54 AM Leo Yan <leo.yan@linaro.org> wrote:
>
> Hi Namhyung,
>
> On Wed, Sep 15, 2021 at 05:17:48PM -0700, Namhyung Kim wrote:
> > When perf report synthesize events from ARM SPE data, it refers to
> > current cpu, pid and tid in the machine.  But there's no place to set
> > them in the ARM SPE decoder.  I'm seeing all pid/tid is set to -1 and
> > user symbols are not resolved in the output.
> >
> >   # perf record -a -e arm_spe_0/ts_enable=1/ sleep 1
> >
> >   # perf report -q | head
> >      8.77%     8.77%  :-1      [kernel.kallsyms]  [k] format_decode
> >      7.02%     7.02%  :-1      [kernel.kallsyms]  [k] seq_printf
> >      7.02%     7.02%  :-1      [unknown]          [.] 0x0000ffff9f687c34
> >      5.26%     5.26%  :-1      [kernel.kallsyms]  [k] vsnprintf
> >      3.51%     3.51%  :-1      [kernel.kallsyms]  [k] string
> >      3.51%     3.51%  :-1      [unknown]          [.] 0x0000ffff9f66ae20
> >      3.51%     3.51%  :-1      [unknown]          [.] 0x0000ffff9f670b3c
> >      3.51%     3.51%  :-1      [unknown]          [.] 0x0000ffff9f67c040
> >      1.75%     1.75%  :-1      [kernel.kallsyms]  [k] ___cache_free
> >      1.75%     1.75%  :-1      [kernel.kallsyms]  [k] __count_memcg_events
> >
> > Like Intel PT, add context switch records to track task info.  As ARM
> > SPE support was added later than PERF_RECORD_SWITCH_CPU_WIDE, I think
> > we can safely set the attr.context_switch bit and use it.
>
> Thanks for the patch.
>
> Before we had discussion for enabling PID/TID for SPE samples; in the patch
> set [1], patches 07, 08 set sample's pid/tid based on the Arm SPE context
> packets.  To enable hardware tracing context ID, you also needs to enable
> kernel config CONFIG_PID_IN_CONTEXTIDR.

Thanks for sharing this.

Yeah I also look at the context info but having a dependency on a kconfig
looks limiting its functionality.  Also the kconfig says it has some overhead
in the critical path (even if perf is not running, right?) - but not sure how
much it can add.

config PID_IN_CONTEXTIDR
    bool "Write the current PID to the CONTEXTIDR register"
    help
      Enabling this option causes the kernel to write the current PID to
      the CONTEXTIDR register, at the expense of some additional
      instructions during context switch. Say Y here only if you are
      planning to use hardware trace tools with this kernel.

>
> At that time, there have a concern is the hardware context ID might
> introduce confusion for non-root namespace.

Sounds like a problem.

>
> We also considered to use PERF_RECORD_SWITCH_CPU_WIDE event for setting
> pid/tid, the Intel PT implementation uses two things to set sample's
> pid/tid: one is PERF_RECORD_SWITCH_CPU_WIDE event and another is to detect
> the branch instruction is the symbol "__switch_to".  Since the trace
> event PERF_RECORD_SWITCH_CPU_WIDE is coarse, so it only uses the new
> pid/tid after the branch instruction for "__switch_to".  Arm SPE is
> 'statistical', thus it cannot promise the trace data must contain the
> branch instruction for "__switch_to", please see details [2].

I can see the need in the Intel PT as it needs to trace all (branch)
instructions, but is it really needed for ARM SPE too?
Maybe I am missing something, but it seems enough to have a
coarse-grained context switch for sampling events..

>
> I think the feasible way is to use CONTEXTIDR to trace PID/TID _only_
> for root namespace, and the perf tool uses context packet to set
> pid/tid for samples.  So except we need patches 07 and 08, we also
> need a change in Arm SPE driver as below:
>
> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> index d44bcc29d99c..2553d53d3772 100644
> --- a/drivers/perf/arm_spe_pmu.c
> +++ b/drivers/perf/arm_spe_pmu.c
> @@ -272,7 +272,9 @@ static u64 arm_spe_event_to_pmscr(struct perf_event *event)
>         if (!attr->exclude_kernel)
>                 reg |= BIT(SYS_PMSCR_EL1_E1SPE_SHIFT);
>
> -       if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable())
> +       /* Only enable context ID tracing for root namespace */
> +       if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable() &&
> +           (task_active_pid_ns(current) == &init_pid_ns))
>                 reg |= BIT(SYS_PMSCR_EL1_CX_SHIFT);
>
>         return reg;
>
> Could you confirm if this works for you?  If it's okay for you, I will
> sync with James for upstreaming the changes.

Let me think about this more..

Thanks,
Namhyung


>
> Thanks,
> Leo
>
> [1] https://lore.kernel.org/lkml/20210119144658.793-8-james.clark@arm.com/
> [2] https://lore.kernel.org/lkml/20210204102734.GA4737@leoy-ThinkPad-X240s/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-09-16 21:01   ` Namhyung Kim
@ 2021-09-23 14:23     ` Leo Yan
  2021-09-23 16:01       ` Namhyung Kim
  2021-09-30 15:08       ` James Clark
  0 siblings, 2 replies; 27+ messages in thread
From: Leo Yan @ 2021-09-23 14:23 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Andi Kleen, Ian Rogers, Stephane Eranian, James Clark,
	Adrian Hunter

Hi Namhyung,

On Thu, Sep 16, 2021 at 02:01:21PM -0700, Namhyung Kim wrote:

[...]

> > Before we had discussion for enabling PID/TID for SPE samples; in the patch
> > set [1], patches 07, 08 set sample's pid/tid based on the Arm SPE context
> > packets.  To enable hardware tracing context ID, you also needs to enable
> > kernel config CONFIG_PID_IN_CONTEXTIDR.
> 
> Thanks for sharing this.
> 
> Yeah I also look at the context info but having a dependency on a kconfig
> looks limiting its functionality.  Also the kconfig says it has some overhead
> in the critical path (even if perf is not running, right?) - but not sure how
> much it can add.

Yes, after enabled config PID_IN_CONTEXTIDR, the kernel will always
write PID into the system register CONTEXTIDR during process context
switching.  Please see the flow:

  __switch_to() (arch/arm64/kernel/process.c)
    `-> contextidr_thread_switch(next)

> config PID_IN_CONTEXTIDR
>     bool "Write the current PID to the CONTEXTIDR register"
>     help
>       Enabling this option causes the kernel to write the current PID to
>       the CONTEXTIDR register, at the expense of some additional
>       instructions during context switch. Say Y here only if you are
>       planning to use hardware trace tools with this kernel.
> 
> >
> > At that time, there have a concern is the hardware context ID might
> > introduce confusion for non-root namespace.
> 
> Sounds like a problem.
> 
> >
> > We also considered to use PERF_RECORD_SWITCH_CPU_WIDE event for setting
> > pid/tid, the Intel PT implementation uses two things to set sample's
> > pid/tid: one is PERF_RECORD_SWITCH_CPU_WIDE event and another is to detect
> > the branch instruction is the symbol "__switch_to".  Since the trace
> > event PERF_RECORD_SWITCH_CPU_WIDE is coarse, so it only uses the new
> > pid/tid after the branch instruction for "__switch_to".  Arm SPE is
> > 'statistical', thus it cannot promise the trace data must contain the
> > branch instruction for "__switch_to", please see details [2].
> 
> I can see the need in the Intel PT as it needs to trace all (branch)
> instructions, but is it really needed for ARM SPE too?
> Maybe I am missing something, but it seems enough to have a
> coarse-grained context switch for sampling events..

The issue is that the coarse-grained context switch if introduces any
inaccuracy in the reported result.  If we can run some workloads and
prove the coarse-grained context switch doesn't cause significant bias,
it will be great and can give us the confidence for this approach.

Even enabling PERF_RECORD_SWITCH_CPU_WIDE event, I think it's good to
give priority for hardware PID tracing in Arm SPE trace data, if detects
the hardware PID tracing is enabled, then we can rollback to use
context packets from hardware trace data to set sample's PID.

How about you think for this?

Thanks,
Leo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-09-23 14:23     ` Leo Yan
@ 2021-09-23 16:01       ` Namhyung Kim
  2021-09-30 18:47         ` Stephane Eranian
  2021-09-30 15:08       ` James Clark
  1 sibling, 1 reply; 27+ messages in thread
From: Namhyung Kim @ 2021-09-23 16:01 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Andi Kleen, Ian Rogers, Stephane Eranian, James Clark,
	Adrian Hunter

Hi Leo,

On Thu, Sep 23, 2021 at 7:23 AM Leo Yan <leo.yan@linaro.org> wrote:
>
> Hi Namhyung,
>
> On Thu, Sep 16, 2021 at 02:01:21PM -0700, Namhyung Kim wrote:
>
> [...]
>
> > > Before we had discussion for enabling PID/TID for SPE samples; in the patch
> > > set [1], patches 07, 08 set sample's pid/tid based on the Arm SPE context
> > > packets.  To enable hardware tracing context ID, you also needs to enable
> > > kernel config CONFIG_PID_IN_CONTEXTIDR.
> >
> > Thanks for sharing this.
> >
> > Yeah I also look at the context info but having a dependency on a kconfig
> > looks limiting its functionality.  Also the kconfig says it has some overhead
> > in the critical path (even if perf is not running, right?) - but not sure how
> > much it can add.
>
> Yes, after enabled config PID_IN_CONTEXTIDR, the kernel will always
> write PID into the system register CONTEXTIDR during process context
> switching.  Please see the flow:
>
>   __switch_to() (arch/arm64/kernel/process.c)
>     `-> contextidr_thread_switch(next)

Thanks for the info.  I assume it's a light-weight operation.


> > > We also considered to use PERF_RECORD_SWITCH_CPU_WIDE event for setting
> > > pid/tid, the Intel PT implementation uses two things to set sample's
> > > pid/tid: one is PERF_RECORD_SWITCH_CPU_WIDE event and another is to detect
> > > the branch instruction is the symbol "__switch_to".  Since the trace
> > > event PERF_RECORD_SWITCH_CPU_WIDE is coarse, so it only uses the new
> > > pid/tid after the branch instruction for "__switch_to".  Arm SPE is
> > > 'statistical', thus it cannot promise the trace data must contain the
> > > branch instruction for "__switch_to", please see details [2].
> >
> > I can see the need in the Intel PT as it needs to trace all (branch)
> > instructions, but is it really needed for ARM SPE too?
> > Maybe I am missing something, but it seems enough to have a
> > coarse-grained context switch for sampling events..
>
> The issue is that the coarse-grained context switch if introduces any
> inaccuracy in the reported result.  If we can run some workloads and
> prove the coarse-grained context switch doesn't cause significant bias,
> it will be great and can give us the confidence for this approach.
>
> Even enabling PERF_RECORD_SWITCH_CPU_WIDE event, I think it's good to
> give priority for hardware PID tracing in Arm SPE trace data, if detects
> the hardware PID tracing is enabled, then we can rollback to use
> context packets from hardware trace data to set sample's PID.
>
> How about you think for this?

I think it's good as long as it has a fallback when the context info
is not available.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-09-23 14:23     ` Leo Yan
  2021-09-23 16:01       ` Namhyung Kim
@ 2021-09-30 15:08       ` James Clark
  2021-10-04  6:26         ` Leo Yan
  1 sibling, 1 reply; 27+ messages in thread
From: James Clark @ 2021-09-30 15:08 UTC (permalink / raw)
  To: Leo Yan, Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Andi Kleen, Ian Rogers, Stephane Eranian, Adrian Hunter,
	german.gomez



On 23/09/2021 15:23, Leo Yan wrote:
> Hi Namhyung,
> 
> On Thu, Sep 16, 2021 at 02:01:21PM -0700, Namhyung Kim wrote:
> 
> [...]
> 
>>> Before we had discussion for enabling PID/TID for SPE samples; in the patch
>>> set [1], patches 07, 08 set sample's pid/tid based on the Arm SPE context
>>> packets.  To enable hardware tracing context ID, you also needs to enable
>>> kernel config CONFIG_PID_IN_CONTEXTIDR.
>>
>> Thanks for sharing this.
>>
>> Yeah I also look at the context info but having a dependency on a kconfig
>> looks limiting its functionality.  Also the kconfig says it has some overhead
>> in the critical path (even if perf is not running, right?) - but not sure how
>> much it can add.
> 
> Yes, after enabled config PID_IN_CONTEXTIDR, the kernel will always
> write PID into the system register CONTEXTIDR during process context
> switching.  Please see the flow:
> 
>   __switch_to() (arch/arm64/kernel/process.c)
>     `-> contextidr_thread_switch(next)
> 
>> config PID_IN_CONTEXTIDR
>>     bool "Write the current PID to the CONTEXTIDR register"
>>     help
>>       Enabling this option causes the kernel to write the current PID to
>>       the CONTEXTIDR register, at the expense of some additional
>>       instructions during context switch. Say Y here only if you are
>>       planning to use hardware trace tools with this kernel.
>>
>>>
>>> At that time, there have a concern is the hardware context ID might
>>> introduce confusion for non-root namespace.
>>
>> Sounds like a problem.
>>
>>>
>>> We also considered to use PERF_RECORD_SWITCH_CPU_WIDE event for setting
>>> pid/tid, the Intel PT implementation uses two things to set sample's
>>> pid/tid: one is PERF_RECORD_SWITCH_CPU_WIDE event and another is to detect
>>> the branch instruction is the symbol "__switch_to".  Since the trace
>>> event PERF_RECORD_SWITCH_CPU_WIDE is coarse, so it only uses the new
>>> pid/tid after the branch instruction for "__switch_to".  Arm SPE is
>>> 'statistical', thus it cannot promise the trace data must contain the
>>> branch instruction for "__switch_to", please see details [2].
>>
>> I can see the need in the Intel PT as it needs to trace all (branch)
>> instructions, but is it really needed for ARM SPE too?
>> Maybe I am missing something, but it seems enough to have a
>> coarse-grained context switch for sampling events..
> 
> The issue is that the coarse-grained context switch if introduces any
> inaccuracy in the reported result.  If we can run some workloads and
> prove the coarse-grained context switch doesn't cause significant bias,
> it will be great and can give us the confidence for this approach.

It sounds like it's worth testing. Do you think the inaccuracy would only
apply to code in the kernel around the time of the switch? Or do you think
it could affect userspace as well? It seems to me that the switch event
would have a timestamp that would precede _all_ userspace code, but I'm not
100% sure on that. I suppose it's easy to test.

German Gomez actually starting looking into the switch events for SPE at the
same time, so I've CCd him and maybe he can do some testing of the patch.

Thanks
James

> 
> Even enabling PERF_RECORD_SWITCH_CPU_WIDE event, I think it's good to
> give priority for hardware PID tracing in Arm SPE trace data, if detects
> the hardware PID tracing is enabled, then we can rollback to use
> context packets from hardware trace data to set sample's PID.
> 
> How about you think for this?
> 
> Thanks,
> Leo
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-09-23 16:01       ` Namhyung Kim
@ 2021-09-30 18:47         ` Stephane Eranian
  2021-10-01 10:44           ` James Clark
  0 siblings, 1 reply; 27+ messages in thread
From: Stephane Eranian @ 2021-09-30 18:47 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar,
	Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, James Clark,
	Adrian Hunter

On Thu, Sep 23, 2021 at 9:02 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hi Leo,
>
> On Thu, Sep 23, 2021 at 7:23 AM Leo Yan <leo.yan@linaro.org> wrote:
> >
> > Hi Namhyung,
> >
> > On Thu, Sep 16, 2021 at 02:01:21PM -0700, Namhyung Kim wrote:
> >
> > [...]
> >
> > > > Before we had discussion for enabling PID/TID for SPE samples; in the patch
> > > > set [1], patches 07, 08 set sample's pid/tid based on the Arm SPE context
> > > > packets.  To enable hardware tracing context ID, you also needs to enable
> > > > kernel config CONFIG_PID_IN_CONTEXTIDR.
> > >
> > > Thanks for sharing this.
> > >
> > > Yeah I also look at the context info but having a dependency on a kconfig
> > > looks limiting its functionality.  Also the kconfig says it has some overhead
> > > in the critical path (even if perf is not running, right?) - but not sure how
> > > much it can add.
> >
> > Yes, after enabled config PID_IN_CONTEXTIDR, the kernel will always
> > write PID into the system register CONTEXTIDR during process context
> > switching.  Please see the flow:
> >
> >   __switch_to() (arch/arm64/kernel/process.c)
> >     `-> contextidr_thread_switch(next)
>
> Thanks for the info.  I assume it's a light-weight operation.
>
>
I'd like to understand why it was believed that having SPE record to
PID could be too expensive
vs. what I am seeing with all the tracking of context switches and the
volume of data this generates.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-09-30 18:47         ` Stephane Eranian
@ 2021-10-01 10:44           ` James Clark
  2021-10-01 18:22             ` Stephane Eranian
  0 siblings, 1 reply; 27+ messages in thread
From: James Clark @ 2021-10-01 10:44 UTC (permalink / raw)
  To: Stephane Eranian, Namhyung Kim
  Cc: Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar,
	Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Adrian Hunter



On 30/09/2021 19:47, Stephane Eranian wrote:
> On Thu, Sep 23, 2021 at 9:02 AM Namhyung Kim <namhyung@kernel.org> wrote:
>>
>> Hi Leo,
>>
>> On Thu, Sep 23, 2021 at 7:23 AM Leo Yan <leo.yan@linaro.org> wrote:
>>>
>>> Hi Namhyung,
>>>
>>> On Thu, Sep 16, 2021 at 02:01:21PM -0700, Namhyung Kim wrote:
>>>
>>> [...]
>>>
>>>>> Before we had discussion for enabling PID/TID for SPE samples; in the patch
>>>>> set [1], patches 07, 08 set sample's pid/tid based on the Arm SPE context
>>>>> packets.  To enable hardware tracing context ID, you also needs to enable
>>>>> kernel config CONFIG_PID_IN_CONTEXTIDR.
>>>>
>>>> Thanks for sharing this.
>>>>
>>>> Yeah I also look at the context info but having a dependency on a kconfig
>>>> looks limiting its functionality.  Also the kconfig says it has some overhead
>>>> in the critical path (even if perf is not running, right?) - but not sure how
>>>> much it can add.
>>>
>>> Yes, after enabled config PID_IN_CONTEXTIDR, the kernel will always
>>> write PID into the system register CONTEXTIDR during process context
>>> switching.  Please see the flow:
>>>
>>>   __switch_to() (arch/arm64/kernel/process.c)
>>>     `-> contextidr_thread_switch(next)
>>
>> Thanks for the info.  I assume it's a light-weight operation.
>>
>>
> I'd like to understand why it was believed that having SPE record to
> PID could be too expensive
> vs. what I am seeing with all the tracking of context switches and the
> volume of data this generates.
> 

I think the justification about it being expensive is that when PID_IN_CONTEXTIDR
is set, there is an extra few instructions to write that value on every context
switch, whether SPE is enabled or not. So it has a system wide impact.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-10-01 10:44           ` James Clark
@ 2021-10-01 18:22             ` Stephane Eranian
  2021-10-04 15:19               ` Leo Yan
  0 siblings, 1 reply; 27+ messages in thread
From: Stephane Eranian @ 2021-10-01 18:22 UTC (permalink / raw)
  To: James Clark
  Cc: Namhyung Kim, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa,
	Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers,
	Adrian Hunter

On Fri, Oct 1, 2021 at 3:44 AM James Clark <james.clark@arm.com> wrote:
>
>
>
> On 30/09/2021 19:47, Stephane Eranian wrote:
> > On Thu, Sep 23, 2021 at 9:02 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >>
> >> Hi Leo,
> >>
> >> On Thu, Sep 23, 2021 at 7:23 AM Leo Yan <leo.yan@linaro.org> wrote:
> >>>
> >>> Hi Namhyung,
> >>>
> >>> On Thu, Sep 16, 2021 at 02:01:21PM -0700, Namhyung Kim wrote:
> >>>
> >>> [...]
> >>>
> >>>>> Before we had discussion for enabling PID/TID for SPE samples; in the patch
> >>>>> set [1], patches 07, 08 set sample's pid/tid based on the Arm SPE context
> >>>>> packets.  To enable hardware tracing context ID, you also needs to enable
> >>>>> kernel config CONFIG_PID_IN_CONTEXTIDR.
> >>>>
> >>>> Thanks for sharing this.
> >>>>
> >>>> Yeah I also look at the context info but having a dependency on a kconfig
> >>>> looks limiting its functionality.  Also the kconfig says it has some overhead
> >>>> in the critical path (even if perf is not running, right?) - but not sure how
> >>>> much it can add.
> >>>
> >>> Yes, after enabled config PID_IN_CONTEXTIDR, the kernel will always
> >>> write PID into the system register CONTEXTIDR during process context
> >>> switching.  Please see the flow:
> >>>
> >>>   __switch_to() (arch/arm64/kernel/process.c)
> >>>     `-> contextidr_thread_switch(next)
> >>
> >> Thanks for the info.  I assume it's a light-weight operation.
> >>
> >>
> > I'd like to understand why it was believed that having SPE record to
> > PID could be too expensive
> > vs. what I am seeing with all the tracking of context switches and the
> > volume of data this generates.
> >
>
> I think the justification about it being expensive is that when PID_IN_CONTEXTIDR
> is set, there is an extra few instructions to write that value on every context
> switch, whether SPE is enabled or not. So it has a system wide impact.

You could use a static key to make this conditional to having SPE
running on the CPU like
it is done for other PMU features on other architectures.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-09-30 15:08       ` James Clark
@ 2021-10-04  6:26         ` Leo Yan
  2021-10-05 10:06           ` German Gomez
  0 siblings, 1 reply; 27+ messages in thread
From: Leo Yan @ 2021-10-04  6:26 UTC (permalink / raw)
  To: James Clark
  Cc: Namhyung Kim, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar,
	Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian,
	Adrian Hunter, german.gomez

Hi James,

On Thu, Sep 30, 2021 at 04:08:52PM +0100, James Clark wrote:
> On 23/09/2021 15:23, Leo Yan wrote:
> > On Thu, Sep 16, 2021 at 02:01:21PM -0700, Namhyung Kim wrote:

[...]

> >>> We also considered to use PERF_RECORD_SWITCH_CPU_WIDE event for setting
> >>> pid/tid, the Intel PT implementation uses two things to set sample's
> >>> pid/tid: one is PERF_RECORD_SWITCH_CPU_WIDE event and another is to detect
> >>> the branch instruction is the symbol "__switch_to".  Since the trace
> >>> event PERF_RECORD_SWITCH_CPU_WIDE is coarse, so it only uses the new
> >>> pid/tid after the branch instruction for "__switch_to".  Arm SPE is
> >>> 'statistical', thus it cannot promise the trace data must contain the
> >>> branch instruction for "__switch_to", please see details [2].
> >>
> >> I can see the need in the Intel PT as it needs to trace all (branch)
> >> instructions, but is it really needed for ARM SPE too?
> >> Maybe I am missing something, but it seems enough to have a
> >> coarse-grained context switch for sampling events..
> > 
> > The issue is that the coarse-grained context switch if introduces any
> > inaccuracy in the reported result.  If we can run some workloads and
> > prove the coarse-grained context switch doesn't cause significant bias,
> > it will be great and can give us the confidence for this approach.
> 
> It sounds like it's worth testing. Do you think the inaccuracy would only
> apply to code in the kernel around the time of the switch? Or do you think
> it could affect userspace as well?

The inaccuracy should only apply to the kernel code.  There would be
some samples will be wrongly accounted for the next task between the
function prepare_task_switch() and switch_to().

> It seems to me that the switch event
> would have a timestamp that would precede _all_ userspace code, but I'm not
> 100% sure on that.

Yes, the switch event is generated in the scheduler which precede
exiting to userspace:

  __schedule()
    `> context_switch()
         `> prepare_task_switch()
              `> perf_event_task_sched_out()

> I suppose it's easy to test.

I'd like to use the comparison method for the test:
We should enable PID tracing and capture in the perf.data, when decode
the trace data, we can based on context packet and based on the switch
events to generate out two results, so we can check how the difference
between these results.

> German Gomez actually starting looking into the switch events for SPE at the
> same time, so I've CCd him and maybe he can do some testing of the patch.

Cool!  German is welcome to continue the related work; since I am in
holiday this week, I will try this as well, if I have any conclusion
will get back to you guys.

If the test result shows good enough, I personally think we need finish
below items:

- Enable PID tracing and decode with context packets;
- Provide interface to user space so perf tool knows if should use
  hardware PID or rollback to context switch events;
- Merge Namhyung's patch for using switch events for samples.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-10-01 18:22             ` Stephane Eranian
@ 2021-10-04 15:19               ` Leo Yan
  0 siblings, 0 replies; 27+ messages in thread
From: Leo Yan @ 2021-10-04 15:19 UTC (permalink / raw)
  To: Stephane Eranian, Will Deacon, Catalin Marinas
  Cc: James Clark, Namhyung Kim, Arnaldo Carvalho de Melo, Jiri Olsa,
	Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers,
	Adrian Hunter

Hi all,

On Fri, Oct 01, 2021 at 11:22:33AM -0700, Stephane Eranian wrote:
> On Fri, Oct 1, 2021 at 3:44 AM James Clark <james.clark@arm.com> wrote:
> > On 30/09/2021 19:47, Stephane Eranian wrote:
> > > On Thu, Sep 23, 2021 at 9:02 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > >> On Thu, Sep 23, 2021 at 7:23 AM Leo Yan <leo.yan@linaro.org> wrote:

[...]

> > >>> Yes, after enabled config PID_IN_CONTEXTIDR, the kernel will always
> > >>> write PID into the system register CONTEXTIDR during process context
> > >>> switching.  Please see the flow:
> > >>>
> > >>>   __switch_to() (arch/arm64/kernel/process.c)
> > >>>     `-> contextidr_thread_switch(next)
> > >>
> > >> Thanks for the info.  I assume it's a light-weight operation.
> > >>
> > > I'd like to understand why it was believed that having SPE record to
> > > PID could be too expensive
> > > vs. what I am seeing with all the tracking of context switches and the
> > > volume of data this generates.
> >
> > I think the justification about it being expensive is that when PID_IN_CONTEXTIDR
> > is set, there is an extra few instructions to write that value on every context
> > switch, whether SPE is enabled or not. So it has a system wide impact.
> 
> You could use a static key to make this conditional to having SPE
> running on the CPU like
> it is done for other PMU features on other architectures.

Good point for using static key to dynamically enable and disable
CONTEXTIDR rather than using config PID_IN_CONTEXTIDR.

For curious, I did a rough testing to compare the performance for using
CONTEXTIDR, there have four testing configurations:
- 'dis': Use the mainline kernel with disabling PID_IN_CONTEXTIDR.
- 'enb': Use the mainline kernel with enablng PID_IN_CONTEXTIDR.
- 'true': Use static key='TRUE' to store PID into CONTEXTIDR.
- 'false': Use static key='FALSE' so don't store PID into CONTEXTIDR.

I used the command 'perf bench sched messaging -t -g 20 -l 1000' as
the testing case, the main reason is this case have total 800 threads
for sending and receiving messages, so it should have huge times for
context switching.

The testing iterates for 5 loops for every configuration, and get the
result for run time (in seconds):

     dis     enb     true    false
     ------------------------------
#1   26.568  26.786  26.056  26.197
#2   26.442  26.457  26.458  26.846
#3   26.719  26.701  27.119  26.281
#4   26.448  27.595  26.953  27.043
#5   27.017  27.263  26.638  26.933
     ------------------------------
avg. 26.638  26.960  26.644   26.66
     ------------------------------
delta pct.    1.21%   0.02%   0.08%

Compared with the base configuration ('dis' with disabling
PID_IN_CONTEXTIDR), we can see the performance is only minor
downgradation in other configurations ('enb': 1.21%, Enable
static key: 0.02%, Disable static key: 0.08%).  So seems to me,
it's feasible to use static key to dynamically control the path for
PID_IN_CONTEXTIDR.

@Catalin, @Will, could you confirm if it is the right direction to
use static key to replace PID_IN_CONTEXTIDR?  Or have any concern for
using the static key, like any secuirty reason?

Also pasted the code for using static key for CONTEXTIDR in the
bottom.

Thanks,
Leo

---8<---

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index f4ba93d4ffeb..857e35a8624a 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -26,9 +26,11 @@
 
 extern bool rodata_full;
 
+DECLARE_STATIC_KEY_FALSE(contextidr_used);
+
 static inline void contextidr_thread_switch(struct task_struct *next)
 {
-	if (!IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR))
+	if (!static_branch_unlikely(&contextidr_used))
 		return;
 
 	write_sysreg(task_pid_nr(next), contextidr_el1);
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 40adb8cdbf5a..f6b6e73fac9d 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -487,6 +487,22 @@ void update_sctlr_el1(u64 sctlr)
 	isb();
 }
 
+DEFINE_STATIC_KEY_FALSE(contextidr_used);
+
+static int __init contextidr_enable(char *str)
+{
+	unsigned int val = 0;
+
+	get_option(&str, &val);
+
+        if (val)
+                static_branch_enable(&contextidr_used);
+
+        return 0;
+}
+early_param("contextidr_enable", contextidr_enable);
+
 /*
  * Thread switching.
  */

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-10-04  6:26         ` Leo Yan
@ 2021-10-05 10:06           ` German Gomez
  2021-10-06  9:36             ` Leo Yan
  2021-10-06 14:06             ` James Clark
  0 siblings, 2 replies; 27+ messages in thread
From: German Gomez @ 2021-10-05 10:06 UTC (permalink / raw)
  To: Leo Yan, James Clark
  Cc: Namhyung Kim, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar,
	Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian,
	Adrian Hunter

Hi Leo,

On 04/10/2021 07:26, Leo Yan wrote:
> Hi James,
>
> On Thu, Sep 30, 2021 at 04:08:52PM +0100, James Clark wrote:
>> On 23/09/2021 15:23, Leo Yan wrote:
>>> On Thu, Sep 16, 2021 at 02:01:21PM -0700, Namhyung Kim wrote:
> [...]
> I'd like to use the comparison method for the test:
> We should enable PID tracing and capture in the perf.data, when decode
> the trace data, we can based on context packet and based on the switch
> events to generate out two results, so we can check how the difference
> between these results.

Yesterday we did some testing and found that there seems to be an exact
match between using context packets and switch events. However this only
applies when tracing in userspace (by adding the 'u' suffix to the perf
event). Otherwise we still see as much as 2% of events having the wrong
PID around the time of the switch.

In order to measure this I applied Namhyung's patch and James's patch
from [1]. Then added a printf line to the function arm_spe_prep_sample
where I have access to both PID values, as a quick way to compare them
later in a perf-report run. This is the diff of the printf patch:

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 41385ab96fbc..591985c66ac4 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -247,6 +247,8 @@ static void arm_spe_prep_sample(struct arm_spe *spe,
    event->sample.header.type = PERF_RECORD_SAMPLE;
    event->sample.header.misc = sample->cpumode;
    event->sample.header.size = sizeof(struct perf_event_header);
+
+       printf(">>>>>> %d / %lu\n", speq->tid, record->context_id & 0x7fff);
 }

The differences obtained as error % were obtained by running the
following perf-record commands for different configurations:

$ sudo ./perf record -e arm_spe/ts_enable=1,load_filter=1,store_filter=1/u -a -- sleep 60
$ sudo ./perf report --stdio \
    | grep ">>>>>>" \
    | awk '{total++; if($2!=$4) miss++} END {print "Error: " (100*miss/total) "% out of " total " samples"}'

Error: 0% out of 11839328 samples

$ sudo ./perf record -e arm_spe/ts_enable=1,load_filter=1,store_filter=1/ -a -- sleep 10
$ sudo ./perf report --stdio \
    | grep ">>>>>>" \
    | awk '{total++; if($2!=$4) miss++} END {print "Error: " (100*miss/total) "% out of " total " samples"}'

Error: 1.30624% out of 3418731 samples

>> German Gomez actually starting looking into the switch events for SPE at the
>> same time, so I've CCd him and maybe he can do some testing of the patch.
> Cool!  German is welcome to continue the related work; since I am in
> holiday this week, I will try this as well, if I have any conclusion
> will get back to you guys.
>
> If the test result shows good enough, I personally think we need finish
> below items:
>
> - Enable PID tracing and decode with context packets;
> - Provide interface to user space so perf tool knows if should use
>   hardware PID or rollback to context switch events;
> - Merge Namhyung's patch for using switch events for samples.
>
> Thanks,
> Leo

I think the fallback to using switch when we can't use the CONTEXTIDR
register is a viable option for userspace events, but maybe not so much
for non-userspace.

Thanks,
German

[1] https://www.spinics.net/lists/linux-perf-users/msg12543.html

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-10-05 10:06           ` German Gomez
@ 2021-10-06  9:36             ` Leo Yan
  2021-10-06 16:09               ` Namhyung Kim
  2021-10-06 14:06             ` James Clark
  1 sibling, 1 reply; 27+ messages in thread
From: Leo Yan @ 2021-10-06  9:36 UTC (permalink / raw)
  To: German Gomez
  Cc: James Clark, Namhyung Kim, Arnaldo Carvalho de Melo, Jiri Olsa,
	Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers,
	Stephane Eranian, Adrian Hunter

Hi German,

On Tue, Oct 05, 2021 at 11:06:12AM +0100, German Gomez wrote:

[...]

> Yesterday we did some testing and found that there seems to be an exact
> match between using context packets and switch events. However this only
> applies when tracing in userspace (by adding the 'u' suffix to the perf
> event). Otherwise we still see as much as 2% of events having the wrong
> PID around the time of the switch.

This result sounds reasonable for me, if we only trace the userspace,
the result should have no any difference between using context packets
and switch events.

It's a bit high deviation with switch events (1.30% as shown in below
result) after enable kernel tracing.

> In order to measure this I applied Namhyung's patch and James's patch
> from [1]. Then added a printf line to the function arm_spe_prep_sample
> where I have access to both PID values, as a quick way to compare them
> later in a perf-report run. This is the diff of the printf patch:
> 
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index 41385ab96fbc..591985c66ac4 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -247,6 +247,8 @@ static void arm_spe_prep_sample(struct arm_spe *spe,
>     event->sample.header.type = PERF_RECORD_SAMPLE;
>     event->sample.header.misc = sample->cpumode;
>     event->sample.header.size = sizeof(struct perf_event_header);
> +
> +       printf(">>>>>> %d / %lu\n", speq->tid, record->context_id & 0x7fff);
>  }
> 
> The differences obtained as error % were obtained by running the
> following perf-record commands for different configurations:
> 
> $ sudo ./perf record -e arm_spe/ts_enable=1,load_filter=1,store_filter=1/u -a -- sleep 60
> $ sudo ./perf report --stdio \
>     | grep ">>>>>>" \
>     | awk '{total++; if($2!=$4) miss++} END {print "Error: " (100*miss/total) "% out of " total " samples"}'
> 
> Error: 0% out of 11839328 samples
> 
> $ sudo ./perf record -e arm_spe/ts_enable=1,load_filter=1,store_filter=1/ -a -- sleep 10
> $ sudo ./perf report --stdio \
>     | grep ">>>>>>" \
>     | awk '{total++; if($2!=$4) miss++} END {print "Error: " (100*miss/total) "% out of " total " samples"}'
> 
> Error: 1.30624% out of 3418731 samples

Thanks for sharing this!

> I think the fallback to using switch when we can't use the CONTEXTIDR
> register is a viable option for userspace events, but maybe not so much
> for non-userspace.

Agreed.

If so, it's good to check the variable
'evsel->core.attr.exclude_kernel' when decode Arm SPE trace data, and
only use context switch event when 'exclude_kernel' is set.

Here should note one thing is the perf tool needs to have knowledge to
decide if the bit 3 'CX' (macro 'SYS_PMSCR_EL1_CX_SHIFT' in kernel) has
been set in register PMSCR or not.  AFAIK, Arm SPE driver doesn't
expose any interface (or config) to userspace for the context tracing,
so one method is to add an extra config in Arm SPE driver for this
bit, e.g. 'ATTR_CFG_FLD_cx_enable_CFG' can be added in Arm SPE driver.

Alternatively, rather than adding new config, I am just wandering we
simply use two flags in perf's decoding: 'use_switch_event_for_pid' and
'use_ctx_packet_for_pid', the first variable will be set if detects
the tracing is userspace only, the second varaible will be set when
detects the hardware tracing containing context packet.  So if the
variable 'use_ctx_packet_for_pid' has been set, then the decoder will
always use context packet for sample's PID, otherwise, it falls back
to check 'use_switch_event_for_pid' and set sample PID based on switch
events.

If have any other idea, please feel free bring up.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-10-05 10:06           ` German Gomez
  2021-10-06  9:36             ` Leo Yan
@ 2021-10-06 14:06             ` James Clark
  1 sibling, 0 replies; 27+ messages in thread
From: James Clark @ 2021-10-06 14:06 UTC (permalink / raw)
  To: German Gomez, Leo Yan
  Cc: Namhyung Kim, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar,
	Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian,
	Adrian Hunter



On 05/10/2021 11:06, German Gomez wrote:
> Hi Leo,
> 
> On 04/10/2021 07:26, Leo Yan wrote:
>> Hi James,
>>
>> On Thu, Sep 30, 2021 at 04:08:52PM +0100, James Clark wrote:
>>> On 23/09/2021 15:23, Leo Yan wrote:
>>>> On Thu, Sep 16, 2021 at 02:01:21PM -0700, Namhyung Kim wrote:
>> [...]
>> I'd like to use the comparison method for the test:
>> We should enable PID tracing and capture in the perf.data, when decode
>> the trace data, we can based on context packet and based on the switch
>> events to generate out two results, so we can check how the difference
>> between these results.
> 
> Yesterday we did some testing and found that there seems to be an exact
> match between using context packets and switch events. However this only
> applies when tracing in userspace (by adding the 'u' suffix to the perf
> event). Otherwise we still see as much as 2% of events having the wrong
> PID around the time of the switch.
> 
> In order to measure this I applied Namhyung's patch and James's patch
> from [1]. 

I thought that this had been applied already so I need to follow this up.

James

[...]
> 
> [1] https://www.spinics.net/lists/linux-perf-users/msg12543.html
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-10-06  9:36             ` Leo Yan
@ 2021-10-06 16:09               ` Namhyung Kim
  2021-10-08 11:07                 ` German Gomez
  0 siblings, 1 reply; 27+ messages in thread
From: Namhyung Kim @ 2021-10-06 16:09 UTC (permalink / raw)
  To: Leo Yan
  Cc: German Gomez, James Clark, Arnaldo Carvalho de Melo, Jiri Olsa,
	Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers,
	Stephane Eranian, Adrian Hunter

Hi Leo and German,

On Wed, Oct 6, 2021 at 2:36 AM Leo Yan <leo.yan@linaro.org> wrote:
>
> Hi German,
>
> On Tue, Oct 05, 2021 at 11:06:12AM +0100, German Gomez wrote:
>
> [...]
>
> > Yesterday we did some testing and found that there seems to be an exact
> > match between using context packets and switch events. However this only
> > applies when tracing in userspace (by adding the 'u' suffix to the perf
> > event). Otherwise we still see as much as 2% of events having the wrong
> > PID around the time of the switch.
>
> This result sounds reasonable for me, if we only trace the userspace,
> the result should have no any difference between using context packets
> and switch events.
>
> It's a bit high deviation with switch events (1.30% as shown in below
> result) after enable kernel tracing.

Yes, it's bigger than I expected, but it'd be workload specific.

>
> > In order to measure this I applied Namhyung's patch and James's patch
> > from [1]. Then added a printf line to the function arm_spe_prep_sample
> > where I have access to both PID values, as a quick way to compare them
> > later in a perf-report run. This is the diff of the printf patch:
> >
> > diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> > index 41385ab96fbc..591985c66ac4 100644
> > --- a/tools/perf/util/arm-spe.c
> > +++ b/tools/perf/util/arm-spe.c
> > @@ -247,6 +247,8 @@ static void arm_spe_prep_sample(struct arm_spe *spe,
> >     event->sample.header.type = PERF_RECORD_SAMPLE;
> >     event->sample.header.misc = sample->cpumode;
> >     event->sample.header.size = sizeof(struct perf_event_header);
> > +
> > +       printf(">>>>>> %d / %lu\n", speq->tid, record->context_id & 0x7fff);
> >  }
> >
> > The differences obtained as error % were obtained by running the
> > following perf-record commands for different configurations:
> >
> > $ sudo ./perf record -e arm_spe/ts_enable=1,load_filter=1,store_filter=1/u -a -- sleep 60
> > $ sudo ./perf report --stdio \
> >     | grep ">>>>>>" \
> >     | awk '{total++; if($2!=$4) miss++} END {print "Error: " (100*miss/total) "% out of " total " samples"}'
> >
> > Error: 0% out of 11839328 samples
> >
> > $ sudo ./perf record -e arm_spe/ts_enable=1,load_filter=1,store_filter=1/ -a -- sleep 10
> > $ sudo ./perf report --stdio \
> >     | grep ">>>>>>" \
> >     | awk '{total++; if($2!=$4) miss++} END {print "Error: " (100*miss/total) "% out of " total " samples"}'
> >
> > Error: 1.30624% out of 3418731 samples
>
> Thanks for sharing this!
>
> > I think the fallback to using switch when we can't use the CONTEXTIDR
> > register is a viable option for userspace events, but maybe not so much
> > for non-userspace.
>
> Agreed.
>
> If so, it's good to check the variable
> 'evsel->core.attr.exclude_kernel' when decode Arm SPE trace data, and
> only use context switch event when 'exclude_kernel' is set.

I think it'd be better to check it in perf record and not set
evsel->core.attr.context_switch if possible.

Or it can ignore the context switch once it sees a context packet.

>
> Here should note one thing is the perf tool needs to have knowledge to
> decide if the bit 3 'CX' (macro 'SYS_PMSCR_EL1_CX_SHIFT' in kernel) has
> been set in register PMSCR or not.  AFAIK, Arm SPE driver doesn't
> expose any interface (or config) to userspace for the context tracing,
> so one method is to add an extra config in Arm SPE driver for this
> bit, e.g. 'ATTR_CFG_FLD_cx_enable_CFG' can be added in Arm SPE driver.
>
> Alternatively, rather than adding new config, I am just wandering we
> simply use two flags in perf's decoding: 'use_switch_event_for_pid' and
> 'use_ctx_packet_for_pid', the first variable will be set if detects
> the tracing is userspace only, the second varaible will be set when
> detects the hardware tracing containing context packet.  So if the
> variable 'use_ctx_packet_for_pid' has been set, then the decoder will
> always use context packet for sample's PID, otherwise, it falls back
> to check 'use_switch_event_for_pid' and set sample PID based on switch
> events.
>
> If have any other idea, please feel free bring up.

If it's just kernel config, we can check /proc/config.gz or
/boot/config-$(uname -r).  When it knows for sure it can just use
the context packet, otherwise it needs the context switch.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-10-06 16:09               ` Namhyung Kim
@ 2021-10-08 11:07                 ` German Gomez
  2021-10-09  0:12                   ` Namhyung Kim
  0 siblings, 1 reply; 27+ messages in thread
From: German Gomez @ 2021-10-08 11:07 UTC (permalink / raw)
  To: Namhyung Kim, Leo Yan
  Cc: James Clark, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar,
	Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian,
	Adrian Hunter

Hi Leo, Namhyung,

On 06/10/2021 17:09, Namhyung Kim wrote:
> Hi Leo and German,
>
> [...]
>
> I think it'd be better to check it in perf record and not set
> evsel->core.attr.context_switch if possible.
>
> Or it can ignore the context switch once it sees a context packet.
>
>> Here should note one thing is the perf tool needs to have knowledge to
>> decide if the bit 3 'CX' (macro 'SYS_PMSCR_EL1_CX_SHIFT' in kernel) has
>> been set in register PMSCR or not.  AFAIK, Arm SPE driver doesn't
>> expose any interface (or config) to userspace for the context tracing,
>> so one method is to add an extra config in Arm SPE driver for this
>> bit, e.g. 'ATTR_CFG_FLD_cx_enable_CFG' can be added in Arm SPE driver.
>>
>> Alternatively, rather than adding new config, I am just wandering we
>> simply use two flags in perf's decoding: 'use_switch_event_for_pid' and
>> 'use_ctx_packet_for_pid', the first variable will be set if detects
>> the tracing is userspace only, the second varaible will be set when
>> detects the hardware tracing containing context packet.  So if the
>> variable 'use_ctx_packet_for_pid' has been set, then the decoder will
>> always use context packet for sample's PID, otherwise, it falls back
>> to check 'use_switch_event_for_pid' and set sample PID based on switch
>> events.
>>
>> If have any other idea, please feel free bring up.
> If it's just kernel config, we can check /proc/config.gz or
> /boot/config-$(uname -r).  When it knows for sure it can just use
> the context packet, otherwise it needs the context switch.
>
> Thanks,
> Namhyung

Please correct me if I'm wrong, after disabling the PID_IN_CONTEXTIDR
feature in the kernel, I don't see any context packets in the auxtraces.
I think after applying the patch from [1], it should be sufficient to
determine if pid tracing should fall back to use --switch-events when
context_id from that patch has a value of -1.

If the patch at the end of this message is applied on top of Namhyuna's
and [1], I think it can work. Also, if the pmu driver is patched to
disable the 'CX' bit when the pid is not in the root namespace [2]
(unfortunately I haven't been able to set up an environment to properly
test Leo's patch yet) tracing could also fall back to context-switch for
userspace tracing. What do you think?

Thanks,
German

[1] https://www.spinics.net/lists/linux-perf-users/msg12543.html
[2] https://lore.kernel.org/lkml/20210916135418.GA383600@leoy-ThinkPad-X240s/

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 708323d..e224665 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -71,6 +71,12 @@
     u64                kernel_start;
 
     unsigned long            num_events;
+
+    /*
+     * Used for PID tracing.
+     */
+    u8                use_context_id_pkt;
+    u8                use_context_switch_event;
 };
 
 struct arm_spe_queue {
@@ -586,13 +592,30 @@
     return timeless_decoding;
 }
 
+static bool arm_spe__is_exclude_kernel(struct arm_spe *spe) {
+    struct evsel *evsel;
+    struct evlist *evlist = spe->session->evlist;
+
+    evlist__for_each_entry(evlist, evsel) {
+        if (evsel->core.attr.type == spe->pmu_type && evsel->core.attr.exclude_kernel)
+            return true;
+    }
+
+    return false;
+}
+
 static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe,
                     struct auxtrace_queue *queue)
 {
     struct arm_spe_queue *speq = queue->priv;
-    pid_t tid;
+    pid_t tid = -1;
 
-    tid = machine__get_current_tid(spe->machine, speq->cpu);
+    if (spe->use_context_id_pkt)
+        tid = speq->decoder->record.context_id;
+
+    if (tid == -1 && spe->use_context_switch_event)
+        tid = machine__get_current_tid(spe->machine, speq->cpu);
+
     if (tid != -1) {
         speq->tid = tid;
         thread__zput(speq->thread);
@@ -1084,6 +1107,15 @@
     spe->timeless_decoding = arm_spe__is_timeless_decoding(spe);
 
     /*
+     * Always try to use context packet by default for pid tracing.
+     *
+     * If it's not enabled in the pmu driver, it will always have a value of -1 and we can try
+     * to fall back to using context-switch events instead.
+     */
+    spe->use_context_id_pkt = true;
+    spe->use_context_switch_event = arm_spe__is_exclude_kernel(spe);
+
+    /*
      * The synthesized event PERF_RECORD_TIME_CONV has been handled ahead
      * and the parameters for hardware clock are stored in the session
      * context.  Passes these parameters to the struct perf_tsc_conversion
@@ -1141,4 +1173,4 @@
 err_free:
     free(spe);
     return err;
-}
+}
\ No newline at end of file

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-10-08 11:07                 ` German Gomez
@ 2021-10-09  0:12                   ` Namhyung Kim
  2021-10-11 13:58                     ` German Gomez
  0 siblings, 1 reply; 27+ messages in thread
From: Namhyung Kim @ 2021-10-09  0:12 UTC (permalink / raw)
  To: German Gomez
  Cc: Leo Yan, James Clark, Arnaldo Carvalho de Melo, Jiri Olsa,
	Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers,
	Stephane Eranian, Adrian Hunter

Hi German,

On Fri, Oct 8, 2021 at 4:08 AM German Gomez <german.gomez@arm.com> wrote:
>
> Hi Leo, Namhyung,
>
> On 06/10/2021 17:09, Namhyung Kim wrote:
> > Hi Leo and German,
> >
> > [...]
> >
> > I think it'd be better to check it in perf record and not set
> > evsel->core.attr.context_switch if possible.
> >
> > Or it can ignore the context switch once it sees a context packet.
> >
> >> Here should note one thing is the perf tool needs to have knowledge to
> >> decide if the bit 3 'CX' (macro 'SYS_PMSCR_EL1_CX_SHIFT' in kernel) has
> >> been set in register PMSCR or not.  AFAIK, Arm SPE driver doesn't
> >> expose any interface (or config) to userspace for the context tracing,
> >> so one method is to add an extra config in Arm SPE driver for this
> >> bit, e.g. 'ATTR_CFG_FLD_cx_enable_CFG' can be added in Arm SPE driver.
> >>
> >> Alternatively, rather than adding new config, I am just wandering we
> >> simply use two flags in perf's decoding: 'use_switch_event_for_pid' and
> >> 'use_ctx_packet_for_pid', the first variable will be set if detects
> >> the tracing is userspace only, the second varaible will be set when
> >> detects the hardware tracing containing context packet.  So if the
> >> variable 'use_ctx_packet_for_pid' has been set, then the decoder will
> >> always use context packet for sample's PID, otherwise, it falls back
> >> to check 'use_switch_event_for_pid' and set sample PID based on switch
> >> events.
> >>
> >> If have any other idea, please feel free bring up.
> > If it's just kernel config, we can check /proc/config.gz or
> > /boot/config-$(uname -r).  When it knows for sure it can just use
> > the context packet, otherwise it needs the context switch.
> >
> > Thanks,
> > Namhyung
>
> Please correct me if I'm wrong, after disabling the PID_IN_CONTEXTIDR
> feature in the kernel, I don't see any context packets in the auxtraces.
> I think after applying the patch from [1], it should be sufficient to
> determine if pid tracing should fall back to use --switch-events when
> context_id from that patch has a value of -1.
>
> If the patch at the end of this message is applied on top of Namhyuna's
> and [1], I think it can work. Also, if the pmu driver is patched to
> disable the 'CX' bit when the pid is not in the root namespace [2]
> (unfortunately I haven't been able to set up an environment to properly
> test Leo's patch yet) tracing could also fall back to context-switch for
> userspace tracing. What do you think?

I think we should use context-switch even for kernel samples, but
only if the context packets are not available.

Thanks,
Namhyung

>
> Thanks,
> German
>
> [1] https://www.spinics.net/lists/linux-perf-users/msg12543.html
> [2] https://lore.kernel.org/lkml/20210916135418.GA383600@leoy-ThinkPad-X240s/
>
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index 708323d..e224665 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -71,6 +71,12 @@
>      u64                kernel_start;
>
>      unsigned long            num_events;
> +
> +    /*
> +     * Used for PID tracing.
> +     */
> +    u8                use_context_id_pkt;
> +    u8                use_context_switch_event;
>  };
>
>  struct arm_spe_queue {
> @@ -586,13 +592,30 @@
>      return timeless_decoding;
>  }
>
> +static bool arm_spe__is_exclude_kernel(struct arm_spe *spe) {
> +    struct evsel *evsel;
> +    struct evlist *evlist = spe->session->evlist;
> +
> +    evlist__for_each_entry(evlist, evsel) {
> +        if (evsel->core.attr.type == spe->pmu_type && evsel->core.attr.exclude_kernel)
> +            return true;
> +    }
> +
> +    return false;
> +}
> +
>  static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe,
>                      struct auxtrace_queue *queue)
>  {
>      struct arm_spe_queue *speq = queue->priv;
> -    pid_t tid;
> +    pid_t tid = -1;
>
> -    tid = machine__get_current_tid(spe->machine, speq->cpu);
> +    if (spe->use_context_id_pkt)
> +        tid = speq->decoder->record.context_id;
> +
> +    if (tid == -1 && spe->use_context_switch_event)
> +        tid = machine__get_current_tid(spe->machine, speq->cpu);
> +
>      if (tid != -1) {
>          speq->tid = tid;
>          thread__zput(speq->thread);
> @@ -1084,6 +1107,15 @@
>      spe->timeless_decoding = arm_spe__is_timeless_decoding(spe);
>
>      /*
> +     * Always try to use context packet by default for pid tracing.
> +     *
> +     * If it's not enabled in the pmu driver, it will always have a value of -1 and we can try
> +     * to fall back to using context-switch events instead.
> +     */
> +    spe->use_context_id_pkt = true;
> +    spe->use_context_switch_event = arm_spe__is_exclude_kernel(spe);
> +
> +    /*
>       * The synthesized event PERF_RECORD_TIME_CONV has been handled ahead
>       * and the parameters for hardware clock are stored in the session
>       * context.  Passes these parameters to the struct perf_tsc_conversion
> @@ -1141,4 +1173,4 @@
>  err_free:
>      free(spe);
>      return err;
> -}
> +}
> \ No newline at end of file

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-10-09  0:12                   ` Namhyung Kim
@ 2021-10-11 13:58                     ` German Gomez
  2021-10-11 14:29                       ` Leo Yan
  0 siblings, 1 reply; 27+ messages in thread
From: German Gomez @ 2021-10-11 13:58 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Leo Yan, James Clark, Arnaldo Carvalho de Melo, Jiri Olsa,
	Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers,
	Stephane Eranian, Adrian Hunter

Hi Namhyung,

On 09/10/2021 01:12, Namhyung Kim wrote:

> Hi German,
>
> On Fri, Oct 8, 2021 at 4:08 AM German Gomez <german.gomez@arm.com> wrote:
>
> [...]
>
> I think we should use context-switch even for kernel samples, but
> only if the context packets are not available.
>
> Thanks,
> Namhyung

I think I agree with that you say. If --switch-events is not enabled by
default like you mention, an user could opt-in to using the fallback if
there's no better option for kernel tracing yet.

@Leo, what are your thoughts on this? Perhaps adding a warning message
to tell the user to please enable context packets, otherwise the results
will have workload-dependant inaccuracies, could be a good enough
compromise?

Thanks,
German

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-10-11 13:58                     ` German Gomez
@ 2021-10-11 14:29                       ` Leo Yan
  2021-10-12 11:07                         ` German Gomez
  0 siblings, 1 reply; 27+ messages in thread
From: Leo Yan @ 2021-10-11 14:29 UTC (permalink / raw)
  To: German Gomez
  Cc: Namhyung Kim, James Clark, Arnaldo Carvalho de Melo, Jiri Olsa,
	Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers,
	Stephane Eranian, Adrian Hunter

Hi German,

On Mon, Oct 11, 2021 at 02:58:40PM +0100, German Gomez wrote:
> Hi Namhyung,
> 
> On 09/10/2021 01:12, Namhyung Kim wrote:
> 
> > Hi German,
> >
> > On Fri, Oct 8, 2021 at 4:08 AM German Gomez <german.gomez@arm.com> wrote:
> >
> > [...]
> >
> > I think we should use context-switch even for kernel samples, but
> > only if the context packets are not available.
> >
> > Thanks,
> > Namhyung
> 
> I think I agree with that you say. If --switch-events is not enabled by
> default like you mention, an user could opt-in to using the fallback if
> there's no better option for kernel tracing yet.

Actually I took time to try to find some way to enable switch events
conditionally.  As Namhyung suggested, we can enable the switch events
in the perf tool (should do this in arm_spe_recording_options()), I am
just wandering if perf tool can enable switch event only when it runs
in the non-root namespace.  I looked the code util/namespaces.c but
still fail to find any approach to confirm the perf is running in
the root namespace...  anyway, this is not critical for this work.

Welcome if anyone has idea for this.

> @Leo, what are your thoughts on this? Perhaps adding a warning message
> to tell the user to please enable context packets, otherwise the results
> will have workload-dependant inaccuracies, could be a good enough
> compromise?

Yeah, this is exactly what I think.  It's good to give a warning so
users have knowledge for the potential inaccuracies.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-10-11 14:29                       ` Leo Yan
@ 2021-10-12 11:07                         ` German Gomez
  2021-10-18 11:01                           ` German Gomez
  0 siblings, 1 reply; 27+ messages in thread
From: German Gomez @ 2021-10-12 11:07 UTC (permalink / raw)
  To: Leo Yan, Namhyung Kim
  Cc: James Clark, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar,
	Peter Zijlstra, LKML, Andi Kleen, Ian Rogers, Stephane Eranian,
	Adrian Hunter

Hi, Leo and Namhyung,

I want to make sure I'm on the same page as you regarding this topic.

On 11/10/2021 15:29, Leo Yan wrote:
> Hi German,
>
> On Mon, Oct 11, 2021 at 02:58:40PM +0100, German Gomez wrote:
>> Hi Namhyung,
>>
>> On 09/10/2021 01:12, Namhyung Kim wrote:
>>
>>> Hi German,
>>>
>>> On Fri, Oct 8, 2021 at 4:08 AM German Gomez <german.gomez@arm.com> wrote:
>>>
>>> [...]
>>>
>>> I think we should use context-switch even for kernel samples, but
>>> only if the context packets are not available.

Do you think we should use them also when tracing outside of the root
namespace? I'm no sure if we are considering the driver patch to disable
context packets in non-root ns from earlier.

>> [...]
>> Actually I took time to try to find some way to enable switch events
>> conditionally.  As Namhyung suggested, we can enable the switch events
>> in the perf tool (should do this in arm_spe_recording_options()), I am
>> just wandering if perf tool can enable switch event only when it runs
>> in the non-root namespace.  I looked the code util/namespaces.c but
>> still fail to find any approach to confirm the perf is running in
>> the root namespace...  anyway, this is not critical for this work.
>>
>> Welcome if anyone has idea for this.

Thanks, Leo. We'll let you know if we come up with something too.

>
>> @Leo, what are your thoughts on this? Perhaps adding a warning message
>> to tell the user to please enable context packets, otherwise the results
>> will have workload-dependant inaccuracies, could be a good enough
>> compromise?
> Yeah, this is exactly what I think.  It's good to give a warning so
> users have knowledge for the potential inaccuracies.
>
> Thanks,
> Leo

If we are not considering patching the driver at this stage, so we allow
hardware tracing on non-root namespaces. I think we could proceed like
this:

  - For userspace, always use context-switch events as they are
    accurate and consistent with namespaces.
  - For kernel tracing, if context packets are enabled, use them, but
    warn the user that the PIDs correspond to the root namespace.
  - Otherwise, use context-switch events and warn the user of the time
    inaccuracies.

Later, if the driver is patched to disable context packets outside the
root namespace, kernel tracing could fall back to using context-switch
events and warn the user with a single message about the time
inaccuracies.

If we are aligned, we could collect your feedback and share an updated
patch that considers the warnings.

Many thanks
Best regards

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-10-12 11:07                         ` German Gomez
@ 2021-10-18 11:01                           ` German Gomez
  2021-10-18 13:23                             ` Leo Yan
  0 siblings, 1 reply; 27+ messages in thread
From: German Gomez @ 2021-10-18 11:01 UTC (permalink / raw)
  To: Leo Yan, Namhyung Kim, James Clark
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Andi Kleen, Ian Rogers, Stephane Eranian, Adrian Hunter

Hi,

What do you thing of the patch below? PERF_RECORD_SWITCH events are also
included for tracing forks. The patch would sit on top of Namhyung's.

Thanks,
German

On 12/10/2021 12:07, German Gomez wrote:
> Hi, Leo and Namhyung,
>
> I want to make sure I'm on the same page as you regarding this topic.
>
> [...]
>
> If we are not considering patching the driver at this stage, so we allow
> hardware tracing on non-root namespaces. I think we could proceed like
> this:
>
>   - For userspace, always use context-switch events as they are
>     accurate and consistent with namespaces.
>   - For kernel tracing, if context packets are enabled, use them, but
>     warn the user that the PIDs correspond to the root namespace.
>   - Otherwise, use context-switch events and warn the user of the time
>     inaccuracies.
>
> Later, if the driver is patched to disable context packets outside the
> root namespace, kernel tracing could fall back to using context-switch
> events and warn the user with a single message about the time
> inaccuracies.
>
> If we are aligned, we could collect your feedback and share an updated
> patch that considers the warnings.
>
> Many thanks
> Best regards

---
 tools/perf/util/arm-spe.c | 66 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 63 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 708323d7c93c..6a2f7a484a80 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -71,6 +71,17 @@ struct arm_spe {
     u64                kernel_start;
 
     unsigned long            num_events;
+
+    /*
+     * Used for PID tracing.
+     */
+    u8                exclude_kernel;
+
+    /*
+     * Warning messages.
+     */
+    u8                warn_context_pkt_namesapce;
+    u8                warn_context_switch_ev_accuracy;
 };
 
 struct arm_spe_queue {
@@ -586,11 +597,42 @@ static bool arm_spe__is_timeless_decoding(struct arm_spe *spe)
     return timeless_decoding;
 }
 
+static bool arm_spe__is_exclude_kernel(struct arm_spe *spe) {
+    struct evsel *evsel;
+    struct evlist *evlist = spe->session->evlist;
+
+    evlist__for_each_entry(evlist, evsel) {
+    if (evsel->core.attr.type == spe->pmu_type && evsel->core.attr.exclude_kernel)
+        return true;
+    }
+
+    return false;
+}
+
 static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe,
                     struct auxtrace_queue *queue)
 {
     struct arm_spe_queue *speq = queue->priv;
-    pid_t tid;
+    pid_t tid = machine__get_current_tid(spe->machine, speq->cpu);
+    u64 context_id = speq->decoder->record.context_id;
+
+    /*
+    * We're tracing the kernel.
+    */
+    if (!spe->exclude_kernel) {
+        /*
+         * Use CONTEXT packets in kernel tracing if available and warn the user of the
+         * values correspond to the root PID namespace.
+         *
+         * If CONTEXT packets aren't available but context-switch events are, warn the user
+         * of the time inaccuracies.
+         */
+        if (context_id != (u64) -1) {
+            tid = speq->decoder->record.context_id;
+            spe->warn_context_pkt_namesapce = true;
+        } else if (tid != -1 && context_id == (u64) -1)
+            spe->warn_context_switch_ev_accuracy = true;
+    }
 
     tid = machine__get_current_tid(spe->machine, speq->cpu);
     if (tid != -1) {
@@ -740,7 +782,8 @@ static int arm_spe_process_event(struct perf_session *session,
         if (err)
             return err;
 
-        if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE)
+        if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE ||
+            event->header.type == PERF_RECORD_SWITCH)
             err = arm_spe_context_switch(spe, event, sample);
     }
 
@@ -807,7 +850,20 @@ static int arm_spe_flush(struct perf_session *session __maybe_unused,
         return arm_spe_process_timeless_queues(spe, -1,
                 MAX_TIMESTAMP - 1);
 
-    return arm_spe_process_queues(spe, MAX_TIMESTAMP);
+    ret = arm_spe_process_queues(spe, MAX_TIMESTAMP);
+
+    if (spe->warn_context_pkt_namesapce)
+        ui__warning(
+            "Arm SPE CONTEXT packets used for PID/TID tracing.\n\n"
+            "PID values correspond to the root PID namespace.\n\n");
+
+    if (spe->warn_context_switch_ev_accuracy)
+        ui__warning(
+            "No Arm SPE CONTEXT packets found within traces.\n\n"
+            "Fallback to PERF_RECORD_SWITCH events for PID/TID tracing will have\n"
+            "workload-dependant timing inaccuracies.\n\n");
+
+    return ret;
 }
 
 static void arm_spe_free_queue(void *priv)
@@ -1083,6 +1139,10 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 
     spe->timeless_decoding = arm_spe__is_timeless_decoding(spe);
 
+    spe->exclude_kernel = arm_spe__is_exclude_kernel(spe);
+    spe->warn_context_pkt_namesapce = false;
+    spe->warn_context_switch_ev_accuracy = false;
+
     /*
      * The synthesized event PERF_RECORD_TIME_CONV has been handled ahead
      * and the parameters for hardware clock are stored in the session
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-10-18 11:01                           ` German Gomez
@ 2021-10-18 13:23                             ` Leo Yan
  2021-10-19 12:21                               ` German Gomez
  0 siblings, 1 reply; 27+ messages in thread
From: Leo Yan @ 2021-10-18 13:23 UTC (permalink / raw)
  To: German Gomez
  Cc: Namhyung Kim, James Clark, Arnaldo Carvalho de Melo, Jiri Olsa,
	Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers,
	Stephane Eranian, Adrian Hunter

Hi German,

On Mon, Oct 18, 2021 at 12:01:27PM +0100, German Gomez wrote:
> Hi,
> 
> What do you thing of the patch below? PERF_RECORD_SWITCH events are also
> included for tracing forks. The patch would sit on top of Namhyung's.

Yeah, it's good to add PERF_RECORD_SWITCH.

> On 12/10/2021 12:07, German Gomez wrote:
> > Hi, Leo and Namhyung,
> >
> > I want to make sure I'm on the same page as you regarding this topic.
> >
> > [...]
> >
> > If we are not considering patching the driver at this stage, so we allow
> > hardware tracing on non-root namespaces. I think we could proceed like
> > this:
> >
> >   - For userspace, always use context-switch events as they are
> >     accurate and consistent with namespaces.

I don't think you can always use context-switch events for userspace
samples.  The underlying mechanism is when there have context-switch
event or context packet is coming, it will invoke the function
machine__set_current_tid() to set current pid/tid; afterwards, we
can retrieve the current pid/tid with the function
arm_spe_set_pid_tid_cpu().

The question is that if we want to use the tid/pid info at the same
time for both context-switch events and context packets, then it's
hard to maintain.  E.g. we need to create multiple thread context, one
is used to track pid info coming from context-switch events and
another context is to track pid info from context packet.

To simplify the code, I still think we give context packet priority and
use it if it's avalible.  And we rollback to use context-switch events
for pid/tid when context packet is not avaliable.

> >   - For kernel tracing, if context packets are enabled, use them, but
> >     warn the user that the PIDs correspond to the root namespace.
> >   - Otherwise, use context-switch events and warn the user of the time
> >     inaccuracies.
> >
> > Later, if the driver is patched to disable context packets outside the
> > root namespace, kernel tracing could fall back to using context-switch
> > events and warn the user with a single message about the time
> > inaccuracies.
> >
> > If we are aligned, we could collect your feedback and share an updated
> > patch that considers the warnings.
> >
> > Many thanks
> > Best regards
> 
> ---
>  tools/perf/util/arm-spe.c | 66 +++++++++++++++++++++++++++++++++++++--
>  1 file changed, 63 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index 708323d7c93c..6a2f7a484a80 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -71,6 +71,17 @@ struct arm_spe {
>      u64                kernel_start;
>  
>      unsigned long            num_events;
> +
> +    /*
> +     * Used for PID tracing.
> +     */
> +    u8                exclude_kernel;
> +
> +    /*
> +     * Warning messages.
> +     */
> +    u8                warn_context_pkt_namesapce;
> +    u8                warn_context_switch_ev_accuracy;
>  };
>  
>  struct arm_spe_queue {
> @@ -586,11 +597,42 @@ static bool arm_spe__is_timeless_decoding(struct arm_spe *spe)
>      return timeless_decoding;
>  }
>  
> +static bool arm_spe__is_exclude_kernel(struct arm_spe *spe) {
> +    struct evsel *evsel;
> +    struct evlist *evlist = spe->session->evlist;
> +
> +    evlist__for_each_entry(evlist, evsel) {
> +    if (evsel->core.attr.type == spe->pmu_type && evsel->core.attr.exclude_kernel)
> +        return true;
> +    }
> +
> +    return false;
> +}
> +
>  static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe,
>                      struct auxtrace_queue *queue)
>  {
>      struct arm_spe_queue *speq = queue->priv;
> -    pid_t tid;
> +    pid_t tid = machine__get_current_tid(spe->machine, speq->cpu);
> +    u64 context_id = speq->decoder->record.context_id;
> +
> +    /*
> +    * We're tracing the kernel.
> +    */
> +    if (!spe->exclude_kernel) {

This is incorrect ... 'exclude_kernel' is a global variable and if
it's set then perf will always run below code.

I think here you want to avoid using contect packet for user space
samples, but checking 'exclude_kernel' cannot help for this purpose
since 'exclude_kernel' cannot be used to decide sample mode (kernel
mode or user mode).

Thanks,
Leo

> +        /*
> +         * Use CONTEXT packets in kernel tracing if available and warn the user of the
> +         * values correspond to the root PID namespace.
> +         *
> +         * If CONTEXT packets aren't available but context-switch events are, warn the user
> +         * of the time inaccuracies.
> +         */
> +        if (context_id != (u64) -1) {
> +            tid = speq->decoder->record.context_id;
> +            spe->warn_context_pkt_namesapce = true;
> +        } else if (tid != -1 && context_id == (u64) -1)
> +            spe->warn_context_switch_ev_accuracy = true;
> +    }
>  
>      tid = machine__get_current_tid(spe->machine, speq->cpu);
>      if (tid != -1) {
> @@ -740,7 +782,8 @@ static int arm_spe_process_event(struct perf_session *session,
>          if (err)
>              return err;
>  
> -        if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE)
> +        if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE ||
> +            event->header.type == PERF_RECORD_SWITCH)
>              err = arm_spe_context_switch(spe, event, sample);
>      }
>  
> @@ -807,7 +850,20 @@ static int arm_spe_flush(struct perf_session *session __maybe_unused,
>          return arm_spe_process_timeless_queues(spe, -1,
>                  MAX_TIMESTAMP - 1);
>  
> -    return arm_spe_process_queues(spe, MAX_TIMESTAMP);
> +    ret = arm_spe_process_queues(spe, MAX_TIMESTAMP);
> +
> +    if (spe->warn_context_pkt_namesapce)
> +        ui__warning(
> +            "Arm SPE CONTEXT packets used for PID/TID tracing.\n\n"
> +            "PID values correspond to the root PID namespace.\n\n");
> +
> +    if (spe->warn_context_switch_ev_accuracy)
> +        ui__warning(
> +            "No Arm SPE CONTEXT packets found within traces.\n\n"
> +            "Fallback to PERF_RECORD_SWITCH events for PID/TID tracing will have\n"
> +            "workload-dependant timing inaccuracies.\n\n");
> +
> +    return ret;
>  }
>  
>  static void arm_spe_free_queue(void *priv)
> @@ -1083,6 +1139,10 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
>  
>      spe->timeless_decoding = arm_spe__is_timeless_decoding(spe);
>  
> +    spe->exclude_kernel = arm_spe__is_exclude_kernel(spe);
> +    spe->warn_context_pkt_namesapce = false;
> +    spe->warn_context_switch_ev_accuracy = false;
> +
>      /*
>       * The synthesized event PERF_RECORD_TIME_CONV has been handled ahead
>       * and the parameters for hardware clock are stored in the session
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-10-18 13:23                             ` Leo Yan
@ 2021-10-19 12:21                               ` German Gomez
  2021-10-29 10:51                                 ` German Gomez
  0 siblings, 1 reply; 27+ messages in thread
From: German Gomez @ 2021-10-19 12:21 UTC (permalink / raw)
  To: Leo Yan
  Cc: Namhyung Kim, James Clark, Arnaldo Carvalho de Melo, Jiri Olsa,
	Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers,
	Stephane Eranian, Adrian Hunter

Hi Leo,

Many thanks for you comments as always and sorry for the rushed patch.

On 18/10/2021 14:23, Leo Yan wrote:
> Hi German,
>
> On Mon, Oct 18, 2021 at 12:01:27PM +0100, German Gomez wrote:
>> Hi,
>>
>> What do you thing of the patch below? PERF_RECORD_SWITCH events are also
>> included for tracing forks. The patch would sit on top of Namhyung's.
> Yeah, it's good to add PERF_RECORD_SWITCH.
>
>> On 12/10/2021 12:07, German Gomez wrote:
>>> Hi, Leo and Namhyung,
>>>
>>> I want to make sure I'm on the same page as you regarding this topic.
>>>
>>> [...]
>>>
>>> If we are not considering patching the driver at this stage, so we allow
>>> hardware tracing on non-root namespaces. I think we could proceed like
>>> this:
>>>
>>> � - For userspace, always use context-switch events as they are
>>> ��� accurate and consistent with namespaces.
> I don't think you can always use context-switch events for userspace
> samples.  The underlying mechanism is when there have context-switch
> event or context packet is coming, it will invoke the function
> machine__set_current_tid() to set current pid/tid; afterwards, we
> can retrieve the current pid/tid with the function
> arm_spe_set_pid_tid_cpu().
>
> The question is that if we want to use the tid/pid info at the same
> time for both context-switch events and context packets, then it's
> hard to maintain.  E.g. we need to create multiple thread context, one
> is used to track pid info coming from context-switch events and
> another context is to track pid info from context packet.

My thinking was to use only one of the methods for the entire run, but
the code below is not expressive enough I'm afraid and I agree it could
become hard to maintain. I need to polish it up.

>
> To simplify the code, I still think we give context packet priority and
> use it if it's avalible.  And we rollback to use context-switch events
> for pid/tid when context packet is not avaliable.

OK if it simplifies things. I think context-pkt availability can be
determined before any events are processed by looking at the top record
in the auxtrace_heap, o any of the auxtrace_queues.

>>> � - For kernel tracing, if context packets are enabled, use them, but
>>> ��� warn the user that the PIDs correspond to the root namespace.
>>> � - Otherwise, use context-switch events and warn the user of the time
>>> ��� inaccuracies.
>>>
>>> Later, if the driver is patched to disable context packets outside the
>>> root namespace, kernel tracing could fall back to using context-switch
>>> events and warn the user with a single message about the time
>>> inaccuracies.
>>>
>>> If we are aligned, we could collect your feedback and share an updated
>>> patch that considers the warnings.
>>>
>>> Many thanks
>>> Best regards
>> ---
>> �tools/perf/util/arm-spe.c | 66 +++++++++++++++++++++++++++++++++++++--
>> �1 file changed, 63 insertions(+), 3 deletions(-)
>>
>> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
>> index 708323d7c93c..6a2f7a484a80 100644
>> --- a/tools/perf/util/arm-spe.c
>> +++ b/tools/perf/util/arm-spe.c
>> @@ -71,6 +71,17 @@ struct arm_spe {
>> ���� u64��� ��� ��� ��� kernel_start;
>> �
>> ���� unsigned long��� ��� ��� num_events;
>> +
>> +��� /*
>> +��� �* Used for PID tracing.
>> +��� �*/
>> +��� u8��� ��� ��� ��� exclude_kernel;
>> +
>> +��� /*
>> +��� �* Warning messages.
>> +��� �*/
>> +��� u8��� ��� ��� ��� warn_context_pkt_namesapce;
>> +��� u8��� ��� ��� ��� warn_context_switch_ev_accuracy;
>> �};
>> �
>> �struct arm_spe_queue {
>> @@ -586,11 +597,42 @@ static bool arm_spe__is_timeless_decoding(struct arm_spe *spe)
>> ���� return timeless_decoding;
>> �}
>> �
>> +static bool arm_spe__is_exclude_kernel(struct arm_spe *spe) {
>> +��� struct evsel *evsel;
>> +��� struct evlist *evlist = spe->session->evlist;
>> +
>> +��� evlist__for_each_entry(evlist, evsel) {
>> +��� if (evsel->core.attr.type == spe->pmu_type && evsel->core.attr.exclude_kernel)
>> +��� ��� return true;
>> +��� }
>> +
>> +��� return false;
>> +}
>> +
>> �static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe,
>> ���� ��� ��� ��� ��� struct auxtrace_queue *queue)
>> �{
>> ���� struct arm_spe_queue *speq = queue->priv;
>> -��� pid_t tid;
>> +��� pid_t tid = machine__get_current_tid(spe->machine, speq->cpu);
>> +��� u64 context_id = speq->decoder->record.context_id;
>> +
>> +��� /*
>> +��� * We're tracing the kernel.
>> +��� */
>> +��� if (!spe->exclude_kernel) {
> This is incorrect ... 'exclude_kernel' is a global variable and if
> it's set then perf will always run below code.
>
> I think here you want to avoid using contect packet for user space
> samples, but checking 'exclude_kernel' cannot help for this purpose
> since 'exclude_kernel' cannot be used to decide sample mode (kernel
> mode or user mode).
>
> Thanks,
> Leo
>
>> +��� ��� /*
>> +��� ��� �* Use CONTEXT packets in kernel tracing if available and warn the user of the
>> +��� ��� �* values correspond to the root PID namespace.
>> +��� ��� �*
>> +��� ��� �* If CONTEXT packets aren't available but context-switch events are, warn the user
>> +��� ��� �* of the time inaccuracies.
>> +��� ��� �*/
>> +��� ��� if (context_id != (u64) -1) {
>> +��� ��� ��� tid = speq->decoder->record.context_id;
>> +��� ��� ��� spe->warn_context_pkt_namesapce = true;
>> +��� ��� } else if (tid != -1 && context_id == (u64) -1)
>> +��� ��� ��� spe->warn_context_switch_ev_accuracy = true;
>> +��� }
>> �
>> ���� tid = machine__get_current_tid(spe->machine, speq->cpu);
>> ���� if (tid != -1) {
>> @@ -740,7 +782,8 @@ static int arm_spe_process_event(struct perf_session *session,
>> ���� ��� if (err)
>> ���� ��� ��� return err;
>> �
>> -��� ��� if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE)
>> +��� ��� if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE ||
>> +��� ��� ��� event->header.type == PERF_RECORD_SWITCH)
>> ���� ��� ��� err = arm_spe_context_switch(spe, event, sample);
>> ���� }
>> �
>> @@ -807,7 +850,20 @@ static int arm_spe_flush(struct perf_session *session __maybe_unused,
>> ���� ��� return arm_spe_process_timeless_queues(spe, -1,
>> ���� ��� ��� ��� MAX_TIMESTAMP - 1);
>> �
>> -��� return arm_spe_process_queues(spe, MAX_TIMESTAMP);
>> +��� ret = arm_spe_process_queues(spe, MAX_TIMESTAMP);
>> +
>> +��� if (spe->warn_context_pkt_namesapce)
>> +��� ��� ui__warning(
>> +��� ��� ��� "Arm SPE CONTEXT packets used for PID/TID tracing.\n\n"
>> +��� ��� ��� "PID values correspond to the root PID namespace.\n\n");
>> +
>> +��� if (spe->warn_context_switch_ev_accuracy)
>> +��� ��� ui__warning(
>> +��� ��� ��� "No Arm SPE CONTEXT packets found within traces.\n\n"
>> +��� ��� ��� "Fallback to PERF_RECORD_SWITCH events for PID/TID tracing will have\n"
>> +��� ��� ��� "workload-dependant timing inaccuracies.\n\n");
>> +
>> +��� return ret;
>> �}
>> �
>> �static void arm_spe_free_queue(void *priv)
>> @@ -1083,6 +1139,10 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
>> �
>> ���� spe->timeless_decoding = arm_spe__is_timeless_decoding(spe);
>> �
>> +��� spe->exclude_kernel = arm_spe__is_exclude_kernel(spe);
>> +��� spe->warn_context_pkt_namesapce = false;
>> +��� spe->warn_context_switch_ev_accuracy = false;
>> +
>> ���� /*
>> ���� �* The synthesized event PERF_RECORD_TIME_CONV has been handled ahead
>> ���� �* and the parameters for hardware clock are stored in the session
>> -- 
>> 2.17.1


Thanks,
German


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-10-19 12:21                               ` German Gomez
@ 2021-10-29 10:51                                 ` German Gomez
  2021-11-01 15:11                                   ` Leo Yan
  0 siblings, 1 reply; 27+ messages in thread
From: German Gomez @ 2021-10-29 10:51 UTC (permalink / raw)
  To: Leo Yan
  Cc: Namhyung Kim, James Clark, Arnaldo Carvalho de Melo, Jiri Olsa,
	Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers,
	Stephane Eranian, Adrian Hunter

Hi Leo,

The current plan is to define a global flag in the `struct arm_spe` to
select the method of pid tracing (context pkt, or switch events):

    struct arm_spe {
       /* ... */
       u8        use_ctx_pkt_for_pid;
    }

The method could be determined by peeking at the top element of the
`struct auxtrace_heap` at the beginning of the perf-report. If ctx
packets have been collected, the first one should have a context_id !=
-1. We could then tweak this part of Namhyung patch slightly:

    if (!spe->use_ctx_pkt_for_pid &&
        (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE ||
         event->header.type == PERF_RECORD_SWITCH))
            err = arm_spe_context_switch(spe, event, sample);

Then we could apply patch [1] which wasn't fully merged in the end,
including similar `if (spe->use_ctx_pkt_for_pid)` to collect the pid/tid
from the context packets.

What do you think?

Thanks,
German

[1] https://lore.kernel.org/lkml/20210119144658.793-8-james.clark@arm.com/

On 19/10/2021 13:21, German Gomez wrote:
> Hi Leo,
>
> Many thanks for you comments as always and sorry for the rushed patch.
>
> On 18/10/2021 14:23, Leo Yan wrote:
>> Hi German,
>>
>> On Mon, Oct 18, 2021 at 12:01:27PM +0100, German Gomez wrote:
>>> Hi,
>>>
>>> What do you thing of the patch below? PERF_RECORD_SWITCH events are also
>>> included for tracing forks. The patch would sit on top of Namhyung's.
>> Yeah, it's good to add PERF_RECORD_SWITCH.
>>
>>> On 12/10/2021 12:07, German Gomez wrote:
>>>> Hi, Leo and Namhyung,
>>>>
>>>> I want to make sure I'm on the same page as you regarding this topic.
>>>>
>>>> [...]
>>>>
>>>> If we are not considering patching the driver at this stage, so we allow
>>>> hardware tracing on non-root namespaces. I think we could proceed like
>>>> this:
>>>>
>>>> � - For userspace, always use context-switch events as they are
>>>> ��� accurate and consistent with namespaces.
>> I don't think you can always use context-switch events for userspace
>> samples.  The underlying mechanism is when there have context-switch
>> event or context packet is coming, it will invoke the function
>> machine__set_current_tid() to set current pid/tid; afterwards, we
>> can retrieve the current pid/tid with the function
>> arm_spe_set_pid_tid_cpu().
>>
>> The question is that if we want to use the tid/pid info at the same
>> time for both context-switch events and context packets, then it's
>> hard to maintain.  E.g. we need to create multiple thread context, one
>> is used to track pid info coming from context-switch events and
>> another context is to track pid info from context packet.
> My thinking was to use only one of the methods for the entire run, but
> the code below is not expressive enough I'm afraid and I agree it could
> become hard to maintain. I need to polish it up.
>
>> To simplify the code, I still think we give context packet priority and
>> use it if it's avalible.  And we rollback to use context-switch events
>> for pid/tid when context packet is not avaliable.
> OK if it simplifies things. I think context-pkt availability can be
> determined before any events are processed by looking at the top record
> in the auxtrace_heap, o any of the auxtrace_queues.
>
>>>> � - For kernel tracing, if context packets are enabled, use them, but
>>>> ��� warn the user that the PIDs correspond to the root namespace.
>>>> � - Otherwise, use context-switch events and warn the user of the time
>>>> ��� inaccuracies.
>>>>
>>>> Later, if the driver is patched to disable context packets outside the
>>>> root namespace, kernel tracing could fall back to using context-switch
>>>> events and warn the user with a single message about the time
>>>> inaccuracies.
>>>>
>>>> If we are aligned, we could collect your feedback and share an updated
>>>> patch that considers the warnings.
>>>>
>>>> Many thanks
>>>> Best regards
>>> ---
>>> �tools/perf/util/arm-spe.c | 66 +++++++++++++++++++++++++++++++++++++--
>>> �1 file changed, 63 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
>>> index 708323d7c93c..6a2f7a484a80 100644
>>> --- a/tools/perf/util/arm-spe.c
>>> +++ b/tools/perf/util/arm-spe.c
>>> @@ -71,6 +71,17 @@ struct arm_spe {
>>> ���� u64��� ��� ��� ��� kernel_start;
>>> �
>>> ���� unsigned long��� ��� ��� num_events;
>>> +
>>> +��� /*
>>> +��� �* Used for PID tracing.
>>> +��� �*/
>>> +��� u8��� ��� ��� ��� exclude_kernel;
>>> +
>>> +��� /*
>>> +��� �* Warning messages.
>>> +��� �*/
>>> +��� u8��� ��� ��� ��� warn_context_pkt_namesapce;
>>> +��� u8��� ��� ��� ��� warn_context_switch_ev_accuracy;
>>> �};
>>> �
>>> �struct arm_spe_queue {
>>> @@ -586,11 +597,42 @@ static bool arm_spe__is_timeless_decoding(struct arm_spe *spe)
>>> ���� return timeless_decoding;
>>> �}
>>> �
>>> +static bool arm_spe__is_exclude_kernel(struct arm_spe *spe) {
>>> +��� struct evsel *evsel;
>>> +��� struct evlist *evlist = spe->session->evlist;
>>> +
>>> +��� evlist__for_each_entry(evlist, evsel) {
>>> +��� if (evsel->core.attr.type == spe->pmu_type && evsel->core.attr.exclude_kernel)
>>> +��� ��� return true;
>>> +��� }
>>> +
>>> +��� return false;
>>> +}
>>> +
>>> �static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe,
>>> ���� ��� ��� ��� ��� struct auxtrace_queue *queue)
>>> �{
>>> ���� struct arm_spe_queue *speq = queue->priv;
>>> -��� pid_t tid;
>>> +��� pid_t tid = machine__get_current_tid(spe->machine, speq->cpu);
>>> +��� u64 context_id = speq->decoder->record.context_id;
>>> +
>>> +��� /*
>>> +��� * We're tracing the kernel.
>>> +��� */
>>> +��� if (!spe->exclude_kernel) {
>> This is incorrect ... 'exclude_kernel' is a global variable and if
>> it's set then perf will always run below code.
>>
>> I think here you want to avoid using contect packet for user space
>> samples, but checking 'exclude_kernel' cannot help for this purpose
>> since 'exclude_kernel' cannot be used to decide sample mode (kernel
>> mode or user mode).
>>
>> Thanks,
>> Leo
>>
>>> +��� ��� /*
>>> +��� ��� �* Use CONTEXT packets in kernel tracing if available and warn the user of the
>>> +��� ��� �* values correspond to the root PID namespace.
>>> +��� ��� �*
>>> +��� ��� �* If CONTEXT packets aren't available but context-switch events are, warn the user
>>> +��� ��� �* of the time inaccuracies.
>>> +��� ��� �*/
>>> +��� ��� if (context_id != (u64) -1) {
>>> +��� ��� ��� tid = speq->decoder->record.context_id;
>>> +��� ��� ��� spe->warn_context_pkt_namesapce = true;
>>> +��� ��� } else if (tid != -1 && context_id == (u64) -1)
>>> +��� ��� ��� spe->warn_context_switch_ev_accuracy = true;
>>> +��� }
>>> �
>>> ���� tid = machine__get_current_tid(spe->machine, speq->cpu);
>>> ���� if (tid != -1) {
>>> @@ -740,7 +782,8 @@ static int arm_spe_process_event(struct perf_session *session,
>>> ���� ��� if (err)
>>> ���� ��� ��� return err;
>>> �
>>> -��� ��� if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE)
>>> +��� ��� if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE ||
>>> +��� ��� ��� event->header.type == PERF_RECORD_SWITCH)
>>> ���� ��� ��� err = arm_spe_context_switch(spe, event, sample);
>>> ���� }
>>> �
>>> @@ -807,7 +850,20 @@ static int arm_spe_flush(struct perf_session *session __maybe_unused,
>>> ���� ��� return arm_spe_process_timeless_queues(spe, -1,
>>> ���� ��� ��� ��� MAX_TIMESTAMP - 1);
>>> �
>>> -��� return arm_spe_process_queues(spe, MAX_TIMESTAMP);
>>> +��� ret = arm_spe_process_queues(spe, MAX_TIMESTAMP);
>>> +
>>> +��� if (spe->warn_context_pkt_namesapce)
>>> +��� ��� ui__warning(
>>> +��� ��� ��� "Arm SPE CONTEXT packets used for PID/TID tracing.\n\n"
>>> +��� ��� ��� "PID values correspond to the root PID namespace.\n\n");
>>> +
>>> +��� if (spe->warn_context_switch_ev_accuracy)
>>> +��� ��� ui__warning(
>>> +��� ��� ��� "No Arm SPE CONTEXT packets found within traces.\n\n"
>>> +��� ��� ��� "Fallback to PERF_RECORD_SWITCH events for PID/TID tracing will have\n"
>>> +��� ��� ��� "workload-dependant timing inaccuracies.\n\n");
>>> +
>>> +��� return ret;
>>> �}
>>> �
>>> �static void arm_spe_free_queue(void *priv)
>>> @@ -1083,6 +1139,10 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
>>> �
>>> ���� spe->timeless_decoding = arm_spe__is_timeless_decoding(spe);
>>> �
>>> +��� spe->exclude_kernel = arm_spe__is_exclude_kernel(spe);
>>> +��� spe->warn_context_pkt_namesapce = false;
>>> +��� spe->warn_context_switch_ev_accuracy = false;
>>> +
>>> ���� /*
>>> ���� �* The synthesized event PERF_RECORD_TIME_CONV has been handled ahead
>>> ���� �* and the parameters for hardware clock are stored in the session
>>> -- 
>>> 2.17.1
>
> Thanks,
> German
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-10-29 10:51                                 ` German Gomez
@ 2021-11-01 15:11                                   ` Leo Yan
  2021-11-01 15:36                                     ` German Gomez
  0 siblings, 1 reply; 27+ messages in thread
From: Leo Yan @ 2021-11-01 15:11 UTC (permalink / raw)
  To: German Gomez
  Cc: Namhyung Kim, James Clark, Arnaldo Carvalho de Melo, Jiri Olsa,
	Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers,
	Stephane Eranian, Adrian Hunter

Hi German,

On Fri, Oct 29, 2021 at 11:51:16AM +0100, German Gomez wrote:
> Hi Leo,
> 
> The current plan is to define a global flag in the `struct arm_spe` to
> select the method of pid tracing (context pkt, or switch events):
> 
>     struct arm_spe {
>        /* ... */
>        u8        use_ctx_pkt_for_pid;
>     }
> 
> The method could be determined by peeking at the top element of the
> `struct auxtrace_heap` at the beginning of the perf-report. If ctx
> packets have been collected, the first one should have a context_id !=
> -1. We could then tweak this part of Namhyung patch slightly:

Have one concern: if cannot find the context packet, will the decoder
drop the SPE packets until it find the first context packet?  If this
is the case, I am concern the decoder will run out for all packets
and doesn't generate any samples if the SPE trace data doesn't contain
any context packet.

> 
>     if (!spe->use_ctx_pkt_for_pid &&
>         (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE ||
>          event->header.type == PERF_RECORD_SWITCH))
>             err = arm_spe_context_switch(spe, event, sample);
> 
> Then we could apply patch [1] which wasn't fully merged in the end,
> including similar `if (spe->use_ctx_pkt_for_pid)` to collect the pid/tid
> from the context packets.
> 
> What do you think?

Except the above concern, the solution looks good to me.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-11-01 15:11                                   ` Leo Yan
@ 2021-11-01 15:36                                     ` German Gomez
  2021-11-01 15:42                                       ` German Gomez
  0 siblings, 1 reply; 27+ messages in thread
From: German Gomez @ 2021-11-01 15:36 UTC (permalink / raw)
  To: Leo Yan
  Cc: Namhyung Kim, James Clark, Arnaldo Carvalho de Melo, Jiri Olsa,
	Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers,
	Stephane Eranian, Adrian Hunter

Hi Leo,

On 01/11/2021 15:11, Leo Yan wrote:
> Hi German,
>
> On Fri, Oct 29, 2021 at 11:51:16AM +0100, German Gomez wrote:
> [...]
> Have one concern: if cannot find the context packet, will the decoder
> drop the SPE packets until it find the first context packet?  If this
> is the case, I am concern the decoder will run out for all packets
> and doesn't generate any samples if the SPE trace data doesn't contain
> any context packet.

Not really. It will only peek at the first decoded packet without
dropping it. I couldn't think of a corner case where the decoder might
miss a context packet for the first records (I also haven't seen any -1
values so far).

>> ��� if (!spe->use_ctx_pkt_for_pid &&
>> ������� (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE ||
>> �������� event->header.type == PERF_RECORD_SWITCH))
>> ����������� err = arm_spe_context_switch(spe, event, sample);
>>
>> Then we could apply patch [1] which wasn't fully merged in the end,
>> including similar `if (spe->use_ctx_pkt_for_pid)` to collect the pid/tid
>> from the context packets.
>>
>> What do you think?
> Except the above concern, the solution looks good to me.

I realized I cannot use the heap for it will not work in timeless
decoding. We can still use the queues though. By the way, is this return
statement in the arm_spe__setup_queue() function misplaced?

    if (spe->timeless_decoding)
            return 0;

Judging by the long comment in the arm_spe_run_decoder() function, it
seems like it should be placed somewhere below the call to "ret =
arm_spe_decode(...)", otherwise arm_spe_run_decoder() will begin with an
uninitialized record?

Thanks,
German

>
> Thanks,
> Leo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
  2021-11-01 15:36                                     ` German Gomez
@ 2021-11-01 15:42                                       ` German Gomez
  0 siblings, 0 replies; 27+ messages in thread
From: German Gomez @ 2021-11-01 15:42 UTC (permalink / raw)
  To: Leo Yan
  Cc: Namhyung Kim, James Clark, Arnaldo Carvalho de Melo, Jiri Olsa,
	Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers,
	Stephane Eranian, Adrian Hunter


On 01/11/2021 15:36, German Gomez wrote:
> [...]
> Not really. It will only peek at the first decoded packet without

Slight correction for clarity: s/decoded packet/decoded record/. I would
be peeking at the first record in the auxtrace queue. The context packet
is stored in the context_id field of the record.

> dropping it. I couldn't think of a corner case where the decoder might
> miss a context packet for the first records (I also haven't seen any -1
> values so far).
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2021-11-01 15:42 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-16  0:17 [RFC] perf arm-spe: Track task context switch for cpu-mode events Namhyung Kim
2021-09-16 13:54 ` Leo Yan
2021-09-16 21:01   ` Namhyung Kim
2021-09-23 14:23     ` Leo Yan
2021-09-23 16:01       ` Namhyung Kim
2021-09-30 18:47         ` Stephane Eranian
2021-10-01 10:44           ` James Clark
2021-10-01 18:22             ` Stephane Eranian
2021-10-04 15:19               ` Leo Yan
2021-09-30 15:08       ` James Clark
2021-10-04  6:26         ` Leo Yan
2021-10-05 10:06           ` German Gomez
2021-10-06  9:36             ` Leo Yan
2021-10-06 16:09               ` Namhyung Kim
2021-10-08 11:07                 ` German Gomez
2021-10-09  0:12                   ` Namhyung Kim
2021-10-11 13:58                     ` German Gomez
2021-10-11 14:29                       ` Leo Yan
2021-10-12 11:07                         ` German Gomez
2021-10-18 11:01                           ` German Gomez
2021-10-18 13:23                             ` Leo Yan
2021-10-19 12:21                               ` German Gomez
2021-10-29 10:51                                 ` German Gomez
2021-11-01 15:11                                   ` Leo Yan
2021-11-01 15:36                                     ` German Gomez
2021-11-01 15:42                                       ` German Gomez
2021-10-06 14:06             ` James Clark

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.