linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] perf: arm_spe: Fix consistency of CONTEXT packets in SPE driver
@ 2022-01-17 12:44 German Gomez
  2022-01-17 12:44 ` [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX German Gomez
  2022-01-17 12:44 ` [RFC PATCH 2/2] perf: arm_spe: Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode German Gomez
  0 siblings, 2 replies; 18+ messages in thread
From: German Gomez @ 2022-01-17 12:44 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel, will, mark.rutland, james.clark, leo.yan
  Cc: German Gomez

Applies a couple of small changes to the arm_spe_pmu driver.

We are seeing context packets in an inconsistent number of SPE records
even  when the perf-tool runs without the needed capabilities. This is
fixed in [1/2].

We're also allowing CONTEXT packets to be collected in per-cpu events in
[2/2].

I'm sending as an RFC because it's the first time I change driver code.
Also I'm not 100% sure of the approach in [2/2] (from a security and/or
implementation standpoint).

Thanks,
German

- [PATCH 1/2] Fixes the consistency issue with the context packets.
- [PATCH 2/2] Enables context packets in per-cpu events.

German Gomez (2):
  perf: arm_spe: Fix consistency of PMSCR register bit CX
  perf: arm_spe: Enable CONTEXT packets in SPE traces if the profiler
    runs in CPU mode.

 drivers/perf/arm_spe_pmu.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX
  2022-01-17 12:44 [RFC PATCH 0/2] perf: arm_spe: Fix consistency of CONTEXT packets in SPE driver German Gomez
@ 2022-01-17 12:44 ` German Gomez
  2022-01-18 10:07   ` Will Deacon
  2022-02-05 15:39   ` Leo Yan
  2022-01-17 12:44 ` [RFC PATCH 2/2] perf: arm_spe: Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode German Gomez
  1 sibling, 2 replies; 18+ messages in thread
From: German Gomez @ 2022-01-17 12:44 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel, will, mark.rutland, james.clark, leo.yan
  Cc: German Gomez

The arm_spe_pmu driver will enable the CX bit of the PMSCR register in
order to add CONTEXT packets into the traces if the owner of the perf
event runs with capabilities CAP_PERFMON or CAP_SYS_ADMIN.

The value of the bit is computed in the arm_spe_event_to_pmscr function
[1], and the check for capabilities happens in [2] in the pmu init
callback. This suggests that the value of the CX bit should remain
consistent for the duration of the perf session.

However, the function arm_spe_event_to_pmscr may be called later during
the start callback [3] when the "current" process is not the owner of
the perf session, so the CX bit is currently not consistent. Consider
the following example:

  1. Run a process in the background with capability CAP_SYS_ADMIN in CPU0.

    $ taskset --cpu-list 0 sudo dd if=/dev/random of=/dev/null &
    [3] 3806

  2. Begin a perf session _without_ capabilities (we shouldn't see CONTEXT packets).

    $ perf record -e arm_spe_0// -C0 -- sleep 1
    $ perf report -D | grep CONTEXT
    .  0000000e:  65 df 0e 00 00                                  CONTEXT 0xedf el2
    .  0000004e:  65 df 0e 00 00                                  CONTEXT 0xedf el2
    .  0000008e:  65 df 0e 00 00                                  CONTEXT 0xedf el2
    [...]

As can be seen, the traces begin showing CONTEXT packets when the pid is
0xedf (3807). This happens because the pmu start callback is run when
the current process is not the owner of the perf session, so the CX
register bit is set.

One way to fix this is by caching the value of the CX bit during the
initialization of the PMU event, so that it remains consistent for the
duration of the session.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_spe_pmu.c?h=v5.16#n275
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_spe_pmu.c?h=v5.16#n713
[3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_spe_pmu.c?h=v5.16#n751

Signed-off-by: German Gomez <german.gomez@arm.com>
---
 drivers/perf/arm_spe_pmu.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index d44bcc29d..8515bf85c 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -57,6 +57,7 @@ struct arm_spe_pmu {
 	u16					pmsver;
 	u16					min_period;
 	u16					counter_sz;
+	bool					pmscr_cx;
 
 #define SPE_PMU_FEAT_FILT_EVT			(1UL << 0)
 #define SPE_PMU_FEAT_FILT_TYP			(1UL << 1)
@@ -260,6 +261,7 @@ static const struct attribute_group *arm_spe_pmu_attr_groups[] = {
 static u64 arm_spe_event_to_pmscr(struct perf_event *event)
 {
 	struct perf_event_attr *attr = &event->attr;
+	struct arm_spe_pmu *spe_pmu = to_spe_pmu(event->pmu);
 	u64 reg = 0;
 
 	reg |= ATTR_CFG_GET_FLD(attr, ts_enable) << SYS_PMSCR_EL1_TS_SHIFT;
@@ -272,7 +274,7 @@ static u64 arm_spe_event_to_pmscr(struct perf_event *event)
 	if (!attr->exclude_kernel)
 		reg |= BIT(SYS_PMSCR_EL1_E1SPE_SHIFT);
 
-	if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable())
+	if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && spe_pmu->pmscr_cx)
 		reg |= BIT(SYS_PMSCR_EL1_CX_SHIFT);
 
 	return reg;
@@ -709,10 +711,10 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
 	    !(spe_pmu->features & SPE_PMU_FEAT_FILT_LAT))
 		return -EOPNOTSUPP;
 
+	spe_pmu->pmscr_cx = perfmon_capable();
 	reg = arm_spe_event_to_pmscr(event);
 	if (!perfmon_capable() &&
 	    (reg & (BIT(SYS_PMSCR_EL1_PA_SHIFT) |
-		    BIT(SYS_PMSCR_EL1_CX_SHIFT) |
 		    BIT(SYS_PMSCR_EL1_PCT_SHIFT))))
 		return -EACCES;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 2/2] perf: arm_spe: Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode.
  2022-01-17 12:44 [RFC PATCH 0/2] perf: arm_spe: Fix consistency of CONTEXT packets in SPE driver German Gomez
  2022-01-17 12:44 ` [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX German Gomez
@ 2022-01-17 12:44 ` German Gomez
  2022-01-17 14:04   ` German Gomez
  2022-01-18  9:52   ` Will Deacon
  1 sibling, 2 replies; 18+ messages in thread
From: German Gomez @ 2022-01-17 12:44 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel, will, mark.rutland, james.clark, leo.yan
  Cc: German Gomez

Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode.
This is no less permissive than the existing behavior for the following
reason:

If perf_event_paranoid <= 0, then non perfmon_capable() users can open
a per-CPU event. With a per-CPU event, unpriviledged users are allowed
to profile _all_ processes, even ones owned by root.

Without this change, users could see kernel addresses, root processes,
etc, but not gather the PIDs of those processes. The PID is probably the
least sensitive of all the information.

It would be more idiomatic to check the perf_event_paranoid level with
perf_allow_cpu(), but this function is not exported so cannot be used
from a module. Looking for cpu != -1 is the indirect way of checking
the same thing as it could never get to arm_spe_pmu_event_init() without
perf_event_paranoid <= 0.

Co-authored-by: James Clark <james.clark@arm.com>
Signed-off-by: German Gomez <german.gomez@arm.com>
---
 drivers/perf/arm_spe_pmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 8515bf85c..7d9a7fa4f 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -711,7 +711,7 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
 	    !(spe_pmu->features & SPE_PMU_FEAT_FILT_LAT))
 		return -EOPNOTSUPP;
 
-	spe_pmu->pmscr_cx = perfmon_capable();
+	spe_pmu->pmscr_cx = perfmon_capable() || (event->cpu != -1);
 	reg = arm_spe_event_to_pmscr(event);
 	if (!perfmon_capable() &&
 	    (reg & (BIT(SYS_PMSCR_EL1_PA_SHIFT) |
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 2/2] perf: arm_spe: Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode.
  2022-01-17 12:44 ` [RFC PATCH 2/2] perf: arm_spe: Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode German Gomez
@ 2022-01-17 14:04   ` German Gomez
  2022-01-18  9:52   ` Will Deacon
  1 sibling, 0 replies; 18+ messages in thread
From: German Gomez @ 2022-01-17 14:04 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel, will, mark.rutland, james.clark, leo.yan


On 17/01/2022 12:44, German Gomez wrote:
> Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode.
> This is no less permissive than the existing behavior for the following
> reason:
>
> If perf_event_paranoid <= 0, then non perfmon_capable() users can open
> a per-CPU event. With a per-CPU event, unpriviledged users are allowed
> to profile _all_ processes, even ones owned by root.
>
> Without this change, users could see kernel addresses, root processes,
> etc, but not gather the PIDs of those processes. The PID is probably the
> least sensitive of all the information.
>
> It would be more idiomatic to check the perf_event_paranoid level with
> perf_allow_cpu(), but this function is not exported so cannot be used
> from a module. Looking for cpu != -1 is the indirect way of checking
> the same thing as it could never get to arm_spe_pmu_event_init() without

Reconsidering this bit (comment below)

> perf_event_paranoid <= 0.
>
> Co-authored-by: James Clark <james.clark@arm.com>
> Signed-off-by: German Gomez <german.gomez@arm.com>
> ---
>  drivers/perf/arm_spe_pmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> index 8515bf85c..7d9a7fa4f 100644
> --- a/drivers/perf/arm_spe_pmu.c
> +++ b/drivers/perf/arm_spe_pmu.c
> @@ -711,7 +711,7 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
>  	    !(spe_pmu->features & SPE_PMU_FEAT_FILT_LAT))
>  		return -EOPNOTSUPP;
>  
> -	spe_pmu->pmscr_cx = perfmon_capable();
> +	spe_pmu->pmscr_cx = perfmon_capable() || (event->cpu != -1);

The perf_event_open(2) manpage states:

       pid == -1 and cpu >= 0
              This measures all processes/threads on the specified CPU.
              This requires CAP_PERFMON (since Linux 5.8) or
              CAP_SYS_ADMIN capability or a
              /proc/sys/kernel/perf_event_paranoid value of less than 1.

So perhaps it's more accurate to (still implicitly) check the paranoid level with "pid == -1 && event->cpu > 0" ?

If so, I think I have to dig deeper into perf_event. I don't immediately see the pid argument. Any hints?

Thanks,
German

>  	reg = arm_spe_event_to_pmscr(event);
>  	if (!perfmon_capable() &&
>  	    (reg & (BIT(SYS_PMSCR_EL1_PA_SHIFT) |

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 2/2] perf: arm_spe: Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode.
  2022-01-17 12:44 ` [RFC PATCH 2/2] perf: arm_spe: Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode German Gomez
  2022-01-17 14:04   ` German Gomez
@ 2022-01-18  9:52   ` Will Deacon
  2022-01-18 14:13     ` German Gomez
  1 sibling, 1 reply; 18+ messages in thread
From: Will Deacon @ 2022-01-18  9:52 UTC (permalink / raw)
  To: German Gomez
  Cc: linux-arm-kernel, linux-kernel, mark.rutland, james.clark, leo.yan

On Mon, Jan 17, 2022 at 12:44:32PM +0000, German Gomez wrote:
> Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode.
> This is no less permissive than the existing behavior for the following
> reason:
> 
> If perf_event_paranoid <= 0, then non perfmon_capable() users can open
> a per-CPU event. With a per-CPU event, unpriviledged users are allowed
> to profile _all_ processes, even ones owned by root.
> 
> Without this change, users could see kernel addresses, root processes,
> etc, but not gather the PIDs of those processes. The PID is probably the
> least sensitive of all the information.
> 
> It would be more idiomatic to check the perf_event_paranoid level with
> perf_allow_cpu(), but this function is not exported so cannot be used
> from a module. Looking for cpu != -1 is the indirect way of checking
> the same thing as it could never get to arm_spe_pmu_event_init() without
> perf_event_paranoid <= 0.

perf_allow_cpu() is a static inline so there's no need to export it. What's
missing?

Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX
  2022-01-17 12:44 ` [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX German Gomez
@ 2022-01-18 10:07   ` Will Deacon
  2022-01-18 14:04     ` German Gomez
  2022-01-18 16:28     ` James Clark
  2022-02-05 15:39   ` Leo Yan
  1 sibling, 2 replies; 18+ messages in thread
From: Will Deacon @ 2022-01-18 10:07 UTC (permalink / raw)
  To: German Gomez
  Cc: linux-arm-kernel, linux-kernel, mark.rutland, james.clark, leo.yan

On Mon, Jan 17, 2022 at 12:44:31PM +0000, German Gomez wrote:
> The arm_spe_pmu driver will enable the CX bit of the PMSCR register in
> order to add CONTEXT packets into the traces if the owner of the perf
> event runs with capabilities CAP_PERFMON or CAP_SYS_ADMIN.
> 
> The value of the bit is computed in the arm_spe_event_to_pmscr function
> [1], and the check for capabilities happens in [2] in the pmu init
> callback. This suggests that the value of the CX bit should remain
> consistent for the duration of the perf session.
> 
> However, the function arm_spe_event_to_pmscr may be called later during
> the start callback [3] when the "current" process is not the owner of
> the perf session, so the CX bit is currently not consistent. Consider
> the following example:
> 
>   1. Run a process in the background with capability CAP_SYS_ADMIN in CPU0.
> 
>     $ taskset --cpu-list 0 sudo dd if=/dev/random of=/dev/null &
>     [3] 3806
> 
>   2. Begin a perf session _without_ capabilities (we shouldn't see CONTEXT packets).
> 
>     $ perf record -e arm_spe_0// -C0 -- sleep 1
>     $ perf report -D | grep CONTEXT
>     .  0000000e:  65 df 0e 00 00                                  CONTEXT 0xedf el2
>     .  0000004e:  65 df 0e 00 00                                  CONTEXT 0xedf el2
>     .  0000008e:  65 df 0e 00 00                                  CONTEXT 0xedf el2
>     [...]
> 
> As can be seen, the traces begin showing CONTEXT packets when the pid is
> 0xedf (3807).

So to be clear: we shouldn't be reporting these packets because 'perf'
doesn't have the right capabilities, but we evaluate that in the context of
'dd' (running as root) and so incorrectly grant the permission. Correct?

> This happens because the pmu start callback is run when
> the current process is not the owner of the perf session, so the CX
> register bit is set.

This doesn't really seem SPE-specific to me -- the perf_allow_*() helpers
also operate implicitly on the current task. How do other PMU drivers avoid
falling into this trap?

> One way to fix this is by caching the value of the CX bit during the
> initialization of the PMU event, so that it remains consistent for the
> duration of the session.

It doesn't feel right to stash this in 'struct arm_spe_pmu' during event
initialisation -- wouldn't that allow perf to continue creating new events
with CX set, even if the paranoid sysctl was changed dynamically? Instead,
I think it would be better if the capabilities were stash in the event
itself somehow at initialisation time.

Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX
  2022-01-18 10:07   ` Will Deacon
@ 2022-01-18 14:04     ` German Gomez
  2022-01-19 11:27       ` German Gomez
  2022-01-18 16:28     ` James Clark
  1 sibling, 1 reply; 18+ messages in thread
From: German Gomez @ 2022-01-18 14:04 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-kernel, mark.rutland, james.clark, leo.yan

Hi Will,

Many thanks for your comments

On 18/01/2022 10:07, Will Deacon wrote:
> On Mon, Jan 17, 2022 at 12:44:31PM +0000, German Gomez wrote:
>> [...]
>>
>>   1. Run a process in the background with capability CAP_SYS_ADMIN in CPU0.
>>
>>     $ taskset --cpu-list 0 sudo dd if=/dev/random of=/dev/null &
>>     [3] 3806
>>
>>   2. Begin a perf session _without_ capabilities (we shouldn't see CONTEXT packets).
>>
>>     $ perf record -e arm_spe_0// -C0 -- sleep 1
>>     $ perf report -D | grep CONTEXT
>>     .  0000000e:  65 df 0e 00 00                                  CONTEXT 0xedf el2
>>     .  0000004e:  65 df 0e 00 00                                  CONTEXT 0xedf el2
>>     .  0000008e:  65 df 0e 00 00                                  CONTEXT 0xedf el2
>>     [...]
>>
>> As can be seen, the traces begin showing CONTEXT packets when the pid is
>> 0xedf (3807).
> So to be clear: we shouldn't be reporting these packets because 'perf'
> doesn't have the right capabilities, but we evaluate that in the context of
> 'dd' (running as root) and so incorrectly grant the permission. Correct?

Yes, correct. My guess was that "perfmon_capable()" was being called
under the assumption that it would always be evaluated in the context of
'perf'. Is that correct?

>
>> This happens because the pmu start callback is run when
>> the current process is not the owner of the perf session, so the CX
>> register bit is set.
> This doesn't really seem SPE-specific to me -- the perf_allow_*() helpers
> also operate implicitly on the current task. How do other PMU drivers avoid
> falling into this trap?

I'm not as familiar with the other PMU drivers. I quickly tried grepping
something related in the cs_etm drivers as they use CONTEXTIDR as well,
but couldn't find references to perfmon_capable() or similar checks.

Grepping for "perf_allow_" inside of drivers doesn't yield results.
There's some gpu driver that has similar perfmon_capable() checks but
unlike spe, they error out if they don't pass (drivers/gpu/drm/i915/i915_perf.c).

>
>> One way to fix this is by caching the value of the CX bit during the
>> initialization of the PMU event, so that it remains consistent for the
>> duration of the session.
> It doesn't feel right to stash this in 'struct arm_spe_pmu' during event
> initialisation -- wouldn't that allow perf to continue creating new events
> with CX set, even if the paranoid sysctl was changed dynamically? Instead,
> I think it would be better if the capabilities were stash in the event
> itself somehow at initialisation time.

I hadn't considered this. Makes more sense to store in the perf_event
or via some type of mapping in the struct spe_pmu if not possible. Do
you have any idea for the former? Or an idiomatic structure from the
kernel for the later?

Thanks,
German

>
> Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 2/2] perf: arm_spe: Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode.
  2022-01-18  9:52   ` Will Deacon
@ 2022-01-18 14:13     ` German Gomez
  0 siblings, 0 replies; 18+ messages in thread
From: German Gomez @ 2022-01-18 14:13 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-kernel, mark.rutland, james.clark, leo.yan


On 18/01/2022 09:52, Will Deacon wrote:
> On Mon, Jan 17, 2022 at 12:44:32PM +0000, German Gomez wrote:
>> Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode.
>> This is no less permissive than the existing behavior for the following
>> reason:
>>
>> If perf_event_paranoid <= 0, then non perfmon_capable() users can open
>> a per-CPU event. With a per-CPU event, unpriviledged users are allowed
>> to profile _all_ processes, even ones owned by root.
>>
>> Without this change, users could see kernel addresses, root processes,
>> etc, but not gather the PIDs of those processes. The PID is probably the
>> least sensitive of all the information.
>>
>> It would be more idiomatic to check the perf_event_paranoid level with
>> perf_allow_cpu(), but this function is not exported so cannot be used
>> from a module. Looking for cpu != -1 is the indirect way of checking
>> the same thing as it could never get to arm_spe_pmu_event_init() without
>> perf_event_paranoid <= 0.
> perf_allow_cpu() is a static inline so there's no need to export it. What's
> missing?

We were still running into build errors:

ERROR: modpost: "security_perf_event_open" [drivers/perf/arm_spe_pmu.ko] undefined!
ERROR: modpost: "sysctl_perf_event_paranoid" [drivers/perf/arm_spe_pmu.ko] undefined

>
> Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX
  2022-01-18 10:07   ` Will Deacon
  2022-01-18 14:04     ` German Gomez
@ 2022-01-18 16:28     ` James Clark
  1 sibling, 0 replies; 18+ messages in thread
From: James Clark @ 2022-01-18 16:28 UTC (permalink / raw)
  To: Will Deacon, German Gomez
  Cc: linux-arm-kernel, linux-kernel, mark.rutland, leo.yan



On 18/01/2022 10:07, Will Deacon wrote:
> On Mon, Jan 17, 2022 at 12:44:31PM +0000, German Gomez wrote:
>> The arm_spe_pmu driver will enable the CX bit of the PMSCR register in
>> order to add CONTEXT packets into the traces if the owner of the perf
>> event runs with capabilities CAP_PERFMON or CAP_SYS_ADMIN.
>>
>> The value of the bit is computed in the arm_spe_event_to_pmscr function
>> [1], and the check for capabilities happens in [2] in the pmu init
>> callback. This suggests that the value of the CX bit should remain
>> consistent for the duration of the perf session.
>>
>> However, the function arm_spe_event_to_pmscr may be called later during
>> the start callback [3] when the "current" process is not the owner of
>> the perf session, so the CX bit is currently not consistent. Consider
>> the following example:
>>
>>   1. Run a process in the background with capability CAP_SYS_ADMIN in CPU0.
>>
>>     $ taskset --cpu-list 0 sudo dd if=/dev/random of=/dev/null &
>>     [3] 3806
>>
>>   2. Begin a perf session _without_ capabilities (we shouldn't see CONTEXT packets).
>>
>>     $ perf record -e arm_spe_0// -C0 -- sleep 1
>>     $ perf report -D | grep CONTEXT
>>     .  0000000e:  65 df 0e 00 00                                  CONTEXT 0xedf el2
>>     .  0000004e:  65 df 0e 00 00                                  CONTEXT 0xedf el2
>>     .  0000008e:  65 df 0e 00 00                                  CONTEXT 0xedf el2
>>     [...]
>>
>> As can be seen, the traces begin showing CONTEXT packets when the pid is
>> 0xedf (3807).
> 
> So to be clear: we shouldn't be reporting these packets because 'perf'
> doesn't have the right capabilities, but we evaluate that in the context of
> 'dd' (running as root) and so incorrectly grant the permission. Correct?

In my opinion we should be reporting context packets in this case. The only reason
a normal user is allowed to profile a root process is if they have
perf_event_paranoid <= 0. So from that perspective perf does have the right
capabilities. But a call to only perfmon_capable() doesn't look at the paranoid
value, only CAP_SYS_ADMIN and CAP_SYS_ADMIN.

> 
>> This happens because the pmu start callback is run when
>> the current process is not the owner of the perf session, so the CX
>> register bit is set.
> 
> This doesn't really seem SPE-specific to me -- the perf_allow_*() helpers
> also operate implicitly on the current task. How do other PMU drivers avoid
> falling into this trap?> 
>> One way to fix this is by caching the value of the CX bit during the
>> initialization of the PMU event, so that it remains consistent for the
>> duration of the session.
> 
> It doesn't feel right to stash this in 'struct arm_spe_pmu' during event
> initialisation -- wouldn't that allow perf to continue creating new events
> with CX set, even if the paranoid sysctl was changed dynamically? Instead,
> I think it would be better if the capabilities were stash in the event
> itself somehow at initialisation time.

If stashing is the issue then the change could be to check the right thing
every time, as in do something like the second patch in this set which
also checks perf_event_paranoid <= 0 (indirectly).

James

> 
> Will
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX
  2022-01-18 14:04     ` German Gomez
@ 2022-01-19 11:27       ` German Gomez
  0 siblings, 0 replies; 18+ messages in thread
From: German Gomez @ 2022-01-19 11:27 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-kernel, mark.rutland, james.clark, leo.yan


On 18/01/2022 14:04, German Gomez wrote:
> Hi Will,
>
> Many thanks for your comments
>
> On 18/01/2022 10:07, Will Deacon wrote:
>> On Mon, Jan 17, 2022 at 12:44:31PM +0000, German Gomez wrote:
>>> [...]
>>>
>>>   1. Run a process in the background with capability CAP_SYS_ADMIN in CPU0.
>>>
>>>     $ taskset --cpu-list 0 sudo dd if=/dev/random of=/dev/null &
>>>     [3] 3806
>>>
>>>   2. Begin a perf session _without_ capabilities (we shouldn't see CONTEXT packets).
>>>
>>>     $ perf record -e arm_spe_0// -C0 -- sleep 1
>>>     $ perf report -D | grep CONTEXT
>>>     .  0000000e:  65 df 0e 00 00                                  CONTEXT 0xedf el2
>>>     .  0000004e:  65 df 0e 00 00                                  CONTEXT 0xedf el2
>>>     .  0000008e:  65 df 0e 00 00                                  CONTEXT 0xedf el2
>>>     [...]
>>>
>>> As can be seen, the traces begin showing CONTEXT packets when the pid is
>>> 0xedf (3807).
>> So to be clear: we shouldn't be reporting these packets because 'perf'
>> doesn't have the right capabilities, but we evaluate that in the context of
>> 'dd' (running as root) and so incorrectly grant the permission. Correct?
> Yes, correct. My guess was that "perfmon_capable()" was being called
> under the assumption that it would always be evaluated in the context of
> 'perf'. Is that correct?
>
>>> This happens because the pmu start callback is run when
>>> the current process is not the owner of the perf session, so the CX
>>> register bit is set.
>> This doesn't really seem SPE-specific to me -- the perf_allow_*() helpers
>> also operate implicitly on the current task. How do other PMU drivers avoid
>> falling into this trap?
> I'm not as familiar with the other PMU drivers. I quickly tried grepping
> something related in the cs_etm drivers as they use CONTEXTIDR as well,
> but couldn't find references to perfmon_capable() or similar checks.
>
> Grepping for "perf_allow_" inside of drivers doesn't yield results.
> There's some gpu driver that has similar perfmon_capable() checks but
> unlike spe, they error out if they don't pass (drivers/gpu/drm/i915/i915_perf.c).

Just to expand a bit more on this (I missed grepping the other directories)

./arch/powerpc/perf/imc-pmu.c => perfmon_capable() only called on init
./kernel/events/core.c        => perfmon_capable() only called on init
./arch/x86/events/core.c      => perf_allow_*() function only called on init

As far as I see, currently spe seems to be the only PMU event that
checks capabilities in the start callback. 'perf' may not be the current
task in this callback.

>
>>> One way to fix this is by caching the value of the CX bit during the
>>> initialization of the PMU event, so that it remains consistent for the
>>> duration of the session.
>> It doesn't feel right to stash this in 'struct arm_spe_pmu' during event
>> initialisation -- wouldn't that allow perf to continue creating new events
>> with CX set, even if the paranoid sysctl was changed dynamically? Instead,
>> I think it would be better if the capabilities were stash in the event
>> itself somehow at initialisation time.
> I hadn't considered this. Makes more sense to store in the perf_event
> or via some type of mapping in the struct spe_pmu if not possible. Do
> you have any idea for the former? Or an idiomatic structure from the
> kernel for the later?
>
> Thanks,
> German
>
>> Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX
  2022-01-17 12:44 ` [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX German Gomez
  2022-01-18 10:07   ` Will Deacon
@ 2022-02-05 15:39   ` Leo Yan
  2022-02-07 12:06     ` German Gomez
  1 sibling, 1 reply; 18+ messages in thread
From: Leo Yan @ 2022-02-05 15:39 UTC (permalink / raw)
  To: German Gomez
  Cc: linux-arm-kernel, linux-kernel, will, mark.rutland, james.clark

Hi German,

On Mon, Jan 17, 2022 at 12:44:31PM +0000, German Gomez wrote:
> The arm_spe_pmu driver will enable the CX bit of the PMSCR register in
> order to add CONTEXT packets into the traces if the owner of the perf
> event runs with capabilities CAP_PERFMON or CAP_SYS_ADMIN.
> 
> The value of the bit is computed in the arm_spe_event_to_pmscr function
> [1], and the check for capabilities happens in [2] in the pmu init
> callback. This suggests that the value of the CX bit should remain
> consistent for the duration of the perf session.
> 
> However, the function arm_spe_event_to_pmscr may be called later during
> the start callback [3] when the "current" process is not the owner of
> the perf session, so the CX bit is currently not consistent. Consider
> the following example:
> 
>   1. Run a process in the background with capability CAP_SYS_ADMIN in CPU0.
> 
>     $ taskset --cpu-list 0 sudo dd if=/dev/random of=/dev/null &
>     [3] 3806
> 
>   2. Begin a perf session _without_ capabilities (we shouldn't see CONTEXT packets).
> 
>     $ perf record -e arm_spe_0// -C0 -- sleep 1
>     $ perf report -D | grep CONTEXT
>     .  0000000e:  65 df 0e 00 00                                  CONTEXT 0xedf el2
>     .  0000004e:  65 df 0e 00 00                                  CONTEXT 0xedf el2
>     .  0000008e:  65 df 0e 00 00                                  CONTEXT 0xedf el2
>     [...]

Could you confirm if you still can reproduce this issue on the latest
mainline kernel?  I cannot reproduce this issue on the latest mainline
kernel, I suspect this is impacted by recent refactoring evlist
patches from Ian Rogers (though I am not for this).

> As can be seen, the traces begin showing CONTEXT packets when the pid is
> 0xedf (3807). This happens because the pmu start callback is run when
> the current process is not the owner of the perf session, so the CX
> register bit is set.

I can image a potential issue is: the "dd" program running in background
with capability CAP_SYS_ADMIN on CPU0, and then perf session sends an
IPI remotely from any other CPU to CPU0, the dd process (on CPU0) is
interrupted to handle ioctl PERF_EVENT_IOC_ENABLE, thus perfmon_capable()
returns the capability of dd process, finally it leads to the wrong
setting for PMSCR.

I reviewed the code and also traced the backtrace for the function
arm_spe_pmu_start(), I can confirm that every time perf session will
execute below flow:

  evlist__enable()
    __evlist__enable()
      evlist__for_each_cpu() {  -> call affinity__set()
        evsel__enable_cpu()
      }

We can see the macro evlist__for_each_cpu() will extend to invoke
evlist__cpu_begin() and affinity__set(); affinity__set() will set CPU
affinity to the target CPU, thus perf process will firstly migrate to
the target CPU and enable event on the target CPU.  This means perf
will not send remote IPI and it directly runs on target CPU, and the
dd program will not interfere capabilities for perf session.

Thanks,
Leo

> One way to fix this is by caching the value of the CX bit during the
> initialization of the PMU event, so that it remains consistent for the
> duration of the session.
> 
> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_spe_pmu.c?h=v5.16#n275
> [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_spe_pmu.c?h=v5.16#n713
> [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_spe_pmu.c?h=v5.16#n751
> 
> Signed-off-by: German Gomez <german.gomez@arm.com>
> ---
>  drivers/perf/arm_spe_pmu.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> index d44bcc29d..8515bf85c 100644
> --- a/drivers/perf/arm_spe_pmu.c
> +++ b/drivers/perf/arm_spe_pmu.c
> @@ -57,6 +57,7 @@ struct arm_spe_pmu {
>  	u16					pmsver;
>  	u16					min_period;
>  	u16					counter_sz;
> +	bool					pmscr_cx;
>  
>  #define SPE_PMU_FEAT_FILT_EVT			(1UL << 0)
>  #define SPE_PMU_FEAT_FILT_TYP			(1UL << 1)
> @@ -260,6 +261,7 @@ static const struct attribute_group *arm_spe_pmu_attr_groups[] = {
>  static u64 arm_spe_event_to_pmscr(struct perf_event *event)
>  {
>  	struct perf_event_attr *attr = &event->attr;
> +	struct arm_spe_pmu *spe_pmu = to_spe_pmu(event->pmu);
>  	u64 reg = 0;
>  
>  	reg |= ATTR_CFG_GET_FLD(attr, ts_enable) << SYS_PMSCR_EL1_TS_SHIFT;
> @@ -272,7 +274,7 @@ static u64 arm_spe_event_to_pmscr(struct perf_event *event)
>  	if (!attr->exclude_kernel)
>  		reg |= BIT(SYS_PMSCR_EL1_E1SPE_SHIFT);
>  
> -	if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable())
> +	if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && spe_pmu->pmscr_cx)
>  		reg |= BIT(SYS_PMSCR_EL1_CX_SHIFT);
>  
>  	return reg;
> @@ -709,10 +711,10 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
>  	    !(spe_pmu->features & SPE_PMU_FEAT_FILT_LAT))
>  		return -EOPNOTSUPP;
>  
> +	spe_pmu->pmscr_cx = perfmon_capable();
>  	reg = arm_spe_event_to_pmscr(event);
>  	if (!perfmon_capable() &&
>  	    (reg & (BIT(SYS_PMSCR_EL1_PA_SHIFT) |
> -		    BIT(SYS_PMSCR_EL1_CX_SHIFT) |
>  		    BIT(SYS_PMSCR_EL1_PCT_SHIFT))))
>  		return -EACCES;
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX
  2022-02-05 15:39   ` Leo Yan
@ 2022-02-07 12:06     ` German Gomez
  2022-02-08 13:00       ` Leo Yan
  0 siblings, 1 reply; 18+ messages in thread
From: German Gomez @ 2022-02-07 12:06 UTC (permalink / raw)
  To: Leo Yan; +Cc: linux-arm-kernel, linux-kernel, will, mark.rutland, james.clark

Hi Leo,

On 05/02/2022 15:39, Leo Yan wrote:
> Hi German,
>
> On Mon, Jan 17, 2022 at 12:44:31PM +0000, German Gomez wrote:
>> [...]
>>
>>   1. Run a process in the background with capability CAP_SYS_ADMIN in CPU0.
>>
>>     $ taskset --cpu-list 0 sudo dd if=/dev/random of=/dev/null &
>>     [3] 3806
>>
>>   2. Begin a perf session _without_ capabilities (we shouldn't see CONTEXT packets).
>>
>>     $ perf record -e arm_spe_0// -C0 -- sleep 1
>>     $ perf report -D | grep CONTEXT
>>     .  0000000e:  65 df 0e 00 00                                  CONTEXT 0xedf el2
>>     .  0000004e:  65 df 0e 00 00                                  CONTEXT 0xedf el2
>>     .  0000008e:  65 df 0e 00 00                                  CONTEXT 0xedf el2
>>     [...]
> Could you confirm if you still can reproduce this issue on the latest
> mainline kernel?  I cannot reproduce this issue on the latest mainline
> kernel, I suspect this is impacted by recent refactoring evlist
> patches from Ian Rogers (though I am not for this).

Are you referring to running on the latest kernel or the latest perf
tool? I can still reproduce using perf from v5.17-rc3, on a kernel and
SPE driver from 5.17.0-rc1 (commands below).

>
>> As can be seen, the traces begin showing CONTEXT packets when the pid is
>> 0xedf (3807). This happens because the pmu start callback is run when
>> the current process is not the owner of the perf session, so the CX
>> register bit is set.
> I can image a potential issue is: the "dd" program running in background
> with capability CAP_SYS_ADMIN on CPU0, and then perf session sends an
> IPI remotely from any other CPU to CPU0, the dd process (on CPU0) is
> interrupted to handle ioctl PERF_EVENT_IOC_ENABLE, thus perfmon_capable()
> returns the capability of dd process, finally it leads to the wrong
> setting for PMSCR.
>
> I reviewed the code and also traced the backtrace for the function
> arm_spe_pmu_start(), I can confirm that every time perf session will
> execute below flow:
>
>   evlist__enable()
>     __evlist__enable()
>       evlist__for_each_cpu() {  -> call affinity__set()
>         evsel__enable_cpu()
>       }
>
> We can see the macro evlist__for_each_cpu() will extend to invoke
> evlist__cpu_begin() and affinity__set(); affinity__set() will set CPU
> affinity to the target CPU, thus perf process will firstly migrate to
> the target CPU and enable event on the target CPU.  This means perf
> will not send remote IPI and it directly runs on target CPU, and the
> dd program will not interfere capabilities for perf session.
Thank you for looking at this,

I re-tested on the N1SDP (previously I was using a graviton2 instance).
I had to adjust the command slightly with "-m,2" to get it consistently
this time:

$ taskset --cpu-list 0 sudo dd if=/dev/random of=/dev/null &
$ perf record -e arm_spe_0// -C0 -m,2 -- sleep 1
$ perf report -D | grep CONTEXT | head
.  0000000e:  65 b5 6e 00 00                                  CONTEXT 0x6eb5 el2
.  0000004e:  65 b5 6e 00 00                                  CONTEXT 0x6eb5 el2
.  0000008e:  65 b5 6e 00 00                                  CONTEXT 0x6eb5 el2
[...]

>
> Thanks,
> Leo
>
>> One way to fix this is by caching the value of the CX bit during the
>> initialization of the PMU event, so that it remains consistent for the
>> duration of the session.
>>
>> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_spe_pmu.c?h=v5.16#n275
>> [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_spe_pmu.c?h=v5.16#n713
>> [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_spe_pmu.c?h=v5.16#n751
>>
>> Signed-off-by: German Gomez <german.gomez@arm.com>
>> ---
>>  drivers/perf/arm_spe_pmu.c | 6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
>> index d44bcc29d..8515bf85c 100644
>> --- a/drivers/perf/arm_spe_pmu.c
>> +++ b/drivers/perf/arm_spe_pmu.c
>> @@ -57,6 +57,7 @@ struct arm_spe_pmu {
>>  	u16					pmsver;
>>  	u16					min_period;
>>  	u16					counter_sz;
>> +	bool					pmscr_cx;
>>  
>>  #define SPE_PMU_FEAT_FILT_EVT			(1UL << 0)
>>  #define SPE_PMU_FEAT_FILT_TYP			(1UL << 1)
>> @@ -260,6 +261,7 @@ static const struct attribute_group *arm_spe_pmu_attr_groups[] = {
>>  static u64 arm_spe_event_to_pmscr(struct perf_event *event)
>>  {
>>  	struct perf_event_attr *attr = &event->attr;
>> +	struct arm_spe_pmu *spe_pmu = to_spe_pmu(event->pmu);
>>  	u64 reg = 0;
>>  
>>  	reg |= ATTR_CFG_GET_FLD(attr, ts_enable) << SYS_PMSCR_EL1_TS_SHIFT;
>> @@ -272,7 +274,7 @@ static u64 arm_spe_event_to_pmscr(struct perf_event *event)
>>  	if (!attr->exclude_kernel)
>>  		reg |= BIT(SYS_PMSCR_EL1_E1SPE_SHIFT);
>>  
>> -	if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable())
>> +	if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && spe_pmu->pmscr_cx)
>>  		reg |= BIT(SYS_PMSCR_EL1_CX_SHIFT);
>>  
>>  	return reg;
>> @@ -709,10 +711,10 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
>>  	    !(spe_pmu->features & SPE_PMU_FEAT_FILT_LAT))
>>  		return -EOPNOTSUPP;
>>  
>> +	spe_pmu->pmscr_cx = perfmon_capable();
>>  	reg = arm_spe_event_to_pmscr(event);
>>  	if (!perfmon_capable() &&
>>  	    (reg & (BIT(SYS_PMSCR_EL1_PA_SHIFT) |
>> -		    BIT(SYS_PMSCR_EL1_CX_SHIFT) |
>>  		    BIT(SYS_PMSCR_EL1_PCT_SHIFT))))
>>  		return -EACCES;
>>  
>> -- 
>> 2.25.1
>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX
  2022-02-07 12:06     ` German Gomez
@ 2022-02-08 13:00       ` Leo Yan
  2022-02-10 17:23         ` German Gomez
  0 siblings, 1 reply; 18+ messages in thread
From: Leo Yan @ 2022-02-08 13:00 UTC (permalink / raw)
  To: German Gomez
  Cc: linux-arm-kernel, linux-kernel, will, mark.rutland, james.clark

Hi German,

On Mon, Feb 07, 2022 at 12:06:14PM +0000, German Gomez wrote:

[...]

> > I reviewed the code and also traced the backtrace for the function
> > arm_spe_pmu_start(), I can confirm that every time perf session will
> > execute below flow:
> >
> >   evlist__enable()
> >     __evlist__enable()
> >       evlist__for_each_cpu() {  -> call affinity__set()
> >         evsel__enable_cpu()
> >       }
> >
> > We can see the macro evlist__for_each_cpu() will extend to invoke
> > evlist__cpu_begin() and affinity__set(); affinity__set() will set CPU
> > affinity to the target CPU, thus perf process will firstly migrate to
> > the target CPU and enable event on the target CPU.  This means perf
> > will not send remote IPI and it directly runs on target CPU, and the
> > dd program will not interfere capabilities for perf session.
>
> Thank you for looking at this,
> 
> I re-tested on the N1SDP (previously I was using a graviton2 instance).
> I had to adjust the command slightly with "-m,2" to get it consistently
> this time:
> 
> $ taskset --cpu-list 0 sudo dd if=/dev/random of=/dev/null &
> $ perf record -e arm_spe_0// -C0 -m,2 -- sleep 1
> $ perf report -D | grep CONTEXT | head
> .  0000000e:  65 b5 6e 00 00                                  CONTEXT 0x6eb5 el2
> .  0000004e:  65 b5 6e 00 00                                  CONTEXT 0x6eb5 el2
> .  0000008e:  65 b5 6e 00 00                                  CONTEXT 0x6eb5 el2
> [...]

Indeed!  I can reproduce the issue now.  And I can capture backtrace
for arm_spe_pmu_start() with below commands:

# cd /home/leoy/linux/tools/perf
# ./perf probe --add "arm_spe_pmu_start" -s /home/leoy/linux/ -k /home/leoy/linux/vmlinux
# echo 1 > /sys/kernel/debug/tracing/events/probe/arm_spe_pmu_start/enable
# echo stacktrace > /sys/kernel/debug/tracing/events/probe/arm_spe_pmu_start/trigger

... run your commands with non-root user ...

# cat /sys/kernel/debug/tracing/trace

             dd-7697    [000] d.h2.   506.068700: arm_spe_pmu_start: (arm_spe_pmu_start+0x8/0xe0)
             dd-7697    [000] d.h3.   506.068701: <stack trace>
=> kprobe_dispatcher
=> kprobe_breakpoint_handler
=> call_break_hook
=> brk_handler
=> do_debug_exception
=> el1_dbg
=> el1h_64_sync_handler
=> el1h_64_sync
=> arm_spe_pmu_start
=> event_sched_in.isra.0
=> merge_sched_in
=> visit_groups_merge.constprop.0
=> ctx_sched_in
=> perf_event_sched_in
=> ctx_resched
=> __perf_event_enable
=> event_function
=> remote_function
=> flush_smp_call_function_queue
=> generic_smp_call_function_single_interrupt
=> ipi_handler
=> handle_percpu_devid_irq
=> generic_handle_domain_irq
=> gic_handle_irq
=> call_on_irq_stack
=> do_interrupt_handler
=> el1_interrupt
=> el1h_64_irq_handler
=> el1h_64_irq
=> _raw_spin_unlock_irqrestore
=> urandom_read_nowarn.isra.0
=> random_read
=> vfs_read
=> ksys_read
=> __arm64_sys_read
=> invoke_syscall
=> el0_svc_common.constprop.0
=> do_el0_svc
=> el0_svc
=> el0t_64_sync_handler
=> el0t_64_sync

The backtrace clearly shows the function arm_spe_pmu_start() is
invoked in the 'dd' process (dd-7697); the flow is:
- perf program sends IPI to CPU0;
- 'dd' process is running on CPU0 and it's interrupted to handle IPI;
- 'dd' process has root capabilities, so it will enable context
  tracing for non-root perf session.

> >> One way to fix this is by caching the value of the CX bit during the
> >> initialization of the PMU event, so that it remains consistent for the
> >> duration of the session.
> >>
> >> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_spe_pmu.c?h=v5.16#n275
> >> [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_spe_pmu.c?h=v5.16#n713
> >> [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_spe_pmu.c?h=v5.16#n751
> >>
> >> Signed-off-by: German Gomez <german.gomez@arm.com>
> >> ---
> >>  drivers/perf/arm_spe_pmu.c | 6 ++++--
> >>  1 file changed, 4 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> >> index d44bcc29d..8515bf85c 100644
> >> --- a/drivers/perf/arm_spe_pmu.c
> >> +++ b/drivers/perf/arm_spe_pmu.c
> >> @@ -57,6 +57,7 @@ struct arm_spe_pmu {
> >>  	u16					pmsver;
> >>  	u16					min_period;
> >>  	u16					counter_sz;
> >> +	bool					pmscr_cx;

So the patch makes sense to me.  Just a minor comment:

Here we can define a u64 for recording pmscr value rather than a
bool value.

struct arm_spe_pmu {
    ...
    u64 pmscr;
};

> >>  
> >>  #define SPE_PMU_FEAT_FILT_EVT			(1UL << 0)
> >>  #define SPE_PMU_FEAT_FILT_TYP			(1UL << 1)
> >> @@ -260,6 +261,7 @@ static const struct attribute_group *arm_spe_pmu_attr_groups[] = {
> >>  static u64 arm_spe_event_to_pmscr(struct perf_event *event)
> >>  {
> >>  	struct perf_event_attr *attr = &event->attr;
> >> +	struct arm_spe_pmu *spe_pmu = to_spe_pmu(event->pmu);
> >>  	u64 reg = 0;
> >>  
> >>  	reg |= ATTR_CFG_GET_FLD(attr, ts_enable) << SYS_PMSCR_EL1_TS_SHIFT;
> >> @@ -272,7 +274,7 @@ static u64 arm_spe_event_to_pmscr(struct perf_event *event)
> >>  	if (!attr->exclude_kernel)
> >>  		reg |= BIT(SYS_PMSCR_EL1_E1SPE_SHIFT);
> >>  
> >> -	if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable())
> >> +	if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && spe_pmu->pmscr_cx)
> >>  		reg |= BIT(SYS_PMSCR_EL1_CX_SHIFT);
> >>  
> >>  	return reg;
> >> @@ -709,10 +711,10 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
> >>  	    !(spe_pmu->features & SPE_PMU_FEAT_FILT_LAT))
> >>  		return -EOPNOTSUPP;
> >>  
> >> +	spe_pmu->pmscr_cx = perfmon_capable();
> >>  	reg = arm_spe_event_to_pmscr(event);

Thus here we can change as:

  spe_pmu->pmscr = arm_spe_event_to_pmscr(event);

And then in the function arm_spe_pmu_start(), we can skip calling
arm_spe_event_to_pmscr() and directly set PMSCR register:

static void arm_spe_pmu_start(struct perf_event *event, int flags)
{
    ...

    isb();
    write_sysreg_s(spe_pmu->pmscr, SYS_PMSCR_EL1);
}

Thanks,
Leo

> >>  	if (!perfmon_capable() &&
> >>  	    (reg & (BIT(SYS_PMSCR_EL1_PA_SHIFT) |
> >> -		    BIT(SYS_PMSCR_EL1_CX_SHIFT) |
> >>  		    BIT(SYS_PMSCR_EL1_PCT_SHIFT))))
> >>  		return -EACCES;
> >>  
> >> -- 
> >> 2.25.1
> >>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX
  2022-02-08 13:00       ` Leo Yan
@ 2022-02-10 17:23         ` German Gomez
  2022-02-11 10:45           ` Leo Yan
  0 siblings, 1 reply; 18+ messages in thread
From: German Gomez @ 2022-02-10 17:23 UTC (permalink / raw)
  To: Leo Yan; +Cc: linux-arm-kernel, linux-kernel, will, mark.rutland, james.clark

Hi Leo,

On 08/02/2022 13:00, Leo Yan wrote:
> Hi German,
>
> On Mon, Feb 07, 2022 at 12:06:14PM +0000, German Gomez wrote:
>
> [...]
> Indeed!  I can reproduce the issue now.  And I can capture backtrace
> for arm_spe_pmu_start() with below commands:
>
> # cd /home/leoy/linux/tools/perf
> # ./perf probe --add "arm_spe_pmu_start" -s /home/leoy/linux/ -k /home/leoy/linux/vmlinux
> # echo 1 > /sys/kernel/debug/tracing/events/probe/arm_spe_pmu_start/enable
> # echo stacktrace > /sys/kernel/debug/tracing/events/probe/arm_spe_pmu_start/trigger
>
> ... run your commands with non-root user ...
>
> # cat /sys/kernel/debug/tracing/trace
>
>              dd-7697    [000] d.h2.   506.068700: arm_spe_pmu_start: (arm_spe_pmu_start+0x8/0xe0)
>              dd-7697    [000] d.h3.   506.068701: <stack trace>
> => kprobe_dispatcher
> => kprobe_breakpoint_handler
> => call_break_hook
> [...]
> => do_el0_svc
> => el0_svc
> => el0t_64_sync_handler
> => el0t_64_sync
>
> The backtrace clearly shows the function arm_spe_pmu_start() is
> invoked in the 'dd' process (dd-7697); the flow is:
> - perf program sends IPI to CPU0;
> - 'dd' process is running on CPU0 and it's interrupted to handle IPI;
> - 'dd' process has root capabilities, so it will enable context
>   tracing for non-root perf session.

Thanks for testing, and sharing the commands in your replies!

>
>>>> One way to fix this is by caching the value of the CX bit during the
>>>> initialization of the PMU event, so that it remains consistent for the
>>>> duration of the session.
>>>>
>>>> [...]
> So the patch makes sense to me.  Just a minor comment:
>
> Here we can define a u64 for recording pmscr value rather than a
> bool value.
>
> struct arm_spe_pmu {
>     ...
>     u64 pmscr;
> };

I agree with the comment from Will that it makes more sense to store the
value of the register in the perf_event somehow (due to misunderstanding
from my side, I thought arm_spe_pmu struct was local to the session).

What about perf_event's void *pmu_private?

Thanks,
German

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX
  2022-02-10 17:23         ` German Gomez
@ 2022-02-11 10:45           ` Leo Yan
  2022-02-15 14:29             ` German Gomez
  0 siblings, 1 reply; 18+ messages in thread
From: Leo Yan @ 2022-02-11 10:45 UTC (permalink / raw)
  To: German Gomez
  Cc: linux-arm-kernel, linux-kernel, will, mark.rutland, james.clark

Hi German,

On Thu, Feb 10, 2022 at 05:23:50PM +0000, German Gomez wrote:

[...]

> >>>> One way to fix this is by caching the value of the CX bit during the
> >>>> initialization of the PMU event, so that it remains consistent for the
> >>>> duration of the session.
> >>>>
> >>>> [...]
> > So the patch makes sense to me.  Just a minor comment:
> >
> > Here we can define a u64 for recording pmscr value rather than a
> > bool value.
> >
> > struct arm_spe_pmu {
> >     ...
> >     u64 pmscr;
> > };
> 
> I agree with the comment from Will that it makes more sense to store the
> value of the register in the perf_event somehow (due to misunderstanding
> from my side, I thought arm_spe_pmu struct was local to the session).

It's shame that I miss this point :) As you said, struct arm_spe_pmu is
a data structure for Arm SPE device driver instance and it's not
allocated for perf session.

> What about perf_event's void *pmu_private?

Before we use perf_event::pmu_private, could you check the data
structure arm_spe_pmu_buf firstly?  This data structure is allocated
when setup AUX ring buffer (so it's allocated for perf session).
IIUC, the function arm_spe_pmu_setup_aux() will be invoked in the perf
process, so it's good for us to initialize pmscr in this function.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX
  2022-02-11 10:45           ` Leo Yan
@ 2022-02-15 14:29             ` German Gomez
  2022-02-16 13:22               ` Leo Yan
  0 siblings, 1 reply; 18+ messages in thread
From: German Gomez @ 2022-02-15 14:29 UTC (permalink / raw)
  To: Leo Yan; +Cc: linux-arm-kernel, linux-kernel, will, mark.rutland, james.clark


On 11/02/2022 10:45, Leo Yan wrote:
> Hi German,
>
> On Thu, Feb 10, 2022 at 05:23:50PM +0000, German Gomez wrote:
>
> [...]
>
>>>>>> One way to fix this is by caching the value of the CX bit during the
>>>>>> initialization of the PMU event, so that it remains consistent for the
>>>>>> duration of the session.
>>>>>>
>>>>>> [...]
>>> So the patch makes sense to me.  Just a minor comment:
>>>
>>> Here we can define a u64 for recording pmscr value rather than a
>>> bool value.
>>>
>>> struct arm_spe_pmu {
>>>     ...
>>>     u64 pmscr;
>>> };
>> I agree with the comment from Will that it makes more sense to store the
>> value of the register in the perf_event somehow (due to misunderstanding
>> from my side, I thought arm_spe_pmu struct was local to the session).
> It's shame that I miss this point :) As you said, struct arm_spe_pmu is
> a data structure for Arm SPE device driver instance and it's not
> allocated for perf session.
>
>> What about perf_event's void *pmu_private?
> Before we use perf_event::pmu_private, could you check the data
> structure arm_spe_pmu_buf firstly?  This data structure is allocated
> when setup AUX ring buffer (so it's allocated for perf session).
> IIUC, the function arm_spe_pmu_setup_aux() will be invoked in the perf
> process, so it's good for us to initialize pmscr in this function.
Thanks for the suggestion. I recorded the following stacktrace:

 perf-323841 [052] d.... 3996.528812: arm_spe_pmu_setup_aux: (arm_spe_pmu_setup_aux+0x60/0x1c0 [arm_spe_pmu])
 perf-323841 [052] d.... 3996.528813: <stack trace>
 => kprobe_dispatcher
 => kprobe_breakpoint_handler
 => call_break_hook
 => brk_handler
 => do_debug_exception
 => el1_dbg
 => el1h_64_sync_handler
 => el1h_64_sync
 => arm_spe_pmu_setup_aux
 => perf_mmap
 => mmap_region
 => do_mmap
 => vm_mmap_pgoff
 => ksys_mmap_pgoff
 => __arm64_sys_mmap
 => invoke_syscall
 => el0_svc_common.constprop.0
 => do_el0_svc
 => el0_svc
 => el0t_64_sync_handler
 => el0t_64_sync

So for a v2 I may include something like this:

diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index d44bcc29d..aadec5a0e 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -45,6 +45,7 @@ struct arm_spe_pmu_buf {
     int                    nr_pages;
     bool                    snapshot;
     void                    *base;
+    u64                    pmscr;
 };
 
 struct arm_spe_pmu {
@@ -748,7 +749,7 @@ static void arm_spe_pmu_start(struct perf_event *event, int flags)
         write_sysreg_s(reg, SYS_PMSICR_EL1);
     }
 
-    reg = arm_spe_event_to_pmscr(event);
+    reg = ((struct arm_spe_pmu_buf *) perf_get_aux(handle))->pmscr;
     isb();
     write_sysreg_s(reg, SYS_PMSCR_EL1);
 }
@@ -855,6 +856,8 @@ static void *arm_spe_pmu_setup_aux(struct perf_event *event, void **pages,
     if (!pglist)
         goto out_free_buf;
 
+    buf->pmscr = arm_spe_event_to_pmscr(event);
+
     for (i = 0; i < nr_pages; ++i)
         pglist[i] = virt_to_page(pages[i]);

>
> Thanks,
> Leo

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX
  2022-02-15 14:29             ` German Gomez
@ 2022-02-16 13:22               ` Leo Yan
  2022-02-16 15:16                 ` German Gomez
  0 siblings, 1 reply; 18+ messages in thread
From: Leo Yan @ 2022-02-16 13:22 UTC (permalink / raw)
  To: German Gomez
  Cc: linux-arm-kernel, linux-kernel, will, mark.rutland, james.clark

On Tue, Feb 15, 2022 at 02:29:27PM +0000, German Gomez wrote:

[...]

> Thanks for the suggestion. I recorded the following stacktrace:
> 
>  perf-323841 [052] d.... 3996.528812: arm_spe_pmu_setup_aux: (arm_spe_pmu_setup_aux+0x60/0x1c0 [arm_spe_pmu])
>  perf-323841 [052] d.... 3996.528813: <stack trace>

Yeah, this show arm_spe_pmu_setup_aux() is called in perf process.

>  => kprobe_dispatcher
>  => kprobe_breakpoint_handler
>  => call_break_hook
>  => brk_handler
>  => do_debug_exception
>  => el1_dbg
>  => el1h_64_sync_handler
>  => el1h_64_sync
>  => arm_spe_pmu_setup_aux
>  => perf_mmap
>  => mmap_region
>  => do_mmap
>  => vm_mmap_pgoff
>  => ksys_mmap_pgoff
>  => __arm64_sys_mmap
>  => invoke_syscall
>  => el0_svc_common.constprop.0
>  => do_el0_svc
>  => el0_svc
>  => el0t_64_sync_handler
>  => el0t_64_sync
> 
> So for a v2 I may include something like this:

The change looks good to me, please see below minor comment.

> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> index d44bcc29d..aadec5a0e 100644
> --- a/drivers/perf/arm_spe_pmu.c
> +++ b/drivers/perf/arm_spe_pmu.c
> @@ -45,6 +45,7 @@ struct arm_spe_pmu_buf {
>      int                    nr_pages;
>      bool                    snapshot;
>      void                    *base;
> +    u64                    pmscr;
>  };
>  
>  struct arm_spe_pmu {
> @@ -748,7 +749,7 @@ static void arm_spe_pmu_start(struct perf_event *event, int flags)
>          write_sysreg_s(reg, SYS_PMSICR_EL1);
>      }
>  
> -    reg = arm_spe_event_to_pmscr(event);
> +    reg = ((struct arm_spe_pmu_buf *) perf_get_aux(handle))->pmscr;
>      isb();
>      write_sysreg_s(reg, SYS_PMSCR_EL1);

Just nitpick (or it's just my preferring coding style), we can define
a local pointer variable 'buf':

  struct arm_spe_pmu_buf *buf = perf_get_aux(handle);

  ...

  isb();
  write_sysreg_s(buf->pmscr, SYS_PMSCR_EL1);

Thanks,
Leo

>  }
> @@ -855,6 +856,8 @@ static void *arm_spe_pmu_setup_aux(struct perf_event *event, void **pages,
>      if (!pglist)
>          goto out_free_buf;
>  
> +    buf->pmscr = arm_spe_event_to_pmscr(event);
> +
>      for (i = 0; i < nr_pages; ++i)
>          pglist[i] = virt_to_page(pages[i]);
> 
> >
> > Thanks,
> > Leo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX
  2022-02-16 13:22               ` Leo Yan
@ 2022-02-16 15:16                 ` German Gomez
  0 siblings, 0 replies; 18+ messages in thread
From: German Gomez @ 2022-02-16 15:16 UTC (permalink / raw)
  To: Leo Yan; +Cc: linux-arm-kernel, linux-kernel, will, mark.rutland, james.clark

Hi Leo,

Thanks for the review

On 16/02/2022 13:22, Leo Yan wrote:
> On Tue, Feb 15, 2022 at 02:29:27PM +0000, German Gomez wrote:
>
> [...]
>
>> Thanks for the suggestion. I recorded the following stacktrace:
>>
>>  perf-323841 [052] d.... 3996.528812: arm_spe_pmu_setup_aux: (arm_spe_pmu_setup_aux+0x60/0x1c0 [arm_spe_pmu])
>>  perf-323841 [052] d.... 3996.528813: <stack trace>
> Yeah, this show arm_spe_pmu_setup_aux() is called in perf process.
>
>>  => kprobe_dispatcher
>>  => kprobe_breakpoint_handler
>>  => call_break_hook
>>  => brk_handler
>>  => do_debug_exception
>>  => el1_dbg
>>  => el1h_64_sync_handler
>>  => el1h_64_sync
>>  => arm_spe_pmu_setup_aux
>>  => perf_mmap
>>  => mmap_region
>>  => do_mmap
>>  => vm_mmap_pgoff
>>  => ksys_mmap_pgoff
>>  => __arm64_sys_mmap
>>  => invoke_syscall
>>  => el0_svc_common.constprop.0
>>  => do_el0_svc
>>  => el0_svc
>>  => el0t_64_sync_handler
>>  => el0t_64_sync
>>
>> So for a v2 I may include something like this:
> The change looks good to me, please see below minor comment.
>
>> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
>> index d44bcc29d..aadec5a0e 100644
>> --- a/drivers/perf/arm_spe_pmu.c
>> +++ b/drivers/perf/arm_spe_pmu.c
>> @@ -45,6 +45,7 @@ struct arm_spe_pmu_buf {
>>      int                    nr_pages;
>>      bool                    snapshot;
>>      void                    *base;
>> +    u64                    pmscr;
>>  };
>>  
>>  struct arm_spe_pmu {
>> @@ -748,7 +749,7 @@ static void arm_spe_pmu_start(struct perf_event *event, int flags)
>>          write_sysreg_s(reg, SYS_PMSICR_EL1);
>>      }
>>  
>> -    reg = arm_spe_event_to_pmscr(event);
>> +    reg = ((struct arm_spe_pmu_buf *) perf_get_aux(handle))->pmscr;
>>      isb();
>>      write_sysreg_s(reg, SYS_PMSCR_EL1);
> Just nitpick (or it's just my preferring coding style), we can define
> a local pointer variable 'buf':
>
>   struct arm_spe_pmu_buf *buf = perf_get_aux(handle);

I need to make sure perf_get_aux(..) is called between perf_aux_output_begin and *_end though (so, after arm_spe_perf_aux_output_begin(..)):

 buf = perf_get_aux(handle);
 reg = buf->pmscr;
 isb();
 write_sysreg_s(buf, SYS_PMSCR_EL1);

Alternatively, we set the register inside of perf_aux_output_begin. It might be confusing for casual readers because the function handles a case where perf_get_aux(..) returns NULL.

Alternatively, we could also wrap perf_get_aux(..) in a static inline function that returns the correct type and do:

 reg = arm_spe_get_aux(handle)->pmscr;
 isb();
 write_sysreg_s(reg, SYS_PMSCR_EL1);

So that it looks cleaner.

>
>   ...
>
>   isb();
>   write_sysreg_s(buf->pmscr, SYS_PMSCR_EL1);
>
> Thanks,
> Leo
>
>>  }
>> @@ -855,6 +856,8 @@ static void *arm_spe_pmu_setup_aux(struct perf_event *event, void **pages,
>>      if (!pglist)
>>          goto out_free_buf;
>>  
>> +    buf->pmscr = arm_spe_event_to_pmscr(event);
>> +
>>      for (i = 0; i < nr_pages; ++i)
>>          pglist[i] = virt_to_page(pages[i]);
>>
>>> Thanks,
>>> Leo

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-02-16 15:16 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-17 12:44 [RFC PATCH 0/2] perf: arm_spe: Fix consistency of CONTEXT packets in SPE driver German Gomez
2022-01-17 12:44 ` [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX German Gomez
2022-01-18 10:07   ` Will Deacon
2022-01-18 14:04     ` German Gomez
2022-01-19 11:27       ` German Gomez
2022-01-18 16:28     ` James Clark
2022-02-05 15:39   ` Leo Yan
2022-02-07 12:06     ` German Gomez
2022-02-08 13:00       ` Leo Yan
2022-02-10 17:23         ` German Gomez
2022-02-11 10:45           ` Leo Yan
2022-02-15 14:29             ` German Gomez
2022-02-16 13:22               ` Leo Yan
2022-02-16 15:16                 ` German Gomez
2022-01-17 12:44 ` [RFC PATCH 2/2] perf: arm_spe: Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode German Gomez
2022-01-17 14:04   ` German Gomez
2022-01-18  9:52   ` Will Deacon
2022-01-18 14:13     ` German Gomez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).