* [RFC PATCH 0/2] perf: arm_spe: Fix consistency of CONTEXT packets in SPE driver @ 2022-01-17 12:44 German Gomez 2022-01-17 12:44 ` [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX German Gomez 2022-01-17 12:44 ` [RFC PATCH 2/2] perf: arm_spe: Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode German Gomez 0 siblings, 2 replies; 18+ messages in thread From: German Gomez @ 2022-01-17 12:44 UTC (permalink / raw) To: linux-arm-kernel, linux-kernel, will, mark.rutland, james.clark, leo.yan Cc: German Gomez Applies a couple of small changes to the arm_spe_pmu driver. We are seeing context packets in an inconsistent number of SPE records even when the perf-tool runs without the needed capabilities. This is fixed in [1/2]. We're also allowing CONTEXT packets to be collected in per-cpu events in [2/2]. I'm sending as an RFC because it's the first time I change driver code. Also I'm not 100% sure of the approach in [2/2] (from a security and/or implementation standpoint). Thanks, German - [PATCH 1/2] Fixes the consistency issue with the context packets. - [PATCH 2/2] Enables context packets in per-cpu events. German Gomez (2): perf: arm_spe: Fix consistency of PMSCR register bit CX perf: arm_spe: Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode. drivers/perf/arm_spe_pmu.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 18+ messages in thread
* [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX 2022-01-17 12:44 [RFC PATCH 0/2] perf: arm_spe: Fix consistency of CONTEXT packets in SPE driver German Gomez @ 2022-01-17 12:44 ` German Gomez 2022-01-18 10:07 ` Will Deacon 2022-02-05 15:39 ` Leo Yan 2022-01-17 12:44 ` [RFC PATCH 2/2] perf: arm_spe: Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode German Gomez 1 sibling, 2 replies; 18+ messages in thread From: German Gomez @ 2022-01-17 12:44 UTC (permalink / raw) To: linux-arm-kernel, linux-kernel, will, mark.rutland, james.clark, leo.yan Cc: German Gomez The arm_spe_pmu driver will enable the CX bit of the PMSCR register in order to add CONTEXT packets into the traces if the owner of the perf event runs with capabilities CAP_PERFMON or CAP_SYS_ADMIN. The value of the bit is computed in the arm_spe_event_to_pmscr function [1], and the check for capabilities happens in [2] in the pmu init callback. This suggests that the value of the CX bit should remain consistent for the duration of the perf session. However, the function arm_spe_event_to_pmscr may be called later during the start callback [3] when the "current" process is not the owner of the perf session, so the CX bit is currently not consistent. Consider the following example: 1. Run a process in the background with capability CAP_SYS_ADMIN in CPU0. $ taskset --cpu-list 0 sudo dd if=/dev/random of=/dev/null & [3] 3806 2. Begin a perf session _without_ capabilities (we shouldn't see CONTEXT packets). $ perf record -e arm_spe_0// -C0 -- sleep 1 $ perf report -D | grep CONTEXT . 0000000e: 65 df 0e 00 00 CONTEXT 0xedf el2 . 0000004e: 65 df 0e 00 00 CONTEXT 0xedf el2 . 0000008e: 65 df 0e 00 00 CONTEXT 0xedf el2 [...] As can be seen, the traces begin showing CONTEXT packets when the pid is 0xedf (3807). This happens because the pmu start callback is run when the current process is not the owner of the perf session, so the CX register bit is set. One way to fix this is by caching the value of the CX bit during the initialization of the PMU event, so that it remains consistent for the duration of the session. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_spe_pmu.c?h=v5.16#n275 [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_spe_pmu.c?h=v5.16#n713 [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_spe_pmu.c?h=v5.16#n751 Signed-off-by: German Gomez <german.gomez@arm.com> --- drivers/perf/arm_spe_pmu.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c index d44bcc29d..8515bf85c 100644 --- a/drivers/perf/arm_spe_pmu.c +++ b/drivers/perf/arm_spe_pmu.c @@ -57,6 +57,7 @@ struct arm_spe_pmu { u16 pmsver; u16 min_period; u16 counter_sz; + bool pmscr_cx; #define SPE_PMU_FEAT_FILT_EVT (1UL << 0) #define SPE_PMU_FEAT_FILT_TYP (1UL << 1) @@ -260,6 +261,7 @@ static const struct attribute_group *arm_spe_pmu_attr_groups[] = { static u64 arm_spe_event_to_pmscr(struct perf_event *event) { struct perf_event_attr *attr = &event->attr; + struct arm_spe_pmu *spe_pmu = to_spe_pmu(event->pmu); u64 reg = 0; reg |= ATTR_CFG_GET_FLD(attr, ts_enable) << SYS_PMSCR_EL1_TS_SHIFT; @@ -272,7 +274,7 @@ static u64 arm_spe_event_to_pmscr(struct perf_event *event) if (!attr->exclude_kernel) reg |= BIT(SYS_PMSCR_EL1_E1SPE_SHIFT); - if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable()) + if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && spe_pmu->pmscr_cx) reg |= BIT(SYS_PMSCR_EL1_CX_SHIFT); return reg; @@ -709,10 +711,10 @@ static int arm_spe_pmu_event_init(struct perf_event *event) !(spe_pmu->features & SPE_PMU_FEAT_FILT_LAT)) return -EOPNOTSUPP; + spe_pmu->pmscr_cx = perfmon_capable(); reg = arm_spe_event_to_pmscr(event); if (!perfmon_capable() && (reg & (BIT(SYS_PMSCR_EL1_PA_SHIFT) | - BIT(SYS_PMSCR_EL1_CX_SHIFT) | BIT(SYS_PMSCR_EL1_PCT_SHIFT)))) return -EACCES; -- 2.25.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX 2022-01-17 12:44 ` [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX German Gomez @ 2022-01-18 10:07 ` Will Deacon 2022-01-18 14:04 ` German Gomez 2022-01-18 16:28 ` James Clark 2022-02-05 15:39 ` Leo Yan 1 sibling, 2 replies; 18+ messages in thread From: Will Deacon @ 2022-01-18 10:07 UTC (permalink / raw) To: German Gomez Cc: linux-arm-kernel, linux-kernel, mark.rutland, james.clark, leo.yan On Mon, Jan 17, 2022 at 12:44:31PM +0000, German Gomez wrote: > The arm_spe_pmu driver will enable the CX bit of the PMSCR register in > order to add CONTEXT packets into the traces if the owner of the perf > event runs with capabilities CAP_PERFMON or CAP_SYS_ADMIN. > > The value of the bit is computed in the arm_spe_event_to_pmscr function > [1], and the check for capabilities happens in [2] in the pmu init > callback. This suggests that the value of the CX bit should remain > consistent for the duration of the perf session. > > However, the function arm_spe_event_to_pmscr may be called later during > the start callback [3] when the "current" process is not the owner of > the perf session, so the CX bit is currently not consistent. Consider > the following example: > > 1. Run a process in the background with capability CAP_SYS_ADMIN in CPU0. > > $ taskset --cpu-list 0 sudo dd if=/dev/random of=/dev/null & > [3] 3806 > > 2. Begin a perf session _without_ capabilities (we shouldn't see CONTEXT packets). > > $ perf record -e arm_spe_0// -C0 -- sleep 1 > $ perf report -D | grep CONTEXT > . 0000000e: 65 df 0e 00 00 CONTEXT 0xedf el2 > . 0000004e: 65 df 0e 00 00 CONTEXT 0xedf el2 > . 0000008e: 65 df 0e 00 00 CONTEXT 0xedf el2 > [...] > > As can be seen, the traces begin showing CONTEXT packets when the pid is > 0xedf (3807). So to be clear: we shouldn't be reporting these packets because 'perf' doesn't have the right capabilities, but we evaluate that in the context of 'dd' (running as root) and so incorrectly grant the permission. Correct? > This happens because the pmu start callback is run when > the current process is not the owner of the perf session, so the CX > register bit is set. This doesn't really seem SPE-specific to me -- the perf_allow_*() helpers also operate implicitly on the current task. How do other PMU drivers avoid falling into this trap? > One way to fix this is by caching the value of the CX bit during the > initialization of the PMU event, so that it remains consistent for the > duration of the session. It doesn't feel right to stash this in 'struct arm_spe_pmu' during event initialisation -- wouldn't that allow perf to continue creating new events with CX set, even if the paranoid sysctl was changed dynamically? Instead, I think it would be better if the capabilities were stash in the event itself somehow at initialisation time. Will ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX 2022-01-18 10:07 ` Will Deacon @ 2022-01-18 14:04 ` German Gomez 2022-01-19 11:27 ` German Gomez 2022-01-18 16:28 ` James Clark 1 sibling, 1 reply; 18+ messages in thread From: German Gomez @ 2022-01-18 14:04 UTC (permalink / raw) To: Will Deacon Cc: linux-arm-kernel, linux-kernel, mark.rutland, james.clark, leo.yan Hi Will, Many thanks for your comments On 18/01/2022 10:07, Will Deacon wrote: > On Mon, Jan 17, 2022 at 12:44:31PM +0000, German Gomez wrote: >> [...] >> >> 1. Run a process in the background with capability CAP_SYS_ADMIN in CPU0. >> >> $ taskset --cpu-list 0 sudo dd if=/dev/random of=/dev/null & >> [3] 3806 >> >> 2. Begin a perf session _without_ capabilities (we shouldn't see CONTEXT packets). >> >> $ perf record -e arm_spe_0// -C0 -- sleep 1 >> $ perf report -D | grep CONTEXT >> . 0000000e: 65 df 0e 00 00 CONTEXT 0xedf el2 >> . 0000004e: 65 df 0e 00 00 CONTEXT 0xedf el2 >> . 0000008e: 65 df 0e 00 00 CONTEXT 0xedf el2 >> [...] >> >> As can be seen, the traces begin showing CONTEXT packets when the pid is >> 0xedf (3807). > So to be clear: we shouldn't be reporting these packets because 'perf' > doesn't have the right capabilities, but we evaluate that in the context of > 'dd' (running as root) and so incorrectly grant the permission. Correct? Yes, correct. My guess was that "perfmon_capable()" was being called under the assumption that it would always be evaluated in the context of 'perf'. Is that correct? > >> This happens because the pmu start callback is run when >> the current process is not the owner of the perf session, so the CX >> register bit is set. > This doesn't really seem SPE-specific to me -- the perf_allow_*() helpers > also operate implicitly on the current task. How do other PMU drivers avoid > falling into this trap? I'm not as familiar with the other PMU drivers. I quickly tried grepping something related in the cs_etm drivers as they use CONTEXTIDR as well, but couldn't find references to perfmon_capable() or similar checks. Grepping for "perf_allow_" inside of drivers doesn't yield results. There's some gpu driver that has similar perfmon_capable() checks but unlike spe, they error out if they don't pass (drivers/gpu/drm/i915/i915_perf.c). > >> One way to fix this is by caching the value of the CX bit during the >> initialization of the PMU event, so that it remains consistent for the >> duration of the session. > It doesn't feel right to stash this in 'struct arm_spe_pmu' during event > initialisation -- wouldn't that allow perf to continue creating new events > with CX set, even if the paranoid sysctl was changed dynamically? Instead, > I think it would be better if the capabilities were stash in the event > itself somehow at initialisation time. I hadn't considered this. Makes more sense to store in the perf_event or via some type of mapping in the struct spe_pmu if not possible. Do you have any idea for the former? Or an idiomatic structure from the kernel for the later? Thanks, German > > Will ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX 2022-01-18 14:04 ` German Gomez @ 2022-01-19 11:27 ` German Gomez 0 siblings, 0 replies; 18+ messages in thread From: German Gomez @ 2022-01-19 11:27 UTC (permalink / raw) To: Will Deacon Cc: linux-arm-kernel, linux-kernel, mark.rutland, james.clark, leo.yan On 18/01/2022 14:04, German Gomez wrote: > Hi Will, > > Many thanks for your comments > > On 18/01/2022 10:07, Will Deacon wrote: >> On Mon, Jan 17, 2022 at 12:44:31PM +0000, German Gomez wrote: >>> [...] >>> >>> 1. Run a process in the background with capability CAP_SYS_ADMIN in CPU0. >>> >>> $ taskset --cpu-list 0 sudo dd if=/dev/random of=/dev/null & >>> [3] 3806 >>> >>> 2. Begin a perf session _without_ capabilities (we shouldn't see CONTEXT packets). >>> >>> $ perf record -e arm_spe_0// -C0 -- sleep 1 >>> $ perf report -D | grep CONTEXT >>> . 0000000e: 65 df 0e 00 00 CONTEXT 0xedf el2 >>> . 0000004e: 65 df 0e 00 00 CONTEXT 0xedf el2 >>> . 0000008e: 65 df 0e 00 00 CONTEXT 0xedf el2 >>> [...] >>> >>> As can be seen, the traces begin showing CONTEXT packets when the pid is >>> 0xedf (3807). >> So to be clear: we shouldn't be reporting these packets because 'perf' >> doesn't have the right capabilities, but we evaluate that in the context of >> 'dd' (running as root) and so incorrectly grant the permission. Correct? > Yes, correct. My guess was that "perfmon_capable()" was being called > under the assumption that it would always be evaluated in the context of > 'perf'. Is that correct? > >>> This happens because the pmu start callback is run when >>> the current process is not the owner of the perf session, so the CX >>> register bit is set. >> This doesn't really seem SPE-specific to me -- the perf_allow_*() helpers >> also operate implicitly on the current task. How do other PMU drivers avoid >> falling into this trap? > I'm not as familiar with the other PMU drivers. I quickly tried grepping > something related in the cs_etm drivers as they use CONTEXTIDR as well, > but couldn't find references to perfmon_capable() or similar checks. > > Grepping for "perf_allow_" inside of drivers doesn't yield results. > There's some gpu driver that has similar perfmon_capable() checks but > unlike spe, they error out if they don't pass (drivers/gpu/drm/i915/i915_perf.c). Just to expand a bit more on this (I missed grepping the other directories) ./arch/powerpc/perf/imc-pmu.c => perfmon_capable() only called on init ./kernel/events/core.c => perfmon_capable() only called on init ./arch/x86/events/core.c => perf_allow_*() function only called on init As far as I see, currently spe seems to be the only PMU event that checks capabilities in the start callback. 'perf' may not be the current task in this callback. > >>> One way to fix this is by caching the value of the CX bit during the >>> initialization of the PMU event, so that it remains consistent for the >>> duration of the session. >> It doesn't feel right to stash this in 'struct arm_spe_pmu' during event >> initialisation -- wouldn't that allow perf to continue creating new events >> with CX set, even if the paranoid sysctl was changed dynamically? Instead, >> I think it would be better if the capabilities were stash in the event >> itself somehow at initialisation time. > I hadn't considered this. Makes more sense to store in the perf_event > or via some type of mapping in the struct spe_pmu if not possible. Do > you have any idea for the former? Or an idiomatic structure from the > kernel for the later? > > Thanks, > German > >> Will ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX 2022-01-18 10:07 ` Will Deacon 2022-01-18 14:04 ` German Gomez @ 2022-01-18 16:28 ` James Clark 1 sibling, 0 replies; 18+ messages in thread From: James Clark @ 2022-01-18 16:28 UTC (permalink / raw) To: Will Deacon, German Gomez Cc: linux-arm-kernel, linux-kernel, mark.rutland, leo.yan On 18/01/2022 10:07, Will Deacon wrote: > On Mon, Jan 17, 2022 at 12:44:31PM +0000, German Gomez wrote: >> The arm_spe_pmu driver will enable the CX bit of the PMSCR register in >> order to add CONTEXT packets into the traces if the owner of the perf >> event runs with capabilities CAP_PERFMON or CAP_SYS_ADMIN. >> >> The value of the bit is computed in the arm_spe_event_to_pmscr function >> [1], and the check for capabilities happens in [2] in the pmu init >> callback. This suggests that the value of the CX bit should remain >> consistent for the duration of the perf session. >> >> However, the function arm_spe_event_to_pmscr may be called later during >> the start callback [3] when the "current" process is not the owner of >> the perf session, so the CX bit is currently not consistent. Consider >> the following example: >> >> 1. Run a process in the background with capability CAP_SYS_ADMIN in CPU0. >> >> $ taskset --cpu-list 0 sudo dd if=/dev/random of=/dev/null & >> [3] 3806 >> >> 2. Begin a perf session _without_ capabilities (we shouldn't see CONTEXT packets). >> >> $ perf record -e arm_spe_0// -C0 -- sleep 1 >> $ perf report -D | grep CONTEXT >> . 0000000e: 65 df 0e 00 00 CONTEXT 0xedf el2 >> . 0000004e: 65 df 0e 00 00 CONTEXT 0xedf el2 >> . 0000008e: 65 df 0e 00 00 CONTEXT 0xedf el2 >> [...] >> >> As can be seen, the traces begin showing CONTEXT packets when the pid is >> 0xedf (3807). > > So to be clear: we shouldn't be reporting these packets because 'perf' > doesn't have the right capabilities, but we evaluate that in the context of > 'dd' (running as root) and so incorrectly grant the permission. Correct? In my opinion we should be reporting context packets in this case. The only reason a normal user is allowed to profile a root process is if they have perf_event_paranoid <= 0. So from that perspective perf does have the right capabilities. But a call to only perfmon_capable() doesn't look at the paranoid value, only CAP_SYS_ADMIN and CAP_SYS_ADMIN. > >> This happens because the pmu start callback is run when >> the current process is not the owner of the perf session, so the CX >> register bit is set. > > This doesn't really seem SPE-specific to me -- the perf_allow_*() helpers > also operate implicitly on the current task. How do other PMU drivers avoid > falling into this trap?> >> One way to fix this is by caching the value of the CX bit during the >> initialization of the PMU event, so that it remains consistent for the >> duration of the session. > > It doesn't feel right to stash this in 'struct arm_spe_pmu' during event > initialisation -- wouldn't that allow perf to continue creating new events > with CX set, even if the paranoid sysctl was changed dynamically? Instead, > I think it would be better if the capabilities were stash in the event > itself somehow at initialisation time. If stashing is the issue then the change could be to check the right thing every time, as in do something like the second patch in this set which also checks perf_event_paranoid <= 0 (indirectly). James > > Will > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX 2022-01-17 12:44 ` [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX German Gomez 2022-01-18 10:07 ` Will Deacon @ 2022-02-05 15:39 ` Leo Yan 2022-02-07 12:06 ` German Gomez 1 sibling, 1 reply; 18+ messages in thread From: Leo Yan @ 2022-02-05 15:39 UTC (permalink / raw) To: German Gomez Cc: linux-arm-kernel, linux-kernel, will, mark.rutland, james.clark Hi German, On Mon, Jan 17, 2022 at 12:44:31PM +0000, German Gomez wrote: > The arm_spe_pmu driver will enable the CX bit of the PMSCR register in > order to add CONTEXT packets into the traces if the owner of the perf > event runs with capabilities CAP_PERFMON or CAP_SYS_ADMIN. > > The value of the bit is computed in the arm_spe_event_to_pmscr function > [1], and the check for capabilities happens in [2] in the pmu init > callback. This suggests that the value of the CX bit should remain > consistent for the duration of the perf session. > > However, the function arm_spe_event_to_pmscr may be called later during > the start callback [3] when the "current" process is not the owner of > the perf session, so the CX bit is currently not consistent. Consider > the following example: > > 1. Run a process in the background with capability CAP_SYS_ADMIN in CPU0. > > $ taskset --cpu-list 0 sudo dd if=/dev/random of=/dev/null & > [3] 3806 > > 2. Begin a perf session _without_ capabilities (we shouldn't see CONTEXT packets). > > $ perf record -e arm_spe_0// -C0 -- sleep 1 > $ perf report -D | grep CONTEXT > . 0000000e: 65 df 0e 00 00 CONTEXT 0xedf el2 > . 0000004e: 65 df 0e 00 00 CONTEXT 0xedf el2 > . 0000008e: 65 df 0e 00 00 CONTEXT 0xedf el2 > [...] Could you confirm if you still can reproduce this issue on the latest mainline kernel? I cannot reproduce this issue on the latest mainline kernel, I suspect this is impacted by recent refactoring evlist patches from Ian Rogers (though I am not for this). > As can be seen, the traces begin showing CONTEXT packets when the pid is > 0xedf (3807). This happens because the pmu start callback is run when > the current process is not the owner of the perf session, so the CX > register bit is set. I can image a potential issue is: the "dd" program running in background with capability CAP_SYS_ADMIN on CPU0, and then perf session sends an IPI remotely from any other CPU to CPU0, the dd process (on CPU0) is interrupted to handle ioctl PERF_EVENT_IOC_ENABLE, thus perfmon_capable() returns the capability of dd process, finally it leads to the wrong setting for PMSCR. I reviewed the code and also traced the backtrace for the function arm_spe_pmu_start(), I can confirm that every time perf session will execute below flow: evlist__enable() __evlist__enable() evlist__for_each_cpu() { -> call affinity__set() evsel__enable_cpu() } We can see the macro evlist__for_each_cpu() will extend to invoke evlist__cpu_begin() and affinity__set(); affinity__set() will set CPU affinity to the target CPU, thus perf process will firstly migrate to the target CPU and enable event on the target CPU. This means perf will not send remote IPI and it directly runs on target CPU, and the dd program will not interfere capabilities for perf session. Thanks, Leo > One way to fix this is by caching the value of the CX bit during the > initialization of the PMU event, so that it remains consistent for the > duration of the session. > > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_spe_pmu.c?h=v5.16#n275 > [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_spe_pmu.c?h=v5.16#n713 > [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_spe_pmu.c?h=v5.16#n751 > > Signed-off-by: German Gomez <german.gomez@arm.com> > --- > drivers/perf/arm_spe_pmu.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c > index d44bcc29d..8515bf85c 100644 > --- a/drivers/perf/arm_spe_pmu.c > +++ b/drivers/perf/arm_spe_pmu.c > @@ -57,6 +57,7 @@ struct arm_spe_pmu { > u16 pmsver; > u16 min_period; > u16 counter_sz; > + bool pmscr_cx; > > #define SPE_PMU_FEAT_FILT_EVT (1UL << 0) > #define SPE_PMU_FEAT_FILT_TYP (1UL << 1) > @@ -260,6 +261,7 @@ static const struct attribute_group *arm_spe_pmu_attr_groups[] = { > static u64 arm_spe_event_to_pmscr(struct perf_event *event) > { > struct perf_event_attr *attr = &event->attr; > + struct arm_spe_pmu *spe_pmu = to_spe_pmu(event->pmu); > u64 reg = 0; > > reg |= ATTR_CFG_GET_FLD(attr, ts_enable) << SYS_PMSCR_EL1_TS_SHIFT; > @@ -272,7 +274,7 @@ static u64 arm_spe_event_to_pmscr(struct perf_event *event) > if (!attr->exclude_kernel) > reg |= BIT(SYS_PMSCR_EL1_E1SPE_SHIFT); > > - if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable()) > + if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && spe_pmu->pmscr_cx) > reg |= BIT(SYS_PMSCR_EL1_CX_SHIFT); > > return reg; > @@ -709,10 +711,10 @@ static int arm_spe_pmu_event_init(struct perf_event *event) > !(spe_pmu->features & SPE_PMU_FEAT_FILT_LAT)) > return -EOPNOTSUPP; > > + spe_pmu->pmscr_cx = perfmon_capable(); > reg = arm_spe_event_to_pmscr(event); > if (!perfmon_capable() && > (reg & (BIT(SYS_PMSCR_EL1_PA_SHIFT) | > - BIT(SYS_PMSCR_EL1_CX_SHIFT) | > BIT(SYS_PMSCR_EL1_PCT_SHIFT)))) > return -EACCES; > > -- > 2.25.1 > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX 2022-02-05 15:39 ` Leo Yan @ 2022-02-07 12:06 ` German Gomez 2022-02-08 13:00 ` Leo Yan 0 siblings, 1 reply; 18+ messages in thread From: German Gomez @ 2022-02-07 12:06 UTC (permalink / raw) To: Leo Yan; +Cc: linux-arm-kernel, linux-kernel, will, mark.rutland, james.clark Hi Leo, On 05/02/2022 15:39, Leo Yan wrote: > Hi German, > > On Mon, Jan 17, 2022 at 12:44:31PM +0000, German Gomez wrote: >> [...] >> >> 1. Run a process in the background with capability CAP_SYS_ADMIN in CPU0. >> >> $ taskset --cpu-list 0 sudo dd if=/dev/random of=/dev/null & >> [3] 3806 >> >> 2. Begin a perf session _without_ capabilities (we shouldn't see CONTEXT packets). >> >> $ perf record -e arm_spe_0// -C0 -- sleep 1 >> $ perf report -D | grep CONTEXT >> . 0000000e: 65 df 0e 00 00 CONTEXT 0xedf el2 >> . 0000004e: 65 df 0e 00 00 CONTEXT 0xedf el2 >> . 0000008e: 65 df 0e 00 00 CONTEXT 0xedf el2 >> [...] > Could you confirm if you still can reproduce this issue on the latest > mainline kernel? I cannot reproduce this issue on the latest mainline > kernel, I suspect this is impacted by recent refactoring evlist > patches from Ian Rogers (though I am not for this). Are you referring to running on the latest kernel or the latest perf tool? I can still reproduce using perf from v5.17-rc3, on a kernel and SPE driver from 5.17.0-rc1 (commands below). > >> As can be seen, the traces begin showing CONTEXT packets when the pid is >> 0xedf (3807). This happens because the pmu start callback is run when >> the current process is not the owner of the perf session, so the CX >> register bit is set. > I can image a potential issue is: the "dd" program running in background > with capability CAP_SYS_ADMIN on CPU0, and then perf session sends an > IPI remotely from any other CPU to CPU0, the dd process (on CPU0) is > interrupted to handle ioctl PERF_EVENT_IOC_ENABLE, thus perfmon_capable() > returns the capability of dd process, finally it leads to the wrong > setting for PMSCR. > > I reviewed the code and also traced the backtrace for the function > arm_spe_pmu_start(), I can confirm that every time perf session will > execute below flow: > > evlist__enable() > __evlist__enable() > evlist__for_each_cpu() { -> call affinity__set() > evsel__enable_cpu() > } > > We can see the macro evlist__for_each_cpu() will extend to invoke > evlist__cpu_begin() and affinity__set(); affinity__set() will set CPU > affinity to the target CPU, thus perf process will firstly migrate to > the target CPU and enable event on the target CPU. This means perf > will not send remote IPI and it directly runs on target CPU, and the > dd program will not interfere capabilities for perf session. Thank you for looking at this, I re-tested on the N1SDP (previously I was using a graviton2 instance). I had to adjust the command slightly with "-m,2" to get it consistently this time: $ taskset --cpu-list 0 sudo dd if=/dev/random of=/dev/null & $ perf record -e arm_spe_0// -C0 -m,2 -- sleep 1 $ perf report -D | grep CONTEXT | head . 0000000e: 65 b5 6e 00 00 CONTEXT 0x6eb5 el2 . 0000004e: 65 b5 6e 00 00 CONTEXT 0x6eb5 el2 . 0000008e: 65 b5 6e 00 00 CONTEXT 0x6eb5 el2 [...] > > Thanks, > Leo > >> One way to fix this is by caching the value of the CX bit during the >> initialization of the PMU event, so that it remains consistent for the >> duration of the session. >> >> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_spe_pmu.c?h=v5.16#n275 >> [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_spe_pmu.c?h=v5.16#n713 >> [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_spe_pmu.c?h=v5.16#n751 >> >> Signed-off-by: German Gomez <german.gomez@arm.com> >> --- >> drivers/perf/arm_spe_pmu.c | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c >> index d44bcc29d..8515bf85c 100644 >> --- a/drivers/perf/arm_spe_pmu.c >> +++ b/drivers/perf/arm_spe_pmu.c >> @@ -57,6 +57,7 @@ struct arm_spe_pmu { >> u16 pmsver; >> u16 min_period; >> u16 counter_sz; >> + bool pmscr_cx; >> >> #define SPE_PMU_FEAT_FILT_EVT (1UL << 0) >> #define SPE_PMU_FEAT_FILT_TYP (1UL << 1) >> @@ -260,6 +261,7 @@ static const struct attribute_group *arm_spe_pmu_attr_groups[] = { >> static u64 arm_spe_event_to_pmscr(struct perf_event *event) >> { >> struct perf_event_attr *attr = &event->attr; >> + struct arm_spe_pmu *spe_pmu = to_spe_pmu(event->pmu); >> u64 reg = 0; >> >> reg |= ATTR_CFG_GET_FLD(attr, ts_enable) << SYS_PMSCR_EL1_TS_SHIFT; >> @@ -272,7 +274,7 @@ static u64 arm_spe_event_to_pmscr(struct perf_event *event) >> if (!attr->exclude_kernel) >> reg |= BIT(SYS_PMSCR_EL1_E1SPE_SHIFT); >> >> - if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable()) >> + if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && spe_pmu->pmscr_cx) >> reg |= BIT(SYS_PMSCR_EL1_CX_SHIFT); >> >> return reg; >> @@ -709,10 +711,10 @@ static int arm_spe_pmu_event_init(struct perf_event *event) >> !(spe_pmu->features & SPE_PMU_FEAT_FILT_LAT)) >> return -EOPNOTSUPP; >> >> + spe_pmu->pmscr_cx = perfmon_capable(); >> reg = arm_spe_event_to_pmscr(event); >> if (!perfmon_capable() && >> (reg & (BIT(SYS_PMSCR_EL1_PA_SHIFT) | >> - BIT(SYS_PMSCR_EL1_CX_SHIFT) | >> BIT(SYS_PMSCR_EL1_PCT_SHIFT)))) >> return -EACCES; >> >> -- >> 2.25.1 >> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX 2022-02-07 12:06 ` German Gomez @ 2022-02-08 13:00 ` Leo Yan 2022-02-10 17:23 ` German Gomez 0 siblings, 1 reply; 18+ messages in thread From: Leo Yan @ 2022-02-08 13:00 UTC (permalink / raw) To: German Gomez Cc: linux-arm-kernel, linux-kernel, will, mark.rutland, james.clark Hi German, On Mon, Feb 07, 2022 at 12:06:14PM +0000, German Gomez wrote: [...] > > I reviewed the code and also traced the backtrace for the function > > arm_spe_pmu_start(), I can confirm that every time perf session will > > execute below flow: > > > > evlist__enable() > > __evlist__enable() > > evlist__for_each_cpu() { -> call affinity__set() > > evsel__enable_cpu() > > } > > > > We can see the macro evlist__for_each_cpu() will extend to invoke > > evlist__cpu_begin() and affinity__set(); affinity__set() will set CPU > > affinity to the target CPU, thus perf process will firstly migrate to > > the target CPU and enable event on the target CPU. This means perf > > will not send remote IPI and it directly runs on target CPU, and the > > dd program will not interfere capabilities for perf session. > > Thank you for looking at this, > > I re-tested on the N1SDP (previously I was using a graviton2 instance). > I had to adjust the command slightly with "-m,2" to get it consistently > this time: > > $ taskset --cpu-list 0 sudo dd if=/dev/random of=/dev/null & > $ perf record -e arm_spe_0// -C0 -m,2 -- sleep 1 > $ perf report -D | grep CONTEXT | head > . 0000000e: 65 b5 6e 00 00 CONTEXT 0x6eb5 el2 > . 0000004e: 65 b5 6e 00 00 CONTEXT 0x6eb5 el2 > . 0000008e: 65 b5 6e 00 00 CONTEXT 0x6eb5 el2 > [...] Indeed! I can reproduce the issue now. And I can capture backtrace for arm_spe_pmu_start() with below commands: # cd /home/leoy/linux/tools/perf # ./perf probe --add "arm_spe_pmu_start" -s /home/leoy/linux/ -k /home/leoy/linux/vmlinux # echo 1 > /sys/kernel/debug/tracing/events/probe/arm_spe_pmu_start/enable # echo stacktrace > /sys/kernel/debug/tracing/events/probe/arm_spe_pmu_start/trigger ... run your commands with non-root user ... # cat /sys/kernel/debug/tracing/trace dd-7697 [000] d.h2. 506.068700: arm_spe_pmu_start: (arm_spe_pmu_start+0x8/0xe0) dd-7697 [000] d.h3. 506.068701: <stack trace> => kprobe_dispatcher => kprobe_breakpoint_handler => call_break_hook => brk_handler => do_debug_exception => el1_dbg => el1h_64_sync_handler => el1h_64_sync => arm_spe_pmu_start => event_sched_in.isra.0 => merge_sched_in => visit_groups_merge.constprop.0 => ctx_sched_in => perf_event_sched_in => ctx_resched => __perf_event_enable => event_function => remote_function => flush_smp_call_function_queue => generic_smp_call_function_single_interrupt => ipi_handler => handle_percpu_devid_irq => generic_handle_domain_irq => gic_handle_irq => call_on_irq_stack => do_interrupt_handler => el1_interrupt => el1h_64_irq_handler => el1h_64_irq => _raw_spin_unlock_irqrestore => urandom_read_nowarn.isra.0 => random_read => vfs_read => ksys_read => __arm64_sys_read => invoke_syscall => el0_svc_common.constprop.0 => do_el0_svc => el0_svc => el0t_64_sync_handler => el0t_64_sync The backtrace clearly shows the function arm_spe_pmu_start() is invoked in the 'dd' process (dd-7697); the flow is: - perf program sends IPI to CPU0; - 'dd' process is running on CPU0 and it's interrupted to handle IPI; - 'dd' process has root capabilities, so it will enable context tracing for non-root perf session. > >> One way to fix this is by caching the value of the CX bit during the > >> initialization of the PMU event, so that it remains consistent for the > >> duration of the session. > >> > >> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_spe_pmu.c?h=v5.16#n275 > >> [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_spe_pmu.c?h=v5.16#n713 > >> [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_spe_pmu.c?h=v5.16#n751 > >> > >> Signed-off-by: German Gomez <german.gomez@arm.com> > >> --- > >> drivers/perf/arm_spe_pmu.c | 6 ++++-- > >> 1 file changed, 4 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c > >> index d44bcc29d..8515bf85c 100644 > >> --- a/drivers/perf/arm_spe_pmu.c > >> +++ b/drivers/perf/arm_spe_pmu.c > >> @@ -57,6 +57,7 @@ struct arm_spe_pmu { > >> u16 pmsver; > >> u16 min_period; > >> u16 counter_sz; > >> + bool pmscr_cx; So the patch makes sense to me. Just a minor comment: Here we can define a u64 for recording pmscr value rather than a bool value. struct arm_spe_pmu { ... u64 pmscr; }; > >> > >> #define SPE_PMU_FEAT_FILT_EVT (1UL << 0) > >> #define SPE_PMU_FEAT_FILT_TYP (1UL << 1) > >> @@ -260,6 +261,7 @@ static const struct attribute_group *arm_spe_pmu_attr_groups[] = { > >> static u64 arm_spe_event_to_pmscr(struct perf_event *event) > >> { > >> struct perf_event_attr *attr = &event->attr; > >> + struct arm_spe_pmu *spe_pmu = to_spe_pmu(event->pmu); > >> u64 reg = 0; > >> > >> reg |= ATTR_CFG_GET_FLD(attr, ts_enable) << SYS_PMSCR_EL1_TS_SHIFT; > >> @@ -272,7 +274,7 @@ static u64 arm_spe_event_to_pmscr(struct perf_event *event) > >> if (!attr->exclude_kernel) > >> reg |= BIT(SYS_PMSCR_EL1_E1SPE_SHIFT); > >> > >> - if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable()) > >> + if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && spe_pmu->pmscr_cx) > >> reg |= BIT(SYS_PMSCR_EL1_CX_SHIFT); > >> > >> return reg; > >> @@ -709,10 +711,10 @@ static int arm_spe_pmu_event_init(struct perf_event *event) > >> !(spe_pmu->features & SPE_PMU_FEAT_FILT_LAT)) > >> return -EOPNOTSUPP; > >> > >> + spe_pmu->pmscr_cx = perfmon_capable(); > >> reg = arm_spe_event_to_pmscr(event); Thus here we can change as: spe_pmu->pmscr = arm_spe_event_to_pmscr(event); And then in the function arm_spe_pmu_start(), we can skip calling arm_spe_event_to_pmscr() and directly set PMSCR register: static void arm_spe_pmu_start(struct perf_event *event, int flags) { ... isb(); write_sysreg_s(spe_pmu->pmscr, SYS_PMSCR_EL1); } Thanks, Leo > >> if (!perfmon_capable() && > >> (reg & (BIT(SYS_PMSCR_EL1_PA_SHIFT) | > >> - BIT(SYS_PMSCR_EL1_CX_SHIFT) | > >> BIT(SYS_PMSCR_EL1_PCT_SHIFT)))) > >> return -EACCES; > >> > >> -- > >> 2.25.1 > >> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX 2022-02-08 13:00 ` Leo Yan @ 2022-02-10 17:23 ` German Gomez 2022-02-11 10:45 ` Leo Yan 0 siblings, 1 reply; 18+ messages in thread From: German Gomez @ 2022-02-10 17:23 UTC (permalink / raw) To: Leo Yan; +Cc: linux-arm-kernel, linux-kernel, will, mark.rutland, james.clark Hi Leo, On 08/02/2022 13:00, Leo Yan wrote: > Hi German, > > On Mon, Feb 07, 2022 at 12:06:14PM +0000, German Gomez wrote: > > [...] > Indeed! I can reproduce the issue now. And I can capture backtrace > for arm_spe_pmu_start() with below commands: > > # cd /home/leoy/linux/tools/perf > # ./perf probe --add "arm_spe_pmu_start" -s /home/leoy/linux/ -k /home/leoy/linux/vmlinux > # echo 1 > /sys/kernel/debug/tracing/events/probe/arm_spe_pmu_start/enable > # echo stacktrace > /sys/kernel/debug/tracing/events/probe/arm_spe_pmu_start/trigger > > ... run your commands with non-root user ... > > # cat /sys/kernel/debug/tracing/trace > > dd-7697 [000] d.h2. 506.068700: arm_spe_pmu_start: (arm_spe_pmu_start+0x8/0xe0) > dd-7697 [000] d.h3. 506.068701: <stack trace> > => kprobe_dispatcher > => kprobe_breakpoint_handler > => call_break_hook > [...] > => do_el0_svc > => el0_svc > => el0t_64_sync_handler > => el0t_64_sync > > The backtrace clearly shows the function arm_spe_pmu_start() is > invoked in the 'dd' process (dd-7697); the flow is: > - perf program sends IPI to CPU0; > - 'dd' process is running on CPU0 and it's interrupted to handle IPI; > - 'dd' process has root capabilities, so it will enable context > tracing for non-root perf session. Thanks for testing, and sharing the commands in your replies! > >>>> One way to fix this is by caching the value of the CX bit during the >>>> initialization of the PMU event, so that it remains consistent for the >>>> duration of the session. >>>> >>>> [...] > So the patch makes sense to me. Just a minor comment: > > Here we can define a u64 for recording pmscr value rather than a > bool value. > > struct arm_spe_pmu { > ... > u64 pmscr; > }; I agree with the comment from Will that it makes more sense to store the value of the register in the perf_event somehow (due to misunderstanding from my side, I thought arm_spe_pmu struct was local to the session). What about perf_event's void *pmu_private? Thanks, German ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX 2022-02-10 17:23 ` German Gomez @ 2022-02-11 10:45 ` Leo Yan 2022-02-15 14:29 ` German Gomez 0 siblings, 1 reply; 18+ messages in thread From: Leo Yan @ 2022-02-11 10:45 UTC (permalink / raw) To: German Gomez Cc: linux-arm-kernel, linux-kernel, will, mark.rutland, james.clark Hi German, On Thu, Feb 10, 2022 at 05:23:50PM +0000, German Gomez wrote: [...] > >>>> One way to fix this is by caching the value of the CX bit during the > >>>> initialization of the PMU event, so that it remains consistent for the > >>>> duration of the session. > >>>> > >>>> [...] > > So the patch makes sense to me. Just a minor comment: > > > > Here we can define a u64 for recording pmscr value rather than a > > bool value. > > > > struct arm_spe_pmu { > > ... > > u64 pmscr; > > }; > > I agree with the comment from Will that it makes more sense to store the > value of the register in the perf_event somehow (due to misunderstanding > from my side, I thought arm_spe_pmu struct was local to the session). It's shame that I miss this point :) As you said, struct arm_spe_pmu is a data structure for Arm SPE device driver instance and it's not allocated for perf session. > What about perf_event's void *pmu_private? Before we use perf_event::pmu_private, could you check the data structure arm_spe_pmu_buf firstly? This data structure is allocated when setup AUX ring buffer (so it's allocated for perf session). IIUC, the function arm_spe_pmu_setup_aux() will be invoked in the perf process, so it's good for us to initialize pmscr in this function. Thanks, Leo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX 2022-02-11 10:45 ` Leo Yan @ 2022-02-15 14:29 ` German Gomez 2022-02-16 13:22 ` Leo Yan 0 siblings, 1 reply; 18+ messages in thread From: German Gomez @ 2022-02-15 14:29 UTC (permalink / raw) To: Leo Yan; +Cc: linux-arm-kernel, linux-kernel, will, mark.rutland, james.clark On 11/02/2022 10:45, Leo Yan wrote: > Hi German, > > On Thu, Feb 10, 2022 at 05:23:50PM +0000, German Gomez wrote: > > [...] > >>>>>> One way to fix this is by caching the value of the CX bit during the >>>>>> initialization of the PMU event, so that it remains consistent for the >>>>>> duration of the session. >>>>>> >>>>>> [...] >>> So the patch makes sense to me. Just a minor comment: >>> >>> Here we can define a u64 for recording pmscr value rather than a >>> bool value. >>> >>> struct arm_spe_pmu { >>> ... >>> u64 pmscr; >>> }; >> I agree with the comment from Will that it makes more sense to store the >> value of the register in the perf_event somehow (due to misunderstanding >> from my side, I thought arm_spe_pmu struct was local to the session). > It's shame that I miss this point :) As you said, struct arm_spe_pmu is > a data structure for Arm SPE device driver instance and it's not > allocated for perf session. > >> What about perf_event's void *pmu_private? > Before we use perf_event::pmu_private, could you check the data > structure arm_spe_pmu_buf firstly? This data structure is allocated > when setup AUX ring buffer (so it's allocated for perf session). > IIUC, the function arm_spe_pmu_setup_aux() will be invoked in the perf > process, so it's good for us to initialize pmscr in this function. Thanks for the suggestion. I recorded the following stacktrace: perf-323841 [052] d.... 3996.528812: arm_spe_pmu_setup_aux: (arm_spe_pmu_setup_aux+0x60/0x1c0 [arm_spe_pmu]) perf-323841 [052] d.... 3996.528813: <stack trace> => kprobe_dispatcher => kprobe_breakpoint_handler => call_break_hook => brk_handler => do_debug_exception => el1_dbg => el1h_64_sync_handler => el1h_64_sync => arm_spe_pmu_setup_aux => perf_mmap => mmap_region => do_mmap => vm_mmap_pgoff => ksys_mmap_pgoff => __arm64_sys_mmap => invoke_syscall => el0_svc_common.constprop.0 => do_el0_svc => el0_svc => el0t_64_sync_handler => el0t_64_sync So for a v2 I may include something like this: diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c index d44bcc29d..aadec5a0e 100644 --- a/drivers/perf/arm_spe_pmu.c +++ b/drivers/perf/arm_spe_pmu.c @@ -45,6 +45,7 @@ struct arm_spe_pmu_buf { int nr_pages; bool snapshot; void *base; + u64 pmscr; }; struct arm_spe_pmu { @@ -748,7 +749,7 @@ static void arm_spe_pmu_start(struct perf_event *event, int flags) write_sysreg_s(reg, SYS_PMSICR_EL1); } - reg = arm_spe_event_to_pmscr(event); + reg = ((struct arm_spe_pmu_buf *) perf_get_aux(handle))->pmscr; isb(); write_sysreg_s(reg, SYS_PMSCR_EL1); } @@ -855,6 +856,8 @@ static void *arm_spe_pmu_setup_aux(struct perf_event *event, void **pages, if (!pglist) goto out_free_buf; + buf->pmscr = arm_spe_event_to_pmscr(event); + for (i = 0; i < nr_pages; ++i) pglist[i] = virt_to_page(pages[i]); > > Thanks, > Leo ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX 2022-02-15 14:29 ` German Gomez @ 2022-02-16 13:22 ` Leo Yan 2022-02-16 15:16 ` German Gomez 0 siblings, 1 reply; 18+ messages in thread From: Leo Yan @ 2022-02-16 13:22 UTC (permalink / raw) To: German Gomez Cc: linux-arm-kernel, linux-kernel, will, mark.rutland, james.clark On Tue, Feb 15, 2022 at 02:29:27PM +0000, German Gomez wrote: [...] > Thanks for the suggestion. I recorded the following stacktrace: > > perf-323841 [052] d.... 3996.528812: arm_spe_pmu_setup_aux: (arm_spe_pmu_setup_aux+0x60/0x1c0 [arm_spe_pmu]) > perf-323841 [052] d.... 3996.528813: <stack trace> Yeah, this show arm_spe_pmu_setup_aux() is called in perf process. > => kprobe_dispatcher > => kprobe_breakpoint_handler > => call_break_hook > => brk_handler > => do_debug_exception > => el1_dbg > => el1h_64_sync_handler > => el1h_64_sync > => arm_spe_pmu_setup_aux > => perf_mmap > => mmap_region > => do_mmap > => vm_mmap_pgoff > => ksys_mmap_pgoff > => __arm64_sys_mmap > => invoke_syscall > => el0_svc_common.constprop.0 > => do_el0_svc > => el0_svc > => el0t_64_sync_handler > => el0t_64_sync > > So for a v2 I may include something like this: The change looks good to me, please see below minor comment. > diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c > index d44bcc29d..aadec5a0e 100644 > --- a/drivers/perf/arm_spe_pmu.c > +++ b/drivers/perf/arm_spe_pmu.c > @@ -45,6 +45,7 @@ struct arm_spe_pmu_buf { > int nr_pages; > bool snapshot; > void *base; > + u64 pmscr; > }; > > struct arm_spe_pmu { > @@ -748,7 +749,7 @@ static void arm_spe_pmu_start(struct perf_event *event, int flags) > write_sysreg_s(reg, SYS_PMSICR_EL1); > } > > - reg = arm_spe_event_to_pmscr(event); > + reg = ((struct arm_spe_pmu_buf *) perf_get_aux(handle))->pmscr; > isb(); > write_sysreg_s(reg, SYS_PMSCR_EL1); Just nitpick (or it's just my preferring coding style), we can define a local pointer variable 'buf': struct arm_spe_pmu_buf *buf = perf_get_aux(handle); ... isb(); write_sysreg_s(buf->pmscr, SYS_PMSCR_EL1); Thanks, Leo > } > @@ -855,6 +856,8 @@ static void *arm_spe_pmu_setup_aux(struct perf_event *event, void **pages, > if (!pglist) > goto out_free_buf; > > + buf->pmscr = arm_spe_event_to_pmscr(event); > + > for (i = 0; i < nr_pages; ++i) > pglist[i] = virt_to_page(pages[i]); > > > > > Thanks, > > Leo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX 2022-02-16 13:22 ` Leo Yan @ 2022-02-16 15:16 ` German Gomez 0 siblings, 0 replies; 18+ messages in thread From: German Gomez @ 2022-02-16 15:16 UTC (permalink / raw) To: Leo Yan; +Cc: linux-arm-kernel, linux-kernel, will, mark.rutland, james.clark Hi Leo, Thanks for the review On 16/02/2022 13:22, Leo Yan wrote: > On Tue, Feb 15, 2022 at 02:29:27PM +0000, German Gomez wrote: > > [...] > >> Thanks for the suggestion. I recorded the following stacktrace: >> >> perf-323841 [052] d.... 3996.528812: arm_spe_pmu_setup_aux: (arm_spe_pmu_setup_aux+0x60/0x1c0 [arm_spe_pmu]) >> perf-323841 [052] d.... 3996.528813: <stack trace> > Yeah, this show arm_spe_pmu_setup_aux() is called in perf process. > >> => kprobe_dispatcher >> => kprobe_breakpoint_handler >> => call_break_hook >> => brk_handler >> => do_debug_exception >> => el1_dbg >> => el1h_64_sync_handler >> => el1h_64_sync >> => arm_spe_pmu_setup_aux >> => perf_mmap >> => mmap_region >> => do_mmap >> => vm_mmap_pgoff >> => ksys_mmap_pgoff >> => __arm64_sys_mmap >> => invoke_syscall >> => el0_svc_common.constprop.0 >> => do_el0_svc >> => el0_svc >> => el0t_64_sync_handler >> => el0t_64_sync >> >> So for a v2 I may include something like this: > The change looks good to me, please see below minor comment. > >> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c >> index d44bcc29d..aadec5a0e 100644 >> --- a/drivers/perf/arm_spe_pmu.c >> +++ b/drivers/perf/arm_spe_pmu.c >> @@ -45,6 +45,7 @@ struct arm_spe_pmu_buf { >> int nr_pages; >> bool snapshot; >> void *base; >> + u64 pmscr; >> }; >> >> struct arm_spe_pmu { >> @@ -748,7 +749,7 @@ static void arm_spe_pmu_start(struct perf_event *event, int flags) >> write_sysreg_s(reg, SYS_PMSICR_EL1); >> } >> >> - reg = arm_spe_event_to_pmscr(event); >> + reg = ((struct arm_spe_pmu_buf *) perf_get_aux(handle))->pmscr; >> isb(); >> write_sysreg_s(reg, SYS_PMSCR_EL1); > Just nitpick (or it's just my preferring coding style), we can define > a local pointer variable 'buf': > > struct arm_spe_pmu_buf *buf = perf_get_aux(handle); I need to make sure perf_get_aux(..) is called between perf_aux_output_begin and *_end though (so, after arm_spe_perf_aux_output_begin(..)): buf = perf_get_aux(handle); reg = buf->pmscr; isb(); write_sysreg_s(buf, SYS_PMSCR_EL1); Alternatively, we set the register inside of perf_aux_output_begin. It might be confusing for casual readers because the function handles a case where perf_get_aux(..) returns NULL. Alternatively, we could also wrap perf_get_aux(..) in a static inline function that returns the correct type and do: reg = arm_spe_get_aux(handle)->pmscr; isb(); write_sysreg_s(reg, SYS_PMSCR_EL1); So that it looks cleaner. > > ... > > isb(); > write_sysreg_s(buf->pmscr, SYS_PMSCR_EL1); > > Thanks, > Leo > >> } >> @@ -855,6 +856,8 @@ static void *arm_spe_pmu_setup_aux(struct perf_event *event, void **pages, >> if (!pglist) >> goto out_free_buf; >> >> + buf->pmscr = arm_spe_event_to_pmscr(event); >> + >> for (i = 0; i < nr_pages; ++i) >> pglist[i] = virt_to_page(pages[i]); >> >>> Thanks, >>> Leo ^ permalink raw reply [flat|nested] 18+ messages in thread
* [RFC PATCH 2/2] perf: arm_spe: Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode. 2022-01-17 12:44 [RFC PATCH 0/2] perf: arm_spe: Fix consistency of CONTEXT packets in SPE driver German Gomez 2022-01-17 12:44 ` [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX German Gomez @ 2022-01-17 12:44 ` German Gomez 2022-01-17 14:04 ` German Gomez 2022-01-18 9:52 ` Will Deacon 1 sibling, 2 replies; 18+ messages in thread From: German Gomez @ 2022-01-17 12:44 UTC (permalink / raw) To: linux-arm-kernel, linux-kernel, will, mark.rutland, james.clark, leo.yan Cc: German Gomez Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode. This is no less permissive than the existing behavior for the following reason: If perf_event_paranoid <= 0, then non perfmon_capable() users can open a per-CPU event. With a per-CPU event, unpriviledged users are allowed to profile _all_ processes, even ones owned by root. Without this change, users could see kernel addresses, root processes, etc, but not gather the PIDs of those processes. The PID is probably the least sensitive of all the information. It would be more idiomatic to check the perf_event_paranoid level with perf_allow_cpu(), but this function is not exported so cannot be used from a module. Looking for cpu != -1 is the indirect way of checking the same thing as it could never get to arm_spe_pmu_event_init() without perf_event_paranoid <= 0. Co-authored-by: James Clark <james.clark@arm.com> Signed-off-by: German Gomez <german.gomez@arm.com> --- drivers/perf/arm_spe_pmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c index 8515bf85c..7d9a7fa4f 100644 --- a/drivers/perf/arm_spe_pmu.c +++ b/drivers/perf/arm_spe_pmu.c @@ -711,7 +711,7 @@ static int arm_spe_pmu_event_init(struct perf_event *event) !(spe_pmu->features & SPE_PMU_FEAT_FILT_LAT)) return -EOPNOTSUPP; - spe_pmu->pmscr_cx = perfmon_capable(); + spe_pmu->pmscr_cx = perfmon_capable() || (event->cpu != -1); reg = arm_spe_event_to_pmscr(event); if (!perfmon_capable() && (reg & (BIT(SYS_PMSCR_EL1_PA_SHIFT) | -- 2.25.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 2/2] perf: arm_spe: Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode. 2022-01-17 12:44 ` [RFC PATCH 2/2] perf: arm_spe: Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode German Gomez @ 2022-01-17 14:04 ` German Gomez 2022-01-18 9:52 ` Will Deacon 1 sibling, 0 replies; 18+ messages in thread From: German Gomez @ 2022-01-17 14:04 UTC (permalink / raw) To: linux-arm-kernel, linux-kernel, will, mark.rutland, james.clark, leo.yan On 17/01/2022 12:44, German Gomez wrote: > Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode. > This is no less permissive than the existing behavior for the following > reason: > > If perf_event_paranoid <= 0, then non perfmon_capable() users can open > a per-CPU event. With a per-CPU event, unpriviledged users are allowed > to profile _all_ processes, even ones owned by root. > > Without this change, users could see kernel addresses, root processes, > etc, but not gather the PIDs of those processes. The PID is probably the > least sensitive of all the information. > > It would be more idiomatic to check the perf_event_paranoid level with > perf_allow_cpu(), but this function is not exported so cannot be used > from a module. Looking for cpu != -1 is the indirect way of checking > the same thing as it could never get to arm_spe_pmu_event_init() without Reconsidering this bit (comment below) > perf_event_paranoid <= 0. > > Co-authored-by: James Clark <james.clark@arm.com> > Signed-off-by: German Gomez <german.gomez@arm.com> > --- > drivers/perf/arm_spe_pmu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c > index 8515bf85c..7d9a7fa4f 100644 > --- a/drivers/perf/arm_spe_pmu.c > +++ b/drivers/perf/arm_spe_pmu.c > @@ -711,7 +711,7 @@ static int arm_spe_pmu_event_init(struct perf_event *event) > !(spe_pmu->features & SPE_PMU_FEAT_FILT_LAT)) > return -EOPNOTSUPP; > > - spe_pmu->pmscr_cx = perfmon_capable(); > + spe_pmu->pmscr_cx = perfmon_capable() || (event->cpu != -1); The perf_event_open(2) manpage states: pid == -1 and cpu >= 0 This measures all processes/threads on the specified CPU. This requires CAP_PERFMON (since Linux 5.8) or CAP_SYS_ADMIN capability or a /proc/sys/kernel/perf_event_paranoid value of less than 1. So perhaps it's more accurate to (still implicitly) check the paranoid level with "pid == -1 && event->cpu > 0" ? If so, I think I have to dig deeper into perf_event. I don't immediately see the pid argument. Any hints? Thanks, German > reg = arm_spe_event_to_pmscr(event); > if (!perfmon_capable() && > (reg & (BIT(SYS_PMSCR_EL1_PA_SHIFT) | ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 2/2] perf: arm_spe: Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode. 2022-01-17 12:44 ` [RFC PATCH 2/2] perf: arm_spe: Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode German Gomez 2022-01-17 14:04 ` German Gomez @ 2022-01-18 9:52 ` Will Deacon 2022-01-18 14:13 ` German Gomez 1 sibling, 1 reply; 18+ messages in thread From: Will Deacon @ 2022-01-18 9:52 UTC (permalink / raw) To: German Gomez Cc: linux-arm-kernel, linux-kernel, mark.rutland, james.clark, leo.yan On Mon, Jan 17, 2022 at 12:44:32PM +0000, German Gomez wrote: > Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode. > This is no less permissive than the existing behavior for the following > reason: > > If perf_event_paranoid <= 0, then non perfmon_capable() users can open > a per-CPU event. With a per-CPU event, unpriviledged users are allowed > to profile _all_ processes, even ones owned by root. > > Without this change, users could see kernel addresses, root processes, > etc, but not gather the PIDs of those processes. The PID is probably the > least sensitive of all the information. > > It would be more idiomatic to check the perf_event_paranoid level with > perf_allow_cpu(), but this function is not exported so cannot be used > from a module. Looking for cpu != -1 is the indirect way of checking > the same thing as it could never get to arm_spe_pmu_event_init() without > perf_event_paranoid <= 0. perf_allow_cpu() is a static inline so there's no need to export it. What's missing? Will ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 2/2] perf: arm_spe: Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode. 2022-01-18 9:52 ` Will Deacon @ 2022-01-18 14:13 ` German Gomez 0 siblings, 0 replies; 18+ messages in thread From: German Gomez @ 2022-01-18 14:13 UTC (permalink / raw) To: Will Deacon Cc: linux-arm-kernel, linux-kernel, mark.rutland, james.clark, leo.yan On 18/01/2022 09:52, Will Deacon wrote: > On Mon, Jan 17, 2022 at 12:44:32PM +0000, German Gomez wrote: >> Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode. >> This is no less permissive than the existing behavior for the following >> reason: >> >> If perf_event_paranoid <= 0, then non perfmon_capable() users can open >> a per-CPU event. With a per-CPU event, unpriviledged users are allowed >> to profile _all_ processes, even ones owned by root. >> >> Without this change, users could see kernel addresses, root processes, >> etc, but not gather the PIDs of those processes. The PID is probably the >> least sensitive of all the information. >> >> It would be more idiomatic to check the perf_event_paranoid level with >> perf_allow_cpu(), but this function is not exported so cannot be used >> from a module. Looking for cpu != -1 is the indirect way of checking >> the same thing as it could never get to arm_spe_pmu_event_init() without >> perf_event_paranoid <= 0. > perf_allow_cpu() is a static inline so there's no need to export it. What's > missing? We were still running into build errors: ERROR: modpost: "security_perf_event_open" [drivers/perf/arm_spe_pmu.ko] undefined! ERROR: modpost: "sysctl_perf_event_paranoid" [drivers/perf/arm_spe_pmu.ko] undefined > > Will ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2022-02-16 15:16 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-01-17 12:44 [RFC PATCH 0/2] perf: arm_spe: Fix consistency of CONTEXT packets in SPE driver German Gomez 2022-01-17 12:44 ` [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX German Gomez 2022-01-18 10:07 ` Will Deacon 2022-01-18 14:04 ` German Gomez 2022-01-19 11:27 ` German Gomez 2022-01-18 16:28 ` James Clark 2022-02-05 15:39 ` Leo Yan 2022-02-07 12:06 ` German Gomez 2022-02-08 13:00 ` Leo Yan 2022-02-10 17:23 ` German Gomez 2022-02-11 10:45 ` Leo Yan 2022-02-15 14:29 ` German Gomez 2022-02-16 13:22 ` Leo Yan 2022-02-16 15:16 ` German Gomez 2022-01-17 12:44 ` [RFC PATCH 2/2] perf: arm_spe: Enable CONTEXT packets in SPE traces if the profiler runs in CPU mode German Gomez 2022-01-17 14:04 ` German Gomez 2022-01-18 9:52 ` Will Deacon 2022-01-18 14:13 ` German Gomez
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).