* [PATCH 1/5] sched_clock: Expose struct clock_read_data
2020-05-12 12:40 [PATCH 0/5] arm64: perf: Proper cap_user_time* support Peter Zijlstra
@ 2020-05-12 12:40 ` Peter Zijlstra
2020-05-12 12:41 ` [PATCH 2/5] arm64: perf: Implement correct cap_user_time Peter Zijlstra
` (4 subsequent siblings)
5 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2020-05-12 12:40 UTC (permalink / raw)
To: Will Deacon, Marc Zyngier, Mark Rutland, Catalin Marinas, Leo Yan
Cc: linux-kernel, linux-arm-kernel, mingo, acme, alexander.shishkin,
jolsa, daniel.lezcano, tglx, sboyd, john.stultz,
Peter Zijlstra (Intel)
In order to support perf_event_mmap_page::cap_time features, an
architecture needs, aside from a userspace readable counter register,
to expose the exact clock data so that userspace can convert the
counter register into a correct timestamp.
Provide struct clock_read_data and two (seqcount) helpers so that
architectures (arm64 in specific) can expose the numbers to userspace.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
include/linux/sched_clock.h | 28 ++++++++++++++++++++++++++++
kernel/time/sched_clock.c | 41 +++++++++++++----------------------------
2 files changed, 41 insertions(+), 28 deletions(-)
--- a/include/linux/sched_clock.h
+++ b/include/linux/sched_clock.h
@@ -6,6 +6,34 @@
#define LINUX_SCHED_CLOCK
#ifdef CONFIG_GENERIC_SCHED_CLOCK
+/**
+ * struct clock_read_data - data required to read from sched_clock()
+ *
+ * @epoch_ns: sched_clock() value at last update
+ * @epoch_cyc: Clock cycle value at last update.
+ * @sched_clock_mask: Bitmask for two's complement subtraction of non 64bit
+ * clocks.
+ * @read_sched_clock: Current clock source (or dummy source when suspended).
+ * @mult: Multipler for scaled math conversion.
+ * @shift: Shift value for scaled math conversion.
+ *
+ * Care must be taken when updating this structure; it is read by
+ * some very hot code paths. It occupies <=40 bytes and, when combined
+ * with the seqcount used to synchronize access, comfortably fits into
+ * a 64 byte cache line.
+ */
+struct clock_read_data {
+ u64 epoch_ns;
+ u64 epoch_cyc;
+ u64 sched_clock_mask;
+ u64 (*read_sched_clock)(void);
+ u32 mult;
+ u32 shift;
+};
+
+extern struct clock_read_data *sched_clock_read_begin(unsigned int *seq);
+extern int sched_clock_read_retry(unsigned int seq);
+
extern void generic_sched_clock_init(void);
extern void sched_clock_register(u64 (*read)(void), int bits,
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -20,31 +20,6 @@
#include "timekeeping.h"
/**
- * struct clock_read_data - data required to read from sched_clock()
- *
- * @epoch_ns: sched_clock() value at last update
- * @epoch_cyc: Clock cycle value at last update.
- * @sched_clock_mask: Bitmask for two's complement subtraction of non 64bit
- * clocks.
- * @read_sched_clock: Current clock source (or dummy source when suspended).
- * @mult: Multipler for scaled math conversion.
- * @shift: Shift value for scaled math conversion.
- *
- * Care must be taken when updating this structure; it is read by
- * some very hot code paths. It occupies <=40 bytes and, when combined
- * with the seqcount used to synchronize access, comfortably fits into
- * a 64 byte cache line.
- */
-struct clock_read_data {
- u64 epoch_ns;
- u64 epoch_cyc;
- u64 sched_clock_mask;
- u64 (*read_sched_clock)(void);
- u32 mult;
- u32 shift;
-};
-
-/**
* struct clock_data - all data needed for sched_clock() (including
* registration of a new clock source)
*
@@ -93,6 +68,17 @@ static inline u64 notrace cyc_to_ns(u64
return (cyc * mult) >> shift;
}
+struct clock_read_data *sched_clock_read_begin(unsigned int *seq)
+{
+ *seq = raw_read_seqcount(&cd.seq);
+ return cd.read_data + (*seq & 1);
+}
+
+int sched_clock_read_retry(unsigned int seq)
+{
+ return read_seqcount_retry(&cd.seq, seq);
+}
+
unsigned long long notrace sched_clock(void)
{
u64 cyc, res;
@@ -100,13 +86,12 @@ unsigned long long notrace sched_clock(v
struct clock_read_data *rd;
do {
- seq = raw_read_seqcount(&cd.seq);
- rd = cd.read_data + (seq & 1);
+ rd = sched_clock_read_begin(&seq);
cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
rd->sched_clock_mask;
res = rd->epoch_ns + cyc_to_ns(cyc, rd->mult, rd->shift);
- } while (read_seqcount_retry(&cd.seq, seq));
+ } while (sched_clock_read_retry(seq));
return res;
}
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/5] arm64: perf: Implement correct cap_user_time
2020-05-12 12:40 [PATCH 0/5] arm64: perf: Proper cap_user_time* support Peter Zijlstra
2020-05-12 12:40 ` [PATCH 1/5] sched_clock: Expose struct clock_read_data Peter Zijlstra
@ 2020-05-12 12:41 ` Peter Zijlstra
2020-05-12 14:03 ` Leo Yan
2020-05-12 16:05 ` kbuild test robot
2020-05-12 12:41 ` [PATCH 3/5] arm64: perf: Only advertise cap_user_time for arch_timer Peter Zijlstra
` (3 subsequent siblings)
5 siblings, 2 replies; 14+ messages in thread
From: Peter Zijlstra @ 2020-05-12 12:41 UTC (permalink / raw)
To: Will Deacon, Marc Zyngier, Mark Rutland, Catalin Marinas, Leo Yan
Cc: linux-kernel, linux-arm-kernel, mingo, acme, alexander.shishkin,
jolsa, daniel.lezcano, tglx, sboyd, john.stultz,
Peter Zijlstra (Intel)
As reported by Leo; the existing implementation is broken when the
clock and counter don't intersect at 0.
Use the sched_clock's struct clock_read_data information to correctly
implement cap_user_time and cap_user_time_zero.
Note that the ARM64 counter is architecturally only guaranteed to be
56bit wide (implementations are allowed to be wider) and the existing
perf ABI cannot deal with wrap-around.
This implementation should also be faster than the old; seeing how we
don't need to recompute mult and shift all the time.
Reported-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/arm64/kernel/perf_event.c | 36 +++++++++++++++++++++++++++---------
1 file changed, 27 insertions(+), 9 deletions(-)
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -19,6 +19,7 @@
#include <linux/of.h>
#include <linux/perf/arm_pmu.h>
#include <linux/platform_device.h>
+#include <linux/sched_clock.h>
#include <linux/smp.h>
/* ARMv8 Cortex-A53 specific event types. */
@@ -1165,28 +1166,45 @@ device_initcall(armv8_pmu_driver_init)
void arch_perf_update_userpage(struct perf_event *event,
struct perf_event_mmap_page *userpg, u64 now)
{
- u32 freq;
- u32 shift;
+ struct clock_read_data *rd;
+ unsigned int seq;
/*
* Internal timekeeping for enabled/running/stopped times
* is always computed with the sched_clock.
*/
- freq = arch_timer_get_rate();
userpg->cap_user_time = 1;
+ userpg->cap_user_time_zero = 1;
+
+ do {
+ rd = sched_clock_read_begin(&seq);
+
+ userpg->time_mult = rd->mult;
+ userpg->time_shift = rd->shift;
+ userpg->time_zero = rd->epoch_ns;
+
+ /*
+ * This isn't strictly correct, the ARM64 counter can be
+ * 'short' and then we get funnies when it wraps. The correct
+ * thing would be to extend the perf ABI with a cycle and mask
+ * value, but because wrapping on ARM64 is very rare in
+ * practise this 'works'.
+ */
+ userpg->time_zero -= (rd->epoch_cyc * rd->mult) >> rd->shift;
+
+ } while (sched_clock_read_retry(seq));
+
+ userpg->time_offset = userpg->time_zero - now;
- clocks_calc_mult_shift(&userpg->time_mult, &shift, freq,
- NSEC_PER_SEC, 0);
/*
* time_shift is not expected to be greater than 31 due to
* the original published conversion algorithm shifting a
* 32-bit value (now specifies a 64-bit value) - refer
* perf_event_mmap_page documentation in perf_event.h.
*/
- if (shift == 32) {
- shift = 31;
+ if (userpg->shift == 32) {
+ userpg->shift = 31;
userpg->time_mult >>= 1;
}
- userpg->time_shift = (u16)shift;
- userpg->time_offset = -now;
+
}
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/5] arm64: perf: Implement correct cap_user_time
2020-05-12 12:41 ` [PATCH 2/5] arm64: perf: Implement correct cap_user_time Peter Zijlstra
@ 2020-05-12 14:03 ` Leo Yan
2020-05-12 14:08 ` Peter Zijlstra
2020-05-12 16:05 ` kbuild test robot
1 sibling, 1 reply; 14+ messages in thread
From: Leo Yan @ 2020-05-12 14:03 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Will Deacon, Marc Zyngier, Mark Rutland, Catalin Marinas,
linux-kernel, linux-arm-kernel, mingo, acme, alexander.shishkin,
jolsa, daniel.lezcano, tglx, sboyd, john.stultz
Hi Peter,
On Tue, May 12, 2020 at 02:41:00PM +0200, Peter Zijlstra wrote:
> As reported by Leo; the existing implementation is broken when the
> clock and counter don't intersect at 0.
>
> Use the sched_clock's struct clock_read_data information to correctly
> implement cap_user_time and cap_user_time_zero.
>
> Note that the ARM64 counter is architecturally only guaranteed to be
> 56bit wide (implementations are allowed to be wider) and the existing
> perf ABI cannot deal with wrap-around.
>
> This implementation should also be faster than the old; seeing how we
> don't need to recompute mult and shift all the time.
>
> Reported-by: Leo Yan <leo.yan@linaro.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> arch/arm64/kernel/perf_event.c | 36 +++++++++++++++++++++++++++---------
> 1 file changed, 27 insertions(+), 9 deletions(-)
>
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -19,6 +19,7 @@
> #include <linux/of.h>
> #include <linux/perf/arm_pmu.h>
> #include <linux/platform_device.h>
> +#include <linux/sched_clock.h>
> #include <linux/smp.h>
>
> /* ARMv8 Cortex-A53 specific event types. */
> @@ -1165,28 +1166,45 @@ device_initcall(armv8_pmu_driver_init)
> void arch_perf_update_userpage(struct perf_event *event,
> struct perf_event_mmap_page *userpg, u64 now)
> {
> - u32 freq;
> - u32 shift;
> + struct clock_read_data *rd;
> + unsigned int seq;
>
> /*
> * Internal timekeeping for enabled/running/stopped times
> * is always computed with the sched_clock.
> */
> - freq = arch_timer_get_rate();
> userpg->cap_user_time = 1;
> + userpg->cap_user_time_zero = 1;
> +
> + do {
> + rd = sched_clock_read_begin(&seq);
> +
> + userpg->time_mult = rd->mult;
> + userpg->time_shift = rd->shift;
> + userpg->time_zero = rd->epoch_ns;
> +
> + /*
> + * This isn't strictly correct, the ARM64 counter can be
> + * 'short' and then we get funnies when it wraps. The correct
> + * thing would be to extend the perf ABI with a cycle and mask
> + * value, but because wrapping on ARM64 is very rare in
> + * practise this 'works'.
> + */
> + userpg->time_zero -= (rd->epoch_cyc * rd->mult) >> rd->shift;
> +
> + } while (sched_clock_read_retry(seq));
> +
> + userpg->time_offset = userpg->time_zero - now;
>
> - clocks_calc_mult_shift(&userpg->time_mult, &shift, freq,
> - NSEC_PER_SEC, 0);
> /*
> * time_shift is not expected to be greater than 31 due to
> * the original published conversion algorithm shifting a
> * 32-bit value (now specifies a 64-bit value) - refer
> * perf_event_mmap_page documentation in perf_event.h.
> */
> - if (shift == 32) {
> - shift = 31;
> + if (userpg->shift == 32) {
Thanks a lot for the patch set, some typos:
s/shift/time_shift
> + userpg->shift = 31;
s/shift/time_shift
Thanks,
Leo
> userpg->time_mult >>= 1;
> }
> - userpg->time_shift = (u16)shift;
> - userpg->time_offset = -now;
> +
> }
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/5] arm64: perf: Implement correct cap_user_time
2020-05-12 14:03 ` Leo Yan
@ 2020-05-12 14:08 ` Peter Zijlstra
0 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2020-05-12 14:08 UTC (permalink / raw)
To: Leo Yan
Cc: Will Deacon, Marc Zyngier, Mark Rutland, Catalin Marinas,
linux-kernel, linux-arm-kernel, mingo, acme, alexander.shishkin,
jolsa, daniel.lezcano, tglx, sboyd, john.stultz
On Tue, May 12, 2020 at 10:03:01PM +0800, Leo Yan wrote:
> > + if (userpg->shift == 32) {
>
> Thanks a lot for the patch set, some typos:
>
> s/shift/time_shift
>
> > + userpg->shift = 31;
>
> s/shift/time_shift
Blergh.. so much for me not taking the time to dig out the arm64 cross
compiler :/ Sorry about that.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/5] arm64: perf: Implement correct cap_user_time
2020-05-12 12:41 ` [PATCH 2/5] arm64: perf: Implement correct cap_user_time Peter Zijlstra
2020-05-12 14:03 ` Leo Yan
@ 2020-05-12 16:05 ` kbuild test robot
1 sibling, 0 replies; 14+ messages in thread
From: kbuild test robot @ 2020-05-12 16:05 UTC (permalink / raw)
To: Peter Zijlstra, Will Deacon, Marc Zyngier, Mark Rutland,
Catalin Marinas, Leo Yan
Cc: kbuild-all, linux-kernel, linux-arm-kernel, mingo, acme,
alexander.shishkin, jolsa, daniel.lezcano, tglx, sboyd
[-- Attachment #1: Type: text/plain, Size: 3298 bytes --]
Hi Peter,
I love your patch! Yet something to improve:
[auto build test ERROR on tip/perf/core]
[also build test ERROR on arm64/for-next/core arm-perf/for-next/perf linus/master v5.7-rc5 next-20200512]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
url: https://github.com/0day-ci/linux/commits/Peter-Zijlstra/arm64-perf-Proper-cap_user_time-support/20200512-205141
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 059c6d68cfc5f85ba3ab71d71a6de380016f7936
config: arm64-allyesconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day GCC_VERSION=9.3.0 make.cross ARCH=arm64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
arch/arm64/kernel/perf_event.c: In function 'arch_perf_update_userpage':
>> arch/arm64/kernel/perf_event.c:1205:12: error: 'struct perf_event_mmap_page' has no member named 'shift'
1205 | if (userpg->shift == 32) {
| ^~
arch/arm64/kernel/perf_event.c:1206:9: error: 'struct perf_event_mmap_page' has no member named 'shift'
1206 | userpg->shift = 31;
| ^~
vim +1205 arch/arm64/kernel/perf_event.c
1165
1166 void arch_perf_update_userpage(struct perf_event *event,
1167 struct perf_event_mmap_page *userpg, u64 now)
1168 {
1169 struct clock_read_data *rd;
1170 unsigned int seq;
1171
1172 /*
1173 * Internal timekeeping for enabled/running/stopped times
1174 * is always computed with the sched_clock.
1175 */
1176 userpg->cap_user_time = 1;
1177 userpg->cap_user_time_zero = 1;
1178
1179 do {
1180 rd = sched_clock_read_begin(&seq);
1181
1182 userpg->time_mult = rd->mult;
1183 userpg->time_shift = rd->shift;
1184 userpg->time_zero = rd->epoch_ns;
1185
1186 /*
1187 * This isn't strictly correct, the ARM64 counter can be
1188 * 'short' and then we get funnies when it wraps. The correct
1189 * thing would be to extend the perf ABI with a cycle and mask
1190 * value, but because wrapping on ARM64 is very rare in
1191 * practise this 'works'.
1192 */
1193 userpg->time_zero -= (rd->epoch_cyc * rd->mult) >> rd->shift;
1194
1195 } while (sched_clock_read_retry(seq));
1196
1197 userpg->time_offset = userpg->time_zero - now;
1198
1199 /*
1200 * time_shift is not expected to be greater than 31 due to
1201 * the original published conversion algorithm shifting a
1202 * 32-bit value (now specifies a 64-bit value) - refer
1203 * perf_event_mmap_page documentation in perf_event.h.
1204 */
> 1205 if (userpg->shift == 32) {
1206 userpg->shift = 31;
1207 userpg->time_mult >>= 1;
1208 }
1209
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 71774 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 3/5] arm64: perf: Only advertise cap_user_time for arch_timer
2020-05-12 12:40 [PATCH 0/5] arm64: perf: Proper cap_user_time* support Peter Zijlstra
2020-05-12 12:40 ` [PATCH 1/5] sched_clock: Expose struct clock_read_data Peter Zijlstra
2020-05-12 12:41 ` [PATCH 2/5] arm64: perf: Implement correct cap_user_time Peter Zijlstra
@ 2020-05-12 12:41 ` Peter Zijlstra
2020-05-12 12:41 ` [PATCH 4/5] perf: Add perf_event_mmap_page::cap_user_time_short ABI Peter Zijlstra
` (2 subsequent siblings)
5 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2020-05-12 12:41 UTC (permalink / raw)
To: Will Deacon, Marc Zyngier, Mark Rutland, Catalin Marinas, Leo Yan
Cc: linux-kernel, linux-arm-kernel, mingo, acme, alexander.shishkin,
jolsa, daniel.lezcano, tglx, sboyd, john.stultz,
Peter Zijlstra (Intel)
When sched_clock is running on anything other than arch_timer, don't
advertise cap_user_time*.
Requested-by: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/arm64/kernel/perf_event.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -13,6 +13,8 @@
#include <asm/sysreg.h>
#include <asm/virt.h>
+#include <clocksource/arm_arch_timer.h>
+
#include <linux/acpi.h>
#include <linux/clocksource.h>
#include <linux/kvm_host.h>
@@ -1169,16 +1171,15 @@ void arch_perf_update_userpage(struct pe
struct clock_read_data *rd;
unsigned int seq;
- /*
- * Internal timekeeping for enabled/running/stopped times
- * is always computed with the sched_clock.
- */
- userpg->cap_user_time = 1;
- userpg->cap_user_time_zero = 1;
+ userpg->cap_user_time = 0;
+ userpg->cap_user_time_zero = 0;
do {
rd = sched_clock_read_begin(&seq);
+ if (rd->read_sched_clock != arch_timer_read_counter)
+ return;
+
userpg->time_mult = rd->mult;
userpg->time_shift = rd->shift;
userpg->time_zero = rd->epoch_ns;
@@ -1207,4 +1208,10 @@ void arch_perf_update_userpage(struct pe
userpg->time_mult >>= 1;
}
+ /*
+ * Internal timekeeping for enabled/running/stopped times
+ * is always computed with the sched_clock.
+ */
+ userpg->cap_user_time = 1;
+ userpg->cap_user_time_zero = 1;
}
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 4/5] perf: Add perf_event_mmap_page::cap_user_time_short ABI
2020-05-12 12:40 [PATCH 0/5] arm64: perf: Proper cap_user_time* support Peter Zijlstra
` (2 preceding siblings ...)
2020-05-12 12:41 ` [PATCH 3/5] arm64: perf: Only advertise cap_user_time for arch_timer Peter Zijlstra
@ 2020-05-12 12:41 ` Peter Zijlstra
2020-05-12 12:41 ` [PATCH 5/5] arm64: perf: Add cap_user_time_short Peter Zijlstra
2020-07-13 6:08 ` [PATCH 0/5] arm64: perf: Proper cap_user_time* support Leo Yan
5 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2020-05-12 12:41 UTC (permalink / raw)
To: Will Deacon, Marc Zyngier, Mark Rutland, Catalin Marinas, Leo Yan
Cc: linux-kernel, linux-arm-kernel, mingo, acme, alexander.shishkin,
jolsa, daniel.lezcano, tglx, sboyd, john.stultz,
Peter Zijlstra (Intel)
In order to support short clock counters, provide an ABI extention.
As a whole:
u64 time, delta, cyc = read_cycle_counter();
+ if (cap_user_time_short)
+ cyc = time_cycle + ((cyc - time_cycle) & time_mask);
delta = mul_u64_u32_shr(cyc, time_mult, time_shift);
if (cap_user_time_zero)
time = time_zero + delta;
delta += time_offset;
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
include/uapi/linux/perf_event.h | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -532,9 +532,10 @@ struct perf_event_mmap_page {
cap_bit0_is_deprecated : 1, /* Always 1, signals that bit 0 is zero */
cap_user_rdpmc : 1, /* The RDPMC instruction can be used to read counts */
- cap_user_time : 1, /* The time_* fields are used */
+ cap_user_time : 1, /* The time_{shift,mult,offset} fields are used */
cap_user_time_zero : 1, /* The time_zero field is used */
- cap_____res : 59;
+ cap_user_time_short : 1, /* the time_{cycle,mask} fields are used */
+ cap_____res : 58;
};
};
@@ -593,13 +594,29 @@ struct perf_event_mmap_page {
* ((rem * time_mult) >> time_shift);
*/
__u64 time_zero;
+
__u32 size; /* Header size up to __reserved[] fields. */
+ __u32 __reserved_1;
+
+ /*
+ * If cap_usr_time_short, the hardware clock is less than 64bit wide
+ * and we must compute the 'cyc' value, as used by cap_usr_time, as:
+ *
+ * cyc = time_cycles + ((cyc - time_cycles) & time_mask)
+ *
+ * NOTE: this form is explicitly chosen such that cap_usr_time_short
+ * is a correction on top of cap_usr_time, and code that doesn't
+ * know about cap_usr_time_short still works under the assumption
+ * the counter doesn't wrap.
+ */
+ __u64 time_cycles;
+ __u64 time_mask;
/*
* Hole for extension of the self monitor capabilities
*/
- __u8 __reserved[118*8+4]; /* align to 1k. */
+ __u8 __reserved[116*8]; /* align to 1k. */
/*
* Control data for the mmap() data buffer.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 5/5] arm64: perf: Add cap_user_time_short
2020-05-12 12:40 [PATCH 0/5] arm64: perf: Proper cap_user_time* support Peter Zijlstra
` (3 preceding siblings ...)
2020-05-12 12:41 ` [PATCH 4/5] perf: Add perf_event_mmap_page::cap_user_time_short ABI Peter Zijlstra
@ 2020-05-12 12:41 ` Peter Zijlstra
2020-05-12 14:11 ` Leo Yan
2020-05-12 16:59 ` kbuild test robot
2020-07-13 6:08 ` [PATCH 0/5] arm64: perf: Proper cap_user_time* support Leo Yan
5 siblings, 2 replies; 14+ messages in thread
From: Peter Zijlstra @ 2020-05-12 12:41 UTC (permalink / raw)
To: Will Deacon, Marc Zyngier, Mark Rutland, Catalin Marinas, Leo Yan
Cc: linux-kernel, linux-arm-kernel, mingo, acme, alexander.shishkin,
jolsa, daniel.lezcano, tglx, sboyd, john.stultz,
Peter Zijlstra (Intel)
This completes the ARM64 cap_user_time support.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/arm64/kernel/perf_event.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -1173,6 +1173,7 @@ void arch_perf_update_userpage(struct pe
userpg->cap_user_time = 0;
userpg->cap_user_time_zero = 0;
+ userpg->cap_user_time_short = 0;
do {
rd = sched_clock_read_begin(&seq);
@@ -1183,13 +1184,13 @@ void arch_perf_update_userpage(struct pe
userpg->time_mult = rd->mult;
userpg->time_shift = rd->shift;
userpg->time_zero = rd->epoch_ns;
+ userpg->time_cycle = rd->epoch_cyc;
+ userpg->time_mask = rd->sched_clock_mask;
/*
- * This isn't strictly correct, the ARM64 counter can be
- * 'short' and then we get funnies when it wraps. The correct
- * thing would be to extend the perf ABI with a cycle and mask
- * value, but because wrapping on ARM64 is very rare in
- * practise this 'works'.
+ * Subtract the cycle base, such that software that
+ * doesn't know about cap_user_time_short still 'works'
+ * assuming no wraps.
*/
userpg->time_zero -= (rd->epoch_cyc * rd->mult) >> rd->shift;
@@ -1214,4 +1215,5 @@ void arch_perf_update_userpage(struct pe
*/
userpg->cap_user_time = 1;
userpg->cap_user_time_zero = 1;
+ userpg->cap_user_time_short = 1;
}
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 5/5] arm64: perf: Add cap_user_time_short
2020-05-12 12:41 ` [PATCH 5/5] arm64: perf: Add cap_user_time_short Peter Zijlstra
@ 2020-05-12 14:11 ` Leo Yan
2020-05-12 16:59 ` kbuild test robot
1 sibling, 0 replies; 14+ messages in thread
From: Leo Yan @ 2020-05-12 14:11 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Will Deacon, Marc Zyngier, Mark Rutland, Catalin Marinas,
linux-kernel, linux-arm-kernel, mingo, acme, alexander.shishkin,
jolsa, daniel.lezcano, tglx, sboyd, john.stultz
On Tue, May 12, 2020 at 02:41:03PM +0200, Peter Zijlstra wrote:
> This completes the ARM64 cap_user_time support.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> arch/arm64/kernel/perf_event.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -1173,6 +1173,7 @@ void arch_perf_update_userpage(struct pe
>
> userpg->cap_user_time = 0;
> userpg->cap_user_time_zero = 0;
> + userpg->cap_user_time_short = 0;
>
> do {
> rd = sched_clock_read_begin(&seq);
> @@ -1183,13 +1184,13 @@ void arch_perf_update_userpage(struct pe
> userpg->time_mult = rd->mult;
> userpg->time_shift = rd->shift;
> userpg->time_zero = rd->epoch_ns;
> + userpg->time_cycle = rd->epoch_cyc;
s/time_cycle/time_cycles, maybe consider to change the naming to
'time_cycle'.
This patch set looks good to me after I tested it on Arm64 board.
Thanks,
Leo
> + userpg->time_mask = rd->sched_clock_mask;
>
> /*
> - * This isn't strictly correct, the ARM64 counter can be
> - * 'short' and then we get funnies when it wraps. The correct
> - * thing would be to extend the perf ABI with a cycle and mask
> - * value, but because wrapping on ARM64 is very rare in
> - * practise this 'works'.
> + * Subtract the cycle base, such that software that
> + * doesn't know about cap_user_time_short still 'works'
> + * assuming no wraps.
> */
> userpg->time_zero -= (rd->epoch_cyc * rd->mult) >> rd->shift;
>
> @@ -1214,4 +1215,5 @@ void arch_perf_update_userpage(struct pe
> */
> userpg->cap_user_time = 1;
> userpg->cap_user_time_zero = 1;
> + userpg->cap_user_time_short = 1;
> }
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 5/5] arm64: perf: Add cap_user_time_short
2020-05-12 12:41 ` [PATCH 5/5] arm64: perf: Add cap_user_time_short Peter Zijlstra
2020-05-12 14:11 ` Leo Yan
@ 2020-05-12 16:59 ` kbuild test robot
1 sibling, 0 replies; 14+ messages in thread
From: kbuild test robot @ 2020-05-12 16:59 UTC (permalink / raw)
To: Peter Zijlstra, Will Deacon, Marc Zyngier, Mark Rutland,
Catalin Marinas, Leo Yan
Cc: kbuild-all, linux-kernel, linux-arm-kernel, mingo, acme,
alexander.shishkin, jolsa, daniel.lezcano, tglx, sboyd
[-- Attachment #1: Type: text/plain, Size: 2625 bytes --]
Hi Peter,
I love your patch! Yet something to improve:
[auto build test ERROR on tip/perf/core]
[also build test ERROR on arm64/for-next/core arm-perf/for-next/perf linus/master v5.7-rc5 next-20200512]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
url: https://github.com/0day-ci/linux/commits/Peter-Zijlstra/arm64-perf-Proper-cap_user_time-support/20200512-205141
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 059c6d68cfc5f85ba3ab71d71a6de380016f7936
config: arm64-allyesconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day GCC_VERSION=9.3.0 make.cross ARCH=arm64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
arch/arm64/kernel/perf_event.c: In function 'arch_perf_update_userpage':
>> arch/arm64/kernel/perf_event.c:1187:11: error: 'struct perf_event_mmap_page' has no member named 'time_cycle'; did you mean 'time_cycles'?
1187 | userpg->time_cycle = rd->epoch_cyc;
| ^~~~~~~~~~
| time_cycles
arch/arm64/kernel/perf_event.c:1207:12: error: 'struct perf_event_mmap_page' has no member named 'shift'
1207 | if (userpg->shift == 32) {
| ^~
arch/arm64/kernel/perf_event.c:1208:9: error: 'struct perf_event_mmap_page' has no member named 'shift'
1208 | userpg->shift = 31;
| ^~
vim +1187 arch/arm64/kernel/perf_event.c
1167
1168 void arch_perf_update_userpage(struct perf_event *event,
1169 struct perf_event_mmap_page *userpg, u64 now)
1170 {
1171 struct clock_read_data *rd;
1172 unsigned int seq;
1173
1174 userpg->cap_user_time = 0;
1175 userpg->cap_user_time_zero = 0;
1176 userpg->cap_user_time_short = 0;
1177
1178 do {
1179 rd = sched_clock_read_begin(&seq);
1180
1181 if (rd->read_sched_clock != arch_timer_read_counter)
1182 return;
1183
1184 userpg->time_mult = rd->mult;
1185 userpg->time_shift = rd->shift;
1186 userpg->time_zero = rd->epoch_ns;
> 1187 userpg->time_cycle = rd->epoch_cyc;
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 71774 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/5] arm64: perf: Proper cap_user_time* support
2020-05-12 12:40 [PATCH 0/5] arm64: perf: Proper cap_user_time* support Peter Zijlstra
` (4 preceding siblings ...)
2020-05-12 12:41 ` [PATCH 5/5] arm64: perf: Add cap_user_time_short Peter Zijlstra
@ 2020-07-13 6:08 ` Leo Yan
2020-07-13 10:11 ` Will Deacon
5 siblings, 1 reply; 14+ messages in thread
From: Leo Yan @ 2020-07-13 6:08 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Will Deacon, Marc Zyngier, Mark Rutland, Catalin Marinas,
linux-kernel, linux-arm-kernel, mingo, acme, alexander.shishkin,
jolsa, daniel.lezcano, tglx, sboyd, john.stultz
Hi Peter,
On Tue, May 12, 2020 at 02:40:58PM +0200, Peter Zijlstra wrote:
> Prompted by Leo's patches, here a series that corrects the arm64 perf cap_user_time situation.
I checked the latest mainline kernel code base, found this patch set
are missed to merge into it.
Could you confirm if this is missed or any other reasons to hold on it?
Thanks,
Leo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/5] arm64: perf: Proper cap_user_time* support
2020-07-13 6:08 ` [PATCH 0/5] arm64: perf: Proper cap_user_time* support Leo Yan
@ 2020-07-13 10:11 ` Will Deacon
2020-07-13 12:58 ` Leo Yan
0 siblings, 1 reply; 14+ messages in thread
From: Will Deacon @ 2020-07-13 10:11 UTC (permalink / raw)
To: Leo Yan
Cc: Peter Zijlstra, Marc Zyngier, Mark Rutland, Catalin Marinas,
linux-kernel, linux-arm-kernel, mingo, acme, alexander.shishkin,
jolsa, daniel.lezcano, tglx, sboyd, john.stultz
Hi Leo,
On Mon, Jul 13, 2020 at 02:08:00PM +0800, Leo Yan wrote:
> On Tue, May 12, 2020 at 02:40:58PM +0200, Peter Zijlstra wrote:
> > Prompted by Leo's patches, here a series that corrects the arm64 perf cap_user_time situation.
>
> I checked the latest mainline kernel code base, found this patch set
> are missed to merge into it.
>
> Could you confirm if this is missed or any other reasons to hold on it?
I was assuming you were going to pick them up, fix up the issues found
by you and kbuild robot and then post a full series after testing.
Will
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/5] arm64: perf: Proper cap_user_time* support
2020-07-13 10:11 ` Will Deacon
@ 2020-07-13 12:58 ` Leo Yan
0 siblings, 0 replies; 14+ messages in thread
From: Leo Yan @ 2020-07-13 12:58 UTC (permalink / raw)
To: Will Deacon
Cc: Peter Zijlstra, Marc Zyngier, Mark Rutland, Catalin Marinas,
linux-kernel, linux-arm-kernel, mingo, acme, alexander.shishkin,
jolsa, daniel.lezcano, tglx, sboyd, john.stultz
Hi Will,
On Mon, Jul 13, 2020 at 11:11:56AM +0100, Will Deacon wrote:
> Hi Leo,
>
> On Mon, Jul 13, 2020 at 02:08:00PM +0800, Leo Yan wrote:
> > On Tue, May 12, 2020 at 02:40:58PM +0200, Peter Zijlstra wrote:
> > > Prompted by Leo's patches, here a series that corrects the arm64 perf cap_user_time situation.
> >
> > I checked the latest mainline kernel code base, found this patch set
> > are missed to merge into it.
> >
> > Could you confirm if this is missed or any other reasons to hold on it?
>
> I was assuming you were going to pick them up, fix up the issues found
> by you and kbuild robot and then post a full series after testing.
Understand now, will rebase the patch set and verify it with SPE
timestamp related changes.
Thanks,
Leo
^ permalink raw reply [flat|nested] 14+ messages in thread