linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] arm64: perf: Proper cap_user_time* support
@ 2020-05-12 12:40 Peter Zijlstra
  2020-05-12 12:40 ` [PATCH 1/5] sched_clock: Expose struct clock_read_data Peter Zijlstra
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Peter Zijlstra @ 2020-05-12 12:40 UTC (permalink / raw)
  To: Will Deacon, Marc Zyngier, Mark Rutland, Catalin Marinas, Leo Yan
  Cc: linux-kernel, linux-arm-kernel, mingo, acme, alexander.shishkin,
	jolsa, daniel.lezcano, tglx, sboyd, john.stultz,
	Peter Zijlstra (Intel)

Prompted by Leo's patches, here a series that corrects the arm64 perf cap_user_time situation.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/5] sched_clock: Expose struct clock_read_data
  2020-05-12 12:40 [PATCH 0/5] arm64: perf: Proper cap_user_time* support Peter Zijlstra
@ 2020-05-12 12:40 ` Peter Zijlstra
  2020-05-12 12:41 ` [PATCH 2/5] arm64: perf: Implement correct cap_user_time Peter Zijlstra
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2020-05-12 12:40 UTC (permalink / raw)
  To: Will Deacon, Marc Zyngier, Mark Rutland, Catalin Marinas, Leo Yan
  Cc: linux-kernel, linux-arm-kernel, mingo, acme, alexander.shishkin,
	jolsa, daniel.lezcano, tglx, sboyd, john.stultz,
	Peter Zijlstra (Intel)

In order to support perf_event_mmap_page::cap_time features, an
architecture needs, aside from a userspace readable counter register,
to expose the exact clock data so that userspace can convert the
counter register into a correct timestamp.

Provide struct clock_read_data and two (seqcount) helpers so that
architectures (arm64 in specific) can expose the numbers to userspace.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/sched_clock.h |   28 ++++++++++++++++++++++++++++
 kernel/time/sched_clock.c   |   41 +++++++++++++----------------------------
 2 files changed, 41 insertions(+), 28 deletions(-)

--- a/include/linux/sched_clock.h
+++ b/include/linux/sched_clock.h
@@ -6,6 +6,34 @@
 #define LINUX_SCHED_CLOCK
 
 #ifdef CONFIG_GENERIC_SCHED_CLOCK
+/**
+ * struct clock_read_data - data required to read from sched_clock()
+ *
+ * @epoch_ns:		sched_clock() value at last update
+ * @epoch_cyc:		Clock cycle value at last update.
+ * @sched_clock_mask:   Bitmask for two's complement subtraction of non 64bit
+ *			clocks.
+ * @read_sched_clock:	Current clock source (or dummy source when suspended).
+ * @mult:		Multipler for scaled math conversion.
+ * @shift:		Shift value for scaled math conversion.
+ *
+ * Care must be taken when updating this structure; it is read by
+ * some very hot code paths. It occupies <=40 bytes and, when combined
+ * with the seqcount used to synchronize access, comfortably fits into
+ * a 64 byte cache line.
+ */
+struct clock_read_data {
+	u64 epoch_ns;
+	u64 epoch_cyc;
+	u64 sched_clock_mask;
+	u64 (*read_sched_clock)(void);
+	u32 mult;
+	u32 shift;
+};
+
+extern struct clock_read_data *sched_clock_read_begin(unsigned int *seq);
+extern int sched_clock_read_retry(unsigned int seq);
+
 extern void generic_sched_clock_init(void);
 
 extern void sched_clock_register(u64 (*read)(void), int bits,
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -20,31 +20,6 @@
 #include "timekeeping.h"
 
 /**
- * struct clock_read_data - data required to read from sched_clock()
- *
- * @epoch_ns:		sched_clock() value at last update
- * @epoch_cyc:		Clock cycle value at last update.
- * @sched_clock_mask:   Bitmask for two's complement subtraction of non 64bit
- *			clocks.
- * @read_sched_clock:	Current clock source (or dummy source when suspended).
- * @mult:		Multipler for scaled math conversion.
- * @shift:		Shift value for scaled math conversion.
- *
- * Care must be taken when updating this structure; it is read by
- * some very hot code paths. It occupies <=40 bytes and, when combined
- * with the seqcount used to synchronize access, comfortably fits into
- * a 64 byte cache line.
- */
-struct clock_read_data {
-	u64 epoch_ns;
-	u64 epoch_cyc;
-	u64 sched_clock_mask;
-	u64 (*read_sched_clock)(void);
-	u32 mult;
-	u32 shift;
-};
-
-/**
  * struct clock_data - all data needed for sched_clock() (including
  *                     registration of a new clock source)
  *
@@ -93,6 +68,17 @@ static inline u64 notrace cyc_to_ns(u64
 	return (cyc * mult) >> shift;
 }
 
+struct clock_read_data *sched_clock_read_begin(unsigned int *seq)
+{
+	*seq = raw_read_seqcount(&cd.seq);
+	return cd.read_data + (*seq & 1);
+}
+
+int sched_clock_read_retry(unsigned int seq)
+{
+	return read_seqcount_retry(&cd.seq, seq);
+}
+
 unsigned long long notrace sched_clock(void)
 {
 	u64 cyc, res;
@@ -100,13 +86,12 @@ unsigned long long notrace sched_clock(v
 	struct clock_read_data *rd;
 
 	do {
-		seq = raw_read_seqcount(&cd.seq);
-		rd = cd.read_data + (seq & 1);
+		rd = sched_clock_read_begin(&seq);
 
 		cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
 		      rd->sched_clock_mask;
 		res = rd->epoch_ns + cyc_to_ns(cyc, rd->mult, rd->shift);
-	} while (read_seqcount_retry(&cd.seq, seq));
+	} while (sched_clock_read_retry(seq));
 
 	return res;
 }



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/5] arm64: perf: Implement correct cap_user_time
  2020-05-12 12:40 [PATCH 0/5] arm64: perf: Proper cap_user_time* support Peter Zijlstra
  2020-05-12 12:40 ` [PATCH 1/5] sched_clock: Expose struct clock_read_data Peter Zijlstra
@ 2020-05-12 12:41 ` Peter Zijlstra
  2020-05-12 14:03   ` Leo Yan
  2020-05-12 16:05   ` kbuild test robot
  2020-05-12 12:41 ` [PATCH 3/5] arm64: perf: Only advertise cap_user_time for arch_timer Peter Zijlstra
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 14+ messages in thread
From: Peter Zijlstra @ 2020-05-12 12:41 UTC (permalink / raw)
  To: Will Deacon, Marc Zyngier, Mark Rutland, Catalin Marinas, Leo Yan
  Cc: linux-kernel, linux-arm-kernel, mingo, acme, alexander.shishkin,
	jolsa, daniel.lezcano, tglx, sboyd, john.stultz,
	Peter Zijlstra (Intel)

As reported by Leo; the existing implementation is broken when the
clock and counter don't intersect at 0.

Use the sched_clock's struct clock_read_data information to correctly
implement cap_user_time and cap_user_time_zero.

Note that the ARM64 counter is architecturally only guaranteed to be
56bit wide (implementations are allowed to be wider) and the existing
perf ABI cannot deal with wrap-around.

This implementation should also be faster than the old; seeing how we
don't need to recompute mult and shift all the time.

Reported-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm64/kernel/perf_event.c |   36 +++++++++++++++++++++++++++---------
 1 file changed, 27 insertions(+), 9 deletions(-)

--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -19,6 +19,7 @@
 #include <linux/of.h>
 #include <linux/perf/arm_pmu.h>
 #include <linux/platform_device.h>
+#include <linux/sched_clock.h>
 #include <linux/smp.h>
 
 /* ARMv8 Cortex-A53 specific event types. */
@@ -1165,28 +1166,45 @@ device_initcall(armv8_pmu_driver_init)
 void arch_perf_update_userpage(struct perf_event *event,
 			       struct perf_event_mmap_page *userpg, u64 now)
 {
-	u32 freq;
-	u32 shift;
+	struct clock_read_data *rd;
+	unsigned int seq;
 
 	/*
 	 * Internal timekeeping for enabled/running/stopped times
 	 * is always computed with the sched_clock.
 	 */
-	freq = arch_timer_get_rate();
 	userpg->cap_user_time = 1;
+	userpg->cap_user_time_zero = 1;
+
+	do {
+		rd = sched_clock_read_begin(&seq);
+
+		userpg->time_mult = rd->mult;
+		userpg->time_shift = rd->shift;
+		userpg->time_zero = rd->epoch_ns;
+
+		/*
+		 * This isn't strictly correct, the ARM64 counter can be
+		 * 'short' and then we get funnies when it wraps. The correct
+		 * thing would be to extend the perf ABI with a cycle and mask
+		 * value, but because wrapping on ARM64 is very rare in
+		 * practise this 'works'.
+		 */
+		userpg->time_zero -= (rd->epoch_cyc * rd->mult) >> rd->shift;
+
+	} while (sched_clock_read_retry(seq));
+
+	userpg->time_offset = userpg->time_zero - now;
 
-	clocks_calc_mult_shift(&userpg->time_mult, &shift, freq,
-			NSEC_PER_SEC, 0);
 	/*
 	 * time_shift is not expected to be greater than 31 due to
 	 * the original published conversion algorithm shifting a
 	 * 32-bit value (now specifies a 64-bit value) - refer
 	 * perf_event_mmap_page documentation in perf_event.h.
 	 */
-	if (shift == 32) {
-		shift = 31;
+	if (userpg->shift == 32) {
+		userpg->shift = 31;
 		userpg->time_mult >>= 1;
 	}
-	userpg->time_shift = (u16)shift;
-	userpg->time_offset = -now;
+
 }



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 3/5] arm64: perf: Only advertise cap_user_time for arch_timer
  2020-05-12 12:40 [PATCH 0/5] arm64: perf: Proper cap_user_time* support Peter Zijlstra
  2020-05-12 12:40 ` [PATCH 1/5] sched_clock: Expose struct clock_read_data Peter Zijlstra
  2020-05-12 12:41 ` [PATCH 2/5] arm64: perf: Implement correct cap_user_time Peter Zijlstra
@ 2020-05-12 12:41 ` Peter Zijlstra
  2020-05-12 12:41 ` [PATCH 4/5] perf: Add perf_event_mmap_page::cap_user_time_short ABI Peter Zijlstra
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2020-05-12 12:41 UTC (permalink / raw)
  To: Will Deacon, Marc Zyngier, Mark Rutland, Catalin Marinas, Leo Yan
  Cc: linux-kernel, linux-arm-kernel, mingo, acme, alexander.shishkin,
	jolsa, daniel.lezcano, tglx, sboyd, john.stultz,
	Peter Zijlstra (Intel)

When sched_clock is running on anything other than arch_timer, don't
advertise cap_user_time*.

Requested-by: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm64/kernel/perf_event.c |   19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -13,6 +13,8 @@
 #include <asm/sysreg.h>
 #include <asm/virt.h>
 
+#include <clocksource/arm_arch_timer.h>
+
 #include <linux/acpi.h>
 #include <linux/clocksource.h>
 #include <linux/kvm_host.h>
@@ -1169,16 +1171,15 @@ void arch_perf_update_userpage(struct pe
 	struct clock_read_data *rd;
 	unsigned int seq;
 
-	/*
-	 * Internal timekeeping for enabled/running/stopped times
-	 * is always computed with the sched_clock.
-	 */
-	userpg->cap_user_time = 1;
-	userpg->cap_user_time_zero = 1;
+	userpg->cap_user_time = 0;
+	userpg->cap_user_time_zero = 0;
 
 	do {
 		rd = sched_clock_read_begin(&seq);
 
+		if (rd->read_sched_clock != arch_timer_read_counter)
+			return;
+
 		userpg->time_mult = rd->mult;
 		userpg->time_shift = rd->shift;
 		userpg->time_zero = rd->epoch_ns;
@@ -1207,4 +1208,10 @@ void arch_perf_update_userpage(struct pe
 		userpg->time_mult >>= 1;
 	}
 
+	/*
+	 * Internal timekeeping for enabled/running/stopped times
+	 * is always computed with the sched_clock.
+	 */
+	userpg->cap_user_time = 1;
+	userpg->cap_user_time_zero = 1;
 }



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 4/5] perf: Add perf_event_mmap_page::cap_user_time_short ABI
  2020-05-12 12:40 [PATCH 0/5] arm64: perf: Proper cap_user_time* support Peter Zijlstra
                   ` (2 preceding siblings ...)
  2020-05-12 12:41 ` [PATCH 3/5] arm64: perf: Only advertise cap_user_time for arch_timer Peter Zijlstra
@ 2020-05-12 12:41 ` Peter Zijlstra
  2020-05-12 12:41 ` [PATCH 5/5] arm64: perf: Add cap_user_time_short Peter Zijlstra
  2020-07-13  6:08 ` [PATCH 0/5] arm64: perf: Proper cap_user_time* support Leo Yan
  5 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2020-05-12 12:41 UTC (permalink / raw)
  To: Will Deacon, Marc Zyngier, Mark Rutland, Catalin Marinas, Leo Yan
  Cc: linux-kernel, linux-arm-kernel, mingo, acme, alexander.shishkin,
	jolsa, daniel.lezcano, tglx, sboyd, john.stultz,
	Peter Zijlstra (Intel)

In order to support short clock counters, provide an ABI extention.

As a whole:

    u64 time, delta, cyc = read_cycle_counter();

+   if (cap_user_time_short)
+	cyc = time_cycle + ((cyc - time_cycle) & time_mask);

    delta = mul_u64_u32_shr(cyc, time_mult, time_shift);

    if (cap_user_time_zero)
	time = time_zero + delta;

    delta += time_offset;

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/uapi/linux/perf_event.h |   23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -532,9 +532,10 @@ struct perf_event_mmap_page {
 				cap_bit0_is_deprecated	: 1, /* Always 1, signals that bit 0 is zero */
 
 				cap_user_rdpmc		: 1, /* The RDPMC instruction can be used to read counts */
-				cap_user_time		: 1, /* The time_* fields are used */
+				cap_user_time		: 1, /* The time_{shift,mult,offset} fields are used */
 				cap_user_time_zero	: 1, /* The time_zero field is used */
-				cap_____res		: 59;
+				cap_user_time_short	: 1, /* the time_{cycle,mask} fields are used */
+				cap_____res		: 58;
 		};
 	};
 
@@ -593,13 +594,29 @@ struct perf_event_mmap_page {
 	 *               ((rem * time_mult) >> time_shift);
 	 */
 	__u64	time_zero;
+
 	__u32	size;			/* Header size up to __reserved[] fields. */
+	__u32	__reserved_1;
+
+	/*
+	 * If cap_usr_time_short, the hardware clock is less than 64bit wide
+	 * and we must compute the 'cyc' value, as used by cap_usr_time, as:
+	 *
+	 *   cyc = time_cycles + ((cyc - time_cycles) & time_mask)
+	 *
+	 * NOTE: this form is explicitly chosen such that cap_usr_time_short
+	 *       is a correction on top of cap_usr_time, and code that doesn't
+	 *       know about cap_usr_time_short still works under the assumption
+	 *       the counter doesn't wrap.
+	 */
+	__u64	time_cycles;
+	__u64	time_mask;
 
 		/*
 		 * Hole for extension of the self monitor capabilities
 		 */
 
-	__u8	__reserved[118*8+4];	/* align to 1k. */
+	__u8	__reserved[116*8];	/* align to 1k. */
 
 	/*
 	 * Control data for the mmap() data buffer.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 5/5] arm64: perf: Add cap_user_time_short
  2020-05-12 12:40 [PATCH 0/5] arm64: perf: Proper cap_user_time* support Peter Zijlstra
                   ` (3 preceding siblings ...)
  2020-05-12 12:41 ` [PATCH 4/5] perf: Add perf_event_mmap_page::cap_user_time_short ABI Peter Zijlstra
@ 2020-05-12 12:41 ` Peter Zijlstra
  2020-05-12 14:11   ` Leo Yan
  2020-05-12 16:59   ` kbuild test robot
  2020-07-13  6:08 ` [PATCH 0/5] arm64: perf: Proper cap_user_time* support Leo Yan
  5 siblings, 2 replies; 14+ messages in thread
From: Peter Zijlstra @ 2020-05-12 12:41 UTC (permalink / raw)
  To: Will Deacon, Marc Zyngier, Mark Rutland, Catalin Marinas, Leo Yan
  Cc: linux-kernel, linux-arm-kernel, mingo, acme, alexander.shishkin,
	jolsa, daniel.lezcano, tglx, sboyd, john.stultz,
	Peter Zijlstra (Intel)

This completes the ARM64 cap_user_time support.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm64/kernel/perf_event.c |   12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -1173,6 +1173,7 @@ void arch_perf_update_userpage(struct pe
 
 	userpg->cap_user_time = 0;
 	userpg->cap_user_time_zero = 0;
+	userpg->cap_user_time_short = 0;
 
 	do {
 		rd = sched_clock_read_begin(&seq);
@@ -1183,13 +1184,13 @@ void arch_perf_update_userpage(struct pe
 		userpg->time_mult = rd->mult;
 		userpg->time_shift = rd->shift;
 		userpg->time_zero = rd->epoch_ns;
+		userpg->time_cycle = rd->epoch_cyc;
+		userpg->time_mask = rd->sched_clock_mask;
 
 		/*
-		 * This isn't strictly correct, the ARM64 counter can be
-		 * 'short' and then we get funnies when it wraps. The correct
-		 * thing would be to extend the perf ABI with a cycle and mask
-		 * value, but because wrapping on ARM64 is very rare in
-		 * practise this 'works'.
+		 * Subtract the cycle base, such that software that
+		 * doesn't know about cap_user_time_short still 'works'
+		 * assuming no wraps.
 		 */
 		userpg->time_zero -= (rd->epoch_cyc * rd->mult) >> rd->shift;
 
@@ -1214,4 +1215,5 @@ void arch_perf_update_userpage(struct pe
 	 */
 	userpg->cap_user_time = 1;
 	userpg->cap_user_time_zero = 1;
+	userpg->cap_user_time_short = 1;
 }



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] arm64: perf: Implement correct cap_user_time
  2020-05-12 12:41 ` [PATCH 2/5] arm64: perf: Implement correct cap_user_time Peter Zijlstra
@ 2020-05-12 14:03   ` Leo Yan
  2020-05-12 14:08     ` Peter Zijlstra
  2020-05-12 16:05   ` kbuild test robot
  1 sibling, 1 reply; 14+ messages in thread
From: Leo Yan @ 2020-05-12 14:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Marc Zyngier, Mark Rutland, Catalin Marinas,
	linux-kernel, linux-arm-kernel, mingo, acme, alexander.shishkin,
	jolsa, daniel.lezcano, tglx, sboyd, john.stultz

Hi Peter,

On Tue, May 12, 2020 at 02:41:00PM +0200, Peter Zijlstra wrote:
> As reported by Leo; the existing implementation is broken when the
> clock and counter don't intersect at 0.
> 
> Use the sched_clock's struct clock_read_data information to correctly
> implement cap_user_time and cap_user_time_zero.
> 
> Note that the ARM64 counter is architecturally only guaranteed to be
> 56bit wide (implementations are allowed to be wider) and the existing
> perf ABI cannot deal with wrap-around.
> 
> This implementation should also be faster than the old; seeing how we
> don't need to recompute mult and shift all the time.
> 
> Reported-by: Leo Yan <leo.yan@linaro.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/arm64/kernel/perf_event.c |   36 +++++++++++++++++++++++++++---------
>  1 file changed, 27 insertions(+), 9 deletions(-)
> 
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -19,6 +19,7 @@
>  #include <linux/of.h>
>  #include <linux/perf/arm_pmu.h>
>  #include <linux/platform_device.h>
> +#include <linux/sched_clock.h>
>  #include <linux/smp.h>
>  
>  /* ARMv8 Cortex-A53 specific event types. */
> @@ -1165,28 +1166,45 @@ device_initcall(armv8_pmu_driver_init)
>  void arch_perf_update_userpage(struct perf_event *event,
>  			       struct perf_event_mmap_page *userpg, u64 now)
>  {
> -	u32 freq;
> -	u32 shift;
> +	struct clock_read_data *rd;
> +	unsigned int seq;
>  
>  	/*
>  	 * Internal timekeeping for enabled/running/stopped times
>  	 * is always computed with the sched_clock.
>  	 */
> -	freq = arch_timer_get_rate();
>  	userpg->cap_user_time = 1;
> +	userpg->cap_user_time_zero = 1;
> +
> +	do {
> +		rd = sched_clock_read_begin(&seq);
> +
> +		userpg->time_mult = rd->mult;
> +		userpg->time_shift = rd->shift;
> +		userpg->time_zero = rd->epoch_ns;
> +
> +		/*
> +		 * This isn't strictly correct, the ARM64 counter can be
> +		 * 'short' and then we get funnies when it wraps. The correct
> +		 * thing would be to extend the perf ABI with a cycle and mask
> +		 * value, but because wrapping on ARM64 is very rare in
> +		 * practise this 'works'.
> +		 */
> +		userpg->time_zero -= (rd->epoch_cyc * rd->mult) >> rd->shift;
> +
> +	} while (sched_clock_read_retry(seq));
> +
> +	userpg->time_offset = userpg->time_zero - now;
>  
> -	clocks_calc_mult_shift(&userpg->time_mult, &shift, freq,
> -			NSEC_PER_SEC, 0);
>  	/*
>  	 * time_shift is not expected to be greater than 31 due to
>  	 * the original published conversion algorithm shifting a
>  	 * 32-bit value (now specifies a 64-bit value) - refer
>  	 * perf_event_mmap_page documentation in perf_event.h.
>  	 */
> -	if (shift == 32) {
> -		shift = 31;
> +	if (userpg->shift == 32) {

Thanks a lot for the patch set, some typos:

s/shift/time_shift

> +		userpg->shift = 31;

s/shift/time_shift

Thanks,
Leo

>  		userpg->time_mult >>= 1;
>  	}
> -	userpg->time_shift = (u16)shift;
> -	userpg->time_offset = -now;
> +
>  }
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] arm64: perf: Implement correct cap_user_time
  2020-05-12 14:03   ` Leo Yan
@ 2020-05-12 14:08     ` Peter Zijlstra
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2020-05-12 14:08 UTC (permalink / raw)
  To: Leo Yan
  Cc: Will Deacon, Marc Zyngier, Mark Rutland, Catalin Marinas,
	linux-kernel, linux-arm-kernel, mingo, acme, alexander.shishkin,
	jolsa, daniel.lezcano, tglx, sboyd, john.stultz

On Tue, May 12, 2020 at 10:03:01PM +0800, Leo Yan wrote:
> > +	if (userpg->shift == 32) {
> 
> Thanks a lot for the patch set, some typos:
> 
> s/shift/time_shift
> 
> > +		userpg->shift = 31;
> 
> s/shift/time_shift

Blergh.. so much for me not taking the time to dig out the arm64 cross
compiler :/ Sorry about that.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/5] arm64: perf: Add cap_user_time_short
  2020-05-12 12:41 ` [PATCH 5/5] arm64: perf: Add cap_user_time_short Peter Zijlstra
@ 2020-05-12 14:11   ` Leo Yan
  2020-05-12 16:59   ` kbuild test robot
  1 sibling, 0 replies; 14+ messages in thread
From: Leo Yan @ 2020-05-12 14:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Marc Zyngier, Mark Rutland, Catalin Marinas,
	linux-kernel, linux-arm-kernel, mingo, acme, alexander.shishkin,
	jolsa, daniel.lezcano, tglx, sboyd, john.stultz

On Tue, May 12, 2020 at 02:41:03PM +0200, Peter Zijlstra wrote:
> This completes the ARM64 cap_user_time support.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/arm64/kernel/perf_event.c |   12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -1173,6 +1173,7 @@ void arch_perf_update_userpage(struct pe
>  
>  	userpg->cap_user_time = 0;
>  	userpg->cap_user_time_zero = 0;
> +	userpg->cap_user_time_short = 0;
>  
>  	do {
>  		rd = sched_clock_read_begin(&seq);
> @@ -1183,13 +1184,13 @@ void arch_perf_update_userpage(struct pe
>  		userpg->time_mult = rd->mult;
>  		userpg->time_shift = rd->shift;
>  		userpg->time_zero = rd->epoch_ns;
> +		userpg->time_cycle = rd->epoch_cyc;

s/time_cycle/time_cycles, maybe consider to change the naming to
'time_cycle'.

This patch set looks good to me after I tested it on Arm64 board.

Thanks,
Leo

> +		userpg->time_mask = rd->sched_clock_mask;
>  
>  		/*
> -		 * This isn't strictly correct, the ARM64 counter can be
> -		 * 'short' and then we get funnies when it wraps. The correct
> -		 * thing would be to extend the perf ABI with a cycle and mask
> -		 * value, but because wrapping on ARM64 is very rare in
> -		 * practise this 'works'.
> +		 * Subtract the cycle base, such that software that
> +		 * doesn't know about cap_user_time_short still 'works'
> +		 * assuming no wraps.
>  		 */
>  		userpg->time_zero -= (rd->epoch_cyc * rd->mult) >> rd->shift;
>  
> @@ -1214,4 +1215,5 @@ void arch_perf_update_userpage(struct pe
>  	 */
>  	userpg->cap_user_time = 1;
>  	userpg->cap_user_time_zero = 1;
> +	userpg->cap_user_time_short = 1;
>  }
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] arm64: perf: Implement correct cap_user_time
  2020-05-12 12:41 ` [PATCH 2/5] arm64: perf: Implement correct cap_user_time Peter Zijlstra
  2020-05-12 14:03   ` Leo Yan
@ 2020-05-12 16:05   ` kbuild test robot
  1 sibling, 0 replies; 14+ messages in thread
From: kbuild test robot @ 2020-05-12 16:05 UTC (permalink / raw)
  To: Peter Zijlstra, Will Deacon, Marc Zyngier, Mark Rutland,
	Catalin Marinas, Leo Yan
  Cc: kbuild-all, linux-kernel, linux-arm-kernel, mingo, acme,
	alexander.shishkin, jolsa, daniel.lezcano, tglx, sboyd

[-- Attachment #1: Type: text/plain, Size: 3298 bytes --]

Hi Peter,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/perf/core]
[also build test ERROR on arm64/for-next/core arm-perf/for-next/perf linus/master v5.7-rc5 next-20200512]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Peter-Zijlstra/arm64-perf-Proper-cap_user_time-support/20200512-205141
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 059c6d68cfc5f85ba3ab71d71a6de380016f7936
config: arm64-allyesconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day GCC_VERSION=9.3.0 make.cross ARCH=arm64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

arch/arm64/kernel/perf_event.c: In function 'arch_perf_update_userpage':
>> arch/arm64/kernel/perf_event.c:1205:12: error: 'struct perf_event_mmap_page' has no member named 'shift'
1205 |  if (userpg->shift == 32) {
|            ^~
arch/arm64/kernel/perf_event.c:1206:9: error: 'struct perf_event_mmap_page' has no member named 'shift'
1206 |   userpg->shift = 31;
|         ^~

vim +1205 arch/arm64/kernel/perf_event.c

  1165	
  1166	void arch_perf_update_userpage(struct perf_event *event,
  1167				       struct perf_event_mmap_page *userpg, u64 now)
  1168	{
  1169		struct clock_read_data *rd;
  1170		unsigned int seq;
  1171	
  1172		/*
  1173		 * Internal timekeeping for enabled/running/stopped times
  1174		 * is always computed with the sched_clock.
  1175		 */
  1176		userpg->cap_user_time = 1;
  1177		userpg->cap_user_time_zero = 1;
  1178	
  1179		do {
  1180			rd = sched_clock_read_begin(&seq);
  1181	
  1182			userpg->time_mult = rd->mult;
  1183			userpg->time_shift = rd->shift;
  1184			userpg->time_zero = rd->epoch_ns;
  1185	
  1186			/*
  1187			 * This isn't strictly correct, the ARM64 counter can be
  1188			 * 'short' and then we get funnies when it wraps. The correct
  1189			 * thing would be to extend the perf ABI with a cycle and mask
  1190			 * value, but because wrapping on ARM64 is very rare in
  1191			 * practise this 'works'.
  1192			 */
  1193			userpg->time_zero -= (rd->epoch_cyc * rd->mult) >> rd->shift;
  1194	
  1195		} while (sched_clock_read_retry(seq));
  1196	
  1197		userpg->time_offset = userpg->time_zero - now;
  1198	
  1199		/*
  1200		 * time_shift is not expected to be greater than 31 due to
  1201		 * the original published conversion algorithm shifting a
  1202		 * 32-bit value (now specifies a 64-bit value) - refer
  1203		 * perf_event_mmap_page documentation in perf_event.h.
  1204		 */
> 1205		if (userpg->shift == 32) {
  1206			userpg->shift = 31;
  1207			userpg->time_mult >>= 1;
  1208		}
  1209	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 71774 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/5] arm64: perf: Add cap_user_time_short
  2020-05-12 12:41 ` [PATCH 5/5] arm64: perf: Add cap_user_time_short Peter Zijlstra
  2020-05-12 14:11   ` Leo Yan
@ 2020-05-12 16:59   ` kbuild test robot
  1 sibling, 0 replies; 14+ messages in thread
From: kbuild test robot @ 2020-05-12 16:59 UTC (permalink / raw)
  To: Peter Zijlstra, Will Deacon, Marc Zyngier, Mark Rutland,
	Catalin Marinas, Leo Yan
  Cc: kbuild-all, linux-kernel, linux-arm-kernel, mingo, acme,
	alexander.shishkin, jolsa, daniel.lezcano, tglx, sboyd

[-- Attachment #1: Type: text/plain, Size: 2625 bytes --]

Hi Peter,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/perf/core]
[also build test ERROR on arm64/for-next/core arm-perf/for-next/perf linus/master v5.7-rc5 next-20200512]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Peter-Zijlstra/arm64-perf-Proper-cap_user_time-support/20200512-205141
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 059c6d68cfc5f85ba3ab71d71a6de380016f7936
config: arm64-allyesconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day GCC_VERSION=9.3.0 make.cross ARCH=arm64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

arch/arm64/kernel/perf_event.c: In function 'arch_perf_update_userpage':
>> arch/arm64/kernel/perf_event.c:1187:11: error: 'struct perf_event_mmap_page' has no member named 'time_cycle'; did you mean 'time_cycles'?
1187 |   userpg->time_cycle = rd->epoch_cyc;
|           ^~~~~~~~~~
|           time_cycles
arch/arm64/kernel/perf_event.c:1207:12: error: 'struct perf_event_mmap_page' has no member named 'shift'
1207 |  if (userpg->shift == 32) {
|            ^~
arch/arm64/kernel/perf_event.c:1208:9: error: 'struct perf_event_mmap_page' has no member named 'shift'
1208 |   userpg->shift = 31;
|         ^~

vim +1187 arch/arm64/kernel/perf_event.c

  1167	
  1168	void arch_perf_update_userpage(struct perf_event *event,
  1169				       struct perf_event_mmap_page *userpg, u64 now)
  1170	{
  1171		struct clock_read_data *rd;
  1172		unsigned int seq;
  1173	
  1174		userpg->cap_user_time = 0;
  1175		userpg->cap_user_time_zero = 0;
  1176		userpg->cap_user_time_short = 0;
  1177	
  1178		do {
  1179			rd = sched_clock_read_begin(&seq);
  1180	
  1181			if (rd->read_sched_clock != arch_timer_read_counter)
  1182				return;
  1183	
  1184			userpg->time_mult = rd->mult;
  1185			userpg->time_shift = rd->shift;
  1186			userpg->time_zero = rd->epoch_ns;
> 1187			userpg->time_cycle = rd->epoch_cyc;

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 71774 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] arm64: perf: Proper cap_user_time* support
  2020-05-12 12:40 [PATCH 0/5] arm64: perf: Proper cap_user_time* support Peter Zijlstra
                   ` (4 preceding siblings ...)
  2020-05-12 12:41 ` [PATCH 5/5] arm64: perf: Add cap_user_time_short Peter Zijlstra
@ 2020-07-13  6:08 ` Leo Yan
  2020-07-13 10:11   ` Will Deacon
  5 siblings, 1 reply; 14+ messages in thread
From: Leo Yan @ 2020-07-13  6:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Marc Zyngier, Mark Rutland, Catalin Marinas,
	linux-kernel, linux-arm-kernel, mingo, acme, alexander.shishkin,
	jolsa, daniel.lezcano, tglx, sboyd, john.stultz

Hi Peter,

On Tue, May 12, 2020 at 02:40:58PM +0200, Peter Zijlstra wrote:
> Prompted by Leo's patches, here a series that corrects the arm64 perf cap_user_time situation.

I checked the latest mainline kernel code base, found this patch set
are missed to merge into it.

Could you confirm if this is missed or any other reasons to hold on it?

Thanks,
Leo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] arm64: perf: Proper cap_user_time* support
  2020-07-13  6:08 ` [PATCH 0/5] arm64: perf: Proper cap_user_time* support Leo Yan
@ 2020-07-13 10:11   ` Will Deacon
  2020-07-13 12:58     ` Leo Yan
  0 siblings, 1 reply; 14+ messages in thread
From: Will Deacon @ 2020-07-13 10:11 UTC (permalink / raw)
  To: Leo Yan
  Cc: Peter Zijlstra, Marc Zyngier, Mark Rutland, Catalin Marinas,
	linux-kernel, linux-arm-kernel, mingo, acme, alexander.shishkin,
	jolsa, daniel.lezcano, tglx, sboyd, john.stultz

Hi Leo,

On Mon, Jul 13, 2020 at 02:08:00PM +0800, Leo Yan wrote:
> On Tue, May 12, 2020 at 02:40:58PM +0200, Peter Zijlstra wrote:
> > Prompted by Leo's patches, here a series that corrects the arm64 perf cap_user_time situation.
> 
> I checked the latest mainline kernel code base, found this patch set
> are missed to merge into it.
> 
> Could you confirm if this is missed or any other reasons to hold on it?

I was assuming you were going to pick them up, fix up the issues found
by you and kbuild robot and then post a full series after testing.

Will

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] arm64: perf: Proper cap_user_time* support
  2020-07-13 10:11   ` Will Deacon
@ 2020-07-13 12:58     ` Leo Yan
  0 siblings, 0 replies; 14+ messages in thread
From: Leo Yan @ 2020-07-13 12:58 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Marc Zyngier, Mark Rutland, Catalin Marinas,
	linux-kernel, linux-arm-kernel, mingo, acme, alexander.shishkin,
	jolsa, daniel.lezcano, tglx, sboyd, john.stultz

Hi Will,

On Mon, Jul 13, 2020 at 11:11:56AM +0100, Will Deacon wrote:
> Hi Leo,
> 
> On Mon, Jul 13, 2020 at 02:08:00PM +0800, Leo Yan wrote:
> > On Tue, May 12, 2020 at 02:40:58PM +0200, Peter Zijlstra wrote:
> > > Prompted by Leo's patches, here a series that corrects the arm64 perf cap_user_time situation.
> > 
> > I checked the latest mainline kernel code base, found this patch set
> > are missed to merge into it.
> > 
> > Could you confirm if this is missed or any other reasons to hold on it?
> 
> I was assuming you were going to pick them up, fix up the issues found
> by you and kbuild robot and then post a full series after testing.

Understand now, will rebase the patch set and verify it with SPE
timestamp related changes.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-07-13 12:58 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-12 12:40 [PATCH 0/5] arm64: perf: Proper cap_user_time* support Peter Zijlstra
2020-05-12 12:40 ` [PATCH 1/5] sched_clock: Expose struct clock_read_data Peter Zijlstra
2020-05-12 12:41 ` [PATCH 2/5] arm64: perf: Implement correct cap_user_time Peter Zijlstra
2020-05-12 14:03   ` Leo Yan
2020-05-12 14:08     ` Peter Zijlstra
2020-05-12 16:05   ` kbuild test robot
2020-05-12 12:41 ` [PATCH 3/5] arm64: perf: Only advertise cap_user_time for arch_timer Peter Zijlstra
2020-05-12 12:41 ` [PATCH 4/5] perf: Add perf_event_mmap_page::cap_user_time_short ABI Peter Zijlstra
2020-05-12 12:41 ` [PATCH 5/5] arm64: perf: Add cap_user_time_short Peter Zijlstra
2020-05-12 14:11   ` Leo Yan
2020-05-12 16:59   ` kbuild test robot
2020-07-13  6:08 ` [PATCH 0/5] arm64: perf: Proper cap_user_time* support Leo Yan
2020-07-13 10:11   ` Will Deacon
2020-07-13 12:58     ` Leo Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).