All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/13] local_clock() vs noinstr
@ 2023-05-19 10:20 Peter Zijlstra
  2023-05-19 10:20 ` [PATCH v2 01/13] seqlock/latch: Provide raw_read_seqcount_latch_retry() Peter Zijlstra
                   ` (13 more replies)
  0 siblings, 14 replies; 41+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:20 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel,
	hca, gor, agordeev, borntraeger, svens, pbonzini, wanpengli,
	vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa, jgross,
	boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu, decui,
	rafael, peterz, longman, boqun.feng, pmladek, senozhatsky,
	rostedt, john.ogness, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, bristot, vschneid, jstultz,
	sboyd, linux-kernel, loongarch, linux-s390, kvm, linux-hyperv,
	linux-pm

Hi All,

latest version of the local_clock_noinstr() patches.

Most of the changes are in Hyper-V and x86/vdso/gettimeofday; Michael has been
very helpful navigating the Hyper-V spec and fixing their sched_clock
implementation.

---
 arch/arm64/include/asm/arch_timer.h      |  8 +----
 arch/arm64/include/asm/io.h              | 12 +++----
 arch/loongarch/include/asm/loongarch.h   |  2 +-
 arch/loongarch/kernel/time.c             |  6 ++--
 arch/s390/include/asm/timex.h            | 13 ++++---
 arch/s390/kernel/time.c                  |  5 +++
 arch/x86/include/asm/mshyperv.h          |  5 +++
 arch/x86/include/asm/vdso/gettimeofday.h | 41 ++++++++++++++++------
 arch/x86/kernel/kvmclock.c               |  4 +--
 arch/x86/kernel/tsc.c                    | 38 +++++++++++++++-----
 arch/x86/kvm/x86.c                       |  7 ++--
 arch/x86/xen/time.c                      |  3 +-
 drivers/clocksource/arm_arch_timer.c     | 60 ++++++++++++++++++++++++--------
 drivers/clocksource/hyperv_timer.c       | 44 ++++++++++++++---------
 drivers/cpuidle/cpuidle.c                |  8 ++---
 drivers/cpuidle/poll_state.c             |  4 +--
 include/clocksource/hyperv_timer.h       | 24 +++++--------
 include/linux/math64.h                   |  2 +-
 include/linux/rbtree_latch.h             |  2 +-
 include/linux/sched/clock.h              | 17 ++++++++-
 include/linux/seqlock.h                  | 15 ++++----
 kernel/printk/printk.c                   |  2 +-
 kernel/sched/clock.c                     | 19 ++++++----
 kernel/time/sched_clock.c                | 24 +++++++++----
 kernel/time/timekeeping.c                |  4 +--
 25 files changed, 242 insertions(+), 127 deletions(-)


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 01/13] seqlock/latch: Provide raw_read_seqcount_latch_retry()
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
@ 2023-05-19 10:20 ` Peter Zijlstra
  2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  2023-05-19 10:21 ` [PATCH v2 02/13] time/sched_clock: Provide sched_clock_noinstr() Peter Zijlstra
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:20 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel,
	hca, gor, agordeev, borntraeger, svens, pbonzini, wanpengli,
	vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa, jgross,
	boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu, decui,
	rafael, peterz, longman, boqun.feng, pmladek, senozhatsky,
	rostedt, john.ogness, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, bristot, vschneid, jstultz,
	sboyd, linux-kernel, loongarch, linux-s390, kvm, linux-hyperv,
	linux-pm

The read side of seqcount_latch consists of:

  do {
    seq = raw_read_seqcount_latch(&latch->seq);
    ...
  } while (read_seqcount_latch_retry(&latch->seq, seq));

which is asymmetric in the raw_ department, and sure enough,
read_seqcount_latch_retry() includes (explicit) instrumentation where
raw_read_seqcount_latch() does not.

This inconsistency becomes a problem when trying to use it from
noinstr code. As such, fix it by renaming and re-implementing
raw_read_seqcount_latch_retry() without the instrumentation.

Specifically the instrumentation in question is kcsan_atomic_next(0)
in do___read_seqcount_retry(). Loosing this annotation is not a
problem because raw_read_seqcount_latch() does not pass through
kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/rbtree_latch.h |    2 +-
 include/linux/seqlock.h      |   15 ++++++++-------
 kernel/printk/printk.c       |    2 +-
 kernel/time/sched_clock.c    |    2 +-
 kernel/time/timekeeping.c    |    4 ++--
 5 files changed, 13 insertions(+), 12 deletions(-)

--- a/include/linux/rbtree_latch.h
+++ b/include/linux/rbtree_latch.h
@@ -206,7 +206,7 @@ latch_tree_find(void *key, struct latch_
 	do {
 		seq = raw_read_seqcount_latch(&root->seq);
 		node = __lt_find(key, root, seq & 1, ops->comp);
-	} while (read_seqcount_latch_retry(&root->seq, seq));
+	} while (raw_read_seqcount_latch_retry(&root->seq, seq));
 
 	return node;
 }
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -671,9 +671,9 @@ typedef struct {
  *
  * Return: sequence counter raw value. Use the lowest bit as an index for
  * picking which data copy to read. The full counter must then be checked
- * with read_seqcount_latch_retry().
+ * with raw_read_seqcount_latch_retry().
  */
-static inline unsigned raw_read_seqcount_latch(const seqcount_latch_t *s)
+static __always_inline unsigned raw_read_seqcount_latch(const seqcount_latch_t *s)
 {
 	/*
 	 * Pairs with the first smp_wmb() in raw_write_seqcount_latch().
@@ -683,16 +683,17 @@ static inline unsigned raw_read_seqcount
 }
 
 /**
- * read_seqcount_latch_retry() - end a seqcount_latch_t read section
+ * raw_read_seqcount_latch_retry() - end a seqcount_latch_t read section
  * @s:		Pointer to seqcount_latch_t
  * @start:	count, from raw_read_seqcount_latch()
  *
  * Return: true if a read section retry is required, else false
  */
-static inline int
-read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
+static __always_inline int
+raw_read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
 {
-	return read_seqcount_retry(&s->seqcount, start);
+	smp_rmb();
+	return unlikely(READ_ONCE(s->seqcount.sequence) != start);
 }
 
 /**
@@ -752,7 +753,7 @@ read_seqcount_latch_retry(const seqcount
  *			entry = data_query(latch->data[idx], ...);
  *
  *		// This includes needed smp_rmb()
- *		} while (read_seqcount_latch_retry(&latch->seq, seq));
+ *		} while (raw_read_seqcount_latch_retry(&latch->seq, seq));
  *
  *		return entry;
  *	}
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -597,7 +597,7 @@ static u64 latched_seq_read_nolock(struc
 		seq = raw_read_seqcount_latch(&ls->latch);
 		idx = seq & 0x1;
 		val = ls->val[idx];
-	} while (read_seqcount_latch_retry(&ls->latch, seq));
+	} while (raw_read_seqcount_latch_retry(&ls->latch, seq));
 
 	return val;
 }
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -77,7 +77,7 @@ notrace struct clock_read_data *sched_cl
 
 notrace int sched_clock_read_retry(unsigned int seq)
 {
-	return read_seqcount_latch_retry(&cd.seq, seq);
+	return raw_read_seqcount_latch_retry(&cd.seq, seq);
 }
 
 unsigned long long notrace sched_clock(void)
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -450,7 +450,7 @@ static __always_inline u64 __ktime_get_f
 		tkr = tkf->base + (seq & 0x01);
 		now = ktime_to_ns(tkr->base);
 		now += fast_tk_get_delta_ns(tkr);
-	} while (read_seqcount_latch_retry(&tkf->seq, seq));
+	} while (raw_read_seqcount_latch_retry(&tkf->seq, seq));
 
 	return now;
 }
@@ -566,7 +566,7 @@ static __always_inline u64 __ktime_get_r
 		basem = ktime_to_ns(tkr->base);
 		baser = ktime_to_ns(tkr->base_real);
 		delta = fast_tk_get_delta_ns(tkr);
-	} while (read_seqcount_latch_retry(&tkf->seq, seq));
+	} while (raw_read_seqcount_latch_retry(&tkf->seq, seq));
 
 	if (mono)
 		*mono = basem + delta;



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 02/13] time/sched_clock: Provide sched_clock_noinstr()
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
  2023-05-19 10:20 ` [PATCH v2 01/13] seqlock/latch: Provide raw_read_seqcount_latch_retry() Peter Zijlstra
@ 2023-05-19 10:21 ` Peter Zijlstra
  2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  2023-05-19 10:21 ` [PATCH v2 03/13] arm64/io: Always inline all of __raw_{read,write}[bwlq]() Peter Zijlstra
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:21 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel,
	hca, gor, agordeev, borntraeger, svens, pbonzini, wanpengli,
	vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa, jgross,
	boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu, decui,
	rafael, peterz, longman, boqun.feng, pmladek, senozhatsky,
	rostedt, john.ogness, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, bristot, vschneid, jstultz,
	sboyd, linux-kernel, loongarch, linux-s390, kvm, linux-hyperv,
	linux-pm

With the intent to provide local_clock_noinstr(), a variant of
local_clock() that's safe to be called from noinstr code (with the
assumption that any such code will already be non-preemptible),
prepare for things by providing a noinstr sched_clock_noinstr() function.

Specifically, preempt_enable_*() calls out to schedule(), which upsets
noinstr validation efforts.

As such, pull out the preempt_{dis,en}able_notrace() requirements from
the sched_clock_read() implementations by explicitly providing it in
the sched_clock() function.

This further requires said sched_clock_read() functions to be noinstr
themselves, for ARCH_WANTS_NO_INSTR users. See the next few patches.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/time/sched_clock.c |   22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -64,7 +64,7 @@ static struct clock_data cd ____cachelin
 	.actual_read_sched_clock = jiffy_sched_clock_read,
 };
 
-static inline u64 notrace cyc_to_ns(u64 cyc, u32 mult, u32 shift)
+static __always_inline u64 cyc_to_ns(u64 cyc, u32 mult, u32 shift)
 {
 	return (cyc * mult) >> shift;
 }
@@ -80,23 +80,33 @@ notrace int sched_clock_read_retry(unsig
 	return raw_read_seqcount_latch_retry(&cd.seq, seq);
 }
 
-unsigned long long notrace sched_clock(void)
+unsigned long long noinstr sched_clock_noinstr(void)
 {
-	u64 cyc, res;
-	unsigned int seq;
 	struct clock_read_data *rd;
+	unsigned int seq;
+	u64 cyc, res;
 
 	do {
-		rd = sched_clock_read_begin(&seq);
+		seq = raw_read_seqcount_latch(&cd.seq);
+		rd = cd.read_data + (seq & 1);
 
 		cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
 		      rd->sched_clock_mask;
 		res = rd->epoch_ns + cyc_to_ns(cyc, rd->mult, rd->shift);
-	} while (sched_clock_read_retry(seq));
+	} while (raw_read_seqcount_latch_retry(&cd.seq, seq));
 
 	return res;
 }
 
+unsigned long long notrace sched_clock(void)
+{
+	unsigned long long ns;
+	preempt_disable_notrace();
+	ns = sched_clock_noinstr();
+	preempt_enable_notrace();
+	return ns;
+}
+
 /*
  * Updating the data required to read the clock.
  *



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 03/13] arm64/io: Always inline all of __raw_{read,write}[bwlq]()
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
  2023-05-19 10:20 ` [PATCH v2 01/13] seqlock/latch: Provide raw_read_seqcount_latch_retry() Peter Zijlstra
  2023-05-19 10:21 ` [PATCH v2 02/13] time/sched_clock: Provide sched_clock_noinstr() Peter Zijlstra
@ 2023-05-19 10:21 ` Peter Zijlstra
  2023-05-24 16:40   ` Valentin Schneider
  2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  2023-05-19 10:21 ` [PATCH v2 04/13] arm64/arch_timer: Provide noinstr sched_clock_read() functions Peter Zijlstra
                   ` (10 subsequent siblings)
  13 siblings, 2 replies; 41+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:21 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel,
	hca, gor, agordeev, borntraeger, svens, pbonzini, wanpengli,
	vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa, jgross,
	boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu, decui,
	rafael, peterz, longman, boqun.feng, pmladek, senozhatsky,
	rostedt, john.ogness, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, bristot, vschneid, jstultz,
	sboyd, linux-kernel, loongarch, linux-s390, kvm, linux-hyperv,
	linux-pm

The next patch will want to use __raw_readl() from a noinstr section
and as such that needs to be marked __always_inline to avoid the
compiler being a silly bugger.

Turns out it already is, but its siblings are not. Finish the work
started in commit e43f1331e2ef913b ("arm64: Ask the compiler to
__always_inline functions used by KVM at HYP") for consistenies sake.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm64/include/asm/io.h |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -22,13 +22,13 @@
  * Generic IO read/write.  These perform native-endian accesses.
  */
 #define __raw_writeb __raw_writeb
-static inline void __raw_writeb(u8 val, volatile void __iomem *addr)
+static __always_inline void __raw_writeb(u8 val, volatile void __iomem *addr)
 {
 	asm volatile("strb %w0, [%1]" : : "rZ" (val), "r" (addr));
 }
 
 #define __raw_writew __raw_writew
-static inline void __raw_writew(u16 val, volatile void __iomem *addr)
+static __always_inline void __raw_writew(u16 val, volatile void __iomem *addr)
 {
 	asm volatile("strh %w0, [%1]" : : "rZ" (val), "r" (addr));
 }
@@ -40,13 +40,13 @@ static __always_inline void __raw_writel
 }
 
 #define __raw_writeq __raw_writeq
-static inline void __raw_writeq(u64 val, volatile void __iomem *addr)
+static __always_inline void __raw_writeq(u64 val, volatile void __iomem *addr)
 {
 	asm volatile("str %x0, [%1]" : : "rZ" (val), "r" (addr));
 }
 
 #define __raw_readb __raw_readb
-static inline u8 __raw_readb(const volatile void __iomem *addr)
+static __always_inline u8 __raw_readb(const volatile void __iomem *addr)
 {
 	u8 val;
 	asm volatile(ALTERNATIVE("ldrb %w0, [%1]",
@@ -57,7 +57,7 @@ static inline u8 __raw_readb(const volat
 }
 
 #define __raw_readw __raw_readw
-static inline u16 __raw_readw(const volatile void __iomem *addr)
+static __always_inline u16 __raw_readw(const volatile void __iomem *addr)
 {
 	u16 val;
 
@@ -80,7 +80,7 @@ static __always_inline u32 __raw_readl(c
 }
 
 #define __raw_readq __raw_readq
-static inline u64 __raw_readq(const volatile void __iomem *addr)
+static __always_inline u64 __raw_readq(const volatile void __iomem *addr)
 {
 	u64 val;
 	asm volatile(ALTERNATIVE("ldr %0, [%1]",



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 04/13] arm64/arch_timer: Provide noinstr sched_clock_read() functions
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
                   ` (2 preceding siblings ...)
  2023-05-19 10:21 ` [PATCH v2 03/13] arm64/io: Always inline all of __raw_{read,write}[bwlq]() Peter Zijlstra
@ 2023-05-19 10:21 ` Peter Zijlstra
  2023-05-24 16:40   ` Valentin Schneider
  2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  2023-05-19 10:21 ` [PATCH v2 05/13] loongarch: Provide noinstr sched_clock_read() Peter Zijlstra
                   ` (9 subsequent siblings)
  13 siblings, 2 replies; 41+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:21 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel,
	hca, gor, agordeev, borntraeger, svens, pbonzini, wanpengli,
	vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa, jgross,
	boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu, decui,
	rafael, peterz, longman, boqun.feng, pmladek, senozhatsky,
	rostedt, john.ogness, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, bristot, vschneid, jstultz,
	sboyd, linux-kernel, loongarch, linux-s390, kvm, linux-hyperv,
	linux-pm

With the intent to provide local_clock_noinstr(), a variant of
local_clock() that's safe to be called from noinstr code (with the
assumption that any such code will already be non-preemptible),
prepare for things by providing a noinstr sched_clock_read() function.

Specifically, preempt_enable_*() calls out to schedule(), which upsets
noinstr validation efforts.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm64/include/asm/arch_timer.h  |    8 ----
 drivers/clocksource/arm_arch_timer.c |   60 ++++++++++++++++++++++++++---------
 2 files changed, 47 insertions(+), 21 deletions(-)

--- a/arch/arm64/include/asm/arch_timer.h
+++ b/arch/arm64/include/asm/arch_timer.h
@@ -88,13 +88,7 @@ static inline notrace u64 arch_timer_rea
 
 #define arch_timer_reg_read_stable(reg)					\
 	({								\
-		u64 _val;						\
-									\
-		preempt_disable_notrace();				\
-		_val = erratum_handler(read_ ## reg)();			\
-		preempt_enable_notrace();				\
-									\
-		_val;							\
+		erratum_handler(read_ ## reg)();			\
 	})
 
 /*
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -191,22 +191,40 @@ u32 arch_timer_reg_read(int access, enum
 	return val;
 }
 
-static notrace u64 arch_counter_get_cntpct_stable(void)
+static noinstr u64 _arch_counter_get_cntpct_stable(void)
 {
 	return __arch_counter_get_cntpct_stable();
 }
 
-static notrace u64 arch_counter_get_cntpct(void)
+static notrace u64 arch_counter_get_cntpct_stable(void)
+{
+	u64 val;
+	preempt_disable_notrace();
+	val = __arch_counter_get_cntpct_stable();
+	preempt_enable_notrace();
+	return val;
+}
+
+static noinstr u64 arch_counter_get_cntpct(void)
 {
 	return __arch_counter_get_cntpct();
 }
 
-static notrace u64 arch_counter_get_cntvct_stable(void)
+static noinstr u64 _arch_counter_get_cntvct_stable(void)
 {
 	return __arch_counter_get_cntvct_stable();
 }
 
-static notrace u64 arch_counter_get_cntvct(void)
+static notrace u64 arch_counter_get_cntvct_stable(void)
+{
+	u64 val;
+	preempt_disable_notrace();
+	val = __arch_counter_get_cntvct_stable();
+	preempt_enable_notrace();
+	return val;
+}
+
+static noinstr u64 arch_counter_get_cntvct(void)
 {
 	return __arch_counter_get_cntvct();
 }
@@ -753,14 +771,14 @@ static int arch_timer_set_next_event_phy
 	return 0;
 }
 
-static u64 arch_counter_get_cnt_mem(struct arch_timer *t, int offset_lo)
+static noinstr u64 arch_counter_get_cnt_mem(struct arch_timer *t, int offset_lo)
 {
 	u32 cnt_lo, cnt_hi, tmp_hi;
 
 	do {
-		cnt_hi = readl_relaxed(t->base + offset_lo + 4);
-		cnt_lo = readl_relaxed(t->base + offset_lo);
-		tmp_hi = readl_relaxed(t->base + offset_lo + 4);
+		cnt_hi = __raw_readl(t->base + offset_lo + 4);
+		cnt_lo = __raw_readl(t->base + offset_lo);
+		tmp_hi = __raw_readl(t->base + offset_lo + 4);
 	} while (cnt_hi != tmp_hi);
 
 	return ((u64) cnt_hi << 32) | cnt_lo;
@@ -1060,7 +1078,7 @@ bool arch_timer_evtstrm_available(void)
 	return cpumask_test_cpu(raw_smp_processor_id(), &evtstrm_available);
 }
 
-static u64 arch_counter_get_cntvct_mem(void)
+static noinstr u64 arch_counter_get_cntvct_mem(void)
 {
 	return arch_counter_get_cnt_mem(arch_timer_mem, CNTVCT_LO);
 }
@@ -1074,6 +1092,13 @@ struct arch_timer_kvm_info *arch_timer_g
 
 static void __init arch_counter_register(unsigned type)
 {
+	/*
+	 * Default to cp15 based access because arm64 uses this function for
+	 * sched_clock() before DT is probed and the cp15 method is guaranteed
+	 * to exist on arm64. arm doesn't use this before DT is probed so even
+	 * if we don't have the cp15 accessors we won't have a problem.
+	 */
+	u64 (*scr)(void) = arch_counter_get_cntvct;
 	u64 start_count;
 	int width;
 
@@ -1083,21 +1108,28 @@ static void __init arch_counter_register
 
 		if ((IS_ENABLED(CONFIG_ARM64) && !is_hyp_mode_available()) ||
 		    arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI) {
-			if (arch_timer_counter_has_wa())
+			if (arch_timer_counter_has_wa()) {
 				rd = arch_counter_get_cntvct_stable;
-			else
+				scr = _arch_counter_get_cntvct_stable;
+			} else {
 				rd = arch_counter_get_cntvct;
+				scr = arch_counter_get_cntvct;
+			}
 		} else {
-			if (arch_timer_counter_has_wa())
+			if (arch_timer_counter_has_wa()) {
 				rd = arch_counter_get_cntpct_stable;
-			else
+				scr = _arch_counter_get_cntpct_stable;
+			} else {
 				rd = arch_counter_get_cntpct;
+				scr = arch_counter_get_cntpct;
+			}
 		}
 
 		arch_timer_read_counter = rd;
 		clocksource_counter.vdso_clock_mode = vdso_default;
 	} else {
 		arch_timer_read_counter = arch_counter_get_cntvct_mem;
+		scr = arch_counter_get_cntvct_mem;
 	}
 
 	width = arch_counter_get_width();
@@ -1113,7 +1145,7 @@ static void __init arch_counter_register
 	timecounter_init(&arch_timer_kvm_info.timecounter,
 			 &cyclecounter, start_count);
 
-	sched_clock_register(arch_timer_read_counter, width, arch_timer_rate);
+	sched_clock_register(scr, width, arch_timer_rate);
 }
 
 static void arch_timer_stop(struct clock_event_device *clk)



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 05/13] loongarch: Provide noinstr sched_clock_read()
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
                   ` (3 preceding siblings ...)
  2023-05-19 10:21 ` [PATCH v2 04/13] arm64/arch_timer: Provide noinstr sched_clock_read() functions Peter Zijlstra
@ 2023-05-19 10:21 ` Peter Zijlstra
  2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  2023-05-19 10:21 ` [PATCH v2 06/13] s390/time: Provide sched_clock_noinstr() Peter Zijlstra
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:21 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel,
	hca, gor, agordeev, borntraeger, svens, pbonzini, wanpengli,
	vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa, jgross,
	boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu, decui,
	rafael, peterz, longman, boqun.feng, pmladek, senozhatsky,
	rostedt, john.ogness, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, bristot, vschneid, jstultz,
	sboyd, linux-kernel, loongarch, linux-s390, kvm, linux-hyperv,
	linux-pm

With the intent to provide local_clock_noinstr(), a variant of
local_clock() that's safe to be called from noinstr code (with the
assumption that any such code will already be non-preemptible),
prepare for things by providing a noinstr sched_clock_read() function.

Specifically, preempt_enable_*() calls out to schedule(), which upsets
noinstr validation efforts.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/loongarch/include/asm/loongarch.h |    2 +-
 arch/loongarch/kernel/time.c           |    6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

--- a/arch/loongarch/include/asm/loongarch.h
+++ b/arch/loongarch/include/asm/loongarch.h
@@ -1167,7 +1167,7 @@ static __always_inline void iocsr_write6
 
 #ifndef __ASSEMBLY__
 
-static inline u64 drdtime(void)
+static __always_inline u64 drdtime(void)
 {
 	int rID = 0;
 	u64 val = 0;
--- a/arch/loongarch/kernel/time.c
+++ b/arch/loongarch/kernel/time.c
@@ -190,9 +190,9 @@ static u64 read_const_counter(struct clo
 	return drdtime();
 }
 
-static u64 native_sched_clock(void)
+static noinstr u64 sched_clock_read(void)
 {
-	return read_const_counter(NULL);
+	return drdtime();
 }
 
 static struct clocksource clocksource_const = {
@@ -211,7 +211,7 @@ int __init constant_clocksource_init(voi
 
 	res = clocksource_register_hz(&clocksource_const, freq);
 
-	sched_clock_register(native_sched_clock, 64, freq);
+	sched_clock_register(sched_clock_read, 64, freq);
 
 	pr_info("Constant clock source device register\n");
 



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 06/13] s390/time: Provide sched_clock_noinstr()
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
                   ` (4 preceding siblings ...)
  2023-05-19 10:21 ` [PATCH v2 05/13] loongarch: Provide noinstr sched_clock_read() Peter Zijlstra
@ 2023-05-19 10:21 ` Peter Zijlstra
  2023-05-22 14:18   ` Heiko Carstens
  2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  2023-05-19 10:21 ` [PATCH v2 07/13] math64: Always inline u128 version of mul_u64_u64_shr() Peter Zijlstra
                   ` (7 subsequent siblings)
  13 siblings, 2 replies; 41+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:21 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel,
	hca, gor, agordeev, borntraeger, svens, pbonzini, wanpengli,
	vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa, jgross,
	boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu, decui,
	rafael, peterz, longman, boqun.feng, pmladek, senozhatsky,
	rostedt, john.ogness, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, bristot, vschneid, jstultz,
	sboyd, linux-kernel, loongarch, linux-s390, kvm, linux-hyperv,
	linux-pm

With the intent to provide local_clock_noinstr(), a variant of
local_clock() that's safe to be called from noinstr code (with the
assumption that any such code will already be non-preemptible),
prepare for things by providing a noinstr sched_clock_noinstr()
function.

Specifically, preempt_enable_*() calls out to schedule(), which upsets
noinstr validation efforts.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/s390/include/asm/timex.h |   13 +++++++++----
 arch/s390/kernel/time.c       |   11 ++++++++++-
 2 files changed, 19 insertions(+), 5 deletions(-)

--- a/arch/s390/include/asm/timex.h
+++ b/arch/s390/include/asm/timex.h
@@ -63,7 +63,7 @@ static inline int store_tod_clock_ext_cc
 	return cc;
 }
 
-static inline void store_tod_clock_ext(union tod_clock *tod)
+static __always_inline void store_tod_clock_ext(union tod_clock *tod)
 {
 	asm volatile("stcke %0" : "=Q" (*tod) : : "cc");
 }
@@ -177,7 +177,7 @@ static inline void local_tick_enable(uns
 
 typedef unsigned long cycles_t;
 
-static inline unsigned long get_tod_clock(void)
+static __always_inline unsigned long get_tod_clock(void)
 {
 	union tod_clock clk;
 
@@ -204,6 +204,11 @@ void init_cpu_timer(void);
 
 extern union tod_clock tod_clock_base;
 
+static __always_inline unsigned long __get_tod_clock_monotonic(void)
+{
+	return get_tod_clock() - tod_clock_base.tod;
+}
+
 /**
  * get_clock_monotonic - returns current time in clock rate units
  *
@@ -216,7 +221,7 @@ static inline unsigned long get_tod_cloc
 	unsigned long tod;
 
 	preempt_disable_notrace();
-	tod = get_tod_clock() - tod_clock_base.tod;
+	tod = __get_tod_clock_monotonic();
 	preempt_enable_notrace();
 	return tod;
 }
@@ -240,7 +245,7 @@ static inline unsigned long get_tod_cloc
  * -> ns = (th * 125) + ((tl * 125) >> 9);
  *
  */
-static inline unsigned long tod_to_ns(unsigned long todval)
+static __always_inline unsigned long tod_to_ns(unsigned long todval)
 {
 	return ((todval >> 9) * 125) + (((todval & 0x1ff) * 125) >> 9);
 }
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -102,6 +102,11 @@ void __init time_early_init(void)
 			((long) qui.old_leap * 4096000000L);
 }
 
+unsigned long long noinstr sched_clock_noinstr(void)
+{
+	return tod_to_ns(__get_tod_clock_monotonic());
+}
+
 /*
  * Scheduler clock - returns current time in nanosec units.
  */



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 07/13] math64: Always inline u128 version of mul_u64_u64_shr()
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
                   ` (5 preceding siblings ...)
  2023-05-19 10:21 ` [PATCH v2 06/13] s390/time: Provide sched_clock_noinstr() Peter Zijlstra
@ 2023-05-19 10:21 ` Peter Zijlstra
  2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  2023-05-19 10:21 ` [PATCH v2 08/13] x86/vdso: Fix gettimeofday masking Peter Zijlstra
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:21 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel,
	hca, gor, agordeev, borntraeger, svens, pbonzini, wanpengli,
	vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa, jgross,
	boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu, decui,
	rafael, peterz, longman, boqun.feng, pmladek, senozhatsky,
	rostedt, john.ogness, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, bristot, vschneid, jstultz,
	sboyd, linux-kernel, loongarch, linux-s390, kvm, linux-hyperv,
	linux-pm

In order to prevent the following complaint from happening, always
inline the u128 variant of mul_u64_u64_shr() -- which is what x86_64
will use.

  vmlinux.o: warning: objtool: read_hv_sched_clock_tsc+0x5a: call to mul_u64_u64_shr.constprop.0() leaves .noinstr.text section

It should compile into something like:

  asm("mul	%[mul];"
      "shrd	%rdx, %rax, %cl"
      : "+&a" (a)
      : "c" shift, [mul] "r" (mul)
      : "d");

Which is silly not to inline, but it happens.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/math64.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/include/linux/math64.h
+++ b/include/linux/math64.h
@@ -168,7 +168,7 @@ static __always_inline u64 mul_u64_u32_s
 #endif /* mul_u64_u32_shr */
 
 #ifndef mul_u64_u64_shr
-static inline u64 mul_u64_u64_shr(u64 a, u64 mul, unsigned int shift)
+static __always_inline u64 mul_u64_u64_shr(u64 a, u64 mul, unsigned int shift)
 {
 	return (u64)(((unsigned __int128)a * mul) >> shift);
 }



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 08/13] x86/vdso: Fix gettimeofday masking
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
                   ` (6 preceding siblings ...)
  2023-05-19 10:21 ` [PATCH v2 07/13] math64: Always inline u128 version of mul_u64_u64_shr() Peter Zijlstra
@ 2023-05-19 10:21 ` Peter Zijlstra
  2023-05-31 15:27   ` Thomas Gleixner
                     ` (2 more replies)
  2023-05-19 10:21 ` [PATCH v2 09/13] clocksource: hyper-v: Adjust hv_read_tsc_page_tsc() to avoid special casing U64_MAX Peter Zijlstra
                   ` (5 subsequent siblings)
  13 siblings, 3 replies; 41+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:21 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel,
	hca, gor, agordeev, borntraeger, svens, pbonzini, wanpengli,
	vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa, jgross,
	boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu, decui,
	rafael, peterz, longman, boqun.feng, pmladek, senozhatsky,
	rostedt, john.ogness, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, bristot, vschneid, jstultz,
	sboyd, linux-kernel, loongarch, linux-s390, kvm, linux-hyperv,
	linux-pm

Because of how the virtual clocks use U64_MAX as an exception value
instead of a valid time, the clocks can no longer be assumed to wrap
cleanly. This is then compounded by arch_vdso_cycles_ok() rejecting
everything with the MSB/Sign-bit set.

Therefore, the effective mask becomes S64_MAX, and the comment with
vdso_calc_delta() that states the mask is U64_MAX and isn't optimized
out is just plain silly.

Now, the code has a negative filter -- to deal with TSC wobbles:

	if (cycles > last)

which is just plain wrong, because it should've been written as:

	if ((s64)(cycles - last) > 0)

to take wrapping into account, but per all the above, we don't
actually wrap on u64 anymore.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/vdso/gettimeofday.h |   39 ++++++++++++++++++++++---------
 1 file changed, 28 insertions(+), 11 deletions(-)

--- a/arch/x86/include/asm/vdso/gettimeofday.h
+++ b/arch/x86/include/asm/vdso/gettimeofday.h
@@ -231,14 +231,17 @@ static u64 vread_pvclock(void)
 		ret = __pvclock_read_cycles(pvti, rdtsc_ordered());
 	} while (pvclock_read_retry(pvti, version));
 
-	return ret;
+	return ret & S64_MAX;
 }
 #endif
 
 #ifdef CONFIG_HYPERV_TIMER
 static u64 vread_hvclock(void)
 {
-	return hv_read_tsc_page(&hvclock_page);
+	u64 ret = hv_read_tsc_page(&hvclock_page);
+	if (likely(ret != U64_MAX))
+		ret &= S64_MAX;
+	return ret;
 }
 #endif
 
@@ -246,7 +249,7 @@ static inline u64 __arch_get_hw_counter(
 					const struct vdso_data *vd)
 {
 	if (likely(clock_mode == VDSO_CLOCKMODE_TSC))
-		return (u64)rdtsc_ordered();
+		return (u64)rdtsc_ordered() & S64_MAX;
 	/*
 	 * For any memory-mapped vclock type, we need to make sure that gcc
 	 * doesn't cleverly hoist a load before the mode check.  Otherwise we
@@ -284,6 +287,9 @@ static inline bool arch_vdso_clocksource
  * which can be invalidated asynchronously and indicate invalidation by
  * returning U64_MAX, which can be effectively tested by checking for a
  * negative value after casting it to s64.
+ *
+ * This effectively forces a S64_MAX mask on the calculations, unlike the
+ * U64_MAX mask normally used by x86 clocksources.
  */
 static inline bool arch_vdso_cycles_ok(u64 cycles)
 {
@@ -303,18 +309,29 @@ static inline bool arch_vdso_cycles_ok(u
  * @last. If not then use @last, which is the base time of the current
  * conversion period.
  *
- * This variant also removes the masking of the subtraction because the
- * clocksource mask of all VDSO capable clocksources on x86 is U64_MAX
- * which would result in a pointless operation. The compiler cannot
- * optimize it away as the mask comes from the vdso data and is not compile
- * time constant.
+ * This variant also uses a custom mask because while the clocksource mask of
+ * all the VDSO capable clocksources on x86 is U64_MAX, the above code uses
+ * U64_MASK as an exception value, additionally arch_vdso_cycles_ok() above
+ * declares everything with the MSB/Sign-bit set as invalid. Therefore the
+ * effective mask is S64_MAX.
  */
 static __always_inline
 u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult)
 {
-	if (cycles > last)
-		return (cycles - last) * mult;
-	return 0;
+	/*
+	 * Due to the MSB/Sign-bit being used as invald marker (see
+	 * arch_vdso_cycles_valid() above), the effective mask is S64_MAX.
+	 */
+	u64 delta = (cycles - last) & S64_MAX;
+
+	/*
+	 * Due to the above mentioned TSC wobbles, filter out negative motion.
+	 * Per the above masking, the effective sign bit is now bit 62.
+	 */
+	if (unlikely(delta & (1ULL << 62)))
+		return 0;
+
+	return delta * mult;
 }
 #define vdso_calc_delta vdso_calc_delta
 



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 09/13] clocksource: hyper-v: Adjust hv_read_tsc_page_tsc() to avoid special casing U64_MAX
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
                   ` (7 preceding siblings ...)
  2023-05-19 10:21 ` [PATCH v2 08/13] x86/vdso: Fix gettimeofday masking Peter Zijlstra
@ 2023-05-19 10:21 ` Peter Zijlstra
  2023-05-19 18:38   ` Michael Kelley (LINUX)
  2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  2023-05-19 10:21 ` [PATCH v2 10/13] clocksource: hyper-v: Provide noinstr sched_clock() Peter Zijlstra
                   ` (4 subsequent siblings)
  13 siblings, 2 replies; 41+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:21 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel,
	hca, gor, agordeev, borntraeger, svens, pbonzini, wanpengli,
	vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa, jgross,
	boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu, decui,
	rafael, peterz, longman, boqun.feng, pmladek, senozhatsky,
	rostedt, john.ogness, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, bristot, vschneid, jstultz,
	sboyd, linux-kernel, loongarch, linux-s390, kvm, linux-hyperv,
	linux-pm

Currently hv_read_tsc_page_tsc() (ab)uses the (valid) time value of
U64_MAX as an error return. This breaks the clean wrap-around of the
clock.

Modify the function signature to return a boolean state and provide
another u64 pointer to store the actual time on success. This obviates
the need to steal one time value and restores the full counter width.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/vdso/gettimeofday.h |   10 ++++++----
 arch/x86/kvm/x86.c                       |    7 +++----
 drivers/clocksource/hyperv_timer.c       |   16 +++++++++++-----
 include/clocksource/hyperv_timer.h       |   24 +++++++++---------------
 4 files changed, 29 insertions(+), 28 deletions(-)

--- a/arch/x86/include/asm/vdso/gettimeofday.h
+++ b/arch/x86/include/asm/vdso/gettimeofday.h
@@ -238,10 +238,12 @@ static u64 vread_pvclock(void)
 #ifdef CONFIG_HYPERV_TIMER
 static u64 vread_hvclock(void)
 {
-	u64 ret = hv_read_tsc_page(&hvclock_page);
-	if (likely(ret != U64_MAX))
-		ret &= S64_MAX;
-	return ret;
+	u64 tsc, time;
+
+	if (hv_read_tsc_page_tsc(&hvclock_page, &tsc, &time))
+		return time & S64_MAX;
+
+	return U64_MAX;
 }
 #endif
 
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2799,14 +2799,13 @@ static u64 read_tsc(void)
 static inline u64 vgettsc(struct pvclock_clock *clock, u64 *tsc_timestamp,
 			  int *mode)
 {
-	long v;
 	u64 tsc_pg_val;
+	long v;
 
 	switch (clock->vclock_mode) {
 	case VDSO_CLOCKMODE_HVCLOCK:
-		tsc_pg_val = hv_read_tsc_page_tsc(hv_get_tsc_page(),
-						  tsc_timestamp);
-		if (tsc_pg_val != U64_MAX) {
+		if (hv_read_tsc_page_tsc(hv_get_tsc_page(),
+					 tsc_timestamp, &tsc_pg_val)) {
 			/* TSC page valid */
 			*mode = VDSO_CLOCKMODE_HVCLOCK;
 			v = (tsc_pg_val - clock->cycle_last) &
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -393,14 +393,20 @@ struct ms_hyperv_tsc_page *hv_get_tsc_pa
 }
 EXPORT_SYMBOL_GPL(hv_get_tsc_page);
 
-static u64 notrace read_hv_clock_tsc(void)
+static notrace u64 read_hv_clock_tsc(void)
 {
-	u64 current_tick = hv_read_tsc_page(hv_get_tsc_page());
+	u64 cur_tsc, time;
 
-	if (current_tick == U64_MAX)
-		current_tick = hv_get_register(HV_REGISTER_TIME_REF_COUNT);
+	/*
+	 * The Hyper-V Top-Level Function Spec (TLFS), section Timers,
+	 * subsection Refererence Counter, guarantees that the TSC and MSR
+	 * times are in sync and monotonic. Therefore we can fall back
+	 * to the MSR in case the TSC page indicates unavailability.
+	 */
+	if (!hv_read_tsc_page_tsc(tsc_page, &cur_tsc, &time))
+		time = hv_get_register(HV_REGISTER_TIME_REF_COUNT);
 
-	return current_tick;
+	return time;
 }
 
 static u64 notrace read_hv_clock_tsc_cs(struct clocksource *arg)
--- a/include/clocksource/hyperv_timer.h
+++ b/include/clocksource/hyperv_timer.h
@@ -38,8 +38,9 @@ extern void hv_remap_tsc_clocksource(voi
 extern unsigned long hv_get_tsc_pfn(void);
 extern struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
 
-static inline notrace u64
-hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg, u64 *cur_tsc)
+static __always_inline bool
+hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg,
+		     u64 *cur_tsc, u64 *time)
 {
 	u64 scale, offset;
 	u32 sequence;
@@ -63,7 +64,7 @@ hv_read_tsc_page_tsc(const struct ms_hyp
 	do {
 		sequence = READ_ONCE(tsc_pg->tsc_sequence);
 		if (!sequence)
-			return U64_MAX;
+			return false;
 		/*
 		 * Make sure we read sequence before we read other values from
 		 * TSC page.
@@ -82,15 +83,8 @@ hv_read_tsc_page_tsc(const struct ms_hyp
 
 	} while (READ_ONCE(tsc_pg->tsc_sequence) != sequence);
 
-	return mul_u64_u64_shr(*cur_tsc, scale, 64) + offset;
-}
-
-static inline notrace u64
-hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
-{
-	u64 cur_tsc;
-
-	return hv_read_tsc_page_tsc(tsc_pg, &cur_tsc);
+	*time = mul_u64_u64_shr(*cur_tsc, scale, 64) + offset;
+	return true;
 }
 
 #else /* CONFIG_HYPERV_TIMER */
@@ -104,10 +98,10 @@ static inline struct ms_hyperv_tsc_page
 	return NULL;
 }
 
-static inline u64 hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg,
-				       u64 *cur_tsc)
+static __always_inline bool
+hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg, u64 *cur_tsc, u64 *time)
 {
-	return U64_MAX;
+	return false;
 }
 
 static inline int hv_stimer_cleanup(unsigned int cpu) { return 0; }



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 10/13] clocksource: hyper-v: Provide noinstr sched_clock()
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
                   ` (8 preceding siblings ...)
  2023-05-19 10:21 ` [PATCH v2 09/13] clocksource: hyper-v: Adjust hv_read_tsc_page_tsc() to avoid special casing U64_MAX Peter Zijlstra
@ 2023-05-19 10:21 ` Peter Zijlstra
  2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  2023-05-19 10:21 ` [PATCH v2 11/13] x86/tsc: Provide sched_clock_noinstr() Peter Zijlstra
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:21 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel,
	hca, gor, agordeev, borntraeger, svens, pbonzini, wanpengli,
	vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa, jgross,
	boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu, decui,
	rafael, peterz, longman, boqun.feng, pmladek, senozhatsky,
	rostedt, john.ogness, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, bristot, vschneid, jstultz,
	sboyd, linux-kernel, loongarch, linux-s390, kvm, linux-hyperv,
	linux-pm, Michael Kelley

With the intent to provide local_clock_noinstr(), a variant of
local_clock() that's safe to be called from noinstr code (with the
assumption that any such code will already be non-preemptible),
prepare for things by making the Hyper-V TSC and MSR sched_clock
implementations noinstr.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Co-developed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/mshyperv.h    |    5 +++++
 drivers/clocksource/hyperv_timer.c |   32 ++++++++++++++++++--------------
 2 files changed, 23 insertions(+), 14 deletions(-)

--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -257,6 +257,11 @@ void hv_set_register(unsigned int reg, u
 u64 hv_get_non_nested_register(unsigned int reg);
 void hv_set_non_nested_register(unsigned int reg, u64 value);
 
+static __always_inline u64 hv_raw_get_register(unsigned int reg)
+{
+	return __rdmsr(reg);
+}
+
 #else /* CONFIG_HYPERV */
 static inline void hyperv_init(void) {}
 static inline void hyperv_setup_mmu_ops(void) {}
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -365,6 +365,20 @@ void hv_stimer_global_cleanup(void)
 }
 EXPORT_SYMBOL_GPL(hv_stimer_global_cleanup);
 
+static __always_inline u64 read_hv_clock_msr(void)
+{
+	/*
+	 * Read the partition counter to get the current tick count. This count
+	 * is set to 0 when the partition is created and is incremented in 100
+	 * nanosecond units.
+	 *
+	 * Use hv_raw_get_register() because this function is used from
+	 * noinstr. Notable; while HV_REGISTER_TIME_REF_COUNT is a synthetic
+	 * register it doesn't need the GHCB path.
+	 */
+	return hv_raw_get_register(HV_REGISTER_TIME_REF_COUNT);
+}
+
 /*
  * Code and definitions for the Hyper-V clocksources.  Two
  * clocksources are defined: one that reads the Hyper-V defined MSR, and
@@ -393,7 +407,7 @@ struct ms_hyperv_tsc_page *hv_get_tsc_pa
 }
 EXPORT_SYMBOL_GPL(hv_get_tsc_page);
 
-static notrace u64 read_hv_clock_tsc(void)
+static __always_inline u64 read_hv_clock_tsc(void)
 {
 	u64 cur_tsc, time;
 
@@ -404,7 +418,7 @@ static notrace u64 read_hv_clock_tsc(voi
 	 * to the MSR in case the TSC page indicates unavailability.
 	 */
 	if (!hv_read_tsc_page_tsc(tsc_page, &cur_tsc, &time))
-		time = hv_get_register(HV_REGISTER_TIME_REF_COUNT);
+		time = read_hv_clock_msr();
 
 	return time;
 }
@@ -414,7 +428,7 @@ static u64 notrace read_hv_clock_tsc_cs(
 	return read_hv_clock_tsc();
 }
 
-static u64 notrace read_hv_sched_clock_tsc(void)
+static u64 noinstr read_hv_sched_clock_tsc(void)
 {
 	return (read_hv_clock_tsc() - hv_sched_clock_offset) *
 		(NSEC_PER_SEC / HV_CLOCK_HZ);
@@ -466,22 +480,12 @@ static struct clocksource hyperv_cs_tsc
 #endif
 };
 
-static u64 notrace read_hv_clock_msr(void)
-{
-	/*
-	 * Read the partition counter to get the current tick count. This count
-	 * is set to 0 when the partition is created and is incremented in
-	 * 100 nanosecond units.
-	 */
-	return hv_get_register(HV_REGISTER_TIME_REF_COUNT);
-}
-
 static u64 notrace read_hv_clock_msr_cs(struct clocksource *arg)
 {
 	return read_hv_clock_msr();
 }
 
-static u64 notrace read_hv_sched_clock_msr(void)
+static u64 noinstr read_hv_sched_clock_msr(void)
 {
 	return (read_hv_clock_msr() - hv_sched_clock_offset) *
 		(NSEC_PER_SEC / HV_CLOCK_HZ);



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 11/13] x86/tsc: Provide sched_clock_noinstr()
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
                   ` (9 preceding siblings ...)
  2023-05-19 10:21 ` [PATCH v2 10/13] clocksource: hyper-v: Provide noinstr sched_clock() Peter Zijlstra
@ 2023-05-19 10:21 ` Peter Zijlstra
  2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  2023-05-19 10:21 ` [PATCH v2 12/13] sched/clock: Provide local_clock_noinstr() Peter Zijlstra
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:21 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel,
	hca, gor, agordeev, borntraeger, svens, pbonzini, wanpengli,
	vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa, jgross,
	boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu, decui,
	rafael, peterz, longman, boqun.feng, pmladek, senozhatsky,
	rostedt, john.ogness, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, bristot, vschneid, jstultz,
	sboyd, linux-kernel, loongarch, linux-s390, kvm, linux-hyperv,
	linux-pm

With the intent to provide local_clock_noinstr(), a variant of
local_clock() that's safe to be called from noinstr code (with the
assumption that any such code will already be non-preemptible),
prepare for things by providing a noinstr sched_clock_noinstr()
function.

Specifically, preempt_enable_*() calls out to schedule(), which upsets
noinstr validation efforts.

  vmlinux.o: warning: objtool: native_sched_clock+0x96: call to preempt_schedule_notrace_thunk() leaves .noinstr.text section
  vmlinux.o: warning: objtool: kvm_clock_read+0x22: call to preempt_schedule_notrace_thunk() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/hyperv_timer.h |    5 ++++
 arch/x86/kernel/kvmclock.c          |    4 +--
 arch/x86/kernel/tsc.c               |   38 +++++++++++++++++++++++--------
 arch/x86/kvm/x86.c                  |    7 ++---
 arch/x86/xen/time.c                 |    3 --
 drivers/clocksource/hyperv_timer.c  |   44 ++++++++++++++++++++++--------------
 include/clocksource/hyperv_timer.h  |   24 +++++++------------
 7 files changed, 76 insertions(+), 49 deletions(-)

--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -71,7 +71,7 @@ static int kvm_set_wallclock(const struc
 	return -ENODEV;
 }
 
-static noinstr u64 kvm_clock_read(void)
+static u64 kvm_clock_read(void)
 {
 	u64 ret;
 
@@ -88,7 +88,7 @@ static u64 kvm_clock_get_cycles(struct c
 
 static noinstr u64 kvm_sched_clock_read(void)
 {
-	return kvm_clock_read() - kvm_sched_clock_offset;
+	return pvclock_clocksource_read_nowd(this_cpu_pvti()) - kvm_sched_clock_offset;
 }
 
 static inline void kvm_sched_clock_init(bool stable)
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -69,12 +69,10 @@ static int __init tsc_early_khz_setup(ch
 }
 early_param("tsc_early_khz", tsc_early_khz_setup);
 
-__always_inline void cyc2ns_read_begin(struct cyc2ns_data *data)
+__always_inline void __cyc2ns_read(struct cyc2ns_data *data)
 {
 	int seq, idx;
 
-	preempt_disable_notrace();
-
 	do {
 		seq = this_cpu_read(cyc2ns.seq.seqcount.sequence);
 		idx = seq & 1;
@@ -86,6 +84,12 @@ __always_inline void cyc2ns_read_begin(s
 	} while (unlikely(seq != this_cpu_read(cyc2ns.seq.seqcount.sequence)));
 }
 
+__always_inline void cyc2ns_read_begin(struct cyc2ns_data *data)
+{
+	preempt_disable_notrace();
+	__cyc2ns_read(data);
+}
+
 __always_inline void cyc2ns_read_end(void)
 {
 	preempt_enable_notrace();
@@ -115,18 +119,25 @@ __always_inline void cyc2ns_read_end(voi
  *                      -johnstul@us.ibm.com "math is hard, lets go shopping!"
  */
 
-static __always_inline unsigned long long cycles_2_ns(unsigned long long cyc)
+static __always_inline unsigned long long __cycles_2_ns(unsigned long long cyc)
 {
 	struct cyc2ns_data data;
 	unsigned long long ns;
 
-	cyc2ns_read_begin(&data);
+	__cyc2ns_read(&data);
 
 	ns = data.cyc2ns_offset;
 	ns += mul_u64_u32_shr(cyc, data.cyc2ns_mul, data.cyc2ns_shift);
 
-	cyc2ns_read_end();
+	return ns;
+}
 
+static __always_inline unsigned long long cycles_2_ns(unsigned long long cyc)
+{
+	unsigned long long ns;
+	preempt_disable_notrace();
+	ns = __cycles_2_ns(cyc);
+	preempt_enable_notrace();
 	return ns;
 }
 
@@ -223,7 +234,7 @@ noinstr u64 native_sched_clock(void)
 		u64 tsc_now = rdtsc();
 
 		/* return the value in ns */
-		return cycles_2_ns(tsc_now);
+		return __cycles_2_ns(tsc_now);
 	}
 
 	/*
@@ -250,7 +261,7 @@ u64 native_sched_clock_from_tsc(u64 tsc)
 /* We need to define a real function for sched_clock, to override the
    weak default version */
 #ifdef CONFIG_PARAVIRT
-noinstr u64 sched_clock(void)
+noinstr u64 sched_clock_noinstr(void)
 {
 	return paravirt_sched_clock();
 }
@@ -260,11 +271,20 @@ bool using_native_sched_clock(void)
 	return static_call_query(pv_sched_clock) == native_sched_clock;
 }
 #else
-u64 sched_clock(void) __attribute__((alias("native_sched_clock")));
+u64 sched_clock_noinstr(void) __attribute__((alias("native_sched_clock")));
 
 bool using_native_sched_clock(void) { return true; }
 #endif
 
+notrace u64 sched_clock(void)
+{
+	u64 now;
+	preempt_disable_notrace();
+	now = sched_clock_noinstr();
+	preempt_enable_notrace();
+	return now;
+}
+
 int check_tsc_unstable(void)
 {
 	return tsc_unstable;
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -66,11 +66,10 @@ static noinstr u64 xen_sched_clock(void)
         struct pvclock_vcpu_time_info *src;
 	u64 ret;
 
-	preempt_disable_notrace();
 	src = &__this_cpu_read(xen_vcpu)->time;
 	ret = pvclock_clocksource_read_nowd(src);
 	ret -= xen_sched_clock_offset;
-	preempt_enable_notrace();
+
 	return ret;
 }
 



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 12/13] sched/clock: Provide local_clock_noinstr()
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
                   ` (10 preceding siblings ...)
  2023-05-19 10:21 ` [PATCH v2 11/13] x86/tsc: Provide sched_clock_noinstr() Peter Zijlstra
@ 2023-05-19 10:21 ` Peter Zijlstra
  2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  2023-05-19 10:21 ` [PATCH v2 13/13] cpuidle: Use local_clock_noinstr() Peter Zijlstra
  2023-05-19 18:48 ` [PATCH v2 00/13] local_clock() vs noinstr Michael Kelley (LINUX)
  13 siblings, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:21 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel,
	hca, gor, agordeev, borntraeger, svens, pbonzini, wanpengli,
	vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa, jgross,
	boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu, decui,
	rafael, peterz, longman, boqun.feng, pmladek, senozhatsky,
	rostedt, john.ogness, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, bristot, vschneid, jstultz,
	sboyd, linux-kernel, loongarch, linux-s390, kvm, linux-hyperv,
	linux-pm

Now that all ARCH_WANTS_NO_INSTR architectures (arm64, loongarch,
s390, x86) provide sched_clock_noinstr(), use this to provide
local_clock_noinstr().

This local_clock_noinstr() will be safe to use from noinstr code with
the assumption that any such noinstr code is non-preemptible (it had
better be, entry code will have IRQs disabled while __cpuidle must
have preemption disabled).

Specifically, preempt_enable_notrace(), a common part of many a
sched_clock() implementation calls out to schedule() -- even though,
per the above, it will never trigger -- which frustrates noinstr
validation.

  vmlinux.o: warning: objtool: local_clock+0xb5: call to preempt_schedule_notrace_thunk() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/sched/clock.h |   17 ++++++++++++++++-
 kernel/sched/clock.c        |   19 +++++++++++++------
 2 files changed, 29 insertions(+), 7 deletions(-)

--- a/include/linux/sched/clock.h
+++ b/include/linux/sched/clock.h
@@ -12,7 +12,16 @@
  *
  * Please use one of the three interfaces below.
  */
-extern unsigned long long notrace sched_clock(void);
+extern u64 sched_clock(void);
+
+#if defined(CONFIG_ARCH_WANTS_NO_INSTR) || defined(CONFIG_GENERIC_SCHED_CLOCK)
+extern u64 sched_clock_noinstr(void);
+#else
+static __always_inline u64 sched_clock_noinstr(void)
+{
+	return sched_clock();
+}
+#endif
 
 /*
  * See the comment in kernel/sched/clock.c
@@ -45,6 +54,11 @@ static inline u64 cpu_clock(int cpu)
 	return sched_clock();
 }
 
+static __always_inline u64 local_clock_noinstr(void)
+{
+	return sched_clock_noinstr();
+}
+
 static __always_inline u64 local_clock(void)
 {
 	return sched_clock();
@@ -79,6 +93,7 @@ static inline u64 cpu_clock(int cpu)
 	return sched_clock_cpu(cpu);
 }
 
+extern u64 local_clock_noinstr(void);
 extern u64 local_clock(void);
 
 #endif
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -266,7 +266,7 @@ static __always_inline u64 sched_clock_l
 	s64 delta;
 
 again:
-	now = sched_clock();
+	now = sched_clock_noinstr();
 	delta = now - scd->tick_raw;
 	if (unlikely(delta < 0))
 		delta = 0;
@@ -293,22 +293,29 @@ static __always_inline u64 sched_clock_l
 	return clock;
 }
 
-noinstr u64 local_clock(void)
+noinstr u64 local_clock_noinstr(void)
 {
 	u64 clock;
 
 	if (static_branch_likely(&__sched_clock_stable))
-		return sched_clock() + __sched_clock_offset;
+		return sched_clock_noinstr() + __sched_clock_offset;
 
 	if (!static_branch_likely(&sched_clock_running))
-		return sched_clock();
+		return sched_clock_noinstr();
 
-	preempt_disable_notrace();
 	clock = sched_clock_local(this_scd());
-	preempt_enable_notrace();
 
 	return clock;
 }
+
+u64 local_clock(void)
+{
+	u64 now;
+	preempt_disable_notrace();
+	now = local_clock_noinstr();
+	preempt_enable_notrace();
+	return now;
+}
 EXPORT_SYMBOL_GPL(local_clock);
 
 static notrace u64 sched_clock_remote(struct sched_clock_data *scd)



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 13/13] cpuidle: Use local_clock_noinstr()
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
                   ` (11 preceding siblings ...)
  2023-05-19 10:21 ` [PATCH v2 12/13] sched/clock: Provide local_clock_noinstr() Peter Zijlstra
@ 2023-05-19 10:21 ` Peter Zijlstra
  2023-05-19 14:13   ` Rafael J. Wysocki
  2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  2023-05-19 18:48 ` [PATCH v2 00/13] local_clock() vs noinstr Michael Kelley (LINUX)
  13 siblings, 2 replies; 41+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:21 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel,
	hca, gor, agordeev, borntraeger, svens, pbonzini, wanpengli,
	vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa, jgross,
	boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu, decui,
	rafael, peterz, longman, boqun.feng, pmladek, senozhatsky,
	rostedt, john.ogness, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, bristot, vschneid, jstultz,
	sboyd, linux-kernel, loongarch, linux-s390, kvm, linux-hyperv,
	linux-pm

With the introduction of local_clock_noinstr(), local_clock() itself
is no longer marked noinstr, use the correct function.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 drivers/cpuidle/cpuidle.c    |    8 ++++----
 drivers/cpuidle/poll_state.c |    4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -145,7 +145,7 @@ static noinstr void enter_s2idle_proper(
 
 	instrumentation_begin();
 
-	time_start = ns_to_ktime(local_clock());
+	time_start = ns_to_ktime(local_clock_noinstr());
 
 	tick_freeze();
 	/*
@@ -169,7 +169,7 @@ static noinstr void enter_s2idle_proper(
 	tick_unfreeze();
 	start_critical_timings();
 
-	time_end = ns_to_ktime(local_clock());
+	time_end = ns_to_ktime(local_clock_noinstr());
 
 	dev->states_usage[index].s2idle_time += ktime_us_delta(time_end, time_start);
 	dev->states_usage[index].s2idle_usage++;
@@ -243,7 +243,7 @@ noinstr int cpuidle_enter_state(struct c
 	sched_idle_set_state(target_state);
 
 	trace_cpu_idle(index, dev->cpu);
-	time_start = ns_to_ktime(local_clock());
+	time_start = ns_to_ktime(local_clock_noinstr());
 
 	stop_critical_timings();
 	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE)) {
@@ -276,7 +276,7 @@ noinstr int cpuidle_enter_state(struct c
 	start_critical_timings();
 
 	sched_clock_idle_wakeup_event();
-	time_end = ns_to_ktime(local_clock());
+	time_end = ns_to_ktime(local_clock_noinstr());
 	trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu);
 
 	/* The cpu is no longer idle or about to enter idle. */
--- a/drivers/cpuidle/poll_state.c
+++ b/drivers/cpuidle/poll_state.c
@@ -15,7 +15,7 @@ static int __cpuidle poll_idle(struct cp
 {
 	u64 time_start;
 
-	time_start = local_clock();
+	time_start = local_clock_noinstr();
 
 	dev->poll_time_limit = false;
 
@@ -32,7 +32,7 @@ static int __cpuidle poll_idle(struct cp
 				continue;
 
 			loop_count = 0;
-			if (local_clock() - time_start > limit) {
+			if (local_clock_noinstr() - time_start > limit) {
 				dev->poll_time_limit = true;
 				break;
 			}



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 13/13] cpuidle: Use local_clock_noinstr()
  2023-05-19 10:21 ` [PATCH v2 13/13] cpuidle: Use local_clock_noinstr() Peter Zijlstra
@ 2023-05-19 14:13   ` Rafael J. Wysocki
  2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 41+ messages in thread
From: Rafael J. Wysocki @ 2023-05-19 14:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: bigeasy, mark.rutland, maz, catalin.marinas, will, chenhuacai,
	kernel, hca, gor, agordeev, borntraeger, svens, pbonzini,
	wanpengli, vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa,
	jgross, boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu,
	decui, rafael, longman, boqun.feng, pmladek, senozhatsky,
	rostedt, john.ogness, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, bristot, vschneid, jstultz,
	sboyd, linux-kernel, loongarch, linux-s390, kvm, linux-hyperv,
	linux-pm

On Fri, May 19, 2023 at 12:33 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> With the introduction of local_clock_noinstr(), local_clock() itself
> is no longer marked noinstr, use the correct function.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Acked-by: Rafael J. Wysocki <rafael@kernel.org>

> ---
>  drivers/cpuidle/cpuidle.c    |    8 ++++----
>  drivers/cpuidle/poll_state.c |    4 ++--
>  2 files changed, 6 insertions(+), 6 deletions(-)
>
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -145,7 +145,7 @@ static noinstr void enter_s2idle_proper(
>
>         instrumentation_begin();
>
> -       time_start = ns_to_ktime(local_clock());
> +       time_start = ns_to_ktime(local_clock_noinstr());
>
>         tick_freeze();
>         /*
> @@ -169,7 +169,7 @@ static noinstr void enter_s2idle_proper(
>         tick_unfreeze();
>         start_critical_timings();
>
> -       time_end = ns_to_ktime(local_clock());
> +       time_end = ns_to_ktime(local_clock_noinstr());
>
>         dev->states_usage[index].s2idle_time += ktime_us_delta(time_end, time_start);
>         dev->states_usage[index].s2idle_usage++;
> @@ -243,7 +243,7 @@ noinstr int cpuidle_enter_state(struct c
>         sched_idle_set_state(target_state);
>
>         trace_cpu_idle(index, dev->cpu);
> -       time_start = ns_to_ktime(local_clock());
> +       time_start = ns_to_ktime(local_clock_noinstr());
>
>         stop_critical_timings();
>         if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE)) {
> @@ -276,7 +276,7 @@ noinstr int cpuidle_enter_state(struct c
>         start_critical_timings();
>
>         sched_clock_idle_wakeup_event();
> -       time_end = ns_to_ktime(local_clock());
> +       time_end = ns_to_ktime(local_clock_noinstr());
>         trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu);
>
>         /* The cpu is no longer idle or about to enter idle. */
> --- a/drivers/cpuidle/poll_state.c
> +++ b/drivers/cpuidle/poll_state.c
> @@ -15,7 +15,7 @@ static int __cpuidle poll_idle(struct cp
>  {
>         u64 time_start;
>
> -       time_start = local_clock();
> +       time_start = local_clock_noinstr();
>
>         dev->poll_time_limit = false;
>
> @@ -32,7 +32,7 @@ static int __cpuidle poll_idle(struct cp
>                                 continue;
>
>                         loop_count = 0;
> -                       if (local_clock() - time_start > limit) {
> +                       if (local_clock_noinstr() - time_start > limit) {
>                                 dev->poll_time_limit = true;
>                                 break;
>                         }
>
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH v2 09/13] clocksource: hyper-v: Adjust hv_read_tsc_page_tsc() to avoid special casing U64_MAX
  2023-05-19 10:21 ` [PATCH v2 09/13] clocksource: hyper-v: Adjust hv_read_tsc_page_tsc() to avoid special casing U64_MAX Peter Zijlstra
@ 2023-05-19 18:38   ` Michael Kelley (LINUX)
  2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 41+ messages in thread
From: Michael Kelley (LINUX) @ 2023-05-19 18:38 UTC (permalink / raw)
  To: Peter Zijlstra, bigeasy
  Cc: Mark Rutland, maz, catalin.marinas, will, chenhuacai, kernel,
	hca, gor, agordeev, borntraeger, svens, pbonzini, wanpengli,
	vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa, jgross,
	boris.ostrovsky, daniel.lezcano, KY Srinivasan, Haiyang Zhang,
	wei.liu, Dexuan Cui, rafael, longman, boqun.feng, pmladek,
	senozhatsky, rostedt, john.ogness, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, bristot, vschneid, jstultz,
	sboyd, linux-kernel, loongarch, linux-s390, kvm, linux-hyperv,
	linux-pm

From: Peter Zijlstra <peterz@infradead.org> Sent: Friday, May 19, 2023 3:21 AM
> 
> Currently hv_read_tsc_page_tsc() (ab)uses the (valid) time value of
> U64_MAX as an error return. This breaks the clean wrap-around of the
> clock.
> 
> Modify the function signature to return a boolean state and provide
> another u64 pointer to store the actual time on success. This obviates
> the need to steal one time value and restores the full counter width.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/include/asm/vdso/gettimeofday.h |   10 ++++++----
>  arch/x86/kvm/x86.c                       |    7 +++----
>  drivers/clocksource/hyperv_timer.c       |   16 +++++++++++-----
>  include/clocksource/hyperv_timer.h       |   24 +++++++++---------------
>  4 files changed, 29 insertions(+), 28 deletions(-)
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH v2 00/13] local_clock() vs noinstr
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
                   ` (12 preceding siblings ...)
  2023-05-19 10:21 ` [PATCH v2 13/13] cpuidle: Use local_clock_noinstr() Peter Zijlstra
@ 2023-05-19 18:48 ` Michael Kelley (LINUX)
  13 siblings, 0 replies; 41+ messages in thread
From: Michael Kelley (LINUX) @ 2023-05-19 18:48 UTC (permalink / raw)
  To: Peter Zijlstra, bigeasy
  Cc: Mark Rutland, maz, catalin.marinas, will, chenhuacai, kernel,
	hca, gor, agordeev, borntraeger, svens, pbonzini, wanpengli,
	vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa, jgross,
	boris.ostrovsky, daniel.lezcano, KY Srinivasan, Haiyang Zhang,
	wei.liu, Dexuan Cui, rafael, longman, boqun.feng, pmladek,
	senozhatsky, rostedt, john.ogness, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, bristot, vschneid, jstultz,
	sboyd, linux-kernel, loongarch, linux-s390, kvm, linux-hyperv,
	linux-pm

From: Peter Zijlstra <peterz@infradead.org> Sent: Friday, May 19, 2023 3:21 AM
> 
> Hi All,
> 
> latest version of the local_clock_noinstr() patches.
> 
> Most of the changes are in Hyper-V and x86/vdso/gettimeofday; Michael has been
> very helpful navigating the Hyper-V spec and fixing their sched_clock
> implementation.
> 
> ---
>  arch/arm64/include/asm/arch_timer.h      |  8 +----
>  arch/arm64/include/asm/io.h              | 12 +++----
>  arch/loongarch/include/asm/loongarch.h   |  2 +-
>  arch/loongarch/kernel/time.c             |  6 ++--
>  arch/s390/include/asm/timex.h            | 13 ++++---
>  arch/s390/kernel/time.c                  |  5 +++
>  arch/x86/include/asm/mshyperv.h          |  5 +++
>  arch/x86/include/asm/vdso/gettimeofday.h | 41 ++++++++++++++++------
>  arch/x86/kernel/kvmclock.c               |  4 +--
>  arch/x86/kernel/tsc.c                    | 38 +++++++++++++++-----
>  arch/x86/kvm/x86.c                       |  7 ++--
>  arch/x86/xen/time.c                      |  3 +-
>  drivers/clocksource/arm_arch_timer.c     | 60 ++++++++++++++++++++++++--------
>  drivers/clocksource/hyperv_timer.c       | 44 ++++++++++++++---------
>  drivers/cpuidle/cpuidle.c                |  8 ++---
>  drivers/cpuidle/poll_state.c             |  4 +--
>  include/clocksource/hyperv_timer.h       | 24 +++++--------
>  include/linux/math64.h                   |  2 +-
>  include/linux/rbtree_latch.h             |  2 +-
>  include/linux/sched/clock.h              | 17 ++++++++-
>  include/linux/seqlock.h                  | 15 ++++----
>  kernel/printk/printk.c                   |  2 +-
>  kernel/sched/clock.c                     | 19 ++++++----
>  kernel/time/sched_clock.c                | 24 +++++++++----
>  kernel/time/timekeeping.c                |  4 +--
>  25 files changed, 242 insertions(+), 127 deletions(-)

Based on linux-next20230519, tested the full series in a Hyper-V
VM on x86/x64.   Did mainly a basic smoke test.  Sched clock
appears to work correctly.  Verified correct operation of the TSC page
clocksource and the MSR-based clocksource.  Verified that vDSO
gettimeofday() works with the Hyper-V TSC page clocksource and
is correctly disabled with the Hyper-V MSR-based clocksource.
Tested in a normal VM and in an "SEV-SNP with vTOM" VM.

Tested-by: Michael Kelley <mikelley@microsoft.com>  # Hyper-V

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 06/13] s390/time: Provide sched_clock_noinstr()
  2023-05-19 10:21 ` [PATCH v2 06/13] s390/time: Provide sched_clock_noinstr() Peter Zijlstra
@ 2023-05-22 14:18   ` Heiko Carstens
  2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 41+ messages in thread
From: Heiko Carstens @ 2023-05-22 14:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: bigeasy, mark.rutland, maz, catalin.marinas, will, chenhuacai,
	kernel, gor, agordeev, borntraeger, svens, pbonzini, wanpengli,
	vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa, jgross,
	boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu, decui,
	rafael, longman, boqun.feng, pmladek, senozhatsky, rostedt,
	john.ogness, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, bristot, vschneid, jstultz, sboyd,
	linux-kernel, loongarch, linux-s390, kvm, linux-hyperv, linux-pm

On Fri, May 19, 2023 at 12:21:04PM +0200, Peter Zijlstra wrote:
> With the intent to provide local_clock_noinstr(), a variant of
> local_clock() that's safe to be called from noinstr code (with the
> assumption that any such code will already be non-preemptible),
> prepare for things by providing a noinstr sched_clock_noinstr()
> function.
> 
> Specifically, preempt_enable_*() calls out to schedule(), which upsets
> noinstr validation efforts.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/s390/include/asm/timex.h |   13 +++++++++----
>  arch/s390/kernel/time.c       |   11 ++++++++++-
>  2 files changed, 19 insertions(+), 5 deletions(-)

Acked-by: Heiko Carstens <hca@linux.ibm.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 03/13] arm64/io: Always inline all of __raw_{read,write}[bwlq]()
  2023-05-19 10:21 ` [PATCH v2 03/13] arm64/io: Always inline all of __raw_{read,write}[bwlq]() Peter Zijlstra
@ 2023-05-24 16:40   ` Valentin Schneider
  2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 41+ messages in thread
From: Valentin Schneider @ 2023-05-24 16:40 UTC (permalink / raw)
  To: Peter Zijlstra, bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel,
	hca, gor, agordeev, borntraeger, svens, pbonzini, wanpengli,
	vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa, jgross,
	boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu, decui,
	rafael, peterz, longman, boqun.feng, pmladek, senozhatsky,
	rostedt, john.ogness, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, bristot, jstultz, sboyd,
	linux-kernel, loongarch, linux-s390, kvm, linux-hyperv, linux-pm

On 19/05/23 12:21, Peter Zijlstra wrote:
> The next patch will want to use __raw_readl() from a noinstr section
> and as such that needs to be marked __always_inline to avoid the
> compiler being a silly bugger.
>
> Turns out it already is, but its siblings are not. Finish the work
> started in commit e43f1331e2ef913b ("arm64: Ask the compiler to
> __always_inline functions used by KVM at HYP") for consistenies sake.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Valentin Schneider <vschneid@redhat.com>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 04/13] arm64/arch_timer: Provide noinstr sched_clock_read() functions
  2023-05-19 10:21 ` [PATCH v2 04/13] arm64/arch_timer: Provide noinstr sched_clock_read() functions Peter Zijlstra
@ 2023-05-24 16:40   ` Valentin Schneider
  2023-06-02 11:54     ` Peter Zijlstra
  2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  1 sibling, 1 reply; 41+ messages in thread
From: Valentin Schneider @ 2023-05-24 16:40 UTC (permalink / raw)
  To: Peter Zijlstra, bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel,
	hca, gor, agordeev, borntraeger, svens, pbonzini, wanpengli,
	vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa, jgross,
	boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu, decui,
	rafael, peterz, longman, boqun.feng, pmladek, senozhatsky,
	rostedt, john.ogness, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, bristot, jstultz, sboyd,
	linux-kernel, loongarch, linux-s390, kvm, linux-hyperv, linux-pm

On 19/05/23 12:21, Peter Zijlstra wrote:
> With the intent to provide local_clock_noinstr(), a variant of
> local_clock() that's safe to be called from noinstr code (with the
> assumption that any such code will already be non-preemptible),
> prepare for things by providing a noinstr sched_clock_read() function.
>
> Specifically, preempt_enable_*() calls out to schedule(), which upsets
> noinstr validation efforts.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/arm64/include/asm/arch_timer.h  |    8 ----
>  drivers/clocksource/arm_arch_timer.c |   60 ++++++++++++++++++++++++++---------
>  2 files changed, 47 insertions(+), 21 deletions(-)
>
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -88,13 +88,7 @@ static inline notrace u64 arch_timer_rea
>
>  #define arch_timer_reg_read_stable(reg)					\
>       ({								\
> -		u64 _val;						\
> -									\
> -		preempt_disable_notrace();				\
> -		_val = erratum_handler(read_ ## reg)();			\
> -		preempt_enable_notrace();				\
> -									\
> -		_val;							\
> +		erratum_handler(read_ ## reg)();			\
>       })
>
>  /*
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -191,22 +191,40 @@ u32 arch_timer_reg_read(int access, enum
>       return val;
>  }
>
> -static notrace u64 arch_counter_get_cntpct_stable(void)
> +static noinstr u64 _arch_counter_get_cntpct_stable(void)
>  {
>       return __arch_counter_get_cntpct_stable();
>  }
>

I tripped over the underscores when reviewing this :( This must be
getting close to the symbol length limit, but could we give this a helpful
prefix or suffix? raw_* or *_noinstr?

> @@ -1074,6 +1092,13 @@ struct arch_timer_kvm_info *arch_timer_g
>
>  static void __init arch_counter_register(unsigned type)
>  {
> +	/*
> +	 * Default to cp15 based access because arm64 uses this function for
> +	 * sched_clock() before DT is probed and the cp15 method is guaranteed
> +	 * to exist on arm64. arm doesn't use this before DT is probed so even
> +	 * if we don't have the cp15 accessors we won't have a problem.
> +	 */
> +	u64 (*scr)(void) = arch_counter_get_cntvct;

So this bit sent me on a little spelunking session :-)

From a control flow perspective the initialization isn't required, but then
I looked into the comment and found it comes from the
arch_timer_read_counter() definition... Which itself doesn't get used by
sched_clock() until the sched_clock_register() below!

So AFAICT that comment was true as of

  220069945b29 ("clocksource: arch_timer: Add support for memory mapped timers")

but not after a commit that came 2 months later:

  65cd4f6c99c1 ("arch_timer: Move to generic sched_clock framework")

which IIUC made arm/arm64 follow the default approach of using the
jiffy-based sched_clock() before probing DT/ACPI and registering a "proper"
sched_clock.

All of that to say: the comment about arch_timer_read_counter() vs early
sched_clock() doesn't apply anymore, but I think we need to keep its
initalization around for stuff like get_cycles(). This initialization here
should be OK to put to the bin, though.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 08/13] x86/vdso: Fix gettimeofday masking
  2023-05-19 10:21 ` [PATCH v2 08/13] x86/vdso: Fix gettimeofday masking Peter Zijlstra
@ 2023-05-31 15:27   ` Thomas Gleixner
  2023-05-31 22:46     ` Thomas Gleixner
  2023-05-31 22:46   ` Thomas Gleixner
  2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  2 siblings, 1 reply; 41+ messages in thread
From: Thomas Gleixner @ 2023-05-31 15:27 UTC (permalink / raw)
  To: Peter Zijlstra, bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel,
	hca, gor, agordeev, borntraeger, svens, pbonzini, wanpengli,
	vkuznets, mingo, bp, dave.hansen, x86, hpa, jgross,
	boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu, decui,
	rafael, peterz, longman, boqun.feng, pmladek, senozhatsky,
	rostedt, john.ogness, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, bristot, vschneid, jstultz,
	sboyd, linux-kernel, loongarch, linux-s390, kvm, linux-hyperv,
	linux-pm

On Fri, May 19 2023 at 12:21, Peter Zijlstra wrote:
> Because of how the virtual clocks use U64_MAX as an exception value
> instead of a valid time, the clocks can no longer be assumed to wrap
> cleanly. This is then compounded by arch_vdso_cycles_ok() rejecting
> everything with the MSB/Sign-bit set.
>
> Therefore, the effective mask becomes S64_MAX, and the comment with
> vdso_calc_delta() that states the mask is U64_MAX and isn't optimized
> out is just plain silly.
>
> Now, the code has a negative filter -- to deal with TSC wobbles:
>
> 	if (cycles > last)
>
> which is just plain wrong, because it should've been written as:
>
> 	if ((s64)(cycles - last) > 0)
>
> to take wrapping into account, but per all the above, we don't
> actually wrap on u64 anymore.

Indeed. The rationale was that you need ~146 years uptime with a 4GHz
TSC or ~584 years with 1GHz to actually reach the wrap around point.

Though I can see your point to make sure that silly BIOSes or VMMs
cannot cause havoc by accident or malice.

Did anyone ever validate that wrap around on TSC including TSC deadline
timer works correctly?

I have faint memories of TSC_ADJUST, which I prefer not to bring back to
main memory :)

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 08/13] x86/vdso: Fix gettimeofday masking
  2023-05-31 15:27   ` Thomas Gleixner
@ 2023-05-31 22:46     ` Thomas Gleixner
  0 siblings, 0 replies; 41+ messages in thread
From: Thomas Gleixner @ 2023-05-31 22:46 UTC (permalink / raw)
  To: Peter Zijlstra, bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel,
	hca, gor, agordeev, borntraeger, svens, pbonzini, wanpengli,
	vkuznets, mingo, bp, dave.hansen, x86, hpa, jgross,
	boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu, decui,
	rafael, peterz, longman, boqun.feng, pmladek, senozhatsky,
	rostedt, john.ogness, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, bristot, vschneid, jstultz,
	sboyd, linux-kernel, loongarch, linux-s390, kvm, linux-hyperv,
	linux-pm

On Wed, May 31 2023 at 17:27, Thomas Gleixner wrote:
> On Fri, May 19 2023 at 12:21, Peter Zijlstra wrote:
>> to take wrapping into account, but per all the above, we don't
>> actually wrap on u64 anymore.
>
> Indeed. The rationale was that you need ~146 years uptime with a 4GHz
> TSC or ~584 years with 1GHz to actually reach the wrap around point.
>
> Though I can see your point to make sure that silly BIOSes or VMMs
> cannot cause havoc by accident or malice.
>
> Did anyone ever validate that wrap around on TSC including TSC deadline
> timer works correctly?
>
> I have faint memories of TSC_ADJUST, which I prefer not to bring back to
> main memory :)

It seems my fears have been unjustified.

At least a quick test which sets the TSC to ~ -8min @2.1Ghz the machine
seems to survive without the colourful explosions I expected due to my
early exposure to TSC_ADJUST and TSC_DEADLINE_TIMER :)

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 08/13] x86/vdso: Fix gettimeofday masking
  2023-05-19 10:21 ` [PATCH v2 08/13] x86/vdso: Fix gettimeofday masking Peter Zijlstra
  2023-05-31 15:27   ` Thomas Gleixner
@ 2023-05-31 22:46   ` Thomas Gleixner
  2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  2 siblings, 0 replies; 41+ messages in thread
From: Thomas Gleixner @ 2023-05-31 22:46 UTC (permalink / raw)
  To: Peter Zijlstra, bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel,
	hca, gor, agordeev, borntraeger, svens, pbonzini, wanpengli,
	vkuznets, mingo, bp, dave.hansen, x86, hpa, jgross,
	boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu, decui,
	rafael, peterz, longman, boqun.feng, pmladek, senozhatsky,
	rostedt, john.ogness, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, bristot, vschneid, jstultz,
	sboyd, linux-kernel, loongarch, linux-s390, kvm, linux-hyperv,
	linux-pm

On Fri, May 19 2023 at 12:21, Peter Zijlstra wrote:
> to take wrapping into account, but per all the above, we don't
> actually wrap on u64 anymore.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Tested-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 04/13] arm64/arch_timer: Provide noinstr sched_clock_read() functions
  2023-05-24 16:40   ` Valentin Schneider
@ 2023-06-02 11:54     ` Peter Zijlstra
  2023-06-07  8:58       ` Valentin Schneider
  0 siblings, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2023-06-02 11:54 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: bigeasy, mark.rutland, maz, catalin.marinas, will, chenhuacai,
	kernel, hca, gor, agordeev, borntraeger, svens, pbonzini,
	wanpengli, vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa,
	jgross, boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu,
	decui, rafael, longman, boqun.feng, pmladek, senozhatsky,
	rostedt, john.ogness, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, bristot, jstultz, sboyd,
	linux-kernel, loongarch, linux-s390, kvm, linux-hyperv, linux-pm

On Wed, May 24, 2023 at 05:40:47PM +0100, Valentin Schneider wrote:
> On 19/05/23 12:21, Peter Zijlstra wrote:
> > With the intent to provide local_clock_noinstr(), a variant of
> > local_clock() that's safe to be called from noinstr code (with the
> > assumption that any such code will already be non-preemptible),
> > prepare for things by providing a noinstr sched_clock_read() function.
> >
> > Specifically, preempt_enable_*() calls out to schedule(), which upsets
> > noinstr validation efforts.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  arch/arm64/include/asm/arch_timer.h  |    8 ----
> >  drivers/clocksource/arm_arch_timer.c |   60 ++++++++++++++++++++++++++---------
> >  2 files changed, 47 insertions(+), 21 deletions(-)
> >
> > --- a/arch/arm64/include/asm/arch_timer.h
> > +++ b/arch/arm64/include/asm/arch_timer.h
> > @@ -88,13 +88,7 @@ static inline notrace u64 arch_timer_rea
> >
> >  #define arch_timer_reg_read_stable(reg)					\
> >       ({								\
> > -		u64 _val;						\
> > -									\
> > -		preempt_disable_notrace();				\
> > -		_val = erratum_handler(read_ ## reg)();			\
> > -		preempt_enable_notrace();				\
> > -									\
> > -		_val;							\
> > +		erratum_handler(read_ ## reg)();			\
> >       })
> >
> >  /*
> > --- a/drivers/clocksource/arm_arch_timer.c
> > +++ b/drivers/clocksource/arm_arch_timer.c
> > @@ -191,22 +191,40 @@ u32 arch_timer_reg_read(int access, enum
> >       return val;
> >  }
> >
> > -static notrace u64 arch_counter_get_cntpct_stable(void)
> > +static noinstr u64 _arch_counter_get_cntpct_stable(void)
> >  {
> >       return __arch_counter_get_cntpct_stable();
> >  }
> >
> 
> I tripped over the underscores when reviewing this :( This must be
> getting close to the symbol length limit, but could we give this a helpful
> prefix or suffix? raw_* or *_noinstr?
> 
> > @@ -1074,6 +1092,13 @@ struct arch_timer_kvm_info *arch_timer_g
> >
> >  static void __init arch_counter_register(unsigned type)
> >  {
> > +	/*
> > +	 * Default to cp15 based access because arm64 uses this function for
> > +	 * sched_clock() before DT is probed and the cp15 method is guaranteed
> > +	 * to exist on arm64. arm doesn't use this before DT is probed so even
> > +	 * if we don't have the cp15 accessors we won't have a problem.
> > +	 */
> > +	u64 (*scr)(void) = arch_counter_get_cntvct;
> 
> So this bit sent me on a little spelunking session :-)
> 
> From a control flow perspective the initialization isn't required, but then
> I looked into the comment and found it comes from the
> arch_timer_read_counter() definition... Which itself doesn't get used by
> sched_clock() until the sched_clock_register() below!
> 
> So AFAICT that comment was true as of
> 
>   220069945b29 ("clocksource: arch_timer: Add support for memory mapped timers")
> 
> but not after a commit that came 2 months later:
> 
>   65cd4f6c99c1 ("arch_timer: Move to generic sched_clock framework")
> 
> which IIUC made arm/arm64 follow the default approach of using the
> jiffy-based sched_clock() before probing DT/ACPI and registering a "proper"
> sched_clock.
> 
> All of that to say: the comment about arch_timer_read_counter() vs early
> sched_clock() doesn't apply anymore, but I think we need to keep its
> initalization around for stuff like get_cycles(). This initialization here
> should be OK to put to the bin, though.

Something like the below folded in then?

---
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -191,7 +191,7 @@ u32 arch_timer_reg_read(int access, enum
 	return val;
 }
 
-static noinstr u64 _arch_counter_get_cntpct_stable(void)
+static noinstr u64 raw_counter_get_cntpct_stable(void)
 {
 	return __arch_counter_get_cntpct_stable();
 }
@@ -210,7 +210,7 @@ static noinstr u64 arch_counter_get_cntp
 	return __arch_counter_get_cntpct();
 }
 
-static noinstr u64 _arch_counter_get_cntvct_stable(void)
+static noinstr u64 raw_counter_get_cntvct_stable(void)
 {
 	return __arch_counter_get_cntvct_stable();
 }
@@ -1092,13 +1092,7 @@ struct arch_timer_kvm_info *arch_timer_g
 
 static void __init arch_counter_register(unsigned type)
 {
-	/*
-	 * Default to cp15 based access because arm64 uses this function for
-	 * sched_clock() before DT is probed and the cp15 method is guaranteed
-	 * to exist on arm64. arm doesn't use this before DT is probed so even
-	 * if we don't have the cp15 accessors we won't have a problem.
-	 */
-	u64 (*scr)(void) = arch_counter_get_cntvct;
+	u64 (*scr)(void);
 	u64 start_count;
 	int width;
 
@@ -1110,7 +1104,7 @@ static void __init arch_counter_register
 		    arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI) {
 			if (arch_timer_counter_has_wa()) {
 				rd = arch_counter_get_cntvct_stable;
-				scr = _arch_counter_get_cntvct_stable;
+				scr = raw_counter_get_cntvct_stable;
 			} else {
 				rd = arch_counter_get_cntvct;
 				scr = arch_counter_get_cntvct;
@@ -1118,7 +1112,7 @@ static void __init arch_counter_register
 		} else {
 			if (arch_timer_counter_has_wa()) {
 				rd = arch_counter_get_cntpct_stable;
-				scr = _arch_counter_get_cntpct_stable;
+				scr = raw_counter_get_cntpct_stable;
 			} else {
 				rd = arch_counter_get_cntpct;
 				scr = arch_counter_get_cntpct;

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [tip: sched/core] cpuidle: Use local_clock_noinstr()
  2023-05-19 10:21 ` [PATCH v2 13/13] cpuidle: Use local_clock_noinstr() Peter Zijlstra
  2023-05-19 14:13   ` Rafael J. Wysocki
@ 2023-06-05 19:16   ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 41+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2023-06-05 19:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Rafael J. Wysocki, Michael Kelley, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     e6a15fa9ea8372ad4db973191233f743ae1081d5
Gitweb:        https://git.kernel.org/tip/e6a15fa9ea8372ad4db973191233f743ae1081d5
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Fri, 19 May 2023 12:21:11 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 05 Jun 2023 21:11:09 +02:00

cpuidle: Use local_clock_noinstr()

With the introduction of local_clock_noinstr(), local_clock() itself
is no longer marked noinstr, use the correct function.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>  # Hyper-V
Link: https://lore.kernel.org/r/20230519102716.045980863@infradead.org
---
 drivers/cpuidle/cpuidle.c    | 8 ++++----
 drivers/cpuidle/poll_state.c | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 8e929f6..737a026 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -145,7 +145,7 @@ static noinstr void enter_s2idle_proper(struct cpuidle_driver *drv,
 
 	instrumentation_begin();
 
-	time_start = ns_to_ktime(local_clock());
+	time_start = ns_to_ktime(local_clock_noinstr());
 
 	tick_freeze();
 	/*
@@ -169,7 +169,7 @@ static noinstr void enter_s2idle_proper(struct cpuidle_driver *drv,
 	tick_unfreeze();
 	start_critical_timings();
 
-	time_end = ns_to_ktime(local_clock());
+	time_end = ns_to_ktime(local_clock_noinstr());
 
 	dev->states_usage[index].s2idle_time += ktime_us_delta(time_end, time_start);
 	dev->states_usage[index].s2idle_usage++;
@@ -243,7 +243,7 @@ noinstr int cpuidle_enter_state(struct cpuidle_device *dev,
 	sched_idle_set_state(target_state);
 
 	trace_cpu_idle(index, dev->cpu);
-	time_start = ns_to_ktime(local_clock());
+	time_start = ns_to_ktime(local_clock_noinstr());
 
 	stop_critical_timings();
 	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE)) {
@@ -276,7 +276,7 @@ noinstr int cpuidle_enter_state(struct cpuidle_device *dev,
 	start_critical_timings();
 
 	sched_clock_idle_wakeup_event();
-	time_end = ns_to_ktime(local_clock());
+	time_end = ns_to_ktime(local_clock_noinstr());
 	trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu);
 
 	/* The cpu is no longer idle or about to enter idle. */
diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
index bdcfeae..9b6d90a 100644
--- a/drivers/cpuidle/poll_state.c
+++ b/drivers/cpuidle/poll_state.c
@@ -15,7 +15,7 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev,
 {
 	u64 time_start;
 
-	time_start = local_clock();
+	time_start = local_clock_noinstr();
 
 	dev->poll_time_limit = false;
 
@@ -32,7 +32,7 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev,
 				continue;
 
 			loop_count = 0;
-			if (local_clock() - time_start > limit) {
+			if (local_clock_noinstr() - time_start > limit) {
 				dev->poll_time_limit = true;
 				break;
 			}

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [tip: sched/core] sched/clock: Provide local_clock_noinstr()
  2023-05-19 10:21 ` [PATCH v2 12/13] sched/clock: Provide local_clock_noinstr() Peter Zijlstra
@ 2023-06-05 19:16   ` tip-bot2 for Peter Zijlstra
  0 siblings, 0 replies; 41+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2023-06-05 19:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Michael Kelley, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     fb7d4948c4da2dbd26da4b7ec76bbd2f19ff862a
Gitweb:        https://git.kernel.org/tip/fb7d4948c4da2dbd26da4b7ec76bbd2f19ff862a
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Fri, 19 May 2023 12:21:10 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 05 Jun 2023 21:11:09 +02:00

sched/clock: Provide local_clock_noinstr()

Now that all ARCH_WANTS_NO_INSTR architectures (arm64, loongarch,
s390, x86) provide sched_clock_noinstr(), use this to provide
local_clock_noinstr().

This local_clock_noinstr() will be safe to use from noinstr code with
the assumption that any such noinstr code is non-preemptible (it had
better be, entry code will have IRQs disabled while __cpuidle must
have preemption disabled).

Specifically, preempt_enable_notrace(), a common part of many a
sched_clock() implementation calls out to schedule() -- even though,
per the above, it will never trigger -- which frustrates noinstr
validation.

  vmlinux.o: warning: objtool: local_clock+0xb5: call to preempt_schedule_notrace_thunk() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>  # Hyper-V
Link: https://lore.kernel.org/r/20230519102715.978624636@infradead.org
---
 include/linux/sched/clock.h | 17 ++++++++++++++++-
 kernel/sched/clock.c        | 19 +++++++++++++------
 2 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched/clock.h b/include/linux/sched/clock.h
index ca008f7..196f0ca 100644
--- a/include/linux/sched/clock.h
+++ b/include/linux/sched/clock.h
@@ -12,7 +12,16 @@
  *
  * Please use one of the three interfaces below.
  */
-extern unsigned long long notrace sched_clock(void);
+extern u64 sched_clock(void);
+
+#if defined(CONFIG_ARCH_WANTS_NO_INSTR) || defined(CONFIG_GENERIC_SCHED_CLOCK)
+extern u64 sched_clock_noinstr(void);
+#else
+static __always_inline u64 sched_clock_noinstr(void)
+{
+	return sched_clock();
+}
+#endif
 
 /*
  * See the comment in kernel/sched/clock.c
@@ -45,6 +54,11 @@ static inline u64 cpu_clock(int cpu)
 	return sched_clock();
 }
 
+static __always_inline u64 local_clock_noinstr(void)
+{
+	return sched_clock_noinstr();
+}
+
 static __always_inline u64 local_clock(void)
 {
 	return sched_clock();
@@ -79,6 +93,7 @@ static inline u64 cpu_clock(int cpu)
 	return sched_clock_cpu(cpu);
 }
 
+extern u64 local_clock_noinstr(void);
 extern u64 local_clock(void);
 
 #endif
diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index b5cc2b5..5a575a0 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -266,7 +266,7 @@ static __always_inline u64 sched_clock_local(struct sched_clock_data *scd)
 	s64 delta;
 
 again:
-	now = sched_clock();
+	now = sched_clock_noinstr();
 	delta = now - scd->tick_raw;
 	if (unlikely(delta < 0))
 		delta = 0;
@@ -293,22 +293,29 @@ again:
 	return clock;
 }
 
-noinstr u64 local_clock(void)
+noinstr u64 local_clock_noinstr(void)
 {
 	u64 clock;
 
 	if (static_branch_likely(&__sched_clock_stable))
-		return sched_clock() + __sched_clock_offset;
+		return sched_clock_noinstr() + __sched_clock_offset;
 
 	if (!static_branch_likely(&sched_clock_running))
-		return sched_clock();
+		return sched_clock_noinstr();
 
-	preempt_disable_notrace();
 	clock = sched_clock_local(this_scd());
-	preempt_enable_notrace();
 
 	return clock;
 }
+
+u64 local_clock(void)
+{
+	u64 now;
+	preempt_disable_notrace();
+	now = local_clock_noinstr();
+	preempt_enable_notrace();
+	return now;
+}
 EXPORT_SYMBOL_GPL(local_clock);
 
 static notrace u64 sched_clock_remote(struct sched_clock_data *scd)

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [tip: sched/core] x86/tsc: Provide sched_clock_noinstr()
  2023-05-19 10:21 ` [PATCH v2 11/13] x86/tsc: Provide sched_clock_noinstr() Peter Zijlstra
@ 2023-06-05 19:16   ` tip-bot2 for Peter Zijlstra
  0 siblings, 0 replies; 41+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2023-06-05 19:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Michael Kelley, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     5c5e9a2b25b6a79d4b7a5f2a54d02ef1c36dc35a
Gitweb:        https://git.kernel.org/tip/5c5e9a2b25b6a79d4b7a5f2a54d02ef1c36dc35a
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Fri, 19 May 2023 12:21:09 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 05 Jun 2023 21:11:08 +02:00

x86/tsc: Provide sched_clock_noinstr()

With the intent to provide local_clock_noinstr(), a variant of
local_clock() that's safe to be called from noinstr code (with the
assumption that any such code will already be non-preemptible),
prepare for things by providing a noinstr sched_clock_noinstr()
function.

Specifically, preempt_enable_*() calls out to schedule(), which upsets
noinstr validation efforts.

  vmlinux.o: warning: objtool: native_sched_clock+0x96: call to preempt_schedule_notrace_thunk() leaves .noinstr.text section
  vmlinux.o: warning: objtool: kvm_clock_read+0x22: call to preempt_schedule_notrace_thunk() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>  # Hyper-V
Link: https://lore.kernel.org/r/20230519102715.910937674@infradead.org
---
 arch/x86/kernel/kvmclock.c |  4 ++--
 arch/x86/kernel/tsc.c      | 38 ++++++++++++++++++++++++++++---------
 arch/x86/xen/time.c        |  3 +--
 3 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 0f35d44..fb8f521 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -71,7 +71,7 @@ static int kvm_set_wallclock(const struct timespec64 *now)
 	return -ENODEV;
 }
 
-static noinstr u64 kvm_clock_read(void)
+static u64 kvm_clock_read(void)
 {
 	u64 ret;
 
@@ -88,7 +88,7 @@ static u64 kvm_clock_get_cycles(struct clocksource *cs)
 
 static noinstr u64 kvm_sched_clock_read(void)
 {
-	return kvm_clock_read() - kvm_sched_clock_offset;
+	return pvclock_clocksource_read_nowd(this_cpu_pvti()) - kvm_sched_clock_offset;
 }
 
 static inline void kvm_sched_clock_init(bool stable)
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 3446988..782a90e 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -69,12 +69,10 @@ static int __init tsc_early_khz_setup(char *buf)
 }
 early_param("tsc_early_khz", tsc_early_khz_setup);
 
-__always_inline void cyc2ns_read_begin(struct cyc2ns_data *data)
+__always_inline void __cyc2ns_read(struct cyc2ns_data *data)
 {
 	int seq, idx;
 
-	preempt_disable_notrace();
-
 	do {
 		seq = this_cpu_read(cyc2ns.seq.seqcount.sequence);
 		idx = seq & 1;
@@ -86,6 +84,12 @@ __always_inline void cyc2ns_read_begin(struct cyc2ns_data *data)
 	} while (unlikely(seq != this_cpu_read(cyc2ns.seq.seqcount.sequence)));
 }
 
+__always_inline void cyc2ns_read_begin(struct cyc2ns_data *data)
+{
+	preempt_disable_notrace();
+	__cyc2ns_read(data);
+}
+
 __always_inline void cyc2ns_read_end(void)
 {
 	preempt_enable_notrace();
@@ -115,18 +119,25 @@ __always_inline void cyc2ns_read_end(void)
  *                      -johnstul@us.ibm.com "math is hard, lets go shopping!"
  */
 
-static __always_inline unsigned long long cycles_2_ns(unsigned long long cyc)
+static __always_inline unsigned long long __cycles_2_ns(unsigned long long cyc)
 {
 	struct cyc2ns_data data;
 	unsigned long long ns;
 
-	cyc2ns_read_begin(&data);
+	__cyc2ns_read(&data);
 
 	ns = data.cyc2ns_offset;
 	ns += mul_u64_u32_shr(cyc, data.cyc2ns_mul, data.cyc2ns_shift);
 
-	cyc2ns_read_end();
+	return ns;
+}
 
+static __always_inline unsigned long long cycles_2_ns(unsigned long long cyc)
+{
+	unsigned long long ns;
+	preempt_disable_notrace();
+	ns = __cycles_2_ns(cyc);
+	preempt_enable_notrace();
 	return ns;
 }
 
@@ -223,7 +234,7 @@ noinstr u64 native_sched_clock(void)
 		u64 tsc_now = rdtsc();
 
 		/* return the value in ns */
-		return cycles_2_ns(tsc_now);
+		return __cycles_2_ns(tsc_now);
 	}
 
 	/*
@@ -250,7 +261,7 @@ u64 native_sched_clock_from_tsc(u64 tsc)
 /* We need to define a real function for sched_clock, to override the
    weak default version */
 #ifdef CONFIG_PARAVIRT
-noinstr u64 sched_clock(void)
+noinstr u64 sched_clock_noinstr(void)
 {
 	return paravirt_sched_clock();
 }
@@ -260,11 +271,20 @@ bool using_native_sched_clock(void)
 	return static_call_query(pv_sched_clock) == native_sched_clock;
 }
 #else
-u64 sched_clock(void) __attribute__((alias("native_sched_clock")));
+u64 sched_clock_noinstr(void) __attribute__((alias("native_sched_clock")));
 
 bool using_native_sched_clock(void) { return true; }
 #endif
 
+notrace u64 sched_clock(void)
+{
+	u64 now;
+	preempt_disable_notrace();
+	now = sched_clock_noinstr();
+	preempt_enable_notrace();
+	return now;
+}
+
 int check_tsc_unstable(void)
 {
 	return tsc_unstable;
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index b74ac25..52fa560 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -66,11 +66,10 @@ static noinstr u64 xen_sched_clock(void)
         struct pvclock_vcpu_time_info *src;
 	u64 ret;
 
-	preempt_disable_notrace();
 	src = &__this_cpu_read(xen_vcpu)->time;
 	ret = pvclock_clocksource_read_nowd(src);
 	ret -= xen_sched_clock_offset;
-	preempt_enable_notrace();
+
 	return ret;
 }
 

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [tip: sched/core] clocksource: hyper-v: Provide noinstr sched_clock()
  2023-05-19 10:21 ` [PATCH v2 10/13] clocksource: hyper-v: Provide noinstr sched_clock() Peter Zijlstra
@ 2023-06-05 19:16   ` tip-bot2 for Peter Zijlstra
  0 siblings, 0 replies; 41+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2023-06-05 19:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Michael Kelley, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     e39acc37db34f6688e2c16e958fb1d662c422c81
Gitweb:        https://git.kernel.org/tip/e39acc37db34f6688e2c16e958fb1d662c422c81
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Fri, 19 May 2023 12:21:08 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 05 Jun 2023 21:11:08 +02:00

clocksource: hyper-v: Provide noinstr sched_clock()

With the intent to provide local_clock_noinstr(), a variant of
local_clock() that's safe to be called from noinstr code (with the
assumption that any such code will already be non-preemptible),
prepare for things by making the Hyper-V TSC and MSR sched_clock
implementations noinstr.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Co-developed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>  # Hyper-V
Link: https://lore.kernel.org/r/20230519102715.843039089@infradead.org
---
 arch/x86/include/asm/mshyperv.h    |  5 +++++-
 drivers/clocksource/hyperv_timer.c | 32 ++++++++++++++++-------------
 2 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 49bb4f2..88d9ef9 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -257,6 +257,11 @@ void hv_set_register(unsigned int reg, u64 value);
 u64 hv_get_non_nested_register(unsigned int reg);
 void hv_set_non_nested_register(unsigned int reg, u64 value);
 
+static __always_inline u64 hv_raw_get_register(unsigned int reg)
+{
+	return __rdmsr(reg);
+}
+
 #else /* CONFIG_HYPERV */
 static inline void hyperv_init(void) {}
 static inline void hyperv_setup_mmu_ops(void) {}
diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index c643bfe..d851970 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -365,6 +365,20 @@ void hv_stimer_global_cleanup(void)
 }
 EXPORT_SYMBOL_GPL(hv_stimer_global_cleanup);
 
+static __always_inline u64 read_hv_clock_msr(void)
+{
+	/*
+	 * Read the partition counter to get the current tick count. This count
+	 * is set to 0 when the partition is created and is incremented in 100
+	 * nanosecond units.
+	 *
+	 * Use hv_raw_get_register() because this function is used from
+	 * noinstr. Notable; while HV_REGISTER_TIME_REF_COUNT is a synthetic
+	 * register it doesn't need the GHCB path.
+	 */
+	return hv_raw_get_register(HV_REGISTER_TIME_REF_COUNT);
+}
+
 /*
  * Code and definitions for the Hyper-V clocksources.  Two
  * clocksources are defined: one that reads the Hyper-V defined MSR, and
@@ -393,7 +407,7 @@ struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 }
 EXPORT_SYMBOL_GPL(hv_get_tsc_page);
 
-static notrace u64 read_hv_clock_tsc(void)
+static __always_inline u64 read_hv_clock_tsc(void)
 {
 	u64 cur_tsc, time;
 
@@ -404,7 +418,7 @@ static notrace u64 read_hv_clock_tsc(void)
 	 * to the MSR in case the TSC page indicates unavailability.
 	 */
 	if (!hv_read_tsc_page_tsc(tsc_page, &cur_tsc, &time))
-		time = hv_get_register(HV_REGISTER_TIME_REF_COUNT);
+		time = read_hv_clock_msr();
 
 	return time;
 }
@@ -414,7 +428,7 @@ static u64 notrace read_hv_clock_tsc_cs(struct clocksource *arg)
 	return read_hv_clock_tsc();
 }
 
-static u64 notrace read_hv_sched_clock_tsc(void)
+static u64 noinstr read_hv_sched_clock_tsc(void)
 {
 	return (read_hv_clock_tsc() - hv_sched_clock_offset) *
 		(NSEC_PER_SEC / HV_CLOCK_HZ);
@@ -466,22 +480,12 @@ static struct clocksource hyperv_cs_tsc = {
 #endif
 };
 
-static u64 notrace read_hv_clock_msr(void)
-{
-	/*
-	 * Read the partition counter to get the current tick count. This count
-	 * is set to 0 when the partition is created and is incremented in
-	 * 100 nanosecond units.
-	 */
-	return hv_get_register(HV_REGISTER_TIME_REF_COUNT);
-}
-
 static u64 notrace read_hv_clock_msr_cs(struct clocksource *arg)
 {
 	return read_hv_clock_msr();
 }
 
-static u64 notrace read_hv_sched_clock_msr(void)
+static u64 noinstr read_hv_sched_clock_msr(void)
 {
 	return (read_hv_clock_msr() - hv_sched_clock_offset) *
 		(NSEC_PER_SEC / HV_CLOCK_HZ);

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [tip: sched/core] clocksource: hyper-v: Adjust hv_read_tsc_page_tsc() to avoid special casing U64_MAX
  2023-05-19 10:21 ` [PATCH v2 09/13] clocksource: hyper-v: Adjust hv_read_tsc_page_tsc() to avoid special casing U64_MAX Peter Zijlstra
  2023-05-19 18:38   ` Michael Kelley (LINUX)
@ 2023-06-05 19:16   ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 41+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2023-06-05 19:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Michael Kelley, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     9397fa2ea3e7634f61da1ab76b9eb88ba04dfdfc
Gitweb:        https://git.kernel.org/tip/9397fa2ea3e7634f61da1ab76b9eb88ba04dfdfc
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Fri, 19 May 2023 12:21:07 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 05 Jun 2023 21:11:07 +02:00

clocksource: hyper-v: Adjust hv_read_tsc_page_tsc() to avoid special casing U64_MAX

Currently hv_read_tsc_page_tsc() (ab)uses the (valid) time value of
U64_MAX as an error return. This breaks the clean wrap-around of the
clock.

Modify the function signature to return a boolean state and provide
another u64 pointer to store the actual time on success. This obviates
the need to steal one time value and restores the full counter width.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>  # Hyper-V
Link: https://lore.kernel.org/r/20230519102715.775630881@infradead.org
---
 arch/x86/include/asm/vdso/gettimeofday.h | 10 ++++++----
 arch/x86/kvm/x86.c                       |  7 +++----
 drivers/clocksource/hyperv_timer.c       | 16 ++++++++++-----
 include/clocksource/hyperv_timer.h       | 24 ++++++++---------------
 4 files changed, 29 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/vdso/gettimeofday.h b/arch/x86/include/asm/vdso/gettimeofday.h
index 0badf0a..c81858d 100644
--- a/arch/x86/include/asm/vdso/gettimeofday.h
+++ b/arch/x86/include/asm/vdso/gettimeofday.h
@@ -238,10 +238,12 @@ static u64 vread_pvclock(void)
 #ifdef CONFIG_HYPERV_TIMER
 static u64 vread_hvclock(void)
 {
-	u64 ret = hv_read_tsc_page(&hvclock_page);
-	if (likely(ret != U64_MAX))
-		ret &= S64_MAX;
-	return ret;
+	u64 tsc, time;
+
+	if (hv_read_tsc_page_tsc(&hvclock_page, &tsc, &time))
+		return time & S64_MAX;
+
+	return U64_MAX;
 }
 #endif
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ceb7c5e..99d97ba 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2799,14 +2799,13 @@ static u64 read_tsc(void)
 static inline u64 vgettsc(struct pvclock_clock *clock, u64 *tsc_timestamp,
 			  int *mode)
 {
-	long v;
 	u64 tsc_pg_val;
+	long v;
 
 	switch (clock->vclock_mode) {
 	case VDSO_CLOCKMODE_HVCLOCK:
-		tsc_pg_val = hv_read_tsc_page_tsc(hv_get_tsc_page(),
-						  tsc_timestamp);
-		if (tsc_pg_val != U64_MAX) {
+		if (hv_read_tsc_page_tsc(hv_get_tsc_page(),
+					 tsc_timestamp, &tsc_pg_val)) {
 			/* TSC page valid */
 			*mode = VDSO_CLOCKMODE_HVCLOCK;
 			v = (tsc_pg_val - clock->cycle_last) &
diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index bcd9042..c643bfe 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -393,14 +393,20 @@ struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 }
 EXPORT_SYMBOL_GPL(hv_get_tsc_page);
 
-static u64 notrace read_hv_clock_tsc(void)
+static notrace u64 read_hv_clock_tsc(void)
 {
-	u64 current_tick = hv_read_tsc_page(hv_get_tsc_page());
+	u64 cur_tsc, time;
 
-	if (current_tick == U64_MAX)
-		current_tick = hv_get_register(HV_REGISTER_TIME_REF_COUNT);
+	/*
+	 * The Hyper-V Top-Level Function Spec (TLFS), section Timers,
+	 * subsection Refererence Counter, guarantees that the TSC and MSR
+	 * times are in sync and monotonic. Therefore we can fall back
+	 * to the MSR in case the TSC page indicates unavailability.
+	 */
+	if (!hv_read_tsc_page_tsc(tsc_page, &cur_tsc, &time))
+		time = hv_get_register(HV_REGISTER_TIME_REF_COUNT);
 
-	return current_tick;
+	return time;
 }
 
 static u64 notrace read_hv_clock_tsc_cs(struct clocksource *arg)
diff --git a/include/clocksource/hyperv_timer.h b/include/clocksource/hyperv_timer.h
index 536f897..6cdc873 100644
--- a/include/clocksource/hyperv_timer.h
+++ b/include/clocksource/hyperv_timer.h
@@ -38,8 +38,9 @@ extern void hv_remap_tsc_clocksource(void);
 extern unsigned long hv_get_tsc_pfn(void);
 extern struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
 
-static inline notrace u64
-hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg, u64 *cur_tsc)
+static __always_inline bool
+hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg,
+		     u64 *cur_tsc, u64 *time)
 {
 	u64 scale, offset;
 	u32 sequence;
@@ -63,7 +64,7 @@ hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg, u64 *cur_tsc)
 	do {
 		sequence = READ_ONCE(tsc_pg->tsc_sequence);
 		if (!sequence)
-			return U64_MAX;
+			return false;
 		/*
 		 * Make sure we read sequence before we read other values from
 		 * TSC page.
@@ -82,15 +83,8 @@ hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg, u64 *cur_tsc)
 
 	} while (READ_ONCE(tsc_pg->tsc_sequence) != sequence);
 
-	return mul_u64_u64_shr(*cur_tsc, scale, 64) + offset;
-}
-
-static inline notrace u64
-hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
-{
-	u64 cur_tsc;
-
-	return hv_read_tsc_page_tsc(tsc_pg, &cur_tsc);
+	*time = mul_u64_u64_shr(*cur_tsc, scale, 64) + offset;
+	return true;
 }
 
 #else /* CONFIG_HYPERV_TIMER */
@@ -104,10 +98,10 @@ static inline struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 	return NULL;
 }
 
-static inline u64 hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg,
-				       u64 *cur_tsc)
+static __always_inline bool
+hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg, u64 *cur_tsc, u64 *time)
 {
-	return U64_MAX;
+	return false;
 }
 
 static inline int hv_stimer_cleanup(unsigned int cpu) { return 0; }

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [tip: sched/core] math64: Always inline u128 version of mul_u64_u64_shr()
  2023-05-19 10:21 ` [PATCH v2 07/13] math64: Always inline u128 version of mul_u64_u64_shr() Peter Zijlstra
@ 2023-06-05 19:16   ` tip-bot2 for Peter Zijlstra
  0 siblings, 0 replies; 41+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2023-06-05 19:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Michael Kelley, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     fc4a0db4149afcdae2527f0d8c376accca34adc9
Gitweb:        https://git.kernel.org/tip/fc4a0db4149afcdae2527f0d8c376accca34adc9
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Fri, 19 May 2023 12:21:05 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 05 Jun 2023 21:11:06 +02:00

math64: Always inline u128 version of mul_u64_u64_shr()

In order to prevent the following complaint from happening, always
inline the u128 variant of mul_u64_u64_shr() -- which is what x86_64
will use.

  vmlinux.o: warning: objtool: read_hv_sched_clock_tsc+0x5a: call to mul_u64_u64_shr.constprop.0() leaves .noinstr.text section

It should compile into something like:

  asm("mul	%[mul];"
      "shrd	%rdx, %rax, %cl"
      : "+&a" (a)
      : "c" shift, [mul] "r" (mul)
      : "d");

Which is silly not to inline, but it happens.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>  # Hyper-V
Link: https://lore.kernel.org/r/20230519102715.637420396@infradead.org
---
 include/linux/math64.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/math64.h b/include/linux/math64.h
index 8b9191a..bf74478 100644
--- a/include/linux/math64.h
+++ b/include/linux/math64.h
@@ -168,7 +168,7 @@ static __always_inline u64 mul_u64_u32_shr(u64 a, u32 mul, unsigned int shift)
 #endif /* mul_u64_u32_shr */
 
 #ifndef mul_u64_u64_shr
-static inline u64 mul_u64_u64_shr(u64 a, u64 mul, unsigned int shift)
+static __always_inline u64 mul_u64_u64_shr(u64 a, u64 mul, unsigned int shift)
 {
 	return (u64)(((unsigned __int128)a * mul) >> shift);
 }

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [tip: sched/core] x86/vdso: Fix gettimeofday masking
  2023-05-19 10:21 ` [PATCH v2 08/13] x86/vdso: Fix gettimeofday masking Peter Zijlstra
  2023-05-31 15:27   ` Thomas Gleixner
  2023-05-31 22:46   ` Thomas Gleixner
@ 2023-06-05 19:16   ` tip-bot2 for Peter Zijlstra
  2 siblings, 0 replies; 41+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2023-06-05 19:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Thomas Gleixner, Michael Kelley, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     77750f78b0b3247c64b9821b49158cafe0506880
Gitweb:        https://git.kernel.org/tip/77750f78b0b3247c64b9821b49158cafe0506880
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Fri, 19 May 2023 12:21:06 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 05 Jun 2023 21:11:07 +02:00

x86/vdso: Fix gettimeofday masking

Because of how the virtual clocks use U64_MAX as an exception value
instead of a valid time, the clocks can no longer be assumed to wrap
cleanly. This is then compounded by arch_vdso_cycles_ok() rejecting
everything with the MSB/Sign-bit set.

Therefore, the effective mask becomes S64_MAX, and the comment with
vdso_calc_delta() that states the mask is U64_MAX and isn't optimized
out is just plain silly.

Now, the code has a negative filter -- to deal with TSC wobbles:

	if (cycles > last)

which is just plain wrong, because it should've been written as:

	if ((s64)(cycles - last) > 0)

to take wrapping into account, but per all the above, we don't
actually wrap on u64 anymore.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mikelley@microsoft.com>  # Hyper-V
Link: https://lore.kernel.org/r/20230519102715.704767397@infradead.org
---
 arch/x86/include/asm/vdso/gettimeofday.h | 39 ++++++++++++++++-------
 1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/vdso/gettimeofday.h b/arch/x86/include/asm/vdso/gettimeofday.h
index 4cf6794..0badf0a 100644
--- a/arch/x86/include/asm/vdso/gettimeofday.h
+++ b/arch/x86/include/asm/vdso/gettimeofday.h
@@ -231,14 +231,17 @@ static u64 vread_pvclock(void)
 		ret = __pvclock_read_cycles(pvti, rdtsc_ordered());
 	} while (pvclock_read_retry(pvti, version));
 
-	return ret;
+	return ret & S64_MAX;
 }
 #endif
 
 #ifdef CONFIG_HYPERV_TIMER
 static u64 vread_hvclock(void)
 {
-	return hv_read_tsc_page(&hvclock_page);
+	u64 ret = hv_read_tsc_page(&hvclock_page);
+	if (likely(ret != U64_MAX))
+		ret &= S64_MAX;
+	return ret;
 }
 #endif
 
@@ -246,7 +249,7 @@ static inline u64 __arch_get_hw_counter(s32 clock_mode,
 					const struct vdso_data *vd)
 {
 	if (likely(clock_mode == VDSO_CLOCKMODE_TSC))
-		return (u64)rdtsc_ordered();
+		return (u64)rdtsc_ordered() & S64_MAX;
 	/*
 	 * For any memory-mapped vclock type, we need to make sure that gcc
 	 * doesn't cleverly hoist a load before the mode check.  Otherwise we
@@ -284,6 +287,9 @@ static inline bool arch_vdso_clocksource_ok(const struct vdso_data *vd)
  * which can be invalidated asynchronously and indicate invalidation by
  * returning U64_MAX, which can be effectively tested by checking for a
  * negative value after casting it to s64.
+ *
+ * This effectively forces a S64_MAX mask on the calculations, unlike the
+ * U64_MAX mask normally used by x86 clocksources.
  */
 static inline bool arch_vdso_cycles_ok(u64 cycles)
 {
@@ -303,18 +309,29 @@ static inline bool arch_vdso_cycles_ok(u64 cycles)
  * @last. If not then use @last, which is the base time of the current
  * conversion period.
  *
- * This variant also removes the masking of the subtraction because the
- * clocksource mask of all VDSO capable clocksources on x86 is U64_MAX
- * which would result in a pointless operation. The compiler cannot
- * optimize it away as the mask comes from the vdso data and is not compile
- * time constant.
+ * This variant also uses a custom mask because while the clocksource mask of
+ * all the VDSO capable clocksources on x86 is U64_MAX, the above code uses
+ * U64_MASK as an exception value, additionally arch_vdso_cycles_ok() above
+ * declares everything with the MSB/Sign-bit set as invalid. Therefore the
+ * effective mask is S64_MAX.
  */
 static __always_inline
 u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult)
 {
-	if (cycles > last)
-		return (cycles - last) * mult;
-	return 0;
+	/*
+	 * Due to the MSB/Sign-bit being used as invald marker (see
+	 * arch_vdso_cycles_valid() above), the effective mask is S64_MAX.
+	 */
+	u64 delta = (cycles - last) & S64_MAX;
+
+	/*
+	 * Due to the above mentioned TSC wobbles, filter out negative motion.
+	 * Per the above masking, the effective sign bit is now bit 62.
+	 */
+	if (unlikely(delta & (1ULL << 62)))
+		return 0;
+
+	return delta * mult;
 }
 #define vdso_calc_delta vdso_calc_delta
 

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [tip: sched/core] s390/time: Provide sched_clock_noinstr()
  2023-05-19 10:21 ` [PATCH v2 06/13] s390/time: Provide sched_clock_noinstr() Peter Zijlstra
  2023-05-22 14:18   ` Heiko Carstens
@ 2023-06-05 19:16   ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 41+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2023-06-05 19:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Heiko Carstens, Michael Kelley, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     91b41a237512b569746e1f560a42d9fba077261d
Gitweb:        https://git.kernel.org/tip/91b41a237512b569746e1f560a42d9fba077261d
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Fri, 19 May 2023 12:21:04 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 05 Jun 2023 21:11:06 +02:00

s390/time: Provide sched_clock_noinstr()

With the intent to provide local_clock_noinstr(), a variant of
local_clock() that's safe to be called from noinstr code (with the
assumption that any such code will already be non-preemptible),
prepare for things by providing a noinstr sched_clock_noinstr()
function.

Specifically, preempt_enable_*() calls out to schedule(), which upsets
noinstr validation efforts.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>  # Hyper-V
Link: https://lore.kernel.org/r/20230519102715.570170436@infradead.org
---
 arch/s390/include/asm/timex.h | 13 +++++++++----
 arch/s390/kernel/time.c       |  5 +++++
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/s390/include/asm/timex.h b/arch/s390/include/asm/timex.h
index ce878e8..4d64665 100644
--- a/arch/s390/include/asm/timex.h
+++ b/arch/s390/include/asm/timex.h
@@ -63,7 +63,7 @@ static inline int store_tod_clock_ext_cc(union tod_clock *clk)
 	return cc;
 }
 
-static inline void store_tod_clock_ext(union tod_clock *tod)
+static __always_inline void store_tod_clock_ext(union tod_clock *tod)
 {
 	asm volatile("stcke %0" : "=Q" (*tod) : : "cc");
 }
@@ -177,7 +177,7 @@ static inline void local_tick_enable(unsigned long comp)
 
 typedef unsigned long cycles_t;
 
-static inline unsigned long get_tod_clock(void)
+static __always_inline unsigned long get_tod_clock(void)
 {
 	union tod_clock clk;
 
@@ -204,6 +204,11 @@ void init_cpu_timer(void);
 
 extern union tod_clock tod_clock_base;
 
+static __always_inline unsigned long __get_tod_clock_monotonic(void)
+{
+	return get_tod_clock() - tod_clock_base.tod;
+}
+
 /**
  * get_clock_monotonic - returns current time in clock rate units
  *
@@ -216,7 +221,7 @@ static inline unsigned long get_tod_clock_monotonic(void)
 	unsigned long tod;
 
 	preempt_disable_notrace();
-	tod = get_tod_clock() - tod_clock_base.tod;
+	tod = __get_tod_clock_monotonic();
 	preempt_enable_notrace();
 	return tod;
 }
@@ -240,7 +245,7 @@ static inline unsigned long get_tod_clock_monotonic(void)
  * -> ns = (th * 125) + ((tl * 125) >> 9);
  *
  */
-static inline unsigned long tod_to_ns(unsigned long todval)
+static __always_inline unsigned long tod_to_ns(unsigned long todval)
 {
 	return ((todval >> 9) * 125) + (((todval & 0x1ff) * 125) >> 9);
 }
diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c
index 6b7b6d5..2762781 100644
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -102,6 +102,11 @@ void __init time_early_init(void)
 			((long) qui.old_leap * 4096000000L);
 }
 
+unsigned long long noinstr sched_clock_noinstr(void)
+{
+	return tod_to_ns(__get_tod_clock_monotonic());
+}
+
 /*
  * Scheduler clock - returns current time in nanosec units.
  */

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [tip: sched/core] loongarch: Provide noinstr sched_clock_read()
  2023-05-19 10:21 ` [PATCH v2 05/13] loongarch: Provide noinstr sched_clock_read() Peter Zijlstra
@ 2023-06-05 19:16   ` tip-bot2 for Peter Zijlstra
  0 siblings, 0 replies; 41+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2023-06-05 19:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Michael Kelley, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     6b10fef09f937433563822f4bb6a2f947176e5a0
Gitweb:        https://git.kernel.org/tip/6b10fef09f937433563822f4bb6a2f947176e5a0
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Fri, 19 May 2023 12:21:03 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 05 Jun 2023 21:11:05 +02:00

loongarch: Provide noinstr sched_clock_read()

With the intent to provide local_clock_noinstr(), a variant of
local_clock() that's safe to be called from noinstr code (with the
assumption that any such code will already be non-preemptible),
prepare for things by providing a noinstr sched_clock_read() function.

Specifically, preempt_enable_*() calls out to schedule(), which upsets
noinstr validation efforts.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>  # Hyper-V
Link: https://lore.kernel.org/r/20230519102715.502547082@infradead.org
---
 arch/loongarch/include/asm/loongarch.h | 2 +-
 arch/loongarch/kernel/time.c           | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
index b3323ab..357ef70 100644
--- a/arch/loongarch/include/asm/loongarch.h
+++ b/arch/loongarch/include/asm/loongarch.h
@@ -1167,7 +1167,7 @@ static __always_inline void iocsr_write64(u64 val, u32 reg)
 
 #ifndef __ASSEMBLY__
 
-static inline u64 drdtime(void)
+static __always_inline u64 drdtime(void)
 {
 	int rID = 0;
 	u64 val = 0;
diff --git a/arch/loongarch/kernel/time.c b/arch/loongarch/kernel/time.c
index f377e50..c189e03 100644
--- a/arch/loongarch/kernel/time.c
+++ b/arch/loongarch/kernel/time.c
@@ -190,9 +190,9 @@ static u64 read_const_counter(struct clocksource *clk)
 	return drdtime();
 }
 
-static u64 native_sched_clock(void)
+static noinstr u64 sched_clock_read(void)
 {
-	return read_const_counter(NULL);
+	return drdtime();
 }
 
 static struct clocksource clocksource_const = {
@@ -211,7 +211,7 @@ int __init constant_clocksource_init(void)
 
 	res = clocksource_register_hz(&clocksource_const, freq);
 
-	sched_clock_register(native_sched_clock, 64, freq);
+	sched_clock_register(sched_clock_read, 64, freq);
 
 	pr_info("Constant clock source device register\n");
 

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [tip: sched/core] arm64/arch_timer: Provide noinstr sched_clock_read() functions
  2023-05-19 10:21 ` [PATCH v2 04/13] arm64/arch_timer: Provide noinstr sched_clock_read() functions Peter Zijlstra
  2023-05-24 16:40   ` Valentin Schneider
@ 2023-06-05 19:16   ` tip-bot2 for Peter Zijlstra
  2023-06-06  8:06     ` Peter Zijlstra
  1 sibling, 1 reply; 41+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2023-06-05 19:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Michael Kelley, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     24ee7607b286b44a5112ced38652df14cd80d5e2
Gitweb:        https://git.kernel.org/tip/24ee7607b286b44a5112ced38652df14cd80d5e2
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Fri, 19 May 2023 12:21:02 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 05 Jun 2023 21:11:05 +02:00

arm64/arch_timer: Provide noinstr sched_clock_read() functions

With the intent to provide local_clock_noinstr(), a variant of
local_clock() that's safe to be called from noinstr code (with the
assumption that any such code will already be non-preemptible),
prepare for things by providing a noinstr sched_clock_read() function.

Specifically, preempt_enable_*() calls out to schedule(), which upsets
noinstr validation efforts.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>  # Hyper-V
Link: https://lore.kernel.org/r/20230519102715.435618812@infradead.org
---
 arch/arm64/include/asm/arch_timer.h  |  8 +----
 drivers/clocksource/arm_arch_timer.c | 54 ++++++++++++++++++++-------
 2 files changed, 41 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
index af1fafb..934c658 100644
--- a/arch/arm64/include/asm/arch_timer.h
+++ b/arch/arm64/include/asm/arch_timer.h
@@ -88,13 +88,7 @@ static inline notrace u64 arch_timer_read_cntvct_el0(void)
 
 #define arch_timer_reg_read_stable(reg)					\
 	({								\
-		u64 _val;						\
-									\
-		preempt_disable_notrace();				\
-		_val = erratum_handler(read_ ## reg)();			\
-		preempt_enable_notrace();				\
-									\
-		_val;							\
+		erratum_handler(read_ ## reg)();			\
 	})
 
 /*
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index e09d442..b23d23b 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -191,22 +191,40 @@ u32 arch_timer_reg_read(int access, enum arch_timer_reg reg,
 	return val;
 }
 
-static notrace u64 arch_counter_get_cntpct_stable(void)
+static noinstr u64 raw_counter_get_cntpct_stable(void)
 {
 	return __arch_counter_get_cntpct_stable();
 }
 
-static notrace u64 arch_counter_get_cntpct(void)
+static notrace u64 arch_counter_get_cntpct_stable(void)
+{
+	u64 val;
+	preempt_disable_notrace();
+	val = __arch_counter_get_cntpct_stable();
+	preempt_enable_notrace();
+	return val;
+}
+
+static noinstr u64 arch_counter_get_cntpct(void)
 {
 	return __arch_counter_get_cntpct();
 }
 
-static notrace u64 arch_counter_get_cntvct_stable(void)
+static noinstr u64 raw_counter_get_cntvct_stable(void)
 {
 	return __arch_counter_get_cntvct_stable();
 }
 
-static notrace u64 arch_counter_get_cntvct(void)
+static notrace u64 arch_counter_get_cntvct_stable(void)
+{
+	u64 val;
+	preempt_disable_notrace();
+	val = __arch_counter_get_cntvct_stable();
+	preempt_enable_notrace();
+	return val;
+}
+
+static noinstr u64 arch_counter_get_cntvct(void)
 {
 	return __arch_counter_get_cntvct();
 }
@@ -753,14 +771,14 @@ static int arch_timer_set_next_event_phys(unsigned long evt,
 	return 0;
 }
 
-static u64 arch_counter_get_cnt_mem(struct arch_timer *t, int offset_lo)
+static noinstr u64 arch_counter_get_cnt_mem(struct arch_timer *t, int offset_lo)
 {
 	u32 cnt_lo, cnt_hi, tmp_hi;
 
 	do {
-		cnt_hi = readl_relaxed(t->base + offset_lo + 4);
-		cnt_lo = readl_relaxed(t->base + offset_lo);
-		tmp_hi = readl_relaxed(t->base + offset_lo + 4);
+		cnt_hi = __raw_readl(t->base + offset_lo + 4);
+		cnt_lo = __raw_readl(t->base + offset_lo);
+		tmp_hi = __raw_readl(t->base + offset_lo + 4);
 	} while (cnt_hi != tmp_hi);
 
 	return ((u64) cnt_hi << 32) | cnt_lo;
@@ -1060,7 +1078,7 @@ bool arch_timer_evtstrm_available(void)
 	return cpumask_test_cpu(raw_smp_processor_id(), &evtstrm_available);
 }
 
-static u64 arch_counter_get_cntvct_mem(void)
+static noinstr u64 arch_counter_get_cntvct_mem(void)
 {
 	return arch_counter_get_cnt_mem(arch_timer_mem, CNTVCT_LO);
 }
@@ -1074,6 +1092,7 @@ struct arch_timer_kvm_info *arch_timer_get_kvm_info(void)
 
 static void __init arch_counter_register(unsigned type)
 {
+	u64 (*scr)(void);
 	u64 start_count;
 	int width;
 
@@ -1083,21 +1102,28 @@ static void __init arch_counter_register(unsigned type)
 
 		if ((IS_ENABLED(CONFIG_ARM64) && !is_hyp_mode_available()) ||
 		    arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI) {
-			if (arch_timer_counter_has_wa())
+			if (arch_timer_counter_has_wa()) {
 				rd = arch_counter_get_cntvct_stable;
-			else
+				scr = raw_counter_get_cntvct_stable;
+			} else {
 				rd = arch_counter_get_cntvct;
+				scr = arch_counter_get_cntvct;
+			}
 		} else {
-			if (arch_timer_counter_has_wa())
+			if (arch_timer_counter_has_wa()) {
 				rd = arch_counter_get_cntpct_stable;
-			else
+				scr = raw_counter_get_cntpct_stable;
+			} else {
 				rd = arch_counter_get_cntpct;
+				scr = arch_counter_get_cntpct;
+			}
 		}
 
 		arch_timer_read_counter = rd;
 		clocksource_counter.vdso_clock_mode = vdso_default;
 	} else {
 		arch_timer_read_counter = arch_counter_get_cntvct_mem;
+		scr = arch_counter_get_cntvct_mem;
 	}
 
 	width = arch_counter_get_width();
@@ -1113,7 +1139,7 @@ static void __init arch_counter_register(unsigned type)
 	timecounter_init(&arch_timer_kvm_info.timecounter,
 			 &cyclecounter, start_count);
 
-	sched_clock_register(arch_timer_read_counter, width, arch_timer_rate);
+	sched_clock_register(scr, width, arch_timer_rate);
 }
 
 static void arch_timer_stop(struct clock_event_device *clk)

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [tip: sched/core] arm64/io: Always inline all of __raw_{read,write}[bwlq]()
  2023-05-19 10:21 ` [PATCH v2 03/13] arm64/io: Always inline all of __raw_{read,write}[bwlq]() Peter Zijlstra
  2023-05-24 16:40   ` Valentin Schneider
@ 2023-06-05 19:16   ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 41+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2023-06-05 19:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Valentin Schneider, Michael Kelley, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     c1d26c0f0295953d35307f9ee07f3e5295741315
Gitweb:        https://git.kernel.org/tip/c1d26c0f0295953d35307f9ee07f3e5295741315
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Fri, 19 May 2023 12:21:01 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 05 Jun 2023 21:11:04 +02:00

arm64/io: Always inline all of __raw_{read,write}[bwlq]()

The next patch will want to use __raw_readl() from a noinstr section
and as such that needs to be marked __always_inline to avoid the
compiler being a silly bugger.

Turns out it already is, but its siblings are not. Finish the work
started in commit e43f1331e2ef913b ("arm64: Ask the compiler to
__always_inline functions used by KVM at HYP") for consistenies sake.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>  # Hyper-V
Link: https://lore.kernel.org/r/20230519102715.368919762@infradead.org
---
 arch/arm64/include/asm/io.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index 877495a..51d92ab 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -22,13 +22,13 @@
  * Generic IO read/write.  These perform native-endian accesses.
  */
 #define __raw_writeb __raw_writeb
-static inline void __raw_writeb(u8 val, volatile void __iomem *addr)
+static __always_inline void __raw_writeb(u8 val, volatile void __iomem *addr)
 {
 	asm volatile("strb %w0, [%1]" : : "rZ" (val), "r" (addr));
 }
 
 #define __raw_writew __raw_writew
-static inline void __raw_writew(u16 val, volatile void __iomem *addr)
+static __always_inline void __raw_writew(u16 val, volatile void __iomem *addr)
 {
 	asm volatile("strh %w0, [%1]" : : "rZ" (val), "r" (addr));
 }
@@ -40,13 +40,13 @@ static __always_inline void __raw_writel(u32 val, volatile void __iomem *addr)
 }
 
 #define __raw_writeq __raw_writeq
-static inline void __raw_writeq(u64 val, volatile void __iomem *addr)
+static __always_inline void __raw_writeq(u64 val, volatile void __iomem *addr)
 {
 	asm volatile("str %x0, [%1]" : : "rZ" (val), "r" (addr));
 }
 
 #define __raw_readb __raw_readb
-static inline u8 __raw_readb(const volatile void __iomem *addr)
+static __always_inline u8 __raw_readb(const volatile void __iomem *addr)
 {
 	u8 val;
 	asm volatile(ALTERNATIVE("ldrb %w0, [%1]",
@@ -57,7 +57,7 @@ static inline u8 __raw_readb(const volatile void __iomem *addr)
 }
 
 #define __raw_readw __raw_readw
-static inline u16 __raw_readw(const volatile void __iomem *addr)
+static __always_inline u16 __raw_readw(const volatile void __iomem *addr)
 {
 	u16 val;
 
@@ -80,7 +80,7 @@ static __always_inline u32 __raw_readl(const volatile void __iomem *addr)
 }
 
 #define __raw_readq __raw_readq
-static inline u64 __raw_readq(const volatile void __iomem *addr)
+static __always_inline u64 __raw_readq(const volatile void __iomem *addr)
 {
 	u64 val;
 	asm volatile(ALTERNATIVE("ldr %0, [%1]",

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [tip: sched/core] time/sched_clock: Provide sched_clock_noinstr()
  2023-05-19 10:21 ` [PATCH v2 02/13] time/sched_clock: Provide sched_clock_noinstr() Peter Zijlstra
@ 2023-06-05 19:16   ` tip-bot2 for Peter Zijlstra
  0 siblings, 0 replies; 41+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2023-06-05 19:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Michael Kelley, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     5949a68c73444d89b171703b67ff04fc4d6059c1
Gitweb:        https://git.kernel.org/tip/5949a68c73444d89b171703b67ff04fc4d6059c1
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Fri, 19 May 2023 12:21:00 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 05 Jun 2023 21:11:04 +02:00

time/sched_clock: Provide sched_clock_noinstr()

With the intent to provide local_clock_noinstr(), a variant of
local_clock() that's safe to be called from noinstr code (with the
assumption that any such code will already be non-preemptible),
prepare for things by providing a noinstr sched_clock_noinstr() function.

Specifically, preempt_enable_*() calls out to schedule(), which upsets
noinstr validation efforts.

As such, pull out the preempt_{dis,en}able_notrace() requirements from
the sched_clock_read() implementations by explicitly providing it in
the sched_clock() function.

This further requires said sched_clock_read() functions to be noinstr
themselves, for ARCH_WANTS_NO_INSTR users. See the next few patches.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>  # Hyper-V
Link: https://lore.kernel.org/r/20230519102715.302350330@infradead.org
---
 kernel/time/sched_clock.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index e8f2fb0..68d6c11 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -64,7 +64,7 @@ static struct clock_data cd ____cacheline_aligned = {
 	.actual_read_sched_clock = jiffy_sched_clock_read,
 };
 
-static inline u64 notrace cyc_to_ns(u64 cyc, u32 mult, u32 shift)
+static __always_inline u64 cyc_to_ns(u64 cyc, u32 mult, u32 shift)
 {
 	return (cyc * mult) >> shift;
 }
@@ -80,23 +80,33 @@ notrace int sched_clock_read_retry(unsigned int seq)
 	return raw_read_seqcount_latch_retry(&cd.seq, seq);
 }
 
-unsigned long long notrace sched_clock(void)
+unsigned long long noinstr sched_clock_noinstr(void)
 {
-	u64 cyc, res;
-	unsigned int seq;
 	struct clock_read_data *rd;
+	unsigned int seq;
+	u64 cyc, res;
 
 	do {
-		rd = sched_clock_read_begin(&seq);
+		seq = raw_read_seqcount_latch(&cd.seq);
+		rd = cd.read_data + (seq & 1);
 
 		cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
 		      rd->sched_clock_mask;
 		res = rd->epoch_ns + cyc_to_ns(cyc, rd->mult, rd->shift);
-	} while (sched_clock_read_retry(seq));
+	} while (raw_read_seqcount_latch_retry(&cd.seq, seq));
 
 	return res;
 }
 
+unsigned long long notrace sched_clock(void)
+{
+	unsigned long long ns;
+	preempt_disable_notrace();
+	ns = sched_clock_noinstr();
+	preempt_enable_notrace();
+	return ns;
+}
+
 /*
  * Updating the data required to read the clock.
  *

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [tip: sched/core] seqlock/latch: Provide raw_read_seqcount_latch_retry()
  2023-05-19 10:20 ` [PATCH v2 01/13] seqlock/latch: Provide raw_read_seqcount_latch_retry() Peter Zijlstra
@ 2023-06-05 19:16   ` tip-bot2 for Peter Zijlstra
  0 siblings, 0 replies; 41+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2023-06-05 19:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Thomas Gleixner, Petr Mladek, Michael Kelley, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     d16317de9b412aa7bd3598c607112298e36b4352
Gitweb:        https://git.kernel.org/tip/d16317de9b412aa7bd3598c607112298e36b4352
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Fri, 19 May 2023 12:20:59 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 05 Jun 2023 21:11:03 +02:00

seqlock/latch: Provide raw_read_seqcount_latch_retry()

The read side of seqcount_latch consists of:

  do {
    seq = raw_read_seqcount_latch(&latch->seq);
    ...
  } while (read_seqcount_latch_retry(&latch->seq, seq));

which is asymmetric in the raw_ department, and sure enough,
read_seqcount_latch_retry() includes (explicit) instrumentation where
raw_read_seqcount_latch() does not.

This inconsistency becomes a problem when trying to use it from
noinstr code. As such, fix it by renaming and re-implementing
raw_read_seqcount_latch_retry() without the instrumentation.

Specifically the instrumentation in question is kcsan_atomic_next(0)
in do___read_seqcount_retry(). Loosing this annotation is not a
problem because raw_read_seqcount_latch() does not pass through
kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>  # Hyper-V
Link: https://lore.kernel.org/r/20230519102715.233598176@infradead.org
---
 include/linux/rbtree_latch.h |  2 +-
 include/linux/seqlock.h      | 15 ++++++++-------
 kernel/printk/printk.c       |  2 +-
 kernel/time/sched_clock.c    |  2 +-
 kernel/time/timekeeping.c    |  4 ++--
 5 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/linux/rbtree_latch.h b/include/linux/rbtree_latch.h
index 3d1a9e7..6a0999c 100644
--- a/include/linux/rbtree_latch.h
+++ b/include/linux/rbtree_latch.h
@@ -206,7 +206,7 @@ latch_tree_find(void *key, struct latch_tree_root *root,
 	do {
 		seq = raw_read_seqcount_latch(&root->seq);
 		node = __lt_find(key, root, seq & 1, ops->comp);
-	} while (read_seqcount_latch_retry(&root->seq, seq));
+	} while (raw_read_seqcount_latch_retry(&root->seq, seq));
 
 	return node;
 }
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 3926e90..987a59d 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -671,9 +671,9 @@ typedef struct {
  *
  * Return: sequence counter raw value. Use the lowest bit as an index for
  * picking which data copy to read. The full counter must then be checked
- * with read_seqcount_latch_retry().
+ * with raw_read_seqcount_latch_retry().
  */
-static inline unsigned raw_read_seqcount_latch(const seqcount_latch_t *s)
+static __always_inline unsigned raw_read_seqcount_latch(const seqcount_latch_t *s)
 {
 	/*
 	 * Pairs with the first smp_wmb() in raw_write_seqcount_latch().
@@ -683,16 +683,17 @@ static inline unsigned raw_read_seqcount_latch(const seqcount_latch_t *s)
 }
 
 /**
- * read_seqcount_latch_retry() - end a seqcount_latch_t read section
+ * raw_read_seqcount_latch_retry() - end a seqcount_latch_t read section
  * @s:		Pointer to seqcount_latch_t
  * @start:	count, from raw_read_seqcount_latch()
  *
  * Return: true if a read section retry is required, else false
  */
-static inline int
-read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
+static __always_inline int
+raw_read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
 {
-	return read_seqcount_retry(&s->seqcount, start);
+	smp_rmb();
+	return unlikely(READ_ONCE(s->seqcount.sequence) != start);
 }
 
 /**
@@ -752,7 +753,7 @@ read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
  *			entry = data_query(latch->data[idx], ...);
  *
  *		// This includes needed smp_rmb()
- *		} while (read_seqcount_latch_retry(&latch->seq, seq));
+ *		} while (raw_read_seqcount_latch_retry(&latch->seq, seq));
  *
  *		return entry;
  *	}
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 6a333ad..357a4d1 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -528,7 +528,7 @@ static u64 latched_seq_read_nolock(struct latched_seq *ls)
 		seq = raw_read_seqcount_latch(&ls->latch);
 		idx = seq & 0x1;
 		val = ls->val[idx];
-	} while (read_seqcount_latch_retry(&ls->latch, seq));
+	} while (raw_read_seqcount_latch_retry(&ls->latch, seq));
 
 	return val;
 }
diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 8464c5a..e8f2fb0 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -77,7 +77,7 @@ notrace struct clock_read_data *sched_clock_read_begin(unsigned int *seq)
 
 notrace int sched_clock_read_retry(unsigned int seq)
 {
-	return read_seqcount_latch_retry(&cd.seq, seq);
+	return raw_read_seqcount_latch_retry(&cd.seq, seq);
 }
 
 unsigned long long notrace sched_clock(void)
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 09d5949..266d028 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -450,7 +450,7 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf)
 		tkr = tkf->base + (seq & 0x01);
 		now = ktime_to_ns(tkr->base);
 		now += fast_tk_get_delta_ns(tkr);
-	} while (read_seqcount_latch_retry(&tkf->seq, seq));
+	} while (raw_read_seqcount_latch_retry(&tkf->seq, seq));
 
 	return now;
 }
@@ -566,7 +566,7 @@ static __always_inline u64 __ktime_get_real_fast(struct tk_fast *tkf, u64 *mono)
 		basem = ktime_to_ns(tkr->base);
 		baser = ktime_to_ns(tkr->base_real);
 		delta = fast_tk_get_delta_ns(tkr);
-	} while (read_seqcount_latch_retry(&tkf->seq, seq));
+	} while (raw_read_seqcount_latch_retry(&tkf->seq, seq));
 
 	if (mono)
 		*mono = basem + delta;

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [tip: sched/core] arm64/arch_timer: Provide noinstr sched_clock_read() functions
  2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
@ 2023-06-06  8:06     ` Peter Zijlstra
  2023-06-06  8:13       ` Mark Rutland
  2023-06-09  7:55       ` [tip: sched/core] arm64/arch_timer: Fix MMIO byteswap tip-bot2 for Peter Zijlstra
  0 siblings, 2 replies; 41+ messages in thread
From: Peter Zijlstra @ 2023-06-06  8:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-tip-commits, Michael Kelley, x86

On Mon, Jun 05, 2023 at 07:16:18PM -0000, tip-bot2 for Peter Zijlstra wrote:
> @@ -753,14 +771,14 @@ static int arch_timer_set_next_event_phys(unsigned long evt,
>  	return 0;
>  }
>  
> -static u64 arch_counter_get_cnt_mem(struct arch_timer *t, int offset_lo)
> +static noinstr u64 arch_counter_get_cnt_mem(struct arch_timer *t, int offset_lo)
>  {
>  	u32 cnt_lo, cnt_hi, tmp_hi;
>  
>  	do {
> -		cnt_hi = readl_relaxed(t->base + offset_lo + 4);
> -		cnt_lo = readl_relaxed(t->base + offset_lo);
> -		tmp_hi = readl_relaxed(t->base + offset_lo + 4);
> +		cnt_hi = __raw_readl(t->base + offset_lo + 4);
> +		cnt_lo = __raw_readl(t->base + offset_lo);
> +		tmp_hi = __raw_readl(t->base + offset_lo + 4);
>  	} while (cnt_hi != tmp_hi);
>  
>  	return ((u64) cnt_hi << 32) | cnt_lo;

Mark noted that this looses the byteswap :/


---
Subject: arm64/arch_timer: Fix MMIO byteswap

The readl_relaxed() to __raw_readl() change meant to loose the
instrumentation, but also (inadvertently) lost the byteswap.

Fixes: 24ee7607b286 ("arm64/arch_timer: Provide noinstr sched_clock_read() functions")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index b23d23b033cc..e733a2a1927a 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -776,9 +776,9 @@ static noinstr u64 arch_counter_get_cnt_mem(struct arch_timer *t, int offset_lo)
 	u32 cnt_lo, cnt_hi, tmp_hi;
 
 	do {
-		cnt_hi = __raw_readl(t->base + offset_lo + 4);
-		cnt_lo = __raw_readl(t->base + offset_lo);
-		tmp_hi = __raw_readl(t->base + offset_lo + 4);
+		cnt_hi = __le32_to_cpu((__le32 __force)__raw_readl(t->base + offset_lo + 4));
+		cnt_lo = __le32_to_cpu((__le32 __force)__raw_readl(t->base + offset_lo));
+		tmp_hi = __le32_to_cpu((__le32 __force)__raw_readl(t->base + offset_lo + 4));
 	} while (cnt_hi != tmp_hi);
 
 	return ((u64) cnt_hi << 32) | cnt_lo;

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [tip: sched/core] arm64/arch_timer: Provide noinstr sched_clock_read() functions
  2023-06-06  8:06     ` Peter Zijlstra
@ 2023-06-06  8:13       ` Mark Rutland
  2023-06-09  7:55       ` [tip: sched/core] arm64/arch_timer: Fix MMIO byteswap tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 41+ messages in thread
From: Mark Rutland @ 2023-06-06  8:13 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, linux-tip-commits, Michael Kelley, x86

On Tue, Jun 06, 2023 at 10:06:14AM +0200, Peter Zijlstra wrote:
> On Mon, Jun 05, 2023 at 07:16:18PM -0000, tip-bot2 for Peter Zijlstra wrote:
> > @@ -753,14 +771,14 @@ static int arch_timer_set_next_event_phys(unsigned long evt,
> >  	return 0;
> >  }
> >  
> > -static u64 arch_counter_get_cnt_mem(struct arch_timer *t, int offset_lo)
> > +static noinstr u64 arch_counter_get_cnt_mem(struct arch_timer *t, int offset_lo)
> >  {
> >  	u32 cnt_lo, cnt_hi, tmp_hi;
> >  
> >  	do {
> > -		cnt_hi = readl_relaxed(t->base + offset_lo + 4);
> > -		cnt_lo = readl_relaxed(t->base + offset_lo);
> > -		tmp_hi = readl_relaxed(t->base + offset_lo + 4);
> > +		cnt_hi = __raw_readl(t->base + offset_lo + 4);
> > +		cnt_lo = __raw_readl(t->base + offset_lo);
> > +		tmp_hi = __raw_readl(t->base + offset_lo + 4);
> >  	} while (cnt_hi != tmp_hi);
> >  
> >  	return ((u64) cnt_hi << 32) | cnt_lo;
> 
> Mark noted that this looses the byteswap :/
> 
> 
> ---
> Subject: arm64/arch_timer: Fix MMIO byteswap
> 
> The readl_relaxed() to __raw_readl() change meant to loose the
> instrumentation, but also (inadvertently) lost the byteswap.
> 
> Fixes: 24ee7607b286 ("arm64/arch_timer: Provide noinstr sched_clock_read() functions")
> Reported-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> index b23d23b033cc..e733a2a1927a 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -776,9 +776,9 @@ static noinstr u64 arch_counter_get_cnt_mem(struct arch_timer *t, int offset_lo)
>  	u32 cnt_lo, cnt_hi, tmp_hi;
>  
>  	do {
> -		cnt_hi = __raw_readl(t->base + offset_lo + 4);
> -		cnt_lo = __raw_readl(t->base + offset_lo);
> -		tmp_hi = __raw_readl(t->base + offset_lo + 4);
> +		cnt_hi = __le32_to_cpu((__le32 __force)__raw_readl(t->base + offset_lo + 4));
> +		cnt_lo = __le32_to_cpu((__le32 __force)__raw_readl(t->base + offset_lo));
> +		tmp_hi = __le32_to_cpu((__le32 __force)__raw_readl(t->base + offset_lo + 4));
>  	} while (cnt_hi != tmp_hi);

This LGTM, so FWIW:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Longer term it'd be good to have noistr-safe forms of the MMIO accessors, but
from a quick play that's a much bigger piece of work, and I think this is fine
for now!

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 04/13] arm64/arch_timer: Provide noinstr sched_clock_read() functions
  2023-06-02 11:54     ` Peter Zijlstra
@ 2023-06-07  8:58       ` Valentin Schneider
  0 siblings, 0 replies; 41+ messages in thread
From: Valentin Schneider @ 2023-06-07  8:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: bigeasy, mark.rutland, maz, catalin.marinas, will, chenhuacai,
	kernel, hca, gor, agordeev, borntraeger, svens, pbonzini,
	wanpengli, vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa,
	jgross, boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu,
	decui, rafael, longman, boqun.feng, pmladek, senozhatsky,
	rostedt, john.ogness, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, bristot, jstultz, sboyd,
	linux-kernel, loongarch, linux-s390, kvm, linux-hyperv, linux-pm

On 02/06/23 13:54, Peter Zijlstra wrote:
> On Wed, May 24, 2023 at 05:40:47PM +0100, Valentin Schneider wrote:
>>
>> So this bit sent me on a little spelunking session :-)
>>
>> From a control flow perspective the initialization isn't required, but then
>> I looked into the comment and found it comes from the
>> arch_timer_read_counter() definition... Which itself doesn't get used by
>> sched_clock() until the sched_clock_register() below!
>>
>> So AFAICT that comment was true as of
>>
>>   220069945b29 ("clocksource: arch_timer: Add support for memory mapped timers")
>>
>> but not after a commit that came 2 months later:
>>
>>   65cd4f6c99c1 ("arch_timer: Move to generic sched_clock framework")
>>
>> which IIUC made arm/arm64 follow the default approach of using the
>> jiffy-based sched_clock() before probing DT/ACPI and registering a "proper"
>> sched_clock.
>>
>> All of that to say: the comment about arch_timer_read_counter() vs early
>> sched_clock() doesn't apply anymore, but I think we need to keep its
>> initalization around for stuff like get_cycles(). This initialization here
>> should be OK to put to the bin, though.
>
> Something like the below folded in then?
>

Much better, thank you!


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [tip: sched/core] arm64/arch_timer: Fix MMIO byteswap
  2023-06-06  8:06     ` Peter Zijlstra
  2023-06-06  8:13       ` Mark Rutland
@ 2023-06-09  7:55       ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 41+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2023-06-09  7:55 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Mark Rutland, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     5416bf1cf5602ab3a38b4c0d15ccec1ca4199633
Gitweb:        https://git.kernel.org/tip/5416bf1cf5602ab3a38b4c0d15ccec1ca4199633
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 06 Jun 2023 10:06:14 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 06 Jun 2023 10:19:51 +02:00

arm64/arch_timer: Fix MMIO byteswap

The readl_relaxed() to __raw_readl() change meant to loose the
instrumentation, but also (inadvertently) lost the byteswap.

Fixes: 24ee7607b286 ("arm64/arch_timer: Provide noinstr sched_clock_read() functions")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lkml.kernel.org/r/20230606080614.GB905437@hirez.programming.kicks-ass.net
---
 drivers/clocksource/arm_arch_timer.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index b23d23b..e733a2a 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -776,9 +776,9 @@ static noinstr u64 arch_counter_get_cnt_mem(struct arch_timer *t, int offset_lo)
 	u32 cnt_lo, cnt_hi, tmp_hi;
 
 	do {
-		cnt_hi = __raw_readl(t->base + offset_lo + 4);
-		cnt_lo = __raw_readl(t->base + offset_lo);
-		tmp_hi = __raw_readl(t->base + offset_lo + 4);
+		cnt_hi = __le32_to_cpu((__le32 __force)__raw_readl(t->base + offset_lo + 4));
+		cnt_lo = __le32_to_cpu((__le32 __force)__raw_readl(t->base + offset_lo));
+		tmp_hi = __le32_to_cpu((__le32 __force)__raw_readl(t->base + offset_lo + 4));
 	} while (cnt_hi != tmp_hi);
 
 	return ((u64) cnt_hi << 32) | cnt_lo;

^ permalink raw reply related	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2023-06-09  7:56 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
2023-05-19 10:20 ` [PATCH v2 01/13] seqlock/latch: Provide raw_read_seqcount_latch_retry() Peter Zijlstra
2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-05-19 10:21 ` [PATCH v2 02/13] time/sched_clock: Provide sched_clock_noinstr() Peter Zijlstra
2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-05-19 10:21 ` [PATCH v2 03/13] arm64/io: Always inline all of __raw_{read,write}[bwlq]() Peter Zijlstra
2023-05-24 16:40   ` Valentin Schneider
2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-05-19 10:21 ` [PATCH v2 04/13] arm64/arch_timer: Provide noinstr sched_clock_read() functions Peter Zijlstra
2023-05-24 16:40   ` Valentin Schneider
2023-06-02 11:54     ` Peter Zijlstra
2023-06-07  8:58       ` Valentin Schneider
2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-06-06  8:06     ` Peter Zijlstra
2023-06-06  8:13       ` Mark Rutland
2023-06-09  7:55       ` [tip: sched/core] arm64/arch_timer: Fix MMIO byteswap tip-bot2 for Peter Zijlstra
2023-05-19 10:21 ` [PATCH v2 05/13] loongarch: Provide noinstr sched_clock_read() Peter Zijlstra
2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-05-19 10:21 ` [PATCH v2 06/13] s390/time: Provide sched_clock_noinstr() Peter Zijlstra
2023-05-22 14:18   ` Heiko Carstens
2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-05-19 10:21 ` [PATCH v2 07/13] math64: Always inline u128 version of mul_u64_u64_shr() Peter Zijlstra
2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-05-19 10:21 ` [PATCH v2 08/13] x86/vdso: Fix gettimeofday masking Peter Zijlstra
2023-05-31 15:27   ` Thomas Gleixner
2023-05-31 22:46     ` Thomas Gleixner
2023-05-31 22:46   ` Thomas Gleixner
2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-05-19 10:21 ` [PATCH v2 09/13] clocksource: hyper-v: Adjust hv_read_tsc_page_tsc() to avoid special casing U64_MAX Peter Zijlstra
2023-05-19 18:38   ` Michael Kelley (LINUX)
2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-05-19 10:21 ` [PATCH v2 10/13] clocksource: hyper-v: Provide noinstr sched_clock() Peter Zijlstra
2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-05-19 10:21 ` [PATCH v2 11/13] x86/tsc: Provide sched_clock_noinstr() Peter Zijlstra
2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-05-19 10:21 ` [PATCH v2 12/13] sched/clock: Provide local_clock_noinstr() Peter Zijlstra
2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-05-19 10:21 ` [PATCH v2 13/13] cpuidle: Use local_clock_noinstr() Peter Zijlstra
2023-05-19 14:13   ` Rafael J. Wysocki
2023-06-05 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-05-19 18:48 ` [PATCH v2 00/13] local_clock() vs noinstr Michael Kelley (LINUX)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.