linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/8] hrtimers: Overhaul the clock_was_set() logic
@ 2021-04-27  8:25 Thomas Gleixner
  2021-04-27  8:25 ` [patch 1/8] hrtimer: Ensure timerfd notification for HIGHRES=n Thomas Gleixner
                   ` (7 more replies)
  0 siblings, 8 replies; 21+ messages in thread
From: Thomas Gleixner @ 2021-04-27  8:25 UTC (permalink / raw)
  To: LKML
  Cc: Anna-Maria Behnsen, Peter Zijlstra, Marcelo Tosatti,
	Frederic Weisbecker, Peter Xu, Nitesh Narayan Lal, Alex Belits,
	Rafael J. Wysocki, John Stultz

A recent patch from Marcelo to avoid IPIs when the clock was set

  https://lore.kernel.org/r/20210407135301.GA16985@fuller.cnet

made me look deeper into this.

That mechanism has caught some dust and bitrot over time and just making
clock_was_set() a little bit smarter does not make the code any better.

The following series addresses this by:

 - Fixing the cases where the clock_was_set() handling is incorrect
   depending on configuration or runtime conditions.

 - Distangle the resume notification and the clock-was-set mechanism
   to prepare for IPI avoidance

 - Adopt Marcelo's patch to the modified code and add some more smarts on
   top.

The series is also available from git:

  git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git hrtimer

Thanks,

	tglx
---
 fs/timerfd.c                |   16 +++
 include/linux/hrtimer.h     |    8 -
 kernel/time/hrtimer.c       |  217 ++++++++++++++++++++++++++++++++------------
 kernel/time/tick-common.c   |    7 +
 kernel/time/tick-internal.h |   12 ++
 kernel/time/timekeeping.c   |   36 ++++---
 6 files changed, 219 insertions(+), 77 deletions(-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [patch 1/8] hrtimer: Ensure timerfd notification for HIGHRES=n
  2021-04-27  8:25 [patch 0/8] hrtimers: Overhaul the clock_was_set() logic Thomas Gleixner
@ 2021-04-27  8:25 ` Thomas Gleixner
  2021-04-27  8:25 ` [patch 2/8] hrtimer: Force clock_was_set() handling for the HIGHRES=n, NOHZ=y case Thomas Gleixner
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2021-04-27  8:25 UTC (permalink / raw)
  To: LKML
  Cc: Anna-Maria Behnsen, Peter Zijlstra, Marcelo Tosatti,
	Frederic Weisbecker, Peter Xu, Nitesh Narayan Lal, Alex Belits,
	Rafael J. Wysocki, John Stultz

If high resolution timers are disabled the timerfd notification about a
clock was set event is not happening for all cases which use
clock_was_set_delayed() because that's a NOP for HIGHRES=n, which is wrong.

Make clock_was_set_delayed() unconditially available to fix that.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/hrtimer.h     |    5 -----
 kernel/time/hrtimer.c       |   32 ++++++++++++++++----------------
 kernel/time/tick-internal.h |    3 +++
 3 files changed, 19 insertions(+), 21 deletions(-)

--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -318,16 +318,12 @@ struct clock_event_device;
 
 extern void hrtimer_interrupt(struct clock_event_device *dev);
 
-extern void clock_was_set_delayed(void);
-
 extern unsigned int hrtimer_resolution;
 
 #else
 
 #define hrtimer_resolution	(unsigned int)LOW_RES_NSEC
 
-static inline void clock_was_set_delayed(void) { }
-
 #endif
 
 static inline ktime_t
@@ -351,7 +347,6 @@ hrtimer_expires_remaining_adjusted(const
 						    timer->base->get_time());
 }
 
-extern void clock_was_set(void);
 #ifdef CONFIG_TIMERFD
 extern void timerfd_clock_was_set(void);
 #else
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -758,22 +758,6 @@ static void hrtimer_switch_to_hres(void)
 	retrigger_next_event(NULL);
 }
 
-static void clock_was_set_work(struct work_struct *work)
-{
-	clock_was_set();
-}
-
-static DECLARE_WORK(hrtimer_work, clock_was_set_work);
-
-/*
- * Called from timekeeping and resume code to reprogram the hrtimer
- * interrupt device on all cpus.
- */
-void clock_was_set_delayed(void)
-{
-	schedule_work(&hrtimer_work);
-}
-
 #else
 
 static inline int hrtimer_is_hres_enabled(void) { return 0; }
@@ -891,6 +875,22 @@ void clock_was_set(void)
 	timerfd_clock_was_set();
 }
 
+static void clock_was_set_work(struct work_struct *work)
+{
+	clock_was_set();
+}
+
+static DECLARE_WORK(hrtimer_work, clock_was_set_work);
+
+/*
+ * Called from timekeeping and resume code to reprogram the hrtimer
+ * interrupt device on all cpus and to notify timerfd.
+ */
+void clock_was_set_delayed(void)
+{
+	schedule_work(&hrtimer_work);
+}
+
 /*
  * During resume we might have to reprogram the high resolution timer
  * interrupt on all online CPUs.  However, all other CPUs will be
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -164,3 +164,6 @@ DECLARE_PER_CPU(struct hrtimer_cpu_base,
 
 extern u64 get_next_timer_interrupt(unsigned long basej, u64 basem);
 void timer_clear_idle(void);
+
+void clock_was_set(void);
+void clock_was_set_delayed(void);


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [patch 2/8] hrtimer: Force clock_was_set() handling for the HIGHRES=n, NOHZ=y case
  2021-04-27  8:25 [patch 0/8] hrtimers: Overhaul the clock_was_set() logic Thomas Gleixner
  2021-04-27  8:25 ` [patch 1/8] hrtimer: Ensure timerfd notification for HIGHRES=n Thomas Gleixner
@ 2021-04-27  8:25 ` Thomas Gleixner
  2021-05-12 14:59   ` Peter Zijlstra
  2021-04-27  8:25 ` [patch 3/8] timerfd: Provide timerfd_resume() Thomas Gleixner
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2021-04-27  8:25 UTC (permalink / raw)
  To: LKML
  Cc: Anna-Maria Behnsen, Peter Zijlstra, Marcelo Tosatti,
	Frederic Weisbecker, Peter Xu, Nitesh Narayan Lal, Alex Belits,
	Rafael J. Wysocki, John Stultz

When CONFIG_HIGH_RES_TIMERS is disabled, but NOHZ is enabled then
clock_was_set() is not doing anything. With HIGHRES=n the kernel relies on
the periodic tick to update the clock offsets, but when NOHZ is enabled and
active then CPUs which are in a deep idle sleep do not have a periodic tick
which means the expiry of timers affected by clock_was_set() can be
arbitrarily delayed up to the point where the CPUs are brought out of idle
again.

Make the clock_was_set() logic unconditionaly available so that idle CPUs
are kicked out of idle to handle the update.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/hrtimer.c |   87 +++++++++++++++++++++++++++++++++-----------------
 1 file changed, 59 insertions(+), 28 deletions(-)

--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -720,23 +720,7 @@ static inline int hrtimer_is_hres_enable
 	return hrtimer_hres_enabled;
 }
 
-/*
- * Retrigger next event is called after clock was set
- *
- * Called with interrupts disabled via on_each_cpu()
- */
-static void retrigger_next_event(void *arg)
-{
-	struct hrtimer_cpu_base *base = this_cpu_ptr(&hrtimer_bases);
-
-	if (!__hrtimer_hres_active(base))
-		return;
-
-	raw_spin_lock(&base->lock);
-	hrtimer_update_base(base);
-	hrtimer_force_reprogram(base, 0);
-	raw_spin_unlock(&base->lock);
-}
+static void retrigger_next_event(void *arg);
 
 /*
  * Switch to high resolution mode
@@ -762,9 +746,50 @@ static void hrtimer_switch_to_hres(void)
 
 static inline int hrtimer_is_hres_enabled(void) { return 0; }
 static inline void hrtimer_switch_to_hres(void) { }
-static inline void retrigger_next_event(void *arg) { }
 
 #endif /* CONFIG_HIGH_RES_TIMERS */
+/*
+ * Retrigger next event is called after clock was set with interrupts
+ * disabled through an SMP function call or directly from low level
+ * resume code.
+ *
+ * This is only invoked when:
+ *	- CONFIG_HIGH_RES_TIMERS is enabled.
+ *	- CONFIG_NOHZ_COMMON is enabled
+ *
+ * For the other cases this function is empty and because the call sites
+ * are optimized out it vanishes as well, i.e. no need for lots of
+ * #ifdeffery.
+ */
+static void retrigger_next_event(void *arg)
+{
+	struct hrtimer_cpu_base *base = this_cpu_ptr(&hrtimer_bases);
+
+	/*
+	 * When high resolution mode or nohz is active, then the offsets of
+	 * CLOCK_REALTIME/TAI/BOOTTIME have to be updated. Otherwise the
+	 * next tick will take care of that.
+	 *
+	 * If high resolution mode is active then the next expiring timer
+	 * must be reevaluated and the clock event device reprogrammed if
+	 * necessary.
+	 *
+	 * In the NOHZ case the update of the offset and the reevaluation
+	 * of the next expiring timer is enough. The return from the SMP
+	 * function call will take care of the reprogramming in case the
+	 * CPU was in a NOHZ idle sleep.
+	 */
+	if (!__hrtimer_hres_active(base) && !tick_nohz_active)
+		return;
+
+	raw_spin_lock(&base->lock);
+	hrtimer_update_base(base);
+	if (__hrtimer_hres_active(base))
+		hrtimer_force_reprogram(base, 0);
+	else
+		hrtimer_update_next_event(base);
+	raw_spin_unlock(&base->lock);
+}
 
 /*
  * When a timer is enqueued and expires earlier than the already enqueued
@@ -856,22 +881,28 @@ static void hrtimer_reprogram(struct hrt
 }
 
 /*
- * Clock realtime was set
- *
- * Change the offset of the realtime clock vs. the monotonic
- * clock.
+ * Clock was set. This might affect CLOCK_REALTIME, CLOCK_TAI and
+ * CLOCK_BOOTTIME (for late sleep time injection).
  *
- * We might have to reprogram the high resolution timer interrupt. On
- * SMP we call the architecture specific code to retrigger _all_ high
- * resolution timer interrupts. On UP we just disable interrupts and
- * call the high resolution interrupt code.
+ * This requires to update the offsets for these clocks
+ * vs. CLOCK_MONOTONIC. When high resolution timers are enabled, then this
+ * also requires to eventually reprogram the per CPU clock event devices
+ * when the change moves an affected timer ahead of the first expiring
+ * timer on that CPU. Obviously remote per CPU clock event devices cannot
+ * be reprogrammed. The other reason why an IPI has to be sent is when the
+ * system is in !HIGH_RES and NOHZ mode. The NOHZ mode updates the offsets
+ * in the tick, which obviously might be stopped, so this has to bring out
+ * the remote CPU which might sleep in idle to get this sorted.
  */
 void clock_was_set(void)
 {
-#ifdef CONFIG_HIGH_RES_TIMERS
+	if (!hrtimer_hres_active() && !tick_nohz_active)
+		goto out_timerfd;
+
 	/* Retrigger the CPU local events everywhere */
 	on_each_cpu(retrigger_next_event, NULL, 1);
-#endif
+
+out_timerfd:
 	timerfd_clock_was_set();
 }
 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [patch 3/8] timerfd: Provide timerfd_resume()
  2021-04-27  8:25 [patch 0/8] hrtimers: Overhaul the clock_was_set() logic Thomas Gleixner
  2021-04-27  8:25 ` [patch 1/8] hrtimer: Ensure timerfd notification for HIGHRES=n Thomas Gleixner
  2021-04-27  8:25 ` [patch 2/8] hrtimer: Force clock_was_set() handling for the HIGHRES=n, NOHZ=y case Thomas Gleixner
@ 2021-04-27  8:25 ` Thomas Gleixner
  2021-04-27  8:25 ` [patch 4/8] timekeeping: Distangle resume and clock-was-set events Thomas Gleixner
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2021-04-27  8:25 UTC (permalink / raw)
  To: LKML
  Cc: Anna-Maria Behnsen, Peter Zijlstra, Marcelo Tosatti,
	Frederic Weisbecker, Peter Xu, Nitesh Narayan Lal, Alex Belits,
	Rafael J. Wysocki, John Stultz

Resuming timekeeping is a clock-was-set event and uses the clock-was-set
notification mechanism. This is in the way of making the clock-was-set
update for hrtimers selective so unnecessary IPIs are avoided when a CPU
base does not have timers queued which are affected by the clock setting.

Provide a seperate timerfd_resume() interface so the resume logic and the
clock-was-set mechanism can be distangled in the core code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 fs/timerfd.c            |   16 ++++++++++++++++
 include/linux/hrtimer.h |    2 ++
 2 files changed, 18 insertions(+)

--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -115,6 +115,22 @@ void timerfd_clock_was_set(void)
 	rcu_read_unlock();
 }
 
+static void timerfd_resume_work(struct work_struct *work)
+{
+	timerfd_clock_was_set();
+}
+
+static DECLARE_WORK(timerfd_work, timerfd_resume_work);
+
+/*
+ * Invoked from timekeeping_resume(). Defer the actual update to work so
+ * timerfd_clock_was_set() runs in task context.
+ */
+void timerfd_resume(void)
+{
+	schedule_work(&timerfd_work);
+}
+
 static void __timerfd_remove_cancel(struct timerfd_ctx *ctx)
 {
 	if (ctx->might_cancel) {
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -349,8 +349,10 @@ hrtimer_expires_remaining_adjusted(const
 
 #ifdef CONFIG_TIMERFD
 extern void timerfd_clock_was_set(void);
+extern void timerfd_resume(void);
 #else
 static inline void timerfd_clock_was_set(void) { }
+static inline void timerfd_resume(void) { }
 #endif
 extern void hrtimers_resume(void);
 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [patch 4/8] timekeeping: Distangle resume and clock-was-set events
  2021-04-27  8:25 [patch 0/8] hrtimers: Overhaul the clock_was_set() logic Thomas Gleixner
                   ` (2 preceding siblings ...)
  2021-04-27  8:25 ` [patch 3/8] timerfd: Provide timerfd_resume() Thomas Gleixner
@ 2021-04-27  8:25 ` Thomas Gleixner
  2021-04-27  8:25 ` [patch 5/8] time/timekeeping: Avoid invoking clock_was_set() twice Thomas Gleixner
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2021-04-27  8:25 UTC (permalink / raw)
  To: LKML
  Cc: Anna-Maria Behnsen, Peter Zijlstra, Marcelo Tosatti,
	Frederic Weisbecker, Peter Xu, Nitesh Narayan Lal, Alex Belits,
	Rafael J. Wysocki, John Stultz

Resuming timekeeping is a clock-was-set event and uses the clock-was-set
notification mechanism. This is in the way of making the clock-was-set
update for hrtimers selective so unnecessary IPIs are avoided when a CPU
base does not have timers queued which are affected by the clock setting.

Distangle it by invoking hrtimer_resume() on each unfreezing CPU and invoke
the new timerfd_resume() function from timekeeping_resume() which is the
only place where this is needed.

Rename hrtimer_resume() to hrtimer_resume_local() to reflect the change.

With this the clock_was_set*() functions are not longer required to IPI all
CPUs unconditionally and can get some smarts to avoid them.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/hrtimer.h     |    1 -
 kernel/time/hrtimer.c       |   15 ++++++---------
 kernel/time/tick-common.c   |    7 +++++++
 kernel/time/tick-internal.h |    2 ++
 kernel/time/timekeeping.c   |    4 +++-
 5 files changed, 18 insertions(+), 11 deletions(-)

--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -354,7 +354,6 @@ extern void timerfd_resume(void);
 static inline void timerfd_clock_was_set(void) { }
 static inline void timerfd_resume(void) { }
 #endif
-extern void hrtimers_resume(void);
 
 DECLARE_PER_CPU(struct tick_device, tick_cpu_device);
 
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -914,8 +914,8 @@ static void clock_was_set_work(struct wo
 static DECLARE_WORK(hrtimer_work, clock_was_set_work);
 
 /*
- * Called from timekeeping and resume code to reprogram the hrtimer
- * interrupt device on all cpus and to notify timerfd.
+ * Called from timekeeping code to reprogram the hrtimer interrupt device
+ * on all cpus and to notify timerfd.
  */
 void clock_was_set_delayed(void)
 {
@@ -923,18 +923,15 @@ void clock_was_set_delayed(void)
 }
 
 /*
- * During resume we might have to reprogram the high resolution timer
- * interrupt on all online CPUs.  However, all other CPUs will be
- * stopped with IRQs interrupts disabled so the clock_was_set() call
- * must be deferred.
+ * Called during resume either directly from via timekeeping_resume()
+ * or in the case of s2idle from tick_unfreeze() to ensure that the
+ * hrtimers are up to date.
  */
-void hrtimers_resume(void)
+void hrtimers_resume_local(void)
 {
 	lockdep_assert_irqs_disabled();
 	/* Retrigger on the local CPU */
 	retrigger_next_event(NULL);
-	/* And schedule a retrigger for all others */
-	clock_was_set_delayed();
 }
 
 /*
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -475,6 +475,13 @@ void tick_resume_local(void)
 		else
 			tick_resume_oneshot();
 	}
+
+	/*
+	 * Ensure that hrtimers are up to date and the clockevents device
+	 * is reprogrammed correctly when high resolution timers are
+	 * enabled.
+	 */
+	hrtimers_resume_local();
 }
 
 /**
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -167,3 +167,5 @@ void timer_clear_idle(void);
 
 void clock_was_set(void);
 void clock_was_set_delayed(void);
+
+void hrtimers_resume_local(void);
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1799,8 +1799,10 @@ void timekeeping_resume(void)
 
 	touch_softlockup_watchdog();
 
+	/* Resume the clockevent device(s) and hrtimers */
 	tick_resume();
-	hrtimers_resume();
+	/* Notify timerfd as resume is equivalent to clock_was_set() */
+	timerfd_resume();
 }
 
 int timekeeping_suspend(void)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [patch 5/8] time/timekeeping: Avoid invoking clock_was_set() twice
  2021-04-27  8:25 [patch 0/8] hrtimers: Overhaul the clock_was_set() logic Thomas Gleixner
                   ` (3 preceding siblings ...)
  2021-04-27  8:25 ` [patch 4/8] timekeeping: Distangle resume and clock-was-set events Thomas Gleixner
@ 2021-04-27  8:25 ` Thomas Gleixner
  2021-04-27  8:25 ` [patch 6/8] hrtimer: Add bases argument to clock_was_set() Thomas Gleixner
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2021-04-27  8:25 UTC (permalink / raw)
  To: LKML
  Cc: Anna-Maria Behnsen, Peter Zijlstra, Marcelo Tosatti,
	Frederic Weisbecker, Peter Xu, Nitesh Narayan Lal, Alex Belits,
	Rafael J. Wysocki, John Stultz

do_adjtimex() might end up scheduling a delayed clock_was_set() via
timekeeping_advance() and then invoke clock_was_set() directly which is
pointless.

Make timekeeping_advance() return whether an invocation of clock_was_set()
is required and handle it at the call sites which allows do_adjtimex() to
issue a single direct call if required.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/timekeeping.c |   18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -2126,7 +2126,7 @@ static u64 logarithmic_accumulation(stru
  * timekeeping_advance - Updates the timekeeper to the current time and
  * current NTP tick length
  */
-static void timekeeping_advance(enum timekeeping_adv_mode mode)
+static bool timekeeping_advance(enum timekeeping_adv_mode mode)
 {
 	struct timekeeper *real_tk = &tk_core.timekeeper;
 	struct timekeeper *tk = &shadow_timekeeper;
@@ -2197,9 +2197,8 @@ static void timekeeping_advance(enum tim
 	write_seqcount_end(&tk_core.seq);
 out:
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
-	if (clock_set)
-		/* Have to call _delayed version, since in irq context*/
-		clock_was_set_delayed();
+
+	return !!clock_set;
 }
 
 /**
@@ -2208,7 +2207,8 @@ static void timekeeping_advance(enum tim
  */
 void update_wall_time(void)
 {
-	timekeeping_advance(TK_ADV_TICK);
+	if (timekeeping_advance(TK_ADV_TICK))
+		clock_was_set_delayed();
 }
 
 /**
@@ -2388,8 +2388,9 @@ int do_adjtimex(struct __kernel_timex *t
 {
 	struct timekeeper *tk = &tk_core.timekeeper;
 	struct audit_ntp_data ad;
-	unsigned long flags;
+	bool clock_set = false;
 	struct timespec64 ts;
+	unsigned long flags;
 	s32 orig_tai, tai;
 	int ret;
 
@@ -2424,6 +2425,7 @@ int do_adjtimex(struct __kernel_timex *t
 	if (tai != orig_tai) {
 		__timekeeping_set_tai_offset(tk, tai);
 		timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET);
+		clock_set = true;
 	}
 	tk_update_leap_state(tk);
 
@@ -2434,9 +2436,9 @@ int do_adjtimex(struct __kernel_timex *t
 
 	/* Update the multiplier immediately if frequency was set directly */
 	if (txc->modes & (ADJ_FREQUENCY | ADJ_TICK))
-		timekeeping_advance(TK_ADV_FREQ);
+		clock_set |= timekeeping_advance(TK_ADV_FREQ);
 
-	if (tai != orig_tai)
+	if (clock_set)
 		clock_was_set();
 
 	ntp_notify_cmos_timer();


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [patch 6/8] hrtimer: Add bases argument to clock_was_set()
  2021-04-27  8:25 [patch 0/8] hrtimers: Overhaul the clock_was_set() logic Thomas Gleixner
                   ` (4 preceding siblings ...)
  2021-04-27  8:25 ` [patch 5/8] time/timekeeping: Avoid invoking clock_was_set() twice Thomas Gleixner
@ 2021-04-27  8:25 ` Thomas Gleixner
  2021-04-27  8:25 ` [patch 7/8] hrtimer: Avoid unnecessary SMP function calls in clock_was_set() Thomas Gleixner
  2021-04-27  8:25 ` [patch 8/8] hrtimer: Avoid more " Thomas Gleixner
  7 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2021-04-27  8:25 UTC (permalink / raw)
  To: LKML
  Cc: Anna-Maria Behnsen, Peter Zijlstra, Marcelo Tosatti,
	Frederic Weisbecker, Peter Xu, Nitesh Narayan Lal, Alex Belits,
	Rafael J. Wysocki, John Stultz

clock_was_set() unconditionaly invokes retrigger_next_event() on all online
CPUs. This was necessary because that mechanism was also used for resume
from suspend to idle which is not longer the case.

The bases arguments allows the callers of clock_was_set() to hand in a mask
which tells clock_was_set() which of the hrtimer clock bases are affected
by the clock setting. This mask will be used in the next step to check
whether a CPU base has timers queued on a clock base affected by the event
and avoid the SMP function call if there are none.

Add a @bases argument, provide defines for the active bases masking and
fixup all callsites.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/hrtimer.c       |    4 ++--
 kernel/time/tick-internal.h |    9 ++++++++-
 kernel/time/timekeeping.c   |   14 +++++++-------
 3 files changed, 17 insertions(+), 10 deletions(-)

--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -894,7 +894,7 @@ static void hrtimer_reprogram(struct hrt
  * in the tick, which obviously might be stopped, so this has to bring out
  * the remote CPU which might sleep in idle to get this sorted.
  */
-void clock_was_set(void)
+void clock_was_set(unsigned int bases)
 {
 	if (!hrtimer_hres_active() && !tick_nohz_active)
 		goto out_timerfd;
@@ -908,7 +908,7 @@ void clock_was_set(void)
 
 static void clock_was_set_work(struct work_struct *work)
 {
-	clock_was_set();
+	clock_was_set(CLOCK_SET_WALL);
 }
 
 static DECLARE_WORK(hrtimer_work, clock_was_set_work);
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -165,7 +165,14 @@ DECLARE_PER_CPU(struct hrtimer_cpu_base,
 extern u64 get_next_timer_interrupt(unsigned long basej, u64 basem);
 void timer_clear_idle(void);
 
-void clock_was_set(void);
+#define CLOCK_SET_WALL							\
+	(BIT(HRTIMER_BASE_REALTIME) | BIT(HRTIMER_BASE_REALTIME_SOFT) |	\
+	 BIT(HRTIMER_BASE_TAI) | BIT(HRTIMER_BASE_TAI_SOFT))
+
+#define CLOCK_SET_BOOT							\
+	(BIT(HRTIMER_BASE_BOOTTIME) | BIT(HRTIMER_BASE_BOOTTIME_SOFT))
+
+void clock_was_set(unsigned int bases);
 void clock_was_set_delayed(void);
 
 void hrtimers_resume_local(void);
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1322,8 +1322,8 @@ int do_settimeofday64(const struct times
 	write_seqcount_end(&tk_core.seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
-	/* signal hrtimers about time change */
-	clock_was_set();
+	/* Signal hrtimers about time change */
+	clock_was_set(CLOCK_SET_WALL);
 
 	if (!ret)
 		audit_tk_injoffset(ts_delta);
@@ -1370,8 +1370,8 @@ error: /* even if we error out, we forwa
 	write_seqcount_end(&tk_core.seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
-	/* signal hrtimers about time change */
-	clock_was_set();
+	/* Signal hrtimers about time change */
+	clock_was_set(CLOCK_SET_WALL);
 
 	return ret;
 }
@@ -1735,8 +1735,8 @@ void timekeeping_inject_sleeptime64(cons
 	write_seqcount_end(&tk_core.seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
-	/* signal hrtimers about time change */
-	clock_was_set();
+	/* Signal hrtimers about time change */
+	clock_was_set(CLOCK_SET_WALL | CLOCK_SET_BOOT);
 }
 #endif
 
@@ -2429,7 +2429,7 @@ int do_adjtimex(struct __kernel_timex *t
 		clock_set |= timekeeping_advance(TK_ADV_FREQ);
 
 	if (clock_set)
-		clock_was_set();
+		clock_was_set(CLOCK_REALTIME);
 
 	ntp_notify_cmos_timer();
 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [patch 7/8] hrtimer: Avoid unnecessary SMP function calls in clock_was_set()
  2021-04-27  8:25 [patch 0/8] hrtimers: Overhaul the clock_was_set() logic Thomas Gleixner
                   ` (5 preceding siblings ...)
  2021-04-27  8:25 ` [patch 6/8] hrtimer: Add bases argument to clock_was_set() Thomas Gleixner
@ 2021-04-27  8:25 ` Thomas Gleixner
  2021-05-13 14:59   ` Peter Zijlstra
  2021-04-27  8:25 ` [patch 8/8] hrtimer: Avoid more " Thomas Gleixner
  7 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2021-04-27  8:25 UTC (permalink / raw)
  To: LKML
  Cc: Anna-Maria Behnsen, Peter Zijlstra, Marcelo Tosatti,
	Frederic Weisbecker, Peter Xu, Nitesh Narayan Lal, Alex Belits,
	Rafael J. Wysocki, John Stultz

From: Marcelo Tosatti <mtosatti@redhat.com>

Setting of clocks triggers an unconditional SMP function call on all online
CPUs to reprogram the clock event device.

However, only some clocks have their offsets updated and therefore
potentially require a reprogram. That's CLOCK_REALTIME and CLOCK_TAI and in
the case of resume (delayed sleep time injection) also CLOCK_BOOTTIME.

Instead of sending an IPI unconditionally, check each per CPU hrtimer base
whether it has active timers in the affected clock bases which are
indicated by the caller in the @bases argument of clock_was_set().

If that's not the case, skip the IPI and update the offsets remotely which
ensures that any subsequently armed timers on the affected clocks are
evaluated with the correct offsets.

[ tglx: Adopted to the new bases argument, removed the softirq_active
  	check, added comment, fixed up stale comment ]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
v6:
  - Remove the softirq_active check (Peter Xu)
  - Adopt to the new base argument
  - Add comment and fixup the stale one
  - Bring back CLOCK_BOOTTIME conditionally for late sleeptime
    injection on resume (missed that in the review of V1)

v5:
  - Add missing hrtimer_update_base (Peter Xu).

v4:
   - Drop unused code (Thomas).

v3:
   - Nicer changelog  (Thomas).
   - Code style fixes (Thomas).
   - Compilation warning with CONFIG_HIGH_RES_TIMERS=n (Thomas).
   - Shrink preemption disabled section (Thomas).

v2:
   - Only REALTIME and TAI bases are affected by offset-to-monotonic changes (Thomas).
   - Don't special case nohz_full CPUs (Thomas).


---
 kernel/time/hrtimer.c |   35 +++++++++++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -896,11 +896,42 @@ static void hrtimer_reprogram(struct hrt
  */
 void clock_was_set(unsigned int bases)
 {
+	cpumask_var_t mask;
+	int cpu;
+
 	if (!hrtimer_hres_active() && !tick_nohz_active)
 		goto out_timerfd;
 
-	/* Retrigger the CPU local events everywhere */
-	on_each_cpu(retrigger_next_event, NULL, 1);
+	if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
+		on_each_cpu(retrigger_next_event, NULL, 1);
+		goto out_timerfd;
+	}
+
+	/* Avoid interrupting CPUs if possible */
+	cpus_read_lock();
+	for_each_online_cpu(cpu) {
+		struct hrtimer_cpu_base *cpu_base = &per_cpu(hrtimer_bases, cpu);
+		unsigned long flags;
+
+		raw_spin_lock_irqsave(&cpu_base->lock, flags);
+		/*
+		 * Only send the IPI when there are timers queued in one of
+		 * the affected clock bases. Otherwise update the base
+		 * remote to ensure that the next enqueue of a timer on
+		 * such a clock base will see the correct offsets.
+		 */
+		if (cpu_base->active_bases & bases)
+			cpumask_set_cpu(cpu, mask);
+		else
+			hrtimer_update_base(cpu_base);
+		raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
+	}
+
+	preempt_disable();
+	smp_call_function_many(mask, retrigger_next_event, NULL, 1);
+	preempt_enable();
+	cpus_read_unlock();
+	free_cpumask_var(mask);
 
 out_timerfd:
 	timerfd_clock_was_set();


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [patch 8/8] hrtimer: Avoid more SMP function calls in clock_was_set()
  2021-04-27  8:25 [patch 0/8] hrtimers: Overhaul the clock_was_set() logic Thomas Gleixner
                   ` (6 preceding siblings ...)
  2021-04-27  8:25 ` [patch 7/8] hrtimer: Avoid unnecessary SMP function calls in clock_was_set() Thomas Gleixner
@ 2021-04-27  8:25 ` Thomas Gleixner
  2021-04-27 15:11   ` Marcelo Tosatti
  7 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2021-04-27  8:25 UTC (permalink / raw)
  To: LKML
  Cc: Anna-Maria Behnsen, Peter Zijlstra, Marcelo Tosatti,
	Frederic Weisbecker, Peter Xu, Nitesh Narayan Lal, Alex Belits,
	Rafael J. Wysocki, John Stultz

There are more indicators whether the SMP function calls on clock_was_set()
can be avoided:

    - When the remote CPU is currently handling hrtimer_interrupt(). In
      that case the remote CPU will update offsets and reevaluate the timer
      bases before reprogramming anyway, so nothing to do.

By unconditionally updating the offsets the following checks are possible:

    - When the offset update already happened on the remote CPU then the
      remote update attempt will yield the same seqeuence number and no
      IPI is required.

    - After updating it can be checked whether the first expiring timer in
      the affected clock bases moves before the first expiring (softirq)
      timer of the CPU. If that's not the case then sending the IPI is not
      required.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/hrtimer.c |   66 +++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 57 insertions(+), 9 deletions(-)

--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -880,6 +880,60 @@ static void hrtimer_reprogram(struct hrt
 	tick_program_event(expires, 1);
 }
 
+static bool update_needs_ipi(struct hrtimer_cpu_base *cpu_base,
+			     unsigned int active)
+{
+	struct hrtimer_clock_base *base;
+	unsigned int seq;
+	ktime_t expires;
+
+	/*
+	 * If the remote CPU is currently handling an hrtimer interrupt, it
+	 * will update and reevaluate the first expiring timer of all clock
+	 * bases before reprogramming. Nothing to do here.
+	 */
+	if (cpu_base->in_hrtirq)
+		return false;
+
+	/*
+	 * Update the base offsets unconditionally so the following quick
+	 * check whether the SMP function call is required works.
+	 */
+	seq = cpu_base->clock_was_set_seq;
+	hrtimer_update_base(cpu_base);
+
+	/*
+	 * If the sequence did not change over the update then the
+	 * remote CPU already handled it.
+	 */
+	if (seq == cpu_base->clock_was_set_seq)
+		return false;
+
+	/*
+	 * Walk the affected clock bases and check whether the first expiring
+	 * timer in a clock base is moving ahead of the first expiring timer of
+	 * @cpu_base. If so, the IPI must be invoked because per CPU clock
+	 * event devices cannot be remotely reprogrammed.
+	 */
+	for_each_active_base(base, cpu_base, active) {
+		struct timerqueue_node *next;
+
+		next = timerqueue_getnext(&base->active);
+		expires = ktime_sub(next->expires, base->offset);
+		if (expires < cpu_base->expires_next)
+			return true;
+
+		/* Extra check for softirq clock bases */
+		if (base->clockid < HRTIMER_BASE_MONOTONIC_SOFT)
+			continue;
+		if (cpu_base->softirq_activated)
+			continue;
+		if (expires < cpu_base->softirq_expires_next)
+			return true;
+	}
+	return false;
+}
+
 /*
  * Clock was set. This might affect CLOCK_REALTIME, CLOCK_TAI and
  * CLOCK_BOOTTIME (for late sleep time injection).
@@ -914,16 +968,10 @@ void clock_was_set(unsigned int bases)
 		unsigned long flags;
 
 		raw_spin_lock_irqsave(&cpu_base->lock, flags);
-		/*
-		 * Only send the IPI when there are timers queued in one of
-		 * the affected clock bases. Otherwise update the base
-		 * remote to ensure that the next enqueue of a timer on
-		 * such a clock base will see the correct offsets.
-		 */
-		if (cpu_base->active_bases & bases)
+
+		if (update_needs_ipi(cpu_base, bases))
 			cpumask_set_cpu(cpu, mask);
-		else
-			hrtimer_update_base(cpu_base);
+
 		raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
 	}
 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 8/8] hrtimer: Avoid more SMP function calls in clock_was_set()
  2021-04-27  8:25 ` [patch 8/8] hrtimer: Avoid more " Thomas Gleixner
@ 2021-04-27 15:11   ` Marcelo Tosatti
  2021-04-27 19:59     ` Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Marcelo Tosatti @ 2021-04-27 15:11 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Anna-Maria Behnsen, Peter Zijlstra, Frederic Weisbecker,
	Peter Xu, Nitesh Narayan Lal, Alex Belits, Rafael J. Wysocki,
	John Stultz

On Tue, Apr 27, 2021 at 10:25:45AM +0200, Thomas Gleixner wrote:
> There are more indicators whether the SMP function calls on clock_was_set()
> can be avoided:
> 
>     - When the remote CPU is currently handling hrtimer_interrupt(). In
>       that case the remote CPU will update offsets and reevaluate the timer
>       bases before reprogramming anyway, so nothing to do.
> 
> By unconditionally updating the offsets the following checks are possible:
> 
>     - When the offset update already happened on the remote CPU then the
>       remote update attempt will yield the same seqeuence number and no
>       IPI is required.
> 
>     - After updating it can be checked whether the first expiring timer in
>       the affected clock bases moves before the first expiring (softirq)
>       timer of the CPU. If that's not the case then sending the IPI is not
>       required.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/time/hrtimer.c |   66 +++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 57 insertions(+), 9 deletions(-)
> 
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -880,6 +880,60 @@ static void hrtimer_reprogram(struct hrt
>  	tick_program_event(expires, 1);
>  }
>  
> +static bool update_needs_ipi(struct hrtimer_cpu_base *cpu_base,
> +			     unsigned int active)
> +{
> +	struct hrtimer_clock_base *base;
> +	unsigned int seq;
> +	ktime_t expires;
> +
> +	/*
> +	 * If the remote CPU is currently handling an hrtimer interrupt, it
> +	 * will update and reevaluate the first expiring timer of all clock
> +	 * bases before reprogramming. Nothing to do here.
> +	 */
> +	if (cpu_base->in_hrtirq)
> +		return false;

Thomas,

The base offsets are updated at 

Cscope tag: hrtimer_update_base
   #   line  filename / context / line
   1    736  kernel/time/hrtimer.c <<retrigger_next_event>>	(IPI handler)
             hrtimer_update_base(base);
   2   1617  kernel/time/hrtimer.c <<hrtimer_run_softirq>>	(softirq handler)
             now = hrtimer_update_base(cpu_base);
   3   1645  kernel/time/hrtimer.c <<hrtimer_interrupt>>	(hrtimer_interrupt entry)
             entry_time = now = hrtimer_update_base(cpu_base);
   4   1695  kernel/time/hrtimer.c <<hrtimer_interrupt>>	(after tick_program_event failure)
             now = hrtimer_update_base(cpu_base);
   5   1768  kernel/time/hrtimer.c <<hrtimer_run_queues>>	(sched_tick)
             now = hrtimer_update_base(cpu_base);

Consider


hrtimer_interrupt
in_hrtirq = 1
__run_hrtimer
raw_spin_unlock_irqrestore(&cpu_base->lock, flags)
								settimeofday
								clock_was_set
								lock cpu_base->lock
								update_needs_ipi returns false
continue to process hrtimers with stale base->offset

No?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 8/8] hrtimer: Avoid more SMP function calls in clock_was_set()
  2021-04-27 15:11   ` Marcelo Tosatti
@ 2021-04-27 19:59     ` Thomas Gleixner
  2021-04-30  7:12       ` [patch V2 " Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2021-04-27 19:59 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: LKML, Anna-Maria Behnsen, Peter Zijlstra, Frederic Weisbecker,
	Peter Xu, Nitesh Narayan Lal, Alex Belits, Rafael J. Wysocki,
	John Stultz

On Tue, Apr 27 2021 at 12:11, Marcelo Tosatti wrote:
> On Tue, Apr 27, 2021 at 10:25:45AM +0200, Thomas Gleixner wrote:
> Consider
>
>
> hrtimer_interrupt
> in_hrtirq = 1
> __run_hrtimer
> raw_spin_unlock_irqrestore(&cpu_base->lock, flags)
> 								settimeofday
> 								clock_was_set
> 								lock cpu_base->lock
> 								update_needs_ipi returns false
> continue to process hrtimers with stale base->offset

Bah. I somehow convinced myself that hrtimer_interrupt() rechecks the
offset before clearing in_hrtirq, but that's not true. It does so when
reprogramming fails, but not for the regular case.

Lemme stare at it some more.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [patch V2 8/8] hrtimer: Avoid more SMP function calls in clock_was_set()
  2021-04-27 19:59     ` Thomas Gleixner
@ 2021-04-30  7:12       ` Thomas Gleixner
  2021-04-30 16:49         ` Marcelo Tosatti
  2021-05-13  7:47         ` Peter Zijlstra
  0 siblings, 2 replies; 21+ messages in thread
From: Thomas Gleixner @ 2021-04-30  7:12 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: LKML, Anna-Maria Behnsen, Peter Zijlstra, Frederic Weisbecker,
	Peter Xu, Nitesh Narayan Lal, Alex Belits, Rafael J. Wysocki,
	John Stultz

By unconditionally updating the offsets there are more indicators
whether the SMP function calls on clock_was_set() can be avoided:

  - When the offset update already happened on the remote CPU then the
    remote update attempt will yield the same seqeuence number and no
    IPI is required.

  - When the remote CPU is currently handling hrtimer_interrupt(). In
    that case the remote CPU will reevaluate the timer bases before
    reprogramming anyway, so nothing to do.

  - After updating it can be checked whether the first expiring timer in
    the affected clock bases moves before the first expiring (softirq)
    timer of the CPU. If that's not the case then sending the IPI is not
    required.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Fix the in_hrtirq thinko (Marcelo)
    Add the missing masking (reported by 0day)

P.S.: The git branch is updated as well

---
 kernel/time/hrtimer.c |   74 +++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 65 insertions(+), 9 deletions(-)

--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -880,6 +880,68 @@ static void hrtimer_reprogram(struct hrt
 	tick_program_event(expires, 1);
 }
 
+static bool update_needs_ipi(struct hrtimer_cpu_base *cpu_base,
+			     unsigned int active)
+{
+	struct hrtimer_clock_base *base;
+	unsigned int seq;
+	ktime_t expires;
+
+	/*
+	 * Update the base offsets unconditionally so the following
+	 * checks whether the SMP function call is required works.
+	 *
+	 * The update is safe even when the remote CPU is in the hrtimer
+	 * interrupt or the hrtimer soft interrupt and expiring affected
+	 * bases. Either it will see the update before handling a base or
+	 * it will see it when it finishes the processing and reevaluates
+	 * the next expiring timer.
+	 */
+	seq = cpu_base->clock_was_set_seq;
+	hrtimer_update_base(cpu_base);
+
+	/*
+	 * If the sequence did not change over the update then the
+	 * remote CPU already handled it.
+	 */
+	if (seq == cpu_base->clock_was_set_seq)
+		return false;
+
+	/*
+	 * If the remote CPU is currently handling an hrtimer interrupt, it
+	 * will reevaluate the first expiring timer of all clock bases
+	 * before reprogramming. Nothing to do here.
+	 */
+	if (cpu_base->in_hrtirq)
+		return false;
+
+	/*
+	 * Walk the affected clock bases and check whether the first expiring
+	 * timer in a clock base is moving ahead of the first expiring timer of
+	 * @cpu_base. If so, the IPI must be invoked because per CPU clock
+	 * event devices cannot be remotely reprogrammed.
+	 */
+	active &= cpu_base->active_bases;
+
+	for_each_active_base(base, cpu_base, active) {
+		struct timerqueue_node *next;
+
+		next = timerqueue_getnext(&base->active);
+		expires = ktime_sub(next->expires, base->offset);
+		if (expires < cpu_base->expires_next)
+			return true;
+
+		/* Extra check for softirq clock bases */
+		if (base->clockid < HRTIMER_BASE_MONOTONIC_SOFT)
+			continue;
+		if (cpu_base->softirq_activated)
+			continue;
+		if (expires < cpu_base->softirq_expires_next)
+			return true;
+	}
+	return false;
+}
+
 /*
  * Clock was set. This might affect CLOCK_REALTIME, CLOCK_TAI and
  * CLOCK_BOOTTIME (for late sleep time injection).
@@ -914,16 +976,10 @@ void clock_was_set(unsigned int bases)
 		unsigned long flags;
 
 		raw_spin_lock_irqsave(&cpu_base->lock, flags);
-		/*
-		 * Only send the IPI when there are timers queued in one of
-		 * the affected clock bases. Otherwise update the base
-		 * remote to ensure that the next enqueue of a timer on
-		 * such a clock base will see the correct offsets.
-		 */
-		if (cpu_base->active_bases & bases)
+
+		if (update_needs_ipi(cpu_base, bases))
 			cpumask_set_cpu(cpu, mask);
-		else
-			hrtimer_update_base(cpu_base);
+
 		raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
 	}
 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch V2 8/8] hrtimer: Avoid more SMP function calls in clock_was_set()
  2021-04-30  7:12       ` [patch V2 " Thomas Gleixner
@ 2021-04-30 16:49         ` Marcelo Tosatti
  2021-05-13  7:47         ` Peter Zijlstra
  1 sibling, 0 replies; 21+ messages in thread
From: Marcelo Tosatti @ 2021-04-30 16:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Anna-Maria Behnsen, Peter Zijlstra, Frederic Weisbecker,
	Peter Xu, Nitesh Narayan Lal, Alex Belits, Rafael J. Wysocki,
	John Stultz

On Fri, Apr 30, 2021 at 09:12:15AM +0200, Thomas Gleixner wrote:
> By unconditionally updating the offsets there are more indicators
> whether the SMP function calls on clock_was_set() can be avoided:
> 
>   - When the offset update already happened on the remote CPU then the
>     remote update attempt will yield the same seqeuence number and no
>     IPI is required.
> 
>   - When the remote CPU is currently handling hrtimer_interrupt(). In
>     that case the remote CPU will reevaluate the timer bases before
>     reprogramming anyway, so nothing to do.
> 
>   - After updating it can be checked whether the first expiring timer in
>     the affected clock bases moves before the first expiring (softirq)
>     timer of the CPU. If that's not the case then sending the IPI is not
>     required.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> V2: Fix the in_hrtirq thinko (Marcelo)
>     Add the missing masking (reported by 0day)
> 
> P.S.: The git branch is updated as well
> 
> ---
>  kernel/time/hrtimer.c |   74 +++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 65 insertions(+), 9 deletions(-)
> 
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -880,6 +880,68 @@ static void hrtimer_reprogram(struct hrt
>  	tick_program_event(expires, 1);
>  }
>  
> +static bool update_needs_ipi(struct hrtimer_cpu_base *cpu_base,
> +			     unsigned int active)
> +{
> +	struct hrtimer_clock_base *base;
> +	unsigned int seq;
> +	ktime_t expires;
> +
> +	/*
> +	 * Update the base offsets unconditionally so the following
> +	 * checks whether the SMP function call is required works.
> +	 *
> +	 * The update is safe even when the remote CPU is in the hrtimer
> +	 * interrupt or the hrtimer soft interrupt and expiring affected
> +	 * bases. Either it will see the update before handling a base or
> +	 * it will see it when it finishes the processing and reevaluates
> +	 * the next expiring timer.
> +	 */
> +	seq = cpu_base->clock_was_set_seq;
> +	hrtimer_update_base(cpu_base);
> +
> +	/*
> +	 * If the sequence did not change over the update then the
> +	 * remote CPU already handled it.
> +	 */
> +	if (seq == cpu_base->clock_was_set_seq)
> +		return false;
> +
> +	/*
> +	 * If the remote CPU is currently handling an hrtimer interrupt, it
> +	 * will reevaluate the first expiring timer of all clock bases
> +	 * before reprogramming. Nothing to do here.
> +	 */
> +	if (cpu_base->in_hrtirq)
> +		return false;

Looks good, thanks.

> +
> +	/*
> +	 * Walk the affected clock bases and check whether the first expiring
> +	 * timer in a clock base is moving ahead of the first expiring timer of
> +	 * @cpu_base. If so, the IPI must be invoked because per CPU clock
> +	 * event devices cannot be remotely reprogrammed.
> +	 */
> +	active &= cpu_base->active_bases;
> +
> +	for_each_active_base(base, cpu_base, active) {
> +		struct timerqueue_node *next;
> +
> +		next = timerqueue_getnext(&base->active);
> +		expires = ktime_sub(next->expires, base->offset);
> +		if (expires < cpu_base->expires_next)
> +			return true;
> +
> +		/* Extra check for softirq clock bases */
> +		if (base->clockid < HRTIMER_BASE_MONOTONIC_SOFT)
> +			continue;
> +		if (cpu_base->softirq_activated)
> +			continue;
> +		if (expires < cpu_base->softirq_expires_next)
> +			return true;
> +	}
> +	return false;
> +}
> +
>  /*
>   * Clock was set. This might affect CLOCK_REALTIME, CLOCK_TAI and
>   * CLOCK_BOOTTIME (for late sleep time injection).
> @@ -914,16 +976,10 @@ void clock_was_set(unsigned int bases)
>  		unsigned long flags;
>  
>  		raw_spin_lock_irqsave(&cpu_base->lock, flags);
> -		/*
> -		 * Only send the IPI when there are timers queued in one of
> -		 * the affected clock bases. Otherwise update the base
> -		 * remote to ensure that the next enqueue of a timer on
> -		 * such a clock base will see the correct offsets.
> -		 */
> -		if (cpu_base->active_bases & bases)
> +
> +		if (update_needs_ipi(cpu_base, bases))
>  			cpumask_set_cpu(cpu, mask);
> -		else
> -			hrtimer_update_base(cpu_base);
> +
>  		raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
>  	}
>  
> 
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 2/8] hrtimer: Force clock_was_set() handling for the HIGHRES=n, NOHZ=y case
  2021-04-27  8:25 ` [patch 2/8] hrtimer: Force clock_was_set() handling for the HIGHRES=n, NOHZ=y case Thomas Gleixner
@ 2021-05-12 14:59   ` Peter Zijlstra
  2021-05-12 16:40     ` Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2021-05-12 14:59 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Anna-Maria Behnsen, Marcelo Tosatti, Frederic Weisbecker,
	Peter Xu, Nitesh Narayan Lal, Alex Belits, Rafael J. Wysocki,
	John Stultz

On Tue, Apr 27, 2021 at 10:25:39AM +0200, Thomas Gleixner wrote:
>  void clock_was_set(void)
>  {
> +	if (!hrtimer_hres_active() && !tick_nohz_active)
> +		goto out_timerfd;
> +
>  	/* Retrigger the CPU local events everywhere */
>  	on_each_cpu(retrigger_next_event, NULL, 1);
> +
> +out_timerfd:
>  	timerfd_clock_was_set();
>  }

Isn't that simpler when written like:

	if (hrtimer_hres_active() || tick_nohz_active())
		on_each_cpu(retrigger_next_event, NULL, 1);

	timerfd_clock_was_set();

?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 2/8] hrtimer: Force clock_was_set() handling for the HIGHRES=n, NOHZ=y case
  2021-05-12 14:59   ` Peter Zijlstra
@ 2021-05-12 16:40     ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2021-05-12 16:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Anna-Maria Behnsen, Marcelo Tosatti, Frederic Weisbecker,
	Peter Xu, Nitesh Narayan Lal, Alex Belits, Rafael J. Wysocki,
	John Stultz

On Wed, May 12 2021 at 16:59, Peter Zijlstra wrote:
> On Tue, Apr 27, 2021 at 10:25:39AM +0200, Thomas Gleixner wrote:
>>  void clock_was_set(void)
>>  {
>> +	if (!hrtimer_hres_active() && !tick_nohz_active)
>> +		goto out_timerfd;
>> +
>>  	/* Retrigger the CPU local events everywhere */
>>  	on_each_cpu(retrigger_next_event, NULL, 1);
>> +
>> +out_timerfd:
>>  	timerfd_clock_was_set();
>>  }
>
> Isn't that simpler when written like:
>
> 	if (hrtimer_hres_active() || tick_nohz_active())
> 		on_each_cpu(retrigger_next_event, NULL, 1);
>
> 	timerfd_clock_was_set();
>
> ?

Yes, but look at the later patches. Then we'll introduce it there. :)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch V2 8/8] hrtimer: Avoid more SMP function calls in clock_was_set()
  2021-04-30  7:12       ` [patch V2 " Thomas Gleixner
  2021-04-30 16:49         ` Marcelo Tosatti
@ 2021-05-13  7:47         ` Peter Zijlstra
  2021-05-14 19:08           ` Thomas Gleixner
  1 sibling, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2021-05-13  7:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Marcelo Tosatti, LKML, Anna-Maria Behnsen, Frederic Weisbecker,
	Peter Xu, Nitesh Narayan Lal, Alex Belits, Rafael J. Wysocki,
	John Stultz

On Fri, Apr 30, 2021 at 09:12:15AM +0200, Thomas Gleixner wrote:
> +static bool update_needs_ipi(struct hrtimer_cpu_base *cpu_base,
> +			     unsigned int active)
> +{
> +	struct hrtimer_clock_base *base;
> +	unsigned int seq;
> +	ktime_t expires;
> +
> +	/*
> +	 * Update the base offsets unconditionally so the following
> +	 * checks whether the SMP function call is required works.
> +	 *
> +	 * The update is safe even when the remote CPU is in the hrtimer
> +	 * interrupt or the hrtimer soft interrupt and expiring affected
> +	 * bases. Either it will see the update before handling a base or
> +	 * it will see it when it finishes the processing and reevaluates
> +	 * the next expiring timer.
> +	 */
> +	seq = cpu_base->clock_was_set_seq;
> +	hrtimer_update_base(cpu_base);
> +
> +	/*
> +	 * If the sequence did not change over the update then the
> +	 * remote CPU already handled it.
> +	 */
> +	if (seq == cpu_base->clock_was_set_seq)
> +		return false;
> +

So far so simple, if there's nothing to update, we done.

> +	/*
> +	 * If the remote CPU is currently handling an hrtimer interrupt, it
> +	 * will reevaluate the first expiring timer of all clock bases
> +	 * before reprogramming. Nothing to do here.
> +	 */
> +	if (cpu_base->in_hrtirq)
> +		return false;

This one gives me a head-ache though; if we get here, that means
hrtimer_interrupt()'s hrtimer_update_base() happened before the change.
It also means that CPU is in __run_hrtimer() running a fn(), since we
own cpu_base->lock.

That in turn means it is in __hrtimer_run_queues(), possible on the last
base.

Now, if I understand it right, the thing that saves us, is that
hrtimer_update_next_event() -- right after returning from
__hrtimer_run_queues() -- will re-evaluate all bases (with the
hrtimer_update_base() we just did visible to it) and we'll eventually
goto retry if time moved such that we now have timers that should've ran
but were missed due to this concurrent shift in time.

However, since that retries thing is limited to 3; could we not trigger
that by generating a stream of these updates, causing the timer to keep
having to be reset? I suppose updating time is a root only thing, and
root can shoot its own foot off any time it damn well likes, so who
cares.

> +	/*
> +	 * Walk the affected clock bases and check whether the first expiring
> +	 * timer in a clock base is moving ahead of the first expiring timer of
> +	 * @cpu_base. If so, the IPI must be invoked because per CPU clock
> +	 * event devices cannot be remotely reprogrammed.
> +	 */
> +	active &= cpu_base->active_bases;
> +
> +	for_each_active_base(base, cpu_base, active) {
> +		struct timerqueue_node *next;
> +
> +		next = timerqueue_getnext(&base->active);
> +		expires = ktime_sub(next->expires, base->offset);
> +		if (expires < cpu_base->expires_next)
> +			return true;
> +
> +		/* Extra check for softirq clock bases */
> +		if (base->clockid < HRTIMER_BASE_MONOTONIC_SOFT)
> +			continue;
> +		if (cpu_base->softirq_activated)
> +			continue;
> +		if (expires < cpu_base->softirq_expires_next)
> +			return true;
> +	}

Fair enough..

> +	return false;
> +}
> +
>  /*
>   * Clock was set. This might affect CLOCK_REALTIME, CLOCK_TAI and
>   * CLOCK_BOOTTIME (for late sleep time injection).


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 7/8] hrtimer: Avoid unnecessary SMP function calls in clock_was_set()
  2021-04-27  8:25 ` [patch 7/8] hrtimer: Avoid unnecessary SMP function calls in clock_was_set() Thomas Gleixner
@ 2021-05-13 14:59   ` Peter Zijlstra
  2021-05-14 18:52     ` Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2021-05-13 14:59 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Anna-Maria Behnsen, Marcelo Tosatti, Frederic Weisbecker,
	Peter Xu, Nitesh Narayan Lal, Alex Belits, Rafael J. Wysocki,
	John Stultz

On Tue, Apr 27, 2021 at 10:25:44AM +0200, Thomas Gleixner wrote:
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -896,11 +896,42 @@ static void hrtimer_reprogram(struct hrt
>   */
>  void clock_was_set(unsigned int bases)
>  {
> +	cpumask_var_t mask;
> +	int cpu;
> +
>  	if (!hrtimer_hres_active() && !tick_nohz_active)
>  		goto out_timerfd;
>  
> -	/* Retrigger the CPU local events everywhere */
> -	on_each_cpu(retrigger_next_event, NULL, 1);
> +	if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
> +		on_each_cpu(retrigger_next_event, NULL, 1);

This will violate NOHZ_FULL;

> +		goto out_timerfd;
> +	}
> +
> +	/* Avoid interrupting CPUs if possible */
> +	cpus_read_lock();
> +	for_each_online_cpu(cpu) {
> +		struct hrtimer_cpu_base *cpu_base = &per_cpu(hrtimer_bases, cpu);
> +		unsigned long flags;
> +
> +		raw_spin_lock_irqsave(&cpu_base->lock, flags);
> +		/*
> +		 * Only send the IPI when there are timers queued in one of
> +		 * the affected clock bases. Otherwise update the base
> +		 * remote to ensure that the next enqueue of a timer on
> +		 * such a clock base will see the correct offsets.
> +		 */
> +		if (cpu_base->active_bases & bases)
> +			cpumask_set_cpu(cpu, mask);
> +		else
> +			hrtimer_update_base(cpu_base);
> +		raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
> +	}
> +
> +	preempt_disable();
> +	smp_call_function_many(mask, retrigger_next_event, NULL, 1);

The sane option is:

	smp_call_function_many_cond(cpu_online_mask, retrigger_next_event,
				    NULL, SCF_WAIT, update_needs_ipi);

Which does all of the above, but better.


> +	preempt_enable();
> +	cpus_read_unlock();
> +	free_cpumask_var(mask);
>  
>  out_timerfd:
>  	timerfd_clock_was_set();
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 7/8] hrtimer: Avoid unnecessary SMP function calls in clock_was_set()
  2021-05-13 14:59   ` Peter Zijlstra
@ 2021-05-14 18:52     ` Thomas Gleixner
  2021-05-14 23:28       ` Peter Zijlstra
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2021-05-14 18:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Anna-Maria Behnsen, Marcelo Tosatti, Frederic Weisbecker,
	Peter Xu, Nitesh Narayan Lal, Alex Belits, Rafael J. Wysocki,
	John Stultz

On Thu, May 13 2021 at 16:59, Peter Zijlstra wrote:
> On Tue, Apr 27, 2021 at 10:25:44AM +0200, Thomas Gleixner wrote:
>> -	/* Retrigger the CPU local events everywhere */
>> -	on_each_cpu(retrigger_next_event, NULL, 1);
>> +	if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
>> +		on_each_cpu(retrigger_next_event, NULL, 1);
>
> This will violate NOHZ_FULL;

Only if that allocation fails.

Aside of that any CPU which has an affected timer will get notified even
on NOHZ_FULL.

>> +		goto out_timerfd;
>> +	}
>> +
>> +	/* Avoid interrupting CPUs if possible */
>> +	cpus_read_lock();
>> +	for_each_online_cpu(cpu) {
>> +		struct hrtimer_cpu_base *cpu_base = &per_cpu(hrtimer_bases, cpu);
>> +		unsigned long flags;
>> +
>> +		raw_spin_lock_irqsave(&cpu_base->lock, flags);
>> +		/*
>> +		 * Only send the IPI when there are timers queued in one of
>> +		 * the affected clock bases. Otherwise update the base
>> +		 * remote to ensure that the next enqueue of a timer on
>> +		 * such a clock base will see the correct offsets.
>> +		 */
>> +		if (cpu_base->active_bases & bases)
>> +			cpumask_set_cpu(cpu, mask);
>> +		else
>> +			hrtimer_update_base(cpu_base);
>> +		raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
>> +	}
>> +
>> +	preempt_disable();
>> +	smp_call_function_many(mask, retrigger_next_event, NULL, 1);
>
> The sane option is:
>
> 	smp_call_function_many_cond(cpu_online_mask, retrigger_next_event,
> 				    NULL, SCF_WAIT, update_needs_ipi);
>
> Which does all of the above, but better.

With the difference that the for_each_cpu() loop runs with preemption
disabled, while with this approach preemption is only disabled accross
the function call.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch V2 8/8] hrtimer: Avoid more SMP function calls in clock_was_set()
  2021-05-13  7:47         ` Peter Zijlstra
@ 2021-05-14 19:08           ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2021-05-14 19:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marcelo Tosatti, LKML, Anna-Maria Behnsen, Frederic Weisbecker,
	Peter Xu, Nitesh Narayan Lal, Alex Belits, Rafael J. Wysocki,
	John Stultz

On Thu, May 13 2021 at 09:47, Peter Zijlstra wrote:
> On Fri, Apr 30, 2021 at 09:12:15AM +0200, Thomas Gleixner wrote:
>> +	/*
>> +	 * If the remote CPU is currently handling an hrtimer interrupt, it
>> +	 * will reevaluate the first expiring timer of all clock bases
>> +	 * before reprogramming. Nothing to do here.
>> +	 */
>> +	if (cpu_base->in_hrtirq)
>> +		return false;
>
> This one gives me a head-ache though; if we get here, that means
> hrtimer_interrupt()'s hrtimer_update_base() happened before the change.
> It also means that CPU is in __run_hrtimer() running a fn(), since we
> own cpu_base->lock.
>
> That in turn means it is in __hrtimer_run_queues(), possible on the last
> base.
>
> Now, if I understand it right, the thing that saves us, is that
> hrtimer_update_next_event() -- right after returning from
> __hrtimer_run_queues() -- will re-evaluate all bases (with the
> hrtimer_update_base() we just did visible to it) and we'll eventually
> goto retry if time moved such that we now have timers that should've ran
> but were missed due to this concurrent shift in time.

Correct.

> However, since that retries thing is limited to 3; could we not trigger
> that by generating a stream of these updates, causing the timer to keep
> having to be reset? I suppose updating time is a root only thing, and
> root can shoot its own foot off any time it damn well likes, so who
> cares.

It's root only. Sou you could argue that a borked NTPd can cause this to
happen, but then you surely have other problems aside of hitting the
retries limit.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 7/8] hrtimer: Avoid unnecessary SMP function calls in clock_was_set()
  2021-05-14 18:52     ` Thomas Gleixner
@ 2021-05-14 23:28       ` Peter Zijlstra
  2021-05-15  0:24         ` Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2021-05-14 23:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Anna-Maria Behnsen, Marcelo Tosatti, Frederic Weisbecker,
	Peter Xu, Nitesh Narayan Lal, Alex Belits, Rafael J. Wysocki,
	John Stultz

On Fri, May 14, 2021 at 08:52:33PM +0200, Thomas Gleixner wrote:
> On Thu, May 13 2021 at 16:59, Peter Zijlstra wrote:
> > On Tue, Apr 27, 2021 at 10:25:44AM +0200, Thomas Gleixner wrote:
> >> -	/* Retrigger the CPU local events everywhere */
> >> -	on_each_cpu(retrigger_next_event, NULL, 1);
> >> +	if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
> >> +		on_each_cpu(retrigger_next_event, NULL, 1);
> >
> > This will violate NOHZ_FULL;
> 
> Only if that allocation fails.

Right, which should be near to never I suppose.

> Aside of that any CPU which has an affected timer will get notified even
> on NOHZ_FULL.

Right; but if it's properly NOHZ_FULL -- the kind that wanted a signal
on any entry into the kernel -- when it won't have timers and this IPI
will trigger the signal and kill the program.

But yeah, you're right, that's not very likely.


> >> +	preempt_disable();
> >> +	smp_call_function_many(mask, retrigger_next_event, NULL, 1);
> >
> > The sane option is:
> >
> > 	smp_call_function_many_cond(cpu_online_mask, retrigger_next_event,
> > 				    NULL, SCF_WAIT, update_needs_ipi);
> >
> > Which does all of the above, but better.
> 
> With the difference that the for_each_cpu() loop runs with preemption
> disabled, while with this approach preemption is only disabled accross
> the function call.

Yeah, I'd forgotten that... I might put looking at that on the todo list
somewhere :/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 7/8] hrtimer: Avoid unnecessary SMP function calls in clock_was_set()
  2021-05-14 23:28       ` Peter Zijlstra
@ 2021-05-15  0:24         ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2021-05-15  0:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Anna-Maria Behnsen, Marcelo Tosatti, Frederic Weisbecker,
	Peter Xu, Nitesh Narayan Lal, Alex Belits, Rafael J. Wysocki,
	John Stultz

On Sat, May 15 2021 at 01:28, Peter Zijlstra wrote:

> On Fri, May 14, 2021 at 08:52:33PM +0200, Thomas Gleixner wrote:
>> On Thu, May 13 2021 at 16:59, Peter Zijlstra wrote:
>> > On Tue, Apr 27, 2021 at 10:25:44AM +0200, Thomas Gleixner wrote:
>> >> -	/* Retrigger the CPU local events everywhere */
>> >> -	on_each_cpu(retrigger_next_event, NULL, 1);
>> >> +	if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
>> >> +		on_each_cpu(retrigger_next_event, NULL, 1);
>> >
>> > This will violate NOHZ_FULL;
>> 
>> Only if that allocation fails.
>
> Right, which should be near to never I suppose.
>
>> Aside of that any CPU which has an affected timer will get notified even
>> on NOHZ_FULL.
>
> Right; but if it's properly NOHZ_FULL -- the kind that wanted a signal
> on any entry into the kernel -- when it won't have timers and this IPI
> will trigger the signal and kill the program.

That's true today. clock_was_set() IPI's unconditionally. The whole
point of this exercise is to avoid that if there are no armed timers on
the affected clocks.

>> >> +	preempt_disable();
>> >> +	smp_call_function_many(mask, retrigger_next_event, NULL, 1);
>> >
>> > The sane option is:
>> >
>> > 	smp_call_function_many_cond(cpu_online_mask, retrigger_next_event,
>> > 				    NULL, SCF_WAIT, update_needs_ipi);
>> >
>> > Which does all of the above, but better.
>> 
>> With the difference that the for_each_cpu() loop runs with preemption
>> disabled, while with this approach preemption is only disabled accross
>> the function call.
>
> Yeah, I'd forgotten that... I might put looking at that on the todo list
> somewhere :/

That's this append only thingy, right?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-05-15  0:24 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-27  8:25 [patch 0/8] hrtimers: Overhaul the clock_was_set() logic Thomas Gleixner
2021-04-27  8:25 ` [patch 1/8] hrtimer: Ensure timerfd notification for HIGHRES=n Thomas Gleixner
2021-04-27  8:25 ` [patch 2/8] hrtimer: Force clock_was_set() handling for the HIGHRES=n, NOHZ=y case Thomas Gleixner
2021-05-12 14:59   ` Peter Zijlstra
2021-05-12 16:40     ` Thomas Gleixner
2021-04-27  8:25 ` [patch 3/8] timerfd: Provide timerfd_resume() Thomas Gleixner
2021-04-27  8:25 ` [patch 4/8] timekeeping: Distangle resume and clock-was-set events Thomas Gleixner
2021-04-27  8:25 ` [patch 5/8] time/timekeeping: Avoid invoking clock_was_set() twice Thomas Gleixner
2021-04-27  8:25 ` [patch 6/8] hrtimer: Add bases argument to clock_was_set() Thomas Gleixner
2021-04-27  8:25 ` [patch 7/8] hrtimer: Avoid unnecessary SMP function calls in clock_was_set() Thomas Gleixner
2021-05-13 14:59   ` Peter Zijlstra
2021-05-14 18:52     ` Thomas Gleixner
2021-05-14 23:28       ` Peter Zijlstra
2021-05-15  0:24         ` Thomas Gleixner
2021-04-27  8:25 ` [patch 8/8] hrtimer: Avoid more " Thomas Gleixner
2021-04-27 15:11   ` Marcelo Tosatti
2021-04-27 19:59     ` Thomas Gleixner
2021-04-30  7:12       ` [patch V2 " Thomas Gleixner
2021-04-30 16:49         ` Marcelo Tosatti
2021-05-13  7:47         ` Peter Zijlstra
2021-05-14 19:08           ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).