linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11] timer: Reduce timers softirq v3
@ 2020-07-17 14:05 Frederic Weisbecker
  2020-07-17 14:05 ` [PATCH 01/12] timer: Fix wheel index calculation on last level Frederic Weisbecker
                   ` (11 more replies)
  0 siblings, 12 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2020-07-17 14:05 UTC (permalink / raw)
  To: Thomas Gleixner, Anna-Maria Behnsen
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Juri Lelli

This set piggybacks the 3 patches that have been posted seperately
along the reviews.

Changes since v2:

1) Add "timer: Fix wheel index calculation on last level"
   This one targets timers/urgent

2) Add "timer: Preserve higher bits of expiration on index calculation"
       "timer: Move trigger_dyntick_cpu() to enqueue_timer()"

3) Rebase the series against the whole

4) Fix comments in "timer: Move trigger_dyntick_cpu() to enqueue_timer()"
   as per Anna Maria's suggestion

5) Fix comments in "timer: Add comments about calc_index() ceiling work"
   as per Thomas' suggestion

Please pull the timers/softirq-v3 branch that can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	timers/softirq-v3

HEAD: f176eb2a146b422f78d81ec63273487c33b3011d

Thanks,
	Frederic
---

Frederic Weisbecker (11):
      timer: Fix wheel index calculation on last level
      timer: Preserve higher bits of expiration on index calculation
      timer: Move trigger_dyntick_cpu() to enqueue_timer()
      timer: Add comments about calc_index() ceiling work
      timer: Optimize _next_timer_interrupt() level iteration
      timers: Always keep track of next expiry
      timer: Reuse next expiry cache after nohz exit
      timer: Expand clk forward logic beyond nohz
      timer: Spare timer softirq until next expiry
      timer: Remove must_forward_clk
      timer: Lower base clock forwarding threshold

Anna-Maria Behnsen (1):
      timers: Use only bucket expiry for base->next_expiry value


 kernel/time/timer.c | 236 ++++++++++++++++++++--------------------------------
 1 file changed, 92 insertions(+), 144 deletions(-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 01/12] timer: Fix wheel index calculation on last level
  2020-07-17 14:05 [PATCH 00/11] timer: Reduce timers softirq v3 Frederic Weisbecker
@ 2020-07-17 14:05 ` Frederic Weisbecker
  2020-07-17 19:49   ` [tip: timers/urgent] " tip-bot2 for Frederic Weisbecker
  2020-07-17 14:05 ` [PATCH 02/12] timer: Preserve higher bits of expiration on index calculation Frederic Weisbecker
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2020-07-17 14:05 UTC (permalink / raw)
  To: Thomas Gleixner, Anna-Maria Behnsen
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Juri Lelli, stable

When an expiration delta falls into the last level of the wheel, we want
to compare that delta against the maximum possible delay and reduce our
delta to fit in if necessary.

However instead of comparing the delta against the maximum, we are
comparing the actual expiry against the maximum. Then instead of fixing
the delta to fit in, we set the maximum delta as the expiry value.

This can result in various undesired outcomes, the worst possible one
being a timer expiring 15 days ahead to fire immediately.

Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
---
 kernel/time/timer.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 9a838d38dbe6..df1ff803acc4 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -521,8 +521,8 @@ static int calc_wheel_index(unsigned long expires, unsigned long clk)
 		 * Force expire obscene large timeouts to expire at the
 		 * capacity limit of the wheel.
 		 */
-		if (expires >= WHEEL_TIMEOUT_CUTOFF)
-			expires = WHEEL_TIMEOUT_MAX;
+		if (delta >= WHEEL_TIMEOUT_CUTOFF)
+			expires = clk + WHEEL_TIMEOUT_MAX;
 
 		idx = calc_index(expires, LVL_DEPTH - 1);
 	}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/12] timer: Preserve higher bits of expiration on index calculation
  2020-07-17 14:05 [PATCH 00/11] timer: Reduce timers softirq v3 Frederic Weisbecker
  2020-07-17 14:05 ` [PATCH 01/12] timer: Fix wheel index calculation on last level Frederic Weisbecker
@ 2020-07-17 14:05 ` Frederic Weisbecker
  2020-07-17 20:00   ` [tip: timers/core] timers: " tip-bot2 for Frederic Weisbecker
  2020-07-17 14:05 ` [PATCH 03/12] timers: Use only bucket expiry for base->next_expiry value Frederic Weisbecker
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2020-07-17 14:05 UTC (permalink / raw)
  To: Thomas Gleixner, Anna-Maria Behnsen
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Juri Lelli

The higher bits of the timer expiration are cropped while calling
calc_index() due to the implicit cast from unsigned long to unsigned int.

This loss shouldn't have consequences on the current code since all the
computation to calculate the index is done on the lower 32 bits.

However we are preparing to return the actual bucket expiration from
calc_index() in order to properly fix base->next_expiry updates.
Preserving the higher bits is a requirement to achieve that.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
---
 kernel/time/timer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index df1ff803acc4..bcdc3045138d 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -487,7 +487,7 @@ static inline void timer_set_idx(struct timer_list *timer, unsigned int idx)
  * Helper function to calculate the array index for a given expiry
  * time.
  */
-static inline unsigned calc_index(unsigned expires, unsigned lvl)
+static inline unsigned calc_index(unsigned long expires, unsigned lvl)
 {
 	expires = (expires + LVL_GRAN(lvl)) >> LVL_SHIFT(lvl);
 	return LVL_OFFS(lvl) + (expires & LVL_MASK);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 03/12] timers: Use only bucket expiry for base->next_expiry value
  2020-07-17 14:05 [PATCH 00/11] timer: Reduce timers softirq v3 Frederic Weisbecker
  2020-07-17 14:05 ` [PATCH 01/12] timer: Fix wheel index calculation on last level Frederic Weisbecker
  2020-07-17 14:05 ` [PATCH 02/12] timer: Preserve higher bits of expiration on index calculation Frederic Weisbecker
@ 2020-07-17 14:05 ` Frederic Weisbecker
  2020-07-17 20:00   ` [tip: timers/core] " tip-bot2 for Anna-Maria Behnsen
  2020-07-17 14:05 ` [PATCH 04/12] timer: Move trigger_dyntick_cpu() to enqueue_timer() Frederic Weisbecker
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2020-07-17 14:05 UTC (permalink / raw)
  To: Thomas Gleixner, Anna-Maria Behnsen
  Cc: LKML, Peter Zijlstra, Juri Lelli, Frederic Weisbecker

From: Anna-Maria Behnsen <anna-maria@linutronix.de>

The bucket expiry time is the effective expriy time of timers and is
greater than or equal to the requested timer expiry time. This is due
to the guarantee that timers never expire early and the reduced expiry
granularity in the secondary wheel levels.

When a timer is enqueued, trigger_dyntick_cpu() checks whether the
timer is the new first timer. This check compares next_expiry with
the requested timer expiry value and not with the effective expiry
value of the bucket into which the timer was queued.

Storing the requested timer expiry value in base->next_expiry can lead
to base->clk going backwards if the requested timer expiry value is
smaller than base->clk. Commit 30c66fc30ee7 ("timer: Prevent base->clk
from moving backward") worked around this by preventing the store when
timer->expiry is before base->clk, but did not fix the underlying
problem.

Use the expiry value of the bucket into which the timer is queued to
do the new first timer check. This fixes the base->clk going backward
problem.

The workaround of commit 30c66fc30ee7 ("timer: Prevent base->clk from
moving backward") in trigger_dyntick_cpu() is not longer necessary as the
timers bucket expiry is guaranteed to be greater than or equal base->clk.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/time/timer.c | 64 ++++++++++++++++++++++++---------------------
 1 file changed, 34 insertions(+), 30 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index bcdc3045138d..a7a3cf737411 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -487,35 +487,39 @@ static inline void timer_set_idx(struct timer_list *timer, unsigned int idx)
  * Helper function to calculate the array index for a given expiry
  * time.
  */
-static inline unsigned calc_index(unsigned long expires, unsigned lvl)
+static inline unsigned calc_index(unsigned long expires, unsigned lvl,
+				  unsigned long *bucket_expiry)
 {
 	expires = (expires + LVL_GRAN(lvl)) >> LVL_SHIFT(lvl);
+	*bucket_expiry = expires << LVL_SHIFT(lvl);
 	return LVL_OFFS(lvl) + (expires & LVL_MASK);
 }
 
-static int calc_wheel_index(unsigned long expires, unsigned long clk)
+static int calc_wheel_index(unsigned long expires, unsigned long clk,
+			    unsigned long *bucket_expiry)
 {
 	unsigned long delta = expires - clk;
 	unsigned int idx;
 
 	if (delta < LVL_START(1)) {
-		idx = calc_index(expires, 0);
+		idx = calc_index(expires, 0, bucket_expiry);
 	} else if (delta < LVL_START(2)) {
-		idx = calc_index(expires, 1);
+		idx = calc_index(expires, 1, bucket_expiry);
 	} else if (delta < LVL_START(3)) {
-		idx = calc_index(expires, 2);
+		idx = calc_index(expires, 2, bucket_expiry);
 	} else if (delta < LVL_START(4)) {
-		idx = calc_index(expires, 3);
+		idx = calc_index(expires, 3, bucket_expiry);
 	} else if (delta < LVL_START(5)) {
-		idx = calc_index(expires, 4);
+		idx = calc_index(expires, 4, bucket_expiry);
 	} else if (delta < LVL_START(6)) {
-		idx = calc_index(expires, 5);
+		idx = calc_index(expires, 5, bucket_expiry);
 	} else if (delta < LVL_START(7)) {
-		idx = calc_index(expires, 6);
+		idx = calc_index(expires, 6, bucket_expiry);
 	} else if (LVL_DEPTH > 8 && delta < LVL_START(8)) {
-		idx = calc_index(expires, 7);
+		idx = calc_index(expires, 7, bucket_expiry);
 	} else if ((long) delta < 0) {
 		idx = clk & LVL_MASK;
+		*bucket_expiry = clk;
 	} else {
 		/*
 		 * Force expire obscene large timeouts to expire at the
@@ -524,7 +528,7 @@ static int calc_wheel_index(unsigned long expires, unsigned long clk)
 		if (delta >= WHEEL_TIMEOUT_CUTOFF)
 			expires = clk + WHEEL_TIMEOUT_MAX;
 
-		idx = calc_index(expires, LVL_DEPTH - 1);
+		idx = calc_index(expires, LVL_DEPTH - 1, bucket_expiry);
 	}
 	return idx;
 }
@@ -544,16 +548,18 @@ static void enqueue_timer(struct timer_base *base, struct timer_list *timer,
 }
 
 static void
-__internal_add_timer(struct timer_base *base, struct timer_list *timer)
+__internal_add_timer(struct timer_base *base, struct timer_list *timer,
+		     unsigned long *bucket_expiry)
 {
 	unsigned int idx;
 
-	idx = calc_wheel_index(timer->expires, base->clk);
+	idx = calc_wheel_index(timer->expires, base->clk, bucket_expiry);
 	enqueue_timer(base, timer, idx);
 }
 
 static void
-trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
+trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer,
+		    unsigned long bucket_expiry)
 {
 	if (!is_timers_nohz_active())
 		return;
@@ -576,31 +582,29 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
 	if (!base->is_idle)
 		return;
 
-	/* Check whether this is the new first expiring timer: */
-	if (time_after_eq(timer->expires, base->next_expiry))
+	/*
+	 * Check whether this is the new first expiring timer. The
+	 * effective expiry time of the timer is required here
+	 * (bucket_expiry) instead of timer->expires.
+	 */
+	if (time_after_eq(bucket_expiry, base->next_expiry))
 		return;
 
 	/*
 	 * Set the next expiry time and kick the CPU so it can reevaluate the
 	 * wheel:
 	 */
-	if (time_before(timer->expires, base->clk)) {
-		/*
-		 * Prevent from forward_timer_base() moving the base->clk
-		 * backward
-		 */
-		base->next_expiry = base->clk;
-	} else {
-		base->next_expiry = timer->expires;
-	}
+	base->next_expiry = bucket_expiry;
 	wake_up_nohz_cpu(base->cpu);
 }
 
 static void
 internal_add_timer(struct timer_base *base, struct timer_list *timer)
 {
-	__internal_add_timer(base, timer);
-	trigger_dyntick_cpu(base, timer);
+	unsigned long bucket_expiry;
+
+	__internal_add_timer(base, timer, &bucket_expiry);
+	trigger_dyntick_cpu(base, timer, bucket_expiry);
 }
 
 #ifdef CONFIG_DEBUG_OBJECTS_TIMERS
@@ -959,9 +963,9 @@ static struct timer_base *lock_timer_base(struct timer_list *timer,
 static inline int
 __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int options)
 {
+	unsigned long clk = 0, flags, bucket_expiry;
 	struct timer_base *base, *new_base;
 	unsigned int idx = UINT_MAX;
-	unsigned long clk = 0, flags;
 	int ret = 0;
 
 	BUG_ON(!timer->function);
@@ -1000,7 +1004,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int option
 		}
 
 		clk = base->clk;
-		idx = calc_wheel_index(expires, clk);
+		idx = calc_wheel_index(expires, clk, &bucket_expiry);
 
 		/*
 		 * Retrieve and compare the array index of the pending
@@ -1059,7 +1063,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int option
 	 */
 	if (idx != UINT_MAX && clk == base->clk) {
 		enqueue_timer(base, timer, idx);
-		trigger_dyntick_cpu(base, timer);
+		trigger_dyntick_cpu(base, timer, bucket_expiry);
 	} else {
 		internal_add_timer(base, timer);
 	}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/12] timer: Move trigger_dyntick_cpu() to enqueue_timer()
  2020-07-17 14:05 [PATCH 00/11] timer: Reduce timers softirq v3 Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2020-07-17 14:05 ` [PATCH 03/12] timers: Use only bucket expiry for base->next_expiry value Frederic Weisbecker
@ 2020-07-17 14:05 ` Frederic Weisbecker
  2020-07-17 20:00   ` [tip: timers/core] timers: " tip-bot2 for Frederic Weisbecker
  2020-07-17 14:05 ` [PATCH 05/12] timer: Add comments about calc_index() ceiling work Frederic Weisbecker
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2020-07-17 14:05 UTC (permalink / raw)
  To: Thomas Gleixner, Anna-Maria Behnsen
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Juri Lelli

Consolidate the code by calling trigger_dyntick_cpu() from
enqueue_timer() instead of calling it from all its callers.

Tested-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Juri Lelli <juri.lelli@redhat.com>
---
 kernel/time/timer.c | 61 +++++++++++++++++++--------------------------
 1 file changed, 25 insertions(+), 36 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index a7a3cf737411..2af08a169564 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -533,30 +533,6 @@ static int calc_wheel_index(unsigned long expires, unsigned long clk,
 	return idx;
 }
 
-/*
- * Enqueue the timer into the hash bucket, mark it pending in
- * the bitmap and store the index in the timer flags.
- */
-static void enqueue_timer(struct timer_base *base, struct timer_list *timer,
-			  unsigned int idx)
-{
-	hlist_add_head(&timer->entry, base->vectors + idx);
-	__set_bit(idx, base->pending_map);
-	timer_set_idx(timer, idx);
-
-	trace_timer_start(timer, timer->expires, timer->flags);
-}
-
-static void
-__internal_add_timer(struct timer_base *base, struct timer_list *timer,
-		     unsigned long *bucket_expiry)
-{
-	unsigned int idx;
-
-	idx = calc_wheel_index(timer->expires, base->clk, bucket_expiry);
-	enqueue_timer(base, timer, idx);
-}
-
 static void
 trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer,
 		    unsigned long bucket_expiry)
@@ -598,15 +574,31 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer,
 	wake_up_nohz_cpu(base->cpu);
 }
 
-static void
-internal_add_timer(struct timer_base *base, struct timer_list *timer)
+/*
+ * Enqueue the timer into the hash bucket, mark it pending in
+ * the bitmap, store the index in the timer flags then wake up
+ * the target CPU if needed.
+ */
+static void enqueue_timer(struct timer_base *base, struct timer_list *timer,
+			  unsigned int idx, unsigned long bucket_expiry)
 {
-	unsigned long bucket_expiry;
+	hlist_add_head(&timer->entry, base->vectors + idx);
+	__set_bit(idx, base->pending_map);
+	timer_set_idx(timer, idx);
 
-	__internal_add_timer(base, timer, &bucket_expiry);
+	trace_timer_start(timer, timer->expires, timer->flags);
 	trigger_dyntick_cpu(base, timer, bucket_expiry);
 }
 
+static void internal_add_timer(struct timer_base *base, struct timer_list *timer)
+{
+	unsigned long bucket_expiry;
+	unsigned int idx;
+
+	idx = calc_wheel_index(timer->expires, base->clk, &bucket_expiry);
+	enqueue_timer(base, timer, idx, bucket_expiry);
+}
+
 #ifdef CONFIG_DEBUG_OBJECTS_TIMERS
 
 static struct debug_obj_descr timer_debug_descr;
@@ -1057,16 +1049,13 @@ __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int option
 	/*
 	 * If 'idx' was calculated above and the base time did not advance
 	 * between calculating 'idx' and possibly switching the base, only
-	 * enqueue_timer() and trigger_dyntick_cpu() is required. Otherwise
-	 * we need to (re)calculate the wheel index via
-	 * internal_add_timer().
+	 * enqueue_timer() is required. Otherwise we need to (re)calculate
+	 * the wheel index via internal_add_timer().
 	 */
-	if (idx != UINT_MAX && clk == base->clk) {
-		enqueue_timer(base, timer, idx);
-		trigger_dyntick_cpu(base, timer, bucket_expiry);
-	} else {
+	if (idx != UINT_MAX && clk == base->clk)
+		enqueue_timer(base, timer, idx, bucket_expiry);
+	else
 		internal_add_timer(base, timer);
-	}
 
 out_unlock:
 	raw_spin_unlock_irqrestore(&base->lock, flags);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 05/12] timer: Add comments about calc_index() ceiling work
  2020-07-17 14:05 [PATCH 00/11] timer: Reduce timers softirq v3 Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2020-07-17 14:05 ` [PATCH 04/12] timer: Move trigger_dyntick_cpu() to enqueue_timer() Frederic Weisbecker
@ 2020-07-17 14:05 ` Frederic Weisbecker
  2020-07-17 20:00   ` [tip: timers/core] timers: " tip-bot2 for Frederic Weisbecker
  2020-07-17 14:05 ` [PATCH 06/12] timer: Optimize _next_timer_interrupt() level iteration Frederic Weisbecker
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2020-07-17 14:05 UTC (permalink / raw)
  To: Thomas Gleixner, Anna-Maria Behnsen
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Juri Lelli

calc_index() adds 1 unit of the level granularity to the expiry passed
in parameter to ensure that the timer doesn't expire too early. Add a
comment to explain that and the resulting layout in the wheel.

Most-of-the-patch-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Juri Lelli <juri.lelli@redhat.com>
---
 kernel/time/timer.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2af08a169564..af1c08b0b168 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -156,7 +156,8 @@ EXPORT_SYMBOL(jiffies_64);
 
 /*
  * The time start value for each level to select the bucket at enqueue
- * time.
+ * time. We start from the last possible delta of the previous level
+ * so that we can later add an extra LVL_GRAN(n) to n (see calc_index()).
  */
 #define LVL_START(n)	((LVL_SIZE - 1) << (((n) - 1) * LVL_CLK_SHIFT))
 
@@ -490,6 +491,15 @@ static inline void timer_set_idx(struct timer_list *timer, unsigned int idx)
 static inline unsigned calc_index(unsigned long expires, unsigned lvl,
 				  unsigned long *bucket_expiry)
 {
+
+	/*
+	 * The timer wheel has to guarantee that a timer does not fire
+	 * early. Early expiry can happen due to:
+	 * - Timer is armed at the edge of a tick
+	 * - Truncation of the expiry time in the outer wheel levels
+	 *
+	 * Round up with level granularity to prevent this.
+	 */
 	expires = (expires + LVL_GRAN(lvl)) >> LVL_SHIFT(lvl);
 	*bucket_expiry = expires << LVL_SHIFT(lvl);
 	return LVL_OFFS(lvl) + (expires & LVL_MASK);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 06/12] timer: Optimize _next_timer_interrupt() level iteration
  2020-07-17 14:05 [PATCH 00/11] timer: Reduce timers softirq v3 Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2020-07-17 14:05 ` [PATCH 05/12] timer: Add comments about calc_index() ceiling work Frederic Weisbecker
@ 2020-07-17 14:05 ` Frederic Weisbecker
  2020-07-17 20:00   ` [tip: timers/core] timers: " tip-bot2 for Frederic Weisbecker
  2020-07-17 14:05 ` [PATCH 07/12] timers: Always keep track of next expiry Frederic Weisbecker
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2020-07-17 14:05 UTC (permalink / raw)
  To: Thomas Gleixner, Anna-Maria Behnsen
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Juri Lelli

If a level has a timer that expires before we reach the next level,
there is no need to iterate further.

The next level is reached when the 3 lower bits of the current level are
cleared. If the next event happens before/during that, the next levels
won't give us an earlier expiration.

Tested-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Juri Lelli <juri.lelli@redhat.com>
---
 kernel/time/timer.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index af1c08b0b168..9abc41715fd2 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1526,6 +1526,7 @@ static unsigned long __next_timer_interrupt(struct timer_base *base)
 	clk = base->clk;
 	for (lvl = 0; lvl < LVL_DEPTH; lvl++, offset += LVL_SIZE) {
 		int pos = next_pending_bucket(base, offset, clk & LVL_MASK);
+		unsigned long lvl_clk = clk & LVL_CLK_MASK;
 
 		if (pos >= 0) {
 			unsigned long tmp = clk + (unsigned long) pos;
@@ -1533,6 +1534,13 @@ static unsigned long __next_timer_interrupt(struct timer_base *base)
 			tmp <<= LVL_SHIFT(lvl);
 			if (time_before(tmp, next))
 				next = tmp;
+
+			/*
+			 * If the next expiration happens before we reach
+			 * the next level, no need to check further.
+			 */
+			if (pos <= ((LVL_CLK_DIV - lvl_clk) & LVL_CLK_MASK))
+				break;
 		}
 		/*
 		 * Clock for the next level. If the current level clock lower
@@ -1570,7 +1578,7 @@ static unsigned long __next_timer_interrupt(struct timer_base *base)
 		 * So the simple check whether the lower bits of the current
 		 * level are 0 or not is sufficient for all cases.
 		 */
-		adj = clk & LVL_CLK_MASK ? 1 : 0;
+		adj = lvl_clk ? 1 : 0;
 		clk >>= LVL_CLK_SHIFT;
 		clk += adj;
 	}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 07/12] timers: Always keep track of next expiry
  2020-07-17 14:05 [PATCH 00/11] timer: Reduce timers softirq v3 Frederic Weisbecker
                   ` (5 preceding siblings ...)
  2020-07-17 14:05 ` [PATCH 06/12] timer: Optimize _next_timer_interrupt() level iteration Frederic Weisbecker
@ 2020-07-17 14:05 ` Frederic Weisbecker
  2020-07-17 20:00   ` [tip: timers/core] " tip-bot2 for Frederic Weisbecker
  2020-07-17 14:05 ` [PATCH 08/12] timer: Reuse next expiry cache after nohz exit Frederic Weisbecker
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2020-07-17 14:05 UTC (permalink / raw)
  To: Thomas Gleixner, Anna-Maria Behnsen
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Juri Lelli

So far next expiry was only tracked while the CPU was in nohz_idle mode
in order to cope with missing ticks that can't increment the base->clk
periodically anymore.

We are going to expand that logic beyond nohz in order to spare timers
softirqs so do it unconditionally.

Tested-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Juri Lelli <juri.lelli@redhat.com>
---
 kernel/time/timer.c | 42 +++++++++++++++++++++---------------------
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 9abc41715fd2..76fd9644638b 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -544,8 +544,7 @@ static int calc_wheel_index(unsigned long expires, unsigned long clk,
 }
 
 static void
-trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer,
-		    unsigned long bucket_expiry)
+trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
 {
 	if (!is_timers_nohz_active())
 		return;
@@ -565,23 +564,8 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer,
 	 * timer is not deferrable. If the other CPU is on the way to idle
 	 * then it can't set base->is_idle as we hold the base lock:
 	 */
-	if (!base->is_idle)
-		return;
-
-	/*
-	 * Check whether this is the new first expiring timer. The
-	 * effective expiry time of the timer is required here
-	 * (bucket_expiry) instead of timer->expires.
-	 */
-	if (time_after_eq(bucket_expiry, base->next_expiry))
-		return;
-
-	/*
-	 * Set the next expiry time and kick the CPU so it can reevaluate the
-	 * wheel:
-	 */
-	base->next_expiry = bucket_expiry;
-	wake_up_nohz_cpu(base->cpu);
+	if (base->is_idle)
+		wake_up_nohz_cpu(base->cpu);
 }
 
 /*
@@ -592,12 +576,26 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer,
 static void enqueue_timer(struct timer_base *base, struct timer_list *timer,
 			  unsigned int idx, unsigned long bucket_expiry)
 {
+
 	hlist_add_head(&timer->entry, base->vectors + idx);
 	__set_bit(idx, base->pending_map);
 	timer_set_idx(timer, idx);
 
 	trace_timer_start(timer, timer->expires, timer->flags);
-	trigger_dyntick_cpu(base, timer, bucket_expiry);
+
+	/*
+	 * Check whether this is the new first expiring timer. The
+	 * effective expiry time of the timer is required here
+	 * (bucket_expiry) instead of timer->expires.
+	 */
+	if (time_before(bucket_expiry, base->next_expiry)) {
+		/*
+		 * Set the next expiry time and kick the CPU so it
+		 * can reevaluate the wheel:
+		 */
+		base->next_expiry = bucket_expiry;
+		trigger_dyntick_cpu(base, timer);
+	}
 }
 
 static void internal_add_timer(struct timer_base *base, struct timer_list *timer)
@@ -1493,7 +1491,6 @@ static int __collect_expired_timers(struct timer_base *base,
 	return levels;
 }
 
-#ifdef CONFIG_NO_HZ_COMMON
 /*
  * Find the next pending bucket of a level. Search from level start (@offset)
  * + @clk upwards and if nothing there, search from start of the level
@@ -1585,6 +1582,7 @@ static unsigned long __next_timer_interrupt(struct timer_base *base)
 	return next;
 }
 
+#ifdef CONFIG_NO_HZ_COMMON
 /*
  * Check, if the next hrtimer event is before the next timer wheel
  * event:
@@ -1790,6 +1788,7 @@ static inline void __run_timers(struct timer_base *base)
 
 		levels = collect_expired_timers(base, heads);
 		base->clk++;
+		base->next_expiry = __next_timer_interrupt(base);
 
 		while (levels--)
 			expire_timers(base, heads + levels);
@@ -2042,6 +2041,7 @@ static void __init init_timer_cpu(int cpu)
 		base->cpu = cpu;
 		raw_spin_lock_init(&base->lock);
 		base->clk = jiffies;
+		base->next_expiry = base->clk + NEXT_TIMER_MAX_DELTA;
 		timer_base_init_expiry_lock(base);
 	}
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 08/12] timer: Reuse next expiry cache after nohz exit
  2020-07-17 14:05 [PATCH 00/11] timer: Reduce timers softirq v3 Frederic Weisbecker
                   ` (6 preceding siblings ...)
  2020-07-17 14:05 ` [PATCH 07/12] timers: Always keep track of next expiry Frederic Weisbecker
@ 2020-07-17 14:05 ` Frederic Weisbecker
  2020-07-17 20:00   ` [tip: timers/core] timers: " tip-bot2 for Frederic Weisbecker
  2020-07-17 14:05 ` [PATCH 09/12] timer: Expand clk forward logic beyond nohz Frederic Weisbecker
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2020-07-17 14:05 UTC (permalink / raw)
  To: Thomas Gleixner, Anna-Maria Behnsen
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Juri Lelli

Now that we track the next expiry unconditionally when a timer is added,
we can reuse that information on a tick firing after exiting nohz.

Tested-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Juri Lelli <juri.lelli@redhat.com>
---
 kernel/time/timer.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 76fd9644638b..13f48ee708aa 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1706,13 +1706,11 @@ static int collect_expired_timers(struct timer_base *base,
 	 * the next expiring timer.
 	 */
 	if ((long)(now - base->clk) > 2) {
-		unsigned long next = __next_timer_interrupt(base);
-
 		/*
 		 * If the next timer is ahead of time forward to current
 		 * jiffies, otherwise forward to the next expiry time:
 		 */
-		if (time_after(next, now)) {
+		if (time_after(base->next_expiry, now)) {
 			/*
 			 * The call site will increment base->clk and then
 			 * terminate the expiry loop immediately.
@@ -1720,7 +1718,7 @@ static int collect_expired_timers(struct timer_base *base,
 			base->clk = now;
 			return 0;
 		}
-		base->clk = next;
+		base->clk = base->next_expiry;
 	}
 	return __collect_expired_timers(base, heads);
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 09/12] timer: Expand clk forward logic beyond nohz
  2020-07-17 14:05 [PATCH 00/11] timer: Reduce timers softirq v3 Frederic Weisbecker
                   ` (7 preceding siblings ...)
  2020-07-17 14:05 ` [PATCH 08/12] timer: Reuse next expiry cache after nohz exit Frederic Weisbecker
@ 2020-07-17 14:05 ` Frederic Weisbecker
  2020-07-17 20:00   ` [tip: timers/core] timers: " tip-bot2 for Frederic Weisbecker
  2020-07-17 14:05 ` [PATCH 10/12] timer: Spare timer softirq until next expiry Frederic Weisbecker
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2020-07-17 14:05 UTC (permalink / raw)
  To: Thomas Gleixner, Anna-Maria Behnsen
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Juri Lelli

As for next_expiry, we want to expand base->clk catch up logic beyond
nohz in order to avoid triggering useless softirqs.

If we want to fire softirqs only to expire pending timers, periodic
base->clk increments must be skippable for random amounts of time.
Therefore we must prepare to catch-up with missing updates whenever we
need an up-to-date base clock.

Tested-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Juri Lelli <juri.lelli@redhat.com>
---
 kernel/time/timer.c | 26 ++++----------------------
 1 file changed, 4 insertions(+), 22 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 13f48ee708aa..1be92b53b75f 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -888,19 +888,12 @@ get_target_base(struct timer_base *base, unsigned tflags)
 
 static inline void forward_timer_base(struct timer_base *base)
 {
-#ifdef CONFIG_NO_HZ_COMMON
 	unsigned long jnow;
 
-	/*
-	 * We only forward the base when we are idle or have just come out of
-	 * idle (must_forward_clk logic), and have a delta between base clock
-	 * and jiffies. In the common case, run_timers will take care of it.
-	 */
-	if (likely(!base->must_forward_clk))
+	if (!base->must_forward_clk)
 		return;
 
 	jnow = READ_ONCE(jiffies);
-	base->must_forward_clk = base->is_idle;
 	if ((long)(jnow - base->clk) < 2)
 		return;
 
@@ -915,7 +908,6 @@ static inline void forward_timer_base(struct timer_base *base)
 			return;
 		base->clk = base->next_expiry;
 	}
-#endif
 }
 
 
@@ -1667,10 +1659,8 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem)
 		 * logic is only maintained for the BASE_STD base, deferrable
 		 * timers may still see large granularity skew (by design).
 		 */
-		if ((expires - basem) > TICK_NSEC) {
-			base->must_forward_clk = true;
+		if ((expires - basem) > TICK_NSEC)
 			base->is_idle = true;
-		}
 	}
 	raw_spin_unlock(&base->lock);
 
@@ -1769,16 +1759,7 @@ static inline void __run_timers(struct timer_base *base)
 	/*
 	 * timer_base::must_forward_clk must be cleared before running
 	 * timers so that any timer functions that call mod_timer() will
-	 * not try to forward the base. Idle tracking / clock forwarding
-	 * logic is only used with BASE_STD timers.
-	 *
-	 * The must_forward_clk flag is cleared unconditionally also for
-	 * the deferrable base. The deferrable base is not affected by idle
-	 * tracking and never forwarded, so clearing the flag is a NOOP.
-	 *
-	 * The fact that the deferrable base is never forwarded can cause
-	 * large variations in granularity for deferrable timers, but they
-	 * can be deferred for long periods due to idle anyway.
+	 * not try to forward the base.
 	 */
 	base->must_forward_clk = false;
 
@@ -1791,6 +1772,7 @@ static inline void __run_timers(struct timer_base *base)
 		while (levels--)
 			expire_timers(base, heads + levels);
 	}
+	base->must_forward_clk = true;
 	raw_spin_unlock_irq(&base->lock);
 	timer_base_unlock_expiry(base);
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 10/12] timer: Spare timer softirq until next expiry
  2020-07-17 14:05 [PATCH 00/11] timer: Reduce timers softirq v3 Frederic Weisbecker
                   ` (8 preceding siblings ...)
  2020-07-17 14:05 ` [PATCH 09/12] timer: Expand clk forward logic beyond nohz Frederic Weisbecker
@ 2020-07-17 14:05 ` Frederic Weisbecker
  2020-07-17 20:00   ` [tip: timers/core] timers: " tip-bot2 for Frederic Weisbecker
  2020-07-17 14:05 ` [PATCH 11/12] timer: Remove must_forward_clk Frederic Weisbecker
  2020-07-17 14:05 ` [PATCH 12/12] timer: Lower base clock forwarding threshold Frederic Weisbecker
  11 siblings, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2020-07-17 14:05 UTC (permalink / raw)
  To: Thomas Gleixner, Anna-Maria Behnsen
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Juri Lelli

Now that the core timer infrastructure doesn't depend anymore on
periodic base->clk increments, even when the CPU is not in NO_HZ mode,
we can delay the timer softirqs until we have actual timers to expire.

Some spurious softirqs can still remain since base->next_expiry doesn't
keep track of canceled timers but we are still way ahead of the
unconditional periodic softirqs (~15 times less of them with 1000 Hz
and ~5 times less with 100 Hz).

Tested-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Juri Lelli <juri.lelli@redhat.com>
---
 kernel/time/timer.c | 49 ++++++++-------------------------------------
 1 file changed, 8 insertions(+), 41 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 1be92b53b75f..4f78a7bff9e1 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1458,10 +1458,10 @@ static void expire_timers(struct timer_base *base, struct hlist_head *head)
 	}
 }
 
-static int __collect_expired_timers(struct timer_base *base,
-				    struct hlist_head *heads)
+static int collect_expired_timers(struct timer_base *base,
+				  struct hlist_head *heads)
 {
-	unsigned long clk = base->clk;
+	unsigned long clk = base->clk = base->next_expiry;
 	struct hlist_head *vec;
 	int i, levels = 0;
 	unsigned int idx;
@@ -1684,40 +1684,6 @@ void timer_clear_idle(void)
 	 */
 	base->is_idle = false;
 }
-
-static int collect_expired_timers(struct timer_base *base,
-				  struct hlist_head *heads)
-{
-	unsigned long now = READ_ONCE(jiffies);
-
-	/*
-	 * NOHZ optimization. After a long idle sleep we need to forward the
-	 * base to current jiffies. Avoid a loop by searching the bitfield for
-	 * the next expiring timer.
-	 */
-	if ((long)(now - base->clk) > 2) {
-		/*
-		 * If the next timer is ahead of time forward to current
-		 * jiffies, otherwise forward to the next expiry time:
-		 */
-		if (time_after(base->next_expiry, now)) {
-			/*
-			 * The call site will increment base->clk and then
-			 * terminate the expiry loop immediately.
-			 */
-			base->clk = now;
-			return 0;
-		}
-		base->clk = base->next_expiry;
-	}
-	return __collect_expired_timers(base, heads);
-}
-#else
-static inline int collect_expired_timers(struct timer_base *base,
-					 struct hlist_head *heads)
-{
-	return __collect_expired_timers(base, heads);
-}
 #endif
 
 /*
@@ -1750,7 +1716,7 @@ static inline void __run_timers(struct timer_base *base)
 	struct hlist_head heads[LVL_DEPTH];
 	int levels;
 
-	if (!time_after_eq(jiffies, base->clk))
+	if (time_before(jiffies, base->next_expiry))
 		return;
 
 	timer_base_lock_expiry(base);
@@ -1763,7 +1729,8 @@ static inline void __run_timers(struct timer_base *base)
 	 */
 	base->must_forward_clk = false;
 
-	while (time_after_eq(jiffies, base->clk)) {
+	while (time_after_eq(jiffies, base->clk) &&
+	       time_after_eq(jiffies, base->next_expiry)) {
 
 		levels = collect_expired_timers(base, heads);
 		base->clk++;
@@ -1798,12 +1765,12 @@ void run_local_timers(void)
 
 	hrtimer_run_queues();
 	/* Raise the softirq only if required. */
-	if (time_before(jiffies, base->clk)) {
+	if (time_before(jiffies, base->next_expiry)) {
 		if (!IS_ENABLED(CONFIG_NO_HZ_COMMON))
 			return;
 		/* CPU is awake, so check the deferrable base. */
 		base++;
-		if (time_before(jiffies, base->clk))
+		if (time_before(jiffies, base->next_expiry))
 			return;
 	}
 	raise_softirq(TIMER_SOFTIRQ);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 11/12] timer: Remove must_forward_clk
  2020-07-17 14:05 [PATCH 00/11] timer: Reduce timers softirq v3 Frederic Weisbecker
                   ` (9 preceding siblings ...)
  2020-07-17 14:05 ` [PATCH 10/12] timer: Spare timer softirq until next expiry Frederic Weisbecker
@ 2020-07-17 14:05 ` Frederic Weisbecker
  2020-07-17 20:00   ` [tip: timers/core] timers: " tip-bot2 for Frederic Weisbecker
  2020-07-17 14:05 ` [PATCH 12/12] timer: Lower base clock forwarding threshold Frederic Weisbecker
  11 siblings, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2020-07-17 14:05 UTC (permalink / raw)
  To: Thomas Gleixner, Anna-Maria Behnsen
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Juri Lelli

There is no reason to keep this guard around. The code makes sure that
base->clk remains sane and won't be forwarded beyond jiffies nor set
backward.

Tested-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Juri Lelli <juri.lelli@redhat.com>
---
 kernel/time/timer.c | 22 ++++++----------------
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 4f78a7bff9e1..8b3fb52d8c47 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -205,7 +205,6 @@ struct timer_base {
 	unsigned long		next_expiry;
 	unsigned int		cpu;
 	bool			is_idle;
-	bool			must_forward_clk;
 	DECLARE_BITMAP(pending_map, WHEEL_SIZE);
 	struct hlist_head	vectors[WHEEL_SIZE];
 } ____cacheline_aligned;
@@ -888,12 +887,13 @@ get_target_base(struct timer_base *base, unsigned tflags)
 
 static inline void forward_timer_base(struct timer_base *base)
 {
-	unsigned long jnow;
+	unsigned long jnow = READ_ONCE(jiffies);
 
-	if (!base->must_forward_clk)
-		return;
-
-	jnow = READ_ONCE(jiffies);
+	/*
+	 * No need to forward if we are close enough below jiffies.
+	 * Also while executing timers, base->clk is 1 offset ahead
+	 * of jiffies to avoid endless requeuing to current jffies.
+	 */
 	if ((long)(jnow - base->clk) < 2)
 		return;
 
@@ -1722,16 +1722,8 @@ static inline void __run_timers(struct timer_base *base)
 	timer_base_lock_expiry(base);
 	raw_spin_lock_irq(&base->lock);
 
-	/*
-	 * timer_base::must_forward_clk must be cleared before running
-	 * timers so that any timer functions that call mod_timer() will
-	 * not try to forward the base.
-	 */
-	base->must_forward_clk = false;
-
 	while (time_after_eq(jiffies, base->clk) &&
 	       time_after_eq(jiffies, base->next_expiry)) {
-
 		levels = collect_expired_timers(base, heads);
 		base->clk++;
 		base->next_expiry = __next_timer_interrupt(base);
@@ -1739,7 +1731,6 @@ static inline void __run_timers(struct timer_base *base)
 		while (levels--)
 			expire_timers(base, heads + levels);
 	}
-	base->must_forward_clk = true;
 	raw_spin_unlock_irq(&base->lock);
 	timer_base_unlock_expiry(base);
 }
@@ -1935,7 +1926,6 @@ int timers_prepare_cpu(unsigned int cpu)
 		base->clk = jiffies;
 		base->next_expiry = base->clk + NEXT_TIMER_MAX_DELTA;
 		base->is_idle = false;
-		base->must_forward_clk = true;
 	}
 	return 0;
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 12/12] timer: Lower base clock forwarding threshold
  2020-07-17 14:05 [PATCH 00/11] timer: Reduce timers softirq v3 Frederic Weisbecker
                   ` (10 preceding siblings ...)
  2020-07-17 14:05 ` [PATCH 11/12] timer: Remove must_forward_clk Frederic Weisbecker
@ 2020-07-17 14:05 ` Frederic Weisbecker
  2020-07-17 20:00   ` [tip: timers/core] timers: " tip-bot2 for Frederic Weisbecker
  11 siblings, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2020-07-17 14:05 UTC (permalink / raw)
  To: Thomas Gleixner, Anna-Maria Behnsen
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Juri Lelli

There is nothing that prevents us from forwarding the base clock if
it's one jiffy off. This reason for this arbitrary limit is historical
and doesn't seem to stand anymore.

Tested-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Juri Lelli <juri.lelli@redhat.com>
---
 kernel/time/timer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 8b3fb52d8c47..77e21e98ec32 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -894,7 +894,7 @@ static inline void forward_timer_base(struct timer_base *base)
 	 * Also while executing timers, base->clk is 1 offset ahead
 	 * of jiffies to avoid endless requeuing to current jffies.
 	 */
-	if ((long)(jnow - base->clk) < 2)
+	if ((long)(jnow - base->clk) < 1)
 		return;
 
 	/*
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip: timers/urgent] timer: Fix wheel index calculation on last level
  2020-07-17 14:05 ` [PATCH 01/12] timer: Fix wheel index calculation on last level Frederic Weisbecker
@ 2020-07-17 19:49   ` tip-bot2 for Frederic Weisbecker
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2020-07-17 19:49 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Frederic Weisbecker, Thomas Gleixner, stable, x86, LKML

The following commit has been merged into the timers/urgent branch of tip:

Commit-ID:     e2a71bdea81690b6ef11f4368261ec6f5b6891aa
Gitweb:        https://git.kernel.org/tip/e2a71bdea81690b6ef11f4368261ec6f5b6891aa
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Fri, 17 Jul 2020 16:05:40 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Fri, 17 Jul 2020 21:44:05 +02:00

timer: Fix wheel index calculation on last level

When an expiration delta falls into the last level of the wheel, that delta
has be compared against the maximum possible delay and reduced to fit in if
necessary.

However instead of comparing the delta against the maximum, the code
compares the actual expiry against the maximum. Then instead of fixing the
delta to fit in, it sets the maximum delta as the expiry value.

This can result in various undesired outcomes, the worst possible one
being a timer expiring 15 days ahead to fire immediately.

Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200717140551.29076-2-frederic@kernel.org

---
 kernel/time/timer.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 9a838d3..df1ff80 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -521,8 +521,8 @@ static int calc_wheel_index(unsigned long expires, unsigned long clk)
 		 * Force expire obscene large timeouts to expire at the
 		 * capacity limit of the wheel.
 		 */
-		if (expires >= WHEEL_TIMEOUT_CUTOFF)
-			expires = WHEEL_TIMEOUT_MAX;
+		if (delta >= WHEEL_TIMEOUT_CUTOFF)
+			expires = clk + WHEEL_TIMEOUT_MAX;
 
 		idx = calc_index(expires, LVL_DEPTH - 1);
 	}

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip: timers/core] timers: Lower base clock forwarding threshold
  2020-07-17 14:05 ` [PATCH 12/12] timer: Lower base clock forwarding threshold Frederic Weisbecker
@ 2020-07-17 20:00   ` tip-bot2 for Frederic Weisbecker
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2020-07-17 20:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Frederic Weisbecker, Thomas Gleixner, Juri Lelli, x86, LKML

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     36cd28a4cdd05d47ccb62a2d86e8f37839cc879a
Gitweb:        https://git.kernel.org/tip/36cd28a4cdd05d47ccb62a2d86e8f37839cc879a
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Fri, 17 Jul 2020 16:05:51 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Fri, 17 Jul 2020 21:55:25 +02:00

timers: Lower base clock forwarding threshold

There is nothing that prevents from forwarding the base clock if it's one
jiffy off. The reason for this arbitrary limit of two jiffies is historical
and does not longer exist.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200717140551.29076-13-frederic@kernel.org

---
 kernel/time/timer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 8b3fb52..77e21e9 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -894,7 +894,7 @@ static inline void forward_timer_base(struct timer_base *base)
 	 * Also while executing timers, base->clk is 1 offset ahead
 	 * of jiffies to avoid endless requeuing to current jffies.
 	 */
-	if ((long)(jnow - base->clk) < 2)
+	if ((long)(jnow - base->clk) < 1)
 		return;
 
 	/*

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip: timers/core] timers: Remove must_forward_clk
  2020-07-17 14:05 ` [PATCH 11/12] timer: Remove must_forward_clk Frederic Weisbecker
@ 2020-07-17 20:00   ` tip-bot2 for Frederic Weisbecker
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2020-07-17 20:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Frederic Weisbecker, Thomas Gleixner, Juri Lelli, x86, LKML

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     0975fb565b8b8f9e0c96d0de39fcb954833ea5e0
Gitweb:        https://git.kernel.org/tip/0975fb565b8b8f9e0c96d0de39fcb954833ea5e0
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Fri, 17 Jul 2020 16:05:50 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Fri, 17 Jul 2020 21:55:25 +02:00

timers: Remove must_forward_clk

There is no reason to keep this guard around. The code makes sure that
base->clk remains sane and won't be forwarded beyond jiffies nor set
backward.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200717140551.29076-12-frederic@kernel.org

---
 kernel/time/timer.c | 22 ++++++----------------
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 4f78a7b..8b3fb52 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -205,7 +205,6 @@ struct timer_base {
 	unsigned long		next_expiry;
 	unsigned int		cpu;
 	bool			is_idle;
-	bool			must_forward_clk;
 	DECLARE_BITMAP(pending_map, WHEEL_SIZE);
 	struct hlist_head	vectors[WHEEL_SIZE];
 } ____cacheline_aligned;
@@ -888,12 +887,13 @@ get_target_base(struct timer_base *base, unsigned tflags)
 
 static inline void forward_timer_base(struct timer_base *base)
 {
-	unsigned long jnow;
+	unsigned long jnow = READ_ONCE(jiffies);
 
-	if (!base->must_forward_clk)
-		return;
-
-	jnow = READ_ONCE(jiffies);
+	/*
+	 * No need to forward if we are close enough below jiffies.
+	 * Also while executing timers, base->clk is 1 offset ahead
+	 * of jiffies to avoid endless requeuing to current jffies.
+	 */
 	if ((long)(jnow - base->clk) < 2)
 		return;
 
@@ -1722,16 +1722,8 @@ static inline void __run_timers(struct timer_base *base)
 	timer_base_lock_expiry(base);
 	raw_spin_lock_irq(&base->lock);
 
-	/*
-	 * timer_base::must_forward_clk must be cleared before running
-	 * timers so that any timer functions that call mod_timer() will
-	 * not try to forward the base.
-	 */
-	base->must_forward_clk = false;
-
 	while (time_after_eq(jiffies, base->clk) &&
 	       time_after_eq(jiffies, base->next_expiry)) {
-
 		levels = collect_expired_timers(base, heads);
 		base->clk++;
 		base->next_expiry = __next_timer_interrupt(base);
@@ -1739,7 +1731,6 @@ static inline void __run_timers(struct timer_base *base)
 		while (levels--)
 			expire_timers(base, heads + levels);
 	}
-	base->must_forward_clk = true;
 	raw_spin_unlock_irq(&base->lock);
 	timer_base_unlock_expiry(base);
 }
@@ -1935,7 +1926,6 @@ int timers_prepare_cpu(unsigned int cpu)
 		base->clk = jiffies;
 		base->next_expiry = base->clk + NEXT_TIMER_MAX_DELTA;
 		base->is_idle = false;
-		base->must_forward_clk = true;
 	}
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip: timers/core] timers: Expand clk forward logic beyond nohz
  2020-07-17 14:05 ` [PATCH 09/12] timer: Expand clk forward logic beyond nohz Frederic Weisbecker
@ 2020-07-17 20:00   ` tip-bot2 for Frederic Weisbecker
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2020-07-17 20:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Frederic Weisbecker, Thomas Gleixner, Juri Lelli, x86, LKML

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     1f8a4212dc83f8353843fabf6465fd918372fbbf
Gitweb:        https://git.kernel.org/tip/1f8a4212dc83f8353843fabf6465fd918372fbbf
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Fri, 17 Jul 2020 16:05:48 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Fri, 17 Jul 2020 21:55:24 +02:00

timers: Expand clk forward logic beyond nohz

As for next_expiry, the base->clk catch up logic will be expanded beyond
NOHZ in order to avoid triggering useless softirqs.

If softirqs should only fire to expire pending timers, periodic base->clk
increments must be skippable for random amounts of time.  Therefore prepare
to catch-up with missing updates whenever an up-to-date base clock is
needed.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200717140551.29076-10-frederic@kernel.org

---
 kernel/time/timer.c | 26 ++++----------------------
 1 file changed, 4 insertions(+), 22 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 13f48ee..1be92b5 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -888,19 +888,12 @@ get_target_base(struct timer_base *base, unsigned tflags)
 
 static inline void forward_timer_base(struct timer_base *base)
 {
-#ifdef CONFIG_NO_HZ_COMMON
 	unsigned long jnow;
 
-	/*
-	 * We only forward the base when we are idle or have just come out of
-	 * idle (must_forward_clk logic), and have a delta between base clock
-	 * and jiffies. In the common case, run_timers will take care of it.
-	 */
-	if (likely(!base->must_forward_clk))
+	if (!base->must_forward_clk)
 		return;
 
 	jnow = READ_ONCE(jiffies);
-	base->must_forward_clk = base->is_idle;
 	if ((long)(jnow - base->clk) < 2)
 		return;
 
@@ -915,7 +908,6 @@ static inline void forward_timer_base(struct timer_base *base)
 			return;
 		base->clk = base->next_expiry;
 	}
-#endif
 }
 
 
@@ -1667,10 +1659,8 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem)
 		 * logic is only maintained for the BASE_STD base, deferrable
 		 * timers may still see large granularity skew (by design).
 		 */
-		if ((expires - basem) > TICK_NSEC) {
-			base->must_forward_clk = true;
+		if ((expires - basem) > TICK_NSEC)
 			base->is_idle = true;
-		}
 	}
 	raw_spin_unlock(&base->lock);
 
@@ -1769,16 +1759,7 @@ static inline void __run_timers(struct timer_base *base)
 	/*
 	 * timer_base::must_forward_clk must be cleared before running
 	 * timers so that any timer functions that call mod_timer() will
-	 * not try to forward the base. Idle tracking / clock forwarding
-	 * logic is only used with BASE_STD timers.
-	 *
-	 * The must_forward_clk flag is cleared unconditionally also for
-	 * the deferrable base. The deferrable base is not affected by idle
-	 * tracking and never forwarded, so clearing the flag is a NOOP.
-	 *
-	 * The fact that the deferrable base is never forwarded can cause
-	 * large variations in granularity for deferrable timers, but they
-	 * can be deferred for long periods due to idle anyway.
+	 * not try to forward the base.
 	 */
 	base->must_forward_clk = false;
 
@@ -1791,6 +1772,7 @@ static inline void __run_timers(struct timer_base *base)
 		while (levels--)
 			expire_timers(base, heads + levels);
 	}
+	base->must_forward_clk = true;
 	raw_spin_unlock_irq(&base->lock);
 	timer_base_unlock_expiry(base);
 }

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip: timers/core] timers: Spare timer softirq until next expiry
  2020-07-17 14:05 ` [PATCH 10/12] timer: Spare timer softirq until next expiry Frederic Weisbecker
@ 2020-07-17 20:00   ` tip-bot2 for Frederic Weisbecker
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2020-07-17 20:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Frederic Weisbecker, Thomas Gleixner, Juri Lelli, x86, LKML

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     d4f7dae87096dfe722bf32aa82076ece1063746c
Gitweb:        https://git.kernel.org/tip/d4f7dae87096dfe722bf32aa82076ece1063746c
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Fri, 17 Jul 2020 16:05:49 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Fri, 17 Jul 2020 21:55:24 +02:00

timers: Spare timer softirq until next expiry

Now that the core timer infrastructure doesn't depend anymore on
periodic base->clk increments, even when the CPU is not in NO_HZ mode,
timer softirqs can be skipped until there are timers to expire.

Some spurious softirqs can still remain since base->next_expiry doesn't
keep track of canceled timers but this still reduces the number of softirqs
significantly: ~15 times less for HZ=1000 and ~5 times less for HZ=100.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200717140551.29076-11-frederic@kernel.org

---
 kernel/time/timer.c | 49 +++++++-------------------------------------
 1 file changed, 8 insertions(+), 41 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 1be92b5..4f78a7b 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1458,10 +1458,10 @@ static void expire_timers(struct timer_base *base, struct hlist_head *head)
 	}
 }
 
-static int __collect_expired_timers(struct timer_base *base,
-				    struct hlist_head *heads)
+static int collect_expired_timers(struct timer_base *base,
+				  struct hlist_head *heads)
 {
-	unsigned long clk = base->clk;
+	unsigned long clk = base->clk = base->next_expiry;
 	struct hlist_head *vec;
 	int i, levels = 0;
 	unsigned int idx;
@@ -1684,40 +1684,6 @@ void timer_clear_idle(void)
 	 */
 	base->is_idle = false;
 }
-
-static int collect_expired_timers(struct timer_base *base,
-				  struct hlist_head *heads)
-{
-	unsigned long now = READ_ONCE(jiffies);
-
-	/*
-	 * NOHZ optimization. After a long idle sleep we need to forward the
-	 * base to current jiffies. Avoid a loop by searching the bitfield for
-	 * the next expiring timer.
-	 */
-	if ((long)(now - base->clk) > 2) {
-		/*
-		 * If the next timer is ahead of time forward to current
-		 * jiffies, otherwise forward to the next expiry time:
-		 */
-		if (time_after(base->next_expiry, now)) {
-			/*
-			 * The call site will increment base->clk and then
-			 * terminate the expiry loop immediately.
-			 */
-			base->clk = now;
-			return 0;
-		}
-		base->clk = base->next_expiry;
-	}
-	return __collect_expired_timers(base, heads);
-}
-#else
-static inline int collect_expired_timers(struct timer_base *base,
-					 struct hlist_head *heads)
-{
-	return __collect_expired_timers(base, heads);
-}
 #endif
 
 /*
@@ -1750,7 +1716,7 @@ static inline void __run_timers(struct timer_base *base)
 	struct hlist_head heads[LVL_DEPTH];
 	int levels;
 
-	if (!time_after_eq(jiffies, base->clk))
+	if (time_before(jiffies, base->next_expiry))
 		return;
 
 	timer_base_lock_expiry(base);
@@ -1763,7 +1729,8 @@ static inline void __run_timers(struct timer_base *base)
 	 */
 	base->must_forward_clk = false;
 
-	while (time_after_eq(jiffies, base->clk)) {
+	while (time_after_eq(jiffies, base->clk) &&
+	       time_after_eq(jiffies, base->next_expiry)) {
 
 		levels = collect_expired_timers(base, heads);
 		base->clk++;
@@ -1798,12 +1765,12 @@ void run_local_timers(void)
 
 	hrtimer_run_queues();
 	/* Raise the softirq only if required. */
-	if (time_before(jiffies, base->clk)) {
+	if (time_before(jiffies, base->next_expiry)) {
 		if (!IS_ENABLED(CONFIG_NO_HZ_COMMON))
 			return;
 		/* CPU is awake, so check the deferrable base. */
 		base++;
-		if (time_before(jiffies, base->clk))
+		if (time_before(jiffies, base->next_expiry))
 			return;
 	}
 	raise_softirq(TIMER_SOFTIRQ);

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip: timers/core] timers: Reuse next expiry cache after nohz exit
  2020-07-17 14:05 ` [PATCH 08/12] timer: Reuse next expiry cache after nohz exit Frederic Weisbecker
@ 2020-07-17 20:00   ` tip-bot2 for Frederic Weisbecker
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2020-07-17 20:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Frederic Weisbecker, Thomas Gleixner, Juri Lelli, x86, LKML

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     90d52f65f303091be17b5f4ffab7090b2064b4a1
Gitweb:        https://git.kernel.org/tip/90d52f65f303091be17b5f4ffab7090b2064b4a1
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Fri, 17 Jul 2020 16:05:47 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Fri, 17 Jul 2020 21:55:23 +02:00

timers: Reuse next expiry cache after nohz exit

Now that the next expiry it tracked unconditionally when a timer is added,
this information can be reused on a tick firing after exiting nohz.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200717140551.29076-9-frederic@kernel.org

---
 kernel/time/timer.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 76fd964..13f48ee 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1706,13 +1706,11 @@ static int collect_expired_timers(struct timer_base *base,
 	 * the next expiring timer.
 	 */
 	if ((long)(now - base->clk) > 2) {
-		unsigned long next = __next_timer_interrupt(base);
-
 		/*
 		 * If the next timer is ahead of time forward to current
 		 * jiffies, otherwise forward to the next expiry time:
 		 */
-		if (time_after(next, now)) {
+		if (time_after(base->next_expiry, now)) {
 			/*
 			 * The call site will increment base->clk and then
 			 * terminate the expiry loop immediately.
@@ -1720,7 +1718,7 @@ static int collect_expired_timers(struct timer_base *base,
 			base->clk = now;
 			return 0;
 		}
-		base->clk = next;
+		base->clk = base->next_expiry;
 	}
 	return __collect_expired_timers(base, heads);
 }

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip: timers/core] timers: Optimize _next_timer_interrupt() level iteration
  2020-07-17 14:05 ` [PATCH 06/12] timer: Optimize _next_timer_interrupt() level iteration Frederic Weisbecker
@ 2020-07-17 20:00   ` tip-bot2 for Frederic Weisbecker
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2020-07-17 20:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Frederic Weisbecker, Thomas Gleixner, Juri Lelli, x86, LKML

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     001ec1b3925da0d51847c23fc0aa4129282db526
Gitweb:        https://git.kernel.org/tip/001ec1b3925da0d51847c23fc0aa4129282db526
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Fri, 17 Jul 2020 16:05:45 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Fri, 17 Jul 2020 21:55:22 +02:00

timers: Optimize _next_timer_interrupt() level iteration

If a level has a timer that expires before reaching the next level, there
is no need to iterate further.

The next level is reached when the 3 lower bits of the current level are
cleared. If the next event happens before/during that, the next levels
won't provide an earlier expiration.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200717140551.29076-7-frederic@kernel.org

---
 kernel/time/timer.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index af1c08b..9abc417 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1526,6 +1526,7 @@ static unsigned long __next_timer_interrupt(struct timer_base *base)
 	clk = base->clk;
 	for (lvl = 0; lvl < LVL_DEPTH; lvl++, offset += LVL_SIZE) {
 		int pos = next_pending_bucket(base, offset, clk & LVL_MASK);
+		unsigned long lvl_clk = clk & LVL_CLK_MASK;
 
 		if (pos >= 0) {
 			unsigned long tmp = clk + (unsigned long) pos;
@@ -1533,6 +1534,13 @@ static unsigned long __next_timer_interrupt(struct timer_base *base)
 			tmp <<= LVL_SHIFT(lvl);
 			if (time_before(tmp, next))
 				next = tmp;
+
+			/*
+			 * If the next expiration happens before we reach
+			 * the next level, no need to check further.
+			 */
+			if (pos <= ((LVL_CLK_DIV - lvl_clk) & LVL_CLK_MASK))
+				break;
 		}
 		/*
 		 * Clock for the next level. If the current level clock lower
@@ -1570,7 +1578,7 @@ static unsigned long __next_timer_interrupt(struct timer_base *base)
 		 * So the simple check whether the lower bits of the current
 		 * level are 0 or not is sufficient for all cases.
 		 */
-		adj = clk & LVL_CLK_MASK ? 1 : 0;
+		adj = lvl_clk ? 1 : 0;
 		clk >>= LVL_CLK_SHIFT;
 		clk += adj;
 	}

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip: timers/core] timers: Always keep track of next expiry
  2020-07-17 14:05 ` [PATCH 07/12] timers: Always keep track of next expiry Frederic Weisbecker
@ 2020-07-17 20:00   ` tip-bot2 for Frederic Weisbecker
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2020-07-17 20:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Frederic Weisbecker, Thomas Gleixner, Juri Lelli, x86, LKML

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     dc2a0f1fb2a06df09f5094f29aea56b763aa7cca
Gitweb:        https://git.kernel.org/tip/dc2a0f1fb2a06df09f5094f29aea56b763aa7cca
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Fri, 17 Jul 2020 16:05:46 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Fri, 17 Jul 2020 21:55:23 +02:00

timers: Always keep track of next expiry

So far next expiry was only tracked while the CPU was in nohz_idle mode
in order to cope with missing ticks that can't increment the base->clk
periodically anymore.

This logic is going to be expanded beyond nohz in order to spare timer
softirqs so do it unconditionally.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200717140551.29076-8-frederic@kernel.org

---
 kernel/time/timer.c | 42 +++++++++++++++++++++---------------------
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 9abc417..76fd964 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -544,8 +544,7 @@ static int calc_wheel_index(unsigned long expires, unsigned long clk,
 }
 
 static void
-trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer,
-		    unsigned long bucket_expiry)
+trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
 {
 	if (!is_timers_nohz_active())
 		return;
@@ -565,23 +564,8 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer,
 	 * timer is not deferrable. If the other CPU is on the way to idle
 	 * then it can't set base->is_idle as we hold the base lock:
 	 */
-	if (!base->is_idle)
-		return;
-
-	/*
-	 * Check whether this is the new first expiring timer. The
-	 * effective expiry time of the timer is required here
-	 * (bucket_expiry) instead of timer->expires.
-	 */
-	if (time_after_eq(bucket_expiry, base->next_expiry))
-		return;
-
-	/*
-	 * Set the next expiry time and kick the CPU so it can reevaluate the
-	 * wheel:
-	 */
-	base->next_expiry = bucket_expiry;
-	wake_up_nohz_cpu(base->cpu);
+	if (base->is_idle)
+		wake_up_nohz_cpu(base->cpu);
 }
 
 /*
@@ -592,12 +576,26 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer,
 static void enqueue_timer(struct timer_base *base, struct timer_list *timer,
 			  unsigned int idx, unsigned long bucket_expiry)
 {
+
 	hlist_add_head(&timer->entry, base->vectors + idx);
 	__set_bit(idx, base->pending_map);
 	timer_set_idx(timer, idx);
 
 	trace_timer_start(timer, timer->expires, timer->flags);
-	trigger_dyntick_cpu(base, timer, bucket_expiry);
+
+	/*
+	 * Check whether this is the new first expiring timer. The
+	 * effective expiry time of the timer is required here
+	 * (bucket_expiry) instead of timer->expires.
+	 */
+	if (time_before(bucket_expiry, base->next_expiry)) {
+		/*
+		 * Set the next expiry time and kick the CPU so it
+		 * can reevaluate the wheel:
+		 */
+		base->next_expiry = bucket_expiry;
+		trigger_dyntick_cpu(base, timer);
+	}
 }
 
 static void internal_add_timer(struct timer_base *base, struct timer_list *timer)
@@ -1493,7 +1491,6 @@ static int __collect_expired_timers(struct timer_base *base,
 	return levels;
 }
 
-#ifdef CONFIG_NO_HZ_COMMON
 /*
  * Find the next pending bucket of a level. Search from level start (@offset)
  * + @clk upwards and if nothing there, search from start of the level
@@ -1585,6 +1582,7 @@ static unsigned long __next_timer_interrupt(struct timer_base *base)
 	return next;
 }
 
+#ifdef CONFIG_NO_HZ_COMMON
 /*
  * Check, if the next hrtimer event is before the next timer wheel
  * event:
@@ -1790,6 +1788,7 @@ static inline void __run_timers(struct timer_base *base)
 
 		levels = collect_expired_timers(base, heads);
 		base->clk++;
+		base->next_expiry = __next_timer_interrupt(base);
 
 		while (levels--)
 			expire_timers(base, heads + levels);
@@ -2042,6 +2041,7 @@ static void __init init_timer_cpu(int cpu)
 		base->cpu = cpu;
 		raw_spin_lock_init(&base->lock);
 		base->clk = jiffies;
+		base->next_expiry = base->clk + NEXT_TIMER_MAX_DELTA;
 		timer_base_init_expiry_lock(base);
 	}
 }

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip: timers/core] timers: Add comments about calc_index() ceiling work
  2020-07-17 14:05 ` [PATCH 05/12] timer: Add comments about calc_index() ceiling work Frederic Weisbecker
@ 2020-07-17 20:00   ` tip-bot2 for Frederic Weisbecker
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2020-07-17 20:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Frederic Weisbecker, Juri Lelli, x86, LKML

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     4468897211628865ee2392acb5ad281f74176f63
Gitweb:        https://git.kernel.org/tip/4468897211628865ee2392acb5ad281f74176f63
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Fri, 17 Jul 2020 16:05:44 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Fri, 17 Jul 2020 21:55:22 +02:00

timers: Add comments about calc_index() ceiling work

calc_index() adds 1 unit of the level granularity to the expiry passed
in parameter to ensure that the timer doesn't expire too early. Add a
comment to explain that and the resulting layout in the wheel.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200717140551.29076-6-frederic@kernel.org

---
 kernel/time/timer.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2af08a1..af1c08b 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -156,7 +156,8 @@ EXPORT_SYMBOL(jiffies_64);
 
 /*
  * The time start value for each level to select the bucket at enqueue
- * time.
+ * time. We start from the last possible delta of the previous level
+ * so that we can later add an extra LVL_GRAN(n) to n (see calc_index()).
  */
 #define LVL_START(n)	((LVL_SIZE - 1) << (((n) - 1) * LVL_CLK_SHIFT))
 
@@ -490,6 +491,15 @@ static inline void timer_set_idx(struct timer_list *timer, unsigned int idx)
 static inline unsigned calc_index(unsigned long expires, unsigned lvl,
 				  unsigned long *bucket_expiry)
 {
+
+	/*
+	 * The timer wheel has to guarantee that a timer does not fire
+	 * early. Early expiry can happen due to:
+	 * - Timer is armed at the edge of a tick
+	 * - Truncation of the expiry time in the outer wheel levels
+	 *
+	 * Round up with level granularity to prevent this.
+	 */
 	expires = (expires + LVL_GRAN(lvl)) >> LVL_SHIFT(lvl);
 	*bucket_expiry = expires << LVL_SHIFT(lvl);
 	return LVL_OFFS(lvl) + (expires & LVL_MASK);

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip: timers/core] timers: Move trigger_dyntick_cpu() to enqueue_timer()
  2020-07-17 14:05 ` [PATCH 04/12] timer: Move trigger_dyntick_cpu() to enqueue_timer() Frederic Weisbecker
@ 2020-07-17 20:00   ` tip-bot2 for Frederic Weisbecker
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2020-07-17 20:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Frederic Weisbecker, Thomas Gleixner, Juri Lelli, x86, LKML

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     9a2b764b06c880678416d803d027f575ae40ec99
Gitweb:        https://git.kernel.org/tip/9a2b764b06c880678416d803d027f575ae40ec99
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Fri, 17 Jul 2020 16:05:43 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Fri, 17 Jul 2020 21:55:22 +02:00

timers: Move trigger_dyntick_cpu() to enqueue_timer()

Consolidate the code by calling trigger_dyntick_cpu() from
enqueue_timer() instead of calling it from all its callers.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200717140551.29076-5-frederic@kernel.org

---
 kernel/time/timer.c | 61 ++++++++++++++++++--------------------------
 1 file changed, 25 insertions(+), 36 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index a7a3cf7..2af08a1 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -533,30 +533,6 @@ static int calc_wheel_index(unsigned long expires, unsigned long clk,
 	return idx;
 }
 
-/*
- * Enqueue the timer into the hash bucket, mark it pending in
- * the bitmap and store the index in the timer flags.
- */
-static void enqueue_timer(struct timer_base *base, struct timer_list *timer,
-			  unsigned int idx)
-{
-	hlist_add_head(&timer->entry, base->vectors + idx);
-	__set_bit(idx, base->pending_map);
-	timer_set_idx(timer, idx);
-
-	trace_timer_start(timer, timer->expires, timer->flags);
-}
-
-static void
-__internal_add_timer(struct timer_base *base, struct timer_list *timer,
-		     unsigned long *bucket_expiry)
-{
-	unsigned int idx;
-
-	idx = calc_wheel_index(timer->expires, base->clk, bucket_expiry);
-	enqueue_timer(base, timer, idx);
-}
-
 static void
 trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer,
 		    unsigned long bucket_expiry)
@@ -598,15 +574,31 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer,
 	wake_up_nohz_cpu(base->cpu);
 }
 
-static void
-internal_add_timer(struct timer_base *base, struct timer_list *timer)
+/*
+ * Enqueue the timer into the hash bucket, mark it pending in
+ * the bitmap, store the index in the timer flags then wake up
+ * the target CPU if needed.
+ */
+static void enqueue_timer(struct timer_base *base, struct timer_list *timer,
+			  unsigned int idx, unsigned long bucket_expiry)
 {
-	unsigned long bucket_expiry;
+	hlist_add_head(&timer->entry, base->vectors + idx);
+	__set_bit(idx, base->pending_map);
+	timer_set_idx(timer, idx);
 
-	__internal_add_timer(base, timer, &bucket_expiry);
+	trace_timer_start(timer, timer->expires, timer->flags);
 	trigger_dyntick_cpu(base, timer, bucket_expiry);
 }
 
+static void internal_add_timer(struct timer_base *base, struct timer_list *timer)
+{
+	unsigned long bucket_expiry;
+	unsigned int idx;
+
+	idx = calc_wheel_index(timer->expires, base->clk, &bucket_expiry);
+	enqueue_timer(base, timer, idx, bucket_expiry);
+}
+
 #ifdef CONFIG_DEBUG_OBJECTS_TIMERS
 
 static struct debug_obj_descr timer_debug_descr;
@@ -1057,16 +1049,13 @@ __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int option
 	/*
 	 * If 'idx' was calculated above and the base time did not advance
 	 * between calculating 'idx' and possibly switching the base, only
-	 * enqueue_timer() and trigger_dyntick_cpu() is required. Otherwise
-	 * we need to (re)calculate the wheel index via
-	 * internal_add_timer().
+	 * enqueue_timer() is required. Otherwise we need to (re)calculate
+	 * the wheel index via internal_add_timer().
 	 */
-	if (idx != UINT_MAX && clk == base->clk) {
-		enqueue_timer(base, timer, idx);
-		trigger_dyntick_cpu(base, timer, bucket_expiry);
-	} else {
+	if (idx != UINT_MAX && clk == base->clk)
+		enqueue_timer(base, timer, idx, bucket_expiry);
+	else
 		internal_add_timer(base, timer);
-	}
 
 out_unlock:
 	raw_spin_unlock_irqrestore(&base->lock, flags);

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip: timers/core] timers: Use only bucket expiry for base->next_expiry value
  2020-07-17 14:05 ` [PATCH 03/12] timers: Use only bucket expiry for base->next_expiry value Frederic Weisbecker
@ 2020-07-17 20:00   ` tip-bot2 for Anna-Maria Behnsen
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot2 for Anna-Maria Behnsen @ 2020-07-17 20:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Anna-Maria Behnsen, Frederic Weisbecker, Thomas Gleixner, x86, LKML

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     1f32cab0db4bdf6491eb4a60838f278e01c31698
Gitweb:        https://git.kernel.org/tip/1f32cab0db4bdf6491eb4a60838f278e01c31698
Author:        Anna-Maria Behnsen <anna-maria@linutronix.de>
AuthorDate:    Fri, 17 Jul 2020 16:05:42 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Fri, 17 Jul 2020 21:55:21 +02:00

timers: Use only bucket expiry for base->next_expiry value

The bucket expiry time is the effective expriy time of timers and is
greater than or equal to the requested timer expiry time. This is due
to the guarantee that timers never expire early and the reduced expiry
granularity in the secondary wheel levels.

When a timer is enqueued, trigger_dyntick_cpu() checks whether the
timer is the new first timer. This check compares next_expiry with
the requested timer expiry value and not with the effective expiry
value of the bucket into which the timer was queued.

Storing the requested timer expiry value in base->next_expiry can lead
to base->clk going backwards if the requested timer expiry value is
smaller than base->clk. Commit 30c66fc30ee7 ("timer: Prevent base->clk
from moving backward") worked around this by preventing the store when
timer->expiry is before base->clk, but did not fix the underlying
problem.

Use the expiry value of the bucket into which the timer is queued to
do the new first timer check. This fixes the base->clk going backward
problem.

The workaround of commit 30c66fc30ee7 ("timer: Prevent base->clk from
moving backward") in trigger_dyntick_cpu() is not longer necessary as the
timers bucket expiry is guaranteed to be greater than or equal base->clk.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200717140551.29076-4-frederic@kernel.org

---
 kernel/time/timer.c | 64 +++++++++++++++++++++++---------------------
 1 file changed, 34 insertions(+), 30 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index bcdc304..a7a3cf7 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -487,35 +487,39 @@ static inline void timer_set_idx(struct timer_list *timer, unsigned int idx)
  * Helper function to calculate the array index for a given expiry
  * time.
  */
-static inline unsigned calc_index(unsigned long expires, unsigned lvl)
+static inline unsigned calc_index(unsigned long expires, unsigned lvl,
+				  unsigned long *bucket_expiry)
 {
 	expires = (expires + LVL_GRAN(lvl)) >> LVL_SHIFT(lvl);
+	*bucket_expiry = expires << LVL_SHIFT(lvl);
 	return LVL_OFFS(lvl) + (expires & LVL_MASK);
 }
 
-static int calc_wheel_index(unsigned long expires, unsigned long clk)
+static int calc_wheel_index(unsigned long expires, unsigned long clk,
+			    unsigned long *bucket_expiry)
 {
 	unsigned long delta = expires - clk;
 	unsigned int idx;
 
 	if (delta < LVL_START(1)) {
-		idx = calc_index(expires, 0);
+		idx = calc_index(expires, 0, bucket_expiry);
 	} else if (delta < LVL_START(2)) {
-		idx = calc_index(expires, 1);
+		idx = calc_index(expires, 1, bucket_expiry);
 	} else if (delta < LVL_START(3)) {
-		idx = calc_index(expires, 2);
+		idx = calc_index(expires, 2, bucket_expiry);
 	} else if (delta < LVL_START(4)) {
-		idx = calc_index(expires, 3);
+		idx = calc_index(expires, 3, bucket_expiry);
 	} else if (delta < LVL_START(5)) {
-		idx = calc_index(expires, 4);
+		idx = calc_index(expires, 4, bucket_expiry);
 	} else if (delta < LVL_START(6)) {
-		idx = calc_index(expires, 5);
+		idx = calc_index(expires, 5, bucket_expiry);
 	} else if (delta < LVL_START(7)) {
-		idx = calc_index(expires, 6);
+		idx = calc_index(expires, 6, bucket_expiry);
 	} else if (LVL_DEPTH > 8 && delta < LVL_START(8)) {
-		idx = calc_index(expires, 7);
+		idx = calc_index(expires, 7, bucket_expiry);
 	} else if ((long) delta < 0) {
 		idx = clk & LVL_MASK;
+		*bucket_expiry = clk;
 	} else {
 		/*
 		 * Force expire obscene large timeouts to expire at the
@@ -524,7 +528,7 @@ static int calc_wheel_index(unsigned long expires, unsigned long clk)
 		if (delta >= WHEEL_TIMEOUT_CUTOFF)
 			expires = clk + WHEEL_TIMEOUT_MAX;
 
-		idx = calc_index(expires, LVL_DEPTH - 1);
+		idx = calc_index(expires, LVL_DEPTH - 1, bucket_expiry);
 	}
 	return idx;
 }
@@ -544,16 +548,18 @@ static void enqueue_timer(struct timer_base *base, struct timer_list *timer,
 }
 
 static void
-__internal_add_timer(struct timer_base *base, struct timer_list *timer)
+__internal_add_timer(struct timer_base *base, struct timer_list *timer,
+		     unsigned long *bucket_expiry)
 {
 	unsigned int idx;
 
-	idx = calc_wheel_index(timer->expires, base->clk);
+	idx = calc_wheel_index(timer->expires, base->clk, bucket_expiry);
 	enqueue_timer(base, timer, idx);
 }
 
 static void
-trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
+trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer,
+		    unsigned long bucket_expiry)
 {
 	if (!is_timers_nohz_active())
 		return;
@@ -576,31 +582,29 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
 	if (!base->is_idle)
 		return;
 
-	/* Check whether this is the new first expiring timer: */
-	if (time_after_eq(timer->expires, base->next_expiry))
+	/*
+	 * Check whether this is the new first expiring timer. The
+	 * effective expiry time of the timer is required here
+	 * (bucket_expiry) instead of timer->expires.
+	 */
+	if (time_after_eq(bucket_expiry, base->next_expiry))
 		return;
 
 	/*
 	 * Set the next expiry time and kick the CPU so it can reevaluate the
 	 * wheel:
 	 */
-	if (time_before(timer->expires, base->clk)) {
-		/*
-		 * Prevent from forward_timer_base() moving the base->clk
-		 * backward
-		 */
-		base->next_expiry = base->clk;
-	} else {
-		base->next_expiry = timer->expires;
-	}
+	base->next_expiry = bucket_expiry;
 	wake_up_nohz_cpu(base->cpu);
 }
 
 static void
 internal_add_timer(struct timer_base *base, struct timer_list *timer)
 {
-	__internal_add_timer(base, timer);
-	trigger_dyntick_cpu(base, timer);
+	unsigned long bucket_expiry;
+
+	__internal_add_timer(base, timer, &bucket_expiry);
+	trigger_dyntick_cpu(base, timer, bucket_expiry);
 }
 
 #ifdef CONFIG_DEBUG_OBJECTS_TIMERS
@@ -959,9 +963,9 @@ static struct timer_base *lock_timer_base(struct timer_list *timer,
 static inline int
 __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int options)
 {
+	unsigned long clk = 0, flags, bucket_expiry;
 	struct timer_base *base, *new_base;
 	unsigned int idx = UINT_MAX;
-	unsigned long clk = 0, flags;
 	int ret = 0;
 
 	BUG_ON(!timer->function);
@@ -1000,7 +1004,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int option
 		}
 
 		clk = base->clk;
-		idx = calc_wheel_index(expires, clk);
+		idx = calc_wheel_index(expires, clk, &bucket_expiry);
 
 		/*
 		 * Retrieve and compare the array index of the pending
@@ -1059,7 +1063,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int option
 	 */
 	if (idx != UINT_MAX && clk == base->clk) {
 		enqueue_timer(base, timer, idx);
-		trigger_dyntick_cpu(base, timer);
+		trigger_dyntick_cpu(base, timer, bucket_expiry);
 	} else {
 		internal_add_timer(base, timer);
 	}

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip: timers/core] timers: Preserve higher bits of expiration on index calculation
  2020-07-17 14:05 ` [PATCH 02/12] timer: Preserve higher bits of expiration on index calculation Frederic Weisbecker
@ 2020-07-17 20:00   ` tip-bot2 for Frederic Weisbecker
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2020-07-17 20:00 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Frederic Weisbecker, Thomas Gleixner, x86, LKML

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     3d2e83a2a6a0657c1cf145fa6ba23620715d6c36
Gitweb:        https://git.kernel.org/tip/3d2e83a2a6a0657c1cf145fa6ba23620715d6c36
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Fri, 17 Jul 2020 16:05:41 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Fri, 17 Jul 2020 21:55:21 +02:00

timers: Preserve higher bits of expiration on index calculation

The higher bits of the timer expiration are cropped while calling
calc_index() due to the implicit cast from unsigned long to unsigned int.

This loss shouldn't have consequences on the current code since all the
computation to calculate the index is done on the lower 32 bits.

However to prepare for returning the actual bucket expiration from
calc_index() in order to properly fix base->next_expiry updates, the higher
bits need to be preserved.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200717140551.29076-3-frederic@kernel.org

---
 kernel/time/timer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index df1ff80..bcdc304 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -487,7 +487,7 @@ static inline void timer_set_idx(struct timer_list *timer, unsigned int idx)
  * Helper function to calculate the array index for a given expiry
  * time.
  */
-static inline unsigned calc_index(unsigned expires, unsigned lvl)
+static inline unsigned calc_index(unsigned long expires, unsigned lvl)
 {
 	expires = (expires + LVL_GRAN(lvl)) >> LVL_SHIFT(lvl);
 	return LVL_OFFS(lvl) + (expires & LVL_MASK);

^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-07-17 20:01 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-17 14:05 [PATCH 00/11] timer: Reduce timers softirq v3 Frederic Weisbecker
2020-07-17 14:05 ` [PATCH 01/12] timer: Fix wheel index calculation on last level Frederic Weisbecker
2020-07-17 19:49   ` [tip: timers/urgent] " tip-bot2 for Frederic Weisbecker
2020-07-17 14:05 ` [PATCH 02/12] timer: Preserve higher bits of expiration on index calculation Frederic Weisbecker
2020-07-17 20:00   ` [tip: timers/core] timers: " tip-bot2 for Frederic Weisbecker
2020-07-17 14:05 ` [PATCH 03/12] timers: Use only bucket expiry for base->next_expiry value Frederic Weisbecker
2020-07-17 20:00   ` [tip: timers/core] " tip-bot2 for Anna-Maria Behnsen
2020-07-17 14:05 ` [PATCH 04/12] timer: Move trigger_dyntick_cpu() to enqueue_timer() Frederic Weisbecker
2020-07-17 20:00   ` [tip: timers/core] timers: " tip-bot2 for Frederic Weisbecker
2020-07-17 14:05 ` [PATCH 05/12] timer: Add comments about calc_index() ceiling work Frederic Weisbecker
2020-07-17 20:00   ` [tip: timers/core] timers: " tip-bot2 for Frederic Weisbecker
2020-07-17 14:05 ` [PATCH 06/12] timer: Optimize _next_timer_interrupt() level iteration Frederic Weisbecker
2020-07-17 20:00   ` [tip: timers/core] timers: " tip-bot2 for Frederic Weisbecker
2020-07-17 14:05 ` [PATCH 07/12] timers: Always keep track of next expiry Frederic Weisbecker
2020-07-17 20:00   ` [tip: timers/core] " tip-bot2 for Frederic Weisbecker
2020-07-17 14:05 ` [PATCH 08/12] timer: Reuse next expiry cache after nohz exit Frederic Weisbecker
2020-07-17 20:00   ` [tip: timers/core] timers: " tip-bot2 for Frederic Weisbecker
2020-07-17 14:05 ` [PATCH 09/12] timer: Expand clk forward logic beyond nohz Frederic Weisbecker
2020-07-17 20:00   ` [tip: timers/core] timers: " tip-bot2 for Frederic Weisbecker
2020-07-17 14:05 ` [PATCH 10/12] timer: Spare timer softirq until next expiry Frederic Weisbecker
2020-07-17 20:00   ` [tip: timers/core] timers: " tip-bot2 for Frederic Weisbecker
2020-07-17 14:05 ` [PATCH 11/12] timer: Remove must_forward_clk Frederic Weisbecker
2020-07-17 20:00   ` [tip: timers/core] timers: " tip-bot2 for Frederic Weisbecker
2020-07-17 14:05 ` [PATCH 12/12] timer: Lower base clock forwarding threshold Frederic Weisbecker
2020-07-17 20:00   ` [tip: timers/core] timers: " tip-bot2 for Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).