linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Eric Dumazet <edumazet@google.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Chris Mason <clm@fb.com>, Arjan van de Ven <arjan@infradead.org>,
	rt@linutronix.de, Rik van Riel <riel@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	George Spelvin <linux@sciencehorizons.net>,
	Len Brown <lenb@kernel.org>,
	Anna-Maria Gleixner <anna-maria@linutronix.de>
Subject: [patch V2 15/20] timer: Optimize collect timers for NOHZ
Date: Fri, 17 Jun 2016 13:26:42 -0000	[thread overview]
Message-ID: <20160617131004.060581803@linutronix.de> (raw)
In-Reply-To: 20160617121134.417319325@linutronix.de

[-- Attachment #1: timer_Optimize_collect_timers_for_NOHZ.patch --]
[-- Type: text/plain, Size: 4255 bytes --]

From: Anna-Maria Gleixner <anna-maria@linutronix.de>

After a NOHZ idle sleep the wheel must be forwarded to current jiffies. There
might be expired timers so the current code loops and checks the epxired
buckets for timers. This can take quite some time for long NOHZ idle periods.

The pending bitmask in the timer base allows us to do a quick search for the
next expiring timer and therefor a fast forward of the base time which
prevents pointless long lasting loops.

For a 3 second idle sleep this reduces the catchup time from ~1ms to 5us.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Chris Mason <clm@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: rt@linutronix.de
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Arjan van de Ven <arjan@infradead.org>

---
 kernel/time/timer.c |   52 ++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 44 insertions(+), 8 deletions(-)

--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1246,8 +1246,8 @@ static void expire_timers(struct timer_b
 	}
 }
 
-static int collect_expired_timers(struct timer_base *base,
-				  struct hlist_head *heads)
+static int __collect_expired_timers(struct timer_base *base,
+				    struct hlist_head *heads)
 {
 	unsigned long clk = base->clk >> BASE_CLK_SHIFT;
 	struct hlist_head *vec;
@@ -1273,9 +1273,9 @@ static int collect_expired_timers(struct
 
 #ifdef CONFIG_NO_HZ_COMMON
 /*
- * Find the next pending bucket of a level. Search from @offset + @clk upwards
- * and if nothing there, search from start of the level (@offset) up to
- * @offset + clk.
+ * Find the next pending bucket of a level. Search from level start (@offset)
+ * + @clk upwards and if nothing there, search from start of the level
+ * (@offset) up to @offset + clk.
  */
 static int next_pending_bucket(struct timer_base *base, unsigned offset,
 			       unsigned clk)
@@ -1292,7 +1292,8 @@ static int next_pending_bucket(struct ti
 }
 
 /*
- * Search the first expiring timer in the various clock levels.
+ * Search the first expiring timer in the various clock levels. Caller must
+ * hold base->lock.
  *
  * Note: This implementation might be suboptimal vs. timers enqueued in the
  *	 cascade level because we do not look at the timers to figure out when
@@ -1305,7 +1306,6 @@ static unsigned long __next_timer_interr
 	unsigned long clk, next, adj;
 	unsigned lvl, offset = 0;
 
-	spin_lock(&base->lock);
 	next = BASE_RND_UP(base->clk + NEXT_TIMER_MAX_DELTA);
 	clk = base->clk >> BASE_CLK_SHIFT;
 	for (lvl = 0; lvl < LVL_DEPTH; lvl++, offset += LVL_SIZE) {
@@ -1358,7 +1358,6 @@ static unsigned long __next_timer_interr
 		clk >>= LVL_CLK_SHIFT;
 		clk += adj;
 	}
-	spin_unlock(&base->lock);
 	return next;
 }
 
@@ -1416,7 +1415,10 @@ u64 get_next_timer_interrupt(unsigned lo
 	if (cpu_is_offline(smp_processor_id()))
 		return expires;
 
+	spin_lock(&base->lock);
 	nextevt = __next_timer_interrupt(base);
+	spin_unlock(&base->lock);
+
 	if (time_before_eq(nextevt, basej))
 		expires = basem;
 	else
@@ -1424,6 +1426,40 @@ u64 get_next_timer_interrupt(unsigned lo
 
 	return cmp_next_hrtimer_event(basem, expires);
 }
+
+static int collect_expired_timers(struct timer_base *base,
+				  struct hlist_head *heads)
+{
+	/*
+	 * NOHZ optimization. After a long idle sleep we need to forward the
+	 * base to current jiffies. Avoid a loop by searching the bitfield for
+	 * the next expiring timer.
+	 */
+	if ((long)(jiffies - base->clk) > 2 * BASE_INCR) {
+		unsigned long next = __next_timer_interrupt(base);
+
+		/*
+		 * If the next timer is ahead of time forward to current
+		 * jiffies, otherwise forward to the next expiry time.
+		 */
+		if (time_after(next, jiffies)) {
+			/*
+			 * We need to round down here as the call site will
+			 * increment clock once more.
+			 */
+			base->clk = BASE_RND_DN(jiffies);
+			return 0;
+		}
+		base->clk = next;
+	}
+	return __collect_expired_timers(base, heads);
+}
+#else
+static inline int collect_expired_timers(struct timer_base *base,
+					 struct hlist_head *heads)
+{
+	return __collect_expired_timers(base, heads);
+}
 #endif
 
 /*

  parent reply	other threads:[~2016-06-17 13:28 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-17 13:26 [patch V2 00/20] timer: Refactor the timer wheel Thomas Gleixner
2016-06-17 13:26 ` [patch V2 01/20] timer: Make pinned a timer property Thomas Gleixner
2016-06-17 13:26 ` [patch V2 02/20] x86/apic/uv: Initialize timer as pinned Thomas Gleixner
2016-06-17 13:26 ` [patch V2 03/20] x86/mce: " Thomas Gleixner
2016-06-17 13:26 ` [patch V2 05/20] driver/net/ethernet/tile: " Thomas Gleixner
2016-06-21 18:14   ` Peter Zijlstra
2016-06-17 13:26 ` [patch V2 04/20] cpufreq/powernv: " Thomas Gleixner
2016-06-17 13:26 ` [patch V2 06/20] drivers/tty/metag_da: " Thomas Gleixner
2016-06-17 13:26 ` [patch V2 07/20] drivers/tty/mips_ejtag: " Thomas Gleixner
2016-06-17 13:26 ` [patch V2 08/20] net/ipv4/inet: Initialize timers " Thomas Gleixner
2016-06-17 13:26 ` [patch V2 09/20] timer: Remove mod_timer_pinned Thomas Gleixner
2016-06-17 13:26 ` [patch V2 10/20] hlist: Add hlist_is_singular_node() helper Thomas Gleixner
2016-06-17 13:26 ` [patch V2 11/20] timer: Give a few structs and members proper names Thomas Gleixner
2016-06-17 13:26 ` [patch V2 12/20] timer: Switch to a non cascading wheel Thomas Gleixner
2016-06-18  9:55   ` George Spelvin
2016-06-24 10:06     ` Thomas Gleixner
2016-06-17 13:26 ` [patch V2 13/20] timer: Remove slack leftovers Thomas Gleixner
2016-06-17 13:26 ` [patch V2 14/20] timer: Move __run_timers() function Thomas Gleixner
2016-06-17 13:26 ` Thomas Gleixner [this message]
2016-06-17 13:26 ` [patch V2 16/20] tick/sched: Remove pointless empty function Thomas Gleixner
2016-06-17 13:26 ` [patch V2 17/20] timer: Forward wheel clock whenever possible Thomas Gleixner
2016-06-17 13:26 ` [patch V2 18/20] timer: Only wake softirq if necessary Thomas Gleixner
2016-06-17 13:26 ` [patch V2 19/20] timer: Split out index calculation Thomas Gleixner
2016-06-17 13:26 ` [patch V2 20/20] timer: Optimization for same expiry time in mod_timer() Thomas Gleixner
2016-06-17 13:48 ` [patch V2 00/20] timer: Refactor the timer wheel Eric Dumazet
2016-06-17 13:57   ` Thomas Gleixner
2016-06-17 14:25     ` Eric Dumazet
2016-06-20 13:56       ` Thomas Gleixner
2016-06-20 14:46         ` Arjan van de Ven
2016-06-20 14:46           ` Thomas Gleixner
2016-06-20 14:49             ` Arjan van de Ven
2016-06-20 19:03         ` Rik van Riel
2016-06-21  2:48           ` Eric Dumazet
2016-06-17 14:26 ` Arjan van de Ven
2016-06-20 15:05 ` Paul E. McKenney
2016-06-20 15:13   ` Thomas Gleixner
2016-06-20 15:41     ` Paul E. McKenney
2016-06-22  7:37 ` Mike Galbraith
2016-06-22  8:44   ` Thomas Gleixner
2016-06-22  9:06     ` Mike Galbraith
2016-06-22 13:37       ` Mike Galbraith
2016-06-22 10:28     ` [LTP] " Cyril Hrubis
2016-06-23  8:27       ` Thomas Gleixner
2016-06-23 11:47         ` Cyril Hrubis
2016-06-23 13:58           ` George Spelvin
2016-06-23 14:10             ` Thomas Gleixner
2016-06-23 15:11             ` Cyril Hrubis
2016-06-23 15:21               ` Thomas Gleixner
2016-06-23 16:31                 ` Cyril Hrubis
2016-06-26 19:00     ` Pavel Machek
2016-06-26 19:21       ` Arjan van de Ven
2016-06-26 20:02         ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160617131004.060581803@linutronix.de \
    --to=tglx@linutronix.de \
    --cc=anna-maria@linutronix.de \
    --cc=arjan@infradead.org \
    --cc=clm@fb.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@sciencehorizons.net \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=rt@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).