linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 00/10] timer: Move from a push remote at enqueue to a pull at expiry model
@ 2017-04-17 18:32 Thomas Gleixner
  2017-04-17 18:32 ` [patch 01/10] timer: Invoke timer_start_debug() where it makes sense Thomas Gleixner
                   ` (10 more replies)
  0 siblings, 11 replies; 13+ messages in thread
From: Thomas Gleixner @ 2017-04-17 18:32 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, John Stultz, Eric Dumazet, Anna-Maria Gleixner,
	Rafael J. Wysocki, linux-pm, Arjan van de Ven, Paul E. McKenney,
	Frederic Weisbecker, Rik van Riel

Placing timers at enqueue time on a target CPU based on dubious heuristics
does not make any sense:

 1) Most timer wheel timers are canceled or rearmed before they expire.

 2) The heuristics to predict which CPU will be busy when the timer expires
    are wrong by definition.

So we waste precious cycles to place timers at enqueue time.

The proper solution to this problem is to always queue the timers on the
local CPU and allow the non pinned timers to be pulled onto a busy CPU at
expiry time.

To achieve this the timer storage has been split into local pinned and
global timers. Local pinned timers are always expired on the CPU on which
they have been queued. Global timers can be expired on any CPU.

As long as a CPU is busy it expires both local and global timers. When a
CPU goes idle it arms for the first expiring local timer. If the first
expiring pinned (local) timer is before the first expiring movable timer,
then no action is required because the CPU will wake up before the first
movable timer expires. If the first expiring movable timer is before the
first expiring pinned (local) timer, then this timer is queued into a idle
timerqueue and eventually expired by some other active CPU.

To avoid global locking the timerqueues are implemented as a hierarchy. The
lowest level of the hierarchy holds the CPUs. The CPUs are associated to
groups of 8, which are seperated per node. If more than one CPU group
exist, then a second level in the hierarchy collects the groups. Depending
on the size of the system more than 2 levels are required. Each group has a
"migrator" which checks the timerqueue during the tick for remote expirable
timers.

If the last CPU in a group goes idle it reports the first expiring event in
the group up to the next group(s) in the hierarchy. If the last CPU goes
idle it arms its timer for the first system wide expiring timer to ensure
that no timer event is missed.

The series is also available from git:

  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.timers

Thanks,

	tglx
---
 b/.../timer_migration.h         |  173 ++++++++++
 b/kernel/time/timer_migration.c |  659 ++++++++++++++++++++++++++++++++++++++++
 b/kernel/time/timer_migration.h |   89 +++++
 include/linux/cpuhotplug.h      |    1 
 kernel/time/Makefile            |    1 
 kernel/time/tick-internal.h     |    4 
 kernel/time/tick-sched.c        |  121 ++++++-
 kernel/time/tick-sched.h        |    3 
 kernel/time/timer.c             |  240 +++++++++-----
 lib/timerqueue.c                |    8 
 10 files changed, 1203 insertions(+), 96 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [patch 01/10] timer: Invoke timer_start_debug() where it makes sense
  2017-04-17 18:32 [patch 00/10] timer: Move from a push remote at enqueue to a pull at expiry model Thomas Gleixner
@ 2017-04-17 18:32 ` Thomas Gleixner
  2017-04-17 18:32 ` [patch 02/10] timerqueue: Document return values of timerqueue_add/del() Thomas Gleixner
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2017-04-17 18:32 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, John Stultz, Eric Dumazet, Anna-Maria Gleixner,
	Rafael J. Wysocki, linux-pm, Arjan van de Ven, Paul E. McKenney,
	Frederic Weisbecker, Rik van Riel

[-- Attachment #1: timer--call-debug-function-after-setting-properly-the-base-and-the-flags.patch --]
[-- Type: text/plain, Size: 834 bytes --]

The timer start debug function is called before the proper timer base
is set.

As a consequence the trace data contains the stale CPU and flags values.

Call the debug function after setting the new base and flags.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/timer.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -970,8 +970,6 @@ static inline int
 	if (!ret && pending_only)
 		goto out_unlock;
 
-	debug_activate(timer, expires);
-
 	new_base = get_target_base(base, timer->flags);
 
 	if (base != new_base) {
@@ -994,6 +992,8 @@ static inline int
 		}
 	}
 
+	debug_activate(timer, expires);
+
 	/* Try to forward a stale timer base clock */
 	forward_timer_base(base);
 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [patch 02/10] timerqueue: Document return values of timerqueue_add/del()
  2017-04-17 18:32 [patch 00/10] timer: Move from a push remote at enqueue to a pull at expiry model Thomas Gleixner
  2017-04-17 18:32 ` [patch 01/10] timer: Invoke timer_start_debug() where it makes sense Thomas Gleixner
@ 2017-04-17 18:32 ` Thomas Gleixner
  2017-04-17 18:32 ` [patch 03/10] timers: Rework idle logic Thomas Gleixner
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2017-04-17 18:32 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, John Stultz, Eric Dumazet, Anna-Maria Gleixner,
	Rafael J. Wysocki, linux-pm, Arjan van de Ven, Paul E. McKenney,
	Frederic Weisbecker, Rik van Riel

[-- Attachment #1: timerqueue--Document-return-values.patch --]
[-- Type: text/plain, Size: 1292 bytes --]

The return values of timerqueue_add/del() are not documented in the kernel doc
comment. Add proper documentation.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 lib/timerqueue.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Index: b/lib/timerqueue.c
===================================================================
--- a/lib/timerqueue.c
+++ b/lib/timerqueue.c
@@ -33,8 +33,9 @@
  * @head: head of timerqueue
  * @node: timer node to be added
  *
- * Adds the timer node to the timerqueue, sorted by the
- * node's expires value.
+ * Adds the timer node to the timerqueue, sorted by the node's expires
+ * value. Returns true if the newly added timer is the first expiring timer in
+ * the queue.
  */
 bool timerqueue_add(struct timerqueue_head *head, struct timerqueue_node *node)
 {
@@ -70,7 +71,8 @@ EXPORT_SYMBOL_GPL(timerqueue_add);
  * @head: head of timerqueue
  * @node: timer node to be removed
  *
- * Removes the timer node from the timerqueue.
+ * Removes the timer node from the timerqueue. Returns true if the queue is
+ * not empty after the remove.
  */
 bool timerqueue_del(struct timerqueue_head *head, struct timerqueue_node *node)
 {

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [patch 03/10] timers: Rework idle logic
  2017-04-17 18:32 [patch 00/10] timer: Move from a push remote at enqueue to a pull at expiry model Thomas Gleixner
  2017-04-17 18:32 ` [patch 01/10] timer: Invoke timer_start_debug() where it makes sense Thomas Gleixner
  2017-04-17 18:32 ` [patch 02/10] timerqueue: Document return values of timerqueue_add/del() Thomas Gleixner
@ 2017-04-17 18:32 ` Thomas Gleixner
  2017-04-17 18:32 ` [patch 04/10] timer: Keep the pinned timers separate from the others Thomas Gleixner
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2017-04-17 18:32 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, John Stultz, Eric Dumazet, Anna-Maria Gleixner,
	Rafael J. Wysocki, linux-pm, Arjan van de Ven, Paul E. McKenney,
	Frederic Weisbecker, Rik van Riel

[-- Attachment #1: timers--Rework-idle-logic.patch --]
[-- Type: text/plain, Size: 3085 bytes --]

Storing next event and determining whether the base is idle can be done in
__next_timer_interrupt(). 

Preparatory patch for new call sites which need this information as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/timer.c |   43 ++++++++++++++++++++++++-------------------
 1 file changed, 24 insertions(+), 19 deletions(-)

--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1358,8 +1358,11 @@ static int next_pending_bucket(struct ti
 /*
  * Search the first expiring timer in the various clock levels. Caller must
  * hold base->lock.
+ *
+ * Stores the next expiry time in base. The return value indicates whether
+ * the base is empty or not.
  */
-static unsigned long __next_timer_interrupt(struct timer_base *base)
+static bool __next_timer_interrupt(struct timer_base *base)
 {
 	unsigned long clk, next, adj;
 	unsigned lvl, offset = 0;
@@ -1416,7 +1419,10 @@ static unsigned long __next_timer_interr
 		clk >>= LVL_CLK_SHIFT;
 		clk += adj;
 	}
-	return next;
+	/* Store the next event in the base */
+	base->next_expiry = next;
+	/* Return whether the base is empty or not */
+	return next == base->clk + NEXT_TIMER_MAX_DELTA;
 }
 
 /*
@@ -1465,7 +1471,7 @@ u64 get_next_timer_interrupt(unsigned lo
 	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
 	u64 expires = KTIME_MAX;
 	unsigned long nextevt;
-	bool is_max_delta;
+	bool is_empty;
 
 	/*
 	 * Pretend that there is no timer pending if the cpu is offline.
@@ -1475,9 +1481,8 @@ u64 get_next_timer_interrupt(unsigned lo
 		return expires;
 
 	spin_lock(&base->lock);
-	nextevt = __next_timer_interrupt(base);
-	is_max_delta = (nextevt == base->clk + NEXT_TIMER_MAX_DELTA);
-	base->next_expiry = nextevt;
+	is_empty = __next_timer_interrupt(base);
+	nextevt = base->next_expiry;
 	/*
 	 * We have a fresh next event. Check whether we can forward the
 	 * base. We can only do that when @basej is past base->clk
@@ -1490,20 +1495,17 @@ u64 get_next_timer_interrupt(unsigned lo
 			base->clk = nextevt;
 	}
 
-	if (time_before_eq(nextevt, basej)) {
-		expires = basem;
-		base->is_idle = false;
-	} else {
-		if (!is_max_delta)
-			expires = basem + (nextevt - basej) * TICK_NSEC;
-		/*
-		 * If we expect to sleep more than a tick, mark the base idle:
-		 */
-		if ((expires - basem) > TICK_NSEC)
-			base->is_idle = true;
-	}
+	/* Base is idle if the next event is more than a tick away. */
+	base->is_idle = time_after(nextevt, basej + 1);
 	spin_unlock(&base->lock);
 
+	if (!is_empty) {
+		/* If we missed a tick already, force 0 delta */
+		if (time_before_eq(nextevt, basej))
+			nextevt = basej;
+		expires = basem + (nextevt - basej) * TICK_NSEC;
+	}
+
 	return cmp_next_hrtimer_event(basem, expires);
 }
 
@@ -1534,7 +1536,10 @@ static int collect_expired_timers(struct
 	 * the next expiring timer.
 	 */
 	if ((long)(jiffies - base->clk) > 2) {
-		unsigned long next = __next_timer_interrupt(base);
+		unsigned long next;
+
+		__next_timer_interrupt(base);
+		next = base->next_expiry;
 
 		/*
 		 * If the next timer is ahead of time forward to current

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [patch 04/10] timer: Keep the pinned timers separate from the others
  2017-04-17 18:32 [patch 00/10] timer: Move from a push remote at enqueue to a pull at expiry model Thomas Gleixner
                   ` (2 preceding siblings ...)
  2017-04-17 18:32 ` [patch 03/10] timers: Rework idle logic Thomas Gleixner
@ 2017-04-17 18:32 ` Thomas Gleixner
  2017-04-17 18:32 ` [patch 05/10] timer: Retrieve next expiry of pinned/non-pinned timers seperately Thomas Gleixner
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2017-04-17 18:32 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, John Stultz, Eric Dumazet, Anna-Maria Gleixner,
	Rafael J. Wysocki, linux-pm, Arjan van de Ven, Paul E. McKenney,
	Frederic Weisbecker, Rik van Riel, Richard Cochran

[-- Attachment #1: timer_Keep_the_pinned_timers_separate_from_the_others.patch --]
[-- Type: text/plain, Size: 6858 bytes --]

Seperate the storage space for pinned timers.

This is preparatory work for changing the NOHZ timer placement from a push
at enqueue time to a pull at expiry time model.

No functional change.

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/time/timer.c |   98 +++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 70 insertions(+), 28 deletions(-)

--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -185,12 +185,14 @@ EXPORT_SYMBOL(jiffies_64);
 #define WHEEL_SIZE	(LVL_SIZE * LVL_DEPTH)
 
 #ifdef CONFIG_NO_HZ_COMMON
-# define NR_BASES	2
-# define BASE_STD	0
-# define BASE_DEF	1
+# define NR_BASES	3
+# define BASE_LOCAL	0
+# define BASE_GLOBAL	1
+# define BASE_DEF	2
 #else
 # define NR_BASES	1
-# define BASE_STD	0
+# define BASE_LOCAL	0
+# define BASE_GLOBAL	0
 # define BASE_DEF	0
 #endif
 
@@ -218,16 +220,18 @@ void timers_update_migration(bool update
 	unsigned int cpu;
 
 	/* Avoid the loop, if nothing to update */
-	if (this_cpu_read(timer_bases[BASE_STD].migration_enabled) == on)
+	if (this_cpu_read(timer_bases[BASE_GLOBAL].migration_enabled) == on)
 		return;
 
 	for_each_possible_cpu(cpu) {
-		per_cpu(timer_bases[BASE_STD].migration_enabled, cpu) = on;
+		per_cpu(timer_bases[BASE_LOCAL].migration_enabled, cpu) = on;
+		per_cpu(timer_bases[BASE_GLOBAL].migration_enabled, cpu) = on;
 		per_cpu(timer_bases[BASE_DEF].migration_enabled, cpu) = on;
 		per_cpu(hrtimer_bases.migration_enabled, cpu) = on;
 		if (!update_nohz)
 			continue;
-		per_cpu(timer_bases[BASE_STD].nohz_active, cpu) = true;
+		per_cpu(timer_bases[BASE_LOCAL].nohz_active, cpu) = true;
+		per_cpu(timer_bases[BASE_GLOBAL].nohz_active, cpu) = true;
 		per_cpu(timer_bases[BASE_DEF].nohz_active, cpu) = true;
 		per_cpu(hrtimer_bases.nohz_active, cpu) = true;
 	}
@@ -810,7 +814,10 @@ static int detach_if_pending(struct time
 
 static inline struct timer_base *get_timer_cpu_base(u32 tflags, u32 cpu)
 {
-	struct timer_base *base = per_cpu_ptr(&timer_bases[BASE_STD], cpu);
+	int index = tflags & TIMER_PINNED ? BASE_LOCAL : BASE_GLOBAL;
+	struct timer_base *base;
+
+	base = per_cpu_ptr(&timer_bases[index], cpu);
 
 	/*
 	 * If the timer is deferrable and nohz is active then we need to use
@@ -824,7 +831,10 @@ static inline struct timer_base *get_tim
 
 static inline struct timer_base *get_timer_this_cpu_base(u32 tflags)
 {
-	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
+	int index = tflags & TIMER_PINNED ? BASE_LOCAL : BASE_GLOBAL;
+	struct timer_base *base;
+
+	base = this_cpu_ptr(&timer_bases[index]);
 
 	/*
 	 * If the timer is deferrable and nohz is active then we need to use
@@ -1468,10 +1478,10 @@ static u64 cmp_next_hrtimer_event(u64 ba
  */
 u64 get_next_timer_interrupt(unsigned long basej, u64 basem)
 {
-	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
+	unsigned long nextevt, nextevt_local, nextevt_global;
+	bool local_empty, global_empty, local_first, is_idle;
+	struct timer_base *base_local, *base_global;
 	u64 expires = KTIME_MAX;
-	unsigned long nextevt;
-	bool is_empty;
 
 	/*
 	 * Pretend that there is no timer pending if the cpu is offline.
@@ -1480,26 +1490,49 @@ u64 get_next_timer_interrupt(unsigned lo
 	if (cpu_is_offline(smp_processor_id()))
 		return expires;
 
-	spin_lock(&base->lock);
-	is_empty = __next_timer_interrupt(base);
-	nextevt = base->next_expiry;
+	base_local = this_cpu_ptr(&timer_bases[BASE_LOCAL]);
+	base_global = this_cpu_ptr(&timer_bases[BASE_GLOBAL]);
+
+	spin_lock(&base_global->lock);
+	spin_lock_nested(&base_local->lock, SINGLE_DEPTH_NESTING);
+
+	local_empty = __next_timer_interrupt(base_local);
+	nextevt_local = base_local->next_expiry;
+
+	global_empty = __next_timer_interrupt(base_global);
+	nextevt_global = base_global->next_expiry;
+
 	/*
 	 * We have a fresh next event. Check whether we can forward the
 	 * base. We can only do that when @basej is past base->clk
 	 * otherwise we might rewind base->clk.
 	 */
-	if (time_after(basej, base->clk)) {
-		if (time_after(nextevt, basej))
-			base->clk = basej;
-		else if (time_after(nextevt, base->clk))
-			base->clk = nextevt;
+	if (time_after(basej, base_local->clk)) {
+		if (time_after(nextevt_local, basej))
+			base_local->clk = basej;
+		else if (time_after(nextevt_local, base_local->clk))
+			base_local->clk = nextevt_local;
+	}
+
+	if (time_after(basej, base_global->clk)) {
+		if (time_after(nextevt_global, basej))
+			base_global->clk = basej;
+		else if (time_after(nextevt_global, base_global->clk))
+			base_global->clk = nextevt_global;
 	}
 
 	/* Base is idle if the next event is more than a tick away. */
-	base->is_idle = time_after(nextevt, basej + 1);
-	spin_unlock(&base->lock);
+	local_first = time_before_eq(nextevt_local, nextevt_global);
+	nextevt = local_first ? nextevt_local : nextevt_global;
+	is_idle = time_after(nextevt, basej + 1);
+
+	/* We need to mark both bases in sync */
+	base_local->is_idle = base_global->is_idle = is_idle;
 
-	if (!is_empty) {
+	spin_unlock(&base_local->lock);
+	spin_unlock(&base_global->lock);
+
+	if (!local_empty || !global_empty) {
 		/* If we missed a tick already, force 0 delta */
 		if (time_before_eq(nextevt, basej))
 			nextevt = basej;
@@ -1516,7 +1549,7 @@ u64 get_next_timer_interrupt(unsigned lo
  */
 void timer_clear_idle(void)
 {
-	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
+	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_LOCAL]);
 
 	/*
 	 * We do this unlocked. The worst outcome is a remote enqueue sending
@@ -1525,6 +1558,9 @@ void timer_clear_idle(void)
 	 * the lock in the exit from idle path.
 	 */
 	base->is_idle = false;
+
+	base = this_cpu_ptr(&timer_bases[BASE_GLOBAL]);
+	base->is_idle = false;
 }
 
 static int collect_expired_timers(struct timer_base *base,
@@ -1614,11 +1650,17 @@ static inline void __run_timers(struct t
  */
 static __latent_entropy void run_timer_softirq(struct softirq_action *h)
 {
-	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
+	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_LOCAL]);
 
 	__run_timers(base);
-	if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active)
-		__run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));
+	if (IS_ENABLED(CONFIG_NO_HZ_COMMON)) {
+		base = this_cpu_ptr(&timer_bases[BASE_GLOBAL]);
+		__run_timers(base);
+
+		base = this_cpu_ptr(&timer_bases[BASE_DEF]);
+		if (base->nohz_active)
+			__run_timers(base);
+	}
 }
 
 /*
@@ -1626,7 +1668,7 @@ static __latent_entropy void run_timer_s
  */
 void run_local_timers(void)
 {
-	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
+	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_LOCAL]);
 
 	hrtimer_run_queues();
 	/* Raise the softirq only if required. */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [patch 05/10] timer: Retrieve next expiry of pinned/non-pinned timers seperately
  2017-04-17 18:32 [patch 00/10] timer: Move from a push remote at enqueue to a pull at expiry model Thomas Gleixner
                   ` (3 preceding siblings ...)
  2017-04-17 18:32 ` [patch 04/10] timer: Keep the pinned timers separate from the others Thomas Gleixner
@ 2017-04-17 18:32 ` Thomas Gleixner
  2017-04-17 18:32 ` [patch 06/10] timer: Restructure internal locking Thomas Gleixner
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2017-04-17 18:32 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, John Stultz, Eric Dumazet, Anna-Maria Gleixner,
	Rafael J. Wysocki, linux-pm, Arjan van de Ven, Paul E. McKenney,
	Frederic Weisbecker, Rik van Riel, Richard Cochran

[-- Attachment #1: timer_Provide_both_pinned_and_movable_expiration_times.patch --]
[-- Type: text/plain, Size: 4686 bytes --]

To prepare for the conversion of the NOHZ timer placement to a pull at
expiry time model it's required to have seperate expiry times for the
pinned and the non-pinned (movable) timers.

No functional change

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/time/tick-internal.h |    3 ++-
 kernel/time/tick-sched.c    |   10 ++++++----
 kernel/time/timer.c         |   41 +++++++++++++++++++++++++++++++++++------
 3 files changed, 43 insertions(+), 11 deletions(-)

--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -163,5 +163,6 @@ static inline void timers_update_migrati
 
 DECLARE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases);
 
-extern u64 get_next_timer_interrupt(unsigned long basej, u64 basem);
+extern u64 get_next_timer_interrupt(unsigned long basej, u64 basem,
+				    u64 *global_evt);
 void timer_clear_idle(void);
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -666,7 +666,7 @@ static ktime_t tick_nohz_stop_sched_tick
 					 ktime_t now, int cpu)
 {
 	struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
-	u64 basemono, next_tick, next_tmr, next_rcu, delta, expires;
+	u64 basemono, next_tick, next_local, next_global, next_rcu, delta, expires;
 	unsigned long seq, basejiff;
 	ktime_t	tick;
 
@@ -689,10 +689,12 @@ static ktime_t tick_nohz_stop_sched_tick
 		 * disabled this also looks at the next expiring
 		 * hrtimer.
 		 */
-		next_tmr = get_next_timer_interrupt(basejiff, basemono);
-		ts->next_timer = next_tmr;
+		next_local = get_next_timer_interrupt(basejiff, basemono,
+						      &next_global);
+		next_local = min(next_local, next_global);
+		ts->next_timer = next_local;
 		/* Take the next rcu event into account */
-		next_tick = next_rcu < next_tmr ? next_rcu : next_tmr;
+		next_tick = next_rcu < next_local ? next_rcu : next_local;
 	}
 
 	/*
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1472,23 +1472,27 @@ static u64 cmp_next_hrtimer_event(u64 ba
  * get_next_timer_interrupt - return the time (clock mono) of the next timer
  * @basej:	base time jiffies
  * @basem:	base time clock monotonic
+ * @global_evt:	Pointer to store the expiry time of the next global timer
  *
  * Returns the tick aligned clock monotonic time of the next pending
  * timer or KTIME_MAX if no timer is pending.
  */
-u64 get_next_timer_interrupt(unsigned long basej, u64 basem)
+u64 get_next_timer_interrupt(unsigned long basej, u64 basem, u64 *global_evt)
 {
 	unsigned long nextevt, nextevt_local, nextevt_global;
 	bool local_empty, global_empty, local_first, is_idle;
 	struct timer_base *base_local, *base_global;
-	u64 expires = KTIME_MAX;
+	u64 local_evt = KTIME_MAX;
+
+	/* Preset global event */
+	*global_evt = KTIME_MAX;
 
 	/*
 	 * Pretend that there is no timer pending if the cpu is offline.
 	 * Possible pending timers will be migrated later to an active cpu.
 	 */
 	if (cpu_is_offline(smp_processor_id()))
-		return expires;
+		return local_evt;
 
 	base_local = this_cpu_ptr(&timer_bases[BASE_LOCAL]);
 	base_global = this_cpu_ptr(&timer_bases[BASE_GLOBAL]);
@@ -1532,14 +1536,39 @@ u64 get_next_timer_interrupt(unsigned lo
 	spin_unlock(&base_local->lock);
 	spin_unlock(&base_global->lock);
 
-	if (!local_empty || !global_empty) {
+	/*
+	 * If the bases are not marked idle, i.e one of the events is at
+	 * max. one tick away, use the next event for calculating next
+	 * local expiry value. The next global event is left as KTIME_MAX,
+	 * so this CPU will not queue itself in the global expiry
+	 * mechanism.
+	 */
+	if (!is_idle) {
 		/* If we missed a tick already, force 0 delta */
 		if (time_before_eq(nextevt, basej))
 			nextevt = basej;
-		expires = basem + (nextevt - basej) * TICK_NSEC;
+		local_evt = basem + (nextevt - basej) * TICK_NSEC;
+		return cmp_next_hrtimer_event(basem, local_evt);
 	}
 
-	return cmp_next_hrtimer_event(basem, expires);
+	/*
+	 * If the bases are marked idle, i.e. the next event on both the
+	 * local and the global queue are farther away than a tick,
+	 * evaluate both bases. No need to check whether one of the bases
+	 * has an already expired timer as this is caught by the !is_idle
+	 * condition above.
+	 */
+	if (!local_empty)
+		local_evt = basem + (nextevt_local - basej) * TICK_NSEC;
+
+	/*
+	 * If the local queue expires first, there is no requirement for
+	 * queuing the CPU in the global expiry mechanism.
+	 */
+	if (!local_first && !global_empty)
+		*global_evt = basem + (nextevt_global - basej) * TICK_NSEC;
+
+	return cmp_next_hrtimer_event(basem, local_evt);
 }
 
 /**

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [patch 06/10] timer: Restructure internal locking
  2017-04-17 18:32 [patch 00/10] timer: Move from a push remote at enqueue to a pull at expiry model Thomas Gleixner
                   ` (4 preceding siblings ...)
  2017-04-17 18:32 ` [patch 05/10] timer: Retrieve next expiry of pinned/non-pinned timers seperately Thomas Gleixner
@ 2017-04-17 18:32 ` Thomas Gleixner
  2017-04-17 18:32 ` [patch 07/10] tick/sched: Split out jiffies update helper function Thomas Gleixner
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2017-04-17 18:32 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, John Stultz, Eric Dumazet, Anna-Maria Gleixner,
	Rafael J. Wysocki, linux-pm, Arjan van de Ven, Paul E. McKenney,
	Frederic Weisbecker, Rik van Riel, Richard Cochran

[-- Attachment #1: timer_Restructure_internal_locking.patch --]
[-- Type: text/plain, Size: 2165 bytes --]

Move the locking out from __run_timers() to the call sites, so the
protected section can be extended at the call site. Preparatory patch for
changing the NOHZ timer placement to a pull at expiry time model.

No functional change.

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/time/timer.c |   33 +++++++++++++++++++--------------
 1 file changed, 19 insertions(+), 14 deletions(-)

--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1651,17 +1651,14 @@ void update_process_times(int user_tick)
 /**
  * __run_timers - run all expired timers (if any) on this CPU.
  * @base: the timer vector to be processed.
+ *
+ * Caller must hold the base lock.
  */
 static inline void __run_timers(struct timer_base *base)
 {
 	struct hlist_head heads[LVL_DEPTH];
 	int levels;
 
-	if (!time_after_eq(jiffies, base->clk))
-		return;
-
-	spin_lock_irq(&base->lock);
-
 	while (time_after_eq(jiffies, base->clk)) {
 
 		levels = collect_expired_timers(base, heads);
@@ -1671,7 +1668,20 @@ static inline void __run_timers(struct t
 			expire_timers(base, heads + levels);
 	}
 	base->running_timer = NULL;
-	spin_unlock_irq(&base->lock);
+}
+
+static void run_timer_base(int index, bool check_nohz)
+{
+	struct timer_base *base = this_cpu_ptr(&timer_bases[index]);
+
+	if (check_nohz && !base->nohz_active)
+		return;
+
+	if (time_after_eq(jiffies, base->clk)) {
+		spin_lock_irq(&base->lock);
+		__run_timers(base);
+		spin_unlock_irq(&base->lock);
+	}
 }
 
 /*
@@ -1679,16 +1689,11 @@ static inline void __run_timers(struct t
  */
 static __latent_entropy void run_timer_softirq(struct softirq_action *h)
 {
-	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_LOCAL]);
+	run_timer_base(BASE_LOCAL, false);
 
-	__run_timers(base);
 	if (IS_ENABLED(CONFIG_NO_HZ_COMMON)) {
-		base = this_cpu_ptr(&timer_bases[BASE_GLOBAL]);
-		__run_timers(base);
-
-		base = this_cpu_ptr(&timer_bases[BASE_DEF]);
-		if (base->nohz_active)
-			__run_timers(base);
+		run_timer_base(BASE_GLOBAL, false);
+		run_timer_base(BASE_DEF, true);
 	}
 }
 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [patch 07/10] tick/sched: Split out jiffies update helper function
  2017-04-17 18:32 [patch 00/10] timer: Move from a push remote at enqueue to a pull at expiry model Thomas Gleixner
                   ` (5 preceding siblings ...)
  2017-04-17 18:32 ` [patch 06/10] timer: Restructure internal locking Thomas Gleixner
@ 2017-04-17 18:32 ` Thomas Gleixner
  2017-04-17 18:32 ` [patch 08/10] timer: Implement the hierarchical pull model Thomas Gleixner
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2017-04-17 18:32 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, John Stultz, Eric Dumazet, Anna-Maria Gleixner,
	Rafael J. Wysocki, linux-pm, Arjan van de Ven, Paul E. McKenney,
	Frederic Weisbecker, Rik van Riel, Richard Cochran

[-- Attachment #1: tick-sched_Split-out-jiffies-update-helper.patch --]
[-- Type: text/plain, Size: 2050 bytes --]

The logic to get the time of the last jiffies update will be needed by
the timer pull model as well.

Move the code into a global funtion in anticipation of the new caller.

No functional change.

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/time/tick-internal.h |    1 +
 kernel/time/tick-sched.c    |   27 ++++++++++++++++++++-------
 2 files changed, 21 insertions(+), 7 deletions(-)

--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -151,6 +151,7 @@ static inline void tick_nohz_init(void)
 
 #ifdef CONFIG_NO_HZ_COMMON
 extern unsigned long tick_nohz_active;
+extern u64 get_jiffies_update(unsigned long *basej);
 #else
 #define tick_nohz_active (0)
 #endif
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -112,6 +112,24 @@ static ktime_t tick_init_jiffy_update(vo
 	return period;
 }
 
+#ifdef CONFIG_NO_HZ_COMMON
+/*
+ * Read jiffies and the time when jiffies were updated last
+ */
+u64 get_jiffies_update(unsigned long *basej)
+{
+	unsigned long seq, basejiff;
+	u64 basemono;
+
+	do {
+		seq = read_seqbegin(&jiffies_lock);
+		basemono = last_jiffies_update;
+		basejiff = jiffies;
+	} while (read_seqretry(&jiffies_lock, seq));
+	*basej = basejiff;
+	return basemono;
+}
+#endif
 
 static void tick_sched_do_timer(ktime_t now)
 {
@@ -667,15 +685,10 @@ static ktime_t tick_nohz_stop_sched_tick
 {
 	struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
 	u64 basemono, next_tick, next_local, next_global, next_rcu, delta, expires;
-	unsigned long seq, basejiff;
+	unsigned long basejiff;
 	ktime_t	tick;
 
-	/* Read jiffies and the time when jiffies were updated last */
-	do {
-		seq = read_seqbegin(&jiffies_lock);
-		basemono = last_jiffies_update;
-		basejiff = jiffies;
-	} while (read_seqretry(&jiffies_lock, seq));
+	basemono = get_jiffies_update(&basejiff);
 	ts->last_jiffies = basejiff;
 
 	if (rcu_needs_cpu(basemono, &next_rcu) ||

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [patch 08/10] timer: Implement the hierarchical pull model
  2017-04-17 18:32 [patch 00/10] timer: Move from a push remote at enqueue to a pull at expiry model Thomas Gleixner
                   ` (6 preceding siblings ...)
  2017-04-17 18:32 ` [patch 07/10] tick/sched: Split out jiffies update helper function Thomas Gleixner
@ 2017-04-17 18:32 ` Thomas Gleixner
  2017-04-17 18:32 ` [patch 09/10] timer/migration: Add tracepoints Thomas Gleixner
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2017-04-17 18:32 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, John Stultz, Eric Dumazet, Anna-Maria Gleixner,
	Rafael J. Wysocki, linux-pm, Arjan van de Ven, Paul E. McKenney,
	Frederic Weisbecker, Rik van Riel

[-- Attachment #1: timer--Implement-the-hierarchical-pull-model.patch --]
[-- Type: text/plain, Size: 32219 bytes --]

Placing timers at enqueue time on a target CPU based on dubious heuristics
does not make any sense:

 1) Most timer wheel timers are canceled or rearmed before they expire.

 2) The heuristics to predict which CPU will be busy when the timer expires
    are wrong by definition.

So we waste precious cycles to place timers at enqueue time.

The proper solution to this problem is to always queue the timers on the
local CPU and allow the non pinned timers to be pulled onto a busy CPU at
expiry time.

To achieve this the timer storage has been split into local pinned and
global timers. Local pinned timers are always expired on the CPU on which
they have been queued. Global timers can be expired on any CPU.

As long as a CPU is busy it expires both local and global timers. When a
CPU goes idle it arms for the first expiring local timer. If the first
expiring pinned (local) timer is before the first expiring movable timer,
then no action is required because the CPU will wake up before the first
movable timer expires. If the first expiring movable timer is before the
first expiring pinned (local) timer, then this timer is queued into a idle
timerqueue and eventually expired by some other active CPU.

To avoid global locking the timerqueues are implemented as a hierarchy. The
lowest level of the hierarchy holds the CPUs. The CPUs are associated to
groups of 8, which are seperated per node. If more than one CPU group
exist, then a second level in the hierarchy collects the groups. Depending
on the size of the system more than 2 levels are required. Each group has a
"migrator" which checks the timerqueue during the tick for remote expirable
timers.

If the last CPU in a group goes idle it reports the first expiring event in
the group up to the next group(s) in the hierarchy. If the last CPU goes
idle it arms its timer for the first system wide expiring timer to ensure
that no timer event is missed.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/cpuhotplug.h    |    1 
 kernel/time/Makefile          |    1 
 kernel/time/tick-sched.c      |   92 +++++-
 kernel/time/tick-sched.h      |    3 
 kernel/time/timer.c           |   31 +-
 kernel/time/timer_migration.c |  642 ++++++++++++++++++++++++++++++++++++++++++
 kernel/time/timer_migration.h |   89 +++++
 7 files changed, 847 insertions(+), 12 deletions(-)

--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -137,6 +137,7 @@ enum cpuhp_state {
 	CPUHP_AP_PERF_ARM_CCN_ONLINE,
 	CPUHP_AP_PERF_ARM_L2X0_ONLINE,
 	CPUHP_AP_PERF_ARM_QCOM_L2_ONLINE,
+	CPUHP_AP_TMIGR_ONLINE,
 	CPUHP_AP_WORKQUEUE_ONLINE,
 	CPUHP_AP_RCUTREE_ONLINE,
 	CPUHP_AP_ONLINE_DYN,
--- a/kernel/time/Makefile
+++ b/kernel/time/Makefile
@@ -15,5 +15,6 @@ ifeq ($(CONFIG_GENERIC_CLOCKEVENTS_BROAD
 endif
 obj-$(CONFIG_GENERIC_SCHED_CLOCK)		+= sched_clock.o
 obj-$(CONFIG_TICK_ONESHOT)			+= tick-oneshot.o tick-sched.o
+obj-$(CONFIG_NO_HZ_COMMON)			+= timer_migration.o
 obj-$(CONFIG_DEBUG_FS)				+= timekeeping_debug.o
 obj-$(CONFIG_TEST_UDELAY)			+= test_udelay.o
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -30,6 +30,7 @@
 
 #include <asm/irq_regs.h>
 
+#include "timer_migration.h"
 #include "tick-internal.h"
 
 #include <trace/events/timer.h>
@@ -680,11 +681,46 @@ static void tick_nohz_restart(struct tic
 		tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1);
 }
 
+#ifdef CONFIG_SMP
+static u64
+tick_tmigr_idle(struct tick_sched *ts, u64 next_global, u64 next_local)
+{
+	ts->tmigr_idle = 1;
+
+	/*
+	 * If next_global is after next_local, event does not have to
+	 * be queued in the timer migration hierarchy, but cpu needs
+	 * to be marked as idle.
+	 */
+	if (next_global >= next_local)
+		next_global = KTIME_MAX;
+
+	next_global = tmigr_cpu_idle(next_global);
+
+	return min_t(u64, next_local, next_global);
+}
+
+static void tick_tmigr_stop_idle(struct tick_sched *ts)
+{
+	if (ts->tmigr_idle) {
+		ts->tmigr_idle = 0;
+		tmigr_cpu_activate();
+	}
+}
+#else
+static u64
+tick_tmigr_idle(struct tick_sched *ts, u64 next_global, u64 next_local)
+{
+	return min_t(u64, next_global, next_local);
+}
+static inline void tick_tmigr_stop_idle(struct tick_sched *ts) { }
+#endif /*CONFIG_SMP*/
+
 static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 					 ktime_t now, int cpu)
 {
 	struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
-	u64 basemono, next_tick, next_local, next_global, next_rcu, delta, expires;
+	u64 basemono, next_local, next_global, next_rcu, delta, expires;
 	unsigned long basejiff;
 	ktime_t	tick;
 
@@ -693,7 +729,12 @@ static ktime_t tick_nohz_stop_sched_tick
 
 	if (rcu_needs_cpu(basemono, &next_rcu) ||
 	    arch_needs_cpu() || irq_work_needs_cpu()) {
-		next_tick = basemono + TICK_NSEC;
+		/*
+		 * If anyone needs the CPU, treat this as a local
+		 * timer expiring in a jiffy.
+		 */
+		next_global = KTIME_MAX;
+		next_local = basemono + TICK_NSEC;
 	} else {
 		/*
 		 * Get the next pending timer. If high resolution
@@ -704,21 +745,48 @@ static ktime_t tick_nohz_stop_sched_tick
 		 */
 		next_local = get_next_timer_interrupt(basejiff, basemono,
 						      &next_global);
-		next_local = min(next_local, next_global);
-		ts->next_timer = next_local;
-		/* Take the next rcu event into account */
-		next_tick = next_rcu < next_local ? next_rcu : next_local;
+		/*
+		 * Take RCU into account.  Even though rcu_needs_cpu()
+		 * returned false, RCU might need a wake up event
+		 * after all, as reflected in the value of next_rcu.
+		 */
+		next_local = min_t(u64, next_rcu, next_local);
 	}
 
 	/*
+	 * The CPU might be the last one going inactive in the timer
+	 * migration hierarchy. Mark the CPU inactive and get the next
+	 * timer event returned, if the next local event is more than a
+	 * tick away.
+	 *
+	 * If the CPU is the last one to go inactive, then the earliest
+	 * event (next local or next global event in the hierarchy) is
+	 * returned.
+	 *
+	 * Otherwise the next local event is returned and the next global
+	 * event of this CPU is queued in the hierarchy.
+	 */
+	delta = next_local - basemono;
+	if (delta > (u64)TICK_NSEC)
+		next_local = tick_tmigr_idle(ts, next_local, next_global);
+
+	ts->next_timer = next_local;
+
+	/*
 	 * If the tick is due in the next period, keep it ticking or
 	 * force prod the timer.
 	 */
-	delta = next_tick - basemono;
+	delta = next_local - basemono;
 	if (delta <= (u64)TICK_NSEC) {
 		tick = 0;
 
 		/*
+		 * If the CPU deactivated itself in the timer migration
+		 * hierarchy, activate it again.
+		 */
+		tick_tmigr_stop_idle(ts);
+
+		/*
 		 * Tell the timer code that the base is not idle, i.e. undo
 		 * the effect of get_next_timer_interrupt():
 		 */
@@ -782,7 +850,7 @@ static ktime_t tick_nohz_stop_sched_tick
 	else
 		expires = KTIME_MAX;
 
-	expires = min_t(u64, expires, next_tick);
+	expires = min_t(u64, expires, next_local);
 	tick = expires;
 
 	/* Skip reprogram of event if its not changed */
@@ -1047,6 +1115,8 @@ void tick_nohz_idle_exit(void)
 
 	ts->inidle = 0;
 
+	tick_tmigr_stop_idle(ts);
+
 	if (ts->idle_active || ts->tick_stopped)
 		now = ktime_get();
 
@@ -1126,8 +1196,12 @@ static inline void tick_nohz_irq_enter(v
 	struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
 	ktime_t now;
 
-	if (!ts->idle_active && !ts->tick_stopped)
+	if (!ts->idle_active && !ts->tick_stopped && !ts->tmigr_idle)
 		return;
+
+	/* FIMXE: Is this really needed ???? */
+	tick_tmigr_stop_idle(ts);
+
 	now = ktime_get();
 	if (ts->idle_active)
 		tick_nohz_stop_idle(ts, now);
--- a/kernel/time/tick-sched.h
+++ b/kernel/time/tick-sched.h
@@ -27,7 +27,9 @@ enum tick_nohz_mode {
  *			timer is modified for nohz sleeps. This is necessary
  *			to resume the tick timer operation in the timeline
  *			when the CPU returns from nohz sleep.
+ * @inidle:		Indicator that the CPU is in the tick idle mode
  * @tick_stopped:	Indicator that the idle tick has been stopped
+ * @tmigr_idle:		Indicator that the CPU is idle vs. timer migration
  * @idle_jiffies:	jiffies at the entry to idle for idle time accounting
  * @idle_calls:		Total number of idle calls
  * @idle_sleeps:	Number of idle calls, where the sched tick was stopped
@@ -46,6 +48,7 @@ struct tick_sched {
 	ktime_t				last_tick;
 	int				inidle;
 	int				tick_stopped;
+	int				tmigr_idle;
 	unsigned long			idle_jiffies;
 	unsigned long			idle_calls;
 	unsigned long			idle_sleeps;
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -51,6 +51,7 @@
 #include <asm/timex.h>
 #include <asm/io.h>
 
+#include "timer_migration.h"
 #include "tick-internal.h"
 
 #define CREATE_TRACE_POINTS
@@ -185,6 +186,10 @@ EXPORT_SYMBOL(jiffies_64);
 #define WHEEL_SIZE	(LVL_SIZE * LVL_DEPTH)
 
 #ifdef CONFIG_NO_HZ_COMMON
+/*
+ * If multiple bases need to be locked, use the base ordering for lock
+ * nesting, i.e. lowest number first.
+ */
 # define NR_BASES	3
 # define BASE_LOCAL	0
 # define BASE_GLOBAL	1
@@ -216,7 +221,7 @@ unsigned int sysctl_timer_migration = 1;
 
 void timers_update_migration(bool update_nohz)
 {
-	bool on = sysctl_timer_migration && tick_nohz_active;
+	bool on = sysctl_timer_migration && tick_nohz_active && tmigr_enabled;
 	unsigned int cpu;
 
 	/* Avoid the loop, if nothing to update */
@@ -1619,13 +1624,27 @@ static int collect_expired_timers(struct
 	}
 	return __collect_expired_timers(base, heads);
 }
-#else
+
+static inline void __run_timers(struct timer_base *base);
+
+#ifdef CONFIG_SMP
+void timer_expire_remote(unsigned int cpu)
+{
+	struct timer_base *base = per_cpu_ptr(&timer_bases[BASE_GLOBAL], cpu);
+
+	spin_lock_irq(&base->lock);
+	__run_timers(base);
+	spin_unlock_irq(&base->lock);
+}
+#endif
+
+#else /* CONFIG_NO_HZ_COMMON */
 static inline int collect_expired_timers(struct timer_base *base,
 					 struct hlist_head *heads)
 {
 	return __collect_expired_timers(base, heads);
 }
-#endif
+#endif /* !CONFIG_NO_HZ_COMMON */
 
 /*
  * Called from the timer interrupt handler to charge one tick to the current
@@ -1689,11 +1708,16 @@ static void run_timer_base(int index, bo
  */
 static __latent_entropy void run_timer_softirq(struct softirq_action *h)
 {
+	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_LOCAL]);
+
 	run_timer_base(BASE_LOCAL, false);
 
 	if (IS_ENABLED(CONFIG_NO_HZ_COMMON)) {
 		run_timer_base(BASE_GLOBAL, false);
 		run_timer_base(BASE_DEF, true);
+
+		if (base->nohz_active)
+			tmigr_handle_remote();
 	}
 }
 
@@ -1909,6 +1933,7 @@ void __init init_timers(void)
 {
 	init_timer_cpus();
 	open_softirq(TIMER_SOFTIRQ, run_timer_softirq);
+	tmigr_init();
 }
 
 /**
--- /dev/null
+++ b/kernel/time/timer_migration.c
@@ -0,0 +1,642 @@
+/*
+ * Infrastructure for migrateable timers
+ *
+ * Copyright(C) 2016 linutronix GmbH
+ *
+ * This code is licenced under the GPL version 2. For details see
+ * kernel-base/COPYING.
+ */
+#include <linux/cpuhotplug.h>
+#include <linux/slab.h>
+#include <linux/smp.h>
+#include <linux/spinlock.h>
+#include <linux/timerqueue.h>
+#include <linux/timer.h>
+
+#include "timer_migration.h"
+#include "tick-internal.h"
+
+#ifdef DEBUG
+# define DBG_BUG_ON(x)	BUG_ON(x)
+#else
+# define DBG_BUG_ON(x)
+#endif
+
+/* Per group capacity. Must be a power of 2! */
+static const unsigned int tmigr_childs_per_group = 8;
+
+bool tmigr_enabled __read_mostly;
+static unsigned int tmigr_hierarchy_levels __read_mostly;
+static unsigned int tmigr_crossnode_level __read_mostly;
+static struct list_head *tmigr_level_list __read_mostly;
+
+static DEFINE_MUTEX(tmigr_mutex);
+
+static DEFINE_PER_CPU(struct tmigr_cpu, tmigr_cpu);
+
+static void tmigr_add_evt(struct tmigr_group *group, struct tmigr_event *evt)
+{
+	/*
+	 * Can be called with @evt == NULL, an already queued @evt or
+	 * an event that do not need to be queued (expires ==
+	 * KTIME_MAX)
+	 */
+	if (!evt || !RB_EMPTY_NODE(&evt->nextevt.node) ||
+	    evt->nextevt.expires == KTIME_MAX)
+		return;
+
+	/* @group->group event must not be queued in the parent group */
+	DBG_BUG_ON(!RB_EMPTY_NODE(&group->groupevt.nextevt.node));
+
+	/*  If this is the new first to expire event, update group event */
+	if (timerqueue_add(&group->events, &evt->nextevt)) {
+		group->groupevt.nextevt.expires = evt->nextevt.expires;
+		group->groupevt.cpu = evt->cpu;
+	}
+}
+
+static void tmigr_remove_evt(struct tmigr_group *group, struct tmigr_event *evt)
+{
+	struct timerqueue_node *next;
+	struct tmigr_event *nextevt;
+	bool first;
+
+	/*
+	 * It's safe to modify the group event of this group, because it is
+	 * not queued in the parent group.
+	 */
+	DBG_BUG_ON(!RB_EMPTY_NODE(&group->groupevt.nextevt.node));
+
+	/* Remove the child event, if pending */
+	if (!evt || RB_EMPTY_NODE(&evt->nextevt.node))
+		return;
+	/*
+	 * If this was the last queued event in the group, clear
+	 * the group event. If this was the first event to expire,
+	 * update the group.
+	 */
+	first = (timerqueue_getnext(&group->events) == &evt->nextevt);
+
+	if (!timerqueue_del(&group->events, &evt->nextevt)) {
+		group->groupevt.nextevt.expires = KTIME_MAX;
+		group->groupevt.cpu = TMIGR_NONE;
+	} else if (first) {
+		next = timerqueue_getnext(&group->events);
+		nextevt = container_of(next, struct tmigr_event, nextevt);
+		group->groupevt.nextevt.expires = nextevt->nextevt.expires;
+		group->groupevt.cpu = nextevt->cpu;
+	}
+}
+
+static void tmigr_update_remote(unsigned int cpu, u64 now, unsigned long jif)
+{
+	struct tmigr_cpu *tmc = per_cpu_ptr(&tmigr_cpu, cpu);
+	struct tmigr_group *group = tmc->tmgroup;
+	u64 next_local, next_global;
+
+	/*
+	 * Here the migrator CPU races with the target CPU.  The migrator
+	 * removed @tmc->nextevt from the group's queue, but it then dropped
+	 * the group lock.  Concurrently the target CPU might have serviced
+	 * an interrupt and therefore have called tmigr_cpu_activate() and
+	 * possibly tmigr_cpu_idle() which requeued CPUs @tmc into @group.
+	 *
+	 * Must hold @tmc->lock for changing @tmc->nextevt and @group->lock
+	 * to protect the timer queue of @group.
+	 */
+	raw_spin_lock_irq(&tmc->lock);
+	raw_spin_lock(&group->lock);
+
+	/*
+	 * If the cpu went offline or marked itself active again, nothing
+	 * more to do.
+	 */
+	if (!tmc->online || cpumask_test_cpu(cpu, group->cpus))
+		goto done;
+
+	/*
+	 * Although __timgr_handle_remote() just dequeued the event, still
+	 * the target CPU might have added it again after the lock got
+	 * dropped. If it's queued the group queue is up to date.
+	 */
+	if (!RB_EMPTY_NODE(&tmc->cpuevt.nextevt.node))
+		goto done;
+
+	/*
+	 * Recalculate next event. Needs to be calculated while holding the
+	 * lock because the first expiring global timer could have been
+	 * removed since the last evaluation.
+	 */
+	next_local = get_next_timer_interrupt(jif, now, &next_global);
+
+	/*
+	 * If next_global is after next_local, event does not have to
+	 * be queued.
+	 */
+	if (next_global >= next_local)
+		next_global = KTIME_MAX;
+
+	tmc->cpuevt.nextevt.expires = next_global;
+
+	/* Queue @cpu event (is not ne queued if expires == KTIME_MAX) */
+	tmigr_add_evt(group, &tmc->cpuevt);
+
+done:
+	raw_spin_unlock(&group->lock);
+	raw_spin_unlock_irq(&tmc->lock);
+}
+
+static void __tmigr_handle_remote(struct tmigr_group *group, unsigned int cpu,
+				  u64 now, unsigned long jif, bool walkup)
+{
+	struct timerqueue_node *tmr;
+	struct tmigr_group *parent;
+	struct tmigr_event *evt;
+
+again:
+	raw_spin_lock_irq(&group->lock);
+	/*
+	 * Handle the group only if @cpu is the migrator or if the group
+	 * has no migrator. Otherwise the group is active and is handled by
+	 * its own migrator.
+	 */
+	if (group->migrator != cpu && group->migrator != TMIGR_NONE) {
+		raw_spin_unlock_irq(&group->lock);
+		return;
+	}
+
+	tmr = timerqueue_getnext(&group->events);
+	if (tmr && now >= tmr->expires) {
+		/*
+		 * Remove the expired entry from the queue and handle
+		 * it. If this is a leaf group, call the timer poll
+		 * function for the given cpu. Otherwise handle the group
+		 * itself.  Drop the group lock here in both cases to avoid
+		 * lock ordering inversions.
+		 */
+		evt = container_of(tmr, struct tmigr_event, nextevt);
+		tmigr_remove_evt(group, evt);
+
+		raw_spin_unlock_irq(&group->lock);
+
+		/*
+		 * If the event is a group event walk down the hierarchy of
+		 * that group to the CPU leafs. If not, handle the expired
+		 * timer from the remote CPU.
+		 */
+		if (evt->group) {
+			__tmigr_handle_remote(evt->group, cpu, now, jif, false);
+		} else {
+			timer_expire_remote(evt->cpu);
+			tmigr_update_remote(evt->cpu, now, jif);
+		}
+		goto again;
+	}
+
+	/*
+	 * If @group is not active, queue the next event in the parent
+	 * group. This is required, because the next event of @group
+	 * could have been changed by tmigr_update_remote() above.
+	 */
+	parent = group->parent;
+	if (parent && !group->active) {
+		raw_spin_lock_nested(&parent->lock, parent->level);
+		tmigr_add_evt(parent, &group->groupevt);
+		raw_spin_unlock(&parent->lock);
+	}
+	raw_spin_unlock_irq(&group->lock);
+
+	/* Walk the hierarchy up? */
+	if (!walkup || !parent)
+		return;
+
+	/* Racy lockless check: See comment in tmigr_handle_remote() */
+	if (parent->migrator == cpu)
+		__tmigr_handle_remote(parent, cpu, now, jif, true);
+}
+
+/**
+ * tmigr_handle_remote - Handle migratable timers on remote idle CPUs
+ *
+ * Called from the timer soft interrupt with interrupts enabled.
+ */
+void tmigr_handle_remote(void)
+{
+	struct tmigr_cpu *tmc = this_cpu_ptr(&tmigr_cpu);
+	int cpu = smp_processor_id();
+	unsigned long basej;
+	ktime_t now;
+
+	if (!tmigr_enabled)
+		return;
+
+	/*
+	 * Check whether this CPU is responsible for handling the global
+	 * timers of other CPUs. Do a racy lockless check to avoid lock
+	 * contention for the busy case where timer soft interrupts happen
+	 * in parallel. It's not an issue, if the CPU misses a concurrent
+	 * update of the migrator role for its base group. It's not more
+	 * racy than doing this check under the lock, if the update happens
+	 * right after the lock is dropped. There is no damage in such a
+	 * case other than potentially expiring a global timer one tick
+	 * late.
+	 */
+	if (tmc->tmgroup->migrator != cpu)
+		return;
+
+	now = get_jiffies_update(&basej);
+	__tmigr_handle_remote(tmc->tmgroup, cpu, now, basej, true);
+}
+
+/**
+ * tmigr_set_cpu_inactive - Set a CPU inactive in the group
+ * @group:	The group from which @cpu is removed
+ * @child:	The child group which was updated before
+ * @evt:	The event to queue in @group
+ * @cpu:	The CPU which becomes inactive
+ *
+ * Remove @cpu from @group and propagate it through the hierarchy if
+ * @cpu was the migrator of @group.
+ *
+ * Returns KTIME_MAX if @cpu is not the last outgoing CPU in the
+ * hierarchy. Otherwise it returns the first expiring global event.
+ */
+static u64 tmigr_set_cpu_inactive(struct tmigr_group *group,
+				  struct tmigr_group *child,
+				  struct tmigr_event *evt,
+				  unsigned int cpu)
+{
+	struct tmigr_group *parent;
+	u64 nextevt = KTIME_MAX;
+
+	raw_spin_lock_nested(&group->lock, group->level);
+
+	DBG_BUG_ON(!group->active);
+
+	cpumask_clear_cpu(cpu, group->cpus);
+	group->active--;
+
+	/*
+	 * If @child is not NULL, then this is a recursive invocation to
+	 * propagate the deactivation of @cpu. If @child has a new migrator
+	 * set it active in @group.
+	 */
+	if (child && child->migrator != TMIGR_NONE) {
+		cpumask_set_cpu(child->migrator, group->cpus);
+		group->active++;
+	}
+
+	/* Add @evt to @group */
+	tmigr_add_evt(group, evt);
+
+	/* If @cpu is not the active migrator, everything is up to date */
+	if (group->migrator != cpu)
+		goto done;
+
+	/* Update the migrator. */
+	if (!group->active)
+		group->migrator = TMIGR_NONE;
+	else
+		group->migrator = cpumask_first(group->cpus);
+
+	parent = group->parent;
+	if (parent) {
+		/*
+		 * @cpu was the migrator in @group, so it is marked as
+		 * active in its parent group(s) as well. Propagate the
+		 * migrator change.
+		 */
+		evt = group->active ? NULL : &group->groupevt;
+		nextevt = tmigr_set_cpu_inactive(parent, group, evt, cpu);
+	} else {
+		/*
+		 * This is the top level of the hierarchy. If @cpu is about
+		 * to go offline wake up some random other cpu so it will
+		 * take over the migrator duty and program its timer
+		 * proper. Ideally wake the cpu with the closest expiry
+		 * time, but that's overkill to figure out.
+		 */
+		if (!per_cpu(tmigr_cpu, cpu).online) {
+			cpu = cpumask_any_but(cpu_online_mask, cpu);
+			smp_send_reschedule(cpu);
+		}
+		/*
+		 * Return the earliest event of the top level group to make
+		 * sure that its handled.
+		 *
+		 * This could be optimized by keeping track of the last
+		 * global scheduled event and only arming it on @cpu if the
+		 * new event is earlier. Not sure if its worth the
+		 * complexity.
+		 */
+		nextevt = group->groupevt.nextevt.expires;
+	}
+done:
+	raw_spin_unlock(&group->lock);
+	return nextevt;
+}
+
+/**
+ * tmigr_cpu_idle - Put current CPU into idle state
+ * @nextevt:	The next timer event set in the current CPU
+ *
+ * Returns either the next event of the current CPU or the next event from
+ * the hierarchy if this CPU is the top level migrator.
+ *
+ * Must be called with interrupts disabled.
+ */
+u64 tmigr_cpu_idle(u64 nextevt)
+{
+	struct tmigr_cpu *tmc = this_cpu_ptr(&tmigr_cpu);
+	struct tmigr_group *group = tmc->tmgroup;
+	int cpu = smp_processor_id();
+
+	if (!tmc->online)
+		return nextevt;
+
+	raw_spin_lock(&tmc->lock);
+	tmc->cpuevt.nextevt.expires = nextevt;
+	nextevt = tmigr_set_cpu_inactive(group, NULL, &tmc->cpuevt, cpu);
+	raw_spin_unlock(&tmc->lock);
+	return nextevt;
+}
+
+/*
+ * tmigr_set_cpu_active - Propagate the activation of a CPU
+ * @group:	The group in which the CPU is activated
+ * @evt:	The event which is removed from @group
+ * @cpu:	The CPU which is activated
+ */
+static void tmigr_set_cpu_active(struct tmigr_group *group,
+				 struct tmigr_event *evt,
+				 unsigned int cpu)
+{
+	raw_spin_lock_nested(&group->lock, group->level);
+
+	if (WARN_ON(group->active == group->num_childs)) {
+		raw_spin_unlock(&group->lock);
+		return;
+	}
+
+	cpumask_set_cpu(cpu, group->cpus);
+	group->active++;
+
+	/* The first active cpu in a group takes the migrator role */
+	if (group->active == 1) {
+		struct tmigr_group *parent = group->parent;
+
+		group->migrator = cpu;
+		/* Propagate through the hierarchy */
+		if (parent)
+			tmigr_set_cpu_active(parent, &group->groupevt, cpu);
+	}
+	/*
+	 * Update groupevt and dequeue @evt. Must be called after parent
+	 * groups have been updated above so @group->groupevt is inactive.
+	 */
+	tmigr_remove_evt(group, evt);
+	raw_spin_unlock(&group->lock);
+}
+
+/**
+ * tmigr_cpu_activate - Activate current CPU
+ *
+ * Called from the NOHZ and cpu online code.
+ */
+void tmigr_cpu_activate(void)
+{
+	struct tmigr_cpu *tmc = this_cpu_ptr(&tmigr_cpu);
+	struct tmigr_group *group = tmc->tmgroup;
+	int cpu = smp_processor_id();
+	unsigned long flags;
+
+	if (!tmc->online || !group)
+		return;
+
+	local_irq_save(flags);
+	tmigr_set_cpu_active(group, &tmc->cpuevt, cpu);
+	local_irq_restore(flags);
+}
+
+static void tmigr_free_group(struct tmigr_group *group)
+{
+	if (group->parent) {
+		group->parent->num_childs--;
+		if (!group->parent->num_childs)
+			tmigr_free_group(group->parent);
+	}
+	list_del(&group->list);
+	free_cpumask_var(group->cpus);
+	kfree(group);
+}
+
+static void tmigr_init_group(struct tmigr_group *group, unsigned int lvl,
+			     unsigned int node)
+{
+	raw_spin_lock_init(&group->lock);
+	group->level = lvl;
+	group->numa_node = lvl < tmigr_crossnode_level ? node : NUMA_NO_NODE;
+	group->migrator = TMIGR_NONE;
+	timerqueue_init_head(&group->events);
+	timerqueue_init(&group->groupevt.nextevt);
+	group->groupevt.group = group;
+	group->groupevt.nextevt.expires = KTIME_MAX;
+	group->groupevt.cpu = TMIGR_NONE;
+	group->num_childs = 1;
+}
+
+static struct tmigr_group *tmigr_get_group(unsigned int node, unsigned int lvl)
+{
+	struct tmigr_group *group;
+
+	/* Try to attach to an exisiting group first */
+	list_for_each_entry(group, &tmigr_level_list[lvl], list) {
+		/*
+		 * If @lvl is below the cross numa node level, check
+		 * whether this group belongs to the same numa node.
+		 */
+		if (lvl < tmigr_crossnode_level && group->numa_node != node)
+			continue;
+		/* If the group has capacity, use it */
+		if (group->num_childs < tmigr_childs_per_group) {
+			group->num_childs++;
+			return group;
+		}
+	}
+	/* Allocate and set up a new group */
+	group = kzalloc_node(sizeof(*group), GFP_KERNEL, node);
+	if (!group)
+		return ERR_PTR(-ENOMEM);
+
+	if (!zalloc_cpumask_var_node(&group->cpus, GFP_KERNEL, node)) {
+		kfree(group);
+		return ERR_PTR(-ENOMEM);
+	}
+	tmigr_init_group(group, lvl, node);
+	/* Setup successful. Add it to the hierarchy */
+	list_add(&group->list, &tmigr_level_list[lvl]);
+	return group;
+}
+
+static int tmigr_setup_parents(unsigned int lvl)
+{
+	struct list_head *lvllist = &tmigr_level_list[lvl];
+	struct tmigr_group *group, *parent;
+	int ret = 0;
+
+	/* End of hierarchy reached? */
+	if (list_is_singular(lvllist))
+		return 0;
+
+	DBG_BUG_ON(lvl == tmigr_hierarchy_levels);
+
+	list_for_each_entry(group, lvllist, list) {
+		if (group->parent)
+			continue;
+		parent = tmigr_get_group(group->numa_node, lvl + 1);
+		if (IS_ERR(parent))
+			return PTR_ERR(parent);
+
+		raw_spin_lock_irq(&group->lock);
+		group->parent = parent;
+		if (group->active)
+			tmigr_set_cpu_active(parent, NULL, group->migrator);
+		raw_spin_unlock_irq(&group->lock);
+		ret = 1;
+	}
+	return ret;
+}
+
+static int tmigr_check_hierarchy(void)
+{
+	int lvl, ret = 0;
+
+	for (lvl = 0; lvl < tmigr_hierarchy_levels; lvl++) {
+		ret = tmigr_setup_parents(lvl);
+		if (ret != 1)
+			break;
+	}
+	return ret;
+}
+
+static struct tmigr_group *tmigr_add_cpu(unsigned int cpu)
+{
+	unsigned int node = cpu_to_node(cpu);
+	struct tmigr_group *group;
+
+	mutex_lock(&tmigr_mutex);
+	group = tmigr_get_group(node, 0);
+	if (IS_ERR(group))
+		goto out;
+	/*
+	 * If the group was newly allocated, connect it
+	 * to parent group(s) if necessary.
+	 */
+	if (group->num_childs == 1) {
+		int ret = tmigr_check_hierarchy();
+
+		if (ret < 0) {
+			tmigr_free_group(group);
+			group = ERR_PTR(ret);
+		}
+	}
+out:
+	mutex_unlock(&tmigr_mutex);
+	return group;
+}
+
+static int tmigr_cpu_online(unsigned int cpu)
+{
+	struct tmigr_cpu *tmc = this_cpu_ptr(&tmigr_cpu);
+	struct tmigr_group *group;
+
+	/* First online attempt? Initialize cpu data */
+	if (!tmc->tmgroup) {
+		raw_spin_lock_init(&tmc->lock);
+		timerqueue_init(&tmc->cpuevt.nextevt);
+		tmc->cpuevt.group = NULL;
+		tmc->cpuevt.cpu = cpu;
+		group = tmigr_add_cpu(cpu);
+		if (IS_ERR(group))
+			return PTR_ERR(group);
+		tmc->tmgroup = group;
+	}
+	tmc->online = true;
+	tmigr_cpu_activate();
+	return 0;
+}
+
+static int tmigr_cpu_offline(unsigned int cpu)
+{
+	struct tmigr_cpu *tmc = this_cpu_ptr(&tmigr_cpu);
+	struct tmigr_group *group = tmc->tmgroup;
+
+	local_irq_disable();
+	tmc->online = false;
+	tmigr_set_cpu_inactive(group, NULL, NULL, cpu);
+	local_irq_enable();
+
+	return 0;
+}
+
+void __init tmigr_init(void)
+{
+	unsigned int cpulvl, nodelvl, cpus_per_node, i;
+	unsigned int nnodes = num_possible_nodes();
+	unsigned int ncpus = num_possible_cpus();
+	struct tmigr_group *group;
+	size_t sz;
+
+	/*
+	 * Calculate the required hierarchy levels. Unfortunately there is
+	 * no reliable information available, unless all possible CPUs have
+	 * been brought up and all numa nodes are populated.
+	 *
+	 * Estimate the number of levels with the number of possible nodes and
+	 * the number of possible cpus. Assume CPUs are spread evenly accross
+	 * nodes.
+	 */
+	cpus_per_node = DIV_ROUND_UP(ncpus, nnodes);
+	/* Calc the hierarchy levels required to hold the CPUs of a node */
+	cpulvl = DIV_ROUND_UP(order_base_2(cpus_per_node),
+			      ilog2(tmigr_childs_per_group));
+	/* Calculate the extra levels to connect all nodes */
+	nodelvl = DIV_ROUND_UP(order_base_2(nnodes),
+			       ilog2(tmigr_childs_per_group));
+
+	tmigr_hierarchy_levels = cpulvl + nodelvl;
+	/*
+	 * If a numa node spawns more than one CPU level group then the
+	 * next level(s) of the hierarchy contains groups which handle all
+	 * CPU groups of the same numa node. The level above goes accross
+	 * numa nodes. Store this information for the setup code to decide
+	 * when node matching is not longer required.
+	 */
+	tmigr_crossnode_level = cpulvl;
+
+	sz = sizeof(struct list_head) * tmigr_hierarchy_levels;
+	tmigr_level_list = kzalloc(sz, GFP_KERNEL);
+	if (!tmigr_level_list)
+		goto err;
+
+	for (i = 0; i < tmigr_hierarchy_levels; i++)
+		INIT_LIST_HEAD(&tmigr_level_list[i]);
+
+	if (cpuhp_setup_state(CPUHP_AP_TMIGR_ONLINE, "tmigr:online",
+			      tmigr_cpu_online, tmigr_cpu_offline))
+		goto hp_err;
+
+	tmigr_enabled = true;
+	pr_info("Timer migration: %d hierarchy levels\n", tmigr_hierarchy_levels);
+	return;
+
+hp_err:
+	/* Walk levels and free already allocated groups */
+	for (i = 0; i < tmigr_hierarchy_levels; i++) {
+		list_for_each_entry(group, &tmigr_level_list[i], list)
+			tmigr_free_group(group);
+	}
+	kfree(tmigr_level_list);
+err:
+	pr_err("Timer migration setup failed\n");
+}
--- /dev/null
+++ b/kernel/time/timer_migration.h
@@ -0,0 +1,89 @@
+/*
+ * Infrastructure for migrateable timers
+ *
+ * Copyright(C) 2016 linutronix GmbH
+ *
+ * This code is licenced under the GPL version 2. For details see
+ * kernel-base/COPYING.
+ */
+#ifndef _KERNEL_TIME_MIGRATION_H
+#define _KERNEL_TIME_MIGRATION_H
+
+#ifdef CONFIG_NO_HZ_COMMON
+extern bool tmigr_enabled;
+void tmigr_init(void);
+#else
+static inline void tmigr_init(void) { }
+#endif
+
+#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
+extern void tmigr_handle_remote(void);
+extern u64 tmigr_cpu_idle(u64 nextevt);
+extern void tmigr_cpu_activate(void);
+extern void timer_expire_remote(unsigned int cpu);
+#else
+static inline void tmigr_handle_remote(void) { }
+static inline u64 tmigr_cpu_idle(u64 nextevt) { }
+static inline void tmigr_cpu_activate(void) { }
+#endif
+
+#define TMIGR_NONE		(~0U)
+
+/**
+ * struct tmigr_event - A timer event associated to a CPU or a group
+ * @nextevt:	The node to enqueue an event in the group queue
+ * @group:	The group to which this event belongs (NULL if a cpu event)
+ * @cpu:	The cpu to which this event belongs (TMIGR_NONE if a group
+ *		event)
+ */
+struct tmigr_event {
+	struct timerqueue_node	nextevt;
+	struct tmigr_group	*group;
+	unsigned int		cpu;
+};
+
+/**
+ * struct tmigr_group - the hierachical group for timer migration structure
+ * @lock:	Group serialization. Must be taken with interrupts disabled.
+ * @active:	Specifies the number of active (not offline and not idle)
+ *		childs of the group
+ * @migrator:	CPU id of the migrator for this group or TMIGR_NONE
+ * @events:	timerqueue head of all events of the group
+ * @groupevt:	Next event of the group
+ * @parent:	Pointer to the parent group. Null if top level group
+ * @cpus:	CPU mask to track the active CPUs
+ * @list:	List head to queue in the global group level lists
+ * @level:	Specifies the hierarchy level of the group
+ * @numa_node:	Specifies the numa node of the group. NUMA_NO_NODE if the
+ *		group spawns multiple numa nodes
+ * @num_childs:	Number of childs of a group that are connected with the group
+ */
+struct tmigr_group {
+	raw_spinlock_t		lock;
+	unsigned int		active;
+	unsigned int		migrator;
+	struct timerqueue_head	events;
+	struct tmigr_event	groupevt;
+	struct tmigr_group	*parent;
+	cpumask_var_t		cpus;
+	struct list_head	list;
+	unsigned int		level;
+	unsigned int		numa_node;
+	unsigned int		num_childs;
+};
+
+/**
+ * struct tmigr_cpu - Per CPU entry connected to a leaf group in the hierarchy
+ * @lock:	Protection for the per cpu data
+ * @online:	Online marker vs. the timer migration functionality.
+ * @cpuevt:	Specifies the next timer event of the CPU
+ * @tmgroup:	Pointer to the leaf group to which this CPU belongs
+ */
+struct tmigr_cpu {
+	raw_spinlock_t		lock;
+	bool			online;
+	struct tmigr_event	cpuevt;
+	struct tmigr_group	*tmgroup;
+};
+
+#endif

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [patch 09/10] timer/migration: Add tracepoints
  2017-04-17 18:32 [patch 00/10] timer: Move from a push remote at enqueue to a pull at expiry model Thomas Gleixner
                   ` (7 preceding siblings ...)
  2017-04-17 18:32 ` [patch 08/10] timer: Implement the hierarchical pull model Thomas Gleixner
@ 2017-04-17 18:32 ` Thomas Gleixner
  2017-04-17 19:09   ` Steven Rostedt
  2017-04-17 18:32 ` [patch 10/10] timer: Always queue timers on the local CPU Thomas Gleixner
  2017-04-18 13:57 ` [patch 00/10] timer: Move from a push remote at enqueue to a pull at expiry model Rafael J. Wysocki
  10 siblings, 1 reply; 13+ messages in thread
From: Thomas Gleixner @ 2017-04-17 18:32 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, John Stultz, Eric Dumazet, Anna-Maria Gleixner,
	Rafael J. Wysocki, linux-pm, Arjan van de Ven, Paul E. McKenney,
	Frederic Weisbecker, Rik van Riel, Steven Rostedt

[-- Attachment #1: timer-migration-Add-tracepoints.patch --]
[-- Type: text/plain, Size: 7189 bytes --]

The timer pull logic needs proper debuging aids. Add tracepoints so the
hierarchical idle machinery can be diagnosed.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/trace/events/timer_migration.h |  173 +++++++++++++++++++++++++++++++++
 kernel/time/timer_migration.c          |   17 +++
 2 files changed, 190 insertions(+)

--- /dev/null
+++ b/include/trace/events/timer_migration.h
@@ -0,0 +1,173 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM timer_migration
+
+#if !defined(_TRACE_TIMER_MIGRATION_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_TIMER_MIGRATION_H
+
+#include <linux/tracepoint.h>
+
+/* Group events */
+DECLARE_EVENT_CLASS(tmigr_group,
+
+	TP_PROTO(struct tmigr_group *group),
+
+	TP_ARGS(group),
+
+	TP_STRUCT__entry(
+		__field( void *,	group	)
+		__field( unsigned int,	lvl	)
+		__field( unsigned int,	numa_node )
+		__field( unsigned int,	active )
+		__field( unsigned int,	migrator )
+		__field( unsigned int,	num_childs )
+		__field( void *,	parent	)
+		__field( u64,		nextevt	)
+		__field( unsigned int,	evtcpu	)
+	),
+
+	TP_fast_assign(
+		__entry->group		= group;
+		__entry->lvl		= group->level;
+		__entry->numa_node	= group->numa_node;
+		__entry->active		= group->active;
+		__entry->migrator	= group->migrator;
+		__entry->num_childs	= group->num_childs;
+		__entry->parent		= group->parent;
+		__entry->nextevt	= group->groupevt.nextevt.expires;
+		__entry->evtcpu		= group->groupevt.cpu;
+	),
+
+	TP_printk("group=%p lvl=%d numa=%d active=%d migrator=%d num_childs=%d "
+		  "parent=%p nextevt=%llu evtcpu=%d",
+		  __entry->group, __entry->lvl, __entry->numa_node,
+		  __entry->active, __entry->migrator, __entry->num_childs,
+		  __entry->parent, __entry->nextevt, __entry->evtcpu)
+);
+
+DEFINE_EVENT(tmigr_group, tmigr_group_addevt,
+
+	TP_PROTO(struct tmigr_group *group),
+
+	TP_ARGS(group)
+);
+
+DEFINE_EVENT(tmigr_group, tmigr_group_removeevt,
+
+	TP_PROTO(struct tmigr_group *group),
+
+	TP_ARGS(group)
+);
+
+DEFINE_EVENT(tmigr_group, tmigr_group_set_cpu_inactive,
+
+	TP_PROTO(struct tmigr_group *group),
+
+	TP_ARGS(group)
+);
+
+DEFINE_EVENT(tmigr_group, tmigr_group_set_cpu_active,
+
+	TP_PROTO(struct tmigr_group *group),
+
+	TP_ARGS(group)
+);
+
+DEFINE_EVENT(tmigr_group, tmigr_group_free,
+
+	TP_PROTO(struct tmigr_group *group),
+
+	TP_ARGS(group)
+);
+
+DEFINE_EVENT(tmigr_group, tmigr_group_set,
+
+	TP_PROTO(struct tmigr_group *group),
+
+	TP_ARGS(group)
+);
+
+DEFINE_EVENT(tmigr_group, tmigr_group_setup_parents,
+
+	TP_PROTO(struct tmigr_group *group),
+
+	TP_ARGS(group)
+);
+
+/* CPU events*/
+DECLARE_EVENT_CLASS(tmigr_cpugroup,
+
+	TP_PROTO(struct tmigr_cpu *tcpu, unsigned int cpu),
+
+	TP_ARGS(tcpu, cpu),
+
+	TP_STRUCT__entry(
+		__field( unsigned int,	cpu)
+		__field( void *,	parent)
+	),
+
+	TP_fast_assign(
+		__entry->cpu		= cpu;
+		__entry->parent		= tcpu->tmgroup;
+	),
+
+	TP_printk("cpu=%d parent=%p", __entry->cpu, __entry->parent)
+);
+
+DEFINE_EVENT(tmigr_cpugroup, tmigr_cpu_update_remote,
+
+	TP_PROTO(struct tmigr_cpu *tcpu, unsigned int cpu),
+
+	TP_ARGS(tcpu, cpu)
+);
+
+DEFINE_EVENT(tmigr_cpugroup, tmigr_cpu_add,
+
+	TP_PROTO(struct tmigr_cpu *tcpu, unsigned int cpu),
+
+	TP_ARGS(tcpu, cpu)
+);
+
+/* Other events */
+TRACE_EVENT(tmigr_handle_remote,
+
+	TP_PROTO(struct tmigr_group *group, unsigned int cpu),
+
+	TP_ARGS(group, cpu),
+
+	TP_STRUCT__entry(
+		__field( void *,	group	)
+		__field( unsigned int,	lvl	)
+		__field( unsigned int,	numa_node )
+		__field( unsigned int,	active )
+		__field( unsigned int,	migrator )
+		__field( unsigned int,	num_childs )
+		__field( void *,	parent	)
+		__field( u64,		nextevt	)
+		__field( unsigned int,	evtcpu	)
+		__field( unsigned int,	cpu	)
+	),
+
+	TP_fast_assign(
+		__entry->group		= group;
+		__entry->lvl		= group->level;
+		__entry->numa_node	= group->numa_node;
+		__entry->active		= group->active;
+		__entry->migrator	= group->migrator;
+		__entry->num_childs	= group->num_childs;
+		__entry->parent		= group->parent;
+		__entry->nextevt	= group->groupevt.nextevt.expires;
+		__entry->evtcpu		= group->groupevt.cpu;
+		__entry->cpu		= cpu;
+	),
+
+	TP_printk("group=%p lvl=%d numa=%d active=%d migrator=%d num_childs=%d "
+		  "parent=%p nextevt=%llu evtcpu=%d cpu=%d",
+		  __entry->group, __entry->lvl, __entry->numa_node,
+		  __entry->active, __entry->migrator, __entry->num_childs,
+		  __entry->parent, __entry->nextevt, __entry->evtcpu, __entry->cpu)
+);
+
+#endif /*  _TRACE_TIMER_MIGRATION_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
--- a/kernel/time/timer_migration.c
+++ b/kernel/time/timer_migration.c
@@ -16,6 +16,9 @@
 #include "timer_migration.h"
 #include "tick-internal.h"
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/timer_migration.h>
+
 #ifdef DEBUG
 # define DBG_BUG_ON(x)	BUG_ON(x)
 #else
@@ -53,6 +56,8 @@ static void tmigr_add_evt(struct tmigr_g
 		group->groupevt.nextevt.expires = evt->nextevt.expires;
 		group->groupevt.cpu = evt->cpu;
 	}
+
+	trace_tmigr_group_addevt(group);
 }
 
 static void tmigr_remove_evt(struct tmigr_group *group, struct tmigr_event *evt)
@@ -86,6 +91,8 @@ static void tmigr_remove_evt(struct tmig
 		group->groupevt.nextevt.expires = nextevt->nextevt.expires;
 		group->groupevt.cpu = nextevt->cpu;
 	}
+
+	trace_tmigr_group_removeevt(group);
 }
 
 static void tmigr_update_remote(unsigned int cpu, u64 now, unsigned long jif)
@@ -142,6 +149,7 @@ static void tmigr_update_remote(unsigned
 	tmigr_add_evt(group, &tmc->cpuevt);
 
 done:
+	trace_tmigr_cpu_update_remote(tmc, cpu);
 	raw_spin_unlock(&group->lock);
 	raw_spin_unlock_irq(&tmc->lock);
 }
@@ -153,6 +161,8 @@ static void __tmigr_handle_remote(struct
 	struct tmigr_group *parent;
 	struct tmigr_event *evt;
 
+	trace_tmigr_handle_remote(group, cpu);
+
 again:
 	raw_spin_lock_irq(&group->lock);
 	/*
@@ -332,6 +342,7 @@ static u64 tmigr_set_cpu_inactive(struct
 		nextevt = group->groupevt.nextevt.expires;
 	}
 done:
+	trace_tmigr_group_set_cpu_inactive(group);
 	raw_spin_unlock(&group->lock);
 	return nextevt;
 }
@@ -390,6 +401,9 @@ static void tmigr_set_cpu_active(struct
 		if (parent)
 			tmigr_set_cpu_active(parent, &group->groupevt, cpu);
 	}
+
+	trace_tmigr_group_set_cpu_active(group);
+
 	/*
 	 * Update groupevt and dequeue @evt. Must be called after parent
 	 * groups have been updated above so @group->groupevt is inactive.
@@ -425,6 +439,7 @@ static void tmigr_free_group(struct tmig
 		if (!group->parent->num_childs)
 			tmigr_free_group(group->parent);
 	}
+	trace_tmigr_group_free(group);
 	list_del(&group->list);
 	free_cpumask_var(group->cpus);
 	kfree(group);
@@ -475,6 +490,7 @@ static struct tmigr_group *tmigr_get_gro
 	tmigr_init_group(group, lvl, node);
 	/* Setup successful. Add it to the hierarchy */
 	list_add(&group->list, &tmigr_level_list[lvl]);
+	trace_tmigr_group_set(group);
 	return group;
 }
 
@@ -502,6 +518,7 @@ static int tmigr_setup_parents(unsigned
 		if (group->active)
 			tmigr_set_cpu_active(parent, NULL, group->migrator);
 		raw_spin_unlock_irq(&group->lock);
+		trace_tmigr_group_setup_parents(group);
 		ret = 1;
 	}
 	return ret;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [patch 10/10] timer: Always queue timers on the local CPU
  2017-04-17 18:32 [patch 00/10] timer: Move from a push remote at enqueue to a pull at expiry model Thomas Gleixner
                   ` (8 preceding siblings ...)
  2017-04-17 18:32 ` [patch 09/10] timer/migration: Add tracepoints Thomas Gleixner
@ 2017-04-17 18:32 ` Thomas Gleixner
  2017-04-18 13:57 ` [patch 00/10] timer: Move from a push remote at enqueue to a pull at expiry model Rafael J. Wysocki
  10 siblings, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2017-04-17 18:32 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, John Stultz, Eric Dumazet, Anna-Maria Gleixner,
	Rafael J. Wysocki, linux-pm, Arjan van de Ven, Paul E. McKenney,
	Frederic Weisbecker, Rik van Riel, Richard Cochran

[-- Attachment #1: timer_Place_timers_on_the_local_CPU.patch --]
[-- Type: text/plain, Size: 1689 bytes --]

The timer pull model is in place so we can remove the heuristics which try
to guess the best target CPU at enqueue/modification time.

All non pinned timers are queued on the local cpu in the seperate storage
and eventually pulled at expiry time to a remote cpu.

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/time/timer.c |   20 +-------------------
 1 file changed, 1 insertion(+), 19 deletions(-)

--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -857,18 +857,6 @@ static inline struct timer_base *get_tim
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
-static inline struct timer_base *
-get_target_base(struct timer_base *base, unsigned tflags)
-{
-#ifdef CONFIG_SMP
-	if ((tflags & TIMER_PINNED) || !base->migration_enabled)
-		return get_timer_this_cpu_base(tflags);
-	return get_timer_cpu_base(tflags, get_nohz_timer_target());
-#else
-	return get_timer_this_cpu_base(tflags);
-#endif
-}
-
 static inline void forward_timer_base(struct timer_base *base)
 {
 	unsigned long jnow = READ_ONCE(jiffies);
@@ -890,12 +878,6 @@ static inline void forward_timer_base(st
 		base->clk = base->next_expiry;
 }
 #else
-static inline struct timer_base *
-get_target_base(struct timer_base *base, unsigned tflags)
-{
-	return get_timer_this_cpu_base(tflags);
-}
-
 static inline void forward_timer_base(struct timer_base *base) { }
 #endif
 
@@ -985,7 +967,7 @@ static inline int
 	if (!ret && pending_only)
 		goto out_unlock;
 
-	new_base = get_target_base(base, timer->flags);
+	new_base = get_timer_this_cpu_base(timer->flags);
 
 	if (base != new_base) {
 		/*

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch 09/10] timer/migration: Add tracepoints
  2017-04-17 18:32 ` [patch 09/10] timer/migration: Add tracepoints Thomas Gleixner
@ 2017-04-17 19:09   ` Steven Rostedt
  0 siblings, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2017-04-17 19:09 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, John Stultz, Eric Dumazet,
	Anna-Maria Gleixner, Rafael J. Wysocki, linux-pm,
	Arjan van de Ven, Paul E. McKenney, Frederic Weisbecker,
	Rik van Riel

On Mon, 17 Apr 2017 20:32:50 +0200
Thomas Gleixner <tglx@linutronix.de> wrote:

> The timer pull logic needs proper debuging aids. Add tracepoints so the
> hierarchical idle machinery can be diagnosed.
> 
> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  include/trace/events/timer_migration.h |  173 +++++++++++++++++++++++++++++++++
>  kernel/time/timer_migration.c          |   17 +++
>  2 files changed, 190 insertions(+)
> 
> --- /dev/null
> +++ b/include/trace/events/timer_migration.h
> @@ -0,0 +1,173 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM timer_migration
> +
> +#if !defined(_TRACE_TIMER_MIGRATION_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_TIMER_MIGRATION_H
> +
> +#include <linux/tracepoint.h>
> +
> +/* Group events */
> +DECLARE_EVENT_CLASS(tmigr_group,
> +
> +	TP_PROTO(struct tmigr_group *group),
> +
> +	TP_ARGS(group),
> +
> +	TP_STRUCT__entry(
> +		__field( void *,	group	)
> +		__field( unsigned int,	lvl	)
> +		__field( unsigned int,	numa_node )
> +		__field( unsigned int,	active )
> +		__field( unsigned int,	migrator )
> +		__field( unsigned int,	num_childs )
> +		__field( void *,	parent	)
> +		__field( u64,		nextevt	)
> +		__field( unsigned int,	evtcpu	)

On 64 bit boxes, with long and pointers as 8 bytes and int is only 4
bytes, the above can be laid out better, as the above structure will
most likely create holes. Like a 4 byte one after num_childs. Perhaps
move num_childs down to above evtcpu. In other words, please pair ints
together when possible, between pointers and longs.

The order of the struct does not need to be the same as the order of
the output.

Thanks,

-- Steve

> +	),
> +
> +	TP_fast_assign(
> +		__entry->group		= group;
> +		__entry->lvl		= group->level;
> +		__entry->numa_node	= group->numa_node;
> +		__entry->active		= group->active;
> +		__entry->migrator	= group->migrator;
> +		__entry->num_childs	= group->num_childs;
> +		__entry->parent		= group->parent;
> +		__entry->nextevt	= group->groupevt.nextevt.expires;
> +		__entry->evtcpu		= group->groupevt.cpu;
> +	),
> +
> +	TP_printk("group=%p lvl=%d numa=%d active=%d migrator=%d num_childs=%d "
> +		  "parent=%p nextevt=%llu evtcpu=%d",
> +		  __entry->group, __entry->lvl, __entry->numa_node,
> +		  __entry->active, __entry->migrator, __entry->num_childs,
> +		  __entry->parent, __entry->nextevt, __entry->evtcpu)
> +);
> +

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch 00/10] timer: Move from a push remote at enqueue to a pull at expiry model
  2017-04-17 18:32 [patch 00/10] timer: Move from a push remote at enqueue to a pull at expiry model Thomas Gleixner
                   ` (9 preceding siblings ...)
  2017-04-17 18:32 ` [patch 10/10] timer: Always queue timers on the local CPU Thomas Gleixner
@ 2017-04-18 13:57 ` Rafael J. Wysocki
  10 siblings, 0 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2017-04-18 13:57 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, John Stultz, Eric Dumazet,
	Anna-Maria Gleixner, Rafael J. Wysocki, linux-pm,
	Arjan van de Ven, Paul E. McKenney, Frederic Weisbecker,
	Rik van Riel

On Monday, April 17, 2017 08:32:41 PM Thomas Gleixner wrote:
> Placing timers at enqueue time on a target CPU based on dubious heuristics
> does not make any sense:
> 
>  1) Most timer wheel timers are canceled or rearmed before they expire.
> 
>  2) The heuristics to predict which CPU will be busy when the timer expires
>     are wrong by definition.
> 
> So we waste precious cycles to place timers at enqueue time.
> 
> The proper solution to this problem is to always queue the timers on the
> local CPU and allow the non pinned timers to be pulled onto a busy CPU at
> expiry time.
> 
> To achieve this the timer storage has been split into local pinned and
> global timers. Local pinned timers are always expired on the CPU on which
> they have been queued. Global timers can be expired on any CPU.
> 
> As long as a CPU is busy it expires both local and global timers. When a
> CPU goes idle it arms for the first expiring local timer. If the first
> expiring pinned (local) timer is before the first expiring movable timer,
> then no action is required because the CPU will wake up before the first
> movable timer expires. If the first expiring movable timer is before the
> first expiring pinned (local) timer, then this timer is queued into a idle
> timerqueue and eventually expired by some other active CPU.
> 
> To avoid global locking the timerqueues are implemented as a hierarchy. The
> lowest level of the hierarchy holds the CPUs. The CPUs are associated to
> groups of 8, which are seperated per node. If more than one CPU group
> exist, then a second level in the hierarchy collects the groups. Depending
> on the size of the system more than 2 levels are required. Each group has a
> "migrator" which checks the timerqueue during the tick for remote expirable
> timers.
> 
> If the last CPU in a group goes idle it reports the first expiring event in
> the group up to the next group(s) in the hierarchy. If the last CPU goes
> idle it arms its timer for the first system wide expiring timer to ensure
> that no timer event is missed.
> 
> The series is also available from git:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.timers

No concerns from me FWIW.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-04-18 14:03 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-17 18:32 [patch 00/10] timer: Move from a push remote at enqueue to a pull at expiry model Thomas Gleixner
2017-04-17 18:32 ` [patch 01/10] timer: Invoke timer_start_debug() where it makes sense Thomas Gleixner
2017-04-17 18:32 ` [patch 02/10] timerqueue: Document return values of timerqueue_add/del() Thomas Gleixner
2017-04-17 18:32 ` [patch 03/10] timers: Rework idle logic Thomas Gleixner
2017-04-17 18:32 ` [patch 04/10] timer: Keep the pinned timers separate from the others Thomas Gleixner
2017-04-17 18:32 ` [patch 05/10] timer: Retrieve next expiry of pinned/non-pinned timers seperately Thomas Gleixner
2017-04-17 18:32 ` [patch 06/10] timer: Restructure internal locking Thomas Gleixner
2017-04-17 18:32 ` [patch 07/10] tick/sched: Split out jiffies update helper function Thomas Gleixner
2017-04-17 18:32 ` [patch 08/10] timer: Implement the hierarchical pull model Thomas Gleixner
2017-04-17 18:32 ` [patch 09/10] timer/migration: Add tracepoints Thomas Gleixner
2017-04-17 19:09   ` Steven Rostedt
2017-04-17 18:32 ` [patch 10/10] timer: Always queue timers on the local CPU Thomas Gleixner
2017-04-18 13:57 ` [patch 00/10] timer: Move from a push remote at enqueue to a pull at expiry model Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).