All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep
@ 2013-12-17 22:51 Frederic Weisbecker
  2013-12-17 22:51 ` [PATCH 01/13] tick: Rename tick_check_idle() to tick_irq_enter() Frederic Weisbecker
                   ` (13 more replies)
  0 siblings, 14 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-17 22:51 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alex Shi, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Paul E. McKenney, John Stultz,
	Kevin Hilman

Hi,

This series makes the nohz subsystem eventually use the RCU full sysidle
detection.

When we have CPUs running in full dynticks mode in the system, the
CPU 0 handles the timekeeping duty on behalf of all other CPUs. Given
that full dynticks can run anytime, CPU 0 stays periodic and never
enter into dynticks idle mode. This is of course a powersaving issue.

Now making CPU 0 to become more power-friendly sounds like an easy task.
After all we just need to allow it to enter in dynticks idle mode as
soon as all full dynticks CPUs are idle (aka sysidle state, we'll
probably need some more precise name as it only applies to full dynticks
CPUs).

But sysidle state detection is actually difficult to get right. It
must scale with growing number of CPUs, minimize IPIs and atomic
operations on fast paths. Given that this detection already existed into
a bit more generalized form through the existing RCU extended quiescent
state detection, it has been implemented by specializing this code and
adding a state machine on top of it.
(Thanks to Paul for this work, more details: https://lwn.net/Articles/558284/)

This feature which is enabled with CONFIG_NO_HZ_FULL_SYSIDLE=y, is
working but is not yet plugged into the nohz subsystem. Namely we can
detect states where all full dynticks CPUs are sleeping, but we don't
yet take benefit from it by opportunistically stopping the tick of
the timekeeper CPU 0.

So this is what this series brings, more details following:

* Some code, naming and whitespace cleanups

* Allow all CPUs outside the nohz_full range to handle the timekeeping
  duty, not just CPU 0. Balancing the timekeeping duty should improve
  powersavings.

* Let the timekeeper (including CPU 0) sleep when its duty is
  handed over to another CPU

* Allow timekeeper to sleep when all full dynticks CPUs are sleeping
  (plug nohz to RCU sysidle detection)

* Wake up timekeeper with an IPI when full dynticks CPUs exit sysidle
  state

* Wake up CPU 0 when a secondary timekeeper is offlined so that its
  duty gets migrated


For convenience, you can fetch this from:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	timers/full_sysidle-rfc

Thanks,
	Frederic
---

Frederic Weisbecker (12):
      tick: Rename tick_check_idle() to tick_irq_enter()
      time: New helper to check CPU eligibility to handle timekeeping
      rcu: Exclude all potential timekeepers from sysidle detection
      tick: Use timekeeping_cpu() to elect the CPU handling timekeeping duty
      rcu: Fix unraised IPI to timekeeping CPU
      nohz: Introduce full dynticks' default timekeeping target
      sched: Enable IPI reception on timekeeper under nohz full system
      nohz: Get timekeeping max deferment outside jiffies_lock
      nohz: Allow timekeeper's tick to stop when all full dynticks CPUs are idle
      nohz: Hand over timekeeping duty on cpu offlining
      nohz: Wake up timekeeper on exit from sysidle state
      nohz: Allow all CPUs outside nohz_full range to do timekeeping

Alex Shi (1):
      nohz_full: fix code style issue of tick_nohz_full_stop_tick


 include/linux/tick.h      |  38 ++++++++++++--
 kernel/rcu/tree_plugin.h  |  12 ++---
 kernel/sched/core.c       |   6 +--
 kernel/softirq.c          |   2 +-
 kernel/time/tick-common.c |   2 +-
 kernel/time/tick-sched.c  | 128 +++++++++++++++++++++++++++++++++++-----------
 6 files changed, 142 insertions(+), 46 deletions(-)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 01/13] tick: Rename tick_check_idle() to tick_irq_enter()
  2013-12-17 22:51 [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep Frederic Weisbecker
@ 2013-12-17 22:51 ` Frederic Weisbecker
  2014-01-25 14:22   ` [tip:timers/urgent] " tip-bot for Frederic Weisbecker
  2013-12-17 22:51 ` [PATCH 02/13] time: New helper to check CPU eligibility to handle timekeeping Frederic Weisbecker
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-17 22:51 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Paul E. McKenney, John Stultz,
	Alex Shi, Kevin Hilman

This makes the code more symetric against the existing tick functions
called on irq exit: tick_irq_exit() and tick_nohz_irq_exit().

These function are also symetric as they mirror each other's action:
we start to account idle time on irq exit and we stop this accounting
on irq entry. Also the tick is stopped on irq exit and timekeeping
catches up with the tickless time elapsed until we reach irq entry.

This rename was suggested by Peter Zijlstra a long while ago but it
got forgotten in the mass.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
---
 include/linux/tick.h     | 6 +++---
 kernel/softirq.c         | 2 +-
 kernel/time/tick-sched.c | 8 ++++----
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 0175d86..b84773c 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -104,7 +104,7 @@ extern struct cpumask *tick_get_broadcast_oneshot_mask(void);
 extern void tick_clock_notify(void);
 extern int tick_check_oneshot_change(int allow_nohz);
 extern struct tick_sched *tick_get_tick_sched(int cpu);
-extern void tick_check_idle(void);
+extern void tick_irq_enter(void);
 extern int tick_oneshot_mode_active(void);
 #  ifndef arch_needs_cpu
 #   define arch_needs_cpu(cpu) (0)
@@ -112,7 +112,7 @@ extern int tick_oneshot_mode_active(void);
 # else
 static inline void tick_clock_notify(void) { }
 static inline int tick_check_oneshot_change(int allow_nohz) { return 0; }
-static inline void tick_check_idle(void) { }
+static inline void tick_irq_enter(void) { }
 static inline int tick_oneshot_mode_active(void) { return 0; }
 # endif
 
@@ -121,7 +121,7 @@ static inline void tick_init(void) { }
 static inline void tick_cancel_sched_timer(int cpu) { }
 static inline void tick_clock_notify(void) { }
 static inline int tick_check_oneshot_change(int allow_nohz) { return 0; }
-static inline void tick_check_idle(void) { }
+static inline void tick_irq_enter(void) { }
 static inline int tick_oneshot_mode_active(void) { return 0; }
 #endif /* !CONFIG_GENERIC_CLOCKEVENTS */
 
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 11348de..ca9cb35 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -318,7 +318,7 @@ void irq_enter(void)
 		 * here, as softirq will be serviced on return from interrupt.
 		 */
 		local_bh_disable();
-		tick_check_idle();
+		tick_irq_enter();
 		_local_bh_enable();
 	}
 
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 2afd43f..df6432d 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1025,7 +1025,7 @@ static void tick_nohz_kick_tick(struct tick_sched *ts, ktime_t now)
 #endif
 }
 
-static inline void tick_check_nohz_this_cpu(void)
+static inline void tick_nohz_irq_enter(void)
 {
 	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
 	ktime_t now;
@@ -1044,17 +1044,17 @@ static inline void tick_check_nohz_this_cpu(void)
 #else
 
 static inline void tick_nohz_switch_to_nohz(void) { }
-static inline void tick_check_nohz_this_cpu(void) { }
+static inline void tick_nohz_irq_enter(void) { }
 
 #endif /* CONFIG_NO_HZ_COMMON */
 
 /*
  * Called from irq_enter to notify about the possible interruption of idle()
  */
-void tick_check_idle(void)
+void tick_irq_enter(void)
 {
 	tick_check_oneshot_broadcast_this_cpu();
-	tick_check_nohz_this_cpu();
+	tick_nohz_irq_enter();
 }
 
 /*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 02/13] time: New helper to check CPU eligibility to handle timekeeping
  2013-12-17 22:51 [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep Frederic Weisbecker
  2013-12-17 22:51 ` [PATCH 01/13] tick: Rename tick_check_idle() to tick_irq_enter() Frederic Weisbecker
@ 2013-12-17 22:51 ` Frederic Weisbecker
  2013-12-17 22:51 ` [PATCH 03/13] rcu: Exclude all potential timekeepers from sysidle detection Frederic Weisbecker
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-17 22:51 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Paul E. McKenney, John Stultz,
	Alex Shi, Kevin Hilman

Traditionally, all online CPUs in the system are allowed to handle
the timekeeping duty.

Now with the full dynticks subsystem, we don't want the busy CPUs
running tickless to be bothered with housekeeping duties that
require periodic interrupts because that would defeat the main purpose
of running tickless in the first place.

As such, the following rules apply to define the timekeeping duty affinity:

* If we run in full dynticks mode, CPU 0 handles the timekeeping on
  behalf of all other CPUs in the system. No other CPU is allowed to
  take that duty. This behaviour will soon be subject to change though
  as we plan to balance this duty over all the CPUs outside the full
  dynticks range with CONFIG_NO_HZ_FULL=y in order to improve power
  consumption.

* On any other environment (strict periodic or dynticks idle kernel),
  the timekeeping duty can be handled by any CPU.

In order to enforce and consolidate this behaviour, provide an API that
core subsystems can use to check if a CPU is allowed to handle the
timekeeping duty.

This is going to be used by the timer subsystem before assigning a
timekeeper and by RCU for the full sysidle detection.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
---
 include/linux/tick.h | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index b84773c..cf2fd34 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -179,15 +179,35 @@ static inline bool tick_nohz_full_cpu(int cpu)
 	return cpumask_test_cpu(cpu, tick_nohz_full_mask);
 }
 
+/**
+ * tick_timeeping_cpu - check if a CPU is elligble to handle timekeeping duty
+ * @cpu:	the cpu to check
+ *
+ * @return true if the CPU is elligible to perform timekeeping duty.
+ */
+static inline bool tick_timekeeping_cpu(int cpu)
+{
+	/*
+	 * If there are full dynticks CPUs around,
+	 * CPU 0 must stay periodic to update timekeeping.
+	 */
+	if (tick_nohz_full_enabled())
+		return cpu == 0;
+
+	/* Otherwise any CPU is elligible for timekeeping duty */
+	return true;
+}
+
 extern void tick_nohz_init(void);
 extern void __tick_nohz_full_check(void);
 extern void tick_nohz_full_kick(void);
 extern void tick_nohz_full_kick_all(void);
 extern void __tick_nohz_task_switch(struct task_struct *tsk);
-#else
+# else
 static inline void tick_nohz_init(void) { }
 static inline bool tick_nohz_full_enabled(void) { return false; }
 static inline bool tick_nohz_full_cpu(int cpu) { return false; }
+static inline bool tick_timekeeping_cpu(int cpu) { return true; }
 static inline void __tick_nohz_full_check(void) { }
 static inline void tick_nohz_full_kick(void) { }
 static inline void tick_nohz_full_kick_all(void) { }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 03/13] rcu: Exclude all potential timekeepers from sysidle detection
  2013-12-17 22:51 [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep Frederic Weisbecker
  2013-12-17 22:51 ` [PATCH 01/13] tick: Rename tick_check_idle() to tick_irq_enter() Frederic Weisbecker
  2013-12-17 22:51 ` [PATCH 02/13] time: New helper to check CPU eligibility to handle timekeeping Frederic Weisbecker
@ 2013-12-17 22:51 ` Frederic Weisbecker
  2013-12-17 23:27   ` Paul E. McKenney
  2013-12-17 22:51 ` [PATCH 04/13] tick: Use timekeeping_cpu() to elect the CPU handling timekeeping duty Frederic Weisbecker
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-17 22:51 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Paul E. McKenney, John Stultz,
	Alex Shi, Kevin Hilman

The purpose of the full system idle detection is to notify the CPU
handling the timekeeping when the rest of the system is idle so that it
can sleep when nobody needs the jiffies nor GTOD to be maintained.

Now this machinery excludes CPU 0 itself from the range of the idle
detection because if CPU 0 has any non-idle task to execute, it is going
to restart its own tick since it's guaranteed to be outside the full
dynticks range. And as it is the only eligible timekeeper when
CONFIG_NO_HZ_FULL=y anyway, it can handle the timekeeping duty for
and by itself.

Still we also plan to extend the timekeepers affinity and allow every CPU
outside the full dynticks range to handle the timekeeping duty, not just
CPU 0.

So once we reach that step, we can state that all CPUs that are not
full dynticks can be excluded from the full system idle detection,
simply because those CPUs share the same property than CPU 0 today. When
a non full dynticks CPU needs to run some busy task, it restarts its
tick and handles the timekeeping duty for its own needs as is currently
done under CONFIG_NO_HZ_IDLE=y.

To prepare for this support in the sysidle detection, we can use the
tick_timekeeping_cpu() API which checks if a CPU is allowed to handle
timekeeping duty. If so we can conclude that it's not full dynticks and
it can maintain timekeeping by itself and as such it can be excluded
from the sysidle detection.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
---
 kernel/rcu/tree_plugin.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 6abb03d..08004da 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2539,7 +2539,7 @@ static void rcu_sysidle_exit(struct rcu_dynticks *rdtp, int irq)
 	 * invoke rcu_sysidle_force_exit() directly if it does anything
 	 * more than take a scheduling-clock interrupt.
 	 */
-	if (smp_processor_id() == tick_do_timer_cpu)
+	if (tick_timekeeping_cpu(smp_processor_id()))
 		return;
 
 	/* Update system-idle state: We are clearly no longer fully idle! */
@@ -2563,10 +2563,10 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle,
 	 * is an offline or the timekeeping CPU, nothing to do.
 	 */
 	if (!*isidle || rdp->rsp != rcu_sysidle_state ||
-	    cpu_is_offline(rdp->cpu) || rdp->cpu == tick_do_timer_cpu)
+	    cpu_is_offline(rdp->cpu) || tick_timekeeping_cpu(rdp->cpu))
 		return;
 	if (rcu_gp_in_progress(rdp->rsp))
-		WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);
+		WARN_ON_ONCE(!tick_timekeeping_cpu(smp_processor_id()));
 
 	/* Pick up current idle and NMI-nesting counter and check. */
 	cur = atomic_read(&rdtp->dynticks_idle);
@@ -2729,7 +2729,7 @@ bool rcu_sys_is_idle(void)
 	static struct rcu_sysidle_head rsh;
 	int rss = ACCESS_ONCE(full_sysidle_state);
 
-	if (WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu))
+	if (WARN_ON_ONCE(!tick_timekeeping_cpu(smp_processor_id())))
 		return false;
 
 	/* Handle small-system case by doing a full scan of CPUs. */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 04/13] tick: Use timekeeping_cpu() to elect the CPU handling timekeeping duty
  2013-12-17 22:51 [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2013-12-17 22:51 ` [PATCH 03/13] rcu: Exclude all potential timekeepers from sysidle detection Frederic Weisbecker
@ 2013-12-17 22:51 ` Frederic Weisbecker
  2013-12-17 23:55   ` Paul E. McKenney
  2013-12-17 22:51 ` [PATCH 05/13] rcu: Fix unraised IPI to timekeeping CPU Frederic Weisbecker
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-17 22:51 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Paul E. McKenney, John Stultz,
	Alex Shi, Kevin Hilman

Now that we have an API to determine if a CPU is allowed to handle
timekeeping duty, use it now on timekeeper selection time for clarity.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
---
 kernel/time/tick-common.c | 2 +-
 kernel/time/tick-sched.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 64522ec..755dcd6 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -163,7 +163,7 @@ static void tick_setup_device(struct tick_device *td,
 		 * this cpu:
 		 */
 		if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
-			if (!tick_nohz_full_cpu(cpu))
+			if (tick_timekeeping_cpu(cpu))
 				tick_do_timer_cpu = cpu;
 			else
 				tick_do_timer_cpu = TICK_DO_TIMER_NONE;
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index df6432d..ea0d411 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -118,7 +118,7 @@ static void tick_sched_do_timer(ktime_t now)
 	 * jiffies_lock.
 	 */
 	if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)
-	    && !tick_nohz_full_cpu(cpu))
+	    && tick_timekeeping_cpu(cpu))
 		tick_do_timer_cpu = cpu;
 #endif
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 05/13] rcu: Fix unraised IPI to timekeeping CPU
  2013-12-17 22:51 [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2013-12-17 22:51 ` [PATCH 04/13] tick: Use timekeeping_cpu() to elect the CPU handling timekeeping duty Frederic Weisbecker
@ 2013-12-17 22:51 ` Frederic Weisbecker
  2013-12-17 23:21   ` Paul E. McKenney
  2013-12-18 12:12   ` Peter Zijlstra
  2013-12-17 22:51 ` [PATCH 06/13] nohz: Introduce full dynticks' default timekeeping target Frederic Weisbecker
                   ` (8 subsequent siblings)
  13 siblings, 2 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-17 22:51 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Paul E. McKenney, John Stultz,
	Alex Shi, Kevin Hilman

The plan with full system idle detection is to allow the timekeeper
to sleep when all full dynticks CPUs are sleeping.

Then when a full dynticks CPU wakes up while the whole system is idle,
it sends an IPI to the timekeeping CPU which then restarts its tick
and polls on its timekeeping duty on behalf of all other CPUs in the
system.

But we are using rcu_kick_nohz_cpu() to raise this IPI, which is wrong
because this function is used to kick full dynticks CPUs when they run
in the kernel for too long without reporting a quiescent state. And
this function ignores targets that are not full dynticks, like our
timekeeper.

To fix this, use the smp_send_reschedule() function directly.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
---
 kernel/rcu/tree_plugin.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 08004da..84d90c8 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2488,7 +2488,7 @@ void rcu_sysidle_force_exit(void)
 				      oldstate, RCU_SYSIDLE_NOT);
 		if (oldstate == newoldstate &&
 		    oldstate == RCU_SYSIDLE_FULL_NOTED) {
-			rcu_kick_nohz_cpu(tick_do_timer_cpu);
+			smp_send_reschedule(tick_do_timer_cpu);
 			return; /* We cleared it, done! */
 		}
 		oldstate = newoldstate;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 06/13] nohz: Introduce full dynticks' default timekeeping target
  2013-12-17 22:51 [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2013-12-17 22:51 ` [PATCH 05/13] rcu: Fix unraised IPI to timekeeping CPU Frederic Weisbecker
@ 2013-12-17 22:51 ` Frederic Weisbecker
  2013-12-17 23:54   ` Paul E. McKenney
  2013-12-17 22:51 ` [PATCH 07/13] sched: Enable IPI reception on timekeeper under nohz full system Frederic Weisbecker
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-17 22:51 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Paul E. McKenney, John Stultz,
	Alex Shi, Kevin Hilman

When a full dynticks CPU wakes up while the whole rest of the system
is idle, we need to wake up the CPU in charge of the timekeeping duty
handling.

As of today, the CPU that maintains this duty is CPU 0 when
CONFIG_NO_HZ_FULL=y. So referring to tick_do_timer_cpu like we
currently do is correct. But this behaviour is subject to change
in the future because we want to balance the timekeeping duty over all
the CPUs outside the full dynticks range.

As such we now need to define a default timekeeping CPU which receives
the timekeeping wakeup IPIs and which can't be offlined so that it's
guaranteed to always be present for full dynticks CPUs housekeeping.

So lets stick to CPU 0 for this purpose. It's convenient because
rejecting any other CPU's offlining request may result in suspend
failure.

We can optimize this solution later by adaptively sending the IPI
to a potential timekeeping CPU that is already running a non idle task.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
---
 include/linux/tick.h     | 16 ++++++++++++++++
 kernel/rcu/tree_plugin.h |  4 ++--
 kernel/time/tick-sched.c |  2 +-
 3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index cf2fd34..af98d2c 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -180,6 +180,22 @@ static inline bool tick_nohz_full_cpu(int cpu)
 }
 
 /**
+ * tick_timekeeping_default_cpu - seek timekeeping default CPU
+
+ * @return the default target which we send an IPI to
+ * when a full dynticks CPU wakes up and exits from full
+ * system idle state.
+ *
+ * This target is always CPU 0 in full dynticks environment.
+ * If we were to pick up any other CPU, that would result in suspend
+ * failures due to rejected offlining request.
+ */
+static inline int tick_timekeeping_default_cpu(void)
+{
+	return 0;
+}
+
+/**
  * tick_timeeping_cpu - check if a CPU is elligble to handle timekeeping duty
  * @cpu:	the cpu to check
  *
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 84d90c8..1795265 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2488,7 +2488,7 @@ void rcu_sysidle_force_exit(void)
 				      oldstate, RCU_SYSIDLE_NOT);
 		if (oldstate == newoldstate &&
 		    oldstate == RCU_SYSIDLE_FULL_NOTED) {
-			smp_send_reschedule(tick_do_timer_cpu);
+			smp_send_reschedule(tick_timekeeping_default_cpu());
 			return; /* We cleared it, done! */
 		}
 		oldstate = newoldstate;
@@ -2597,7 +2597,7 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp)
  */
 static void rcu_bind_gp_kthread(void)
 {
-	int cpu = ACCESS_ONCE(tick_do_timer_cpu);
+	int cpu = tick_timekeeping_default_cpu();
 
 	if (cpu < 0 || cpu >= nr_cpu_ids)
 		return;
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index ea0d411..9a91c31 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -305,7 +305,7 @@ static int tick_nohz_cpu_down_callback(struct notifier_block *nfb,
 		 * If we handle the timekeeping duty for full dynticks CPUs,
 		 * we can't safely shutdown that CPU.
 		 */
-		if (tick_nohz_full_running && tick_do_timer_cpu == cpu)
+		if (tick_nohz_full_running && tick_timekeeping_default_cpu() == cpu)
 			return NOTIFY_BAD;
 		break;
 	}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 07/13] sched: Enable IPI reception on timekeeper under nohz full system
  2013-12-17 22:51 [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep Frederic Weisbecker
                   ` (5 preceding siblings ...)
  2013-12-17 22:51 ` [PATCH 06/13] nohz: Introduce full dynticks' default timekeeping target Frederic Weisbecker
@ 2013-12-17 22:51 ` Frederic Weisbecker
  2013-12-17 23:52   ` Paul E. McKenney
  2013-12-18 10:06   ` Peter Zijlstra
  2013-12-17 22:51 ` [PATCH 08/13] nohz: Get timekeeping max deferment outside jiffies_lock Frederic Weisbecker
                   ` (6 subsequent siblings)
  13 siblings, 2 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-17 22:51 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Paul E. McKenney, John Stultz,
	Alex Shi, Kevin Hilman

We need the default timekeeping CPU to be able to receive IPIs sent
from full dynticks CPUs when they wake up from full system idle state.

Therefore we need an entrypoint from the scheduler IPI so that the
need to poll on timekeeping duty is re-evaluated from irq_exit().

In order to achieve this, lets take the scheduler IPI everytime as long
as there is at least one full dynticks CPU around. Full dynticks CPUs
are interested too in taking scheduler IPIs to reevaluate their tick.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
---
 kernel/sched/core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e85cda2..f46a7bc 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1502,9 +1502,9 @@ void scheduler_ipi(void)
 	if (tif_need_resched())
 		set_preempt_need_resched();
 
-	if (llist_empty(&this_rq()->wake_list)
-			&& !tick_nohz_full_cpu(smp_processor_id())
-			&& !got_nohz_idle_kick())
+	if (llist_empty(&this_rq()->wake_list) &&
+	    !tick_nohz_full_enabled() &&
+	    !got_nohz_idle_kick())
 		return;
 
 	/*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 08/13] nohz: Get timekeeping max deferment outside jiffies_lock
  2013-12-17 22:51 [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep Frederic Weisbecker
                   ` (6 preceding siblings ...)
  2013-12-17 22:51 ` [PATCH 07/13] sched: Enable IPI reception on timekeeper under nohz full system Frederic Weisbecker
@ 2013-12-17 22:51 ` Frederic Weisbecker
  2014-01-25 14:22   ` [tip:timers/urgent] " tip-bot for Frederic Weisbecker
  2013-12-17 22:51 ` [PATCH 09/13] nohz: Allow timekeeper's tick to stop when all full dynticks CPUs are idle Frederic Weisbecker
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-17 22:51 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Paul E. McKenney, John Stultz,
	Alex Shi, Kevin Hilman

We don't need to fetch the timekeeping max deferment under the
jiffies_lock seqlock.

If the clocksource is updated concurrently while we stop the tick,
stop machine is called and the tick will be reevaluated again along with
uptodate jiffies and its related values.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
---
 kernel/time/tick-sched.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 9a91c31..0d2d774 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -532,12 +532,13 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
 	u64 time_delta;
 
+	time_delta = timekeeping_max_deferment();
+
 	/* Read jiffies and the time when jiffies were updated last */
 	do {
 		seq = read_seqbegin(&jiffies_lock);
 		last_update = last_jiffies_update;
 		last_jiffies = jiffies;
-		time_delta = timekeeping_max_deferment();
 	} while (read_seqretry(&jiffies_lock, seq));
 
 	if (rcu_needs_cpu(cpu, &rcu_delta_jiffies) ||
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 09/13] nohz: Allow timekeeper's tick to stop when all full dynticks CPUs are idle
  2013-12-17 22:51 [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep Frederic Weisbecker
                   ` (7 preceding siblings ...)
  2013-12-17 22:51 ` [PATCH 08/13] nohz: Get timekeeping max deferment outside jiffies_lock Frederic Weisbecker
@ 2013-12-17 22:51 ` Frederic Weisbecker
  2013-12-17 23:51   ` Paul E. McKenney
  2013-12-17 22:51 ` [PATCH 10/13] nohz: Hand over timekeeping duty on cpu offlining Frederic Weisbecker
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-17 22:51 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Paul E. McKenney, John Stultz,
	Alex Shi, Kevin Hilman

When all full dynticks CPUs are idle, as detected by RCU's sysidle
detection, there is no need to keep the timekeeping CPU's tick alive
anymore. So lets shut it down when we meet this favourable state. The
timekeeper will be notified with an IPI if any full dynticks CPU
wakes up.

Also, since we plan to allow every CPUs outside the full dynticks range
to handle the timekeeping duty, lets also allow the timekeeping duty
to be balanced. The only requirement is that the last timekeeper can't
shut down its idle tick further than 1 jiffie until some other CPU
takes its duty or until all full dynticks CPUs go to sleep.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
---
 kernel/time/tick-sched.c | 67 ++++++++++++++++++++++++++++++++++++------------
 1 file changed, 50 insertions(+), 17 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 0d2d774..527b501 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -192,6 +192,49 @@ static bool can_stop_full_tick(void)
 	return true;
 }
 
+/*
+ * Fetch max deferment for the current clockevent source until it overflows.
+ * Also in full dynticks environment, make sure the current timekeeper
+ * stays periodic until some other CPU can take its timekeeping duty
+ * or until all full dynticks go to sleep.
+ */
+static u64 tick_timekeeping_max_deferment(struct tick_sched *ts)
+{
+	int cpu;
+	u64 ret = KTIME_MAX;
+
+	/*
+	 * Fast path for full dynticks off-case: skip to
+	 * clockevent max deferment
+	 */
+	if (!tick_nohz_full_enabled())
+		return timekeeping_max_deferment();
+
+	cpu = smp_processor_id();
+
+	/* Full dynticks CPU don't take timekeeping duty */
+	if (!tick_timekeeping_cpu(cpu))
+		return timekeeping_max_deferment();
+
+	/*
+	 * If we are the timekeeper and all full dynticks CPUs are idle,
+	 * then we can finally sleep.
+	 */
+	if (tick_do_timer_cpu == cpu ||
+	    (tick_do_timer_cpu == TICK_DO_TIMER_NONE &&	ts->do_timer_last == 1)) {
+		if (!rcu_sys_is_idle()) {
+			/*
+			 * Stop tick for 1 jiffy. In practice we stay periodic
+			 * but that let us possibly delegate our timekeeping duty
+			 * to stop the tick for real in the future.
+			 */
+			ret = tick_period.tv64;
+		}
+	}
+
+	return min_t(u64, ret, timekeeping_max_deferment());
+}
+
 static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now);
 
 /*
@@ -352,7 +395,12 @@ void __init tick_nohz_init(void)
 	cpulist_scnprintf(nohz_full_buf, sizeof(nohz_full_buf), tick_nohz_full_mask);
 	pr_info("NO_HZ: Full dynticks CPUs: %s.\n", nohz_full_buf);
 }
-#endif
+# else /* CONFIG_NO_HZ_FULL */
+static u64 tick_timekeeping_max_deferment(struct tick_sched *ts)
+{
+	return timekeeping_max_deferment();
+}
+#endif /* CONFIG_NO_HZ_FULL */
 
 /*
  * NOHZ - aka dynamic tick functionality
@@ -532,7 +580,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
 	u64 time_delta;
 
-	time_delta = timekeeping_max_deferment();
+	time_delta = tick_timekeeping_max_deferment(ts);
 
 	/* Read jiffies and the time when jiffies were updated last */
 	do {
@@ -726,21 +774,6 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 		return false;
 	}
 
-	if (tick_nohz_full_enabled()) {
-		/*
-		 * Keep the tick alive to guarantee timekeeping progression
-		 * if there are full dynticks CPUs around
-		 */
-		if (tick_do_timer_cpu == cpu)
-			return false;
-		/*
-		 * Boot safety: make sure the timekeeping duty has been
-		 * assigned before entering dyntick-idle mode,
-		 */
-		if (tick_do_timer_cpu == TICK_DO_TIMER_NONE)
-			return false;
-	}
-
 	return true;
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 10/13] nohz: Hand over timekeeping duty on cpu offlining
  2013-12-17 22:51 [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep Frederic Weisbecker
                   ` (8 preceding siblings ...)
  2013-12-17 22:51 ` [PATCH 09/13] nohz: Allow timekeeper's tick to stop when all full dynticks CPUs are idle Frederic Weisbecker
@ 2013-12-17 22:51 ` Frederic Weisbecker
  2013-12-17 23:40   ` Paul E. McKenney
  2013-12-18 12:30   ` Peter Zijlstra
  2013-12-17 22:51 ` [PATCH 11/13] nohz: Wake up timekeeper on exit from sysidle state Frederic Weisbecker
                   ` (3 subsequent siblings)
  13 siblings, 2 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-17 22:51 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Paul E. McKenney, John Stultz,
	Alex Shi, Kevin Hilman

When there are full dynticks CPUs around and the timekeeper goes
offline, we have to hand over the timekeeping duty to another potential
timekeeper.

The default timekeeper (aka CPU 0) is the perfect candidate for this
task since it can't be offlined itself.

So lets send an IPI to the default timekeeping when the current
timekeeper goes offline, so that the duty is relayed.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
---
 include/linux/tick.h     |  2 ++
 kernel/time/tick-sched.c | 31 +++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index af98d2c..bd3c32e 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -218,6 +218,7 @@ extern void tick_nohz_init(void);
 extern void __tick_nohz_full_check(void);
 extern void tick_nohz_full_kick(void);
 extern void tick_nohz_full_kick_all(void);
+extern void tick_nohz_full_kick_timekeeping(void);
 extern void __tick_nohz_task_switch(struct task_struct *tsk);
 # else
 static inline void tick_nohz_init(void) { }
@@ -227,6 +228,7 @@ static inline bool tick_timekeeping_cpu(int cpu) { return true; }
 static inline void __tick_nohz_full_check(void) { }
 static inline void tick_nohz_full_kick(void) { }
 static inline void tick_nohz_full_kick_all(void) { }
+static inline void tick_nohz_full_kick_timekeeping(void) { }
 static inline void __tick_nohz_task_switch(struct task_struct *tsk) { }
 #endif
 
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 527b501..94b6901 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -217,6 +217,12 @@ static u64 tick_timekeeping_max_deferment(struct tick_sched *ts)
 		return timekeeping_max_deferment();
 
 	/*
+	 * Order tick_do_timer_cpu read against the IPI, pairs with
+	 * tick_nohz_full_kick_timekeeping()
+	 */
+	smp_rmb();
+
+	/*
 	 * If we are the timekeeper and all full dynticks CPUs are idle,
 	 * then we can finally sleep.
 	 */
@@ -293,6 +299,22 @@ void tick_nohz_full_kick_all(void)
 	preempt_enable();
 }
 
+/**
+ * tick_nohz_full_kick_timekeeping - kick the default timekeeper
+ *
+ * kick the default timekeeper when a secondary timekeeper goes offline.
+ */
+void tick_nohz_full_kick_timekeeping(void)
+{
+	tick_do_timer_cpu = tick_timekeeping_default_cpu();
+	/*
+	 * Order tick_do_timer_cpu against the IPI, pairs with
+	 * tick_timekeeping_max_deferment on irq exit.
+	 */
+	smp_wmb();
+	smp_send_reschedule(tick_timekeeping_default_cpu());
+}
+
 /*
  * Re-evaluate the need for the tick as we switch the current task.
  * It might need the tick due to per task/process properties:
@@ -351,6 +373,15 @@ static int tick_nohz_cpu_down_callback(struct notifier_block *nfb,
 		if (tick_nohz_full_running && tick_timekeeping_default_cpu() == cpu)
 			return NOTIFY_BAD;
 		break;
+
+	case CPU_DYING:
+		/*
+		 * Notify default timekeeper if we are giving up
+		 * timekeeping duty
+		 */
+		if (tick_nohz_full_running && tick_do_timer_cpu == cpu)
+			tick_nohz_full_kick_timekeeping();
+		break;
 	}
 	return NOTIFY_OK;
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 11/13] nohz: Wake up timekeeper on exit from sysidle state
  2013-12-17 22:51 [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep Frederic Weisbecker
                   ` (9 preceding siblings ...)
  2013-12-17 22:51 ` [PATCH 10/13] nohz: Hand over timekeeping duty on cpu offlining Frederic Weisbecker
@ 2013-12-17 22:51 ` Frederic Weisbecker
  2013-12-17 23:34   ` Paul E. McKenney
  2013-12-17 22:51 ` [PATCH 12/13] nohz: Allow all CPUs outside nohz_full range to do timekeeping Frederic Weisbecker
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-17 22:51 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Paul E. McKenney, John Stultz,
	Alex Shi, Kevin Hilman

When a full dynticks CPU wakes up from sysidle state, which means that
all full dynticks CPUs were previously sleeping, it's possible that
all the potential timekeeping CPUs are sleeping as well and nobody
maintains the associated duty.

But full dynticks CPUs don't run the tick by definition so we need
to wake up a timekeeper such that it can handle the timekeeping
duty on behalf of the freshly awoken full dyntick CPU.

To achieve this and ensure that this CPU won't deal with stale
jiffies values, lets wake up the default timekeeper using the right
API.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
---
 kernel/rcu/tree_plugin.h | 2 +-
 kernel/time/tick-sched.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 1795265..b43e32d 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2488,7 +2488,7 @@ void rcu_sysidle_force_exit(void)
 				      oldstate, RCU_SYSIDLE_NOT);
 		if (oldstate == newoldstate &&
 		    oldstate == RCU_SYSIDLE_FULL_NOTED) {
-			smp_send_reschedule(tick_timekeeping_default_cpu());
+			tick_nohz_full_kick_timekeeping();
 			return; /* We cleared it, done! */
 		}
 		oldstate = newoldstate;
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 94b6901..f5ae69f 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -302,7 +302,8 @@ void tick_nohz_full_kick_all(void)
 /**
  * tick_nohz_full_kick_timekeeping - kick the default timekeeper
  *
- * kick the default timekeeper when a secondary timekeeper goes offline.
+ * kick the default timekeeper when full dynticks CPUs exit full
+ * system idle state or when a secondary timekeeper goes offline.
  */
 void tick_nohz_full_kick_timekeeping(void)
 {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 12/13] nohz: Allow all CPUs outside nohz_full range to do timekeeping
  2013-12-17 22:51 [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep Frederic Weisbecker
                   ` (10 preceding siblings ...)
  2013-12-17 22:51 ` [PATCH 11/13] nohz: Wake up timekeeper on exit from sysidle state Frederic Weisbecker
@ 2013-12-17 22:51 ` Frederic Weisbecker
  2013-12-17 23:32   ` Paul E. McKenney
  2013-12-17 22:51 ` [PATCH 13/13] nohz_full: fix code style issue of tick_nohz_full_stop_tick Frederic Weisbecker
  2013-12-18  2:04 ` [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep Alex Shi
  13 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-17 22:51 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Paul E. McKenney, John Stultz,
	Alex Shi, Kevin Hilman

Now that we have all the infrastructure in place and ready to support
timekeeping duty balanced across every non full dynticks CPUs, we can
hereby extend the timekeeping duty affinity.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
---
 include/linux/tick.h | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index bd3c32e..07c02e8 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -203,15 +203,7 @@ static inline int tick_timekeeping_default_cpu(void)
  */
 static inline bool tick_timekeeping_cpu(int cpu)
 {
-	/*
-	 * If there are full dynticks CPUs around,
-	 * CPU 0 must stay periodic to update timekeeping.
-	 */
-	if (tick_nohz_full_enabled())
-		return cpu == 0;
-
-	/* Otherwise any CPU is elligible for timekeeping duty */
-	return true;
+	return !tick_nohz_full_cpu(cpu);
 }
 
 extern void tick_nohz_init(void);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 13/13] nohz_full: fix code style issue of tick_nohz_full_stop_tick
  2013-12-17 22:51 [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep Frederic Weisbecker
                   ` (11 preceding siblings ...)
  2013-12-17 22:51 ` [PATCH 12/13] nohz: Allow all CPUs outside nohz_full range to do timekeeping Frederic Weisbecker
@ 2013-12-17 22:51 ` Frederic Weisbecker
  2013-12-18  2:04 ` [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep Alex Shi
  13 siblings, 0 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-17 22:51 UTC (permalink / raw)
  To: LKML
  Cc: Alex Shi, Frederic Weisbecker, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Paul E. McKenney, John Stultz,
	Kevin Hilman

From: Alex Shi <alex.shi@linaro.org>

Code usually starts with 'tab' instead of 7 'space' in kernel

Signed-off-by: Alex Shi <alex.shi@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
---
 kernel/time/tick-sched.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index f5ae69f..8c47c6c 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -758,18 +758,18 @@ out:
 static void tick_nohz_full_stop_tick(struct tick_sched *ts)
 {
 #ifdef CONFIG_NO_HZ_FULL
-       int cpu = smp_processor_id();
+	int cpu = smp_processor_id();
 
-       if (!tick_nohz_full_cpu(cpu) || is_idle_task(current))
-               return;
+	if (!tick_nohz_full_cpu(cpu) || is_idle_task(current))
+		return;
 
-       if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
-	       return;
+	if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
+		return;
 
-       if (!can_stop_full_tick())
-               return;
+	if (!can_stop_full_tick())
+		return;
 
-       tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
+	tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
 #endif
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/13] rcu: Fix unraised IPI to timekeeping CPU
  2013-12-17 22:51 ` [PATCH 05/13] rcu: Fix unraised IPI to timekeeping CPU Frederic Weisbecker
@ 2013-12-17 23:21   ` Paul E. McKenney
  2013-12-18 14:13     ` Frederic Weisbecker
  2013-12-18 12:12   ` Peter Zijlstra
  1 sibling, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2013-12-17 23:21 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, John Stultz, Alex Shi, Kevin Hilman

On Tue, Dec 17, 2013 at 11:51:24PM +0100, Frederic Weisbecker wrote:
> The plan with full system idle detection is to allow the timekeeper
> to sleep when all full dynticks CPUs are sleeping.
> 
> Then when a full dynticks CPU wakes up while the whole system is idle,
> it sends an IPI to the timekeeping CPU which then restarts its tick
> and polls on its timekeeping duty on behalf of all other CPUs in the
> system.
> 
> But we are using rcu_kick_nohz_cpu() to raise this IPI, which is wrong
> because this function is used to kick full dynticks CPUs when they run
> in the kernel for too long without reporting a quiescent state. And
> this function ignores targets that are not full dynticks, like our
> timekeeper.
> 
> To fix this, use the smp_send_reschedule() function directly.

I guess the fact that you needed some change is reassuring.  You know
the old saying, "no bugs, no users".  ;-)

> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Alex Shi <alex.shi@linaro.org>
> Cc: Kevin Hilman <khilman@linaro.org>
> ---
>  kernel/rcu/tree_plugin.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 08004da..84d90c8 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2488,7 +2488,7 @@ void rcu_sysidle_force_exit(void)
>  				      oldstate, RCU_SYSIDLE_NOT);
>  		if (oldstate == newoldstate &&
>  		    oldstate == RCU_SYSIDLE_FULL_NOTED) {
> -			rcu_kick_nohz_cpu(tick_do_timer_cpu);
> +			smp_send_reschedule(tick_do_timer_cpu);

Hmmm...

We haven't done any sort of wakeup, and tick_nohz_full_cpu() should
return false for tick_do_timer_cpu, and I don't see that we have
done anything to make got_nohz_idle_kick() return true.

So the idea is that the fact of the interrupt is sufficient, and
the target CPU will figure out that it must turn its scheduling-clock
interrupt when returning from interrupt?

Or is something else going on here?

							Thanx, Paul

>  			return; /* We cleared it, done! */
>  		}
>  		oldstate = newoldstate;
> -- 
> 1.8.3.1
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/13] rcu: Exclude all potential timekeepers from sysidle detection
  2013-12-17 22:51 ` [PATCH 03/13] rcu: Exclude all potential timekeepers from sysidle detection Frederic Weisbecker
@ 2013-12-17 23:27   ` Paul E. McKenney
  2013-12-17 23:49     ` Frederic Weisbecker
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2013-12-17 23:27 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, John Stultz, Alex Shi, Kevin Hilman

On Tue, Dec 17, 2013 at 11:51:22PM +0100, Frederic Weisbecker wrote:
> The purpose of the full system idle detection is to notify the CPU
> handling the timekeeping when the rest of the system is idle so that it
> can sleep when nobody needs the jiffies nor GTOD to be maintained.
> 
> Now this machinery excludes CPU 0 itself from the range of the idle
> detection because if CPU 0 has any non-idle task to execute, it is going
> to restart its own tick since it's guaranteed to be outside the full
> dynticks range. And as it is the only eligible timekeeper when
> CONFIG_NO_HZ_FULL=y anyway, it can handle the timekeeping duty for
> and by itself.
> 
> Still we also plan to extend the timekeepers affinity and allow every CPU
> outside the full dynticks range to handle the timekeeping duty, not just
> CPU 0.
> 
> So once we reach that step, we can state that all CPUs that are not
> full dynticks can be excluded from the full system idle detection,
> simply because those CPUs share the same property than CPU 0 today. When
> a non full dynticks CPU needs to run some busy task, it restarts its
> tick and handles the timekeeping duty for its own needs as is currently
> done under CONFIG_NO_HZ_IDLE=y.
> 
> To prepare for this support in the sysidle detection, we can use the
> tick_timekeeping_cpu() API which checks if a CPU is allowed to handle
> timekeeping duty. If so we can conclude that it's not full dynticks and
> it can maintain timekeeping by itself and as such it can be excluded
> from the sysidle detection.
> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Alex Shi <alex.shi@linaro.org>
> Cc: Kevin Hilman <khilman@linaro.org>

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

A few comments below as well.

> ---
>  kernel/rcu/tree_plugin.h | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 6abb03d..08004da 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h

The rcu_sysidle_force_exit() function uses tick_do_timer_cpu, but
presumably needs to continue doing so in order to whack the right
CPU over the head.  I am happy to defer worrying about the interaction
with multiple timekeeping CPUs for the moment.  ;-)

> @@ -2539,7 +2539,7 @@ static void rcu_sysidle_exit(struct rcu_dynticks *rdtp, int irq)
>  	 * invoke rcu_sysidle_force_exit() directly if it does anything
>  	 * more than take a scheduling-clock interrupt.
>  	 */
> -	if (smp_processor_id() == tick_do_timer_cpu)
> +	if (tick_timekeeping_cpu(smp_processor_id()))
>  		return;
> 
>  	/* Update system-idle state: We are clearly no longer fully idle! */
> @@ -2563,10 +2563,10 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle,
>  	 * is an offline or the timekeeping CPU, nothing to do.
>  	 */
>  	if (!*isidle || rdp->rsp != rcu_sysidle_state ||
> -	    cpu_is_offline(rdp->cpu) || rdp->cpu == tick_do_timer_cpu)
> +	    cpu_is_offline(rdp->cpu) || tick_timekeeping_cpu(rdp->cpu))
>  		return;
>  	if (rcu_gp_in_progress(rdp->rsp))
> -		WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);
> +		WARN_ON_ONCE(!tick_timekeeping_cpu(smp_processor_id()));
> 
>  	/* Pick up current idle and NMI-nesting counter and check. */
>  	cur = atomic_read(&rdtp->dynticks_idle);

The rcu_bind_gp_kthread() uses tick_do_timer_cpu to figure out where
to run.  Is there some CPU mask that it should use instead once there
can be multiple timekeeping CPUs?

> @@ -2729,7 +2729,7 @@ bool rcu_sys_is_idle(void)
>  	static struct rcu_sysidle_head rsh;
>  	int rss = ACCESS_ONCE(full_sysidle_state);
> 
> -	if (WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu))
> +	if (WARN_ON_ONCE(!tick_timekeeping_cpu(smp_processor_id())))
>  		return false;
> 
>  	/* Handle small-system case by doing a full scan of CPUs. */
> -- 
> 1.8.3.1
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 12/13] nohz: Allow all CPUs outside nohz_full range to do timekeeping
  2013-12-17 22:51 ` [PATCH 12/13] nohz: Allow all CPUs outside nohz_full range to do timekeeping Frederic Weisbecker
@ 2013-12-17 23:32   ` Paul E. McKenney
  0 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2013-12-17 23:32 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, John Stultz, Alex Shi, Kevin Hilman

On Tue, Dec 17, 2013 at 11:51:31PM +0100, Frederic Weisbecker wrote:
> Now that we have all the infrastructure in place and ready to support
> timekeeping duty balanced across every non full dynticks CPUs, we can
> hereby extend the timekeeping duty affinity.
> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Alex Shi <alex.shi@linaro.org>
> Cc: Kevin Hilman <khilman@linaro.org>
> ---
>  include/linux/tick.h | 10 +---------
>  1 file changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/include/linux/tick.h b/include/linux/tick.h
> index bd3c32e..07c02e8 100644
> --- a/include/linux/tick.h
> +++ b/include/linux/tick.h
> @@ -203,15 +203,7 @@ static inline int tick_timekeeping_default_cpu(void)
>   */
>  static inline bool tick_timekeeping_cpu(int cpu)
>  {
> -	/*
> -	 * If there are full dynticks CPUs around,
> -	 * CPU 0 must stay periodic to update timekeeping.
> -	 */
> -	if (tick_nohz_full_enabled())
> -		return cpu == 0;
> -
> -	/* Otherwise any CPU is elligible for timekeeping duty */
> -	return true;
> +	return !tick_nohz_full_cpu(cpu);

OK, I guess the future is already here.  ;-)

Is it still OK for RCU to kick tick_do_timer_cpu?  Or are there race
conditions that could result in the wrong CPU being kicked?  Or is
there some guarantee that I missed that says that the timekeeping
CPU cannot change while in RCU_SYSIDLE_FULL_NOTED state?

							Thanx, Paul

>  }
> 
>  extern void tick_nohz_init(void);
> -- 
> 1.8.3.1
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 11/13] nohz: Wake up timekeeper on exit from sysidle state
  2013-12-17 22:51 ` [PATCH 11/13] nohz: Wake up timekeeper on exit from sysidle state Frederic Weisbecker
@ 2013-12-17 23:34   ` Paul E. McKenney
  2013-12-17 23:52     ` Frederic Weisbecker
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2013-12-17 23:34 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, John Stultz, Alex Shi, Kevin Hilman

On Tue, Dec 17, 2013 at 11:51:30PM +0100, Frederic Weisbecker wrote:
> When a full dynticks CPU wakes up from sysidle state, which means that
> all full dynticks CPUs were previously sleeping, it's possible that
> all the potential timekeeping CPUs are sleeping as well and nobody
> maintains the associated duty.
> 
> But full dynticks CPUs don't run the tick by definition so we need
> to wake up a timekeeper such that it can handle the timekeeping
> duty on behalf of the freshly awoken full dyntick CPU.
> 
> To achieve this and ensure that this CPU won't deal with stale
> jiffies values, lets wake up the default timekeeper using the right
> API.
> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Alex Shi <alex.shi@linaro.org>
> Cc: Kevin Hilman <khilman@linaro.org>
> ---
>  kernel/rcu/tree_plugin.h | 2 +-
>  kernel/time/tick-sched.c | 3 ++-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 1795265..b43e32d 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2488,7 +2488,7 @@ void rcu_sysidle_force_exit(void)
>  				      oldstate, RCU_SYSIDLE_NOT);
>  		if (oldstate == newoldstate &&
>  		    oldstate == RCU_SYSIDLE_FULL_NOTED) {
> -			smp_send_reschedule(tick_timekeeping_default_cpu());
> +			tick_nohz_full_kick_timekeeping();

OK, I guess I should look at the patches in order.  So yes, it is no
longer safe to just kick tick_do_timer_cpu.  ;-)

Never mind my question on patch 12/13 in this series.

							Thanx, Paul

>  			return; /* We cleared it, done! */
>  		}
>  		oldstate = newoldstate;
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 94b6901..f5ae69f 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -302,7 +302,8 @@ void tick_nohz_full_kick_all(void)
>  /**
>   * tick_nohz_full_kick_timekeeping - kick the default timekeeper
>   *
> - * kick the default timekeeper when a secondary timekeeper goes offline.
> + * kick the default timekeeper when full dynticks CPUs exit full
> + * system idle state or when a secondary timekeeper goes offline.
>   */
>  void tick_nohz_full_kick_timekeeping(void)
>  {
> -- 
> 1.8.3.1
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 10/13] nohz: Hand over timekeeping duty on cpu offlining
  2013-12-17 22:51 ` [PATCH 10/13] nohz: Hand over timekeeping duty on cpu offlining Frederic Weisbecker
@ 2013-12-17 23:40   ` Paul E. McKenney
  2013-12-18 14:19     ` Frederic Weisbecker
  2013-12-18 12:30   ` Peter Zijlstra
  1 sibling, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2013-12-17 23:40 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, John Stultz, Alex Shi, Kevin Hilman

On Tue, Dec 17, 2013 at 11:51:29PM +0100, Frederic Weisbecker wrote:
> When there are full dynticks CPUs around and the timekeeper goes
> offline, we have to hand over the timekeeping duty to another potential
> timekeeper.
> 
> The default timekeeper (aka CPU 0) is the perfect candidate for this
> task since it can't be offlined itself.
> 
> So lets send an IPI to the default timekeeping when the current
> timekeeper goes offline, so that the duty is relayed.

A few comments below.

							Thanx, Paul

> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Alex Shi <alex.shi@linaro.org>
> Cc: Kevin Hilman <khilman@linaro.org>
> ---
>  include/linux/tick.h     |  2 ++
>  kernel/time/tick-sched.c | 31 +++++++++++++++++++++++++++++++
>  2 files changed, 33 insertions(+)
> 
> diff --git a/include/linux/tick.h b/include/linux/tick.h
> index af98d2c..bd3c32e 100644
> --- a/include/linux/tick.h
> +++ b/include/linux/tick.h
> @@ -218,6 +218,7 @@ extern void tick_nohz_init(void);
>  extern void __tick_nohz_full_check(void);
>  extern void tick_nohz_full_kick(void);
>  extern void tick_nohz_full_kick_all(void);
> +extern void tick_nohz_full_kick_timekeeping(void);
>  extern void __tick_nohz_task_switch(struct task_struct *tsk);
>  # else
>  static inline void tick_nohz_init(void) { }
> @@ -227,6 +228,7 @@ static inline bool tick_timekeeping_cpu(int cpu) { return true; }
>  static inline void __tick_nohz_full_check(void) { }
>  static inline void tick_nohz_full_kick(void) { }
>  static inline void tick_nohz_full_kick_all(void) { }
> +static inline void tick_nohz_full_kick_timekeeping(void) { }
>  static inline void __tick_nohz_task_switch(struct task_struct *tsk) { }
>  #endif
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 527b501..94b6901 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -217,6 +217,12 @@ static u64 tick_timekeeping_max_deferment(struct tick_sched *ts)
>  		return timekeeping_max_deferment();
> 
>  	/*
> +	 * Order tick_do_timer_cpu read against the IPI, pairs with
> +	 * tick_nohz_full_kick_timekeeping()
> +	 */
> +	smp_rmb();

If this is the handler for the smp_send_reschedule(), then the above
memory barrier is not needed.  (See my comment below.)

> +
> +	/*
>  	 * If we are the timekeeper and all full dynticks CPUs are idle,
>  	 * then we can finally sleep.
>  	 */
> @@ -293,6 +299,22 @@ void tick_nohz_full_kick_all(void)
>  	preempt_enable();
>  }
> 
> +/**
> + * tick_nohz_full_kick_timekeeping - kick the default timekeeper
> + *
> + * kick the default timekeeper when a secondary timekeeper goes offline.
> + */
> +void tick_nohz_full_kick_timekeeping(void)
> +{
> +	tick_do_timer_cpu = tick_timekeeping_default_cpu();
> +	/*
> +	 * Order tick_do_timer_cpu against the IPI, pairs with
> +	 * tick_timekeeping_max_deferment on irq exit.
> +	 */
> +	smp_wmb();

But the IPI is supposed to provide full ordering between the CPU invoking
the IPI and the IPI handler, right?  I do not believe that you need
the above smp_wmb() -- though keeping the comment stating that you are
relying on the implicit barrier in IPI would be good.

> +	smp_send_reschedule(tick_timekeeping_default_cpu());

Again, smp_send_reschedule()'s IPI hander does not necessarily do
anything if there is nothing for the scheduler to do, so any needed
actions are taking in the return-from-interrupt code?

> +}
> +
>  /*
>   * Re-evaluate the need for the tick as we switch the current task.
>   * It might need the tick due to per task/process properties:
> @@ -351,6 +373,15 @@ static int tick_nohz_cpu_down_callback(struct notifier_block *nfb,
>  		if (tick_nohz_full_running && tick_timekeeping_default_cpu() == cpu)
>  			return NOTIFY_BAD;
>  		break;
> +
> +	case CPU_DYING:
> +		/*
> +		 * Notify default timekeeper if we are giving up
> +		 * timekeeping duty
> +		 */
> +		if (tick_nohz_full_running && tick_do_timer_cpu == cpu)
> +			tick_nohz_full_kick_timekeeping();
> +		break;
>  	}
>  	return NOTIFY_OK;
>  }
> -- 
> 1.8.3.1
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/13] rcu: Exclude all potential timekeepers from sysidle detection
  2013-12-17 23:27   ` Paul E. McKenney
@ 2013-12-17 23:49     ` Frederic Weisbecker
  2013-12-18 11:43       ` Peter Zijlstra
  0 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-17 23:49 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, John Stultz, Alex Shi, Kevin Hilman

On Tue, Dec 17, 2013 at 03:27:14PM -0800, Paul E. McKenney wrote:
> On Tue, Dec 17, 2013 at 11:51:22PM +0100, Frederic Weisbecker wrote:
> > The purpose of the full system idle detection is to notify the CPU
> > handling the timekeeping when the rest of the system is idle so that it
> > can sleep when nobody needs the jiffies nor GTOD to be maintained.
> > 
> > Now this machinery excludes CPU 0 itself from the range of the idle
> > detection because if CPU 0 has any non-idle task to execute, it is going
> > to restart its own tick since it's guaranteed to be outside the full
> > dynticks range. And as it is the only eligible timekeeper when
> > CONFIG_NO_HZ_FULL=y anyway, it can handle the timekeeping duty for
> > and by itself.
> > 
> > Still we also plan to extend the timekeepers affinity and allow every CPU
> > outside the full dynticks range to handle the timekeeping duty, not just
> > CPU 0.
> > 
> > So once we reach that step, we can state that all CPUs that are not
> > full dynticks can be excluded from the full system idle detection,
> > simply because those CPUs share the same property than CPU 0 today. When
> > a non full dynticks CPU needs to run some busy task, it restarts its
> > tick and handles the timekeeping duty for its own needs as is currently
> > done under CONFIG_NO_HZ_IDLE=y.
> > 
> > To prepare for this support in the sysidle detection, we can use the
> > tick_timekeeping_cpu() API which checks if a CPU is allowed to handle
> > timekeeping duty. If so we can conclude that it's not full dynticks and
> > it can maintain timekeeping by itself and as such it can be excluded
> > from the sysidle detection.
> > 
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: John Stultz <john.stultz@linaro.org>
> > Cc: Alex Shi <alex.shi@linaro.org>
> > Cc: Kevin Hilman <khilman@linaro.org>
> 
> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Thanks!

> 
> A few comments below as well.
> 
> > ---
> >  kernel/rcu/tree_plugin.h | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 6abb03d..08004da 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> 
> The rcu_sysidle_force_exit() function uses tick_do_timer_cpu, but
> presumably needs to continue doing so in order to whack the right
> CPU over the head.  I am happy to defer worrying about the interaction
> with multiple timekeeping CPUs for the moment.  ;-)

So yeah, I changed that in "nohz: Wake up timekeeper on exit from sysidle state".
We always target CPU 0 for the IPI.

That's to start simple as CPU 0 can't be offlined. But we certainly want
to optimize that by kicking a potential timekeeper that is not idle.

But that require a lookup like get_timer_nohz_target() and some safety against
CPU hotplug. So that needs more thought :)

> 
> > @@ -2539,7 +2539,7 @@ static void rcu_sysidle_exit(struct rcu_dynticks *rdtp, int irq)
> >  	 * invoke rcu_sysidle_force_exit() directly if it does anything
> >  	 * more than take a scheduling-clock interrupt.
> >  	 */
> > -	if (smp_processor_id() == tick_do_timer_cpu)
> > +	if (tick_timekeeping_cpu(smp_processor_id()))
> >  		return;
> > 
> >  	/* Update system-idle state: We are clearly no longer fully idle! */
> > @@ -2563,10 +2563,10 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle,
> >  	 * is an offline or the timekeeping CPU, nothing to do.
> >  	 */
> >  	if (!*isidle || rdp->rsp != rcu_sysidle_state ||
> > -	    cpu_is_offline(rdp->cpu) || rdp->cpu == tick_do_timer_cpu)
> > +	    cpu_is_offline(rdp->cpu) || tick_timekeeping_cpu(rdp->cpu))
> >  		return;
> >  	if (rcu_gp_in_progress(rdp->rsp))
> > -		WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);
> > +		WARN_ON_ONCE(!tick_timekeeping_cpu(smp_processor_id()));
> > 
> >  	/* Pick up current idle and NMI-nesting counter and check. */
> >  	cur = atomic_read(&rdtp->dynticks_idle);
> 
> The rcu_bind_gp_kthread() uses tick_do_timer_cpu to figure out where
> to run.  Is there some CPU mask that it should use instead once there
> can be multiple timekeeping CPUs?

Good point, we'll need to build one, or use ~nohz_full_mask

Thanks.

> 
> > @@ -2729,7 +2729,7 @@ bool rcu_sys_is_idle(void)
> >  	static struct rcu_sysidle_head rsh;
> >  	int rss = ACCESS_ONCE(full_sysidle_state);
> > 
> > -	if (WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu))
> > +	if (WARN_ON_ONCE(!tick_timekeeping_cpu(smp_processor_id())))
> >  		return false;
> > 
> >  	/* Handle small-system case by doing a full scan of CPUs. */
> > -- 
> > 1.8.3.1
> > 
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 09/13] nohz: Allow timekeeper's tick to stop when all full dynticks CPUs are idle
  2013-12-17 22:51 ` [PATCH 09/13] nohz: Allow timekeeper's tick to stop when all full dynticks CPUs are idle Frederic Weisbecker
@ 2013-12-17 23:51   ` Paul E. McKenney
  2013-12-18 14:36     ` Frederic Weisbecker
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2013-12-17 23:51 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, John Stultz, Alex Shi, Kevin Hilman

On Tue, Dec 17, 2013 at 11:51:28PM +0100, Frederic Weisbecker wrote:
> When all full dynticks CPUs are idle, as detected by RCU's sysidle
> detection, there is no need to keep the timekeeping CPU's tick alive
> anymore. So lets shut it down when we meet this favourable state. The
> timekeeper will be notified with an IPI if any full dynticks CPU
> wakes up.
> 
> Also, since we plan to allow every CPUs outside the full dynticks range
> to handle the timekeeping duty, lets also allow the timekeeping duty
> to be balanced. The only requirement is that the last timekeeper can't
> shut down its idle tick further than 1 jiffie until some other CPU
> takes its duty or until all full dynticks CPUs go to sleep.

Some questions below...

							Thanx, Paul

> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Alex Shi <alex.shi@linaro.org>
> Cc: Kevin Hilman <khilman@linaro.org>
> ---
>  kernel/time/tick-sched.c | 67 ++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 50 insertions(+), 17 deletions(-)
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 0d2d774..527b501 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -192,6 +192,49 @@ static bool can_stop_full_tick(void)
>  	return true;
>  }
> 
> +/*
> + * Fetch max deferment for the current clockevent source until it overflows.
> + * Also in full dynticks environment, make sure the current timekeeper
> + * stays periodic until some other CPU can take its timekeeping duty
> + * or until all full dynticks go to sleep.
> + */
> +static u64 tick_timekeeping_max_deferment(struct tick_sched *ts)
> +{
> +	int cpu;
> +	u64 ret = KTIME_MAX;
> +
> +	/*
> +	 * Fast path for full dynticks off-case: skip to
> +	 * clockevent max deferment
> +	 */
> +	if (!tick_nohz_full_enabled())
> +		return timekeeping_max_deferment();
> +
> +	cpu = smp_processor_id();
> +
> +	/* Full dynticks CPU don't take timekeeping duty */
> +	if (!tick_timekeeping_cpu(cpu))
> +		return timekeeping_max_deferment();
> +
> +	/*
> +	 * If we are the timekeeper and all full dynticks CPUs are idle,
> +	 * then we can finally sleep.
> +	 */
> +	if (tick_do_timer_cpu == cpu ||
> +	    (tick_do_timer_cpu == TICK_DO_TIMER_NONE &&	ts->do_timer_last == 1)) {
> +		if (!rcu_sys_is_idle()) {

So multiple CPUs could call rcu_sys_is_idle()?  Seems like this could
happen if tick_do_timer_cpu == TICK_DO_TIMER_NONE.  This would be OK only
if tick_timekeeping_cpu() returns true for one and only one of the CPUs
at any given range of time -- and also that no one calls rcu_sys_is_idle()
during a timekeeping CPU handoff.

If two different CPUs call rcu_sys_is_idle() anywhere nearly concurrently
on a small system (CONFIG_NO_HZ_FULL_SYSIDLE_SMALL), rcu_sys_is_idle()
will break and you will have voided your warranty.  ;-)

Also, if tick_timekeeping_cpu() doesn't think that there is a timekeeping
CPU, rcu_sys_is_idle() will always return false.  I think that this is
what you want to happen, just checking.

> +			/*
> +			 * Stop tick for 1 jiffy. In practice we stay periodic
> +			 * but that let us possibly delegate our timekeeping duty
> +			 * to stop the tick for real in the future.
> +			 */
> +			ret = tick_period.tv64;
> +		}

Do we need to set tick_do_timer_cpu to cpu?  Or is that handled elsewhere?
(If this is the boot-safety feature deleted below, could we please have
the comment back here?)

> +	}
> +
> +	return min_t(u64, ret, timekeeping_max_deferment());
> +}
> +
>  static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now);
> 
>  /*
> @@ -352,7 +395,12 @@ void __init tick_nohz_init(void)
>  	cpulist_scnprintf(nohz_full_buf, sizeof(nohz_full_buf), tick_nohz_full_mask);
>  	pr_info("NO_HZ: Full dynticks CPUs: %s.\n", nohz_full_buf);
>  }
> -#endif
> +# else /* CONFIG_NO_HZ_FULL */
> +static u64 tick_timekeeping_max_deferment(struct tick_sched *ts)
> +{
> +	return timekeeping_max_deferment();
> +}
> +#endif /* CONFIG_NO_HZ_FULL */
> 
>  /*
>   * NOHZ - aka dynamic tick functionality
> @@ -532,7 +580,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
>  	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
>  	u64 time_delta;
> 
> -	time_delta = timekeeping_max_deferment();
> +	time_delta = tick_timekeeping_max_deferment(ts);
> 
>  	/* Read jiffies and the time when jiffies were updated last */
>  	do {
> @@ -726,21 +774,6 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
>  		return false;
>  	}
> 
> -	if (tick_nohz_full_enabled()) {
> -		/*
> -		 * Keep the tick alive to guarantee timekeeping progression
> -		 * if there are full dynticks CPUs around
> -		 */
> -		if (tick_do_timer_cpu == cpu)
> -			return false;
> -		/*
> -		 * Boot safety: make sure the timekeeping duty has been
> -		 * assigned before entering dyntick-idle mode,
> -		 */
> -		if (tick_do_timer_cpu == TICK_DO_TIMER_NONE)
> -			return false;
> -	}
> -
>  	return true;
>  }
> 
> -- 
> 1.8.3.1
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 11/13] nohz: Wake up timekeeper on exit from sysidle state
  2013-12-17 23:34   ` Paul E. McKenney
@ 2013-12-17 23:52     ` Frederic Weisbecker
  0 siblings, 0 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-17 23:52 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, John Stultz, Alex Shi, Kevin Hilman

On Tue, Dec 17, 2013 at 03:34:54PM -0800, Paul E. McKenney wrote:
> On Tue, Dec 17, 2013 at 11:51:30PM +0100, Frederic Weisbecker wrote:
> > When a full dynticks CPU wakes up from sysidle state, which means that
> > all full dynticks CPUs were previously sleeping, it's possible that
> > all the potential timekeeping CPUs are sleeping as well and nobody
> > maintains the associated duty.
> > 
> > But full dynticks CPUs don't run the tick by definition so we need
> > to wake up a timekeeper such that it can handle the timekeeping
> > duty on behalf of the freshly awoken full dyntick CPU.
> > 
> > To achieve this and ensure that this CPU won't deal with stale
> > jiffies values, lets wake up the default timekeeper using the right
> > API.
> > 
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: John Stultz <john.stultz@linaro.org>
> > Cc: Alex Shi <alex.shi@linaro.org>
> > Cc: Kevin Hilman <khilman@linaro.org>
> > ---
> >  kernel/rcu/tree_plugin.h | 2 +-
> >  kernel/time/tick-sched.c | 3 ++-
> >  2 files changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 1795265..b43e32d 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -2488,7 +2488,7 @@ void rcu_sysidle_force_exit(void)
> >  				      oldstate, RCU_SYSIDLE_NOT);
> >  		if (oldstate == newoldstate &&
> >  		    oldstate == RCU_SYSIDLE_FULL_NOTED) {
> > -			smp_send_reschedule(tick_timekeeping_default_cpu());
> > +			tick_nohz_full_kick_timekeeping();
> 
> OK, I guess I should look at the patches in order.  So yes, it is no
> longer safe to just kick tick_do_timer_cpu.  ;-)

Hehe, I do linear reviews as well ;)

Indeed it's no longer safe, especially since tick_do_timer_cpu can be TICK_DO_TIMER_NONE.
So this always kick CPU 0 instead (for now).

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 07/13] sched: Enable IPI reception on timekeeper under nohz full system
  2013-12-17 22:51 ` [PATCH 07/13] sched: Enable IPI reception on timekeeper under nohz full system Frederic Weisbecker
@ 2013-12-17 23:52   ` Paul E. McKenney
  2013-12-18 14:49     ` Frederic Weisbecker
  2013-12-18 10:06   ` Peter Zijlstra
  1 sibling, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2013-12-17 23:52 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, John Stultz, Alex Shi, Kevin Hilman

On Tue, Dec 17, 2013 at 11:51:26PM +0100, Frederic Weisbecker wrote:
> We need the default timekeeping CPU to be able to receive IPIs sent
> from full dynticks CPUs when they wake up from full system idle state.
> 
> Therefore we need an entrypoint from the scheduler IPI so that the
> need to poll on timekeeping duty is re-evaluated from irq_exit().
> 
> In order to achieve this, lets take the scheduler IPI everytime as long
> as there is at least one full dynticks CPU around. Full dynticks CPUs
> are interested too in taking scheduler IPIs to reevaluate their tick.
> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Alex Shi <alex.shi@linaro.org>
> Cc: Kevin Hilman <khilman@linaro.org>
> ---
>  kernel/sched/core.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index e85cda2..f46a7bc 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1502,9 +1502,9 @@ void scheduler_ipi(void)
>  	if (tif_need_resched())
>  		set_preempt_need_resched();
> 
> -	if (llist_empty(&this_rq()->wake_list)
> -			&& !tick_nohz_full_cpu(smp_processor_id())
> -			&& !got_nohz_idle_kick())
> +	if (llist_empty(&this_rq()->wake_list) &&
> +	    !tick_nohz_full_enabled() &&
> +	    !got_nohz_idle_kick())
>  		return;

OK, this is what I was missing in my question about whether the
NO_HZ_FULL state was re-evaluated in the interrupt-return path.

						Thanx, Paul

>  	/*
> -- 
> 1.8.3.1
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 06/13] nohz: Introduce full dynticks' default timekeeping target
  2013-12-17 22:51 ` [PATCH 06/13] nohz: Introduce full dynticks' default timekeeping target Frederic Weisbecker
@ 2013-12-17 23:54   ` Paul E. McKenney
  0 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2013-12-17 23:54 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, John Stultz, Alex Shi, Kevin Hilman

On Tue, Dec 17, 2013 at 11:51:25PM +0100, Frederic Weisbecker wrote:
> When a full dynticks CPU wakes up while the whole rest of the system
> is idle, we need to wake up the CPU in charge of the timekeeping duty
> handling.
> 
> As of today, the CPU that maintains this duty is CPU 0 when
> CONFIG_NO_HZ_FULL=y. So referring to tick_do_timer_cpu like we
> currently do is correct. But this behaviour is subject to change
> in the future because we want to balance the timekeeping duty over all
> the CPUs outside the full dynticks range.
> 
> As such we now need to define a default timekeeping CPU which receives
> the timekeeping wakeup IPIs and which can't be offlined so that it's
> guaranteed to always be present for full dynticks CPUs housekeeping.
> 
> So lets stick to CPU 0 for this purpose. It's convenient because
> rejecting any other CPU's offlining request may result in suspend
> failure.

OK, so not multiple CPUs yet.  Whew!

Well, at least you know some of my concerns with multiple timekeeping
CPUs beforehand, which would not have happened had I reviewed the
patches in order.  ;-)

							Thanx, Paul

> We can optimize this solution later by adaptively sending the IPI
> to a potential timekeeping CPU that is already running a non idle task.
> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Alex Shi <alex.shi@linaro.org>
> Cc: Kevin Hilman <khilman@linaro.org>
> ---
>  include/linux/tick.h     | 16 ++++++++++++++++
>  kernel/rcu/tree_plugin.h |  4 ++--
>  kernel/time/tick-sched.c |  2 +-
>  3 files changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/tick.h b/include/linux/tick.h
> index cf2fd34..af98d2c 100644
> --- a/include/linux/tick.h
> +++ b/include/linux/tick.h
> @@ -180,6 +180,22 @@ static inline bool tick_nohz_full_cpu(int cpu)
>  }
> 
>  /**
> + * tick_timekeeping_default_cpu - seek timekeeping default CPU
> +
> + * @return the default target which we send an IPI to
> + * when a full dynticks CPU wakes up and exits from full
> + * system idle state.
> + *
> + * This target is always CPU 0 in full dynticks environment.
> + * If we were to pick up any other CPU, that would result in suspend
> + * failures due to rejected offlining request.
> + */
> +static inline int tick_timekeeping_default_cpu(void)
> +{
> +	return 0;
> +}
> +
> +/**
>   * tick_timeeping_cpu - check if a CPU is elligble to handle timekeeping duty
>   * @cpu:	the cpu to check
>   *
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 84d90c8..1795265 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2488,7 +2488,7 @@ void rcu_sysidle_force_exit(void)
>  				      oldstate, RCU_SYSIDLE_NOT);
>  		if (oldstate == newoldstate &&
>  		    oldstate == RCU_SYSIDLE_FULL_NOTED) {
> -			smp_send_reschedule(tick_do_timer_cpu);
> +			smp_send_reschedule(tick_timekeeping_default_cpu());
>  			return; /* We cleared it, done! */
>  		}
>  		oldstate = newoldstate;
> @@ -2597,7 +2597,7 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp)
>   */
>  static void rcu_bind_gp_kthread(void)
>  {
> -	int cpu = ACCESS_ONCE(tick_do_timer_cpu);
> +	int cpu = tick_timekeeping_default_cpu();
> 
>  	if (cpu < 0 || cpu >= nr_cpu_ids)
>  		return;
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index ea0d411..9a91c31 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -305,7 +305,7 @@ static int tick_nohz_cpu_down_callback(struct notifier_block *nfb,
>  		 * If we handle the timekeeping duty for full dynticks CPUs,
>  		 * we can't safely shutdown that CPU.
>  		 */
> -		if (tick_nohz_full_running && tick_do_timer_cpu == cpu)
> +		if (tick_nohz_full_running && tick_timekeeping_default_cpu() == cpu)
>  			return NOTIFY_BAD;
>  		break;
>  	}
> -- 
> 1.8.3.1
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 04/13] tick: Use timekeeping_cpu() to elect the CPU handling timekeeping duty
  2013-12-17 22:51 ` [PATCH 04/13] tick: Use timekeeping_cpu() to elect the CPU handling timekeeping duty Frederic Weisbecker
@ 2013-12-17 23:55   ` Paul E. McKenney
  0 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2013-12-17 23:55 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, John Stultz, Alex Shi, Kevin Hilman

On Tue, Dec 17, 2013 at 11:51:23PM +0100, Frederic Weisbecker wrote:
> Now that we have an API to determine if a CPU is allowed to handle
> timekeeping duty, use it now on timekeeper selection time for clarity.
> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Alex Shi <alex.shi@linaro.org>
> Cc: Kevin Hilman <khilman@linaro.org>

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> ---
>  kernel/time/tick-common.c | 2 +-
>  kernel/time/tick-sched.c  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index 64522ec..755dcd6 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -163,7 +163,7 @@ static void tick_setup_device(struct tick_device *td,
>  		 * this cpu:
>  		 */
>  		if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
> -			if (!tick_nohz_full_cpu(cpu))
> +			if (tick_timekeeping_cpu(cpu))
>  				tick_do_timer_cpu = cpu;
>  			else
>  				tick_do_timer_cpu = TICK_DO_TIMER_NONE;
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index df6432d..ea0d411 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -118,7 +118,7 @@ static void tick_sched_do_timer(ktime_t now)
>  	 * jiffies_lock.
>  	 */
>  	if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)
> -	    && !tick_nohz_full_cpu(cpu))
> +	    && tick_timekeeping_cpu(cpu))
>  		tick_do_timer_cpu = cpu;
>  #endif
> 
> -- 
> 1.8.3.1
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep
  2013-12-17 22:51 [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep Frederic Weisbecker
                   ` (12 preceding siblings ...)
  2013-12-17 22:51 ` [PATCH 13/13] nohz_full: fix code style issue of tick_nohz_full_stop_tick Frederic Weisbecker
@ 2013-12-18  2:04 ` Alex Shi
  2013-12-18 10:19   ` Peter Zijlstra
  2013-12-18 17:43   ` Frederic Weisbecker
  13 siblings, 2 replies; 57+ messages in thread
From: Alex Shi @ 2013-12-18  2:04 UTC (permalink / raw)
  To: Frederic Weisbecker, LKML
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
	Paul E. McKenney, John Stultz, Kevin Hilman

On 12/18/2013 06:51 AM, Frederic Weisbecker wrote:
> So this is what this series brings, more details following:
> 
> * Some code, naming and whitespace cleanups
> 
> * Allow all CPUs outside the nohz_full range to handle the timekeeping
>   duty, not just CPU 0. Balancing the timekeeping duty should improve
>   powersavings.

If the system just has one nohz_full cpu running, it will need another
cpu to do timerkeeper job. Then the system roughly needs 2 cpu living.
>From powersaving POV, that is not good compare to normal nohz idle.
> 
> * Let the timekeeper (including CPU 0) sleep when its duty is
>   handed over to another CPU
> 
> * Allow timekeeper to sleep when all full dynticks CPUs are sleeping
>   (plug nohz to RCU sysidle detection)

Thanks Fredic!
It is much better on powersaving POV compare to current nohz_full. :)
> 
> * Wake up timekeeper with an IPI when full dynticks CPUs exit sysidle
>   state
> 
> * Wake up CPU 0 when a secondary timekeeper is offlined so that its
>   duty gets migrated


-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 07/13] sched: Enable IPI reception on timekeeper under nohz full system
  2013-12-17 22:51 ` [PATCH 07/13] sched: Enable IPI reception on timekeeper under nohz full system Frederic Weisbecker
  2013-12-17 23:52   ` Paul E. McKenney
@ 2013-12-18 10:06   ` Peter Zijlstra
  1 sibling, 0 replies; 57+ messages in thread
From: Peter Zijlstra @ 2013-12-18 10:06 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Steven Rostedt,
	Paul E. McKenney, John Stultz, Alex Shi, Kevin Hilman

On Tue, Dec 17, 2013 at 11:51:26PM +0100, Frederic Weisbecker wrote:
> We need the default timekeeping CPU to be able to receive IPIs sent
> from full dynticks CPUs when they wake up from full system idle state.
> 
> Therefore we need an entrypoint from the scheduler IPI so that the
> need to poll on timekeeping duty is re-evaluated from irq_exit().
> 
> In order to achieve this, lets take the scheduler IPI everytime as long
> as there is at least one full dynticks CPU around. Full dynticks CPUs
> are interested too in taking scheduler IPIs to reevaluate their tick.

I should probably read the other patches in this series, but the above
makes my brain go *beeeep* and throw an error.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep
  2013-12-18  2:04 ` [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep Alex Shi
@ 2013-12-18 10:19   ` Peter Zijlstra
  2013-12-18 14:18     ` Paul E. McKenney
  2013-12-18 17:43   ` Frederic Weisbecker
  1 sibling, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2013-12-18 10:19 UTC (permalink / raw)
  To: Alex Shi
  Cc: Frederic Weisbecker, LKML, Thomas Gleixner, Ingo Molnar,
	Steven Rostedt, Paul E. McKenney, John Stultz, Kevin Hilman

On Wed, Dec 18, 2013 at 10:04:43AM +0800, Alex Shi wrote:
> On 12/18/2013 06:51 AM, Frederic Weisbecker wrote:
> Thanks Fredic!
> It is much better on powersaving POV compare to current nohz_full. :)

If you're using nohz_full for powersaving, you're doing it wrong.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/13] rcu: Exclude all potential timekeepers from sysidle detection
  2013-12-17 23:49     ` Frederic Weisbecker
@ 2013-12-18 11:43       ` Peter Zijlstra
  2013-12-18 11:46         ` Peter Zijlstra
                           ` (2 more replies)
  0 siblings, 3 replies; 57+ messages in thread
From: Peter Zijlstra @ 2013-12-18 11:43 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E. McKenney, LKML, Thomas Gleixner, Ingo Molnar,
	Steven Rostedt, John Stultz, Alex Shi, Kevin Hilman

On Wed, Dec 18, 2013 at 12:49:15AM +0100, Frederic Weisbecker wrote:
> That's to start simple as CPU 0 can't be offlined.

Yes it can, see CONFIG_BOOTPARAM_HOTPLUG_CPU0

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/13] rcu: Exclude all potential timekeepers from sysidle detection
  2013-12-18 11:43       ` Peter Zijlstra
@ 2013-12-18 11:46         ` Peter Zijlstra
  2013-12-18 14:15         ` Paul E. McKenney
  2013-12-18 16:24         ` Frederic Weisbecker
  2 siblings, 0 replies; 57+ messages in thread
From: Peter Zijlstra @ 2013-12-18 11:46 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E. McKenney, LKML, Thomas Gleixner, Ingo Molnar,
	Steven Rostedt, John Stultz, Alex Shi, Kevin Hilman

On Wed, Dec 18, 2013 at 12:43:18PM +0100, Peter Zijlstra wrote:
> On Wed, Dec 18, 2013 at 12:49:15AM +0100, Frederic Weisbecker wrote:
> > That's to start simple as CPU 0 can't be offlined.
> 
> Yes it can, see CONFIG_BOOTPARAM_HOTPLUG_CPU0

Also, that's a traditional weirdness of x86. Other archs might not have
this constraint at all and can hotplug any cpu.

So relying on the fact that cpu 0 cannot be hotplugged is broken.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/13] rcu: Fix unraised IPI to timekeeping CPU
  2013-12-17 22:51 ` [PATCH 05/13] rcu: Fix unraised IPI to timekeeping CPU Frederic Weisbecker
  2013-12-17 23:21   ` Paul E. McKenney
@ 2013-12-18 12:12   ` Peter Zijlstra
  2013-12-18 15:38     ` Frederic Weisbecker
  1 sibling, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2013-12-18 12:12 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Steven Rostedt,
	Paul E. McKenney, John Stultz, Alex Shi, Kevin Hilman

On Tue, Dec 17, 2013 at 11:51:24PM +0100, Frederic Weisbecker wrote:
> -			rcu_kick_nohz_cpu(tick_do_timer_cpu);
> +			smp_send_reschedule(tick_do_timer_cpu);

Hurm.. so I don't really like this (same for the other patches doing the
similar change).

Why not have a function called: wake_time_keeper_cpu(), and have that do
the right thing?

Also, afaict, all you want is that remote cpu to run
tick_nohz_full_check(), right?

So why are you (ab)using the scheduler_ipi for this at all? Why not have
something like:

static DEFINE_PER_CPU(struct call_single_data, nohz_full_csd);

int init_nohz_full(void)
{
	int cpu;

	for_each_possible_cpu(cpu) {
		struct call_single_data *csd = &per_cpu(nohz_full_csd, cpu);

		csd->flags = 0;
		csd->func = &tick_nohz_full_check;
		csd->info = NULL;
	}

	return 0;
}

void wake_time_keeper_cpu(void)
{
	int cpu = pick_timekeeper_cpu();
	struct single_call_data *csd = &per_cpu(nohz_full_csd, cpu);

	__smp_call_function_single(cpu, csd, 0);
}

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 10/13] nohz: Hand over timekeeping duty on cpu offlining
  2013-12-17 22:51 ` [PATCH 10/13] nohz: Hand over timekeeping duty on cpu offlining Frederic Weisbecker
  2013-12-17 23:40   ` Paul E. McKenney
@ 2013-12-18 12:30   ` Peter Zijlstra
  2013-12-18 16:43     ` Frederic Weisbecker
  1 sibling, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2013-12-18 12:30 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Steven Rostedt,
	Paul E. McKenney, John Stultz, Alex Shi, Kevin Hilman

On Tue, Dec 17, 2013 at 11:51:29PM +0100, Frederic Weisbecker wrote:
> The default timekeeper (aka CPU 0) is the perfect candidate for this
> task since it can't be offlined itself.

Like said previously, yes it can!

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/13] rcu: Fix unraised IPI to timekeeping CPU
  2013-12-17 23:21   ` Paul E. McKenney
@ 2013-12-18 14:13     ` Frederic Weisbecker
  2013-12-18 14:22       ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-18 14:13 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, John Stultz, Alex Shi, Kevin Hilman

On Tue, Dec 17, 2013 at 03:21:00PM -0800, Paul E. McKenney wrote:
> On Tue, Dec 17, 2013 at 11:51:24PM +0100, Frederic Weisbecker wrote:
> > The plan with full system idle detection is to allow the timekeeper
> > to sleep when all full dynticks CPUs are sleeping.
> > 
> > Then when a full dynticks CPU wakes up while the whole system is idle,
> > it sends an IPI to the timekeeping CPU which then restarts its tick
> > and polls on its timekeeping duty on behalf of all other CPUs in the
> > system.
> > 
> > But we are using rcu_kick_nohz_cpu() to raise this IPI, which is wrong
> > because this function is used to kick full dynticks CPUs when they run
> > in the kernel for too long without reporting a quiescent state. And
> > this function ignores targets that are not full dynticks, like our
> > timekeeper.
> > 
> > To fix this, use the smp_send_reschedule() function directly.
> 
> I guess the fact that you needed some change is reassuring.  You know
> the old saying, "no bugs, no users".  ;-)

;-)

> 
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: John Stultz <john.stultz@linaro.org>
> > Cc: Alex Shi <alex.shi@linaro.org>
> > Cc: Kevin Hilman <khilman@linaro.org>
> > ---
> >  kernel/rcu/tree_plugin.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 08004da..84d90c8 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -2488,7 +2488,7 @@ void rcu_sysidle_force_exit(void)
> >  				      oldstate, RCU_SYSIDLE_NOT);
> >  		if (oldstate == newoldstate &&
> >  		    oldstate == RCU_SYSIDLE_FULL_NOTED) {
> > -			rcu_kick_nohz_cpu(tick_do_timer_cpu);
> > +			smp_send_reschedule(tick_do_timer_cpu);
> 
> Hmmm...
> 
> We haven't done any sort of wakeup, and tick_nohz_full_cpu() should
> return false for tick_do_timer_cpu, and I don't see that we have
> done anything to make got_nohz_idle_kick() return true.
> 
> So the idea is that the fact of the interrupt is sufficient, and
> the target CPU will figure out that it must turn its scheduling-clock
> interrupt when returning from interrupt?

Exactly, the interrupt alone is sufficient and the tick is reevaluated
on irq_exit().

> 
> Or is something else going on here?
> 
> 							Thanx, Paul
> 
> >  			return; /* We cleared it, done! */
> >  		}
> >  		oldstate = newoldstate;
> > -- 
> > 1.8.3.1
> > 
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/13] rcu: Exclude all potential timekeepers from sysidle detection
  2013-12-18 11:43       ` Peter Zijlstra
  2013-12-18 11:46         ` Peter Zijlstra
@ 2013-12-18 14:15         ` Paul E. McKenney
  2013-12-18 16:24         ` Frederic Weisbecker
  2 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2013-12-18 14:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, LKML, Thomas Gleixner, Ingo Molnar,
	Steven Rostedt, John Stultz, Alex Shi, Kevin Hilman

On Wed, Dec 18, 2013 at 12:43:18PM +0100, Peter Zijlstra wrote:
> On Wed, Dec 18, 2013 at 12:49:15AM +0100, Frederic Weisbecker wrote:
> > That's to start simple as CPU 0 can't be offlined.
> 
> Yes it can, see CONFIG_BOOTPARAM_HOTPLUG_CPU0

Ah, and cpu0_hotplug boot parameter even if CONFIG_BOOTPARAM_HOTPLUG_CPU0=n.

Good to know!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep
  2013-12-18 10:19   ` Peter Zijlstra
@ 2013-12-18 14:18     ` Paul E. McKenney
  0 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2013-12-18 14:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alex Shi, Frederic Weisbecker, LKML, Thomas Gleixner,
	Ingo Molnar, Steven Rostedt, John Stultz, Kevin Hilman

On Wed, Dec 18, 2013 at 11:19:12AM +0100, Peter Zijlstra wrote:
> On Wed, Dec 18, 2013 at 10:04:43AM +0800, Alex Shi wrote:
> > On 12/18/2013 06:51 AM, Frederic Weisbecker wrote:
> > Thanks Fredic!
> > It is much better on powersaving POV compare to current nohz_full. :)
> 
> If you're using nohz_full for powersaving, you're doing it wrong.

They just want nohz_full not to totally break powersavings when the
entire system is idle.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 10/13] nohz: Hand over timekeeping duty on cpu offlining
  2013-12-17 23:40   ` Paul E. McKenney
@ 2013-12-18 14:19     ` Frederic Weisbecker
  0 siblings, 0 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-18 14:19 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, John Stultz, Alex Shi, Kevin Hilman

On Tue, Dec 17, 2013 at 03:40:23PM -0800, Paul E. McKenney wrote:
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index 527b501..94b6901 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -217,6 +217,12 @@ static u64 tick_timekeeping_max_deferment(struct tick_sched *ts)
> >  		return timekeeping_max_deferment();
> > 
> >  	/*
> > +	 * Order tick_do_timer_cpu read against the IPI, pairs with
> > +	 * tick_nohz_full_kick_timekeeping()
> > +	 */
> > +	smp_rmb();
> 
> If this is the handler for the smp_send_reschedule(), then the above
> memory barrier is not needed.  (See my comment below.)
> 
> > +
> > +	/*
> >  	 * If we are the timekeeper and all full dynticks CPUs are idle,
> >  	 * then we can finally sleep.
> >  	 */
> > @@ -293,6 +299,22 @@ void tick_nohz_full_kick_all(void)
> >  	preempt_enable();
> >  }
> > 
> > +/**
> > + * tick_nohz_full_kick_timekeeping - kick the default timekeeper
> > + *
> > + * kick the default timekeeper when a secondary timekeeper goes offline.
> > + */
> > +void tick_nohz_full_kick_timekeeping(void)
> > +{
> > +	tick_do_timer_cpu = tick_timekeeping_default_cpu();
> > +	/*
> > +	 * Order tick_do_timer_cpu against the IPI, pairs with
> > +	 * tick_timekeeping_max_deferment on irq exit.
> > +	 */
> > +	smp_wmb();
> 
> But the IPI is supposed to provide full ordering between the CPU invoking
> the IPI and the IPI handler, right?

Great! I remember you told me so once but I wasn't sure. Thanks for
the confirmation!

> I do not believe that you need
> the above smp_wmb() -- though keeping the comment stating that you are
> relying on the implicit barrier in IPI would be good.

Indeed, I'll keep the comment on what the code rely on concerning implicit
ordering guarantees and then remove the explicit barriers.

> 
> > +	smp_send_reschedule(tick_timekeeping_default_cpu());
> 
> Again, smp_send_reschedule()'s IPI hander does not necessarily do
> anything if there is nothing for the scheduler to do, so any needed
> actions are taking in the return-from-interrupt code?

Yep, this is all handled from irq exit:

irq_exit -> tick_nohz_irq_exit() -> tick_nohz_stop_sched_tick() -> tick_timekeeping_max_deferment()

The latter function performs the tick re-evaluation on top of the need or not for the current
CPU to handle the timekeeping on behalf of others.

Thanks.

> 
> > +}
> > +
> >  /*
> >   * Re-evaluate the need for the tick as we switch the current task.
> >   * It might need the tick due to per task/process properties:
> > @@ -351,6 +373,15 @@ static int tick_nohz_cpu_down_callback(struct notifier_block *nfb,
> >  		if (tick_nohz_full_running && tick_timekeeping_default_cpu() == cpu)
> >  			return NOTIFY_BAD;
> >  		break;
> > +
> > +	case CPU_DYING:
> > +		/*
> > +		 * Notify default timekeeper if we are giving up
> > +		 * timekeeping duty
> > +		 */
> > +		if (tick_nohz_full_running && tick_do_timer_cpu == cpu)
> > +			tick_nohz_full_kick_timekeeping();
> > +		break;
> >  	}
> >  	return NOTIFY_OK;
> >  }
> > -- 
> > 1.8.3.1
> > 
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/13] rcu: Fix unraised IPI to timekeeping CPU
  2013-12-18 14:13     ` Frederic Weisbecker
@ 2013-12-18 14:22       ` Paul E. McKenney
  2013-12-18 14:56         ` Frederic Weisbecker
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2013-12-18 14:22 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, John Stultz, Alex Shi, Kevin Hilman

On Wed, Dec 18, 2013 at 03:13:52PM +0100, Frederic Weisbecker wrote:
> On Tue, Dec 17, 2013 at 03:21:00PM -0800, Paul E. McKenney wrote:
> > On Tue, Dec 17, 2013 at 11:51:24PM +0100, Frederic Weisbecker wrote:
> > > The plan with full system idle detection is to allow the timekeeper
> > > to sleep when all full dynticks CPUs are sleeping.
> > > 
> > > Then when a full dynticks CPU wakes up while the whole system is idle,
> > > it sends an IPI to the timekeeping CPU which then restarts its tick
> > > and polls on its timekeeping duty on behalf of all other CPUs in the
> > > system.
> > > 
> > > But we are using rcu_kick_nohz_cpu() to raise this IPI, which is wrong
> > > because this function is used to kick full dynticks CPUs when they run
> > > in the kernel for too long without reporting a quiescent state. And
> > > this function ignores targets that are not full dynticks, like our
> > > timekeeper.
> > > 
> > > To fix this, use the smp_send_reschedule() function directly.
> > 
> > I guess the fact that you needed some change is reassuring.  You know
> > the old saying, "no bugs, no users".  ;-)
> 
> ;-)
> 
> > 
> > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Ingo Molnar <mingo@kernel.org>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > Cc: John Stultz <john.stultz@linaro.org>
> > > Cc: Alex Shi <alex.shi@linaro.org>
> > > Cc: Kevin Hilman <khilman@linaro.org>
> > > ---
> > >  kernel/rcu/tree_plugin.h | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index 08004da..84d90c8 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -2488,7 +2488,7 @@ void rcu_sysidle_force_exit(void)
> > >  				      oldstate, RCU_SYSIDLE_NOT);
> > >  		if (oldstate == newoldstate &&
> > >  		    oldstate == RCU_SYSIDLE_FULL_NOTED) {
> > > -			rcu_kick_nohz_cpu(tick_do_timer_cpu);
> > > +			smp_send_reschedule(tick_do_timer_cpu);
> > 
> > Hmmm...
> > 
> > We haven't done any sort of wakeup, and tick_nohz_full_cpu() should
> > return false for tick_do_timer_cpu, and I don't see that we have
> > done anything to make got_nohz_idle_kick() return true.
> > 
> > So the idea is that the fact of the interrupt is sufficient, and
> > the target CPU will figure out that it must turn its scheduling-clock
> > interrupt when returning from interrupt?
> 
> Exactly, the interrupt alone is sufficient and the tick is reevaluated
> on irq_exit().

But if that is the case, why do you need the change to scheduler_ipi()
in patch 07/13?  Just having received any sort of IPI should suffice.

							Thanx, Paul

> > Or is something else going on here?
> > 
> > 							Thanx, Paul
> > 
> > >  			return; /* We cleared it, done! */
> > >  		}
> > >  		oldstate = newoldstate;
> > > -- 
> > > 1.8.3.1
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 09/13] nohz: Allow timekeeper's tick to stop when all full dynticks CPUs are idle
  2013-12-17 23:51   ` Paul E. McKenney
@ 2013-12-18 14:36     ` Frederic Weisbecker
  2013-12-18 15:29       ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-18 14:36 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, John Stultz, Alex Shi, Kevin Hilman

On Tue, Dec 17, 2013 at 03:51:17PM -0800, Paul E. McKenney wrote:
> On Tue, Dec 17, 2013 at 11:51:28PM +0100, Frederic Weisbecker wrote:
> > +/*
> > + * Fetch max deferment for the current clockevent source until it overflows.
> > + * Also in full dynticks environment, make sure the current timekeeper
> > + * stays periodic until some other CPU can take its timekeeping duty
> > + * or until all full dynticks go to sleep.
> > + */
> > +static u64 tick_timekeeping_max_deferment(struct tick_sched *ts)
> > +{
> > +	int cpu;
> > +	u64 ret = KTIME_MAX;
> > +
> > +	/*
> > +	 * Fast path for full dynticks off-case: skip to
> > +	 * clockevent max deferment
> > +	 */
> > +	if (!tick_nohz_full_enabled())
> > +		return timekeeping_max_deferment();
> > +
> > +	cpu = smp_processor_id();
> > +
> > +	/* Full dynticks CPU don't take timekeeping duty */
> > +	if (!tick_timekeeping_cpu(cpu))
> > +		return timekeeping_max_deferment();
> > +
> > +	/*
> > +	 * If we are the timekeeper and all full dynticks CPUs are idle,
> > +	 * then we can finally sleep.
> > +	 */
> > +	if (tick_do_timer_cpu == cpu ||
> > +	    (tick_do_timer_cpu == TICK_DO_TIMER_NONE &&	ts->do_timer_last == 1)) {
> > +		if (!rcu_sys_is_idle()) {
> 
> So multiple CPUs could call rcu_sys_is_idle()?  Seems like this could
> happen if tick_do_timer_cpu == TICK_DO_TIMER_NONE.  This would be OK only
> if tick_timekeeping_cpu() returns true for one and only one of the CPUs
> at any given range of time -- and also that no one calls rcu_sys_is_idle()
> during a timekeeping CPU handoff.

Hmm yeah I fear we can have concurrent callers of this at a same time range.

> 
> If two different CPUs call rcu_sys_is_idle() anywhere nearly concurrently
> on a small system (CONFIG_NO_HZ_FULL_SYSIDLE_SMALL), rcu_sys_is_idle()
> will break and you will have voided your warranty.  ;-)

So it breaks because of concurrent state machine stepping on each other toes, right?
Like one CPU has reached RCU_SYSIDLE_SHORT and another comes and see only
RCU_SYSIDLE_NONE, so it can for example overwite to SHORT while the other CPU
can be already far further the cmpxchg() sequence?

Aye, I need to think further on how to cope with that...

> 
> Also, if tick_timekeeping_cpu() doesn't think that there is a timekeeping
> CPU, rcu_sys_is_idle() will always return false.  I think that this is
> what you want to happen, just checking.

Ah right but that should be fine. tick_timekeeping_cpu() works for all potential
timekeepers. Basically it's !tick_nohz_full_cpu(cpu).

> 
> > +			/*
> > +			 * Stop tick for 1 jiffy. In practice we stay periodic
> > +			 * but that let us possibly delegate our timekeeping duty
> > +			 * to stop the tick for real in the future.
> > +			 */
> > +			ret = tick_period.tv64;
> > +		}
> 
> Do we need to set tick_do_timer_cpu to cpu?  Or is that handled elsewhere?
> (If this is the boot-safety feature deleted below, could we please have
> the comment back here?)

This is done in the patch that calls ..kick_timekeeping()  from sysidle_exit().

Do you have another case in mind?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 07/13] sched: Enable IPI reception on timekeeper under nohz full system
  2013-12-17 23:52   ` Paul E. McKenney
@ 2013-12-18 14:49     ` Frederic Weisbecker
  2013-12-18 15:50       ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-18 14:49 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, John Stultz, Alex Shi, Kevin Hilman

On Tue, Dec 17, 2013 at 03:52:48PM -0800, Paul E. McKenney wrote:
> On Tue, Dec 17, 2013 at 11:51:26PM +0100, Frederic Weisbecker wrote:
> > We need the default timekeeping CPU to be able to receive IPIs sent
> > from full dynticks CPUs when they wake up from full system idle state.
> > 
> > Therefore we need an entrypoint from the scheduler IPI so that the
> > need to poll on timekeeping duty is re-evaluated from irq_exit().
> > 
> > In order to achieve this, lets take the scheduler IPI everytime as long
> > as there is at least one full dynticks CPU around. Full dynticks CPUs
> > are interested too in taking scheduler IPIs to reevaluate their tick.
> > 
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: John Stultz <john.stultz@linaro.org>
> > Cc: Alex Shi <alex.shi@linaro.org>
> > Cc: Kevin Hilman <khilman@linaro.org>
> > ---
> >  kernel/sched/core.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index e85cda2..f46a7bc 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1502,9 +1502,9 @@ void scheduler_ipi(void)
> >  	if (tif_need_resched())
> >  		set_preempt_need_resched();
> > 
> > -	if (llist_empty(&this_rq()->wake_list)
> > -			&& !tick_nohz_full_cpu(smp_processor_id())
> > -			&& !got_nohz_idle_kick())
> > +	if (llist_empty(&this_rq()->wake_list) &&
> > +	    !tick_nohz_full_enabled() &&
> > +	    !got_nohz_idle_kick())
> >  		return;
> 
> OK, this is what I was missing in my question about whether the
> NO_HZ_FULL state was re-evaluated in the interrupt-return path.

I tend to write my patchset by splitting every single logical bricks and then only
in the end I enable the feature.

But that makes a tradeoff between patchset granularity and global overview. And in the end,
may be it's unbalanced toward overview.

Notwithstanding bisectability.

I remember I had similar reactions when I posted the initial full nohz patchset.

OTOH it's not good to have big all-in-one patches. And granular patchsets like this
are better to focus discussions on each isolated topics.

There is a hard balance to find out here.
 
> 						Thanx, Paul
> 
> >  	/*
> > -- 
> > 1.8.3.1
> > 
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/13] rcu: Fix unraised IPI to timekeeping CPU
  2013-12-18 14:22       ` Paul E. McKenney
@ 2013-12-18 14:56         ` Frederic Weisbecker
  2013-12-18 15:11           ` Peter Zijlstra
  0 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-18 14:56 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, John Stultz, Alex Shi, Kevin Hilman

On Wed, Dec 18, 2013 at 06:22:40AM -0800, Paul E. McKenney wrote:
> > 
> > Exactly, the interrupt alone is sufficient and the tick is reevaluated
> > on irq_exit().
> 
> But if that is the case, why do you need the change to scheduler_ipi()
> in patch 07/13?  Just having received any sort of IPI should suffice.

Because scheduler_ipi() conditionally calls irq_enter() and irq_exit()
(I wonder if that's a good idea btw, is that here to deal with spurious
scheduler IPIs of some sort?)

And it's convenient because it can be called anywhere, even when irqs
are disabled, unlike kernel/smp.c IPIs.

But I agree it's kind of an abuse of scheduler_ipi() here. Well I'm going
to develop that in my answers to Peter as he has concerns precisely about that.

Thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/13] rcu: Fix unraised IPI to timekeeping CPU
  2013-12-18 14:56         ` Frederic Weisbecker
@ 2013-12-18 15:11           ` Peter Zijlstra
  2013-12-18 15:58             ` Frederic Weisbecker
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2013-12-18 15:11 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E. McKenney, LKML, Thomas Gleixner, Ingo Molnar,
	Steven Rostedt, John Stultz, Alex Shi, Kevin Hilman

On Wed, Dec 18, 2013 at 03:56:12PM +0100, Frederic Weisbecker wrote:
> Because scheduler_ipi() conditionally calls irq_enter() and irq_exit()
> (I wonder if that's a good idea btw, is that here to deal with spurious
> scheduler IPIs of some sort?)

No its because the traditional scheduler IPI did all the work from the
interrupt return path -- and like the comment in there says, for many of
the IPIs that's still true.

So going through the architectures and making all scheduler_ipi callers
do irq_enter/exit would actually make them slower.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 09/13] nohz: Allow timekeeper's tick to stop when all full dynticks CPUs are idle
  2013-12-18 14:36     ` Frederic Weisbecker
@ 2013-12-18 15:29       ` Paul E. McKenney
  0 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2013-12-18 15:29 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, John Stultz, Alex Shi, Kevin Hilman

On Wed, Dec 18, 2013 at 03:36:11PM +0100, Frederic Weisbecker wrote:
> On Tue, Dec 17, 2013 at 03:51:17PM -0800, Paul E. McKenney wrote:
> > On Tue, Dec 17, 2013 at 11:51:28PM +0100, Frederic Weisbecker wrote:
> > > +/*
> > > + * Fetch max deferment for the current clockevent source until it overflows.
> > > + * Also in full dynticks environment, make sure the current timekeeper
> > > + * stays periodic until some other CPU can take its timekeeping duty
> > > + * or until all full dynticks go to sleep.
> > > + */
> > > +static u64 tick_timekeeping_max_deferment(struct tick_sched *ts)
> > > +{
> > > +	int cpu;
> > > +	u64 ret = KTIME_MAX;
> > > +
> > > +	/*
> > > +	 * Fast path for full dynticks off-case: skip to
> > > +	 * clockevent max deferment
> > > +	 */
> > > +	if (!tick_nohz_full_enabled())
> > > +		return timekeeping_max_deferment();
> > > +
> > > +	cpu = smp_processor_id();
> > > +
> > > +	/* Full dynticks CPU don't take timekeeping duty */
> > > +	if (!tick_timekeeping_cpu(cpu))
> > > +		return timekeeping_max_deferment();
> > > +
> > > +	/*
> > > +	 * If we are the timekeeper and all full dynticks CPUs are idle,
> > > +	 * then we can finally sleep.
> > > +	 */
> > > +	if (tick_do_timer_cpu == cpu ||
> > > +	    (tick_do_timer_cpu == TICK_DO_TIMER_NONE &&	ts->do_timer_last == 1)) {
> > > +		if (!rcu_sys_is_idle()) {
> > 
> > So multiple CPUs could call rcu_sys_is_idle()?  Seems like this could
> > happen if tick_do_timer_cpu == TICK_DO_TIMER_NONE.  This would be OK only
> > if tick_timekeeping_cpu() returns true for one and only one of the CPUs
> > at any given range of time -- and also that no one calls rcu_sys_is_idle()
> > during a timekeeping CPU handoff.
> 
> Hmm yeah I fear we can have concurrent callers of this at a same time range.

OK, we do need to fix that.

> > If two different CPUs call rcu_sys_is_idle() anywhere nearly concurrently
> > on a small system (CONFIG_NO_HZ_FULL_SYSIDLE_SMALL), rcu_sys_is_idle()
> > will break and you will have voided your warranty.  ;-)
> 
> So it breaks because of concurrent state machine stepping on each other toes, right?
> Like one CPU has reached RCU_SYSIDLE_SHORT and another comes and see only
> RCU_SYSIDLE_NONE, so it can for example overwite to SHORT while the other CPU
> can be already far further the cmpxchg() sequence?

The ugliest possibility is in the CONFIG_NO_HZ_FULL_SYSIDLE_SMALL case.
Three CPUs are independently scanning the CPUs, and by luck, each of these
three CPUs see all the CPUs idle, even though CPUs are going idle and
non-idle continuously throughout.  After all three complete, they each
advance the state.  The state thus becomes RCU_SYSIDLE_FULL even though
there are non-idle CPUs.  The timekeeping tick therefore gets shut off
even though there are non-idle CPUs.  Very bad.

> Aye, I need to think further on how to cope with that...

One way would be to cmpxchg the ownership, and only the winner does
the call.  After that, the winner is tick_do_timer_cpu and no one else
does the call.  I guess you would want to skip the cmpxchg during early
boot, but can't claim to understand your constraints.

> > Also, if tick_timekeeping_cpu() doesn't think that there is a timekeeping
> > CPU, rcu_sys_is_idle() will always return false.  I think that this is
> > what you want to happen, just checking.
> 
> Ah right but that should be fine. tick_timekeeping_cpu() works for all potential
> timekeepers. Basically it's !tick_nohz_full_cpu(cpu).

OK, so we do have to worry about multiple callers.  Like you said above. ;-)

> > > +			/*
> > > +			 * Stop tick for 1 jiffy. In practice we stay periodic
> > > +			 * but that let us possibly delegate our timekeeping duty
> > > +			 * to stop the tick for real in the future.
> > > +			 */
> > > +			ret = tick_period.tv64;
> > > +		}
> > 
> > Do we need to set tick_do_timer_cpu to cpu?  Or is that handled elsewhere?
> > (If this is the boot-safety feature deleted below, could we please have
> > the comment back here?)
> 
> This is done in the patch that calls ..kick_timekeeping()  from sysidle_exit().
> 
> Do you have another case in mind?

Mostly just that if we did that determination here, we could avoid having
multiple callers into the state machine.

Alternatively, I might be able to use a lock to serialize callers, but
that would result in CPUs spinning needlessly in rcu_sys_is_idle().
Hence my suggestion of a cmpxchg() to elect a timekeeping CPU in this
code.

						Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/13] rcu: Fix unraised IPI to timekeeping CPU
  2013-12-18 12:12   ` Peter Zijlstra
@ 2013-12-18 15:38     ` Frederic Weisbecker
  2013-12-18 15:45       ` Christoph Hellwig
  2013-12-18 17:10       ` Peter Zijlstra
  0 siblings, 2 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-18 15:38 UTC (permalink / raw)
  To: Peter Zijlstra, Christoph Hellwig
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Steven Rostedt,
	Paul E. McKenney, John Stultz, Alex Shi, Kevin Hilman

On Wed, Dec 18, 2013 at 01:12:04PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 17, 2013 at 11:51:24PM +0100, Frederic Weisbecker wrote:
> > -			rcu_kick_nohz_cpu(tick_do_timer_cpu);
> > +			smp_send_reschedule(tick_do_timer_cpu);
> 
> Hurm.. so I don't really like this (same for the other patches doing the
> similar change).

Hehe, I knew you wouldn't like it :)
To be honest I'm not very happy with this change either.

> 
> Why not have a function called: wake_time_keeper_cpu(), and have that do
> the right thing?
> 
> Also, afaict, all you want is that remote cpu to run
> tick_nohz_full_check(), right?

Not only. There are two kind of IPIs we are interested in in full dynticks
environment:

1) Calls to tick_nohz_full_check() to make full dynticks CPUs to reevaluate
their tick. This must be called from IRQ to avoid crazy locking scenarios
from deep callers in place like scheduler internals.

So this one is currently implemented by scheduler_ipi() upstream.

2) Timekeeping kick: when a full dynticks CPU wakes up, it needs a housekeeping
CPU to run the timekeeping duty on its behalf. But it's possible that all
housekeeping CPUs are sleeping, so we need to kick one of them with an IPI.

In this case we don't need to call a specific handler, we are only interested in
calling irq_exit(). Although, thinking about it more, having a real handler would
make the code less complicated because then we can modify tick_do_timer_cpu
locally. And that should simplify a few things.


> So why are you (ab)using the scheduler_ipi for this at all?

So, I've been cowardly using it as a swiss army knife IPI ;-)
After the BKL, the BKT (Big kernel timer), the BKI (big kernel IPI) ;-)

> Why not have
> something like:
> 
> static DEFINE_PER_CPU(struct call_single_data, nohz_full_csd);
> 
> int init_nohz_full(void)
> {
> 	int cpu;
> 
> 	for_each_possible_cpu(cpu) {
> 		struct call_single_data *csd = &per_cpu(nohz_full_csd, cpu);
> 
> 		csd->flags = 0;
> 		csd->func = &tick_nohz_full_check;
> 		csd->info = NULL;
> 	}
> 
> 	return 0;
> }
> 
> void wake_time_keeper_cpu(void)
> {
> 	int cpu = pick_timekeeper_cpu();
> 	struct single_call_data *csd = &per_cpu(nohz_full_csd, cpu);
> 
> 	__smp_call_function_single(cpu, csd, 0);
> }

That's because I can't call smp_call_function_single() with interrupts disabled.

BTW this warning in __smp_call_function_single() looks wrong:

    WARN_ON_ONCE(cpu_online(smp_processor_id()) && wait && irqs_disabled()
    	      && !oops_in_progress)

I think we want to warn on irqs_disabled() even if !wait, because we wait for csd lock
anyway before raising the IPI.

Anyway so what I need is an IPI that can be raised with irqs disabled. And abusing the
scheduler IPI with ugliness after the other is obsviously not what we want.

So I wonder if we could change these smp_call_function_*() to use llist and behave closer
to what irq_work does.

Like: enqueue the llist_entry, if it's already enqueued then the remote CPU is going to
execute the callback so we are good. But may be we lose some ordering guarantees in
this case:

   store A        exec IPI
   raise IPI      load A

is perhaps not guaranteed anymore if the callback is already queued when we want to
add it and thus we don't raise the IPI. Althouth llist_add() failure perhaps brings
the atomic + full barrier that we need. So on the other side, llist_get() provides
the atomic + barrier that are needed such that the remote CPU sees our latest state.

Now may be I'm missing something obvious there, but that would allow us to drop that csd
thing.

I think that Christophe Hellwig had patches to drop csd and make smp.c lockless, I don't
recall the details though.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/13] rcu: Fix unraised IPI to timekeeping CPU
  2013-12-18 15:38     ` Frederic Weisbecker
@ 2013-12-18 15:45       ` Christoph Hellwig
  2013-12-18 17:10       ` Peter Zijlstra
  1 sibling, 0 replies; 57+ messages in thread
From: Christoph Hellwig @ 2013-12-18 15:45 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, Christoph Hellwig, LKML, Thomas Gleixner,
	Ingo Molnar, Steven Rostedt, Paul E. McKenney, John Stultz,
	Alex Shi, Kevin Hilman

On Wed, Dec 18, 2013 at 04:38:59PM +0100, Frederic Weisbecker wrote:
> I think that Christophe Hellwig had patches to drop csd and make smp.c lockless, I don't
> recall the details though.

I had a patch to use llists, but you still have the csd_lock opencoded
lock.  By coincidence Jan Kara just resent the patch a few minutes ago.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 07/13] sched: Enable IPI reception on timekeeper under nohz full system
  2013-12-18 14:49     ` Frederic Weisbecker
@ 2013-12-18 15:50       ` Paul E. McKenney
  0 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2013-12-18 15:50 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, John Stultz, Alex Shi, Kevin Hilman

On Wed, Dec 18, 2013 at 03:49:17PM +0100, Frederic Weisbecker wrote:
> On Tue, Dec 17, 2013 at 03:52:48PM -0800, Paul E. McKenney wrote:
> > On Tue, Dec 17, 2013 at 11:51:26PM +0100, Frederic Weisbecker wrote:
> > > We need the default timekeeping CPU to be able to receive IPIs sent
> > > from full dynticks CPUs when they wake up from full system idle state.
> > > 
> > > Therefore we need an entrypoint from the scheduler IPI so that the
> > > need to poll on timekeeping duty is re-evaluated from irq_exit().
> > > 
> > > In order to achieve this, lets take the scheduler IPI everytime as long
> > > as there is at least one full dynticks CPU around. Full dynticks CPUs
> > > are interested too in taking scheduler IPIs to reevaluate their tick.
> > > 
> > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Ingo Molnar <mingo@kernel.org>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > Cc: John Stultz <john.stultz@linaro.org>
> > > Cc: Alex Shi <alex.shi@linaro.org>
> > > Cc: Kevin Hilman <khilman@linaro.org>
> > > ---
> > >  kernel/sched/core.c | 6 +++---
> > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > index e85cda2..f46a7bc 100644
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -1502,9 +1502,9 @@ void scheduler_ipi(void)
> > >  	if (tif_need_resched())
> > >  		set_preempt_need_resched();
> > > 
> > > -	if (llist_empty(&this_rq()->wake_list)
> > > -			&& !tick_nohz_full_cpu(smp_processor_id())
> > > -			&& !got_nohz_idle_kick())
> > > +	if (llist_empty(&this_rq()->wake_list) &&
> > > +	    !tick_nohz_full_enabled() &&
> > > +	    !got_nohz_idle_kick())
> > >  		return;
> > 
> > OK, this is what I was missing in my question about whether the
> > NO_HZ_FULL state was re-evaluated in the interrupt-return path.
> 
> I tend to write my patchset by splitting every single logical bricks and then only
> in the end I enable the feature.
> 
> But that makes a tradeoff between patchset granularity and global overview. And in the end,
> may be it's unbalanced toward overview.
> 
> Notwithstanding bisectability.
> 
> I remember I had similar reactions when I posted the initial full nohz patchset.
> 
> OTOH it's not good to have big all-in-one patches. And granular patchsets like this
> are better to focus discussions on each isolated topics.
> 
> There is a hard balance to find out here.

It is OK.  I have no idea what possessed me to review the patches in
reverse order.  I normally review in forward order.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/13] rcu: Fix unraised IPI to timekeeping CPU
  2013-12-18 15:11           ` Peter Zijlstra
@ 2013-12-18 15:58             ` Frederic Weisbecker
  0 siblings, 0 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-18 15:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, LKML, Thomas Gleixner, Ingo Molnar,
	Steven Rostedt, John Stultz, Alex Shi, Kevin Hilman

On Wed, Dec 18, 2013 at 04:11:58PM +0100, Peter Zijlstra wrote:
> On Wed, Dec 18, 2013 at 03:56:12PM +0100, Frederic Weisbecker wrote:
> > Because scheduler_ipi() conditionally calls irq_enter() and irq_exit()
> > (I wonder if that's a good idea btw, is that here to deal with spurious
> > scheduler IPIs of some sort?)
> 
> No its because the traditional scheduler IPI did all the work from the
> interrupt return path -- and like the comment in there says, for many of
> the IPIs that's still true.
> 
> So going through the architectures and making all scheduler_ipi callers
> do irq_enter/exit would actually make them slower.

Fair point!

Thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/13] rcu: Exclude all potential timekeepers from sysidle detection
  2013-12-18 11:43       ` Peter Zijlstra
  2013-12-18 11:46         ` Peter Zijlstra
  2013-12-18 14:15         ` Paul E. McKenney
@ 2013-12-18 16:24         ` Frederic Weisbecker
  2 siblings, 0 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-18 16:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, LKML, Thomas Gleixner, Ingo Molnar,
	Steven Rostedt, John Stultz, Alex Shi, Kevin Hilman

On Wed, Dec 18, 2013 at 12:43:18PM +0100, Peter Zijlstra wrote:
> On Wed, Dec 18, 2013 at 12:49:15AM +0100, Frederic Weisbecker wrote:
> > That's to start simple as CPU 0 can't be offlined.
> 
> Yes it can, see CONFIG_BOOTPARAM_HOTPLUG_CPU0

Yeah it can but the tick hotplug callback prevent that to happen on nohz full.
You can check out tick_nohz_cpu_down_callback() for details.

I agree that rejecting CPU_DOWN_PREPARE requests is propably not the prettiest
thing ever but that's the only solution that we have now.

There must _always_ be a timekeeper to maintain timekeeping duty as long
as there are full dynticks CPUs around. So one of these potential timekeepers has
to stay online. And CPU 0 is quite a handy candidate since rejecting any
other CPU's offlining request may block suspend.

Now if it happens to be annoying for some reason, we can certainly change that
behaviour in the future. This will involve forcing some full dynticks CPU to
carry timekeeping but heh, if user does silly things :)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 10/13] nohz: Hand over timekeeping duty on cpu offlining
  2013-12-18 12:30   ` Peter Zijlstra
@ 2013-12-18 16:43     ` Frederic Weisbecker
  0 siblings, 0 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-18 16:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Steven Rostedt,
	Paul E. McKenney, John Stultz, Alex Shi, Kevin Hilman

On Wed, Dec 18, 2013 at 01:30:01PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 17, 2013 at 11:51:29PM +0100, Frederic Weisbecker wrote:
> > The default timekeeper (aka CPU 0) is the perfect candidate for this
> > task since it can't be offlined itself.
> 
> Like said previously, yes it can!

I'll rephrase that in the changelog to precise that it's the tick hotplug handler
which prevent it to offline.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/13] rcu: Fix unraised IPI to timekeeping CPU
  2013-12-18 15:38     ` Frederic Weisbecker
  2013-12-18 15:45       ` Christoph Hellwig
@ 2013-12-18 17:10       ` Peter Zijlstra
  1 sibling, 0 replies; 57+ messages in thread
From: Peter Zijlstra @ 2013-12-18 17:10 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Christoph Hellwig, LKML, Thomas Gleixner, Ingo Molnar,
	Steven Rostedt, Paul E. McKenney, John Stultz, Alex Shi,
	Kevin Hilman

On Wed, Dec 18, 2013 at 04:38:59PM +0100, Frederic Weisbecker wrote:
> BTW this warning in __smp_call_function_single() looks wrong:
> 
>     WARN_ON_ONCE(cpu_online(smp_processor_id()) && wait && irqs_disabled()
>     	      && !oops_in_progress)
> 
> I think we want to warn on irqs_disabled() even if !wait, because we wait for csd lock
> anyway before raising the IPI.
> 
> Anyway so what I need is an IPI that can be raised with irqs disabled. And abusing the
> scheduler IPI with ugliness after the other is obsviously not what we want.

Ah indeed, so you can actually make it work but its a little more
involved.

struct foo {
	struct call_single_data csd;
	unsigned int msg;
};

static DEFINE_PER_CPU_ALIGNED(struct foo, foo_csd);

/*
 * set @bit in @msg, return true if its the first bit set
 */
static bool foo_set_msg(unsigned int *msg, int bit)
{
	unsigned int old_msg, new_msg;

	for (;;) {
		old_msg = new_msg = *msg;
		new_msg |= 1U << bit;
		if (cmpxchg(msg, old_msg, new_msg) == old_msg)
			break;
	}

	return old_msg == 0;
}

/*
 * clear @bit in @msg, return true if its the last bit cleared
 */
static bool foo_clear_msg(unsigned int *msg, int bit)
{
	unsigned int old_msg, new_msg;

	for (;;) {
		old_msg = new_msg = *msg;
		new_msg &= ~(1U << bit);
		if (cmpxchg(msg, old_msg, new_msg) == old_msg)
			break;
	}

	return new_msg == 0;
}

/*
 * handle @bit msg
 */
static void foo_handle_msg(int bit)
{
	switch (bit) {
	case 0:
		/* .... */
		break;

		/* ... */
	}
}

static void foo_call(void *info)
{
	struct foo *foo_csd = info;
	int bit;

	for (;;) {
		for (bit = 0; bit < 32; bit++) {
			if (ACCESS_ONCE(foo_csd->msg) & (1U << bit)) {
				foo_handle_msg(bit);
				if (foo_clear_msg(bit))
					return;
			}
		}
	}
}

int foo_init(void)
{
	int cpu;

	for_each_possible_cpu(cpu) {
		struct foo *foo_csd = &per_cpu(foo_csd, cpu);

		foo_csd->csd.flags = 0;
		foo_csd->csd.func = foo_call;
		foo_csd->csd.info = foo_csd;
		foo_csd->msg = 0;
	}

	return 0;
}

void foo_msg(int cpu, int bit)
{
	struct foo *foo_csd = &per_cpu(foo_csd, cpu);

	/*
	 * Only send the IPI if this is the first msg bit set;
	 * otherwise the handler is already pending/running and
	 * will pick it up.
	 *
	 * This will not deadlock on csd_lock because the atomic
	 * msg manipulation ensures there's only ever one csd per cpu
	 * active.
	 */
	if (foo_set_msg(&foo_csd->msg, bit))
		__smp_call_function_single(cpu, &foo_csd->csd, 0);
}

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep
  2013-12-18  2:04 ` [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep Alex Shi
  2013-12-18 10:19   ` Peter Zijlstra
@ 2013-12-18 17:43   ` Frederic Weisbecker
  2013-12-18 21:29     ` Andy Lutomirski
  1 sibling, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2013-12-18 17:43 UTC (permalink / raw)
  To: Alex Shi
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, John Stultz, Kevin Hilman

On Wed, Dec 18, 2013 at 10:04:43AM +0800, Alex Shi wrote:
> On 12/18/2013 06:51 AM, Frederic Weisbecker wrote:
> > So this is what this series brings, more details following:
> > 
> > * Some code, naming and whitespace cleanups
> > 
> > * Allow all CPUs outside the nohz_full range to handle the timekeeping
> >   duty, not just CPU 0. Balancing the timekeeping duty should improve
> >   powersavings.
> 
> If the system just has one nohz_full cpu running, it will need another
> cpu to do timerkeeper job. Then the system roughly needs 2 cpu living.
> From powersaving POV, that is not good compare to normal nohz idle.

Sure, but everything has a tradeoff :)

We could theoretically run with the timekeeper purely idle if the other
CPU in full dynticks mode runs in userspace for a long while and seldom
do syscalls and faults. Timekeeping could be updated on kernel/user
boundaries in this case without much impact on performances.

But then there is one strict condition for that: it can't read the timeofday
through the vdso but only through a syscall.

Then if we can meet that condition, then CPU 0 could as well be full dynticks.

Now that's real extreme HPC ;)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep
  2013-12-18 17:43   ` Frederic Weisbecker
@ 2013-12-18 21:29     ` Andy Lutomirski
  2013-12-18 21:49       ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Andy Lutomirski @ 2013-12-18 21:29 UTC (permalink / raw)
  To: Frederic Weisbecker, Alex Shi
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, John Stultz, Kevin Hilman

On 12/18/2013 09:43 AM, Frederic Weisbecker wrote:
> On Wed, Dec 18, 2013 at 10:04:43AM +0800, Alex Shi wrote:
>> On 12/18/2013 06:51 AM, Frederic Weisbecker wrote:
>>> So this is what this series brings, more details following:
>>>
>>> * Some code, naming and whitespace cleanups
>>>
>>> * Allow all CPUs outside the nohz_full range to handle the timekeeping
>>>   duty, not just CPU 0. Balancing the timekeeping duty should improve
>>>   powersavings.
>>
>> If the system just has one nohz_full cpu running, it will need another
>> cpu to do timerkeeper job. Then the system roughly needs 2 cpu living.
>> From powersaving POV, that is not good compare to normal nohz idle.
> 
> Sure, but everything has a tradeoff :)
> 
> We could theoretically run with the timekeeper purely idle if the other
> CPU in full dynticks mode runs in userspace for a long while and seldom
> do syscalls and faults. Timekeeping could be updated on kernel/user
> boundaries in this case without much impact on performances.
> 
> But then there is one strict condition for that: it can't read the timeofday
> through the vdso but only through a syscall.

Where's your ambition? :)

If the vdso timing functions could see that it's been too long since a
real timekeeping update, they could fall back to a syscall.  Otherwise,
they could using rdtsc or whatever is in use.

--Andy

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep
  2013-12-18 21:29     ` Andy Lutomirski
@ 2013-12-18 21:49       ` Paul E. McKenney
  2013-12-18 21:53         ` Andy Lutomirski
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2013-12-18 21:49 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Frederic Weisbecker, Alex Shi, LKML, Thomas Gleixner,
	Ingo Molnar, Peter Zijlstra, Steven Rostedt, John Stultz,
	Kevin Hilman

On Wed, Dec 18, 2013 at 01:29:53PM -0800, Andy Lutomirski wrote:
> On 12/18/2013 09:43 AM, Frederic Weisbecker wrote:
> > On Wed, Dec 18, 2013 at 10:04:43AM +0800, Alex Shi wrote:
> >> On 12/18/2013 06:51 AM, Frederic Weisbecker wrote:
> >>> So this is what this series brings, more details following:
> >>>
> >>> * Some code, naming and whitespace cleanups
> >>>
> >>> * Allow all CPUs outside the nohz_full range to handle the timekeeping
> >>>   duty, not just CPU 0. Balancing the timekeeping duty should improve
> >>>   powersavings.
> >>
> >> If the system just has one nohz_full cpu running, it will need another
> >> cpu to do timerkeeper job. Then the system roughly needs 2 cpu living.
> >> From powersaving POV, that is not good compare to normal nohz idle.
> > 
> > Sure, but everything has a tradeoff :)
> > 
> > We could theoretically run with the timekeeper purely idle if the other
> > CPU in full dynticks mode runs in userspace for a long while and seldom
> > do syscalls and faults. Timekeeping could be updated on kernel/user
> > boundaries in this case without much impact on performances.
> > 
> > But then there is one strict condition for that: it can't read the timeofday
> > through the vdso but only through a syscall.
> 
> Where's your ambition? :)
> 
> If the vdso timing functions could see that it's been too long since a
> real timekeeping update, they could fall back to a syscall.  Otherwise,
> they could using rdtsc or whatever is in use.

One objection to that approach in the past has been that it injects
avoidable latency into the worker CPUs.  I suppose that you could argue
that the cache misses due to a timekeeping-CPU update are not free, but
then again, the syscall is likely to also incur a few cache misses as
well.

I bet that the timekeeping-CPU approach wins, but it would be cool to
see you prove me wrong.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep
  2013-12-18 21:49       ` Paul E. McKenney
@ 2013-12-18 21:53         ` Andy Lutomirski
  2013-12-18 21:57           ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Andy Lutomirski @ 2013-12-18 21:53 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Frederic Weisbecker, Alex Shi, LKML, Thomas Gleixner,
	Ingo Molnar, Peter Zijlstra, Steven Rostedt, John Stultz,
	Kevin Hilman

On Wed, Dec 18, 2013 at 1:49 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Wed, Dec 18, 2013 at 01:29:53PM -0800, Andy Lutomirski wrote:
>> On 12/18/2013 09:43 AM, Frederic Weisbecker wrote:
>> > On Wed, Dec 18, 2013 at 10:04:43AM +0800, Alex Shi wrote:
>> >> On 12/18/2013 06:51 AM, Frederic Weisbecker wrote:
>> >>> So this is what this series brings, more details following:
>> >>>
>> >>> * Some code, naming and whitespace cleanups
>> >>>
>> >>> * Allow all CPUs outside the nohz_full range to handle the timekeeping
>> >>>   duty, not just CPU 0. Balancing the timekeeping duty should improve
>> >>>   powersavings.
>> >>
>> >> If the system just has one nohz_full cpu running, it will need another
>> >> cpu to do timerkeeper job. Then the system roughly needs 2 cpu living.
>> >> From powersaving POV, that is not good compare to normal nohz idle.
>> >
>> > Sure, but everything has a tradeoff :)
>> >
>> > We could theoretically run with the timekeeper purely idle if the other
>> > CPU in full dynticks mode runs in userspace for a long while and seldom
>> > do syscalls and faults. Timekeeping could be updated on kernel/user
>> > boundaries in this case without much impact on performances.
>> >
>> > But then there is one strict condition for that: it can't read the timeofday
>> > through the vdso but only through a syscall.
>>
>> Where's your ambition? :)
>>
>> If the vdso timing functions could see that it's been too long since a
>> real timekeeping update, they could fall back to a syscall.  Otherwise,
>> they could using rdtsc or whatever is in use.
>
> One objection to that approach in the past has been that it injects
> avoidable latency into the worker CPUs.  I suppose that you could argue
> that the cache misses due to a timekeeping-CPU update are not free, but
> then again, the syscall is likely to also incur a few cache misses as
> well.
>
> I bet that the timekeeping-CPU approach wins, but it would be cool to
> see you prove me wrong.

There's already some (very vague) discussion about having a scheduled
time at which the clock frequency and/or offset will change, and this
wouldn't be a huge departure from that.  The goal there is to avoid
waiting for timekeeping if vclock_gettime runs concurrently with an
update, but the same approach could apply here (albeit with one extra
branch).

Anyway, syscalls aren't *that* expensive.

Alternatively, couldn't workloads like this just turn off NTP?

--Andy

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep
  2013-12-18 21:53         ` Andy Lutomirski
@ 2013-12-18 21:57           ` Paul E. McKenney
  2013-12-18 22:55             ` Andy Lutomirski
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2013-12-18 21:57 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Frederic Weisbecker, Alex Shi, LKML, Thomas Gleixner,
	Ingo Molnar, Peter Zijlstra, Steven Rostedt, John Stultz,
	Kevin Hilman

On Wed, Dec 18, 2013 at 01:53:18PM -0800, Andy Lutomirski wrote:
> On Wed, Dec 18, 2013 at 1:49 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Wed, Dec 18, 2013 at 01:29:53PM -0800, Andy Lutomirski wrote:
> >> On 12/18/2013 09:43 AM, Frederic Weisbecker wrote:
> >> > On Wed, Dec 18, 2013 at 10:04:43AM +0800, Alex Shi wrote:
> >> >> On 12/18/2013 06:51 AM, Frederic Weisbecker wrote:
> >> >>> So this is what this series brings, more details following:
> >> >>>
> >> >>> * Some code, naming and whitespace cleanups
> >> >>>
> >> >>> * Allow all CPUs outside the nohz_full range to handle the timekeeping
> >> >>>   duty, not just CPU 0. Balancing the timekeeping duty should improve
> >> >>>   powersavings.
> >> >>
> >> >> If the system just has one nohz_full cpu running, it will need another
> >> >> cpu to do timerkeeper job. Then the system roughly needs 2 cpu living.
> >> >> From powersaving POV, that is not good compare to normal nohz idle.
> >> >
> >> > Sure, but everything has a tradeoff :)
> >> >
> >> > We could theoretically run with the timekeeper purely idle if the other
> >> > CPU in full dynticks mode runs in userspace for a long while and seldom
> >> > do syscalls and faults. Timekeeping could be updated on kernel/user
> >> > boundaries in this case without much impact on performances.
> >> >
> >> > But then there is one strict condition for that: it can't read the timeofday
> >> > through the vdso but only through a syscall.
> >>
> >> Where's your ambition? :)
> >>
> >> If the vdso timing functions could see that it's been too long since a
> >> real timekeeping update, they could fall back to a syscall.  Otherwise,
> >> they could using rdtsc or whatever is in use.
> >
> > One objection to that approach in the past has been that it injects
> > avoidable latency into the worker CPUs.  I suppose that you could argue
> > that the cache misses due to a timekeeping-CPU update are not free, but
> > then again, the syscall is likely to also incur a few cache misses as
> > well.
> >
> > I bet that the timekeeping-CPU approach wins, but it would be cool to
> > see you prove me wrong.
> 
> There's already some (very vague) discussion about having a scheduled
> time at which the clock frequency and/or offset will change, and this
> wouldn't be a huge departure from that.  The goal there is to avoid
> waiting for timekeeping if vclock_gettime runs concurrently with an
> update, but the same approach could apply here (albeit with one extra
> branch).
> 
> Anyway, syscalls aren't *that* expensive.

Like I said, it would be cool to see you prove me wrong, but that will
need to be with patches and performance results rather than rhetoric.

> Alternatively, couldn't workloads like this just turn off NTP?

Some probably could, but others need accurate time.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep
  2013-12-18 21:57           ` Paul E. McKenney
@ 2013-12-18 22:55             ` Andy Lutomirski
  0 siblings, 0 replies; 57+ messages in thread
From: Andy Lutomirski @ 2013-12-18 22:55 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Frederic Weisbecker, Alex Shi, LKML, Thomas Gleixner,
	Ingo Molnar, Peter Zijlstra, Steven Rostedt, John Stultz,
	Kevin Hilman

On Wed, Dec 18, 2013 at 1:57 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Wed, Dec 18, 2013 at 01:53:18PM -0800, Andy Lutomirski wrote:
>> On Wed, Dec 18, 2013 at 1:49 PM, Paul E. McKenney
>> <paulmck@linux.vnet.ibm.com> wrote:
>> > On Wed, Dec 18, 2013 at 01:29:53PM -0800, Andy Lutomirski wrote:
>> >> On 12/18/2013 09:43 AM, Frederic Weisbecker wrote:
>> >> > On Wed, Dec 18, 2013 at 10:04:43AM +0800, Alex Shi wrote:
>> >> >> On 12/18/2013 06:51 AM, Frederic Weisbecker wrote:
>> >> >>> So this is what this series brings, more details following:
>> >> >>>
>> >> >>> * Some code, naming and whitespace cleanups
>> >> >>>
>> >> >>> * Allow all CPUs outside the nohz_full range to handle the timekeeping
>> >> >>>   duty, not just CPU 0. Balancing the timekeeping duty should improve
>> >> >>>   powersavings.
>> >> >>
>> >> >> If the system just has one nohz_full cpu running, it will need another
>> >> >> cpu to do timerkeeper job. Then the system roughly needs 2 cpu living.
>> >> >> From powersaving POV, that is not good compare to normal nohz idle.
>> >> >
>> >> > Sure, but everything has a tradeoff :)
>> >> >
>> >> > We could theoretically run with the timekeeper purely idle if the other
>> >> > CPU in full dynticks mode runs in userspace for a long while and seldom
>> >> > do syscalls and faults. Timekeeping could be updated on kernel/user
>> >> > boundaries in this case without much impact on performances.
>> >> >
>> >> > But then there is one strict condition for that: it can't read the timeofday
>> >> > through the vdso but only through a syscall.
>> >>
>> >> Where's your ambition? :)
>> >>
>> >> If the vdso timing functions could see that it's been too long since a
>> >> real timekeeping update, they could fall back to a syscall.  Otherwise,
>> >> they could using rdtsc or whatever is in use.
>> >
>> > One objection to that approach in the past has been that it injects
>> > avoidable latency into the worker CPUs.  I suppose that you could argue
>> > that the cache misses due to a timekeeping-CPU update are not free, but
>> > then again, the syscall is likely to also incur a few cache misses as
>> > well.
>> >
>> > I bet that the timekeeping-CPU approach wins, but it would be cool to
>> > see you prove me wrong.
>>
>> There's already some (very vague) discussion about having a scheduled
>> time at which the clock frequency and/or offset will change, and this
>> wouldn't be a huge departure from that.  The goal there is to avoid
>> waiting for timekeeping if vclock_gettime runs concurrently with an
>> update, but the same approach could apply here (albeit with one extra
>> branch).
>>
>> Anyway, syscalls aren't *that* expensive.
>
> Like I said, it would be cool to see you prove me wrong, but that will
> need to be with patches and performance results rather than rhetoric.

Fair enough.  Don't hold your breath.  (And certainly don't take this
as an objection to these patches.)

--Andy

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [tip:timers/urgent] tick: Rename tick_check_idle() to tick_irq_enter()
  2013-12-17 22:51 ` [PATCH 01/13] tick: Rename tick_check_idle() to tick_irq_enter() Frederic Weisbecker
@ 2014-01-25 14:22   ` tip-bot for Frederic Weisbecker
  0 siblings, 0 replies; 57+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2014-01-25 14:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, peterz, john.stultz, alex.shi, paulmck,
	fweisbec, rostedt, khilman, tglx

Commit-ID:  5acac1be499d979e3aa463ea73a498888faefcbe
Gitweb:     http://git.kernel.org/tip/5acac1be499d979e3aa463ea73a498888faefcbe
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Wed, 4 Dec 2013 18:28:20 +0100
Committer:  Frederic Weisbecker <fweisbec@gmail.com>
CommitDate: Wed, 15 Jan 2014 23:05:31 +0100

tick: Rename tick_check_idle() to tick_irq_enter()

This makes the code more symetric against the existing tick functions
called on irq exit: tick_irq_exit() and tick_nohz_irq_exit().

These function are also symetric as they mirror each other's action:
we start to account idle time on irq exit and we stop this accounting
on irq entry. Also the tick is stopped on irq exit and timekeeping
catches up with the tickless time elapsed until we reach irq entry.

This rename was suggested by Peter Zijlstra a long while ago but it
got forgotten in the mass.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
Link: http://lkml.kernel.org/r/1387320692-28460-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/tick.h     | 6 +++---
 kernel/softirq.c         | 2 +-
 kernel/time/tick-sched.c | 8 ++++----
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 0175d86..b84773c 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -104,7 +104,7 @@ extern struct cpumask *tick_get_broadcast_oneshot_mask(void);
 extern void tick_clock_notify(void);
 extern int tick_check_oneshot_change(int allow_nohz);
 extern struct tick_sched *tick_get_tick_sched(int cpu);
-extern void tick_check_idle(void);
+extern void tick_irq_enter(void);
 extern int tick_oneshot_mode_active(void);
 #  ifndef arch_needs_cpu
 #   define arch_needs_cpu(cpu) (0)
@@ -112,7 +112,7 @@ extern int tick_oneshot_mode_active(void);
 # else
 static inline void tick_clock_notify(void) { }
 static inline int tick_check_oneshot_change(int allow_nohz) { return 0; }
-static inline void tick_check_idle(void) { }
+static inline void tick_irq_enter(void) { }
 static inline int tick_oneshot_mode_active(void) { return 0; }
 # endif
 
@@ -121,7 +121,7 @@ static inline void tick_init(void) { }
 static inline void tick_cancel_sched_timer(int cpu) { }
 static inline void tick_clock_notify(void) { }
 static inline int tick_check_oneshot_change(int allow_nohz) { return 0; }
-static inline void tick_check_idle(void) { }
+static inline void tick_irq_enter(void) { }
 static inline int tick_oneshot_mode_active(void) { return 0; }
 #endif /* !CONFIG_GENERIC_CLOCKEVENTS */
 
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 11348de..ca9cb35 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -318,7 +318,7 @@ void irq_enter(void)
 		 * here, as softirq will be serviced on return from interrupt.
 		 */
 		local_bh_disable();
-		tick_check_idle();
+		tick_irq_enter();
 		_local_bh_enable();
 	}
 
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 0ddd020..e4d0f09 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1023,7 +1023,7 @@ static void tick_nohz_kick_tick(struct tick_sched *ts, ktime_t now)
 #endif
 }
 
-static inline void tick_check_nohz_this_cpu(void)
+static inline void tick_nohz_irq_enter(void)
 {
 	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
 	ktime_t now;
@@ -1042,17 +1042,17 @@ static inline void tick_check_nohz_this_cpu(void)
 #else
 
 static inline void tick_nohz_switch_to_nohz(void) { }
-static inline void tick_check_nohz_this_cpu(void) { }
+static inline void tick_nohz_irq_enter(void) { }
 
 #endif /* CONFIG_NO_HZ_COMMON */
 
 /*
  * Called from irq_enter to notify about the possible interruption of idle()
  */
-void tick_check_idle(void)
+void tick_irq_enter(void)
 {
 	tick_check_oneshot_broadcast_this_cpu();
-	tick_check_nohz_this_cpu();
+	tick_nohz_irq_enter();
 }
 
 /*

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [tip:timers/urgent] nohz: Get timekeeping max deferment outside jiffies_lock
  2013-12-17 22:51 ` [PATCH 08/13] nohz: Get timekeeping max deferment outside jiffies_lock Frederic Weisbecker
@ 2014-01-25 14:22   ` tip-bot for Frederic Weisbecker
  0 siblings, 0 replies; 57+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2014-01-25 14:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, peterz, john.stultz, alex.shi, paulmck,
	fweisbec, rostedt, khilman, tglx

Commit-ID:  855a0fc30b70d6ae681badd24d6625f9a9abb787
Gitweb:     http://git.kernel.org/tip/855a0fc30b70d6ae681badd24d6625f9a9abb787
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Tue, 17 Dec 2013 00:16:37 +0100
Committer:  Frederic Weisbecker <fweisbec@gmail.com>
CommitDate: Wed, 15 Jan 2014 23:07:11 +0100

nohz: Get timekeeping max deferment outside jiffies_lock

We don't need to fetch the timekeeping max deferment under the
jiffies_lock seqlock.

If the clocksource is updated concurrently while we stop the tick,
stop machine is called and the tick will be reevaluated again along with
uptodate jiffies and its related values.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
Link: http://lkml.kernel.org/r/1387320692-28460-9-git-send-email-fweisbec@gmail.com
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/time/tick-sched.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index e4d0f09..68331d1 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -533,12 +533,13 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
 	u64 time_delta;
 
+	time_delta = timekeeping_max_deferment();
+
 	/* Read jiffies and the time when jiffies were updated last */
 	do {
 		seq = read_seqbegin(&jiffies_lock);
 		last_update = last_jiffies_update;
 		last_jiffies = jiffies;
-		time_delta = timekeeping_max_deferment();
 	} while (read_seqretry(&jiffies_lock, seq));
 
 	if (rcu_needs_cpu(cpu, &rcu_delta_jiffies) ||

^ permalink raw reply related	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2014-01-25 14:23 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-17 22:51 [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep Frederic Weisbecker
2013-12-17 22:51 ` [PATCH 01/13] tick: Rename tick_check_idle() to tick_irq_enter() Frederic Weisbecker
2014-01-25 14:22   ` [tip:timers/urgent] " tip-bot for Frederic Weisbecker
2013-12-17 22:51 ` [PATCH 02/13] time: New helper to check CPU eligibility to handle timekeeping Frederic Weisbecker
2013-12-17 22:51 ` [PATCH 03/13] rcu: Exclude all potential timekeepers from sysidle detection Frederic Weisbecker
2013-12-17 23:27   ` Paul E. McKenney
2013-12-17 23:49     ` Frederic Weisbecker
2013-12-18 11:43       ` Peter Zijlstra
2013-12-18 11:46         ` Peter Zijlstra
2013-12-18 14:15         ` Paul E. McKenney
2013-12-18 16:24         ` Frederic Weisbecker
2013-12-17 22:51 ` [PATCH 04/13] tick: Use timekeeping_cpu() to elect the CPU handling timekeeping duty Frederic Weisbecker
2013-12-17 23:55   ` Paul E. McKenney
2013-12-17 22:51 ` [PATCH 05/13] rcu: Fix unraised IPI to timekeeping CPU Frederic Weisbecker
2013-12-17 23:21   ` Paul E. McKenney
2013-12-18 14:13     ` Frederic Weisbecker
2013-12-18 14:22       ` Paul E. McKenney
2013-12-18 14:56         ` Frederic Weisbecker
2013-12-18 15:11           ` Peter Zijlstra
2013-12-18 15:58             ` Frederic Weisbecker
2013-12-18 12:12   ` Peter Zijlstra
2013-12-18 15:38     ` Frederic Weisbecker
2013-12-18 15:45       ` Christoph Hellwig
2013-12-18 17:10       ` Peter Zijlstra
2013-12-17 22:51 ` [PATCH 06/13] nohz: Introduce full dynticks' default timekeeping target Frederic Weisbecker
2013-12-17 23:54   ` Paul E. McKenney
2013-12-17 22:51 ` [PATCH 07/13] sched: Enable IPI reception on timekeeper under nohz full system Frederic Weisbecker
2013-12-17 23:52   ` Paul E. McKenney
2013-12-18 14:49     ` Frederic Weisbecker
2013-12-18 15:50       ` Paul E. McKenney
2013-12-18 10:06   ` Peter Zijlstra
2013-12-17 22:51 ` [PATCH 08/13] nohz: Get timekeeping max deferment outside jiffies_lock Frederic Weisbecker
2014-01-25 14:22   ` [tip:timers/urgent] " tip-bot for Frederic Weisbecker
2013-12-17 22:51 ` [PATCH 09/13] nohz: Allow timekeeper's tick to stop when all full dynticks CPUs are idle Frederic Weisbecker
2013-12-17 23:51   ` Paul E. McKenney
2013-12-18 14:36     ` Frederic Weisbecker
2013-12-18 15:29       ` Paul E. McKenney
2013-12-17 22:51 ` [PATCH 10/13] nohz: Hand over timekeeping duty on cpu offlining Frederic Weisbecker
2013-12-17 23:40   ` Paul E. McKenney
2013-12-18 14:19     ` Frederic Weisbecker
2013-12-18 12:30   ` Peter Zijlstra
2013-12-18 16:43     ` Frederic Weisbecker
2013-12-17 22:51 ` [PATCH 11/13] nohz: Wake up timekeeper on exit from sysidle state Frederic Weisbecker
2013-12-17 23:34   ` Paul E. McKenney
2013-12-17 23:52     ` Frederic Weisbecker
2013-12-17 22:51 ` [PATCH 12/13] nohz: Allow all CPUs outside nohz_full range to do timekeeping Frederic Weisbecker
2013-12-17 23:32   ` Paul E. McKenney
2013-12-17 22:51 ` [PATCH 13/13] nohz_full: fix code style issue of tick_nohz_full_stop_tick Frederic Weisbecker
2013-12-18  2:04 ` [RFC PATCH 00/13] nohz: Use sysidle detection to let the timekeeper sleep Alex Shi
2013-12-18 10:19   ` Peter Zijlstra
2013-12-18 14:18     ` Paul E. McKenney
2013-12-18 17:43   ` Frederic Weisbecker
2013-12-18 21:29     ` Andy Lutomirski
2013-12-18 21:49       ` Paul E. McKenney
2013-12-18 21:53         ` Andy Lutomirski
2013-12-18 21:57           ` Paul E. McKenney
2013-12-18 22:55             ` Andy Lutomirski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.