All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] nohz: Support sysidle (+ some more nohz kick cleanups)
@ 2014-07-28 17:37 Frederic Weisbecker
  2014-07-28 17:37 ` [PATCH 01/10] smp: Introduce void kick_cpu_async() Frederic Weisbecker
                   ` (9 more replies)
  0 siblings, 10 replies; 20+ messages in thread
From: Frederic Weisbecker @ 2014-07-28 17:37 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Nicolas Pitre, Paul E. McKenney,
	Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
	Viresh Kumar

Hi,

In this set, I have added Paul's acks and applied Peterz suggestion to
rename irq_work_void_on() to kick_cpu(), which I actually expanded to
kick_cpu_async() so to differentiate its behaviour from kick_all_cpus_sync().

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	nohz/sysidle-v2

Thanks,
	Frederic
---

Frederic Weisbecker (10):
      smp: Introduce void kick_cpu_async()
      nohz: Kick full dynticks timer targets with an empty IPI
      rcu: Kick full dynticks CPU on extended grace period with kick_cpu_async()
      nohz: Appropriate timekeeper kick on sysidle break
      smp: Fast path check on IPI list
      nohz: Define meaningful symbol for nohz full timekeeper
      nohz: Enforce timekeeping on CPU 0
      nohz: Fetch timekeeping max deferment only for timekeeper
      nohz: Switch nohz full timekeeper to dynticks idle on top of sysidle detection
      nohz: Warn on illegal timekeeper switch in nohz full


 include/linux/smp.h         |  1 +
 kernel/rcu/tree_plugin.h    | 10 ++++---
 kernel/sched/core.c         |  8 ++---
 kernel/smp.c                | 32 +++++++++++++++++++-
 kernel/time/tick-common.c   | 11 ++++---
 kernel/time/tick-internal.h |  8 +++++
 kernel/time/tick-sched.c    | 72 ++++++++++++++++++++++-----------------------
 7 files changed, 90 insertions(+), 52 deletions(-)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 01/10] smp: Introduce void kick_cpu_async()
  2014-07-28 17:37 [PATCH 00/10] nohz: Support sysidle (+ some more nohz kick cleanups) Frederic Weisbecker
@ 2014-07-28 17:37 ` Frederic Weisbecker
  2014-07-28 17:37 ` [PATCH 02/10] nohz: Kick full dynticks timer targets with an empty IPI Frederic Weisbecker
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Frederic Weisbecker @ 2014-07-28 17:37 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Ingo Molnar, Nicolas Pitre,
	Paul E. McKenney, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner, Viresh Kumar

Being able to trigger an empty IPI appears to be useless in the first
place. Yet it is expected to be very useful for callers who just need
to execute irq_enter() or irq_exit() to a remote target.

More precisely this is going to be useful for the nohz subsystem which
often needs a remote CPU to reschedule its tick when some state changed
and require periodicity for any reason. Similar cases have been handled
with random IPIs until now. But they surely bring unwanted overhead
all along since they are all dedicated for specific tasks.

Triggering an smp_call_function IPI should be enough to solve that. If
competing and spurious IPIs become a problem, we can still optimize that
later.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/smp.h |  1 +
 kernel/smp.c        | 21 +++++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index 34347f2..a8d9c9b 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -100,6 +100,7 @@ int smp_call_function_any(const struct cpumask *mask,
 			  smp_call_func_t func, void *info, int wait);
 
 void kick_all_cpus_sync(void);
+void kick_cpu_async(int cpu);
 
 /*
  * Generic and arch helpers
diff --git a/kernel/smp.c b/kernel/smp.c
index a1812d1..12c5bea 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -658,3 +658,24 @@ void kick_all_cpus_sync(void)
 	smp_call_function(do_nothing, NULL, 1);
 }
 EXPORT_SYMBOL_GPL(kick_all_cpus_sync);
+
+/**
+ * kick_cpu_async(): Run a void IRQ on the target
+ * @cpu: The cpu to run the IRQ on
+ *
+ * Run a void IRQ for its own sake on the target. It's generally
+ * useful for callers which want to run irq_enter() or irq_exit()
+ * on a remote CPU.
+ */
+void kick_cpu_async(int cpu)
+{
+	/*
+	 * NOTE: we could optimize that and spare some IPIs
+	 * after checking that call_single_queue isn't empty before raising.
+	 * This can't be done properly without cmpxchg() though so
+	 * it may make things worse after all. But lets leave that
+	 * possibility open in case people report such issue in the
+	 * future.
+	 */
+	arch_send_call_function_single_ipi(cpu);
+}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 02/10] nohz: Kick full dynticks timer targets with an empty IPI
  2014-07-28 17:37 [PATCH 00/10] nohz: Support sysidle (+ some more nohz kick cleanups) Frederic Weisbecker
  2014-07-28 17:37 ` [PATCH 01/10] smp: Introduce void kick_cpu_async() Frederic Weisbecker
@ 2014-07-28 17:37 ` Frederic Weisbecker
  2014-07-29 11:33   ` Peter Zijlstra
  2014-07-28 17:37 ` [PATCH 03/10] rcu: Kick full dynticks CPU on extended grace period with kick_cpu_async() Frederic Weisbecker
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2014-07-28 17:37 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Ingo Molnar, Nicolas Pitre,
	Paul E. McKenney, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner, Viresh Kumar

While we enqueue a new timer on a dynticks target, we must send it an
IPI so that it reschedules the next tick accordingly.

Now all we need for that is to run irq_exit() on the target. Hence
an empty IRQ is way enough. We don't need to run tick_nohz_full_check()
and all the overhead that comes with the several cmpxchg() necessary for
irq work list and state progress.

So lets use a void irq work instead.

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/sched/core.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7f3063c..171f9ce 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -685,15 +685,13 @@ static void wake_up_idle_cpu(int cpu)
 static bool wake_up_full_nohz_cpu(int cpu)
 {
 	/*
-	 * We just need the target to call irq_exit() and re-evaluate
-	 * the next tick. The nohz full kick at least implies that.
-	 * If needed we can still optimize that later with an
-	 * empty IRQ.
+	 * Kick the target to make it call irq_exit() and re-evaluate
+	 * the next tick.
 	 */
 	if (tick_nohz_full_cpu(cpu)) {
 		if (cpu != smp_processor_id() ||
 		    tick_nohz_tick_stopped())
-			tick_nohz_full_kick_cpu(cpu);
+			kick_cpu_async(cpu);
 		return true;
 	}
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 03/10] rcu: Kick full dynticks CPU on extended grace period with kick_cpu_async()
  2014-07-28 17:37 [PATCH 00/10] nohz: Support sysidle (+ some more nohz kick cleanups) Frederic Weisbecker
  2014-07-28 17:37 ` [PATCH 01/10] smp: Introduce void kick_cpu_async() Frederic Weisbecker
  2014-07-28 17:37 ` [PATCH 02/10] nohz: Kick full dynticks timer targets with an empty IPI Frederic Weisbecker
@ 2014-07-28 17:37 ` Frederic Weisbecker
  2014-07-28 17:37 ` [PATCH 04/10] nohz: Appropriate timekeeper kick on sysidle break Frederic Weisbecker
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Frederic Weisbecker @ 2014-07-28 17:37 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Ingo Molnar, Nicolas Pitre,
	Paul E. McKenney, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner, Viresh Kumar

When a full dynticks CPU stays for too long in the kernel, it may fail
to report quiescent states due to it missing ticks and therefore it can
delay the completion of grace periods.

A way to solve this is to send an IPI to the incriminated CPU such that
it can check rcu_needs_cpu() and reschedule some ticks accordingly to
poll again on quiescent states reports.

This is what we try to achieve while calling rcu_kick_nohz_cpu() but it
has no effect because we trigger a scheduler IPI which is actually a
no-op when not used for scheduler internal purpose, ie: it doesn't call
irq_exit() when not used for remote wakeup or other specifics.

No need to tweak the scheduler IPI further though. Lets keep it clean
of external noise and use an empty IPI instead. We hereby get sure that
we will call irq_exit() on the target without much overhead nor added
noise on scheduler fast path.

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/rcu/tree_plugin.h | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index cbc2c45..551398b 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2401,14 +2401,16 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)
  * off.  RCU will be paying attention to this CPU because it is in the
  * kernel, but the CPU cannot be guaranteed to be executing the RCU state
  * machine because the scheduling-clock tick has been disabled.  Therefore,
- * if an adaptive-ticks CPU is failing to respond to the current grace
- * period and has not be idle from an RCU perspective, kick it.
+ * if a full dynticks CPU is failing to respond to the current grace
+ * period and has not been idle from an RCU perspective, kick it with a
+ * void IRQ so that it can check that RCU needs its tick from rcu_needs_cpu()
+ * on irq exit.
  */
 static void rcu_kick_nohz_cpu(int cpu)
 {
 #ifdef CONFIG_NO_HZ_FULL
 	if (tick_nohz_full_cpu(cpu))
-		smp_send_reschedule(cpu);
+		kick_cpu_async(cpu);
 #endif /* #ifdef CONFIG_NO_HZ_FULL */
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 04/10] nohz: Appropriate timekeeper kick on sysidle break
  2014-07-28 17:37 [PATCH 00/10] nohz: Support sysidle (+ some more nohz kick cleanups) Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2014-07-28 17:37 ` [PATCH 03/10] rcu: Kick full dynticks CPU on extended grace period with kick_cpu_async() Frederic Weisbecker
@ 2014-07-28 17:37 ` Frederic Weisbecker
  2014-07-28 17:37 ` [PATCH 05/10] smp: Fast path check on IPI list Frederic Weisbecker
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Frederic Weisbecker @ 2014-07-28 17:37 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Ingo Molnar, Nicolas Pitre,
	Paul E. McKenney, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner, Viresh Kumar

When a CPU wakes up from idle and finds out that the timekeeper is
sleeping, we need to kick it such that it switches from dynticks to
periodic mode to maintain its timekeeping duty on behalf of the newly
awoken CPU.

However we aren't using the right API for that. rcu_kick_nohz_cpu() is
aimed at waking full dynticks CPUs and not the timekeeper.

Moreover the timekeeper must perform a new dynticks cycle to check the
new sysidle state and restart the tick if necessary. A simple call
to irq_exit() isn't enough.

wake_up_nohz_cpu() is a good fit for that job because it pulls the
target out of the idle loop and restart the tick. Then if no other
task waits for the CPU, it will reenter the idle loop and then the
new sysidle state will be visible and well handled.

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/rcu/tree_plugin.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 551398b..477d81f 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2494,7 +2494,7 @@ void rcu_sysidle_force_exit(void)
 				      oldstate, RCU_SYSIDLE_NOT);
 		if (oldstate == newoldstate &&
 		    oldstate == RCU_SYSIDLE_FULL_NOTED) {
-			rcu_kick_nohz_cpu(tick_do_timer_cpu);
+			wake_up_nohz_cpu(tick_do_timer_cpu);
 			return; /* We cleared it, done! */
 		}
 		oldstate = newoldstate;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 05/10] smp: Fast path check on IPI list
  2014-07-28 17:37 [PATCH 00/10] nohz: Support sysidle (+ some more nohz kick cleanups) Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2014-07-28 17:37 ` [PATCH 04/10] nohz: Appropriate timekeeper kick on sysidle break Frederic Weisbecker
@ 2014-07-28 17:37 ` Frederic Weisbecker
  2014-07-29 12:07   ` Peter Zijlstra
  2014-07-28 17:37 ` [PATCH 06/10] nohz: Define meaningful symbol for nohz full timekeeper Frederic Weisbecker
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2014-07-28 17:37 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Ingo Molnar, Nicolas Pitre,
	Paul E. McKenney, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner, Viresh Kumar

When we enqueue a remote irq work, we trigger the same IPI as those
raised by smp_call_function_*() family.

So when we receive such IPI, we check both irq_work and smp_call_function
queues. Thus if we trigger a remote irq work, we'll likely find the
smp_call_function queue empty unless we collide with concurrent enqueuers
but the probability is low.

Meanwhile, checking the smp_call_function queue can be costly because
we use llist_del_all() which relies on cmpxchg().

We can reduce this overhead by doing a fast path check with llist_empty().
Given the implicit IPI ordering:

       Enqueuer                            Dequeuer
       ---------                           --------
       llist_add(csd, queue)               get_IPI() {
       send_IPI()                              if (llist_empty(queue)
                                                    ...
When the IPI is sent, we are guaranteed that the IPI receiver will
see the new csd.

So lets do the fast path check to optimize non smp_call_function() related
jobs.

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/smp.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 12c5bea..09ddc92 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -184,11 +184,19 @@ static int generic_exec_single(int cpu, struct call_single_data *csd,
  */
 void generic_smp_call_function_single_interrupt(void)
 {
+	struct llist_head *head = &__get_cpu_var(call_single_queue);
 	struct llist_node *entry;
 	struct call_single_data *csd, *csd_next;
 	static bool warned;
 
-	entry = llist_del_all(&__get_cpu_var(call_single_queue));
+	/*
+	 * Fast check: in case of irq work remote queue, the IPI list
+	 * is likely empty. We can spare the expensive llist_del_all().
+	 */
+	if (llist_empty(head))
+		goto irq_work;
+
+	entry = llist_del_all(head);
 	entry = llist_reverse_order(entry);
 
 	/*
@@ -212,6 +220,7 @@ void generic_smp_call_function_single_interrupt(void)
 		csd_unlock(csd);
 	}
 
+irq_work:
 	/*
 	 * Handle irq works queued remotely by irq_work_queue_on().
 	 * Smp functions above are typically synchronous so they
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 06/10] nohz: Define meaningful symbol for nohz full timekeeper
  2014-07-28 17:37 [PATCH 00/10] nohz: Support sysidle (+ some more nohz kick cleanups) Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2014-07-28 17:37 ` [PATCH 05/10] smp: Fast path check on IPI list Frederic Weisbecker
@ 2014-07-28 17:37 ` Frederic Weisbecker
  2014-07-28 17:37 ` [PATCH 07/10] nohz: Enforce timekeeping on CPU 0 Frederic Weisbecker
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Frederic Weisbecker @ 2014-07-28 17:37 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Ingo Molnar, Nicolas Pitre,
	Paul E. McKenney, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner, Viresh Kumar

The nohz full timekeeper is always CPU 0. Lets add it to the list of
special tick_do_timer_cpu symbols for more self explanatory code.

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/time/tick-internal.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index 7ab92b1..6b592a8 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -10,6 +10,7 @@ extern seqlock_t jiffies_lock;
 
 #ifdef CONFIG_GENERIC_CLOCKEVENTS_BUILD
 
+#define TICK_DO_TIMER_DEFAULT	0
 #define TICK_DO_TIMER_NONE	-1
 #define TICK_DO_TIMER_BOOT	-2
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 07/10] nohz: Enforce timekeeping on CPU 0
  2014-07-28 17:37 [PATCH 00/10] nohz: Support sysidle (+ some more nohz kick cleanups) Frederic Weisbecker
                   ` (5 preceding siblings ...)
  2014-07-28 17:37 ` [PATCH 06/10] nohz: Define meaningful symbol for nohz full timekeeper Frederic Weisbecker
@ 2014-07-28 17:37 ` Frederic Weisbecker
  2014-07-29 12:12   ` Peter Zijlstra
  2014-07-28 17:37 ` [PATCH 08/10] nohz: Fetch timekeeping max deferment only for timekeeper Frederic Weisbecker
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2014-07-28 17:37 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Ingo Molnar, Nicolas Pitre,
	Paul E. McKenney, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner, Viresh Kumar

The timekeeper gets initialized to the value of the CPU where the
first clockevent device is setup. This works well because the timekeeper
can be any online CPU in most configs.

Full dynticks has its own requirement though and needs the timekeeper
to always be 0. And this requirement seem to accomodate pretty well with
the above described boot timekeeper setting because the first clockevent
device happens to be initialized, most of the time, on the boot CPU
(which should be CPU 0).

However there is no mention of such a guarantee anywhere. This assumption
might well be defeated on some corner case now or in the future.

So lets wipe out the FUD and force tick_do_timer_cpu to CPU 0 on boot
when full dynticks is used.

This way we can even remove some corner case code that handled scenarios
where all clockevent devices were setup on full dynticks CPUs.

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/time/tick-common.c | 6 +++---
 kernel/time/tick-sched.c  | 6 ------
 2 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 0a0608e..cb57bce 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -179,10 +179,10 @@ static void tick_setup_device(struct tick_device *td,
 		 * this cpu:
 		 */
 		if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
-			if (!tick_nohz_full_cpu(cpu))
-				tick_do_timer_cpu = cpu;
+			if (tick_nohz_full_enabled())
+				tick_do_timer_cpu = TICK_DO_TIMER_DEFAULT;
 			else
-				tick_do_timer_cpu = TICK_DO_TIMER_NONE;
+				tick_do_timer_cpu = cpu;
 			tick_next_period = ktime_get();
 			tick_period = ktime_set(0, NSEC_PER_SEC / HZ);
 		}
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 3d63944..2ea2143 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -741,12 +741,6 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 		 */
 		if (tick_do_timer_cpu == cpu)
 			return false;
-		/*
-		 * Boot safety: make sure the timekeeping duty has been
-		 * assigned before entering dyntick-idle mode,
-		 */
-		if (tick_do_timer_cpu == TICK_DO_TIMER_NONE)
-			return false;
 	}
 
 	return true;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 08/10] nohz: Fetch timekeeping max deferment only for timekeeper
  2014-07-28 17:37 [PATCH 00/10] nohz: Support sysidle (+ some more nohz kick cleanups) Frederic Weisbecker
                   ` (6 preceding siblings ...)
  2014-07-28 17:37 ` [PATCH 07/10] nohz: Enforce timekeeping on CPU 0 Frederic Weisbecker
@ 2014-07-28 17:37 ` Frederic Weisbecker
  2014-07-28 17:37 ` [PATCH 09/10] nohz: Switch nohz full timekeeper to dynticks idle on top of sysidle detection Frederic Weisbecker
  2014-07-28 17:37 ` [PATCH 10/10] nohz: Warn on illegal timekeeper switch in nohz full Frederic Weisbecker
  9 siblings, 0 replies; 20+ messages in thread
From: Frederic Weisbecker @ 2014-07-28 17:37 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Ingo Molnar, Nicolas Pitre,
	Paul E. McKenney, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner, Viresh Kumar

We fetch it unconditionally from the tick stop code whereas only
the timekeeper, or the CPU that carried that duty last, needs it.

Fetching the timekeeping max deferment should be lightweight but it
still involves a few read side barriers and a seqcount that may well
be cache cold for non-timekeepers.

So lets spare it when possible by inverting the way we handle timekeeper
deferment state machine.

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/time/tick-sched.c | 57 ++++++++++++++++++++++++++----------------------
 1 file changed, 31 insertions(+), 26 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 2ea2143..bcba79d 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -529,6 +529,36 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
 }
 EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us);
 
+/*
+ * If this cpu is the one which updates jiffies, then
+ * give up the assignment and let it be taken by the
+ * cpu which runs the tick timer next, which might be
+ * this cpu as well. If we don't drop this here the
+ * jiffies might be stale and do_timer() never
+ * invoked. Keep track of the fact that it was the one
+ * which had the do_timer() duty last. If this cpu is
+ * the one which had the do_timer() duty last, we
+ * limit the sleep time to the timekeeping max deferement
+ * value. Otherwise we can sleep as long as we want.
+ */
+static u64 timekeeping_deferment(struct tick_sched *ts, int cpu)
+{
+	u64 time_delta = KTIME_MAX;
+
+	if (tick_do_timer_cpu == cpu) {
+		time_delta = timekeeping_max_deferment();
+		tick_do_timer_cpu = TICK_DO_TIMER_NONE;
+		ts->do_timer_last = 1;
+	} else if (ts->do_timer_last) {
+		if (tick_do_timer_cpu == TICK_DO_TIMER_NONE)
+			time_delta = timekeeping_max_deferment();
+		else
+			ts->do_timer_last = 0;
+	}
+
+	return time_delta;
+}
+
 static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 					 ktime_t now, int cpu)
 {
@@ -536,9 +566,6 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	ktime_t last_update, expires, ret = { .tv64 = 0 };
 	unsigned long rcu_delta_jiffies;
 	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
-	u64 time_delta;
-
-	time_delta = timekeeping_max_deferment();
 
 	/* Read jiffies and the time when jiffies were updated last */
 	do {
@@ -570,29 +597,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 
 	/* Schedule the tick, if we are at least one jiffie off */
 	if ((long)delta_jiffies >= 1) {
-
-		/*
-		 * If this cpu is the one which updates jiffies, then
-		 * give up the assignment and let it be taken by the
-		 * cpu which runs the tick timer next, which might be
-		 * this cpu as well. If we don't drop this here the
-		 * jiffies might be stale and do_timer() never
-		 * invoked. Keep track of the fact that it was the one
-		 * which had the do_timer() duty last. If this cpu is
-		 * the one which had the do_timer() duty last, we
-		 * limit the sleep time to the timekeeping
-		 * max_deferement value which we retrieved
-		 * above. Otherwise we can sleep as long as we want.
-		 */
-		if (cpu == tick_do_timer_cpu) {
-			tick_do_timer_cpu = TICK_DO_TIMER_NONE;
-			ts->do_timer_last = 1;
-		} else if (tick_do_timer_cpu != TICK_DO_TIMER_NONE) {
-			time_delta = KTIME_MAX;
-			ts->do_timer_last = 0;
-		} else if (!ts->do_timer_last) {
-			time_delta = KTIME_MAX;
-		}
+		u64 time_delta = timekeeping_deferment(ts, cpu);
 
 #ifdef CONFIG_NO_HZ_FULL
 		if (!ts->inidle) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 09/10] nohz: Switch nohz full timekeeper to dynticks idle on top of sysidle detection
  2014-07-28 17:37 [PATCH 00/10] nohz: Support sysidle (+ some more nohz kick cleanups) Frederic Weisbecker
                   ` (7 preceding siblings ...)
  2014-07-28 17:37 ` [PATCH 08/10] nohz: Fetch timekeeping max deferment only for timekeeper Frederic Weisbecker
@ 2014-07-28 17:37 ` Frederic Weisbecker
  2014-07-29 12:17   ` Peter Zijlstra
  2014-07-28 17:37 ` [PATCH 10/10] nohz: Warn on illegal timekeeper switch in nohz full Frederic Weisbecker
  9 siblings, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2014-07-28 17:37 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Ingo Molnar, Nicolas Pitre,
	Paul E. McKenney, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner, Viresh Kumar

In full dynticks, the CPU 0 carries the timekeeping duty on behalf
of all other CPUs in the system. This way full dynticks are left
undisturbed on this regard.

Of course this prevents CPU 0 from entering in dynticks idle mode
because any CPU may need uptodate timekeeping at any time.

Theoretically though, we could put CPU 0 in dynticks idle mode once we
are sure that all other CPUs are dynticks idle as well. Then when a
CPU wakes up and finds the timekeeper idle, it would send an IPI to
wake it up on its duty.

Such a machine state needs to take care of all the races in the way, make
sure that CPU 0 is neither stuck accidentally to sleep for ever, nor
stuck in periodic mode when it could sleep. Also given the amount of
shared data this involves and their access frequency, this must be built
on top of lockless low-overhead state machine.

This is what sysidle provides. The feature is ready for a while, we
were just waiting for the nohz susbsystem to support it. And we just
reached that state.

So lets defer the last call for CPU 0 to enter in dynticks idle to when
we find a full system idle state. And lets wake it up when its duty is
needed.

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/time/tick-sched.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index bcba79d..845aaff 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -547,8 +547,10 @@ static u64 timekeeping_deferment(struct tick_sched *ts, int cpu)
 
 	if (tick_do_timer_cpu == cpu) {
 		time_delta = timekeeping_max_deferment();
-		tick_do_timer_cpu = TICK_DO_TIMER_NONE;
 		ts->do_timer_last = 1;
+		/* In full dynticks mode, CPU 0 always keeps the duty */
+		if (!tick_nohz_full_enabled())
+			tick_do_timer_cpu = TICK_DO_TIMER_NONE;
 	} else if (ts->do_timer_last) {
 		if (tick_do_timer_cpu == TICK_DO_TIMER_NONE)
 			time_delta = timekeeping_max_deferment();
@@ -745,7 +747,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 		 * if there are full dynticks CPUs around
 		 */
 		if (tick_do_timer_cpu == cpu)
-			return false;
+			return rcu_sys_is_idle();
 	}
 
 	return true;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 10/10] nohz: Warn on illegal timekeeper switch in nohz full
  2014-07-28 17:37 [PATCH 00/10] nohz: Support sysidle (+ some more nohz kick cleanups) Frederic Weisbecker
                   ` (8 preceding siblings ...)
  2014-07-28 17:37 ` [PATCH 09/10] nohz: Switch nohz full timekeeper to dynticks idle on top of sysidle detection Frederic Weisbecker
@ 2014-07-28 17:37 ` Frederic Weisbecker
  9 siblings, 0 replies; 20+ messages in thread
From: Frederic Weisbecker @ 2014-07-28 17:37 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Ingo Molnar, Nicolas Pitre,
	Paul E. McKenney, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner, Viresh Kumar

In full dynticks idle mode, the timekeeper should never be set to another
CPU than 0.

Lets enforce that through a dedicated mutator.

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/time/tick-common.c   | 9 ++++-----
 kernel/time/tick-internal.h | 7 +++++++
 kernel/time/tick-sched.c    | 7 +++----
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index cb57bce..72ccaf2 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -180,9 +180,9 @@ static void tick_setup_device(struct tick_device *td,
 		 */
 		if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
 			if (tick_nohz_full_enabled())
-				tick_do_timer_cpu = TICK_DO_TIMER_DEFAULT;
+				tick_do_timer_cpu_set(TICK_DO_TIMER_DEFAULT);
 			else
-				tick_do_timer_cpu = cpu;
+				tick_do_timer_cpu_set(cpu);
 			tick_next_period = ktime_get();
 			tick_period = ktime_set(0, NSEC_PER_SEC / HZ);
 		}
@@ -341,9 +341,8 @@ void tick_handover_do_timer(int *cpup)
 {
 	if (*cpup == tick_do_timer_cpu) {
 		int cpu = cpumask_first(cpu_online_mask);
-
-		tick_do_timer_cpu = (cpu < nr_cpu_ids) ? cpu :
-			TICK_DO_TIMER_NONE;
+		tick_do_timer_cpu_set((cpu < nr_cpu_ids) ? cpu :
+				      TICK_DO_TIMER_NONE);
 	}
 }
 
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index 6b592a8..eadfd89 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -23,6 +23,13 @@ extern void tick_setup_periodic(struct clock_event_device *dev, int broadcast);
 extern void tick_handle_periodic(struct clock_event_device *dev);
 extern void tick_check_new_device(struct clock_event_device *dev);
 extern void tick_handover_do_timer(int *cpup);
+
+static inline void tick_do_timer_cpu_set(int cpu)
+{
+	WARN_ON_ONCE(tick_nohz_full_enabled() && cpu != TICK_DO_TIMER_DEFAULT);
+	tick_do_timer_cpu = cpu;
+}
+
 extern void tick_shutdown(unsigned int *cpup);
 extern void tick_suspend(void);
 extern void tick_resume(void);
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 845aaff..3b5102a 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -123,7 +123,7 @@ static void tick_sched_do_timer(ktime_t now)
 	 */
 	if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)
 	    && !tick_nohz_full_cpu(cpu))
-		tick_do_timer_cpu = cpu;
+		tick_do_timer_cpu_set(cpu);
 #endif
 
 	/* Check, if the jiffies need an update */
@@ -550,7 +550,7 @@ static u64 timekeeping_deferment(struct tick_sched *ts, int cpu)
 		ts->do_timer_last = 1;
 		/* In full dynticks mode, CPU 0 always keeps the duty */
 		if (!tick_nohz_full_enabled())
-			tick_do_timer_cpu = TICK_DO_TIMER_NONE;
+			tick_do_timer_cpu_set(TICK_DO_TIMER_NONE);
 	} else if (ts->do_timer_last) {
 		if (tick_do_timer_cpu == TICK_DO_TIMER_NONE)
 			time_delta = timekeeping_max_deferment();
@@ -600,7 +600,6 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	/* Schedule the tick, if we are at least one jiffie off */
 	if ((long)delta_jiffies >= 1) {
 		u64 time_delta = timekeeping_deferment(ts, cpu);
-
 #ifdef CONFIG_NO_HZ_FULL
 		if (!ts->inidle) {
 			time_delta = min(time_delta,
@@ -717,7 +716,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 	 */
 	if (unlikely(!cpu_online(cpu))) {
 		if (cpu == tick_do_timer_cpu)
-			tick_do_timer_cpu = TICK_DO_TIMER_NONE;
+			tick_do_timer_cpu_set(TICK_DO_TIMER_NONE);
 		return false;
 	}
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 02/10] nohz: Kick full dynticks timer targets with an empty IPI
  2014-07-28 17:37 ` [PATCH 02/10] nohz: Kick full dynticks timer targets with an empty IPI Frederic Weisbecker
@ 2014-07-29 11:33   ` Peter Zijlstra
  2014-07-29 21:53     ` Frederic Weisbecker
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2014-07-29 11:33 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Ingo Molnar, Nicolas Pitre, Paul E. McKenney,
	Steven Rostedt, Thomas Gleixner, Viresh Kumar

On Mon, Jul 28, 2014 at 07:37:28PM +0200, Frederic Weisbecker wrote:
>  	if (tick_nohz_full_cpu(cpu)) {
>  		if (cpu != smp_processor_id() ||
>  		    tick_nohz_tick_stopped())

Not really a comment on this patch per-se, but why is that line broken?
If I concat them its only 74 chars long.

> +			kick_cpu_async(cpu);
>  		return true;
>  	}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 05/10] smp: Fast path check on IPI list
  2014-07-28 17:37 ` [PATCH 05/10] smp: Fast path check on IPI list Frederic Weisbecker
@ 2014-07-29 12:07   ` Peter Zijlstra
  2014-07-29 21:54     ` Frederic Weisbecker
  2014-07-29 21:55     ` Frederic Weisbecker
  0 siblings, 2 replies; 20+ messages in thread
From: Peter Zijlstra @ 2014-07-29 12:07 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Ingo Molnar, Nicolas Pitre, Paul E. McKenney,
	Steven Rostedt, Thomas Gleixner, Viresh Kumar

On Mon, Jul 28, 2014 at 07:37:31PM +0200, Frederic Weisbecker wrote:
> When we enqueue a remote irq work, we trigger the same IPI as those
> raised by smp_call_function_*() family.
> 
> So when we receive such IPI, we check both irq_work and smp_call_function
> queues. Thus if we trigger a remote irq work, we'll likely find the
> smp_call_function queue empty unless we collide with concurrent enqueuers
> but the probability is low.
> 
> Meanwhile, checking the smp_call_function queue can be costly because
> we use llist_del_all() which relies on cmpxchg().

xchg()

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 07/10] nohz: Enforce timekeeping on CPU 0
  2014-07-28 17:37 ` [PATCH 07/10] nohz: Enforce timekeeping on CPU 0 Frederic Weisbecker
@ 2014-07-29 12:12   ` Peter Zijlstra
  2014-07-30 13:23     ` Frederic Weisbecker
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2014-07-29 12:12 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Ingo Molnar, Nicolas Pitre, Paul E. McKenney,
	Steven Rostedt, Thomas Gleixner, Viresh Kumar

On Mon, Jul 28, 2014 at 07:37:33PM +0200, Frederic Weisbecker wrote:
> The timekeeper gets initialized to the value of the CPU where the
> first clockevent device is setup. This works well because the timekeeper
> can be any online CPU in most configs.
> 
> Full dynticks has its own requirement though and needs the timekeeper
> to always be 0. And this requirement seem to accomodate pretty well with
> the above described boot timekeeper setting because the first clockevent
> device happens to be initialized, most of the time, on the boot CPU
> (which should be CPU 0).

This isn't true in general, Voyager (which we dropped support for iirc)
had a boot cpu != 0, and I think there's ARM platforms where the same
can be true.

> However there is no mention of such a guarantee anywhere. This assumption
> might well be defeated on some corner case now or in the future.

Right..

> So lets wipe out the FUD and force tick_do_timer_cpu to CPU 0 on boot
> when full dynticks is used.
> 
> This way we can even remove some corner case code that handled scenarios
> where all clockevent devices were setup on full dynticks CPUs.

> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index 0a0608e..cb57bce 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -179,10 +179,10 @@ static void tick_setup_device(struct tick_device *td,
>  		 * this cpu:
>  		 */
>  		if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
> -			if (!tick_nohz_full_cpu(cpu))
> -				tick_do_timer_cpu = cpu;
> +			if (tick_nohz_full_enabled())
> +				tick_do_timer_cpu = TICK_DO_TIMER_DEFAULT;
>  			else
> -				tick_do_timer_cpu = TICK_DO_TIMER_NONE;
> +				tick_do_timer_cpu = cpu;
>  			tick_next_period = ktime_get();
>  			tick_period = ktime_set(0, NSEC_PER_SEC / HZ);
>  		}

So from what I can tell this code can get called before SMP setup, which
would mean we could get here before CPU0 is online?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 09/10] nohz: Switch nohz full timekeeper to dynticks idle on top of sysidle detection
  2014-07-28 17:37 ` [PATCH 09/10] nohz: Switch nohz full timekeeper to dynticks idle on top of sysidle detection Frederic Weisbecker
@ 2014-07-29 12:17   ` Peter Zijlstra
  2014-07-29 22:04     ` Frederic Weisbecker
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2014-07-29 12:17 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Ingo Molnar, Nicolas Pitre, Paul E. McKenney,
	Steven Rostedt, Thomas Gleixner, Viresh Kumar

On Mon, Jul 28, 2014 at 07:37:35PM +0200, Frederic Weisbecker wrote:

> @@ -745,7 +747,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
>  		 * if there are full dynticks CPUs around
>  		 */
>  		if (tick_do_timer_cpu == cpu)
> -			return false;
> +			return rcu_sys_is_idle();
>  	}

I still feel its entirely upside down to rely on RCU for this...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 02/10] nohz: Kick full dynticks timer targets with an empty IPI
  2014-07-29 11:33   ` Peter Zijlstra
@ 2014-07-29 21:53     ` Frederic Weisbecker
  0 siblings, 0 replies; 20+ messages in thread
From: Frederic Weisbecker @ 2014-07-29 21:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Ingo Molnar, Nicolas Pitre, Paul E. McKenney,
	Steven Rostedt, Thomas Gleixner, Viresh Kumar

On Tue, Jul 29, 2014 at 01:33:07PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 28, 2014 at 07:37:28PM +0200, Frederic Weisbecker wrote:
> >  	if (tick_nohz_full_cpu(cpu)) {
> >  		if (cpu != smp_processor_id() ||
> >  		    tick_nohz_tick_stopped())
> 
> Not really a comment on this patch per-se, but why is that line broken?
> If I concat them its only 74 chars long.

Ah, probably a bad anticipation from me then :)

> 
> > +			kick_cpu_async(cpu);
> >  		return true;
> >  	}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 05/10] smp: Fast path check on IPI list
  2014-07-29 12:07   ` Peter Zijlstra
@ 2014-07-29 21:54     ` Frederic Weisbecker
  2014-07-29 21:55     ` Frederic Weisbecker
  1 sibling, 0 replies; 20+ messages in thread
From: Frederic Weisbecker @ 2014-07-29 21:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Ingo Molnar, Nicolas Pitre, Paul E. McKenney,
	Steven Rostedt, Thomas Gleixner, Viresh Kumar

On Tue, Jul 29, 2014 at 02:07:39PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 28, 2014 at 07:37:31PM +0200, Frederic Weisbecker wrote:
> > When we enqueue a remote irq work, we trigger the same IPI as those
> > raised by smp_call_function_*() family.
> > 
> > So when we receive such IPI, we check both irq_work and smp_call_function
> > queues. Thus if we trigger a remote irq work, we'll likely find the
> > smp_call_function queue empty unless we collide with concurrent enqueuers
> > but the probability is low.
> > 
> > Meanwhile, checking the smp_call_function queue can be costly because
> > we use llist_del_all() which relies on cmpxchg().
> 
> xchg()

Ah, right!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 05/10] smp: Fast path check on IPI list
  2014-07-29 12:07   ` Peter Zijlstra
  2014-07-29 21:54     ` Frederic Weisbecker
@ 2014-07-29 21:55     ` Frederic Weisbecker
  1 sibling, 0 replies; 20+ messages in thread
From: Frederic Weisbecker @ 2014-07-29 21:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Ingo Molnar, Nicolas Pitre, Paul E. McKenney,
	Steven Rostedt, Thomas Gleixner, Viresh Kumar

On Tue, Jul 29, 2014 at 02:07:39PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 28, 2014 at 07:37:31PM +0200, Frederic Weisbecker wrote:
> > When we enqueue a remote irq work, we trigger the same IPI as those
> > raised by smp_call_function_*() family.
> > 
> > So when we receive such IPI, we check both irq_work and smp_call_function
> > queues. Thus if we trigger a remote irq work, we'll likely find the
> > smp_call_function queue empty unless we collide with concurrent enqueuers
> > but the probability is low.
> > 
> > Meanwhile, checking the smp_call_function queue can be costly because
> > we use llist_del_all() which relies on cmpxchg().
> 
> xchg()

It's still costly though as it's an atomic read/write with full barrier. I'll
update the comment meanwhile.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 09/10] nohz: Switch nohz full timekeeper to dynticks idle on top of sysidle detection
  2014-07-29 12:17   ` Peter Zijlstra
@ 2014-07-29 22:04     ` Frederic Weisbecker
  0 siblings, 0 replies; 20+ messages in thread
From: Frederic Weisbecker @ 2014-07-29 22:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Ingo Molnar, Nicolas Pitre, Paul E. McKenney,
	Steven Rostedt, Thomas Gleixner, Viresh Kumar

On Tue, Jul 29, 2014 at 02:17:53PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 28, 2014 at 07:37:35PM +0200, Frederic Weisbecker wrote:
> 
> > @@ -745,7 +747,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
> >  		 * if there are full dynticks CPUs around
> >  		 */
> >  		if (tick_do_timer_cpu == cpu)
> > -			return false;
> > +			return rcu_sys_is_idle();
> >  	}
> 
> I still feel its entirely upside down to rely on RCU for this...

RCU had this feature internally for a while. Then comes nohz. I guess in
a perfect world we would make it a library that RCU and nohz could use.
It's probably possible to do but right now it is too tied to RCU internals.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 07/10] nohz: Enforce timekeeping on CPU 0
  2014-07-29 12:12   ` Peter Zijlstra
@ 2014-07-30 13:23     ` Frederic Weisbecker
  0 siblings, 0 replies; 20+ messages in thread
From: Frederic Weisbecker @ 2014-07-30 13:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Ingo Molnar, Nicolas Pitre, Paul E. McKenney,
	Steven Rostedt, Thomas Gleixner, Viresh Kumar

On Tue, Jul 29, 2014 at 02:12:37PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 28, 2014 at 07:37:33PM +0200, Frederic Weisbecker wrote:
> > The timekeeper gets initialized to the value of the CPU where the
> > first clockevent device is setup. This works well because the timekeeper
> > can be any online CPU in most configs.
> > 
> > Full dynticks has its own requirement though and needs the timekeeper
> > to always be 0. And this requirement seem to accomodate pretty well with
> > the above described boot timekeeper setting because the first clockevent
> > device happens to be initialized, most of the time, on the boot CPU
> > (which should be CPU 0).
> 
> This isn't true in general, Voyager (which we dropped support for iirc)
> had a boot cpu != 0, and I think there's ARM platforms where the same
> can be true.

Ah I didn't know that. But I heard before that assuming CPU0 as the boot
CPU is a mistake.

> 
> > However there is no mention of such a guarantee anywhere. This assumption
> > might well be defeated on some corner case now or in the future.
> 
> Right..
> 
> > So lets wipe out the FUD and force tick_do_timer_cpu to CPU 0 on boot
> > when full dynticks is used.
> > 
> > This way we can even remove some corner case code that handled scenarios
> > where all clockevent devices were setup on full dynticks CPUs.
> 
> > diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> > index 0a0608e..cb57bce 100644
> > --- a/kernel/time/tick-common.c
> > +++ b/kernel/time/tick-common.c
> > @@ -179,10 +179,10 @@ static void tick_setup_device(struct tick_device *td,
> >  		 * this cpu:
> >  		 */
> >  		if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
> > -			if (!tick_nohz_full_cpu(cpu))
> > -				tick_do_timer_cpu = cpu;
> > +			if (tick_nohz_full_enabled())
> > +				tick_do_timer_cpu = TICK_DO_TIMER_DEFAULT;
> >  			else
> > -				tick_do_timer_cpu = TICK_DO_TIMER_NONE;
> > +				tick_do_timer_cpu = cpu;
> >  			tick_next_period = ktime_get();
> >  			tick_period = ktime_set(0, NSEC_PER_SEC / HZ);
> >  		}
> 
> So from what I can tell this code can get called before SMP setup, which
> would mean we could get here before CPU0 is online?

If CPU0 is a secondary CPU then yeah we have a problem.

Look, the most important for me is to determine the boot CPU and make sure
we don't need to offline it for suspend.

Looking at disable_nonboot_cpus(), it seems that the lowest number from the online set
is considered the boot CPU. But if CPU0 boots after the boot CPU, this assumption is defeated,
right?

Probably disable_nonboot_cpus() doesn't look for the real boot CPU but
rather it just tries to keep at least one CPU alive for stopping and resuming the
suspend process.

It looks like the only sane thing I can do is:

1) Pick up the boot CPU (the one that calls tick_nohz_init() for example) as the
default timekeeper.

2) Let it be offlined but then handover its timekeeping duty to any online CPU,
whether full dynticks or not (of course then the full dynticks timekeeper won't be
ever allowed to enter in full dynticks mode), we don't have the choice anyway.

But then we need a way to reset the sysidle state when we handover the timekeeping
duty.

Big headaches on the horizon!

Why haven't we kept CPU0 as the boot CPU for everyone? Things were just simple that way...

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2014-07-30 13:23 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-28 17:37 [PATCH 00/10] nohz: Support sysidle (+ some more nohz kick cleanups) Frederic Weisbecker
2014-07-28 17:37 ` [PATCH 01/10] smp: Introduce void kick_cpu_async() Frederic Weisbecker
2014-07-28 17:37 ` [PATCH 02/10] nohz: Kick full dynticks timer targets with an empty IPI Frederic Weisbecker
2014-07-29 11:33   ` Peter Zijlstra
2014-07-29 21:53     ` Frederic Weisbecker
2014-07-28 17:37 ` [PATCH 03/10] rcu: Kick full dynticks CPU on extended grace period with kick_cpu_async() Frederic Weisbecker
2014-07-28 17:37 ` [PATCH 04/10] nohz: Appropriate timekeeper kick on sysidle break Frederic Weisbecker
2014-07-28 17:37 ` [PATCH 05/10] smp: Fast path check on IPI list Frederic Weisbecker
2014-07-29 12:07   ` Peter Zijlstra
2014-07-29 21:54     ` Frederic Weisbecker
2014-07-29 21:55     ` Frederic Weisbecker
2014-07-28 17:37 ` [PATCH 06/10] nohz: Define meaningful symbol for nohz full timekeeper Frederic Weisbecker
2014-07-28 17:37 ` [PATCH 07/10] nohz: Enforce timekeeping on CPU 0 Frederic Weisbecker
2014-07-29 12:12   ` Peter Zijlstra
2014-07-30 13:23     ` Frederic Weisbecker
2014-07-28 17:37 ` [PATCH 08/10] nohz: Fetch timekeeping max deferment only for timekeeper Frederic Weisbecker
2014-07-28 17:37 ` [PATCH 09/10] nohz: Switch nohz full timekeeper to dynticks idle on top of sysidle detection Frederic Weisbecker
2014-07-29 12:17   ` Peter Zijlstra
2014-07-29 22:04     ` Frederic Weisbecker
2014-07-28 17:37 ` [PATCH 10/10] nohz: Warn on illegal timekeeper switch in nohz full Frederic Weisbecker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.