linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] nohz: Move nohz kick out of scheduler IPI, v5
@ 2014-05-13 22:25 Frederic Weisbecker
  2014-05-13 22:25 ` [PATCH 1/3] irq_work: Implement remote queueing Frederic Weisbecker
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2014-05-13 22:25 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, Ingo Molnar, Kevin Hilman,
	Paul E. McKenney, Peter Zijlstra, Thomas Gleixner, Viresh Kumar

So I removed all the part that tried to avoid the tick for the nohz
IPI since the lockdep report I saw was actually about other issues
related to locking scenarios of my own brain.

Now it's much simplified!

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	timers/nohz-irq-work-v3

Thanks,
	Frederic
---

Frederic Weisbecker (3):
      irq_work: Implement remote queueing
      nohz: Move full nohz kick to its own IPI
      nohz: Use IPI implicit full barrier against rq->nr_running r/w


 include/linux/irq_work.h |  2 ++
 include/linux/tick.h     |  9 ++++++++-
 kernel/irq_work.c        | 19 ++++++++++++++++++-
 kernel/sched/core.c      | 14 ++++++--------
 kernel/sched/sched.h     | 12 +++++++++---
 kernel/smp.c             |  4 ++++
 kernel/time/tick-sched.c | 10 ++++++----
 7 files changed, 53 insertions(+), 17 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] irq_work: Implement remote queueing
  2014-05-13 22:25 [PATCH 0/3] nohz: Move nohz kick out of scheduler IPI, v5 Frederic Weisbecker
@ 2014-05-13 22:25 ` Frederic Weisbecker
  2014-05-14  9:06   ` Peter Zijlstra
  2014-05-13 22:25 ` [PATCH 2/3] nohz: Move full nohz kick to its own IPI Frederic Weisbecker
  2014-05-13 22:25 ` [PATCH 3/3] nohz: Use IPI implicit full barrier against rq->nr_running r/w Frederic Weisbecker
  2 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2014-05-13 22:25 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, Ingo Molnar, Kevin Hilman,
	Paul E. McKenney, Peter Zijlstra, Thomas Gleixner, Viresh Kumar

irq work currently only supports local callbacks. However its code
is mostly ready to run remote callbacks and we have some potential user.

The full nohz subsystem currently open codes its own remote irq work
on top of the scheduler ipi when it wants a CPU to reevaluate its next
tick. However this ad hoc solution bloats the scheduler IPI.

Lets just extend the irq work subsystem to support remote queuing on top
of the generic SMP IPI to handle this kind of user. This shouldn't add
noticeable overhead.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/irq_work.h |  2 ++
 kernel/irq_work.c        | 19 ++++++++++++++++++-
 kernel/smp.c             |  4 ++++
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index 19ae05d..ae44aa2 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -33,6 +33,8 @@ void init_irq_work(struct irq_work *work, void (*func)(struct irq_work *))
 #define DEFINE_IRQ_WORK(name, _f) struct irq_work name = { .func = (_f), }
 
 bool irq_work_queue(struct irq_work *work);
+bool irq_work_queue_on(struct irq_work *work, int cpu);
+
 void irq_work_run(void);
 void irq_work_sync(struct irq_work *work);
 
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index a82170e..9f9be55 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -56,11 +56,28 @@ void __weak arch_irq_work_raise(void)
 }
 
 /*
- * Enqueue the irq_work @entry unless it's already pending
+ * Enqueue the irq_work @work on @cpu unless it's already pending
  * somewhere.
  *
  * Can be re-enqueued while the callback is still in progress.
  */
+bool irq_work_queue_on(struct irq_work *work, int cpu)
+{
+	/* Only queue if not already pending */
+	if (!irq_work_claim(work))
+		return false;
+
+	/* All work should have been flushed before going offline */
+	WARN_ON_ONCE(cpu_is_offline(cpu));
+
+	llist_add(&work->llnode, &per_cpu(irq_work_list, cpu));
+	native_send_call_func_single_ipi(cpu);
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(irq_work_queue_on);
+
+/* Enqueue the irq work @work on the current CPU */
 bool irq_work_queue(struct irq_work *work)
 {
 	/* Only queue if not already pending */
diff --git a/kernel/smp.c b/kernel/smp.c
index 06d574e..ba0d8fd 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -3,6 +3,7 @@
  *
  * (C) Jens Axboe <jens.axboe@oracle.com> 2008
  */
+#include <linux/irq_work.h>
 #include <linux/rcupdate.h>
 #include <linux/rculist.h>
 #include <linux/kernel.h>
@@ -198,6 +199,9 @@ void generic_smp_call_function_single_interrupt(void)
 		csd->func(csd->info);
 		csd_unlock(csd);
 	}
+
+	/* Handle irq works queued remotely by irq_work_queue_on() */
+	irq_work_run();
 }
 
 /*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/3] nohz: Move full nohz kick to its own IPI
  2014-05-13 22:25 [PATCH 0/3] nohz: Move nohz kick out of scheduler IPI, v5 Frederic Weisbecker
  2014-05-13 22:25 ` [PATCH 1/3] irq_work: Implement remote queueing Frederic Weisbecker
@ 2014-05-13 22:25 ` Frederic Weisbecker
  2014-05-13 22:25 ` [PATCH 3/3] nohz: Use IPI implicit full barrier against rq->nr_running r/w Frederic Weisbecker
  2 siblings, 0 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2014-05-13 22:25 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, Ingo Molnar, Kevin Hilman,
	Paul E. McKenney, Peter Zijlstra, Thomas Gleixner, Viresh Kumar

Now that the irq work subsystem can queue remote callbacks, it's
a perfect fit to safely queue IPIs when interrupts are disabled
without worrying about concurrent callers.

Lets use it for the full dynticks kick to notify a CPU that it's
exiting single task mode.

This unbloats a bit the scheduler IPI that the nohz code was abusing
for its cool "callable anywhere/anytime" properties.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/tick.h     |  9 ++++++++-
 kernel/sched/core.c      |  5 +----
 kernel/sched/sched.h     |  2 +-
 kernel/time/tick-sched.c | 10 ++++++----
 4 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index b84773c..8a4987f 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -181,7 +181,13 @@ static inline bool tick_nohz_full_cpu(int cpu)
 
 extern void tick_nohz_init(void);
 extern void __tick_nohz_full_check(void);
-extern void tick_nohz_full_kick(void);
+extern void tick_nohz_full_kick_cpu(int cpu);
+
+static inline void tick_nohz_full_kick(void)
+{
+	tick_nohz_full_kick_cpu(smp_processor_id());
+}
+
 extern void tick_nohz_full_kick_all(void);
 extern void __tick_nohz_task_switch(struct task_struct *tsk);
 #else
@@ -189,6 +195,7 @@ static inline void tick_nohz_init(void) { }
 static inline bool tick_nohz_full_enabled(void) { return false; }
 static inline bool tick_nohz_full_cpu(int cpu) { return false; }
 static inline void __tick_nohz_full_check(void) { }
+static inline void tick_nohz_full_kick_cpu(int cpu) { }
 static inline void tick_nohz_full_kick(void) { }
 static inline void tick_nohz_full_kick_all(void) { }
 static inline void __tick_nohz_task_switch(struct task_struct *tsk) { }
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d9d8ece..fb6dfad 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1500,9 +1500,7 @@ void scheduler_ipi(void)
 	 */
 	preempt_fold_need_resched();
 
-	if (llist_empty(&this_rq()->wake_list)
-			&& !tick_nohz_full_cpu(smp_processor_id())
-			&& !got_nohz_idle_kick())
+	if (llist_empty(&this_rq()->wake_list) && !got_nohz_idle_kick())
 		return;
 
 	/*
@@ -1519,7 +1517,6 @@ void scheduler_ipi(void)
 	 * somewhat pessimize the simple resched case.
 	 */
 	irq_enter();
-	tick_nohz_full_check();
 	sched_ttwu_pending();
 
 	/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 456e492..6089e00 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1225,7 +1225,7 @@ static inline void inc_nr_running(struct rq *rq)
 		if (tick_nohz_full_cpu(rq->cpu)) {
 			/* Order rq->nr_running write against the IPI */
 			smp_wmb();
-			smp_send_reschedule(rq->cpu);
+			tick_nohz_full_kick_cpu(rq->cpu);
 		}
        }
 #endif
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 6558b7a..3d63944 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -224,13 +224,15 @@ static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
 };
 
 /*
- * Kick the current CPU if it's full dynticks in order to force it to
+ * Kick the CPU if it's full dynticks in order to force it to
  * re-evaluate its dependency on the tick and restart it if necessary.
  */
-void tick_nohz_full_kick(void)
+void tick_nohz_full_kick_cpu(int cpu)
 {
-	if (tick_nohz_full_cpu(smp_processor_id()))
-		irq_work_queue(&__get_cpu_var(nohz_full_kick_work));
+	if (!tick_nohz_full_cpu(cpu))
+		return;
+
+	irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu);
 }
 
 static void nohz_full_kick_ipi(void *info)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/3] nohz: Use IPI implicit full barrier against rq->nr_running r/w
  2014-05-13 22:25 [PATCH 0/3] nohz: Move nohz kick out of scheduler IPI, v5 Frederic Weisbecker
  2014-05-13 22:25 ` [PATCH 1/3] irq_work: Implement remote queueing Frederic Weisbecker
  2014-05-13 22:25 ` [PATCH 2/3] nohz: Move full nohz kick to its own IPI Frederic Weisbecker
@ 2014-05-13 22:25 ` Frederic Weisbecker
  2014-05-14  9:09   ` Peter Zijlstra
  2 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2014-05-13 22:25 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, Ingo Molnar, Kevin Hilman,
	Paul E. McKenney, Peter Zijlstra, Thomas Gleixner, Viresh Kumar

A full dynticks CPU is allowed to stop its tick when a single task runs.
Meanwhile when a new task gets enqueued, the CPU must be notified so that
it can restart its tick to maintain local fairness and other accounting
details.

This notification is performed by way of an IPI. Then when the target
receives the IPI, we expect it to see the new value of rq->nr_running.

Hence the following ordering scenario:

   CPU 0                   CPU 1

   write rq->running       get IPI
   smp_wmb()               smp_rmb()
   send IPI                read rq->nr_running

But Paul Mckenney says that nowadays IPIs imply a full barrier on
all architectures. So we can safely remove this pair and rely on the
implicit barriers that come along IPI send/receive. Lets
just comment on this new assumption.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/sched/core.c  |  9 +++++----
 kernel/sched/sched.h | 10 ++++++++--
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fb6dfad..a06cac1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -670,10 +670,11 @@ bool sched_can_stop_tick(void)
 
        rq = this_rq();
 
-       /* Make sure rq->nr_running update is visible after the IPI */
-       smp_rmb();
-
-       /* More than one running task need preemption */
+       /*
+	* More than one running task need preemption.
+	* nr_running update is assumed to be visible
+	* after IPI is sent from wakers.
+	*/
        if (rq->nr_running > 1)
                return false;
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 6089e00..219bfbd 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1223,8 +1223,14 @@ static inline void inc_nr_running(struct rq *rq)
 #ifdef CONFIG_NO_HZ_FULL
 	if (rq->nr_running == 2) {
 		if (tick_nohz_full_cpu(rq->cpu)) {
-			/* Order rq->nr_running write against the IPI */
-			smp_wmb();
+			/*
+			 * Tick is needed if more than one task runs on a CPU.
+			 * Send the target an IPI to kick it out of nohz mode.
+			 *
+			 * We assume that IPI implies full memory barrier and the
+			 * new value of rq->nr_running is visible on reception
+			 * from the target.
+			 */
 			tick_nohz_full_kick_cpu(rq->cpu);
 		}
        }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] irq_work: Implement remote queueing
  2014-05-13 22:25 ` [PATCH 1/3] irq_work: Implement remote queueing Frederic Weisbecker
@ 2014-05-14  9:06   ` Peter Zijlstra
  2014-05-14  9:10     ` Peter Zijlstra
  2014-05-14 11:38     ` Frederic Weisbecker
  0 siblings, 2 replies; 16+ messages in thread
From: Peter Zijlstra @ 2014-05-14  9:06 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Andrew Morton, Ingo Molnar, Kevin Hilman, Paul E. McKenney,
	Thomas Gleixner, Viresh Kumar

[-- Attachment #1: Type: text/plain, Size: 3226 bytes --]

On Wed, May 14, 2014 at 12:25:54AM +0200, Frederic Weisbecker wrote:
> irq work currently only supports local callbacks. However its code
> is mostly ready to run remote callbacks and we have some potential user.
> 
> The full nohz subsystem currently open codes its own remote irq work
> on top of the scheduler ipi when it wants a CPU to reevaluate its next
> tick. However this ad hoc solution bloats the scheduler IPI.
> 
> Lets just extend the irq work subsystem to support remote queuing on top
> of the generic SMP IPI to handle this kind of user. This shouldn't add
> noticeable overhead.
> 
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Kevin Hilman <khilman@linaro.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> ---
>  include/linux/irq_work.h |  2 ++
>  kernel/irq_work.c        | 19 ++++++++++++++++++-
>  kernel/smp.c             |  4 ++++
>  3 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
> index 19ae05d..ae44aa2 100644
> --- a/include/linux/irq_work.h
> +++ b/include/linux/irq_work.h
> @@ -33,6 +33,8 @@ void init_irq_work(struct irq_work *work, void (*func)(struct irq_work *))
>  #define DEFINE_IRQ_WORK(name, _f) struct irq_work name = { .func = (_f), }
>  
>  bool irq_work_queue(struct irq_work *work);
> +bool irq_work_queue_on(struct irq_work *work, int cpu);
> +
>  void irq_work_run(void);
>  void irq_work_sync(struct irq_work *work);
>  
> diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> index a82170e..9f9be55 100644
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -56,11 +56,28 @@ void __weak arch_irq_work_raise(void)
>  }
>  
>  /*
> - * Enqueue the irq_work @entry unless it's already pending
> + * Enqueue the irq_work @work on @cpu unless it's already pending
>   * somewhere.
>   *
>   * Can be re-enqueued while the callback is still in progress.
>   */
> +bool irq_work_queue_on(struct irq_work *work, int cpu)
> +{
> +	/* Only queue if not already pending */
> +	if (!irq_work_claim(work))
> +		return false;
> +
> +	/* All work should have been flushed before going offline */
> +	WARN_ON_ONCE(cpu_is_offline(cpu));

	WARN_ON_ONCE(in_nmi());

> +
> +	llist_add(&work->llnode, &per_cpu(irq_work_list, cpu));
> +	native_send_call_func_single_ipi(cpu);

At the very leastestest make that:

	if (llist_add(&work->llnode, &per_cpu(irq_work_list, cpu)))
		native_send_call_func_single_ipi(cpu);

But ideally, also test the IRQ_WORK_LAZY support, its weird to have that
only be supported for the other queue.

Hmm, why do we need that LAZY crap, that completely wrecks a perfectly
simple thing.

The changelog (bc6679aef673f), not the printk() usage make much sense,
printk() can't cause an IPI storm... printk() isn't fast enough to storm
anything.

> +
> +	return true;
> +}
> +EXPORT_SYMBOL_GPL(irq_work_queue_on);

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] nohz: Use IPI implicit full barrier against rq->nr_running r/w
  2014-05-13 22:25 ` [PATCH 3/3] nohz: Use IPI implicit full barrier against rq->nr_running r/w Frederic Weisbecker
@ 2014-05-14  9:09   ` Peter Zijlstra
  2014-05-14 11:38     ` Frederic Weisbecker
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2014-05-14  9:09 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Andrew Morton, Ingo Molnar, Kevin Hilman, Paul E. McKenney,
	Thomas Gleixner, Viresh Kumar

[-- Attachment #1: Type: text/plain, Size: 614 bytes --]

On Wed, May 14, 2014 at 12:25:56AM +0200, Frederic Weisbecker wrote:
> @@ -670,10 +670,11 @@ bool sched_can_stop_tick(void)
>  
>         rq = this_rq();
>  
> -       /* Make sure rq->nr_running update is visible after the IPI */
> -       smp_rmb();
> -
> -       /* More than one running task need preemption */
> +       /*
> +	* More than one running task need preemption.
> +	* nr_running update is assumed to be visible
> +	* after IPI is sent from wakers.
> +	*/
>         if (rq->nr_running > 1)
>                 return false;

Looks like whitespace damage on that comment's indenting.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] irq_work: Implement remote queueing
  2014-05-14  9:06   ` Peter Zijlstra
@ 2014-05-14  9:10     ` Peter Zijlstra
  2014-05-14 11:38     ` Frederic Weisbecker
  1 sibling, 0 replies; 16+ messages in thread
From: Peter Zijlstra @ 2014-05-14  9:10 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Andrew Morton, Ingo Molnar, Kevin Hilman, Paul E. McKenney,
	Thomas Gleixner, Viresh Kumar

[-- Attachment #1: Type: text/plain, Size: 774 bytes --]

On Wed, May 14, 2014 at 11:06:29AM +0200, Peter Zijlstra wrote:
> > +	llist_add(&work->llnode, &per_cpu(irq_work_list, cpu));
> > +	native_send_call_func_single_ipi(cpu);
> 
> At the very leastestest make that:
> 
> 	if (llist_add(&work->llnode, &per_cpu(irq_work_list, cpu)))
> 		native_send_call_func_single_ipi(cpu);
> 
> But ideally, also test the IRQ_WORK_LAZY support, its weird to have that
> only be supported for the other queue.
> 
> Hmm, why do we need that LAZY crap, that completely wrecks a perfectly
> simple thing.
> 
> The changelog (bc6679aef673f), not the printk() usage make much sense,

s/not/nor/

> printk() can't cause an IPI storm... printk() isn't fast enough to storm
> anything.

Except, as we all know, slow serial lines.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] irq_work: Implement remote queueing
  2014-05-14  9:06   ` Peter Zijlstra
  2014-05-14  9:10     ` Peter Zijlstra
@ 2014-05-14 11:38     ` Frederic Weisbecker
  2014-05-14 11:54       ` Peter Zijlstra
  1 sibling, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2014-05-14 11:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Andrew Morton, Ingo Molnar, Kevin Hilman, Paul E. McKenney,
	Thomas Gleixner, Viresh Kumar

On Wed, May 14, 2014 at 11:06:29AM +0200, Peter Zijlstra wrote:
> On Wed, May 14, 2014 at 12:25:54AM +0200, Frederic Weisbecker wrote:
> > irq work currently only supports local callbacks. However its code
> > is mostly ready to run remote callbacks and we have some potential user.
> > 
> > The full nohz subsystem currently open codes its own remote irq work
> > on top of the scheduler ipi when it wants a CPU to reevaluate its next
> > tick. However this ad hoc solution bloats the scheduler IPI.
> > 
> > Lets just extend the irq work subsystem to support remote queuing on top
> > of the generic SMP IPI to handle this kind of user. This shouldn't add
> > noticeable overhead.
> > 
> > Suggested-by: Peter Zijlstra <peterz@infradead.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Kevin Hilman <khilman@linaro.org>
> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Viresh Kumar <viresh.kumar@linaro.org>
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > ---
> >  include/linux/irq_work.h |  2 ++
> >  kernel/irq_work.c        | 19 ++++++++++++++++++-
> >  kernel/smp.c             |  4 ++++
> >  3 files changed, 24 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
> > index 19ae05d..ae44aa2 100644
> > --- a/include/linux/irq_work.h
> > +++ b/include/linux/irq_work.h
> > @@ -33,6 +33,8 @@ void init_irq_work(struct irq_work *work, void (*func)(struct irq_work *))
> >  #define DEFINE_IRQ_WORK(name, _f) struct irq_work name = { .func = (_f), }
> >  
> >  bool irq_work_queue(struct irq_work *work);
> > +bool irq_work_queue_on(struct irq_work *work, int cpu);
> > +
> >  void irq_work_run(void);
> >  void irq_work_sync(struct irq_work *work);
> >  
> > diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> > index a82170e..9f9be55 100644
> > --- a/kernel/irq_work.c
> > +++ b/kernel/irq_work.c
> > @@ -56,11 +56,28 @@ void __weak arch_irq_work_raise(void)
> >  }
> >  
> >  /*
> > - * Enqueue the irq_work @entry unless it's already pending
> > + * Enqueue the irq_work @work on @cpu unless it's already pending
> >   * somewhere.
> >   *
> >   * Can be re-enqueued while the callback is still in progress.
> >   */
> > +bool irq_work_queue_on(struct irq_work *work, int cpu)
> > +{
> > +	/* Only queue if not already pending */
> > +	if (!irq_work_claim(work))
> > +		return false;
> > +
> > +	/* All work should have been flushed before going offline */
> > +	WARN_ON_ONCE(cpu_is_offline(cpu));
> 
> 	WARN_ON_ONCE(in_nmi());

Well... I think it's actually NMI-safe.

> 
> > +
> > +	llist_add(&work->llnode, &per_cpu(irq_work_list, cpu));
> > +	native_send_call_func_single_ipi(cpu);
> 
> At the very leastestest make that:
> 
> 	if (llist_add(&work->llnode, &per_cpu(irq_work_list, cpu)))
> 		native_send_call_func_single_ipi(cpu);

So yeah the issue is that we may have IRQ_WORK_LAZY in the queue. And
if we have only such work in the queue, nobody has raised before us.

So we can't just test with llist_add(). Or if we do, we must then
separate raised and lazy list.

Also note that nohz is the only user for now and irq_work_claim() thus
prevents from double IPI. Of course if more users come up the issue arise
again.

> 
> But ideally, also test the IRQ_WORK_LAZY support, its weird to have that
> only be supported for the other queue.

OTOH IRQ_WORK_LAZY don't make much sense in remote queueing. We can't safely
just *wait* for another CPU's tick. The IPI is necessary anyway.

> 
> Hmm, why do we need that LAZY crap, that completely wrecks a perfectly
> simple thing.
> 
> The changelog (bc6679aef673f), not the printk() usage make much sense,
> printk() can't cause an IPI storm... printk() isn't fast enough to storm
> anything.

Maybe I was paranoid but I was worried about the overhead of printk() wakeups
on boot if implemented with IPIs.

Of course if I can be proven that it won't bring much damage to use an IPI, I'd
be very happy to remove it.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] nohz: Use IPI implicit full barrier against rq->nr_running r/w
  2014-05-14  9:09   ` Peter Zijlstra
@ 2014-05-14 11:38     ` Frederic Weisbecker
  0 siblings, 0 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2014-05-14 11:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Andrew Morton, Ingo Molnar, Kevin Hilman, Paul E. McKenney,
	Thomas Gleixner, Viresh Kumar

On Wed, May 14, 2014 at 11:09:03AM +0200, Peter Zijlstra wrote:
> On Wed, May 14, 2014 at 12:25:56AM +0200, Frederic Weisbecker wrote:
> > @@ -670,10 +670,11 @@ bool sched_can_stop_tick(void)
> >  
> >         rq = this_rq();
> >  
> > -       /* Make sure rq->nr_running update is visible after the IPI */
> > -       smp_rmb();
> > -
> > -       /* More than one running task need preemption */
> > +       /*
> > +	* More than one running task need preemption.
> > +	* nr_running update is assumed to be visible
> > +	* after IPI is sent from wakers.
> > +	*/
> >         if (rq->nr_running > 1)
> >                 return false;
> 
> Looks like whitespace damage on that comment's indenting.

Oops!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] irq_work: Implement remote queueing
  2014-05-14 11:38     ` Frederic Weisbecker
@ 2014-05-14 11:54       ` Peter Zijlstra
  2014-05-14 12:11         ` Frederic Weisbecker
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2014-05-14 11:54 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Andrew Morton, Ingo Molnar, Kevin Hilman, Paul E. McKenney,
	Thomas Gleixner, Viresh Kumar

[-- Attachment #1: Type: text/plain, Size: 2173 bytes --]

On Wed, May 14, 2014 at 01:38:14PM +0200, Frederic Weisbecker wrote:
> > > +bool irq_work_queue_on(struct irq_work *work, int cpu)
> > > +{
> > > +	/* Only queue if not already pending */
> > > +	if (!irq_work_claim(work))
> > > +		return false;
> > > +
> > > +	/* All work should have been flushed before going offline */
> > > +	WARN_ON_ONCE(cpu_is_offline(cpu));
> > 
> > 	WARN_ON_ONCE(in_nmi());
> 
> Well... I think it's actually NMI-safe.

I don't think it is, most apic calls do apic_wait_icr_idle() then the
apic op, if an NMI happens in between and writes to the APIC, the return
context will see a !idle icr and fail.

This is why arch_irq_work_raise() again idles the icr after sending the
IPI.

Also, I think, seeing what benh said earlier, its unsafe for other archs
too.

> > > +	llist_add(&work->llnode, &per_cpu(irq_work_list, cpu));
> > > +	native_send_call_func_single_ipi(cpu);
> > 
> > At the very leastestest make that:
> > 
> > 	if (llist_add(&work->llnode, &per_cpu(irq_work_list, cpu)))
> > 		native_send_call_func_single_ipi(cpu);
> 
> So yeah the issue is that we may have IRQ_WORK_LAZY in the queue. And
> if we have only such work in the queue, nobody has raised before us.
> 
> So we can't just test with llist_add(). Or if we do, we must then
> separate raised and lazy list.

Then do the remote irq_work_raised thing. But it really stinks you broke
this very nice and simple thing.

> Also note that nohz is the only user for now and irq_work_claim() thus
> prevents from double IPI. Of course if more users come up the issue arise
> again.

DANGER, half arsed engineering at work, seriously? Just write proper
code already.

There's no fucking way the next user will check the implementation to
make sure its 'sane'.

> Maybe I was paranoid but I was worried about the overhead of printk() wakeups
> on boot if implemented with IPIs.
> 
> Of course if I can be proven that it won't bring much damage to use an IPI, I'd
> be very happy to remove it.

That's the wrong fucking way around, first proof its needed then do
something about it. As is I think the LAZY thing is horrid.



[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] irq_work: Implement remote queueing
  2014-05-14 11:54       ` Peter Zijlstra
@ 2014-05-14 12:11         ` Frederic Weisbecker
  2014-05-14 12:41           ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2014-05-14 12:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Andrew Morton, Ingo Molnar, Kevin Hilman, Paul E. McKenney,
	Thomas Gleixner, Viresh Kumar

On Wed, May 14, 2014 at 01:54:06PM +0200, Peter Zijlstra wrote:
> On Wed, May 14, 2014 at 01:38:14PM +0200, Frederic Weisbecker wrote:
> > > > +bool irq_work_queue_on(struct irq_work *work, int cpu)
> > > > +{
> > > > +	/* Only queue if not already pending */
> > > > +	if (!irq_work_claim(work))
> > > > +		return false;
> > > > +
> > > > +	/* All work should have been flushed before going offline */
> > > > +	WARN_ON_ONCE(cpu_is_offline(cpu));
> > > 
> > > 	WARN_ON_ONCE(in_nmi());
> > 
> > Well... I think it's actually NMI-safe.
> 
> I don't think it is, most apic calls do apic_wait_icr_idle() then the
> apic op, if an NMI happens in between and writes to the APIC, the return
> context will see a !idle icr and fail.
> 
> This is why arch_irq_work_raise() again idles the icr after sending the
> IPI.
> 
> Also, I think, seeing what benh said earlier, its unsafe for other archs
> too.

Ah I don't know much these archs details, so I concede it.

> 
> > > > +	llist_add(&work->llnode, &per_cpu(irq_work_list, cpu));
> > > > +	native_send_call_func_single_ipi(cpu);
> > > 
> > > At the very leastestest make that:
> > > 
> > > 	if (llist_add(&work->llnode, &per_cpu(irq_work_list, cpu)))
> > > 		native_send_call_func_single_ipi(cpu);
> > 
> > So yeah the issue is that we may have IRQ_WORK_LAZY in the queue. And
> > if we have only such work in the queue, nobody has raised before us.
> > 
> > So we can't just test with llist_add(). Or if we do, we must then
> > separate raised and lazy list.
> 
> Then do the remote irq_work_raised thing. But it really stinks you broke
> this very nice and simple thing.

I tried not to break boot with printk overhead. That said I've considered having
a very simple "tick work" that can rely on irq work when the tick is stopped
and use it for printk. That would restore the initial simplicity.

> 
> > Also note that nohz is the only user for now and irq_work_claim() thus
> > prevents from double IPI. Of course if more users come up the issue arise
> > again.
> 
> DANGER, half arsed engineering at work, seriously? Just write proper
> code already.
> 
> There's no fucking way the next user will check the implementation to
> make sure its 'sane'.

Are you competing with tglx on grumpiness? You guys are free to treat us
like shit but don't be surprised if one day you'll be alone in kernel/*

> 
> > Maybe I was paranoid but I was worried about the overhead of printk() wakeups
> > on boot if implemented with IPIs.
> > 
> > Of course if I can be proven that it won't bring much damage to use an IPI, I'd
> > be very happy to remove it.
> 
> That's the wrong fucking way around, first proof its needed then do
> something about it. As is I think the LAZY thing is horrid.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] irq_work: Implement remote queueing
  2014-05-14 12:11         ` Frederic Weisbecker
@ 2014-05-14 12:41           ` Peter Zijlstra
  2014-05-14 13:51             ` Frederic Weisbecker
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2014-05-14 12:41 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Andrew Morton, Ingo Molnar, Kevin Hilman, Paul E. McKenney,
	Thomas Gleixner, Viresh Kumar

[-- Attachment #1: Type: text/plain, Size: 1967 bytes --]

On Wed, May 14, 2014 at 02:11:25PM +0200, Frederic Weisbecker wrote:
> > I don't think it is, most apic calls do apic_wait_icr_idle() then the
> > apic op, if an NMI happens in between and writes to the APIC, the return
> > context will see a !idle icr and fail.
> > 
> > This is why arch_irq_work_raise() again idles the icr after sending the
> > IPI.
> > 
> > Also, I think, seeing what benh said earlier, its unsafe for other archs
> > too.
> 
> Ah I don't know much these archs details, so I concede it.

Yeah, I didn't either, had to figure it out when someone asked WTH there
was an wait_icr_idle call in there.

> > Then do the remote irq_work_raised thing. But it really stinks you broke
> > this very nice and simple thing.
> 
> I tried not to break boot with printk overhead. That said I've considered having
> a very simple "tick work" that can rely on irq work when the tick is stopped
> and use it for printk. That would restore the initial simplicity.

But but but.. did you even try without the lazy thing?

Don't fix what ain't broken, keep it simple, etc..

Anyway, if it turns out to really be needed, the split list doesn't
sound bad.

> > > Also note that nohz is the only user for now and irq_work_claim() thus
> > > prevents from double IPI. Of course if more users come up the issue arise
> > > again.
> > 
> > DANGER, half arsed engineering at work, seriously? Just write proper
> > code already.
> > 
> > There's no fucking way the next user will check the implementation to
> > make sure its 'sane'.
> 
> Are you competing with tglx on grumpiness? You guys are free to treat us
> like shit but don't be surprised if one day you'll be alone in kernel/*

There's really only so much nonsense one can take on any one day before
getting seriously grumpy.

And arguing that because there's only one user so we can skimp a core
function really tops the day.

So maybe I need a holiday, but shees.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] irq_work: Implement remote queueing
  2014-05-14 12:41           ` Peter Zijlstra
@ 2014-05-14 13:51             ` Frederic Weisbecker
  2014-05-14 13:55               ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2014-05-14 13:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Andrew Morton, Ingo Molnar, Kevin Hilman, Paul E. McKenney,
	Thomas Gleixner, Viresh Kumar

On Wed, May 14, 2014 at 02:41:50PM +0200, Peter Zijlstra wrote:
> On Wed, May 14, 2014 at 02:11:25PM +0200, Frederic Weisbecker wrote:
> > > I don't think it is, most apic calls do apic_wait_icr_idle() then the
> > > apic op, if an NMI happens in between and writes to the APIC, the return
> > > context will see a !idle icr and fail.
> > > 
> > > This is why arch_irq_work_raise() again idles the icr after sending the
> > > IPI.
> > > 
> > > Also, I think, seeing what benh said earlier, its unsafe for other archs
> > > too.
> > 
> > Ah I don't know much these archs details, so I concede it.
> 
> Yeah, I didn't either, had to figure it out when someone asked WTH there
> was an wait_icr_idle call in there.
> 
> > > Then do the remote irq_work_raised thing. But it really stinks you broke
> > > this very nice and simple thing.
> > 
> > I tried not to break boot with printk overhead. That said I've considered having
> > a very simple "tick work" that can rely on irq work when the tick is stopped
> > and use it for printk. That would restore the initial simplicity.
> 
> But but but.. did you even try without the lazy thing?
> 
> Don't fix what ain't broken, keep it simple, etc..

The problem is that I may well see no significant issues in my small and common hardware
but the problem may hit on boxes with specific configs or big numbers of CPUs.

"Don't fix what ain't broken" here clashes with "lets stay conservative/paranoid"
to avoid bringing new bugs. Before printk used irq_work we had printk_tick(),
I simply kept the old behaviour to avoid breaking boot time on other boxes.

> 
> Anyway, if it turns out to really be needed, the split list doesn't
> sound bad.

Either that or we can remove LAZY stuff and wait to see if people complain :)

> 
> > > > Also note that nohz is the only user for now and irq_work_claim() thus
> > > > prevents from double IPI. Of course if more users come up the issue arise
> > > > again.
> > > 
> > > DANGER, half arsed engineering at work, seriously? Just write proper
> > > code already.
> > > 
> > > There's no fucking way the next user will check the implementation to
> > > make sure its 'sane'.
> > 
> > Are you competing with tglx on grumpiness? You guys are free to treat us
> > like shit but don't be surprised if one day you'll be alone in kernel/*
> 
> There's really only so much nonsense one can take on any one day before
> getting seriously grumpy.
> 
> And arguing that because there's only one user so we can skimp a core
> function really tops the day.
> 
> So maybe I need a holiday, but shees.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] irq_work: Implement remote queueing
  2014-05-14 13:51             ` Frederic Weisbecker
@ 2014-05-14 13:55               ` Peter Zijlstra
  2014-05-14 14:28                 ` Thomas Gleixner
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2014-05-14 13:55 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Andrew Morton, Ingo Molnar, Kevin Hilman, Paul E. McKenney,
	Thomas Gleixner, Viresh Kumar

[-- Attachment #1: Type: text/plain, Size: 341 bytes --]

On Wed, May 14, 2014 at 03:51:19PM +0200, Frederic Weisbecker wrote:
> > Anyway, if it turns out to really be needed, the split list doesn't
> > sound bad.
> 
> Either that or we can remove LAZY stuff and wait to see if people complain :)

How big do we need tested? I'm fairly sure RHT has some medium silly
boxes to run things on.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] irq_work: Implement remote queueing
  2014-05-14 13:55               ` Peter Zijlstra
@ 2014-05-14 14:28                 ` Thomas Gleixner
  0 siblings, 0 replies; 16+ messages in thread
From: Thomas Gleixner @ 2014-05-14 14:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar,
	Kevin Hilman, Paul E. McKenney, Viresh Kumar

On Wed, 14 May 2014, Peter Zijlstra wrote:

> On Wed, May 14, 2014 at 03:51:19PM +0200, Frederic Weisbecker wrote:
> > > Anyway, if it turns out to really be needed, the split list doesn't
> > > sound bad.
> > 
> > Either that or we can remove LAZY stuff and wait to see if people complain :)
> 
> How big do we need tested? I'm fairly sure RHT has some medium silly
> boxes to run things on.

Poke Davidlohr and pretend it's a futex optimization.

He has access to a 160 way for futex tests :)

Thanks,

	tglx

 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/3] nohz: Move full nohz kick to its own IPI
  2014-03-19 18:28 [RFC PATCH 0/3] nohz: Move nohz kick out of scheduler IPI Frederic Weisbecker
@ 2014-03-19 18:28 ` Frederic Weisbecker
  0 siblings, 0 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2014-03-19 18:28 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, Ingo Molnar, Jens Axboe,
	Kevin Hilman, Paul E. McKenney, Peter Zijlstra, Thomas Gleixner

Now that we have smp_queue_function_single() which can be used to
safely queue IPIs when interrupts are disabled and without worrying
about concurrent callers, lets use it for the full dynticks kick to
notify a CPU that it's exiting single task mode.

This unbloats a bit the scheduler IPI that the nohz code was abusing
for its cool "callable anywhere/anytime" properties.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/tick.h     |  2 ++
 kernel/sched/core.c      |  5 +----
 kernel/sched/sched.h     |  2 +-
 kernel/time/tick-sched.c | 20 ++++++++++++++++++++
 4 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index b84773c..9d3fcc2 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -182,6 +182,7 @@ static inline bool tick_nohz_full_cpu(int cpu)
 extern void tick_nohz_init(void);
 extern void __tick_nohz_full_check(void);
 extern void tick_nohz_full_kick(void);
+extern void tick_nohz_full_kick_cpu(int cpu);
 extern void tick_nohz_full_kick_all(void);
 extern void __tick_nohz_task_switch(struct task_struct *tsk);
 #else
@@ -190,6 +191,7 @@ static inline bool tick_nohz_full_enabled(void) { return false; }
 static inline bool tick_nohz_full_cpu(int cpu) { return false; }
 static inline void __tick_nohz_full_check(void) { }
 static inline void tick_nohz_full_kick(void) { }
+static inline void tick_nohz_full_kick_cpu(int cpu) { }
 static inline void tick_nohz_full_kick_all(void) { }
 static inline void __tick_nohz_task_switch(struct task_struct *tsk) { }
 #endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0cca04a..a07c3a4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1502,9 +1502,7 @@ void scheduler_ipi(void)
 	 */
 	preempt_fold_need_resched();
 
-	if (llist_empty(&this_rq()->wake_list)
-			&& !tick_nohz_full_cpu(smp_processor_id())
-			&& !got_nohz_idle_kick())
+	if (llist_empty(&this_rq()->wake_list) && !got_nohz_idle_kick())
 		return;
 
 	/*
@@ -1521,7 +1519,6 @@ void scheduler_ipi(void)
 	 * somewhat pessimize the simple resched case.
 	 */
 	irq_enter();
-	tick_nohz_full_check();
 	sched_ttwu_pending();
 
 	/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c2119fd..3b165c1 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1233,7 +1233,7 @@ static inline void inc_nr_running(struct rq *rq)
 		if (tick_nohz_full_cpu(rq->cpu)) {
 			/* Order rq->nr_running write against the IPI */
 			smp_wmb();
-			smp_send_reschedule(rq->cpu);
+			tick_nohz_full_kick_cpu(rq->cpu);
 		}
        }
 #endif
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 9f8af69..33a0043 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -230,6 +230,26 @@ void tick_nohz_full_kick(void)
 		irq_work_queue(&__get_cpu_var(nohz_full_kick_work));
 }
 
+static DEFINE_PER_CPU(struct queue_single_data, nohz_full_kick_qsd);
+
+static void nohz_full_kick_queue(struct queue_single_data *qsd)
+{
+	__tick_nohz_full_check();
+}
+
+void tick_nohz_full_kick_cpu(int cpu)
+{
+	if (!tick_nohz_full_cpu(cpu))
+		return;
+
+	if (cpu == smp_processor_id()) {
+		irq_work_queue(&__get_cpu_var(nohz_full_kick_work));
+	} else {
+		smp_queue_function_single(cpu, nohz_full_kick_queue,
+					  &per_cpu(nohz_full_kick_qsd, cpu));
+	}
+}
+
 static void nohz_full_kick_ipi(void *info)
 {
 	__tick_nohz_full_check();
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-05-14 14:27 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-13 22:25 [PATCH 0/3] nohz: Move nohz kick out of scheduler IPI, v5 Frederic Weisbecker
2014-05-13 22:25 ` [PATCH 1/3] irq_work: Implement remote queueing Frederic Weisbecker
2014-05-14  9:06   ` Peter Zijlstra
2014-05-14  9:10     ` Peter Zijlstra
2014-05-14 11:38     ` Frederic Weisbecker
2014-05-14 11:54       ` Peter Zijlstra
2014-05-14 12:11         ` Frederic Weisbecker
2014-05-14 12:41           ` Peter Zijlstra
2014-05-14 13:51             ` Frederic Weisbecker
2014-05-14 13:55               ` Peter Zijlstra
2014-05-14 14:28                 ` Thomas Gleixner
2014-05-13 22:25 ` [PATCH 2/3] nohz: Move full nohz kick to its own IPI Frederic Weisbecker
2014-05-13 22:25 ` [PATCH 3/3] nohz: Use IPI implicit full barrier against rq->nr_running r/w Frederic Weisbecker
2014-05-14  9:09   ` Peter Zijlstra
2014-05-14 11:38     ` Frederic Weisbecker
  -- strict thread matches above, loose matches on Subject: below --
2014-03-19 18:28 [RFC PATCH 0/3] nohz: Move nohz kick out of scheduler IPI Frederic Weisbecker
2014-03-19 18:28 ` [PATCH 2/3] nohz: Move full nohz kick to its own IPI Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).