[GIT PULL] nohz: Move nohz kick out of scheduler IPI, v7

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [GIT PULL] nohz: Move nohz kick out of scheduler IPI, v7
@ 2014-06-03 14:40 Frederic Weisbecker
  2014-06-03 14:40 ` [PATCH 1/5] irq_work: Split raised and lazy lists Frederic Weisbecker
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Frederic Weisbecker @ 2014-06-03 14:40 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Andrew Morton, Kevin Hilman,
	Paul E. McKenney, Thomas Gleixner, Viresh Kumar

Hi,

This version fixes the following concerns from Peterz:

* Warn _before_ work claim on irq_work_queue_on()
* Warn on in_nmi() while remote queueing
* Only disabled preemption (and not irqs) on local queueing

Thanks.

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	timers/nohz-irq-work-v5

Thanks,
	Frederic
---

Frederic Weisbecker (5):
      irq_work: Split raised and lazy lists
      irq_work: Shorten a bit irq_work_needs_cpu()
      irq_work: Implement remote queueing
      nohz: Move full nohz kick to its own IPI
      nohz: Use IPI implicit full barrier against rq->nr_running r/w


 include/linux/irq_work.h |  2 ++
 include/linux/tick.h     |  9 +++++-
 kernel/irq_work.c        | 72 ++++++++++++++++++++++++++++--------------------
 kernel/sched/core.c      | 14 ++++------
 kernel/sched/sched.h     | 12 ++++++--
 kernel/smp.c             |  4 +++
 kernel/time/tick-sched.c | 10 ++++---
 7 files changed, 77 insertions(+), 46 deletions(-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/5] irq_work: Split raised and lazy lists
  2014-06-03 14:40 [GIT PULL] nohz: Move nohz kick out of scheduler IPI, v7 Frederic Weisbecker
@ 2014-06-03 14:40 ` Frederic Weisbecker
  2014-06-03 14:54   ` Peter Zijlstra
  2014-06-03 14:40 ` [PATCH 2/5] irq_work: Shorten a bit irq_work_needs_cpu() Frederic Weisbecker
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Frederic Weisbecker @ 2014-06-03 14:40 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Andrew Morton, Kevin Hilman,
	Paul E. McKenney, Thomas Gleixner, Viresh Kumar

An irq work can be handled from two places: from the tick if the work
carries the "lazy" flag and the tick is periodic, or from a self IPI.

We merge all these works in a single list and we use some per cpu latch
to avoid raising a self-IPI when one is already pending.

Now we could do away with this ugly latch if only the list was only made of
non-lazy works. Just enqueueing a work on the empty list would be enough
to know if we need to raise an IPI or not.

Also we are going to implement remote irq work queuing. Then the per CPU
latch will need to become atomic in the global scope. That's too bad
because, here as well, just enqueueing a work on an empty list of
non-lazy works would be enough to know if we need to raise an IPI or not.

So lets take a way out of this: split the works in two distinct lists,
one for the works that can be handled by the next tick and another
one for those handled by the IPI. Just checking if the latter is empty
when we queue a new work is enough to know if we need to raise an IPI.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/irq_work.c | 53 ++++++++++++++++++++++++-----------------------------
 1 file changed, 24 insertions(+), 29 deletions(-)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index a82170e..44ac87c 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -19,8 +19,8 @@
 #include <asm/processor.h>
 
 
-static DEFINE_PER_CPU(struct llist_head, irq_work_list);
-static DEFINE_PER_CPU(int, irq_work_raised);
+static DEFINE_PER_CPU(struct llist_head, raised_list);
+static DEFINE_PER_CPU(struct llist_head, lazy_list);
 
 /*
  * Claim the entry so that no one else will poke at it.
@@ -70,15 +70,14 @@ bool irq_work_queue(struct irq_work *work)
 	/* Queue the entry and raise the IPI if needed. */
 	preempt_disable();
 
-	llist_add(&work->llnode, &__get_cpu_var(irq_work_list));
-
-	/*
-	 * If the work is not "lazy" or the tick is stopped, raise the irq
-	 * work interrupt (if supported by the arch), otherwise, just wait
-	 * for the next tick.
-	 */
-	if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
-		if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
+	/* If the work is "lazy", handle it from next tick if any */
+	if (work->flags & IRQ_WORK_LAZY) {
+		if (llist_add(&work->llnode, &__get_cpu_var(lazy_list)) &&
+		    tick_nohz_tick_stopped())
+			arch_irq_work_raise();
+	}
+	} else {
+		if (llist_add(&work->llnode, &__get_cpu_var(raised_list)))
 			arch_irq_work_raise();
 	}
 
@@ -90,10 +89,10 @@ EXPORT_SYMBOL_GPL(irq_work_queue);
 
 bool irq_work_needs_cpu(void)
 {
-	struct llist_head *this_list;
+	struct llist_head *list;
 
-	this_list = &__get_cpu_var(irq_work_list);
-	if (llist_empty(this_list))
+	list = &__get_cpu_var(lazy_list);
+	if (llist_empty(list))
 		return false;
 
 	/* All work should have been flushed before going offline */
@@ -102,28 +101,18 @@ bool irq_work_needs_cpu(void)
 	return true;
 }
 
-static void __irq_work_run(void)
+static void irq_work_run_list(struct llist_head *list)
 {
 	unsigned long flags;
 	struct irq_work *work;
-	struct llist_head *this_list;
 	struct llist_node *llnode;
 
-
-	/*
-	 * Reset the "raised" state right before we check the list because
-	 * an NMI may enqueue after we find the list empty from the runner.
-	 */
-	__this_cpu_write(irq_work_raised, 0);
-	barrier();
-
-	this_list = &__get_cpu_var(irq_work_list);
-	if (llist_empty(this_list))
-		return;
-
 	BUG_ON(!irqs_disabled());
 
-	llnode = llist_del_all(this_list);
+	if (llist_empty(list))
+		return;
+
+	llnode = llist_del_all(list);
 	while (llnode != NULL) {
 		work = llist_entry(llnode, struct irq_work, llnode);
 
@@ -148,6 +137,12 @@ static void __irq_work_run(void)
 	}
 }
 
+static void __irq_work_run(void)
+{
+	irq_work_run_list(&__get_cpu_var(raised_list));
+	irq_work_run_list(&__get_cpu_var(lazy_list));
+}
+
 /*
  * Run the irq_work entries on this cpu. Requires to be ran from hardirq
  * context with local IRQs disabled.
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/5] irq_work: Shorten a bit irq_work_needs_cpu()
  2014-06-03 14:40 [GIT PULL] nohz: Move nohz kick out of scheduler IPI, v7 Frederic Weisbecker
  2014-06-03 14:40 ` [PATCH 1/5] irq_work: Split raised and lazy lists Frederic Weisbecker
@ 2014-06-03 14:40 ` Frederic Weisbecker
  2014-06-03 14:40 ` [PATCH 3/5] irq_work: Implement remote queueing Frederic Weisbecker
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Frederic Weisbecker @ 2014-06-03 14:40 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Andrew Morton, Kevin Hilman,
	Paul E. McKenney, Thomas Gleixner, Viresh Kumar

Just a small cleanup.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/irq_work.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 44ac87c..a4350d0 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -89,10 +89,7 @@ EXPORT_SYMBOL_GPL(irq_work_queue);
 
 bool irq_work_needs_cpu(void)
 {
-	struct llist_head *list;
-
-	list = &__get_cpu_var(lazy_list);
-	if (llist_empty(list))
+	if (llist_empty(&__get_cpu_var(lazy_list)))
 		return false;
 
 	/* All work should have been flushed before going offline */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/5] irq_work: Implement remote queueing
  2014-06-03 14:40 [GIT PULL] nohz: Move nohz kick out of scheduler IPI, v7 Frederic Weisbecker
  2014-06-03 14:40 ` [PATCH 1/5] irq_work: Split raised and lazy lists Frederic Weisbecker
  2014-06-03 14:40 ` [PATCH 2/5] irq_work: Shorten a bit irq_work_needs_cpu() Frederic Weisbecker
@ 2014-06-03 14:40 ` Frederic Weisbecker
  2014-06-03 15:00   ` Peter Zijlstra
  2014-06-03 16:29   ` Eric Dumazet
  2014-06-03 14:40 ` [PATCH 4/5] nohz: Move full nohz kick to its own IPI Frederic Weisbecker
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 22+ messages in thread
From: Frederic Weisbecker @ 2014-06-03 14:40 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Andrew Morton, Kevin Hilman,
	Paul E. McKenney, Thomas Gleixner, Viresh Kumar

irq work currently only supports local callbacks. However its code
is mostly ready to run remote callbacks and we have some potential user.

The full nohz subsystem currently open codes its own remote irq work
on top of the scheduler ipi when it wants a CPU to reevaluate its next
tick. However this ad hoc solution bloats the scheduler IPI.

Lets just extend the irq work subsystem to support remote queuing on top
of the generic SMP IPI to handle this kind of user. This shouldn't add
noticeable overhead.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/irq_work.h |  2 ++
 kernel/irq_work.c        | 22 +++++++++++++++++++++-
 kernel/smp.c             |  4 ++++
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index 19ae05d..ae44aa2 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -33,6 +33,8 @@ void init_irq_work(struct irq_work *work, void (*func)(struct irq_work *))
 #define DEFINE_IRQ_WORK(name, _f) struct irq_work name = { .func = (_f), }
 
 bool irq_work_queue(struct irq_work *work);
+bool irq_work_queue_on(struct irq_work *work, int cpu);
+
 void irq_work_run(void);
 void irq_work_sync(struct irq_work *work);
 
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index a4350d0..42b87cb 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -56,11 +56,31 @@ void __weak arch_irq_work_raise(void)
 }
 
 /*
- * Enqueue the irq_work @entry unless it's already pending
+ * Enqueue the irq_work @work on @cpu unless it's already pending
  * somewhere.
  *
  * Can be re-enqueued while the callback is still in progress.
  */
+bool irq_work_queue_on(struct irq_work *work, int cpu)
+{
+	/* All work should have been flushed before going offline */
+	WARN_ON_ONCE(cpu_is_offline(cpu));
+
+	/* Arch remote IPI send/receive backend aren't NMI safe */
+	WARN_ON_ONCE(in_nmi());
+
+	/* Only queue if not already pending */
+	if (!irq_work_claim(work))
+		return false;
+
+	if (llist_add(&work->llnode, &per_cpu(raised_list, cpu)))
+		native_send_call_func_single_ipi(cpu);
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(irq_work_queue_on);
+
+/* Enqueue the irq work @work on the current CPU */
 bool irq_work_queue(struct irq_work *work)
 {
 	/* Only queue if not already pending */
diff --git a/kernel/smp.c b/kernel/smp.c
index 06d574e..ba0d8fd 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -3,6 +3,7 @@
  *
  * (C) Jens Axboe <jens.axboe@oracle.com> 2008
  */
+#include <linux/irq_work.h>
 #include <linux/rcupdate.h>
 #include <linux/rculist.h>
 #include <linux/kernel.h>
@@ -198,6 +199,9 @@ void generic_smp_call_function_single_interrupt(void)
 		csd->func(csd->info);
 		csd_unlock(csd);
 	}
+
+	/* Handle irq works queued remotely by irq_work_queue_on() */
+	irq_work_run();
 }
 
 /*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/5] nohz: Move full nohz kick to its own IPI
  2014-06-03 14:40 [GIT PULL] nohz: Move nohz kick out of scheduler IPI, v7 Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2014-06-03 14:40 ` [PATCH 3/5] irq_work: Implement remote queueing Frederic Weisbecker
@ 2014-06-03 14:40 ` Frederic Weisbecker
  2014-06-03 15:00   ` Peter Zijlstra
  2014-06-03 14:40 ` [PATCH 5/5] nohz: Use IPI implicit full barrier against rq->nr_running r/w Frederic Weisbecker
  2014-06-03 14:43 ` [GIT PULL] nohz: Move nohz kick out of scheduler IPI, v7 Frederic Weisbecker
  5 siblings, 1 reply; 22+ messages in thread
From: Frederic Weisbecker @ 2014-06-03 14:40 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Andrew Morton, Kevin Hilman,
	Paul E. McKenney, Thomas Gleixner, Viresh Kumar

Now that the irq work subsystem can queue remote callbacks, it's
a perfect fit to safely queue IPIs when interrupts are disabled
without worrying about concurrent callers.

Lets use it for the full dynticks kick to notify a CPU that it's
exiting single task mode.

This unbloats a bit the scheduler IPI that the nohz code was abusing
for its cool "callable anywhere/anytime" properties.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/tick.h     |  9 ++++++++-
 kernel/sched/core.c      |  5 +----
 kernel/sched/sched.h     |  2 +-
 kernel/time/tick-sched.c | 10 ++++++----
 4 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index b84773c..8a4987f 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -181,7 +181,13 @@ static inline bool tick_nohz_full_cpu(int cpu)
 
 extern void tick_nohz_init(void);
 extern void __tick_nohz_full_check(void);
-extern void tick_nohz_full_kick(void);
+extern void tick_nohz_full_kick_cpu(int cpu);
+
+static inline void tick_nohz_full_kick(void)
+{
+	tick_nohz_full_kick_cpu(smp_processor_id());
+}
+
 extern void tick_nohz_full_kick_all(void);
 extern void __tick_nohz_task_switch(struct task_struct *tsk);
 #else
@@ -189,6 +195,7 @@ static inline void tick_nohz_init(void) { }
 static inline bool tick_nohz_full_enabled(void) { return false; }
 static inline bool tick_nohz_full_cpu(int cpu) { return false; }
 static inline void __tick_nohz_full_check(void) { }
+static inline void tick_nohz_full_kick_cpu(int cpu) { }
 static inline void tick_nohz_full_kick(void) { }
 static inline void tick_nohz_full_kick_all(void) { }
 static inline void __tick_nohz_task_switch(struct task_struct *tsk) { }
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d9d8ece..fb6dfad 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1500,9 +1500,7 @@ void scheduler_ipi(void)
 	 */
 	preempt_fold_need_resched();
 
-	if (llist_empty(&this_rq()->wake_list)
-			&& !tick_nohz_full_cpu(smp_processor_id())
-			&& !got_nohz_idle_kick())
+	if (llist_empty(&this_rq()->wake_list) && !got_nohz_idle_kick())
 		return;
 
 	/*
@@ -1519,7 +1517,6 @@ void scheduler_ipi(void)
 	 * somewhat pessimize the simple resched case.
 	 */
 	irq_enter();
-	tick_nohz_full_check();
 	sched_ttwu_pending();
 
 	/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 456e492..6089e00 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1225,7 +1225,7 @@ static inline void inc_nr_running(struct rq *rq)
 		if (tick_nohz_full_cpu(rq->cpu)) {
 			/* Order rq->nr_running write against the IPI */
 			smp_wmb();
-			smp_send_reschedule(rq->cpu);
+			tick_nohz_full_kick_cpu(rq->cpu);
 		}
        }
 #endif
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 6558b7a..3d63944 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -224,13 +224,15 @@ static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
 };
 
 /*
- * Kick the current CPU if it's full dynticks in order to force it to
+ * Kick the CPU if it's full dynticks in order to force it to
  * re-evaluate its dependency on the tick and restart it if necessary.
  */
-void tick_nohz_full_kick(void)
+void tick_nohz_full_kick_cpu(int cpu)
 {
-	if (tick_nohz_full_cpu(smp_processor_id()))
-		irq_work_queue(&__get_cpu_var(nohz_full_kick_work));
+	if (!tick_nohz_full_cpu(cpu))
+		return;
+
+	irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu);
 }
 
 static void nohz_full_kick_ipi(void *info)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/5] nohz: Use IPI implicit full barrier against rq->nr_running r/w
  2014-06-03 14:40 [GIT PULL] nohz: Move nohz kick out of scheduler IPI, v7 Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2014-06-03 14:40 ` [PATCH 4/5] nohz: Move full nohz kick to its own IPI Frederic Weisbecker
@ 2014-06-03 14:40 ` Frederic Weisbecker
  2014-06-03 15:02   ` Peter Zijlstra
  2014-06-03 14:43 ` [GIT PULL] nohz: Move nohz kick out of scheduler IPI, v7 Frederic Weisbecker
  5 siblings, 1 reply; 22+ messages in thread
From: Frederic Weisbecker @ 2014-06-03 14:40 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Andrew Morton, Kevin Hilman,
	Paul E. McKenney, Thomas Gleixner, Viresh Kumar

A full dynticks CPU is allowed to stop its tick when a single task runs.
Meanwhile when a new task gets enqueued, the CPU must be notified so that
it can restart its tick to maintain local fairness and other accounting
details.

This notification is performed by way of an IPI. Then when the target
receives the IPI, we expect it to see the new value of rq->nr_running.

Hence the following ordering scenario:

   CPU 0                   CPU 1

   write rq->running       get IPI
   smp_wmb()               smp_rmb()
   send IPI                read rq->nr_running

But Paul Mckenney says that nowadays IPIs imply a full barrier on
all architectures. So we can safely remove this pair and rely on the
implicit barriers that come along IPI send/receive. Lets
just comment on this new assumption.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/sched/core.c  |  9 +++++----
 kernel/sched/sched.h | 10 ++++++++--
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fb6dfad..a06cac1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -670,10 +670,11 @@ bool sched_can_stop_tick(void)
 
        rq = this_rq();
 
-       /* Make sure rq->nr_running update is visible after the IPI */
-       smp_rmb();
-
-       /* More than one running task need preemption */
+       /*
+	* More than one running task need preemption.
+	* nr_running update is assumed to be visible
+	* after IPI is sent from wakers.
+	*/
        if (rq->nr_running > 1)
                return false;
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 6089e00..219bfbd 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1223,8 +1223,14 @@ static inline void inc_nr_running(struct rq *rq)
 #ifdef CONFIG_NO_HZ_FULL
 	if (rq->nr_running == 2) {
 		if (tick_nohz_full_cpu(rq->cpu)) {
-			/* Order rq->nr_running write against the IPI */
-			smp_wmb();
+			/*
+			 * Tick is needed if more than one task runs on a CPU.
+			 * Send the target an IPI to kick it out of nohz mode.
+			 *
+			 * We assume that IPI implies full memory barrier and the
+			 * new value of rq->nr_running is visible on reception
+			 * from the target.
+			 */
 			tick_nohz_full_kick_cpu(rq->cpu);
 		}
        }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [GIT PULL] nohz: Move nohz kick out of scheduler IPI, v7
  2014-06-03 14:40 [GIT PULL] nohz: Move nohz kick out of scheduler IPI, v7 Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2014-06-03 14:40 ` [PATCH 5/5] nohz: Use IPI implicit full barrier against rq->nr_running r/w Frederic Weisbecker
@ 2014-06-03 14:43 ` Frederic Weisbecker
  5 siblings, 0 replies; 22+ messages in thread
From: Frederic Weisbecker @ 2014-06-03 14:43 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: LKML, Andrew Morton, Kevin Hilman, Paul E. McKenney,
	Thomas Gleixner, Viresh Kumar

On Tue, Jun 03, 2014 at 04:40:15PM +0200, Frederic Weisbecker wrote:
> Hi,
> 
> This version fixes the following concerns from Peterz:
> 
> * Warn _before_ work claim on irq_work_queue_on()
> * Warn on in_nmi() while remote queueing
> * Only disabled preemption (and not irqs) on local queueing
> 
> Thanks.
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> 	timers/nohz-irq-work-v5

Oops, wait a minute, the tree is broken, I'll do another pull request.

> 
> Thanks,
> 	Frederic
> ---
> 
> Frederic Weisbecker (5):
>       irq_work: Split raised and lazy lists
>       irq_work: Shorten a bit irq_work_needs_cpu()
>       irq_work: Implement remote queueing
>       nohz: Move full nohz kick to its own IPI
>       nohz: Use IPI implicit full barrier against rq->nr_running r/w
> 
> 
>  include/linux/irq_work.h |  2 ++
>  include/linux/tick.h     |  9 +++++-
>  kernel/irq_work.c        | 72 ++++++++++++++++++++++++++++--------------------
>  kernel/sched/core.c      | 14 ++++------
>  kernel/sched/sched.h     | 12 ++++++--
>  kernel/smp.c             |  4 +++
>  kernel/time/tick-sched.c | 10 ++++---
>  7 files changed, 77 insertions(+), 46 deletions(-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] irq_work: Split raised and lazy lists
  2014-06-03 14:40 ` [PATCH 1/5] irq_work: Split raised and lazy lists Frederic Weisbecker
@ 2014-06-03 14:54   ` Peter Zijlstra
  2014-06-03 14:56     ` Frederic Weisbecker
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2014-06-03 14:54 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Andrew Morton, Kevin Hilman, Paul E. McKenney,
	Thomas Gleixner, Viresh Kumar

[-- Attachment #1: Type: text/plain, Size: 645 bytes --]

On Tue, Jun 03, 2014 at 04:40:16PM +0200, Frederic Weisbecker wrote:
> @@ -90,10 +89,10 @@ EXPORT_SYMBOL_GPL(irq_work_queue);
>  
>  bool irq_work_needs_cpu(void)
>  {
> -	struct llist_head *this_list;
> +	struct llist_head *list;
>  
> -	this_list = &__get_cpu_var(irq_work_list);
> -	if (llist_empty(this_list))
> +	list = &__get_cpu_var(lazy_list);
> +	if (llist_empty(list))
>  		return false;
>  
>  	/* All work should have been flushed before going offline */

Does this mean needs_cpu() only checks the lazy list? What about archs
without the arch_irq_work_raise() function? They run the other list from
the tick too.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] irq_work: Split raised and lazy lists
  2014-06-03 14:54   ` Peter Zijlstra
@ 2014-06-03 14:56     ` Frederic Weisbecker
  0 siblings, 0 replies; 22+ messages in thread
From: Frederic Weisbecker @ 2014-06-03 14:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, LKML, Andrew Morton, Kevin Hilman, Paul E. McKenney,
	Thomas Gleixner, Viresh Kumar

On Tue, Jun 03, 2014 at 04:54:42PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 03, 2014 at 04:40:16PM +0200, Frederic Weisbecker wrote:
> > @@ -90,10 +89,10 @@ EXPORT_SYMBOL_GPL(irq_work_queue);
> >  
> >  bool irq_work_needs_cpu(void)
> >  {
> > -	struct llist_head *this_list;
> > +	struct llist_head *list;
> >  
> > -	this_list = &__get_cpu_var(irq_work_list);
> > -	if (llist_empty(this_list))
> > +	list = &__get_cpu_var(lazy_list);
> > +	if (llist_empty(list))
> >  		return false;
> >  
> >  	/* All work should have been flushed before going offline */
> 
> Does this mean needs_cpu() only checks the lazy list? What about archs
> without the arch_irq_work_raise() function? They run the other list from
> the tick too.

Right, I'll fix that too.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/5] irq_work: Implement remote queueing
  2014-06-03 14:40 ` [PATCH 3/5] irq_work: Implement remote queueing Frederic Weisbecker
@ 2014-06-03 15:00   ` Peter Zijlstra
  2014-06-03 15:08     ` Frederic Weisbecker
  2014-06-03 16:29   ` Eric Dumazet
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2014-06-03 15:00 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Andrew Morton, Kevin Hilman, Paul E. McKenney,
	Thomas Gleixner, Viresh Kumar

[-- Attachment #1: Type: text/plain, Size: 1720 bytes --]

On Tue, Jun 03, 2014 at 04:40:18PM +0200, Frederic Weisbecker wrote:
> irq work currently only supports local callbacks. However its code
> is mostly ready to run remote callbacks and we have some potential user.
> 
> The full nohz subsystem currently open codes its own remote irq work
> on top of the scheduler ipi when it wants a CPU to reevaluate its next
> tick. However this ad hoc solution bloats the scheduler IPI.
> 
> Lets just extend the irq work subsystem to support remote queuing on top
> of the generic SMP IPI to handle this kind of user. This shouldn't add
> noticeable overhead.
> 
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Kevin Hilman <khilman@linaro.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Acked-by: Peter Zijlstra <peterz@infradead.org>

> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> ---
>  include/linux/irq_work.h |  2 ++
>  kernel/irq_work.c        | 22 +++++++++++++++++++++-
>  kernel/smp.c             |  4 ++++
>  3 files changed, 27 insertions(+), 1 deletion(-)

> @@ -198,6 +199,9 @@ void generic_smp_call_function_single_interrupt(void)
>  		csd->func(csd->info);
>  		csd_unlock(csd);
>  	}
> +
> +	/* Handle irq works queued remotely by irq_work_queue_on() */
> +	irq_work_run();
>  }

One could possibly extend that comment by stating that we explicitly run
the irq_work bits after the function bits, because the function bits are
typically synchronous and have people waiting on them, while not so for
the irq_works.


[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] nohz: Move full nohz kick to its own IPI
  2014-06-03 14:40 ` [PATCH 4/5] nohz: Move full nohz kick to its own IPI Frederic Weisbecker
@ 2014-06-03 15:00   ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2014-06-03 15:00 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Andrew Morton, Kevin Hilman, Paul E. McKenney,
	Thomas Gleixner, Viresh Kumar

[-- Attachment #1: Type: text/plain, Size: 1099 bytes --]

On Tue, Jun 03, 2014 at 04:40:19PM +0200, Frederic Weisbecker wrote:
> Now that the irq work subsystem can queue remote callbacks, it's
> a perfect fit to safely queue IPIs when interrupts are disabled
> without worrying about concurrent callers.
> 
> Lets use it for the full dynticks kick to notify a CPU that it's
> exiting single task mode.
> 
> This unbloats a bit the scheduler IPI that the nohz code was abusing
> for its cool "callable anywhere/anytime" properties.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Kevin Hilman <khilman@linaro.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> ---
>  include/linux/tick.h     |  9 ++++++++-
>  kernel/sched/core.c      |  5 +----
>  kernel/sched/sched.h     |  2 +-
>  kernel/time/tick-sched.c | 10 ++++++----
>  4 files changed, 16 insertions(+), 10 deletions(-)
> 

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] nohz: Use IPI implicit full barrier against rq->nr_running r/w
  2014-06-03 14:40 ` [PATCH 5/5] nohz: Use IPI implicit full barrier against rq->nr_running r/w Frederic Weisbecker
@ 2014-06-03 15:02   ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2014-06-03 15:02 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Andrew Morton, Kevin Hilman, Paul E. McKenney,
	Thomas Gleixner, Viresh Kumar

[-- Attachment #1: Type: text/plain, Size: 1307 bytes --]

On Tue, Jun 03, 2014 at 04:40:20PM +0200, Frederic Weisbecker wrote:
> A full dynticks CPU is allowed to stop its tick when a single task runs.
> Meanwhile when a new task gets enqueued, the CPU must be notified so that
> it can restart its tick to maintain local fairness and other accounting
> details.
> 
> This notification is performed by way of an IPI. Then when the target
> receives the IPI, we expect it to see the new value of rq->nr_running.
> 
> Hence the following ordering scenario:
> 
>    CPU 0                   CPU 1
> 
>    write rq->running       get IPI
>    smp_wmb()               smp_rmb()
>    send IPI                read rq->nr_running
> 
> But Paul Mckenney says that nowadays IPIs imply a full barrier on
> all architectures. So we can safely remove this pair and rely on the
> implicit barriers that come along IPI send/receive. Lets
> just comment on this new assumption.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Kevin Hilman <khilman@linaro.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/5] irq_work: Implement remote queueing
  2014-06-03 15:00   ` Peter Zijlstra
@ 2014-06-03 15:08     ` Frederic Weisbecker
  0 siblings, 0 replies; 22+ messages in thread
From: Frederic Weisbecker @ 2014-06-03 15:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, LKML, Andrew Morton, Kevin Hilman, Paul E. McKenney,
	Thomas Gleixner, Viresh Kumar

On Tue, Jun 03, 2014 at 05:00:08PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 03, 2014 at 04:40:18PM +0200, Frederic Weisbecker wrote:
> > irq work currently only supports local callbacks. However its code
> > is mostly ready to run remote callbacks and we have some potential user.
> > 
> > The full nohz subsystem currently open codes its own remote irq work
> > on top of the scheduler ipi when it wants a CPU to reevaluate its next
> > tick. However this ad hoc solution bloats the scheduler IPI.
> > 
> > Lets just extend the irq work subsystem to support remote queuing on top
> > of the generic SMP IPI to handle this kind of user. This shouldn't add
> > noticeable overhead.
> > 
> > Suggested-by: Peter Zijlstra <peterz@infradead.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Kevin Hilman <khilman@linaro.org>
> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> Acked-by: Peter Zijlstra <peterz@infradead.org>
> 
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Viresh Kumar <viresh.kumar@linaro.org>
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > ---
> >  include/linux/irq_work.h |  2 ++
> >  kernel/irq_work.c        | 22 +++++++++++++++++++++-
> >  kernel/smp.c             |  4 ++++
> >  3 files changed, 27 insertions(+), 1 deletion(-)
> 
> > @@ -198,6 +199,9 @@ void generic_smp_call_function_single_interrupt(void)
> >  		csd->func(csd->info);
> >  		csd_unlock(csd);
> >  	}
> > +
> > +	/* Handle irq works queued remotely by irq_work_queue_on() */
> > +	irq_work_run();
> >  }
> 
> One could possibly extend that comment by stating that we explicitly run
> the irq_work bits after the function bits, because the function bits are
> typically synchronous and have people waiting on them, while not so for
> the irq_works.
> 

Good point! I'll expand the comment.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/5] irq_work: Implement remote queueing
  2014-06-03 14:40 ` [PATCH 3/5] irq_work: Implement remote queueing Frederic Weisbecker
  2014-06-03 15:00   ` Peter Zijlstra
@ 2014-06-03 16:29   ` Eric Dumazet
  2014-06-03 16:37     ` Peter Zijlstra
  1 sibling, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2014-06-03 16:29 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, Ingo Molnar, LKML, Andrew Morton, Kevin Hilman,
	Paul E. McKenney, Thomas Gleixner, Viresh Kumar

On Tue, 2014-06-03 at 16:40 +0200, Frederic Weisbecker wrote:
> irq work currently only supports local callbacks. However its code
> is mostly ready to run remote callbacks and we have some potential user.
> 
> The full nohz subsystem currently open codes its own remote irq work
> on top of the scheduler ipi when it wants a CPU to reevaluate its next
> tick. However this ad hoc solution bloats the scheduler IPI.
> 
> Lets just extend the irq work subsystem to support remote queuing on top
> of the generic SMP IPI to handle this kind of user. This shouldn't add
> noticeable overhead.



>   */
> +bool irq_work_queue_on(struct irq_work *work, int cpu)
> +{
> +	/* All work should have been flushed before going offline */
> +	WARN_ON_ONCE(cpu_is_offline(cpu));
> +
> +	/* Arch remote IPI send/receive backend aren't NMI safe */
> +	WARN_ON_ONCE(in_nmi());
> +
> +	/* Only queue if not already pending */
> +	if (!irq_work_claim(work))
> +		return false;
> +
> +	if (llist_add(&work->llnode, &per_cpu(raised_list, cpu)))
> +		native_send_call_func_single_ipi(cpu);
> +
> +	return true;
> +}
> +EXPORT_SYMBOL_GPL(irq_work_queue_on);
> +

I am curious, this should only compile on x86, right ?




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/5] irq_work: Implement remote queueing
  2014-06-03 16:29   ` Eric Dumazet
@ 2014-06-03 16:37     ` Peter Zijlstra
  2014-06-03 20:02       ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2014-06-03 16:37 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Frederic Weisbecker, Ingo Molnar, LKML, Andrew Morton,
	Kevin Hilman, Paul E. McKenney, Thomas Gleixner, Viresh Kumar

[-- Attachment #1: Type: text/plain, Size: 882 bytes --]

On Tue, Jun 03, 2014 at 09:29:07AM -0700, Eric Dumazet wrote:
> > +bool irq_work_queue_on(struct irq_work *work, int cpu)
> > +{
> > +	/* All work should have been flushed before going offline */
> > +	WARN_ON_ONCE(cpu_is_offline(cpu));
> > +
> > +	/* Arch remote IPI send/receive backend aren't NMI safe */
> > +	WARN_ON_ONCE(in_nmi());
> > +
> > +	/* Only queue if not already pending */
> > +	if (!irq_work_claim(work))
> > +		return false;
> > +
> > +	if (llist_add(&work->llnode, &per_cpu(raised_list, cpu)))
> > +		native_send_call_func_single_ipi(cpu);
> > +
> > +	return true;
> > +}
> > +EXPORT_SYMBOL_GPL(irq_work_queue_on);
> > +
> 
> I am curious, this should only compile on x86, right ?

Oh, you tease, you forgot to say why you think this.

Are you referring to the in_nmi() usage? that's from
include/linux/hardirq.h, hardly x86 specific.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/5] irq_work: Implement remote queueing
  2014-06-03 16:37     ` Peter Zijlstra
@ 2014-06-03 20:02       ` Eric Dumazet
  2014-06-03 20:29         ` Frederic Weisbecker
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2014-06-03 20:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, Ingo Molnar, LKML, Andrew Morton,
	Kevin Hilman, Paul E. McKenney, Thomas Gleixner, Viresh Kumar

On Tue, 2014-06-03 at 18:37 +0200, Peter Zijlstra wrote:
> On Tue, Jun 03, 2014 at 09:29:07AM -0700, Eric Dumazet wrote:
> > > +bool irq_work_queue_on(struct irq_work *work, int cpu)
> > > +{
> > > +	/* All work should have been flushed before going offline */
> > > +	WARN_ON_ONCE(cpu_is_offline(cpu));
> > > +
> > > +	/* Arch remote IPI send/receive backend aren't NMI safe */
> > > +	WARN_ON_ONCE(in_nmi());
> > > +
> > > +	/* Only queue if not already pending */
> > > +	if (!irq_work_claim(work))
> > > +		return false;
> > > +
> > > +	if (llist_add(&work->llnode, &per_cpu(raised_list, cpu)))
> > > +		native_send_call_func_single_ipi(cpu);
> > > +
> > > +	return true;
> > > +}
> > > +EXPORT_SYMBOL_GPL(irq_work_queue_on);
> > > +
> > 
> > I am curious, this should only compile on x86, right ?
> 
> Oh, you tease, you forgot to say why you think this.
> 
> Are you referring to the in_nmi() usage? that's from
> include/linux/hardirq.h, hardly x86 specific.

No, my eyes were attracted by native_send_call_func_single_ipi()





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/5] irq_work: Implement remote queueing
  2014-06-03 20:02       ` Eric Dumazet
@ 2014-06-03 20:29         ` Frederic Weisbecker
  0 siblings, 0 replies; 22+ messages in thread
From: Frederic Weisbecker @ 2014-06-03 20:29 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Peter Zijlstra, Ingo Molnar, LKML, Andrew Morton, Kevin Hilman,
	Paul E. McKenney, Thomas Gleixner, Viresh Kumar

On Tue, Jun 03, 2014 at 01:02:39PM -0700, Eric Dumazet wrote:
> On Tue, 2014-06-03 at 18:37 +0200, Peter Zijlstra wrote:
> > On Tue, Jun 03, 2014 at 09:29:07AM -0700, Eric Dumazet wrote:
> > > > +bool irq_work_queue_on(struct irq_work *work, int cpu)
> > > > +{
> > > > +	/* All work should have been flushed before going offline */
> > > > +	WARN_ON_ONCE(cpu_is_offline(cpu));
> > > > +
> > > > +	/* Arch remote IPI send/receive backend aren't NMI safe */
> > > > +	WARN_ON_ONCE(in_nmi());
> > > > +
> > > > +	/* Only queue if not already pending */
> > > > +	if (!irq_work_claim(work))
> > > > +		return false;
> > > > +
> > > > +	if (llist_add(&work->llnode, &per_cpu(raised_list, cpu)))
> > > > +		native_send_call_func_single_ipi(cpu);
> > > > +
> > > > +	return true;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(irq_work_queue_on);
> > > > +
> > > 
> > > I am curious, this should only compile on x86, right ?
> > 
> > Oh, you tease, you forgot to say why you think this.
> > 
> > Are you referring to the in_nmi() usage? that's from
> > include/linux/hardirq.h, hardly x86 specific.
> 
> No, my eyes were attracted by native_send_call_func_single_ipi()

Right, should be arch_send_call_function_single_ipi(). I'm like unable
to get a correct patchset before at least 9 takes...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/5] irq_work: Implement remote queueing
  2014-05-26 19:19       ` Peter Zijlstra
@ 2014-05-26 19:26         ` Frederic Weisbecker
  0 siblings, 0 replies; 22+ messages in thread
From: Frederic Weisbecker @ 2014-05-26 19:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Andrew Morton, Ingo Molnar, Kevin Hilman, Paul E. McKenney,
	Thomas Gleixner, Viresh Kumar

On Mon, May 26, 2014 at 09:19:18PM +0200, Peter Zijlstra wrote:
> On Mon, May 26, 2014 at 06:50:41PM +0200, Frederic Weisbecker wrote:
> > On Mon, May 26, 2014 at 06:02:24PM +0200, Peter Zijlstra wrote:
> > > On Sun, May 25, 2014 at 04:29:49PM +0200, Frederic Weisbecker wrote:
> > > > +bool irq_work_queue_on(struct irq_work *work, int cpu)
> > > > +{
> > > > +	/* Only queue if not already pending */
> > > > +	if (!irq_work_claim(work))
> > > > +		return false;
> > > > +
> > > > +	/* All work should have been flushed before going offline */
> > > > +	WARN_ON_ONCE(cpu_is_offline(cpu));
> > > 	WARN_ON_ONCE(in_nmi());
> > 
> > But it's actually NMI safe, isn't it?
> 
> Deja-vu:
> 
>   lkml.kernel.org/r/20140514115406.GA11096@twins.programming.kicks-ass.net

Doh!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/5] irq_work: Implement remote queueing
  2014-05-26 16:50     ` Frederic Weisbecker
@ 2014-05-26 19:19       ` Peter Zijlstra
  2014-05-26 19:26         ` Frederic Weisbecker
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2014-05-26 19:19 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Andrew Morton, Ingo Molnar, Kevin Hilman, Paul E. McKenney,
	Thomas Gleixner, Viresh Kumar

On Mon, May 26, 2014 at 06:50:41PM +0200, Frederic Weisbecker wrote:
> On Mon, May 26, 2014 at 06:02:24PM +0200, Peter Zijlstra wrote:
> > On Sun, May 25, 2014 at 04:29:49PM +0200, Frederic Weisbecker wrote:
> > > +bool irq_work_queue_on(struct irq_work *work, int cpu)
> > > +{
> > > +	/* Only queue if not already pending */
> > > +	if (!irq_work_claim(work))
> > > +		return false;
> > > +
> > > +	/* All work should have been flushed before going offline */
> > > +	WARN_ON_ONCE(cpu_is_offline(cpu));
> > 	WARN_ON_ONCE(in_nmi());
> 
> But it's actually NMI safe, isn't it?

Deja-vu:

  lkml.kernel.org/r/20140514115406.GA11096@twins.programming.kicks-ass.net

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/5] irq_work: Implement remote queueing
  2014-05-26 16:02   ` Peter Zijlstra
@ 2014-05-26 16:50     ` Frederic Weisbecker
  2014-05-26 19:19       ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Frederic Weisbecker @ 2014-05-26 16:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Andrew Morton, Ingo Molnar, Kevin Hilman, Paul E. McKenney,
	Thomas Gleixner, Viresh Kumar

On Mon, May 26, 2014 at 06:02:24PM +0200, Peter Zijlstra wrote:
> On Sun, May 25, 2014 at 04:29:49PM +0200, Frederic Weisbecker wrote:
> > +bool irq_work_queue_on(struct irq_work *work, int cpu)
> > +{
> > +	/* Only queue if not already pending */
> > +	if (!irq_work_claim(work))
> > +		return false;
> > +
> > +	/* All work should have been flushed before going offline */
> > +	WARN_ON_ONCE(cpu_is_offline(cpu));
> 	WARN_ON_ONCE(in_nmi());

But it's actually NMI safe, isn't it?

> 
> 
> Also, I would do both these checks before the claim thing.

Right, I'll move the check.

> 
> > +
> > +	if (llist_add(&work->llnode, &per_cpu(raised_list, cpu)))
> > +		native_send_call_func_single_ipi(cpu);
> > +
> > +	return true;
> > +}
> > +EXPORT_SYMBOL_GPL(irq_work_queue_on);
> 
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/5] irq_work: Implement remote queueing
  2014-05-25 14:29 ` [PATCH 3/5] irq_work: Implement remote queueing Frederic Weisbecker
@ 2014-05-26 16:02   ` Peter Zijlstra
  2014-05-26 16:50     ` Frederic Weisbecker
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2014-05-26 16:02 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Andrew Morton, Ingo Molnar, Kevin Hilman, Paul E. McKenney,
	Thomas Gleixner, Viresh Kumar

[-- Attachment #1: Type: text/plain, Size: 600 bytes --]

On Sun, May 25, 2014 at 04:29:49PM +0200, Frederic Weisbecker wrote:
> +bool irq_work_queue_on(struct irq_work *work, int cpu)
> +{
> +	/* Only queue if not already pending */
> +	if (!irq_work_claim(work))
> +		return false;
> +
> +	/* All work should have been flushed before going offline */
> +	WARN_ON_ONCE(cpu_is_offline(cpu));
	WARN_ON_ONCE(in_nmi());


Also, I would do both these checks before the claim thing.

> +
> +	if (llist_add(&work->llnode, &per_cpu(raised_list, cpu)))
> +		native_send_call_func_single_ipi(cpu);
> +
> +	return true;
> +}
> +EXPORT_SYMBOL_GPL(irq_work_queue_on);



[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 3/5] irq_work: Implement remote queueing
  2014-05-25 14:29 [PATCH 0/5] nohz: Move nohz kick out of scheduler IPI, v6 Frederic Weisbecker
@ 2014-05-25 14:29 ` Frederic Weisbecker
  2014-05-26 16:02   ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Frederic Weisbecker @ 2014-05-25 14:29 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, Ingo Molnar, Kevin Hilman,
	Paul E. McKenney, Peter Zijlstra, Thomas Gleixner, Viresh Kumar

irq work currently only supports local callbacks. However its code
is mostly ready to run remote callbacks and we have some potential user.

The full nohz subsystem currently open codes its own remote irq work
on top of the scheduler ipi when it wants a CPU to reevaluate its next
tick. However this ad hoc solution bloats the scheduler IPI.

Lets just extend the irq work subsystem to support remote queuing on top
of the generic SMP IPI to handle this kind of user. This shouldn't add
noticeable overhead.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/irq_work.h |  2 ++
 kernel/irq_work.c        | 19 ++++++++++++++++++-
 kernel/smp.c             |  4 ++++
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index 19ae05d..ae44aa2 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -33,6 +33,8 @@ void init_irq_work(struct irq_work *work, void (*func)(struct irq_work *))
 #define DEFINE_IRQ_WORK(name, _f) struct irq_work name = { .func = (_f), }
 
 bool irq_work_queue(struct irq_work *work);
+bool irq_work_queue_on(struct irq_work *work, int cpu);
+
 void irq_work_run(void);
 void irq_work_sync(struct irq_work *work);
 
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index dee5a1f..e58e2c5 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -56,11 +56,28 @@ void __weak arch_irq_work_raise(void)
 }
 
 /*
- * Enqueue the irq_work @entry unless it's already pending
+ * Enqueue the irq_work @work on @cpu unless it's already pending
  * somewhere.
  *
  * Can be re-enqueued while the callback is still in progress.
  */
+bool irq_work_queue_on(struct irq_work *work, int cpu)
+{
+	/* Only queue if not already pending */
+	if (!irq_work_claim(work))
+		return false;
+
+	/* All work should have been flushed before going offline */
+	WARN_ON_ONCE(cpu_is_offline(cpu));
+
+	if (llist_add(&work->llnode, &per_cpu(raised_list, cpu)))
+		native_send_call_func_single_ipi(cpu);
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(irq_work_queue_on);
+
+/* Enqueue the irq work @work on the current CPU */
 bool irq_work_queue(struct irq_work *work)
 {
 	unsigned long flags;
diff --git a/kernel/smp.c b/kernel/smp.c
index 06d574e..ba0d8fd 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -3,6 +3,7 @@
  *
  * (C) Jens Axboe <jens.axboe@oracle.com> 2008
  */
+#include <linux/irq_work.h>
 #include <linux/rcupdate.h>
 #include <linux/rculist.h>
 #include <linux/kernel.h>
@@ -198,6 +199,9 @@ void generic_smp_call_function_single_interrupt(void)
 		csd->func(csd->info);
 		csd_unlock(csd);
 	}
+
+	/* Handle irq works queued remotely by irq_work_queue_on() */
+	irq_work_run();
 }
 
 /*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-06-03 20:29 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-03 14:40 [GIT PULL] nohz: Move nohz kick out of scheduler IPI, v7 Frederic Weisbecker
2014-06-03 14:40 ` [PATCH 1/5] irq_work: Split raised and lazy lists Frederic Weisbecker
2014-06-03 14:54   ` Peter Zijlstra
2014-06-03 14:56     ` Frederic Weisbecker
2014-06-03 14:40 ` [PATCH 2/5] irq_work: Shorten a bit irq_work_needs_cpu() Frederic Weisbecker
2014-06-03 14:40 ` [PATCH 3/5] irq_work: Implement remote queueing Frederic Weisbecker
2014-06-03 15:00   ` Peter Zijlstra
2014-06-03 15:08     ` Frederic Weisbecker
2014-06-03 16:29   ` Eric Dumazet
2014-06-03 16:37     ` Peter Zijlstra
2014-06-03 20:02       ` Eric Dumazet
2014-06-03 20:29         ` Frederic Weisbecker
2014-06-03 14:40 ` [PATCH 4/5] nohz: Move full nohz kick to its own IPI Frederic Weisbecker
2014-06-03 15:00   ` Peter Zijlstra
2014-06-03 14:40 ` [PATCH 5/5] nohz: Use IPI implicit full barrier against rq->nr_running r/w Frederic Weisbecker
2014-06-03 15:02   ` Peter Zijlstra
2014-06-03 14:43 ` [GIT PULL] nohz: Move nohz kick out of scheduler IPI, v7 Frederic Weisbecker
  -- strict thread matches above, loose matches on Subject: below --
2014-05-25 14:29 [PATCH 0/5] nohz: Move nohz kick out of scheduler IPI, v6 Frederic Weisbecker
2014-05-25 14:29 ` [PATCH 3/5] irq_work: Implement remote queueing Frederic Weisbecker
2014-05-26 16:02   ` Peter Zijlstra
2014-05-26 16:50     ` Frederic Weisbecker
2014-05-26 19:19       ` Peter Zijlstra
2014-05-26 19:26         ` Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).