All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL v2] nohz: Move nohz kick out of scheduler IPI
@ 2014-04-03 16:17 Frederic Weisbecker
  2014-04-03 16:17 ` [PATCH 1/2] smp: Non busy-waiting IPI queue Frederic Weisbecker
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Frederic Weisbecker @ 2014-04-03 16:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Paul E. McKenney, Andrew Morton,
	Jens Axboe, Kevin Hilman, Peter Zijlstra

Ingo, Thomas,

Please pull the timers/nohz-ipi-for-tip-v3 branch that can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	timers/nohz-ipi-for-tip-v3

It is based on 7a48837732f87a574ee3e1855927dc250117f565 
("Merge branch 'for-3.15/core' of git://git.kernel.dk/linux-block")
due to dependencies on the block tree containing IPI core changes.

This v2 addresses Paul review and add his reviewed-by tags.

I'm trying to get these 2 patches in before the merge window ends
otherwise we'll have to do tricky branch split on the next cycle.

---
When a full dynticks CPU runs in single task mode then a new task gets
enqueued, we notify it through an IPI such that it restarts its tick.

The IPI used here is the scheduler IPI. There are a few reasons for that:
it can be called when interrupts are disabled, it can be called
concurrently... These convenient properties altogether aren't yet offered
by the IPI subsystem.

Meanwhile, bloating that way the scheduler IPI with scheduler unrelated
code is an abuse of this fast path. We certainly don't want to start a
big kernel IPI.

So this patchset adds a small helper to the IPI subsystem that allows
to queue an IPI from interrupt disabled code while handling concurrent
callers as well. Eventually the nohz kick gets converted to this new facility.

Partly inspired by a suggestion from Peter Zijlstra.

* Patch 1/2 brings the IPI infrastructure to support this
* Patch 2/2 does the nohz IPI conversion


Thanks,
	Frederic
---

Frederic Weisbecker (2):
      smp: Non busy-waiting IPI queue
      nohz: Move full nohz kick to its own IPI


 include/linux/smp.h      | 11 +++++++++++
 include/linux/tick.h     |  2 ++
 kernel/sched/core.c      |  5 +----
 kernel/sched/sched.h     |  2 +-
 kernel/smp.c             | 42 ++++++++++++++++++++++++++++++++++++++++++
 kernel/time/tick-sched.c | 21 +++++++++++++++++++++
 6 files changed, 78 insertions(+), 5 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] smp: Non busy-waiting IPI queue
  2014-04-03 16:17 [GIT PULL v2] nohz: Move nohz kick out of scheduler IPI Frederic Weisbecker
@ 2014-04-03 16:17 ` Frederic Weisbecker
  2014-04-03 16:17 ` [PATCH 2/2] nohz: Move full nohz kick to its own IPI Frederic Weisbecker
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Frederic Weisbecker @ 2014-04-03 16:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Andrew Morton, Jens Axboe,
	Kevin Hilman, Paul E. McKenney, Peter Zijlstra

Some IPI users, such as the nohz subsystem, need to be able to send
an async IPI (async = non waiting for any other IPI completion) on
contexts with disabled interrupts. And we want the IPI subsystem to handle
concurrent calls by itself.

Currently the nohz machinery uses the scheduler IPI for this purpose
because it can be triggered from any context and doesn't need any
serialization from the caller. But this is an abuse of a scheduler
fast path. We are bloating it with a job that should use its own IPI.

The current set of IPI functions can't be called when interrupts are
disabled otherwise we risk a deadlock when two CPUs wait for each
other's IPI completion.

OTOH smp_call_function_single_async() can be called when interrupts
are disabled. But then it's up to the caller to serialize the given
IPI. This can't be called concurrently without special care.

So we need a version of the async IPI that takes care of concurrent
calls.

The proposed solution is to synchronize the IPI with a specific flag
that prevents the IPI from being sent if it is already pending but not
yet executed. Ordering is maintained such that, if the IPI is not sent
because it's already pending, we guarantee it will see the new state of
the data we expect it to when it will execute.

This model is close to the irq_work design. It's also partly inspired by
suggestions from Peter Zijlstra.

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/smp.h | 11 +++++++++++
 kernel/smp.c        | 42 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index 633f5ed..0de1eff 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -29,6 +29,17 @@ extern unsigned int total_cpus;
 int smp_call_function_single(int cpuid, smp_call_func_t func, void *info,
 			     int wait);
 
+struct queue_single_data;
+typedef void (*smp_queue_func_t)(struct queue_single_data *qsd);
+
+struct queue_single_data {
+	struct call_single_data data;
+	smp_queue_func_t func;
+	int pending;
+};
+
+int smp_queue_function_single(int cpu, struct queue_single_data *qsd);
+
 /*
  * Call a function on all processors
  */
diff --git a/kernel/smp.c b/kernel/smp.c
index 06d574e..7589be5 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -265,6 +265,48 @@ int smp_call_function_single_async(int cpu, struct call_single_data *csd)
 }
 EXPORT_SYMBOL_GPL(smp_call_function_single_async);
 
+void generic_smp_queue_function_single_interrupt(void *info)
+{
+	struct queue_single_data *qsd = info;
+
+	WARN_ON_ONCE(xchg(&qsd->pending, 0) != 1);
+	qsd->func(qsd);
+}
+
+/**
+ * smp_queue_function_single - Queue an asynchronous function to run on a
+ * 				specific CPU unless it's already pending.
+ * @cpu: The CPU to run on.
+ * @qsd: Pre-allocated and setup data structure
+ *
+ * Like smp_call_function_single_async() but the call to the function is
+ * serialized and won't be queued if it is already pending. In the latter case,
+ * ordering is still guaranteed such that the pending call will see the new
+ * data we expect it to.
+ *
+ * This must not be called on offline CPUs.
+ *
+ * Returns 0 when function is successfully queued or already pending, else a
+ * negative status code.
+ */
+int smp_queue_function_single(int cpu, struct queue_single_data *qsd)
+{
+	int err;
+
+	if (cmpxchg(&qsd->pending, 0, 1))
+		return 0;
+
+	preempt_disable();
+	err = generic_exec_single(cpu, &qsd->data, generic_smp_queue_function_single_interrupt, qsd, 0);
+	preempt_enable();
+
+	/* Reset in case of error. This must not be called on offline CPUs */
+	if (err)
+		qsd->pending = 0;
+
+	return err;
+}
+
 /*
  * smp_call_function_any - Run a function on any of the given cpus
  * @mask: The mask of cpus it can run on.
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] nohz: Move full nohz kick to its own IPI
  2014-04-03 16:17 [GIT PULL v2] nohz: Move nohz kick out of scheduler IPI Frederic Weisbecker
  2014-04-03 16:17 ` [PATCH 1/2] smp: Non busy-waiting IPI queue Frederic Weisbecker
@ 2014-04-03 16:17 ` Frederic Weisbecker
  2014-04-09 15:28 ` [GIT PULL v2] nohz: Move nohz kick out of scheduler IPI Frederic Weisbecker
  2014-04-16  7:37 ` Ingo Molnar
  3 siblings, 0 replies; 13+ messages in thread
From: Frederic Weisbecker @ 2014-04-03 16:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Andrew Morton, Jens Axboe,
	Kevin Hilman, Paul E. McKenney, Peter Zijlstra

Now that we have smp_queue_function_single() which can be used to
safely queue IPIs when interrupts are disabled and without worrying
about concurrent callers, lets use it for the full dynticks kick to
notify a CPU that it's exiting single task mode.

This unbloats a bit the scheduler IPI that the nohz code was abusing
for its cool "callable anywhere/anytime" properties.

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/tick.h     |  2 ++
 kernel/sched/core.c      |  5 +----
 kernel/sched/sched.h     |  2 +-
 kernel/time/tick-sched.c | 21 +++++++++++++++++++++
 4 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index b84773c..9d3fcc2 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -182,6 +182,7 @@ static inline bool tick_nohz_full_cpu(int cpu)
 extern void tick_nohz_init(void);
 extern void __tick_nohz_full_check(void);
 extern void tick_nohz_full_kick(void);
+extern void tick_nohz_full_kick_cpu(int cpu);
 extern void tick_nohz_full_kick_all(void);
 extern void __tick_nohz_task_switch(struct task_struct *tsk);
 #else
@@ -190,6 +191,7 @@ static inline bool tick_nohz_full_enabled(void) { return false; }
 static inline bool tick_nohz_full_cpu(int cpu) { return false; }
 static inline void __tick_nohz_full_check(void) { }
 static inline void tick_nohz_full_kick(void) { }
+static inline void tick_nohz_full_kick_cpu(int cpu) { }
 static inline void tick_nohz_full_kick_all(void) { }
 static inline void __tick_nohz_task_switch(struct task_struct *tsk) { }
 #endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9cae286..e4b344e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1499,9 +1499,7 @@ void scheduler_ipi(void)
 	 */
 	preempt_fold_need_resched();
 
-	if (llist_empty(&this_rq()->wake_list)
-			&& !tick_nohz_full_cpu(smp_processor_id())
-			&& !got_nohz_idle_kick())
+	if (llist_empty(&this_rq()->wake_list) && !got_nohz_idle_kick())
 		return;
 
 	/*
@@ -1518,7 +1516,6 @@ void scheduler_ipi(void)
 	 * somewhat pessimize the simple resched case.
 	 */
 	irq_enter();
-	tick_nohz_full_check();
 	sched_ttwu_pending();
 
 	/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c9007f2..4771063 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1225,7 +1225,7 @@ static inline void inc_nr_running(struct rq *rq)
 		if (tick_nohz_full_cpu(rq->cpu)) {
 			/* Order rq->nr_running write against the IPI */
 			smp_wmb();
-			smp_send_reschedule(rq->cpu);
+			tick_nohz_full_kick_cpu(rq->cpu);
 		}
        }
 #endif
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 9f8af69..582d3f6 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -230,6 +230,27 @@ void tick_nohz_full_kick(void)
 		irq_work_queue(&__get_cpu_var(nohz_full_kick_work));
 }
 
+static void nohz_full_kick_queue(struct queue_single_data *qsd)
+{
+	__tick_nohz_full_check();
+}
+
+static DEFINE_PER_CPU(struct queue_single_data, nohz_full_kick_qsd) = {
+	.func = nohz_full_kick_queue,
+};
+
+void tick_nohz_full_kick_cpu(int cpu)
+{
+	if (!tick_nohz_full_cpu(cpu))
+		return;
+
+	if (cpu == smp_processor_id()) {
+		irq_work_queue(&__get_cpu_var(nohz_full_kick_work));
+	} else {
+		smp_queue_function_single(cpu, &per_cpu(nohz_full_kick_qsd, cpu));
+	}
+}
+
 static void nohz_full_kick_ipi(void *info)
 {
 	__tick_nohz_full_check();
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [GIT PULL v2] nohz: Move nohz kick out of scheduler IPI
  2014-04-03 16:17 [GIT PULL v2] nohz: Move nohz kick out of scheduler IPI Frederic Weisbecker
  2014-04-03 16:17 ` [PATCH 1/2] smp: Non busy-waiting IPI queue Frederic Weisbecker
  2014-04-03 16:17 ` [PATCH 2/2] nohz: Move full nohz kick to its own IPI Frederic Weisbecker
@ 2014-04-09 15:28 ` Frederic Weisbecker
  2014-04-14 11:22   ` Ingo Molnar
  2014-04-16  7:37 ` Ingo Molnar
  3 siblings, 1 reply; 13+ messages in thread
From: Frederic Weisbecker @ 2014-04-09 15:28 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner
  Cc: LKML, Paul E. McKenney, Andrew Morton, Jens Axboe, Kevin Hilman,
	Peter Zijlstra

On Thu, Apr 03, 2014 at 06:17:10PM +0200, Frederic Weisbecker wrote:
> Ingo, Thomas,
> 
> Please pull the timers/nohz-ipi-for-tip-v3 branch that can be found at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> 	timers/nohz-ipi-for-tip-v3

Ping?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL v2] nohz: Move nohz kick out of scheduler IPI
  2014-04-09 15:28 ` [GIT PULL v2] nohz: Move nohz kick out of scheduler IPI Frederic Weisbecker
@ 2014-04-14 11:22   ` Ingo Molnar
  2014-04-14 23:04     ` Frederic Weisbecker
  0 siblings, 1 reply; 13+ messages in thread
From: Ingo Molnar @ 2014-04-14 11:22 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, LKML, Paul E. McKenney, Andrew Morton,
	Jens Axboe, Kevin Hilman, Peter Zijlstra


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> On Thu, Apr 03, 2014 at 06:17:10PM +0200, Frederic Weisbecker wrote:
> > Ingo, Thomas,
> > 
> > Please pull the timers/nohz-ipi-for-tip-v3 branch that can be found at:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > 	timers/nohz-ipi-for-tip-v3
> 
> Ping?

So this came a bit late - will it be fine for v3.16 as-is?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL v2] nohz: Move nohz kick out of scheduler IPI
  2014-04-14 11:22   ` Ingo Molnar
@ 2014-04-14 23:04     ` Frederic Weisbecker
  2014-04-15  6:05       ` Ingo Molnar
  0 siblings, 1 reply; 13+ messages in thread
From: Frederic Weisbecker @ 2014-04-14 23:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, LKML, Paul E. McKenney, Andrew Morton,
	Jens Axboe, Kevin Hilman, Peter Zijlstra

On Mon, Apr 14, 2014 at 01:22:17PM +0200, Ingo Molnar wrote:
> 
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > On Thu, Apr 03, 2014 at 06:17:10PM +0200, Frederic Weisbecker wrote:
> > > Ingo, Thomas,
> > > 
> > > Please pull the timers/nohz-ipi-for-tip-v3 branch that can be found at:
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > > 	timers/nohz-ipi-for-tip-v3
> > 
> > Ping?
> 
> So this came a bit late - will it be fine for v3.16 as-is?

It was late on purpose due to dependencies on the block tree. But nevermind.

It's still fine to be pulled as is after -rc1 for 3.16.

Thanks!
 
> Thanks,
> 
> 	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL v2] nohz: Move nohz kick out of scheduler IPI
  2014-04-14 23:04     ` Frederic Weisbecker
@ 2014-04-15  6:05       ` Ingo Molnar
  0 siblings, 0 replies; 13+ messages in thread
From: Ingo Molnar @ 2014-04-15  6:05 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, LKML, Paul E. McKenney, Andrew Morton,
	Jens Axboe, Kevin Hilman, Peter Zijlstra


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> On Mon, Apr 14, 2014 at 01:22:17PM +0200, Ingo Molnar wrote:
> > 
> > * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > 
> > > On Thu, Apr 03, 2014 at 06:17:10PM +0200, Frederic Weisbecker wrote:
> > > > Ingo, Thomas,
> > > > 
> > > > Please pull the timers/nohz-ipi-for-tip-v3 branch that can be found at:
> > > > 
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > > > 	timers/nohz-ipi-for-tip-v3
> > > 
> > > Ping?
> > 
> > So this came a bit late - will it be fine for v3.16 as-is?
> 
> It was late on purpose due to dependencies on the block tree. [...]

Well, changing the generic SMP IPI facilities via the block tree was 
indeed a problem.

> But nevermind.
> 
> It's still fine to be pulled as is after -rc1 for 3.16.

Ok, will do.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL v2] nohz: Move nohz kick out of scheduler IPI
  2014-04-03 16:17 [GIT PULL v2] nohz: Move nohz kick out of scheduler IPI Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2014-04-09 15:28 ` [GIT PULL v2] nohz: Move nohz kick out of scheduler IPI Frederic Weisbecker
@ 2014-04-16  7:37 ` Ingo Molnar
  3 siblings, 0 replies; 13+ messages in thread
From: Ingo Molnar @ 2014-04-16  7:37 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, LKML, Paul E. McKenney, Andrew Morton,
	Jens Axboe, Kevin Hilman, Peter Zijlstra


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> Ingo, Thomas,
> 
> Please pull the timers/nohz-ipi-for-tip-v3 branch that can be found at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> 	timers/nohz-ipi-for-tip-v3
> 
> It is based on 7a48837732f87a574ee3e1855927dc250117f565 
> ("Merge branch 'for-3.15/core' of git://git.kernel.dk/linux-block")
> due to dependencies on the block tree containing IPI core changes.
> 
> This v2 addresses Paul review and add his reviewed-by tags.
> 
> I'm trying to get these 2 patches in before the merge window ends
> otherwise we'll have to do tricky branch split on the next cycle.
> 
> ---
> When a full dynticks CPU runs in single task mode then a new task gets
> enqueued, we notify it through an IPI such that it restarts its tick.
> 
> The IPI used here is the scheduler IPI. There are a few reasons for that:
> it can be called when interrupts are disabled, it can be called
> concurrently... These convenient properties altogether aren't yet offered
> by the IPI subsystem.
> 
> Meanwhile, bloating that way the scheduler IPI with scheduler unrelated
> code is an abuse of this fast path. We certainly don't want to start a
> big kernel IPI.
> 
> So this patchset adds a small helper to the IPI subsystem that allows
> to queue an IPI from interrupt disabled code while handling concurrent
> callers as well. Eventually the nohz kick gets converted to this new facility.
> 
> Partly inspired by a suggestion from Peter Zijlstra.
> 
> * Patch 1/2 brings the IPI infrastructure to support this
> * Patch 2/2 does the nohz IPI conversion
> 
> 
> Thanks,
> 	Frederic
> ---
> 
> Frederic Weisbecker (2):
>       smp: Non busy-waiting IPI queue
>       nohz: Move full nohz kick to its own IPI
> 
> 
>  include/linux/smp.h      | 11 +++++++++++
>  include/linux/tick.h     |  2 ++
>  kernel/sched/core.c      |  5 +----
>  kernel/sched/sched.h     |  2 +-
>  kernel/smp.c             | 42 ++++++++++++++++++++++++++++++++++++++++++
>  kernel/time/tick-sched.c | 21 +++++++++++++++++++++
>  6 files changed, 78 insertions(+), 5 deletions(-)

Pulled into tip:timers/nohz, thanks a lot Frederic!

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] smp: Non busy-waiting IPI queue
  2014-04-03  0:09 [PATCH 0/2] nohz: Move nohz kick out of scheduler IPI v2 Frederic Weisbecker
@ 2014-04-03  0:09 ` Frederic Weisbecker
  0 siblings, 0 replies; 13+ messages in thread
From: Frederic Weisbecker @ 2014-04-03  0:09 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Frederic Weisbecker, Andrew Morton, Ingo Molnar,
	Jens Axboe, Kevin Hilman, Peter Zijlstra, Thomas Gleixner

Some IPI users, such as the nohz subsystem, need to be able to send
an async IPI (async = non waiting for any other IPI completion) on
contexts with disabled interrupts. And we want the IPI subsystem to handle
concurrent calls by itself.

Currently the nohz machinery uses the scheduler IPI for this purpose
because it can be triggered from any context and doesn't need any
serialization from the caller. But this is an abuse of a scheduler
fast path. We are bloating it with a job that should use its own IPI.

The current set of IPI functions can't be called when interrupts are
disabled otherwise we risk a deadlock when two CPUs wait for each
other's IPI completion.

OTOH smp_call_function_single_async() can be called when interrupts
are disabled. But then it's up to the caller to serialize the given
IPI. This can't be called concurrently without special care.

So we need a version of the async IPI that takes care of concurrent
calls.

The proposed solution is to synchronize the IPI with a specific flag
that prevents the IPI from being sent if it is already pending but not
yet executed. Ordering is maintained such that, if the IPI is not sent
because it's already pending, we guarantee it will see the new state of
the data we expect it to when it will execute.

This model is close to the irq_work design. It's also partly inspired by
suggestions from Peter Zijlstra.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/smp.h | 11 +++++++++++
 kernel/smp.c        | 42 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index 633f5ed..0de1eff 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -29,6 +29,17 @@ extern unsigned int total_cpus;
 int smp_call_function_single(int cpuid, smp_call_func_t func, void *info,
 			     int wait);
 
+struct queue_single_data;
+typedef void (*smp_queue_func_t)(struct queue_single_data *qsd);
+
+struct queue_single_data {
+	struct call_single_data data;
+	smp_queue_func_t func;
+	int pending;
+};
+
+int smp_queue_function_single(int cpu, struct queue_single_data *qsd);
+
 /*
  * Call a function on all processors
  */
diff --git a/kernel/smp.c b/kernel/smp.c
index 06d574e..7589be5 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -265,6 +265,48 @@ int smp_call_function_single_async(int cpu, struct call_single_data *csd)
 }
 EXPORT_SYMBOL_GPL(smp_call_function_single_async);
 
+void generic_smp_queue_function_single_interrupt(void *info)
+{
+	struct queue_single_data *qsd = info;
+
+	WARN_ON_ONCE(xchg(&qsd->pending, 0) != 1);
+	qsd->func(qsd);
+}
+
+/**
+ * smp_queue_function_single - Queue an asynchronous function to run on a
+ * 				specific CPU unless it's already pending.
+ * @cpu: The CPU to run on.
+ * @qsd: Pre-allocated and setup data structure
+ *
+ * Like smp_call_function_single_async() but the call to the function is
+ * serialized and won't be queued if it is already pending. In the latter case,
+ * ordering is still guaranteed such that the pending call will see the new
+ * data we expect it to.
+ *
+ * This must not be called on offline CPUs.
+ *
+ * Returns 0 when function is successfully queued or already pending, else a
+ * negative status code.
+ */
+int smp_queue_function_single(int cpu, struct queue_single_data *qsd)
+{
+	int err;
+
+	if (cmpxchg(&qsd->pending, 0, 1))
+		return 0;
+
+	preempt_disable();
+	err = generic_exec_single(cpu, &qsd->data, generic_smp_queue_function_single_interrupt, qsd, 0);
+	preempt_enable();
+
+	/* Reset in case of error. This must not be called on offline CPUs */
+	if (err)
+		qsd->pending = 0;
+
+	return err;
+}
+
 /*
  * smp_call_function_any - Run a function on any of the given cpus
  * @mask: The mask of cpus it can run on.
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] smp: Non busy-waiting IPI queue
  2014-04-02 18:30     ` Frederic Weisbecker
@ 2014-04-02 19:01       ` Paul E. McKenney
  0 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2014-04-02 19:01 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, Thomas Gleixner, LKML, Andrew Morton, Jens Axboe,
	Kevin Hilman, Peter Zijlstra

On Wed, Apr 02, 2014 at 08:30:23PM +0200, Frederic Weisbecker wrote:
> 2014-04-02 20:05 GMT+02:00 Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> > On Wed, Apr 02, 2014 at 06:26:05PM +0200, Frederic Weisbecker wrote:
> >> diff --git a/kernel/smp.c b/kernel/smp.c
> >> index 06d574e..bfe7b36 100644
> >> --- a/kernel/smp.c
> >> +++ b/kernel/smp.c
> >> @@ -265,6 +265,50 @@ int smp_call_function_single_async(int cpu, struct call_single_data *csd)
> >>  }
> >>  EXPORT_SYMBOL_GPL(smp_call_function_single_async);
> >>
> >> +void generic_smp_queue_function_single_interrupt(void *info)
> >> +{
> >> +     struct queue_single_data *qsd = info;
> >> +
> >> +     WARN_ON_ONCE(xchg(&qsd->pending, 0) != 1);
> >
> > I am probably missing something here, but shouldn't this function copy
> > *qsd to a local on-stack variable before doing the above xchg()?  What
> > prevents the following from happening?
> >
> > o       CPU 0 does smp_queue_function_single(), which sets ->pending
> >         and fills in ->func and ->data.
> >
> > o       CPU 1 takes IPI, invoking generic_smp_queue_function_single_interrupt().
> >
> > o       CPU 1 does xchg(), so that ->pending is now zero.
> >
> > o       An attempt to reuse the queue_single_data sees ->pending equal
> >         to zero, so the ->func and ->data is overwritten.
> >
> > o       CPU 1 calls the new ->func with the new ->data (or any of the other
> >         two possible unexpected outcomes), which might not be helpful to
> >         the kernel's actuarial statistics.
> >
> > So what am I missing?
> 
> Ah, I forgot to precise that the function must remain the same for all
> calls on a single qsd. And the data is always the qsd so this one can
> only stay stable. So that shouldn't be a problem.

I did indeed miss that particular constraint.  ;-)

> But you're right. The fact that we pass the function as an argument of
> smp_queue_function_single() suggests that we can pass a different
> function across various calls on a same qsd. So that's confusing.
> Perhaps changing smp_queue_function_single() such that it only takes
> the qsd as an argument would make that clearer? Then it's up to the
> caller to initialize the qsd with the constant function. I could
> define smp_queue_function_init() for that purpose. Or
> DEFINE_QUEUE_FUNCTION_DATA() for static initializers.
> 
> How does that sound?

Sounds good!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] smp: Non busy-waiting IPI queue
  2014-04-02 18:05   ` Paul E. McKenney
@ 2014-04-02 18:30     ` Frederic Weisbecker
  2014-04-02 19:01       ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Frederic Weisbecker @ 2014-04-02 18:30 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Ingo Molnar, Thomas Gleixner, LKML, Andrew Morton, Jens Axboe,
	Kevin Hilman, Peter Zijlstra

2014-04-02 20:05 GMT+02:00 Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> On Wed, Apr 02, 2014 at 06:26:05PM +0200, Frederic Weisbecker wrote:
>> diff --git a/kernel/smp.c b/kernel/smp.c
>> index 06d574e..bfe7b36 100644
>> --- a/kernel/smp.c
>> +++ b/kernel/smp.c
>> @@ -265,6 +265,50 @@ int smp_call_function_single_async(int cpu, struct call_single_data *csd)
>>  }
>>  EXPORT_SYMBOL_GPL(smp_call_function_single_async);
>>
>> +void generic_smp_queue_function_single_interrupt(void *info)
>> +{
>> +     struct queue_single_data *qsd = info;
>> +
>> +     WARN_ON_ONCE(xchg(&qsd->pending, 0) != 1);
>
> I am probably missing something here, but shouldn't this function copy
> *qsd to a local on-stack variable before doing the above xchg()?  What
> prevents the following from happening?
>
> o       CPU 0 does smp_queue_function_single(), which sets ->pending
>         and fills in ->func and ->data.
>
> o       CPU 1 takes IPI, invoking generic_smp_queue_function_single_interrupt().
>
> o       CPU 1 does xchg(), so that ->pending is now zero.
>
> o       An attempt to reuse the queue_single_data sees ->pending equal
>         to zero, so the ->func and ->data is overwritten.
>
> o       CPU 1 calls the new ->func with the new ->data (or any of the other
>         two possible unexpected outcomes), which might not be helpful to
>         the kernel's actuarial statistics.
>
> So what am I missing?

Ah, I forgot to precise that the function must remain the same for all
calls on a single qsd. And the data is always the qsd so this one can
only stay stable. So that shouldn't be a problem.

But you're right. The fact that we pass the function as an argument of
smp_queue_function_single() suggests that we can pass a different
function across various calls on a same qsd. So that's confusing.
Perhaps changing smp_queue_function_single() such that it only takes
the qsd as an argument would make that clearer? Then it's up to the
caller to initialize the qsd with the constant function. I could
define smp_queue_function_init() for that purpose. Or
DEFINE_QUEUE_FUNCTION_DATA() for static initializers.

How does that sound?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] smp: Non busy-waiting IPI queue
  2014-04-02 16:26 ` [PATCH 1/2] smp: Non busy-waiting IPI queue Frederic Weisbecker
@ 2014-04-02 18:05   ` Paul E. McKenney
  2014-04-02 18:30     ` Frederic Weisbecker
  0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2014-04-02 18:05 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, Thomas Gleixner, LKML, Andrew Morton, Jens Axboe,
	Kevin Hilman, Peter Zijlstra

On Wed, Apr 02, 2014 at 06:26:05PM +0200, Frederic Weisbecker wrote:
> Some IPI users, such as the nohz subsystem, need to be able to send
> an async IPI (async = non waiting for any other IPI completion) on
> contexts with disabled interrupts. And we want the IPI subsystem to handle
> concurrent calls by itself.
> 
> Currently the nohz machinery uses the scheduler IPI for this purpose
> because it can be triggered from any context and doesn't need any
> serialization from the caller. But this is an abuse of a scheduler
> fast path. We are bloating it with a job that should use its own IPI.
> 
> The current set of IPI functions can't be called when interrupts are
> disabled otherwise we risk a deadlock when two CPUs wait for each
> other's IPI completion.
> 
> OTOH smp_call_function_single_async() can be called when interrupts
> are disabled. But then it's up to the caller to serialize the given
> IPI. This can't be called concurrently without special care.
> 
> So we need a version of the async IPI that takes care of concurrent
> calls.
> 
> The proposed solution is to synchronize the IPI with a specific flag
> that prevents the IPI from being sent if it is already pending but not
> yet executed. Ordering is maintained such that, if the IPI is not sent
> because it's already pending, we guarantee it will see the new state of
> the data we expect it to when it will execute.
> 
> This model is close to the irq_work design. It's also partly inspired by
> suggestions from Peter Zijlstra.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Jens Axboe <jens.axboe@oracle.com>
> Cc: Kevin Hilman <khilman@linaro.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>

Nice, but one question below.

							Thanx, Paul

> ---
>  include/linux/smp.h | 12 ++++++++++++
>  kernel/smp.c        | 44 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 56 insertions(+)
> 
> diff --git a/include/linux/smp.h b/include/linux/smp.h
> index 633f5ed..155dc86 100644
> --- a/include/linux/smp.h
> +++ b/include/linux/smp.h
> @@ -29,6 +29,18 @@ extern unsigned int total_cpus;
>  int smp_call_function_single(int cpuid, smp_call_func_t func, void *info,
>  			     int wait);
> 
> +struct queue_single_data;
> +typedef void (*smp_queue_func_t)(struct queue_single_data *qsd);
> +
> +struct queue_single_data {
> +	struct call_single_data data;
> +	smp_queue_func_t func;
> +	int pending;
> +};
> +
> +int smp_queue_function_single(int cpuid, smp_queue_func_t func,
> +			      struct queue_single_data *qsd);
> +
>  /*
>   * Call a function on all processors
>   */
> diff --git a/kernel/smp.c b/kernel/smp.c
> index 06d574e..bfe7b36 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -265,6 +265,50 @@ int smp_call_function_single_async(int cpu, struct call_single_data *csd)
>  }
>  EXPORT_SYMBOL_GPL(smp_call_function_single_async);
> 
> +void generic_smp_queue_function_single_interrupt(void *info)
> +{
> +	struct queue_single_data *qsd = info;
> +
> +	WARN_ON_ONCE(xchg(&qsd->pending, 0) != 1);

I am probably missing something here, but shouldn't this function copy
*qsd to a local on-stack variable before doing the above xchg()?  What
prevents the following from happening?

o	CPU 0 does smp_queue_function_single(), which sets ->pending
	and fills in ->func and ->data.

o	CPU 1 takes IPI, invoking generic_smp_queue_function_single_interrupt().

o	CPU 1 does xchg(), so that ->pending is now zero.

o	An attempt to reuse the queue_single_data sees ->pending equal
	to zero, so the ->func and ->data is overwritten.

o	CPU 1 calls the new ->func with the new ->data (or any of the other
	two possible unexpected outcomes), which might not be helpful to
	the kernel's actuarial statistics.

So what am I missing?

> +	qsd->func(qsd);
> +}
> +
> +/**
> + * smp_queue_function_single - Queue an asynchronous function to run on a
> + * 				specific CPU unless it's already pending.
> + * @func: The function to run. This must be fast and non-blocking.
> + * @qsd: The data contained in the interested object if any
> + *
> + * Like smp_call_function_single_async() but the call to @func is serialized
> + * and won't be queued if it is already pending. In the latter case, ordering
> + * is still guaranteed such that the pending call will sees the new data we
> + * expect it to.
> + *
> + * This must not be called on offline CPUs.
> + *
> + * Returns 0 when @func is successfully queued or already pending, else a negative
> + * status code.
> + */
> +int smp_queue_function_single(int cpu, smp_queue_func_t func, struct queue_single_data *qsd)
> +{
> +	int err;
> +
> +	if (cmpxchg(&qsd->pending, 0, 1))
> +		return 0;
> +
> +	qsd->func = func;
> +
> +	preempt_disable();
> +	err = generic_exec_single(cpu, &qsd->data, generic_smp_queue_function_single_interrupt, qsd, 0);
> +	preempt_enable();
> +
> +	/* Reset is case of error. This must not be called on offline CPUs */
> +	if (err)
> +		qsd->pending = 0;
> +
> +	return err;
> +}
> +
>  /*
>   * smp_call_function_any - Run a function on any of the given cpus
>   * @mask: The mask of cpus it can run on.
> -- 
> 1.8.3.1
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] smp: Non busy-waiting IPI queue
  2014-04-02 16:26 [GIT PULL] nohz: Move nohz kick out of scheduler IPI Frederic Weisbecker
@ 2014-04-02 16:26 ` Frederic Weisbecker
  2014-04-02 18:05   ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Frederic Weisbecker @ 2014-04-02 16:26 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Andrew Morton, Jens Axboe,
	Kevin Hilman, Paul E. McKenney, Peter Zijlstra

Some IPI users, such as the nohz subsystem, need to be able to send
an async IPI (async = non waiting for any other IPI completion) on
contexts with disabled interrupts. And we want the IPI subsystem to handle
concurrent calls by itself.

Currently the nohz machinery uses the scheduler IPI for this purpose
because it can be triggered from any context and doesn't need any
serialization from the caller. But this is an abuse of a scheduler
fast path. We are bloating it with a job that should use its own IPI.

The current set of IPI functions can't be called when interrupts are
disabled otherwise we risk a deadlock when two CPUs wait for each
other's IPI completion.

OTOH smp_call_function_single_async() can be called when interrupts
are disabled. But then it's up to the caller to serialize the given
IPI. This can't be called concurrently without special care.

So we need a version of the async IPI that takes care of concurrent
calls.

The proposed solution is to synchronize the IPI with a specific flag
that prevents the IPI from being sent if it is already pending but not
yet executed. Ordering is maintained such that, if the IPI is not sent
because it's already pending, we guarantee it will see the new state of
the data we expect it to when it will execute.

This model is close to the irq_work design. It's also partly inspired by
suggestions from Peter Zijlstra.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/smp.h | 12 ++++++++++++
 kernel/smp.c        | 44 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index 633f5ed..155dc86 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -29,6 +29,18 @@ extern unsigned int total_cpus;
 int smp_call_function_single(int cpuid, smp_call_func_t func, void *info,
 			     int wait);
 
+struct queue_single_data;
+typedef void (*smp_queue_func_t)(struct queue_single_data *qsd);
+
+struct queue_single_data {
+	struct call_single_data data;
+	smp_queue_func_t func;
+	int pending;
+};
+
+int smp_queue_function_single(int cpuid, smp_queue_func_t func,
+			      struct queue_single_data *qsd);
+
 /*
  * Call a function on all processors
  */
diff --git a/kernel/smp.c b/kernel/smp.c
index 06d574e..bfe7b36 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -265,6 +265,50 @@ int smp_call_function_single_async(int cpu, struct call_single_data *csd)
 }
 EXPORT_SYMBOL_GPL(smp_call_function_single_async);
 
+void generic_smp_queue_function_single_interrupt(void *info)
+{
+	struct queue_single_data *qsd = info;
+
+	WARN_ON_ONCE(xchg(&qsd->pending, 0) != 1);
+	qsd->func(qsd);
+}
+
+/**
+ * smp_queue_function_single - Queue an asynchronous function to run on a
+ * 				specific CPU unless it's already pending.
+ * @func: The function to run. This must be fast and non-blocking.
+ * @qsd: The data contained in the interested object if any
+ *
+ * Like smp_call_function_single_async() but the call to @func is serialized
+ * and won't be queued if it is already pending. In the latter case, ordering
+ * is still guaranteed such that the pending call will sees the new data we
+ * expect it to.
+ *
+ * This must not be called on offline CPUs.
+ *
+ * Returns 0 when @func is successfully queued or already pending, else a negative
+ * status code.
+ */
+int smp_queue_function_single(int cpu, smp_queue_func_t func, struct queue_single_data *qsd)
+{
+	int err;
+
+	if (cmpxchg(&qsd->pending, 0, 1))
+		return 0;
+
+	qsd->func = func;
+
+	preempt_disable();
+	err = generic_exec_single(cpu, &qsd->data, generic_smp_queue_function_single_interrupt, qsd, 0);
+	preempt_enable();
+
+	/* Reset is case of error. This must not be called on offline CPUs */
+	if (err)
+		qsd->pending = 0;
+
+	return err;
+}
+
 /*
  * smp_call_function_any - Run a function on any of the given cpus
  * @mask: The mask of cpus it can run on.
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-04-16  7:37 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-03 16:17 [GIT PULL v2] nohz: Move nohz kick out of scheduler IPI Frederic Weisbecker
2014-04-03 16:17 ` [PATCH 1/2] smp: Non busy-waiting IPI queue Frederic Weisbecker
2014-04-03 16:17 ` [PATCH 2/2] nohz: Move full nohz kick to its own IPI Frederic Weisbecker
2014-04-09 15:28 ` [GIT PULL v2] nohz: Move nohz kick out of scheduler IPI Frederic Weisbecker
2014-04-14 11:22   ` Ingo Molnar
2014-04-14 23:04     ` Frederic Weisbecker
2014-04-15  6:05       ` Ingo Molnar
2014-04-16  7:37 ` Ingo Molnar
  -- strict thread matches above, loose matches on Subject: below --
2014-04-03  0:09 [PATCH 0/2] nohz: Move nohz kick out of scheduler IPI v2 Frederic Weisbecker
2014-04-03  0:09 ` [PATCH 1/2] smp: Non busy-waiting IPI queue Frederic Weisbecker
2014-04-02 16:26 [GIT PULL] nohz: Move nohz kick out of scheduler IPI Frederic Weisbecker
2014-04-02 16:26 ` [PATCH 1/2] smp: Non busy-waiting IPI queue Frederic Weisbecker
2014-04-02 18:05   ` Paul E. McKenney
2014-04-02 18:30     ` Frederic Weisbecker
2014-04-02 19:01       ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.