linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* smp: Start up non-boot CPUs asynchronously
@ 2012-01-31  4:54 Arjan van de Ven
  2012-01-31 12:52 ` Ingo Molnar
  0 siblings, 1 reply; 17+ messages in thread
From: Arjan van de Ven @ 2012-01-31  4:54 UTC (permalink / raw)
  To: linux-kernel; +Cc: Milton Miller, Andrew Morton, Ingo Molnar, arjanvandeven

>From 3700e391ab2841a9f9241e4e31a6281aa59be5f1 Mon Sep 17 00:00:00 2001
From: Arjan van de Ven <arjan@linux.intel.com>
Date: Mon, 30 Jan 2012 20:44:51 -0800
Subject: [PATCH] smp: Start up non-boot CPUs asynchronously

The starting of the "not first" CPUs actually takes a lot of boot time
of the kernel... upto "minutes" on some of the bigger SGI boxes.
Right now, this is a fully sequential operation with the rest of the kernel
boot.

This patch turns this bringup of the other cpus into an asynchronous operation,
saving significant kernel boot time (40% on my laptop!!). Basically
now CPUs get brought up in parallel to disk enumeration, graphic mode bringup
etc etc etc.

Note that the implementation in this patch still waits for all CPUs to
be brought up before starting userspace; I would love to remove that
restriction over time (technically that is simple), but that becomes
then a change in behavior... I'd like to see more discussion on that
being a good idea before I write that patch.

Second note: We add a small delay between the bring up of cpus, this is
needed to actually get a boot time improvement. If we bring up CPUs
straight back-to-back, we hog the cpu hotplug lock for write, and
that lock is used everywhere during initialization for read. By
adding a small delay, we allow those tasks to make progress.

CC: Milton Miller <miltonm@bga.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Ingo Molnar <mingo@elte.hu>

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 kernel/smp.c |   32 +++++++++++++++++++++++++++++++-
 1 files changed, 31 insertions(+), 1 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index db197d6..b83d82e 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -12,6 +12,8 @@
 #include <linux/gfp.h>
 #include <linux/smp.h>
 #include <linux/cpu.h>
+#include <linux/async.h>
+#include <linux/delay.h>
 
 #ifdef CONFIG_USE_GENERIC_SMP_HELPERS
 static struct {
@@ -664,17 +666,45 @@ void __init setup_nr_cpu_ids(void)
 	nr_cpu_ids = find_last_bit(cpumask_bits(cpu_possible_mask),NR_CPUS) + 1;
 }
 
+void __init async_cpu_up(void *data, async_cookie_t cookie)
+{
+	unsigned long nr = (unsigned long) data;
+	/*
+	 * we can only up one cpu at a time, due to the hotplug lock;
+	 * it's better to wait for all earlier CPUs to be done before
+	 * us so that the bring up order is predictable.
+	 */
+	async_synchronize_cookie(cookie);
+	/*
+	 * wait a little bit of time between cpus, to allow
+	 * the kernel boot to not get stuck for a long time
+	 * on the hotplug lock. We wait longer for the first
+	 * CPU since many of the early kernel init code is
+	 * of the hotplug-lock using type.
+	 */
+	if (nr < 2)
+		msleep(100);
+	else
+		msleep(5);
+	cpu_up(nr);
+}
+
 /* Called by boot processor to activate the rest. */
 void __init smp_init(void)
 {
 	unsigned int cpu;
 
 	/* FIXME: This should be done in userspace --RR */
+
+	/*
+	 * But until we do this in userspace, we're going to do this
+	 * in parallel to the rest of the kernel boot up.-- Arjan
+	 */
 	for_each_present_cpu(cpu) {
 		if (num_online_cpus() >= setup_max_cpus)
 			break;
 		if (!cpu_online(cpu))
-			cpu_up(cpu);
+			async_schedule(async_cpu_up, (void *) cpu);
 	}
 
 	/* Any cleanup work */
-- 
1.7.6.4



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: smp: Start up non-boot CPUs asynchronously
  2012-01-31  4:54 smp: Start up non-boot CPUs asynchronously Arjan van de Ven
@ 2012-01-31 12:52 ` Ingo Molnar
  2012-01-31 13:41   ` Arjan van de Ven
  0 siblings, 1 reply; 17+ messages in thread
From: Ingo Molnar @ 2012-01-31 12:52 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: linux-kernel, Milton Miller, Andrew Morton, arjanvandeven,
	Linus Torvalds, Peter Zijlstra, Thomas Gleixner, H. Peter Anvin


* Arjan van de Ven <arjan@infradead.org> wrote:

> >From 3700e391ab2841a9f9241e4e31a6281aa59be5f1 Mon Sep 17 00:00:00 2001
> From: Arjan van de Ven <arjan@linux.intel.com>
> Date: Mon, 30 Jan 2012 20:44:51 -0800
> Subject: [PATCH] smp: Start up non-boot CPUs asynchronously
> 
> The starting of the "not first" CPUs actually takes a lot of 
> boot time of the kernel... upto "minutes" on some of the 
> bigger SGI boxes. Right now, this is a fully sequential 
> operation with the rest of the kernel boot.

Yeah.

> This patch turns this bringup of the other cpus into an 
> asynchronous operation, saving significant kernel boot time 
> (40% on my laptop!!). Basically now CPUs get brought up in 
> parallel to disk enumeration, graphic mode bringup etc etc 
> etc.

Very nice!

> Note that the implementation in this patch still waits for all 
> CPUs to be brought up before starting userspace; I would love 
> to remove that restriction over time (technically that is 
> simple), but that becomes then a change in behavior... I'd 
> like to see more discussion on that being a good idea before I 
> write that patch.

Yeah, it's a good idea to be conservative with that - most of 
the silent assumptions will be on the kernel init side anyway 
and we want to map those out first, without any userspace 
variance mixed in.

I'd expect this patch to eventually break stuff in the kernel - 
we'll fix any kernel bugs that get uncovered, and we can move on 
to make things more parallel once that process has stabilized.

> Second note: We add a small delay between the bring up of 
> cpus, this is needed to actually get a boot time improvement. 
> If we bring up CPUs straight back-to-back, we hog the cpu 
> hotplug lock for write, and that lock is used everywhere 
> during initialization for read. By adding a small delay, we 
> allow those tasks to make progress.

> +void __init async_cpu_up(void *data, async_cookie_t cookie)
> +{
> +	unsigned long nr = (unsigned long) data;
> +	/*
> +	 * we can only up one cpu at a time, due to the hotplug lock;
> +	 * it's better to wait for all earlier CPUs to be done before
> +	 * us so that the bring up order is predictable.
> +	 */
> +	async_synchronize_cookie(cookie);
> +	/*
> +	 * wait a little bit of time between cpus, to allow
> +	 * the kernel boot to not get stuck for a long time
> +	 * on the hotplug lock. We wait longer for the first
> +	 * CPU since many of the early kernel init code is
> +	 * of the hotplug-lock using type.
> +	 */
> +	if (nr < 2)
> +		msleep(100);
> +	else
> +		msleep(5);

Hm, the limits here seem way too ad-hoc and rigid to me.

The bigger worry is that it makes the asynchronity of the boot 
process very timing dependent, 'hiding' a lot of early code on 
faster boxes and only interleaving the execution on slower 
boxes. But slower boxes are harder to debug!

The real fix would be to make the init code depend less on each 
other, i.e. have less hotplug lock dependencies. Or, if it's 
such a hot lock for a good reason, why does spinning on it slow 
down the boot process? It really shouldnt.

So i think this bit is not a good idea. Lets just be fully 
parallel and profile early execution via 'perf kvm' or so and 
figure out where the hotplug lock overhead comes from?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: smp: Start up non-boot CPUs asynchronously
  2012-01-31 12:52 ` Ingo Molnar
@ 2012-01-31 13:41   ` Arjan van de Ven
  2012-01-31 14:31     ` Ingo Molnar
  0 siblings, 1 reply; 17+ messages in thread
From: Arjan van de Ven @ 2012-01-31 13:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Milton Miller, Andrew Morton, arjanvandeven,
	Linus Torvalds, Peter Zijlstra, Thomas Gleixner, H. Peter Anvin

On Tue, 31 Jan 2012 13:52:32 +0100
Ingo Molnar <mingo@elte.hu> wrote:
> 
> The real fix would be to make the init code depend less on each 
> other, i.e. have less hotplug lock dependencies. Or, if it's 
> such a hot lock for a good reason, why does spinning on it slow 
> down the boot process? It really shouldnt.

by inspection, anything that calls get_online_cpus()/put_online_cpus()
will block while a CPU is coming up. This is used in things like 
kmem_cache_create()... which is used about everywhere.
(there's various other places... more or less it's a requirement
for using the for_each_online_cpu() api correctly)



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: smp: Start up non-boot CPUs asynchronously
  2012-01-31 13:41   ` Arjan van de Ven
@ 2012-01-31 14:31     ` Ingo Molnar
  2012-01-31 15:22       ` Arjan van de Ven
  0 siblings, 1 reply; 17+ messages in thread
From: Ingo Molnar @ 2012-01-31 14:31 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: linux-kernel, Milton Miller, Andrew Morton, arjanvandeven,
	Linus Torvalds, Peter Zijlstra, Thomas Gleixner, H. Peter Anvin


* Arjan van de Ven <arjan@infradead.org> wrote:

> On Tue, 31 Jan 2012 13:52:32 +0100
> Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > The real fix would be to make the init code depend less on each 
> > other, i.e. have less hotplug lock dependencies. Or, if it's 
> > such a hot lock for a good reason, why does spinning on it slow 
> > down the boot process? It really shouldnt.
> 
> by inspection, anything that calls 
> get_online_cpus()/put_online_cpus() will block while a CPU is 
> coming up. This is used in things like kmem_cache_create()... 
> which is used about everywhere. (there's various other 
> places... more or less it's a requirement for using the 
> for_each_online_cpu() api correctly)

Still magic delays are not acceptable - we want to face any 
remaining performance problems head on, we want to understand 
and fix them correctly.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: smp: Start up non-boot CPUs asynchronously
  2012-01-31 14:31     ` Ingo Molnar
@ 2012-01-31 15:22       ` Arjan van de Ven
  2012-01-31 16:12         ` Ingo Molnar
  0 siblings, 1 reply; 17+ messages in thread
From: Arjan van de Ven @ 2012-01-31 15:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Milton Miller, Andrew Morton, arjanvandeven,
	Linus Torvalds, Peter Zijlstra, Thomas Gleixner, H. Peter Anvin

On Tue, 31 Jan 2012 15:31:31 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> > 
> > by inspection, anything that calls 
> > get_online_cpus()/put_online_cpus() will block while a CPU is 
> > coming up. This is used in things like kmem_cache_create()... 
> > which is used about everywhere. (there's various other 
> > places... more or less it's a requirement for using the 
> > for_each_online_cpu() api correctly)
> 
> Still magic delays are not acceptable - we want to face any 
> remaining performance problems head on, we want to understand 
> and fix them correctly.

it's not really a performance problem as it is an obvious "we have a
ton of back-to-back writers on a read-write lock that we have quite a
few readers for". Unless the writers back off a little, the readers are
going to get starved.


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: smp: Start up non-boot CPUs asynchronously
  2012-01-31 15:22       ` Arjan van de Ven
@ 2012-01-31 16:12         ` Ingo Molnar
  2012-01-31 16:24           ` Arjan van de Ven
  0 siblings, 1 reply; 17+ messages in thread
From: Ingo Molnar @ 2012-01-31 16:12 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: linux-kernel, Milton Miller, Andrew Morton, arjanvandeven,
	Linus Torvalds, Peter Zijlstra, Thomas Gleixner, H. Peter Anvin


* Arjan van de Ven <arjan@infradead.org> wrote:

> On Tue, 31 Jan 2012 15:31:31 +0100
> Ingo Molnar <mingo@elte.hu> wrote:
> 
> > > 
> > > by inspection, anything that calls 
> > > get_online_cpus()/put_online_cpus() will block while a CPU 
> > > is coming up. This is used in things like 
> > > kmem_cache_create()... which is used about everywhere. 
> > > (there's various other places... more or less it's a 
> > > requirement for using the for_each_online_cpu() api 
> > > correctly)
> > 
> > Still magic delays are not acceptable - we want to face any 
> > remaining performance problems head on, we want to 
> > understand and fix them correctly.
> 
> it's not really a performance problem as it is an obvious "we 
> have a ton of back-to-back writers on a read-write lock that 
> we have quite a few readers for". Unless the writers back off 
> a little, the readers are going to get starved.

I didn't think I'd ever quote Bush, but my reaction to that is: 
'Bring it on!' ;-)

Really, we don't want *more* random delays in kernel code.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: smp: Start up non-boot CPUs asynchronously
  2012-01-31 16:12         ` Ingo Molnar
@ 2012-01-31 16:24           ` Arjan van de Ven
  2012-02-01 22:55             ` Andrew Morton
  2012-02-14  8:17             ` Srivatsa S. Bhat
  0 siblings, 2 replies; 17+ messages in thread
From: Arjan van de Ven @ 2012-01-31 16:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Milton Miller, Andrew Morton, arjanvandeven,
	Linus Torvalds, Peter Zijlstra, Thomas Gleixner, H. Peter Anvin

On Tue, 31 Jan 2012 17:12:07 +0100
Ingo Molnar <mingo@elte.hu> wrote:
> > it's not really a performance problem as it is an obvious "we 
> > have a ton of back-to-back writers on a read-write lock that 
> > we have quite a few readers for". Unless the writers back off 
> > a little, the readers are going to get starved.
> 
> I didn't think I'd ever quote Bush, but my reaction to that is: 
> 'Bring it on!' ;-)

You ask and you shall receive....
In this patch are "No New Taxis"

I do have a request: do run with this patch, and make a bootchart.pl
of the result, and you'll see what I meant....


>From ee65be59057c920042747d46dc174c5a5a56c744 Mon Sep 17 00:00:00 2001
From: Arjan van de Ven <arjan@linux.intel.com>
Date: Mon, 30 Jan 2012 20:44:51 -0800
Subject: [PATCH] smp: Start up non-boot CPUs asynchronously

The starting of the "not first" CPUs actually takes a lot of boot time
of the kernel... upto "minutes" on some of the bigger SGI boxes.
Right now, this is a fully sequential operation with the rest of the kernel
boot.

This patch turns this bringup of the other cpus into an asynchronous operation.
With some other changes (not in this patch) this can save significant kernel
boot time (upto 40% on my laptop!!).
Basically now CPUs could get brought up in parallel to disk enumeration, graphic
mode bringup etc etc etc.

Note that the implementation in this patch still waits for all CPUs to
be brought up before starting userspace; I would love to remove that
restriction over time (technically that is simple), but that becomes
then a change in behavior... I'd like to see more discussion on that
being a good idea before I write that patch.

Second note on version 2 of the patch:
This patch does currently not save any boot time, due to a situation
where the cpu hotplug lock gets taken for write by the cpu bringup code,
which starves out readers of this lock throughout the kernel.
Ingo specifically requested this behavior to expose this lock problem.

CC: Milton Miller <miltonm@bga.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Ingo Molnar <mingo@elte.hu>

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 kernel/smp.c |   21 ++++++++++++++++++++-
 1 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index db197d6..ea48418 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -12,6 +12,8 @@
 #include <linux/gfp.h>
 #include <linux/smp.h>
 #include <linux/cpu.h>
+#include <linux/async.h>
+#include <linux/delay.h>
 
 #ifdef CONFIG_USE_GENERIC_SMP_HELPERS
 static struct {
@@ -664,17 +666,34 @@ void __init setup_nr_cpu_ids(void)
 	nr_cpu_ids = find_last_bit(cpumask_bits(cpu_possible_mask),NR_CPUS) + 1;
 }
 
+void __init async_cpu_up(void *data, async_cookie_t cookie)
+{
+	unsigned long nr = (unsigned long) data;
+	/*
+	 * we can only up one cpu at a time, as enforced by the hotplug
+	 * lock; it's better to wait for all earlier CPUs to be done before
+	 * we bring up ours, so that the bring up order is predictable.
+	 */
+	async_synchronize_cookie(cookie);
+	cpu_up(nr);
+}
+
 /* Called by boot processor to activate the rest. */
 void __init smp_init(void)
 {
 	unsigned int cpu;
 
 	/* FIXME: This should be done in userspace --RR */
+
+	/*
+	 * But until we do this in userspace, we're going to do this
+	 * in parallel to the rest of the kernel boot up.-- Arjan
+	 */
 	for_each_present_cpu(cpu) {
 		if (num_online_cpus() >= setup_max_cpus)
 			break;
 		if (!cpu_online(cpu))
-			cpu_up(cpu);
+			async_schedule(async_cpu_up, (void *) cpu);
 	}
 
 	/* Any cleanup work */
-- 
1.7.6.4



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: smp: Start up non-boot CPUs asynchronously
  2012-01-31 16:24           ` Arjan van de Ven
@ 2012-02-01 22:55             ` Andrew Morton
       [not found]               ` <CADyApD0yVOePmaLznks_h6xR_BCUjzEFUB7VtsL9vvsoHwCOVw@mail.gmail.com>
  2012-02-14  8:17             ` Srivatsa S. Bhat
  1 sibling, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2012-02-01 22:55 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Ingo Molnar, linux-kernel, Milton Miller, arjanvandeven,
	Linus Torvalds, Peter Zijlstra, Thomas Gleixner, H. Peter Anvin

On Tue, 31 Jan 2012 08:24:39 -0800
Arjan van de Ven <arjan@infradead.org> wrote:

> The starting of the "not first" CPUs actually takes a lot of boot time
> of the kernel... upto "minutes" on some of the bigger SGI boxes.

Where is all this time being spent?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: smp: Start up non-boot CPUs asynchronously
       [not found]               ` <CADyApD0yVOePmaLznks_h6xR_BCUjzEFUB7VtsL9vvsoHwCOVw@mail.gmail.com>
@ 2012-02-01 23:31                 ` Linus Torvalds
  0 siblings, 0 replies; 17+ messages in thread
From: Linus Torvalds @ 2012-02-01 23:31 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andrew Morton, Arjan van de Ven, Ingo Molnar, linux-kernel,
	Milton Miller, Peter Zijlstra, Thomas Gleixner, H. Peter Anvin

On Wed, Feb 1, 2012 at 3:09 PM, Arjan van de Ven
<arjanvandeven@gmail.com> wrote:
>
>
> we spend slightly more than 10 milliseconds on doing the hardware level
> "send ipi, wait for the cpu to get power"  dance. This is mostly just
> hardware physics.
> we spend a bunch of time calibrating loops-per-jiffie/tsc (in 3.3-rc this is
> only done once per socket, but each time we do it, it's several dozen
> milliseconds)
> we spend 20 milliseconds on making sure the tsc is not out of sync with the
> rest of the system (we're looking at optimizing this right now)
>
> a 3.2 kernel spent on average 120 milliseconds per logical non-boot cpu on
> my laptop. 3.3-rc is better (the calibration is now cached for each physical
> cpu), but still dire

Could we drop the cpu hotplug lock earlier?

In particular, maybe we could split it up, and make it something like
the following:

 - keep the existing cpu_hotplug.lock with largely the same semantics

 - add a new *per-cpu* hotplug lock that gets taken fairly early when
the CPU comes up (before calibration), and then we can drop the global
lock. We just need to make sure that the CPU has been added to the
list of CPU's, we don't need for the CPU to have fully initialized
itself.

 - on cpu unplug, we first take the global lock, and then after that
we need to take the local lock of the CPU being brought down - so that
a "down" event cannot succeed before the previous "up" is complete.

Wouldn't something like that largely solve the problem? Sure, maybe
some of the current get_online_cpus() users would need to be taught to
wait for the percpu lock (or completion - maybe that would be better),
but most of them don't really care. They tend to want to just do
something fairly simple with a stable list of CPU's.

I dunno. Maybe it would be more painful than I think it would.

                  Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: smp: Start up non-boot CPUs asynchronously
  2012-01-31 16:24           ` Arjan van de Ven
  2012-02-01 22:55             ` Andrew Morton
@ 2012-02-14  8:17             ` Srivatsa S. Bhat
  2012-02-14  9:48               ` Srivatsa S. Bhat
  1 sibling, 1 reply; 17+ messages in thread
From: Srivatsa S. Bhat @ 2012-02-14  8:17 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Ingo Molnar, linux-kernel, Milton Miller, Andrew Morton,
	arjanvandeven, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	H. Peter Anvin, benh, Vaidyanathan Srinivasan, Stephen Rothwell,
	mikey, Paul E. McKenney, gregkh, Srivatsa Vaddagiri, ppc-dev

On 01/31/2012 09:54 PM, Arjan van de Ven wrote:

> From ee65be59057c920042747d46dc174c5a5a56c744 Mon Sep 17 00:00:00 2001
> From: Arjan van de Ven <arjan@linux.intel.com>
> Date: Mon, 30 Jan 2012 20:44:51 -0800
> Subject: [PATCH] smp: Start up non-boot CPUs asynchronously
> 
> The starting of the "not first" CPUs actually takes a lot of boot time
> of the kernel... upto "minutes" on some of the bigger SGI boxes.
> Right now, this is a fully sequential operation with the rest of the kernel
> boot.
> 
> This patch turns this bringup of the other cpus into an asynchronous operation.
> With some other changes (not in this patch) this can save significant kernel
> boot time (upto 40% on my laptop!!).
> Basically now CPUs could get brought up in parallel to disk enumeration, graphic
> mode bringup etc etc etc.
> 
> Note that the implementation in this patch still waits for all CPUs to
> be brought up before starting userspace; I would love to remove that
> restriction over time (technically that is simple), but that becomes
> then a change in behavior... I'd like to see more discussion on that
> being a good idea before I write that patch.
> 
> Second note on version 2 of the patch:
> This patch does currently not save any boot time, due to a situation
> where the cpu hotplug lock gets taken for write by the cpu bringup code,
> which starves out readers of this lock throughout the kernel.
> Ingo specifically requested this behavior to expose this lock problem.
> 
> CC: Milton Miller <miltonm@bga.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Ingo Molnar <mingo@elte.hu>
> 
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> ---
>  kernel/smp.c |   21 ++++++++++++++++++++-
>  1 files changed, 20 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/smp.c b/kernel/smp.c
> index db197d6..ea48418 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -12,6 +12,8 @@
>  #include <linux/gfp.h>
>  #include <linux/smp.h>
>  #include <linux/cpu.h>
> +#include <linux/async.h>
> +#include <linux/delay.h>
> 
>  #ifdef CONFIG_USE_GENERIC_SMP_HELPERS
>  static struct {
> @@ -664,17 +666,34 @@ void __init setup_nr_cpu_ids(void)
>  	nr_cpu_ids = find_last_bit(cpumask_bits(cpu_possible_mask),NR_CPUS) + 1;
>  }
> 
> +void __init async_cpu_up(void *data, async_cookie_t cookie)
> +{
> +	unsigned long nr = (unsigned long) data;
> +	/*
> +	 * we can only up one cpu at a time, as enforced by the hotplug
> +	 * lock; it's better to wait for all earlier CPUs to be done before
> +	 * we bring up ours, so that the bring up order is predictable.
> +	 */
> +	async_synchronize_cookie(cookie);
> +	cpu_up(nr);
> +}
> +
>  /* Called by boot processor to activate the rest. */
>  void __init smp_init(void)
>  {
>  	unsigned int cpu;
> 
>  	/* FIXME: This should be done in userspace --RR */
> +
> +	/*
> +	 * But until we do this in userspace, we're going to do this
> +	 * in parallel to the rest of the kernel boot up.-- Arjan
> +	 */
>  	for_each_present_cpu(cpu) {
>  		if (num_online_cpus() >= setup_max_cpus)
>  			break;
>  		if (!cpu_online(cpu))
> -			cpu_up(cpu);
> +			async_schedule(async_cpu_up, (void *) cpu);
>  	}
> 
>  	/* Any cleanup work */


If I understand correctly, with this patch, the booting of non-boot CPUs
will happen in parallel with the rest of the kernel boot, but bringing up
of individual CPU is still serialized (due to hotplug lock).

If that is correct, I see several issues with this patch:

1. In smp_init(), after the comment "Any cleanup work" (see above), we print:
printk(KERN_INFO "Brought up %ld CPUs\n", (long)num_online_cpus());
So this can potentially print less than expected number of CPUs and might
surprise users.

2. Just below that we have smp_cpus_done(setup_max_cpus); and this translates
to native_smp_cpus_done() under x86, which calls impress_friends().
And that means, the bogosum calculation and the total activated processor
count which is printed, may get messed up.

3. sched_init_smp() is called immediately after smp_init(). And that calls
init_sched_domains(cpu_active_mask). Of course, it registers a hotplug
notifier callback to handle hot-added cpus.. but with this patch, boot up can
actually become unnecessarily slow at this point - what could have been done
in one go with an appropriately filled up cpu_active_mask, needs to be done
again and again using notifier callbacks. IOW, building sched domains can
potentially become a bottleneck, especially if there are lots and lots of
cpus in the machine.

4. There is an unhandled race condition (tiny window) in sched_init_smp():

get_online_cpus();
...
init_sched_domains(cpu_active_mask);
...
put_online_cpus();
                                           <<<<<<<<<<<<<<<<<<<<<<<< There!

hotcpu_notifier(cpuset_cpu_active, CPU_PRI_CPUSET_ACTIVE);
hotcpu_notifier(cpuset_cpu_inactive, CPU_PRI_CPUSET_INACTIVE);

At the point shown above, some non-boot cpus can get booted up, without
being noticed by the scheduler.

5. And in powerpc, it creates a new race condition, as explained in
https://lkml.org/lkml/2012/2/13/383
(Of course, we can fix it trivially by using get/put_online_cpus().)

There could be many more things that this patch breaks.. I haven't checked
thoroughly.

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: smp: Start up non-boot CPUs asynchronously
  2012-02-14  8:17             ` Srivatsa S. Bhat
@ 2012-02-14  9:48               ` Srivatsa S. Bhat
       [not found]                 ` <CADyApD0o4UYsTkqf2H2yJZ-d05NAyRAEc6z+m1gJEogc=cZLqQ@mail.gmail.com>
                                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Srivatsa S. Bhat @ 2012-02-14  9:48 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Stephen Rothwell, mikey, Paul E. McKenney, Peter Zijlstra,
	gregkh, ppc-dev, linux-kernel, Milton Miller, Srivatsa Vaddagiri,
	Thomas Gleixner, H. Peter Anvin, arjanvandeven, Andrew Morton,
	Linus Torvalds, Ingo Molnar

On 02/14/2012 01:47 PM, Srivatsa S. Bhat wrote:

> On 01/31/2012 09:54 PM, Arjan van de Ven wrote:
> 
>> From ee65be59057c920042747d46dc174c5a5a56c744 Mon Sep 17 00:00:00 2001
>> From: Arjan van de Ven <arjan@linux.intel.com>
>> Date: Mon, 30 Jan 2012 20:44:51 -0800
>> Subject: [PATCH] smp: Start up non-boot CPUs asynchronously
>>
>> The starting of the "not first" CPUs actually takes a lot of boot time
>> of the kernel... upto "minutes" on some of the bigger SGI boxes.
>> Right now, this is a fully sequential operation with the rest of the kernel
>> boot.
>>
>> This patch turns this bringup of the other cpus into an asynchronous operation.
>> With some other changes (not in this patch) this can save significant kernel
>> boot time (upto 40% on my laptop!!).
>> Basically now CPUs could get brought up in parallel to disk enumeration, graphic
>> mode bringup etc etc etc.
>>
>> Note that the implementation in this patch still waits for all CPUs to
>> be brought up before starting userspace; I would love to remove that
>> restriction over time (technically that is simple), but that becomes
>> then a change in behavior... I'd like to see more discussion on that
>> being a good idea before I write that patch.
>>
>> Second note on version 2 of the patch:
>> This patch does currently not save any boot time, due to a situation
>> where the cpu hotplug lock gets taken for write by the cpu bringup code,
>> which starves out readers of this lock throughout the kernel.
>> Ingo specifically requested this behavior to expose this lock problem.
>>
>> CC: Milton Miller <miltonm@bga.com>
>> CC: Andrew Morton <akpm@linux-foundation.org>
>> CC: Ingo Molnar <mingo@elte.hu>
>>
>> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
>> ---
>>  kernel/smp.c |   21 ++++++++++++++++++++-
>>  1 files changed, 20 insertions(+), 1 deletions(-)
>>
>> diff --git a/kernel/smp.c b/kernel/smp.c
>> index db197d6..ea48418 100644
>> --- a/kernel/smp.c
>> +++ b/kernel/smp.c
>> @@ -12,6 +12,8 @@
>>  #include <linux/gfp.h>
>>  #include <linux/smp.h>
>>  #include <linux/cpu.h>
>> +#include <linux/async.h>
>> +#include <linux/delay.h>
>>
>>  #ifdef CONFIG_USE_GENERIC_SMP_HELPERS
>>  static struct {
>> @@ -664,17 +666,34 @@ void __init setup_nr_cpu_ids(void)
>>  	nr_cpu_ids = find_last_bit(cpumask_bits(cpu_possible_mask),NR_CPUS) + 1;
>>  }
>>
>> +void __init async_cpu_up(void *data, async_cookie_t cookie)
>> +{
>> +	unsigned long nr = (unsigned long) data;
>> +	/*
>> +	 * we can only up one cpu at a time, as enforced by the hotplug
>> +	 * lock; it's better to wait for all earlier CPUs to be done before
>> +	 * we bring up ours, so that the bring up order is predictable.
>> +	 */
>> +	async_synchronize_cookie(cookie);
>> +	cpu_up(nr);
>> +}
>> +
>>  /* Called by boot processor to activate the rest. */
>>  void __init smp_init(void)
>>  {
>>  	unsigned int cpu;
>>
>>  	/* FIXME: This should be done in userspace --RR */
>> +
>> +	/*
>> +	 * But until we do this in userspace, we're going to do this
>> +	 * in parallel to the rest of the kernel boot up.-- Arjan
>> +	 */
>>  	for_each_present_cpu(cpu) {
>>  		if (num_online_cpus() >= setup_max_cpus)
>>  			break;
>>  		if (!cpu_online(cpu))
>> -			cpu_up(cpu);
>> +			async_schedule(async_cpu_up, (void *) cpu);
>>  	}
>>
>>  	/* Any cleanup work */
> 
> 
> If I understand correctly, with this patch, the booting of non-boot CPUs
> will happen in parallel with the rest of the kernel boot, but bringing up
> of individual CPU is still serialized (due to hotplug lock).
> 
> If that is correct, I see several issues with this patch:
> 
> 1. In smp_init(), after the comment "Any cleanup work" (see above), we print:
> printk(KERN_INFO "Brought up %ld CPUs\n", (long)num_online_cpus());
> So this can potentially print less than expected number of CPUs and might
> surprise users.
> 
> 2. Just below that we have smp_cpus_done(setup_max_cpus); and this translates
> to native_smp_cpus_done() under x86, which calls impress_friends().
> And that means, the bogosum calculation and the total activated processor
> count which is printed, may get messed up.
> 
> 3. sched_init_smp() is called immediately after smp_init(). And that calls
> init_sched_domains(cpu_active_mask). Of course, it registers a hotplug
> notifier callback to handle hot-added cpus.. but with this patch, boot up can
> actually become unnecessarily slow at this point - what could have been done
> in one go with an appropriately filled up cpu_active_mask, needs to be done
> again and again using notifier callbacks. IOW, building sched domains can
> potentially become a bottleneck, especially if there are lots and lots of
> cpus in the machine.
> 
> 4. There is an unhandled race condition (tiny window) in sched_init_smp():
> 
> get_online_cpus();
> ...
> init_sched_domains(cpu_active_mask);
> ...
> put_online_cpus();
>                                            <<<<<<<<<<<<<<<<<<<<<<<< There!
> 
> hotcpu_notifier(cpuset_cpu_active, CPU_PRI_CPUSET_ACTIVE);
> hotcpu_notifier(cpuset_cpu_inactive, CPU_PRI_CPUSET_INACTIVE);
> 
> At the point shown above, some non-boot cpus can get booted up, without
> being noticed by the scheduler.
> 
> 5. And in powerpc, it creates a new race condition, as explained in
> https://lkml.org/lkml/2012/2/13/383
> (Of course, we can fix it trivially by using get/put_online_cpus().)
> 


Actually, this one is trickier than that, to get it perfectly right.
[see point 8 below].

6. I also observed that in powerpc, a distinction is made implicitly between
a cpu booting for the first time vs a soft CPU online event. That is, for
freshly booted cpus, the following 3 functions are called:
(Refer arch/powerpc/kernel/sysfs.c: topology_init)

	register_cpu(c, cpu);
	device_create_file(&c->dev, &dev_attr_physical_id);
	register_cpu_online(cpu);

However, for a soft CPU Online event, only the last function is called.
(And that looks correct because it matches properly with what is done
upon CPU offline - only unregister_cpu_online() is called).

IOW, with this patch it becomes necessary to carefully examine all code
with such implicit assumptions and modify them to handle the async boot up
properly.

7. And whichever code between smp_init() and async_synchronize_full() didn't
care about CPU hotplug till today but depended on all cpus being online must
suddenly start worrying about CPU Hotplug. They must register a cpu notifier
and handle callbacks etc etc.. Or if they are not worth that complexity, they
should atleast be redesigned or moved around - like the print statements that
tell how many cpus came up, for example.

8. And we should provide a way in which a piece of code can easily "catch" all
CPU_ONLINE/UP_PREPARE events without missing any of them due to race
conditions. Of course register_cpu_notifier() and friends are provided for
that purpose, but they can't be used as it is in this boot up code..
And calling register_cpu_notifier() within get/put_online_cpus() would be a
disaster since that could lead to ABBA deadlock between cpu_add_remove_lock
and cpu_hotplug.lock

> There could be many more things that this patch breaks.. I haven't checked
> thoroughly.
> 

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: smp: Start up non-boot CPUs asynchronously
       [not found]                 ` <CADyApD0o4UYsTkqf2H2yJZ-d05NAyRAEc6z+m1gJEogc=cZLqQ@mail.gmail.com>
@ 2012-02-14 15:20                   ` Peter Zijlstra
  2012-02-14 19:57                   ` Srivatsa S. Bhat
  1 sibling, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2012-02-14 15:20 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Srivatsa S. Bhat, Arjan van de Ven, Stephen Rothwell, mikey,
	Paul E. McKenney, gregkh, ppc-dev, linux-kernel, Milton Miller,
	Srivatsa Vaddagiri, Thomas Gleixner, H. Peter Anvin,
	Andrew Morton, Linus Torvalds, Ingo Molnar, Arjan van de Ven

On Tue, 2012-02-14 at 06:31 -0800, Arjan van de Ven wrote:
> 
> frankly, such code HAS to worry about cpus going online and offline
> even today; the firmware, at least on X86, can start taking cores
> offline/online once ACPI is initialized.... (as controlled by a data
> center manager from outside the box, usually done based on thermal or
> power conditions on a datacenter level). Now, no doubt that we have
> bugs in this space, since this only happened very rarely before. 

Which frankly is an utter piece of crap, that ACPI spec is total garbage
and completely useless. You might have noticed that the ACPI code
supporting that failure carries a big nacked-by from me.

That's not to say we shouldn't try to fix hotplug, but bringing that
ACPI nonsense to the table makes me care less, not more.

I mean, really, that spec is broken, the support is worse.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: smp: Start up non-boot CPUs asynchronously
  2012-02-14  9:48               ` Srivatsa S. Bhat
       [not found]                 ` <CADyApD0o4UYsTkqf2H2yJZ-d05NAyRAEc6z+m1gJEogc=cZLqQ@mail.gmail.com>
@ 2012-02-14 19:32                 ` Srivatsa S. Bhat
  2012-02-14 21:28                 ` Benjamin Herrenschmidt
  2 siblings, 0 replies; 17+ messages in thread
From: Srivatsa S. Bhat @ 2012-02-14 19:32 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Stephen Rothwell, mikey, Peter Zijlstra, gregkh, Ingo Molnar,
	linux-kernel, Milton Miller, Srivatsa Vaddagiri, Linus Torvalds,
	H. Peter Anvin, arjanvandeven, Thomas Gleixner, Paul E. McKenney,
	ppc-dev, Andrew Morton

On 02/14/2012 03:18 PM, Srivatsa S. Bhat wrote:

> On 02/14/2012 01:47 PM, Srivatsa S. Bhat wrote:
> 
>> On 01/31/2012 09:54 PM, Arjan van de Ven wrote:
>>
>>> From ee65be59057c920042747d46dc174c5a5a56c744 Mon Sep 17 00:00:00 2001
>>> From: Arjan van de Ven <arjan@linux.intel.com>
>>> Date: Mon, 30 Jan 2012 20:44:51 -0800
>>> Subject: [PATCH] smp: Start up non-boot CPUs asynchronously
>>>
>>> The starting of the "not first" CPUs actually takes a lot of boot time
>>> of the kernel... upto "minutes" on some of the bigger SGI boxes.
>>> Right now, this is a fully sequential operation with the rest of the kernel
>>> boot.
>>>
>>> This patch turns this bringup of the other cpus into an asynchronous operation.
>>> With some other changes (not in this patch) this can save significant kernel
>>> boot time (upto 40% on my laptop!!).
>>> Basically now CPUs could get brought up in parallel to disk enumeration, graphic
>>> mode bringup etc etc etc.
>>>
>>> Note that the implementation in this patch still waits for all CPUs to
>>> be brought up before starting userspace; I would love to remove that
>>> restriction over time (technically that is simple), but that becomes
>>> then a change in behavior... I'd like to see more discussion on that
>>> being a good idea before I write that patch.
>>>
>>> Second note on version 2 of the patch:
>>> This patch does currently not save any boot time, due to a situation
>>> where the cpu hotplug lock gets taken for write by the cpu bringup code,
>>> which starves out readers of this lock throughout the kernel.
>>> Ingo specifically requested this behavior to expose this lock problem.
>>>
>>> CC: Milton Miller <miltonm@bga.com>
>>> CC: Andrew Morton <akpm@linux-foundation.org>
>>> CC: Ingo Molnar <mingo@elte.hu>
>>>
>>> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
>>> ---
>>>  kernel/smp.c |   21 ++++++++++++++++++++-
>>>  1 files changed, 20 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/kernel/smp.c b/kernel/smp.c
>>> index db197d6..ea48418 100644
>>> --- a/kernel/smp.c
>>> +++ b/kernel/smp.c
>>> @@ -12,6 +12,8 @@
>>>  #include <linux/gfp.h>
>>>  #include <linux/smp.h>
>>>  #include <linux/cpu.h>
>>> +#include <linux/async.h>
>>> +#include <linux/delay.h>
>>>
>>>  #ifdef CONFIG_USE_GENERIC_SMP_HELPERS
>>>  static struct {
>>> @@ -664,17 +666,34 @@ void __init setup_nr_cpu_ids(void)
>>>  	nr_cpu_ids = find_last_bit(cpumask_bits(cpu_possible_mask),NR_CPUS) + 1;
>>>  }
>>>
>>> +void __init async_cpu_up(void *data, async_cookie_t cookie)
>>> +{
>>> +	unsigned long nr = (unsigned long) data;
>>> +	/*
>>> +	 * we can only up one cpu at a time, as enforced by the hotplug
>>> +	 * lock; it's better to wait for all earlier CPUs to be done before
>>> +	 * we bring up ours, so that the bring up order is predictable.
>>> +	 */
>>> +	async_synchronize_cookie(cookie);
>>> +	cpu_up(nr);
>>> +}
>>> +
>>>  /* Called by boot processor to activate the rest. */
>>>  void __init smp_init(void)
>>>  {
>>>  	unsigned int cpu;
>>>
>>>  	/* FIXME: This should be done in userspace --RR */
>>> +
>>> +	/*
>>> +	 * But until we do this in userspace, we're going to do this
>>> +	 * in parallel to the rest of the kernel boot up.-- Arjan
>>> +	 */
>>>  	for_each_present_cpu(cpu) {
>>>  		if (num_online_cpus() >= setup_max_cpus)
>>>  			break;
>>>  		if (!cpu_online(cpu))
>>> -			cpu_up(cpu);
>>> +			async_schedule(async_cpu_up, (void *) cpu);
>>>  	}
>>>
>>>  	/* Any cleanup work */
>>
>>
>> If I understand correctly, with this patch, the booting of non-boot CPUs
>> will happen in parallel with the rest of the kernel boot, but bringing up
>> of individual CPU is still serialized (due to hotplug lock).
>>
>> If that is correct, I see several issues with this patch:
>>
>> 1. In smp_init(), after the comment "Any cleanup work" (see above), we print:
>> printk(KERN_INFO "Brought up %ld CPUs\n", (long)num_online_cpus());
>> So this can potentially print less than expected number of CPUs and might
>> surprise users.
>>
>> 2. Just below that we have smp_cpus_done(setup_max_cpus); and this translates
>> to native_smp_cpus_done() under x86, which calls impress_friends().
>> And that means, the bogosum calculation and the total activated processor
>> count which is printed, may get messed up.
>>
>> 3. sched_init_smp() is called immediately after smp_init(). And that calls
>> init_sched_domains(cpu_active_mask). Of course, it registers a hotplug
>> notifier callback to handle hot-added cpus.. but with this patch, boot up can
>> actually become unnecessarily slow at this point - what could have been done
>> in one go with an appropriately filled up cpu_active_mask, needs to be done
>> again and again using notifier callbacks. IOW, building sched domains can
>> potentially become a bottleneck, especially if there are lots and lots of
>> cpus in the machine.
>>
>> 4. There is an unhandled race condition (tiny window) in sched_init_smp():
>>
>> get_online_cpus();
>> ...
>> init_sched_domains(cpu_active_mask);
>> ...
>> put_online_cpus();
>>                                            <<<<<<<<<<<<<<<<<<<<<<<< There!
>>
>> hotcpu_notifier(cpuset_cpu_active, CPU_PRI_CPUSET_ACTIVE);
>> hotcpu_notifier(cpuset_cpu_inactive, CPU_PRI_CPUSET_INACTIVE);
>>
>> At the point shown above, some non-boot cpus can get booted up, without
>> being noticed by the scheduler.
>>
>> 5. And in powerpc, it creates a new race condition, as explained in
>> https://lkml.org/lkml/2012/2/13/383
>> (Of course, we can fix it trivially by using get/put_online_cpus().)
>>
> 
> 
> Actually, this one is trickier than that, to get it perfectly right.
> [see point 8 below].
> 
> 6. I also observed that in powerpc, a distinction is made implicitly between
> a cpu booting for the first time vs a soft CPU online event. That is, for
> freshly booted cpus, the following 3 functions are called:
> (Refer arch/powerpc/kernel/sysfs.c: topology_init)
> 
> 	register_cpu(c, cpu);
> 	device_create_file(&c->dev, &dev_attr_physical_id);
> 	register_cpu_online(cpu);
> 
> However, for a soft CPU Online event, only the last function is called.
> (And that looks correct because it matches properly with what is done
> upon CPU offline - only unregister_cpu_online() is called).
> 
> IOW, with this patch it becomes necessary to carefully examine all code
> with such implicit assumptions and modify them to handle the async boot up
> properly.
> 
> 7. And whichever code between smp_init() and async_synchronize_full() didn't
> care about CPU hotplug till today but depended on all cpus being online must
> suddenly start worrying about CPU Hotplug. They must register a cpu notifier
> and handle callbacks etc etc.. Or if they are not worth that complexity, they
> should atleast be redesigned or moved around - like the print statements that
> tell how many cpus came up, for example.
> 
> 8. And we should provide a way in which a piece of code can easily "catch" all
> CPU_ONLINE/UP_PREPARE events without missing any of them due to race
> conditions. Of course register_cpu_notifier() and friends are provided for
> that purpose, but they can't be used as it is in this boot up code..
> And calling register_cpu_notifier() within get/put_online_cpus() would be a
> disaster since that could lead to ABBA deadlock between cpu_add_remove_lock
> and cpu_hotplug.lock
> 


9. With this patch, the second statement below in Documentation/cpu-hotplug.txt
won't be true anymore:

Init functions could be of two types:
1. early init (init function called when only the boot processor is online).
2. late init (init function called _after_ all the CPUs are online).

And hence, those parts of the code which depend on this will have to be revisited.

10. Further down, in Documentation/cpu-hotplug.txt, we see:
(referring to early init as first case and late init as second case)

"For the first case, you should add the following to your init function

        register_cpu_notifier(&foobar_cpu_notifier);

For the second case, you should add the following to your init function

        register_hotcpu_notifier(&foobar_cpu_notifier); "

And as of now, hotcpu notifiers are nothing but regular cpu notifiers.
I wonder why do we even have something called hotcpu notifiers, when they do
nothing different.. rather, the distinction between "hotcpu add" vs "just
normal booting" was implicitly handled by choosing when we register our
callbacks:
register at early init => "normal booting" + "hotcpu, including soft online"
register at late init => "hotcpu, including soft online"

So, earlier we had some control over which CPU hotplug events we wanted to be
notified of, by choosing when we register the notifiers. But with this patch,
"careful placement" of our callback registration doesn't make any difference
anymore because late initcalls could run in parallel with smp boot up...

The point I am making is, what was already bad with respect to callback
registration, is made even worse by this patch.
(Btw, this issue is in the light of point 6 above).

>> There could be many more things that this patch breaks.. I haven't checked
>> thoroughly.
>>


 
Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: smp: Start up non-boot CPUs asynchronously
       [not found]                 ` <CADyApD0o4UYsTkqf2H2yJZ-d05NAyRAEc6z+m1gJEogc=cZLqQ@mail.gmail.com>
  2012-02-14 15:20                   ` Peter Zijlstra
@ 2012-02-14 19:57                   ` Srivatsa S. Bhat
  2012-02-14 20:00                     ` Peter Zijlstra
  2012-02-14 21:02                     ` Arjan van de Ven
  1 sibling, 2 replies; 17+ messages in thread
From: Srivatsa S. Bhat @ 2012-02-14 19:57 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Arjan van de Ven, Stephen Rothwell, mikey, Paul E. McKenney,
	Peter Zijlstra, gregkh, ppc-dev, linux-kernel, Milton Miller,
	Srivatsa Vaddagiri, Thomas Gleixner, H. Peter Anvin,
	Andrew Morton, Linus Torvalds, Ingo Molnar, Arjan van de Ven

[Small note, it appears as if the last 2 of your replies to this thread
 didn't reach LKML.]

On 02/14/2012 08:01 PM, Arjan van de Ven wrote:

> one coments; will comment more when I get to work
> 
> On Tue, Feb 14, 2012 at 1:48 AM, Srivatsa S. Bhat 
> 
> 7. And whichever code between smp_init() and async_synchronize_full() didn't
> 
>     care about CPU hotplug till today but depended on all cpus being
>     online must
>     suddenly start worrying about CPU Hotplug. They must register a cpu
>     notifier
>     and handle callbacks etc etc.. Or if they are not worth that
>     complexity, they
>     should atleast be redesigned or moved around - like the print
>     statements that
>     tell how many cpus came up, for example.
> 
> 
> frankly, such code HAS to worry about cpus going online and offline even
> today; the firmware, at least on X86, can start taking cores
> offline/online once ACPI is initialized....
> (as controlled by a data center manager from outside the box, usually
> done based on thermal or power conditions on a datacenter level).
> Now, no doubt that we have bugs in this space, since this only happened
> very rarely before.
> 
> Question is what to do from a longer term strategy:
> Either we declare the number of online CPUs invariant during a certain
> phase of the boot (and make ACPI and co honor this as well somehow)
> or
> We decide to go about fixing these (maybe with the help of lockdep?)
> 
> In addition to this, the reality is that the whole "bring cpus up"
> sequence needs to be changed; the current one is very messy and requires
> the hotplug lock for the whole bring up of each individual cpu... which
> is a very unfortunate design; a much better design would be to only take
> the lock for the actual registration of the newly brought up CPU to the
> kernel, while running the physical bringup without the global lock.
> If/when that change gets made, we can do the physical bring up in
> parallel (with each other, but also with the rest of the kernel boot),
> and do the registration en-mass at some convenient time in the boot,
> potentially late.
> 


Sounds like a good idea, but how will we take care of CPU_UP_PREPARE and
CPU_STARTING callbacks then? Because, CPU_UP_PREPARE callbacks are run
before bringing up the cpu and CPU_STARTING is called from the cpu that is
coming up. Also, CPU_UP_PREPARE callbacks can be failed, which can lead
to that particular cpu boot getting aborted. With the "late commissioning
of CPUs" idea you proposed above, retaining such semantics could become
very challenging.

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: smp: Start up non-boot CPUs asynchronously
  2012-02-14 19:57                   ` Srivatsa S. Bhat
@ 2012-02-14 20:00                     ` Peter Zijlstra
  2012-02-14 21:02                     ` Arjan van de Ven
  1 sibling, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2012-02-14 20:00 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Arjan van de Ven, Arjan van de Ven, Stephen Rothwell, mikey,
	Paul E. McKenney, gregkh, ppc-dev, linux-kernel, Milton Miller,
	Srivatsa Vaddagiri, Thomas Gleixner, H. Peter Anvin,
	Andrew Morton, Linus Torvalds, Ingo Molnar, Arjan van de Ven

On Wed, 2012-02-15 at 01:27 +0530, Srivatsa S. Bhat wrote:
> [Small note, it appears as if the last 2 of your replies to this
> thread
>  didn't reach LKML.] 

because he used html mail, LKML drops those.. IIRC you can tell K-9 not
to use html cruft, but then I stopped trying to pretend you can email
using phones, its all too painful.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: smp: Start up non-boot CPUs asynchronously
  2012-02-14 19:57                   ` Srivatsa S. Bhat
  2012-02-14 20:00                     ` Peter Zijlstra
@ 2012-02-14 21:02                     ` Arjan van de Ven
  1 sibling, 0 replies; 17+ messages in thread
From: Arjan van de Ven @ 2012-02-14 21:02 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Arjan van de Ven, Arjan van de Ven, Stephen Rothwell, mikey,
	Paul E. McKenney, Peter Zijlstra, gregkh, ppc-dev, linux-kernel,
	Milton Miller, Srivatsa Vaddagiri, Thomas Gleixner,
	H. Peter Anvin, Andrew Morton, Linus Torvalds, Ingo Molnar

On 2/14/2012 11:57 AM, Srivatsa S. Bhat wrote:

>> In addition to this, the reality is that the whole "bring cpus up"
>> sequence needs to be changed; the current one is very messy and requires
>> the hotplug lock for the whole bring up of each individual cpu... which
>> is a very unfortunate design; a much better design would be to only take
>> the lock for the actual registration of the newly brought up CPU to the
>> kernel, while running the physical bringup without the global lock.
>> If/when that change gets made, we can do the physical bring up in
>> parallel (with each other, but also with the rest of the kernel boot),
>> and do the registration en-mass at some convenient time in the boot,
>> potentially late.
>>
> 
> 
> Sounds like a good idea, but how will we take care of CPU_UP_PREPARE and
> CPU_STARTING callbacks then? Because, CPU_UP_PREPARE callbacks are run
> before bringing up the cpu and CPU_STARTING is called from the cpu that is
> coming up. Also, CPU_UP_PREPARE callbacks can be failed, which can lead
> to that particular cpu boot getting aborted. With the "late commissioning
> of CPUs" idea you proposed above, retaining such semantics could become
> very challenging.

some of these callbacks may need to be redesigned as well; or at least,
we may need to decouple the "physical" state of the CPU that's getting
brought up from the "logical" OS visible one.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: smp: Start up non-boot CPUs asynchronously
  2012-02-14  9:48               ` Srivatsa S. Bhat
       [not found]                 ` <CADyApD0o4UYsTkqf2H2yJZ-d05NAyRAEc6z+m1gJEogc=cZLqQ@mail.gmail.com>
  2012-02-14 19:32                 ` Srivatsa S. Bhat
@ 2012-02-14 21:28                 ` Benjamin Herrenschmidt
  2 siblings, 0 replies; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2012-02-14 21:28 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Stephen Rothwell, mikey, Peter Zijlstra, gregkh, Ingo Molnar,
	linux-kernel, Milton Miller, Srivatsa Vaddagiri, Linus Torvalds,
	H. Peter Anvin, arjanvandeven, Thomas Gleixner, Paul E. McKenney,
	ppc-dev, Andrew Morton, Srivatsa S. Bhat

On Tue, 2012-02-14 at 15:18 +0530, Srivatsa S. Bhat wrote:

> > 2. Just below that we have smp_cpus_done(setup_max_cpus); and this translates
> > to native_smp_cpus_done() under x86, which calls impress_friends().
> > And that means, the bogosum calculation and the total activated processor
> > count which is printed, may get messed up.

We also have code on powerpc that relies on the bringup having been
completed in smp_cpus_done(), especially on platforms that don't support
CPU hotplug (or fake it using sleep loops).

In some case we unmap MMIO space or close access to components (i2c for
example) that we use during the bringup for things like hard synchro of
CPU timebases, etc... on some G5s we disable the elastic interface on
the northbridge for CPUs that weren't brought up, that sort of thing...

So this patch will break a LOT of stuff for us, it must at least be a
config option for now, until we find another way to fix these things.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-02-14 21:29 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-31  4:54 smp: Start up non-boot CPUs asynchronously Arjan van de Ven
2012-01-31 12:52 ` Ingo Molnar
2012-01-31 13:41   ` Arjan van de Ven
2012-01-31 14:31     ` Ingo Molnar
2012-01-31 15:22       ` Arjan van de Ven
2012-01-31 16:12         ` Ingo Molnar
2012-01-31 16:24           ` Arjan van de Ven
2012-02-01 22:55             ` Andrew Morton
     [not found]               ` <CADyApD0yVOePmaLznks_h6xR_BCUjzEFUB7VtsL9vvsoHwCOVw@mail.gmail.com>
2012-02-01 23:31                 ` Linus Torvalds
2012-02-14  8:17             ` Srivatsa S. Bhat
2012-02-14  9:48               ` Srivatsa S. Bhat
     [not found]                 ` <CADyApD0o4UYsTkqf2H2yJZ-d05NAyRAEc6z+m1gJEogc=cZLqQ@mail.gmail.com>
2012-02-14 15:20                   ` Peter Zijlstra
2012-02-14 19:57                   ` Srivatsa S. Bhat
2012-02-14 20:00                     ` Peter Zijlstra
2012-02-14 21:02                     ` Arjan van de Ven
2012-02-14 19:32                 ` Srivatsa S. Bhat
2012-02-14 21:28                 ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).