All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 0/5] [x86]: Improve secondary CPU bring-up process robustness
  2012-05-09 10:24 [PATCH 0/5] [x86]: Improve secondary CPU bring-up process robustness Igor Mammedov
@ 2012-05-09  9:19 ` Peter Zijlstra
  2012-05-09 12:29   ` Igor Mammedov
  2012-05-09 13:12   ` Ingo Molnar
  2012-05-09 10:24 ` [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up Igor Mammedov
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2012-05-09  9:19 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: linux-kernel, rob, tglx, mingo, hpa, x86, luto, suresh.b.siddha,
	avi, johnstul, arjan, linux-doc

On Wed, 2012-05-09 at 12:24 +0200, Igor Mammedov wrote:
> Target audience for this patches is mostly virt. environments, where
> physical CPUs are shared beetween many guests and on overcommited
> host it can uncover different race conditions during secondary CPU
> bring-up.

The good news is that you're working on this, the bad news is that all
this code is slated for the scrap heap :-)

Thomas is currently in the process of doing a massive overhaul of the
hotplug code, included in that would be the stuff you're touching.

Every architecture does this hand-shake differently and probably buggy,
all that needs to move into generic code. The only bits needed in the
arch code are the cpu wakeup and initial trampoline, the rest should be
generic.

I'm not quite sure how far along he is, but it would be awesome if you
could help him out somehow.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 0/5] [x86]: Improve secondary CPU bring-up process robustness
@ 2012-05-09 10:24 Igor Mammedov
  2012-05-09  9:19 ` Peter Zijlstra
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Igor Mammedov @ 2012-05-09 10:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: rob, tglx, mingo, hpa, x86, luto, suresh.b.siddha, avi, imammedo,
	a.p.zijlstra, johnstul, arjan, linux-doc


Target audience for this patches is mostly virt. environments, where
physical CPUs are shared beetween many guests and on overcommited
host it can uncover different race conditions during secondary CPU
bring-up.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up
  2012-05-09 10:24 [PATCH 0/5] [x86]: Improve secondary CPU bring-up process robustness Igor Mammedov
  2012-05-09  9:19 ` Peter Zijlstra
@ 2012-05-09 10:24 ` Igor Mammedov
  2012-05-09 15:04   ` Shuah Khan
  2012-05-11 11:45   ` Thomas Gleixner
  2012-05-09 10:24 ` [PATCH 2/5] Take in account that several cpus might call check_tsc_sync_* at the same time Igor Mammedov
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 27+ messages in thread
From: Igor Mammedov @ 2012-05-09 10:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: rob, tglx, mingo, hpa, x86, luto, suresh.b.siddha, avi, imammedo,
	a.p.zijlstra, johnstul, arjan, linux-doc

When bringing up cpuX1, it could stall in start_secondary
before setting cpu_callin_mask for more than 5 sec. That forces
do_boot_cpu() to give up on waiting and go to error return path
printing messages:
  pr_err("CPU%d: Stuck ??\n", cpuX1);
or
  pr_err("CPU%d: Not responding.\n", cpuX1);
and native_cpu_up returns early with -EIO. However AP may continue
its boot process till it reaches check_tsc_sync_target(), where
it will wait for boot cpu to run cpu_up...=>check_tsc_sync_source.
That will never happen since cpu_up have returned with error before.

Now we need to note that cpuX1 is marked as active in smp_callin
before it stuck in check_tsc_sync_target. And when another cpuX2
is being onlined, start_secondary on it will call
  smp_callin
    -> smp_store_cpu_info
      -> identify_secondary_cpu
        -> mtrr_ap_init
          -> set_mtrr_from_inactive_cpu
            -> stop_machine_from_inactive_cpu
where it's going to schedule stop_machine work on all ACTIVE cpus
  smdata.num_threads = num_active_cpus() + 1;
and wait till they all complete it before continuing. As was noted
before cpuX1 was marked as active but can't execute any work since
it's not completed initialization and stuck in check_tsc_sync_target.
As result system soft lockups in stop_machine_cpu_stop.

backtrace from reproducer:

PID: 3324   TASK: ffff88007c00ae20  CPU: other cpus   COMMAND: "migration/1"
    [exception RIP: stop_machine_cpu_stop+131]
...
 #0 [ffff88007b4d7de8] cpu_stopper_thread at ffffffff810c66bd
 #1 [ffff88007b4d7ee8] kthread at ffffffff8107871e
 #2 [ffff88007b4d7f48] kernel_thread_helper at ffffffff8154af24

PID: 0      TASK: ffff88007c029710  CPU: 2   COMMAND: "swapper/2"
    [exception RIP: check_tsc_sync_target+33]
...
 #0 [ffff88007c025f30] start_secondary at ffffffff81539876

PID: 0      TASK: ffff88007c041710  CPU: 3   COMMAND: "swapper/3"
    [exception RIP: stop_machine_cpu_stop+131]
...
 #0 [ffff88007c04be50] stop_machine_from_inactive_cpu at ffffffff810c6b2f
 #1 [ffff88007c04bee0] mtrr_ap_init at ffffffff8102e963
 #2 [ffff88007c04bf10] identify_secondary_cpu at ffffffff81536799
 #3 [ffff88007c04bf20] smp_store_cpu_info at ffffffff815396d5
 #4 [ffff88007c04bf30] start_secondary at ffffffff81539800

Could be fixed by not marking being onlined cpu as active too early.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 arch/x86/kernel/smpboot.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 6e1e406..ae19d90 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -232,8 +232,6 @@ static void __cpuinit smp_callin(void)
 	set_cpu_sibling_map(raw_smp_processor_id());
 	wmb();
 
-	notify_cpu_starting(cpuid);
-
 	/*
 	 * Allow the master to continue.
 	 */
@@ -268,6 +266,8 @@ notrace static void __cpuinit start_secondary(void *unused)
 	 */
 	check_tsc_sync_target();
 
+	notify_cpu_starting(smp_processor_id());
+
 	/*
 	 * We need to hold call_lock, so there is no inconsistency
 	 * between the time smp_call_function() determines number of
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/5] Take in account that several cpus might call check_tsc_sync_* at the same time
  2012-05-09 10:24 [PATCH 0/5] [x86]: Improve secondary CPU bring-up process robustness Igor Mammedov
  2012-05-09  9:19 ` Peter Zijlstra
  2012-05-09 10:24 ` [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up Igor Mammedov
@ 2012-05-09 10:24 ` Igor Mammedov
  2012-05-09 10:25 ` [PATCH 3/5] Do not wait till next cpu online and abort early if lead cpu do not wait for us anymore Igor Mammedov
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Igor Mammedov @ 2012-05-09 10:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: rob, tglx, mingo, hpa, x86, luto, suresh.b.siddha, avi, imammedo,
	a.p.zijlstra, johnstul, arjan, linux-doc

conditions:
  multiliple cpus guest on over-commited host running aggressive
  cpu online/offline script.

real use-case:
  cpu hotplug in virt environment

Guest may hung with following trace:
---- cut ----
[   54.234569] CPU3: Stuck ??
./offV2.sh: line 11: echo: write error: Input/output error
[   54.250408] CPU 1 is now offline
[   54.260887] CPU 2 is now offline
[   54.261350] SMP alternatives: switching to UP code
./offV2.sh: line 8: echo: write error: Device or resource busy
[   54.286857] SMP alternatives: switching to SMP code
[   54.299540] Booting Node 0 Processor 1 APIC 0x1
[   54.250401] Disabled fast string operations
[   54.591023] ------------[ cut here ]------------
[   54.591023] WARNING: at kernel/watchdog.c:241 watchdog_overflow_callback+0x98/0xc0()
[   54.591023] Hardware name: KVM
[   54.591023] Watchdog detected hard LOCKUP on cpu 0
[   54.591023] Modules linked in: sunrpc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc crc32c_intel ghash_clmulni_intel microcode serio_raw virtio_balloon e1000 i2c_piix4 i2c_core floppy [last unloaded: scsi_wait_scan]
[   54.591023] Pid: 1193, comm: offV2.sh Not tainted 3.4.0-rc4+ #203
[   54.591023] Call Trace:
[   54.591023]  <NMI>  [<ffffffff810566df>] warn_slowpath_common+0x7f/0xc0
[   54.591023]  [<ffffffff810567d6>] warn_slowpath_fmt+0x46/0x50
[   54.591023]  [<ffffffff810dddf8>] watchdog_overflow_callback+0x98/0xc0
[   54.591023]  [<ffffffff8111685c>] __perf_event_overflow+0x9c/0x220
[   54.591023]  [<ffffffff81023c18>] ? x86_perf_event_set_period+0xd8/0x160
[   54.591023]  [<ffffffff81116e14>] perf_event_overflow+0x14/0x20
[   54.591023]  [<ffffffff81029405>] intel_pmu_handle_irq+0x1a5/0x310
[   54.591023]  [<ffffffff815436e1>] perf_event_nmi_handler+0x21/0x30
[   54.591023]  [<ffffffff81542f52>] do_nmi+0x132/0x3d0
[   54.591023]  [<ffffffff815424bc>] end_repeat_nmi+0x1a/0x1e
[   54.591023]  [<ffffffff81053adf>] ? __cleanup_sighand+0x2f/0x40
[   54.591023]  [<ffffffff81539b27>] ? check_tsc_sync_source+0x5c/0x14a
[   54.591023]  [<ffffffff81539b27>] ? check_tsc_sync_source+0x5c/0x14a
[   54.591023]  [<ffffffff81539b27>] ? check_tsc_sync_source+0x5c/0x14a
[   54.591023]  <<EOE>>  [<ffffffff81539274>] native_cpu_up+0x156/0x181
[   54.591023]  [<ffffffff8153ad8c>] _cpu_up+0x9c/0x10e
[   54.591023]  [<ffffffff8153ae4a>] cpu_up+0x4c/0x5c
[   54.591023]  [<ffffffff8152cd1c>] store_online+0x9c/0xd0
[   54.591023]  [<ffffffff8136d210>] dev_attr_store+0x20/0x30
[   54.591023]  [<ffffffff811e819f>] sysfs_write_file+0xef/0x170
[   54.591023]  [<ffffffff81179618>] vfs_write+0xc8/0x190
[   54.591023]  [<ffffffff811797e1>] sys_write+0x51/0x90
[   54.591023]  [<ffffffff81549c29>] system_call_fastpath+0x16/0x1b
[   54.591023] ---[ end trace d5170b988db7c86f ]---
---- cut ----

this happens when boot cpu0 give-ups on waiting for cpu3 due to cpu3 not
setting cpu_callin_mask in specified timeout. And then cpu1 is being brought
online.

 1. In this case cpu3 already incremented start_count => 1 and waiting at:

      while (atomic_read(&start_count) != cpus) /// local const int cpus = 2

 2. then cpu1 arrives, increments start_count => 2 and cpu[1,3] continue
    to boot as if everything ok.

 3. then cpu0 arrives to check_tsc_sync_source and lock-ups on

      while (atomic_read(&start_count) != cpus-1) // start_count is 2

Fix race by making boot cpu lead rendezvous process and forcing failed
cpu to return early from check_tsc_sync_target not doing clock sync.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 arch/x86/kernel/tsc_sync.c |   38 ++++++++++++++++----------------------
 1 files changed, 16 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index fc25e60..5f06138 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -25,8 +25,8 @@
  * Entry/exit counters that make sure that both CPUs
  * run the measurement code at once:
  */
-static __cpuinitdata atomic_t start_count;
-static __cpuinitdata atomic_t stop_count;
+static __cpuinitdata atomic_t start_tsc_sync;
+static __cpuinitdata atomic_t stop_tsc_sync;
 
 /*
  * We use a raw spinlock in this exceptional case, because
@@ -123,8 +123,6 @@ static inline unsigned int loop_timeout(int cpu)
  */
 void __cpuinit check_tsc_sync_source(int cpu)
 {
-	int cpus = 2;
-
 	/*
 	 * No need to check if we already know that the TSC is not
 	 * synchronized:
@@ -140,23 +138,19 @@ void __cpuinit check_tsc_sync_source(int cpu)
 	}
 
 	/*
-	 * Reset it - in case this is a second bootup:
+	 * Sygnal AP[s] that source cpu is arrived
 	 */
-	atomic_set(&stop_count, 0);
+	atomic_set(&start_tsc_sync, 1);
 
 	/*
 	 * Wait for the target to arrive:
 	 */
-	while (atomic_read(&start_count) != cpus-1)
+	while (atomic_read(&start_tsc_sync))
 		cpu_relax();
-	/*
-	 * Trigger the target to continue into the measurement too:
-	 */
-	atomic_inc(&start_count);
 
 	check_tsc_warp(loop_timeout(cpu));
 
-	while (atomic_read(&stop_count) != cpus-1)
+	while (!atomic_read(&stop_tsc_sync))
 		cpu_relax();
 
 	if (nr_warps) {
@@ -173,7 +167,6 @@ void __cpuinit check_tsc_sync_source(int cpu)
 	/*
 	 * Reset it - just in case we boot another CPU later:
 	 */
-	atomic_set(&start_count, 0);
 	nr_warps = 0;
 	max_warp = 0;
 	last_tsc = 0;
@@ -181,7 +174,7 @@ void __cpuinit check_tsc_sync_source(int cpu)
 	/*
 	 * Let the target continue with the bootup:
 	 */
-	atomic_inc(&stop_count);
+	atomic_set(&stop_tsc_sync, 0);
 }
 
 /*
@@ -189,29 +182,30 @@ void __cpuinit check_tsc_sync_source(int cpu)
  */
 void __cpuinit check_tsc_sync_target(void)
 {
-	int cpus = 2;
-
 	if (unsynchronized_tsc() || tsc_clocksource_reliable)
 		return;
 
 	/*
-	 * Register this CPU's participation and wait for the
-	 * source CPU to start the measurement:
+	 * Wait for the source CPU to start the measurement
 	 */
-	atomic_inc(&start_count);
-	while (atomic_read(&start_count) != cpus)
+	while (!atomic_read(&start_tsc_sync))
 		cpu_relax();
 
+	if (!cpumask_test_cpu(smp_processor_id(), cpu_initialized_mask))
+		return;
+
+	atomic_set(&start_tsc_sync, 0);
+
 	check_tsc_warp(loop_timeout(smp_processor_id()));
 
 	/*
 	 * Ok, we are done:
 	 */
-	atomic_inc(&stop_count);
+	atomic_set(&stop_tsc_sync, 1);
 
 	/*
 	 * Wait for the source CPU to print stuff:
 	 */
-	while (atomic_read(&stop_count) != cpus)
+	while (atomic_read(&stop_tsc_sync))
 		cpu_relax();
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/5] Do not wait till next cpu online and abort early if lead cpu do not wait for us anymore
  2012-05-09 10:24 [PATCH 0/5] [x86]: Improve secondary CPU bring-up process robustness Igor Mammedov
                   ` (2 preceding siblings ...)
  2012-05-09 10:24 ` [PATCH 2/5] Take in account that several cpus might call check_tsc_sync_* at the same time Igor Mammedov
@ 2012-05-09 10:25 ` Igor Mammedov
  2012-05-09 10:25 ` [PATCH 4/5] Cancel secondary CPU bringup if boot cpu abandoned this effort Igor Mammedov
  2012-05-09 10:25 ` [PATCH 5/5] Do not mark cpu as not present if we failed to boot it Igor Mammedov
  5 siblings, 0 replies; 27+ messages in thread
From: Igor Mammedov @ 2012-05-09 10:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: rob, tglx, mingo, hpa, x86, luto, suresh.b.siddha, avi, imammedo,
	a.p.zijlstra, johnstul, arjan, linux-doc

Use cpu_callout_mask for checking if boot cpu is still waiting on us

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 arch/x86/kernel/tsc_sync.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index 5f06138..1741385 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -188,10 +188,13 @@ void __cpuinit check_tsc_sync_target(void)
 	/*
 	 * Wait for the source CPU to start the measurement
 	 */
-	while (!atomic_read(&start_tsc_sync))
+	while (!atomic_read(&start_tsc_sync)) {
+		if (!cpumask_test_cpu(smp_processor_id(), cpu_callout_mask))
+			return;
 		cpu_relax();
+	}
 
-	if (!cpumask_test_cpu(smp_processor_id(), cpu_initialized_mask))
+	if (!cpumask_test_cpu(smp_processor_id(), cpu_callout_mask))
 		return;
 
 	atomic_set(&start_tsc_sync, 0);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/5] Cancel secondary CPU bringup if boot cpu abandoned this effort
  2012-05-09 10:24 [PATCH 0/5] [x86]: Improve secondary CPU bring-up process robustness Igor Mammedov
                   ` (3 preceding siblings ...)
  2012-05-09 10:25 ` [PATCH 3/5] Do not wait till next cpu online and abort early if lead cpu do not wait for us anymore Igor Mammedov
@ 2012-05-09 10:25 ` Igor Mammedov
  2012-05-09 10:25 ` [PATCH 5/5] Do not mark cpu as not present if we failed to boot it Igor Mammedov
  5 siblings, 0 replies; 27+ messages in thread
From: Igor Mammedov @ 2012-05-09 10:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: rob, tglx, mingo, hpa, x86, luto, suresh.b.siddha, avi, imammedo,
	a.p.zijlstra, johnstul, arjan, linux-doc

Instead go to idle without setting cpu online, which results in calling play_dead()

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 arch/x86/include/asm/tsc.h |    2 +-
 arch/x86/kernel/smpboot.c  |   12 +++++++++++-
 arch/x86/kernel/tsc_sync.c |   10 ++++++----
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index c91e8b9..b2491fc 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -58,7 +58,7 @@ extern int tsc_clocksource_reliable;
  * all CPUs/cores:
  */
 extern void check_tsc_sync_source(int cpu);
-extern void check_tsc_sync_target(void);
+extern bool check_tsc_sync_target(void);
 
 extern int notsc_setup(char *);
 extern void tsc_save_sched_clock_state(void);
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ae19d90..af63cab 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -264,7 +264,16 @@ notrace static void __cpuinit start_secondary(void *unused)
 	/*
 	 * Check TSC synchronization with the BP:
 	 */
-	check_tsc_sync_target();
+	if (!check_tsc_sync_target()) {
+		clear_local_APIC();
+
+		/* was set by cpu_init() */
+		cpumask_clear_cpu(smp_processor_id(), cpu_initialized_mask);
+		/* was set by smp_callin() */
+		cpumask_clear_cpu(smp_processor_id(), cpu_callin_mask);
+
+		goto do_idle;
+	}
 
 	notify_cpu_starting(smp_processor_id());
 
@@ -296,6 +305,7 @@ notrace static void __cpuinit start_secondary(void *unused)
 
 	x86_cpuinit.setup_percpu_clockev();
 
+do_idle:
 	wmb();
 	cpu_idle();
 }
diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index 1741385..45a593e 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -180,22 +180,22 @@ void __cpuinit check_tsc_sync_source(int cpu)
 /*
  * Freshly booted CPUs call into this:
  */
-void __cpuinit check_tsc_sync_target(void)
+bool __cpuinit check_tsc_sync_target(void)
 {
 	if (unsynchronized_tsc() || tsc_clocksource_reliable)
-		return;
+		return true;
 
 	/*
 	 * Wait for the source CPU to start the measurement
 	 */
 	while (!atomic_read(&start_tsc_sync)) {
 		if (!cpumask_test_cpu(smp_processor_id(), cpu_callout_mask))
-			return;
+			return false;
 		cpu_relax();
 	}
 
 	if (!cpumask_test_cpu(smp_processor_id(), cpu_callout_mask))
-		return;
+		return false;
 
 	atomic_set(&start_tsc_sync, 0);
 
@@ -211,4 +211,6 @@ void __cpuinit check_tsc_sync_target(void)
 	 */
 	while (atomic_read(&stop_tsc_sync))
 		cpu_relax();
+
+	return true;
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 5/5] Do not mark cpu as not present if we failed to boot it
  2012-05-09 10:24 [PATCH 0/5] [x86]: Improve secondary CPU bring-up process robustness Igor Mammedov
                   ` (4 preceding siblings ...)
  2012-05-09 10:25 ` [PATCH 4/5] Cancel secondary CPU bringup if boot cpu abandoned this effort Igor Mammedov
@ 2012-05-09 10:25 ` Igor Mammedov
  5 siblings, 0 replies; 27+ messages in thread
From: Igor Mammedov @ 2012-05-09 10:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: rob, tglx, mingo, hpa, x86, luto, suresh.b.siddha, avi, imammedo,
	a.p.zijlstra, johnstul, arjan, linux-doc

It will allow to boot cpu later if possible.

v2:
Introduce failed_cpu_boots_limit cmd-line parameter.

At startup udev might try to online cpu even if it have failed to boot
first time. And udev will loop there on cpu that refuses to boot.
So disable cpu after failed_cpu_boots_limit is reached to prevent
udev spinning on onlining persistently faulty cpu.
Guest kernel on overcomitted hosts could use this parameter to set
limit to acceptable number of cpu online failures.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 Documentation/kernel-parameters.txt |    6 +++++
 arch/x86/kernel/smpboot.c           |   36 +++++++++++++++++++++++++++++++++-
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index c1601e5..6b9bbbc 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -825,6 +825,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			Format: <interval>,<probability>,<space>,<times>
 			See also Documentation/fault-injection/.
 
+	failed_cpu_boots_limit=[SMP,X86]
+			Number of tries	kernel allowed to boot not responding /
+			stuck cpu. When fail attempts are reached, kernel will
+			disable failed cpu and mark it as not present.
+			Default: 0
+
 	floppy=		[HW]
 			See Documentation/blockdev/floppy.txt.
 
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index af63cab..2d72a8a 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -136,6 +136,28 @@ EXPORT_PER_CPU_SYMBOL(cpu_info);
 
 atomic_t init_deasserted;
 
+static int failed_cpu_boots_limit = 0;
+static int cpu_boot_error_nr[NR_CPUS];
+
+static int parse_failed_cpu_boots(char *str)
+{
+	unsigned long val;
+	int err;
+
+	if (!str)
+		return -EINVAL;
+
+	err = kstrtoul(str, 0, &failed_cpu_boots_limit);
+	if (err)
+		return -EINVAL;
+
+	printk(KERN_NOTICE "Limit CPU failed boot attempts: %d\n",
+			failed_cpu_boots_limit);
+
+	return 0;
+}
+__setup("failed_cpu_boots_limit=", parse_failed_cpu_boots);
+
 /*
  * Report back to the Boot Processor.
  * Running on AP.
@@ -810,8 +832,18 @@ do_rest:
 		/* was set by cpu_init() */
 		cpumask_clear_cpu(cpu, cpu_initialized_mask);
 
-		set_cpu_present(cpu, false);
-		per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
+		/* was set by smp_callin() */
+		cpumask_clear_cpu(cpu, cpu_callin_mask);
+
+		/* disable CPU if it's failed to boot N times in a row */
+		if (cpu_boot_error_nr[cpu]++ > failed_cpu_boots_limit) {
+			set_cpu_present(cpu, false);
+			per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
+			pr_err("CPU%d: repeatedly fails to boot, disabling.\n",
+				cpu);
+		}
+	} else {
+		cpu_boot_error_nr[cpu] = 0;
 	}
 
 	/* mark "stuck" area as not stuck */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/5] [x86]: Improve secondary CPU bring-up process robustness
  2012-05-09  9:19 ` Peter Zijlstra
@ 2012-05-09 12:29   ` Igor Mammedov
  2012-05-09 13:12   ` Ingo Molnar
  1 sibling, 0 replies; 27+ messages in thread
From: Igor Mammedov @ 2012-05-09 12:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, rob, tglx, mingo, hpa, x86, luto, suresh.b.siddha,
	avi, johnstul, arjan, linux-doc

On 05/09/2012 11:19 AM, Peter Zijlstra wrote:
> On Wed, 2012-05-09 at 12:24 +0200, Igor Mammedov wrote:
>> Target audience for this patches is mostly virt. environments, where
>> physical CPUs are shared beetween many guests and on overcommited
>> host it can uncover different race conditions during secondary CPU
>> bring-up.
>
> The good news is that you're working on this, the bad news is that all
> this code is slated for the scrap heap :-)
>
> Thomas is currently in the process of doing a massive overhaul of the
> hotplug code, included in that would be the stuff you're touching.

If Thomas' rewrite is progressed well and could be completed for 3.5 then
there is no big harm in throwing this patches away. However if it's not,
it might have sense to apply these patches in 3.5 devel cycle.

Also massive rewrite would be unlikely backport candidate to stable 3.x trees,
and some of these patches might be considered as such ones.

> Every architecture does this hand-shake differently and probably buggy,
> all that needs to move into generic code. The only bits needed in the
> arch code are the cpu wakeup and initial trampoline, the rest should be
> generic.
>
> I'm not quite sure how far along he is, but it would be awesome if you
> could help him out somehow.

Sure, I could lend a hand as minimum in testing and maybe some rewriting
too if Thomas can give some part if so that we do not conflict on this.

PS:
There is still a couple hangs in 3.4-rc4+:

one looks like kvm host related, hangs when writing into apic register:
#0  native_apic_mem_write (reg=768, v=<value optimized out>) at /builds/imammedo/linux-2.6/arch/x86/include/asm/apic.h:107
#1  0xffffffff81034749 in apic_write (low=50432, id=<value optimized out>) at /builds/imammedo/linux-2.6/arch/x86/include/asm/apic.h:426
#2  native_apic_icr_write (low=50432, id=<value optimized out>) at arch/x86/kernel/apic/apic.c:273
#3  0xffffffff815a78fb in apic_icr_write (apicid=2, cpu=2) at /builds/imammedo/linux-2.6/arch/x86/include/asm/apic.h:436
#4  wakeup_secondary_cpu_via_init (apicid=2, cpu=2) at arch/x86/kernel/smpboot.c:563
#5  do_boot_cpu (apicid=2, cpu=2) at arch/x86/kernel/smpboot.c:782


And another one cannot be helped: RHBZ 816899 comment 7
https://bugzilla.redhat.com/show_bug.cgi?id=816899#c7

-- 
-----
  Igor

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/5] [x86]: Improve secondary CPU bring-up process robustness
  2012-05-09  9:19 ` Peter Zijlstra
  2012-05-09 12:29   ` Igor Mammedov
@ 2012-05-09 13:12   ` Ingo Molnar
  2012-05-10 17:31     ` Rob Landley
  1 sibling, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2012-05-09 13:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Igor Mammedov, linux-kernel, rob, tglx, mingo, hpa, x86, luto,
	suresh.b.siddha, avi, johnstul, arjan, linux-doc


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> I'm not quite sure how far along he is, but it would be 
> awesome if you could help him out somehow.

bits of Thomas's rework are in latest -tip, the tip:smp/hotplug 
branch, also merged into tip:master and soon to linux-next.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up
  2012-05-09 10:24 ` [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up Igor Mammedov
@ 2012-05-09 15:04   ` Shuah Khan
  2012-05-09 15:22     ` Igor Mammedov
  2012-05-11 11:45   ` Thomas Gleixner
  1 sibling, 1 reply; 27+ messages in thread
From: Shuah Khan @ 2012-05-09 15:04 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: shuahkhan, linux-kernel, rob, tglx, mingo, hpa, x86, luto,
	suresh.b.siddha, avi, a.p.zijlstra, johnstul, arjan, linux-doc

On Wed, 2012-05-09 at 12:24 +0200, Igor Mammedov wrote:
> When bringing up cpuX1, it could stall in start_secondary
> before setting cpu_callin_mask for more than 5 sec. That forces
> do_boot_cpu() to give up on waiting and go to error return path
> printing messages:
>   pr_err("CPU%d: Stuck ??\n", cpuX1);

I am seeing this with the linux-next May 7th build on my laptop HP
EliteBook 8440p during boot. Could this problem be not specific to virt
envs.? Anybody else seeing it?

-- Shuah



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up
  2012-05-09 15:04   ` Shuah Khan
@ 2012-05-09 15:22     ` Igor Mammedov
  2012-05-09 15:34       ` Shuah Khan
  0 siblings, 1 reply; 27+ messages in thread
From: Igor Mammedov @ 2012-05-09 15:22 UTC (permalink / raw)
  To: shuahkhan
  Cc: linux-kernel, rob, tglx, mingo, hpa, x86, luto, suresh.b.siddha,
	avi, a.p.zijlstra, johnstul, arjan, linux-doc

On 05/09/2012 05:04 PM, Shuah Khan wrote:
> On Wed, 2012-05-09 at 12:24 +0200, Igor Mammedov wrote:
>> When bringing up cpuX1, it could stall in start_secondary
>> before setting cpu_callin_mask for more than 5 sec. That forces
>> do_boot_cpu() to give up on waiting and go to error return path
>> printing messages:
>>    pr_err("CPU%d: Stuck ??\n", cpuX1);
>
> I am seeing this with the linux-next May 7th build on my laptop HP
> EliteBook 8440p during boot. Could this problem be not specific to virt
> envs.? Anybody else seeing it?

I could only guess that on bare metal SMI or other firmware issue may interfere with AP boot.
Could you test if this patch-set fixes issue for you?
And do you see the same problem during suspend/resume (assuming that it's booted without problem)?

-- 
-----
  Igor

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up
  2012-05-09 15:22     ` Igor Mammedov
@ 2012-05-09 15:34       ` Shuah Khan
  2012-05-10 15:26         ` Shuah Khan
  0 siblings, 1 reply; 27+ messages in thread
From: Shuah Khan @ 2012-05-09 15:34 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: shuahkhan, linux-kernel, rob, tglx, mingo, hpa, x86, luto,
	suresh.b.siddha, avi, a.p.zijlstra, johnstul, arjan, linux-doc

On Wed, 2012-05-09 at 17:22 +0200, Igor Mammedov wrote:
> On 05/09/2012 05:04 PM, Shuah Khan wrote:
> > On Wed, 2012-05-09 at 12:24 +0200, Igor Mammedov wrote:
> >> When bringing up cpuX1, it could stall in start_secondary
> >> before setting cpu_callin_mask for more than 5 sec. That forces
> >> do_boot_cpu() to give up on waiting and go to error return path
> >> printing messages:
> >>    pr_err("CPU%d: Stuck ??\n", cpuX1);
> >
> > I am seeing this with the linux-next May 7th build on my laptop HP
> > EliteBook 8440p during boot. Could this problem be not specific to virt
> > envs.? Anybody else seeing it?
> 
> I could only guess that on bare metal SMI or other firmware issue may interfere with AP boot.
> Could you test if this patch-set fixes issue for you?
> And do you see the same problem during suspend/resume (assuming that it's booted without problem)?
> 

I had to abandon the boot and go back to distro installed kernel - yes I
will test with your patch set and see if the problem goes away. Might
not happen until close to end of day today, but will report back.

-- Shuah



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up
  2012-05-09 15:34       ` Shuah Khan
@ 2012-05-10 15:26         ` Shuah Khan
  2012-05-10 16:29           ` Igor Mammedov
  0 siblings, 1 reply; 27+ messages in thread
From: Shuah Khan @ 2012-05-10 15:26 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: shuahkhan, linux-kernel, rob, tglx, mingo, hpa, x86, luto,
	suresh.b.siddha, avi, a.p.zijlstra, johnstul, arjan, linux-doc


> > I could only guess that on bare metal SMI or other firmware issue may interfere with AP boot.
> > Could you test if this patch-set fixes issue for you?
> > And do you see the same problem during suspend/resume (assuming that it's booted without problem)?
> > 
> 
> I had to abandon the boot and go back to distro installed kernel - yes I
> will test with your patch set and see if the problem goes away. Might
> not happen until close to end of day today, but will report back.

These patches failed to apply on linux-next May 9th. Couldn't get the
testing done as planned.

-- Shuah


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up
  2012-05-10 15:26         ` Shuah Khan
@ 2012-05-10 16:29           ` Igor Mammedov
  2012-05-10 16:38             ` Shuah Khan
  0 siblings, 1 reply; 27+ messages in thread
From: Igor Mammedov @ 2012-05-10 16:29 UTC (permalink / raw)
  To: shuahkhan
  Cc: linux-kernel, rob, tglx, mingo, hpa, x86, luto, suresh b siddha,
	avi, a p zijlstra, johnstul, arjan, linux-doc

just cloned git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
last commit 0bed306

All 5 patches applied cleanly, maybe we are talking about different linux-next trees?


----- Original Message -----
> From: "Shuah Khan" <shuahkhan@gmail.com>
> To: "Igor Mammedov" <imammedo@redhat.com>
> Cc: shuahkhan@gmail.com, linux-kernel@vger.kernel.org, rob@landley.net, tglx@linutronix.de, mingo@redhat.com,
> hpa@zytor.com, x86@kernel.org, luto@mit.edu, "suresh b siddha" <suresh.b.siddha@intel.com>, avi@redhat.com, "a p
> zijlstra" <a.p.zijlstra@chello.nl>, johnstul@us.ibm.com, arjan@linux.intel.com, linux-doc@vger.kernel.org
> Sent: Thursday, May 10, 2012 5:26:24 PM
> Subject: Re: [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up
> 
> 
> > > I could only guess that on bare metal SMI or other firmware issue
> > > may interfere with AP boot.
> > > Could you test if this patch-set fixes issue for you?
> > > And do you see the same problem during suspend/resume (assuming
> > > that it's booted without problem)?
> > > 
> > 
> > I had to abandon the boot and go back to distro installed kernel -
> > yes I
> > will test with your patch set and see if the problem goes away.
> > Might
> > not happen until close to end of day today, but will report back.
> 
> These patches failed to apply on linux-next May 9th. Couldn't get the
> testing done as planned.
> 
> -- Shuah
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up
  2012-05-10 16:29           ` Igor Mammedov
@ 2012-05-10 16:38             ` Shuah Khan
  0 siblings, 0 replies; 27+ messages in thread
From: Shuah Khan @ 2012-05-10 16:38 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: shuahkhan, linux-kernel, rob, tglx, mingo, hpa, x86, luto,
	suresh b siddha, avi, a p zijlstra, johnstul, arjan, linux-doc

On Thu, 2012-05-10 at 12:29 -0400, Igor Mammedov wrote:
> just cloned git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> last commit 0bed306
> 
> All 5 patches applied cleanly, maybe we are talking about different linux-next trees?

The same git from May 8th. The commit id I have is
407655a15be465ca284a68843e66c6fe7decf4bc. Let me try again.

-- Shuah


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/5] [x86]: Improve secondary CPU bring-up process robustness
  2012-05-09 13:12   ` Ingo Molnar
@ 2012-05-10 17:31     ` Rob Landley
  2012-05-10 17:39       ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Rob Landley @ 2012-05-10 17:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Igor Mammedov, linux-kernel, tglx, mingo, hpa,
	x86, luto, suresh.b.siddha, avi, johnstul, arjan, linux-doc

On 05/09/2012 08:12 AM, Ingo Molnar wrote:
> 
> * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> 
>> I'm not quite sure how far along he is, but it would be 
>> awesome if you could help him out somehow.
> 
> bits of Thomas's rework are in latest -tip, the tip:smp/hotplug 
> branch, also merged into tip:master and soon to linux-next.

Um, I missed a curve here: I know what linux-next is, and I know what
staging is (although I'm a bit confused about how those two relate), but
what's tip?  (Do you mean linux's tree?)

I checked applying-patches.txt and HOWTO in case I missed something, and
both are are hideously stale.  (For one thing, both still say 2.6
everywhere.  I guess this is my problem now, I'll try a fixup pass this
weekend.)

In the meantime: what do you mean by tip?

> Thanks,
> 
> 	Ingo
> 

Rob
-- 
GNU/Linux isn't: Linux=GPLv2, GNU=GPLv3+, they can't share code.
Either it's "mere aggregation", or a license violation.  Pick one.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/5] [x86]: Improve secondary CPU bring-up process robustness
  2012-05-10 17:31     ` Rob Landley
@ 2012-05-10 17:39       ` Peter Zijlstra
  0 siblings, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2012-05-10 17:39 UTC (permalink / raw)
  To: Rob Landley
  Cc: Ingo Molnar, Igor Mammedov, linux-kernel, tglx, mingo, hpa, x86,
	luto, suresh.b.siddha, avi, johnstul, arjan, linux-doc

On Thu, 2012-05-10 at 12:31 -0500, Rob Landley wrote:
> In the meantime: what do you mean by tip?

  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git

This tree contains x86,sched,lockdep,tracing,irq,timers,timekeeping and
other assorted stuff.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up
  2012-05-09 10:24 ` [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up Igor Mammedov
  2012-05-09 15:04   ` Shuah Khan
@ 2012-05-11 11:45   ` Thomas Gleixner
  2012-05-11 15:16     ` Igor Mammedov
  1 sibling, 1 reply; 27+ messages in thread
From: Thomas Gleixner @ 2012-05-11 11:45 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: linux-kernel, rob, mingo, hpa, x86, luto, suresh.b.siddha, avi,
	a.p.zijlstra, johnstul, arjan, linux-doc

On Wed, 9 May 2012, Igor Mammedov wrote:

> When bringing up cpuX1, it could stall in start_secondary
> before setting cpu_callin_mask for more than 5 sec. That forces
> do_boot_cpu() to give up on waiting and go to error return path
> printing messages:
>   pr_err("CPU%d: Stuck ??\n", cpuX1);
> or
>   pr_err("CPU%d: Not responding.\n", cpuX1);
> and native_cpu_up returns early with -EIO. However AP may continue
> its boot process till it reaches check_tsc_sync_target(), where
> it will wait for boot cpu to run cpu_up...=>check_tsc_sync_source.
> That will never happen since cpu_up have returned with error before.
> 
> Now we need to note that cpuX1 is marked as active in smp_callin
> before it stuck in check_tsc_sync_target. And when another cpuX2
> is being onlined, start_secondary on it will call
>   smp_callin
>     -> smp_store_cpu_info
>       -> identify_secondary_cpu
>         -> mtrr_ap_init
>           -> set_mtrr_from_inactive_cpu
>             -> stop_machine_from_inactive_cpu
> where it's going to schedule stop_machine work on all ACTIVE cpus
>   smdata.num_threads = num_active_cpus() + 1;
> and wait till they all complete it before continuing. As was noted
> before cpuX1 was marked as active but can't execute any work since
> it's not completed initialization and stuck in check_tsc_sync_target.
> As result system soft lockups in stop_machine_cpu_stop.
> 
> backtrace from reproducer:
> 
> PID: 3324   TASK: ffff88007c00ae20  CPU: other cpus   COMMAND: "migration/1"
>     [exception RIP: stop_machine_cpu_stop+131]
> ...
>  #0 [ffff88007b4d7de8] cpu_stopper_thread at ffffffff810c66bd
>  #1 [ffff88007b4d7ee8] kthread at ffffffff8107871e
>  #2 [ffff88007b4d7f48] kernel_thread_helper at ffffffff8154af24
> 
> PID: 0      TASK: ffff88007c029710  CPU: 2   COMMAND: "swapper/2"
>     [exception RIP: check_tsc_sync_target+33]
> ...
>  #0 [ffff88007c025f30] start_secondary at ffffffff81539876
> 
> PID: 0      TASK: ffff88007c041710  CPU: 3   COMMAND: "swapper/3"
>     [exception RIP: stop_machine_cpu_stop+131]
> ...
>  #0 [ffff88007c04be50] stop_machine_from_inactive_cpu at ffffffff810c6b2f
>  #1 [ffff88007c04bee0] mtrr_ap_init at ffffffff8102e963
>  #2 [ffff88007c04bf10] identify_secondary_cpu at ffffffff81536799
>  #3 [ffff88007c04bf20] smp_store_cpu_info at ffffffff815396d5
>  #4 [ffff88007c04bf30] start_secondary at ffffffff81539800
> 
> Could be fixed by not marking being onlined cpu as active too early.

This explanation is completely useless. What's fixed by what. And is
it fixed or could it be fixed?

This also want's an explanation why moving the cpu_starting notifier
does not hurt any assumptions of the code which has notifiers
registered for CPU_STARTING. In fact your change can result in
CPU_ONLINE notifier being called _BEFORE_ CPU_STARTING. Do you really
think that's correct?

Aside of that your whole patch series tackles the wrong aspect.
 
Why the heck do you need extra magic in check_tsc_sync_target() ?

If the booting CPU fails to set the callin map within 5 seconds then
it should not even reach check_tsc_sync_target() at all.

And just for the record, the new CPU can run into the very same
timeout problem, when the boot CPU fails to set the callout mask.

This whole stuff is a complete trainwreck already and I don't want to
see anything like your "fixing the symptoms" hackery near it, really.

This whole stuff needs a proper rewrite and not some more braindamaged
bandaids. And if we apply bandaids for the time being, then certainly
not bandaids like the mess you created.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up
  2012-05-11 11:45   ` Thomas Gleixner
@ 2012-05-11 15:16     ` Igor Mammedov
  2012-05-11 21:14       ` Thomas Gleixner
  0 siblings, 1 reply; 27+ messages in thread
From: Igor Mammedov @ 2012-05-11 15:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, rob, mingo, hpa, x86, luto, suresh.b.siddha, avi,
	a.p.zijlstra, johnstul, arjan, linux-doc

On 05/11/2012 01:45 PM, Thomas Gleixner wrote:
> On Wed, 9 May 2012, Igor Mammedov wrote:
>
>> When bringing up cpuX1, it could stall in start_secondary
>> before setting cpu_callin_mask for more than 5 sec. That forces
>> do_boot_cpu() to give up on waiting and go to error return path
>> printing messages:
>>    pr_err("CPU%d: Stuck ??\n", cpuX1);
>> or
>>    pr_err("CPU%d: Not responding.\n", cpuX1);
>> and native_cpu_up returns early with -EIO. However AP may continue
>> its boot process till it reaches check_tsc_sync_target(), where
>> it will wait for boot cpu to run cpu_up...=>check_tsc_sync_source.
>> That will never happen since cpu_up have returned with error before.
>>
>> Now we need to note that cpuX1 is marked as active in smp_callin
>> before it stuck in check_tsc_sync_target. And when another cpuX2
>> is being onlined, start_secondary on it will call
>>    smp_callin
>>      ->  smp_store_cpu_info
>>        ->  identify_secondary_cpu
>>          ->  mtrr_ap_init
>>            ->  set_mtrr_from_inactive_cpu
>>              ->  stop_machine_from_inactive_cpu
>> where it's going to schedule stop_machine work on all ACTIVE cpus
>>    smdata.num_threads = num_active_cpus() + 1;
>> and wait till they all complete it before continuing. As was noted
>> before cpuX1 was marked as active but can't execute any work since
>> it's not completed initialization and stuck in check_tsc_sync_target.
>> As result system soft lockups in stop_machine_cpu_stop.
>>
>> backtrace from reproducer:
>>
>> PID: 3324   TASK: ffff88007c00ae20  CPU: other cpus   COMMAND: "migration/1"
>>      [exception RIP: stop_machine_cpu_stop+131]
>> ...
>>   #0 [ffff88007b4d7de8] cpu_stopper_thread at ffffffff810c66bd
>>   #1 [ffff88007b4d7ee8] kthread at ffffffff8107871e
>>   #2 [ffff88007b4d7f48] kernel_thread_helper at ffffffff8154af24
>>
>> PID: 0      TASK: ffff88007c029710  CPU: 2   COMMAND: "swapper/2"
>>      [exception RIP: check_tsc_sync_target+33]
>> ...
>>   #0 [ffff88007c025f30] start_secondary at ffffffff81539876
>>
>> PID: 0      TASK: ffff88007c041710  CPU: 3   COMMAND: "swapper/3"
>>      [exception RIP: stop_machine_cpu_stop+131]
>> ...
>>   #0 [ffff88007c04be50] stop_machine_from_inactive_cpu at ffffffff810c6b2f
>>   #1 [ffff88007c04bee0] mtrr_ap_init at ffffffff8102e963
>>   #2 [ffff88007c04bf10] identify_secondary_cpu at ffffffff81536799
>>   #3 [ffff88007c04bf20] smp_store_cpu_info at ffffffff815396d5
>>   #4 [ffff88007c04bf30] start_secondary at ffffffff81539800
>>
>> Could be fixed by not marking being onlined cpu as active too early.
>
> This explanation is completely useless. What's fixed by what. And is
> it fixed or could it be fixed?
What's fixed:
  above mentioned hang in stop_machine_from_inactive_cpu() because even if
  a cpu failed to set cpu_callin_mask in time and boot cpu marked it as
  not present + removed from some maps, with this move, a failed cpu won't
  set cpu_active_mask before it completes check_tsc_sync_target().
  And with patches [2,3,4]/5 it will not set cpu_active_mask at all
  so making itself unavailable to the rest of kernel.

> This also want's an explanation why moving the cpu_starting notifier
> does not hurt any assumptions of the code which has notifiers
> registered for CPU_STARTING.
I've checked in kernel users [sched, kvm, pmu] before moving it here.
It looked safe. However I might have missed something.

>In fact your change can result in
> CPU_ONLINE notifier being called _BEFORE_ CPU_STARTING. Do you really
> think that's correct?
That's certainly is not correct, it asks for a barrier after
cpu_starting and before setting cpu_online_mask.

> Aside of that your whole patch series tackles the wrong aspect.
patch series tries to prevent a failed to boot cpu wreck havoc on
running kernel. How wrong is that?
What should be fixed instead?

> Why the heck do you need extra magic in check_tsc_sync_target() ?
Because it's plainly racy. patch 2/5 describes/fixes race condition in
check_tsc_sync_target().

> If the booting CPU fails to set the callin map within 5 seconds then
> it should not even reach check_tsc_sync_target() at all.
Why it shouldn't reach check_tsc_sync_target () at all. There is nothing
that prevents it and guaranties such behavior.
For example: it happens when kernel is running inside guest on overloaded host.
And it seems on baremetal as well: https://lkml.org/lkml/2012/5/9/336

> And just for the record, the new CPU can run into the very same
> timeout problem, when the boot CPU fails to set the callout mask.
Yes, it can.
I've tried to fix only what was reproducible on my test system, so I
haven't touched this.
That might result in panic in smp_callin():
    panic("%s: CPU%d started up but did not get a callout!\n"

> This whole stuff is a complete trainwreck already and I don't want to
> see anything like your "fixing the symptoms" hackery near it, really.
Fixing slow to respond cpu might be not option, so we need to gracefully
abort failed cpu_online operation instead of hanging in stop_machine or
crashing in scheduler[https://lkml.org/lkml/2012/5/9/137].

>
> This whole stuff needs a proper rewrite and not some more braindamaged
> bandaids. And if we apply bandaids for the time being, then certainly
> not bandaids like the mess you created.
Rewrite will need to deal with failed to boot in time cpu as well.
So if rewrite is not near completion, then maybe for a time being bandaids
would be needed.
Any ideas/suggestions for "right bandaids" instead of braindamaged ones?

> Thanks,
>
> 	tglx


-- 
-----
  Thanks,
    Igor

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up
  2012-05-11 15:16     ` Igor Mammedov
@ 2012-05-11 21:14       ` Thomas Gleixner
  2012-05-12 19:32         ` [RFC] [x86]: abort secondary cpu bringup gracefully Igor Mammedov
  0 siblings, 1 reply; 27+ messages in thread
From: Thomas Gleixner @ 2012-05-11 21:14 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: linux-kernel, rob, mingo, hpa, x86, luto, suresh.b.siddha, avi,
	a.p.zijlstra, johnstul, arjan, linux-doc

On Fri, 11 May 2012, Igor Mammedov wrote:
> On 05/11/2012 01:45 PM, Thomas Gleixner wrote:
> > In fact your change can result in
> > CPU_ONLINE notifier being called _BEFORE_ CPU_STARTING. Do you really
> > think that's correct?
> 
> That's certainly is not correct, it asks for a barrier after
> cpu_starting and before setting cpu_online_mask.

Ah, a barrier will solve that. Interesting approach.

> > Aside of that your whole patch series tackles the wrong aspect.
> 
> patch series tries to prevent a failed to boot cpu wreck havoc on
> running kernel. How wrong is that?

Emphasis on "tries". And as I explained before you are fixing stuff at
the wrong place.

> What should be fixed instead?

If that timeout happens, then prevent that the following code is
reached, perhaps ?
 
> > Why the heck do you need extra magic in check_tsc_sync_target() ?
> Because it's plainly racy. patch 2/5 describes/fixes race condition in
> check_tsc_sync_target().

Crap. You do not understand at all. If the code which is before
check_tsc_sync_target() is failing then check_tsc_sync_target() should
not be called at all. It's that simple. Putting weird ass checks into
that code is simply the wrong solution.
 
> > If the booting CPU fails to set the callin map within 5 seconds then
> > it should not even reach check_tsc_sync_target() at all.
> 
> Why it shouldn't reach check_tsc_sync_target () at all. There is nothing
> that prevents it and guaranties such behavior.

That's the whole fcking point. The code is missing which prevents that
and instead of hacking that crap into check_tsc_sync_target() we need
to add that what's missing.

> > And just for the record, the new CPU can run into the v()ery same
> > timeout problem, when the boot CPU fails to set the callout mask.
> 
> Yes, it can.
> I've tried to fix only what was reproducible on my test system, so I
> haven't touched this.
> That might result in panic in smp_callin():
>    panic("%s: CPU%d started up but did not get a callout!\n"

I know that. And it's fucking wrong and I don't care whether you are
only fixing what's reproducible on your test system. If we touch that
code for that purpose then we better touch it so it's correct in all
aspects and not in some "this fixes my esoteric problem" approach.

> > This whole stuff is a complete trainwreck already and I don't want to
> > see anything like your "fixing the symptoms" hackery near it, really.
> 
> Fixing slow to respond cpu might be not option, so we need to gracefully
> abort failed cpu_online operation instead of hanging in stop_machine or
> crashing in scheduler[https://lkml.org/lkml/2012/5/9/137].

I'm tired of your symptom links. You are simply not understanding the
scope of the problem and you just try to fix it so your testing
failures go away.
 
> > This whole stuff needs a proper rewrite and not some more braindamaged
> > bandaids. And if we apply bandaids for the time being, then certainly
> > not bandaids like the mess you created.
>
> Rewrite will need to deal with failed to boot in time cpu as well.

Really? Thanks for the hint, didn't know that.

> So if rewrite is not near completion, then maybe for a time being bandaids
> would be needed.

As I said, I don't object against proper bandaids, but I object
against the hackery you provided.

> Any ideas/suggestions for "right bandaids" instead of braindamaged ones?

Maybe start to think about my answers instead of blindly repeating
your observations of symptoms and praising your symptom cures.

Even in bandaid mode we can fix that behaviour by putting proper
synchronization into the right points, instead of hacking weird
failure handling into code which should never be affected by that.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] [x86]: abort secondary cpu bringup gracefully
  2012-05-12 19:32         ` [RFC] [x86]: abort secondary cpu bringup gracefully Igor Mammedov
@ 2012-05-12 17:39           ` Peter Zijlstra
  2012-05-12 18:51             ` Igor Mammedov
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2012-05-12 17:39 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: linux-kernel, rob, mingo, hpa, x86, luto, suresh.b.siddha, avi,
	johnstul, arjan

On Sat, 2012-05-12 at 21:32 +0200, Igor Mammedov wrote:

> @@ -232,12 +233,36 @@ static void __cpuinit smp_callin(void)
>  	set_cpu_sibling_map(raw_smp_processor_id());
>  	wmb();
>  
> -	notify_cpu_starting(cpuid);
> -
>  	/*
>  	 * Allow the master to continue.
>  	 */
>  	cpumask_set_cpu(cpuid, cpu_callin_mask);
> +
> +	/*
> +	 * Wait for master to continue.
> +	 */
> +	for (timeout = 0; timeout < 50000; timeout++) {
> +		if (cpumask_test_cpu(cpuid, cpu_may_complete_boot_mask))
> +			break;
> +
> +		if (!cpumask_test_cpu(cpuid, cpu_callout_mask))
> +			break;
> +
> +		udelay(100);
> +	}
> +
> +	if (!cpumask_test_cpu(cpuid, cpu_may_complete_boot_mask))
> +		goto die;
> +
> +	notify_cpu_starting(cpuid);

Its absolutely broken to call CPU_STARTING after the master cpu is told
to continue. Once it returns from cpu_up() it assumes the secondary is
completely initialized and ready to run.

> +	return;
> +
> +die:

You've forgotten to clean up the bits set by set_cpu_sibling_map().

> +	/* was set by cpu_init() */
> +	cpumask_clear_cpu(smp_processor_id(), cpu_initialized_mask);
> +	cpumask_clear_cpu(smp_processor_id(), cpu_callin_mask);
> +	clear_local_APIC();
> +	play_dead();
>  }
>  
>  /*
> @@ -774,6 +799,8 @@ do_rest:
>  		}
>  
>  		if (cpumask_test_cpu(cpu, cpu_callin_mask)) {
> +			/* Signal AP that it may continue to boot */
> +			cpumask_set_cpu(cpu, cpu_may_complete_boot_mask);
>  			print_cpu_msr(&cpu_data(cpu));
>  			pr_debug("CPU%d: has booted.\n", cpu);
>  		} else {
> @@ -1250,6 +1277,7 @@ static void __ref remove_cpu_from_maps(int cpu)
>  	cpumask_clear_cpu(cpu, cpu_callin_mask);
>  	/* was set by cpu_init() */
>  	cpumask_clear_cpu(cpu, cpu_initialized_mask);
> +	cpumask_clear_cpu(cpu, cpu_may_complete_boot_mask);
>  	numa_remove_cpu(cpu);
>  }
>  


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] [x86]: abort secondary cpu bringup gracefully
  2012-05-12 17:39           ` Peter Zijlstra
@ 2012-05-12 18:51             ` Igor Mammedov
  2012-05-14 11:09               ` [RFC v2] " Igor Mammedov
  0 siblings, 1 reply; 27+ messages in thread
From: Igor Mammedov @ 2012-05-12 18:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, rob, mingo, hpa, x86, luto, suresh b siddha, avi,
	johnstul, arjan, Igor Mammedov

----- Original Message -----
> From: "Peter Zijlstra" <a.p.zijlstra@chello.nl>
> To: "Igor Mammedov" <imammedo@redhat.com>
> On Sat, 2012-05-12 at 21:32 +0200, Igor Mammedov wrote:
> 
> > @@ -232,12 +233,36 @@ static void __cpuinit smp_callin(void)
> >  	set_cpu_sibling_map(raw_smp_processor_id());
> >  	wmb();
> >  
> > -	notify_cpu_starting(cpuid);
> > -
> >  	/*
> >  	 * Allow the master to continue.
> >  	 */
> >  	cpumask_set_cpu(cpuid, cpu_callin_mask);
> > +
> > +	/*
> > +	 * Wait for master to continue.
> > +	 */
> > +	for (timeout = 0; timeout < 50000; timeout++) {
> > +		if (cpumask_test_cpu(cpuid, cpu_may_complete_boot_mask))
> > +			break;
> > +
> > +		if (!cpumask_test_cpu(cpuid, cpu_callout_mask))
> > +			break;
> > +
> > +		udelay(100);
> > +	}
> > +
> > +	if (!cpumask_test_cpu(cpuid, cpu_may_complete_boot_mask))
> > +		goto die;
> > +
> > +	notify_cpu_starting(cpuid);
> 
> Its absolutely broken to call CPU_STARTING after the master cpu is
> told
> to continue. Once it returns from cpu_up() it assumes the secondary
> is
> completely initialized and ready to run.
Wouldn't master cpu stop in  native_cpu_up() and wait till AP will set cpu_online_mask?

        local_irq_save(flags);
        check_tsc_sync_source(cpu);
        local_irq_restore(flags);

        while (!cpu_online(cpu)) {
                cpu_rela);
                touch_nmi_watchdog();
        }

So it shouldn't do anything till AP is online.

> 
> > +	return;
> > +
> > +die:
> 
> You've forgotten to clean up the bits set by set_cpu_sibling_map().
Thanks, I'll fix it.

> 
> > +	/* was set by cpu_init() */
> > +	cpumask_clear_cpu(smp_processor_id(), cpu_initialized_mask);
> > +	cpumask_clear_cpu(smp_processor_id(), cpu_callin_mask);
> > +	clear_local_APIC();
> > +	play_dead();
> >  }
> >  
> >  /*
> > @@ -774,6 +799,8 @@ do_rest:
> >  		}
> >  
> >  		if (cpumask_test_cpu(cpu, cpu_callin_mask)) {
> > +			/* Signal AP that it may continue to boot */
> > +			cpumask_set_cpu(cpu, cpu_may_complete_boot_mask);
> >  			print_cpu_msr(&cpu_data(cpu));
> >  			pr_debug("CPU%d: has booted.\n", cpu);
> >  		} else {
> > @@ -1250,6 +1277,7 @@ static void __ref remove_cpu_from_maps(int
> > cpu)
> >  	cpumask_clear_cpu(cpu, cpu_callin_mask);
> >  	/* was set by cpu_init() */
> >  	cpumask_clear_cpu(cpu, cpu_initialized_mask);
> > +	cpumask_clear_cpu(cpu, cpu_may_complete_boot_mask);
> >  	numa_remove_cpu(cpu);
> >  }
> >  
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC] [x86]: abort secondary cpu bringup gracefully
  2012-05-11 21:14       ` Thomas Gleixner
@ 2012-05-12 19:32         ` Igor Mammedov
  2012-05-12 17:39           ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Igor Mammedov @ 2012-05-12 19:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: rob, mingo, hpa, x86, luto, suresh.b.siddha, avi, a.p.zijlstra,
	johnstul, arjan


Thomas,
  Is this what you've meant?

Master cpu may timeout before cpu_callin_mask is set and decide to
abort cpu boot but being onlined cpu will continue to boot, set
cpu_active_mask and wait in check_tsc_sync_target() for master cpu
to arrive, that will never happen because master cpu aborted boot
proccess. Following attempt to online next cpu will hang in
stop_machine because it will wait on comletion of stop_work on all
cpus from cpu_active_mask and that will never happen because first
failed cpu spins in check_tsc_sync_target().

Introduce cpu_may_complete_boot_mask which will be set by master cpu
if it goes via normal boot path and decides to continue cpu bring up.
Being onlined cpu will continue to boot only if master cpu confirms
via cpu_may_complete_boot_mask its intention not to abort cpu bring up.
Otherwise being onlined cpu will gracefully die.

In addition if being onlined cpu timed-out waiting on cpu_callout_mask,
it should not panic but rather die.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 arch/x86/include/asm/cpumask.h |    1 +
 arch/x86/kernel/cpu/common.c   |    2 ++
 arch/x86/kernel/smpboot.c      |   34 +++++++++++++++++++++++++++++++---
 3 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/cpumask.h b/arch/x86/include/asm/cpumask.h
index 61c852f..eacd269 100644
--- a/arch/x86/include/asm/cpumask.h
+++ b/arch/x86/include/asm/cpumask.h
@@ -7,6 +7,7 @@ extern cpumask_var_t cpu_callin_mask;
 extern cpumask_var_t cpu_callout_mask;
 extern cpumask_var_t cpu_initialized_mask;
 extern cpumask_var_t cpu_sibling_setup_mask;
+extern cpumask_var_t cpu_may_complete_boot_mask;
 
 extern void setup_cpu_local_masks(void);
 
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index cf79302..50e91cb 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -48,6 +48,7 @@
 cpumask_var_t cpu_initialized_mask;
 cpumask_var_t cpu_callout_mask;
 cpumask_var_t cpu_callin_mask;
+cpumask_var_t cpu_may_complete_boot_mask;
 
 /* representing cpus for which sibling maps can be computed */
 cpumask_var_t cpu_sibling_setup_mask;
@@ -59,6 +60,7 @@ void __init setup_cpu_local_masks(void)
 	alloc_bootmem_cpumask_var(&cpu_callin_mask);
 	alloc_bootmem_cpumask_var(&cpu_callout_mask);
 	alloc_bootmem_cpumask_var(&cpu_sibling_setup_mask);
+	alloc_bootmem_cpumask_var(&cpu_may_complete_boot_mask);
 }
 
 static void __cpuinit default_init(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 6e1e406..b33149f 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -187,8 +187,9 @@ static void __cpuinit smp_callin(void)
 	}
 
 	if (!time_before(jiffies, timeout)) {
-		panic("%s: CPU%d started up but did not get a callout!\n",
+		pr_debug("%s: CPU%d started up but did not get a callout!\n",
 		      __func__, cpuid);
+		goto die;
 	}
 
 	/*
@@ -232,12 +233,36 @@ static void __cpuinit smp_callin(void)
 	set_cpu_sibling_map(raw_smp_processor_id());
 	wmb();
 
-	notify_cpu_starting(cpuid);
-
 	/*
 	 * Allow the master to continue.
 	 */
 	cpumask_set_cpu(cpuid, cpu_callin_mask);
+
+	/*
+	 * Wait for master to continue.
+	 */
+	for (timeout = 0; timeout < 50000; timeout++) {
+		if (cpumask_test_cpu(cpuid, cpu_may_complete_boot_mask))
+			break;
+
+		if (!cpumask_test_cpu(cpuid, cpu_callout_mask))
+			break;
+
+		udelay(100);
+	}
+
+	if (!cpumask_test_cpu(cpuid, cpu_may_complete_boot_mask))
+		goto die;
+
+	notify_cpu_starting(cpuid);
+	return;
+
+die:
+	/* was set by cpu_init() */
+	cpumask_clear_cpu(smp_processor_id(), cpu_initialized_mask);
+	cpumask_clear_cpu(smp_processor_id(), cpu_callin_mask);
+	clear_local_APIC();
+	play_dead();
 }
 
 /*
@@ -774,6 +799,8 @@ do_rest:
 		}
 
 		if (cpumask_test_cpu(cpu, cpu_callin_mask)) {
+			/* Signal AP that it may continue to boot */
+			cpumask_set_cpu(cpu, cpu_may_complete_boot_mask);
 			print_cpu_msr(&cpu_data(cpu));
 			pr_debug("CPU%d: has booted.\n", cpu);
 		} else {
@@ -1250,6 +1277,7 @@ static void __ref remove_cpu_from_maps(int cpu)
 	cpumask_clear_cpu(cpu, cpu_callin_mask);
 	/* was set by cpu_init() */
 	cpumask_clear_cpu(cpu, cpu_initialized_mask);
+	cpumask_clear_cpu(cpu, cpu_may_complete_boot_mask);
 	numa_remove_cpu(cpu);
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC v2] [x86]: abort secondary cpu bringup gracefully
  2012-05-12 18:51             ` Igor Mammedov
@ 2012-05-14 11:09               ` Igor Mammedov
  2012-05-24 15:41                 ` Igor Mammedov
  0 siblings, 1 reply; 27+ messages in thread
From: Igor Mammedov @ 2012-05-14 11:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: rob, tglx, mingo, hpa, x86, luto, suresh.b.siddha, avi, imammedo,
	a.p.zijlstra, johnstul, arjan

Forgot to cc v1 to tglx, so reposting fixed v2.

Thomas,
  is patch below what you've meant?

Master cpu may timeout before cpu_callin_mask is set and decide to
abort cpu boot but being onlined cpu will continue to boot, set
cpu_active_mask and wait in check_tsc_sync_target() for master cpu
to arrive, that will never happen because master cpu aborted boot
proccess. Following attempt to online next cpu will hang in
stop_machine because it will wait on comletion of stop_work on all
cpus from cpu_active_mask and that will never happen because first
failed cpu spins in check_tsc_sync_target().

Introduce cpu_may_complete_boot_mask which will be set by master cpu
if it goes via normal boot path and decides to continue cpu bring up.
Being onlined cpu will continue to boot only if master cpu confirms
via cpu_may_complete_boot_mask its intention not to abort cpu bring up.
Otherwise being onlined cpu will gracefully die.

Reason for moving setting cpu_callin_mask before is that one of
CPU_STARTING notfiers sets cpu_active_mask before it is known for sure
that cpu may continue to boot. And that may lead to soft lookups in
stop_machine when next cpu is booted but failed one haven't completed
cpu_*_mask cleaups yet.

It's ok for being onlined cpu to set cpu_callin_mask before calling
CPU_STARTING notifiers because master cpu may first wait on being
booted cpu in check_tsc_sync_source() and after that it waits in
native_cpu_up() till being booted cpu comes online and only when
being booted cpu sets cpu_online_mask it will exit native_cpu_up()
and then call CPU_ONLINE notifiers. So call sequence looks like this:

    CPU1                            CPU2

  wait till CPU2 sets
  cpu_callin_mask
 -------------------------------------------------
                                    set cpu_callin_mask
                                    wait till CPU1 sets
                                    cpu_may_complete_boot_mask
 -------------------------------------------------
  got ack from CPU2 via
  cpu_callin_mask

  set cpu_may_complete_boot_mask
  exit do_boot_cpu and return into
  native_cpu_up()

  in native_cpu_up() CPU1 may spin first in
  check_tsc_sync_source()
  and then wait till CPU2 will set
  cpu_online_mask:

        while (!cpu_online(cpu)) {
                cpu_relax();
                touch_nmi_watchdog();
        }
 -------------------------------------------------
                                     got ack from CPU1 via
                                     cpu_may_complete_boot_mask

                                     call CPU_STARTING notifiers
                                     ...
                                     call check_tsc_sync_target()
                                     set cpu_online_mask
 -------------------------------------------------
  got ack from CPU2 that it
  is online, return from native_cpu_up()

  call CPU_ONLINE notifiers
 ----

In addition if being onlined cpu timed-out waiting on cpu_callout_mask,
it should not panic but rather die.

v2:
  - added missing remove_siblinginfo() on 'die' error path.
  - added explanation why it's ok to set cpu_callin_mask before calling
    CPU_STARTING notifiers

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 arch/x86/include/asm/cpumask.h |    1 +
 arch/x86/kernel/cpu/common.c   |    2 ++
 arch/x86/kernel/smpboot.c      |   39 ++++++++++++++++++++++++++++++++++++---
 3 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/cpumask.h b/arch/x86/include/asm/cpumask.h
index 61c852f..eacd269 100644
--- a/arch/x86/include/asm/cpumask.h
+++ b/arch/x86/include/asm/cpumask.h
@@ -7,6 +7,7 @@ extern cpumask_var_t cpu_callin_mask;
 extern cpumask_var_t cpu_callout_mask;
 extern cpumask_var_t cpu_initialized_mask;
 extern cpumask_var_t cpu_sibling_setup_mask;
+extern cpumask_var_t cpu_may_complete_boot_mask;
 
 extern void setup_cpu_local_masks(void);
 
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index cf79302..50e91cb 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -48,6 +48,7 @@
 cpumask_var_t cpu_initialized_mask;
 cpumask_var_t cpu_callout_mask;
 cpumask_var_t cpu_callin_mask;
+cpumask_var_t cpu_may_complete_boot_mask;
 
 /* representing cpus for which sibling maps can be computed */
 cpumask_var_t cpu_sibling_setup_mask;
@@ -59,6 +60,7 @@ void __init setup_cpu_local_masks(void)
 	alloc_bootmem_cpumask_var(&cpu_callin_mask);
 	alloc_bootmem_cpumask_var(&cpu_callout_mask);
 	alloc_bootmem_cpumask_var(&cpu_sibling_setup_mask);
+	alloc_bootmem_cpumask_var(&cpu_may_complete_boot_mask);
 }
 
 static void __cpuinit default_init(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 6e1e406..b419e2e 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -136,6 +136,10 @@ EXPORT_PER_CPU_SYMBOL(cpu_info);
 
 atomic_t init_deasserted;
 
+#ifdef CONFIG_HOTPLUG_CPU
+static void remove_siblinginfo(int cpu);
+#endif
+
 /*
  * Report back to the Boot Processor.
  * Running on AP.
@@ -187,8 +191,9 @@ static void __cpuinit smp_callin(void)
 	}
 
 	if (!time_before(jiffies, timeout)) {
-		panic("%s: CPU%d started up but did not get a callout!\n",
+		pr_debug("%s: CPU%d started up but did not get a callout!\n",
 		      __func__, cpuid);
+		goto die;
 	}
 
 	/*
@@ -232,12 +237,37 @@ static void __cpuinit smp_callin(void)
 	set_cpu_sibling_map(raw_smp_processor_id());
 	wmb();
 
-	notify_cpu_starting(cpuid);
-
 	/*
 	 * Allow the master to continue.
 	 */
 	cpumask_set_cpu(cpuid, cpu_callin_mask);
+
+	/*
+	 * Wait for master to continue.
+	 */
+	for (timeout = 0; timeout < 50000; timeout++) {
+		if (cpumask_test_cpu(cpuid, cpu_may_complete_boot_mask))
+			break;
+
+		if (!cpumask_test_cpu(cpuid, cpu_callout_mask))
+			break;
+
+		udelay(100);
+	}
+
+	if (!cpumask_test_cpu(cpuid, cpu_may_complete_boot_mask))
+		goto die;
+
+	notify_cpu_starting(cpuid);
+	return;
+
+die:
+	/* was set by cpu_init() */
+	cpumask_clear_cpu(smp_processor_id(), cpu_initialized_mask);
+	remove_siblinginfo(cpuid);
+	cpumask_clear_cpu(smp_processor_id(), cpu_callin_mask);
+	clear_local_APIC();
+	play_dead();
 }
 
 /*
@@ -774,6 +804,8 @@ do_rest:
 		}
 
 		if (cpumask_test_cpu(cpu, cpu_callin_mask)) {
+			/* Signal AP that it may continue to boot */
+			cpumask_set_cpu(cpu, cpu_may_complete_boot_mask);
 			print_cpu_msr(&cpu_data(cpu));
 			pr_debug("CPU%d: has booted.\n", cpu);
 		} else {
@@ -1250,6 +1282,7 @@ static void __ref remove_cpu_from_maps(int cpu)
 	cpumask_clear_cpu(cpu, cpu_callin_mask);
 	/* was set by cpu_init() */
 	cpumask_clear_cpu(cpu, cpu_initialized_mask);
+	cpumask_clear_cpu(cpu, cpu_may_complete_boot_mask);
 	numa_remove_cpu(cpu);
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC v2] [x86]: abort secondary cpu bringup gracefully
  2012-05-14 11:09               ` [RFC v2] " Igor Mammedov
@ 2012-05-24 15:41                 ` Igor Mammedov
  2012-05-25 18:11                   ` Rob Landley
  0 siblings, 1 reply; 27+ messages in thread
From: Igor Mammedov @ 2012-05-24 15:41 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: linux-kernel, rob, tglx, mingo, hpa, x86, luto, suresh.b.siddha,
	avi, a.p.zijlstra, johnstul, arjan

ping for reviewers.

Please review patch.

On 05/14/2012 01:09 PM, Igor Mammedov wrote:
> Forgot to cc v1 to tglx, so reposting fixed v2.
>
> Thomas,
>    is patch below what you've meant?
>
> Master cpu may timeout before cpu_callin_mask is set and decide to
> abort cpu boot but being onlined cpu will continue to boot, set
> cpu_active_mask and wait in check_tsc_sync_target() for master cpu
> to arrive, that will never happen because master cpu aborted boot
> proccess. Following attempt to online next cpu will hang in
> stop_machine because it will wait on comletion of stop_work on all
> cpus from cpu_active_mask and that will never happen because first
> failed cpu spins in check_tsc_sync_target().
>
> Introduce cpu_may_complete_boot_mask which will be set by master cpu
> if it goes via normal boot path and decides to continue cpu bring up.
> Being onlined cpu will continue to boot only if master cpu confirms
> via cpu_may_complete_boot_mask its intention not to abort cpu bring up.
> Otherwise being onlined cpu will gracefully die.
>
> Reason for moving setting cpu_callin_mask before is that one of
> CPU_STARTING notfiers sets cpu_active_mask before it is known for sure
> that cpu may continue to boot. And that may lead to soft lookups in
> stop_machine when next cpu is booted but failed one haven't completed
> cpu_*_mask cleaups yet.
>
> It's ok for being onlined cpu to set cpu_callin_mask before calling
> CPU_STARTING notifiers because master cpu may first wait on being
> booted cpu in check_tsc_sync_source() and after that it waits in
> native_cpu_up() till being booted cpu comes online and only when
> being booted cpu sets cpu_online_mask it will exit native_cpu_up()
> and then call CPU_ONLINE notifiers. So call sequence looks like this:
>
>      CPU1                            CPU2
>
>    wait till CPU2 sets
>    cpu_callin_mask
>   -------------------------------------------------
>                                      set cpu_callin_mask
>                                      wait till CPU1 sets
>                                      cpu_may_complete_boot_mask
>   -------------------------------------------------
>    got ack from CPU2 via
>    cpu_callin_mask
>
>    set cpu_may_complete_boot_mask
>    exit do_boot_cpu and return into
>    native_cpu_up()
>
>    in native_cpu_up() CPU1 may spin first in
>    check_tsc_sync_source()
>    and then wait till CPU2 will set
>    cpu_online_mask:
>
>          while (!cpu_online(cpu)) {
>                  cpu_relax();
>                  touch_nmi_watchdog();
>          }
>   -------------------------------------------------
>                                       got ack from CPU1 via
>                                       cpu_may_complete_boot_mask
>
>                                       call CPU_STARTING notifiers
>                                       ...
>                                       call check_tsc_sync_target()
>                                       set cpu_online_mask
>   -------------------------------------------------
>    got ack from CPU2 that it
>    is online, return from native_cpu_up()
>
>    call CPU_ONLINE notifiers
>   ----
>
> In addition if being onlined cpu timed-out waiting on cpu_callout_mask,
> it should not panic but rather die.
>
> v2:
>    - added missing remove_siblinginfo() on 'die' error path.
>    - added explanation why it's ok to set cpu_callin_mask before calling
>      CPU_STARTING notifiers
>
> Signed-off-by: Igor Mammedov<imammedo@redhat.com>
> ---
>   arch/x86/include/asm/cpumask.h |    1 +
>   arch/x86/kernel/cpu/common.c   |    2 ++
>   arch/x86/kernel/smpboot.c      |   39 ++++++++++++++++++++++++++++++++++++---
>   3 files changed, 39 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/cpumask.h b/arch/x86/include/asm/cpumask.h
> index 61c852f..eacd269 100644
> --- a/arch/x86/include/asm/cpumask.h
> +++ b/arch/x86/include/asm/cpumask.h
> @@ -7,6 +7,7 @@ extern cpumask_var_t cpu_callin_mask;
>   extern cpumask_var_t cpu_callout_mask;
>   extern cpumask_var_t cpu_initialized_mask;
>   extern cpumask_var_t cpu_sibling_setup_mask;
> +extern cpumask_var_t cpu_may_complete_boot_mask;
>
>   extern void setup_cpu_local_masks(void);
>
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index cf79302..50e91cb 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -48,6 +48,7 @@
>   cpumask_var_t cpu_initialized_mask;
>   cpumask_var_t cpu_callout_mask;
>   cpumask_var_t cpu_callin_mask;
> +cpumask_var_t cpu_may_complete_boot_mask;
>
>   /* representing cpus for which sibling maps can be computed */
>   cpumask_var_t cpu_sibling_setup_mask;
> @@ -59,6 +60,7 @@ void __init setup_cpu_local_masks(void)
>   	alloc_bootmem_cpumask_var(&cpu_callin_mask);
>   	alloc_bootmem_cpumask_var(&cpu_callout_mask);
>   	alloc_bootmem_cpumask_var(&cpu_sibling_setup_mask);
> +	alloc_bootmem_cpumask_var(&cpu_may_complete_boot_mask);
>   }
>
>   static void __cpuinit default_init(struct cpuinfo_x86 *c)
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 6e1e406..b419e2e 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -136,6 +136,10 @@ EXPORT_PER_CPU_SYMBOL(cpu_info);
>
>   atomic_t init_deasserted;
>
> +#ifdef CONFIG_HOTPLUG_CPU
> +static void remove_siblinginfo(int cpu);
> +#endif
> +
>   /*
>    * Report back to the Boot Processor.
>    * Running on AP.
> @@ -187,8 +191,9 @@ static void __cpuinit smp_callin(void)
>   	}
>
>   	if (!time_before(jiffies, timeout)) {
> -		panic("%s: CPU%d started up but did not get a callout!\n",
> +		pr_debug("%s: CPU%d started up but did not get a callout!\n",
>   		      __func__, cpuid);
> +		goto die;
>   	}
>
>   	/*
> @@ -232,12 +237,37 @@ static void __cpuinit smp_callin(void)
>   	set_cpu_sibling_map(raw_smp_processor_id());
>   	wmb();
>
> -	notify_cpu_starting(cpuid);
> -
>   	/*
>   	 * Allow the master to continue.
>   	 */
>   	cpumask_set_cpu(cpuid, cpu_callin_mask);
> +
> +	/*
> +	 * Wait for master to continue.
> +	 */
> +	for (timeout = 0; timeout<  50000; timeout++) {
> +		if (cpumask_test_cpu(cpuid, cpu_may_complete_boot_mask))
> +			break;
> +
> +		if (!cpumask_test_cpu(cpuid, cpu_callout_mask))
> +			break;
> +
> +		udelay(100);
> +	}
> +
> +	if (!cpumask_test_cpu(cpuid, cpu_may_complete_boot_mask))
> +		goto die;
> +
> +	notify_cpu_starting(cpuid);
> +	return;
> +
> +die:
> +	/* was set by cpu_init() */
> +	cpumask_clear_cpu(smp_processor_id(), cpu_initialized_mask);
> +	remove_siblinginfo(cpuid);
> +	cpumask_clear_cpu(smp_processor_id(), cpu_callin_mask);
> +	clear_local_APIC();
> +	play_dead();
>   }
>
>   /*
> @@ -774,6 +804,8 @@ do_rest:
>   		}
>
>   		if (cpumask_test_cpu(cpu, cpu_callin_mask)) {
> +			/* Signal AP that it may continue to boot */
> +			cpumask_set_cpu(cpu, cpu_may_complete_boot_mask);
>   			print_cpu_msr(&cpu_data(cpu));
>   			pr_debug("CPU%d: has booted.\n", cpu);
>   		} else {
> @@ -1250,6 +1282,7 @@ static void __ref remove_cpu_from_maps(int cpu)
>   	cpumask_clear_cpu(cpu, cpu_callin_mask);
>   	/* was set by cpu_init() */
>   	cpumask_clear_cpu(cpu, cpu_initialized_mask);
> +	cpumask_clear_cpu(cpu, cpu_may_complete_boot_mask);
>   	numa_remove_cpu(cpu);
>   }
>

-- 
-----
  Igor

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v2] [x86]: abort secondary cpu bringup gracefully
  2012-05-24 15:41                 ` Igor Mammedov
@ 2012-05-25 18:11                   ` Rob Landley
  2012-05-30 16:38                     ` Igor Mammedov
  0 siblings, 1 reply; 27+ messages in thread
From: Rob Landley @ 2012-05-25 18:11 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: linux-kernel, tglx, mingo, hpa, x86, luto, suresh.b.siddha, avi,
	a.p.zijlstra, johnstul, arjan

On 05/24/2012 10:41 AM, Igor Mammedov wrote:
> ping for reviewers.
> 
> Please review patch.

I can't hugely comment on the guts of what the patch is doing, but:

>> --- a/arch/x86/kernel/smpboot.c
>> +++ b/arch/x86/kernel/smpboot.c
>> @@ -136,6 +136,10 @@ EXPORT_PER_CPU_SYMBOL(cpu_info);
>>
>>   atomic_t init_deasserted;
>>
>> +#ifdef CONFIG_HOTPLUG_CPU
>> +static void remove_siblinginfo(int cpu);
>> +#endif
>> +

#ifdefs should almost never be in C code, they should be in header
files. You can stub out functions with empty inline versions.  For a
random example, see kernel/smpboot.h

Rob
-- 
GNU/Linux isn't: Linux=GPLv2, GNU=GPLv3+, they can't share code.
Either it's "mere aggregation", or a license violation.  Pick one.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v2] [x86]: abort secondary cpu bringup gracefully
  2012-05-25 18:11                   ` Rob Landley
@ 2012-05-30 16:38                     ` Igor Mammedov
  0 siblings, 0 replies; 27+ messages in thread
From: Igor Mammedov @ 2012-05-30 16:38 UTC (permalink / raw)
  To: Rob Landley
  Cc: linux-kernel, tglx, mingo, hpa, x86, luto, suresh.b.siddha, avi,
	a.p.zijlstra, johnstul, arjan

On 05/25/2012 08:11 PM, Rob Landley wrote:
> On 05/24/2012 10:41 AM, Igor Mammedov wrote:
>> ping for reviewers.
>>
>> Please review patch.
>
> I can't hugely comment on the guts of what the patch is doing, but:
>
>>> --- a/arch/x86/kernel/smpboot.c
>>> +++ b/arch/x86/kernel/smpboot.c
>>> @@ -136,6 +136,10 @@ EXPORT_PER_CPU_SYMBOL(cpu_info);
>>>
>>>    atomic_t init_deasserted;
>>>
>>> +#ifdef CONFIG_HOTPLUG_CPU
>>> +static void remove_siblinginfo(int cpu);
>>> +#endif
>>> +
>
> #ifdefs should almost never be in C code, they should be in header
> files. You can stub out functions with empty inline versions.  For a
> random example, see kernel/smpboot.h

If it were not just local function, then I'd do so. But since this
function is used only in this file and defined under the same #ifdefs
after smp_callin(), it could stay there and not pollute headers
with non public function declaration.

I made remove_siblinginfo() a forward declaration  before smp_callin(),
because of I didn't have much justification to move ~20 lines function
to be available a bit earlier that it is now.

However thanks for mentioning stubs. I should add stub for building
kernel without CONFIG_HOTPLUG_CPU, to prevent build breakage.

I'll fix and repost patch.

-- 
-----
  Igor

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2012-05-30 16:39 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-09 10:24 [PATCH 0/5] [x86]: Improve secondary CPU bring-up process robustness Igor Mammedov
2012-05-09  9:19 ` Peter Zijlstra
2012-05-09 12:29   ` Igor Mammedov
2012-05-09 13:12   ` Ingo Molnar
2012-05-10 17:31     ` Rob Landley
2012-05-10 17:39       ` Peter Zijlstra
2012-05-09 10:24 ` [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up Igor Mammedov
2012-05-09 15:04   ` Shuah Khan
2012-05-09 15:22     ` Igor Mammedov
2012-05-09 15:34       ` Shuah Khan
2012-05-10 15:26         ` Shuah Khan
2012-05-10 16:29           ` Igor Mammedov
2012-05-10 16:38             ` Shuah Khan
2012-05-11 11:45   ` Thomas Gleixner
2012-05-11 15:16     ` Igor Mammedov
2012-05-11 21:14       ` Thomas Gleixner
2012-05-12 19:32         ` [RFC] [x86]: abort secondary cpu bringup gracefully Igor Mammedov
2012-05-12 17:39           ` Peter Zijlstra
2012-05-12 18:51             ` Igor Mammedov
2012-05-14 11:09               ` [RFC v2] " Igor Mammedov
2012-05-24 15:41                 ` Igor Mammedov
2012-05-25 18:11                   ` Rob Landley
2012-05-30 16:38                     ` Igor Mammedov
2012-05-09 10:24 ` [PATCH 2/5] Take in account that several cpus might call check_tsc_sync_* at the same time Igor Mammedov
2012-05-09 10:25 ` [PATCH 3/5] Do not wait till next cpu online and abort early if lead cpu do not wait for us anymore Igor Mammedov
2012-05-09 10:25 ` [PATCH 4/5] Cancel secondary CPU bringup if boot cpu abandoned this effort Igor Mammedov
2012-05-09 10:25 ` [PATCH 5/5] Do not mark cpu as not present if we failed to boot it Igor Mammedov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.