linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/5] x86: fix hang when AP bringup is too slow
@ 2014-04-14 15:11 Igor Mammedov
  2014-04-14 15:11 ` [PATCH v4 1/5] x86: fix list corruption on CPU hotplug Igor Mammedov
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Igor Mammedov @ 2014-04-14 15:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, mingo, hpa, x86, imammedo, bp, paul.gortmaker, JBeulich,
	prarit, drjones, toshi.kani, riel, gong.chen, andi, lenb, rjw,
	linux-acpi

changes since v3:
 * put simple bugfixes first
 * move common part of syncing with master CPU in cpu_init()
   for x32/64 variant into helper function
 * cpu_init(): WARN_ON if cpu_initialized_mask is set
 * fix panic on CPU unplug, caused by erroneous removing
   of "pr->dev = dev;" in drivers/acpi/acpi_processor.c

--
Hang is observed on virtual machines during CPU hotplug,
especially in big guests with many CPUs. (It happens more
often if host is over-committed).

Hang happens because master CPU timeouts on waiting till
AP boots and 'cancels' CPU online operation assuming AP
is not functional but AP may continue run wild later
causing various hangs or panics in running kernel that
is assuming that AP was offline.

This is an alternative approach, that instead of canceling
in-progress AP bringup (https://lkml.org/lkml/2014/3/6/257),
removes timeouts so that AP bringup won't be affected by
poor timing and syncs AP with master CPU at early startup
making sure that AP won't run wild if master CPU doesn't
expect AP to come online.

Series also fixes 3 bugs found during testing CPU bringup
failure case.

--
Below is the detailed description of a more often happening hang:
---
Master CPU may timeout before cpu_callin_mask is set and cancel
booting CPU, but being onlined CPU still continues to boot, sets
cpu_active_mask (CPU_STARTING notifiers) and spins in
check_tsc_sync_target() for master cpu to arrive. Following attempt
to online another cpu hangs in stop_machine, initiated from here:
smp_callin ->
  smp_store_cpu_info ->
    identify_secondary_cpu ->
      mtrr_ap_init -> set_mtrr_from_inactive_cpu

stop_machine waits on completion of stop_work on all CPUs from
cpu_active_mask including a failed CPU that spins in check_tsc_sync_target().


Igor Mammedov (5):
  x86: fix list corruption on CPU hotplug
  x86: fix memory corruption in acpi_unmap_lsapic()
  acpi_processor: do not mark present at boot but not onlined CPU as
    onlined
  x86: log error on secondary CPU wakeup failure at ERR level
  x86: initialize secondary CPU only if master CPU will wait for it

 arch/x86/kernel/cpu/common.c  |   27 ++++++----
 arch/x86/kernel/smpboot.c     |  103 ++++++++++++----------------------------
 drivers/acpi/acpi_processor.c |    1 -
 3 files changed, 47 insertions(+), 84 deletions(-)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v4 1/5] x86: fix list corruption on CPU hotplug
  2014-04-14 15:11 [PATCH v4 0/5] x86: fix hang when AP bringup is too slow Igor Mammedov
@ 2014-04-14 15:11 ` Igor Mammedov
  2014-04-30 21:18   ` Toshi Kani
  2014-04-14 15:11 ` [PATCH v4 2/5] x86: fix memory corruption in acpi_unmap_lsapic() Igor Mammedov
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Igor Mammedov @ 2014-04-14 15:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, mingo, hpa, x86, imammedo, bp, paul.gortmaker, JBeulich,
	prarit, drjones, toshi.kani, riel, gong.chen, andi, lenb, rjw,
	linux-acpi

currently if AP wake up is failed, master CPU marks AP as not present
in do_boot_cpu() by calling set_cpu_present(cpu, false).
That leads to following list corruption on the next physical CPU
hotplug:

[  418.107336] WARNING: CPU: 1 PID: 45 at lib/list_debug.c:33 __list_add+0xbe/0xd0()
[  418.115268] list_add corruption. prev->next should be next (ffff88003dc57600), but was ffff88003e20c3a0. (prev=ffff88003e20c3a0).
[  418.123693] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT ipt_REJECT cfg80211 xt_conntrack rfkill ee
[  418.138979] CPU: 1 PID: 45 Comm: kworker/u10:1 Not tainted 3.14.0-rc6+ #387
[  418.149989] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[  418.165750] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
[  418.166433]  0000000000000021 ffff880038ca7988 ffffffff8159b22d 0000000000000021
[  418.176460]  ffff880038ca79d8 ffff880038ca79c8 ffffffff8106942c ffff880038ca79e8
[  418.177453]  ffff88003e20c3a0 ffff88003dc57600 ffff88003e20c3a0 00000000ffffffea
[  418.178445] Call Trace:
[  418.185811]  [<ffffffff8159b22d>] dump_stack+0x49/0x5c
[  418.186440]  [<ffffffff8106942c>] warn_slowpath_common+0x8c/0xc0
[  418.187192]  [<ffffffff81069516>] warn_slowpath_fmt+0x46/0x50
[  418.191231]  [<ffffffff8136ef51>] ? acpi_ns_get_node+0xb7/0xc7
[  418.193889]  [<ffffffff812f796e>] __list_add+0xbe/0xd0
[  418.196649]  [<ffffffff812e2aa9>] kobject_add_internal+0x79/0x200
[  418.208610]  [<ffffffff812e2e18>] kobject_add_varg+0x38/0x60
[  418.213831]  [<ffffffff812e2ef4>] kobject_add+0x44/0x70
[  418.229961]  [<ffffffff813e2c60>] device_add+0xd0/0x550
[  418.234991]  [<ffffffff813f0e95>] ? pm_runtime_init+0xe5/0xf0
[  418.250226]  [<ffffffff813e32be>] device_register+0x1e/0x30
[  418.255296]  [<ffffffff813e82a3>] register_cpu+0xe3/0x130
[  418.266539]  [<ffffffff81592be5>] arch_register_cpu+0x65/0x150
[  418.285845]  [<ffffffff81355c0d>] acpi_processor_hotadd_init+0x5a/0x9b
...
Which is caused by the fact that generic_processor_info() allocates
logical CPU id by calling:

 cpu = cpumask_next_zero(-1, cpu_present_mask);

which returns id of previously failed to wake up CPU, since its bit
is cleared by do_boot_cpu() and as result register_cpu() tries to
register another CPU with the same id as already present but failed
to be onlined CPU.

Taking in account that AP will not do anything if master CPU failed to
wake it up, there is no reason to mark that AP as not present and
break next cpu hotplug attempts. As a side effect of not marking AP
as not present, user would be allowed to online it again later.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 arch/x86/kernel/smpboot.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 3482693..6124f15 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -860,7 +860,6 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 		/* was set by cpu_init() */
 		cpumask_clear_cpu(cpu, cpu_initialized_mask);
 
-		set_cpu_present(cpu, false);
 		per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
 	}
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 2/5] x86: fix memory corruption in acpi_unmap_lsapic()
  2014-04-14 15:11 [PATCH v4 0/5] x86: fix hang when AP bringup is too slow Igor Mammedov
  2014-04-14 15:11 ` [PATCH v4 1/5] x86: fix list corruption on CPU hotplug Igor Mammedov
@ 2014-04-14 15:11 ` Igor Mammedov
  2014-04-14 15:11 ` [PATCH v4 3/5] acpi_processor: do not mark present at boot but not onlined CPU as onlined Igor Mammedov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Igor Mammedov @ 2014-04-14 15:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, mingo, hpa, x86, imammedo, bp, paul.gortmaker, JBeulich,
	prarit, drjones, toshi.kani, riel, gong.chen, andi, lenb, rjw,
	linux-acpi

if during CPU hotplug master CPU failed to wake up AP
it set percpu x86_cpu_to_apicid to BAD_APICID=0xFFFF for AP.

However following attempt to unplug that CPU will lead to
out of bound write access to __apicid_to_node[] which is
32768 items long on x86_64 kernel.

So drop setting x86_cpu_to_apicid to BAD_APICID in do_boot_cpu()
and allow acpi_processor_remove()->acpi_unmap_lsapic() cleanly
remove CPU.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 arch/x86/kernel/smpboot.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 6124f15..2988f69 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -859,8 +859,6 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 
 		/* was set by cpu_init() */
 		cpumask_clear_cpu(cpu, cpu_initialized_mask);
-
-		per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
 	}
 
 	/* mark "stuck" area as not stuck */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 3/5] acpi_processor: do not mark present at boot but not onlined CPU as onlined
  2014-04-14 15:11 [PATCH v4 0/5] x86: fix hang when AP bringup is too slow Igor Mammedov
  2014-04-14 15:11 ` [PATCH v4 1/5] x86: fix list corruption on CPU hotplug Igor Mammedov
  2014-04-14 15:11 ` [PATCH v4 2/5] x86: fix memory corruption in acpi_unmap_lsapic() Igor Mammedov
@ 2014-04-14 15:11 ` Igor Mammedov
  2014-04-15  5:48   ` Rafael J. Wysocki
                     ` (2 more replies)
  2014-04-14 15:11 ` [PATCH v4 4/5] x86: log error on secondary CPU wakeup failure at ERR level Igor Mammedov
  2014-04-14 15:11 ` [PATCH v4 5/5] x86: initialize secondary CPU only if master CPU will wait for it Igor Mammedov
  4 siblings, 3 replies; 20+ messages in thread
From: Igor Mammedov @ 2014-04-14 15:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, mingo, hpa, x86, imammedo, bp, paul.gortmaker, JBeulich,
	prarit, drjones, toshi.kani, riel, gong.chen, andi, lenb, rjw,
	linux-acpi

acpi_processor_add() assumes that present at boot CPUs
are always onlined, it is not so if a CPU failed to become
onlined. As result acpi_processor_add() will mark such CPU
device as onlined in sysfs and following attempts to
online/offline it using /sys/device/system/cpu/cpuX/online
attribute will fail.

Do not poke into device internals in acpi_processor_add()
and touch "struct device { .offline }" attribute, since
for CPUs onlined at boot it's set by:
  topology_init() -> arch_register_cpu() -> register_cpu()
before ACPI device tree is parsed, and for hotplugged
CPUs it's set when userspace onlines CPU via sysfs.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
v2:
 - fix regression in v1 leading to NULL pointer dereference
   on CPU unplug, do not remove "pr->dev = dev;"
---
 drivers/acpi/acpi_processor.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index c29c2c3..42d66f8 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -404,7 +404,6 @@ static int acpi_processor_add(struct acpi_device *device,
 		goto err;
 
 	pr->dev = dev;
-	dev->offline = pr->flags.need_hotplug_init;
 
 	/* Trigger the processor driver's .probe() if present. */
 	if (device_attach(dev) >= 0)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 4/5] x86: log error on secondary CPU wakeup failure at ERR level
  2014-04-14 15:11 [PATCH v4 0/5] x86: fix hang when AP bringup is too slow Igor Mammedov
                   ` (2 preceding siblings ...)
  2014-04-14 15:11 ` [PATCH v4 3/5] acpi_processor: do not mark present at boot but not onlined CPU as onlined Igor Mammedov
@ 2014-04-14 15:11 ` Igor Mammedov
  2014-04-30 21:30   ` Toshi Kani
  2014-04-14 15:11 ` [PATCH v4 5/5] x86: initialize secondary CPU only if master CPU will wait for it Igor Mammedov
  4 siblings, 1 reply; 20+ messages in thread
From: Igor Mammedov @ 2014-04-14 15:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, mingo, hpa, x86, imammedo, bp, paul.gortmaker, JBeulich,
	prarit, drjones, toshi.kani, riel, gong.chen, andi, lenb, rjw,
	linux-acpi

If system is running without debug level logging,
it will not log error if do_boot_cpu() failed to
wakeup AP. It may lead to silent AP bringup
failures at boot time.
Change message level to KERN_ERR to make error
visible to user as it's done on other architectures.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 arch/x86/kernel/smpboot.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 2988f69..ae2fd97 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -918,7 +918,7 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
 
 	err = do_boot_cpu(apicid, cpu, tidle);
 	if (err) {
-		pr_debug("do_boot_cpu failed %d\n", err);
+		pr_err("do_boot_cpu failed(%d) to wakeup CPU#%u\n", err, cpu);
 		return -EIO;
 	}
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 5/5] x86: initialize secondary CPU only if master CPU will wait for it
  2014-04-14 15:11 [PATCH v4 0/5] x86: fix hang when AP bringup is too slow Igor Mammedov
                   ` (3 preceding siblings ...)
  2014-04-14 15:11 ` [PATCH v4 4/5] x86: log error on secondary CPU wakeup failure at ERR level Igor Mammedov
@ 2014-04-14 15:11 ` Igor Mammedov
  2014-05-01 23:11   ` Toshi Kani
  4 siblings, 1 reply; 20+ messages in thread
From: Igor Mammedov @ 2014-04-14 15:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, mingo, hpa, x86, imammedo, bp, paul.gortmaker, JBeulich,
	prarit, drjones, toshi.kani, riel, gong.chen, andi, lenb, rjw,
	linux-acpi

Hang is observed on virtual machines during CPU hotplug,
especially in big guests with many CPUs. (It reproducible
more often if host is over-committed).

It happens because master CPU gives up waiting on
secondary CPU and allows it to run wild. As result
AP causes locking or crashing system. For example
as described here: https://lkml.org/lkml/2014/3/6/257

If master CPU have sent STARTUP IPI successfully,
and AP signalled to master CPU that it's ready
to start initialization, make master CPU wait
indefinitely till AP is onlined.
To ensure that AP won't ever run wild, make it
wait at early startup till master CPU confirms its
intention to wait for AP.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
v2:
 - ammend comment in cpu_init()
v3:
 - leave timeouts in do_boot_cpu(), so that master CPU
   won't hang if AP doesn't respond, use cpu_initialized_mask
   as a way for AP to signal to master CPU that it's ready
   to start initialzation.
v4:
 - move common code in cpu_init() for x32/x64 in shared
   helper function wait_for_master_cpu()
 - add WARN_ON(cpumask_test_and_set_cpu(cpu, cpu_initialized_mask))
   to wait_formaster_cpu()
---
 arch/x86/kernel/cpu/common.c |   27 +++++++-----
 arch/x86/kernel/smpboot.c    |   98 +++++++++++++-----------------------------
 2 files changed, 46 insertions(+), 79 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a135239..a4bcbac 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1221,6 +1221,17 @@ static void dbg_restore_debug_regs(void)
 #define dbg_restore_debug_regs()
 #endif /* ! CONFIG_KGDB */
 
+static void wait_for_master_cpu(int cpu)
+{
+	/*
+	 * wait for ACK from master CPU before continuing
+	 * with AP initialization
+	 */
+	WARN_ON(cpumask_test_and_set_cpu(cpu, cpu_initialized_mask));
+	while (!cpumask_test_cpu(cpu, cpu_callout_mask))
+		cpu_relax();
+}
+
 /*
  * cpu_init() initializes state that is per-CPU. Some data is already
  * initialized (naturally) in the bootstrap process, such as the GDT
@@ -1236,16 +1247,17 @@ void cpu_init(void)
 	struct task_struct *me;
 	struct tss_struct *t;
 	unsigned long v;
-	int cpu;
+	int cpu = stack_smp_processor_id();
 	int i;
 
+	wait_for_master_cpu(cpu);
+
 	/*
 	 * Load microcode on this cpu if a valid microcode is available.
 	 * This is early microcode loading procedure.
 	 */
 	load_ucode_ap();
 
-	cpu = stack_smp_processor_id();
 	t = &per_cpu(init_tss, cpu);
 	oist = &per_cpu(orig_ist, cpu);
 
@@ -1257,9 +1269,6 @@ void cpu_init(void)
 
 	me = current;
 
-	if (cpumask_test_and_set_cpu(cpu, cpu_initialized_mask))
-		panic("CPU#%d already initialized!\n", cpu);
-
 	pr_debug("Initializing CPU#%d\n", cpu);
 
 	clear_in_cr4(X86_CR4_VME|X86_CR4_PVI|X86_CR4_TSD|X86_CR4_DE);
@@ -1336,13 +1345,9 @@ void cpu_init(void)
 	struct tss_struct *t = &per_cpu(init_tss, cpu);
 	struct thread_struct *thread = &curr->thread;
 
-	show_ucode_info_early();
+	wait_for_master_cpu(cpu);
 
-	if (cpumask_test_and_set_cpu(cpu, cpu_initialized_mask)) {
-		printk(KERN_WARNING "CPU#%d already initialized!\n", cpu);
-		for (;;)
-			local_irq_enable();
-	}
+	show_ucode_info_early();
 
 	printk(KERN_INFO "Initializing CPU#%d\n", cpu);
 
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ae2fd97..44903ad 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -111,7 +111,6 @@ atomic_t init_deasserted;
 static void smp_callin(void)
 {
 	int cpuid, phys_id;
-	unsigned long timeout;
 
 	/*
 	 * If waken up by an INIT in an 82489DX configuration
@@ -130,37 +129,6 @@ static void smp_callin(void)
 	 * (This works even if the APIC is not enabled.)
 	 */
 	phys_id = read_apic_id();
-	if (cpumask_test_cpu(cpuid, cpu_callin_mask)) {
-		panic("%s: phys CPU#%d, CPU#%d already present??\n", __func__,
-					phys_id, cpuid);
-	}
-	pr_debug("CPU#%d (phys ID: %d) waiting for CALLOUT\n", cpuid, phys_id);
-
-	/*
-	 * STARTUP IPIs are fragile beasts as they might sometimes
-	 * trigger some glue motherboard logic. Complete APIC bus
-	 * silence for 1 second, this overestimates the time the
-	 * boot CPU is spending to send the up to 2 STARTUP IPIs
-	 * by a factor of two. This should be enough.
-	 */
-
-	/*
-	 * Waiting 2s total for startup (udelay is not yet working)
-	 */
-	timeout = jiffies + 2*HZ;
-	while (time_before(jiffies, timeout)) {
-		/*
-		 * Has the boot CPU finished it's STARTUP sequence?
-		 */
-		if (cpumask_test_cpu(cpuid, cpu_callout_mask))
-			break;
-		cpu_relax();
-	}
-
-	if (!time_before(jiffies, timeout)) {
-		panic("%s: CPU%d started up but did not get a callout!\n",
-		      __func__, cpuid);
-	}
 
 	/*
 	 * the boot CPU has finished the init stage and is spinning
@@ -750,8 +718,8 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 	unsigned long start_ip = real_mode_header->trampoline_start;
 
 	unsigned long boot_error = 0;
-	int timeout;
 	int cpu0_nmi_registered = 0;
+	unsigned long timeout;
 
 	/* Just in case we booted with a single CPU. */
 	alternatives_enable_smp();
@@ -799,6 +767,14 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 	}
 
 	/*
+	 * AP might wait on cpu_callout_mask in cpu_init() with
+	 * cpu_initialized_mask set if previous attempt to online
+	 * it timed-out. Clear cpu_initialized_mask so that after
+	 * INIT/SIPI it could start with a clean state.
+	 */
+	cpumask_clear_cpu(cpu, cpu_initialized_mask);
+
+	/*
 	 * Wake up a CPU in difference cases:
 	 * - Use the method in the APIC driver if it's defined
 	 * Otherwise,
@@ -810,55 +786,41 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 		boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid,
 						     &cpu0_nmi_registered);
 
+
 	if (!boot_error) {
 		/*
-		 * allow APs to start initializing.
+		 * Wait 10s total for a response from AP
 		 */
-		pr_debug("Before Callout %d\n", cpu);
-		cpumask_set_cpu(cpu, cpu_callout_mask);
-		pr_debug("After Callout %d\n", cpu);
+		boot_error = -1;
+		timeout = jiffies + 10*HZ;
+		while (time_before(jiffies, timeout)) {
+			if (cpumask_test_cpu(cpu, cpu_initialized_mask)) {
+				/*
+				 * Tell AP to proceed with initialization
+				 */
+				cpumask_set_cpu(cpu, cpu_callout_mask);
+				boot_error = 0;
+				break;
+			}
+			udelay(100);
+			schedule();
+		}
+	}
 
+	if (!boot_error) {
 		/*
-		 * Wait 5s total for a response
+		 * Wait till AP completes initial initialization
 		 */
-		for (timeout = 0; timeout < 50000; timeout++) {
-			if (cpumask_test_cpu(cpu, cpu_callin_mask))
-				break;	/* It has booted */
-			udelay(100);
+		while (!cpumask_test_cpu(cpu, cpu_callin_mask)) {
 			/*
 			 * Allow other tasks to run while we wait for the
 			 * AP to come online. This also gives a chance
 			 * for the MTRR work(triggered by the AP coming online)
 			 * to be completed in the stop machine context.
 			 */
+			udelay(100);
 			schedule();
 		}
-
-		if (cpumask_test_cpu(cpu, cpu_callin_mask)) {
-			print_cpu_msr(&cpu_data(cpu));
-			pr_debug("CPU%d: has booted.\n", cpu);
-		} else {
-			boot_error = 1;
-			if (*trampoline_status == 0xA5A5A5A5)
-				/* trampoline started but...? */
-				pr_err("CPU%d: Stuck ??\n", cpu);
-			else
-				/* trampoline code not run */
-				pr_err("CPU%d: Not responding\n", cpu);
-			if (apic->inquire_remote_apic)
-				apic->inquire_remote_apic(apicid);
-		}
-	}
-
-	if (boot_error) {
-		/* Try to put things back the way they were before ... */
-		numa_remove_cpu(cpu); /* was set by numa_add_cpu */
-
-		/* was set by do_boot_cpu() */
-		cpumask_clear_cpu(cpu, cpu_callout_mask);
-
-		/* was set by cpu_init() */
-		cpumask_clear_cpu(cpu, cpu_initialized_mask);
 	}
 
 	/* mark "stuck" area as not stuck */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 3/5] acpi_processor: do not mark present at boot but not onlined CPU as onlined
  2014-04-14 15:11 ` [PATCH v4 3/5] acpi_processor: do not mark present at boot but not onlined CPU as onlined Igor Mammedov
@ 2014-04-15  5:48   ` Rafael J. Wysocki
  2014-04-15  6:00     ` Igor Mammedov
  2014-04-15  6:04     ` Ingo Molnar
  2014-04-15  5:53   ` Rafael J. Wysocki
  2014-04-30 21:25   ` Toshi Kani
  2 siblings, 2 replies; 20+ messages in thread
From: Rafael J. Wysocki @ 2014-04-15  5:48 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: linux-kernel, tglx, mingo, hpa, x86, bp, paul.gortmaker,
	JBeulich, prarit, drjones, toshi.kani, riel, gong.chen, andi,
	lenb, linux-acpi

On Monday, April 14, 2014 05:11:15 PM Igor Mammedov wrote:
> acpi_processor_add() assumes that present at boot CPUs
> are always onlined, it is not so if a CPU failed to become
> onlined. As result acpi_processor_add() will mark such CPU
> device as onlined in sysfs and following attempts to
> online/offline it using /sys/device/system/cpu/cpuX/online
> attribute will fail.
> 
> Do not poke into device internals in acpi_processor_add()
> and touch "struct device { .offline }" attribute, since
> for CPUs onlined at boot it's set by:
>   topology_init() -> arch_register_cpu() -> register_cpu()
> before ACPI device tree is parsed, and for hotplugged
> CPUs it's set when userspace onlines CPU via sysfs.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> v2:
>  - fix regression in v1 leading to NULL pointer dereference
>    on CPU unplug, do not remove "pr->dev = dev;"

Yeah.

Does this patch depend on any other patches in the series?

I don't think so, but just asking.

If it doesn't, why is it part of this series at all?

> ---
>  drivers/acpi/acpi_processor.c |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index c29c2c3..42d66f8 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -404,7 +404,6 @@ static int acpi_processor_add(struct acpi_device *device,
>  		goto err;
>  
>  	pr->dev = dev;
> -	dev->offline = pr->flags.need_hotplug_init;
>  
>  	/* Trigger the processor driver's .probe() if present. */
>  	if (device_attach(dev) >= 0)
> 

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 3/5] acpi_processor: do not mark present at boot but not onlined CPU as onlined
  2014-04-14 15:11 ` [PATCH v4 3/5] acpi_processor: do not mark present at boot but not onlined CPU as onlined Igor Mammedov
  2014-04-15  5:48   ` Rafael J. Wysocki
@ 2014-04-15  5:53   ` Rafael J. Wysocki
  2014-04-30 21:25   ` Toshi Kani
  2 siblings, 0 replies; 20+ messages in thread
From: Rafael J. Wysocki @ 2014-04-15  5:53 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: linux-kernel, tglx, mingo, hpa, x86, bp, paul.gortmaker,
	JBeulich, prarit, drjones, toshi.kani, riel, gong.chen, andi,
	lenb, linux-acpi

On Monday, April 14, 2014 05:11:15 PM Igor Mammedov wrote:
> acpi_processor_add() assumes that present at boot CPUs
> are always onlined, it is not so if a CPU failed to become
> onlined. As result acpi_processor_add() will mark such CPU
> device as onlined in sysfs and following attempts to
> online/offline it using /sys/device/system/cpu/cpuX/online
> attribute will fail.
> 
> Do not poke into device internals in acpi_processor_add()
> and touch "struct device { .offline }" attribute, since
> for CPUs onlined at boot it's set by:
>   topology_init() -> arch_register_cpu() -> register_cpu()
> before ACPI device tree is parsed, and for hotplugged
> CPUs it's set when userspace onlines CPU via sysfs.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> v2:
>  - fix regression in v1 leading to NULL pointer dereference
>    on CPU unplug, do not remove "pr->dev = dev;"
> ---
>  drivers/acpi/acpi_processor.c |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index c29c2c3..42d66f8 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -404,7 +404,6 @@ static int acpi_processor_add(struct acpi_device *device,
>  		goto err;
>  
>  	pr->dev = dev;
> -	dev->offline = pr->flags.need_hotplug_init;

This line is to ensure that dev->offline and pr->flags.need_hotplug_init are
consistent with each other.  If you remove it, you need to ensure that they
will be consistent in some other way.

>  
>  	/* Trigger the processor driver's .probe() if present. */
>  	if (device_attach(dev) >= 0)
> 

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 3/5] acpi_processor: do not mark present at boot but not onlined CPU as onlined
  2014-04-15  5:48   ` Rafael J. Wysocki
@ 2014-04-15  6:00     ` Igor Mammedov
  2014-04-15  6:04     ` Ingo Molnar
  1 sibling, 0 replies; 20+ messages in thread
From: Igor Mammedov @ 2014-04-15  6:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, tglx, mingo, hpa, x86, bp, paul.gortmaker,
	JBeulich, prarit, drjones, toshi.kani, riel, gong.chen, andi,
	lenb, linux-acpi

On Tue, 15 Apr 2014 07:48:30 +0200
"Rafael J. Wysocki" <rjw@rjwysocki.net> wrote:

> On Monday, April 14, 2014 05:11:15 PM Igor Mammedov wrote:
> > acpi_processor_add() assumes that present at boot CPUs
> > are always onlined, it is not so if a CPU failed to become
> > onlined. As result acpi_processor_add() will mark such CPU
> > device as onlined in sysfs and following attempts to
> > online/offline it using /sys/device/system/cpu/cpuX/online
> > attribute will fail.
> > 
> > Do not poke into device internals in acpi_processor_add()
> > and touch "struct device { .offline }" attribute, since
> > for CPUs onlined at boot it's set by:
> >   topology_init() -> arch_register_cpu() -> register_cpu()
> > before ACPI device tree is parsed, and for hotplugged
> > CPUs it's set when userspace onlines CPU via sysfs.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > v2:
> >  - fix regression in v1 leading to NULL pointer dereference
> >    on CPU unplug, do not remove "pr->dev = dev;"
> 
> Yeah.
> 
> Does this patch depend on any other patches in the series?
> 
> I don't think so, but just asking.
> 
> If it doesn't, why is it part of this series at all?
It's doesn't depend on any other patches in here, it was just
convenient to post it as a part of fixes found in CPU hotplug
code and nothing more.

> 
> > ---
> >  drivers/acpi/acpi_processor.c |    1 -
> >  1 files changed, 0 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > index c29c2c3..42d66f8 100644
> > --- a/drivers/acpi/acpi_processor.c
> > +++ b/drivers/acpi/acpi_processor.c
> > @@ -404,7 +404,6 @@ static int acpi_processor_add(struct acpi_device *device,
> >  		goto err;
> >  
> >  	pr->dev = dev;
> > -	dev->offline = pr->flags.need_hotplug_init;
> >  
> >  	/* Trigger the processor driver's .probe() if present. */
> >  	if (device_attach(dev) >= 0)
> > 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 3/5] acpi_processor: do not mark present at boot but not onlined CPU as onlined
  2014-04-15  5:48   ` Rafael J. Wysocki
  2014-04-15  6:00     ` Igor Mammedov
@ 2014-04-15  6:04     ` Ingo Molnar
  2014-04-15 15:48       ` Rafael J. Wysocki
  1 sibling, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2014-04-15  6:04 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Igor Mammedov, linux-kernel, tglx, mingo, hpa, x86, bp,
	paul.gortmaker, JBeulich, prarit, drjones, toshi.kani, riel,
	gong.chen, andi, lenb, linux-acpi


* Rafael J. Wysocki <rjw@rjwysocki.net> wrote:

> On Monday, April 14, 2014 05:11:15 PM Igor Mammedov wrote:
> > acpi_processor_add() assumes that present at boot CPUs
> > are always onlined, it is not so if a CPU failed to become
> > onlined. As result acpi_processor_add() will mark such CPU
> > device as onlined in sysfs and following attempts to
> > online/offline it using /sys/device/system/cpu/cpuX/online
> > attribute will fail.
> > 
> > Do not poke into device internals in acpi_processor_add()
> > and touch "struct device { .offline }" attribute, since
> > for CPUs onlined at boot it's set by:
> >   topology_init() -> arch_register_cpu() -> register_cpu()
> > before ACPI device tree is parsed, and for hotplugged
> > CPUs it's set when userspace onlines CPU via sysfs.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > v2:
> >  - fix regression in v1 leading to NULL pointer dereference
> >    on CPU unplug, do not remove "pr->dev = dev;"
> 
> Yeah.
> 
> Does this patch depend on any other patches in the series?
> 
> I don't think so, but just asking.
> 
> If it doesn't, why is it part of this series at all?

I suspect because Igor was rigorously stress-testing CPU hotplug, and 
was fixing all the bugs he saw, before adding the one feature he is 
interested in.

The feature cannot be guaranteed to be correct, without having a 
stable base to work on.

As such this series makes sense, as long as the fixes precede the 
feature, and as long as the fixes are correct.

Consider it work in progress, with you being one of the reviewers who 
makes sure the fixes are correct.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 3/5] acpi_processor: do not mark present at boot but not onlined CPU as onlined
  2014-04-15  6:04     ` Ingo Molnar
@ 2014-04-15 15:48       ` Rafael J. Wysocki
  0 siblings, 0 replies; 20+ messages in thread
From: Rafael J. Wysocki @ 2014-04-15 15:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Igor Mammedov, linux-kernel, tglx, mingo, hpa, x86, bp,
	paul.gortmaker, JBeulich, prarit, drjones, toshi.kani, riel,
	gong.chen, andi, lenb, linux-acpi

On Tuesday, April 15, 2014 08:04:11 AM Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> 
> > On Monday, April 14, 2014 05:11:15 PM Igor Mammedov wrote:
> > > acpi_processor_add() assumes that present at boot CPUs
> > > are always onlined, it is not so if a CPU failed to become
> > > onlined. As result acpi_processor_add() will mark such CPU
> > > device as onlined in sysfs and following attempts to
> > > online/offline it using /sys/device/system/cpu/cpuX/online
> > > attribute will fail.
> > > 
> > > Do not poke into device internals in acpi_processor_add()
> > > and touch "struct device { .offline }" attribute, since
> > > for CPUs onlined at boot it's set by:
> > >   topology_init() -> arch_register_cpu() -> register_cpu()
> > > before ACPI device tree is parsed, and for hotplugged
> > > CPUs it's set when userspace onlines CPU via sysfs.
> > > 
> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > ---
> > > v2:
> > >  - fix regression in v1 leading to NULL pointer dereference
> > >    on CPU unplug, do not remove "pr->dev = dev;"
> > 
> > Yeah.
> > 
> > Does this patch depend on any other patches in the series?
> > 
> > I don't think so, but just asking.
> > 
> > If it doesn't, why is it part of this series at all?
> 
> I suspect because Igor was rigorously stress-testing CPU hotplug, and 
> was fixing all the bugs he saw, before adding the one feature he is 
> interested in.
> 
> The feature cannot be guaranteed to be correct, without having a 
> stable base to work on.
> 
> As such this series makes sense, as long as the fixes precede the 
> feature, and as long as the fixes are correct.
> 
> Consider it work in progress, with you being one of the reviewers who 
> makes sure the fixes are correct.

Fair enough.

So perhaps the subject of the whole series should be changed, because x86 is
not the only architecture affected by this particular patch?

Rafael


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 1/5] x86: fix list corruption on CPU hotplug
  2014-04-14 15:11 ` [PATCH v4 1/5] x86: fix list corruption on CPU hotplug Igor Mammedov
@ 2014-04-30 21:18   ` Toshi Kani
  0 siblings, 0 replies; 20+ messages in thread
From: Toshi Kani @ 2014-04-30 21:18 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: linux-kernel, tglx, mingo, hpa, x86, bp, paul.gortmaker,
	JBeulich, prarit, drjones, riel, gong.chen, andi, lenb, rjw,
	linux-acpi

On Mon, 2014-04-14 at 17:11 +0200, Igor Mammedov wrote:
> currently if AP wake up is failed, master CPU marks AP as not present
> in do_boot_cpu() by calling set_cpu_present(cpu, false).
> That leads to following list corruption on the next physical CPU
> hotplug:
> 
> [  418.107336] WARNING: CPU: 1 PID: 45 at lib/list_debug.c:33 __list_add+0xbe/0xd0()
> [  418.115268] list_add corruption. prev->next should be next (ffff88003dc57600), but was ffff88003e20c3a0. (prev=ffff88003e20c3a0).
> [  418.123693] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT ipt_REJECT cfg80211 xt_conntrack rfkill ee
> [  418.138979] CPU: 1 PID: 45 Comm: kworker/u10:1 Not tainted 3.14.0-rc6+ #387
> [  418.149989] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
> [  418.165750] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> [  418.166433]  0000000000000021 ffff880038ca7988 ffffffff8159b22d 0000000000000021
> [  418.176460]  ffff880038ca79d8 ffff880038ca79c8 ffffffff8106942c ffff880038ca79e8
> [  418.177453]  ffff88003e20c3a0 ffff88003dc57600 ffff88003e20c3a0 00000000ffffffea
> [  418.178445] Call Trace:
> [  418.185811]  [<ffffffff8159b22d>] dump_stack+0x49/0x5c
> [  418.186440]  [<ffffffff8106942c>] warn_slowpath_common+0x8c/0xc0
> [  418.187192]  [<ffffffff81069516>] warn_slowpath_fmt+0x46/0x50
> [  418.191231]  [<ffffffff8136ef51>] ? acpi_ns_get_node+0xb7/0xc7
> [  418.193889]  [<ffffffff812f796e>] __list_add+0xbe/0xd0
> [  418.196649]  [<ffffffff812e2aa9>] kobject_add_internal+0x79/0x200
> [  418.208610]  [<ffffffff812e2e18>] kobject_add_varg+0x38/0x60
> [  418.213831]  [<ffffffff812e2ef4>] kobject_add+0x44/0x70
> [  418.229961]  [<ffffffff813e2c60>] device_add+0xd0/0x550
> [  418.234991]  [<ffffffff813f0e95>] ? pm_runtime_init+0xe5/0xf0
> [  418.250226]  [<ffffffff813e32be>] device_register+0x1e/0x30
> [  418.255296]  [<ffffffff813e82a3>] register_cpu+0xe3/0x130
> [  418.266539]  [<ffffffff81592be5>] arch_register_cpu+0x65/0x150
> [  418.285845]  [<ffffffff81355c0d>] acpi_processor_hotadd_init+0x5a/0x9b
> ...
> Which is caused by the fact that generic_processor_info() allocates
> logical CPU id by calling:
> 
>  cpu = cpumask_next_zero(-1, cpu_present_mask);
> 
> which returns id of previously failed to wake up CPU, since its bit
> is cleared by do_boot_cpu() and as result register_cpu() tries to
> register another CPU with the same id as already present but failed
> to be onlined CPU.
> 
> Taking in account that AP will not do anything if master CPU failed to
> wake it up, there is no reason to mark that AP as not present and
> break next cpu hotplug attempts. As a side effect of not marking AP
> as not present, user would be allowed to online it again later.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

Hi Igor,

Sorry for long delay...  Can you please combine patch 1/5 and 2/5?  When
a CPU is marked as present, its APIC ID must be valid.  So, it does not
make sense to separate patch 1/5 and 2/5.  With that change:

Acked-by: Toshi Kani <toshi.kani@hp.com>

Thanks,
-Toshi




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 3/5] acpi_processor: do not mark present at boot but not onlined CPU as onlined
  2014-04-14 15:11 ` [PATCH v4 3/5] acpi_processor: do not mark present at boot but not onlined CPU as onlined Igor Mammedov
  2014-04-15  5:48   ` Rafael J. Wysocki
  2014-04-15  5:53   ` Rafael J. Wysocki
@ 2014-04-30 21:25   ` Toshi Kani
  2014-05-02 11:32     ` Igor Mammedov
  2 siblings, 1 reply; 20+ messages in thread
From: Toshi Kani @ 2014-04-30 21:25 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: linux-kernel, tglx, mingo, hpa, x86, bp, paul.gortmaker,
	JBeulich, prarit, drjones, riel, gong.chen, andi, lenb, rjw,
	linux-acpi

On Mon, 2014-04-14 at 17:11 +0200, Igor Mammedov wrote:
> acpi_processor_add() assumes that present at boot CPUs
> are always onlined, it is not so if a CPU failed to become
> onlined. As result acpi_processor_add() will mark such CPU
> device as onlined in sysfs and following attempts to
> online/offline it using /sys/device/system/cpu/cpuX/online
> attribute will fail.
> 
> Do not poke into device internals in acpi_processor_add()
> and touch "struct device { .offline }" attribute, since
> for CPUs onlined at boot it's set by:
>   topology_init() -> arch_register_cpu() -> register_cpu()
> before ACPI device tree is parsed, and for hotplugged
> CPUs it's set when userspace onlines CPU via sysfs.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> v2:
>  - fix regression in v1 leading to NULL pointer dereference
>    on CPU unplug, do not remove "pr->dev = dev;"
> ---
>  drivers/acpi/acpi_processor.c |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index c29c2c3..42d66f8 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -404,7 +404,6 @@ static int acpi_processor_add(struct acpi_device *device,
>  		goto err;
>  
>  	pr->dev = dev;
> -	dev->offline = pr->flags.need_hotplug_init;

IIRC, this change was necessary to handle the case when maxcpus=X is
specified at boot.  In this case, excessive CPU's dev->offline needs to
be set to 1.  Can you verify this?

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 4/5] x86: log error on secondary CPU wakeup failure at ERR level
  2014-04-14 15:11 ` [PATCH v4 4/5] x86: log error on secondary CPU wakeup failure at ERR level Igor Mammedov
@ 2014-04-30 21:30   ` Toshi Kani
  0 siblings, 0 replies; 20+ messages in thread
From: Toshi Kani @ 2014-04-30 21:30 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: linux-kernel, tglx, mingo, hpa, x86, bp, paul.gortmaker,
	JBeulich, prarit, drjones, riel, gong.chen, andi, lenb, rjw,
	linux-acpi

On Mon, 2014-04-14 at 17:11 +0200, Igor Mammedov wrote:
> If system is running without debug level logging,
> it will not log error if do_boot_cpu() failed to
> wakeup AP. It may lead to silent AP bringup
> failures at boot time.
> Change message level to KERN_ERR to make error
> visible to user as it's done on other architectures.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  arch/x86/kernel/smpboot.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 2988f69..ae2fd97 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -918,7 +918,7 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
>  
>  	err = do_boot_cpu(apicid, cpu, tidle);
>  	if (err) {
> -		pr_debug("do_boot_cpu failed %d\n", err);
> +		pr_err("do_boot_cpu failed(%d) to wakeup CPU#%u\n", err, cpu);
>  		return -EIO;
>  	}
>  

Looks good.

Acked-by: Toshi Kani <toshi.kani@hp.com>

I will review patch 5/5 later (probably tomorrow).
Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 5/5] x86: initialize secondary CPU only if master CPU will wait for it
  2014-04-14 15:11 ` [PATCH v4 5/5] x86: initialize secondary CPU only if master CPU will wait for it Igor Mammedov
@ 2014-05-01 23:11   ` Toshi Kani
  2014-05-02  8:21     ` Igor Mammedov
  0 siblings, 1 reply; 20+ messages in thread
From: Toshi Kani @ 2014-05-01 23:11 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: linux-kernel, tglx, mingo, hpa, x86, bp, paul.gortmaker,
	JBeulich, prarit, drjones, riel, gong.chen, andi, lenb, rjw,
	linux-acpi

On Mon, 2014-04-14 at 17:11 +0200, Igor Mammedov wrote:
> Hang is observed on virtual machines during CPU hotplug,
> especially in big guests with many CPUs. (It reproducible
> more often if host is over-committed).
> 
> It happens because master CPU gives up waiting on
> secondary CPU and allows it to run wild. As result
> AP causes locking or crashing system. For example
> as described here: https://lkml.org/lkml/2014/3/6/257
> 
> If master CPU have sent STARTUP IPI successfully,
> and AP signalled to master CPU that it's ready
> to start initialization, make master CPU wait
> indefinitely till AP is onlined.
> To ensure that AP won't ever run wild, make it
> wait at early startup till master CPU confirms its
> intention to wait for AP.

Please also add description that the master CPU times out when an AP
does not start initialization within 10 seconds.

> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> v2:
>  - ammend comment in cpu_init()
> v3:
>  - leave timeouts in do_boot_cpu(), so that master CPU
>    won't hang if AP doesn't respond, use cpu_initialized_mask
>    as a way for AP to signal to master CPU that it's ready
>    to start initialzation.
> v4:
>  - move common code in cpu_init() for x32/x64 in shared
>    helper function wait_for_master_cpu()
>  - add WARN_ON(cpumask_test_and_set_cpu(cpu, cpu_initialized_mask))
>    to wait_formaster_cpu()
> ---
>  arch/x86/kernel/cpu/common.c |   27 +++++++-----
>  arch/x86/kernel/smpboot.c    |   98 +++++++++++++-----------------------------
>  2 files changed, 46 insertions(+), 79 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index a135239..a4bcbac 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1221,6 +1221,17 @@ static void dbg_restore_debug_regs(void)
>  #define dbg_restore_debug_regs()
>  #endif /* ! CONFIG_KGDB */
>  
> +static void wait_for_master_cpu(int cpu)
> +{
> +	/*
> +	 * wait for ACK from master CPU before continuing
> +	 * with AP initialization
> +	 */
> +	WARN_ON(cpumask_test_and_set_cpu(cpu, cpu_initialized_mask));
> +	while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> +		cpu_relax();
> +}
> +
>  /*
>   * cpu_init() initializes state that is per-CPU. Some data is already
>   * initialized (naturally) in the bootstrap process, such as the GDT
> @@ -1236,16 +1247,17 @@ void cpu_init(void)
>  	struct task_struct *me;
>  	struct tss_struct *t;
>  	unsigned long v;
> -	int cpu;
> +	int cpu = stack_smp_processor_id();
>  	int i;
>  
> +	wait_for_master_cpu(cpu);
> +
>  	/*
>  	 * Load microcode on this cpu if a valid microcode is available.
>  	 * This is early microcode loading procedure.
>  	 */
>  	load_ucode_ap();
>  
> -	cpu = stack_smp_processor_id();
>  	t = &per_cpu(init_tss, cpu);
>  	oist = &per_cpu(orig_ist, cpu);
>  
> @@ -1257,9 +1269,6 @@ void cpu_init(void)
>  
>  	me = current;
>  
> -	if (cpumask_test_and_set_cpu(cpu, cpu_initialized_mask))
> -		panic("CPU#%d already initialized!\n", cpu);
> -
>  	pr_debug("Initializing CPU#%d\n", cpu);
>  
>  	clear_in_cr4(X86_CR4_VME|X86_CR4_PVI|X86_CR4_TSD|X86_CR4_DE);
> @@ -1336,13 +1345,9 @@ void cpu_init(void)
>  	struct tss_struct *t = &per_cpu(init_tss, cpu);
>  	struct thread_struct *thread = &curr->thread;
>  
> -	show_ucode_info_early();
> +	wait_for_master_cpu(cpu);
>  
> -	if (cpumask_test_and_set_cpu(cpu, cpu_initialized_mask)) {
> -		printk(KERN_WARNING "CPU#%d already initialized!\n", cpu);
> -		for (;;)
> -			local_irq_enable();
> -	}
> +	show_ucode_info_early();
>  
>  	printk(KERN_INFO "Initializing CPU#%d\n", cpu);
>  
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index ae2fd97..44903ad 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -111,7 +111,6 @@ atomic_t init_deasserted;
>  static void smp_callin(void)
>  {
>  	int cpuid, phys_id;
> -	unsigned long timeout;
>  
>  	/*
>  	 * If waken up by an INIT in an 82489DX configuration
> @@ -130,37 +129,6 @@ static void smp_callin(void)
>  	 * (This works even if the APIC is not enabled.)
>  	 */
>  	phys_id = read_apic_id();
> -	if (cpumask_test_cpu(cpuid, cpu_callin_mask)) {
> -		panic("%s: phys CPU#%d, CPU#%d already present??\n", __func__,
> -					phys_id, cpuid);
> -	}
> -	pr_debug("CPU#%d (phys ID: %d) waiting for CALLOUT\n", cpuid, phys_id);
> -
> -	/*
> -	 * STARTUP IPIs are fragile beasts as they might sometimes
> -	 * trigger some glue motherboard logic. Complete APIC bus
> -	 * silence for 1 second, this overestimates the time the
> -	 * boot CPU is spending to send the up to 2 STARTUP IPIs
> -	 * by a factor of two. This should be enough.
> -	 */
> -
> -	/*
> -	 * Waiting 2s total for startup (udelay is not yet working)
> -	 */
> -	timeout = jiffies + 2*HZ;
> -	while (time_before(jiffies, timeout)) {
> -		/*
> -		 * Has the boot CPU finished it's STARTUP sequence?
> -		 */
> -		if (cpumask_test_cpu(cpuid, cpu_callout_mask))
> -			break;
> -		cpu_relax();
> -	}
> -
> -	if (!time_before(jiffies, timeout)) {
> -		panic("%s: CPU%d started up but did not get a callout!\n",
> -		      __func__, cpuid);
> -	}
>  
>  	/*
>  	 * the boot CPU has finished the init stage and is spinning
> @@ -750,8 +718,8 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
>  	unsigned long start_ip = real_mode_header->trampoline_start;
>  
>  	unsigned long boot_error = 0;
> -	int timeout;
>  	int cpu0_nmi_registered = 0;
> +	unsigned long timeout;
>  
>  	/* Just in case we booted with a single CPU. */
>  	alternatives_enable_smp();
> @@ -799,6 +767,14 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
>  	}
>  
>  	/*
> +	 * AP might wait on cpu_callout_mask in cpu_init() with
> +	 * cpu_initialized_mask set if previous attempt to online
> +	 * it timed-out. Clear cpu_initialized_mask so that after
> +	 * INIT/SIPI it could start with a clean state.
> +	 */
> +	cpumask_clear_cpu(cpu, cpu_initialized_mask);

I think smp_mb() should be added here to ensure that the target AP sees
this change.

> +
> +	/*
>  	 * Wake up a CPU in difference cases:
>  	 * - Use the method in the APIC driver if it's defined
>  	 * Otherwise,
> @@ -810,55 +786,41 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
>  		boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid,
>  						     &cpu0_nmi_registered);
>  
> +
>  	if (!boot_error) {
>  		/*
> -		 * allow APs to start initializing.
> +		 * Wait 10s total for a response from AP
>  		 */
> -		pr_debug("Before Callout %d\n", cpu);
> -		cpumask_set_cpu(cpu, cpu_callout_mask);
> -		pr_debug("After Callout %d\n", cpu);
> +		boot_error = -1;
> +		timeout = jiffies + 10*HZ;
> +		while (time_before(jiffies, timeout)) {
> +			if (cpumask_test_cpu(cpu, cpu_initialized_mask)) {
> +				/*
> +				 * Tell AP to proceed with initialization
> +				 */
> +				cpumask_set_cpu(cpu, cpu_callout_mask);
> +				boot_error = 0;
> +				break;
> +			}
> +			udelay(100);
> +			schedule();
> +		}
> +	}

When 10s passed, the master could set a new flag, ex.
cpu_callout_error_mask, which wait_for_master_cpu() checks and call
play_dead() when it is set.  This avoids AP to spin forever when 10s
becomes not long enough.  But it does not have to be part of this
patchset, though.  

> +	if (!boot_error) {
>  		/*
> -		 * Wait 5s total for a response
> +		 * Wait till AP completes initial initialization

We should generally avoid such wait w/o a timeout condition, but since
native_cpu_up() waits till cpu_online(cpu) anyway after this point, this
seems OK...  I wonder if we need touch_nmi_watchdog(), though.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 5/5] x86: initialize secondary CPU only if master CPU will wait for it
  2014-05-01 23:11   ` Toshi Kani
@ 2014-05-02  8:21     ` Igor Mammedov
  2014-05-02 14:52       ` Toshi Kani
  0 siblings, 1 reply; 20+ messages in thread
From: Igor Mammedov @ 2014-05-02  8:21 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-kernel, tglx, mingo, hpa, x86, bp, paul.gortmaker,
	JBeulich, prarit, drjones, riel, gong.chen, andi, lenb, rjw,
	linux-acpi

On Thu, 01 May 2014 17:11:56 -0600
Toshi Kani <toshi.kani@hp.com> wrote:

> On Mon, 2014-04-14 at 17:11 +0200, Igor Mammedov wrote:
> > Hang is observed on virtual machines during CPU hotplug,
> > especially in big guests with many CPUs. (It reproducible
> > more often if host is over-committed).
> > 
> > It happens because master CPU gives up waiting on
> > secondary CPU and allows it to run wild. As result
> > AP causes locking or crashing system. For example
> > as described here: https://lkml.org/lkml/2014/3/6/257
> > 
> > If master CPU have sent STARTUP IPI successfully,
> > and AP signalled to master CPU that it's ready
> > to start initialization, make master CPU wait
> > indefinitely till AP is onlined.
> > To ensure that AP won't ever run wild, make it
> > wait at early startup till master CPU confirms its
> > intention to wait for AP.
> 
> Please also add description that the master CPU times out when an AP
> does not start initialization within 10 seconds.
added

> 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > v2:
> >  - ammend comment in cpu_init()
> > v3:
> >  - leave timeouts in do_boot_cpu(), so that master CPU
> >    won't hang if AP doesn't respond, use cpu_initialized_mask
> >    as a way for AP to signal to master CPU that it's ready
> >    to start initialzation.
> > v4:
> >  - move common code in cpu_init() for x32/x64 in shared
> >    helper function wait_for_master_cpu()
> >  - add WARN_ON(cpumask_test_and_set_cpu(cpu, cpu_initialized_mask))
> >    to wait_formaster_cpu()
> > ---
> >  arch/x86/kernel/cpu/common.c |   27 +++++++-----
> >  arch/x86/kernel/smpboot.c    |   98 +++++++++++++-----------------------------
> >  2 files changed, 46 insertions(+), 79 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> > index a135239..a4bcbac 100644
> > --- a/arch/x86/kernel/cpu/common.c
> > +++ b/arch/x86/kernel/cpu/common.c
> > @@ -1221,6 +1221,17 @@ static void dbg_restore_debug_regs(void)
> >  #define dbg_restore_debug_regs()
> >  #endif /* ! CONFIG_KGDB */
> >  
> > +static void wait_for_master_cpu(int cpu)
> > +{
> > +	/*
> > +	 * wait for ACK from master CPU before continuing
> > +	 * with AP initialization
> > +	 */
> > +	WARN_ON(cpumask_test_and_set_cpu(cpu, cpu_initialized_mask));
> > +	while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > +		cpu_relax();
> > +}
> > +
> >  /*
> >   * cpu_init() initializes state that is per-CPU. Some data is already
> >   * initialized (naturally) in the bootstrap process, such as the GDT
> > @@ -1236,16 +1247,17 @@ void cpu_init(void)
> >  	struct task_struct *me;
> >  	struct tss_struct *t;
> >  	unsigned long v;
> > -	int cpu;
> > +	int cpu = stack_smp_processor_id();
> >  	int i;
> >  
> > +	wait_for_master_cpu(cpu);
> > +
> >  	/*
> >  	 * Load microcode on this cpu if a valid microcode is available.
> >  	 * This is early microcode loading procedure.
> >  	 */
> >  	load_ucode_ap();
> >  
> > -	cpu = stack_smp_processor_id();
> >  	t = &per_cpu(init_tss, cpu);
> >  	oist = &per_cpu(orig_ist, cpu);
> >  
> > @@ -1257,9 +1269,6 @@ void cpu_init(void)
> >  
> >  	me = current;
> >  
> > -	if (cpumask_test_and_set_cpu(cpu, cpu_initialized_mask))
> > -		panic("CPU#%d already initialized!\n", cpu);
> > -
> >  	pr_debug("Initializing CPU#%d\n", cpu);
> >  
> >  	clear_in_cr4(X86_CR4_VME|X86_CR4_PVI|X86_CR4_TSD|X86_CR4_DE);
> > @@ -1336,13 +1345,9 @@ void cpu_init(void)
> >  	struct tss_struct *t = &per_cpu(init_tss, cpu);
> >  	struct thread_struct *thread = &curr->thread;
> >  
> > -	show_ucode_info_early();
> > +	wait_for_master_cpu(cpu);
> >  
> > -	if (cpumask_test_and_set_cpu(cpu, cpu_initialized_mask)) {
> > -		printk(KERN_WARNING "CPU#%d already initialized!\n", cpu);
> > -		for (;;)
> > -			local_irq_enable();
> > -	}
> > +	show_ucode_info_early();
> >  
> >  	printk(KERN_INFO "Initializing CPU#%d\n", cpu);
> >  
> > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> > index ae2fd97..44903ad 100644
> > --- a/arch/x86/kernel/smpboot.c
> > +++ b/arch/x86/kernel/smpboot.c
> > @@ -111,7 +111,6 @@ atomic_t init_deasserted;
> >  static void smp_callin(void)
> >  {
> >  	int cpuid, phys_id;
> > -	unsigned long timeout;
> >  
> >  	/*
> >  	 * If waken up by an INIT in an 82489DX configuration
> > @@ -130,37 +129,6 @@ static void smp_callin(void)
> >  	 * (This works even if the APIC is not enabled.)
> >  	 */
> >  	phys_id = read_apic_id();
> > -	if (cpumask_test_cpu(cpuid, cpu_callin_mask)) {
> > -		panic("%s: phys CPU#%d, CPU#%d already present??\n", __func__,
> > -					phys_id, cpuid);
> > -	}
> > -	pr_debug("CPU#%d (phys ID: %d) waiting for CALLOUT\n", cpuid, phys_id);
> > -
> > -	/*
> > -	 * STARTUP IPIs are fragile beasts as they might sometimes
> > -	 * trigger some glue motherboard logic. Complete APIC bus
> > -	 * silence for 1 second, this overestimates the time the
> > -	 * boot CPU is spending to send the up to 2 STARTUP IPIs
> > -	 * by a factor of two. This should be enough.
> > -	 */
> > -
> > -	/*
> > -	 * Waiting 2s total for startup (udelay is not yet working)
> > -	 */
> > -	timeout = jiffies + 2*HZ;
> > -	while (time_before(jiffies, timeout)) {
> > -		/*
> > -		 * Has the boot CPU finished it's STARTUP sequence?
> > -		 */
> > -		if (cpumask_test_cpu(cpuid, cpu_callout_mask))
> > -			break;
> > -		cpu_relax();
> > -	}
> > -
> > -	if (!time_before(jiffies, timeout)) {
> > -		panic("%s: CPU%d started up but did not get a callout!\n",
> > -		      __func__, cpuid);
> > -	}
> >  
> >  	/*
> >  	 * the boot CPU has finished the init stage and is spinning
> > @@ -750,8 +718,8 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
> >  	unsigned long start_ip = real_mode_header->trampoline_start;
> >  
> >  	unsigned long boot_error = 0;
> > -	int timeout;
> >  	int cpu0_nmi_registered = 0;
> > +	unsigned long timeout;
> >  
> >  	/* Just in case we booted with a single CPU. */
> >  	alternatives_enable_smp();
> > @@ -799,6 +767,14 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
> >  	}
> >  
> >  	/*
> > +	 * AP might wait on cpu_callout_mask in cpu_init() with
> > +	 * cpu_initialized_mask set if previous attempt to online
> > +	 * it timed-out. Clear cpu_initialized_mask so that after
> > +	 * INIT/SIPI it could start with a clean state.
> > +	 */
> > +	cpumask_clear_cpu(cpu, cpu_initialized_mask);
> 
> I think smp_mb() should be added here to ensure that the target AP sees
> this change.
ok

> 
> > +
> > +	/*
> >  	 * Wake up a CPU in difference cases:
> >  	 * - Use the method in the APIC driver if it's defined
> >  	 * Otherwise,
> > @@ -810,55 +786,41 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
> >  		boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid,
> >  						     &cpu0_nmi_registered);
> >  
> > +
> >  	if (!boot_error) {
> >  		/*
> > -		 * allow APs to start initializing.
> > +		 * Wait 10s total for a response from AP
> >  		 */
> > -		pr_debug("Before Callout %d\n", cpu);
> > -		cpumask_set_cpu(cpu, cpu_callout_mask);
> > -		pr_debug("After Callout %d\n", cpu);
> > +		boot_error = -1;
> > +		timeout = jiffies + 10*HZ;
> > +		while (time_before(jiffies, timeout)) {
> > +			if (cpumask_test_cpu(cpu, cpu_initialized_mask)) {
> > +				/*
> > +				 * Tell AP to proceed with initialization
> > +				 */
> > +				cpumask_set_cpu(cpu, cpu_callout_mask);
> > +				boot_error = 0;
> > +				break;
> > +			}
> > +			udelay(100);
> > +			schedule();
> > +		}
> > +	}
> 
> When 10s passed, the master could set a new flag, ex.
> cpu_callout_error_mask, which wait_for_master_cpu() checks and call
> play_dead() when it is set.  This avoids AP to spin forever when 10s
> becomes not long enough.  But it does not have to be part of this
> patchset, though.
I'm reluctant to add another to already too many cpu_*_mask,
maybe we could reuse cpu_initialized_mask by clearing it on timeout.
This way AP spinning on cpu_callout_mask could notice it and halt itself.

It would be better to make it separate patch on top of this series,
to reduce delay of bugfixes in this series.

> 
> > +	if (!boot_error) {
> >  		/*
> > -		 * Wait 5s total for a response
> > +		 * Wait till AP completes initial initialization
> 
> We should generally avoid such wait w/o a timeout condition, but since
> native_cpu_up() waits till cpu_online(cpu) anyway after this point, this
If we don't wait here and fall through into tight loop waiting on
cpu_online(cpu) in native_cpu_up() or check_tsc_sync_source() then
stop_task for syncing MTTRs initiated from AP won't have a chance
to run on the master CPU.

> seems OK...  I wonder if we need touch_nmi_watchdog(), though.
There wasn't any touch_nmi_watchdog() in the original code and I don't
think we need it here since we are not just spinning on CPU but giving
control back to kernel calling schedule(), which would allow watchdog_task
to do the job if needed.

> Thanks,
> -Toshi
> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 3/5] acpi_processor: do not mark present at boot but not onlined CPU as onlined
  2014-04-30 21:25   ` Toshi Kani
@ 2014-05-02 11:32     ` Igor Mammedov
  2014-05-02 17:23       ` Toshi Kani
  0 siblings, 1 reply; 20+ messages in thread
From: Igor Mammedov @ 2014-05-02 11:32 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-kernel, tglx, mingo, hpa, x86, bp, paul.gortmaker,
	JBeulich, prarit, drjones, riel, gong.chen, andi, lenb, rjw,
	linux-acpi

On Wed, 30 Apr 2014 15:25:51 -0600
Toshi Kani <toshi.kani@hp.com> wrote:

> On Mon, 2014-04-14 at 17:11 +0200, Igor Mammedov wrote:
> > acpi_processor_add() assumes that present at boot CPUs
> > are always onlined, it is not so if a CPU failed to become
> > onlined. As result acpi_processor_add() will mark such CPU
> > device as onlined in sysfs and following attempts to
> > online/offline it using /sys/device/system/cpu/cpuX/online
> > attribute will fail.
> > 
> > Do not poke into device internals in acpi_processor_add()
> > and touch "struct device { .offline }" attribute, since
> > for CPUs onlined at boot it's set by:
> >   topology_init() -> arch_register_cpu() -> register_cpu()
> > before ACPI device tree is parsed, and for hotplugged
> > CPUs it's set when userspace onlines CPU via sysfs.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > v2:
> >  - fix regression in v1 leading to NULL pointer dereference
> >    on CPU unplug, do not remove "pr->dev = dev;"
> > ---
> >  drivers/acpi/acpi_processor.c |    1 -
> >  1 files changed, 0 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > index c29c2c3..42d66f8 100644
> > --- a/drivers/acpi/acpi_processor.c
> > +++ b/drivers/acpi/acpi_processor.c:q
> > @@ -404,7 +404,6 @@ static int acpi_processor_add(struct acpi_device *device,
> >  		goto err;
> >  
> >  	pr->dev = dev;
> > -	dev->offline = pr->flags.need_hotplug_init;
> 
> IIRC, this change was necessary to handle the case when maxcpus=X is
> specified at boot.  In this case, excessive CPU's dev->offline needs to
> be set to 1.  Can you verify this?
Option 'maxcpus' works just fine without and with this patch since a bit
earlier in acpi_processor_add() it exits in case of extra present CPUs:

#ifdef CONFIG_SMP
        if (pr->id >= setup_max_cpus && pr->id != 0)
                return 0;
#endif

and execution doesn't get to the point the patch touches.

The point is that acpi_processor_add() shouldn't touch
dev->offline at all and allow register_cpu() handle it.

> 
> Thanks,
> -Toshi
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 5/5] x86: initialize secondary CPU only if master CPU will wait for it
  2014-05-02  8:21     ` Igor Mammedov
@ 2014-05-02 14:52       ` Toshi Kani
  2014-05-05 20:26         ` Igor Mammedov
  0 siblings, 1 reply; 20+ messages in thread
From: Toshi Kani @ 2014-05-02 14:52 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: linux-kernel, tglx, mingo, hpa, x86, bp, paul.gortmaker,
	JBeulich, prarit, drjones, riel, gong.chen, andi, lenb, rjw,
	linux-acpi

On Fri, 2014-05-02 at 10:21 +0200, Igor Mammedov wrote:
> On Thu, 01 May 2014 17:11:56 -0600
> Toshi Kani <toshi.kani@hp.com> wrote:
 :
> > When 10s passed, the master could set a new flag, ex.
> > cpu_callout_error_mask, which wait_for_master_cpu() checks and call
> > play_dead() when it is set.  This avoids AP to spin forever when 10s
> > becomes not long enough.  But it does not have to be part of this
> > patchset, though.
> I'm reluctant to add another to already too many cpu_*_mask,
> maybe we could reuse cpu_initialized_mask by clearing it on timeout.
> This way AP spinning on cpu_callout_mask could notice it and halt itself.

I agree that there are too many cpu_* masks.  IMHO, these cpu rendezvous
masks, initialized/callout/callin, should be combined into a per-cpu
flag.  There is not much point of being individual masks.

Anyway, I do not think cpu_initialized_mask can be reused here.

> It would be better to make it separate patch on top of this series,
> to reduce delay of bugfixes in this series.

Agreed.

> > 
> > > +	if (!boot_error) {
> > >  		/*
> > > -		 * Wait 5s total for a response
> > > +		 * Wait till AP completes initial initialization
> > 
> > We should generally avoid such wait w/o a timeout condition, but since
> > native_cpu_up() waits till cpu_online(cpu) anyway after this point, this
> If we don't wait here and fall through into tight loop waiting on
> cpu_online(cpu) in native_cpu_up() or check_tsc_sync_source() then
> stop_task for syncing MTTRs initiated from AP won't have a chance
> to run on the master CPU.
> 
> > seems OK...  I wonder if we need touch_nmi_watchdog(), though.
> There wasn't any touch_nmi_watchdog() in the original code and I don't
> think we need it here since we are not just spinning on CPU but giving
> control back to kernel calling schedule(), which would allow watchdog_task
> to do the job if needed.

Agreed.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 3/5] acpi_processor: do not mark present at boot but not onlined CPU as onlined
  2014-05-02 11:32     ` Igor Mammedov
@ 2014-05-02 17:23       ` Toshi Kani
  0 siblings, 0 replies; 20+ messages in thread
From: Toshi Kani @ 2014-05-02 17:23 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: linux-kernel, tglx, mingo, hpa, x86, bp, paul.gortmaker,
	JBeulich, prarit, drjones, riel, gong.chen, andi, lenb, rjw,
	linux-acpi

On Fri, 2014-05-02 at 13:32 +0200, Igor Mammedov wrote:
> On Wed, 30 Apr 2014 15:25:51 -0600
> Toshi Kani <toshi.kani@hp.com> wrote:
> 
> > On Mon, 2014-04-14 at 17:11 +0200, Igor Mammedov wrote:
> > > acpi_processor_add() assumes that present at boot CPUs
> > > are always onlined, it is not so if a CPU failed to become
> > > onlined. As result acpi_processor_add() will mark such CPU
> > > device as onlined in sysfs and following attempts to
> > > online/offline it using /sys/device/system/cpu/cpuX/online
> > > attribute will fail.
> > > 
> > > Do not poke into device internals in acpi_processor_add()
> > > and touch "struct device { .offline }" attribute, since
> > > for CPUs onlined at boot it's set by:
> > >   topology_init() -> arch_register_cpu() -> register_cpu()
> > > before ACPI device tree is parsed, and for hotplugged
> > > CPUs it's set when userspace onlines CPU via sysfs.
> > > 
> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > ---
> > > v2:
> > >  - fix regression in v1 leading to NULL pointer dereference
> > >    on CPU unplug, do not remove "pr->dev = dev;"
> > > ---
> > >  drivers/acpi/acpi_processor.c |    1 -
> > >  1 files changed, 0 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > > index c29c2c3..42d66f8 100644
> > > --- a/drivers/acpi/acpi_processor.c
> > > +++ b/drivers/acpi/acpi_processor.c:q
> > > @@ -404,7 +404,6 @@ static int acpi_processor_add(struct acpi_device *device,
> > >  		goto err;
> > >  
> > >  	pr->dev = dev;
> > > -	dev->offline = pr->flags.need_hotplug_init;
> > 
> > IIRC, this change was necessary to handle the case when maxcpus=X is
> > specified at boot.  In this case, excessive CPU's dev->offline needs to
> > be set to 1.  Can you verify this?
> Option 'maxcpus' works just fine without and with this patch since a bit
> earlier in acpi_processor_add() it exits in case of extra present CPUs:
> 
> #ifdef CONFIG_SMP
>         if (pr->id >= setup_max_cpus && pr->id != 0)
>                 return 0;
> #endif
> 
> and execution doesn't get to the point the patch touches.

This is a separate topic, but I feel that the above code should not be
necessary...

> The point is that acpi_processor_add() shouldn't touch
> dev->offline at all and allow register_cpu() handle it.

Sorry, I had confused with cpu->dev.offline in register_cpu() in my
recollection.  Yes, I agree that we should let register_cpu() to handle
it.

Acked-by: Toshi Kani <toshi.kani@hp.com>

Thanks,
-Toshi






^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 5/5] x86: initialize secondary CPU only if master CPU will wait for it
  2014-05-02 14:52       ` Toshi Kani
@ 2014-05-05 20:26         ` Igor Mammedov
  0 siblings, 0 replies; 20+ messages in thread
From: Igor Mammedov @ 2014-05-05 20:26 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-kernel, tglx, mingo, hpa, x86, bp, paul.gortmaker,
	JBeulich, prarit, drjones, riel, gong.chen, andi, lenb, rjw,
	linux-acpi

On Fri, 02 May 2014 08:52:22 -0600
Toshi Kani <toshi.kani@hp.com> wrote:

> On Fri, 2014-05-02 at 10:21 +0200, Igor Mammedov wrote:
> > On Thu, 01 May 2014 17:11:56 -0600
> > Toshi Kani <toshi.kani@hp.com> wrote:
>  :
> > > When 10s passed, the master could set a new flag, ex.
> > > cpu_callout_error_mask, which wait_for_master_cpu() checks and call
> > > play_dead() when it is set.  This avoids AP to spin forever when 10s
> > > becomes not long enough.  But it does not have to be part of this
> > > patchset, though.
> > I'm reluctant to add another to already too many cpu_*_mask,
> > maybe we could reuse cpu_initialized_mask by clearing it on timeout.
> > This way AP spinning on cpu_callout_mask could notice it and halt itself.
> 
> I agree that there are too many cpu_* masks.  IMHO, these cpu rendezvous
> masks, initialized/callout/callin, should be combined into a per-cpu
> flag.  There is not much point of being individual masks.
> 
> Anyway, I do not think cpu_initialized_mask can be reused here.
I'll look if we could use percpu here when writing patch to halt timed-out AP.

> 
> > It would be better to make it separate patch on top of this series,
> > to reduce delay of bugfixes in this series.
> 
> Agreed.
> 
> > > 
> > > > +	if (!boot_error) {
> > > >  		/*
> > > > -		 * Wait 5s total for a response
> > > > +		 * Wait till AP completes initial initialization
> > > 
> > > We should generally avoid such wait w/o a timeout condition, but since
> > > native_cpu_up() waits till cpu_online(cpu) anyway after this point, this
> > If we don't wait here and fall through into tight loop waiting on
> > cpu_online(cpu) in native_cpu_up() or check_tsc_sync_source() then
> > stop_task for syncing MTTRs initiated from AP won't have a chance
> > to run on the master CPU.
> > 
> > > seems OK...  I wonder if we need touch_nmi_watchdog(), though.
> > There wasn't any touch_nmi_watchdog() in the original code and I don't
> > think we need it here since we are not just spinning on CPU but giving
> > control back to kernel calling schedule(), which would allow watchdog_task
> > to do the job if needed.
> 
> Agreed.
> 
> Thanks,
> -Toshi
> 


-- 
Regards,
  Igor

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2014-05-05 20:27 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-14 15:11 [PATCH v4 0/5] x86: fix hang when AP bringup is too slow Igor Mammedov
2014-04-14 15:11 ` [PATCH v4 1/5] x86: fix list corruption on CPU hotplug Igor Mammedov
2014-04-30 21:18   ` Toshi Kani
2014-04-14 15:11 ` [PATCH v4 2/5] x86: fix memory corruption in acpi_unmap_lsapic() Igor Mammedov
2014-04-14 15:11 ` [PATCH v4 3/5] acpi_processor: do not mark present at boot but not onlined CPU as onlined Igor Mammedov
2014-04-15  5:48   ` Rafael J. Wysocki
2014-04-15  6:00     ` Igor Mammedov
2014-04-15  6:04     ` Ingo Molnar
2014-04-15 15:48       ` Rafael J. Wysocki
2014-04-15  5:53   ` Rafael J. Wysocki
2014-04-30 21:25   ` Toshi Kani
2014-05-02 11:32     ` Igor Mammedov
2014-05-02 17:23       ` Toshi Kani
2014-04-14 15:11 ` [PATCH v4 4/5] x86: log error on secondary CPU wakeup failure at ERR level Igor Mammedov
2014-04-30 21:30   ` Toshi Kani
2014-04-14 15:11 ` [PATCH v4 5/5] x86: initialize secondary CPU only if master CPU will wait for it Igor Mammedov
2014-05-01 23:11   ` Toshi Kani
2014-05-02  8:21     ` Igor Mammedov
2014-05-02 14:52       ` Toshi Kani
2014-05-05 20:26         ` Igor Mammedov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).