[PATCH 0/2] kdump: Enter 2nd kernel with BSP for enabling multiple CPUs

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] kdump: Enter 2nd kernel with BSP for enabling multiple CPUs
@ 2012-04-16  2:21 ` HATAYAMA Daisuke
  0 siblings, 0 replies; 19+ messages in thread
From: HATAYAMA Daisuke @ 2012-04-16  2:21 UTC (permalink / raw)
  To: kexec, linux-kernel, ebiederm, vgoyal, kumagai-atsushi

Currently, booting up 2nd kernel with multiple CPUs fails in most
cases since it enters 2nd kernel with AP if the crash happens on the
AP. The problem is to signal startup IPI from AP to BSP. Typical
result of the operation I saw is the machine hanging during the 2nd
kernel boot.

To solve this issue, always enter 2nd kernel with BSP. To do this, I
modify logic for shooting down CPUs. I use simple existing logic only
in this mechanism, not complicating crash path to machine_kexec().

I did stress tests about 100 in total on the processors below:

  Intel(R) Xeon(R) CPU E7- 4820  @ 2.00GHz
  Socket x 4, Core x 8, Thread x 16 (160 LCPUS in total)

  Intel(R) Xeon(R) CPU E7- 8870  @ 2.40GHz
  Socket x 8, Core x 10, Thread x 20 (64 LCPUS in total)

* Motivation of enabling multiple CPUs on the 2nd kernel

This patch is aimed at doing parallel compression on the 2nd
kernel. The machine that has more than tera bytes memory requires
several hours to generate crash dump.

There are several ways to reduce generation time of crash time, but
they have different pros and cons:

  Fast I/O devices
    pros
      - Can obtain high-speed stably
    cons
      - Big financial cost for good performance I/O devices. It's
        difficult financially to prepare these for all environments as
        dump devices.

  Filtering
    pros
      - No financial cost.
      - Large reduction of crash dump size

    cons
      - Some data is definitely lost. So, we cannot use this on some
        situations:

        1) High availability configuration where application triggers
        OS to crash and users want to debug the application later by
        retrieving the application's user process image from the
        system's crash dump.

        2) KVM virtualization configuration where KVM host machine
        contains KVM guest machine images as user processes.

        3) Page cache is needed for debugging filesystem related bugs.

  Compression
    pros
      - No financial cost.
      - No data lost.

    cons
      - Compression doesn't always reduce crash dump size.
      - take heavy CPU time. Slow if CPU is weak in speed.

Machines with large memory tend to have a lot of CPUs. Parallel
compression is sutable for parallel processing. My goal is to make
compression as for free as possible.

* TODO

  - Extend 512MB limit of reserved memory size for 2nd kernel for
    multiple CPUs.

  - Intel microcode patch loading on the 2nd kenrel is slow for the
    2nd and later CPUs: about one or more minutes per one CPU.

  - There are a limited number of irq vectors for TLB flush IPI on
    x86_64: 32 for recent 3.x kernels and 8 for around 2.6.x
    kernels. So compression doesn't scale if a lot of page reclaim
    happens when reading kernel image larger than memory. Special
    handling without page cache could be applicable to parallel dump
    mechanism, but more investigation is needed.

---

HATAYAMA Daisuke (2):
      Enter 2nd kernel with BSP
      Introduce crash ipi helpers to wait for APs to stop


 arch/x86/include/asm/reboot.h |    4 +++
 arch/x86/kernel/crash.c       |   15 +++++++++-
 arch/x86/kernel/reboot.c      |   63 +++++++++++++++++++++++++++++------------
 3 files changed, 62 insertions(+), 20 deletions(-)

-- 
HATAYAMA Daisuke

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 0/2] kdump: Enter 2nd kernel with BSP for enabling multiple CPUs
@ 2012-04-16  2:21 ` HATAYAMA Daisuke
  0 siblings, 0 replies; 19+ messages in thread
From: HATAYAMA Daisuke @ 2012-04-16  2:21 UTC (permalink / raw)
  To: kexec, linux-kernel, ebiederm, vgoyal, kumagai-atsushi

Currently, booting up 2nd kernel with multiple CPUs fails in most
cases since it enters 2nd kernel with AP if the crash happens on the
AP. The problem is to signal startup IPI from AP to BSP. Typical
result of the operation I saw is the machine hanging during the 2nd
kernel boot.

To solve this issue, always enter 2nd kernel with BSP. To do this, I
modify logic for shooting down CPUs. I use simple existing logic only
in this mechanism, not complicating crash path to machine_kexec().

I did stress tests about 100 in total on the processors below:

  Intel(R) Xeon(R) CPU E7- 4820  @ 2.00GHz
  Socket x 4, Core x 8, Thread x 16 (160 LCPUS in total)

  Intel(R) Xeon(R) CPU E7- 8870  @ 2.40GHz
  Socket x 8, Core x 10, Thread x 20 (64 LCPUS in total)

* Motivation of enabling multiple CPUs on the 2nd kernel

This patch is aimed at doing parallel compression on the 2nd
kernel. The machine that has more than tera bytes memory requires
several hours to generate crash dump.

There are several ways to reduce generation time of crash time, but
they have different pros and cons:

  Fast I/O devices
    pros
      - Can obtain high-speed stably
    cons
      - Big financial cost for good performance I/O devices. It's
        difficult financially to prepare these for all environments as
        dump devices.

  Filtering
    pros
      - No financial cost.
      - Large reduction of crash dump size

    cons
      - Some data is definitely lost. So, we cannot use this on some
        situations:

        1) High availability configuration where application triggers
        OS to crash and users want to debug the application later by
        retrieving the application's user process image from the
        system's crash dump.

        2) KVM virtualization configuration where KVM host machine
        contains KVM guest machine images as user processes.

        3) Page cache is needed for debugging filesystem related bugs.

  Compression
    pros
      - No financial cost.
      - No data lost.

    cons
      - Compression doesn't always reduce crash dump size.
      - take heavy CPU time. Slow if CPU is weak in speed.

Machines with large memory tend to have a lot of CPUs. Parallel
compression is sutable for parallel processing. My goal is to make
compression as for free as possible.

* TODO

  - Extend 512MB limit of reserved memory size for 2nd kernel for
    multiple CPUs.

  - Intel microcode patch loading on the 2nd kenrel is slow for the
    2nd and later CPUs: about one or more minutes per one CPU.

  - There are a limited number of irq vectors for TLB flush IPI on
    x86_64: 32 for recent 3.x kernels and 8 for around 2.6.x
    kernels. So compression doesn't scale if a lot of page reclaim
    happens when reading kernel image larger than memory. Special
    handling without page cache could be applicable to parallel dump
    mechanism, but more investigation is needed.

---

HATAYAMA Daisuke (2):
      Enter 2nd kernel with BSP
      Introduce crash ipi helpers to wait for APs to stop


 arch/x86/include/asm/reboot.h |    4 +++
 arch/x86/kernel/crash.c       |   15 +++++++++-
 arch/x86/kernel/reboot.c      |   63 +++++++++++++++++++++++++++++------------
 3 files changed, 62 insertions(+), 20 deletions(-)

-- 
HATAYAMA Daisuke

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/2] Introduce crash ipi helpers to wait for APs to stop
  2012-04-16  2:21 ` HATAYAMA Daisuke
@ 2012-04-16  2:21   ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 19+ messages in thread
From: HATAYAMA Daisuke @ 2012-04-16  2:21 UTC (permalink / raw)
  To: kexec, linux-kernel, ebiederm, vgoyal, kumagai-atsushi

Introduce crash ipi helpers to use them from BSP and AP sides in
common.

There's no logical change in this patch.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 arch/x86/include/asm/reboot.h |    4 +++
 arch/x86/kernel/reboot.c      |   53 ++++++++++++++++++++++++++++++++---------
 2 files changed, 45 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index 92f29706..2f8e9e7 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -26,4 +26,8 @@ void machine_real_restart(unsigned int type);
 typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
 void nmi_shootdown_cpus(nmi_shootdown_cb callback);
 
+void crash_ipi_init(void);
+void crash_ipi_dec_and_halt(void);
+void crash_ipi_wait_for_APs(void);
+
 #endif /* _ASM_X86_REBOOT_H */
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index d840e69..6dd77a8 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -769,6 +769,31 @@ static nmi_shootdown_cb shootdown_callback;
 
 static atomic_t waiting_for_crash_ipi;
 
+void crash_ipi_init(void)
+{
+	atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
+}
+
+void crash_ipi_dec_and_halt(void)
+{
+	atomic_dec(&waiting_for_crash_ipi);
+	/* Assume hlt works */
+	halt();
+	for (;;)
+		cpu_relax();
+}
+
+void crash_ipi_wait_for_APs(void)
+{
+	unsigned long msecs;
+
+	msecs = 1000; /* Wait at most a second for the other cpus to stop */
+	while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
+		mdelay(1);
+		msecs--;
+	}
+}
+
 static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
 {
 	int cpu;
@@ -785,11 +810,7 @@ static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
 
 	shootdown_callback(cpu, regs);
 
-	atomic_dec(&waiting_for_crash_ipi);
-	/* Assume hlt works */
-	halt();
-	for (;;)
-		cpu_relax();
+	crash_ipi_dec_and_halt();
 
 	return NMI_HANDLED;
 }
@@ -807,7 +828,6 @@ static void smp_send_nmi_allbutself(void)
  */
 void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 {
-	unsigned long msecs;
 	local_irq_disable();
 
 	/* Make a note of crashing cpu. Will be used in NMI callback.*/
@@ -815,7 +835,8 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 
 	shootdown_callback = callback;
 
-	atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
+	crash_ipi_init();
+
 	/* Would it be better to replace the trap vector here? */
 	if (register_nmi_handler(NMI_LOCAL, crash_nmi_callback,
 				 NMI_FLAG_FIRST, "crash"))
@@ -827,11 +848,7 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 
 	smp_send_nmi_allbutself();
 
-	msecs = 1000; /* Wait at most a second for the other cpus to stop */
-	while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
-		mdelay(1);
-		msecs--;
-	}
+	crash_ipi_wait_for_APs();
 
 	/* Leave the nmi callback set */
 }
@@ -840,4 +857,16 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 {
 	/* No other CPUs to shoot down */
 }
+
+void crash_ipi_init(void)
+{
+}
+
+void crash_ipi_dec_and_halt(void)
+{
+}
+
+void crash_ipi_wait_for_APs(void)
+{
+}
 #endif


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 1/2] Introduce crash ipi helpers to wait for APs to stop
@ 2012-04-16  2:21   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 19+ messages in thread
From: HATAYAMA Daisuke @ 2012-04-16  2:21 UTC (permalink / raw)
  To: kexec, linux-kernel, ebiederm, vgoyal, kumagai-atsushi

Introduce crash ipi helpers to use them from BSP and AP sides in
common.

There's no logical change in this patch.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 arch/x86/include/asm/reboot.h |    4 +++
 arch/x86/kernel/reboot.c      |   53 ++++++++++++++++++++++++++++++++---------
 2 files changed, 45 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index 92f29706..2f8e9e7 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -26,4 +26,8 @@ void machine_real_restart(unsigned int type);
 typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
 void nmi_shootdown_cpus(nmi_shootdown_cb callback);
 
+void crash_ipi_init(void);
+void crash_ipi_dec_and_halt(void);
+void crash_ipi_wait_for_APs(void);
+
 #endif /* _ASM_X86_REBOOT_H */
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index d840e69..6dd77a8 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -769,6 +769,31 @@ static nmi_shootdown_cb shootdown_callback;
 
 static atomic_t waiting_for_crash_ipi;
 
+void crash_ipi_init(void)
+{
+	atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
+}
+
+void crash_ipi_dec_and_halt(void)
+{
+	atomic_dec(&waiting_for_crash_ipi);
+	/* Assume hlt works */
+	halt();
+	for (;;)
+		cpu_relax();
+}
+
+void crash_ipi_wait_for_APs(void)
+{
+	unsigned long msecs;
+
+	msecs = 1000; /* Wait at most a second for the other cpus to stop */
+	while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
+		mdelay(1);
+		msecs--;
+	}
+}
+
 static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
 {
 	int cpu;
@@ -785,11 +810,7 @@ static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
 
 	shootdown_callback(cpu, regs);
 
-	atomic_dec(&waiting_for_crash_ipi);
-	/* Assume hlt works */
-	halt();
-	for (;;)
-		cpu_relax();
+	crash_ipi_dec_and_halt();
 
 	return NMI_HANDLED;
 }
@@ -807,7 +828,6 @@ static void smp_send_nmi_allbutself(void)
  */
 void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 {
-	unsigned long msecs;
 	local_irq_disable();
 
 	/* Make a note of crashing cpu. Will be used in NMI callback.*/
@@ -815,7 +835,8 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 
 	shootdown_callback = callback;
 
-	atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
+	crash_ipi_init();
+
 	/* Would it be better to replace the trap vector here? */
 	if (register_nmi_handler(NMI_LOCAL, crash_nmi_callback,
 				 NMI_FLAG_FIRST, "crash"))
@@ -827,11 +848,7 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 
 	smp_send_nmi_allbutself();
 
-	msecs = 1000; /* Wait at most a second for the other cpus to stop */
-	while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
-		mdelay(1);
-		msecs--;
-	}
+	crash_ipi_wait_for_APs();
 
 	/* Leave the nmi callback set */
 }
@@ -840,4 +857,16 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 {
 	/* No other CPUs to shoot down */
 }
+
+void crash_ipi_init(void)
+{
+}
+
+void crash_ipi_dec_and_halt(void)
+{
+}
+
+void crash_ipi_wait_for_APs(void)
+{
+}
 #endif


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/2] Enter 2nd kernel with BSP
  2012-04-16  2:21 ` HATAYAMA Daisuke
@ 2012-04-16  2:21   ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 19+ messages in thread
From: HATAYAMA Daisuke @ 2012-04-16  2:21 UTC (permalink / raw)
  To: kexec, linux-kernel, ebiederm, vgoyal, kumagai-atsushi

Split logic into BSP's and AP's: BSP waits for AP halting.

Don't remove variable crashing_cpu for debugging use; useful for
determining one what CPU crash happens.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 arch/x86/kernel/crash.c  |   15 ++++++++++++++-
 arch/x86/kernel/reboot.c |   16 ++++++----------
 2 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 13ad899..c5c19fa 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -83,9 +83,14 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 	 * In practice this means shooting down the other cpus in
 	 * an SMP system.
 	 */
+
+	int cpu;
+
 	/* The kernel is broken so disable interrupts */
 	local_irq_disable();
 
+	crash_ipi_init();
+
 	kdump_nmi_shootdown_cpus();
 
 	/* Booting kdump kernel with VMX or SVM enabled won't work,
@@ -102,5 +107,13 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 #ifdef CONFIG_HPET_TIMER
 	hpet_disable();
 #endif
-	crash_save_cpu(regs, safe_smp_processor_id());
+	cpu = safe_smp_processor_id();
+	crash_save_cpu(regs, cpu);
+
+	if (cpu_physical_id(cpu) == boot_cpu_physical_apicid) {
+		crash_ipi_wait_for_APs();
+		return;
+	}
+
+	crash_ipi_dec_and_halt();
 }
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 6dd77a8..90354f9 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -7,6 +7,7 @@
 #include <linux/sched.h>
 #include <linux/tboot.h>
 #include <linux/delay.h>
+#include <linux/kexec.h>
 #include <acpi/reboot.h>
 #include <asm/io.h>
 #include <asm/apic.h>
@@ -800,16 +801,15 @@ static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
 
 	cpu = raw_smp_processor_id();
 
-	/* Don't do anything if this handler is invoked on crashing cpu.
-	 * Otherwise, system will completely hang. Crashing cpu can get
-	 * an NMI if system was initially booted with nmi_watchdog parameter.
-	 */
-	if (cpu == crashing_cpu)
-		return NMI_HANDLED;
 	local_irq_disable();
 
 	shootdown_callback(cpu, regs);
 
+	if (cpu_physical_id(cpu) == boot_cpu_physical_apicid) {
+		crash_ipi_wait_for_APs();
+		machine_kexec(kexec_crash_image);
+	}
+
 	crash_ipi_dec_and_halt();
 
 	return NMI_HANDLED;
@@ -835,8 +835,6 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 
 	shootdown_callback = callback;
 
-	crash_ipi_init();
-
 	/* Would it be better to replace the trap vector here? */
 	if (register_nmi_handler(NMI_LOCAL, crash_nmi_callback,
 				 NMI_FLAG_FIRST, "crash"))
@@ -848,8 +846,6 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 
 	smp_send_nmi_allbutself();
 
-	crash_ipi_wait_for_APs();
-
 	/* Leave the nmi callback set */
 }
 #else /* !CONFIG_SMP */


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/2] Enter 2nd kernel with BSP
@ 2012-04-16  2:21   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 19+ messages in thread
From: HATAYAMA Daisuke @ 2012-04-16  2:21 UTC (permalink / raw)
  To: kexec, linux-kernel, ebiederm, vgoyal, kumagai-atsushi

Split logic into BSP's and AP's: BSP waits for AP halting.

Don't remove variable crashing_cpu for debugging use; useful for
determining one what CPU crash happens.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 arch/x86/kernel/crash.c  |   15 ++++++++++++++-
 arch/x86/kernel/reboot.c |   16 ++++++----------
 2 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 13ad899..c5c19fa 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -83,9 +83,14 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 	 * In practice this means shooting down the other cpus in
 	 * an SMP system.
 	 */
+
+	int cpu;
+
 	/* The kernel is broken so disable interrupts */
 	local_irq_disable();
 
+	crash_ipi_init();
+
 	kdump_nmi_shootdown_cpus();
 
 	/* Booting kdump kernel with VMX or SVM enabled won't work,
@@ -102,5 +107,13 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 #ifdef CONFIG_HPET_TIMER
 	hpet_disable();
 #endif
-	crash_save_cpu(regs, safe_smp_processor_id());
+	cpu = safe_smp_processor_id();
+	crash_save_cpu(regs, cpu);
+
+	if (cpu_physical_id(cpu) == boot_cpu_physical_apicid) {
+		crash_ipi_wait_for_APs();
+		return;
+	}
+
+	crash_ipi_dec_and_halt();
 }
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 6dd77a8..90354f9 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -7,6 +7,7 @@
 #include <linux/sched.h>
 #include <linux/tboot.h>
 #include <linux/delay.h>
+#include <linux/kexec.h>
 #include <acpi/reboot.h>
 #include <asm/io.h>
 #include <asm/apic.h>
@@ -800,16 +801,15 @@ static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
 
 	cpu = raw_smp_processor_id();
 
-	/* Don't do anything if this handler is invoked on crashing cpu.
-	 * Otherwise, system will completely hang. Crashing cpu can get
-	 * an NMI if system was initially booted with nmi_watchdog parameter.
-	 */
-	if (cpu == crashing_cpu)
-		return NMI_HANDLED;
 	local_irq_disable();
 
 	shootdown_callback(cpu, regs);
 
+	if (cpu_physical_id(cpu) == boot_cpu_physical_apicid) {
+		crash_ipi_wait_for_APs();
+		machine_kexec(kexec_crash_image);
+	}
+
 	crash_ipi_dec_and_halt();
 
 	return NMI_HANDLED;
@@ -835,8 +835,6 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 
 	shootdown_callback = callback;
 
-	crash_ipi_init();
-
 	/* Would it be better to replace the trap vector here? */
 	if (register_nmi_handler(NMI_LOCAL, crash_nmi_callback,
 				 NMI_FLAG_FIRST, "crash"))
@@ -848,8 +846,6 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 
 	smp_send_nmi_allbutself();
 
-	crash_ipi_wait_for_APs();
-
 	/* Leave the nmi callback set */
 }
 #else /* !CONFIG_SMP */


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/2] kdump: Enter 2nd kernel with BSP for enabling multiple CPUs
       [not found] ` <beaa8ade-b7af-4e71-b4e0-a418ceb83f1e@email.android.com>
@ 2012-04-16  6:40     ` HATAYAMA Daisuke
  0 siblings, 0 replies; 19+ messages in thread
From: HATAYAMA Daisuke @ 2012-04-16  6:40 UTC (permalink / raw)
  To: ebiederm; +Cc: kexec, linux-kernel, vgoyal, kumagai-atsushi

From: "Eric W. Biederman" <ebiederm@xmission.com>
Subject: Re: [PATCH 0/2] kdump: Enter 2nd kernel with BSP for enabling multiple CPUs
Date: Sun, 15 Apr 2012 19:59:52 -0700

> HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
> 
>>Currently, booting up 2nd kernel with multiple CPUs fails in most
>>cases since it enters 2nd kernel with AP if the crash happens on the
>>AP. The problem is to signal startup IPI from AP to BSP.
> 
> If so, then we need to fix our code that sends startupIPIs.
> And perhaps the code that attempts to shutdown the other cpus.
> 

Now maxcpus=1 is set at default, in which configuration, 2nd kernel
doesn't try to wake up secondary and later CPUs. So there's no startup
IPI on the 2nd kernel now.

> It is not ok to switch cpus during kdump (reducing the reliability) just so you can write crash dumps faster.  Better would be to cope with secondary cpus not booting.

Even the current implememntation uses NMI to stop other CPUs. The
reliability you are concerned about here is the possibility where the
non-crashing BSP doesn't go into machine_kexec() due to some failures
of interrupt processing, right?

Alternative idea is:
  1) try to go into 2nd kernel with BSP,
  2) after some seconds, then try to go into 2nd kernel with crashing
  CPU. Then, think of CPUs except for the crashing cpu as abnormal,
  and use crashing cpu only on the 2nd kernel.

This seems as reliable as the current one.

> 
> I do like the direction of pounding on things so we can get multiple cpus going in large configurations.  Although I am surprised you are cpu bound and not disk bound in the time to write your crash dumps.
> 

What do you indicate in the 1st sentence? I don't understand around
``pounding on thing'', sorry.

For the 2nd, it depends on data. If data is sparse enough, the data
size is significantly reduced, and so IO size is also reduced. If data
is randomized enough, compression takes much time and the data remains
the same size, resulting in cpu bound processing.

Thanks.
HATAYAMA, Daisuke

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/2] kdump: Enter 2nd kernel with BSP for enabling multiple CPUs
@ 2012-04-16  6:40     ` HATAYAMA Daisuke
  0 siblings, 0 replies; 19+ messages in thread
From: HATAYAMA Daisuke @ 2012-04-16  6:40 UTC (permalink / raw)
  To: ebiederm; +Cc: kumagai-atsushi, kexec, linux-kernel, vgoyal

From: "Eric W. Biederman" <ebiederm@xmission.com>
Subject: Re: [PATCH 0/2] kdump: Enter 2nd kernel with BSP for enabling multiple CPUs
Date: Sun, 15 Apr 2012 19:59:52 -0700

> HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
> 
>>Currently, booting up 2nd kernel with multiple CPUs fails in most
>>cases since it enters 2nd kernel with AP if the crash happens on the
>>AP. The problem is to signal startup IPI from AP to BSP.
> 
> If so, then we need to fix our code that sends startupIPIs.
> And perhaps the code that attempts to shutdown the other cpus.
> 

Now maxcpus=1 is set at default, in which configuration, 2nd kernel
doesn't try to wake up secondary and later CPUs. So there's no startup
IPI on the 2nd kernel now.

> It is not ok to switch cpus during kdump (reducing the reliability) just so you can write crash dumps faster.  Better would be to cope with secondary cpus not booting.

Even the current implememntation uses NMI to stop other CPUs. The
reliability you are concerned about here is the possibility where the
non-crashing BSP doesn't go into machine_kexec() due to some failures
of interrupt processing, right?

Alternative idea is:
  1) try to go into 2nd kernel with BSP,
  2) after some seconds, then try to go into 2nd kernel with crashing
  CPU. Then, think of CPUs except for the crashing cpu as abnormal,
  and use crashing cpu only on the 2nd kernel.

This seems as reliable as the current one.

> 
> I do like the direction of pounding on things so we can get multiple cpus going in large configurations.  Although I am surprised you are cpu bound and not disk bound in the time to write your crash dumps.
> 

What do you indicate in the 1st sentence? I don't understand around
``pounding on thing'', sorry.

For the 2nd, it depends on data. If data is sparse enough, the data
size is significantly reduced, and so IO size is also reduced. If data
is randomized enough, compression takes much time and the data remains
the same size, resulting in cpu bound processing.

Thanks.
HATAYAMA, Daisuke

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] Enter 2nd kernel with BSP
  2012-04-16  2:21   ` HATAYAMA Daisuke
  (?)
@ 2012-04-23 10:46   ` Andi Kleen
  -1 siblings, 0 replies; 19+ messages in thread
From: Andi Kleen @ 2012-04-23 10:46 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: public-kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	public-linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	public-ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	public-vgoyal-H+wXaHxf7aLQT0dZR+AlfA,
	public-kumagai-atsushi-biTfD1RFvDe45+QrQBaojngSJqDPrsil,
	fenghua.yu

HATAYAMA Daisuke <d.hatayama-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
writes:

> Split logic into BSP's and AP's: BSP waits for AP halting.
>
> Don't remove variable crashing_cpu for debugging use; useful for
> determining one what CPU crash happens.

Fenghua has this patchkit to allow offlining the BSP. What happens then?

It would be good to understand why initializing APs from other APs
should not work.  Not aware of such a limitation in x86.

AFAIK when we online a random cpu after boot there is no effort to do it
from BSP. Why does that work and not from the kexec boot?

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] Enter 2nd kernel with BSP
  2012-04-16  2:21   ` HATAYAMA Daisuke
  (?)
  (?)
@ 2012-04-23 10:49   ` Andi Kleen
  2012-04-24  2:05     ` HATAYAMA Daisuke
  -1 siblings, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2012-04-23 10:49 UTC (permalink / raw)
  To: HATAYAMA Daisuke; +Cc: linux-kernel, fenghua.yu

> Split logic into BSP's and AP's: BSP waits for AP halting.
>
> Don't remove variable crashing_cpu for debugging use; useful for
> determining one what CPU crash happens.

Fenghua has this patchkit to allow offlining the BSP. What happens then?

It would be good to understand why initializing APs from other APs
should not work.  Not aware of such a limitation in x86.

AFAIK when we online a random cpu after boot there is no effort to do it
from BSP. Why does that work and not from the kexec boot?

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] Enter 2nd kernel with BSP
  2012-04-23 10:49   ` Andi Kleen
@ 2012-04-24  2:05     ` HATAYAMA Daisuke
  2012-04-24  3:04       ` Yu, Fenghua
  2012-04-24  8:04       ` Andi Kleen
  0 siblings, 2 replies; 19+ messages in thread
From: HATAYAMA Daisuke @ 2012-04-24  2:05 UTC (permalink / raw)
  To: andi; +Cc: linux-kernel, fenghua.yu, ebiederm

From: Andi Kleen <andi@firstfloor.org>
Subject: Re: [PATCH 2/2] Enter 2nd kernel with BSP
Date: Mon, 23 Apr 2012 03:49:15 -0700

>> Split logic into BSP's and AP's: BSP waits for AP halting.
>>
>> Don't remove variable crashing_cpu for debugging use; useful for
>> determining one what CPU crash happens.
> 
> Fenghua has this patchkit to allow offlining the BSP. What happens then?
> 
> It would be good to understand why initializing APs from other APs
> should not work.  Not aware of such a limitation in x86.
> 
> AFAIK when we online a random cpu after boot there is no effort to do it
> from BSP. Why does that work and not from the kexec boot?
> 

Hello Andi,

I have yet to test what I explain here, so I'm sorry I say these with
``would''.

APs to APs case would be entirely my misunderstanding. The problematic
case would be APs to BSP case only.

native_cpu_up avoids to wake up cpu with boot_cpu_physical_apicid, but
the boot_cpu_physical_apicid would be just a boot cpu now, which could
be non-BSP on the 2nd kernel; equal to crashing cpu on the 1st
kernel. It would need to check explicitly whether a given cpu is BSP
or not and this would be better fix here. This is suggested by Eric.

Thanks.
HATAYAMA, Daisuke

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 2/2] Enter 2nd kernel with BSP
  2012-04-24  2:05     ` HATAYAMA Daisuke
@ 2012-04-24  3:04       ` Yu, Fenghua
  2012-04-24  8:04       ` Andi Kleen
  1 sibling, 0 replies; 19+ messages in thread
From: Yu, Fenghua @ 2012-04-24  3:04 UTC (permalink / raw)
  To: HATAYAMA Daisuke, andi; +Cc: linux-kernel, ebiederm

> >> Split logic into BSP's and AP's: BSP waits for AP halting.
> >>
> >> Don't remove variable crashing_cpu for debugging use; useful for
> >> determining one what CPU crash happens.
> >
> > Fenghua has this patchkit to allow offlining the BSP. What happens
> then?

I'm fixing one issue when doing stress BSP online/offline test. I'll send out a newer patch set pretty soon.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] Enter 2nd kernel with BSP
  2012-04-24  2:05     ` HATAYAMA Daisuke
  2012-04-24  3:04       ` Yu, Fenghua
@ 2012-04-24  8:04       ` Andi Kleen
  2012-04-24 10:46         ` Eric W. Biederman
  1 sibling, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2012-04-24  8:04 UTC (permalink / raw)
  To: HATAYAMA Daisuke; +Cc: andi, linux-kernel, fenghua.yu, ebiederm

> native_cpu_up avoids to wake up cpu with boot_cpu_physical_apicid, but

Ok so that check needs to go then. I wonder why it is there.

> the boot_cpu_physical_apicid would be just a boot cpu now, which could
> be non-BSP on the 2nd kernel; equal to crashing cpu on the 1st
> kernel. It would need to check explicitly whether a given cpu is BSP
> or not and this would be better fix here. This is suggested by Eric.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] Enter 2nd kernel with BSP
  2012-04-24  8:04       ` Andi Kleen
@ 2012-04-24 10:46         ` Eric W. Biederman
  0 siblings, 0 replies; 19+ messages in thread
From: Eric W. Biederman @ 2012-04-24 10:46 UTC (permalink / raw)
  To: Andi Kleen; +Cc: HATAYAMA Daisuke, linux-kernel, fenghua.yu

Andi Kleen <andi@firstfloor.org> writes:

>> native_cpu_up avoids to wake up cpu with boot_cpu_physical_apicid, but
>
> Ok so that check needs to go then. I wonder why it is there.

As I remember we had a check to avoid waking up the current cpu
we are running on, and there was something about the value for
the current cpu coming from BIOS tables etc.  Because of all of the
silliness in how we get the apicid for the current cpu I seem to
remember the case where we start all of our secondary processors
to periodically regress.

>> the boot_cpu_physical_apicid would be just a boot cpu now, which could
>> be non-BSP on the 2nd kernel; equal to crashing cpu on the 1st
>> kernel. It would need to check explicitly whether a given cpu is BSP
>> or not and this would be better fix here. This is suggested by Eric.

Yes.  My two reasons for designing the crash path the way I did was
that we might have offlined/hot-unplugged the boot cpu, or the bootcpu
might have gotten itself into a terrible state.  Since we want as much
reliability in kexec on panic we don't bother trying to restore it.

The normal reboot path, and a normal kexec tries to reboot on the boot
cpu primarily because there are a lot of BIOS's out there that if you
talk to them on anything other than the boot cpu they get confused.

Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/2] kdump: Enter 2nd kernel with BSP for enabling multiple CPUs
  2012-04-16  2:21 ` HATAYAMA Daisuke
                   ` (3 preceding siblings ...)
  (?)
@ 2012-05-14  8:29 ` Cong Wang
  -1 siblings, 0 replies; 19+ messages in thread
From: Cong Wang @ 2012-05-14  8:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: kexec

So, the reason why you want to have multiple CPU's enabled in the 2nd
kernel is to speed up the compression of the core dump?

The first question is that, why the speed is important? Given the fact
that the whole kdump progress happens automatically nowdays, there should
be very few guys waiting for a kdump to complete, so the speed is not that
important.

Second, currently we use nr_cpus=1 for the 2nd kernel on RHEL6,
to reduce the memory usage in the 2nd kernel. You mentioned
512M is a limit, but we want to make it even less, even 512M is
not a good choice for us on x86. Bringing up more than 1 CPU will
of course need more memory in the 2nd kernel.

The limit is not only the size, but also the max address of loading
initrd, which is 896M on x86 IIRC. A contiguos memory area larger
than 512M usually sit above 869M, due to the fragmentation in
the lower memory, so I am afraid you need to do (much?) more work.

So, I am afraid you spend too much effort to fix a not-that-important
issue, but I may miss something here...

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/2] kdump: Enter 2nd kernel with BSP for enabling multiple CPUs
  2012-04-16  2:21 ` HATAYAMA Daisuke
@ 2013-04-18 11:41   ` Petr Tesarik
  -1 siblings, 0 replies; 19+ messages in thread
From: Petr Tesarik @ 2013-04-18 11:41 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: kexec, linux-kernel, ebiederm, vgoyal, kumagai-atsushi, Fenghua Yu

On Mon, 16 Apr 2012 11:21:28 +0900
HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:

> Currently, booting up 2nd kernel with multiple CPUs fails in most
> cases since it enters 2nd kernel with AP if the crash happens on the
> AP. The problem is to signal startup IPI from AP to BSP. Typical
> result of the operation I saw is the machine hanging during the 2nd
> kernel boot.
> 
> To solve this issue, always enter 2nd kernel with BSP. To do this, I
> modify logic for shooting down CPUs. I use simple existing logic only
> in this mechanism, not complicating crash path to machine_kexec().

These patches looked pretty good. I seem to recall that Fenghua (from
Intel) had an alternative solution for booting from AP. Unfortunately I
can't find his mails in my kexec mailbox...

Anyway, what's the latest upstream status?

Petr

> I did stress tests about 100 in total on the processors below:
> 
>   Intel(R) Xeon(R) CPU E7- 4820  @ 2.00GHz
>   Socket x 4, Core x 8, Thread x 16 (160 LCPUS in total)
> 
>   Intel(R) Xeon(R) CPU E7- 8870  @ 2.40GHz
>   Socket x 8, Core x 10, Thread x 20 (64 LCPUS in total)
> 
> * Motivation of enabling multiple CPUs on the 2nd kernel
> 
> This patch is aimed at doing parallel compression on the 2nd
> kernel. The machine that has more than tera bytes memory requires
> several hours to generate crash dump.
> 
> There are several ways to reduce generation time of crash time, but
> they have different pros and cons:
> 
>   Fast I/O devices
>     pros
>       - Can obtain high-speed stably
>     cons
>       - Big financial cost for good performance I/O devices. It's
>         difficult financially to prepare these for all environments as
>         dump devices.
> 
>   Filtering
>     pros
>       - No financial cost.
>       - Large reduction of crash dump size
> 
>     cons
>       - Some data is definitely lost. So, we cannot use this on some
>         situations:
> 
>         1) High availability configuration where application triggers
>         OS to crash and users want to debug the application later by
>         retrieving the application's user process image from the
>         system's crash dump.
> 
>         2) KVM virtualization configuration where KVM host machine
>         contains KVM guest machine images as user processes.
> 
>         3) Page cache is needed for debugging filesystem related bugs.
> 
>   Compression
>     pros
>       - No financial cost.
>       - No data lost.
> 
>     cons
>       - Compression doesn't always reduce crash dump size.
>       - take heavy CPU time. Slow if CPU is weak in speed.
> 
> Machines with large memory tend to have a lot of CPUs. Parallel
> compression is sutable for parallel processing. My goal is to make
> compression as for free as possible.
> 
> * TODO
> 
>   - Extend 512MB limit of reserved memory size for 2nd kernel for
>     multiple CPUs.
> 
>   - Intel microcode patch loading on the 2nd kenrel is slow for the
>     2nd and later CPUs: about one or more minutes per one CPU.
> 
>   - There are a limited number of irq vectors for TLB flush IPI on
>     x86_64: 32 for recent 3.x kernels and 8 for around 2.6.x
>     kernels. So compression doesn't scale if a lot of page reclaim
>     happens when reading kernel image larger than memory. Special
>     handling without page cache could be applicable to parallel dump
>     mechanism, but more investigation is needed.
> 
> ---
> 
> HATAYAMA Daisuke (2):
>       Enter 2nd kernel with BSP
>       Introduce crash ipi helpers to wait for APs to stop
> 
> 
>  arch/x86/include/asm/reboot.h |    4 +++
>  arch/x86/kernel/crash.c       |   15 +++++++++-
>  arch/x86/kernel/reboot.c      |   63 +++++++++++++++++++++++++++++------------
>  3 files changed, 62 insertions(+), 20 deletions(-)
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/2] kdump: Enter 2nd kernel with BSP for enabling multiple CPUs
@ 2013-04-18 11:41   ` Petr Tesarik
  0 siblings, 0 replies; 19+ messages in thread
From: Petr Tesarik @ 2013-04-18 11:41 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: Fenghua Yu, kexec, linux-kernel, kumagai-atsushi, ebiederm, vgoyal

On Mon, 16 Apr 2012 11:21:28 +0900
HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:

> Currently, booting up 2nd kernel with multiple CPUs fails in most
> cases since it enters 2nd kernel with AP if the crash happens on the
> AP. The problem is to signal startup IPI from AP to BSP. Typical
> result of the operation I saw is the machine hanging during the 2nd
> kernel boot.
> 
> To solve this issue, always enter 2nd kernel with BSP. To do this, I
> modify logic for shooting down CPUs. I use simple existing logic only
> in this mechanism, not complicating crash path to machine_kexec().

These patches looked pretty good. I seem to recall that Fenghua (from
Intel) had an alternative solution for booting from AP. Unfortunately I
can't find his mails in my kexec mailbox...

Anyway, what's the latest upstream status?

Petr

> I did stress tests about 100 in total on the processors below:
> 
>   Intel(R) Xeon(R) CPU E7- 4820  @ 2.00GHz
>   Socket x 4, Core x 8, Thread x 16 (160 LCPUS in total)
> 
>   Intel(R) Xeon(R) CPU E7- 8870  @ 2.40GHz
>   Socket x 8, Core x 10, Thread x 20 (64 LCPUS in total)
> 
> * Motivation of enabling multiple CPUs on the 2nd kernel
> 
> This patch is aimed at doing parallel compression on the 2nd
> kernel. The machine that has more than tera bytes memory requires
> several hours to generate crash dump.
> 
> There are several ways to reduce generation time of crash time, but
> they have different pros and cons:
> 
>   Fast I/O devices
>     pros
>       - Can obtain high-speed stably
>     cons
>       - Big financial cost for good performance I/O devices. It's
>         difficult financially to prepare these for all environments as
>         dump devices.
> 
>   Filtering
>     pros
>       - No financial cost.
>       - Large reduction of crash dump size
> 
>     cons
>       - Some data is definitely lost. So, we cannot use this on some
>         situations:
> 
>         1) High availability configuration where application triggers
>         OS to crash and users want to debug the application later by
>         retrieving the application's user process image from the
>         system's crash dump.
> 
>         2) KVM virtualization configuration where KVM host machine
>         contains KVM guest machine images as user processes.
> 
>         3) Page cache is needed for debugging filesystem related bugs.
> 
>   Compression
>     pros
>       - No financial cost.
>       - No data lost.
> 
>     cons
>       - Compression doesn't always reduce crash dump size.
>       - take heavy CPU time. Slow if CPU is weak in speed.
> 
> Machines with large memory tend to have a lot of CPUs. Parallel
> compression is sutable for parallel processing. My goal is to make
> compression as for free as possible.
> 
> * TODO
> 
>   - Extend 512MB limit of reserved memory size for 2nd kernel for
>     multiple CPUs.
> 
>   - Intel microcode patch loading on the 2nd kenrel is slow for the
>     2nd and later CPUs: about one or more minutes per one CPU.
> 
>   - There are a limited number of irq vectors for TLB flush IPI on
>     x86_64: 32 for recent 3.x kernels and 8 for around 2.6.x
>     kernels. So compression doesn't scale if a lot of page reclaim
>     happens when reading kernel image larger than memory. Special
>     handling without page cache could be applicable to parallel dump
>     mechanism, but more investigation is needed.
> 
> ---
> 
> HATAYAMA Daisuke (2):
>       Enter 2nd kernel with BSP
>       Introduce crash ipi helpers to wait for APs to stop
> 
> 
>  arch/x86/include/asm/reboot.h |    4 +++
>  arch/x86/kernel/crash.c       |   15 +++++++++-
>  arch/x86/kernel/reboot.c      |   63 +++++++++++++++++++++++++++++------------
>  3 files changed, 62 insertions(+), 20 deletions(-)
> 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/2] kdump: Enter 2nd kernel with BSP for enabling multiple CPUs
  2013-04-18 11:41   ` Petr Tesarik
@ 2013-04-19  8:45     ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 19+ messages in thread
From: HATAYAMA Daisuke @ 2013-04-19  8:45 UTC (permalink / raw)
  To: Petr Tesarik
  Cc: kexec, linux-kernel, ebiederm, vgoyal, kumagai-atsushi, Fenghua Yu

(2013/04/18 20:41), Petr Tesarik wrote:
> On Mon, 16 Apr 2012 11:21:28 +0900
> HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
>
>> Currently, booting up 2nd kernel with multiple CPUs fails in most
>> cases since it enters 2nd kernel with AP if the crash happens on the
>> AP. The problem is to signal startup IPI from AP to BSP. Typical
>> result of the operation I saw is the machine hanging during the 2nd
>> kernel boot.
>>
>> To solve this issue, always enter 2nd kernel with BSP. To do this, I
>> modify logic for shooting down CPUs. I use simple existing logic only
>> in this mechanism, not complicating crash path to machine_kexec().
>
> These patches looked pretty good. I seem to recall that Fenghua (from
> Intel) had an alternative solution for booting from AP. Unfortunately I
> can't find his mails in my kexec mailbox...
>
> Anyway, what's the latest upstream status?

It's still in experimental state.

The patch itself was nacked by Erick since switching the CPU that 
entered 2nd kenrel through NMI reduced reliability of kdump.

At the discussion of my 2nd patch set that tried to reset BSP flag at 
boot on the 2nd kernel, Erick suggested that BSP flag could be changed 
at runtime and then behaviour when INIT was received varied and first we 
should discuss how unsetting BSP flag affects system.

I'm now going in this direction and the patch I posted a month ago is:

[PATCH] x86, apic: Add unset_bsp parameter to unset BSP flag at boot time
https://lkml.org/lkml/2013/3/18/107

According to Fenghua, some kind of firmware assumes that BSP flag is 
being kept throughout system is running. I have yet to see difference of 
behaviour when unsetting BSP flag on top of the patch on my machine. I 
think this is system dependent and it might be better to assign each 
user to decide whether to unset BSP flag or not.

BTW, the work of software cpu hotplug for BSP by Fenghua is orthogonal 
to my case. His work is for system including firmware that is affected 
if BSP flag is unset and assumes healthy system that cpu#0 is always 
BSP. On the other hand, our case is for crash kernel and we can no 
longer assume cpu#0 is BSP and can no longer use NMI to wake up other 
CPUs since we cannot use logic that depends on the state of CPUs 
sleeping in the 1st kernel.

-- 
Thanks.
HATAYAMA, Daisuke

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/2] kdump: Enter 2nd kernel with BSP for enabling multiple CPUs
@ 2013-04-19  8:45     ` HATAYAMA Daisuke
  0 siblings, 0 replies; 19+ messages in thread
From: HATAYAMA Daisuke @ 2013-04-19  8:45 UTC (permalink / raw)
  To: Petr Tesarik
  Cc: Fenghua Yu, kexec, linux-kernel, kumagai-atsushi, ebiederm, vgoyal

(2013/04/18 20:41), Petr Tesarik wrote:
> On Mon, 16 Apr 2012 11:21:28 +0900
> HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
>
>> Currently, booting up 2nd kernel with multiple CPUs fails in most
>> cases since it enters 2nd kernel with AP if the crash happens on the
>> AP. The problem is to signal startup IPI from AP to BSP. Typical
>> result of the operation I saw is the machine hanging during the 2nd
>> kernel boot.
>>
>> To solve this issue, always enter 2nd kernel with BSP. To do this, I
>> modify logic for shooting down CPUs. I use simple existing logic only
>> in this mechanism, not complicating crash path to machine_kexec().
>
> These patches looked pretty good. I seem to recall that Fenghua (from
> Intel) had an alternative solution for booting from AP. Unfortunately I
> can't find his mails in my kexec mailbox...
>
> Anyway, what's the latest upstream status?

It's still in experimental state.

The patch itself was nacked by Erick since switching the CPU that 
entered 2nd kenrel through NMI reduced reliability of kdump.

At the discussion of my 2nd patch set that tried to reset BSP flag at 
boot on the 2nd kernel, Erick suggested that BSP flag could be changed 
at runtime and then behaviour when INIT was received varied and first we 
should discuss how unsetting BSP flag affects system.

I'm now going in this direction and the patch I posted a month ago is:

[PATCH] x86, apic: Add unset_bsp parameter to unset BSP flag at boot time
https://lkml.org/lkml/2013/3/18/107

According to Fenghua, some kind of firmware assumes that BSP flag is 
being kept throughout system is running. I have yet to see difference of 
behaviour when unsetting BSP flag on top of the patch on my machine. I 
think this is system dependent and it might be better to assign each 
user to decide whether to unset BSP flag or not.

BTW, the work of software cpu hotplug for BSP by Fenghua is orthogonal 
to my case. His work is for system including firmware that is affected 
if BSP flag is unset and assumes healthy system that cpu#0 is always 
BSP. On the other hand, our case is for crash kernel and we can no 
longer assume cpu#0 is BSP and can no longer use NMI to wake up other 
CPUs since we cannot use logic that depends on the state of CPUs 
sleeping in the 1st kernel.

-- 
Thanks.
HATAYAMA, Daisuke

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2013-04-19  8:45 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-16  2:21 [PATCH 0/2] kdump: Enter 2nd kernel with BSP for enabling multiple CPUs HATAYAMA Daisuke
2012-04-16  2:21 ` HATAYAMA Daisuke
2012-04-16  2:21 ` [PATCH 1/2] Introduce crash ipi helpers to wait for APs to stop HATAYAMA Daisuke
2012-04-16  2:21   ` HATAYAMA Daisuke
2012-04-16  2:21 ` [PATCH 2/2] Enter 2nd kernel with BSP HATAYAMA Daisuke
2012-04-16  2:21   ` HATAYAMA Daisuke
2012-04-23 10:46   ` Andi Kleen
2012-04-23 10:49   ` Andi Kleen
2012-04-24  2:05     ` HATAYAMA Daisuke
2012-04-24  3:04       ` Yu, Fenghua
2012-04-24  8:04       ` Andi Kleen
2012-04-24 10:46         ` Eric W. Biederman
     [not found] ` <beaa8ade-b7af-4e71-b4e0-a418ceb83f1e@email.android.com>
2012-04-16  6:40   ` [PATCH 0/2] kdump: Enter 2nd kernel with BSP for enabling multiple CPUs HATAYAMA Daisuke
2012-04-16  6:40     ` HATAYAMA Daisuke
2012-05-14  8:29 ` Cong Wang
2013-04-18 11:41 ` Petr Tesarik
2013-04-18 11:41   ` Petr Tesarik
2013-04-19  8:45   ` HATAYAMA Daisuke
2013-04-19  8:45     ` HATAYAMA Daisuke

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.