linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] kdump: Fix for boot problems on SMP
@ 2004-11-18 14:08 Hariprasad Nellitheertha
  2004-11-18 15:34 ` Badari Pulavarty
  2004-11-19 17:56 ` Akinobu Mita
  0 siblings, 2 replies; 16+ messages in thread
From: Hariprasad Nellitheertha @ 2004-11-18 14:08 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel; +Cc: pbadari, Vara Prasad

[-- Attachment #1: Type: text/plain, Size: 254 bytes --]

Hi Andrew,

There was a buggy (and unnecessary) reserve_bootmem call in the kdump 
call which was causing hangs during early on some SMP machines. The 
attached patch removes that.

Kindly include this patch into the -mm tree.

Thanks and Regards, Hari


[-- Attachment #2: kdump-reserve-bootmem-fix.patch --]
[-- Type: text/plain, Size: 740 bytes --]



Signed-off-by: Hariprasad Nellitheertha <hari@in.ibm.com>
---

 linux-2.6.10-rc2-hari/include/asm-i386/crash_dump.h |    1 -
 1 files changed, 1 deletion(-)

diff -puN include/asm-i386/crash_dump.h~kdump-reserve-bootmem-fix include/asm-i386/crash_dump.h
--- linux-2.6.10-rc2/include/asm-i386/crash_dump.h~kdump-reserve-bootmem-fix	2004-11-18 19:20:47.000000000 +0530
+++ linux-2.6.10-rc2-hari/include/asm-i386/crash_dump.h	2004-11-18 19:21:03.000000000 +0530
@@ -37,7 +37,6 @@ static inline void set_saved_max_pfn(voi
 static inline void crash_reserve_bootmem(void)
 {
 	if (!dump_enabled) {
-		reserve_bootmem(0, CRASH_RELOCATE_SIZE);
 		reserve_bootmem(CRASH_BACKUP_BASE,
 			CRASH_BACKUP_SIZE + CRASH_RELOCATE_SIZE + PAGE_SIZE);
 	}
_

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kdump: Fix for boot problems on SMP
  2004-11-18 14:08 [PATCH] kdump: Fix for boot problems on SMP Hariprasad Nellitheertha
@ 2004-11-18 15:34 ` Badari Pulavarty
  2004-11-19 17:56 ` Akinobu Mita
  1 sibling, 0 replies; 16+ messages in thread
From: Badari Pulavarty @ 2004-11-18 15:34 UTC (permalink / raw)
  To: Hariprasad Nellitheertha; +Cc: Andrew Morton, linux-kernel, Vara Prasad

Hari,

Tested the patch on my 4-way P-III  (where it was hanging earlier)
and it works fine for me.

Thanks,
Badari

Hariprasad Nellitheertha wrote:
> Hi Andrew,
> 
> There was a buggy (and unnecessary) reserve_bootmem call in the kdump 
> call which was causing hangs during early on some SMP machines. The 
> attached patch removes that.
> 
> Kindly include this patch into the -mm tree.
> 
> Thanks and Regards, Hari
> 
> 
> ------------------------------------------------------------------------
> 
> 
> 
> Signed-off-by: Hariprasad Nellitheertha <hari@in.ibm.com>
> ---
> 
>  linux-2.6.10-rc2-hari/include/asm-i386/crash_dump.h |    1 -
>  1 files changed, 1 deletion(-)
> 
> diff -puN include/asm-i386/crash_dump.h~kdump-reserve-bootmem-fix include/asm-i386/crash_dump.h
> --- linux-2.6.10-rc2/include/asm-i386/crash_dump.h~kdump-reserve-bootmem-fix	2004-11-18 19:20:47.000000000 +0530
> +++ linux-2.6.10-rc2-hari/include/asm-i386/crash_dump.h	2004-11-18 19:21:03.000000000 +0530
> @@ -37,7 +37,6 @@ static inline void set_saved_max_pfn(voi
>  static inline void crash_reserve_bootmem(void)
>  {
>  	if (!dump_enabled) {
> -		reserve_bootmem(0, CRASH_RELOCATE_SIZE);
>  		reserve_bootmem(CRASH_BACKUP_BASE,
>  			CRASH_BACKUP_SIZE + CRASH_RELOCATE_SIZE + PAGE_SIZE);
>  	}
> _


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kdump: Fix for boot problems on SMP
  2004-11-18 14:08 [PATCH] kdump: Fix for boot problems on SMP Hariprasad Nellitheertha
  2004-11-18 15:34 ` Badari Pulavarty
@ 2004-11-19 17:56 ` Akinobu Mita
  2004-11-19 23:30   ` Andrew Morton
  1 sibling, 1 reply; 16+ messages in thread
From: Akinobu Mita @ 2004-11-19 17:56 UTC (permalink / raw)
  To: Hariprasad Nellitheertha, Andrew Morton, linux-kernel
  Cc: pbadari, Vara Prasad

On Thursday 18 November 2004 23:08, Hariprasad Nellitheertha wrote:

> There was a buggy (and unnecessary) reserve_bootmem call in the kdump
> call which was causing hangs during early on some SMP machines. The
> attached patch removes that.

Thanks! I also had the same problem.

BTW, If the first kernel enabled CONFIG_DISCONTIGMEM, the second kernel could
not boot. since crash_reserve_bootmem() never called anywhere. 


--- 2.6-mm/arch/i386/mm/discontig.c.orig	2004-11-20 00:14:42.000000000 +0900
+++ 2.6-mm/arch/i386/mm/discontig.c	2004-11-20 00:39:38.000000000 +0900
@@ -32,6 +32,7 @@
 #include <asm/e820.h>
 #include <asm/setup.h>
 #include <asm/mmzone.h>
+#include <asm/crash_dump.h>
 #include <bios_ebda.h>
 
 struct pglist_data *node_data[MAX_NUMNODES];
@@ -363,6 +364,9 @@ unsigned long __init setup_memory(void)
 		}
 	}
 #endif
+
+	crash_reserve_bootmem();
+
 	return system_max_low_pfn;
 }
 




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kdump: Fix for boot problems on SMP
  2004-11-19 23:30   ` Andrew Morton
@ 2004-11-19 23:29     ` Badari Pulavarty
  2004-11-20  1:05     ` Badari Pulavarty
  2004-11-20  3:46     ` Akinobu Mita
  2 siblings, 0 replies; 16+ messages in thread
From: Badari Pulavarty @ 2004-11-19 23:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Akinobu Mita, hari, Linux Kernel Mailing List, varap

Hi Andrew,

I haven't tested it yet on any of my machines (due to the hang). 
I am about to give it a try. But my understanding (please update 
me if I am wrong) is,

1) DISCONTIG_MEM support is not working yet - so i can't use any
of my NUMA boxes.

2) AMD64 is not supported - i can't use my Opteron machine.

3) ppc is not supported - i can't use Power3 and Power4 machines.

So, I can only try it on non-NUMA i386 smp boxes. I have few of
those to try. I will give an update next week on my testing.

Thanks,
Badari


On Fri, 2004-11-19 at 15:30, Andrew Morton wrote:
> Akinobu Mita <amgta@yacht.ocn.ne.jp> wrote:
> >
> > On Thursday 18 November 2004 23:08, Hariprasad Nellitheertha wrote:
> > 
> > > There was a buggy (and unnecessary) reserve_bootmem call in the kdump
> > > call which was causing hangs during early on some SMP machines. The
> > > attached patch removes that.
> > 
> > Thanks! I also had the same problem.
> 
> So..  How is the crashdump code working now?  I haven't heard from anyone
> who is using it and I haven't gotten onto testing it myself.
> 
> Do we have any feeling for its success rate on various machines, and on its
> ease of use?
> 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kdump: Fix for boot problems on SMP
  2004-11-19 17:56 ` Akinobu Mita
@ 2004-11-19 23:30   ` Andrew Morton
  2004-11-19 23:29     ` Badari Pulavarty
                       ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Andrew Morton @ 2004-11-19 23:30 UTC (permalink / raw)
  To: Akinobu Mita; +Cc: hari, linux-kernel, pbadari, varap

Akinobu Mita <amgta@yacht.ocn.ne.jp> wrote:
>
> On Thursday 18 November 2004 23:08, Hariprasad Nellitheertha wrote:
> 
> > There was a buggy (and unnecessary) reserve_bootmem call in the kdump
> > call which was causing hangs during early on some SMP machines. The
> > attached patch removes that.
> 
> Thanks! I also had the same problem.

So..  How is the crashdump code working now?  I haven't heard from anyone
who is using it and I haven't gotten onto testing it myself.

Do we have any feeling for its success rate on various machines, and on its
ease of use?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kdump: Fix for boot problems on SMP
  2004-11-19 23:30   ` Andrew Morton
  2004-11-19 23:29     ` Badari Pulavarty
@ 2004-11-20  1:05     ` Badari Pulavarty
  2004-11-20  3:04       ` Akinobu Mita
  2004-11-20  3:46     ` Akinobu Mita
  2 siblings, 1 reply; 16+ messages in thread
From: Badari Pulavarty @ 2004-11-20  1:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Akinobu Mita, hari, Linux Kernel Mailing List, varap

Well. I tried to use kdump.

I think documentation needs update. Documentation says

..

4) Load the second kernel to be booted using
                                                                                
   kexec -p <second-kernel> --args-linux --append="root=<root-dev> dump
   init 1 memmap=exactmap memmap=640k@0 memmap=32M@16M"

But kexec doesn't seem to like option "-p".
Even when I removed "-p", its complaining about "--args-linux"

# ./kexec  --args-linux --append="root=/dev/sda2 dump init 1
memmap=exactmap memmap=640k@0 memmap=32M@16M"  /boot/kexec2

./kexec: unrecognized option `--args-linux'
kexec 1.98 released 15 September 2004
Usage: kexec [OPTION]... [kernel]
Directly reboot into a new kernel
 
 -h, --help        Print this help.
 -v, --version     Print the version of kexec.
 -f, --force       Force an immediate kexec, don't call shutdown.
 -x, --no-ifdown   Don't bring down network interfaces.
                   (if used, must be last option specified)
 -l, --load        Load the new kernel into the current kernel.
 -u, --unload      Unload the current kexec target kernel.
 -e, --exec        Execute a currently loaded kernel.
 -t, --type=TYPE   Specify the new kernel is of this type.
 
Supported kernel file types and options:
elf32-x86
    --command-line=STRING Set the kernel command line to STRING
    --append=STRING       Set the kernel command line to STRING
    --initrd=FILE         Use FILE as the kernel's initial ramdisk.
    --ramdisk=FILE        Use FILE as the kernel's initial ramdisk.
    --args-linux          Pass linux kernel style options
    --args-elf            Pass elf boot notes
bzImage
-d, --debug               Enable debugging to help spot a failure.
    --real-mode           Use the kernels real mode entry point.
    --command-line=STRING Set the kernel command line to STRING.
    --append=STRING       Set the kernel command line to STRING.
    --initrd=FILE         Use FILE as the kernel's initial ramdisk.
    --ramdisk=FILE        Use FILE as the kernel's initial ramdisk.
 
Cannot load /boot/kexec2


Thanks,
Badari

On Fri, 2004-11-19 at 15:30, Andrew Morton wrote:
> Akinobu Mita <amgta@yacht.ocn.ne.jp> wrote:
> >
> > On Thursday 18 November 2004 23:08, Hariprasad Nellitheertha wrote:
> > 
> > > There was a buggy (and unnecessary) reserve_bootmem call in the kdump
> > > call which was causing hangs during early on some SMP machines. The
> > > attached patch removes that.
> > 
> > Thanks! I also had the same problem.
> 
> So..  How is the crashdump code working now?  I haven't heard from anyone
> who is using it and I haven't gotten onto testing it myself.
> 
> Do we have any feeling for its success rate on various machines, and on its
> ease of use?
> 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kdump: Fix for boot problems on SMP
  2004-11-20  1:05     ` Badari Pulavarty
@ 2004-11-20  3:04       ` Akinobu Mita
  2004-11-22 16:03         ` Hariprasad Nellitheertha
  0 siblings, 1 reply; 16+ messages in thread
From: Akinobu Mita @ 2004-11-20  3:04 UTC (permalink / raw)
  To: Badari Pulavarty, Andrew Morton; +Cc: hari, Linux Kernel Mailing List, varap

I've forgotten CC-ing.

On Saturday 20 November 2004 10:05, Badari Pulavarty wrote:

> 4) Load the second kernel to be booted using
>
>    kexec -p <second-kernel> --args-linux --append="root=<root-dev> dump
>    init 1 memmap=exactmap memmap=640k@0 memmap=32M@16M"
>
> But kexec doesn't seem to like option "-p".
> Even when I removed "-p", its complaining about "--args-linux"


I also have the kexec which does not have "-p" option.
Instead of using "-p" option, I use "-l" option after changing the kexec
as follows.

--- kexec-tools-1.98/kexec/kexec.c.orig	2004-10-31 19:42:34.000000000 +0900
+++ kexec-tools-1.98/kexec/kexec.c	2004-10-31 19:43:01.000000000 +0900
@@ -243,7 +243,7 @@ static int my_load(const char *type, int
 	if (sort_segments(segments, nr_segments) < 0) {
 		return -1;
 	}
-	result = kexec_load(entry, nr_segments, segments, 0);
+	result = kexec_load(entry, nr_segments, segments, 1);
 	if (result != 0) {
 		/* The load failed, print some debugging information */
 		fprintf(stderr, "kexec_load failed: %s\n",



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kdump: Fix for boot problems on SMP
  2004-11-19 23:30   ` Andrew Morton
  2004-11-19 23:29     ` Badari Pulavarty
  2004-11-20  1:05     ` Badari Pulavarty
@ 2004-11-20  3:46     ` Akinobu Mita
  2 siblings, 0 replies; 16+ messages in thread
From: Akinobu Mita @ 2004-11-20  3:46 UTC (permalink / raw)
  To: Andrew Morton; +Cc: hari, linux-kernel, pbadari, varap

On Saturday 20 November 2004 08:30, Andrew Morton wrote:
> So..  How is the crashdump code working now?  I haven't heard from anyone
> who is using it and I haven't gotten onto testing it myself.
>
> Do we have any feeling for its success rate on various machines, and on its
> ease of use?

Though I always genarate a panic intentionally on normal UP box,
(enable panic_on_oops, and generate kernel NULL pointer dereference)
It allways boot second-kernel successfully.

# gdb <first-kernel> -c /proc/vmcore
...

"up" or "down", and it displays the correct local/global values with "print"






^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kdump: Fix for boot problems on SMP
  2004-11-20  3:04       ` Akinobu Mita
@ 2004-11-22 16:03         ` Hariprasad Nellitheertha
  2004-11-22 22:34           ` Badari Pulavarty
  2004-11-23  0:43           ` Badari Pulavarty
  0 siblings, 2 replies; 16+ messages in thread
From: Hariprasad Nellitheertha @ 2004-11-22 16:03 UTC (permalink / raw)
  To: Akinobu Mita
  Cc: Badari Pulavarty, Andrew Morton, Linux Kernel Mailing List, varap

[-- Attachment #1: Type: text/plain, Size: 832 bytes --]

Akinobu Mita wrote:
> I've forgotten CC-ing.
> 
> On Saturday 20 November 2004 10:05, Badari Pulavarty wrote:
> 
> 
>>4) Load the second kernel to be booted using
>>
>>   kexec -p <second-kernel> --args-linux --append="root=<root-dev> dump
>>   init 1 memmap=exactmap memmap=640k@0 memmap=32M@16M"
>>
>>But kexec doesn't seem to like option "-p".
>>Even when I removed "-p", its complaining about "--args-linux"


There is a kexec-tools patch that is required to get the "-p" option
working. I had sent it out only to the fastboot mailing list without
updating kdump documentation. I will send out an updated documentation
patch indicating this requirement (I will host the patch on some site
and point to it in the document).

Meanwhile, I am attaching the patch with this note. Kindly try kdump
with this. Thanks!

Regards, Hari


[-- Attachment #2: kexec-tools-panic.patch --]
[-- Type: text/plain, Size: 2402 bytes --]


Signed-off-by: Hariprasad Nellitheertha <hari@in.ibm.com>


---

 kexec-tools-1.95-hari/kexec/kexec.c |   10 +++++++++-
 kexec-tools-1.95-hari/kexec/kexec.h |    6 ++++--
 2 files changed, 13 insertions(+), 3 deletions(-)

diff -puN kexec/kexec.c~kexec-tools-panic kexec/kexec.c
--- kexec-tools-1.95/kexec/kexec.c~kexec-tools-panic	2004-10-18 14:27:27.000000000 +0530
+++ kexec-tools-1.95-hari/kexec/kexec.c	2004-10-19 21:00:23.000000000 +0530
@@ -30,6 +30,7 @@
 /* local variables */
 static struct memory_range *memory_range;
 static int memory_ranges;
+static unsigned long load_flags;
 
 int valid_memory_range(struct kexec_segment *segment)
 {
@@ -243,7 +244,7 @@ static int my_load(const char *type, int
 	if (sort_segments(segments, nr_segments) < 0) {
 		return -1;
 	}
-	result = kexec_load(entry, nr_segments, segments, 0);
+	result = kexec_load(entry, nr_segments, segments, load_flags);
 	if (result != 0) {
 		/* The load failed, print some debugging information */
 		fprintf(stderr, "kexec_load failed: %s\n",
@@ -325,6 +326,7 @@ void usage(void)
 		" -u, --unload      Unload the current kexec target kernel.\n"
 		" -e, --exec        Execute a currently loaded kernel.\n"
 		" -t, --type=TYPE   Specify the new kernel is of this type.\n"
+		" -p, --load-panic  Load kernel for the reboot on panic case.\n"
 		"\n"
 		"Supported kernel file types and options: \n"
 		);
@@ -393,6 +395,12 @@ int main(int argc, char *argv[])
 		case OPT_TYPE:
 			type = optarg;
 			break;
+		case OPT_PANIC:
+			do_load = 1;
+			do_exec = 0;
+			do_shutdown = 0;
+			load_flags = 1;
+			break;
 		default:
 			break;
 		}
diff -puN kexec/kexec.h~kexec-tools-panic kexec/kexec.h
--- kexec-tools-1.95/kexec/kexec.h~kexec-tools-panic	2004-10-18 14:36:23.000000000 +0530
+++ kexec-tools-1.95-hari/kexec/kexec.h	2004-10-20 14:09:46.000000000 +0530
@@ -45,6 +45,7 @@ extern int file_types;
 #define OPT_LOAD		'l'
 #define OPT_UNLOAD		'u'
 #define OPT_TYPE		't'
+#define OPT_PANIC		'p'
 #define OPT_MAX			256
 #define KEXEC_OPTIONS \
 	{ "help",		0, 0, OPT_HELP }, \
@@ -54,7 +55,8 @@ extern int file_types;
 	{ "load",		0, 0, OPT_LOAD }, \
 	{ "unload",		0, 0, OPT_UNLOAD }, \
 	{ "exec",		0, 0, OPT_EXEC }, \
-	{ "type",		1, 0, OPT_TYPE }, 
-#define KEXEC_OPT_STR "hvdfxluet:"
+	{ "type",		1, 0, OPT_TYPE }, \
+	{ "panic",		0, 0, OPT_PANIC },
+#define KEXEC_OPT_STR "hvdfxluet:p"
 
 #endif /* KEXEC_H */

_

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kdump: Fix for boot problems on SMP
  2004-11-22 16:03         ` Hariprasad Nellitheertha
@ 2004-11-22 22:34           ` Badari Pulavarty
  2004-11-23  0:43           ` Badari Pulavarty
  1 sibling, 0 replies; 16+ messages in thread
From: Badari Pulavarty @ 2004-11-22 22:34 UTC (permalink / raw)
  To: Hariprasad Nellitheertha
  Cc: Akinobu Mita, Andrew Morton, Linux Kernel Mailing List, varap

Hari,

Thanks for the patch and I tried it. 

I hacked "sysrq-b" to call panic() to test this.
So far, my success is limited.

These could be already known and being worked on ..
Out of few times I tried, I run into following.

1) When panic the system, I get
Badness in smp_call_function() in arch/i386/kernel/smp.c: 552
and the system hangs.

2) Machine boots to single user only with 1 CPU. 
I get following msgs while booting second kernel.

..

Booting processor 1/1 eip 2000
Stuck ??
Inquiring remote APIC #1...
... APIC #1 ID: 01000000
... APIC #1 VERSION: 00040011
... APIC #1 SPIV: 000000ff
CPU #1 not responding - cannot use it.
Booting processor 1/2 eip 2000
Stuck ??
Inquiring remote APIC #2...
... APIC #2 ID: 02000000
... APIC #2 VERSION: 00040011
... APIC #2 SPIV: 000000ff
CPU #2 not responding - cannot use it.
Booting processor 1/3 eip 2000
Stuck ??
Inquiring remote APIC #3...
... APIC #3 ID: 03000000
... APIC #3 VERSION: 00040011
...

3) When I tried to run gdb on the core file,
gdb gets killed since there is not enough memory.
(this is on the second kernel - so this could be okay).

#gdb vmlinux.kexec1 ../core/vmcore.1
GNU gdb 5.2.1
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i586-suse-linux"...oom-killer:
gfp_mask=0x1d2
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 4, high 12, batch 2
cpu 0 cold: low 0, high 4, batch 2
HighMem per-cpu: empty
                       
Free pages:        1116kB (0kB HighMem)
Active:2222 inactive:3280 dirty:0 writeback:0 unstable:0 free:279
slab:804 mapped:2275 pagetables:23
DMA free:292kB min:292kB low:364kB high:436kB active:108kB
inactive:128kB present:16384kB pages_scanned:544 all_unreclaimable? yes
protections[]: 0 0 0
Normal free:824kB min:588kB low:732kB high:880kB active:8780kB
inactive:12992kB present:32768kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
DMA: 1*4kB 0*8kB 0*16kB 1*32kB 0*64kB 0*128kB 1*256kB 0*512kB 0*1024kB
0*2048kB 0*4096kB = 292kB
Normal: 44*4kB 7*8kB 1*16kB 0*32kB 3*64kB 1*128kB 1*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 824kB
HighMem: empty
Swap cache: add 23125, delete 19925, find 8355/9281, race 2+1
Out of Memory: Killed process 4290 (gdb).
Terminated

FYI.


Thanks,
Badari

On Mon, 2004-11-22 at 08:03, Hariprasad Nellitheertha wrote:
> Akinobu Mita wrote:
> > I've forgotten CC-ing.
> > 
> > On Saturday 20 November 2004 10:05, Badari Pulavarty wrote:
> > 
> > 
> >>4) Load the second kernel to be booted using
> >>
> >>   kexec -p <second-kernel> --args-linux --append="root=<root-dev> dump
> >>   init 1 memmap=exactmap memmap=640k@0 memmap=32M@16M"
> >>
> >>But kexec doesn't seem to like option "-p".
> >>Even when I removed "-p", its complaining about "--args-linux"
> 
> 
> There is a kexec-tools patch that is required to get the "-p" option
> working. I had sent it out only to the fastboot mailing list without
> updating kdump documentation. I will send out an updated documentation
> patch indicating this requirement (I will host the patch on some site
> and point to it in the document).
> 
> Meanwhile, I am attaching the patch with this note. Kindly try kdump
> with this. Thanks!
> 
> Regards, Hari
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kdump: Fix for boot problems on SMP
  2004-11-22 16:03         ` Hariprasad Nellitheertha
  2004-11-22 22:34           ` Badari Pulavarty
@ 2004-11-23  0:43           ` Badari Pulavarty
  2004-11-23 18:15             ` Hariprasad Nellitheertha
  2004-11-25 17:21             ` Akinobu Mita
  1 sibling, 2 replies; 16+ messages in thread
From: Badari Pulavarty @ 2004-11-23  0:43 UTC (permalink / raw)
  To: Hariprasad Nellitheertha
  Cc: Akinobu Mita, Andrew Morton, Linux Kernel Mailing List, varap

More info testing results...

gdb is not showing the stack info properly, on my saved vmcore.
I thought vmlinux is not matching the vmcore, so I verified that
vmcore and vmlinux matchup. But still no luck...

# gdb  ../linux-2.6.9/vmlinux vmcore.2
GNU gdb 5.2.1
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i586-suse-linux"...
Core was generated by `root=/dev/sda2 dump init 1 memmap=exactmap
memmap=640k@0
memmap=32M@16M console='.
#0  default_idle () at arch/i386/kernel/process.c:108
108     }
(gdb) bt
#0  default_idle () at arch/i386/kernel/process.c:108
#1  0xc04cdff8 in init_thread_union ()
#2  0xc0101b86 in cpu_idle () at arch/i386/kernel/process.c:196
#3  0xc04cea20 in start_kernel () at init/main.c:523
#4  0xc0100211 in L6 () at /tmp/cch2z2jk.s:2054
Cannot access memory at address 0x550007



Thanks,
Badari



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kdump: Fix for boot problems on SMP
  2004-11-23  0:43           ` Badari Pulavarty
@ 2004-11-23 18:15             ` Hariprasad Nellitheertha
  2004-11-24 20:07               ` Badari Pulavarty
  2004-11-25 17:21             ` Akinobu Mita
  1 sibling, 1 reply; 16+ messages in thread
From: Hariprasad Nellitheertha @ 2004-11-23 18:15 UTC (permalink / raw)
  To: Badari Pulavarty
  Cc: Akinobu Mita, Andrew Morton, Linux Kernel Mailing List, varap

Hi Badari,

Badari Pulavarty wrote:
> More info testing results...
> 
> gdb is not showing the stack info properly, on my saved vmcore.
> I thought vmlinux is not matching the vmcore, so I verified that
> vmcore and vmlinux matchup. But still no luck...

I will try to recreate this using the 'sysrq' method you described in 
the earlier mail. Will let you know my findings asap.

Thanks very much for trying kdump!

Regards, Hari

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kdump: Fix for boot problems on SMP
  2004-11-23 18:15             ` Hariprasad Nellitheertha
@ 2004-11-24 20:07               ` Badari Pulavarty
  2004-11-25 15:13                 ` Hariprasad Nellitheertha
  0 siblings, 1 reply; 16+ messages in thread
From: Badari Pulavarty @ 2004-11-24 20:07 UTC (permalink / raw)
  To: Hariprasad Nellitheertha
  Cc: Akinobu Mita, Andrew Morton, Linux Kernel Mailing List, varap

Hari,


I have a success case and a failure case to report.

1) Success first.. I was able save /proc/vmcore when my machine
paniced (not thro sysrq) and gdb showed the stack correctly :)

For some reason, gdb failed to show stack correctly, when I
ran it on /proc/vmcore directly, when I am on kxec kernel :(

# gdb  ../l*9/vmlinux vmcore.3
...
Core was generated by `root=/dev/sda2 dump init 1 memmap=exactmap
memmap=640k@0
memmap=32M@16M console='.
#0  crash_get_current_regs (regs=0xc050b000)
    at arch/i386/kernel/crash_dump.c:98
98      }
(gdb) bt
#0  crash_get_current_regs (regs=0xc050b000)
    at arch/i386/kernel/crash_dump.c:98
#1  0xc0139986 in __crash_machine_kexec () at kernel/crash.c:83
#2  0xc011b2aa in panic (fmt=0xc050b000 "") at
include/linux/crash_dump.h:21
#3  0xc0104ed5 in die (str=0x0, regs=0x1, err=2)
    at arch/i386/kernel/traps.c:392
#4  0xc0113ad2 in do_page_fault (regs=0xd4937edc, error_code=2)
    at arch/i386/mm/fault.c:480
#5  0xc0104707 in error_code () at /tmp/ccK5IM1b.s:2135
#6  0xc017a55e in aio_put_req (req=0x0) at fs/aio.c:529
#7  0xc017ba0d in io_submit_one (ctx=0xd46fddc0, user_iocb=0xbfffecb0,
    iocb=0xf75af124) at fs/aio.c:1551
#8  0xc017baf1 in sys_io_submit (ctx_id=3226513408, nr=32,
iocbpp=0xbfffec30)
    at fs/aio.c:1609
#9  0xc0103c63 in syscall_call () at /tmp/ccK5IM1b.s:1946
#10 0xc0407220 in default_exec_domain ()
(gdb) q

2) Failure case:

When I recreated the panic again, it tried to run kexec() and
ran into exception in kexec() code, and machine hung.

Here is the console output:

Unable to handle kernel NULL pointer dereference at virtual address
00000020
 printing eip:
c128c044
*pde = 00000000
Oops: 0002 [#1]
SMP
Modules linked in:
CPU:    0
EIP:    0060:[<c128c044>]    Not tainted VLI
EFLAGS: 00010086   (2.6.10-rc2-mm2kexec)
EIP is at _spin_lock_irq+0x4/0x20    <<<<<<<<<**** my original panic 
eax: 00000020   ebx: c2dd77e0   ecx: c2821bb0   edx: c2821b80
esi: 00000020   edi: 00000000   ebp: c1dd9f10   esp: c1dd9f10
ds: 007b   es: 007b   ss: 0068
Process aio_tio (pid: 8084, threadinfo=c1dd8000 task=c2110570)
Stack: c1dd9f2c c107a56e c1dd9f18 c1dd9f18 c2821ba0 c2dd77e0 c1dd9f70
c1dd9f54
       c107ba1d c2821b80 00000000 00000000 bfffecb0 c2821b80 c2821b80
00000000
       bfffec30 c1dd9fbc c107bb01 c1dd9f70 bfffecb0 00000040 bfffecb0
00000000
Call Trace:
 [<c1004aaf>] show_stack+0x7f/0xa0
 [<c1004c5e>] show_registers+0x15e/0x1c0
 [<c1004e62>] die+0xf2/0x180
 [<c1013ad2>] do_page_fault+0x3b2/0x710
 [<c1004707>] error_code+0x2b/0x30
 [<c107a56e>] aio_put_req+0x1e/0x90
 [<c107ba1d>] io_submit_one+0x20d/0x250
 [<c107bb01>] sys_io_submit+0xa1/0x110
 [<c1003c63>] syscall_call+0x7/0xb
Code: fe 0a 79 12 a9 00 02 00 00 74 01 fb f3 90 80 3a 00 7e f9 fa eb e9
5d c3 90 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 55 89 e5 fa <f0> fe
08 79 09 f3 90 80 38 00 7e f9 eb f2 5d c3 8d b6 00 00 00
 <0>Fatal exception: panic in 5 seconds
Kernel panic - not syncing: Fatal exception
 <0>kexec: opening parachute	<<<<<<<<<<*** trying to kexec ?
Unable to handle kernel paging request at virtual address c30a0000
 printing eip:
c1039956
*pde = 00000000
Oops: 0002 [#2]
SMP
Modules linked in:
CPU:    0
EIP:    0060:[<c1039956>]    Not tainted VLI
EFLAGS: 00010206   (2.6.10-rc2-mm2kexec)
EIP is at __crash_machine_kexec+0x66/0x110      <<<<<<** panic in kexec 
eax: 00005400   ebx: c2003180   ecx: 000001e0   edx: 00000001
esi: c140b000   edi: c30a0000   ebp: c1dd9d98   esp: c1dd9d80
ds: 007b   es: 007b   ss: 0068
Process aio_tio (pid: 8084, threadinfo=c1dd8000 task=c2110570)
Stack: c140b000 c1dd9d94 c1dd9d98 c1dd8000 c1dd9edc c12a01d5 c1dd9db4
c101b2aa
       00000000 c140c380 c129e8dd c1dd9dc0 c1dd8000 c1dd9df8 c1004ed5
c129e8ce
       00000001 c1dd9dcc 00000001 c1dd9edc c12a01d5 00000002 000000ff
0000000b
Call Trace:
 [<c1004aaf>] show_stack+0x7f/0xa0
 [<c1004c5e>] show_registers+0x15e/0x1c0
 [<c1004e62>] die+0xf2/0x180
 [<c1013ad2>] do_page_fault+0x3b2/0x710
 [<c1004707>] error_code+0x2b/0x30
 [<c101b2aa>] panic+0x5a/0x120
 [<c1004ed5>] die+0x165/0x180
 [<c1013ad2>] do_page_fault+0x3b2/0x710
 [<c1004707>] error_code+0x2b/0x30
 [<c107a56e>] aio_put_req+0x1e/0x90
 [<c107ba1d>] io_submit_one+0x20d/0x250
 [<c107bb01>] sys_io_submit+0xa1/0x110
 [<c1003c63>] syscall_call+0x7/0xb
Code: 2a c1 be 01 00 00 00 89 35 a4 c7 40 c1 e8 03 22 fe ff 8b 0d a4 c7
40 c1 85 c9 75 6c bf 00 00 0a c3 be 00 b0 40 c1 b9 e0 01 00 00 <f3> a5
c7 04 24 80 07 0a c3 c7 44 24 04 80 b7 40 c1 c7 44 24 08
 <0>Fatal exception: panic in 5 seconds
                                        



Thanks,
Badari

On Tue, 2004-11-23 at 10:15, Hariprasad Nellitheertha wrote:
> Hi Badari,
> 
> Badari Pulavarty wrote:
> > More info testing results...
> > 
> > gdb is not showing the stack info properly, on my saved vmcore.
> > I thought vmlinux is not matching the vmcore, so I verified that
> > vmcore and vmlinux matchup. But still no luck...
> 
> I will try to recreate this using the 'sysrq' method you described in 
> the earlier mail. Will let you know my findings asap.
> 
> Thanks very much for trying kdump!
> 
> Regards, Hari
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kdump: Fix for boot problems on SMP
  2004-11-24 20:07               ` Badari Pulavarty
@ 2004-11-25 15:13                 ` Hariprasad Nellitheertha
  0 siblings, 0 replies; 16+ messages in thread
From: Hariprasad Nellitheertha @ 2004-11-25 15:13 UTC (permalink / raw)
  To: Badari Pulavarty
  Cc: Akinobu Mita, Andrew Morton, Linux Kernel Mailing List, varap

Hi Badari,

Badari Pulavarty wrote:
> Hari,
> 
> 
> I have a success case and a failure case to report.
> 
> 1) Success first.. I was able save /proc/vmcore when my machine
> paniced (not thro sysrq) and gdb showed the stack correctly :)

Thanks for this news! Reassures us that we are on the right track on 
making kdump useful for real-life problems.

> 
> For some reason, gdb failed to show stack correctly, when I
> ran it on /proc/vmcore directly, when I am on kxec kernel :(

Does it throw up wrong entries or does it completely fail?

> 
> # gdb  ../l*9/vmlinux vmcore.3
> ...
.
.
.
>  <0>kexec: opening parachute	<<<<<<<<<<*** trying to kexec ?

Yes, this is the kexec call from the crash dump code.

> Unable to handle kernel paging request at virtual address c30a0000

This is the page reserved for storing the register values. Its really 
strange that it faults here. The page is reserved already during early 
boot.

>  printing eip:
> c1039956
> *pde = 00000000
> Oops: 0002 [#2]
> SMP
> Modules linked in:
> CPU:    0
> EIP:    0060:[<c1039956>]    Not tainted VLI
> EFLAGS: 00010206   (2.6.10-rc2-mm2kexec)
> EIP is at __crash_machine_kexec+0x66/0x110      <<<<<<** panic in kexec 

The panic is in crash_dump_save_registers() while doing a memcpy. As I 
mentioned above, it faults on the page reserved to save the registers.

Is it possible I can get the testcase so I can attempt recreating the 
problem here. Please let me know.

Regards, Hari

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kdump: Fix for boot problems on SMP
  2004-11-23  0:43           ` Badari Pulavarty
  2004-11-23 18:15             ` Hariprasad Nellitheertha
@ 2004-11-25 17:21             ` Akinobu Mita
  2004-11-26 11:57               ` Hariprasad Nellitheertha
  1 sibling, 1 reply; 16+ messages in thread
From: Akinobu Mita @ 2004-11-25 17:21 UTC (permalink / raw)
  To: Badari Pulavarty, Hariprasad Nellitheertha
  Cc: Andrew Morton, Linux Kernel Mailing List, varap

On Tuesday 23 November 2004 09:43, Badari Pulavarty wrote:
> gdb is not showing the stack info properly, on my saved vmcore.
> I thought vmlinux is not matching the vmcore, so I verified that
> vmcore and vmlinux matchup. But still no luck...
>
> # gdb  ../linux-2.6.9/vmlinux vmcore.2

[...]

> (gdb) bt
> #0  default_idle () at arch/i386/kernel/process.c:108
> #1  0xc04cdff8 in init_thread_union ()
> #2  0xc0101b86 in cpu_idle () at arch/i386/kernel/process.c:196
> #3  0xc04cea20 in start_kernel () at init/main.c:523
> #4  0xc0100211 in L6 () at /tmp/cch2z2jk.s:2054
> Cannot access memory at address 0x550007


I think the panic was happened on the CPU except for CPU#0.

Currently vmcore contains only CPU#0's register contents.
Therefore, GDB always shows backtrace of CPU#0.


fs/proc/vmcore.c:

static void elf_vmcore_store_hdr(char *bufp, int nphdr, int dataoff)
{
...
        /* 1 - Get the registers from the reserved memory area */
        reg_ppos = BACKUP_END + CRASH_RELOCATE_SIZE;
        read_from_oldmem(reg_buf, REG_SIZE, &reg_ppos, 0);
        elf_core_copy_regs(&prstatus.pr_reg, (struct pt_regs *)reg_buf);
        buf = storenote(&notes[0], buf);


In this place, "reg_ppos" is the pointer to the copy of relocated
crash_smp_regs[0].
kdump should save the "crash_smp_regs[**panic_cpu**]".

Or, it is better to save all crash_smp_regs[NR_CPUS].
In other words:

# readelf --note /proc/vmcore

Notes at offset 0x00000074 with length 0x0000069c:
  Owner         Data size       Description
  CORE          0x00000090      NT_PRSTATUS (prstatus structure)
  CORE          0x0000007c      NT_PRPSINFO (prpsinfo structure)
  CORE          0x00000560      NT_TASKSTRUCT (task structure)
  :
  :
  :
  ...(repeat NR_CPU times)





^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kdump: Fix for boot problems on SMP
  2004-11-25 17:21             ` Akinobu Mita
@ 2004-11-26 11:57               ` Hariprasad Nellitheertha
  0 siblings, 0 replies; 16+ messages in thread
From: Hariprasad Nellitheertha @ 2004-11-26 11:57 UTC (permalink / raw)
  To: Akinobu Mita
  Cc: Badari Pulavarty, Andrew Morton, Linux Kernel Mailing List, varap

Akinobu Mita wrote:
> On Tuesday 23 November 2004 09:43, Badari Pulavarty wrote:
> 
>>gdb is not showing the stack info properly, on my saved vmcore.
>>I thought vmlinux is not matching the vmcore, so I verified that
>>vmcore and vmlinux matchup. But still no luck...
>>
>># gdb  ../linux-2.6.9/vmlinux vmcore.2
> 
> 
> [...]
> 
> 
>>(gdb) bt
>>#0  default_idle () at arch/i386/kernel/process.c:108
>>#1  0xc04cdff8 in init_thread_union ()
>>#2  0xc0101b86 in cpu_idle () at arch/i386/kernel/process.c:196
>>#3  0xc04cea20 in start_kernel () at init/main.c:523
>>#4  0xc0100211 in L6 () at /tmp/cch2z2jk.s:2054
>>Cannot access memory at address 0x550007
> 
> 
> 
> I think the panic was happened on the CPU except for CPU#0.
> 
> Currently vmcore contains only CPU#0's register contents.
> Therefore, GDB always shows backtrace of CPU#0.
> 
> 
> fs/proc/vmcore.c:
> 
> static void elf_vmcore_store_hdr(char *bufp, int nphdr, int dataoff)
> {
> ...
>         /* 1 - Get the registers from the reserved memory area */
>         reg_ppos = BACKUP_END + CRASH_RELOCATE_SIZE;
>         read_from_oldmem(reg_buf, REG_SIZE, &reg_ppos, 0);
>         elf_core_copy_regs(&prstatus.pr_reg, (struct pt_regs *)reg_buf);
>         buf = storenote(&notes[0], buf);
> 
> 
> In this place, "reg_ppos" is the pointer to the copy of relocated
> crash_smp_regs[0].
> kdump should save the "crash_smp_regs[**panic_cpu**]".
> 
> Or, it is better to save all crash_smp_regs[NR_CPUS].
> In other words:

I am actually working on patches to export the registers of all
processors as elf notes sections. Similar to what multi-threaded core
dump does. This will enable gdb to correctly analyze the stack trace
on all processors.

Thanks and Regards,
Hari

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2004-11-27  6:03 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-18 14:08 [PATCH] kdump: Fix for boot problems on SMP Hariprasad Nellitheertha
2004-11-18 15:34 ` Badari Pulavarty
2004-11-19 17:56 ` Akinobu Mita
2004-11-19 23:30   ` Andrew Morton
2004-11-19 23:29     ` Badari Pulavarty
2004-11-20  1:05     ` Badari Pulavarty
2004-11-20  3:04       ` Akinobu Mita
2004-11-22 16:03         ` Hariprasad Nellitheertha
2004-11-22 22:34           ` Badari Pulavarty
2004-11-23  0:43           ` Badari Pulavarty
2004-11-23 18:15             ` Hariprasad Nellitheertha
2004-11-24 20:07               ` Badari Pulavarty
2004-11-25 15:13                 ` Hariprasad Nellitheertha
2004-11-25 17:21             ` Akinobu Mita
2004-11-26 11:57               ` Hariprasad Nellitheertha
2004-11-20  3:46     ` Akinobu Mita

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).