linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [Fastboot] Re: Kdump Testing
@ 2005-04-23  3:30 Vivek Goyal
  2005-04-25 12:15 ` Nagesh Sharyathi
  0 siblings, 1 reply; 14+ messages in thread
From: Vivek Goyal @ 2005-04-23  3:30 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Nagesh Sharyathi, akpm, fastboot, linux-kernel, maneesh

Quoting "Eric W. Biederman" <ebiederm@xmission.com>:

> Nagesh Sharyathi <sharyathi@in.ibm.com> writes:
> 
> > Here is the console boot log, before the machine jumps to BIOS 
> > after hang during panic kerenl boot
> 
> Ok thanks.  So this is manually triggered with SysRq
> and the kexec part works but the recover kernel simply fails
> to boot.
> 
> It looks like that hunk of the ACPI code that messes up maxcpus=1
> needs to be looked at.

I faced the similiar issue on one of my machine. Little debugging showed that
Boot cpu sends an INIT IPI to application processor to wake it up and then boot
cpu loses its way and jumps to bios. Strange....

Further, in my case this problem was noticed only if crash happened on non-boot
cpu.

It works well with Uniporcessor capture kernel. For the time being sufficient 
to capture the dump but it is always good idea to be able to boot and SMP kernel
as well.


Thanks
Vivek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Fastboot] Re: Kdump Testing
  2005-04-23  3:30 [Fastboot] Re: Kdump Testing Vivek Goyal
@ 2005-04-25 12:15 ` Nagesh Sharyathi
  2005-04-25 23:09   ` Randy.Dunlap
  0 siblings, 1 reply; 14+ messages in thread
From: Nagesh Sharyathi @ 2005-04-25 12:15 UTC (permalink / raw)
  To: vgoyal, akpm, Eric W. Biederman, fastboot, linux-kernel, maneesh

vgoyal@in.ltcfwd.linux.ibm.com wrote on 23/04/2005 09:00:03:

> Quoting "Eric W. Biederman" <ebiederm@xmission.com>:

> > Nagesh Sharyathi <sharyathi@in.ibm.com> writes:
> >
> > > Here is the console boot log, before the machine jumps to BIOS
> > > after hang during panic kerenl boot
> >
> > Ok thanks.  So this is manually triggered with SysRq
> > and the kexec part works but the recover kernel simply fails
> > to boot.
> >
> > It looks like that hunk of the ACPI code that messes up maxcpus=1
> > needs to be looked at.

> It works well with Uniporcessor capture kernel. For the time being 
sufficient
> to capture the dump but it is always good idea to be able to boot 
> and SMP kernel
> as well.
> 
> Vivek
I verified on my machine where earlier kdump used to fail and after 
disabling CONFIG_SMP(ie CONFIG_SMP=n) crash kernel boots properly and I am 
able to take the memory dump
Regards
Sharyathi 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Fastboot] Re: Kdump Testing
  2005-04-25 12:15 ` Nagesh Sharyathi
@ 2005-04-25 23:09   ` Randy.Dunlap
  2005-04-26  8:54     ` Vivek Goyal
  0 siblings, 1 reply; 14+ messages in thread
From: Randy.Dunlap @ 2005-04-25 23:09 UTC (permalink / raw)
  To: Nagesh Sharyathi; +Cc: vgoyal, akpm, ebiederm, fastboot, linux-kernel, maneesh

On Mon, 25 Apr 2005 17:45:43 +0530
Nagesh Sharyathi <sharyathi@in.ibm.com> wrote:

> vgoyal@in.ltcfwd.linux.ibm.com wrote on 23/04/2005 09:00:03:
> 
> > Quoting "Eric W. Biederman" <ebiederm@xmission.com>:
> 
> > > Nagesh Sharyathi <sharyathi@in.ibm.com> writes:
> > >
> > > > Here is the console boot log, before the machine jumps to BIOS
> > > > after hang during panic kerenl boot
> > >
> > > Ok thanks.  So this is manually triggered with SysRq
> > > and the kexec part works but the recover kernel simply fails
> > > to boot.
> > >
> > > It looks like that hunk of the ACPI code that messes up maxcpus=1
> > > needs to be looked at.
> 
> > It works well with Uniporcessor capture kernel. For the time being 
> sufficient
> > to capture the dump but it is always good idea to be able to boot 
> > and SMP kernel
> > as well.
> > 
> > Vivek
> I verified on my machine where earlier kdump used to fail and after 
> disabling CONFIG_SMP(ie CONFIG_SMP=n) crash kernel boots properly and I am 
> able to take the memory dump


Thanks for those hints.  However, my testing didn't go quite
as well as that.


2.6.12-rc2-mm3 reboots vmlinux-recover-UP on panic.
(vmlinux-recover-SMP hangs during [early] reboot, but -UP
goes further....)

(BTW, how does I do serial console from the second
kernel...?  It has the drivers, but not the command
line info?  TBD.)

vmlinux-recover-UP gets to this point, hand-written,
several lines missing:

kfree_debugcheck: bad ptr c3dbffb0h.  ( == %esi)
kernel BUG at <bad filename>:23128!
invalid operand: 0000 [#1]
DEBUG_PAGEALLOC
EIP is at kfree_debugcheck+0x45/0x50

Stack dump shows lots of ext3 cache and inode functions...

On a dual-proc P4 with 1 GB RAM.
-- 
~Randy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Fastboot] Re: Kdump Testing
  2005-04-25 23:09   ` Randy.Dunlap
@ 2005-04-26  8:54     ` Vivek Goyal
  2005-04-27 16:46       ` Randy.Dunlap
  2005-04-27 19:23       ` Randy.Dunlap
  0 siblings, 2 replies; 14+ messages in thread
From: Vivek Goyal @ 2005-04-26  8:54 UTC (permalink / raw)
  To: Randy.Dunlap
  Cc: Nagesh Sharyathi, vgoyal, akpm, ebiederm, fastboot, linux-kernel,
	maneesh

> 
> 2.6.12-rc2-mm3 reboots vmlinux-recover-UP on panic.
> (vmlinux-recover-SMP hangs during [early] reboot, but -UP
> goes further....)
> 
> (BTW, how does I do serial console from the second
> kernel...?  It has the drivers, but not the command
> line info?  TBD.)
> 


While pre-loading the capture kernel using kexec, you can specify the command
line options to second kernel using --append="". You must already be passing
the root device. Add you serial console parameters as well something like
--append="console=ttyS0, 38400"


> vmlinux-recover-UP gets to this point, hand-written,
> several lines missing:
> 
> kfree_debugcheck: bad ptr c3dbffb0h.  ( == %esi)
> kernel BUG at <bad filename>:23128!
> invalid operand: 0000 [#1]
> DEBUG_PAGEALLOC
> EIP is at kfree_debugcheck+0x45/0x50
> 
> Stack dump shows lots of ext3 cache and inode functions...
> 

Can you post a full serial console output of second kernel? That would help.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Fastboot] Re: Kdump Testing
  2005-04-26  8:54     ` Vivek Goyal
@ 2005-04-27 16:46       ` Randy.Dunlap
  2005-04-27 19:23       ` Randy.Dunlap
  1 sibling, 0 replies; 14+ messages in thread
From: Randy.Dunlap @ 2005-04-27 16:46 UTC (permalink / raw)
  To: vgoyal; +Cc: akpm, sharyathi, fastboot, linux-kernel, ebiederm

On Tue, 26 Apr 2005 14:24:48 +0530
Vivek Goyal <vgoyal@in.ibm.com> wrote:

> > 
> > 2.6.12-rc2-mm3 reboots vmlinux-recover-UP on panic.
> > (vmlinux-recover-SMP hangs during [early] reboot, but -UP
> > goes further....)
> > 
> > (BTW, how does I do serial console from the second
> > kernel...?  It has the drivers, but not the command
> > line info?  TBD.)
> > 
> 
> 
> While pre-loading the capture kernel using kexec, you can specify the command
> line options to second kernel using --append="". You must already be passing
> the root device. Add you serial console parameters as well something like
> --append="console=ttyS0, 38400"

Yes, that's what I was planning to try anyway, thanks for the
confirmation.  Finally got it working.


> > vmlinux-recover-UP gets to this point, hand-written,
> > several lines missing:
> > 
> > kfree_debugcheck: bad ptr c3dbffb0h.  ( == %esi)
> > kernel BUG at <bad filename>:23128!
> > invalid operand: 0000 [#1]
> > DEBUG_PAGEALLOC
> > EIP is at kfree_debugcheck+0x45/0x50
> > 
> > Stack dump shows lots of ext3 cache and inode functions...
> > 
> 
> Can you post a full serial console output of second kernel? That would help.

Here:

 Linux version 2.6.12-rc2-mm3 (rddunlap@gargoyle) (gcc version 3.3.3 (SuSE Linux)) #25 Tue Apr 26 17:52:39 PDT 2005
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000100 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 0000000000100000 - 000000003fff0000 (usable)
 BIOS-e820: 000000003fff0000 - 000000003fff3000 (ACPI NVS)
 BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data)
 BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
user-defined physical RAM map:
 user: 0000000000000000 - 00000000000a0000 (usable)
 user: 0000000001000000 - 000000000144d000 (usable)
 user: 00000000014ed400 - 0000000005000000 (usable)
0MB HIGHMEM available.
80MB LOWMEM available.
DMI 2.3 present.
Allocating PCI resources starting at 05000000 (gap: 05000000:fb000000)
Built 1 zonelists
Initializing CPU#0
Kernel command line: root=/dev/hda9 nosmp console=ttyS0,115200n8 console=tty0 init 1 memmap=exactmap memmap=640K@0K memmap=4404K@16384K memmap=60491K@21429K elfcorehdr=21428K
PID hash table entries: 512 (order: 9, 8192 bytes)
Detected 1685.910 MHz processor.
Using tsc for high-res timesource
Console: colour VGA+ 80x25
Unknown interrupt or fault at EIP 00000246 00000060 c13d6653   [*1]
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
Memory: 59468k/81920k available (2561k kernel code, 5956k reserved, 1311k data, 220k init, 0k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.

---
[1] c13d6653 is vfs_caches_init_early

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Fastboot] Re: Kdump Testing
  2005-04-26  8:54     ` Vivek Goyal
  2005-04-27 16:46       ` Randy.Dunlap
@ 2005-04-27 19:23       ` Randy.Dunlap
  2005-04-28 11:44         ` Vivek Goyal
  1 sibling, 1 reply; 14+ messages in thread
From: Randy.Dunlap @ 2005-04-27 19:23 UTC (permalink / raw)
  To: vgoyal; +Cc: akpm, sharyathi, fastboot, linux-kernel, ebiederm

On Tue, 26 Apr 2005 14:24:48 +0530
Vivek Goyal <vgoyal@in.ibm.com> wrote:

> > 
> > 2.6.12-rc2-mm3 reboots vmlinux-recover-UP on panic.
> > (vmlinux-recover-SMP hangs during [early] reboot, but -UP
> > goes further....)
> > 
> > (BTW, how does I do serial console from the second
> > kernel...?  It has the drivers, but not the command
> > line info?  TBD.)
> > 
> 
> 
> While pre-loading the capture kernel using kexec, you can specify the command
> line options to second kernel using --append="". You must already be passing
> the root device. Add you serial console parameters as well something like
> --append="console=ttyS0, 38400"
> 
> 
> > vmlinux-recover-UP gets to this point, hand-written,
> > several lines missing:
> > 
> > kfree_debugcheck: bad ptr c3dbffb0h.  ( == %esi)
> > kernel BUG at <bad filename>:23128!
> > invalid operand: 0000 [#1]
> > DEBUG_PAGEALLOC
> > EIP is at kfree_debugcheck+0x45/0x50
> > 
> > Stack dump shows lots of ext3 cache and inode functions...
> > 
> 
> Can you post a full serial console output of second kernel? That would help.

I did another test run, same kernels (both running and recovery).
The recovery kernel got a little further this time, still had
Badness and a BUG.

---

Kernel panic - not syncing: crashtest
 Linux version 2.6.12-rc2-mm3 (rddunlap@gargoyle) (gcc version 3.3.3 (SuSE Linux)) #25 Tue Apr 26 17:52:39 PDT 2005
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000100 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 0000000000100000 - 000000003fff0000 (usable)
 BIOS-e820: 000000003fff0000 - 000000003fff3000 (ACPI NVS)
 BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data)
 BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
user-defined physical RAM map:
 user: 0000000000000000 - 00000000000a0000 (usable)
 user: 0000000001000000 - 000000000144d000 (usable)
 user: 00000000014ed400 - 0000000005000000 (usable)
0MB HIGHMEM available.
80MB LOWMEM available.
DMI 2.3 present.
Allocating PCI resources starting at 05000000 (gap: 05000000:fb000000)
Built 1 zonelists
Initializing CPU#0
Kernel command line: root=/dev/hda9 nosmp console=ttyS0,115200n8 console=tty0 init 1 memmap=exactmap memmap=640K@0K memmap=4404K@16384K memmap=60491K@21429K elfcorehdr=21428K
PID hash table entries: 512 (order: 9, 8192 bytes)
Detected 1685.983 MHz processor.
Using tsc for high-res timesource
Console: colour VGA+ 80x25
Unknown interrupt or fault at EIP 00000246 00000060 c13d6653
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
Memory: 59468k/81920k available (2561k kernel code, 5956k reserved, 1311k data, 220k init, 0k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Mount-cache hash table entries: 512
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 256K
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
CPU: Intel(R) Xeon(TM) CPU 1.70GHz stepping 02
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
softlockup thread 0 started up.
NET: Registered protocol family 16
EISA bus registered
PCI: PCI BIOS revision 2.10 entry at 0xfb110, last bus=4
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
Linux Plug and Play Support v0.97 (c) Adam Belay
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
PCI: Using IRQ router PIIX/ICH [8086/2440] at 0000:00:1f.0
fscache: general fs caching registered
CacheFS: general fs caching v0.1 registered
inotify device minor=63
Initializing Cryptographic API
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
lp: driver loaded but no devices found
Real Time Clock Driver v1.12
Non-volatile memory driver v1.2
Software Watchdog Timer: 0.07 initialized. soft_noboot=0 soft_margin=60 sec (nowayout= 0)
Linux agpgart interface v0.101 (c) Dave Jones
agpgart: Detected an Intel i860 Chipset.
agpgart: AGP aperture is 64M @ 0xe8000000
Hangcheck: starting hangcheck timer 0.5.0 (tick is 180 seconds, margin is 60 seconds).
PNP: No PS/2 controller found. Probing ports directly.
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing disabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
parport0: PC-style at 0x378 (0x778) [PCSPP,TRISTATE,EPP]
parport0: irq 7 detected
lp0: using parport0 (polling).
lp0: console ready
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
loop: loaded (max 8 devices)
pktcdvd: v0.2.0a 2004-07-14 Jens Axboe (axboe@suse.de) and petero2@telia.com
Intel(R) PRO/1000 Network Driver - version 5.7.6-k2
Copyright (c) 1999-2004 Intel Corporation.
e100: Intel(R) PRO/100 Network Driver, 3.3.6-k2-NAPI
e100: Copyright(c) 1999-2004 Intel Corporation
PCI: Found IRQ 10 for device 0000:04:04.0
e100: eth0: e100_probe: addr 0xf4020000, irq 10, MAC addr 00:02:55:1A:35:D4
Linux video capture interface: v1.00
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ICH2: IDE controller at PCI slot 0000:00:1f.1
ICH2: chipset revision 4
ICH2: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA, hdd:DMA
hda: ST3160023A, ATA DISK drive
hdb: ST3160023A, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hdc: LTN486S, ATAPI CD/DVD-ROM drive
hdd: SONY CD-RW CRX140E, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 1024KiB
hda: 312581808 sectors (160041 MB) w/8192KiB Cache, CHS=19457/255/63, UDMA(100)
hda: cache flushes supported
 hda: hda1 < hda5 hda6 hda7 hda8 hda9 >
hdb: max request size: 1024KiB
hdb: 312581808 sectors (160041 MB) w/8192KiB Cache, CHS=19457/255/63, UDMA(100)
hdb: cache flushes supported
 hdb: hdb1 hdb2 hdb3 hdb4
hdc: ATAPI 48X CD-ROM drive, 120kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
hdd: ATAPI 32X CD-ROM CD-R/RW drive, 4096kB Cache, UDMA(33)
PCI: Enabling device 0000:03:01.0 (0006 -> 0007)
PCI: Found IRQ 11 for device 0000:03:01.0
PCI: Sharing IRQ 11 with 0000:03:01.1
PCI: Enabling device 0000:03:01.1 (0006 -> 0007)
PCI: Found IRQ 11 for device 0000:03:01.1
PCI: Sharing IRQ 11 with 0000:03:01.0
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.36
        <Adaptec aic7899 Ultra160 SCSI adapter>
        aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs

scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.36
        <Adaptec aic7899 Ultra160 SCSI adapter>
        aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs

scsi2 : scsi_debug, version 1.75 [20050113], dev_size_mb=8, opts=0x0
  Vendor: Linux     Model: scsi_debug        Rev: 0004
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 16384 512-byte hdwr sectors (8 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 16384 512-byte hdwr sectors (8 MB)
SCSI device sda: drive cache: write back
 sda: unknown partition table
Attached scsi disk sda at scsi2, channel 0, id 0, lun 0
Attached scsi generic sg0 at scsi2, channel 0, id 0, lun 0,  type 0
SCSI Media Changer driver v0.24 
USB Universal Host Controller Interface driver v2.2
PCI: Found IRQ 11 for device 0000:00:1f.2
uhci_hcd 0000:00:1f.2: Intel Corporation 82801BA/BAM USB (Hub #1)
uhci_hcd 0000:00:1f.2: new USB bus registered, assigned bus number 1
uhci_hcd 0000:00:1f.2: irq 11, io base 0x0000b000
uhci_hcd 0000:00:1f.2: detected 2 ports
usb usb1: Product: Intel Corporation 82801BA/BAM USB (Hub #1)
usb usb1: Manufacturer: Linux 2.6.12-rc2-mm3 uhci_hcd
usb usb1: SerialNumber: 0000:00:1f.2
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 2 ports detected
usbcore: registered new driver hiddev
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.01:USB HID core driver
mice: PS/2 mouse device common for all mice
input: PC Speaker
i2c /dev entries driver
EISA: Probing bus 0 at eisa.0
Cannot allocate resource for EISA slot 4
Cannot allocate resource for EISA slot 5
EISA: Detected 0 cards.
Advanced Linux Sound Architecture Driver Version 1.0.9rc2  (Thu Mar 24 10:33:39 2005 UTC).
PCI: Found IRQ 11 for device 0000:00:1f.5
PCI: Sharing IRQ 11 with 0000:00:1f.3
input: AT Translated Set 2 keyboard on isa0060/serio0
intel8x0_measure_ac97_clock: measured 49559 usecs
intel8x0: clocking to 48000
ALSA device list:
  #0: Intel 82801BA-ICH2 with AD1885 at 0xb800, irq 11
NET: Registered protocol family 26
NET: Registered protocol family 2
IP: routing cache hash table of 128 buckets, 4Kbytes
TCP established hash table entries: 4096 (order: 3, 32768 bytes)
TCP bind hash table entries: 4096 (order: 4, 114688 bytes)
TCP: Hash tables configured (established 4096 bind 4096)
NET: Registered protocol family 1
NET: Registered protocol family 17
CacheFS: Wrong magic number on cache
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
input: ImPS/2 Generic Wheel Mouse on isa0060/serio1
kjournald starting.  Commit interval 5 seconds
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 220k freed
Adding 2104472k swap on /dev/hda7.  Priority:42 extents:1
mismatch in kmem_cache_free: expected cache c168fc80, got c4daca80
c4daca80 is ext3_inode_cache.
c168fc80 is skbuff_head_cache.
Badness in cache_free_debugcheck at mm/slab.c:1917
 [<c1003368>] dump_stack+0x16/0x18
 [<c1041a94>] cache_free_debugcheck+0x88/0x1d5
 [<c10424fd>] kmem_cache_free+0x26/0x65
 [<c10a8c01>] ext3_destroy_inode+0x17/0x19
 [<c10784c9>] destroy_inode+0x27/0x3d
 [<c1078837>] dispose_list+0x60/0x178
 [<c1078f81>] prune_icache+0x363/0x399
 [<c1078fd0>] shrink_icache_memory+0x19/0x32
 [<c1044dd7>] shrink_slab+0x104/0x172
 [<c104641e>] try_to_free_pages+0xbe/0x16f
 [<c103d9a0>] __alloc_pages+0x1d3/0x393
 [<c104037c>] kmem_getpages+0x2d/0x7f
 [<c1041869>] cache_grow+0x155/0x2a8
 [<c1041f1f>] cache_alloc_refill+0x285/0x2c2
 [<c10423c6>] kmem_cache_alloc+0x5d/0x77
 [<c1075dac>] d_alloc+0x16/0x27a
 [<c106b2b9>] real_lookup+0x40/0xc2
 [<c106b68e>] do_lookup+0x41/0x75
 [<c106c3a7>] __link_path_walk+0xce5/0x1066
 [<c106c768>] link_path_walk+0x40/0xc7
 [<c106ca87>] path_lookup+0xec/0xf7
 [<c106cbc9>] __user_walk+0x28/0x42
 [<c10667b3>] vfs_lstat+0x17/0x3f
 [<c1066d1e>] sys_lstat64+0x13/0x29
 [<c1002c5f>] sysenter_past_esp+0x54/0x75
slab error in cache_free_debugcheck(): cache `ext3_inode_cache': double free, or memory outside object was overwritten
 [<c1003368>] dump_stack+0x16/0x18
 [<c1041ad2>] cache_free_debugcheck+0xc6/0x1d5
 [<c10424fd>] kmem_cache_free+0x26/0x65
 [<c10a8c01>] ext3_destroy_inode+0x17/0x19
 [<c10784c9>] destroy_inode+0x27/0x3d
 [<c1078837>] dispose_list+0x60/0x178
 [<c1078f81>] prune_icache+0x363/0x399
 [<c1078fd0>] shrink_icache_memory+0x19/0x32
 [<c1044dd7>] shrink_slab+0x104/0x172
 [<c104641e>] try_to_free_pages+0xbe/0x16f
 [<c103d9a0>] __alloc_pages+0x1d3/0x393
 [<c104037c>] kmem_getpages+0x2d/0x7f
 [<c1041869>] cache_grow+0x155/0x2a8
 [<c1041f1f>] cache_alloc_refill+0x285/0x2c2
 [<c10423c6>] kmem_cache_alloc+0x5d/0x77
 [<c1075dac>] d_alloc+0x16/0x27a
 [<c106b2b9>] real_lookup+0x40/0xc2
 [<c106b68e>] do_lookup+0x41/0x75
 [<c106c3a7>] __link_path_walk+0xce5/0x1066
 [<c106c768>] link_path_walk+0x40/0xc7
 [<c106ca87>] path_lookup+0xec/0xf7
 [<c106cbc9>] __user_walk+0x28/0x42
 [<c10667b3>] vfs_lstat+0x17/0x3f
 [<c1066d1e>] sys_lstat64+0x13/0x29
 [<c1002c5f>] sysenter_past_esp+0x54/0x75
c3d7afb0: redzone 1: 0x0, redzone 2: 0x0.
------------[ cut here ]------------
kernel BUG at <bad filename>:18422!
invalid operand: 0000 [#1]
DEBUG_PAGEALLOC
Modules linked in:
CPU:    0
EIP:    0060:[<c1041b46>]    Not tainted VLI
EFLAGS: 00010002   (2.6.12-rc2-mm3) 
EIP is at cache_free_debugcheck+0x13a/0x1d5
eax: c3d7a000   ebx: c3d7a000   ecx: 00001000   edx: 00000fb0
esi: c3d7afb0   edi: c4daca80   ebp: c2f73bb8   esp: c2f73bac
ds: 007b   es: 007b   ss: 0068
Process showconsole (pid: 1264, threadinfo=c2f72000 task=c2f68ac0)
Stack: c4d0fec4 c4daca80 c3d7bd44 c2f73be0 c10424fd c4daca80 c3d7bd44 c10a8c01 
       00000080 00000286 c3d7bddc c2f73c2c 00000080 c2f73bf0 c10a8c01 c4daca80 
       c3d7bd44 c2f73c00 c10784c9 c3d7bddc c3d7bddc c2f73c1c c1078837 c3d7bddc 
Call Trace:
 [<c100334a>] show_stack+0x7a/0x82
 [<c1003453>] show_registers+0xe9/0x153
 [<c100369f>] die+0x15c/0x23d
 [<c1003a79>] do_invalid_op+0x90/0x97
 [<c1002ed3>] error_code+0x4f/0x54
 [<c10424fd>] kmem_cache_free+0x26/0x65
 [<c10a8c01>] ext3_destroy_inode+0x17/0x19
 [<c10784c9>] destroy_inode+0x27/0x3d
 [<c1078837>] dispose_list+0x60/0x178
 [<c1078f81>] prune_icache+0x363/0x399
 [<c1078fd0>] shrink_icache_memory+0x19/0x32
 [<c1044dd7>] shrink_slab+0x104/0x172
 [<c104641e>] try_to_free_pages+0xbe/0x16f
 [<c103d9a0>] __alloc_pages+0x1d3/0x393
 [<c104037c>] kmem_getpages+0x2d/0x7f
 [<c1041869>] cache_grow+0x155/0x2a8
 [<c1041f1f>] cache_alloc_refill+0x285/0x2c2
 [<c10423c6>] kmem_cache_alloc+0x5d/0x77
 [<c1075dac>] d_alloc+0x16/0x27a
 [<c106b2b9>] real_lookup+0x40/0xc2
 [<c106b68e>] do_lookup+0x41/0x75
 [<c106c3a7>] __link_path_walk+0xce5/0x1066
 [<c106c768>] link_path_walk+0x40/0xc7
 [<c106ca87>] path_lookup+0xec/0xf7
 [<c106cbc9>] __user_walk+0x28/0x42
 [<c10667b3>] vfs_lstat+0x17/0x3f
 [<c1066d1e>] sys_lstat64+0x13/0x29
 [<c1002c5f>] sysenter_past_esp+0x54/0x75
Code: e8 bc e4 ff ff 8b 55 10 89 10 58 5a 8b 5b 0c 89 f0 31 d2 8b 4f 34 29 d8 f7 f1 3b 47 3c 72 02 0f 0b 0f af c1 8d 04 18 39 c6 74 02 <0f> 0b f6 47 39 02 74 15 6a 05 57 57 e8 1d e4 ff ff 8d 04 30 89 
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Fastboot] Re: Kdump Testing
  2005-04-27 19:23       ` Randy.Dunlap
@ 2005-04-28 11:44         ` Vivek Goyal
  2005-04-28 16:11           ` Randy.Dunlap
  0 siblings, 1 reply; 14+ messages in thread
From: Vivek Goyal @ 2005-04-28 11:44 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: vgoyal, akpm, sharyathi, fastboot, linux-kernel, ebiederm

On Wed, Apr 27, 2005 at 12:23:12PM -0700, Randy.Dunlap wrote:
> On Tue, 26 Apr 2005 14:24:48 +0530
> Vivek Goyal <vgoyal@in.ibm.com> wrote:
> 
> > > 
> > > 2.6.12-rc2-mm3 reboots vmlinux-recover-UP on panic.
> > > (vmlinux-recover-SMP hangs during [early] reboot, but -UP
> > > goes further....)
> > > 
> > > (BTW, how does I do serial console from the second
> > > kernel...?  It has the drivers, but not the command
> > > line info?  TBD.)
> > > 
> > 
> > 
> > While pre-loading the capture kernel using kexec, you can specify the command
> > line options to second kernel using --append="". You must already be passing
> > the root device. Add you serial console parameters as well something like
> > --append="console=ttyS0, 38400"
> > 
> > 
> > > vmlinux-recover-UP gets to this point, hand-written,
> > > several lines missing:
> > > 
> > > kfree_debugcheck: bad ptr c3dbffb0h.  ( == %esi)
> > > kernel BUG at <bad filename>:23128!
> > > invalid operand: 0000 [#1]
> > > DEBUG_PAGEALLOC
> > > EIP is at kfree_debugcheck+0x45/0x50
> > > 
> > > Stack dump shows lots of ext3 cache and inode functions...
> > > 
> > 
> > Can you post a full serial console output of second kernel? That would help.
> 
> I did another test run, same kernels (both running and recovery).
> The recovery kernel got a little further this time, still had
> Badness and a BUG.
> 
> ---

Ok. I am also able to see this slab corruption occurring on my machine. I can 
get away with the problem if I disable cachefs support. 

Infact, I can reproduce the problem if I boot capture kernel normally through 
BIOS with commandline "mem=64M". Looks like it is generic problem and not
associated with kexec/kdump. Cachefs might be doing some corruption.


> CacheFS: Wrong magic number on cache
> EXT3-fs: INFO: recovery required on readonly filesystem.
> EXT3-fs: write access will be enabled during recovery.
> input: ImPS/2 Generic Wheel Mouse on isa0060/serio1
> kjournald starting.  Commit interval 5 seconds
> EXT3-fs: recovery complete.
> EXT3-fs: mounted filesystem with ordered data mode.
> VFS: Mounted root (ext3 filesystem) readonly.
> Freeing unused kernel memory: 220k freed
> Adding 2104472k swap on /dev/hda7.  Priority:42 extents:1
> mismatch in kmem_cache_free: expected cache c168fc80, got c4daca80
> c4daca80 is ext3_inode_cache.
> c168fc80 is skbuff_head_cache.
> Badness in cache_free_debugcheck at mm/slab.c:1917
>  [<c1003368>] dump_stack+0x16/0x18
>  [<c1041a94>] cache_free_debugcheck+0x88/0x1d5
>  [<c10424fd>] kmem_cache_free+0x26/0x65
>  [<c10a8c01>] ext3_destroy_inode+0x17/0x19
>  [<c10784c9>] destroy_inode+0x27/0x3d
>  [<c1078837>] dispose_list+0x60/0x178
>  [<c1078f81>] prune_icache+0x363/0x399
>  [<c1078fd0>] shrink_icache_memory+0x19/0x32
>  [<c1044dd7>] shrink_slab+0x104/0x172
>  [<c104641e>] try_to_free_pages+0xbe/0x16f
>  [<c103d9a0>] __alloc_pages+0x1d3/0x393
>  [<c104037c>] kmem_getpages+0x2d/0x7f
>  [<c1041869>] cache_grow+0x155/0x2a8
>  [<c1041f1f>] cache_alloc_refill+0x285/0x2c2
>  [<c10423c6>] kmem_cache_alloc+0x5d/0x77
>  [<c1075dac>] d_alloc+0x16/0x27a
>  [<c106b2b9>] real_lookup+0x40/0xc2
>  [<c106b68e>] do_lookup+0x41/0x75
>  [<c106c3a7>] __link_path_walk+0xce5/0x1066
>  [<c106c768>] link_path_walk+0x40/0xc7
>  [<c106ca87>] path_lookup+0xec/0xf7
>  [<c106cbc9>] __user_walk+0x28/0x42
>  [<c10667b3>] vfs_lstat+0x17/0x3f
>  [<c1066d1e>] sys_lstat64+0x13/0x29
>  [<c1002c5f>] sysenter_past_esp+0x54/0x75
> slab error in cache_free_debugcheck(): cache `ext3_inode_cache': double free, or memory outside object was overwritten
>  [<c1003368>] dump_stack+0x16/0x18
>  [<c1041ad2>] cache_free_debugcheck+0xc6/0x1d5
>  [<c10424fd>] kmem_cache_free+0x26/0x65
>  [<c10a8c01>] ext3_destroy_inode+0x17/0x19
>  [<c10784c9>] destroy_inode+0x27/0x3d
>  [<c1078837>] dispose_list+0x60/0x178
>  [<c1078f81>] prune_icache+0x363/0x399
>  [<c1078fd0>] shrink_icache_memory+0x19/0x32
>  [<c1044dd7>] shrink_slab+0x104/0x172
>  [<c104641e>] try_to_free_pages+0xbe/0x16f
>  [<c103d9a0>] __alloc_pages+0x1d3/0x393
>  [<c104037c>] kmem_getpages+0x2d/0x7f
>  [<c1041869>] cache_grow+0x155/0x2a8
>  [<c1041f1f>] cache_alloc_refill+0x285/0x2c2
>  [<c10423c6>] kmem_cache_alloc+0x5d/0x77
>  [<c1075dac>] d_alloc+0x16/0x27a
>  [<c106b2b9>] real_lookup+0x40/0xc2
>  [<c106b68e>] do_lookup+0x41/0x75
>  [<c106c3a7>] __link_path_walk+0xce5/0x1066
>  [<c106c768>] link_path_walk+0x40/0xc7
>  [<c106ca87>] path_lookup+0xec/0xf7
>  [<c106cbc9>] __user_walk+0x28/0x42
>  [<c10667b3>] vfs_lstat+0x17/0x3f
>  [<c1066d1e>] sys_lstat64+0x13/0x29
>  [<c1002c5f>] sysenter_past_esp+0x54/0x75
> c3d7afb0: redzone 1: 0x0, redzone 2: 0x0.
> ------------[ cut here ]------------
> kernel BUG at <bad filename>:18422!
> invalid operand: 0000 [#1]
> DEBUG_PAGEALLOC
> Modules linked in:
> CPU:    0
> EIP:    0060:[<c1041b46>]    Not tainted VLI
> EFLAGS: 00010002   (2.6.12-rc2-mm3) 
> EIP is at cache_free_debugcheck+0x13a/0x1d5
> eax: c3d7a000   ebx: c3d7a000   ecx: 00001000   edx: 00000fb0
> esi: c3d7afb0   edi: c4daca80   ebp: c2f73bb8   esp: c2f73bac
> ds: 007b   es: 007b   ss: 0068
> Process showconsole (pid: 1264, threadinfo=c2f72000 task=c2f68ac0)
> Stack: c4d0fec4 c4daca80 c3d7bd44 c2f73be0 c10424fd c4daca80 c3d7bd44 c10a8c01 
>        00000080 00000286 c3d7bddc c2f73c2c 00000080 c2f73bf0 c10a8c01 c4daca80 
>        c3d7bd44 c2f73c00 c10784c9 c3d7bddc c3d7bddc c2f73c1c c1078837 c3d7bddc 
> Call Trace:
>  [<c100334a>] show_stack+0x7a/0x82
>  [<c1003453>] show_registers+0xe9/0x153
>  [<c100369f>] die+0x15c/0x23d
>  [<c1003a79>] do_invalid_op+0x90/0x97
>  [<c1002ed3>] error_code+0x4f/0x54
>  [<c10424fd>] kmem_cache_free+0x26/0x65
>  [<c10a8c01>] ext3_destroy_inode+0x17/0x19
>  [<c10784c9>] destroy_inode+0x27/0x3d
>  [<c1078837>] dispose_list+0x60/0x178
>  [<c1078f81>] prune_icache+0x363/0x399
>  [<c1078fd0>] shrink_icache_memory+0x19/0x32
>  [<c1044dd7>] shrink_slab+0x104/0x172
>  [<c104641e>] try_to_free_pages+0xbe/0x16f
>  [<c103d9a0>] __alloc_pages+0x1d3/0x393
>  [<c104037c>] kmem_getpages+0x2d/0x7f
>  [<c1041869>] cache_grow+0x155/0x2a8
>  [<c1041f1f>] cache_alloc_refill+0x285/0x2c2
>  [<c10423c6>] kmem_cache_alloc+0x5d/0x77
>  [<c1075dac>] d_alloc+0x16/0x27a
>  [<c106b2b9>] real_lookup+0x40/0xc2
>  [<c106b68e>] do_lookup+0x41/0x75
>  [<c106c3a7>] __link_path_walk+0xce5/0x1066
>  [<c106c768>] link_path_walk+0x40/0xc7
>  [<c106ca87>] path_lookup+0xec/0xf7
>  [<c106cbc9>] __user_walk+0x28/0x42
>  [<c10667b3>] vfs_lstat+0x17/0x3f
>  [<c1066d1e>] sys_lstat64+0x13/0x29
>  [<c1002c5f>] sysenter_past_esp+0x54/0x75
> Code: e8 bc e4 ff ff 8b 55 10 89 10 58 5a 8b 5b 0c 89 f0 31 d2 8b 4f 34 29 d8 f7 f1 3b 47 3c 72 02 0f 0b 0f af c1 8d 04 18 39 c6 74 02 <0f> 0b f6 47 39 02 74 15 6a 05 57 57 e8 1d e4 ff ff 8d 04 30 89 
>  

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Fastboot] Re: Kdump Testing
  2005-04-28 11:44         ` Vivek Goyal
@ 2005-04-28 16:11           ` Randy.Dunlap
  2005-04-28 19:08             ` Eric W. Biederman
  2005-04-29  3:08             ` [PATCH] Kdump docs Randy.Dunlap
  0 siblings, 2 replies; 14+ messages in thread
From: Randy.Dunlap @ 2005-04-28 16:11 UTC (permalink / raw)
  To: vgoyal; +Cc: akpm, ebiederm, fastboot, linux-kernel, sharyathi

On Thu, 28 Apr 2005 17:14:16 +0530
Vivek Goyal <vgoyal@in.ibm.com> wrote:

> > > Can you post a full serial console output of second kernel? That would help.
> > 
> > I did another test run, same kernels (both running and recovery).
> > The recovery kernel got a little further this time, still had
> > Badness and a BUG.
> > 
> > ---
> 
> Ok. I am also able to see this slab corruption occurring on my machine. I can 
> get away with the problem if I disable cachefs support. 
> 
> Infact, I can reproduce the problem if I boot capture kernel normally through 
> BIOS with commandline "mem=64M". Looks like it is generic problem and not
> associated with kexec/kdump. Cachefs might be doing some corruption.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Wheeeeeeeeee.  Great, we (I) can do without cachefs,
and when I do that, kexec + kdump works.
First time that I've seen kdump work.  :)

-rw-r--r--  1 root root 1.0G Apr 28 08:41 oldmem.0428
-r--------  1 root root 960M Apr 28 08:36 vmcore.0428

My (crashing/panic) kernel is built without -g, but gdb
can still tell me this much:

(gdb) bt
#0  0xc010ef95 in crash_get_current_regs ()
#1  0x00000000 in ?? ()
#2  0xee821ea0 in ?? ()
#3  0xee821ea0 in ?? ()
#4  0xee821ea0 in ?? ()
#5  0x00000046 in ?? ()
#6  0x00000000 in ?? ()
#7  0x00000000 in ?? ()
#8  0x00000000 in ?? ()
#9  0xee82c000 in ?? ()
#10 0x00000000 in ?? ()
#11 0xc010ed38 in machine_kexec ()


Thanks for following up, tracking, working on this.

---
~Randy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Fastboot] Re: Kdump Testing
  2005-04-28 16:11           ` Randy.Dunlap
@ 2005-04-28 19:08             ` Eric W. Biederman
  2005-04-29  3:08             ` [PATCH] Kdump docs Randy.Dunlap
  1 sibling, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2005-04-28 19:08 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: vgoyal, akpm, fastboot, linux-kernel, sharyathi

"Randy.Dunlap" <rddunlap@osdl.org> writes:

> On Thu, 28 Apr 2005 17:14:16 +0530
> Vivek Goyal <vgoyal@in.ibm.com> wrote:
> 
> > > > Can you post a full serial console output of second kernel? That would
> help.
> 
> > > 
> > > I did another test run, same kernels (both running and recovery).
> > > The recovery kernel got a little further this time, still had
> > > Badness and a BUG.
> > > 
> > > ---
> > 
> > Ok. I am also able to see this slab corruption occurring on my machine. I can
> 
> > get away with the problem if I disable cachefs support. 
> > 
> > Infact, I can reproduce the problem if I boot capture kernel normally through
> 
> > BIOS with commandline "mem=64M". Looks like it is generic problem and not
> > associated with kexec/kdump. Cachefs might be doing some corruption.
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> Wheeeeeeeeee.  Great, we (I) can do without cachefs,
> and when I do that, kexec + kdump works.
> First time that I've seen kdump work.  :)
> 
> -rw-r--r--  1 root root 1.0G Apr 28 08:41 oldmem.0428
> -r--------  1 root root 960M Apr 28 08:36 vmcore.0428
> 
> My (crashing/panic) kernel is built without -g, but gdb
> can still tell me this much:
> 
> (gdb) bt
> #0  0xc010ef95 in crash_get_current_regs ()
> #1  0x00000000 in ?? ()
> #2  0xee821ea0 in ?? ()
> #3  0xee821ea0 in ?? ()
> #4  0xee821ea0 in ?? ()
> #5  0x00000046 in ?? ()
> #6  0x00000000 in ?? ()
> #7  0x00000000 in ?? ()
> #8  0x00000000 in ?? ()
> #9  0xee82c000 in ?? ()
> #10 0x00000000 in ?? ()
> #11 0xc010ed38 in machine_kexec ()
> 
> 
> Thanks for following up, tracking, working on this.

Congratulations everyone.  The good really good news is when the
recovery kernel failed it failed early enough it did not make things
worse.  It is good to see that prediction confirmed :)

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] Kdump docs.
  2005-04-28 16:11           ` Randy.Dunlap
  2005-04-28 19:08             ` Eric W. Biederman
@ 2005-04-29  3:08             ` Randy.Dunlap
  2005-04-29  5:07               ` Vivek Goyal
  1 sibling, 1 reply; 14+ messages in thread
From: Randy.Dunlap @ 2005-04-29  3:08 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: vgoyal, akpm, sharyathi, fastboot, ebiederm, linux-kernel

On Thu, 28 Apr 2005 09:11:19 -0700 Randy.Dunlap wrote:

| Wheeeeeeeeee.  Great, we (I) can do without cachefs,
| and when I do that, kexec + kdump works.
| First time that I've seen kdump work.  :)


Vivek, Hari, Andrew-

Here's a patch to make Documentation/kdump.txt cleaner & clearer.

---

From: Randy Dunlap <rddunlap@osdl.org>

Cleanups and clear-ups for kdump doc:
  typos, punctuation, 80 columns, examples.

Signed-off-by: Randy Dunlap <rddunlap@osdl.org>
---

 Documentation/kdump.txt |   89 ++++++++++++++++++++++++++++--------------------
 1 files changed, 52 insertions(+), 37 deletions(-)

diff -Naurp ./Documentation/kdump.txt~kdump_docco ./Documentation/kdump.txt
--- ./Documentation/kdump.txt~kdump_docco	2005-04-22 10:01:39.000000000 -0700
+++ ./Documentation/kdump.txt	2005-04-28 19:55:03.000000000 -0700
@@ -1,4 +1,4 @@
-Documentation for kdump - the kexec based crash dumping solution
+Documentation for kdump - the kexec-based crash dumping solution
 ================================================================
 
 DESIGN
@@ -11,10 +11,10 @@ DMA from the first kernel does not corru
 
 All the necessary information about Core image is encoded in ELF format and
 stored in reserved area of memory before crash. Physical address of start of
-elf header is passed to new kernel through command line parameter elfcorehdr=.
+ELF header is passed to new kernel through command line parameter elfcorehdr=.
 
-On i386, first 640k of physical memory is needed to boot, irrespctive of where
-the kernel loads at. Hence, this region is backed up by kexec just before
+On i386, the first 640 KB of physical memory is needed to boot, irrespective
+of where the kernel loads. Hence, this region is backed up by kexec just before
 rebooting into the new kernel.
 
 In the second kernel, "old memory" can be accessed in two ways.
@@ -22,59 +22,72 @@ In the second kernel, "old memory" can b
 - The first one is through a /dev/oldmem device interface. A capture utility
   can read the device file and write out the memory in raw format. This is raw
   dump of memory and analysis/capture tool should be intelligent enough to
-  determine where to look for the right information. Elf headers (elfcorehdr=)
+  determine where to look for the right information. ELF headers (elfcorehdr=)
   can become handy here.
 
 - The second interface is through /proc/vmcore. This exports the dump as an ELF
   format file which can be written out using any file copy command
   (cp, scp, etc). Further, gdb can be used to perform limited debugging on
   the dump file. This method ensures methods ensure that there is correct
-  ordering of the dump pages (corresponding to the first 640k that has been
+  ordering of the dump pages (corresponding to the first 640 KB that has been
   relocated).
 
 SETUP
 =====
 
-1) Obtain the appropriate -mm tree patch and apply it on to the vanilla
-   kernel tree.
+1) Download and build the appropriate version of kexec-tools.
 
-2) Obtain appropriate version of kexec-tools.
+2) Download and build the appropriate (latest) kexec/kdump (-mm) kernel
+   patchset and apply it to the vanilla kernel tree.
 
-3) Two kernels need to be built in order to get this feature working.
+   Two kernels need to be built in order to get this feature working.
 
-   First kernel:
-   a) Enable "kexec system call" feature.
-   b) Enable "sysfs file system support" (Pseudo filesystems).
-   c) Boot into first kernel with command line "crashkernel=Y@X".  Put
-      appropriate values for X and Y. Y denotes, how much memory to reserve for
-      second kernel, and X denotes at what physical address reserved memory
-      section starts. For example, crashkernel=48M@16M.
-
-   Second kernel:
-   a) Enable "kernel crash dumps" feature.
-   b) Specifiy a suitable value for "Physical address where the kernel is
-      loaded". Typically this value should be same as X (See option c) above).
-   c) Enable "/proc/vmcore support" (Optional).
-
-      Note: Option a) and b) depend upon "Configure standard kernel feature
-            (for small systems)".
-	    Option a) also depends on CONFIG_HIGHMEM.
-	    Both option a) and b) are under "Processor Types and Features"
+  A) First kernel:
+   a) Enable "kexec system call" feature (in Processor type and features).
+	CONFIG_KEXEC=y
+   b) This kernel's physical load address should be the default value of
+      0x100000 (0x100000, 1 MB) (in Processor type and features).
+	CONFIG_PHYSICAL_START=0x100000
+   c) Enable "sysfs file system support" (in Pseudo filesystems).
+	CONFIG_SYSFS=y
+   d) Boot into first kernel with the command line parameter "crashkernel=Y@X".
+      Use appropriate values for X and Y. Y denotes how much memory to reserve
+      for the second kernel, and X denotes at what physical address the reserved
+      memory section starts. For example: "crashkernel=64M@16M".
+
+  B) Second kernel:
+   a) Enable "kernel crash dumps" feature (in Processor type and features).
+	CONFIG_CRASH_DUMP=y
+   b) Specify a suitable value for "Physical address where the kernel is
+      loaded" (in Processor type and features). Typically this value
+      should be same as X (See option b) above, e.g., 16 MB or 0x1000000.
+	CONFIG_PHYSICAL_START=0x1000000
+   c) Enable "/proc/vmcore support" (Optional, in Pseudo filesystems).
+	CONFIG_PROC_VMCORE=y
+
+  Note: Options a) and b) depend upon "Configure standard kernel features
+	(for small systems)" (under General setup).
+	Option a) also depends on CONFIG_HIGHMEM (under Processor
+		type and features).
+	Both option a) and b) are under "Processor type and features".
 
-3) Boot into the first kernel. You are now ready to try out kexec based crash
+3) Boot into the first kernel. You are now ready to try out kexec-based crash
    dumps.
 
-4) Load the second kernel to be booted using
+4) Load the second kernel to be booted using:
 
    kexec -p <second-kernel> --crash-dump --args-linux --append="root=<root-dev>
    maxcpus=1 init 1"
 
    Note: i) <second-kernel> has to be a vmlinux image. bzImage will not work,
 	    as of now.
-	ii) By default elf headers are stored in ELF32 format(for i386). This is
-	    sufficient to represent the physical memory up to 4GB. To store
-	    headers in ELF64 format, specifiy "--elf64-core-headers" on kexec
-	    command line additionally.
+	ii) By default ELF headers are stored in ELF32 format (for i386). This
+	    is sufficient to represent the physical memory up to 4GB. To store
+	    headers in ELF64 format, specifiy "--elf64-core-headers" on the
+	    kexec command line additionally.
+       iii) For now (or until it is fixed), it's best to build the
+	    second-kernel without multi-processor support, i.e., make it
+	    a uniprocessor kernel.
 
 5) System reboots into the second kernel when a panic occurs. A module can be
    written to force the panic, for testing purposes.
@@ -83,14 +96,16 @@ SETUP
 
    cp /proc/vmcore <dump-file>
 
-   Dump can also be accessed as a /dev/oldmem device for a linear/raw view.
-   To create the device, type
+   Dump memory can also be accessed as a /dev/oldmem device for a linear/raw
+   view.  To create the device, type:
 
    mknod /dev/oldmem c 1 12
 
    Use "dd" with suitable options for count, bs and skip to access specific
    portions of the dump.
 
+   Entire memory:  dd if=/dev/oldmem of=oldmem.001
+
 ANALYSIS
 ========
 
@@ -102,7 +117,7 @@ Limited analysis can be done using gdb o
 Stack trace for the task on processor 0, register display, memory display
 work fine.
 
-Note: gdb can not analyse core files generated in ELF64 format for i386.
+Note: gdb cannot analyse core files generated in ELF64 format for i386.
 
 TODO
 ====


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Kdump docs.
  2005-04-29  3:08             ` [PATCH] Kdump docs Randy.Dunlap
@ 2005-04-29  5:07               ` Vivek Goyal
  2005-04-29 14:26                 ` [Fastboot] " Randy.Dunlap
  2005-04-30  3:04                 ` [PATCH] Kdump doc. fix option typo Randy.Dunlap
  0 siblings, 2 replies; 14+ messages in thread
From: Vivek Goyal @ 2005-04-29  5:07 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: vgoyal, akpm, sharyathi, fastboot, ebiederm, linux-kernel

Hi Randy,

> +  A) First kernel:
> +   a) Enable "kexec system call" feature (in Processor type and features).
> +	CONFIG_KEXEC=y
> +   b) This kernel's physical load address should be the default value of
> +      0x100000 (0x100000, 1 MB) (in Processor type and features).
> +	CONFIG_PHYSICAL_START=0x100000
> +   c) Enable "sysfs file system support" (in Pseudo filesystems).
> +	CONFIG_SYSFS=y
> +   d) Boot into first kernel with the command line parameter "crashkernel=Y@X".
> +      Use appropriate values for X and Y. Y denotes how much memory to reserve
> +      for the second kernel, and X denotes at what physical address the reserved
> +      memory section starts. For example: "crashkernel=64M@16M".
> +
> +  B) Second kernel:
> +   a) Enable "kernel crash dumps" feature (in Processor type and features).
> +	CONFIG_CRASH_DUMP=y
> +   b) Specify a suitable value for "Physical address where the kernel is
> +      loaded" (in Processor type and features). Typically this value
> +      should be same as X (See option b) above, e.g., 16 MB or 0x1000000.

Should above line be as follows.
"should be same as X (See option d) above."

This will make clear what is X and what should be the new value of 
CONFIG_PHYSICAL_START. 

Thanks for testing out and providing a clearer documentation.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Fastboot] Re: [PATCH] Kdump docs.
  2005-04-29  5:07               ` Vivek Goyal
@ 2005-04-29 14:26                 ` Randy.Dunlap
  2005-04-30  3:04                 ` [PATCH] Kdump doc. fix option typo Randy.Dunlap
  1 sibling, 0 replies; 14+ messages in thread
From: Randy.Dunlap @ 2005-04-29 14:26 UTC (permalink / raw)
  To: vgoyal; +Cc: akpm, sharyathi, fastboot, linux-kernel, ebiederm

On Fri, 29 Apr 2005 10:37:29 +0530 Vivek Goyal wrote:

| Hi Randy,
| 
| > +  A) First kernel:
| > +   a) Enable "kexec system call" feature (in Processor type and features).
| > +	CONFIG_KEXEC=y
| > +   b) This kernel's physical load address should be the default value of
| > +      0x100000 (0x100000, 1 MB) (in Processor type and features).
| > +	CONFIG_PHYSICAL_START=0x100000
| > +   c) Enable "sysfs file system support" (in Pseudo filesystems).
| > +	CONFIG_SYSFS=y
| > +   d) Boot into first kernel with the command line parameter "crashkernel=Y@X".
| > +      Use appropriate values for X and Y. Y denotes how much memory to reserve
| > +      for the second kernel, and X denotes at what physical address the reserved
| > +      memory section starts. For example: "crashkernel=64M@16M".
| > +
| > +  B) Second kernel:
| > +   a) Enable "kernel crash dumps" feature (in Processor type and features).
| > +	CONFIG_CRASH_DUMP=y
| > +   b) Specify a suitable value for "Physical address where the kernel is
| > +      loaded" (in Processor type and features). Typically this value
| > +      should be same as X (See option b) above, e.g., 16 MB or 0x1000000.
| 
| Should above line be as follows.
| "should be same as X (See option d) above."

Yes, thanks for catching that.  Now how to update it....?

| This will make clear what is X and what should be the new value of 
| CONFIG_PHYSICAL_START. 


---
~Randy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] Kdump doc. fix option typo.
  2005-04-29  5:07               ` Vivek Goyal
  2005-04-29 14:26                 ` [Fastboot] " Randy.Dunlap
@ 2005-04-30  3:04                 ` Randy.Dunlap
  1 sibling, 0 replies; 14+ messages in thread
From: Randy.Dunlap @ 2005-04-30  3:04 UTC (permalink / raw)
  To: vgoyal; +Cc: akpm, sharyathi, fastboot, linux-kernel, ebiederm

On Fri, 29 Apr 2005 10:37:29 +0530 Vivek Goyal wrote:

| Should above line be as follows.
| "should be same as X (See option d) above."
| 
| This will make clear what is X and what should be the new value of 
| CONFIG_PHYSICAL_START. 


From: Randy Dunlap <rddunlap@osdl.org>

Fix one-letter typo of option b->d.

Signed-off-by: Randy Dunlap <rddunlap@osdl.org>
---

 Documentation/kdump.txt |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff -Naurp ./Documentation/kdump.txt~kdump_doc_fix_optionb ./Documentation/kdump.txt
--- ./Documentation/kdump.txt~kdump_doc_fix_optionb	2005-04-28 19:55:03.000000000 -0700
+++ ./Documentation/kdump.txt	2005-04-29 19:59:32.000000000 -0700
@@ -60,7 +60,7 @@ SETUP
 	CONFIG_CRASH_DUMP=y
    b) Specify a suitable value for "Physical address where the kernel is
       loaded" (in Processor type and features). Typically this value
-      should be same as X (See option b) above, e.g., 16 MB or 0x1000000.
+      should be same as X (See option d) above, e.g., 16 MB or 0x1000000.
 	CONFIG_PHYSICAL_START=0x1000000
    c) Enable "/proc/vmcore support" (Optional, in Pseudo filesystems).
 	CONFIG_PROC_VMCORE=y


---

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Fastboot] Re: Kdump Testing
  2005-04-22 10:46 Kdump Testing Nagesh Sharyathi
@ 2005-04-22 12:32 ` Eric W. Biederman
  0 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2005-04-22 12:32 UTC (permalink / raw)
  To: Nagesh Sharyathi; +Cc: linux-kernel, fastboot, akpm

Nagesh Sharyathi <sharyathi@in.ibm.com> writes:

> Here is the console boot log, before the machine jumps to BIOS 
> after hang during panic kerenl boot

Ok thanks.  So this is manually triggered with SysRq
and the kexec part works but the recover kernel simply fails
to boot.

It looks like that hunk of the ACPI code that messes up maxcpus=1
needs to be looked at.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2005-04-30  3:05 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-04-23  3:30 [Fastboot] Re: Kdump Testing Vivek Goyal
2005-04-25 12:15 ` Nagesh Sharyathi
2005-04-25 23:09   ` Randy.Dunlap
2005-04-26  8:54     ` Vivek Goyal
2005-04-27 16:46       ` Randy.Dunlap
2005-04-27 19:23       ` Randy.Dunlap
2005-04-28 11:44         ` Vivek Goyal
2005-04-28 16:11           ` Randy.Dunlap
2005-04-28 19:08             ` Eric W. Biederman
2005-04-29  3:08             ` [PATCH] Kdump docs Randy.Dunlap
2005-04-29  5:07               ` Vivek Goyal
2005-04-29 14:26                 ` [Fastboot] " Randy.Dunlap
2005-04-30  3:04                 ` [PATCH] Kdump doc. fix option typo Randy.Dunlap
  -- strict thread matches above, loose matches on Subject: below --
2005-04-22 10:46 Kdump Testing Nagesh Sharyathi
2005-04-22 12:32 ` [Fastboot] " Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).