From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S262778AbUKXWCV (ORCPT ); Wed, 24 Nov 2004 17:02:21 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S262868AbUKXWCV (ORCPT ); Wed, 24 Nov 2004 17:02:21 -0500 Received: from over.ny.us.ibm.com ([32.97.182.111]:21655 "EHLO over.ny.us.ibm.com") by vger.kernel.org with ESMTP id S262778AbUKXWCE (ORCPT ); Wed, 24 Nov 2004 17:02:04 -0500 Subject: Re: [PATCH] kdump: Fix for boot problems on SMP From: Badari Pulavarty To: Hariprasad Nellitheertha Cc: Akinobu Mita , Andrew Morton , Linux Kernel Mailing List , varap@us.ibm.com In-Reply-To: <41A37E5C.8050305@in.ibm.com> References: <419CACE2.7060408@in.ibm.com> <20041119153052.21b387ca.akpm@osdl.org> <1100912759.4987.207.camel@dyn318077bld.beaverton.ibm.com> <200411201204.37750.amgta@yacht.ocn.ne.jp> <41A20DB5.2050302@in.ibm.com> <1101170617.4987.268.camel@dyn318077bld.beaverton.ibm.com> <41A37E5C.8050305@in.ibm.com> Content-Type: text/plain Organization: Message-Id: <1101326878.26063.18.camel@dyn318077bld.beaverton.ibm.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-5) Date: 24 Nov 2004 12:07:58 -0800 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Hari, I have a success case and a failure case to report. 1) Success first.. I was able save /proc/vmcore when my machine paniced (not thro sysrq) and gdb showed the stack correctly :) For some reason, gdb failed to show stack correctly, when I ran it on /proc/vmcore directly, when I am on kxec kernel :( # gdb ../l*9/vmlinux vmcore.3 ... Core was generated by `root=/dev/sda2 dump init 1 memmap=exactmap memmap=640k@0 memmap=32M@16M console='. #0 crash_get_current_regs (regs=0xc050b000) at arch/i386/kernel/crash_dump.c:98 98 } (gdb) bt #0 crash_get_current_regs (regs=0xc050b000) at arch/i386/kernel/crash_dump.c:98 #1 0xc0139986 in __crash_machine_kexec () at kernel/crash.c:83 #2 0xc011b2aa in panic (fmt=0xc050b000 "") at include/linux/crash_dump.h:21 #3 0xc0104ed5 in die (str=0x0, regs=0x1, err=2) at arch/i386/kernel/traps.c:392 #4 0xc0113ad2 in do_page_fault (regs=0xd4937edc, error_code=2) at arch/i386/mm/fault.c:480 #5 0xc0104707 in error_code () at /tmp/ccK5IM1b.s:2135 #6 0xc017a55e in aio_put_req (req=0x0) at fs/aio.c:529 #7 0xc017ba0d in io_submit_one (ctx=0xd46fddc0, user_iocb=0xbfffecb0, iocb=0xf75af124) at fs/aio.c:1551 #8 0xc017baf1 in sys_io_submit (ctx_id=3226513408, nr=32, iocbpp=0xbfffec30) at fs/aio.c:1609 #9 0xc0103c63 in syscall_call () at /tmp/ccK5IM1b.s:1946 #10 0xc0407220 in default_exec_domain () (gdb) q 2) Failure case: When I recreated the panic again, it tried to run kexec() and ran into exception in kexec() code, and machine hung. Here is the console output: Unable to handle kernel NULL pointer dereference at virtual address 00000020 printing eip: c128c044 *pde = 00000000 Oops: 0002 [#1] SMP Modules linked in: CPU: 0 EIP: 0060:[] Not tainted VLI EFLAGS: 00010086 (2.6.10-rc2-mm2kexec) EIP is at _spin_lock_irq+0x4/0x20 <<<<<<<<<**** my original panic eax: 00000020 ebx: c2dd77e0 ecx: c2821bb0 edx: c2821b80 esi: 00000020 edi: 00000000 ebp: c1dd9f10 esp: c1dd9f10 ds: 007b es: 007b ss: 0068 Process aio_tio (pid: 8084, threadinfo=c1dd8000 task=c2110570) Stack: c1dd9f2c c107a56e c1dd9f18 c1dd9f18 c2821ba0 c2dd77e0 c1dd9f70 c1dd9f54 c107ba1d c2821b80 00000000 00000000 bfffecb0 c2821b80 c2821b80 00000000 bfffec30 c1dd9fbc c107bb01 c1dd9f70 bfffecb0 00000040 bfffecb0 00000000 Call Trace: [] show_stack+0x7f/0xa0 [] show_registers+0x15e/0x1c0 [] die+0xf2/0x180 [] do_page_fault+0x3b2/0x710 [] error_code+0x2b/0x30 [] aio_put_req+0x1e/0x90 [] io_submit_one+0x20d/0x250 [] sys_io_submit+0xa1/0x110 [] syscall_call+0x7/0xb Code: fe 0a 79 12 a9 00 02 00 00 74 01 fb f3 90 80 3a 00 7e f9 fa eb e9 5d c3 90 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 55 89 e5 fa fe 08 79 09 f3 90 80 38 00 7e f9 eb f2 5d c3 8d b6 00 00 00 <0>Fatal exception: panic in 5 seconds Kernel panic - not syncing: Fatal exception <0>kexec: opening parachute <<<<<<<<<<*** trying to kexec ? Unable to handle kernel paging request at virtual address c30a0000 printing eip: c1039956 *pde = 00000000 Oops: 0002 [#2] SMP Modules linked in: CPU: 0 EIP: 0060:[] Not tainted VLI EFLAGS: 00010206 (2.6.10-rc2-mm2kexec) EIP is at __crash_machine_kexec+0x66/0x110 <<<<<<** panic in kexec eax: 00005400 ebx: c2003180 ecx: 000001e0 edx: 00000001 esi: c140b000 edi: c30a0000 ebp: c1dd9d98 esp: c1dd9d80 ds: 007b es: 007b ss: 0068 Process aio_tio (pid: 8084, threadinfo=c1dd8000 task=c2110570) Stack: c140b000 c1dd9d94 c1dd9d98 c1dd8000 c1dd9edc c12a01d5 c1dd9db4 c101b2aa 00000000 c140c380 c129e8dd c1dd9dc0 c1dd8000 c1dd9df8 c1004ed5 c129e8ce 00000001 c1dd9dcc 00000001 c1dd9edc c12a01d5 00000002 000000ff 0000000b Call Trace: [] show_stack+0x7f/0xa0 [] show_registers+0x15e/0x1c0 [] die+0xf2/0x180 [] do_page_fault+0x3b2/0x710 [] error_code+0x2b/0x30 [] panic+0x5a/0x120 [] die+0x165/0x180 [] do_page_fault+0x3b2/0x710 [] error_code+0x2b/0x30 [] aio_put_req+0x1e/0x90 [] io_submit_one+0x20d/0x250 [] sys_io_submit+0xa1/0x110 [] syscall_call+0x7/0xb Code: 2a c1 be 01 00 00 00 89 35 a4 c7 40 c1 e8 03 22 fe ff 8b 0d a4 c7 40 c1 85 c9 75 6c bf 00 00 0a c3 be 00 b0 40 c1 b9 e0 01 00 00 a5 c7 04 24 80 07 0a c3 c7 44 24 04 80 b7 40 c1 c7 44 24 08 <0>Fatal exception: panic in 5 seconds Thanks, Badari On Tue, 2004-11-23 at 10:15, Hariprasad Nellitheertha wrote: > Hi Badari, > > Badari Pulavarty wrote: > > More info testing results... > > > > gdb is not showing the stack info properly, on my saved vmcore. > > I thought vmlinux is not matching the vmcore, so I verified that > > vmcore and vmlinux matchup. But still no luck... > > I will try to recreate this using the 'sysrq' method you described in > the earlier mail. Will let you know my findings asap. > > Thanks very much for trying kdump! > > Regards, Hari >