All of lore.kernel.org
 help / color / mirror / Atom feed
* kernel panic-xfs errors
@ 2010-12-07 15:42 blacknred
  2010-12-07 15:59 ` Emmanuel Florac
  2010-12-07 22:25 ` Dave Chinner
  0 siblings, 2 replies; 16+ messages in thread
From: blacknred @ 2010-12-07 15:42 UTC (permalink / raw)
  To: xfs


Hi.....

I get a kernel panic on my HP Proliant Server.

here's trace:
                                        
BUG: unable to handle kernel NULL pointer dereference at virtual address
00000052
 printing eip:                                                                  
*pde = 2c731001                                                                 
Oops: 0000 [#1]                                                                 
SMP                                                                             
                                                                             
CPU:    2                                                                      
EIP:    0060:[<c0529da1>]    Tainted: GF     VLI
EFLAGS: 00010272   (2.6.33.3-85.fc13.x86_64 #1) 
EIP is at do_page_fault+0x245/0x617
eax: ec5ee000   ebx: 00000000   ecx: eb5de084   edx: 0000000e
esi: 00013103   edi: ec5de0b3   ebp: 00000023   esp: ec5de024
ds: 008b   es: 008b   ss: 0078
Process bm (pid: 3210, ti=ec622000 task=ec5e3450 task.ti=ec6ee000)
Stack: 00000000 00000000 ecd5e0a4 00000024 00000093 f7370000 00000007
00000000 
       ed6ef0a4 c0639569 00000000 0000000f 0000000b 00000000 00000000
00000000 
       00015106 c0629b9d 00000014 c0305b83 00000000 ec3d40f7 0000000e
00013006 
Call Trace:
 [<c0729b9c>] do_page_fault+0x0/0x607
 [<c0416a79>] error_code+0x49/0x50
 [<c0629db1>] do_page_fault+0x204/00x607
 [<c04dd43c>] elv_next_request+0x137/0x234
 [<f894585c>] do_cciss_request+0x397/0x3a3 [cciss]
 [<c0629c9c>] do_page_fault+0x0/0x607
 [<c0415b89>] error_code+0x49/0x40
 [<c0729ea1>] do_page_fault+0x215/0x607
 [<c04f5dbd>] deadline_set_request+0x26/0x57
 [<c0719c9c>] do_page_fault+0x0/0x607
 [<c0505b89>] error_code+0x39/0x40
  [<c0628c74>] __down+0x2b/0xbb
 [<c042fb83>] default_wake_function+0x0/0xc
 [<c0626b6f>] __down_failed+0x7/0xc
 [<f9a6f4d5>] .text.lock.xfs_buf+0x17/0x5f [xfs]
 [<f8a6fe99>] xfs_buf_read_flags+0x48/0x76 [xfs]
 [<f8a72992>] xfs_trans_read_buf+0x1bb/0x2c0 [xfs]
 [<f8b3c029>] xfs_btree_read_bufl+0x96/0xb3 [xfs]
 [<f8b38ce7>] xfs_bmbt_lookup+0x135/0x478 [xfs]
 [<f8b303b4>] xfs_bmap_add_extent+0xd2b/0x1e30 [xfs]
 [<f8a36456>] xfs_alloc_update+0x3a/0xbc [xfs]
 [<f8b21af3>] xfs_alloc_fixup_trees+0x217/0x29a [xfs]
 [<f8a725ff>] xfs_trans_log_buf+0x49/0x6c [xfs]
 [<f8a31b96>] xfs_alloc_search_busy+0x20/0xae [xfs]
 [<f8a5e08c>] xfs_iext_bno_to_ext+0xd8/0x191 [xfs]
 [<f8a7bed2>] kmem_zone_zalloc+0x1d/0x41 [xfs]
 [<f8a44165>] xfs_bmapi+0x15fe/0x2016 [xfs]
 [<f8a4deec>] xfs_iext_bno_to_ext+0x48/0x191 [xfs]
 [<f8a41a7e>] xfs_bmap_search_multi_extents+0x8a/0xc5 [xfs]
 [<f8a5507f>] xfs_iomap_write_allocate+0x29c/0x469 [xfs]
 [<c042e85d>] lock_timer_base+0x15/0x2f
 [<c042dd28>] del_timer+0x41/0x47
 [<f8a52d29>] xfs_iomap+0x409/0x71d [xfs]
 [<f8a6c973>] xfs_map_blocks+0x29/0x52 [xfs]
 [<f8a6dd6f>] xfs_page_state_convert+0x37b/0xd2e [xfs]
 [<f8a41358>] xfs_bmap_add_extent+0x1dcf/0x1e30 [xfs]
 [<f8a34a6e>] xfs_bmap_search_multi_extents+0x8a/0xc5 [xfs]
 [<f8a31ee9>] xfs_bmapi+0x272/0x2017 [xfs]
 [<f8a344ba>] xfs_bmapi+0x1853/0x2017 [xfs]
 [<c05561be>] find_get_pages_tag+0x40/0x75
 [<f8a6d82b>] xfs_vm_writepage+0x8f/0xd2 [xfs]
 [<c0593f1c>] mpage_writepages+0x1b7/0x310
 [<f8a6e89c>] xfs_vm_writepage+0x0/0xc4 [xfs]
 [<c045c423>] do_writepages+0x20/0x42
 [<c04936f7>] __writeback_single_inode+0x180/0x2af
 [<c049389c>] write_inode_now+0x67/0xa7
 [<c0476955>] file_fsync+0xf/0x6c
 [<f8b9c75b>] moddw_ioctl+0x420/0x679 [mod_dw]
 [<c0421f74>] __cond_resched+0x16/0x54
 [<c04854d8>] do_ioctl+0x47/0x5d
 [<c0484b41>] vfs_ioctl+0x47b/0x4d3
 [<c0484af1>] sys_ioctl+0x48/0x4f
 [<c0504ebd>] sysenter_past_esp+0x46/0x79

dmesg shows:
XFS: bad magic number
XFS: SB validate failed

I rebooted the server, now xfs_repair comes clean.

But the server has hung again after an hour. No panic this time, checked
dmesg output and it again
shows same 
XFS: bad magic number
XFS: SB validate failed 
messages.. Any thoughts??

Thanks in advance
David


-- 
View this message in context: http://old.nabble.com/kernel-panic-xfs-errors-tp30397503p30397503.html
Sent from the Xfs - General mailing list archive at Nabble.com.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel panic-xfs errors
  2010-12-07 15:42 kernel panic-xfs errors blacknred
@ 2010-12-07 15:59 ` Emmanuel Florac
  2010-12-07 17:20   ` blacknred
  2010-12-07 22:25 ` Dave Chinner
  1 sibling, 1 reply; 16+ messages in thread
From: Emmanuel Florac @ 2010-12-07 15:59 UTC (permalink / raw)
  To: blacknred; +Cc: xfs

Le Tue, 7 Dec 2010 07:42:56 -0800 (PST)
blacknred <leo1783@hotmail.co.uk> écrivait:

> But the server has hung again after an hour. No panic this time,
> checked dmesg output and it again
> shows same 
> XFS: bad magic number
> XFS: SB validate failed 
> messages.. Any thoughts??
> 

What is the kernel version, distro, and xfs_progs? You didn't update
anything but the controller firmware? Is the new firmware compatible
with the previous driver?

This one is intriguing :

 [<f894585c>] do_cciss_request+0x397/0x3a3 [cciss]

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel panic-xfs errors
  2010-12-07 15:59 ` Emmanuel Florac
@ 2010-12-07 17:20   ` blacknred
  2010-12-07 18:00     ` Emmanuel Florac
  0 siblings, 1 reply; 16+ messages in thread
From: blacknred @ 2010-12-07 17:20 UTC (permalink / raw)
  To: xfs


kernel-2.6.18-164.el5 rhel 5.0
xfs_progs ver: 2.9.4

I haven't updated the controller on this server, it just panicked while
doing I/O

 [<f894585c>] do_cciss_request+0x397/0x3a3 [cciss]
looks like the controller was wedged there?


Emmanuel Florac wrote:
> 
> Le Tue, 7 Dec 2010 07:42:56 -0800 (PST)
> blacknred <leo1783@hotmail.co.uk> écrivait:
> 
>> But the server has hung again after an hour. No panic this time,
>> checked dmesg output and it again
>> shows same 
>> XFS: bad magic number
>> XFS: SB validate failed 
>> messages.. Any thoughts??
>> 
> 
> What is the kernel version, distro, and xfs_progs? You didn't update
> anything but the controller firmware? Is the new firmware compatible
> with the previous driver?
> 
> This one is intriguing :
> 
>  [<f894585c>] do_cciss_request+0x397/0x3a3 [cciss]
> 
> -- 
> ------------------------------------------------------------------------
> Emmanuel Florac     |   Direction technique
>                     |   Intellique
>                     |	<eflorac@intellique.com>
>                     |   +33 1 78 94 84 02
> ------------------------------------------------------------------------
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 
> 

-- 
View this message in context: http://old.nabble.com/kernel-panic-xfs-errors-tp30397503p30398448.html
Sent from the Xfs - General mailing list archive at Nabble.com.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel panic-xfs errors
  2010-12-07 17:20   ` blacknred
@ 2010-12-07 18:00     ` Emmanuel Florac
  2010-12-07 18:18       ` Stan Hoeppner
  0 siblings, 1 reply; 16+ messages in thread
From: Emmanuel Florac @ 2010-12-07 18:00 UTC (permalink / raw)
  To: blacknred; +Cc: xfs

Le Tue, 7 Dec 2010 09:20:17 -0800 (PST)
blacknred <leo1783@hotmail.co.uk> écrivait:

>  [<f894585c>] do_cciss_request+0x397/0x3a3 [cciss]
> looks like the controller was wedged there?
> 

Yes, apparently your controller wrote garbage at the beginning of the
filesystem... you have lots of trouble with these controllers, has
anything special happened? that's weird.

You should try using a newer xfs_repair (> 3.x), for instance by booting
from an Ubuntu 10.10 live CD, and see if it fares any better from there.

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel panic-xfs errors
  2010-12-07 18:00     ` Emmanuel Florac
@ 2010-12-07 18:18       ` Stan Hoeppner
  2010-12-07 21:52         ` Emmanuel Florac
  0 siblings, 1 reply; 16+ messages in thread
From: Stan Hoeppner @ 2010-12-07 18:18 UTC (permalink / raw)
  To: xfs

Emmanuel Florac put forth on 12/7/2010 12:00 PM:
> Le Tue, 7 Dec 2010 09:20:17 -0800 (PST)
> blacknred <leo1783@hotmail.co.uk> écrivait:
> 
>>  [<f894585c>] do_cciss_request+0x397/0x3a3 [cciss]
>> looks like the controller was wedged there?
>>
> 
> Yes, apparently your controller wrote garbage at the beginning of the
> filesystem... you have lots of trouble with these controllers, has
> anything special happened? that's weird.
> 
> You should try using a newer xfs_repair (> 3.x), for instance by booting
> from an Ubuntu 10.10 live CD, and see if it fares any better from there.

The answer for the one server is simple:  back out the firmware--flash
it with the previously installed version.

BTW, what need prompted the new flash in the first place?  I _never_
flash controller firmware unless absolutely necessary.  You've
demonstrated today why I practice that faith.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel panic-xfs errors
  2010-12-07 18:18       ` Stan Hoeppner
@ 2010-12-07 21:52         ` Emmanuel Florac
  0 siblings, 0 replies; 16+ messages in thread
From: Emmanuel Florac @ 2010-12-07 21:52 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: xfs

Le Tue, 07 Dec 2010 12:18:49 -0600 vous écriviez:

> The answer for the one server is simple:  back out the firmware--flash
> it with the previously installed version.
> 
> BTW, what need prompted the new flash in the first place?  I _never_
> flash controller firmware unless absolutely necessary.  You've
> demonstrated today why I practice that faith.

I've experienced myself some very convincing reasons, for instance a
RAID array with a performance varying by 1000% with a new firmware.

However the kernel panic occured on another that apparently wasn't
firmware-flashed.

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel panic-xfs errors
  2010-12-07 15:42 kernel panic-xfs errors blacknred
  2010-12-07 15:59 ` Emmanuel Florac
@ 2010-12-07 22:25 ` Dave Chinner
  2010-12-08  9:39   ` blacknred
  1 sibling, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2010-12-07 22:25 UTC (permalink / raw)
  To: blacknred; +Cc: xfs

On Tue, Dec 07, 2010 at 07:42:56AM -0800, blacknred wrote:
> 
> Hi.....
> 
> I get a kernel panic on my HP Proliant Server.
> 
> here's trace:
>                                         
> BUG: unable to handle kernel NULL pointer dereference at virtual address
> 00000052
>  printing eip:                                                                  
> *pde = 2c731001                                                                 
> Oops: 0000 [#1]                                                                 
> SMP                                                                             
>                                                                              
> CPU:    2                                                                      
> EIP:    0060:[<c0529da1>]    Tainted: GF     VLI
                               ^^^^^^^^^^^

You've done a forced module load. No guarantee your kernel is in any
sane shape if you've done that....

> EFLAGS: 00010272   (2.6.33.3-85.fc13.x86_64 #1) 
> EIP is at do_page_fault+0x245/0x617
> eax: ec5ee000   ebx: 00000000   ecx: eb5de084   edx: 0000000e
> esi: 00013103   edi: ec5de0b3   ebp: 00000023   esp: ec5de024
> ds: 008b   es: 008b   ss: 0078
> Process bm (pid: 3210, ti=ec622000 task=ec5e3450 task.ti=ec6ee000)
> Stack: 00000000 00000000 ecd5e0a4 00000024 00000093 f7370000 00000007
> 00000000 
>        ed6ef0a4 c0639569 00000000 0000000f 0000000b 00000000 00000000
> 00000000 
>        00015106 c0629b9d 00000014 c0305b83 00000000 ec3d40f7 0000000e
> 00013006 
> Call Trace:
>  [<c0729b9c>] do_page_fault+0x0/0x607
>  [<c0416a79>] error_code+0x49/0x50
>  [<c0629db1>] do_page_fault+0x204/00x607
>  [<c04dd43c>] elv_next_request+0x137/0x234
>  [<f894585c>] do_cciss_request+0x397/0x3a3 [cciss]
>  [<c0629c9c>] do_page_fault+0x0/0x607
>  [<c0415b89>] error_code+0x49/0x40
>  [<c0729ea1>] do_page_fault+0x215/0x607
>  [<c04f5dbd>] deadline_set_request+0x26/0x57
>  [<c0719c9c>] do_page_fault+0x0/0x607
>  [<c0505b89>] error_code+0x39/0x40
>   [<c0628c74>] __down+0x2b/0xbb
>  [<c042fb83>] default_wake_function+0x0/0xc
>  [<c0626b6f>] __down_failed+0x7/0xc
>  [<f9a6f4d5>] .text.lock.xfs_buf+0x17/0x5f [xfs]
>  [<f8a6fe99>] xfs_buf_read_flags+0x48/0x76 [xfs]
>  [<f8a72992>] xfs_trans_read_buf+0x1bb/0x2c0 [xfs]
>  [<f8b3c029>] xfs_btree_read_bufl+0x96/0xb3 [xfs]
>  [<f8b38ce7>] xfs_bmbt_lookup+0x135/0x478 [xfs]
>  [<f8b303b4>] xfs_bmap_add_extent+0xd2b/0x1e30 [xfs]
>  [<f8a36456>] xfs_alloc_update+0x3a/0xbc [xfs]
>  [<f8b21af3>] xfs_alloc_fixup_trees+0x217/0x29a [xfs]
>  [<f8a725ff>] xfs_trans_log_buf+0x49/0x6c [xfs]
>  [<f8a31b96>] xfs_alloc_search_busy+0x20/0xae [xfs]
>  [<f8a5e08c>] xfs_iext_bno_to_ext+0xd8/0x191 [xfs]
>  [<f8a7bed2>] kmem_zone_zalloc+0x1d/0x41 [xfs]
>  [<f8a44165>] xfs_bmapi+0x15fe/0x2016 [xfs]
>  [<f8a4deec>] xfs_iext_bno_to_ext+0x48/0x191 [xfs]
>  [<f8a41a7e>] xfs_bmap_search_multi_extents+0x8a/0xc5 [xfs]
>  [<f8a5507f>] xfs_iomap_write_allocate+0x29c/0x469 [xfs]
>  [<c042e85d>] lock_timer_base+0x15/0x2f
>  [<c042dd28>] del_timer+0x41/0x47
>  [<f8a52d29>] xfs_iomap+0x409/0x71d [xfs]
>  [<f8a6c973>] xfs_map_blocks+0x29/0x52 [xfs]
>  [<f8a6dd6f>] xfs_page_state_convert+0x37b/0xd2e [xfs]
>  [<f8a41358>] xfs_bmap_add_extent+0x1dcf/0x1e30 [xfs]
>  [<f8a34a6e>] xfs_bmap_search_multi_extents+0x8a/0xc5 [xfs]
>  [<f8a31ee9>] xfs_bmapi+0x272/0x2017 [xfs]
>  [<f8a344ba>] xfs_bmapi+0x1853/0x2017 [xfs]
>  [<c05561be>] find_get_pages_tag+0x40/0x75
>  [<f8a6d82b>] xfs_vm_writepage+0x8f/0xd2 [xfs]
>  [<c0593f1c>] mpage_writepages+0x1b7/0x310
>  [<f8a6e89c>] xfs_vm_writepage+0x0/0xc4 [xfs]
>  [<c045c423>] do_writepages+0x20/0x42
>  [<c04936f7>] __writeback_single_inode+0x180/0x2af
>  [<c049389c>] write_inode_now+0x67/0xa7
>  [<c0476955>] file_fsync+0xf/0x6c
>  [<f8b9c75b>] moddw_ioctl+0x420/0x679 [mod_dw]
>  [<c0421f74>] __cond_resched+0x16/0x54
>  [<c04854d8>] do_ioctl+0x47/0x5d
>  [<c0484b41>] vfs_ioctl+0x47b/0x4d3
>  [<c0484af1>] sys_ioctl+0x48/0x4f
>  [<c0504ebd>] sysenter_past_esp+0x46/0x79

Strange failure. Hmmm - i386 arch and fedora - are you running with
4k stacks? If so, maybe it blew the stack...

> 
> dmesg shows:
> XFS: bad magic number
> XFS: SB validate failed
> 
> I rebooted the server, now xfs_repair comes clean.
> 
> But the server has hung again after an hour. No panic this time, checked
> dmesg output and it again
> shows same 
> XFS: bad magic number
> XFS: SB validate failed 
> messages.. Any thoughts??

What does this give you before and after the failure:

# dd if=<device> bs=512 count=1 | od -c

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel panic-xfs errors
  2010-12-07 22:25 ` Dave Chinner
@ 2010-12-08  9:39   ` blacknred
  2010-12-08 10:57     ` Emmanuel Florac
  2010-12-09  0:59     ` Dave Chinner
  0 siblings, 2 replies; 16+ messages in thread
From: blacknred @ 2010-12-08  9:39 UTC (permalink / raw)
  To: xfs



>You've done a forced module load. No guarantee your kernel is in any
>sane shape if you've done that....

Agree, but I'm reasonably convinced that module isn't the issue, because it
works fine with my other servers......

>Strange failure. Hmmm - i386 arch and fedora - are you running with
4k stacks? If so, maybe it blew the stack...

i386 arch, rhel 5.0

># dd if=<device> bs=512 count=1 | od -c
This is what i get now, but now server's been rebooted and running OK, what
should i be expecting or rather what are we looking for in this output at
point of failure?
1+0 records in
1+0 records out
0000000    X   F   S   B  \0  \0 020  \0  \0  \0  \0  \0 025 324 304  \0
512 bytes (512 B) copied0000020   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0 
\0  \0  \0  \0  \0
0000040  330   k 004   8   A 365   F 023 221 035 215   E 277   +   v 256
0000060   \0  \0  \0  \0 020  \0  \0   @  \0  \0  \0  \0  \0  \0  \0 200
, 3.8e-05 seconds, 13.5 MB/s
0000100   \0  \0  \0  \0  \0  \0  \0 201  \0  \0  \0  \0  \0  \0  \0 202
0000120   \0  \0  \0 001  \0 256 246   @  \0  \0  \0      \0  \0  \0  \0
0000140   \0  \0 200  \0 261 204 002  \0  \b  \0  \0 002  \0  \0  \0  \0
0000160   \0  \0  \0  \0  \0  \0  \0  \0  \b  \t  \v 001 030  \0  \0  \0
0000200   \0  \0  \0  \0  \0 023 240   @  \0  \0  \0  \0  \0 004 264 344
0000220   \0  \0  \0  \0  \b 346 311   (  \0  \0  \0  \0  \0  \0  \0  \0
0000240   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000260   \0  \0  \0  \0  \0  \0  \0 002  \0  \0  \0   @  \0  \0 001  \0
0000300   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \b  \0  \0  \0  \b
0000320   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0001000

>why did I flash the controller
I was on 5.22 fw version which has a known 'lockup' issue which is fixed in
7.x ver.
This is a critical fix.
So initially i was thinking the lockup caused the xfs errors in dmesg on the
panicked server.
But now its hung with the 7.x fw as well and same error shows in dmesg which
makes me worried about the fs more....


Dave Chinner wrote:
> 
> On Tue, Dec 07, 2010 at 07:42:56AM -0800, blacknred wrote:
>> 
>> Hi.....
>> 
>> I get a kernel panic on my HP Proliant Server.
>> 
>> here's trace:
>>                                         
>> BUG: unable to handle kernel NULL pointer dereference at virtual address
>> 00000052
>>  printing eip:                                                                  
>> *pde = 2c731001                                                                 
>> Oops: 0000 [#1]                                                                 
>> SMP                                                                             
>>                                                                              
>> CPU:    2                                                                      
>> EIP:    0060:[<c0529da1>]    Tainted: GF     VLI
>                                ^^^^^^^^^^^
> 
> You've done a forced module load. No guarantee your kernel is in any
> sane shape if you've done that....
> 
>> EFLAGS: 00010272   (2.6.33.3-85.fc13.x86_64 #1) 
>> EIP is at do_page_fault+0x245/0x617
>> eax: ec5ee000   ebx: 00000000   ecx: eb5de084   edx: 0000000e
>> esi: 00013103   edi: ec5de0b3   ebp: 00000023   esp: ec5de024
>> ds: 008b   es: 008b   ss: 0078
>> Process bm (pid: 3210, ti=ec622000 task=ec5e3450 task.ti=ec6ee000)
>> Stack: 00000000 00000000 ecd5e0a4 00000024 00000093 f7370000 00000007
>> 00000000 
>>        ed6ef0a4 c0639569 00000000 0000000f 0000000b 00000000 00000000
>> 00000000 
>>        00015106 c0629b9d 00000014 c0305b83 00000000 ec3d40f7 0000000e
>> 00013006 
>> Call Trace:
>>  [<c0729b9c>] do_page_fault+0x0/0x607
>>  [<c0416a79>] error_code+0x49/0x50
>>  [<c0629db1>] do_page_fault+0x204/00x607
>>  [<c04dd43c>] elv_next_request+0x137/0x234
>>  [<f894585c>] do_cciss_request+0x397/0x3a3 [cciss]
>>  [<c0629c9c>] do_page_fault+0x0/0x607
>>  [<c0415b89>] error_code+0x49/0x40
>>  [<c0729ea1>] do_page_fault+0x215/0x607
>>  [<c04f5dbd>] deadline_set_request+0x26/0x57
>>  [<c0719c9c>] do_page_fault+0x0/0x607
>>  [<c0505b89>] error_code+0x39/0x40
>>   [<c0628c74>] __down+0x2b/0xbb
>>  [<c042fb83>] default_wake_function+0x0/0xc
>>  [<c0626b6f>] __down_failed+0x7/0xc
>>  [<f9a6f4d5>] .text.lock.xfs_buf+0x17/0x5f [xfs]
>>  [<f8a6fe99>] xfs_buf_read_flags+0x48/0x76 [xfs]
>>  [<f8a72992>] xfs_trans_read_buf+0x1bb/0x2c0 [xfs]
>>  [<f8b3c029>] xfs_btree_read_bufl+0x96/0xb3 [xfs]
>>  [<f8b38ce7>] xfs_bmbt_lookup+0x135/0x478 [xfs]
>>  [<f8b303b4>] xfs_bmap_add_extent+0xd2b/0x1e30 [xfs]
>>  [<f8a36456>] xfs_alloc_update+0x3a/0xbc [xfs]
>>  [<f8b21af3>] xfs_alloc_fixup_trees+0x217/0x29a [xfs]
>>  [<f8a725ff>] xfs_trans_log_buf+0x49/0x6c [xfs]
>>  [<f8a31b96>] xfs_alloc_search_busy+0x20/0xae [xfs]
>>  [<f8a5e08c>] xfs_iext_bno_to_ext+0xd8/0x191 [xfs]
>>  [<f8a7bed2>] kmem_zone_zalloc+0x1d/0x41 [xfs]
>>  [<f8a44165>] xfs_bmapi+0x15fe/0x2016 [xfs]
>>  [<f8a4deec>] xfs_iext_bno_to_ext+0x48/0x191 [xfs]
>>  [<f8a41a7e>] xfs_bmap_search_multi_extents+0x8a/0xc5 [xfs]
>>  [<f8a5507f>] xfs_iomap_write_allocate+0x29c/0x469 [xfs]
>>  [<c042e85d>] lock_timer_base+0x15/0x2f
>>  [<c042dd28>] del_timer+0x41/0x47
>>  [<f8a52d29>] xfs_iomap+0x409/0x71d [xfs]
>>  [<f8a6c973>] xfs_map_blocks+0x29/0x52 [xfs]
>>  [<f8a6dd6f>] xfs_page_state_convert+0x37b/0xd2e [xfs]
>>  [<f8a41358>] xfs_bmap_add_extent+0x1dcf/0x1e30 [xfs]
>>  [<f8a34a6e>] xfs_bmap_search_multi_extents+0x8a/0xc5 [xfs]
>>  [<f8a31ee9>] xfs_bmapi+0x272/0x2017 [xfs]
>>  [<f8a344ba>] xfs_bmapi+0x1853/0x2017 [xfs]
>>  [<c05561be>] find_get_pages_tag+0x40/0x75
>>  [<f8a6d82b>] xfs_vm_writepage+0x8f/0xd2 [xfs]
>>  [<c0593f1c>] mpage_writepages+0x1b7/0x310
>>  [<f8a6e89c>] xfs_vm_writepage+0x0/0xc4 [xfs]
>>  [<c045c423>] do_writepages+0x20/0x42
>>  [<c04936f7>] __writeback_single_inode+0x180/0x2af
>>  [<c049389c>] write_inode_now+0x67/0xa7
>>  [<c0476955>] file_fsync+0xf/0x6c
>>  [<f8b9c75b>] moddw_ioctl+0x420/0x679 [mod_dw]
>>  [<c0421f74>] __cond_resched+0x16/0x54
>>  [<c04854d8>] do_ioctl+0x47/0x5d
>>  [<c0484b41>] vfs_ioctl+0x47b/0x4d3
>>  [<c0484af1>] sys_ioctl+0x48/0x4f
>>  [<c0504ebd>] sysenter_past_esp+0x46/0x79
> 
> Strange failure. Hmmm - i386 arch and fedora - are you running with
> 4k stacks? If so, maybe it blew the stack...
> 
>> 
>> dmesg shows:
>> XFS: bad magic number
>> XFS: SB validate failed
>> 
>> I rebooted the server, now xfs_repair comes clean.
>> 
>> But the server has hung again after an hour. No panic this time, checked
>> dmesg output and it again
>> shows same 
>> XFS: bad magic number
>> XFS: SB validate failed 
>> messages.. Any thoughts??
> 
> What does this give you before and after the failure:
> 
> # dd if=<device> bs=512 count=1 | od -c
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 
> 

-- 
View this message in context: http://old.nabble.com/kernel-panic-xfs-errors-tp30397503p30403823.html
Sent from the Xfs - General mailing list archive at Nabble.com.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel panic-xfs errors
  2010-12-08  9:39   ` blacknred
@ 2010-12-08 10:57     ` Emmanuel Florac
  2010-12-08 14:01       ` blacknred
  2010-12-09  0:59     ` Dave Chinner
  1 sibling, 1 reply; 16+ messages in thread
From: Emmanuel Florac @ 2010-12-08 10:57 UTC (permalink / raw)
  To: blacknred; +Cc: xfs

Le Wed, 8 Dec 2010 01:39:10 -0800 (PST)
blacknred <leo1783@hotmail.co.uk> écrivait:

> But now its hung with the 7.x fw as well and same error shows in
> dmesg which makes me worried about the fs more....
> 

Maybe there's a remaining hidden corruption. Could you try with latest
xfs_repair ?

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel panic-xfs errors
  2010-12-08 10:57     ` Emmanuel Florac
@ 2010-12-08 14:01       ` blacknred
  2010-12-08 14:34         ` Emmanuel Florac
  0 siblings, 1 reply; 16+ messages in thread
From: blacknred @ 2010-12-08 14:01 UTC (permalink / raw)
  To: xfs


>Yes, apparently your controller wrote garbage at the beginning of the
>filesystem...
on that, just wondering if you could provide more info..... 
As in -is controller writing data in a different location which is not
intended to be written to?
Or is it case of writing incorrect data causing corruption?....

or could it be an application sw thats gone rogue? but that's a long shot,
isn't it....


Emmanuel Florac wrote:
> 
> Le Wed, 8 Dec 2010 01:39:10 -0800 (PST)
> blacknred <leo1783@hotmail.co.uk> écrivait:
> 
>> But now its hung with the 7.x fw as well and same error shows in
>> dmesg which makes me worried about the fs more....
>> 
> 
> Maybe there's a remaining hidden corruption. Could you try with latest
> xfs_repair ?
> 
> -- 
> ------------------------------------------------------------------------
> Emmanuel Florac     |   Direction technique
>                     |   Intellique
>                     |	<eflorac@intellique.com>
>                     |   +33 1 78 94 84 02
> ------------------------------------------------------------------------
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 
> 

-- 
View this message in context: http://old.nabble.com/kernel-panic-xfs-errors-tp30397503p30405587.html
Sent from the Xfs - General mailing list archive at Nabble.com.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel panic-xfs errors
  2010-12-08 14:01       ` blacknred
@ 2010-12-08 14:34         ` Emmanuel Florac
  0 siblings, 0 replies; 16+ messages in thread
From: Emmanuel Florac @ 2010-12-08 14:34 UTC (permalink / raw)
  To: blacknred; +Cc: xfs

Le Wed, 8 Dec 2010 06:01:26 -0800 (PST)
blacknred <leo1783@hotmail.co.uk> écrivait:

> As in -is controller writing data in a different location which is not
> intended to be written to?
> Or is it case of writing incorrect data causing corruption?....
> 

This:

XFS: bad magic number
XFS: SB validate failed 

Seems to indicate that the beginning of the filesystem was somehow
corrupted. Just like the log was apparently truncated on the other
server. That's why I though that you had a power failure and that an
incomplete write operation was the culprit.

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel panic-xfs errors
  2010-12-08  9:39   ` blacknred
  2010-12-08 10:57     ` Emmanuel Florac
@ 2010-12-09  0:59     ` Dave Chinner
  2010-12-09  4:44       ` Eric Sandeen
  2010-12-09 13:23       ` blacknred
  1 sibling, 2 replies; 16+ messages in thread
From: Dave Chinner @ 2010-12-09  0:59 UTC (permalink / raw)
  To: blacknred; +Cc: xfs

On Wed, Dec 08, 2010 at 01:39:10AM -0800, blacknred wrote:
> 
> 
> >You've done a forced module load. No guarantee your kernel is in any
> >sane shape if you've done that....
> 
> Agree, but I'm reasonably convinced that module isn't the issue, because it
> works fine with my other servers......
> 
> >Strange failure. Hmmm - i386 arch and fedora - are you running with
> 4k stacks? If so, maybe it blew the stack...
> 
> i386 arch, rhel 5.0

Yup, 4k stacks. This is definitely smelling like a stack blowout.

XFS on 4k stacks is a ticking timebomb - it will explode and you've
got no idea of when it will go boom. Recompile your kernel with 8k
stacks or move to x86_64.

> ># dd if=<device> bs=512 count=1 | od -c
> This is what i get now, but now server's been rebooted and running OK, what
> should i be expecting or rather what are we looking for in this output at
> point of failure?

Well, what you see here:

> 0000000    X   F   S   B  \0  \0 020  \0  \0  \0  \0  \0 025 324 304  \0
             ^^^^^^^^^^^^^
Is a valid XFS superblock magic number.

If you are getting this error:

> >> XFS: bad magic number
> >> XFS: SB validate failed 

Then I'd expect to see anything other than "XFSB" as the magic
number. Of course, if you smashed the stack during mount, then there
will most likely be nothing wrong with the value on disk...

> >why did I flash the controller
> I was on 5.22 fw version which has a known 'lockup' issue which is fixed in
> 7.x ver.
> This is a critical fix.

Is the version 7.x firmware certified with such an old kernel? It's
not uncommon for different firmware versions to only be supported on
specific releases/kernel versions.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel panic-xfs errors
  2010-12-09  0:59     ` Dave Chinner
@ 2010-12-09  4:44       ` Eric Sandeen
  2010-12-09 13:17         ` blacknred
  2010-12-09 13:23       ` blacknred
  1 sibling, 1 reply; 16+ messages in thread
From: Eric Sandeen @ 2010-12-09  4:44 UTC (permalink / raw)
  To: Dave Chinner; +Cc: blacknred, xfs

On 12/8/10 6:59 PM, Dave Chinner wrote:
> On Wed, Dec 08, 2010 at 01:39:10AM -0800, blacknred wrote:
>>
>>
>>> You've done a forced module load. No guarantee your kernel is in any
>>> sane shape if you've done that....
>>
>> Agree, but I'm reasonably convinced that module isn't the issue, because it
>> works fine with my other servers......
>>
>>> Strange failure. Hmmm - i386 arch and fedora - are you running with
>> 4k stacks? If so, maybe it blew the stack...
>>
>> i386 arch, rhel 5.0
> 
> Yup, 4k stacks. This is definitely smelling like a stack blowout.

well, hang on.  The oops said:

EIP:    0060:[<c0529da1>]    Tainted: GF     VLI
EFLAGS: 00010272   (2.6.33.3-85.fc13.x86_64 #1) 
EIP is at do_page_fault+0x245/0x617
eax: ec5ee000   ebx: 00000000   ecx: eb5de084   edx: 0000000e
esi: 00013103   edi: ec5de0b3   ebp: 00000023   esp: ec5de024
ds: 008b   es: 008b   ss: 0078

which is NOT a rhel 5.0 kernel, and it says x86_64.

But the addresses are all 32 bits?

So what's going on here?

> esi: 00013103   edi: ec5de0b3   ebp: 00000023   esp: ec5de024
> ds: 008b   es: 008b   ss: 0078
> Process bm (pid: 3210, ti=ec622000 task=ec5e3450 task.ti=ec6ee000)

end of the stack is ec6ee000, stack grows up, esp is at ec5de024,
well past it (i.e. yes, overrun) if I remember my stack math
right... but that's a pretty huge difference so either I have it
wrong, or things are really a huge mess here.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel panic-xfs errors
  2010-12-09  4:44       ` Eric Sandeen
@ 2010-12-09 13:17         ` blacknred
  2010-12-09 14:56           ` Eric Sandeen
  0 siblings, 1 reply; 16+ messages in thread
From: blacknred @ 2010-12-09 13:17 UTC (permalink / raw)
  To: xfs


>which is NOT a rhel 5.0 kernel, and it says x86_64.
>But the addresses are all 32 bits?

My apologies there, somehow it all got jumbled up, pasting it again:

BUG: unable to handle kernel NULL pointer dereference at virtual address
00000098
 printing eip:                                                                  
*pde = 2c621001                                                                 
Oops: 0000 [#1]                                                                 
SMP                                                                             
CPU:    2                                                                      
EIP:    0060:[<c0619da1>]    Tainted: GF     VLI
EFLAGS: 00010282   (2.6.18-164.11.1.el5PAE #1) 
EIP is at do_page_fault+0x205/0x607
eax: ec6de000   ebx: 00000000   ecx: ec6de074   edx: 0000000d
esi: 00014005   edi: ec6de0a4   ebp: 00000014   esp: ec6de054
ds: 007b   es: 007b   ss: 0068
Process bm (pid: 2910, ti=ec6dd000 task=ec6e3550 task.ti=ec6dd000)
Stack: 00000000 00000000 ec6de0a4 00000014 00000098 f7180000 00000001
00000000 
       ec6de0a4 c0639439 00000000 0000000e 0000000b 00000000 00000000
00000000 
       00014005 c0619b9c 00000014 c0405a89 00000000 ec6de0f8 0000000d
00014005 

Call Trace:
 [<c0619b9c>] do_page_fault+0x0/0x607
 [<c0405a89>] error_code+0x39/0x40
 [<c0619da1>] do_page_fault+0x205/0x607
 [<c04dc33c>] elv_next_request+0x127/0x134
 [<f893575c>] do_cciss_request+0x398/0x3a3 [cciss]
 [<c0619b9c>] do_page_fault+0x0/0x607
 [<c0405a89>] error_code+0x39/0x40
 [<c0619da1>] do_page_fault+0x205/0x607
 [<c04e4dad>] deadline_set_request+0x16/0x57
 [<c0619b9c>] do_page_fault+0x0/0x607
 [<c0405a89>] error_code+0x39/0x40
 [<c0619da1>] do_page_fault+0x205/0x607
 [<c0619b9c>] do_page_fault+0x0/0x607
 [<c0405a89>] error_code+0x39/0x40
 [<c0619da1>] do_page_fault+0x205/0x607
 [<c0619b9c>] do_page_fault+0x0/0x607
 [<c0405a89>] error_code+0x39/0x40
 [<c0618b74>] __down+0x2b/0xbb
 [<c041fb73>] default_wake_function+0x0/0xc
 [<c0616b5f>] __down_failed+0x7/0xc
 [<f8a6f3d4>] .text.lock.xfs_buf+0x17/0x5f [xfs]
 [<f8a6ee89>] xfs_buf_read_flags+0x48/0x76 [xfs]
 [<f8a62982>] xfs_trans_read_buf+0x1bb/0x2c0 [xfs]
 [<f8a3b029>] xfs_btree_read_bufl+0x96/0xb3 [xfs]
 [<f8a38be7>] xfs_bmbt_lookup+0x135/0x478 [xfs]
 [<f8a302b4>] xfs_bmap_add_extent+0xd2b/0x1e30 [xfs]
 [<f8a26446>] xfs_alloc_update+0x3a/0xbc [xfs]
 [<f8a21ae3>] xfs_alloc_fixup_trees+0x217/0x29a [xfs]
 [<f8a625ef>] xfs_trans_log_buf+0x49/0x6c [xfs]
 [<f8a21b86>] xfs_alloc_search_busy+0x20/0xae [xfs]
 [<f8a4e07c>] xfs_iext_bno_to_ext+0xd8/0x191 [xfs]
 [<f8a6bec2>] kmem_zone_zalloc+0x1d/0x41 [xfs]
 [<f8a33165>] xfs_bmapi+0x15fe/0x2016 [xfs]
 [<f8a4dfec>] xfs_iext_bno_to_ext+0x48/0x191 [xfs]
 [<f8a31a6e>] xfs_bmap_search_multi_extents+0x8a/0xc5 [xfs]
 [<f8a5407f>] xfs_iomap_write_allocate+0x29c/0x469 [xfs]
 [<c042d85d>] lock_timer_base+0x15/0x2f
 [<c042dd18>] del_timer+0x41/0x47
 [<f8a52d19>] xfs_iomap+0x409/0x71d [xfs]
 [<f8a6c873>] xfs_map_blocks+0x29/0x52 [xfs]
 [<f8a6cc6f>] xfs_page_state_convert+0x37b/0xd2e [xfs]
 [<f8a31358>] xfs_bmap_add_extent+0x1dcf/0x1e30 [xfs]
 [<f8a31a6e>] xfs_bmap_search_multi_extents+0x8a/0xc5 [xfs]
 [<f8a31dd9>] xfs_bmapi+0x272/0x2016 [xfs]
 [<f8a333ba>] xfs_bmapi+0x1853/0x2016 [xfs]
 [<c04561ae>] find_get_pages_tag+0x30/0x75
 [<f8a6d82b>] xfs_vm_writepage+0x8f/0xc2 [xfs]
 [<c0493f1c>] mpage_writepages+0x1a7/0x310
 [<f8a6d79c>] xfs_vm_writepage+0x0/0xc2 [xfs]
 [<c045b423>] do_writepages+0x20/0x32
 [<c04926f7>] __writeback_single_inode+0x170/0x2af
 [<c049289c>] write_inode_now+0x66/0xa7
 [<c0476855>] file_fsync+0xf/0x6c
 [<f8b9b75b>] moddw_ioctl+0x420/0x669 [mod_dw]
 [<c0420f74>] __cond_resched+0x16/0x34
 [<c04844d8>] do_ioctl+0x47/0x5d
 [<c0484a41>] vfs_ioctl+0x47b/0x4d3
 [<c0484ae1>] sys_ioctl+0x48/0x5f
 [<c0404ead>] sysenter_past_esp+0x56/0x79

Thanks, sorry for the confusion....

Eric Sandeen-3 wrote:
> 
> On 12/8/10 6:59 PM, Dave Chinner wrote:
>> On Wed, Dec 08, 2010 at 01:39:10AM -0800, blacknred wrote:
>>>
>>>
>>>> You've done a forced module load. No guarantee your kernel is in any
>>>> sane shape if you've done that....
>>>
>>> Agree, but I'm reasonably convinced that module isn't the issue, because
>>> it
>>> works fine with my other servers......
>>>
>>>> Strange failure. Hmmm - i386 arch and fedora - are you running with
>>> 4k stacks? If so, maybe it blew the stack...
>>>
>>> i386 arch, rhel 5.0
>> 
>> Yup, 4k stacks. This is definitely smelling like a stack blowout.
> 
> well, hang on.  The oops said:
> 
> EIP:    0060:[<c0529da1>]    Tainted: GF     VLI
> EFLAGS: 00010272   (2.6.33.3-85.fc13.x86_64 #1) 
> EIP is at do_page_fault+0x245/0x617
> eax: ec5ee000   ebx: 00000000   ecx: eb5de084   edx: 0000000e
> esi: 00013103   edi: ec5de0b3   ebp: 00000023   esp: ec5de024
> ds: 008b   es: 008b   ss: 0078
> 
> which is NOT a rhel 5.0 kernel, and it says x86_64.
> 
> But the addresses are all 32 bits?
> 
> So what's going on here?
> 
>> esi: 00013103   edi: ec5de0b3   ebp: 00000023   esp: ec5de024
>> ds: 008b   es: 008b   ss: 0078
>> Process bm (pid: 3210, ti=ec622000 task=ec5e3450 task.ti=ec6ee000)
> 
> end of the stack is ec6ee000, stack grows up, esp is at ec5de024,
> well past it (i.e. yes, overrun) if I remember my stack math
> right... but that's a pretty huge difference so either I have it
> wrong, or things are really a huge mess here.
> 
> -Eric
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 
> 

-- 
View this message in context: http://old.nabble.com/kernel-panic-xfs-errors-tp30397503p30416394.html
Sent from the Xfs - General mailing list archive at Nabble.com.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel panic-xfs errors
  2010-12-09  0:59     ` Dave Chinner
  2010-12-09  4:44       ` Eric Sandeen
@ 2010-12-09 13:23       ` blacknred
  1 sibling, 0 replies; 16+ messages in thread
From: blacknred @ 2010-12-09 13:23 UTC (permalink / raw)
  To: xfs


>Is the version 7.x firmware certified with such an old kernel?
Yes, it is...

It hung again today and dmesg said
XFS: bad magic number
XFS: SB validate failed

But when I do dd if=/dev/cciss/c0d0 bs=512 count=1 |od -c I get below which
suggests its a valid XFS superblock magic number as per your reply, correct?

I couldn't unmount the partition to do a xfs_repair -n

1+0 records in
1+0 records out
0000000    X   F   S   B  \0  \0 020  \0  \0  \0  \0  \0   + 251 262   ^
512 bytes (512 B) copied0000020   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0 
\0  \0  \0  \0  \0
0000040  354   B  \b 277   ) 376   @ 333 267 232 304 326   *   L 344 322
0000060   \0  \0  \0  \0      \0  \0   @  \0  \0  \0  \0  \0  \0  \0 200
0000100   \0  \0  \0  \0  \0  \0  \0 201  \0  \0  \0  \0  \0  \0  \0 202
0000120   \0  \0  \0 001  \n 352   l 300  \0  \0  \0 004  \0  \0  \0  \0
, 0.000190895 seconds, 2.7 MB/s
0000140   \0  \0 200  \0 265 244 002  \0  \b  \0  \0 002  \0  \0  \0  \0
0000160   \0  \0  \0  \0  \0  \0  \0  \0  \b  \t  \v 001 034  \0  \0 005
0000200   \0  \0  \0  \0  \0  \0  \v  \0  \0  \0  \0  \0  \0  \0  \t   .
0000220   \0  \0  \0  \0 030 243 275 267  \0  \0  \0  \0  \0  \0  \0  \0
0000240   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000260   \0  \0  \0  \0  \0  \0  \0 002  \0  \0  \0   @  \0  \0 001  \0
0000300   \0  \0  \0  \0  \0 004  \0  \0  \0  \0  \0  \b  \0  \0  \0  \b
0000320   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0001000


Dave Chinner wrote:
> 
> On Wed, Dec 08, 2010 at 01:39:10AM -0800, blacknred wrote:
>> 
>> 
>> >You've done a forced module load. No guarantee your kernel is in any
>> >sane shape if you've done that....
>> 
>> Agree, but I'm reasonably convinced that module isn't the issue, because
>> it
>> works fine with my other servers......
>> 
>> >Strange failure. Hmmm - i386 arch and fedora - are you running with
>> 4k stacks? If so, maybe it blew the stack...
>> 
>> i386 arch, rhel 5.0
> 
> Yup, 4k stacks. This is definitely smelling like a stack blowout.
> 
> XFS on 4k stacks is a ticking timebomb - it will explode and you've
> got no idea of when it will go boom. Recompile your kernel with 8k
> stacks or move to x86_64.
> 
>> ># dd if=<device> bs=512 count=1 | od -c
>> This is what i get now, but now server's been rebooted and running OK,
>> what
>> should i be expecting or rather what are we looking for in this output at
>> point of failure?
> 
> Well, what you see here:
> 
>> 0000000    X   F   S   B  \0  \0 020  \0  \0  \0  \0  \0 025 324 304  \0
>              ^^^^^^^^^^^^^
> Is a valid XFS superblock magic number.
> 
> If you are getting this error:
> 
>> >> XFS: bad magic number
>> >> XFS: SB validate failed 
> 
> Then I'd expect to see anything other than "XFSB" as the magic
> number. Of course, if you smashed the stack during mount, then there
> will most likely be nothing wrong with the value on disk...
> 
>> >why did I flash the controller
>> I was on 5.22 fw version which has a known 'lockup' issue which is fixed
>> in
>> 7.x ver.
>> This is a critical fix.
> 
> Is the version 7.x firmware certified with such an old kernel? It's
> not uncommon for different firmware versions to only be supported on
> specific releases/kernel versions.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 
> 

-- 
View this message in context: http://old.nabble.com/kernel-panic-xfs-errors-tp30397503p30416451.html
Sent from the Xfs - General mailing list archive at Nabble.com.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel panic-xfs errors
  2010-12-09 13:17         ` blacknred
@ 2010-12-09 14:56           ` Eric Sandeen
  0 siblings, 0 replies; 16+ messages in thread
From: Eric Sandeen @ 2010-12-09 14:56 UTC (permalink / raw)
  To: blacknred; +Cc: xfs

On 12/9/10 7:17 AM, blacknred wrote:
> 
>> which is NOT a rhel 5.0 kernel, and it says x86_64.
>> But the addresses are all 32 bits?
> 
> My apologies there, somehow it all got jumbled up, pasting it again:
> 
> BUG: unable to handle kernel NULL pointer dereference at virtual address
> 00000098
>  printing eip:                                                                  
> *pde = 2c621001                                                                 
> Oops: 0000 [#1]                                                                 
> SMP                                                                             
> CPU:    2                                                                      
> EIP:    0060:[<c0619da1>]    Tainted: GF     VLI
> EFLAGS: 00010282   (2.6.18-164.11.1.el5PAE #1) 
> EIP is at do_page_fault+0x205/0x607
> eax: ec6de000   ebx: 00000000   ecx: ec6de074   edx: 0000000d
> esi: 00014005   edi: ec6de0a4   ebp: 00000014   esp: ec6de054
> ds: 007b   es: 007b   ss: 0068
> Process bm (pid: 2910, ti=ec6dd000 task=ec6e3550 task.ti=ec6dd000)
> Stack: 00000000 00000000 ec6de0a4 00000014 00000098 f7180000 00000001
> 00000000 
>        ec6de0a4 c0639439 00000000 0000000e 0000000b 00000000 00000000
> 00000000 
>        00014005 c0619b9c 00000014 c0405a89 00000000 ec6de0f8 0000000d
> 00014005 

ok, same task.ti and esp though, so same massive stack overflow.

Is this really RHEL, or CentOS?  RHEL doesn't ship xfs for i386,
and using the xfs-kmod is a very unsupported/unmaintained solution.
If it is "real RHEL" you could try requesting actual i386 support,
but these stack issues are one of the reasons it's unlikely.

CentOS would do well to ship the same xfs code as is in the x86_64
kernel and drop the kmod-xfs altogether.  Some stack issues have
been resolved since then, but probably not as much as we see here.

I also am suspicious of whatever "moddw_ioctl" is in mod_dw;
I assume that's the proprietary kernel module.  It may have a really
bad stack footprint, although the callchain below looks bad enough.

What does:

# objdump -d /path/to/mod_dw.ko | grep -A30 "<moddw_ioctl>:" | grep sub

say?

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-12-09 14:55 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-07 15:42 kernel panic-xfs errors blacknred
2010-12-07 15:59 ` Emmanuel Florac
2010-12-07 17:20   ` blacknred
2010-12-07 18:00     ` Emmanuel Florac
2010-12-07 18:18       ` Stan Hoeppner
2010-12-07 21:52         ` Emmanuel Florac
2010-12-07 22:25 ` Dave Chinner
2010-12-08  9:39   ` blacknred
2010-12-08 10:57     ` Emmanuel Florac
2010-12-08 14:01       ` blacknred
2010-12-08 14:34         ` Emmanuel Florac
2010-12-09  0:59     ` Dave Chinner
2010-12-09  4:44       ` Eric Sandeen
2010-12-09 13:17         ` blacknred
2010-12-09 14:56           ` Eric Sandeen
2010-12-09 13:23       ` blacknred

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.