linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Kernel oops / XFS filesystem corruption
@ 2008-03-01 11:21 Thomas Müller
  2008-03-01 21:02 ` Eric Sandeen
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Müller @ 2008-03-01 11:21 UTC (permalink / raw)
  To: xfs; +Cc: linux-kernel, Thomas Müller

[-- Attachment #1: Type: text/plain, Size: 643 bytes --]

Hello :)

My system just crashed because of a power fluctuation and the root
filesystem was damaged.
The system booted up just fine, but when samba tried to start up
the kernel oops'd.

xfs_repair was apparently able to repair the damage, though I seem
to have lost some files.

I do realize that a lot of awful things can happen if you just cut
the power, but the kernel shouldn't oops on a mounted file
system, right?

Please CC me, as I'm not subscribed to the lists.


Regards
Thomas


$ rpm -q xfsprogs
xfsprogs-2.9.4-4.fc8

$ uname -a
Linux linux.local.loc 2.6.23.15-137.fc8 #1 SMP Sun Feb 10 17:48:34 EST 2008 i686 i686 i386 
GNU/Linux

[-- Attachment #2: xfs_check --]
[-- Type: text/plain, Size: 2730 bytes --]

block 0/19018 expected type unknown got free2
agi unlinked bucket 6 is 103430 in ag 3 (inode=12686342)
agi unlinked bucket 14 is 91278 in ag 3 (inode=12674190)
agi unlinked bucket 23 is 106135 in ag 3 (inode=12689047)
agi unlinked bucket 31 is 53279 in ag 3 (inode=12636191)
agi unlinked bucket 35 is 106147 in ag 3 (inode=12689059)
agi unlinked bucket 36 is 60836 in ag 3 (inode=12643748)
agi unlinked bucket 39 is 60839 in ag 3 (inode=12643751)
agi unlinked bucket 41 is 378537 in ag 3 (inode=12961449)
agi unlinked bucket 50 is 91250 in ag 3 (inode=12674162)
agi unlinked bucket 20 is 38996 in ag 4 (inode=16816212)
agi unlinked bucket 57 is 95353 in ag 4 (inode=16872569)
agi unlinked bucket 4 is 199940 in ag 8 (inode=33754372)
agi unlinked bucket 8 is 56392 in ag 8 (inode=33610824)
agi unlinked bucket 21 is 177621 in ag 8 (inode=33732053)
agi unlinked bucket 22 is 56406 in ag 8 (inode=33610838)
agi unlinked bucket 23 is 56407 in ag 8 (inode=33610839)
agi unlinked bucket 27 is 54747 in ag 8 (inode=33609179)
agi unlinked bucket 32 is 67232 in ag 8 (inode=33621664)
agi unlinked bucket 37 is 54757 in ag 8 (inode=33609189)
agi unlinked bucket 39 is 67239 in ag 8 (inode=33621671)
agi unlinked bucket 40 is 67240 in ag 8 (inode=33621672)
agi unlinked bucket 47 is 56367 in ag 8 (inode=33610799)
agi unlinked bucket 0 is 34944 in ag 10 (inode=41977984)
agi unlinked bucket 20 is 42516 in ag 11 (inode=46179860)
agi unlinked bucket 15 is 463 in ag 13 (inode=54526415)
agi unlinked bucket 62 is 154430 in ag 13 (inode=54680382)
block 0/21136 type unknown not expected
allocated inode 12689047 has 0 link count
allocated inode 12689059 has 0 link count
allocated inode 12674162 has 0 link count
allocated inode 12674190 has 0 link count
allocated inode 12636191 has 0 link count
allocated inode 12961449 has 0 link count
allocated inode 12643748 has 0 link count
allocated inode 12643751 has 0 link count
allocated inode 12686342 has 0 link count
allocated inode 16816212 has 0 link count
allocated inode 16872569 has 0 link count
allocated inode 33754372 has 0 link count
allocated inode 33732053 has 0 link count
allocated inode 33621664 has 0 link count
allocated inode 33621671 has 0 link count
allocated inode 33621672 has 0 link count
allocated inode 33609179 has 0 link count
allocated inode 33609189 has 0 link count
allocated inode 33610799 has 0 link count
allocated inode 33610824 has 0 link count
allocated inode 33610838 has 0 link count
allocated inode 33610839 has 0 link count
allocated inode 41977984 has 0 link count
allocated inode 46179860 has 0 link count
allocated inode 54680382 has 0 link count
allocated inode 54526415 has 0 link count
sb_ifree 3257, counted 3259
sb_fdblocks 7248513, counted 7248904

[-- Attachment #3: xfs_oops --]
[-- Type: text/plain, Size: 3163 bytes --]

Mar  1 10:32:03 linux kernel: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000002
Mar  1 10:32:03 linux kernel: printing eip: f8a96141 *pde = 38ccb067 
Mar  1 10:32:03 linux kernel: Oops: 0000 [#1] SMP 
Mar  1 10:32:03 linux kernel: Modules linked in: asb100 hwmon_vid hwmon tun sch_sfq sch_htb pppoe pppox ppp_synctty ppp_async crc_ccitt ppp_generic slhc bridge xt_NOTRACK iptable_raw ipt_MASQUERADE iptable_nat nf_nat ipt_REJECT xt_mac ipt_LOG nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink iptable_filter xt_CLASSIFY xt_length ipt_owner xt_TCPMSS xt_comment xt_tcpudp iptable_mangle ip_tables x_tables ext2 mbcache dm_mirror dm_mod 8139too r8169 mii i2c_i801 iTCO_wdt iTCO_vendor_support i2c_core sg sr_mod cdrom ata_generic ata_piix libata sd_mod scsi_mod xfs ehci_hcd
Mar  1 10:32:03 linux kernel: CPU:    0
Mar  1 10:32:03 linux kernel: EIP:    0060:[<f8a96141>]    Not tainted VLI
Mar  1 10:32:03 linux kernel: EFLAGS: 00010292   (2.6.23.15-137.fc8 #1)
Mar  1 10:32:03 linux kernel: EIP is at xfs_attr_shortform_getvalue+0x15/0xdb [xfs]
Mar  1 10:32:03 linux kernel: eax: 00000000   ebx: f268cddc   ecx: f8ae4d9d   edx: 08d26645
Mar  1 10:32:03 linux kernel: esi: f04d1600   edi: 00000004   ebp: f8ae4d91   esp: f268cdbc
Mar  1 10:32:03 linux kernel: ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Mar  1 10:32:03 linux kernel: Process smbd (pid: 2036, ti=f268c000 task=f7207840 task.ti=f268c000)
Mar  1 10:32:03 linux kernel: Stack: 00000003 f37888d4 00000003 f04d1600 f04d1600 f268ce38 f8ae4d91 f8a93a97 
Mar  1 10:32:03 linux kernel: f8ae4d91 0000000c c1ba6000 00000130 00000402 275b19c4 00000000 00000000 
Mar  1 10:32:03 linux kernel: f04d1600 00000000 00000000 00000000 00000000 00000001 00000000 00000000 
Mar  1 10:32:03 linux kernel: Call Trace:
Mar  1 10:32:03 linux kernel: [<f8a93a97>] xfs_attr_fetch+0x9e/0xee [xfs]
Mar  1 10:32:03 linux kernel: [<f8a8d843>] xfs_acl_iaccess+0x59/0xc2 [xfs]
Mar  1 10:32:03 linux kernel: [<f8abe3c2>] xfs_iaccess+0x87/0x15c [xfs]
Mar  1 10:32:03 linux kernel: [<f8ad53ec>] xfs_access+0x26/0x3a [xfs]
Mar  1 10:32:03 linux kernel: [<f8ae08ae>] xfs_vn_permission+0x0/0x13 [xfs]
Mar  1 10:32:03 linux kernel: [<f8ae08bd>] xfs_vn_permission+0xf/0x13 [xfs]
Mar  1 10:32:03 linux kernel: [<c0487419>] permission+0x9e/0xdb
Mar  1 10:32:03 linux kernel: [<c04887d0>] may_open+0x5c/0x205
Mar  1 10:32:03 linux kernel: [<c048a8b4>] open_namei+0x27d/0x576
Mar  1 10:32:03 linux kernel: [<c047fdb7>] do_filp_open+0x2a/0x3e
Mar  1 10:32:03 linux kernel: [<c047fafe>] get_unused_fd_flags+0x52/0xc5
Mar  1 10:32:03 linux kernel: [<c047fe13>] do_sys_open+0x48/0xca
Mar  1 10:32:03 linux kernel: [<c047fece>] sys_open+0x1c/0x1e
Mar  1 10:32:03 linux kernel: [<c040518a>] syscall_call+0x7/0xb
Mar  1 10:32:03 linux kernel: =======================
Mar  1 10:32:03 linux kernel: Code: 00 00 c6 40 02 00 66 c7 00 00 04 8b 47 2c 5b 5e 5f e9 08 bc 03 00 55 57 56 53 89 c3 83 ec 0c 8b 40 20 8b 40 4c 8b 40 14 8d 78 04 <0f> b6 40 02 c7 44 24 08 00 00 00 00 89 44 24 04 e9 96 00 00 00 
Mar  1 10:32:03 linux kernel: EIP: [<f8a96141>] xfs_attr_shortform_getvalue+0x15/0xdb [xfs] SS:ESP 0068:f268cdbc

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel oops / XFS filesystem corruption
  2008-03-01 11:21 Kernel oops / XFS filesystem corruption Thomas Müller
@ 2008-03-01 21:02 ` Eric Sandeen
  2008-03-02  0:33   ` Thomas Müller
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Sandeen @ 2008-03-01 21:02 UTC (permalink / raw)
  To: Thomas Müller; +Cc: xfs, linux-kernel

Thomas Müller wrote:
> Hello :)
> 
> My system just crashed because of a power fluctuation and the root
> filesystem was damaged.
> The system booted up just fine, but when samba tried to start up
> the kernel oops'd.
> 
> xfs_repair was apparently able to repair the damage, though I seem
> to have lost some files.
> 
> I do realize that a lot of awful things can happen if you just cut
> the power, but the kernel shouldn't oops on a mounted file
> system, right?

right.

here's the disassembly of that function in your kernrel FWIW:

0001012c <xfs_attr_shortform_getvalue>:
   1012c:       55                      push   %ebp
   1012d:       57                      push   %edi
   1012e:       56                      push   %esi
   1012f:       53                      push   %ebx
   10130:       89 c3                   mov    %eax,%ebx
   10132:       83 ec 0c                sub    $0xc,%esp
   10135:       8b 40 20                mov    0x20(%eax),%eax
   10138:       8b 40 4c                mov    0x4c(%eax),%eax
   1013b:       8b 40 14                mov    0x14(%eax),%eax
   1013e:       8d 78 04                lea    0x4(%eax),%edi
   10141:       0f b6 40 02             movzbl 0x2(%eax),%eax <--- boom.
   10145:       c7 44 24 08 00 00 00    movl   $0x0,0x8(%esp)
   1014c:       00
   1014d:       89 44 24 04             mov    %eax,0x4(%esp)
   10151:       e9 96 00 00 00          jmp    101ec
<xfs_attr_shortform_getvalue+0xc0>
...

at this point eax is "sf" (0x0) and edi is "sfe" (0x04)

Mar  1 10:32:03 linux kernel: eax: 00000000   ebx: f268cddc   ecx:
f8ae4d9d   edx: 08d26645
Mar  1 10:32:03 linux kernel: esi: f04d1600   edi: 00000004   ebp:
f8ae4d91   esp: f268cdbc

first part of the function:

int
xfs_attr_shortform_getvalue(xfs_da_args_t *args)
{
        xfs_attr_shortform_t *sf;
        xfs_attr_sf_entry_t *sfe;
        int i;

        ASSERT(args->dp->i_d.di_aformat == XFS_IFINLINE);
        sf = (xfs_attr_shortform_t *)args->dp->i_afp->if_u1.if_data;
        sfe = &sf->list[0];
        for (i = 0; i < sf->hdr.count; <--- died here, sf is 0
                                sfe = XFS_ATTR_SF_NEXTENTRY(sfe), i++) {

we blew up on sf->hdr.count because sf is NULL (hdr.count is 0x2 into sf)

maybe the sgi guys can take it from there ;)  Did you also happen to
save the xfs_repair output?

-Eric


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel oops / XFS filesystem corruption
  2008-03-01 21:02 ` Eric Sandeen
@ 2008-03-02  0:33   ` Thomas Müller
  2008-03-02  1:34     ` Eric Sandeen
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Müller @ 2008-03-02  0:33 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 187 bytes --]

Eric Sandeen wrote:
> Did you also happen to save the xfs_repair output?
No, but I made a complete copy of the file system before
repairing it, so I can easily recreate it... :)


Thomas

[-- Attachment #2: xfs_repair --]
[-- Type: text/plain, Size: 5175 bytes --]

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
data fork in ino 128638 claims free block 19018
        - agno = 1
        - agno = 2
b5ac7b90: Badness in key lookup (length)
bp=(bno 11701280, len 32768 bytes) key=(bno 11701280, len 8192 bytes)
b5ac7b90: Badness in key lookup (length)
bp=(bno 11708896, len 32768 bytes) key=(bno 11708896, len 8192 bytes)
b5ac7b90: Badness in key lookup (length)
bp=(bno 11739296, len 32768 bytes) key=(bno 11739296, len 8192 bytes)
b5ac7b90: Badness in key lookup (length)
bp=(bno 11751440, len 32768 bytes) key=(bno 11751440, len 8192 bytes)
b5ac7b90: Badness in key lookup (length)
bp=(bno 11754176, len 32768 bytes) key=(bno 11754176, len 8192 bytes)
b5ac7b90: Badness in key lookup (length)
bp=(bno 12026592, len 32768 bytes) key=(bno 12026592, len 8192 bytes)
        - agno = 3
b50c6b90: Badness in key lookup (length)
bp=(bno 15569728, len 32768 bytes) key=(bno 15569728, len 8192 bytes)
b50c6b90: Badness in key lookup (length)
bp=(bno 15626080, len 32768 bytes) key=(bno 15626080, len 8192 bytes)
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
b41ffb90: Badness in key lookup (length)
bp=(bno 31116224, len 32768 bytes) key=(bno 31116224, len 8192 bytes)
b41ffb90: Badness in key lookup (length)
bp=(bno 31117856, len 32768 bytes) key=(bno 31117856, len 8192 bytes)
b41ffb90: Badness in key lookup (length)
bp=(bno 31128704, len 32768 bytes) key=(bno 31128704, len 8192 bytes)
b41ffb90: Badness in key lookup (length)
bp=(bno 31239104, len 32768 bytes) key=(bno 31239104, len 8192 bytes)
b41ffb90: Badness in key lookup (length)
bp=(bno 31261408, len 32768 bytes) key=(bno 31261408, len 8192 bytes)
        - agno = 8
local inode 33609156 attr too small (size = 0, min size = 4)
bad attribute fork in inode 33609156, clearing attr fork
clearing inode 33609156 attributes
cleared inode 33609156
        - agno = 9
b50c6b90: Badness in key lookup (length)
bp=(bno 38861808, len 32768 bytes) key=(bno 38861808, len 8192 bytes)
        - agno = 10
b41ffb90: Badness in key lookup (length)
bp=(bno 42752032, len 32768 bytes) key=(bno 42752032, len 8192 bytes)
        - agno = 11
        - agno = 12
b50c6b90: Badness in key lookup (length)
bp=(bno 50475360, len 32768 bytes) key=(bno 50475360, len 8192 bytes)
b50c6b90: Badness in key lookup (length)
bp=(bno 50629312, len 32768 bytes) key=(bno 50629312, len 8192 bytes)
        - agno = 13
        - agno = 14
        - agno = 15
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
bad bmap btree ptr 0xc3a0000100000000 in ino 33609156
bad data fork in inode 33609156
cleared inode 33609156
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
entry "locking.tdb" in directory inode 33585205 points to free inode 33609156
bad hash table for directory inode 33585205 (no data entry): rebuilding
rebuilding directory inode 33585205
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 12636191, moving to lost+found
disconnected inode 12643748, moving to lost+found
disconnected inode 12643751, moving to lost+found
disconnected inode 12674162, moving to lost+found
disconnected inode 12674190, moving to lost+found
disconnected inode 12686342, moving to lost+found
disconnected inode 12689047, moving to lost+found
disconnected inode 12689059, moving to lost+found
disconnected inode 12961449, moving to lost+found
disconnected inode 16816212, moving to lost+found
disconnected inode 16872569, moving to lost+found
disconnected inode 33609179, moving to lost+found
disconnected inode 33609189, moving to lost+found
disconnected inode 33610799, moving to lost+found
disconnected inode 33610824, moving to lost+found
disconnected inode 33610838, moving to lost+found
disconnected inode 33610839, moving to lost+found
disconnected inode 33621664, moving to lost+found
disconnected inode 33621671, moving to lost+found
disconnected inode 33621672, moving to lost+found
disconnected inode 33732053, moving to lost+found
disconnected inode 33754372, moving to lost+found
disconnected inode 41977984, moving to lost+found
disconnected inode 46179860, moving to lost+found
disconnected inode 54526415, moving to lost+found
disconnected inode 54680382, moving to lost+found
Phase 7 - verify and correct link counts...
done

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel oops / XFS filesystem corruption
  2008-03-02  0:33   ` Thomas Müller
@ 2008-03-02  1:34     ` Eric Sandeen
  2008-03-02 19:02       ` Thomas Müller
  2008-03-03  1:02       ` Mark Goodwin
  0 siblings, 2 replies; 7+ messages in thread
From: Eric Sandeen @ 2008-03-02  1:34 UTC (permalink / raw)
  To: Thomas Müller; +Cc: xfs, linux-kernel

Thomas Müller wrote:
> Eric Sandeen wrote:
>> Did you also happen to save the xfs_repair output?
> No, but I made a complete copy of the file system before
> repairing it, so I can easily recreate it... :)

oh, like a dd image?  great.  You can use xfs_metadump to make a more
transportable image... xfs folks might even be able to use that to
recreate the oops.

-Eric

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel oops / XFS filesystem corruption
  2008-03-02  1:34     ` Eric Sandeen
@ 2008-03-02 19:02       ` Thomas Müller
  2008-03-03  1:02         ` Barry Naujok
  2008-03-03  1:02       ` Mark Goodwin
  1 sibling, 1 reply; 7+ messages in thread
From: Thomas Müller @ 2008-03-02 19:02 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs, linux-kernel

Eric Sandeen wrote:
> oh, like a dd image?  great.
Yup :)

 > You can use xfs_metadump to make a more transportable image...
I will, if someone needs it.

As said, I have a complete file system image, so if anyone needs
more information/data, just tell me.


Thomas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel oops / XFS filesystem corruption
  2008-03-02  1:34     ` Eric Sandeen
  2008-03-02 19:02       ` Thomas Müller
@ 2008-03-03  1:02       ` Mark Goodwin
  1 sibling, 0 replies; 7+ messages in thread
From: Mark Goodwin @ 2008-03-03  1:02 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Thomas Müller, xfs, linux-kernel



Eric Sandeen wrote:
> Thomas Müller wrote:
>> Eric Sandeen wrote:
>>> Did you also happen to save the xfs_repair output?
>> No, but I made a complete copy of the file system before
>> repairing it, so I can easily recreate it... :)
> 
> oh, like a dd image?  great.  You can use xfs_metadump to make a more
> transportable image... xfs folks might even be able to use that to
> recreate the oops.

YES PLEASE. See the xfs_metadump man page for instructions. It will
obfuscate filenames by default (but please only do so if you need to).

Please make it available for Barry, thanks.

Cheers
-- Mark

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel oops / XFS filesystem corruption
  2008-03-02 19:02       ` Thomas Müller
@ 2008-03-03  1:02         ` Barry Naujok
  0 siblings, 0 replies; 7+ messages in thread
From: Barry Naujok @ 2008-03-03  1:02 UTC (permalink / raw)
  To: Thomas Müller, Eric Sandeen; +Cc: xfs, linux-kernel

On Mon, 03 Mar 2008 06:02:28 +1100, Thomas Müller <thomas@mathtm.de> wrote:

> Eric Sandeen wrote:
>> oh, like a dd image?  great.
> Yup :)
>
>  > You can use xfs_metadump to make a more transportable image...
> I will, if someone needs it.
>
> As said, I have a complete file system image, so if anyone needs
> more information/data, just tell me.

I could use the metadump image for the badness in key lookups that
xfs_repair was reporting.

Thanks,
Barry.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-03-03  1:03 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-01 11:21 Kernel oops / XFS filesystem corruption Thomas Müller
2008-03-01 21:02 ` Eric Sandeen
2008-03-02  0:33   ` Thomas Müller
2008-03-02  1:34     ` Eric Sandeen
2008-03-02 19:02       ` Thomas Müller
2008-03-03  1:02         ` Barry Naujok
2008-03-03  1:02       ` Mark Goodwin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).