XFS crashed twice, once in 2.6.16.20, next in 2.6.17, reproducable

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* XFS crashed twice, once in 2.6.16.20, next in 2.6.17, reproducable
@ 2006-06-19  7:44 Avuton Olrich
  2006-06-19 10:35 ` daniel+devel.linux.lkml
  2006-06-20  6:10 ` Nathan Scott
  0 siblings, 2 replies; 17+ messages in thread
From: Avuton Olrich @ 2006-06-19  7:44 UTC (permalink / raw)
  To: Linux Kernel Mailing List, linux-xfs

Didn't make it to the linux-xfs list, it's not working so I'll try by
sending it both to the LKML and linux-xfs again.

Hello, when trying to recursively delete a directory (same directory
twice) from my 500gb hard drive I get a problem. It crashed first in
2.6.16.20, then I upgraded to try to get rid of the issue. This one is
from 2.6.17:

xfs_da_do_buf: bno 16777216
dir: inode 1507133580
Filesystem "sda1": XFS internal error xfs_da_do_buf(1) at line 2119 of
file /usr/src/linux-stable-cold/fs/xfs/xfs_da_btree.c.  Caller 0xb01
d9b63
 <b01d9720> xfs_da_do_buf+0x40e/0x7c7  <b01d9b63> xfs_da_read_buf+0x30/0x35
 <b01e43d5> xfs_dir2_leafn_lookup_int+0x2f3/0x453  <b01d9b63>
xfs_da_read_buf+0x30/0x35
 <b01e2ba5> xfs_dir2_node_removename+0x288/0x47f  <b01e2ba5>
xfs_dir2_node_removename+0x288/0x47f
 <b01ddbd3> xfs_dir2_removename+0xce/0xd5  <b020ff5d> kmem_zone_alloc+0x4d/0x98
 <b020d0ef> xfs_remove+0x2ac/0x444  <b0215e7f> xfs_vn_unlink+0x17/0x3b
 <b020a32b> xfs_lookup+0x6e/0x78  <b011e734> __capable+0xc/0x1f
 <b0155827> generic_permission+0x93/0xcc  <b01558f8> permission+0x98/0xa4
 <b0155da0> may_delete+0x32/0xe9  <b0156243> vfs_unlink+0x6d/0xa3
 <b0157c7a> do_unlinkat+0x92/0x125  <b0159a0d> sys_getdents64+0x9c/0xa6
 <b0102b67> sysenter_past_esp+0x54/0x75
Filesystem "sda1": XFS internal error xfs_trans_cancel at line 1150 of
file /usr/src/linux-stable-cold/fs/xfs/xfs_trans.c.  Caller 0xb020d2
5e
 <b0204b48> xfs_trans_cancel+0x59/0xe5  <b020d25e> xfs_remove+0x41b/0x444
 <b020d25e> xfs_remove+0x41b/0x444  <b0215e7f> xfs_vn_unlink+0x17/0x3b
 <b020a32b> xfs_lookup+0x6e/0x78  <b011e734> __capable+0xc/0x1f
 <b0155827> generic_permission+0x93/0xcc  <b01558f8> permission+0x98/0xa4
 <b0155da0> may_delete+0x32/0xe9  <b0156243> vfs_unlink+0x6d/0xa3
 <b0157c7a> do_unlinkat+0x92/0x125  <b0159a0d> sys_getdents64+0x9c/0xa6
 <b0102b67> sysenter_past_esp+0x54/0x75
xfs_force_shutdown(sda1,0x8) called from line 1151 of file
/usr/src/linux-stable-cold/fs/xfs/xfs_trans.c.  Return address =
0xb0218b68
Filesystem "sda1": Corruption of in-memory data detected.  Shutting
down filesystem: sda1

While trying to xfs_repair I get the following:
fatal error -- can't read block 16777216 for directory inode 1507133580

Badblocks has been run on this machine and it was sucessful.

I did find an old thread with this, but no solution:
http://oss.sgi.com/archives/xfs/2005-02/msg00067.html

config:
http://olricha.homelinux.net:8080/config.gz

Thanks for any help. If I can help at all please let me know.
-- 
avuton
--
 Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: XFS crashed twice, once in 2.6.16.20, next in 2.6.17, reproducable
  2006-06-19  7:44 XFS crashed twice, once in 2.6.16.20, next in 2.6.17, reproducable Avuton Olrich
@ 2006-06-19 10:35 ` daniel+devel.linux.lkml
  2006-06-20  6:10 ` Nathan Scott
  1 sibling, 0 replies; 17+ messages in thread
From: daniel+devel.linux.lkml @ 2006-06-19 10:35 UTC (permalink / raw)
  To: Avuton Olrich; +Cc: Linux Kernel Mailing List, linux-xfs

"Avuton Olrich" <avuton@gmail.com> writes:

The same here.
after a complete mkfs.xfs under 2.6.17-rc6 it was solved.

Same if i boot 2.6.8 mk.xfs, boot into 2.6.16 the xfs get "shreddered"
a directly boot from .8 to .17-rc6 works. so i think there was a bug in .16
in the transition of the xfs wich got solved somewhere in the .17.rc? time.

> Filesystem "sda1": Corruption of in-memory data detected.  Shutting
> down filesystem: sda1

Daniel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: XFS crashed twice, once in 2.6.16.20, next in 2.6.17, reproducable
  2006-06-19  7:44 XFS crashed twice, once in 2.6.16.20, next in 2.6.17, reproducable Avuton Olrich
  2006-06-19 10:35 ` daniel+devel.linux.lkml
@ 2006-06-20  6:10 ` Nathan Scott
  2006-06-20  6:38   ` Avuton Olrich
  2006-06-20  6:40   ` Avuton Olrich
  1 sibling, 2 replies; 17+ messages in thread
From: Nathan Scott @ 2006-06-20  6:10 UTC (permalink / raw)
  To: Avuton Olrich; +Cc: linux-kernel, xfs

On Mon, Jun 19, 2006 at 12:44:58AM -0700, Avuton Olrich wrote:
> ..
> Hello, when trying to recursively delete a directory (same directory
> twice) from my 500gb hard drive I get a problem. It crashed first in
> 2.6.16.20, then I upgraded to try to get rid of the issue. This one is
> from 2.6.17:

How reproducible is it?  Is it reproducible even after xfs_repair?

If so, can you try Mandy's patch below, to see if it is addressing
the root cause of your problem?  If problems persist, a reproducible
test case would be wonderful, if one can be found..

cheers.

-- 
Nathan

Fix nused counter.  It's currently getting set to -1 rather than getting
decremented by 1.  Since nused never reaches 0, the "if (!free->hdr.nused)"
check in xfs_dir2_leafn_remove() fails every time and xfs_dir2_shrink_inode()
doesn't get called when it should.  This causes extra blocks to be left on
an empty directory and the directory in unable to be converted back to
inline extent mode.

Signed-off-by: Mandy Kirkconnell <alkirkco@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>

--- a/fs/xfs/xfs_dir2_node.c	2006-06-20 16:00:45.000000000 +1000
+++ b/fs/xfs/xfs_dir2_node.c	2006-06-20 16:00:45.000000000 +1000
@@ -972,7 +972,7 @@ xfs_dir2_leafn_remove(
 			/*
 			 * One less used entry in the free table.
 			 */
-			free->hdr.nused = cpu_to_be32(-1);
+			be32_add(&free->hdr.nused, -1);
 			xfs_dir2_free_log_header(tp, fbp);
 			/*
 			 * If this was the last entry in the table, we can

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: XFS crashed twice, once in 2.6.16.20, next in 2.6.17, reproducable
  2006-06-20  6:10 ` Nathan Scott
@ 2006-06-20  6:38   ` Avuton Olrich
  2006-06-20  6:43     ` Nathan Scott
  2006-06-20  6:40   ` Avuton Olrich
  1 sibling, 1 reply; 17+ messages in thread
From: Avuton Olrich @ 2006-06-20  6:38 UTC (permalink / raw)
  To: Nathan Scott; +Cc: linux-kernel, xfs

On 6/19/06, Nathan Scott <nathans@sgi.com> wrote:
> On Mon, Jun 19, 2006 at 12:44:58AM -0700, Avuton Olrich wrote:
> > ..
> > Hello, when trying to recursively delete a directory (same directory
> > twice) from my 500gb hard drive I get a problem. It crashed first in
> > 2.6.16.20, then I upgraded to try to get rid of the issue. This one is
> > from 2.6.17:
>
> How reproducible is it?  Is it reproducible even after xfs_repair?

Happens every time I try to remove that inode (directory). xfs_repair
ends with a fatal error:

Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - ensuring existence of lost+found directory
        - traversing filesystem starting at / ...
rebuilding directory inode 128

fatal error -- can't read block 16777216 for directory inode 1507133580

> If so, can you try Mandy's patch below, to see if it is addressing
> the root cause of your problem?  If problems persist, a reproducible
> test case would be wonderful, if one can be found..

I'm sorry, the patch doesn't change anything. It never makes it though
the xfs_repair due to the above error. If there's any information I
can get for you please let me know.

I'm not sure if it changes anything, but here's the message after the patch:
xfs_da_do_buf: bno 16777216
dir: inode 1507133580
Filesystem "sda1": XFS internal error xfs_da_do_buf(1) at line 2119 of
file /usr/src/linux-stable-cold/fs/xfs/xfs_da_btree.c.  Caller
0xb01d9b63
 <b01d9720> xfs_da_do_buf+0x40e/0x7c7  <b01d9b63> xfs_da_read_buf+0x30/0x35
 <b01e43d9> xfs_dir2_leafn_lookup_int+0x2f3/0x453  <b01d9b63>
xfs_da_read_buf+0x30/0x35
 <b01e2ba5> xfs_dir2_node_removename+0x288/0x483  <b01e2ba5>
xfs_dir2_node_removename+0x288/0x483
 <b01ddbd3> xfs_dir2_removename+0xce/0xd5  <b020ff61> kmem_zone_alloc+0x4d/0x98
 <b020d0f3> xfs_remove+0x2ac/0x444  <b0215e83> xfs_vn_unlink+0x17/0x3b
 <b016190c> mntput_no_expire+0x11/0x7e  <b01575f1> link_path_walk+0xaf/0xb9
 <b011e734> __capable+0xc/0x1f  <b0155827> generic_permission+0x93/0xcc
 <b01558f8> permission+0x98/0xa4  <b0155da0> may_delete+0x32/0xe9
 <b0156243> vfs_unlink+0x6d/0xa3  <b0157c7a> do_unlinkat+0x92/0x125
 <b0159a0d> sys_getdents64+0x9c/0xa6  <b0102b67> sysenter_past_esp+0x54/0x75
Filesystem "sda1": XFS internal error xfs_trans_cancel at line 1150 of
file /usr/src/linux-stable-cold/fs/xfs/xfs_trans.c.  Caller 0xb020d262
 <b0204b4c> xfs_trans_cancel+0x59/0xe5  <b020d262> xfs_remove+0x41b/0x444
 <b020d262> xfs_remove+0x41b/0x444  <b0215e83> xfs_vn_unlink+0x17/0x3b
 <b016190c> mntput_no_expire+0x11/0x7e  <b01575f1> link_path_walk+0xaf/0xb9
 <b011e734> __capable+0xc/0x1f  <b0155827> generic_permission+0x93/0xcc
 <b01558f8> permission+0x98/0xa4  <b0155da0> may_delete+0x32/0xe9
 <b0156243> vfs_unlink+0x6d/0xa3  <b0157c7a> do_unlinkat+0x92/0x125
 <b0159a0d> sys_getdents64+0x9c/0xa6  <b0102b67> sysenter_past_esp+0x54/0x75
xfs_force_shutdown(sda1,0x8) called from line 1151 of file
/usr/src/linux-stable-cold/fs/xfs/xfs_trans.c.  Return address =
0xb0218b6c
Filesystem "sda1": Corruption of in-memory data detected.  Shutting
down filesystem: sda1
Please umount the filesystem, and rectify the problem(s)
xfs_force_shutdown(sda1,0x1) called from line 338 of file
/usr/src/linux-stable-cold/fs/xfs/xfs_rw.c.  Return address =
0xb0218b6c


-- 
avuton
--
 Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: XFS crashed twice, once in 2.6.16.20, next in 2.6.17, reproducable
  2006-06-20  6:10 ` Nathan Scott
  2006-06-20  6:38   ` Avuton Olrich
@ 2006-06-20  6:40   ` Avuton Olrich
  2006-06-20  8:57     ` Justin Piszcz
  1 sibling, 1 reply; 17+ messages in thread
From: Avuton Olrich @ 2006-06-20  6:40 UTC (permalink / raw)
  To: Nathan Scott; +Cc: linux-kernel, xfs

On 6/19/06, Nathan Scott <nathans@sgi.com> wrote:
> How reproducible is it?  Is it reproducible even after xfs_repair?
It happens everytime I try to delete the directory.

Also, forgot to mention I ran xfs_check on it and it gave me more
information than I had before:
More information, ran xfs_check and got the following:
missing free index for data block 0 in dir ino 1507133580
missing free index for data block 2 in dir ino 1507133580
missing free index for data block 3 in dir ino 1507133580
missing free index for data block 4 in dir ino 1507133580
missing free index for data block 5 in dir ino 1507133580
missing free index for data block 6 in dir ino 1507133580
missing free index for data block 7 in dir ino 1507133580
missing free index for data block 8 in dir ino 1507133580
missing free index for data block 9 in dir ino 1507133580

-- 
avuton
--
 Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: XFS crashed twice, once in 2.6.16.20, next in 2.6.17, reproducable
  2006-06-20  6:38   ` Avuton Olrich
@ 2006-06-20  6:43     ` Nathan Scott
  2006-06-20  6:50       ` Avuton Olrich
  0 siblings, 1 reply; 17+ messages in thread
From: Nathan Scott @ 2006-06-20  6:43 UTC (permalink / raw)
  To: Avuton Olrich; +Cc: linux-kernel, xfs

On Mon, Jun 19, 2006 at 11:38:58PM -0700, Avuton Olrich wrote:
> On 6/19/06, Nathan Scott <nathans@sgi.com> wrote:
> > If so, can you try Mandy's patch below, to see if it is addressing
> > the root cause of your problem?  If problems persist, a reproducible
> > test case would be wonderful, if one can be found..
> 
> I'm sorry, the patch doesn't change anything. It never makes it though
> the xfs_repair due to the above error. If there's any information I
> can get for you please let me know.

Oh - thats a kernel patch, not a repair patch, I was more interested
in whether the initial corruption could be reproduced.  Which version
of xfs_repair are you running?  (xfs_repair -V)  xfsprogs-2.7.18 will
resolve your problem, I suspect.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: XFS crashed twice, once in 2.6.16.20, next in 2.6.17, reproducable
  2006-06-20  6:43     ` Nathan Scott
@ 2006-06-20  6:50       ` Avuton Olrich
  2006-06-20  6:52         ` Nathan Scott
  0 siblings, 1 reply; 17+ messages in thread
From: Avuton Olrich @ 2006-06-20  6:50 UTC (permalink / raw)
  To: Nathan Scott; +Cc: linux-kernel, xfs

On 6/19/06, Nathan Scott <nathans@sgi.com> wrote:
> Oh - thats a kernel patch, not a repair patch, I was more interested
> in whether the initial corruption could be reproduced.  Which version
> of xfs_repair are you running?  (xfs_repair -V)  xfsprogs-2.7.18 will
> resolve your problem, I suspect.

OK, I'm running Gentoo's latest: 2.7.11, I can't find 2.7.18
_anywhere_ although 2.7.13 is in the pre directory on the ftp, is that
the one you're referring to?
-- 
avuton
--
 Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: XFS crashed twice, once in 2.6.16.20, next in 2.6.17, reproducable
  2006-06-20  6:50       ` Avuton Olrich
@ 2006-06-20  6:52         ` Nathan Scott
  2006-06-20  8:20           ` Avuton Olrich
  0 siblings, 1 reply; 17+ messages in thread
From: Nathan Scott @ 2006-06-20  6:52 UTC (permalink / raw)
  To: Avuton Olrich; +Cc: linux-kernel, xfs

On Mon, Jun 19, 2006 at 11:50:37PM -0700, Avuton Olrich wrote:
> On 6/19/06, Nathan Scott <nathans@sgi.com> wrote:
> > Oh - thats a kernel patch, not a repair patch, I was more interested
> > in whether the initial corruption could be reproduced.  Which version
> > of xfs_repair are you running?  (xfs_repair -V)  xfsprogs-2.7.18 will
> > resolve your problem, I suspect.
> 
> OK, I'm running Gentoo's latest: 2.7.11, I can't find 2.7.18
> _anywhere_ although 2.7.13 is in the pre directory on the ftp, is that
> the one you're referring to?

No - its in CVS (for a long time); I'll go get the ftp area updated,
looks like thats been forgotten about again.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: XFS crashed twice, once in 2.6.16.20, next in 2.6.17, reproducable
  2006-06-20  6:52         ` Nathan Scott
@ 2006-06-20  8:20           ` Avuton Olrich
  2006-06-20  8:39             ` Duncan Sands
                               ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Avuton Olrich @ 2006-06-20  8:20 UTC (permalink / raw)
  To: Nathan Scott; +Cc: linux-kernel, xfs

On 6/19/06, Nathan Scott <nathans@sgi.com> wrote:
> On Mon, Jun 19, 2006 at 11:50:37PM -0700, Avuton Olrich wrote:
> > On 6/19/06, Nathan Scott <nathans@sgi.com> wrote:
> > > Oh - thats a kernel patch, not a repair patch, I was more interested
> > > in whether the initial corruption could be reproduced.  Which version
> > > of xfs_repair are you running?  (xfs_repair -V)  xfsprogs-2.7.18 will
> > > resolve your problem, I suspect.
> >
> > OK, I'm running Gentoo's latest: 2.7.11, I can't find 2.7.18
> > _anywhere_ although 2.7.13 is in the pre directory on the ftp, is that
> > the one you're referring to?
>
> No - its in CVS (for a long time); I'll go get the ftp area updated,
> looks like thats been forgotten about again.

OK, just compiled from CVS HEAD (xfs_repair 2.8.2) and it still fails:

If this fix is not yet in the 2.8.x I will wait for 2.7.18 to get on the ftp.

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
entry "/ost+found" at block 0 offset 448 in directory inode 128
references invalid inode 18374686479671623679
        clearing inode number in entry at offset 448...
entry at block 0 offset 448 in directory inode 128 has illegal name
"/ost+found": imap claims a free inode 859505 is in use, correcting
imap and clearing inode
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - clear lost+found (if it exists) ...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - ensuring existence of lost+found directory
        - traversing filesystem starting at / ...
rebuilding directory inode 128

fatal error -- can't read block 16777216 for directory inode
1507133580

-- 
avuton
--
 Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: XFS crashed twice, once in 2.6.16.20, next in 2.6.17, reproducable
  2006-06-20  8:20           ` Avuton Olrich
@ 2006-06-20  8:39             ` Duncan Sands
  2006-06-22  2:56             ` Nathan Scott
  2006-06-25 10:09             ` Duncan Sands
  2 siblings, 0 replies; 17+ messages in thread
From: Duncan Sands @ 2006-06-20  8:39 UTC (permalink / raw)
  To: Avuton Olrich; +Cc: Nathan Scott, linux-kernel, xfs

> fatal error -- can't read block 16777216 for directory inode 1507133580

This looks to be the same problem as http://oss.sgi.com/bugzilla/show_bug.cgi?id=631
Note that the block numbers are identical in both reports: 16777216 = 0x1000000.
A very suspicious block number, wouldn't you say?

Best wishes,

Duncan.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: XFS crashed twice, once in 2.6.16.20, next in 2.6.17, reproducable
  2006-06-20  6:40   ` Avuton Olrich
@ 2006-06-20  8:57     ` Justin Piszcz
  2006-06-20 17:01       ` Avuton Olrich
  0 siblings, 1 reply; 17+ messages in thread
From: Justin Piszcz @ 2006-06-20  8:57 UTC (permalink / raw)
  To: Avuton Olrich; +Cc: Nathan Scott, linux-kernel, xfs

Have you checked to make sure you don't have a bad disk?

On Mon, 19 Jun 2006, Avuton Olrich wrote:

> On 6/19/06, Nathan Scott <nathans@sgi.com> wrote:
>> How reproducible is it?  Is it reproducible even after xfs_repair?
> It happens everytime I try to delete the directory.
>
> Also, forgot to mention I ran xfs_check on it and it gave me more
> information than I had before:
> More information, ran xfs_check and got the following:
> missing free index for data block 0 in dir ino 1507133580
> missing free index for data block 2 in dir ino 1507133580
> missing free index for data block 3 in dir ino 1507133580
> missing free index for data block 4 in dir ino 1507133580
> missing free index for data block 5 in dir ino 1507133580
> missing free index for data block 6 in dir ino 1507133580
> missing free index for data block 7 in dir ino 1507133580
> missing free index for data block 8 in dir ino 1507133580
> missing free index for data block 9 in dir ino 1507133580
>
> -- 
> avuton
> --
> Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: XFS crashed twice, once in 2.6.16.20, next in 2.6.17, reproducable
  2006-06-20  8:57     ` Justin Piszcz
@ 2006-06-20 17:01       ` Avuton Olrich
  2006-06-20 17:15         ` Justin Piszcz
  0 siblings, 1 reply; 17+ messages in thread
From: Avuton Olrich @ 2006-06-20 17:01 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: Nathan Scott, linux-kernel, xfs

On 6/20/06, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> Have you checked to make sure you don't have a bad disk?

In the initial email I do state that I have run badblocks on this disk
sucessfully.
-- 
avuton
--
 Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: XFS crashed twice, once in 2.6.16.20, next in 2.6.17, reproducable
  2006-06-20 17:01       ` Avuton Olrich
@ 2006-06-20 17:15         ` Justin Piszcz
  2006-06-20 17:21           ` Avuton Olrich
  0 siblings, 1 reply; 17+ messages in thread
From: Justin Piszcz @ 2006-06-20 17:15 UTC (permalink / raw)
  To: Avuton Olrich; +Cc: Nathan Scott, linux-kernel, xfs

WHat options did you pass to bad blocks?

On Tue, 20 Jun 2006, Avuton Olrich wrote:

> On 6/20/06, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
>> Have you checked to make sure you don't have a bad disk?
>
> In the initial email I do state that I have run badblocks on this disk
> sucessfully.
> -- 
> avuton
> --
> Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: XFS crashed twice, once in 2.6.16.20, next in 2.6.17, reproducable
  2006-06-20 17:15         ` Justin Piszcz
@ 2006-06-20 17:21           ` Avuton Olrich
  0 siblings, 0 replies; 17+ messages in thread
From: Avuton Olrich @ 2006-06-20 17:21 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: Nathan Scott, linux-kernel, xfs

On 6/20/06, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> WHat options did you pass to bad blocks?

Just the defaults, but it doesn't matter, someone else is having the
same exact issue I am, from the bugzilla entry earlier in this thread.
-- 
avuton
--
 Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: XFS crashed twice, once in 2.6.16.20, next in 2.6.17, reproducable
  2006-06-20  8:20           ` Avuton Olrich
  2006-06-20  8:39             ` Duncan Sands
@ 2006-06-22  2:56             ` Nathan Scott
  2006-06-25 10:09             ` Duncan Sands
  2 siblings, 0 replies; 17+ messages in thread
From: Nathan Scott @ 2006-06-22  2:56 UTC (permalink / raw)
  To: Avuton Olrich; +Cc: linux-kernel, xfs

On Tue, Jun 20, 2006 at 01:20:39AM -0700, Avuton Olrich wrote:
> On 6/19/06, Nathan Scott <nathans@sgi.com> wrote:
> > No - its in CVS (for a long time); I'll go get the ftp area updated,
> > looks like thats been forgotten about again.

FWIW, I've updated the ftp area now.

> OK, just compiled from CVS HEAD (xfs_repair 2.8.2) and it still fails:

Is this a large filesystem?  Any chance we can get access to
it somehow (e.g. xfs_copy to a sparse file, then send me a
pointer to it) to reproduce the problem locally?

> fatal error -- can't read block 16777216 for directory inode
> 1507133580

Once you save a copy of it for further analysis of xfs_repair,
if you can, you can clear out this problem by directly poking at
the device using xfs_db in expert mode.  "xfs_db -x /dev/xxx";
then "inode 1507133580"; then "write core.mode 0"; and then try
another xfs_repair run.  Please try capture the fs for us first
though (if possible) else we're going to struggle to improve on
this aspect of xfs_repair.  Send me some private mail if you do
manage to grab the fs and put it someplace for me.

thanks.

-- 
Nathan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: XFS crashed twice, once in 2.6.16.20, next in 2.6.17, reproducable
  2006-06-20  8:20           ` Avuton Olrich
  2006-06-20  8:39             ` Duncan Sands
  2006-06-22  2:56             ` Nathan Scott
@ 2006-06-25 10:09             ` Duncan Sands
  2006-06-25 13:55               ` Duncan Sands
  2 siblings, 1 reply; 17+ messages in thread
From: Duncan Sands @ 2006-06-25 10:09 UTC (permalink / raw)
  To: Avuton Olrich; +Cc: Nathan Scott, linux-kernel, xfs

I just got a new XFS crash running 2.6.17, again with problems at block
16777216 - I'll try to make a copy of the corrupted filesystem available.
Interestingly enough, I'm also seeing ext3 corruption.  The usual
manifestation is that a program fails to run, with a message about it
not being in executable format (if it happens again I will take a note of
the exact message).  I've had no problems at all with 2.6.17.  It seems
to be happening randomly, which makes me suspect a race condition
(uniprocessor machine, but preemptable kernel), or memory corruption.
I will rebuild the kernel with all kernel debugging options turned
on, once I recover the filesystem.

Ciao,

D.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: XFS crashed twice, once in 2.6.16.20, next in 2.6.17, reproducable
  2006-06-25 10:09             ` Duncan Sands
@ 2006-06-25 13:55               ` Duncan Sands
  0 siblings, 0 replies; 17+ messages in thread
From: Duncan Sands @ 2006-06-25 13:55 UTC (permalink / raw)
  To: Avuton Olrich; +Cc: Nathan Scott, linux-kernel, xfs

On Sunday 25 June 2006 12:09, Duncan Sands wrote:
> I just got a new XFS crash running 2.6.17, again with problems at block
> 16777216 - I'll try to make a copy of the corrupted filesystem available.
> Interestingly enough, I'm also seeing ext3 corruption.  The usual
> manifestation is that a program fails to run, with a message about it
> not being in executable format (if it happens again I will take a note of
> the exact message).  I've had no problems at all with 2.6.17.  It seems
> to be happening randomly, which makes me suspect a race condition
> (uniprocessor machine, but preemptable kernel), or memory corruption.
> I will rebuild the kernel with all kernel debugging options turned
> on, once I recover the filesystem.

Sorry, that should say: "I've had no problems at all with 2.6.15".
Also, xfs_repair successfully repaired the filesystem this time.
I've kept a copy of the filesystem in case anyone is interested.

Duncan.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2006-06-25 13:55 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-06-19  7:44 XFS crashed twice, once in 2.6.16.20, next in 2.6.17, reproducable Avuton Olrich
2006-06-19 10:35 ` daniel+devel.linux.lkml
2006-06-20  6:10 ` Nathan Scott
2006-06-20  6:38   ` Avuton Olrich
2006-06-20  6:43     ` Nathan Scott
2006-06-20  6:50       ` Avuton Olrich
2006-06-20  6:52         ` Nathan Scott
2006-06-20  8:20           ` Avuton Olrich
2006-06-20  8:39             ` Duncan Sands
2006-06-22  2:56             ` Nathan Scott
2006-06-25 10:09             ` Duncan Sands
2006-06-25 13:55               ` Duncan Sands
2006-06-20  6:40   ` Avuton Olrich
2006-06-20  8:57     ` Justin Piszcz
2006-06-20 17:01       ` Avuton Olrich
2006-06-20 17:15         ` Justin Piszcz
2006-06-20 17:21           ` Avuton Olrich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).