All of lore.kernel.org
 help / color / mirror / Atom feed
* xfs bug in 2.6.17.9?
@ 2006-08-24  9:45 Stian Jordet
  2006-08-24 12:29 ` Martin Steigerwald
  2006-08-24 13:42 ` Justin Piszcz
  0 siblings, 2 replies; 9+ messages in thread
From: Stian Jordet @ 2006-08-24  9:45 UTC (permalink / raw)
  To: xfs

I got this on my server today, while it was not doing anything in
particular...

Aug 24 09:22:09 buick kernel: xfs_da_do_buf: bno 16777216
Aug 24 09:22:09 buick kernel: dir: inode 14715927
Aug 24 09:22:09 buick kernel: Filesystem "rd/c0d0p1": XFS internal error
xfs_da_do_buf(1) at line 2119 of file fs/xfs/xfs_da_btree.c.  Caller
0xc029d81c
Aug 24 09:22:09 buick kernel:  <c029d2ef> xfs_da_do_buf+0x4ff/0x980
<c029d81c> xfs_da_read_buf+0x3c/0x40
Aug 24 09:22:09 buick kernel:  <c02aacc8>
xfs_dir2_leafn_lookup_int+0x2e8/0x520  <c02aacc8>
xfs_dir2_leafn_lookup_int+0x2e8/0x520
Aug 24 09:22:09 buick kernel:  <c02a538d>
xfs_dir2_data_log_unused+0x6d/0x90  <c029d81c> xfs_da_read_buf+0x3c/0x40
Aug 24 09:22:09 buick kernel:  <c02a8dd8>
xfs_dir2_node_removename+0x368/0x5b0  <c02a8dd8>
xfs_dir2_node_removename+0x368/0x5b0
Aug 24 09:22:09 buick kernel:  <c02a3389> xfs_dir2_removename+0x129/0x130
<c02cd593> xfs_icsb_modify_counters_int+0x73/0x1d0
Aug 24 09:22:09 buick kernel:  <c02d4beb> xfs_trans_ijoin+0x3b/0x90
<c02de404> xfs_remove+0x314/0x510
Aug 24 09:22:09 buick kernel:  <c016e220> vfs_permission+0x20/0x30
<c02e8e9a> xfs_vn_unlink+0x3a/0x70
Aug 24 09:22:09 buick kernel:  <c02da26f> xfs_access+0x4f/0x60  <c02e91e6>
xfs_vn_permission+0x26/0x30
Aug 24 09:22:09 buick kernel:  <c016cff3> permission+0x73/0x110  <c016d7f3>
may_delete+0x43/0x130
Aug 24 09:22:09 buick kernel:  <c016dea1> vfs_unlink+0xc1/0x120  <c01700c1>
do_unlinkat+0xe1/0x170
Aug 24 09:22:09 buick kernel:  <c0102fcf> syscall_call+0x7/0xb
Aug 24 09:22:09 buick kernel: Filesystem "rd/c0d0p1": XFS internal error
xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c.  Caller 0xc02de475
Aug 24 09:22:09 buick kernel:  <c02d2d2b> xfs_trans_cancel+0x10b/0x140
<c02de475> xfs_remove+0x385/0x510
Aug 24 09:22:09 buick kernel:  <c02de475> xfs_remove+0x385/0x510  <c016e220>
vfs_permission+0x20/0x30
Aug 24 09:22:09 buick kernel:  <c02e8e9a> xfs_vn_unlink+0x3a/0x70
<c02da26f> xfs_access+0x4f/0x60
Aug 24 09:22:09 buick kernel:  <c02e91e6> xfs_vn_permission+0x26/0x30
<c016cff3> permission+0x73/0x110
Aug 24 09:22:09 buick kernel:  <c016d7f3> may_delete+0x43/0x130  <c016dea1>
vfs_unlink+0xc1/0x120
Aug 24 09:22:09 buick kernel:  <c01700c1> do_unlinkat+0xe1/0x170  <c0102fcf>
syscall_call+0x7/0xb
Aug 24 09:22:09 buick kernel: Filesystem "rd/c0d0p1": Corruption of
in-memory data detected.  Shutting down filesystem: rd/c0d0p1
Aug 24 09:22:09 buick kernel: Please umount the filesystem, and rectify the
problem(s)

I'll update to latest 2.6.17.11 tonight, but I wonder if this is a known
bug? I did upgrade the memory on this server from 2GB to 4GB about a week
ago, there is a slight chance there's faulty ram in there, but I don't think
that's the problem. And please, if this is the wrong place for problems like
this, I'm really sorry.

And after I went home in my lunch to restart it, it came up fine, but now
it's dead again. Need to investigate more tonight...

Any thoughts?

Best regards,
Stian


[[HTML alternate version deleted]]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs bug in 2.6.17.9?
  2006-08-24  9:45 xfs bug in 2.6.17.9? Stian Jordet
@ 2006-08-24 12:29 ` Martin Steigerwald
  2006-08-24 23:45   ` Stian Jordet
  2006-08-24 13:42 ` Justin Piszcz
  1 sibling, 1 reply; 9+ messages in thread
From: Martin Steigerwald @ 2006-08-24 12:29 UTC (permalink / raw)
  To: liste; +Cc: xfs

Am Donnerstag 24 August 2006 11:45 schrieb Stian Jordet:
> I got this on my server today, while it was not doing anything in
> particular...
>
> Aug 24 09:22:09 buick kernel: xfs_da_do_buf: bno 16777216

Hello Stian,

It looks to me that the directory corruption bug in kernel 2.6.17 upto 
2.6.17.6 hit you: Did you use a 2.6.17 kernel < 2.6.17.7 before? 

See 

http://oss.sgi.com/projects/xfs/faq.html#dir2
http://bugzilla.kernel.org/show_bug.cgi?id=6757

Try xfs_check and if it finds errors xfs_repair. 

If xfs_repair cannot fix it, you will have to look out a version that 
contains some fixes related to handling this kind of corruption:

http://oss.sgi.com/archives/xfs/2006-07/msg00374.html

Regards,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs bug in 2.6.17.9?
  2006-08-24  9:45 xfs bug in 2.6.17.9? Stian Jordet
  2006-08-24 12:29 ` Martin Steigerwald
@ 2006-08-24 13:42 ` Justin Piszcz
  2006-08-24 23:47   ` Stian Jordet
  1 sibling, 1 reply; 9+ messages in thread
From: Justin Piszcz @ 2006-08-24 13:42 UTC (permalink / raw)
  To: liste; +Cc: xfs

Aug 24 09:22:09 buick kernel: xfs_da_do_buf: bno 16777216

That is the bug from 2.6.17 -> 2.6.17.6.

It was patched in 2.6.17.7, but I assume(?) if you never fixed your FS to 
begin with, you will still have the problem.  Read the XFS FAQ, and backup 
your data before you do that :)

Justin.

On Thu, 24 Aug 2006, Stian Jordet wrote:

> I got this on my server today, while it was not doing anything in
> particular...
>
> Aug 24 09:22:09 buick kernel: xfs_da_do_buf: bno 16777216
> Aug 24 09:22:09 buick kernel: dir: inode 14715927
> Aug 24 09:22:09 buick kernel: Filesystem "rd/c0d0p1": XFS internal error
> xfs_da_do_buf(1) at line 2119 of file fs/xfs/xfs_da_btree.c.  Caller
> 0xc029d81c
> Aug 24 09:22:09 buick kernel:  <c029d2ef> xfs_da_do_buf+0x4ff/0x980
> <c029d81c> xfs_da_read_buf+0x3c/0x40
> Aug 24 09:22:09 buick kernel:  <c02aacc8>
> xfs_dir2_leafn_lookup_int+0x2e8/0x520  <c02aacc8>
> xfs_dir2_leafn_lookup_int+0x2e8/0x520
> Aug 24 09:22:09 buick kernel:  <c02a538d>
> xfs_dir2_data_log_unused+0x6d/0x90  <c029d81c> xfs_da_read_buf+0x3c/0x40
> Aug 24 09:22:09 buick kernel:  <c02a8dd8>
> xfs_dir2_node_removename+0x368/0x5b0  <c02a8dd8>
> xfs_dir2_node_removename+0x368/0x5b0
> Aug 24 09:22:09 buick kernel:  <c02a3389> xfs_dir2_removename+0x129/0x130
> <c02cd593> xfs_icsb_modify_counters_int+0x73/0x1d0
> Aug 24 09:22:09 buick kernel:  <c02d4beb> xfs_trans_ijoin+0x3b/0x90
> <c02de404> xfs_remove+0x314/0x510
> Aug 24 09:22:09 buick kernel:  <c016e220> vfs_permission+0x20/0x30
> <c02e8e9a> xfs_vn_unlink+0x3a/0x70
> Aug 24 09:22:09 buick kernel:  <c02da26f> xfs_access+0x4f/0x60  <c02e91e6>
> xfs_vn_permission+0x26/0x30
> Aug 24 09:22:09 buick kernel:  <c016cff3> permission+0x73/0x110  <c016d7f3>
> may_delete+0x43/0x130
> Aug 24 09:22:09 buick kernel:  <c016dea1> vfs_unlink+0xc1/0x120  <c01700c1>
> do_unlinkat+0xe1/0x170
> Aug 24 09:22:09 buick kernel:  <c0102fcf> syscall_call+0x7/0xb
> Aug 24 09:22:09 buick kernel: Filesystem "rd/c0d0p1": XFS internal error
> xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c.  Caller 0xc02de475
> Aug 24 09:22:09 buick kernel:  <c02d2d2b> xfs_trans_cancel+0x10b/0x140
> <c02de475> xfs_remove+0x385/0x510
> Aug 24 09:22:09 buick kernel:  <c02de475> xfs_remove+0x385/0x510  <c016e220>
> vfs_permission+0x20/0x30
> Aug 24 09:22:09 buick kernel:  <c02e8e9a> xfs_vn_unlink+0x3a/0x70
> <c02da26f> xfs_access+0x4f/0x60
> Aug 24 09:22:09 buick kernel:  <c02e91e6> xfs_vn_permission+0x26/0x30
> <c016cff3> permission+0x73/0x110
> Aug 24 09:22:09 buick kernel:  <c016d7f3> may_delete+0x43/0x130  <c016dea1>
> vfs_unlink+0xc1/0x120
> Aug 24 09:22:09 buick kernel:  <c01700c1> do_unlinkat+0xe1/0x170  <c0102fcf>
> syscall_call+0x7/0xb
> Aug 24 09:22:09 buick kernel: Filesystem "rd/c0d0p1": Corruption of
> in-memory data detected.  Shutting down filesystem: rd/c0d0p1
> Aug 24 09:22:09 buick kernel: Please umount the filesystem, and rectify the
> problem(s)
>
> I'll update to latest 2.6.17.11 tonight, but I wonder if this is a known
> bug? I did upgrade the memory on this server from 2GB to 4GB about a week
> ago, there is a slight chance there's faulty ram in there, but I don't think
> that's the problem. And please, if this is the wrong place for problems like
> this, I'm really sorry.
>
> And after I went home in my lunch to restart it, it came up fine, but now
> it's dead again. Need to investigate more tonight...
>
> Any thoughts?
>
> Best regards,
> Stian
>
>
> [[HTML alternate version deleted]]
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs bug in 2.6.17.9?
  2006-08-24 12:29 ` Martin Steigerwald
@ 2006-08-24 23:45   ` Stian Jordet
  0 siblings, 0 replies; 9+ messages in thread
From: Stian Jordet @ 2006-08-24 23:45 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: xfs

tor, 24,.08.2006 kl. 14.29 +0200, skrev Martin Steigerwald:
> It looks to me that the directory corruption bug in kernel 2.6.17 upto 
> 2.6.17.6 hit you: Did you use a 2.6.17 kernel < 2.6.17.7 before? 
> 
> See 
> 
> http://oss.sgi.com/projects/xfs/faq.html#dir2
> http://bugzilla.kernel.org/show_bug.cgi?id=6757
> 
> Try xfs_check and if it finds errors xfs_repair. 
> 
> If xfs_repair cannot fix it, you will have to look out a version that 
> contains some fixes related to handling this kind of corruption:
> 
> http://oss.sgi.com/archives/xfs/2006-07/msg00374.html

Martin,

thanks for your help. I did use both 2.6.17.1 and 2.6.17.3 before
2.6.17.9... So I guess (hope) that's the problem, and not my memory... 

I have run xfs_repair 2.8.11 on two filesystems with errors (luckily,
neither my /home nor my backup partition seems to be hit), and it find
some errors, but if I run it again, I finds the same errors over and
over again... I seem to have it up and running again now, but I really
don't like that xfs_repair shows a lot of errors on each run. Don't like
that at all... It says it has fixed the errors, but I just never get rid
of them.

Is that normal? Guess not. And is it something that can happen with the
directory corruption bug? I read that I needed to have xfsprogs >2.8.10,
but that didn't help neither...

Best regards,
Stian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs bug in 2.6.17.9?
  2006-08-24 13:42 ` Justin Piszcz
@ 2006-08-24 23:47   ` Stian Jordet
  2006-08-25  1:58     ` Nathan Scott
  2006-08-25  5:08     ` Chris Wedgwood
  0 siblings, 2 replies; 9+ messages in thread
From: Stian Jordet @ 2006-08-24 23:47 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: xfs

tor, 24,.08.2006 kl. 09.42 -0400, skrev Justin Piszcz:
> Aug 24 09:22:09 buick kernel: xfs_da_do_buf: bno 16777216
> 
> That is the bug from 2.6.17 -> 2.6.17.6.
> 
> It was patched in 2.6.17.7, but I assume(?) if you never fixed your FS to 
> begin with, you will still have the problem.  Read the XFS FAQ, and backup 
> your data before you do that :)

As I just wrote to Martin, I did run a couple of those kernels. But even
with updated xfsprogs I can't fix the errors... Is that "normal", or am
I in deep trouble?

Best regards,
Stian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs bug in 2.6.17.9?
  2006-08-24 23:47   ` Stian Jordet
@ 2006-08-25  1:58     ` Nathan Scott
  2006-08-25  6:15       ` Stian Jordet
  2006-08-25  5:08     ` Chris Wedgwood
  1 sibling, 1 reply; 9+ messages in thread
From: Nathan Scott @ 2006-08-25  1:58 UTC (permalink / raw)
  To: Stian Jordet; +Cc: Justin Piszcz, xfs

On Fri, Aug 25, 2006 at 01:47:43AM +0200, Stian Jordet wrote:
> tor, 24,.08.2006 kl. 09.42 -0400, skrev Justin Piszcz:
> > Aug 24 09:22:09 buick kernel: xfs_da_do_buf: bno 16777216
> > 
> > That is the bug from 2.6.17 -> 2.6.17.6.
> > 
> > It was patched in 2.6.17.7, but I assume(?) if you never fixed your FS to 
> > begin with, you will still have the problem.  Read the XFS FAQ, and backup 
> > your data before you do that :)
> 
> As I just wrote to Martin, I did run a couple of those kernels. But even
> with updated xfsprogs I can't fix the errors... Is that "normal", or am
> I in deep trouble?

This is likely to be lost+found being recreated each time, its
normal if you don't do something about the lost+found files -
once those are renamed/removed, it should run cleanly.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs bug in 2.6.17.9?
  2006-08-24 23:47   ` Stian Jordet
  2006-08-25  1:58     ` Nathan Scott
@ 2006-08-25  5:08     ` Chris Wedgwood
  1 sibling, 0 replies; 9+ messages in thread
From: Chris Wedgwood @ 2006-08-25  5:08 UTC (permalink / raw)
  To: Stian Jordet; +Cc: Justin Piszcz, xfs

On Fri, Aug 25, 2006 at 01:47:43AM +0200, Stian Jordet wrote:

> As I just wrote to Martin, I did run a couple of those kernels. But
> even with updated xfsprogs I can't fix the errors... Is that
> "normal", or am I in deep trouble?

More recent xfs_repair will deal better with it, I think it's in CVS
now, I'm not entirely sure though.  Search for the XFS faq, at the end
there is a section on directory corruption and details on how to fix
it by hand if need be.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs bug in 2.6.17.9?
  2006-08-25  1:58     ` Nathan Scott
@ 2006-08-25  6:15       ` Stian Jordet
  2006-08-25  8:38         ` Stian Jordet
  0 siblings, 1 reply; 9+ messages in thread
From: Stian Jordet @ 2006-08-25  6:15 UTC (permalink / raw)
  To: Nathan Scott; +Cc: Justin Piszcz, xfs

fre, 25,.08.2006 kl. 11.58 +1000, skrev Nathan Scott:
> On Fri, Aug 25, 2006 at 01:47:43AM +0200, Stian Jordet wrote:
> > tor, 24,.08.2006 kl. 09.42 -0400, skrev Justin Piszcz:
> > > Aug 24 09:22:09 buick kernel: xfs_da_do_buf: bno 16777216
> > > 
> > > That is the bug from 2.6.17 -> 2.6.17.6.
> > > 
> > > It was patched in 2.6.17.7, but I assume(?) if you never fixed your FS to 
> > > begin with, you will still have the problem.  Read the XFS FAQ, and backup 
> > > your data before you do that :)
> > 
> > As I just wrote to Martin, I did run a couple of those kernels. But even
> > with updated xfsprogs I can't fix the errors... Is that "normal", or am
> > I in deep trouble?
> 
> This is likely to be lost+found being recreated each time, its
> normal if you don't do something about the lost+found files -
> once those are renamed/removed, it should run cleanly.

You seem to be right about that :)

But when I wake up this morning, I had my logs full of this:

0x0: 24 73 74 61 74 73 20 3d 20 7b 0a 20 20 27 73 68
Filesystem "rd/c0d1p1": XFS internal error xfs_da_do_buf(2) at line 2212
of file fs/xfs/xfs_da_btree.c.  Caller 0xc029d81c
 <c02b0b0b> xfs_corruption_error+0x10b/0x140  <c029d81c> xfs_da_read_buf
+0x3c/0x40
 <c02e10a1> kmem_zone_alloc+0x61/0xe0  <c029cd9a> xfs_da_buf_make
+0xfa/0x150
 <c029d719> xfs_da_do_buf+0x929/0x980  <c029d81c> xfs_da_read_buf
+0x3c/0x40
 <c029d81c> xfs_da_read_buf+0x3c/0x40  <c02a05fd> xfs_da_node_lookup_int
+0xcd/0x3b0
 <c02a05fd> xfs_da_node_lookup_int+0xcd/0x3b0  <c02a899f>
xfs_dir2_node_lookup+0x3f/0xc0
 <c02a325a> xfs_dir2_lookup+0x12a/0x130  <c02e91e6> xfs_vn_permission
+0x26/0x30
 <c016e220> vfs_permission+0x20/0x30  <c016e84a> __link_path_walk
+0x8a/0xfa0
 <c02d58cc> xfs_dir_lookup_int+0x4c/0x130  <c02da1fe> xfs_lookup
+0x7e/0xa0
 <c02e963e> xfs_vn_lookup+0x4e/0x90  <c016e119> __lookup_hash+0xe9/0x120
 <c0170088> do_unlinkat+0xa8/0x170  <c0168947> sys_stat64+0x27/0x30
 <c0102fcf> syscall_call+0x7/0xb

Don't know how many times, but many! Is that related to anything...?

Thanks!

Best regards,
Stian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs bug in 2.6.17.9?
  2006-08-25  6:15       ` Stian Jordet
@ 2006-08-25  8:38         ` Stian Jordet
  0 siblings, 0 replies; 9+ messages in thread
From: Stian Jordet @ 2006-08-25  8:38 UTC (permalink / raw)
  To: Nathan Scott; +Cc: Justin Piszcz, xfs

Stian Jordet wrote:
> fre, 25,.08.2006 kl. 11.58 +1000, skrev Nathan Scott:
>   
>> This is likely to be lost+found being recreated each time, its
>> normal if you don't do something about the lost+found files -
>> once those are renamed/removed, it should run cleanly.
>>     
>
> You seem to be right about that :)
>
> But when I wake up this morning, I had my logs full of this:
>
> 0x0: 24 73 74 61 74 73 20 3d 20 7b 0a 20 20 27 73 68
> Filesystem "rd/c0d1p1": XFS internal error xfs_da_do_buf(2) at line 2212
> of file fs/xfs/xfs_da_btree.c.  Caller 0xc029d81c
>  <c02b0b0b> xfs_corruption_error+0x10b/0x140  <c029d81c> xfs_da_read_buf
> +0x3c/0x40
>  <c02e10a1> kmem_zone_alloc+0x61/0xe0  <c029cd9a> xfs_da_buf_make
> +0xfa/0x150
>  <c029d719> xfs_da_do_buf+0x929/0x980  <c029d81c> xfs_da_read_buf
> +0x3c/0x40
>  <c029d81c> xfs_da_read_buf+0x3c/0x40  <c02a05fd> xfs_da_node_lookup_int
> +0xcd/0x3b0
>  <c02a05fd> xfs_da_node_lookup_int+0xcd/0x3b0  <c02a899f>
> xfs_dir2_node_lookup+0x3f/0xc0
>  <c02a325a> xfs_dir2_lookup+0x12a/0x130  <c02e91e6> xfs_vn_permission
> +0x26/0x30
>  <c016e220> vfs_permission+0x20/0x30  <c016e84a> __link_path_walk
> +0x8a/0xfa0
>  <c02d58cc> xfs_dir_lookup_int+0x4c/0x130  <c02da1fe> xfs_lookup
> +0x7e/0xa0
>  <c02e963e> xfs_vn_lookup+0x4e/0x90  <c016e119> __lookup_hash+0xe9/0x120
>  <c0170088> do_unlinkat+0xa8/0x170  <c0168947> sys_stat64+0x27/0x30
>  <c0102fcf> syscall_call+0x7/0xb
>
> Don't know how many times, but many! Is that related to anything...?
>   

It seems I just hadn't used a recent enough xfs_repair with that 
filesystem. Seems good now. Just one last question, are you 99,5% sure 
that this is the symptoms of that corruption bug in 2.6.17? So I can 
assume that my memory wasn't the problem? I'm now running with only 
512MB (which I'm sure is good), and I don't want to use the new memory 
if I get this problem again (even though I have good backups, it's a 
hell of a job fixing it again...)

Thank you.

Best regards,
Stian

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-08-25  8:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-08-24  9:45 xfs bug in 2.6.17.9? Stian Jordet
2006-08-24 12:29 ` Martin Steigerwald
2006-08-24 23:45   ` Stian Jordet
2006-08-24 13:42 ` Justin Piszcz
2006-08-24 23:47   ` Stian Jordet
2006-08-25  1:58     ` Nathan Scott
2006-08-25  6:15       ` Stian Jordet
2006-08-25  8:38         ` Stian Jordet
2006-08-25  5:08     ` Chris Wedgwood

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.