* xfs bug in 2.6.17.9?
@ 2006-08-24 9:45 Stian Jordet
2006-08-24 12:29 ` Martin Steigerwald
2006-08-24 13:42 ` Justin Piszcz
0 siblings, 2 replies; 9+ messages in thread
From: Stian Jordet @ 2006-08-24 9:45 UTC (permalink / raw)
To: xfs
I got this on my server today, while it was not doing anything in
particular...
Aug 24 09:22:09 buick kernel: xfs_da_do_buf: bno 16777216
Aug 24 09:22:09 buick kernel: dir: inode 14715927
Aug 24 09:22:09 buick kernel: Filesystem "rd/c0d0p1": XFS internal error
xfs_da_do_buf(1) at line 2119 of file fs/xfs/xfs_da_btree.c. Caller
0xc029d81c
Aug 24 09:22:09 buick kernel: <c029d2ef> xfs_da_do_buf+0x4ff/0x980
<c029d81c> xfs_da_read_buf+0x3c/0x40
Aug 24 09:22:09 buick kernel: <c02aacc8>
xfs_dir2_leafn_lookup_int+0x2e8/0x520 <c02aacc8>
xfs_dir2_leafn_lookup_int+0x2e8/0x520
Aug 24 09:22:09 buick kernel: <c02a538d>
xfs_dir2_data_log_unused+0x6d/0x90 <c029d81c> xfs_da_read_buf+0x3c/0x40
Aug 24 09:22:09 buick kernel: <c02a8dd8>
xfs_dir2_node_removename+0x368/0x5b0 <c02a8dd8>
xfs_dir2_node_removename+0x368/0x5b0
Aug 24 09:22:09 buick kernel: <c02a3389> xfs_dir2_removename+0x129/0x130
<c02cd593> xfs_icsb_modify_counters_int+0x73/0x1d0
Aug 24 09:22:09 buick kernel: <c02d4beb> xfs_trans_ijoin+0x3b/0x90
<c02de404> xfs_remove+0x314/0x510
Aug 24 09:22:09 buick kernel: <c016e220> vfs_permission+0x20/0x30
<c02e8e9a> xfs_vn_unlink+0x3a/0x70
Aug 24 09:22:09 buick kernel: <c02da26f> xfs_access+0x4f/0x60 <c02e91e6>
xfs_vn_permission+0x26/0x30
Aug 24 09:22:09 buick kernel: <c016cff3> permission+0x73/0x110 <c016d7f3>
may_delete+0x43/0x130
Aug 24 09:22:09 buick kernel: <c016dea1> vfs_unlink+0xc1/0x120 <c01700c1>
do_unlinkat+0xe1/0x170
Aug 24 09:22:09 buick kernel: <c0102fcf> syscall_call+0x7/0xb
Aug 24 09:22:09 buick kernel: Filesystem "rd/c0d0p1": XFS internal error
xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c. Caller 0xc02de475
Aug 24 09:22:09 buick kernel: <c02d2d2b> xfs_trans_cancel+0x10b/0x140
<c02de475> xfs_remove+0x385/0x510
Aug 24 09:22:09 buick kernel: <c02de475> xfs_remove+0x385/0x510 <c016e220>
vfs_permission+0x20/0x30
Aug 24 09:22:09 buick kernel: <c02e8e9a> xfs_vn_unlink+0x3a/0x70
<c02da26f> xfs_access+0x4f/0x60
Aug 24 09:22:09 buick kernel: <c02e91e6> xfs_vn_permission+0x26/0x30
<c016cff3> permission+0x73/0x110
Aug 24 09:22:09 buick kernel: <c016d7f3> may_delete+0x43/0x130 <c016dea1>
vfs_unlink+0xc1/0x120
Aug 24 09:22:09 buick kernel: <c01700c1> do_unlinkat+0xe1/0x170 <c0102fcf>
syscall_call+0x7/0xb
Aug 24 09:22:09 buick kernel: Filesystem "rd/c0d0p1": Corruption of
in-memory data detected. Shutting down filesystem: rd/c0d0p1
Aug 24 09:22:09 buick kernel: Please umount the filesystem, and rectify the
problem(s)
I'll update to latest 2.6.17.11 tonight, but I wonder if this is a known
bug? I did upgrade the memory on this server from 2GB to 4GB about a week
ago, there is a slight chance there's faulty ram in there, but I don't think
that's the problem. And please, if this is the wrong place for problems like
this, I'm really sorry.
And after I went home in my lunch to restart it, it came up fine, but now
it's dead again. Need to investigate more tonight...
Any thoughts?
Best regards,
Stian
[[HTML alternate version deleted]]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: xfs bug in 2.6.17.9?
2006-08-24 9:45 xfs bug in 2.6.17.9? Stian Jordet
@ 2006-08-24 12:29 ` Martin Steigerwald
2006-08-24 23:45 ` Stian Jordet
2006-08-24 13:42 ` Justin Piszcz
1 sibling, 1 reply; 9+ messages in thread
From: Martin Steigerwald @ 2006-08-24 12:29 UTC (permalink / raw)
To: liste; +Cc: xfs
Am Donnerstag 24 August 2006 11:45 schrieb Stian Jordet:
> I got this on my server today, while it was not doing anything in
> particular...
>
> Aug 24 09:22:09 buick kernel: xfs_da_do_buf: bno 16777216
Hello Stian,
It looks to me that the directory corruption bug in kernel 2.6.17 upto
2.6.17.6 hit you: Did you use a 2.6.17 kernel < 2.6.17.7 before?
See
http://oss.sgi.com/projects/xfs/faq.html#dir2
http://bugzilla.kernel.org/show_bug.cgi?id=6757
Try xfs_check and if it finds errors xfs_repair.
If xfs_repair cannot fix it, you will have to look out a version that
contains some fixes related to handling this kind of corruption:
http://oss.sgi.com/archives/xfs/2006-07/msg00374.html
Regards,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: xfs bug in 2.6.17.9?
2006-08-24 9:45 xfs bug in 2.6.17.9? Stian Jordet
2006-08-24 12:29 ` Martin Steigerwald
@ 2006-08-24 13:42 ` Justin Piszcz
2006-08-24 23:47 ` Stian Jordet
1 sibling, 1 reply; 9+ messages in thread
From: Justin Piszcz @ 2006-08-24 13:42 UTC (permalink / raw)
To: liste; +Cc: xfs
Aug 24 09:22:09 buick kernel: xfs_da_do_buf: bno 16777216
That is the bug from 2.6.17 -> 2.6.17.6.
It was patched in 2.6.17.7, but I assume(?) if you never fixed your FS to
begin with, you will still have the problem. Read the XFS FAQ, and backup
your data before you do that :)
Justin.
On Thu, 24 Aug 2006, Stian Jordet wrote:
> I got this on my server today, while it was not doing anything in
> particular...
>
> Aug 24 09:22:09 buick kernel: xfs_da_do_buf: bno 16777216
> Aug 24 09:22:09 buick kernel: dir: inode 14715927
> Aug 24 09:22:09 buick kernel: Filesystem "rd/c0d0p1": XFS internal error
> xfs_da_do_buf(1) at line 2119 of file fs/xfs/xfs_da_btree.c. Caller
> 0xc029d81c
> Aug 24 09:22:09 buick kernel: <c029d2ef> xfs_da_do_buf+0x4ff/0x980
> <c029d81c> xfs_da_read_buf+0x3c/0x40
> Aug 24 09:22:09 buick kernel: <c02aacc8>
> xfs_dir2_leafn_lookup_int+0x2e8/0x520 <c02aacc8>
> xfs_dir2_leafn_lookup_int+0x2e8/0x520
> Aug 24 09:22:09 buick kernel: <c02a538d>
> xfs_dir2_data_log_unused+0x6d/0x90 <c029d81c> xfs_da_read_buf+0x3c/0x40
> Aug 24 09:22:09 buick kernel: <c02a8dd8>
> xfs_dir2_node_removename+0x368/0x5b0 <c02a8dd8>
> xfs_dir2_node_removename+0x368/0x5b0
> Aug 24 09:22:09 buick kernel: <c02a3389> xfs_dir2_removename+0x129/0x130
> <c02cd593> xfs_icsb_modify_counters_int+0x73/0x1d0
> Aug 24 09:22:09 buick kernel: <c02d4beb> xfs_trans_ijoin+0x3b/0x90
> <c02de404> xfs_remove+0x314/0x510
> Aug 24 09:22:09 buick kernel: <c016e220> vfs_permission+0x20/0x30
> <c02e8e9a> xfs_vn_unlink+0x3a/0x70
> Aug 24 09:22:09 buick kernel: <c02da26f> xfs_access+0x4f/0x60 <c02e91e6>
> xfs_vn_permission+0x26/0x30
> Aug 24 09:22:09 buick kernel: <c016cff3> permission+0x73/0x110 <c016d7f3>
> may_delete+0x43/0x130
> Aug 24 09:22:09 buick kernel: <c016dea1> vfs_unlink+0xc1/0x120 <c01700c1>
> do_unlinkat+0xe1/0x170
> Aug 24 09:22:09 buick kernel: <c0102fcf> syscall_call+0x7/0xb
> Aug 24 09:22:09 buick kernel: Filesystem "rd/c0d0p1": XFS internal error
> xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c. Caller 0xc02de475
> Aug 24 09:22:09 buick kernel: <c02d2d2b> xfs_trans_cancel+0x10b/0x140
> <c02de475> xfs_remove+0x385/0x510
> Aug 24 09:22:09 buick kernel: <c02de475> xfs_remove+0x385/0x510 <c016e220>
> vfs_permission+0x20/0x30
> Aug 24 09:22:09 buick kernel: <c02e8e9a> xfs_vn_unlink+0x3a/0x70
> <c02da26f> xfs_access+0x4f/0x60
> Aug 24 09:22:09 buick kernel: <c02e91e6> xfs_vn_permission+0x26/0x30
> <c016cff3> permission+0x73/0x110
> Aug 24 09:22:09 buick kernel: <c016d7f3> may_delete+0x43/0x130 <c016dea1>
> vfs_unlink+0xc1/0x120
> Aug 24 09:22:09 buick kernel: <c01700c1> do_unlinkat+0xe1/0x170 <c0102fcf>
> syscall_call+0x7/0xb
> Aug 24 09:22:09 buick kernel: Filesystem "rd/c0d0p1": Corruption of
> in-memory data detected. Shutting down filesystem: rd/c0d0p1
> Aug 24 09:22:09 buick kernel: Please umount the filesystem, and rectify the
> problem(s)
>
> I'll update to latest 2.6.17.11 tonight, but I wonder if this is a known
> bug? I did upgrade the memory on this server from 2GB to 4GB about a week
> ago, there is a slight chance there's faulty ram in there, but I don't think
> that's the problem. And please, if this is the wrong place for problems like
> this, I'm really sorry.
>
> And after I went home in my lunch to restart it, it came up fine, but now
> it's dead again. Need to investigate more tonight...
>
> Any thoughts?
>
> Best regards,
> Stian
>
>
> [[HTML alternate version deleted]]
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: xfs bug in 2.6.17.9?
2006-08-24 12:29 ` Martin Steigerwald
@ 2006-08-24 23:45 ` Stian Jordet
0 siblings, 0 replies; 9+ messages in thread
From: Stian Jordet @ 2006-08-24 23:45 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: xfs
tor, 24,.08.2006 kl. 14.29 +0200, skrev Martin Steigerwald:
> It looks to me that the directory corruption bug in kernel 2.6.17 upto
> 2.6.17.6 hit you: Did you use a 2.6.17 kernel < 2.6.17.7 before?
>
> See
>
> http://oss.sgi.com/projects/xfs/faq.html#dir2
> http://bugzilla.kernel.org/show_bug.cgi?id=6757
>
> Try xfs_check and if it finds errors xfs_repair.
>
> If xfs_repair cannot fix it, you will have to look out a version that
> contains some fixes related to handling this kind of corruption:
>
> http://oss.sgi.com/archives/xfs/2006-07/msg00374.html
Martin,
thanks for your help. I did use both 2.6.17.1 and 2.6.17.3 before
2.6.17.9... So I guess (hope) that's the problem, and not my memory...
I have run xfs_repair 2.8.11 on two filesystems with errors (luckily,
neither my /home nor my backup partition seems to be hit), and it find
some errors, but if I run it again, I finds the same errors over and
over again... I seem to have it up and running again now, but I really
don't like that xfs_repair shows a lot of errors on each run. Don't like
that at all... It says it has fixed the errors, but I just never get rid
of them.
Is that normal? Guess not. And is it something that can happen with the
directory corruption bug? I read that I needed to have xfsprogs >2.8.10,
but that didn't help neither...
Best regards,
Stian
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: xfs bug in 2.6.17.9?
2006-08-24 13:42 ` Justin Piszcz
@ 2006-08-24 23:47 ` Stian Jordet
2006-08-25 1:58 ` Nathan Scott
2006-08-25 5:08 ` Chris Wedgwood
0 siblings, 2 replies; 9+ messages in thread
From: Stian Jordet @ 2006-08-24 23:47 UTC (permalink / raw)
To: Justin Piszcz; +Cc: xfs
tor, 24,.08.2006 kl. 09.42 -0400, skrev Justin Piszcz:
> Aug 24 09:22:09 buick kernel: xfs_da_do_buf: bno 16777216
>
> That is the bug from 2.6.17 -> 2.6.17.6.
>
> It was patched in 2.6.17.7, but I assume(?) if you never fixed your FS to
> begin with, you will still have the problem. Read the XFS FAQ, and backup
> your data before you do that :)
As I just wrote to Martin, I did run a couple of those kernels. But even
with updated xfsprogs I can't fix the errors... Is that "normal", or am
I in deep trouble?
Best regards,
Stian
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: xfs bug in 2.6.17.9?
2006-08-24 23:47 ` Stian Jordet
@ 2006-08-25 1:58 ` Nathan Scott
2006-08-25 6:15 ` Stian Jordet
2006-08-25 5:08 ` Chris Wedgwood
1 sibling, 1 reply; 9+ messages in thread
From: Nathan Scott @ 2006-08-25 1:58 UTC (permalink / raw)
To: Stian Jordet; +Cc: Justin Piszcz, xfs
On Fri, Aug 25, 2006 at 01:47:43AM +0200, Stian Jordet wrote:
> tor, 24,.08.2006 kl. 09.42 -0400, skrev Justin Piszcz:
> > Aug 24 09:22:09 buick kernel: xfs_da_do_buf: bno 16777216
> >
> > That is the bug from 2.6.17 -> 2.6.17.6.
> >
> > It was patched in 2.6.17.7, but I assume(?) if you never fixed your FS to
> > begin with, you will still have the problem. Read the XFS FAQ, and backup
> > your data before you do that :)
>
> As I just wrote to Martin, I did run a couple of those kernels. But even
> with updated xfsprogs I can't fix the errors... Is that "normal", or am
> I in deep trouble?
This is likely to be lost+found being recreated each time, its
normal if you don't do something about the lost+found files -
once those are renamed/removed, it should run cleanly.
cheers.
--
Nathan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: xfs bug in 2.6.17.9?
2006-08-24 23:47 ` Stian Jordet
2006-08-25 1:58 ` Nathan Scott
@ 2006-08-25 5:08 ` Chris Wedgwood
1 sibling, 0 replies; 9+ messages in thread
From: Chris Wedgwood @ 2006-08-25 5:08 UTC (permalink / raw)
To: Stian Jordet; +Cc: Justin Piszcz, xfs
On Fri, Aug 25, 2006 at 01:47:43AM +0200, Stian Jordet wrote:
> As I just wrote to Martin, I did run a couple of those kernels. But
> even with updated xfsprogs I can't fix the errors... Is that
> "normal", or am I in deep trouble?
More recent xfs_repair will deal better with it, I think it's in CVS
now, I'm not entirely sure though. Search for the XFS faq, at the end
there is a section on directory corruption and details on how to fix
it by hand if need be.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: xfs bug in 2.6.17.9?
2006-08-25 1:58 ` Nathan Scott
@ 2006-08-25 6:15 ` Stian Jordet
2006-08-25 8:38 ` Stian Jordet
0 siblings, 1 reply; 9+ messages in thread
From: Stian Jordet @ 2006-08-25 6:15 UTC (permalink / raw)
To: Nathan Scott; +Cc: Justin Piszcz, xfs
fre, 25,.08.2006 kl. 11.58 +1000, skrev Nathan Scott:
> On Fri, Aug 25, 2006 at 01:47:43AM +0200, Stian Jordet wrote:
> > tor, 24,.08.2006 kl. 09.42 -0400, skrev Justin Piszcz:
> > > Aug 24 09:22:09 buick kernel: xfs_da_do_buf: bno 16777216
> > >
> > > That is the bug from 2.6.17 -> 2.6.17.6.
> > >
> > > It was patched in 2.6.17.7, but I assume(?) if you never fixed your FS to
> > > begin with, you will still have the problem. Read the XFS FAQ, and backup
> > > your data before you do that :)
> >
> > As I just wrote to Martin, I did run a couple of those kernels. But even
> > with updated xfsprogs I can't fix the errors... Is that "normal", or am
> > I in deep trouble?
>
> This is likely to be lost+found being recreated each time, its
> normal if you don't do something about the lost+found files -
> once those are renamed/removed, it should run cleanly.
You seem to be right about that :)
But when I wake up this morning, I had my logs full of this:
0x0: 24 73 74 61 74 73 20 3d 20 7b 0a 20 20 27 73 68
Filesystem "rd/c0d1p1": XFS internal error xfs_da_do_buf(2) at line 2212
of file fs/xfs/xfs_da_btree.c. Caller 0xc029d81c
<c02b0b0b> xfs_corruption_error+0x10b/0x140 <c029d81c> xfs_da_read_buf
+0x3c/0x40
<c02e10a1> kmem_zone_alloc+0x61/0xe0 <c029cd9a> xfs_da_buf_make
+0xfa/0x150
<c029d719> xfs_da_do_buf+0x929/0x980 <c029d81c> xfs_da_read_buf
+0x3c/0x40
<c029d81c> xfs_da_read_buf+0x3c/0x40 <c02a05fd> xfs_da_node_lookup_int
+0xcd/0x3b0
<c02a05fd> xfs_da_node_lookup_int+0xcd/0x3b0 <c02a899f>
xfs_dir2_node_lookup+0x3f/0xc0
<c02a325a> xfs_dir2_lookup+0x12a/0x130 <c02e91e6> xfs_vn_permission
+0x26/0x30
<c016e220> vfs_permission+0x20/0x30 <c016e84a> __link_path_walk
+0x8a/0xfa0
<c02d58cc> xfs_dir_lookup_int+0x4c/0x130 <c02da1fe> xfs_lookup
+0x7e/0xa0
<c02e963e> xfs_vn_lookup+0x4e/0x90 <c016e119> __lookup_hash+0xe9/0x120
<c0170088> do_unlinkat+0xa8/0x170 <c0168947> sys_stat64+0x27/0x30
<c0102fcf> syscall_call+0x7/0xb
Don't know how many times, but many! Is that related to anything...?
Thanks!
Best regards,
Stian
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: xfs bug in 2.6.17.9?
2006-08-25 6:15 ` Stian Jordet
@ 2006-08-25 8:38 ` Stian Jordet
0 siblings, 0 replies; 9+ messages in thread
From: Stian Jordet @ 2006-08-25 8:38 UTC (permalink / raw)
To: Nathan Scott; +Cc: Justin Piszcz, xfs
Stian Jordet wrote:
> fre, 25,.08.2006 kl. 11.58 +1000, skrev Nathan Scott:
>
>> This is likely to be lost+found being recreated each time, its
>> normal if you don't do something about the lost+found files -
>> once those are renamed/removed, it should run cleanly.
>>
>
> You seem to be right about that :)
>
> But when I wake up this morning, I had my logs full of this:
>
> 0x0: 24 73 74 61 74 73 20 3d 20 7b 0a 20 20 27 73 68
> Filesystem "rd/c0d1p1": XFS internal error xfs_da_do_buf(2) at line 2212
> of file fs/xfs/xfs_da_btree.c. Caller 0xc029d81c
> <c02b0b0b> xfs_corruption_error+0x10b/0x140 <c029d81c> xfs_da_read_buf
> +0x3c/0x40
> <c02e10a1> kmem_zone_alloc+0x61/0xe0 <c029cd9a> xfs_da_buf_make
> +0xfa/0x150
> <c029d719> xfs_da_do_buf+0x929/0x980 <c029d81c> xfs_da_read_buf
> +0x3c/0x40
> <c029d81c> xfs_da_read_buf+0x3c/0x40 <c02a05fd> xfs_da_node_lookup_int
> +0xcd/0x3b0
> <c02a05fd> xfs_da_node_lookup_int+0xcd/0x3b0 <c02a899f>
> xfs_dir2_node_lookup+0x3f/0xc0
> <c02a325a> xfs_dir2_lookup+0x12a/0x130 <c02e91e6> xfs_vn_permission
> +0x26/0x30
> <c016e220> vfs_permission+0x20/0x30 <c016e84a> __link_path_walk
> +0x8a/0xfa0
> <c02d58cc> xfs_dir_lookup_int+0x4c/0x130 <c02da1fe> xfs_lookup
> +0x7e/0xa0
> <c02e963e> xfs_vn_lookup+0x4e/0x90 <c016e119> __lookup_hash+0xe9/0x120
> <c0170088> do_unlinkat+0xa8/0x170 <c0168947> sys_stat64+0x27/0x30
> <c0102fcf> syscall_call+0x7/0xb
>
> Don't know how many times, but many! Is that related to anything...?
>
It seems I just hadn't used a recent enough xfs_repair with that
filesystem. Seems good now. Just one last question, are you 99,5% sure
that this is the symptoms of that corruption bug in 2.6.17? So I can
assume that my memory wasn't the problem? I'm now running with only
512MB (which I'm sure is good), and I don't want to use the new memory
if I get this problem again (even though I have good backups, it's a
hell of a job fixing it again...)
Thank you.
Best regards,
Stian
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2006-08-25 8:39 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-08-24 9:45 xfs bug in 2.6.17.9? Stian Jordet
2006-08-24 12:29 ` Martin Steigerwald
2006-08-24 23:45 ` Stian Jordet
2006-08-24 13:42 ` Justin Piszcz
2006-08-24 23:47 ` Stian Jordet
2006-08-25 1:58 ` Nathan Scott
2006-08-25 6:15 ` Stian Jordet
2006-08-25 8:38 ` Stian Jordet
2006-08-25 5:08 ` Chris Wedgwood
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.