* xfs_repair stops on "traversing filesystem..."
@ 2009-07-09 14:13 Tomek Kruszona
2009-07-09 14:54 ` Eric Sandeen
2009-07-10 5:28 ` Eric Sandeen
0 siblings, 2 replies; 11+ messages in thread
From: Tomek Kruszona @ 2009-07-09 14:13 UTC (permalink / raw)
To: xfs
Hello!
I have a little problem with XFS filesystem that I have on one of my
machines. I try to make xfs_repair that was not making any problems
before, but xfs_repair stops on:
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
CPU usage grows up to 100%. I left it in the night hoping it will finish
job till morning, but the situation hasn't changed...
System is Debian Lenny with current updates and custom 2.6.30.1 kernel
xfsprogs-2.9.8. Filesysystem is placed on LVM2 Logical Volume.
I upgraded xfsprogs to 3.0.2 version and the problem still persists.
Then I reverted to 2.9.8 package from Debian Lenny.
Switching back to debian default 2.6.26 kernel doesn't help too.
I can mount this filesystem and operate on it.
Data on this system is not so crucial, because it's backup/testing
machine, but it would be great to keep this data, because synchronizing
14TB of data will take some time.
Output from xfs_info:
# xfs_info /mnt/storage/
meta-data=/dev/mapper/p02bvg-p02blv isize=256 agcount=32,
agsize=268435455 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=8410889216, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=32768, version=2
= sectsz=512 sunit=0 blks, lazy-count=0
realtime =none extsz=4096 blocks=0, rtextents=0
Any ideas how to make xfs_repair working again?
Best regards,
Tomasz Kruszona
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: xfs_repair stops on "traversing filesystem..."
2009-07-09 14:13 xfs_repair stops on "traversing filesystem..." Tomek Kruszona
@ 2009-07-09 14:54 ` Eric Sandeen
2009-07-09 15:03 ` Tomek Kruszona
2009-07-10 5:28 ` Eric Sandeen
1 sibling, 1 reply; 11+ messages in thread
From: Eric Sandeen @ 2009-07-09 14:54 UTC (permalink / raw)
To: Tomek Kruszona; +Cc: xfs
Tomek Kruszona wrote:
> Hello!
>
> I have a little problem with XFS filesystem that I have on one of my
> machines. I try to make xfs_repair that was not making any problems
> before, but xfs_repair stops on:
>
> Phase 6 - check inode connectivity...
> - resetting contents of realtime bitmap and summary inodes
> - traversing filesystem ...
>
> CPU usage grows up to 100%. I left it in the night hoping it will finish
> job till morning, but the situation hasn't changed...
...
> Any ideas how to make xfs_repair working again?
You might try running with -P, though I doubt that's the issue.
If that doesn't help, you could provide an xfs_metadump image (in public
or to me in private) and I'll take a look to see what's going on.
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: xfs_repair stops on "traversing filesystem..."
2009-07-09 14:54 ` Eric Sandeen
@ 2009-07-09 15:03 ` Tomek Kruszona
0 siblings, 0 replies; 11+ messages in thread
From: Tomek Kruszona @ 2009-07-09 15:03 UTC (permalink / raw)
To: xfs
Eric Sandeen wrote:
> You might try running with -P, though I doubt that's the issue.
>
> If that doesn't help, you could provide an xfs_metadump image (in public
> or to me in private) and I'll take a look to see what's going on.
Thank you! I will just left xfs_repair -P to do the job and let you know
if it helps. Otherwise, I'll send you metadump image in private.
Best regards,
Tomasz Kruszona
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: xfs_repair stops on "traversing filesystem..."
2009-07-09 14:13 xfs_repair stops on "traversing filesystem..." Tomek Kruszona
2009-07-09 14:54 ` Eric Sandeen
@ 2009-07-10 5:28 ` Eric Sandeen
2009-07-10 7:27 ` Tomek Kruszona
1 sibling, 1 reply; 11+ messages in thread
From: Eric Sandeen @ 2009-07-10 5:28 UTC (permalink / raw)
To: Tomek Kruszona; +Cc: xfs
Tomek Kruszona wrote:
> Hello!
>
> I have a little problem with XFS filesystem that I have on one of my
> machines. I try to make xfs_repair that was not making any problems
> before, but xfs_repair stops on:
>
> Phase 6 - check inode connectivity...
> - resetting contents of realtime bitmap and summary inodes
> - traversing filesystem ...
>
> CPU usage grows up to 100%. I left it in the night hoping it will finish
> job till morning, but the situation hasn't changed...
>
> System is Debian Lenny with current updates and custom 2.6.30.1 kernel
> xfsprogs-2.9.8. Filesysystem is placed on LVM2 Logical Volume.
>
> I upgraded xfsprogs to 3.0.2 version and the problem still persists.
> Then I reverted to 2.9.8 package from Debian Lenny.
> Switching back to debian default 2.6.26 kernel doesn't help too.
>
> I can mount this filesystem and operate on it.
>
> Data on this system is not so crucial, because it's backup/testing
> machine, but it would be great to keep this data, because synchronizing
> 14TB of data will take some time.
>
> Output from xfs_info:
> # xfs_info /mnt/storage/
> meta-data=/dev/mapper/p02bvg-p02blv isize=256 agcount=32,
> agsize=268435455 blks
> = sectsz=512 attr=2
> data = bsize=4096 blocks=8410889216, imaxpct=5
> = sunit=0 swidth=0 blks
> naming =version 2 bsize=4096
> log =internal bsize=4096 blocks=32768, version=2
> = sectsz=512 sunit=0 blks, lazy-count=0
> realtime =none extsz=4096 blocks=0, rtextents=0
>
> Any ideas how to make xfs_repair working again?
No fix for you yet, but it's in cache_node_get(), in the for(;;) loop,
and it looks like cache_node_allocate() fails to get a new node and we
keep spinning. I need to look some more at what's going on....
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: xfs_repair stops on "traversing filesystem..."
2009-07-10 5:28 ` Eric Sandeen
@ 2009-07-10 7:27 ` Tomek Kruszona
2009-07-10 14:35 ` Eric Sandeen
2009-07-10 20:17 ` Eric Sandeen
0 siblings, 2 replies; 11+ messages in thread
From: Tomek Kruszona @ 2009-07-10 7:27 UTC (permalink / raw)
To: xfs
Eric Sandeen wrote:
> No fix for you yet, but it's in cache_node_get(), in the for(;;) loop,
> and it looks like cache_node_allocate() fails to get a new node and we
> keep spinning. I need to look some more at what's going on....
Hello!
Is this specific behavior for this particular broken filesystem or is it
a bug in functions you mentioned? I'm just curious :)
Best regards,
Tomasz Kruszona
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: xfs_repair stops on "traversing filesystem..."
2009-07-10 7:27 ` Tomek Kruszona
@ 2009-07-10 14:35 ` Eric Sandeen
2009-07-10 20:17 ` Eric Sandeen
1 sibling, 0 replies; 11+ messages in thread
From: Eric Sandeen @ 2009-07-10 14:35 UTC (permalink / raw)
To: Tomek Kruszona; +Cc: xfs
Tomek Kruszona wrote:
> Eric Sandeen wrote:
>> No fix for you yet, but it's in cache_node_get(), in the for(;;) loop,
>> and it looks like cache_node_allocate() fails to get a new node and we
>> keep spinning. I need to look some more at what's going on....
>
> Hello!
>
> Is this specific behavior for this particular broken filesystem or is it
> a bug in functions you mentioned? I'm just curious :)
I don't know yet :)
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: xfs_repair stops on "traversing filesystem..."
2009-07-10 7:27 ` Tomek Kruszona
2009-07-10 14:35 ` Eric Sandeen
@ 2009-07-10 20:17 ` Eric Sandeen
2009-07-10 21:02 ` Tomek Kruszona
1 sibling, 1 reply; 11+ messages in thread
From: Eric Sandeen @ 2009-07-10 20:17 UTC (permalink / raw)
To: Tomek Kruszona; +Cc: xfs
Tomek Kruszona wrote:
> Eric Sandeen wrote:
>> No fix for you yet, but it's in cache_node_get(), in the for(;;) loop,
>> and it looks like cache_node_allocate() fails to get a new node and we
>> keep spinning. I need to look some more at what's going on....
>
> Hello!
>
> Is this specific behavior for this particular broken filesystem or is it
> a bug in functions you mentioned? I'm just curious :)
This looks like some of the caching that xfs_repair does is mis-sized,
and it gets stuck when it's unable to find a slot for a new node to
cache. IMHO that's still a bug that I'd like to work out. If it gets
stuck this way, it'd probably be better to exit, and suggest a larger
hash size.
But anyway, I forced a bigger hash size:
xfs_repair -P -o bhash=1024 <blah>
and it did complete. 1024 is probably over the top, but it worked for
me on a 4G machine w/ some swap.
I'd strongly suggest doing a non-obfuscated xfs_metadump, do
xfs_mdrestore of that to some temp.img, run xfs_repair <blah> on that
temp.img, mount it, and see what you're left with; that way you'll know
what you're getting into w/ repair.
I ended up w/ about 5000 files in lost+found just FWIW...
Out of curiosity, do you know how the fs was damaged?
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: xfs_repair stops on "traversing filesystem..."
2009-07-10 20:17 ` Eric Sandeen
@ 2009-07-10 21:02 ` Tomek Kruszona
2009-07-10 21:15 ` Eric Sandeen
0 siblings, 1 reply; 11+ messages in thread
From: Tomek Kruszona @ 2009-07-10 21:02 UTC (permalink / raw)
To: xfs
Eric Sandeen wrote:
> This looks like some of the caching that xfs_repair does is mis-sized,
> and it gets stuck when it's unable to find a slot for a new node to
> cache. IMHO that's still a bug that I'd like to work out. If it gets
> stuck this way, it'd probably be better to exit, and suggest a larger
> hash size.
>
> But anyway, I forced a bigger hash size:
>
> xfs_repair -P -o bhash=1024 <blah>
>
> and it did complete. 1024 is probably over the top, but it worked for
> me on a 4G machine w/ some swap.
:D
Is it safe to use xfs_repair without this options after the FS was
repaired? Or maybe I should use them every time I have similar problem?
> I'd strongly suggest doing a non-obfuscated xfs_metadump, do
> xfs_mdrestore of that to some temp.img, run xfs_repair <blah> on that
> temp.img, mount it, and see what you're left with; that way you'll know
> what you're getting into w/ repair.
> I ended up w/ about 5000 files in lost+found just FWIW...
It doesn't matter. On this filesystem is a lot of small files. Those are
image sequences used for video composition. It's backup machine so if
they're gone from filesystem they will be copied back from original
machine. No stress :)
I'm doing xfs_repair on the image now - it's Phase 4 and for now list of
files looks very similar to list that I saw during xfs_repair without
options you suggested.
> Out of curiosity, do you know how the fs was damaged?
I'm not sure. I see some possibilities. I played with write cache
options on the RAID controller when the FS was mounted and running.
Maybe then something went wrong... Second possible reason is that we had
power loss last time and this machine went down then :/
Last one is that I have some problems with XFS filesytems on LVM2. in
kernels <2.6.30 barriers are automatically disabled when underlying
device is some dm-device. As I'm using RAID controllers I should have
write cache disabled. So after upgrade to 2.6.30 message about disabled
barriers disappeared and it was safe to enable write cache again.
Somewhere in the meantime I wanted to check filesystem that everything
is ok with it and then the problem started - I couldn't finish
xfs_repair. This power loss was IIRC after my troubles with xfs_repair,
so the filesystem wasn't totally clean when power failed. Maybe this is
the reason of this mess ;)
Best regards
Tomasz Kruszona
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: xfs_repair stops on "traversing filesystem..."
2009-07-10 21:02 ` Tomek Kruszona
@ 2009-07-10 21:15 ` Eric Sandeen
2009-07-10 23:44 ` Tomek Kruszona
0 siblings, 1 reply; 11+ messages in thread
From: Eric Sandeen @ 2009-07-10 21:15 UTC (permalink / raw)
To: Tomek Kruszona; +Cc: xfs
Tomek Kruszona wrote:
> Eric Sandeen wrote:
>> This looks like some of the caching that xfs_repair does is mis-sized,
>> and it gets stuck when it's unable to find a slot for a new node to
>> cache. IMHO that's still a bug that I'd like to work out. If it gets
>> stuck this way, it'd probably be better to exit, and suggest a larger
>> hash size.
>>
>> But anyway, I forced a bigger hash size:
>>
>> xfs_repair -P -o bhash=1024 <blah>
>>
>> and it did complete. 1024 is probably over the top, but it worked for
>> me on a 4G machine w/ some swap.
> :D
>
> Is it safe to use xfs_repair without this options after the FS was
> repaired? Or maybe I should use them every time I have similar problem?
These are all good questions ;) TBH I'm kind of digging through repair
in earnest for the first time. I'm not certain why it got into this
state, whether there is some underlying bug, perhaps leaving things
wrongly referenced, or just a plain ol' mis-sizing of the caches.
I have a patch now that ends like this; if all else fails at least it'd
not spin forever, and give a hint of what to try.
-Eric
...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
unknown magic number 0 for block 8388608 in directory inode 40541
rebuilding directory inode 40541
unknown magic number 0 for block 8388608 in directory inode 48934
rebuilding directory inode 48934
unknown magic number 0 for block 8388608 in directory inode 56139
rebuilding directory inode 56139
unknown magic number 0 for block 8388608 in directory inode 63785
rebuilding directory inode 63785
Unable to free any items in cache for new node; exiting.
Try increasing the bhash and/or ihash size beyond 64
cache: 0x190ed4d0
Max supported entries = 512
Max utilized entries = 512
Active entries = 512
Hash table size = 64
Hits = 130779
Misses = 271155
Hit ratio = 32.54
MRU 0 entries = 0 ( 0%)
MRU 1 entries = 0 ( 0%)
MRU 2 entries = 0 ( 0%)
MRU 3 entries = 0 ( 0%)
MRU 4 entries = 0 ( 0%)
MRU 5 entries = 0 ( 0%)
MRU 6 entries = 0 ( 0%)
MRU 7 entries = 0 ( 0%)
MRU 8 entries = 0 ( 0%)
MRU 9 entries = 0 ( 0%)
MRU 10 entries = 0 ( 0%)
MRU 11 entries = 0 ( 0%)
MRU 12 entries = 0 ( 0%)
MRU 13 entries = 0 ( 0%)
MRU 14 entries = 0 ( 0%)
MRU 15 entries = 0 ( 0%)
Hash buckets with 2 entries 2 ( 0%)
Hash buckets with 3 entries 2 ( 1%)
Hash buckets with 4 entries 4 ( 3%)
Hash buckets with 5 entries 1 ( 0%)
Hash buckets with 6 entries 8 ( 9%)
Hash buckets with 7 entries 9 ( 12%)
Hash buckets with 8 entries 9 ( 14%)
Hash buckets with 9 entries 10 ( 17%)
Hash buckets with 10 entries 9 ( 17%)
Hash buckets with 11 entries 6 ( 12%)
Hash buckets with 12 entries 1 ( 2%)
Hash buckets with 13 entries 2 ( 5%)
Hash buckets with 14 entries 1 ( 2%)
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: xfs_repair stops on "traversing filesystem..."
2009-07-10 21:15 ` Eric Sandeen
@ 2009-07-10 23:44 ` Tomek Kruszona
2009-07-11 0:36 ` Eric Sandeen
0 siblings, 1 reply; 11+ messages in thread
From: Tomek Kruszona @ 2009-07-10 23:44 UTC (permalink / raw)
To: xfs
Eric Sandeen wrote:
> These are all good questions ;) TBH I'm kind of digging through repair
> in earnest for the first time. I'm not certain why it got into this
> state, whether there is some underlying bug, perhaps leaving things
> wrongly referenced, or just a plain ol' mis-sizing of the caches.
>
> I have a patch now that ends like this; if all else fails at least it'd
> not spin forever, and give a hint of what to try.
>
> -Eric
>
> ...
>
> Phase 6 - check inode connectivity...
> - resetting contents of realtime bitmap and summary inodes
> - traversing filesystem ...
> unknown magic number 0 for block 8388608 in directory inode 40541
> rebuilding directory inode 40541
> unknown magic number 0 for block 8388608 in directory inode 48934
> rebuilding directory inode 48934
> unknown magic number 0 for block 8388608 in directory inode 56139
> rebuilding directory inode 56139
> unknown magic number 0 for block 8388608 in directory inode 63785
> rebuilding directory inode 63785
> Unable to free any items in cache for new node; exiting.
> Try increasing the bhash and/or ihash size beyond 64
> cache: 0x190ed4d0
> Max supported entries = 512
> Max utilized entries = 512
> Active entries = 512
> Hash table size = 64
> Hits = 130779
> Misses = 271155
> Hit ratio = 32.54
[snip]
I made some tests and it seems, that filesystem to finish xfs_repair
needs to be repaired with bhash=1024... With default options it still
hangs on "traversing filesystem..." Is it possible to change this
behavior to normal in other way than reformat? Moreover I spotted some
strange thing. 16GB of data has been moved to lost+found. I tried to
clean L+F by
# rm -rf lost+found
but suddenly I got this:
# ls -l /mnt/storage/
ls: cannot access /mnt/storage/lost+found: No such file or directory
total 0
drwxr-xr-x 4 root root 33 Mar 4 17:21 l_mirror
?????????? ? ? ? ? ? lost+found
I had to run xfs_repair (bhash=1024) once again and then l+f disappeared...
So I started to think: does it have some influence on data that are
stored on this filesystem? I'm afraid that files on this FS may become
inconsistent :/
Best regards,
Tomasz Kruszona
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: xfs_repair stops on "traversing filesystem..."
2009-07-10 23:44 ` Tomek Kruszona
@ 2009-07-11 0:36 ` Eric Sandeen
0 siblings, 0 replies; 11+ messages in thread
From: Eric Sandeen @ 2009-07-11 0:36 UTC (permalink / raw)
To: Tomek Kruszona; +Cc: xfs
Tomek Kruszona wrote:
...
> I made some tests and it seems, that filesystem to finish xfs_repair
> needs to be repaired with bhash=1024... With default options it still
> hangs on "traversing filesystem..." Is it possible to change this
> behavior to normal in other way than reformat?
It has nothing to do w/ the format, it's just internal to xfs_repair
while it's running, the way it caches blocks that it has recently used.
> Moreover I spotted some
> strange thing. 16GB of data has been moved to lost+found. I tried to
Yeah ...
> clean L+F by
> # rm -rf lost+found
>
> but suddenly I got this:
>
> # ls -l /mnt/storage/
> ls: cannot access /mnt/storage/lost+found: No such file or directory
> total 0
> drwxr-xr-x 4 root root 33 Mar 4 17:21 l_mirror
> ?????????? ? ? ? ? ? lost+found
huh. let me try on the repaired metadata image...
[root@bear-05 bad-repair]# mount -o loop badfs.img mnt/
[root@bear-05 bad-repair]# du -hc mnt/lost+found/
26M mnt/lost+found/1602304
5.0M mnt/lost+found/2558868
18G mnt/lost+found/
18G total
[root@bear-05 bad-repair]# rm -rf mnt/lost+found/*
[root@bear-05 bad-repair]# ls -l mnt/
total 0
drwxr-xr-x 4 root root 33 Mar 4 10:21 ??5?t??4
drwxr-xr-x 0 root root 6 Jul 10 19:48 lost+found
drwxr-xr-x 2 1000 root 6 Jun 30 08:45 tests
Did you do something else to the fs in between?
> I had to run xfs_repair (bhash=1024) once again and then l+f disappeared...
>
> So I started to think: does it have some influence on data that are
> stored on this filesystem? I'm afraid that files on this FS may become
> inconsistent :/
Well, I don't know what has gone wrong with your filesytem (or, with the
storage beneath it, or whatever) - but it is certainly possible that
with corrupted metadata, that there is corrupted data as well - all
depends on the root cause ...
-Eric
> Best regards,
> Tomasz Kruszona
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2009-07-11 0:35 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-09 14:13 xfs_repair stops on "traversing filesystem..." Tomek Kruszona
2009-07-09 14:54 ` Eric Sandeen
2009-07-09 15:03 ` Tomek Kruszona
2009-07-10 5:28 ` Eric Sandeen
2009-07-10 7:27 ` Tomek Kruszona
2009-07-10 14:35 ` Eric Sandeen
2009-07-10 20:17 ` Eric Sandeen
2009-07-10 21:02 ` Tomek Kruszona
2009-07-10 21:15 ` Eric Sandeen
2009-07-10 23:44 ` Tomek Kruszona
2009-07-11 0:36 ` Eric Sandeen
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.