All of lore.kernel.org
 help / color / mirror / Atom feed
* Problem recovering XFS filesystem
@ 2012-04-26 20:00 Aaron Williams
  2012-04-27 21:31 ` Michael Monnerie
  0 siblings, 1 reply; 5+ messages in thread
From: Aaron Williams @ 2012-04-26 20:00 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 4111 bytes --]

Hi,

I had an issue with my RAID system and am having problems trying to recover
my XFS filesystem.

First of all, I made a copy of it to another device (using dd) and I was
able to recover that image with some data loss by blowing away the log. I
would like to try and recover it properly, however.

I currently have extracted all of the files from the recovered version and
now am trying to recover again without blowing away the log.

When I attempt to mount the filesystem I get the error: mount: Structure
needs cleaning

The kernel reports:
Apr 26 12:53:41 flash kernel: [388563.491665] XFS (sdd1): Mounting
Filesystem
Apr 26 12:53:41 flash kernel: [388563.503667] XFS (sdd1): Starting recovery
(logdev: internal)
Apr 26 12:53:41 flash kernel: [388563.509539] XFS: Internal error
XFS_WANT_CORRUPTED_GOTO at line 1530 of file
/home/abuild/rpmbuild/BUILD/kernel-default-3.1.10/linux-3.1/fs/xfs/xfs_alloc.c.
Caller 0xffffffffa005da7c
Apr 26 12:53:41 flash kernel: [388563.509540]
Apr 26 12:53:41 flash kernel: [388563.509542] Pid: 29146, comm: mount
Tainted: P            3.1.10-22-default #1
Apr 26 12:53:41 flash kernel: [388563.509544] Call Trace:
Apr 26 12:53:41 flash kernel: [388563.509554]  [<ffffffff810042fa>]
dump_trace+0x9a/0x270
Apr 26 12:53:41 flash kernel: [388563.509558]  [<ffffffff815266c3>]
dump_stack+0x69/0x6f
Apr 26 12:53:41 flash kernel: [388563.509589]  [<ffffffffa005b304>]
xfs_free_ag_extent+0x564/0x7c0 [xfs]
Apr 26 12:53:41 flash kernel: [388563.509629]  [<ffffffffa005da7c>]
xfs_free_extent+0xec/0x130 [xfs]
Apr 26 12:53:41 flash kernel: [388563.509670]  [<ffffffffa008b900>]
xlog_recover_process_efi+0x160/0x1b0 [xfs]
Apr 26 12:53:41 flash kernel: [388563.509733]  [<ffffffffa008cbf1>]
xlog_recover_process_efis.isra.8+0x61/0xb0 [xfs]
Apr 26 12:53:41 flash kernel: [388563.509795]  [<ffffffffa00907f0>]
xlog_recover_finish+0x20/0xb0 [xfs]
Apr 26 12:53:41 flash kernel: [388563.509859]  [<ffffffffa009337e>]
xfs_mountfs+0x43e/0x6b0 [xfs]
Apr 26 12:53:41 flash kernel: [388563.509923]  [<ffffffffa00536cd>]
xfs_fs_fill_super+0x1bd/0x270 [xfs]
Apr 26 12:53:41 flash kernel: [388563.509948]  [<ffffffff8114e6a4>]
mount_bdev+0x1b4/0x1f0
Apr 26 12:53:41 flash kernel: [388563.509951]  [<ffffffff8114ef55>]
mount_fs+0x45/0x1d0
Apr 26 12:53:41 flash kernel: [388563.509955]  [<ffffffff81167656>]
vfs_kern_mount+0x66/0xd0
Apr 26 12:53:41 flash kernel: [388563.509958]  [<ffffffff81168a33>]
do_kern_mount+0x53/0x120
Apr 26 12:53:41 flash kernel: [388563.509961]  [<ffffffff8116a4e5>]
do_mount+0x1a5/0x260
Apr 26 12:53:41 flash kernel: [388563.509964]  [<ffffffff8116a98a>]
sys_mount+0x9a/0xf0
Apr 26 12:53:41 flash kernel: [388563.509968]  [<ffffffff81546712>]
system_call_fastpath+0x16/0x1b
Apr 26 12:53:41 flash kernel: [388563.509972]  [<00007f8e22dd397a>]
0x7f8e22dd3979
Apr 26 12:53:41 flash kernel: [388563.509977] XFS (sdd1): Failed to recover
EFIs
Apr 26 12:53:41 flash kernel: [388563.509979] XFS (sdd1): log mount finish
failed

If I run xfs_repair I get the following:

./xfs_repair -v /dev/sdd1
Phase 1 - find and verify superblock...
        - block cache size set to 2282936 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 6784 tail block 6528
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

I am running the Linux kernel 3.1.10-22 (openSUSE) and xfsprogs 3.1.8.

When I did the repair I had to blow away the log and I had to use xfs_db to
fix some cases where blocks were claimed by multiple files. There was a
brief period where the corruption was occurring and the files were
generally things that are not important. I used xfs_db to identify the
files and deleted the files. After several passes using xfs_repair, xfs_db
and deleting the files I was able to recover the filesystem.

-Aaron

[-- Attachment #1.2: Type: text/html, Size: 4419 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problem recovering XFS filesystem
  2012-04-26 20:00 Problem recovering XFS filesystem Aaron Williams
@ 2012-04-27 21:31 ` Michael Monnerie
  2012-04-28  2:04   ` Aaron Williams
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Monnerie @ 2012-04-27 21:31 UTC (permalink / raw)
  To: xfs; +Cc: Aaron Williams


[-- Attachment #1.1: Type: text/plain, Size: 449 bytes --]

Am Donnerstag, 26. April 2012, 13:00:06 schrieb Aaron Williams:
> I was able to recover the filesystem.

So your RAID busted the filesystem. Maybe the devs could want an 
xfs_metadump of the FS before your repair, so they can inspect it and 
improve xfs_repair.

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problem recovering XFS filesystem
  2012-04-27 21:31 ` Michael Monnerie
@ 2012-04-28  2:04   ` Aaron Williams
  2012-04-29  0:35     ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Aaron Williams @ 2012-04-28  2:04 UTC (permalink / raw)
  To: Michael Monnerie; +Cc: xfs


[-- Attachment #1.1: Type: text/plain, Size: 2684 bytes --]

On Fri, Apr 27, 2012 at 2:31 PM, Michael Monnerie <
michael.monnerie@is.it-management.at> wrote:

> Am Donnerstag, 26. April 2012, 13:00:06 schrieb Aaron Williams:
> > I was able to recover the filesystem.
>
> So your RAID busted the filesystem. Maybe the devs could want an
> xfs_metadump of the FS before your repair, so they can inspect it and
> improve xfs_repair.
>
> Hi Michael,

It appears that way, or it may be the fact that I mounted with nobarrier
and in the process of recovering the RAID the information in the
battery-backed RAID cache got blown away. I have an Areca ARC-1210
controller that was in the process of rebuilding when I attempted to shut
down and reboot my Linux system after I mistakenly unplugged the wrong
drive from my RAID array. I had another drive fail on me and it had
completed rebuilding itself using a hot spare drive. I intended to remove
the bad drive to replace it but disconnected the wrong drive. After
reconnecting the good drive it went on to start rebuilding itself again. At
this point I decided it might be safer to shut down Linux to replace the
drive and thought the RAID controller would pick up where it left off in
rebuilding.

Linux did not shut down all the way however. I don't know if it was waiting
for the array to rebuild itself or if something else happened. Anyway, I
eventually hit the reset button. The RAID BIOS reported it could not find
the array and I had to go about rebuilding the array. I also did a volume
check which found about 70,000 blocks that it repaired.  Needless to say I
was quite nervous.

Once that was done Linux refused to mount the XFS partition, I think due to
corruption in the log.

I have an image of my pre-repaired filesystem by using dd and can try and
do a meta dump. The filesystem is 1.9TB in size with about 1.2TB of data in
use.

It looks like I was able to recover everything fine after blowing away the
log. I see a bunch of files recovered in lost+found but those all appear to
be files like cached web pages, etc.

I also dumped the log to a file (128M).

So far it looks like any actual data loss is minimal (thankfully) and was a
good wakeup call to start doing more frequent backups.

I also upgraded xfsprogs from 3.1.6-2.1.2 to 3.1.8 which did a much better
job at recovery than my previous attempt.

It would be nice if xfs_db would allow me to continue when the log is dirty
instead of requiring me to mount the filesystem first. It also would be
nice if xfs_logprint could try and identify the filenames of the inodes
involved.

I understand that there are plans to update XFS to include the UID in all
of the on-disk structures. Any idea on when this will happen?

-Aaron

[-- Attachment #1.2: Type: text/html, Size: 3359 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problem recovering XFS filesystem
  2012-04-28  2:04   ` Aaron Williams
@ 2012-04-29  0:35     ` Dave Chinner
  2012-04-29 21:55       ` Aaron Williams
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2012-04-29  0:35 UTC (permalink / raw)
  To: Aaron Williams; +Cc: Michael Monnerie, xfs

On Fri, Apr 27, 2012 at 07:04:48PM -0700, Aaron Williams wrote:
> On Fri, Apr 27, 2012 at 2:31 PM, Michael Monnerie <
> michael.monnerie@is.it-management.at> wrote:
> 
> > Am Donnerstag, 26. April 2012, 13:00:06 schrieb Aaron Williams:
> > > I was able to recover the filesystem.
> >
> > So your RAID busted the filesystem. Maybe the devs could want an
> > xfs_metadump of the FS before your repair, so they can inspect it and
> > improve xfs_repair.
> >
> > Hi Michael,

<snip story of woe>

> Once that was done Linux refused to mount the XFS partition, I think due to
> corruption in the log.

The reason will be in the log. e.g dmesg |tail -100 usually tells
you why it failed to mount.

> I have an image of my pre-repaired filesystem by using dd and can try and
> do a meta dump. The filesystem is 1.9TB in size with about 1.2TB of data in
> use.

ISTR that metadump needs the log to be clean first, too.

> It looks like I was able to recover everything fine after blowing away the
> log. I see a bunch of files recovered in lost+found but those all appear to
> be files like cached web pages, etc.
> 
> I also dumped the log to a file (128M).
> 
> So far it looks like any actual data loss is minimal (thankfully) and was a
> good wakeup call to start doing more frequent backups.
> 
> I also upgraded xfsprogs from 3.1.6-2.1.2 to 3.1.8 which did a much better
> job at recovery than my previous attempt.

That's good to know ;)

> It would be nice if xfs_db would allow me to continue when the log is dirty
> instead of requiring me to mount the filesystem first.

Log recovery is done by the kernel code, not userspace, which is why
there is this requirement. If the kernel can't replay it, then you
have to use xfs_repair to zero it. Unforutnately, you can't just
zero the log with xfs_repair - you could do it hackily by terminatin
xfs_reapir just after it has zeroed the log....

> It also would be
> nice if xfs_logprint could try and identify the filenames of the inodes
> involved.

xfs_logprint just analyses the log transactions - it knows nothing
about the structure of the filesystem and doesn't even mount it. If
you want to know the names of the inodes, then use xfs_db once you
have the inode numbers in question. That requires a full filesystem
traversal to find the name for the inode number in question, so can
be *very* slow. Given that there can be hundreds of thousands of
unique inodes in the log, that sort of translation woul dbe
*extremely* expensive.

> I understand that there are plans to update XFS to include the UID

UUID, not UID.

> in all of the on-disk structures. Any idea on when this will
> happen?

When it is ready. And then you'll have to mkfs a new filesystem to
use it because it can't be retro-fitted to existing filesystems....

I'm already pushing infrastructure changes needed to support all the
new on-disk functionality into the kernel, so the timeframe is
months for experimental support on the new on-disk format....

Cheers,

Dave.
> 
> -Aaron

> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs


-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problem recovering XFS filesystem
  2012-04-29  0:35     ` Dave Chinner
@ 2012-04-29 21:55       ` Aaron Williams
  0 siblings, 0 replies; 5+ messages in thread
From: Aaron Williams @ 2012-04-29 21:55 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Michael Monnerie, xfs

Hi Dave,

On 04/28/2012 05:35 PM, Dave Chinner wrote:
> On Fri, Apr 27, 2012 at 07:04:48PM -0700, Aaron Williams wrote:
>> On Fri, Apr 27, 2012 at 2:31 PM, Michael Monnerie <
>> michael.monnerie@is.it-management.at> wrote:
>>
>>> Am Donnerstag, 26. April 2012, 13:00:06 schrieb Aaron Williams:
>>>> I was able to recover the filesystem.
>>> So your RAID busted the filesystem. Maybe the devs could want an
>>> xfs_metadump of the FS before your repair, so they can inspect it and
>>> improve xfs_repair.
>>>
>>> Hi Michael,
> <snip story of woe>
>
>> Once that was done Linux refused to mount the XFS partition, I think due to
>> corruption in the log.
> The reason will be in the log. e.g dmesg |tail -100 usually tells
> you why it failed to mount.
I should have included the dmesg output earlier.  Here it is:

Apr 26 12:41:00 flash kernel: [387803.170457] XFS (sdd1): Mounting
Filesystem
Apr 26 12:41:00 flash kernel: [387803.181638] XFS (sdd1): Starting
recovery (logdev: internal)
Apr 26 12:41:00 flash kernel: [387803.453411] XFS: Internal error
XFS_WANT_CORRUPTED_GOTO at line 1530 of file
/home/abuild/rpmbuild/BUILD/kernel-default-3.1.10/linux-3.1/fs/xfs/xfs_alloc.c. 
Caller 0xffffffffa005da7c
Apr 26 12:41:00 flash kernel: [387803.453414]
Apr 26 12:41:00 flash kernel: [387803.453418] Pid: 28185, comm: mount
Tainted: P            3.1.10-22-default #1
Apr 26 12:41:00 flash kernel: [387803.453421] Call Trace:
Apr 26 12:41:00 flash kernel: [387803.453436]  [<ffffffff810042fa>]
dump_trace+0x9a/0x270
Apr 26 12:41:00 flash kernel: [387803.453443]  [<ffffffff815266c3>]
dump_stack+0x69/0x6f
Apr 26 12:41:00 flash kernel: [387803.453486]  [<ffffffffa005b304>]
xfs_free_ag_extent+0x564/0x7c0 [xfs]
Apr 26 12:41:00 flash kernel: [387803.453562]  [<ffffffffa005da7c>]
xfs_free_extent+0xec/0x130 [xfs]
Apr 26 12:41:00 flash kernel: [387803.453641]  [<ffffffffa008b900>]
xlog_recover_process_efi+0x160/0x1b0 [xfs]
Apr 26 12:41:00 flash kernel: [387803.453763]  [<ffffffffa008cbf1>]
xlog_recover_process_efis.isra.8+0x61/0xb0 [xfs]
Apr 26 12:41:00 flash kernel: [387803.453884]  [<ffffffffa00907f0>]
xlog_recover_finish+0x20/0xb0 [xfs]
Apr 26 12:41:00 flash kernel: [387803.454009]  [<ffffffffa009337e>]
xfs_mountfs+0x43e/0x6b0 [xfs]
Apr 26 12:41:00 flash kernel: [387803.454132]  [<ffffffffa00536cd>]
xfs_fs_fill_super+0x1bd/0x270 [xfs]
Apr 26 12:41:00 flash kernel: [387803.454180]  [<ffffffff8114e6a4>]
mount_bdev+0x1b4/0x1f0
Apr 26 12:41:00 flash kernel: [387803.454186]  [<ffffffff8114ef55>]
mount_fs+0x45/0x1d0
Apr 26 12:41:00 flash kernel: [387803.454192]  [<ffffffff81167656>]
vfs_kern_mount+0x66/0xd0
Apr 26 12:41:00 flash kernel: [387803.454197]  [<ffffffff81168a33>]
do_kern_mount+0x53/0x120
Apr 26 12:41:00 flash kernel: [387803.454202]  [<ffffffff8116a4e5>]
do_mount+0x1a5/0x260
Apr 26 12:41:00 flash kernel: [387803.454208]  [<ffffffff8116a98a>]
sys_mount+0x9a/0xf0
Apr 26 12:41:00 flash kernel: [387803.454214]  [<ffffffff81546712>]
system_call_fastpath+0x16/0x1b
Apr 26 12:41:00 flash kernel: [387803.454222]  [<00007f171d3bb97a>]
0x7f171d3bb979
Apr 26 12:41:00 flash kernel: [387803.454230] XFS (sdd1): Failed to
recover EFIs
Apr 26 12:41:00 flash kernel: [387803.454232] XFS (sdd1): log mount
finish failed
Apr
>
>> I have an image of my pre-repaired filesystem by using dd and can try and
>> do a meta dump. The filesystem is 1.9TB in size with about 1.2TB of data in
>> use.
> ISTR that metadump needs the log to be clean first, too.
What is ISTR?

> Cheers,
>
> Dave.
>> -Aaron
>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-04-29 21:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-26 20:00 Problem recovering XFS filesystem Aaron Williams
2012-04-27 21:31 ` Michael Monnerie
2012-04-28  2:04   ` Aaron Williams
2012-04-29  0:35     ` Dave Chinner
2012-04-29 21:55       ` Aaron Williams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.