Error (failing assert) in xfs

All of lore.kernel.org
 help / color / mirror / Atom feed

* Error (failing assert) in xfs_repair
@ 2014-07-08 14:48 Hans Kraus
  2014-07-08 21:03 ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Hans Kraus @ 2014-07-08 14:48 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: Type: text/html, Size: 1822 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Error (failing assert) in xfs_repair
  2014-07-08 14:48 Error (failing assert) in xfs_repair Hans Kraus
@ 2014-07-08 21:03 ` Dave Chinner
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2014-07-08 21:03 UTC (permalink / raw)
  To: Hans Kraus; +Cc: xfs

On Tue, Jul 08, 2014 at 04:48:25PM +0200, Hans Kraus wrote:
> <html><head></head><body><div style="font-family: Verdana;font-size: 12.0px;"><div>
> <div>Hi,</div>

Hi, can you please use text-only email, not html email?

> <div>I installed xfstools from the git repository on my debian 7 amd64<br/>
> backup server. I got the following assertion error:<br/>
> -------------------------------------------------------------------<br/>
> root@elefant:/home/kraush# xfs_repair -L /dev/vg_stor1/lv_stor1<br/>

What commit did you pull from? Also, a bit more information about
your filesystem and storage will help us:

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

> Phase 1 - find and verify superblock...<br/>
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; - reporting progress in intervals of 15 minutes<br/>
> Phase 2 - using internal log<br/>
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; - zero log...<br/>
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; - scan filesystem freespace and inode maps...<br/>
> zeroing unused portion of secondary superblock (AG #20)<br/>
> Metadata corruption detected at block 0x2b9e8a001/0x200<br/>
> zeroing unused portion of secondary superblock (AG #16)<br/>
> bad agbno 4274958142 in agfl, agno 16<br/>
> freeblk count 1 != flcount 1806214135 in ag 16<br/>
> bad agbno 131784061 for btbno root, agno 16<br/>
> bad agbno 1628187110 for btbcnt root, agno 16<br/>
> agf_freeblks 2414458438, counted 0 in ag 16<br/>
> agf_longest 1482451932, counted 0 in ag 16<br/>
> agf_btreeblks 4031360978, counted 0 in ag 16<br/>
> bad agbno 1207336865 for inobt root, agno 16<br/>
> agi_count 1835626108, counted 0 in ag 16<br/>
> agi_freecount 952362526, counted 0 in ag 16<br/>
> xfs_repair: scan.c:1579: scan_ag: Assertion &#96;agf_dirty &#124;&#124;<br/>
> agfbuf-&gt;b_error != 117&#39; failed.<br/>

So AG 16 has garbage in superblock, the AGF and the AGI. How did
that happen? What went wrong with the storage that lead you to run
xfs_repair?

I think I see the problem, but I first need to confirm what assert
is firing by matching commits.

> root@elefant:/home/kraush#<br/>
> -------------------------------------------------------------------<br/>
> What shall I do to repair the file system (about 16 TB, 88% full,<br/>
> backuppc storage medium).</div>

You'll probably have to wait for a patch to fix the problem. The
sooner I can confirm the assert you hit, the sooner I'll be able to
get you that patch.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Error (failing assert) in xfs_repair
@ 2014-07-13 14:54 Hans Kraus
  0 siblings, 0 replies; 5+ messages in thread
From: Hans Kraus @ 2014-07-13 14:54 UTC (permalink / raw)
  To: xfs

Hi Dave,

thanks for the patch. It worked and I lost not very many files. The good thing with backuppc is,
that only deleted or corrupted files are copied anew, as a result of the file deduplication
(on file basis).

Thanks, Hans

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Error (failing assert) in xfs_repair
  2014-07-08 21:50 Hans Kraus
@ 2014-07-08 22:39 ` Dave Chinner
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2014-07-08 22:39 UTC (permalink / raw)
  To: Hans Kraus; +Cc: xfs

On Tue, Jul 08, 2014 at 11:50:31PM +0200, Hans Kraus wrote:
> Hi Dave,
> 
> sorry for the HTML mail. My normal mail account isn't allowed to
> send mail to the list, my provider got somehow onto a spam list. I
> had to revert to GMX and only recently learnt how to send plain
> text mails via GMX. I hope that that works now.

Looks good ;)

> The system is now down, therefore some of the details you
> requested are only from my memory.  The version of xfstools is the
> one I got with the command "git clone
> git://oss.sgi.com/xfs/cmds/xfsprogs", sunday or monday this week.

Ok.

> The story is as follows: my backup file system is on a Raid6
> (mdadm), on top of that lvm2 and xfs.  One of the HDs reported
> smart errors, I replaced it with a new one. During the Raid6
> resyncronisation I got a kernel panic. After reset the raid didn't
> come up. I finally started it with the '--force' parameter. After
> that the filesystem didn't mount. I issued the 'xfs_repair -L ...'
> from the current distribution, Debian 7 amd64. During the repair
> process the command stalled for more than 24 hours. After that I
> got nervous and restarted the machine.

Yup, happens occasionally on older repairs - the "-P" option
probably helps there. That also helps explains why repair only
started complaining at AG 16.

> Then I downloaded the git xfstools version and compiled it. Now I
> get repeatedly the error from my first mail.

Yes, the IO verifier is reporting bp->b_error == EFSCORRUPTED on the
AGF buffer, and this assert is firing:

		ASSERT(agf_dirty || agfbuf->b_error != EFSCORRUPTED);

The issue is that the initial checks on the AGF are not resulting in
a dirty AGF because the fields that are corrupted can't be repaired
in phase 2, and hence the agf is not being dirtied despite being
corrupted.

The assert needs fixing.

> Two more drives failed, I'm now back to a system without any
> retundancy. I will only power it up again when I have two new
> replacement drives (already ordered).

Taking the failed raid resync, the --force to reconstruct it and
the style of corruption being reported in the XFS metadata, I'd
say you've probably corrupted all the data on your filesystem beyond
the point you can recover any of it.

The patch below will avoid the assert failure issue so repair runs
further, but if I were you I'd be considering the filesystem and the
data it contains a complete loss and restoring from backups....

.... which you probably don't have because this is a backup server.
Who has backups of their backups?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

repair: handle uncorrected corruptions in phase 2

From: Dave Chinner <dchinner@redhat.com>

Some of the AG header corruptions detected by the IO verifiers
cannot be corrected in phase 2 when we do the initial scan of the
AGs. Correcting some errors cannot be done until a full rebuild of
the trees is done in phase 5.

Hence we can end up with a "clean" AGF/AGI buffer but have a
EFSCORRUPTED error on the buffer. This results in an assert failing:

	ASSERT(agf_dirty || agfbuf->b_error != EFSCORRUPTED);

and repair not beign able to fix the problems it has tripped over.
Hence the assert that we corrected all corruptions in the buffers
is not valid and should be removed.

Reported-by: Hans Kraus <hans.w.kraus@gmx.at>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 repair/scan.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/repair/scan.c b/repair/scan.c
index f29ff8d..142d8d7 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -1572,14 +1572,13 @@ scan_ag(

 	/*
 	 * Only pay attention to CRC/verifier errors if we can correct them.
-	 * While there, ensure that we corrected a corruption error if the
-	 * verifier detected one.
+	 * Note that we can get uncorrected EFSCORRUPTED errors here because
+	 * the verifier will flag on out of range values that we can't correct
+	 * until phase 5 when we have all the information necessary to rebuild
+	 * the freespace/inode btrees. We can correct bad CRC errors
+	 * immediately, though.
 	 */
 	if (!no_modify) {
-		ASSERT(agi_dirty || agibuf->b_error != EFSCORRUPTED);
-		ASSERT(agf_dirty || agfbuf->b_error != EFSCORRUPTED);
-		ASSERT(sb_dirty || sbbuf->b_error != EFSCORRUPTED);
-
 		agi_dirty += (agibuf->b_error == EFSBADCRC);
 		agf_dirty += (agfbuf->b_error == EFSBADCRC);
 		sb_dirty += (sbbuf->b_error == EFSBADCRC);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: Error (failing assert) in xfs_repair
@ 2014-07-08 21:50 Hans Kraus
  2014-07-08 22:39 ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Hans Kraus @ 2014-07-08 21:50 UTC (permalink / raw)
  To: xfs

Hi Dave,

sorry for the HTML mail. My normal mail account isn't allowed to send mail to the list,
my provider got somehow onto a spam list. I had to revert to GMX and only recently learnt
how to send plain text mails via GMX. I hope that that works now.

The system is now down, therefore some of the details you requested are only from my memory.
The version of xfstools is the one I got with the command "git clone git://oss.sgi.com/xfs/cmds/xfsprogs",
sunday or monday this week.

The story is as follows: my backup file system is on a Raid6 (mdadm), on top of that lvm2 and xfs.
One of the HDs reported smart errors, I replaced it with a new one. During the Raid6 resyncronisation
I got a kernel panic. After reset the raid didn't come up. I finally started it with the '--force'
parameter. After that the filesystem didn't mount. I issued the 'xfs_repair -L ...' from the
current distribution, Debian 7 amd64. During the repair process the command stalled for more than
24 hours. After that I got nervous and restarted the machine.

Then I downloaded the git xfstools version and compiled it. Now I get repeatedly the error from
my first mail.

Two more drives failed, I'm now back to a system without any retundancy. I will only power it up
again when I have two new replacement drives (already ordered).

Kind regards, Hans

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-07-13 14:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-08 14:48 Error (failing assert) in xfs_repair Hans Kraus
2014-07-08 21:03 ` Dave Chinner
2014-07-08 21:50 Hans Kraus
2014-07-08 22:39 ` Dave Chinner
2014-07-13 14:54 Hans Kraus

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.