All of lore.kernel.org
 help / color / mirror / Atom feed
* "This is a bug."
@ 2015-09-10  9:18 Tapani Tarvainen
  2015-09-10 10:31 ` Tapani Tarvainen
  2015-09-10 11:48 ` Emmanuel Florac
  0 siblings, 2 replies; 21+ messages in thread
From: Tapani Tarvainen @ 2015-09-10  9:18 UTC (permalink / raw)
  To: xfs

Hi,

After a rather spectacular crash we've got an xfs filesystem
in unrepairable state: mount fails with "Structure needs cleaning",
xfs_repair without options refuses to work (asks to mount first),
and xfs_repair -L stopped with

corrupt dinode 2151195170, extent total = 1, nblocks = 0.  This is a bug.
Please capture the filesystem metadata with xfs_metadump and
report it to xfs@oss.sgi.com.

I tried to dump the metadata and it failed, too:

# xfs_metadump /dev/sdata1/data1 /data2/tmp/data1_metadump
xfs_metadump: cannot init perag data (117)
*** glibc detected *** xfs_db: double free or corruption (!prev): 0x0000000003361000 ***
======= Backtrace: ========= 
[...]
Aborted


At this point I'm going to give up trying to recover the data
(recreate filesystem and restore from backup), but if you want to
analyze it to find the bug, I have enough spare disk space to keep a
copy for a while (took lvm snapshot of it before recovery attempt,
could dd it to another location).

The machine is running Debian Wheezy (7.8),
kernel 3.2.0-4-amd64 #1 SMP Debian 3.2.68-1+deb7u1 x86_64 GNU/Linux,
xfsprogs version 3.1.7+b1.
And the filesystem is 6TB in size.

If you want to take a look, please let me know what I can do to help.

Thank you,

-- 
Tapani Tarvainen

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "This is a bug."
  2015-09-10  9:18 "This is a bug." Tapani Tarvainen
@ 2015-09-10 10:31 ` Tapani Tarvainen
  2015-09-10 11:53   ` Emmanuel Florac
  2015-09-10 11:48 ` Emmanuel Florac
  1 sibling, 1 reply; 21+ messages in thread
From: Tapani Tarvainen @ 2015-09-10 10:31 UTC (permalink / raw)
  To: xfs

A perhaps interesting addition: xfs_metadump succeeds with
the -o option (without it it fails consistently).

If you think the dump would be useful to you I can probably
send it unobfuscated (would have to check but I don't think
there's anything sensitive in the filenames or attributes).

-- 
Tapani Tarvainen

On 10 Sep 12:18, Tapani Tarvainen (tapani.j.tarvainen@jyu.fi) wrote:
> 
> Hi,
> 
> After a rather spectacular crash we've got an xfs filesystem
> in unrepairable state: mount fails with "Structure needs cleaning",
> xfs_repair without options refuses to work (asks to mount first),
> and xfs_repair -L stopped with
> 
> corrupt dinode 2151195170, extent total = 1, nblocks = 0.  This is a bug.
> Please capture the filesystem metadata with xfs_metadump and
> report it to xfs@oss.sgi.com.
> 
> I tried to dump the metadata and it failed, too:
> 
> # xfs_metadump /dev/sdata1/data1 /data2/tmp/data1_metadump
> xfs_metadump: cannot init perag data (117)
> *** glibc detected *** xfs_db: double free or corruption (!prev): 0x0000000003361000 ***
> ======= Backtrace: ========= 
> [...]
> Aborted
> 
> 
> At this point I'm going to give up trying to recover the data
> (recreate filesystem and restore from backup), but if you want to
> analyze it to find the bug, I have enough spare disk space to keep a
> copy for a while (took lvm snapshot of it before recovery attempt,
> could dd it to another location).
> 
> The machine is running Debian Wheezy (7.8),
> kernel 3.2.0-4-amd64 #1 SMP Debian 3.2.68-1+deb7u1 x86_64 GNU/Linux,
> xfsprogs version 3.1.7+b1.
> And the filesystem is 6TB in size.
> 
> If you want to take a look, please let me know what I can do to help.
> 
> Thank you,
> 
> -- 
> Tapani Tarvainen

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "This is a bug."
  2015-09-10  9:18 "This is a bug." Tapani Tarvainen
  2015-09-10 10:31 ` Tapani Tarvainen
@ 2015-09-10 11:48 ` Emmanuel Florac
  2015-09-10 11:55   ` Tapani Tarvainen
  1 sibling, 1 reply; 21+ messages in thread
From: Emmanuel Florac @ 2015-09-10 11:48 UTC (permalink / raw)
  To: Tapani Tarvainen; +Cc: xfs

Le Thu, 10 Sep 2015 12:18:34 +0300
Tapani Tarvainen <tapani.j.tarvainen@jyu.fi> écrivait:

> xfsprogs version 3.1.7+b1.

Don't use that, it's much too old. Use at least a 3.2.x version, or if
possible the very latest version of xfs_repair. You can copy over
xfs_repair from another machine as it's a static binary.

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "This is a bug."
  2015-09-10 10:31 ` Tapani Tarvainen
@ 2015-09-10 11:53   ` Emmanuel Florac
  2015-09-10 12:05     ` Tapani Tarvainen
  0 siblings, 1 reply; 21+ messages in thread
From: Emmanuel Florac @ 2015-09-10 11:53 UTC (permalink / raw)
  To: Tapani Tarvainen; +Cc: xfs

Le Thu, 10 Sep 2015 13:31:42 +0300
Tapani Tarvainen <tapani.j.tarvainen@jyu.fi> écrivait:

> > Hi,
> > 
> > After a rather spectacular crash we've got an xfs filesystem
> > in unrepairable state: mount fails with "Structure needs cleaning",
> > xfs_repair without options refuses to work (asks to mount first),
> > and xfs_repair -L stopped with
> > 
> > corrupt dinode 2151195170, extent total = 1, nblocks = 0.  This is
> > a bug. Please capture the filesystem metadata with xfs_metadump and
> > report it to xfs@oss.sgi.com.
> > 

If you're not afraid of binaries from the web, I've just compiled
xfs-repair 4.2.0 on a Wheezy machine:

http://update.intellique.com/pub/xfs_repair-4.2.0.gz

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "This is a bug."
  2015-09-10 11:48 ` Emmanuel Florac
@ 2015-09-10 11:55   ` Tapani Tarvainen
  2015-09-10 12:30     ` Tapani Tarvainen
  0 siblings, 1 reply; 21+ messages in thread
From: Tapani Tarvainen @ 2015-09-10 11:55 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs

On 10 Sep 13:48, Emmanuel Florac (eflorac@intellique.com) wrote:

> > xfsprogs version 3.1.7+b1.
> 
> Don't use that, it's much too old. Use at least a 3.2.x version, or if
> possible the very latest version of xfs_repair. You can copy over
> xfs_repair from another machine as it's a static binary.

Seems it isn't static in Debian (copied over from a Jessie box):

# /home/tt/xfs_repair -v /dev/sdata1/data1
/home/tt/xfs_repair: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.14' not found (required by /home/tt/xfs_repair)

But if a new version is really likely to help I can build
it from source. Thank you for the suggestion.

-- 
Tapani Tarvainen

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "This is a bug."
  2015-09-10 11:53   ` Emmanuel Florac
@ 2015-09-10 12:05     ` Tapani Tarvainen
  0 siblings, 0 replies; 21+ messages in thread
From: Tapani Tarvainen @ 2015-09-10 12:05 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs

On 10 Sep 13:53, Emmanuel Florac (eflorac@intellique.com) wrote:

> If you're not afraid of binaries from the web, I've just compiled
> xfs-repair 4.2.0 on a Wheezy machine:

Thanks, but I'd just compiled it myself... waiting to see
what it does.

-- 
Tapani Tarvainen

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "This is a bug."
  2015-09-10 11:55   ` Tapani Tarvainen
@ 2015-09-10 12:30     ` Tapani Tarvainen
  2015-09-10 12:36       ` Brian Foster
  0 siblings, 1 reply; 21+ messages in thread
From: Tapani Tarvainen @ 2015-09-10 12:30 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs

On 10 Sep 14:55, Tapani Tarvainen (tapani.j.tarvainen@jyu.fi) wrote:
> On 10 Sep 13:48, Emmanuel Florac (eflorac@intellique.com) wrote:
> 
> > > xfsprogs version 3.1.7+b1.
> > 
> > Don't use that, it's much too old. Use at least a 3.2.x version, or if
> > possible the very latest version of xfs_repair.

With (self-compiled) 4.2.0 version xfs_repair without -L no longer
refuses to run but after a while failed with

[...]
correcting nextents for inode 2152363147
xfs_repair: dinode.c:1961: process_inode_data_fork: Assertion `err == 0' failed.
Aborted

With -L ... same result.

-- 
Tapani Tarvainen

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "This is a bug."
  2015-09-10 12:30     ` Tapani Tarvainen
@ 2015-09-10 12:36       ` Brian Foster
  2015-09-10 12:54         ` Tapani Tarvainen
  0 siblings, 1 reply; 21+ messages in thread
From: Brian Foster @ 2015-09-10 12:36 UTC (permalink / raw)
  To: Tapani Tarvainen; +Cc: xfs

On Thu, Sep 10, 2015 at 03:30:30PM +0300, Tapani Tarvainen wrote:
> On 10 Sep 14:55, Tapani Tarvainen (tapani.j.tarvainen@jyu.fi) wrote:
> > On 10 Sep 13:48, Emmanuel Florac (eflorac@intellique.com) wrote:
> > 
> > > > xfsprogs version 3.1.7+b1.
> > > 
> > > Don't use that, it's much too old. Use at least a 3.2.x version, or if
> > > possible the very latest version of xfs_repair.
> 
> With (self-compiled) 4.2.0 version xfs_repair without -L no longer
> refuses to run but after a while failed with
> 
> [...]
> correcting nextents for inode 2152363147
> xfs_repair: dinode.c:1961: process_inode_data_fork: Assertion `err == 0' failed.
> Aborted
> 
> With -L ... same result.
> 

Care to post the metadump?

Brian

> -- 
> Tapani Tarvainen
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "This is a bug."
  2015-09-10 12:36       ` Brian Foster
@ 2015-09-10 12:54         ` Tapani Tarvainen
  2015-09-10 13:01           ` Brian Foster
  0 siblings, 1 reply; 21+ messages in thread
From: Tapani Tarvainen @ 2015-09-10 12:54 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On 10 Sep 08:36, Brian Foster (bfoster@redhat.com) wrote:

> Care to post the metadump?

It is 2.5GB so not really nice to mail... but if you want
to take a look, here it is:

https://huom.it.jyu.fi/tmp/data1.metadump

-- 
Tapani Tarvainen

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "This is a bug."
  2015-09-10 12:54         ` Tapani Tarvainen
@ 2015-09-10 13:01           ` Brian Foster
  2015-09-10 13:05             ` Tapani Tarvainen
  0 siblings, 1 reply; 21+ messages in thread
From: Brian Foster @ 2015-09-10 13:01 UTC (permalink / raw)
  To: Tapani Tarvainen; +Cc: xfs

On Thu, Sep 10, 2015 at 03:54:41PM +0300, Tapani Tarvainen wrote:
> On 10 Sep 08:36, Brian Foster (bfoster@redhat.com) wrote:
> 
> > Care to post the metadump?
> 
> It is 2.5GB so not really nice to mail... but if you want
> to take a look, here it is:
> 
> https://huom.it.jyu.fi/tmp/data1.metadump
> 

Can you compress it?

Brian

> -- 
> Tapani Tarvainen

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "This is a bug."
  2015-09-10 13:01           ` Brian Foster
@ 2015-09-10 13:05             ` Tapani Tarvainen
  2015-09-10 14:51               ` Brian Foster
  0 siblings, 1 reply; 21+ messages in thread
From: Tapani Tarvainen @ 2015-09-10 13:05 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On 10 Sep 09:01, Brian Foster (bfoster@redhat.com) wrote:

> > It is 2.5GB so not really nice to mail...

> Can you compress it?

Ah. Of course, should've done it in the first place.
Still 250MB though:

https://huom.it.jyu.fi/tmp/data1.metadump.gz

-- 
Tapani Tarvainen

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "This is a bug."
  2015-09-10 13:05             ` Tapani Tarvainen
@ 2015-09-10 14:51               ` Brian Foster
  2015-09-10 15:05                 ` Brian Foster
  2015-09-10 17:31                 ` Tapani Tarvainen
  0 siblings, 2 replies; 21+ messages in thread
From: Brian Foster @ 2015-09-10 14:51 UTC (permalink / raw)
  To: Tapani Tarvainen; +Cc: xfs

On Thu, Sep 10, 2015 at 04:05:30PM +0300, Tapani Tarvainen wrote:
> On 10 Sep 09:01, Brian Foster (bfoster@redhat.com) wrote:
> 
> > > It is 2.5GB so not really nice to mail...
> 
> > Can you compress it?
> 
> Ah. Of course, should've done it in the first place.
> Still 250MB though:
> 
> https://huom.it.jyu.fi/tmp/data1.metadump.gz
> 

First off, I see ~60MB of corruption output before I even get to the
reported repair failure, so this appears to be an extremely severe
corruption and I wouldn't be surprised if ultimately beyond repair (not
that it matters for you, since you are restoring from backups).

The failure itself is an assert failure against an error return value
that appears to have a fallback path, so I'm not really sure why it's
there. I tried just removing it to see what happens. It ran to
completion, but there was a ton of output, write verifier errors, etc.,
so I'm not totally sure how coherent the result is yet. I'll run another
repair pass and do some directory traversals and whatnot and see if it
explodes...

I suspect what's more interesting at this point is what happened to
cause this level of corruption? What kind of event lead to this? Was it
a pure filesystem crash or some kind of hardware/raid failure?

Also, do you happen to know the geometry (xfs_info) of the original fs?
Repair was showing agno's up in the 20k's and now that I've mounted the
repaired image, xfs_info shows the following:

meta-data=/dev/loop0             isize=256    agcount=24576, agsize=65536 blks
         =                       sectsz=4096  attr=2, projid32bit=0
         =                       crc=0        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=1610612736, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

So that's a 6TB fs with over 24000 allocation groups of size 256MB, as
opposed to the mkfs default of 6 allocation groups of 1TB each. Is that
intentional?

Brian

> -- 
> Tapani Tarvainen
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "This is a bug."
  2015-09-10 14:51               ` Brian Foster
@ 2015-09-10 15:05                 ` Brian Foster
  2015-09-10 17:52                   ` Tapani Tarvainen
  2015-09-10 17:31                 ` Tapani Tarvainen
  1 sibling, 1 reply; 21+ messages in thread
From: Brian Foster @ 2015-09-10 15:05 UTC (permalink / raw)
  To: Tapani Tarvainen; +Cc: xfs

On Thu, Sep 10, 2015 at 10:51:54AM -0400, Brian Foster wrote:
> On Thu, Sep 10, 2015 at 04:05:30PM +0300, Tapani Tarvainen wrote:
> > On 10 Sep 09:01, Brian Foster (bfoster@redhat.com) wrote:
> > 
> > > > It is 2.5GB so not really nice to mail...
> > 
> > > Can you compress it?
> > 
> > Ah. Of course, should've done it in the first place.
> > Still 250MB though:
> > 
> > https://huom.it.jyu.fi/tmp/data1.metadump.gz
> > 
> 
> First off, I see ~60MB of corruption output before I even get to the
> reported repair failure, so this appears to be an extremely severe
> corruption and I wouldn't be surprised if ultimately beyond repair (not
> that it matters for you, since you are restoring from backups).
> 
> The failure itself is an assert failure against an error return value
> that appears to have a fallback path, so I'm not really sure why it's
> there. I tried just removing it to see what happens. It ran to
> completion, but there was a ton of output, write verifier errors, etc.,
> so I'm not totally sure how coherent the result is yet. I'll run another
> repair pass and do some directory traversals and whatnot and see if it
> explodes...
> 

FWIW, the follow up repair did come up clean so it appears (so far) to
have put the fs back together from a metadata standpoint. That said,
>570k files end up in lost+found and who knows whether the files
themselves would have contained the expected data once all of the bmaps
are fixed up and whatnot.

Brian

> I suspect what's more interesting at this point is what happened to
> cause this level of corruption? What kind of event lead to this? Was it
> a pure filesystem crash or some kind of hardware/raid failure?
> 
> Also, do you happen to know the geometry (xfs_info) of the original fs?
> Repair was showing agno's up in the 20k's and now that I've mounted the
> repaired image, xfs_info shows the following:
> 
> meta-data=/dev/loop0             isize=256    agcount=24576, agsize=65536 blks
>          =                       sectsz=4096  attr=2, projid32bit=0
>          =                       crc=0        finobt=0 spinodes=0
> data     =                       bsize=4096   blocks=1610612736, imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
> log      =internal               bsize=4096   blocks=2560, version=2
>          =                       sectsz=4096  sunit=1 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> So that's a 6TB fs with over 24000 allocation groups of size 256MB, as
> opposed to the mkfs default of 6 allocation groups of 1TB each. Is that
> intentional?
> 
> Brian
> 
> > -- 
> > Tapani Tarvainen
> > 
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "This is a bug."
  2015-09-10 14:51               ` Brian Foster
  2015-09-10 15:05                 ` Brian Foster
@ 2015-09-10 17:31                 ` Tapani Tarvainen
  2015-09-10 17:55                   ` Brian Foster
  1 sibling, 1 reply; 21+ messages in thread
From: Tapani Tarvainen @ 2015-09-10 17:31 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Thu, Sep 10, 2015 at 10:51:54AM -0400, Brian Foster (bfoster@redhat.com) wrote:

> First off, I see ~60MB of corruption output before I even get to the
> reported repair failure, so this appears to be an extremely severe
> corruption and I wouldn't be surprised if ultimately beyond repair

I assumed as much already.

> I suspect what's more interesting at this point is what happened to
> cause this level of corruption? What kind of event lead to this? Was it
> a pure filesystem crash or some kind of hardware/raid failure?

Hardware failure. Details are still a bit unclear but apparently raid
controller went haywire, offlining the array in the middle of
heavy filesystem use.

> Also, do you happen to know the geometry (xfs_info) of the original fs?

No (and xfs_info doesn't work on the copy made after crash as it
can't be mounted).

> Repair was showing agno's up in the 20k's and now that I've mounted the
> repaired image, xfs_info shows the following:
[...]
> So that's a 6TB fs with over 24000 allocation groups of size 256MB, as
> opposed to the mkfs default of 6 allocation groups of 1TB each. Is that
> intentional?

Not to my knowledge. Unless I'm mistaken, the filesystem was created
while the machine was running Debian Squeeze, using whatever defaults
were back then.

-- 
Tapani Tarvainen

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "This is a bug."
  2015-09-10 15:05                 ` Brian Foster
@ 2015-09-10 17:52                   ` Tapani Tarvainen
  2015-09-10 18:01                     ` Tapani Tarvainen
  0 siblings, 1 reply; 21+ messages in thread
From: Tapani Tarvainen @ 2015-09-10 17:52 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Thu, Sep 10, 2015 at 11:05:25AM -0400, Brian Foster (bfoster@redhat.com) wrote:

> FWIW, the follow up repair did come up clean so it appears (so far) to
> have put the fs back together from a metadata standpoint.

Indeed, now I can mount it. Not that I expect to find much
useful there.

Also, it turns out I have another similar case: another filesystem
(in the same RAID set) failed also, even though it'd initially
mounted without problems. Now it shows the same symptoms,
neither mount nor repair works (have't tried repair -L yet).

-- 
Tapani Tarvainen

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "This is a bug."
  2015-09-10 17:31                 ` Tapani Tarvainen
@ 2015-09-10 17:55                   ` Brian Foster
  2015-09-10 18:03                     ` Tapani Tarvainen
  0 siblings, 1 reply; 21+ messages in thread
From: Brian Foster @ 2015-09-10 17:55 UTC (permalink / raw)
  To: Tapani Tarvainen; +Cc: xfs

On Thu, Sep 10, 2015 at 08:31:38PM +0300, Tapani Tarvainen wrote:
> On Thu, Sep 10, 2015 at 10:51:54AM -0400, Brian Foster (bfoster@redhat.com) wrote:
> 
> > First off, I see ~60MB of corruption output before I even get to the
> > reported repair failure, so this appears to be an extremely severe
> > corruption and I wouldn't be surprised if ultimately beyond repair
> 
> I assumed as much already.
> 
> > I suspect what's more interesting at this point is what happened to
> > cause this level of corruption? What kind of event lead to this? Was it
> > a pure filesystem crash or some kind of hardware/raid failure?
> 
> Hardware failure. Details are still a bit unclear but apparently raid
> controller went haywire, offlining the array in the middle of
> heavy filesystem use.
> 
> > Also, do you happen to know the geometry (xfs_info) of the original fs?
> 
> No (and xfs_info doesn't work on the copy made after crash as it
> can't be mounted).
> 
> > Repair was showing agno's up in the 20k's and now that I've mounted the
> > repaired image, xfs_info shows the following:
> [...]
> > So that's a 6TB fs with over 24000 allocation groups of size 256MB, as
> > opposed to the mkfs default of 6 allocation groups of 1TB each. Is that
> > intentional?
> 
> Not to my knowledge. Unless I'm mistaken, the filesystem was created
> while the machine was running Debian Squeeze, using whatever defaults
> were back then.
> 

Strange... was the filesystem created small and then grown to a much
larger size via xfs_growfs? I just formatted a 1GB fs that started with
4 allocation groups and ends with 24576 (same as above) AGs when grown
to 6TB.

Brian

> -- 
> Tapani Tarvainen

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "This is a bug."
  2015-09-10 17:52                   ` Tapani Tarvainen
@ 2015-09-10 18:01                     ` Tapani Tarvainen
  0 siblings, 0 replies; 21+ messages in thread
From: Tapani Tarvainen @ 2015-09-10 18:01 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Thu, Sep 10, 2015 at 08:52:20PM +0300, Tapani Tarvainen (tapani@tapanitarvainen.fi) wrote:

> Also, it turns out I have another similar case: another filesystem
> (in the same RAID set) failed also, even though it'd initially
> mounted without problems. Now it shows the same symptoms,
> neither mount nor repair works (have't tried repair -L yet).

Apparently it only failed after someone tried to write on it.

Potentially interesting stuff from dmesg:

[31604.130052] XFS (dm-4): xfs_da_do_buf: bno 8388608 dir: inode 964
[31604.130080] XFS (dm-4): [00] br_startoff 8388608 br_startblock -2 br_blockcount 1 br_state 0
[31604.130126] XFS (dm-4): Internal error xfs_da_do_buf(1) at line 2011 of file /build/linux-4wkEzn/
linux-3.2.68/fs/xfs/xfs_da_btree.c.  Caller 0xffffffffa0371b67
[31604.130127] 
[31604.130213] Pid: 18082, comm: du Not tainted 3.2.0-4-amd64 #1 Debian 3.2.68-1+deb7u1
[...]

-- 
Tapani Tarvainen

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "This is a bug."
  2015-09-10 17:55                   ` Brian Foster
@ 2015-09-10 18:03                     ` Tapani Tarvainen
  2015-09-10 18:33                       ` Brian Foster
  2015-09-11  0:12                       ` Eric Sandeen
  0 siblings, 2 replies; 21+ messages in thread
From: Tapani Tarvainen @ 2015-09-10 18:03 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Thu, Sep 10, 2015 at 01:55:58PM -0400, Brian Foster (bfoster@redhat.com) wrote:

> > > So that's a 6TB fs with over 24000 allocation groups of size 256MB, as
> > > opposed to the mkfs default of 6 allocation groups of 1TB each. Is that
> > > intentional?
> > 
> > Not to my knowledge. Unless I'm mistaken, the filesystem was created
> > while the machine was running Debian Squeeze, using whatever defaults
> > were back then.

> Strange... was the filesystem created small and then grown to a much
> larger size via xfs_growfs?

Almost certainly yes, although how small it initially was I'm not
sure.

-- 
Tapani Tarvainen

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "This is a bug."
  2015-09-10 18:03                     ` Tapani Tarvainen
@ 2015-09-10 18:33                       ` Brian Foster
  2015-09-11  6:19                         ` Tapani Tarvainen
  2015-09-11  0:12                       ` Eric Sandeen
  1 sibling, 1 reply; 21+ messages in thread
From: Brian Foster @ 2015-09-10 18:33 UTC (permalink / raw)
  To: Tapani Tarvainen; +Cc: xfs

On Thu, Sep 10, 2015 at 09:03:39PM +0300, Tapani Tarvainen wrote:
> On Thu, Sep 10, 2015 at 01:55:58PM -0400, Brian Foster (bfoster@redhat.com) wrote:
> 
> > > > So that's a 6TB fs with over 24000 allocation groups of size 256MB, as
> > > > opposed to the mkfs default of 6 allocation groups of 1TB each. Is that
> > > > intentional?
> > > 
> > > Not to my knowledge. Unless I'm mistaken, the filesystem was created
> > > while the machine was running Debian Squeeze, using whatever defaults
> > > were back then.
> 
> > Strange... was the filesystem created small and then grown to a much
> > larger size via xfs_growfs?
> 
> Almost certainly yes, although how small it initially was I'm not
> sure.
> 

That probably explains that then. While growfs is obviously supported,
it's not usually a great idea to grow from something really small to
really large like this precisely because you end up with this kind of
weird geometry. mkfs tries to format the fs to an ideal default geometry
based on the current size of the device, but the allocation group size
cannot be modified once the filesystem is created. Therefore, growfs can
only add more AGs of the original size.

As a result, you end up with a 6TB filesystem with >24k allocation
groups, whereas mkfs will format a 6TB device with 6 allocation groups
by default (though I think specifying a stripe unit can tweak this). My
understanding is that this could be increased sanely on large cpu count
systems and such, but we're probably talking about going to something on
the order of 32 or 64 allocation groups as opposed to thousands.

I'd expect such a large filesystem with such small allocation groups to
probably introduce overhead in terms of metadata usage (24k agi's,
agf's, 2x free space btrees and 1x inode btree per AG), spending more
time in AG selection algorithms for allocations and whatnot, increased
fragmentation due to capping the maximum contiguous extent size,
creating more work for userspace tools such as repair, etc., and
probably to have other weird or non-obvious side effects that I'm not
familiar with.

Brian

> -- 
> Tapani Tarvainen

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "This is a bug."
  2015-09-10 18:03                     ` Tapani Tarvainen
  2015-09-10 18:33                       ` Brian Foster
@ 2015-09-11  0:12                       ` Eric Sandeen
  1 sibling, 0 replies; 21+ messages in thread
From: Eric Sandeen @ 2015-09-11  0:12 UTC (permalink / raw)
  To: xfs

On 9/10/15 1:03 PM, Tapani Tarvainen wrote:
> On Thu, Sep 10, 2015 at 01:55:58PM -0400, Brian Foster (bfoster@redhat.com) wrote:
> 
>>>> So that's a 6TB fs with over 24000 allocation groups of size 256MB, as
>>>> opposed to the mkfs default of 6 allocation groups of 1TB each. Is that
>>>> intentional?
>>>
>>> Not to my knowledge. Unless I'm mistaken, the filesystem was created
>>> while the machine was running Debian Squeeze, using whatever defaults
>>> were back then.
> 
>> Strange... was the filesystem created small and then grown to a much
>> larger size via xfs_growfs?
> 
> Almost certainly yes, although how small it initially was I'm not
> sure.

Oof; with a default of 4 AGs that means that this filesystem was likely
grown from 1G to 6T.

Like Brian says, that is definitely not recommended.  ;)

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "This is a bug."
  2015-09-10 18:33                       ` Brian Foster
@ 2015-09-11  6:19                         ` Tapani Tarvainen
  0 siblings, 0 replies; 21+ messages in thread
From: Tapani Tarvainen @ 2015-09-11  6:19 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On 10 Sep 14:33, Brian Foster (bfoster@redhat.com) wrote:

> > >... was the filesystem created small and then grown to a much
> > > larger size via xfs_growfs?
> > 
> > Almost certainly yes, although how small it initially was I'm not
> > sure.

It actually been grown several times over the years - the system is
rather old. Indeed, all its disks have been replaced with bigger ones
without reinstallation, so the filesystem could not have been
initially created as big as it is now.

> That probably explains that then. While growfs is obviously supported,
> it's not usually a great idea to grow from something really small to
> really large like this

That's good to know - but sometimes you just can't plan
ahead far enough.

> I'd expect such a large filesystem with such small allocation groups to
> probably introduce overhead in terms of metadata usage (24k agi's,
> agf's, 2x free space btrees and 1x inode btree per AG), spending more
> time in AG selection algorithms for allocations and whatnot, increased
> fragmentation due to capping the maximum contiguous extent size,
> creating more work for userspace tools such as repair, etc., and
> probably to have other weird or non-obvious side effects that I'm not
> familiar with.

So it's likely to also make it more fragile and harder to repair in
case of a disaster like this.

So, my take from this is that

(1) The bug was real but it was just in the old version of xfs_repair
in Debian Wheezy, and even when the machine is updated to Jessie
(due soon) it's better to install latest (4.20) xfsprogs from sources
rather Jessie's packaged 3.20; and

(2) When a filesystem grows a lot it is better to recreate it
(at least every now and then if the growth is incremental)
rather than keep growing it forever.

If there's anything you'd like to add, and especially if there
is something you'd still like to debug where I could help,
please let me know.

Thank you for your help,

-- 
Tapani Tarvainen

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2015-09-11  6:19 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-10  9:18 "This is a bug." Tapani Tarvainen
2015-09-10 10:31 ` Tapani Tarvainen
2015-09-10 11:53   ` Emmanuel Florac
2015-09-10 12:05     ` Tapani Tarvainen
2015-09-10 11:48 ` Emmanuel Florac
2015-09-10 11:55   ` Tapani Tarvainen
2015-09-10 12:30     ` Tapani Tarvainen
2015-09-10 12:36       ` Brian Foster
2015-09-10 12:54         ` Tapani Tarvainen
2015-09-10 13:01           ` Brian Foster
2015-09-10 13:05             ` Tapani Tarvainen
2015-09-10 14:51               ` Brian Foster
2015-09-10 15:05                 ` Brian Foster
2015-09-10 17:52                   ` Tapani Tarvainen
2015-09-10 18:01                     ` Tapani Tarvainen
2015-09-10 17:31                 ` Tapani Tarvainen
2015-09-10 17:55                   ` Brian Foster
2015-09-10 18:03                     ` Tapani Tarvainen
2015-09-10 18:33                       ` Brian Foster
2015-09-11  6:19                         ` Tapani Tarvainen
2015-09-11  0:12                       ` Eric Sandeen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.