All of lore.kernel.org
 help / color / mirror / Atom feed
* xfs_repair of critical volume
@ 2010-10-31  7:54 Eli Morris
  2010-10-31  9:54 ` Stan Hoeppner
                   ` (3 more replies)
  0 siblings, 4 replies; 35+ messages in thread
From: Eli Morris @ 2010-10-31  7:54 UTC (permalink / raw)
  To: xfs

Hi,

I have a large XFS filesystem (60 TB) that is composed of 5 hardware RAID 6 volumes. One of those volumes had several drives fail in a very short time and we lost that volume. However, four of the volumes seem OK. We are in a worse state because our backup unit failed a week later when four drives simultaneously went offline. So we are in a bad very state. I am able to mount the filesystem that consists of the four remaining volumes. I was thinking about running xfs_repair on the filesystem in hopes it would recover all the files that were not on the bad volume, which are obviously gone. Since our backup is gone, I'm very concerned about doing anything to lose the data that will still have. I ran xfs_repair with the -n flag and I have a lengthly file of things that program would do to our filesystem. I don't have the expertise to decipher the output and figure out if xfs_repair would fix the filesystem in a way that would retain our remaining data or if it would, let's say t!
 runcate the filesystem at the data loss boundary (our lost volume was the middle one of the five volumes), returning 2/5 of the filesystem or some other undesirable result. I would post the xfs_repair -n output here, but it is more than a megabyte. I'm hoping some one of you xfs gurus will take pity on me and let me send you the output to look at or give me an idea as to what they think xfs_repair is likely to do if I should run it or if anyone has any suggestions as to how to get back as much data as possible in this recovery.

thanks very much,

Eli


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-10-31  7:54 xfs_repair of critical volume Eli Morris
@ 2010-10-31  9:54 ` Stan Hoeppner
  2010-11-12  8:48   ` Eli Morris
  2010-10-31 14:10 ` Emmanuel Florac
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 35+ messages in thread
From: Stan Hoeppner @ 2010-10-31  9:54 UTC (permalink / raw)
  To: xfs; +Cc: ermorris

Eli Morris put forth on 10/31/2010 2:54 AM:
> Hi,
> 
> I have a large XFS filesystem (60 TB) that is composed of 5 hardware RAID 6 volumes. One of those volumes had several drives fail in a very short time and we lost that volume. However, four of the volumes seem OK. We are in a worse state because our backup unit failed a week later when four drives simultaneously went offline. So we are in a bad very state. I am able to mount the filesystem that consists of the four remaining volumes. I was thinking about running xfs_repair on the filesystem in hopes it would recover all the files that were not on the bad volume, which are obviously gone. Since our backup is gone, I'm very concerned about doing anything to lose the data that will still have. I ran xfs_repair with the -n flag and I have a lengthly file of things that program would do to our filesystem. I don't have the expertise to decipher the output and figure out if xfs_repair would fix the filesystem in a way that would retain our remaining data or if it would, let's say
 
 t!
>  runcate the filesystem at the data loss boundary (our lost volume was the middle one of the five volumes), returning 2/5 of the filesystem or some other undesirable result. I would post the xfs_repair -n output here, but it is more than a megabyte. I'm hoping some one of you xfs gurus will take pity on me and let me send you the output to look at or give me an idea as to what they think xfs_repair is likely to do if I should run it or if anyone has any suggestions as to how to get back as much data as possible in this recovery.

This isn't the storage that houses the genome data is it?

Unfortunately I don't have an answer for you Eli, or, at least, not one
you would like to hear.  One of the devs will be able to tell you if you
need to start typing the letter of resignation or loading the suicide
pistol.  (Apologies if the attempt at humor during this difficult time
is inappropriate.  Sometimes a grin, giggle, or laugh can help with the
stress, even if for only a moment or two. :)

One thing I recommend is simply posting the xfs_repair output to a web
page so you don't have to email it to multiple people.  If you don't
have an easily accessible resource for this at the university I'll
gladly post it on my webserver and post the URL here to the XFS
list--takes me about 2 minutes.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-10-31  7:54 xfs_repair of critical volume Eli Morris
  2010-10-31  9:54 ` Stan Hoeppner
@ 2010-10-31 14:10 ` Emmanuel Florac
  2010-10-31 14:41   ` Steve Costaras
  2010-10-31 16:52 ` Roger Willcocks
  2010-11-01 22:21 ` Eric Sandeen
  3 siblings, 1 reply; 35+ messages in thread
From: Emmanuel Florac @ 2010-10-31 14:10 UTC (permalink / raw)
  To: Eli Morris; +Cc: xfs

Le Sun, 31 Oct 2010 00:54:13 -0700 vous écriviez:

> I have a large XFS filesystem (60 TB) that is composed of 5 hardware
> RAID 6 volumes. One of those volumes had several drives fail in a
> very short time and we lost that volume.

You may still have a slight chance to repair the broken RAID volume.
What is the type and model of RAID controller? What is the model of the
drives? Did you aggregate the RAID arrays with LVM?


Most drives failures (particularly on late 2009 Seagate SATA drives) are
both relatively frequent and transitory, i. e. may randomly recover
after a while.

What did you try? Can you power down the faulty RAID array
entirely and power it up after a while? Did you try to actually freeze
the failed drives (it may revive them for a while)? Did you try to run
SpinRite or another utility on the broken drives, individually ?

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-10-31 14:10 ` Emmanuel Florac
@ 2010-10-31 14:41   ` Steve Costaras
  0 siblings, 0 replies; 35+ messages in thread
From: Steve Costaras @ 2010-10-31 14:41 UTC (permalink / raw)
  To: xfs



On 2010-10-31 09:10, Emmanuel Florac wrote:
> Did you try to actually freeze the failed drives (it may revive them 
> for a while)?

Do NOT try this.   It's only good for some /very/ specific types of 
issues with older drives.   With an array of your size you are probably 
running relatively current drives (i.e. past 5-7 years) and this has a 
vary large probability of causing more damage.

The other questions are to the point to determine the circumstances 
around the failure and what the  state of the array was at the 
time.      Take your time, do not rush anything; you are already hanging 
over a cliff.

First thing if you are able is to do a bit copy of the physical drives 
to spares that way you can always get back to the same point where you 
are now.    This may not be practical with such a large array but if you 
have the means it's worth it.

You want to start from the lowest component and work your way up.    So 
you want to make sure that your raid array itself is sane before looking 
to fix any volume management functions and that before looking at your 
file systems.    When dealing with degraded or failed arrays be careful 
on what you do if you have write cache enabled on your controllers.   
Talk to the vendor!   Whatever operations you do on the card could cause 
this data to be lost and that can  be substantial with some controllers 
(MiB->GiB ranges).       Normally we  run w/ write cache disabled (both 
on the drive and on the raid controllers) for critical data to avoid 
having too much data in flight if a problem ever did occur.

The points that Emmanuel mentioned are valid;    Though would hold off 
on powering down until you are able to get all the geometry information 
from your raid's (unless you already have them).   Also would hold off 
until you determine if you have any dirty caches on the raid 
controllers.    Most controllers keep a rotating buffer of events 
including failure pointers that if you re-boot the re-scanning of drives 
upon start may push that pointer further down the stack until it gets 
lost and then you won't be able to recover outstanding data.   I've seen 
this set at 128 - 256 entries on various systems, another reason to keep 
drives per controller counts down.

Steve


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-10-31  7:54 xfs_repair of critical volume Eli Morris
  2010-10-31  9:54 ` Stan Hoeppner
  2010-10-31 14:10 ` Emmanuel Florac
@ 2010-10-31 16:52 ` Roger Willcocks
  2010-11-01 22:21 ` Eric Sandeen
  3 siblings, 0 replies; 35+ messages in thread
From: Roger Willcocks @ 2010-10-31 16:52 UTC (permalink / raw)
  To: xfs

Don't do anything which has the potential to write to your drives until you have a full bit-for-bit copy of the existing volumes.

In particular, don't run xfs_repair. This is is a hardware issue. It can't be fixed with software.

Now stop and think. There's a good chance a professional data repair outfit can get stuff off your failed drives.

So before you go any further:

* carefully label all the drives, note down their serial numbers, and their positions in the array. You need to do this for the 'failed' drives too.

* speak to your raid vendor. They will have seen this before. 

* try and find out why multiple drives failed on both your main and your backup systems. Was it power related? Temperature? Vibration? Or a bad batch of disks?

* speak to the drive manufacturer. They will have seen this before.

Come back to this list and give us an update. This isn't an xfs problem per se, but there are several people here who work regularly with multi-terabyte arrays.


--
Roger



On 31 Oct 2010, at 07:54, Eli Morris wrote:

> Hi,
> 
> I have a large XFS filesystem (60 TB) that is composed of 5 hardware RAID 6 volumes. One of those volumes had several drives fail in a very short time and we lost that volume. However, four of the volumes seem OK. We are in a worse state because our backup unit failed a week later when four drives simultaneously went offline. So we are in a bad very state. I am able to mount the filesystem that consists of the four remaining volumes. I was thinking about running xfs_repair on the filesystem in hopes it would recover all the files that were not on the bad volume, which are obviously gone. Since our backup is gone, I'm very concerned about doing anything to lose the data that will still have. I ran xfs_repair with the -n flag and I have a lengthly file of things that program would do to our filesystem. I don't have the expertise to decipher the output and figure out if xfs_repair would fix the filesystem in a way that would retain our remaining data or if it would, let's say!
  t!
> runcate the filesystem at the data loss boundary (our lost volume was the middle one of the five volumes), returning 2/5 of the filesystem or some other undesirable result. I would post the xfs_repair -n output here, but it is more than a megabyte. I'm hoping some one of you xfs gurus will take pity on me and let me send you the output to look at or give me an idea as to what they think xfs_repair is likely to do if I should run it or if anyone has any suggestions as to how to get back as much data as possible in this recovery.
> 
> thanks very much,
> 
> Eli
> 
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-10-31  7:54 xfs_repair of critical volume Eli Morris
                   ` (2 preceding siblings ...)
  2010-10-31 16:52 ` Roger Willcocks
@ 2010-11-01 22:21 ` Eric Sandeen
  2010-11-01 23:32   ` Eli Morris
  3 siblings, 1 reply; 35+ messages in thread
From: Eric Sandeen @ 2010-11-01 22:21 UTC (permalink / raw)
  To: Eli Morris; +Cc: xfs

On 10/31/10 2:54 AM, Eli Morris wrote:
> I have a large XFS filesystem (60 TB) that is composed of 5 hardware
> RAID 6 volumes. One of those volumes had several drives fail in a
> very short time and we lost that volume. However, four of the volumes
> seem OK. We are in a worse state because our backup unit failed a
> week later when four drives simultaneously went offline. So we are in
> a bad very state. I am able to mount the filesystem that consists of
> the four remaining volumes. I was thinking about running xfs_repair
> on the filesystem in hopes it would recover all the files that were
> not on the bad volume, which are obviously gone. Since our backup is
> gone, I'm very concerned about doing anything to lose the data that
> will still have. I ran xfs_repair with the -n flag and I have a
> lengthly file of things that program would do to our filesystem. I
> don't have the expertise to decipher the output and figure out if
> xfs_repair would fix the filesystem in a way that would retain our
> remaining data or if it would, let's say t!


One thing you could do is make an xfs_metadump image, xfs_mdrestore
it to a sparse file, and then do a real xfs_repair run on that.
You can then mount the repaired image and see what's there.
So from a metadata perspective, you can do a real-live repair
run on an image, and see what happens.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-11-01 22:21 ` Eric Sandeen
@ 2010-11-01 23:32   ` Eli Morris
  2010-11-02  0:14     ` Eric Sandeen
  0 siblings, 1 reply; 35+ messages in thread
From: Eli Morris @ 2010-11-01 23:32 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs


On Nov 1, 2010, at 3:21 PM, Eric Sandeen wrote:

> On 10/31/10 2:54 AM, Eli Morris wrote:
>> I have a large XFS filesystem (60 TB) that is composed of 5 hardware
>> RAID 6 volumes. One of those volumes had several drives fail in a
>> very short time and we lost that volume. However, four of the volumes
>> seem OK. We are in a worse state because our backup unit failed a
>> week later when four drives simultaneously went offline. So we are in
>> a bad very state. I am able to mount the filesystem that consists of
>> the four remaining volumes. I was thinking about running xfs_repair
>> on the filesystem in hopes it would recover all the files that were
>> not on the bad volume, which are obviously gone. Since our backup is
>> gone, I'm very concerned about doing anything to lose the data that
>> will still have. I ran xfs_repair with the -n flag and I have a
>> lengthly file of things that program would do to our filesystem. I
>> don't have the expertise to decipher the output and figure out if
>> xfs_repair would fix the filesystem in a way that would retain our
>> remaining data or if it would, let's say t!
> 
> 
> One thing you could do is make an xfs_metadump image, xfs_mdrestore
> it to a sparse file, and then do a real xfs_repair run on that.
> You can then mount the repaired image and see what's there.
> So from a metadata perspective, you can do a real-live repair
> run on an image, and see what happens.
> 
> -Eric

Hi Eric,

Thanks for the suggestion. I tried is out and this is what happened when I ran xfs_mdrestore:

# xfs_mdrestore -g xfs_dump_image vol5_dump
xfs_mdrestore: cannot set filesystem image size: File too large
# 

Any ideas? Is the file as large as the volume or something? I think you had a really good suggestion. If you know how to make this work, I think that would be great.

thanks,

Eli


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-11-01 23:32   ` Eli Morris
@ 2010-11-02  0:14     ` Eric Sandeen
  0 siblings, 0 replies; 35+ messages in thread
From: Eric Sandeen @ 2010-11-02  0:14 UTC (permalink / raw)
  To: Eli Morris; +Cc: xfs

On 11/1/10 6:32 PM, Eli Morris wrote:
> 
> On Nov 1, 2010, at 3:21 PM, Eric Sandeen wrote:
> 
>> On 10/31/10 2:54 AM, Eli Morris wrote:
>>> I have a large XFS filesystem (60 TB) that is composed of 5
>>> hardware RAID 6 volumes. One of those volumes had several drives
>>> fail in a very short time and we lost that volume. However, four
>>> of the volumes seem OK. We are in a worse state because our
>>> backup unit failed a week later when four drives simultaneously
>>> went offline. So we are in a bad very state. I am able to mount
>>> the filesystem that consists of the four remaining volumes. I was
>>> thinking about running xfs_repair on the filesystem in hopes it
>>> would recover all the files that were not on the bad volume,
>>> which are obviously gone. Since our backup is gone, I'm very
>>> concerned about doing anything to lose the data that will still
>>> have. I ran xfs_repair with the -n flag and I have a lengthly
>>> file of things that program would do to our filesystem. I don't
>>> have the expertise to decipher the output and figure out if 
>>> xfs_repair would fix the filesystem in a way that would retain
>>> our remaining data or if it would, let's say t!
>> 
>> 
>> One thing you could do is make an xfs_metadump image,
>> xfs_mdrestore it to a sparse file, and then do a real xfs_repair
>> run on that. You can then mount the repaired image and see what's
>> there. So from a metadata perspective, you can do a real-live
>> repair run on an image, and see what happens.
>> 
>> -Eric
> 
> Hi Eric,
> 
> Thanks for the suggestion. I tried is out and this is what happened
> when I ran xfs_mdrestore:
> 
> # xfs_mdrestore -g xfs_dump_image vol5_dump xfs_mdrestore: cannot set
> filesystem image size: File too large #
> 
> Any ideas? Is the file as large as the volume or something? I think
> you had a really good suggestion. If you know how to make this work,
> I think that would be great.

Guessing you tried to create it on an ext3 filesystem?

The file has a maximum offset == the size of the filesystem, but it is
sparse so does not take up that much disk space.

ext3 can't go beyond a 2T file offset.

Making the file "dump_image" on an xfs filesystem should do the trick.

-Eric

> thanks,
> 
> Eli
> 
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-10-31  9:54 ` Stan Hoeppner
@ 2010-11-12  8:48   ` Eli Morris
  2010-11-12 13:22     ` Michael Monnerie
  0 siblings, 1 reply; 35+ messages in thread
From: Eli Morris @ 2010-11-12  8:48 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: xfs


On Oct 31, 2010, at 2:54 AM, Stan Hoeppner wrote:

> Eli Morris put forth on 10/31/2010 2:54 AM:
>> Hi,
>> 
>> I have a large XFS filesystem (60 TB) that is composed of 5 hardware RAID 6 volumes. One of those volumes had several drives fail in a very short time and we lost that volume. However, four of the volumes seem OK. We are in a worse state because our backup unit failed a week later when four drives simultaneously went offline. So we are in a bad very state. I am able to mount the filesystem that consists of the four remaining volumes. I was thinking about running xfs_repair on the filesystem in hopes it would recover all the files that were not on the bad volume, which are obviously gone. Since our backup is gone, I'm very concerned about doing anything to lose the data that will still have. I ran xfs_repair with the -n flag and I have a lengthly file of things that program would do to our filesystem. I don't have the expertise to decipher the output and figure out if xfs_repair would fix the filesystem in a way that would retain our remaining data or if it would, let's say
> t!
>> runcate the filesystem at the data loss boundary (our lost volume was the middle one of the five volumes), returning 2/5 of the filesystem or some other undesirable result. I would post the xfs_repair -n output here, but it is more than a megabyte. I'm hoping some one of you xfs gurus will take pity on me and let me send you the output to look at or give me an idea as to what they think xfs_repair is likely to do if I should run it or if anyone has any suggestions as to how to get back as much data as possible in this recovery.
> 
> This isn't the storage that houses the genome data is it?
> 
> Unfortunately I don't have an answer for you Eli, or, at least, not one
> you would like to hear.  One of the devs will be able to tell you if you
> need to start typing the letter of resignation or loading the suicide
> pistol.  (Apologies if the attempt at humor during this difficult time
> is inappropriate.  Sometimes a grin, giggle, or laugh can help with the
> stress, even if for only a moment or two. :)
> 
> One thing I recommend is simply posting the xfs_repair output to a web
> page so you don't have to email it to multiple people.  If you don't
> have an easily accessible resource for this at the university I'll
> gladly post it on my webserver and post the URL here to the XFS
> list--takes me about 2 minutes.
> 
> -- 
> Stan


Hi guys,

For reference: vol5 is the 62TB XFS filesystem on Centos 5.2 I had that was composed of 5 RAID units. One went bye-bye and was re-initialized. I was able to get it back in the LVM volume with the other units and I could mount the whole thing again as vol5, just with a huge chunk missing. I want to try and repair what I have left, so I have something workable, while retaining as much data as I can of what is left.....

After thinking about a lot of options for both my failed raids (including moving to another country), I converted one of one old legacy raid units to XFS so I could do an xfs_metadump on vol5 then xfs_mdrestore on the dump file and then do an xfs_repair on that as a test. It seems to go OK, so I tried in on the real volume. I don't really understand what happened. Everything looks the same as prior to losing 1/5 of the disk volume. du, df report the same numbers as they always have for the volume. Nothing looks missing. It must be of course. The filesystem must be pointing to files that don't exist, or something like that. Is there a way to fix that, to say, remove files that don't exist anymore, sort of command? I thought that xfs_repair would do that, but apparently not in this case.

thanks as always,

Eli




_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-11-12  8:48   ` Eli Morris
@ 2010-11-12 13:22     ` Michael Monnerie
  2010-11-12 22:14       ` Stan Hoeppner
  2010-11-12 23:01       ` Eli Morris
  0 siblings, 2 replies; 35+ messages in thread
From: Michael Monnerie @ 2010-11-12 13:22 UTC (permalink / raw)
  To: xfs; +Cc: Eli Morris


[-- Attachment #1.1: Type: Text/Plain, Size: 1601 bytes --]

On Freitag, 12. November 2010 Eli Morris wrote:
> The filesystem must be pointing to files that don't exist, or
> something like that. Is there a way to fix that, to say, remove
> files that don't exist anymore, sort of command? I thought that
> xfs_repair would do that, but apparently not in this case.

The filesystem is not optimized for "I replace part of the disk contents 
with zeroes" and find that errors. You will have to look in each file if 
it's contents are still valid, or maybe bogus.

I find the robustness of XFS amazing: You overwrote 1/5th of the disk 
with zeroes, and it still works :-)

Now that you are in this state, I'd recommend you
a) make a *real* *tape* *backup*
You learned it the hard way: a disk copy is no backup, at least I hope 
you learned that lesson
b) Maybe also copy all your files to another system, or you trust your 
backup from a) very much
c) reinitialize the full array. Really recreate every array, 2 b sure 
all your RAIDs work this time.
d) copy your data backup - either from the other copy of b), or from the 
tape backup in a)

Then you will see a correct view of disk space used and which files are 
still there. Now you must check every files content, some will have 
bogus content.

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531

// ****** Radiointerview zum Thema Spam ******
// http://www.it-podcast.at/archiv.html#podcast-100716
// 
// Haus zu verkaufen: http://zmi.at/langegg/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-11-12 13:22     ` Michael Monnerie
@ 2010-11-12 22:14       ` Stan Hoeppner
  2010-11-13  8:19         ` Emmanuel Florac
  2010-12-04 10:30         ` Martin Steigerwald
  2010-11-12 23:01       ` Eli Morris
  1 sibling, 2 replies; 35+ messages in thread
From: Stan Hoeppner @ 2010-11-12 22:14 UTC (permalink / raw)
  To: xfs

Michael Monnerie put forth on 11/12/2010 7:22 AM:

> I find the robustness of XFS amazing: You overwrote 1/5th of the disk 
> with zeroes, and it still works :-)

This isn't "robustness" Michael.  If anything it's a serious problem.
XFS is reporting that hundreds or thousands of files that have been
physically removed still exist.  Regardless of how he arrived at this
position, how is this "robust"?  Most people would consider this
inconsistency of state a "corruption" situation, not "robustness".

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-11-12 13:22     ` Michael Monnerie
  2010-11-12 22:14       ` Stan Hoeppner
@ 2010-11-12 23:01       ` Eli Morris
  2010-11-13 15:25         ` Michael Monnerie
  2010-11-14 11:05         ` Dave Chinner
  1 sibling, 2 replies; 35+ messages in thread
From: Eli Morris @ 2010-11-12 23:01 UTC (permalink / raw)
  To: Michael Monnerie; +Cc: xfs


On Nov 12, 2010, at 5:22 AM, Michael Monnerie wrote:

> On Freitag, 12. November 2010 Eli Morris wrote:
>> The filesystem must be pointing to files that don't exist, or
>> something like that. Is there a way to fix that, to say, remove
>> files that don't exist anymore, sort of command? I thought that
>> xfs_repair would do that, but apparently not in this case.
> 
> The filesystem is not optimized for "I replace part of the disk contents 
> with zeroes" and find that errors. You will have to look in each file if 
> it's contents are still valid, or maybe bogus.
> 
> I find the robustness of XFS amazing: You overwrote 1/5th of the disk 
> with zeroes, and it still works :-)
> 
> Now that you are in this state, I'd recommend you
> a) make a *real* *tape* *backup*
> You learned it the hard way: a disk copy is no backup, at least I hope 
> you learned that lesson
> b) Maybe also copy all your files to another system, or you trust your 
> backup from a) very much
> c) reinitialize the full array. Really recreate every array, 2 b sure 
> all your RAIDs work this time.
> d) copy your data backup - either from the other copy of b), or from the 
> tape backup in a)
> 
> Then you will see a correct view of disk space used and which files are 
> still there. Now you must check every files content, some will have 
> bogus content.
> 
> -- 
> mit freundlichen Grüssen,
> Michael Monnerie, Ing. BSc
> 
> it-management Internet Services: Protéger
> http://proteger.at [gesprochen: Prot-e-schee]
> Tel: +43 660 / 415 6531
> 
> // ****** Radiointerview zum Thema Spam ******
> // http://www.it-podcast.at/archiv.html#podcast-100716
> // 
> // Haus zu verkaufen: http://zmi.at/langegg/


Hi Michael,

thanks for the advise. 

Let me see if I can give you and everyone else a little more information and clarify this problem somewhat. And if there is nothing practical that can be done, then OK. What I am looking for is the best PRACTICAL outcome here given our resources and if anyone has an idea that might be helpful, that would be awesome. I put practical in caps, because that is the rub in all this. We could send X to a data recovery service, but there is no money for that. We could do Y, but if it takes a couple of months to accomplish, it might be better to do Z, even though Z is riskier or deletes some amount of data, because it is cheap and only takes one day to do.


This is a small University lab setup. We do not have access to a lot of resources. We do have a partial tape backup of this data, but...

a) The time it takes to back up the full 62 TB is long enough to tape that it is not really much of a help. Most days we have hundreds of GBs generated and removed. We back up about 12 TB of the most important files, and ones that don't rapidly change, but our tape backup system just can not keep up with everything. Yes, it would be *fantastic* to have a full tape backup system that is practical and has the capacity to deal with everything. Because we have had so many problems with our storage lately, the backup is somewhat stale, partial, and a little suspect. Still, it is there and I will investigate what can be recovered from it.

b) I don't have another system to copy the files to. Our disk backup is screwed up and that is all of our storage. We do have a tape backup, as I mentioned, and while it is theoretically possible to dump to tape, rebuild the RAID arrays, then dump back, the practical aspects of this process make this a so-so option. Realistically, it would take more than a month to accomplish. It is a possibility, but is not a really great option.

c) We are working on making sure everything is working OK. I think the power output from our UPS might be problematic. We are definitely investigating that, because it could be behind all these crazy problems.

d) Checking every files' content manually is not something that is going to work. It would, literally, take years.

Again, thanks for any advise. I'm not trying to be negative, just realistic in what I have to work with in terms of resources and time. 

Would de-fraging the filesystem remove those zeroed files from the filesystem? Does anyone make a XFS utility program that might help? Maybe an XFS utility that can be used to remove zeroed files from the filesystem? Or remove files that are stored in that one bad LVM volume?

thanks very much,

Eli


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-11-12 22:14       ` Stan Hoeppner
@ 2010-11-13  8:19         ` Emmanuel Florac
  2010-11-13  9:28           ` Stan Hoeppner
  2010-12-04 10:30         ` Martin Steigerwald
  1 sibling, 1 reply; 35+ messages in thread
From: Emmanuel Florac @ 2010-11-13  8:19 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: xfs

Le Fri, 12 Nov 2010 16:14:52 -0600 vous écriviez:

> This isn't "robustness" Michael.  If anything it's a serious problem.

I beg to disagree. Would it be better if instead of still having some
of the data, everything was lost? At what level of accidental
destruction do you think that the whole data set should be made
unavailable? 10%? 5? 1? 

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-11-13  8:19         ` Emmanuel Florac
@ 2010-11-13  9:28           ` Stan Hoeppner
  2010-11-13 15:35             ` Michael Monnerie
  0 siblings, 1 reply; 35+ messages in thread
From: Stan Hoeppner @ 2010-11-13  9:28 UTC (permalink / raw)
  To: xfs

Emmanuel Florac put forth on 11/13/2010 2:19 AM:
> Le Fri, 12 Nov 2010 16:14:52 -0600 vous écriviez:
> 
>> This isn't "robustness" Michael.  If anything it's a serious problem.
> 
> I beg to disagree. Would it be better if instead of still having some
> of the data, everything was lost? At what level of accidental
> destruction do you think that the whole data set should be made
> unavailable? 10%? 5? 1? 

You've missed the point of this sub thread discussion, or I did.  He
stated that having the metadata show the files still exist is a positive
thing.  The files are gone.  I stated that this discrepancy is not good
thing.

I believe you are confused, thinking this micro discussion is dealing
with the OP's overall situation.  It is not.  It is dealing strictly
with the issue of the lost set of disks, the files that were on them,
and the fact the metadata says they still exist.  I believe this is due
to the fact that he hasn't run a destructive xfs_repair yet, which I'm
guessing will remove those orphaned metadata entries.

Again, I'm pretty sure you misunderstood exactly what we were talking
about, or I misunderstood what he was talking about, heck, maybe both.
I absolutely was not stating anything akin to throwing the baby out with
the bath water.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-11-12 23:01       ` Eli Morris
@ 2010-11-13 15:25         ` Michael Monnerie
  2010-11-14 11:05         ` Dave Chinner
  1 sibling, 0 replies; 35+ messages in thread
From: Michael Monnerie @ 2010-11-13 15:25 UTC (permalink / raw)
  To: xfs; +Cc: Eli Morris


[-- Attachment #1.1: Type: Text/Plain, Size: 3508 bytes --]

On Samstag, 13. November 2010 Eli Morris wrote:
> This is a small University lab setup. We do not have access to a lot
> of resources. We do have a partial tape backup of this data, but...

Yes, Eli, I understand you. We also have universities as customers, and 
I know there's no money. But you're definitely deep in shit now. Isn't 
there another department with tape backup that you could "borrow" in 
this state of crisis?
 
> a) tape backup

So, if you can't do that, we forget it.
 
> b) I don't have another system to copy the files to. (disk backup)

So, you can't even copy the rest of the still-existing data away.

The way you describe it, you will have to mess around with the existing 
data. So first, did you run xfs-repair without "-n", so that it actually 
repairs whatever it can? Maybe run it several times, until no more error 
shows up. You need to ensure you are in a clean state.

Then, try to access the files that are still there. A simple script like
find /mydestroyedfs -exec dd if={} of=/dev/null bs=1024k \;
would read all files once. If this causes errors, either remove the 
problematic files, or maybe xfs-repair will clean those out then.

Now try to access the data with your application, and see which contents 
are still valid. I guess there will be files that are truncated, or 
partly overwritten, or otherwise badly messed. Delete all those files.

Maybe, if you're lucky, you can still use some of that data. I've once 
had a filesystem where the first 1/3rd of the disks has been zeroed, and 
till most files could be recovered. But then again, another customer had 
only about 5-10% overwritten, and could drop all data because an index 
was destroyed so the data was worthless.
It definitely depends on your app. Hopefully that app uses checksums, 
that would make your life easier now.

> c) We are working on making sure everything is working OK. I think
> the power output from our UPS might be problematic. We are
> definitely investigating that, because it could be behind all these
> crazy problems.

I generally do the following, if only one UPS is available: put one 
power supply on the UPS, and the other on the normal line. I hope you 
have redundant PS, do you? That helps whenever the UPS is crazy, at 
least the normal power is available. Better would be two different 
UPSes, but budget is scarce very often.

> d) Checking every files' content manually is not something that is
> going to work. It would, literally, take years.

OK, so what you want to do? Just use it and hope the data is valid? If 
you don't check the files, every calculation you do with that broken 
data is *bogus*, so you better delete it than have wrong data, or no?
 
> Would de-fraging the filesystem remove those zeroed files from the
> filesystem? Does anyone make a XFS utility program that might help?
> Maybe an XFS utility that can be used to remove zeroed files from
> the filesystem? Or remove files that are stored in that one bad LVM
> volume?

Maybe xfs_db can help you find and identify files that had parts or all 
of their data in that area, and remove them.

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531

// ****** Radiointerview zum Thema Spam ******
// http://www.it-podcast.at/archiv.html#podcast-100716
// 
// Haus zu verkaufen: http://zmi.at/langegg/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-11-13  9:28           ` Stan Hoeppner
@ 2010-11-13 15:35             ` Michael Monnerie
  2010-11-14  3:31               ` Stan Hoeppner
  0 siblings, 1 reply; 35+ messages in thread
From: Michael Monnerie @ 2010-11-13 15:35 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: Text/Plain, Size: 2066 bytes --]

On Samstag, 13. November 2010 Stan Hoeppner wrote:
> You've missed the point of this sub thread discussion, or I did.  He
> stated that having the metadata show the files still exist is a
> positive thing.  The files are gone.  I stated that this discrepancy
> is not good thing.

And it's *not* a problem of the filesystem that that data is gone. The 
OP took 1/5 of the disk area and basically overwrote it with zeroes. He 
can be very lucky if there's even a single file still readable. That's 
definitely *robustness* of XFS. If you take 1/5th of any filesytem and 
replace it with zeroes, how many FS would will work after that, or be in 
a workable state?
 
> I believe you are confused, thinking this micro discussion is dealing
> with the OP's overall situation.  It is not.  It is dealing strictly
> with the issue of the lost set of disks, the files that were on them,
> and the fact the metadata says they still exist.  I believe this is
> due to the fact that he hasn't run a destructive xfs_repair yet,
> which I'm guessing will remove those orphaned metadata entries.

Maybe, but you took my message, which solely described that XFS is 
incredible to still work, and mix it with the wish to still have that 
data.

Yes, the OP is in the shit, but it's more or less his own fault. Having 
no full backup, and destroying 1/5th of the disk is very crazy. If he 
can still recover the rest of the contents, he can be very lucky and 
proud to have used XFS. Other FS maybe wouldn't have been so nice.

And I believe no "chkdsk" type program like xfs-repair is designed to 
recover from partly overwritten disks anyway. The sole purpose is to 
bring the filesystem back to a working state.

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531

// ****** Radiointerview zum Thema Spam ******
// http://www.it-podcast.at/archiv.html#podcast-100716
// 
// Haus zu verkaufen: http://zmi.at/langegg/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-11-13 15:35             ` Michael Monnerie
@ 2010-11-14  3:31               ` Stan Hoeppner
  0 siblings, 0 replies; 35+ messages in thread
From: Stan Hoeppner @ 2010-11-14  3:31 UTC (permalink / raw)
  To: xfs

Michael Monnerie put forth on 11/13/2010 9:35 AM:

> Maybe, but you took my message, which solely described that XFS is 
> incredible to still work, and mix it with the wish to still have that 
> data.

That is not what I said at all.  I said that the metadata showing that
the files still exist, when in fact they do not, isn't a state of
affairs I'd describe as "robust".  I've stated this at least twice now,
very clearly.  You are ascribing thoughts, wishes, etc, to me, that I
never enunciated.  I made a simple remark about a very specific aspect
of the OP's situation, relating to your remark, period.  I did not
suggest an alternate behavior would be better.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-11-12 23:01       ` Eli Morris
  2010-11-13 15:25         ` Michael Monnerie
@ 2010-11-14 11:05         ` Dave Chinner
  2010-11-15  4:09           ` Eli Morris
  1 sibling, 1 reply; 35+ messages in thread
From: Dave Chinner @ 2010-11-14 11:05 UTC (permalink / raw)
  To: Eli Morris; +Cc: Michael Monnerie, xfs

On Fri, Nov 12, 2010 at 03:01:47PM -0800, Eli Morris wrote:
> 
> On Nov 12, 2010, at 5:22 AM, Michael Monnerie wrote:
> 
> > On Freitag, 12. November 2010 Eli Morris wrote:
> >> The filesystem must be pointing to files that don't exist, or
> >> something like that. Is there a way to fix that, to say, remove
> >> files that don't exist anymore, sort of command? I thought that
> >> xfs_repair would do that, but apparently not in this case.
> > 
> > The filesystem is not optimized for "I replace part of the disk contents 
> > with zeroes" and find that errors. You will have to look in each file if 
> > it's contents are still valid, or maybe bogus.
....
> Let me see if I can give you and everyone else a little more
> information and clarify this problem somewhat. And if there is
> nothing practical that can be done, then OK. What I am looking for
> is the best PRACTICAL outcome here given our resources and if
> anyone has an idea that might be helpful, that would be awesome. I
> put practical in caps, because that is the rub in all this. We
> could send X to a data recovery service, but there is no money for
> that. We could do Y, but if it takes a couple of months to
> accomplish, it might be better to do Z, even though Z is riskier
> or deletes some amount of data, because it is cheap and only takes
> one day to do..

Well, the best thing you can do is work out where in the block
device the zeroed range was, and then walk the entire filesystem
running xfs_bmap on every file to work out where their physical
extents are. i.e. build a physical block map of the good and bad
regions, then find what files have bits in the bad regions.
I've seen this done before with a perl script, and shouldn't take
more than a few hours to write and run....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-11-14 11:05         ` Dave Chinner
@ 2010-11-15  4:09           ` Eli Morris
  2010-11-16  0:04             ` Dave Chinner
  0 siblings, 1 reply; 35+ messages in thread
From: Eli Morris @ 2010-11-15  4:09 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


On Nov 14, 2010, at 3:05 AM, Dave Chinner wrote:

> On Fri, Nov 12, 2010 at 03:01:47PM -0800, Eli Morris wrote:
>> 
>> On Nov 12, 2010, at 5:22 AM, Michael Monnerie wrote:
>> 
>>> On Freitag, 12. November 2010 Eli Morris wrote:
>>>> The filesystem must be pointing to files that don't exist, or
>>>> something like that. Is there a way to fix that, to say, remove
>>>> files that don't exist anymore, sort of command? I thought that
>>>> xfs_repair would do that, but apparently not in this case.
>>> 
>>> The filesystem is not optimized for "I replace part of the disk contents 
>>> with zeroes" and find that errors. You will have to look in each file if 
>>> it's contents are still valid, or maybe bogus.
> ....
>> Let me see if I can give you and everyone else a little more
>> information and clarify this problem somewhat. And if there is
>> nothing practical that can be done, then OK. What I am looking for
>> is the best PRACTICAL outcome here given our resources and if
>> anyone has an idea that might be helpful, that would be awesome. I
>> put practical in caps, because that is the rub in all this. We
>> could send X to a data recovery service, but there is no money for
>> that. We could do Y, but if it takes a couple of months to
>> accomplish, it might be better to do Z, even though Z is riskier
>> or deletes some amount of data, because it is cheap and only takes
>> one day to do..
> 
> Well, the best thing you can do is work out where in the block
> device the zeroed range was, and then walk the entire filesystem
> running xfs_bmap on every file to work out where their physical
> extents are. i.e. build a physical block map of the good and bad
> regions, then find what files have bits in the bad regions.
> I've seen this done before with a perl script, and shouldn't take
> more than a few hours to write and run....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com


Thanks a lot Dave,

I think that's a really good suggestion. I was thinking along those same lines myself. I understand how I would find where the files are located using xfs_bmap. Do you know which command I would use to find where the 'bad region' is located, so I can compare them to the file locations?

 

thanks again,

Eli


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-11-15  4:09           ` Eli Morris
@ 2010-11-16  0:04             ` Dave Chinner
  2010-11-17  7:29               ` Eli Morris
  0 siblings, 1 reply; 35+ messages in thread
From: Dave Chinner @ 2010-11-16  0:04 UTC (permalink / raw)
  To: Eli Morris; +Cc: xfs

On Sun, Nov 14, 2010 at 08:09:35PM -0800, Eli Morris wrote:
> On Nov 14, 2010, at 3:05 AM, Dave Chinner wrote:
> > On Fri, Nov 12, 2010 at 03:01:47PM -0800, Eli Morris wrote:
> >> On Nov 12, 2010, at 5:22 AM, Michael Monnerie wrote:
> >>> On Freitag, 12. November 2010 Eli Morris wrote:
> >>>> The filesystem must be pointing to files that don't exist, or
> >>>> something like that. Is there a way to fix that, to say, remove
> >>>> files that don't exist anymore, sort of command? I thought that
> >>>> xfs_repair would do that, but apparently not in this case.
> >>> 
> >>> The filesystem is not optimized for "I replace part of the disk contents 
> >>> with zeroes" and find that errors. You will have to look in each file if 
> >>> it's contents are still valid, or maybe bogus.
> > ....
> >> Let me see if I can give you and everyone else a little more
> >> information and clarify this problem somewhat. And if there is
> >> nothing practical that can be done, then OK. What I am looking for
> >> is the best PRACTICAL outcome here given our resources and if
> >> anyone has an idea that might be helpful, that would be awesome. I
> >> put practical in caps, because that is the rub in all this. We
> >> could send X to a data recovery service, but there is no money for
> >> that. We could do Y, but if it takes a couple of months to
> >> accomplish, it might be better to do Z, even though Z is riskier
> >> or deletes some amount of data, because it is cheap and only takes
> >> one day to do..
> > 
> > Well, the best thing you can do is work out where in the block
> > device the zeroed range was, and then walk the entire filesystem
> > running xfs_bmap on every file to work out where their physical
> > extents are. i.e. build a physical block map of the good and bad
> > regions, then find what files have bits in the bad regions.
> > I've seen this done before with a perl script, and shouldn't take
> > more than a few hours to write and run....
> 
> I think that's a really good suggestion. I was thinking along
> those same lines myself. I understand how I would find where the
> files are located using xfs_bmap. Do you know which command I
> would use to find where the 'bad region' is located, so I can
> compare them to the file locations?

There isn't a command to find the 'bad region'. The bad region(s)
need to be worked out based on the storage geometry. e.g. if you had
a linear concat of 3 luns ilke so:

	lun		logical offset		length
	 0		     0GB		500GB
	 1		   500GB		500GB
	 2		  1000GB		500GB

And you lost lun 1, then your bad region is from 500GB-1000GB, and
it's easy to map that. However, if you have a RAID5/6 of those luns,
it gets a whole lot more complex because you need to know how the
RAID layout works (e.g. left-asymmetric) to work out where all the
parity is stored for each stripe and hence which disk contains data.

I'm not sure what your layout is, but you should be able to
calculate the bad regions specifically from the geometry of the
storage and your knowledge of which lun got zeroed....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-11-16  0:04             ` Dave Chinner
@ 2010-11-17  7:29               ` Eli Morris
  2010-11-17  7:47                 ` Dave Chinner
  0 siblings, 1 reply; 35+ messages in thread
From: Eli Morris @ 2010-11-17  7:29 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


On Nov 15, 2010, at 4:04 PM, Dave Chinner wrote:

> On Sun, Nov 14, 2010 at 08:09:35PM -0800, Eli Morris wrote:
>> On Nov 14, 2010, at 3:05 AM, Dave Chinner wrote:
>>> On Fri, Nov 12, 2010 at 03:01:47PM -0800, Eli Morris wrote:
>>>> On Nov 12, 2010, at 5:22 AM, Michael Monnerie wrote:
>>>>> On Freitag, 12. November 2010 Eli Morris wrote:
>>>>>> The filesystem must be pointing to files that don't exist, or
>>>>>> something like that. Is there a way to fix that, to say, remove
>>>>>> files that don't exist anymore, sort of command? I thought that
>>>>>> xfs_repair would do that, but apparently not in this case.
>>>>> 
>>>>> The filesystem is not optimized for "I replace part of the disk contents 
>>>>> with zeroes" and find that errors. You will have to look in each file if 
>>>>> it's contents are still valid, or maybe bogus.
>>> ....
>>>> Let me see if I can give you and everyone else a little more
>>>> information and clarify this problem somewhat. And if there is
>>>> nothing practical that can be done, then OK. What I am looking for
>>>> is the best PRACTICAL outcome here given our resources and if
>>>> anyone has an idea that might be helpful, that would be awesome. I
>>>> put practical in caps, because that is the rub in all this. We
>>>> could send X to a data recovery service, but there is no money for
>>>> that. We could do Y, but if it takes a couple of months to
>>>> accomplish, it might be better to do Z, even though Z is riskier
>>>> or deletes some amount of data, because it is cheap and only takes
>>>> one day to do..
>>> 
>>> Well, the best thing you can do is work out where in the block
>>> device the zeroed range was, and then walk the entire filesystem
>>> running xfs_bmap on every file to work out where their physical
>>> extents are. i.e. build a physical block map of the good and bad
>>> regions, then find what files have bits in the bad regions.
>>> I've seen this done before with a perl script, and shouldn't take
>>> more than a few hours to write and run....
>> 
>> I think that's a really good suggestion. I was thinking along
>> those same lines myself. I understand how I would find where the
>> files are located using xfs_bmap. Do you know which command I
>> would use to find where the 'bad region' is located, so I can
>> compare them to the file locations?
> 
> There isn't a command to find the 'bad region'. The bad region(s)
> need to be worked out based on the storage geometry. e.g. if you had
> a linear concat of 3 luns ilke so:
> 
> 	lun		logical offset		length
> 	 0		     0GB		500GB
> 	 1		   500GB		500GB
> 	 2		  1000GB		500GB
> 
> And you lost lun 1, then your bad region is from 500GB-1000GB, and
> it's easy to map that. However, if you have a RAID5/6 of those luns,
> it gets a whole lot more complex because you need to know how the
> RAID layout works (e.g. left-asymmetric) to work out where all the
> parity is stored for each stripe and hence which disk contains data.
> 
> I'm not sure what your layout is, but you should be able to
> calculate the bad regions specifically from the geometry of the
> storage and your knowledge of which lun got zeroed....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com



Hi Dave,

Thanks a lot for your help. I looked at the man page and elsewhere for this info and can't find what this means:


extent: [startoffset..endoffset]: startblock..endblock


I understand what an offset would be, but what the heck is a startoffset and an endoffset? 

Is the formula for the location of the file:

 startoffset + startblock through endoffset + endblock, where the blocks and the offsets are in 512 bytes?


So this file:

0: [0..1053271]: 5200578944..5201632215

would be contained from:

beginning: 	(0 + 5200578944) * 512 bytes
ending:		(1053271 + 5201632215) * 512 bytes

Is that correct?

thanks again,

Eli




_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-11-17  7:29               ` Eli Morris
@ 2010-11-17  7:47                 ` Dave Chinner
  2010-11-30  7:22                   ` Eli Morris
  0 siblings, 1 reply; 35+ messages in thread
From: Dave Chinner @ 2010-11-17  7:47 UTC (permalink / raw)
  To: Eli Morris; +Cc: xfs

On Tue, Nov 16, 2010 at 11:29:41PM -0800, Eli Morris wrote:
> Hi Dave,
> 
> Thanks a lot for your help. I looked at the man page and elsewhere for this info and can't find what this means:
> 
> 
> extent: [startoffset..endoffset]: startblock..endblock
> 
> 
> I understand what an offset would be, but what the heck is a startoffset and an endoffset? 

startoffset: file offset of the start of the extent
endoffset: file offset of the end of the extent

> Is the formula for the location of the file:
> 
>  startoffset + startblock through endoffset + endblock, where the blocks and the offsets are in 512 bytes?

no.

> So this file:
> 
> 0: [0..1053271]: 5200578944..5201632215
> 
> would be contained from:
> 
> beginning: 	(0 + 5200578944) * 512 bytes
> ending:		(1053271 + 5201632215) * 512 bytes

No, it translates like this:

    Logical		  Physical
File Offset (bytes)	block on disk
-------------------     -------------
   0 (0..511)		5200578944
   1 (512..1023)	5200578945
   2 (1024..1536)	5200578946
 .....			.....
1053270			5201632214
1053271			5201632215

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-11-17  7:47                 ` Dave Chinner
@ 2010-11-30  7:22                   ` Eli Morris
  2010-12-02 11:33                     ` Michael Monnerie
  0 siblings, 1 reply; 35+ messages in thread
From: Eli Morris @ 2010-11-30  7:22 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


On Nov 16, 2010, at 11:47 PM, Dave Chinner wrote:

> On Tue, Nov 16, 2010 at 11:29:41PM -0800, Eli Morris wrote:
>> Hi Dave,
>> 
>> Thanks a lot for your help. I looked at the man page and elsewhere for this info and can't find what this means:
>> 
>> 
>> extent: [startoffset..endoffset]: startblock..endblock
>> 
>> 
>> I understand what an offset would be, but what the heck is a startoffset and an endoffset? 
> 
> startoffset: file offset of the start of the extent
> endoffset: file offset of the end of the extent
> 
>> Is the formula for the location of the file:
>> 
>> startoffset + startblock through endoffset + endblock, where the blocks and the offsets are in 512 bytes?
> 
> no.
> 
>> So this file:
>> 
>> 0: [0..1053271]: 5200578944..5201632215
>> 
>> would be contained from:
>> 
>> beginning: 	(0 + 5200578944) * 512 bytes
>> ending:		(1053271 + 5201632215) * 512 bytes
> 
> No, it translates like this:
> 
>    Logical		  Physical
> File Offset (bytes)	block on disk
> -------------------     -------------
>   0 (0..511)		5200578944
>   1 (512..1023)	5200578945
>   2 (1024..1536)	5200578946
> .....			.....
> 1053270			5201632214
> 1053271			5201632215
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

Hi Dave,

Thanks for your help with this. I wrote the program and ran it through and it looks like we have we able to preserve 44 TB of valid data, while removing the corrupted files, which is a great result, considering the circumstances. 

Thanks again,

Eli






_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-11-30  7:22                   ` Eli Morris
@ 2010-12-02 11:33                     ` Michael Monnerie
  2010-12-03  0:58                       ` Stan Hoeppner
  2010-12-04  0:43                       ` Eli Morris
  0 siblings, 2 replies; 35+ messages in thread
From: Michael Monnerie @ 2010-12-02 11:33 UTC (permalink / raw)
  To: xfs; +Cc: Eli Morris


[-- Attachment #1.1: Type: Text/Plain, Size: 961 bytes --]

On Dienstag, 30. November 2010 Eli Morris wrote:
> Thanks for your help with this. I wrote the program and ran it
> through and it looks like we have we able to preserve 44 TB of valid
> data, while removing the corrupted files, which is a great result,
> considering the circumstances. 

Eli, could you post the relevant program here so others can use it if 
needed? There are requests from time to time, and it would be good if 
such a program were available (like I'm sure you'd been happy if it 
already existed the time you needed it).

Thanks, and wow: what an amazing filesystem can recover such an event!

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531

// ****** Radiointerview zum Thema Spam ******
// http://www.it-podcast.at/archiv.html#podcast-100716
// 
// Haus zu verkaufen: http://zmi.at/langegg/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-12-02 11:33                     ` Michael Monnerie
@ 2010-12-03  0:58                       ` Stan Hoeppner
  2010-12-04  0:43                       ` Eli Morris
  1 sibling, 0 replies; 35+ messages in thread
From: Stan Hoeppner @ 2010-12-03  0:58 UTC (permalink / raw)
  To: xfs

Michael Monnerie put forth on 12/2/2010 5:33 AM:

> Thanks, and wow: what an amazing filesystem can recover such an event!

FSVO "recover".  It's definitely amazing that he didn't lose all of his
data as a result of his hardware failures and storage configuration.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-12-02 11:33                     ` Michael Monnerie
  2010-12-03  0:58                       ` Stan Hoeppner
@ 2010-12-04  0:43                       ` Eli Morris
  1 sibling, 0 replies; 35+ messages in thread
From: Eli Morris @ 2010-12-04  0:43 UTC (permalink / raw)
  To: Michael Monnerie; +Cc: xfs


[-- Attachment #1.1: Type: text/plain, Size: 5562 bytes --]


On Dec 2, 2010, at 3:33 AM, Michael Monnerie wrote:

> On Dienstag, 30. November 2010 Eli Morris wrote:
>> Thanks for your help with this. I wrote the program and ran it
>> through and it looks like we have we able to preserve 44 TB of valid
>> data, while removing the corrupted files, which is a great result,
>> considering the circumstances. 
> 
> Eli, could you post the relevant program here so others can use it if 
> needed? There are requests from time to time, and it would be good if 
> such a program were available (like I'm sure you'd been happy if it 
> already existed the time you needed it).
> 
> Thanks, and wow: what an amazing filesystem can recover such an event!
> 
> -- 
> mit freundlichen Grüssen,
> Michael Monnerie, Ing. BSc
> 
> it-management Internet Services: Protéger
> http://proteger.at [gesprochen: Prot-e-schee]
> Tel: +43 660 / 415 6531
> 
> // ****** Radiointerview zum Thema Spam ******
> // http://www.it-podcast.at/archiv.html#podcast-100716
> // 
> // Haus zu verkaufen: http://zmi.at/langegg/


Good idea, here is the program:

Eli

#!/bin/bash
# 
#    Copyright 2010 Eli Morris, Travis O'Brien, University of California 
# 
#    remove_bad.sh is free software: you can redistribute it under the  terms
#    of the GNU General Public License as published by the Free Software
#    Foundation, either version 3 of the License, or (at your option) any later
#    version. 
# 
#    This program is distributed in the hope that it will be useful, but
#    WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
#    or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
#    for more details. 
# 
#    You should have received a copy of the GNU General Public License along
#    with this program.  If not, see <http://www.gnu.org/licenses/>. 
#
#
#remove_bad.sh: A script to determine whether any part of a file falls within a
#set of blocks (indicated by arguments 1 and 2).  This script is
#originally written with the intent to find files on a file system that
#exist(ed) on a corrupt section of the file system.  It generates a list of files
#that are potentially bad, so that they can be removed by another script.
#

#Check command line arguments; grab arguments 1 and 2
if [ $# -eq 2 ]; then
	BAD_BLOCK_BEGINNING=$1
	BAD_BLOCK_END=$2
	echo "bad block beginning $BAD_BLOCK_BEGINNING"
	echo "bad block ending $BAD_BLOCK_END"
#if there aren't exactly 2 arguments then print the usage to the user
else
	echo "usage: remove_bad.sh beginning_block ending_block"
	exit
fi

remove file from last run
if ( test -e "./naughty_list.txt") 
then
	echo "removing the previous naughty list"
	rm "./naughty_list.txt"
fi

IFS=$'\n' #set the field separator to the carriage return character
ALL_FILES=(`find /export/vol5 -type f`) #A list of all files on the volume, SUBSTITUTE NAME OF YOUR VOLUME
NUM_FILES=${#ALL_FILES[@]} #The number of files on the volume
echo "number of files is $NUM_FILES" #Report the number of files to the user

# for each of the file in vol5
for (( COUNT=0; COUNT<$NUM_FILES; COUNT++))
do
    	#Report which file is being worked on
	echo "file number: $COUNT is ${ALL_FILES[$COUNT]}"

	# report number of files to go
	FILES_TO_GO=$((NUM_FILES-COUNT))
	echo "files left: $FILES_TO_GO" 

    	#Run xfs_bmap to get the blocks that the file lives within
	OUTPUT=(`xfs_bmap ${ALL_FILES[$COUNT]}`)
	# output looks like this
	# vol5dump:
	# 0: [0..1053271]: 5200578944..5201632215

	BAD_FILE=0 #Initialize the bad file flag
	NUM_LINES=${#OUTPUT[@]} #The number of lines from xfs_bmap

	# echo "number of lines for file: $NUM_LINES" #Report the number of lines to the user
    	#Loop through each line
	for (( LINE=1; LINE < $NUM_LINES; LINE++))
	do
		# echo "line number $LINE: output: ${OUTPUT[$LINE]}" #Report the current working line

		# get the block range from the line
		BLOCKS=`echo ${OUTPUT[$LINE]} | cut -d':' -f3`

       	 	#Report the number of blocks occupied
		# echo "blocks after cut: '$BLOCKS'" 
        	#Use cut to get the first and last block for the file
		FIRST_BLOCK=`echo $BLOCKS | cut -d'.' -f1` 
		LAST_BLOCK=`echo $BLOCKS | cut -d'.' -f3`
		
        	#Report these to the user
		# echo "beginning block: $FIRST_BLOCK"
		# echo "ending block: $LAST_BLOCK"

		#TODO: I'm not sure what exactly 'hole' means, but I get the impression that it has something
		#to do with XFS's way of avoiding file fragmentation. TAO
		if [ "$BLOCKS" != " hole" ]; then  #Don't deal with lines that report 'hole'
			# compare to bad block region
			#For now, check whether the blocks for the file fall within the user-given block range
			#if any of the blocks do, then mark this file as bad.

		  	if ( (( "$BAD_BLOCK_BEGINNING" <= "$FIRST_BLOCK")) && (( "$FIRST_BLOCK" <= "$BAD_BLOCK_END")) ); then
				  # echo "hit first criterium"
				  BAD_FILE=1
				  break
		  	elif ( (( "$BAD_BLOCK_BEGINNING" <= "$LAST_BLOCK")) && (( "$LAST_BLOCK" <= "$BAD_BLOCK_END")) ); then
				  # echo "hit second criterium"
				  BAD_FILE=1
				  break
		  	fi
		fi
	done
	# add the file to the list of bad files
	if (($BAD_FILE == 1)); then
                #Report to the user that the current file is bad
		echo "putting file: ${ALL_FILES[$COUNT]} on the naughty list"
                #Write the file's name to the list
		echo "${ALL_FILES[$COUNT]}" >> naughty_list.txt
	fi
done
echo "program_ended_succesfully" >> naughty_list.txt


[-- Attachment #1.2: Type: text/html, Size: 32446 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-11-12 22:14       ` Stan Hoeppner
  2010-11-13  8:19         ` Emmanuel Florac
@ 2010-12-04 10:30         ` Martin Steigerwald
  2010-12-05  4:49           ` Stan Hoeppner
  1 sibling, 1 reply; 35+ messages in thread
From: Martin Steigerwald @ 2010-12-04 10:30 UTC (permalink / raw)
  To: xfs

Am Freitag 12 November 2010 schrieb Stan Hoeppner:
> Michael Monnerie put forth on 11/12/2010 7:22 AM:
> > I find the robustness of XFS amazing: You overwrote 1/5th of the disk
> > with zeroes, and it still works :-)
> 
> This isn't "robustness" Michael.  If anything it's a serious problem.
> XFS is reporting that hundreds or thousands of files that have been
> physically removed still exist.  Regardless of how he arrived at this
> position, how is this "robust"?  Most people would consider this
> inconsistency of state a "corruption" situation, not "robustness".

I think its necessary to differentiate here:

1) It appears to be robustness - or pure luck - regarding metadata 
consistency of the filesystem. I tend to believe its pure luck and that XFS 
just stored the metadata on the other RAID arrays.

2) XFS does not seem to have a way to detect whether file contents are 
still valid and consistent. It shares that with I think every other Linux 
filesystem instead BTRFS which uses checksumming for files. (Maybe NILFS as 
well, I don't know, and the FUSE or the other ZFS port).

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-12-04 10:30         ` Martin Steigerwald
@ 2010-12-05  4:49           ` Stan Hoeppner
  2010-12-05  9:44             ` Roger Willcocks
  0 siblings, 1 reply; 35+ messages in thread
From: Stan Hoeppner @ 2010-12-05  4:49 UTC (permalink / raw)
  To: xfs

Martin Steigerwald put forth on 12/4/2010 4:30 AM:
> Am Freitag 12 November 2010 schrieb Stan Hoeppner:
>> Michael Monnerie put forth on 11/12/2010 7:22 AM:
>>> I find the robustness of XFS amazing: You overwrote 1/5th of the disk
>>> with zeroes, and it still works :-)
>>
>> This isn't "robustness" Michael.  If anything it's a serious problem.
>> XFS is reporting that hundreds or thousands of files that have been
>> physically removed still exist.  Regardless of how he arrived at this
>> position, how is this "robust"?  Most people would consider this
>> inconsistency of state a "corruption" situation, not "robustness".
> 
> I think its necessary to differentiate here:
> 
> 1) It appears to be robustness - or pure luck - regarding metadata 
> consistency of the filesystem. I tend to believe its pure luck and that XFS 
> just stored the metadata on the other RAID arrays.
> 
> 2) XFS does not seem to have a way to detect whether file contents are 
> still valid and consistent. It shares that with I think every other Linux 
> filesystem instead BTRFS which uses checksumming for files. (Maybe NILFS as 
> well, I don't know, and the FUSE or the other ZFS port).

After re-reading my own words above again, I feel I a need to clarify
something:  I took exception merely to the description of "robustness"
being used in this situation.  I was not and am not being derogatory of
XFS in any way.  I love XFS.  Of all available filesystems (on any OS) I
feel it is the best.  That's why I use it. :)

In this scenario, other filesystems may have left the OP empty handed.
So, I guess XFS deserves deserves a positive attribution for this.  But,
again, I don't think "robustness" is the correct attribution here.

-- 
Stan


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-12-05  4:49           ` Stan Hoeppner
@ 2010-12-05  9:44             ` Roger Willcocks
  0 siblings, 0 replies; 35+ messages in thread
From: Roger Willcocks @ 2010-12-05  9:44 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: xfs


On 5 Dec 2010, at 04:49, Stan Hoeppner wrote:

> Martin Steigerwald put forth on 12/4/2010 4:30 AM:
>> Am Freitag 12 November 2010 schrieb Stan Hoeppner:
>>> Michael Monnerie put forth on 11/12/2010 7:22 AM:
>>>> I find the robustness of XFS amazing: You overwrote 1/5th of the disk
>>>> with zeroes, and it still works :-)
>>> 
>>> This isn't "robustness" Michael.  If anything it's a serious problem.
>>> XFS is reporting that hundreds or thousands of files that have been
>>> physically removed still exist.  Regardless of how he arrived at this
>>> position, how is this "robust"?  Most people would consider this
>>> inconsistency of state a "corruption" situation, not "robustness".
>> 
>> I think its necessary to differentiate here:
>> 
>> 1) It appears to be robustness - or pure luck - regarding metadata 
>> consistency of the filesystem. I tend to believe its pure luck and that XFS 
>> just stored the metadata on the other RAID arrays.
>> 
>> 2) XFS does not seem to have a way to detect whether file contents are 
>> still valid and consistent. It shares that with I think every other Linux 
>> filesystem instead BTRFS which uses checksumming for files. (Maybe NILFS as 
>> well, I don't know, and the FUSE or the other ZFS port).
> 
> After re-reading my own words above again, I feel I a need to clarify
> something:  I took exception merely to the description of "robustness"
> being used in this situation.  I was not and am not being derogatory of
> XFS in any way.  I love XFS.  Of all available filesystems (on any OS) I
> feel it is the best.  That's why I use it. :)
> 
> In this scenario, other filesystems may have left the OP empty handed.
> So, I guess XFS deserves deserves a positive attribution for this.  But,
> again, I don't think "robustness" is the correct attribution here.

I think 'lucky' is probably a more appropriate term. The chances are that due to the size of the array, all the inodes and the inline extent lists were on the first volume. If' he'd lost that instead, everything would be gone.

--
Roger

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-10-31 19:56 Eli Morris
  2010-10-31 20:40 ` Emmanuel Florac
  2010-10-31 21:10 ` Steve Costaras
@ 2010-11-01 15:03 ` Stan Hoeppner
  2 siblings, 0 replies; 35+ messages in thread
From: Stan Hoeppner @ 2010-11-01 15:03 UTC (permalink / raw)
  To: xfs

Eli Morris put forth on 10/31/2010 2:56 PM:

> OK, that's a long tale of woe. Thanks for any advise. 

In additional to the suggestions you've already received, I'd suggest
you reach out to your colleagues at SDSC.  They'd most certainly have
quite a bit of storage experience on staff, and they are part of the
University of California system, and thus "family" of sorts.

The Janus 6640 has 4 rows of 4 hot swap drives connected to a backplane.
 Of the 4 drives that were marked offline, are they all in the same
horizontal row or vertical column?  If so, I'd say you most certainly
have a defective SATA backplane.  Even if the offline drives are not in
a physical row, the problem could still likely be the backplane.  This
is _very_ common with "low end" or low cost SATA arrays.  Backplanes
issues are the most common cause of drives being kicked offline
unexpectedly.

The very first thing I would do, given the _value_ of the data itself,
is get an emergency onsite qualified service tech from your vendor or
the manufacturer and have the backplane or the entire unit itself
replaced.  If replacing the entire unit, swap all of the 16 drives into
the new unit _inserting each drive in the same slot number as the old
unit_

Have the firmware/nvram configuration dumped from the old unit to the
new one so the RAID configuration is carried over as well as the same
firmware rev you were using.

After this is complete, power up the array and manually put all of the
drives online and get a healthy status in the LCD display.  Mount the
filesystem read only and do some serious read stress tests to make sure
drives aren't kicked offline again.  If they are kicked offline, note
the drive slot numbers to see if the same set of 4 drives are kicked
offline.  At this point, either the backplane design is faulty, or the 4
drives being kicked offline have a firmware rev different enough from
the other drives, or simply faulty for your RAID application, that the
RAID controller simply doesn't like them.  If this is the case, you need
to take an inventory of the firmware revision on each and every one of
the 2TB drives.  Of those not being kicked offline, note the highest
quantity of identical firmware.

Contact Western Digital support via phone.  Briefly but thoroughly
explain who you are, what your situation is, and the gravity of the
situation.  Ask them what their opinion is on the firmware issue, and
what rev you should download for use in flashing the entire set of drives.

Mismatched drive firmware across a set of drives assigned to a RAID
array, especially a hardware RAID array, is the second most common cause
of drives being kicked offline unexpectedly.  Linux mdraid is slightly
more tolerant of mismatched firmware, but it's always best practice to
use only drives of matched firmware rev within a given RAID group.  This
has been true for a couple of decades now (or more).

Hope this helps.  Good luck.  We're pulling for you.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-11-01  3:40   ` Eli Morris
@ 2010-11-01 10:07     ` Emmanuel Florac
  0 siblings, 0 replies; 35+ messages in thread
From: Emmanuel Florac @ 2010-11-01 10:07 UTC (permalink / raw)
  To: Eli Morris; +Cc: xfs

Le Sun, 31 Oct 2010 20:40:20 -0700 vous écriviez:

> Thanks for your help. The RAID is a SCSI connected direct attached
> storage 16 bay unit made by Maxtronic. It is a Janus 6640, is case
> that helps anything.

Alas, never heard of it... Looks like a quite low end hardware, without
redundant controllers. Probably similar to Infortrend. The possibility
of an unreliable firmware cannot be excluded with these :)

> At the time of its problem, it was mounted
> read-only as I was trying to be careful of the data, since the main
> volume failed and this was our only copy. So maybe the cache isn't a
> big deal.

Indeed, nothing to lose in RO mode.

> I'll try the "dead" drives tomorrow with a WD utility. I'll
> give Spinrite a try, if the WD utility doesn't revive them.

Good luck.

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-10-31 20:40 ` Emmanuel Florac
@ 2010-11-01  3:40   ` Eli Morris
  2010-11-01 10:07     ` Emmanuel Florac
  0 siblings, 1 reply; 35+ messages in thread
From: Eli Morris @ 2010-11-01  3:40 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs


On Oct 31, 2010, at 1:40 PM, Emmanuel Florac wrote:

> Le Sun, 31 Oct 2010 12:56:33 -0700 vous écriviez:
> 
>> OK, that's a long tale of woe. Thanks for any advise. 
> 
> oK, so what we'd like to do is get the backup RAID volume back in
> working order. You said it's made of 2TB Caviar green drives, but
> didn't mention the RAID controller type... As I understand it, you
> power-cycled the RAID array, so the cache is gone, whatever have been
> in it... 
> 
> All arrays I know will happily reassemble a working RAID if you
> succesfully revive the failed drives.
> 
> Logically the failed drives are almost certainly not really dead, but
> in a temporary failure state mode. First, you must check WD support and
> utilities to see if something may apply to your configuration. Anyway,
> checking the failed drives' health with the western digital disk
> utility should allow you to determine if they're toast or not.
> 
> In the case they're not actually dead, you could try to revive the
> badblocks with Spinrite (www.grc.com), it saved my life a couple of
> times, however it's quite risky when used with SMART-tripped drives.
> 
> -- 
> ------------------------------------------------------------------------
> Emmanuel Florac     |   Direction technique
>                    |   Intellique
>                    |	<eflorac@intellique.com>
>                    |   +33 1 78 94 84 02
> ------------------------------------------------------------------------

Hi,

Thanks for your help. The RAID is a SCSI connected direct attached storage 16 bay unit made by Maxtronic. It is a Janus 6640, is case that helps anything. At the time of its problem, it was mounted read-only as I was trying to be careful of the data, since the main volume failed and this was our only copy. So maybe the cache isn't a big deal. I'll try the "dead" drives tomorrow with a WD utility. I'll give Spinrite a try, if the WD utility doesn't revive them.

thanks a lot for the help. I'll let you all know how things go.

Eli 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-10-31 19:56 Eli Morris
  2010-10-31 20:40 ` Emmanuel Florac
@ 2010-10-31 21:10 ` Steve Costaras
  2010-11-01 15:03 ` Stan Hoeppner
  2 siblings, 0 replies; 35+ messages in thread
From: Steve Costaras @ 2010-10-31 21:10 UTC (permalink / raw)
  To: Eli Morris; +Cc: xfs



On 2010-10-31 14:56, Eli Morris wrote:
>
> Hi guys,
>
> Thanks for all the responses. On the XFS volume that I'm trying to recover here, I've already re-initialized the RAID, so I've kissed that data goodbye. I am using LVM2. Each of the 5 RAID volumes is a physical volume. Then a logical volume is created out of those, and then the filesystem lies on top of that. So now we have, in order, 2 intact PVs, 1 OK, but blank PV, 2 intact PVs. On the RAID where we lost the drives, replacements are in place and I created a now healthy volume. Through LVM, I was then able to create a new PV from the re-constituted RAID volume and put that into our logical volume in place of the destroyed PV. So now, I have a logical volume that I can activate and I can see the filesystem. It still reports as having all the old files as before, although it doesn't. So the hardware is now OK. It's just what to do with our damaged filesystem that has a huge chunk missing out of it. I put the xfs_repair trial output on an http server, as suggested (good sug!
 ge!
>   stion) and it is here:

What was your raid stripe size (hardware)?  Did you have any 
partitioning scheme on the hdw raid volumes or did you just use the 
native device?    When you created the volume group & lv did you do any 
striping or just concatenation of the luns?  if striping what was your 
lvcreate parameters (stripe size et al).

You mentioned that you lost only 1 of the 5 arrays.    Assuming the 
others did not have any failures?    You wiped the array that failed so 
you  have 4/5 of the data and 1/5 is zeroed.     Which removes the 
possibility of vendor recovery/assistance.

Assuming that everything is equal there should be an equal distribution 
of files across the AG's and the AG's should have been distributed 
across the 5 volumes.    Do you have the xfs_info data?     I think you 
may be a bit out of luck here with xfs_repair.     I am not sure how XFS 
handles files/fragmentation between AG's and AG's relation to the 
underlying 'physical volume'.    I.e.  problem would be if a particular 
AG was on a different volume than the blocks of the actual file, 
likewise another complexity would be fragmented files where data was not 
contiguous.    What is the average size of the files that you had on the 
volume?

In similar circumstances if files were small enough to be on the 
remaining disks and contiguous/non fragmented I've had some luck w/ 
forensic tools Foremost & Scalpel.

Steve



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
  2010-10-31 19:56 Eli Morris
@ 2010-10-31 20:40 ` Emmanuel Florac
  2010-11-01  3:40   ` Eli Morris
  2010-10-31 21:10 ` Steve Costaras
  2010-11-01 15:03 ` Stan Hoeppner
  2 siblings, 1 reply; 35+ messages in thread
From: Emmanuel Florac @ 2010-10-31 20:40 UTC (permalink / raw)
  To: Eli Morris; +Cc: xfs

Le Sun, 31 Oct 2010 12:56:33 -0700 vous écriviez:

> OK, that's a long tale of woe. Thanks for any advise. 

oK, so what we'd like to do is get the backup RAID volume back in
working order. You said it's made of 2TB Caviar green drives, but
didn't mention the RAID controller type... As I understand it, you
power-cycled the RAID array, so the cache is gone, whatever have been
in it... 

All arrays I know will happily reassemble a working RAID if you
succesfully revive the failed drives.

Logically the failed drives are almost certainly not really dead, but
in a temporary failure state mode. First, you must check WD support and
utilities to see if something may apply to your configuration. Anyway,
checking the failed drives' health with the western digital disk
utility should allow you to determine if they're toast or not.

In the case they're not actually dead, you could try to revive the
badblocks with Spinrite (www.grc.com), it saved my life a couple of
times, however it's quite risky when used with SMART-tripped drives.

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: xfs_repair of critical volume
@ 2010-10-31 19:56 Eli Morris
  2010-10-31 20:40 ` Emmanuel Florac
                   ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Eli Morris @ 2010-10-31 19:56 UTC (permalink / raw)
  To: xfs

> Hi,
> 
> I have a large XFS filesystem (60 TB) that is composed of 5 hardware RAID 6 volumes. One of those volumes had several drives fail in a very short time and we lost that volume. However, four of the volumes seem OK. We are in a worse state because our backup unit failed a week later when four drives simultaneously went offline. So we are in a bad very state. I am able to mount the filesystem that consists of the four remaining volumes. I was thinking about running xfs_repair on the filesystem in hopes it would recover all the files that were not on the bad volume, which are obviously gone. Since our backup is gone, I'm very concerned about doing anything to lose the data that will still have. I ran xfs_repair with the -n flag and I have a lengthly file of things that program would do to our filesystem. I don't have the expertise to decipher the output and figure out if xfs_repair would fix the filesystem in a way that would retain our remaining data or if it would, let's say!
  t!
> runcate the filesystem at the data loss boundary (our lost volume was the middle one of the five volumes), returning 2/5 of the filesystem or some other undesirable result. I would post the xfs_repair -n output here, but it is more than a megabyte. I'm hoping some one of you xfs gurus will take pity on me and let me send you the output to look at or give me an idea as to what they think xfs_repair is likely to do if I should run it or if anyone has any suggestions as to how to get back as much data as possible in this recovery.
> 
> thanks very much,
> 
> Eli

Hi guys,

Thanks for all the responses. On the XFS volume that I'm trying to recover here, I've already re-initialized the RAID, so I've kissed that data goodbye. I am using LVM2. Each of the 5 RAID volumes is a physical volume. Then a logical volume is created out of those, and then the filesystem lies on top of that. So now we have, in order, 2 intact PVs, 1 OK, but blank PV, 2 intact PVs. On the RAID where we lost the drives, replacements are in place and I created a now healthy volume. Through LVM, I was then able to create a new PV from the re-constituted RAID volume and put that into our logical volume in place of the destroyed PV. So now, I have a logical volume that I can activate and I can see the filesystem. It still reports as having all the old files as before, although it doesn't. So the hardware is now OK. It's just what to do with our damaged filesystem that has a huge chunk missing out of it. I put the xfs_repair trial output on an http server, as suggested (good sugge!
 stion) and it is here:

http://sczdisplay.ucsc.edu/vol_repair_test.txt

Now I also have the problem of our backup RAID unit that failed. That one failed after I re-initialized the primary RAID, but before I could restore the backups to the primary. I'm having some good luck, huh? On that RAID unit, everything was fine until the next time I looked at it, which was a couple of hours later, 4 drives went offline and it reported the volume as lost. On that unit, the only thing I have done so far is to power cycle it a couple of times. Other than that, it is untouched. In it we are using the Caviar Green 2 TB drives, which our vendor told us where fine to use. However, I have read in the last couple of days that they have as issue with timing out as they remap sectors, as noted here:

http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery#Western_Digital_Time_Limit_Error_Recovery_Utility_-_WDTLER.EXE

Thus, I've learned that they are not recommended for use in RAID volumes. So I am looking hard into ways to trying to recover that data as well, although it is only a partial backup of our main volume. It contains about 10 TB of the most critical files from the main volume. Fortunately, this isn't the human genome, but it is climate modeling data that graduate students have been generating for years. So losing all this could set them back years on their PhDs. So I take the situation pretty seriously. In this case, we are thinking about going with a data recovery company, but this isn't industry. Our lab doesn't have very deep pockets. $10K would be a huge chunk of money to spend. So, I would welcome suggestions for this unit as well. I believe the drives themselves in this unit are OK, as four going out with one minute, as the log shows, is not something that makes a lot of sense.  My guess is that they were under heavy load for the first time in a few months and four of the!
  drives started remapping sectors at pretty much the same time. The RAID controller in this DAS 16 drive box tried to contact the drives and reached a timeout and marked them all as dead. We are also considering that we are having some sort of power problem as we seem to be usually unlucky in the last couple of weeks, although we do have everything behind a pretty nice $7K UPS that isn't reporting any problems. 

OK, that's a long tale of woe. Thanks for any advise. 

Eli
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2010-12-05  9:43 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-31  7:54 xfs_repair of critical volume Eli Morris
2010-10-31  9:54 ` Stan Hoeppner
2010-11-12  8:48   ` Eli Morris
2010-11-12 13:22     ` Michael Monnerie
2010-11-12 22:14       ` Stan Hoeppner
2010-11-13  8:19         ` Emmanuel Florac
2010-11-13  9:28           ` Stan Hoeppner
2010-11-13 15:35             ` Michael Monnerie
2010-11-14  3:31               ` Stan Hoeppner
2010-12-04 10:30         ` Martin Steigerwald
2010-12-05  4:49           ` Stan Hoeppner
2010-12-05  9:44             ` Roger Willcocks
2010-11-12 23:01       ` Eli Morris
2010-11-13 15:25         ` Michael Monnerie
2010-11-14 11:05         ` Dave Chinner
2010-11-15  4:09           ` Eli Morris
2010-11-16  0:04             ` Dave Chinner
2010-11-17  7:29               ` Eli Morris
2010-11-17  7:47                 ` Dave Chinner
2010-11-30  7:22                   ` Eli Morris
2010-12-02 11:33                     ` Michael Monnerie
2010-12-03  0:58                       ` Stan Hoeppner
2010-12-04  0:43                       ` Eli Morris
2010-10-31 14:10 ` Emmanuel Florac
2010-10-31 14:41   ` Steve Costaras
2010-10-31 16:52 ` Roger Willcocks
2010-11-01 22:21 ` Eric Sandeen
2010-11-01 23:32   ` Eli Morris
2010-11-02  0:14     ` Eric Sandeen
2010-10-31 19:56 Eli Morris
2010-10-31 20:40 ` Emmanuel Florac
2010-11-01  3:40   ` Eli Morris
2010-11-01 10:07     ` Emmanuel Florac
2010-10-31 21:10 ` Steve Costaras
2010-11-01 15:03 ` Stan Hoeppner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.