All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steve Costaras <stevecs@chaven.com>
To: xfs@oss.sgi.com
Subject: Re: xfs_repair of critical volume
Date: Sun, 31 Oct 2010 09:41:37 -0500	[thread overview]
Message-ID: <4CCD8021.2070403@chaven.com> (raw)
In-Reply-To: <20101031151000.70dcd6b9@galadriel.home>



On 2010-10-31 09:10, Emmanuel Florac wrote:
> Did you try to actually freeze the failed drives (it may revive them 
> for a while)?

Do NOT try this.   It's only good for some /very/ specific types of 
issues with older drives.   With an array of your size you are probably 
running relatively current drives (i.e. past 5-7 years) and this has a 
vary large probability of causing more damage.

The other questions are to the point to determine the circumstances 
around the failure and what the  state of the array was at the 
time.      Take your time, do not rush anything; you are already hanging 
over a cliff.

First thing if you are able is to do a bit copy of the physical drives 
to spares that way you can always get back to the same point where you 
are now.    This may not be practical with such a large array but if you 
have the means it's worth it.

You want to start from the lowest component and work your way up.    So 
you want to make sure that your raid array itself is sane before looking 
to fix any volume management functions and that before looking at your 
file systems.    When dealing with degraded or failed arrays be careful 
on what you do if you have write cache enabled on your controllers.   
Talk to the vendor!   Whatever operations you do on the card could cause 
this data to be lost and that can  be substantial with some controllers 
(MiB->GiB ranges).       Normally we  run w/ write cache disabled (both 
on the drive and on the raid controllers) for critical data to avoid 
having too much data in flight if a problem ever did occur.

The points that Emmanuel mentioned are valid;    Though would hold off 
on powering down until you are able to get all the geometry information 
from your raid's (unless you already have them).   Also would hold off 
until you determine if you have any dirty caches on the raid 
controllers.    Most controllers keep a rotating buffer of events 
including failure pointers that if you re-boot the re-scanning of drives 
upon start may push that pointer further down the stack until it gets 
lost and then you won't be able to recover outstanding data.   I've seen 
this set at 128 - 256 entries on various systems, another reason to keep 
drives per controller counts down.

Steve


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2010-10-31 14:40 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-31  7:54 xfs_repair of critical volume Eli Morris
2010-10-31  9:54 ` Stan Hoeppner
2010-11-12  8:48   ` Eli Morris
2010-11-12 13:22     ` Michael Monnerie
2010-11-12 22:14       ` Stan Hoeppner
2010-11-13  8:19         ` Emmanuel Florac
2010-11-13  9:28           ` Stan Hoeppner
2010-11-13 15:35             ` Michael Monnerie
2010-11-14  3:31               ` Stan Hoeppner
2010-12-04 10:30         ` Martin Steigerwald
2010-12-05  4:49           ` Stan Hoeppner
2010-12-05  9:44             ` Roger Willcocks
2010-11-12 23:01       ` Eli Morris
2010-11-13 15:25         ` Michael Monnerie
2010-11-14 11:05         ` Dave Chinner
2010-11-15  4:09           ` Eli Morris
2010-11-16  0:04             ` Dave Chinner
2010-11-17  7:29               ` Eli Morris
2010-11-17  7:47                 ` Dave Chinner
2010-11-30  7:22                   ` Eli Morris
2010-12-02 11:33                     ` Michael Monnerie
2010-12-03  0:58                       ` Stan Hoeppner
2010-12-04  0:43                       ` Eli Morris
2010-10-31 14:10 ` Emmanuel Florac
2010-10-31 14:41   ` Steve Costaras [this message]
2010-10-31 16:52 ` Roger Willcocks
2010-11-01 22:21 ` Eric Sandeen
2010-11-01 23:32   ` Eli Morris
2010-11-02  0:14     ` Eric Sandeen
2010-10-31 19:56 Eli Morris
2010-10-31 20:40 ` Emmanuel Florac
2010-11-01  3:40   ` Eli Morris
2010-11-01 10:07     ` Emmanuel Florac
2010-10-31 21:10 ` Steve Costaras
2010-11-01 15:03 ` Stan Hoeppner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CCD8021.2070403@chaven.com \
    --to=stevecs@chaven.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.