All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: rob pfile <rpfile@gmail.com>
Cc: Mikael Abrahamsson <swmike@swm.pp.se>, linux-raid@vger.kernel.org
Subject: Re: 4-disk raid5 with 2 disks going bad: best way to proceed?
Date: Fri, 8 Apr 2011 22:10:46 +1000	[thread overview]
Message-ID: <20110408221046.2aa5e685@notabene.brown> (raw)
In-Reply-To: <242C1984-F4B5-4C34-BF4C-619875BD9CAF@gmail.com>

On Thu, 7 Apr 2011 15:15:09 -0700 rob pfile <rpfile@gmail.com> wrote:

> 
> On Apr 7, 2011, at 12:21 AM, Mikael Abrahamsson wrote:
> 
> > On Wed, 6 Apr 2011, rob pfile wrote:
> > 
> >> Hi all,
> >> 
> >> any collective wisdom on what to do here? i've got a 4-disk raid5, and the most recent checkarray showed several bad blocks caused by uncorrectable read errors on two of the disks in the array. both disks in question show 0 reallocated sectors, but one looks like this:
> > 
> > Generally I'd recommend a "repair" as it would try to read all, if it can't read it properly, it'd recalculate from parity and as long as that write succeeded, you'd be golden.
> > 
> > To be safe, stop the array, dd_rescue the two bad drives, start the array again with the originals in the array, don't mount the filesystem, issue repair and see what happens.
> > 
> > This is one reason why I nowadays always run RAID6, then you can fail a drive and still have parity for read errors...
> > 
> > 
> 
> thanks for your reply.
> 
> from reading a similar thread, (http://www.spinics.net/lists/raid/msg31779.html) it's stated that the "repair" command will rebuild the parity if it is thought to be wrong.  i don't think i want to risk writing parity blocks that are probably now correct and could become corrupted because of a bad data read messing up the parity... it's almost like i want something inbetween "check" and "repair" where when the disk gives a hard error on a sector, the data block containing that sector is reconstructed from the parity and then immediately written back to the disk. or does "check" already do that? i was guessing that it did not, or else the drives would probably show a few reallocated sectors.

When a device gives a hard read error md/raid always calculates the correct
data from other devices (Assuming that parity is correct) and writes it out.
It does this for check and for repair and for normal IO.

I am no expert on SMART however if there are no reallocated sectors then
maybe what happened is that whenever md wrote to a bad sector, the drive
determined that the media there was still usable and wrote the data there.
But that is just a guess.

> 
> by "as long as that write succeeded you'd be golden", do you mean that re-writing the block would either reallocate the bad sector, or that perhaps the write would just succeed on the same physical sector, thus cleaning it up somehow? i think reallocating the sector would be preferable but i guess we don't have too much control over what the disk does.
> 
> i guess i should clone these disks as you suggest. if dd_rescue runs without error then i suppose i can just put in the replacement disks and forget about it. if not, i could try the repair. i assume that if the repair goes horribly wrong, i could just put the clones into the array. but as above i'd worry that if corrupt parity got rewritten during the repair of the original disks that perhaps the clones would no longer match the corrupt parity. in that case i'd have to run "repair" again after putting the clones in, i assume.

I would probably be using 'check' rather than 'repair'.  The key is to read
all the drive so as to find any bad block and correct them.  Check does this.
If check reports errors in mismatch_cnt, then it might be appropriate to run
'repair'.  If any data has been corrupt it is already to late to do anything
about it.

NeilBrown


> 
> 
> yeah, i should probably look into raid6.
> 
> thanks,
> 
> rob
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


  reply	other threads:[~2011-04-08 12:10 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-07  1:45 4-disk raid5 with 2 disks going bad: best way to proceed? rob pfile
2011-04-07  3:35 ` Roberto Spadim
2011-04-07  7:21 ` Mikael Abrahamsson
2011-04-07 22:15   ` rob pfile
2011-04-08 12:10     ` NeilBrown [this message]
2011-04-09 14:39       ` rob pfile
2011-04-07 20:13 ` Nagilum
2011-04-08 12:05   ` NeilBrown
2011-04-08 15:47     ` Nagilum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110408221046.2aa5e685@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=rpfile@gmail.com \
    --cc=swmike@swm.pp.se \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.