All of lore.kernel.org
 help / color / mirror / Atom feed
* Removing a failing drive from multiple arrays
@ 2012-04-19 18:54 Bill Davidsen
  2012-04-19 21:52 ` NeilBrown
  2012-04-20 14:35 ` John Stoffel
  0 siblings, 2 replies; 15+ messages in thread
From: Bill Davidsen @ 2012-04-19 18:54 UTC (permalink / raw)
  To: Linux RAID

[-- Attachment #1: Type: text/plain, Size: 1361 bytes --]

I have a failing drive, and partitions are in multiple arrays. I'm 
looking for the least painful and most reliable way to replace it. It's 
internal, I have a twin in an external box, and can create all the parts 
now and then swap the drive physically. The layout is complex, here's 
what blkdevtra tells me about this device, the full trace is attached.

Block device sdd, logical device 8:48
Model Family:     Seagate Barracuda 7200.10
Device Model:     ST3750640AS
Serial Number:    5QD330ZW
     Device size   732.575 GB
            sdd1     0.201 GB
            sdd2     3.912 GB
            sdd3    24.419 GB
            sdd4     0.000 GB
            sdd5    48.838 GB [md123] /mnt/workspace
            sdd6     0.498 GB
            sdd7    19.543 GB [md125]
            sdd8    29.303 GB [md126]
            sdd9   605.859 GB [md127] /exports/common
   Unpartitioned     0.003 GB

I think what I want to do is to partition the new drive, then one array 
at a time fail and remove the partition on the bad drive, and add a 
partition on the new good drive. Then repeat for each array until all 
are complete and on a new drive. Then I should be able to power off, 
remove the failed drive, put the good drive in the case, and the arrays 
should reassemble by UUID.

Does that sound right? Is there an easier way?

-- 
Bill Davidsen <davidsen@tmr.com>


[-- Attachment #2: blkdevtrc.out --]
[-- Type: text/plain, Size: 2858 bytes --]

block device trace for host pixels.tmr.com 2012-04-19 14:26

Block device sda, logical device 8:0
Device Model:     Patriot Pyro
Serial Number:    PT1210A00076662
    Device size    58.616 GB
           sda1     0.001 GB 
           sda2     0.512 GB /boot
           sda3    58.101 GB 
  Unpartitioned     0.002 GB
======
Block device sdb, logical device 8:16
Model Family:     Seagate Barracuda 7200.10
Device Model:     ST3750640AS
Serial Number:    5QD58GM2
    Device size   732.575 GB
           sdb1     0.201 GB [md121] 
           sdb2     3.912 GB [md120] SWAP
           sdb3    24.419 GB [md122] /mnt/root-fc9
           sdb4     0.000 GB 
           sdb5    48.838 GB [md123] /mnt/workspace
           sdb6     0.498 GB [md124] 
           sdb7    19.543 GB [md125] 
           sdb8    29.303 GB [md126] 
           sdb9   605.859 GB [md127] /exports/common
  Unpartitioned     0.003 GB
======
Block device sdc, logical device 8:32
Model Family:     Western Digital Caviar Green
Device Model:     WDC WD10EACS-00D6B1
Serial Number:    WD-WCAU44201083
    Device size   976.763 GB
           sdc1   976.760 GB /mnt/local
  Unpartitioned     0.003 GB
======
Block device sdd, logical device 8:48
Model Family:     Seagate Barracuda 7200.10
Device Model:     ST3750640AS
Serial Number:    5QD330ZW
    Device size   732.575 GB
           sdd1     0.201 GB 
           sdd2     3.912 GB 
           sdd3    24.419 GB 
           sdd4     0.000 GB 
           sdd5    48.838 GB [md123] /mnt/workspace
           sdd6     0.498 GB 
           sdd7    19.543 GB [md125] 
           sdd8    29.303 GB [md126] 
           sdd9   605.859 GB [md127] /exports/common
  Unpartitioned     0.003 GB
======
Block device sde, logical device 8:64
Model Family:     Seagate Barracuda 7200.10
Device Model:     ST3750640AS
Serial Number:    5QD5ABN1
    Device size   732.575 GB
           sde1     0.201 GB [md121] 
           sde2     3.912 GB [md120] SWAP
           sde3    24.419 GB [md122] /mnt/root-fc9
           sde4     0.000 GB 
           sde5    48.838 GB [md123] /mnt/workspace
           sde6     0.498 GB [md124] 
           sde7    19.543 GB [md125] 
           sde8    29.303 GB [md126] 
           sde9   605.859 GB [md127] /exports/common
  Unpartitioned     0.003 GB
======
Block device sdf, logical device 8:80
       No media (USB SD Reader)
======
Block device sdg, logical device 8:96
       No media (USB CF Reader)
======
Block device sdh, logical device 8:112
       No media (USB SM Reader)
======
Block device sdi, logical device 8:128
       No media (USB MS Reader)
======
Block device sdj, logical device 8:144
Model Family:     Western Digital Caviar Blue Serial ATA
    Device size   976.763 GB
           sdj1   976.762 GB /mnt/backup
  Unpartitioned     0.001 GB

blkdevtrc 1.16 (Stable) 2010-02-26 13:24:07-05 davidsen@tmr.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Removing a failing drive from multiple arrays
  2012-04-19 18:54 Removing a failing drive from multiple arrays Bill Davidsen
@ 2012-04-19 21:52 ` NeilBrown
  2012-04-20 14:30   ` Bill Davidsen
                     ` (2 more replies)
  2012-04-20 14:35 ` John Stoffel
  1 sibling, 3 replies; 15+ messages in thread
From: NeilBrown @ 2012-04-19 21:52 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Linux RAID

[-- Attachment #1: Type: text/plain, Size: 1984 bytes --]

On Thu, 19 Apr 2012 14:54:30 -0400 Bill Davidsen <davidsen@tmr.com> wrote:

> I have a failing drive, and partitions are in multiple arrays. I'm 
> looking for the least painful and most reliable way to replace it. It's 
> internal, I have a twin in an external box, and can create all the parts 
> now and then swap the drive physically. The layout is complex, here's 
> what blkdevtra tells me about this device, the full trace is attached.
> 
> Block device sdd, logical device 8:48
> Model Family:     Seagate Barracuda 7200.10
> Device Model:     ST3750640AS
> Serial Number:    5QD330ZW
>      Device size   732.575 GB
>             sdd1     0.201 GB
>             sdd2     3.912 GB
>             sdd3    24.419 GB
>             sdd4     0.000 GB
>             sdd5    48.838 GB [md123] /mnt/workspace
>             sdd6     0.498 GB
>             sdd7    19.543 GB [md125]
>             sdd8    29.303 GB [md126]
>             sdd9   605.859 GB [md127] /exports/common
>    Unpartitioned     0.003 GB
> 
> I think what I want to do is to partition the new drive, then one array 
> at a time fail and remove the partition on the bad drive, and add a 
> partition on the new good drive. Then repeat for each array until all 
> are complete and on a new drive. Then I should be able to power off, 
> remove the failed drive, put the good drive in the case, and the arrays 
> should reassemble by UUID.
> 
> Does that sound right? Is there an easier way?
> 

I would add the new partition before failing the old but that isn't a big
issues.

If you were running a really new kernel, used 1.x metadata, and were happy to
try out code that that hasn't had a lot of real-life testing you could (after
adding the new partition) do
   echo want_replacement > /sys/block/md123/md/dev-sdd5/state
(for example).

Then it would build the spare before failing the original.
You need linux 3.3 for this to have any chance of working.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Removing a failing drive from multiple arrays
  2012-04-19 21:52 ` NeilBrown
@ 2012-04-20 14:30   ` Bill Davidsen
  2012-04-22 22:33   ` Bill Davidsen
  2012-04-25  0:07   ` Bill Davidsen
  2 siblings, 0 replies; 15+ messages in thread
From: Bill Davidsen @ 2012-04-20 14:30 UTC (permalink / raw)
  To: Linux RAID; +Cc: NeilBrown

NeilBrown wrote:
> On Thu, 19 Apr 2012 14:54:30 -0400 Bill Davidsen<davidsen@tmr.com>  wrote:
>
>> I have a failing drive, and partitions are in multiple arrays. I'm
>> looking for the least painful and most reliable way to replace it. It's
>> internal, I have a twin in an external box, and can create all the parts
>> now and then swap the drive physically. The layout is complex, here's
>> what blkdevtra tells me about this device, the full trace is attached.
>>
>> Block device sdd, logical device 8:48
>> Model Family:     Seagate Barracuda 7200.10
>> Device Model:     ST3750640AS
>> Serial Number:    5QD330ZW
>>       Device size   732.575 GB
>>              sdd1     0.201 GB
>>              sdd2     3.912 GB
>>              sdd3    24.419 GB
>>              sdd4     0.000 GB
>>              sdd5    48.838 GB [md123] /mnt/workspace
>>              sdd6     0.498 GB
>>              sdd7    19.543 GB [md125]
>>              sdd8    29.303 GB [md126]
>>              sdd9   605.859 GB [md127] /exports/common
>>     Unpartitioned     0.003 GB
>>
>> I think what I want to do is to partition the new drive, then one array
>> at a time fail and remove the partition on the bad drive, and add a
>> partition on the new good drive. Then repeat for each array until all
>> are complete and on a new drive. Then I should be able to power off,
>> remove the failed drive, put the good drive in the case, and the arrays
>> should reassemble by UUID.
>>
>> Does that sound right? Is there an easier way?
>>
>
> I would add the new partition before failing the old but that isn't a big
> issues.
>
> If you were running a really new kernel, used 1.x metadata, and were happy to
> try out code that that hasn't had a lot of real-life testing you could (after
> adding the new partition) do
>     echo want_replacement>  /sys/block/md123/md/dev-sdd5/state
> (for example).
>
> Then it would build the spare before failing the original.
> You need linux 3.3 for this to have any chance of working.
>
Seems I got this a day late, but I will happily do some testing in real 
world conditions when I get another replacement drive, since I noted 
some issues in another drive.

Kernel is 3.3.1-5 in Fedora 16, should have mentioned that, I guess.

Thanks for the input, wonder why multiple drives are dying, coincidence 
or some other problem? I did check the p/s, voltages are all good, 
minimal ripple, no spikes, surge protection and UPS on the power, etc.


-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Removing a failing drive from multiple arrays
  2012-04-19 18:54 Removing a failing drive from multiple arrays Bill Davidsen
  2012-04-19 21:52 ` NeilBrown
@ 2012-04-20 14:35 ` John Stoffel
  2012-04-20 16:31   ` John Robinson
  1 sibling, 1 reply; 15+ messages in thread
From: John Stoffel @ 2012-04-20 14:35 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Linux RAID


Bill> I have a failing drive, and partitions are in multiple
Bill> arrays. 

Ugh!  Why?  This is why I love LVM on top of MD.  I just mirror
drives, then carve them up as needed.  Yes, you need to have two (or
more) drives of the same approximate size, but that's easy.  

Mirroring partitions just seems to be asking for trouble to me.  

Bill> I'm looking for the least painful and most reliable way
Bill> to replace it. It's internal, I have a twin in an external box,
Bill> and can create all the parts now and then swap the drive
Bill> physically. The layout is complex, here's what blkdevtra tells
Bill> me about this device, the full trace is attached.

Bill> Block device sdd, logical device 8:48
Bill> Model Family:     Seagate Barracuda 7200.10
Bill> Device Model:     ST3750640AS
Bill> Serial Number:    5QD330ZW
Bill>      Device size   732.575 GB
Bill>             sdd1     0.201 GB
Bill>             sdd2     3.912 GB
Bill>             sdd3    24.419 GB
Bill>             sdd4     0.000 GB
Bill>             sdd5    48.838 GB [md123] /mnt/workspace
Bill>             sdd6     0.498 GB
Bill>             sdd7    19.543 GB [md125]
Bill>             sdd8    29.303 GB [md126]
Bill>             sdd9   605.859 GB [md127] /exports/common
Bill>    Unpartitioned     0.003 GB

Bill> I think what I want to do is to partition the new drive, then one array 
Bill> at a time fail and remove the partition on the bad drive, and add a 
Bill> partition on the new good drive. Then repeat for each array until all 
Bill> are complete and on a new drive. Then I should be able to power off, 
Bill> remove the failed drive, put the good drive in the case, and the arrays 
Bill> should reassemble by UUID.

Sounds like a plan to me, esp if you script it and let it do all the
work over night while you're asleep.  

Bill> Does that sound right? Is there an easier way?

Niel has the better way if you're running a new kernel, but since that
implies downtime anyway... I doubt you'll do it until you've got the
data moved.

Personally, I'd move to LVM on top of MD to make life simpler...

John

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Removing a failing drive from multiple arrays
  2012-04-20 14:35 ` John Stoffel
@ 2012-04-20 16:31   ` John Robinson
       [not found]     ` <CAK2H+efwgznsS4==Rrtm6UE=uOb25-Q0Qm84i8yAJEJJ2JLdgg@mail.gmail.com>
  0 siblings, 1 reply; 15+ messages in thread
From: John Robinson @ 2012-04-20 16:31 UTC (permalink / raw)
  To: John Stoffel; +Cc: Bill Davidsen, Linux RAID

On 20/04/2012 15:35, John Stoffel wrote:
>
> Bill>  I have a failing drive, and partitions are in multiple
> Bill>  arrays.
>
> Ugh!  Why?  This is why I love LVM on top of MD.  I just mirror
> drives, then carve them up as needed.  Yes, you need to have two (or
> more) drives of the same approximate size, but that's easy.
>
> Mirroring partitions just seems to be asking for trouble to me.

On small machines (3-6 drives) I will regularly have a RAID-1 /boot, 
RAID-10 swap and RAID-5 or 6 everything else, done with partitions. I do 
use LVM too though - that "everything else" will be LVM over the big RAID.

Cheers,

John.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Removing a failing drive from multiple arrays
       [not found]     ` <CAK2H+efwgznsS4==Rrtm6UE=uOb25-Q0Qm84i8yAJEJJ2JLdgg@mail.gmail.com>
@ 2012-04-22 18:41       ` John Robinson
  2012-04-26  2:37         ` Bill Davidsen
  0 siblings, 1 reply; 15+ messages in thread
From: John Robinson @ 2012-04-22 18:41 UTC (permalink / raw)
  To: Mark Knecht; +Cc: Linux RAID

On 20/04/2012 17:50, Mark Knecht wrote:
> On Fri, Apr 20, 2012 at 9:31 AM, John Robinson
> <john.robinson@anonymous.org.uk>  wrote:
[...]
>> On small machines (3-6 drives) I will regularly have a RAID-1 /boot, RAID-10
>> swap and RAID-5 or 6 everything else, done with partitions. I do use LVM too
>> though - that "everything else" will be LVM over the big RAID.
>
> Thanks for the info John. Can you tell me what are the requirements
> for the RAID-1 /boot? grub2? initrd? BIOS-based RAID? Something else?
> I'm still just mirroring my boot drives - booting from sda1 but
> copying everything to /sdb1, sdc1, etc. I think I'd like to go full
> RAID on /boot if the requirements are too high.

Pretty much any modern distro's installer will do the right thing with 
whatever boot loader it uses. Also, there are so many boot methods - 
BIOS/MBR, GPT, UEFI - that in one short email it's tricky to give advice 
in one short email. It's all out there via Google.

But no, you can do a RAID-1 /boot with LILO or grub, and without BIOS 
RAID. The BIOS will boot off the first available hard drive and doesn't 
understand md RAID. grub doesn't understand md RAID either. You have to 
make your md RAID-1 /boot with metadata 1.0 (or 0.90) because they have 
the data at the beginning so when you create a filesystem on the array, 
each individual component (partition) looks like it has a filesystem on 
it. You install grub (or LILO) onto the MBR (or GPT boot partition, 
which isn't the same as your /boot partition), pointing to the partition 
(not the md array).

grub2 does understand md RAID, but has to be loaded by the BIOS, so 
there are still restrictions.

Without either hardware or BIOS RAID, you can still end up being unable 
to boot, e.g. the BIOS will try to boot from the first hard drive 
present, but if it has bad sectors in the MBR or /boot partition, 
booting may fail even though there's a perfectly good mirror on the 
second drive, because the BIOS doesn't understand RAID. This has 
happened to me :-(

Cheers,

John.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Removing a failing drive from multiple arrays
  2012-04-19 21:52 ` NeilBrown
  2012-04-20 14:30   ` Bill Davidsen
@ 2012-04-22 22:33   ` Bill Davidsen
  2012-04-22 22:55     ` NeilBrown
  2012-04-25  0:07   ` Bill Davidsen
  2 siblings, 1 reply; 15+ messages in thread
From: Bill Davidsen @ 2012-04-22 22:33 UTC (permalink / raw)
  To: NeilBrown; +Cc: Linux RAID

NeilBrown wrote:
> On Thu, 19 Apr 2012 14:54:30 -0400 Bill Davidsen<davidsen@tmr.com>  wrote:
>
>> I have a failing drive, and partitions are in multiple arrays. I'm
>> looking for the least painful and most reliable way to replace it. It's
>> internal, I have a twin in an external box, and can create all the parts
>> now and then swap the drive physically. The layout is complex, here's
>> what blkdevtra tells me about this device, the full trace is attached.
>>
>> Block device sdd, logical device 8:48
>> Model Family:     Seagate Barracuda 7200.10
>> Device Model:     ST3750640AS
>> Serial Number:    5QD330ZW
>>       Device size   732.575 GB
>>              sdd1     0.201 GB
>>              sdd2     3.912 GB
>>              sdd3    24.419 GB
>>              sdd4     0.000 GB
>>              sdd5    48.838 GB [md123] /mnt/workspace
>>              sdd6     0.498 GB
>>              sdd7    19.543 GB [md125]
>>              sdd8    29.303 GB [md126]
>>              sdd9   605.859 GB [md127] /exports/common
>>     Unpartitioned     0.003 GB
>>
>> I think what I want to do is to partition the new drive, then one array
>> at a time fail and remove the partition on the bad drive, and add a
>> partition on the new good drive. Then repeat for each array until all
>> are complete and on a new drive. Then I should be able to power off,
>> remove the failed drive, put the good drive in the case, and the arrays
>> should reassemble by UUID.
>>
>> Does that sound right? Is there an easier way?
>>
> I would add the new partition before failing the old but that isn't a big
> issues.
>
> If you were running a really new kernel, used 1.x metadata, and were happy to
> try out code that that hasn't had a lot of real-life testing you could (after
> adding the new partition) do
>     echo want_replacement>  /sys/block/md123/md/dev-sdd5/state
> (for example).
>
> Then it would build the spare before failing the original.
> You need linux 3.3 for this to have any chance of working.
>
> NeilBrown

I expect to try this in a real world case tomorrow. Am I so lucky that when 
rebuilding the failing drive will be copies in a way which uses a recovered 
value for the chunk if there's a bad block? And only if there's a bad block, so 
that possible evil on the other drives would not be a problem unless they were 
at the same chunk?

As soon as the pack of replacements arrives I'll let you know how well this 
worked, if at all.


-- 
Bill Davidsen<davidsen@tmr.com>
   We are not out of the woods yet, but we know the direction and have
taken the first step. The steps are many, but finite in number, and if
we persevere we will reach our destination.  -me, 2010




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Removing a failing drive from multiple arrays
  2012-04-22 22:33   ` Bill Davidsen
@ 2012-04-22 22:55     ` NeilBrown
  0 siblings, 0 replies; 15+ messages in thread
From: NeilBrown @ 2012-04-22 22:55 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Linux RAID

[-- Attachment #1: Type: text/plain, Size: 2750 bytes --]

On Sun, 22 Apr 2012 18:33:36 -0400 Bill Davidsen <davidsen@tmr.com> wrote:

> NeilBrown wrote:
> > On Thu, 19 Apr 2012 14:54:30 -0400 Bill Davidsen<davidsen@tmr.com>  wrote:
> >
> >> I have a failing drive, and partitions are in multiple arrays. I'm
> >> looking for the least painful and most reliable way to replace it. It's
> >> internal, I have a twin in an external box, and can create all the parts
> >> now and then swap the drive physically. The layout is complex, here's
> >> what blkdevtra tells me about this device, the full trace is attached.
> >>
> >> Block device sdd, logical device 8:48
> >> Model Family:     Seagate Barracuda 7200.10
> >> Device Model:     ST3750640AS
> >> Serial Number:    5QD330ZW
> >>       Device size   732.575 GB
> >>              sdd1     0.201 GB
> >>              sdd2     3.912 GB
> >>              sdd3    24.419 GB
> >>              sdd4     0.000 GB
> >>              sdd5    48.838 GB [md123] /mnt/workspace
> >>              sdd6     0.498 GB
> >>              sdd7    19.543 GB [md125]
> >>              sdd8    29.303 GB [md126]
> >>              sdd9   605.859 GB [md127] /exports/common
> >>     Unpartitioned     0.003 GB
> >>
> >> I think what I want to do is to partition the new drive, then one array
> >> at a time fail and remove the partition on the bad drive, and add a
> >> partition on the new good drive. Then repeat for each array until all
> >> are complete and on a new drive. Then I should be able to power off,
> >> remove the failed drive, put the good drive in the case, and the arrays
> >> should reassemble by UUID.
> >>
> >> Does that sound right? Is there an easier way?
> >>
> > I would add the new partition before failing the old but that isn't a big
> > issues.
> >
> > If you were running a really new kernel, used 1.x metadata, and were happy to
> > try out code that that hasn't had a lot of real-life testing you could (after
> > adding the new partition) do
> >     echo want_replacement>  /sys/block/md123/md/dev-sdd5/state
> > (for example).
> >
> > Then it would build the spare before failing the original.
> > You need linux 3.3 for this to have any chance of working.
> >
> > NeilBrown
> 
> I expect to try this in a real world case tomorrow. Am I so lucky that when 
> rebuilding the failing drive will be copies in a way which uses a recovered 
> value for the chunk if there's a bad block? And only if there's a bad block, so 
> that possible evil on the other drives would not be a problem unless they were 
> at the same chunk?

That's the theory, yes.

> 
> As soon as the pack of replacements arrives I'll let you know how well this 
> worked, if at all.
> 
> 

Thanks.

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Removing a failing drive from multiple arrays
  2012-04-19 21:52 ` NeilBrown
  2012-04-20 14:30   ` Bill Davidsen
  2012-04-22 22:33   ` Bill Davidsen
@ 2012-04-25  0:07   ` Bill Davidsen
  2 siblings, 0 replies; 15+ messages in thread
From: Bill Davidsen @ 2012-04-25  0:07 UTC (permalink / raw)
  To: NeilBrown; +Cc: Linux RAID

NeilBrown wrote:
> On Thu, 19 Apr 2012 14:54:30 -0400 Bill Davidsen<davidsen@tmr.com>  wrote:
>
>> I have a failing drive, and partitions are in multiple arrays. I'm
>> looking for the least painful and most reliable way to replace it. It's
>> internal, I have a twin in an external box, and can create all the parts
>> now and then swap the drive physically. The layout is complex, here's
>> what blkdevtra tells me about this device, the full trace is attached.
>>
>> Block device sdd, logical device 8:48
>> Model Family:     Seagate Barracuda 7200.10
>> Device Model:     ST3750640AS
>> Serial Number:    5QD330ZW
>>       Device size   732.575 GB
>>              sdd1     0.201 GB
>>              sdd2     3.912 GB
>>              sdd3    24.419 GB
>>              sdd4     0.000 GB
>>              sdd5    48.838 GB [md123] /mnt/workspace
>>              sdd6     0.498 GB
>>              sdd7    19.543 GB [md125]
>>              sdd8    29.303 GB [md126]
>>              sdd9   605.859 GB [md127] /exports/common
>>     Unpartitioned     0.003 GB
>>
>> I think what I want to do is to partition the new drive, then one array
>> at a time fail and remove the partition on the bad drive, and add a
>> partition on the new good drive. Then repeat for each array until all
>> are complete and on a new drive. Then I should be able to power off,
>> remove the failed drive, put the good drive in the case, and the arrays
>> should reassemble by UUID.
>>
>> Does that sound right? Is there an easier way?
>>
>
> I would add the new partition before failing the old but that isn't a big
> issues.
>
> If you were running a really new kernel, used 1.x metadata, and were happy to
> try out code that that hasn't had a lot of real-life testing you could (after
> adding the new partition) do
>     echo want_replacement>  /sys/block/md123/md/dev-sdd5/state
> (for example).
>
> Then it would build the spare before failing the original.
> You need linux 3.3 for this to have any chance of working.
>
Well, it does occur, has on the first bunch of partitions, is now doing 
the big ~TB one. And because I'm nervous about power cycling sick disks 
(been there, done that) I am doing the whole rebuild onto drives 
attached by USB and eSATA connections. On the last one now.

I did them all live and running, although I did "swapoff" the one for 
swap, it isn't really needed and just seems like a bad thing to be 
diddling while the system is using it.

Good news, it has worked perfectly, bad news it doesn't do what I 
thought it did. For RAID-[56] it does what I expected and pulls data off 
the partition marked for replacement, but with RAID-10 2f layout the 
"take the best copy" logic seems to take over and data comes from all 
active drives. I would have expected it to come from the failing drive 
first and be taken elsewhere only if the failing drive didn't provide 
the data. I have seen cases where migration failed due to a bad sector 
on another drive, so that's unexpected. I don't claim it wrong, just 
"not what I expected."

I think in a perfect world (where you have infinite time to diddle 
stuff), it would be useful to have three options:
  - favor the failing drive, recover what you must
  - reconstruct all data possible, don't use the failing drive
  - build the new copy fastest way possible, get it where it's available.

In any case this feature worked just fine, and I put my thoughts on the 
method out for comment. By morning the last rebuild will be done, and I 
can actually pull the bad drives by serial number, hope the UUID means 
the new drive can go anywhere, add another eSATA card and Blu-Ray 
burner, and be up solid.


-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Removing a failing drive from multiple arrays
  2012-04-22 18:41       ` John Robinson
@ 2012-04-26  2:37         ` Bill Davidsen
  2012-04-26  6:19           ` John Robinson
  2012-04-26  7:36           ` Brian Candler
  0 siblings, 2 replies; 15+ messages in thread
From: Bill Davidsen @ 2012-04-26  2:37 UTC (permalink / raw)
  To: Linux RAID

John Robinson wrote:
> On 20/04/2012 17:50, Mark Knecht wrote:
>> On Fri, Apr 20, 2012 at 9:31 AM, John Robinson
>> <john.robinson@anonymous.org.uk> wrote:
> [...]
>>> On small machines (3-6 drives) I will regularly have a RAID-1 /boot, RAID-10
>>> swap and RAID-5 or 6 everything else, done with partitions. I do use LVM too
>>> though - that "everything else" will be LVM over the big RAID.
>>
>> Thanks for the info John. Can you tell me what are the requirements
>> for the RAID-1 /boot? grub2? initrd? BIOS-based RAID? Something else?
>> I'm still just mirroring my boot drives - booting from sda1 but
>> copying everything to /sdb1, sdc1, etc. I think I'd like to go full
>> RAID on /boot if the requirements are too high.
>
> Pretty much any modern distro's installer will do the right thing with whatever
> boot loader it uses. Also, there are so many boot methods - BIOS/MBR, GPT, UEFI
> - that in one short email it's tricky to give advice in one short email. It's
> all out there via Google.
>
> But no, you can do a RAID-1 /boot with LILO or grub, and without BIOS RAID. The
> BIOS will boot off the first available hard drive and doesn't understand md
> RAID. grub doesn't understand md RAID either. You have to make your md RAID-1
> /boot with metadata 1.0 (or 0.90) because they have the data at the beginning so
> when you create a filesystem on the array, each individual component (partition)
> looks like it has a filesystem on it. You install grub (or LILO) onto the MBR
> (or GPT boot partition, which isn't the same as your /boot partition), pointing
> to the partition (not the md array).
>
> grub2 does understand md RAID, but has to be loaded by the BIOS, so there are
> still restrictions.
>
> Without either hardware or BIOS RAID, you can still end up being unable to boot,
> e.g. the BIOS will try to boot from the first hard drive present, but if it has
> bad sectors in the MBR or /boot partition, booting may fail even though there's
> a perfectly good mirror on the second drive, because the BIOS doesn't understand
> RAID. This has happened to me :-(
>
Doesn't need to understand RAID, just be willing to try the next item in the 
boot list on failure. My experience has been that almost every BIOS will try the 
2nd item if the 1st fails totally (ie. drive isn't there). A _good_ BIOS will 
try the next item on sector error in the MBR. After that the BIOS needs to 
understand a lot more to do anything smart after the MBR runs.

I have put MBR and boot partition on a USB thumb drive because the failure rate 
of a R/O flash is lower than rotating devices (in my experience). Use ext2 for 
boot, no journal so the drive works really read-only. Hopefully grub2 mounts the 
boot noatime.



-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Removing a failing drive from multiple arrays
  2012-04-26  2:37         ` Bill Davidsen
@ 2012-04-26  6:19           ` John Robinson
  2012-04-26  7:36           ` Brian Candler
  1 sibling, 0 replies; 15+ messages in thread
From: John Robinson @ 2012-04-26  6:19 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Linux RAID

On 26/04/2012 03:37, Bill Davidsen wrote:
> John Robinson wrote:
[...]
>> Without either hardware or BIOS RAID, you can still end up being
>> unable to boot,
>> e.g. the BIOS will try to boot from the first hard drive present, but
>> if it has
>> bad sectors in the MBR or /boot partition, booting may fail even
>> though there's
>> a perfectly good mirror on the second drive, because the BIOS doesn't
>> understand
>> RAID. This has happened to me :-(
>>
> Doesn't need to understand RAID, just be willing to try the next item in
> the boot list on failure. My experience has been that almost every BIOS
> will try the 2nd item if the 1st fails totally (ie. drive isn't there).
> A _good_ BIOS will try the next item on sector error in the MBR.

Absolutely - but in the case I had, grub couldn't load its next stage 
because of a sector error.

It probably didn't help that the weekly array scrubs don't touch the 
space between the MBR and the first partition, where that code lives.

> After that the BIOS needs to understand a lot more to do anything
> smart after the MBR runs.

Agreed; only a RAID BIOS (software e.g. IMSM or hardware RAID card) 
could have saved me from the above failure.

Cheers,

John.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Removing a failing drive from multiple arrays
  2012-04-26  2:37         ` Bill Davidsen
  2012-04-26  6:19           ` John Robinson
@ 2012-04-26  7:36           ` Brian Candler
  2012-04-26 12:59             ` Bill Davidsen
  1 sibling, 1 reply; 15+ messages in thread
From: Brian Candler @ 2012-04-26  7:36 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Linux RAID

On Wed, Apr 25, 2012 at 10:37:50PM -0400, Bill Davidsen wrote:
> I have put MBR and boot partition on a USB thumb drive because the
> failure rate of a R/O flash is lower than rotating devices (in my
> experience). Use ext2 for boot, no journal so the drive works really
> read-only. Hopefully grub2 mounts the boot noatime.

Another option, although I've not done this for a long time, is PXE boot. 
You need a DHCP server giving out the correct parameters and a TFTP server
for the kernel (and ramdisk?)

Regards,

Brian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Removing a failing drive from multiple arrays
  2012-04-26  7:36           ` Brian Candler
@ 2012-04-26 12:59             ` Bill Davidsen
  2012-04-26 13:23               ` Brian Candler
  0 siblings, 1 reply; 15+ messages in thread
From: Bill Davidsen @ 2012-04-26 12:59 UTC (permalink / raw)
  To: Brian Candler; +Cc: Linux RAID

Brian Candler wrote:
> On Wed, Apr 25, 2012 at 10:37:50PM -0400, Bill Davidsen wrote:
>> I have put MBR and boot partition on a USB thumb drive because the
>> failure rate of a R/O flash is lower than rotating devices (in my
>> experience). Use ext2 for boot, no journal so the drive works really
>> read-only. Hopefully grub2 mounts the boot noatime.
> Another option, although I've not done this for a long time, is PXE boot.
> You need a DHCP server giving out the correct parameters and a TFTP server
> for the kernel (and ramdisk?)
>

I think that's addressing one point of failure while adding more. The network 
(as a collection of single points of failure), the server(s), all seem to go in 
the "must work" category. Adding stuff in parallel is good, if one works the 
process works, in series is bad, if one fails the process fails.

Like many people I'm trying for five star reliability on a three star budget. I 
do have redundant servers and storage, I lack up to the last transaction file 
duplication. And as I found last year, my firewall and DNS have a backup, but 
it's on a shelf, not running in parallel. That's on my list of things to do this 
summer.

Thanks for the thought, that would be a great install option, wouldn't it?

-- 
Bill Davidsen<davidsen@tmr.com>
   We are not out of the woods yet, but we know the direction and have
taken the first step. The steps are many, but finite in number, and if
we persevere we will reach our destination.  -me, 2010




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Removing a failing drive from multiple arrays
  2012-04-26 12:59             ` Bill Davidsen
@ 2012-04-26 13:23               ` Brian Candler
  2012-04-26 21:17                 ` Bill Davidsen
  0 siblings, 1 reply; 15+ messages in thread
From: Brian Candler @ 2012-04-26 13:23 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Linux RAID

On Thu, Apr 26, 2012 at 08:59:01AM -0400, Bill Davidsen wrote:
> >Another option, although I've not done this for a long time, is PXE boot.
> >You need a DHCP server giving out the correct parameters and a TFTP server
> >for the kernel (and ramdisk?)
> >
> 
> I think that's addressing one point of failure while adding more.

Alternate point of view: you need reliable DNS anyway, and it's no harder to
make reliable DHCP than reliable DNS (you just have two of them).
Furthermore, this only needs to be available at the time a server boots up.

I worked in one place where they made all the servers (which were VMs) pick
up their IPs via DHCP.  This allowed them to dump the images across to the
disaster recovery site, boot them up there, and bring them all up on new IPs
without touching any configs.  It worked well.  They ran DHCP service on the
Cisco L3 switches that the VM hosts were plugged into, on the basis that
this was the most reliable kit that they had.

Like you say, it's a balancing act of cost, complexity, and reliability, and
is also coloured by experience.  I've had not-so-good experiences with USB
thumb drives.

Regards,

Brian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Removing a failing drive from multiple arrays
  2012-04-26 13:23               ` Brian Candler
@ 2012-04-26 21:17                 ` Bill Davidsen
  0 siblings, 0 replies; 15+ messages in thread
From: Bill Davidsen @ 2012-04-26 21:17 UTC (permalink / raw)
  To: Brian Candler; +Cc: Linux RAID

Brian Candler wrote:
> On Thu, Apr 26, 2012 at 08:59:01AM -0400, Bill Davidsen wrote:
>>> Another option, although I've not done this for a long time, is PXE boot.
>>> You need a DHCP server giving out the correct parameters and a TFTP server
>>> for the kernel (and ramdisk?)
>>>
>> I think that's addressing one point of failure while adding more.
> Alternate point of view: you need reliable DNS anyway, and it's no harder to
> make reliable DHCP than reliable DNS (you just have two of them).
> Furthermore, this only needs to be available at the time a server boots up.
>
> I worked in one place where they made all the servers (which were VMs) pick
> up their IPs via DHCP.  This allowed them to dump the images across to the
> disaster recovery site, boot them up there, and bring them all up on new IPs
> without touching any configs.  It worked well.  They ran DHCP service on the
> Cisco L3 switches that the VM hosts were plugged into, on the basis that
> this was the most reliable kit that they had.
That is why my DNS/DHCP server is so vital :-(
I have been doing my IP assign that way for some years, and using the ability of 
KVM to set the MAC of the VM so I can take a fairly standard image and change 
the name of the machine and IP at boot time. I have had less luck getting IPv6 
working "right" and at the moment the IPv6 enabled servers get their IP from DNS 
look-up on their name, and their name from DHCP in IPv4. It's ugly but solid, 
I'm in not rush to find out why my IPv6 DHCP is unreliably.
> Like you say, it's a balancing act of cost, complexity, and reliability, and
> is also coloured by experience.  I've had not-so-good experiences with USB
> thumb drives.
>
I've had failures on the old ones due to write cycle fatigue. Once written it 
should be stable as a read-only device. That's my thinking, if SSD is reliable, 
then thumb drives should be slow but reliable, too. I'm saying that, feel free 
to provide your opinion or contradictory facts. I take little on faith myself.

-- 
Bill Davidsen<davidsen@tmr.com>
   We are not out of the woods yet, but we know the direction and have
taken the first step. The steps are many, but finite in number, and if
we persevere we will reach our destination.  -me, 2010




^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-04-26 21:17 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-19 18:54 Removing a failing drive from multiple arrays Bill Davidsen
2012-04-19 21:52 ` NeilBrown
2012-04-20 14:30   ` Bill Davidsen
2012-04-22 22:33   ` Bill Davidsen
2012-04-22 22:55     ` NeilBrown
2012-04-25  0:07   ` Bill Davidsen
2012-04-20 14:35 ` John Stoffel
2012-04-20 16:31   ` John Robinson
     [not found]     ` <CAK2H+efwgznsS4==Rrtm6UE=uOb25-Q0Qm84i8yAJEJJ2JLdgg@mail.gmail.com>
2012-04-22 18:41       ` John Robinson
2012-04-26  2:37         ` Bill Davidsen
2012-04-26  6:19           ` John Robinson
2012-04-26  7:36           ` Brian Candler
2012-04-26 12:59             ` Bill Davidsen
2012-04-26 13:23               ` Brian Candler
2012-04-26 21:17                 ` Bill Davidsen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.