All of lore.kernel.org
 help / color / mirror / Atom feed
* RAID 10 array won't assemble, all devices marked spare, confusing mdadm metadata
@ 2009-04-17 14:14 Dave Fisher
  2009-04-17 22:38 ` Neil Brown
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Fisher @ 2009-04-17 14:14 UTC (permalink / raw)
  To: linux-raid

Hi,

Please forgive me if I'm posting to the wrong list because I've
misunderstood the list's parameters.  The osdl wiki, and the list
archives it points to, suggest that this is *not* a developer-only
list. If I've got that wrong, please redirect me to somewhere more
appropriate.

I need help to diagnose a RAID 10 failure, but I'm struggling to find
anyone with genuinely authoritive knowledge of Linux software raid who
might be willing to spare a little time to help me out.

The affected system is an Ubuntu 8.10 amd64 server, running a 2.6.27-11
kernel.

I have two RAID arrays:

[CODE]
  $ sudo mdadm --examine --scan -v
  ARRAY /dev/md0 level=raid1 num-devices=2 UUID=e1023500:94537d05:cb667a5a:bd8e784b
     spares=1   devices=/dev/sde2,/dev/sdd2,/dev/sdc2,/dev/sdb2,/dev/sda2
  ARRAY /dev/md1 level=raid10 num-devices=4 UUID=f4ddbd55:206c7f81:b855f41b:37d33d37
     spares=1   devices=/dev/sde4,/dev/sdd4,/dev/sdc4,/dev/sdb4,/dev/sda4
[/CODE]

/dev/md1 doesn't assemble on boot, and I can't assemble it manually (although
that might be because I don't know how to):

[CODE]
  $ sudo mdadm --assemble /dev/md1 /dev/sda4 /dev/sdb4 /dev/sdc4 /dev/sdd4 /dev/sde4
  mdadm: /dev/md1 assembled from 1 drive and 1 spare - not enough to start the array.
[/CODE]

As you can see from mdstat, the kernel appears to have marked all the
partitions in /dev/md1 as spare (S):

[CODE]
  $ cat /proc/mdstat
  Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
  md1 : inactive sda4[0](S) sde4[4](S) sdd4[3](S) sdc4[2](S) sdb4[1](S)
        4829419520 blocks
         
  md0 : active raid1 sda2[0] sde2[2](S) sdb2[1]
        9767424 blocks [2/2] [UU]
        
  unused devices: <none>
[/CODE]

This is clearly wrong.  The /dev/md1 arrray should have 4 members plus one spare. 

When I examine the partitions in /dev/md1, the messages are confusing, and seem contradictory: 

[CODE]
$ sudo mdadm -E /dev/sd{a,b,c,d,e}4
/dev/sda4:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : f4ddbd55:206c7f81:b855f41b:37d33d37
  Creation Time : Tue May  6 02:06:45 2008
     Raid Level : raid10
  Used Dev Size : 965883904 (921.14 GiB 989.07 GB)
     Array Size : 1931767808 (1842.28 GiB 1978.13 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 1

    Update Time : Tue Apr 14 00:45:27 2009
          State : active
 Active Devices : 3
Working Devices : 4
 Failed Devices : 1
  Spare Devices : 1
       Checksum : 7a3576c1 - correct
         Events : 221

         Layout : near=2, far=1
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8       20        0      active sync   /dev/sdb4

   0     0       8       20        0      active sync   /dev/sdb4
   1     1       8       36        1      active sync   /dev/sdc4
   2     2       0        0        2      faulty removed
   3     3       8       68        3      active sync   /dev/sde4
   4     4       8       84        4      spare
/dev/sdb4:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : f4ddbd55:206c7f81:b855f41b:37d33d37
  Creation Time : Tue May  6 02:06:45 2008
     Raid Level : raid10
  Used Dev Size : 965883904 (921.14 GiB 989.07 GB)
     Array Size : 1931767808 (1842.28 GiB 1978.13 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 1

    Update Time : Tue Apr 14 00:44:13 2009
          State : active
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1
       Checksum : 7a35767a - correct
         Events : 219

         Layout : near=2, far=1
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1       8       36        1      active sync   /dev/sdc4

   0     0       8       20        0      active sync   /dev/sdb4
   1     1       8       36        1      active sync   /dev/sdc4
   2     2       8       52        2      active sync   /dev/sdd4
   3     3       8       68        3      active sync   /dev/sde4
   4     4       8       84        4      spare
/dev/sdc4:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : f4ddbd55:206c7f81:b855f41b:37d33d37
  Creation Time : Tue May  6 02:06:45 2008
     Raid Level : raid10
  Used Dev Size : 965883904 (921.14 GiB 989.07 GB)
     Array Size : 1931767808 (1842.28 GiB 1978.13 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 1

    Update Time : Tue Apr 14 00:44:13 2009
          State : active
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1
       Checksum : 7a35768c - correct
         Events : 219

         Layout : near=2, far=1
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     2       8       52        2      active sync   /dev/sdd4

   0     0       8       20        0      active sync   /dev/sdb4
   1     1       8       36        1      active sync   /dev/sdc4
   2     2       8       52        2      active sync   /dev/sdd4
   3     3       8       68        3      active sync   /dev/sde4
   4     4       8       84        4      spare
/dev/sdd4:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : f4ddbd55:206c7f81:b855f41b:37d33d37
  Creation Time : Tue May  6 02:06:45 2008
     Raid Level : raid10
  Used Dev Size : 965883904 (921.14 GiB 989.07 GB)
     Array Size : 1931767808 (1842.28 GiB 1978.13 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 1

    Update Time : Tue Apr 14 00:44:13 2009
          State : active
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1
       Checksum : 7a35769e - correct
         Events : 219

         Layout : near=2, far=1
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     3       8       68        3      active sync   /dev/sde4

   0     0       8       20        0      active sync   /dev/sdb4
   1     1       8       36        1      active sync   /dev/sdc4
   2     2       8       52        2      active sync   /dev/sdd4
   3     3       8       68        3      active sync   /dev/sde4
   4     4       8       84        4      spare
/dev/sde4:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : f4ddbd55:206c7f81:b855f41b:37d33d37
  Creation Time : Tue May  6 02:06:45 2008
     Raid Level : raid10
  Used Dev Size : 965883904 (921.14 GiB 989.07 GB)
     Array Size : 1931767808 (1842.28 GiB 1978.13 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 1

    Update Time : Fri Apr 10 16:43:47 2009
          State : clean
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1
       Checksum : 7a31126a - correct
         Events : 218

         Layout : near=2, far=1
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     4       8       84        4      spare

   0     0       8       20        0      active sync   /dev/sdb4
   1     1       8       36        1      active sync   /dev/sdc4
   2     2       8       52        2      active sync   /dev/sdd4
   3     3       8       68        3      active sync   /dev/sde4
   4     4       8       84        4      spare
[/CODE]

As you can see, sda4 is not explicitly listed in any of the tables above.

I am guessing that this is because mdadm thinks that sda4 is the actual spare.  

I'm not sure that mdadm is correct.  Based on the fdisk readouts shown below,
and my (admittedly imperfect human) memory, I think that sde4 should be the
spare

Another confusing thing is that mdadm --examine for sda4 produces results that
appear to contradict the examinations of sdb4, sdc4, sdd4, and sde4.

The results for sda4 show one partition (apparently sdd4) to be "faulty
removed", but the other four examinations show sdd4 as "active sync".
Examining sda4 also shows 1 "Failed" device, whereas the remaining 4
examinations show no failures.

On the other hand, sda4, sdb4, sdc4, and sdd4 are shown as "State: active"
whereas sde4 is shown as "State: clean".  
 
The checksums look OK.

All the devices have the same UUID, which I presume, is the UUID for /dev/md1.

I haven't seen any obvious signs of hardware failure, although the following do
appear in my syslog:

[CODE]
  Apr 14 23:16:43 lonsdale mdadm[22524]: DeviceDisappeared event detected on md device /dev/md1
  Apr 14 23:23:21 lonsdale mdadm[10050]: DeviceDisappeared event detected on md device /dev/md1
  Apr 15 13:45:52 lonsdale mdadm[6780]: DeviceDisappeared event detected on md device /dev/md1
[/CODE]

I cannot afford to mess around with /dev/md1 because it contains a small amount
of business-critical data for which no totally clean back-ups exist, i.e. I
don't want to do anything that could conceivably overwrite this data until I'm
certain that I've correctly diagnosed the problem.

So please suggest non-destructive diagnostic procedures that I can follow to
narrow down the problem.

A further complication is that /dev/md1 contains my /home, /var, and /tmp
filesystems, so I can't update my kernel, initrd or mdadm via a normal apt
install.


For what it's worth, here's my /etc/mdadm/mdam.conf, /etc/fstab, and fdisk outut:

mdadm.conf
[CODE]
  DEVICE partitions
  CREATE owner=root group=disk mode=0660 auto=yes
  HOMEHOST <system>
  MAILADDR root
  # definitions of existing MD arrays
  ARRAY /dev/md0 level=raid1 num-devices=2 UUID=e1023500:94537d05:cb667a5a:bd8e784b
  ARRAY /dev/md0 level=raid1 num-devices=2 UUID=8643e320:2d6a0e4d:49d52491:42504c27
  ARRAY /dev/md1 level=raid10 num-devices=4 UUID=f4ddbd55:206c7f81:b855f41b:37d33d37
     spares=1
[/CODE]

fstab:
[CODE]
cat /etc/fstab 
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
proc            /proc           proc    defaults        0       0
# /dev/md0
UUID=5d494f0c-d723-4f15-90d6-b4d08e5fd059 /               ext3    relatime,errors=remount-ro 0       1
# /dev/sda1
UUID=2968bbbe-223f-490f-869e-1312dabdaf18 /boot           ext2    relatime        0       2
# /dev/mapper/vg--data1-lv--home on /dev/md1
UUID=8b824f93-e686-4f08-9ec2-76e754d8f06f /home           ext3    relatime        0       2
# /dev/mapper/vg--data1-lv--tmp on /dev/md1
UUID=dee6072f-ca1c-462f-9730-c277e3f8b8d9 /tmp            ext3    relatime        0       2
# /dev/mapper/vg--data1-lv--var on /dev/md1
UUID=03600db0-f72f-4021-9bb2-b8cb19f3a2a0 /var            ext3    relatime        0       2
[/CODE]

fdisk
[CODE]
Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0004b119

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          16      128488+  83  Linux
/dev/sda2              17        1232     9767520   fd  Linux RAID autodetect
/dev/sda3            1233        1354      979965   82  Linux swap / Solaris
/dev/sda4            1355      121601   965884027+  fd  Linux RAID autodetect


Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x000aca3f

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1          16      128488+  83  Linux
/dev/sdb2              17        1232     9767520   fd  Linux RAID autodetect
/dev/sdb3            1233        1354      979965   82  Linux swap / Solaris
/dev/sdb4            1355      121601   965884027+  fd  Linux RAID autodetect

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00052d44


   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1   *           1          16      128488+  83  Linux
/dev/sdc2              17        1232     9767520   83  Linux
/dev/sdc3            1233        1354      979965   82  Linux swap / Solaris
/dev/sdc4            1355      121601   965884027+  fd  Linux RAID autodetect


Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00055ffb

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1   *           1          16      128488+  83  Linux
/dev/sdd2              17        1232     9767520   83  Linux
/dev/sdd3            1233        1354      979965   82  Linux swap / Solaris
/dev/sdd4            1355      121601   965884027+  fd  Linux RAID autodetect


Disk /dev/sde: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x000249a1

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1               1          16      128488+  83  Linux
/dev/sde2              17        1232     9767520   83  Linux
/dev/sde3            1233        1354      979965   82  Linux swap / Solaris
/dev/sde4            1355      121601   965884027+  fd  Linux RAID autodetect
[/CODE]

Finally, I should point out that some of the data on the dodgy /dev/md1 is
absolutely vital and needs to be accessed urgently.

If the worst comes to the worst and I can't assemble, re-sync or re-build the
array non-destructively, I'd appreciate any advice you could give on any
possibilities that might exist for retrieving the data by reading the raw disks. 

UPDATES:

Since I started writing this message I've rebooted the machine from an
ubuntu jaunty alternate install disc and got radically different results
from "cat /proc/mdstat".

Now, mdadm thinks that /dev/md1 only consists of one partition
(/dev/sda4), marking the other 3 unidentified members as removed.

Moreover, if I try to assemble with:

Code:

$ mdadm --assemble /dev/md1 /dev/sda4 /dev/sdb4 /dev/sdc4 /dev/sdd4
/dev/sde4

It says that it can't assemble because md1 is currently active ...
whereas /proc/mstat says that it is inactive.

Any ideas?

N.B. I am having difficulty getting remote access to/from the machine
concerned, so copy + paste code is currently unavailable.

I've also seen this bug report, which appears to describe similar
symptoms:

   http://bugzilla.kernel.org/show_bug.cgi?id=11967

Dave

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RAID 10 array won't assemble, all devices marked spare, confusing mdadm metadata
  2009-04-17 14:14 RAID 10 array won't assemble, all devices marked spare, confusing mdadm metadata Dave Fisher
@ 2009-04-17 22:38 ` Neil Brown
  2009-04-18 14:43   ` Dave Fisher
  2009-04-18 20:13   ` RAID 10 array won't assemble ... spare ... metadata - Disappointing Report Back Dave Fisher
  0 siblings, 2 replies; 5+ messages in thread
From: Neil Brown @ 2009-04-17 22:38 UTC (permalink / raw)
  To: Dave Fisher; +Cc: linux-raid

On Friday April 17, davef@davefisher.co.uk wrote:
> Hi,
> 
> Please forgive me if I'm posting to the wrong list because I've
> misunderstood the list's parameters.  The osdl wiki, and the list
> archives it points to, suggest that this is *not* a developer-only
> list. If I've got that wrong, please redirect me to somewhere more
> appropriate.

No, this is not developer only.  Not at all.

> 
> I need help to diagnose a RAID 10 failure, but I'm struggling to find
> anyone with genuinely authoritive knowledge of Linux software raid who
> might be willing to spare a little time to help me out.

You've come to the right place.


> 
> The affected system is an Ubuntu 8.10 amd64 server, running a 2.6.27-11
> kernel.
> 
> I have two RAID arrays:
> 
> [CODE]
>   $ sudo mdadm --examine --scan -v
>   ARRAY /dev/md0 level=raid1 num-devices=2 UUID=e1023500:94537d05:cb667a5a:bd8e784b
>      spares=1   devices=/dev/sde2,/dev/sdd2,/dev/sdc2,/dev/sdb2,/dev/sda2
>   ARRAY /dev/md1 level=raid10 num-devices=4 UUID=f4ddbd55:206c7f81:b855f41b:37d33d37
>      spares=1   devices=/dev/sde4,/dev/sdd4,/dev/sdc4,/dev/sdb4,/dev/sda4
> [/CODE]
> 
> /dev/md1 doesn't assemble on boot, and I can't assemble it manually (although
> that might be because I don't know how to):
> 
> [CODE]
>   $ sudo mdadm --assemble /dev/md1 /dev/sda4 /dev/sdb4 /dev/sdc4 /dev/sdd4 /dev/sde4
>   mdadm: /dev/md1 assembled from 1 drive and 1 spare - not enough to start the array.
> [/CODE]

Extracting a summary from the
> $ sudo mdadm -E /dev/sd{a,b,c,d,e}4
information you provided (thanks for being thorough),

> [CODE]
> $ sudo mdadm -E /dev/sd{a,b,c,d,e}4
> /dev/sda4:
>     Update Time : Tue Apr 14 00:45:27 2009
>           State : active
>          Events : 221
> /dev/sdb4:
>     Update Time : Tue Apr 14 00:44:13 2009
>           State : active
>          Events : 219
> /dev/sdc4:
>     Update Time : Tue Apr 14 00:44:13 2009
>           State : active
>          Events : 219
> /dev/sdd4:
>     Update Time : Tue Apr 14 00:44:13 2009
>           State : active
>          Events : 219
> /dev/sde4:
>     Update Time : Fri Apr 10 16:43:47 2009
>           State : clean
>          Events : 218
> [/CODE]

So sda is most up-to-date.  sd[bcd]4 were updated 74 seconds
earlier. and sde4 has not been updated for 4 days.
So it looks like sde4 failed first, and then when it
tried to update the metadata on the array, sda4 worked but all the
rest failed to get updated.

Given that sda thinks the array is still intact as we can see from

>       Number   Major   Minor   RaidDevice State
> this     0       8       20        0      active sync   /dev/sdb4
> 
>    0     0       8       20        0      active sync   /dev/sdb4
>    1     1       8       36        1      active sync   /dev/sdc4
>    2     2       0        0        2      faulty removed
>    3     3       8       68        3      active sync   /dev/sde4
>    4     4       8       84        4      spare

Your data is safe and we can get it back.
Note that the device currently known as /dev/sda4 thought, last time
the metadata was updated, that it's name was /dev/sdb4.  This suggests
some rearrangement of devices has happened.  This can be confusing,
but mdadm copes without any problem.
To restart your array, simple use the "--force" flag.
It might be valuable to also add "--verbose" so you can see what is
happening.
So:

  mdadm -S /dev/md1
  mdadm -A /dev/md1 -fv /dev/sd[abcde]4

and report the result.


> 
> As you can see from mdstat, the kernel appears to have marked all the
> partitions in /dev/md1 as spare (S):
> 
> [CODE]
>   $ cat /proc/mdstat
>   Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
>   md1 : inactive sda4[0](S) sde4[4](S) sdd4[3](S) sdc4[2](S) sdb4[1](S)
>         4829419520 blocks

When an array is 'inactive', everything is 'spare'.  It's an internal
implementation detail.  Probably confusing, yes.

> 
> As you can see, sda4 is not explicitly listed in any of the tables above.

This is presumably because names changed between tables being written.

> 
> I am guessing that this is because mdadm thinks that sda4 is the actual spare.  
> 
> I'm not sure that mdadm is correct.  Based on the fdisk readouts shown below,
> and my (admittedly imperfect human) memory, I think that sde4 should be the
> spare
> 
> Another confusing thing is that mdadm --examine for sda4 produces results that
> appear to contradict the examinations of sdb4, sdc4, sdd4, and sde4.
> 
> The results for sda4 show one partition (apparently sdd4) to be "faulty
> removed", but the other four examinations show sdd4 as "active sync".
> Examining sda4 also shows 1 "Failed" device, whereas the remaining 4
> examinations show no failures.
> 
> On the other hand, sda4, sdb4, sdc4, and sdd4 are shown as "State: active"
> whereas sde4 is shown as "State: clean".  

So this is what probably happened:
 The array has been idle, or read-only, since April 10, so no metadata
 updates happened and all devices were at "Events 218" and were 'clean'.
 At Apr 14 00:44:13, something tried to write to the array so it had
 to be marked 'active'.  This involves writing the metadata to every
 device and updating the Event count to 219.
 This worked for 3 of the 4 devices in the array.  For the 4th -
 currently called '/dev/sde4' but at the time called /dev/sdd4, the
 metadata update failed.
 So md/raid10 decided that device was faulty and so tried to update
 the metadata to record this fact.
 That metadata update succeeded for one devices, currently called sda4
 but at the time was called sdb4.  For the rest of the devices the
 update failed.  At this point the array would (understandably) no
 longer work.

 mdadm can fix this up for you with the "-f" flag.  It will probably
 trigger a resync, just to be on the safe side.  But all your data
 will be there.

NeilBrown

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RAID 10 array won't assemble, all devices marked spare, confusing mdadm metadata
  2009-04-17 22:38 ` Neil Brown
@ 2009-04-18 14:43   ` Dave Fisher
  2009-04-18 20:13   ` RAID 10 array won't assemble ... spare ... metadata - Disappointing Report Back Dave Fisher
  1 sibling, 0 replies; 5+ messages in thread
From: Dave Fisher @ 2009-04-18 14:43 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On Sat, Apr 18, 2009 at 08:38:59AM +1000, Neil Brown wrote:
> To restart your array, simple use the "--force" flag.
> It might be valuable to also add "--verbose" so you can see what is
> happening.
> So:
> 
>   mdadm -S /dev/md1
>   mdadm -A /dev/md1 -fv /dev/sd[abcde]4
> 
> and report the result.

Hi Neil,

Thanks for the kind and exceptionally helpful response. It was much
appreciated.  It greatly improved my understanding of both the problem
and md generally.

As it happens, I'd issued the following command some time before your
message arrived:

  $ mdadm --assemble --run /dev/md1 /dev/sd[bcde]4

Fortunately, my impatience (after nearly a week of grief) was not
rewarded by the punishment it probably deserved ;-) [Note 1]

After many more hours /dev/md1 had resynched and recovered, using sdbe4
as the spare.

The system then rebooted perfectly.

Unfortunately, I then stupidly installed a new kernel, forgetting that
Debian/Ubuntu would then update grub.  

So now I can boot kernels which reside on /dev/md0 from grub, but they
don't find the root file system, which is also on /dev/md0.  

Needless to say, I'd find this a little easier to understand if the
booting kernel didn't actually sit on the root filesystem itself.

Obviously I need to fix grub, but I'm not sure how.  I'm guessing that I
have to do one or both of the following:

  1. Tweak the root parameters for grub or the kernel in menu.lst

  2. Update/reinstall parts of the bootloader on my MBRs or partitions.  

Sadly, my understanding of grub is a bit flaky, and my understanding of
how it handles raid arrays is even flakier.

I'd appreciate some advice, but fully understand that the new problem
is slightly off-topic for this list, so I'm not expecting anything.

By the way, once I've sorted grub (and backed up my business-critical
data), I'll run your suggested command on the original disks and report
back.

In the real world I have decades of experience as a technical writer and
teacher.  So, once I've educated myself a bit more, I'd like to
contribute by helping to improve the fragmented and outdated
documentation that I've found over the course of the last week. 

Best wishes,

Dave

[Note 1] I should add that my impatient act was not 100% reckless,
since I had previously dd'd all 5 1TB disks and was operating only on
the copies.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RAID 10 array won't assemble ... spare ... metadata - Disappointing Report Back
  2009-04-17 22:38 ` Neil Brown
  2009-04-18 14:43   ` Dave Fisher
@ 2009-04-18 20:13   ` Dave Fisher
  2009-04-18 20:20     ` RAID 10 array won't assemble ... spare ... metadata - Correction " Dave Fisher
  1 sibling, 1 reply; 5+ messages in thread
From: Dave Fisher @ 2009-04-18 20:13 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On Sat, Apr 18, 2009 at 08:38:59AM +1000, Neil Brown wrote:
> So:
> 
>   mdadm -S /dev/md1
>   mdadm -A /dev/md1 -fv /dev/sd[abcde]4
> 
> and report the result.

Output transcribed by hand:

    mdadm: looking for devices for /dev/md1
    mdadm: /dev/sda4 is identified as a member of /dev/md1 slot 0
    mdadm: /dev/sdb4 is identified as a member of /dev/md1 slot 1
    mdadm: /dev/sdc4 is identified as a member of /dev/md1 slot 2
    mdadm: /dev/sdd4 is identified as a member of /dev/md1 slot 3
    mdadm: /dev/sde4 is identified as a member of /dev/md1 slot 4
    mdadm: forcing event count in /dev/sdb4(1) from 219 upto 221

But when I ran mdadm -E on each of the partitions the counts were
competely unchanged, i.e. sd[bcde]4 were still all at 219.

sda4 was still at 221

Everything else looked exactly the same, including the summary tables at
the end.

Any thoughts?

Do I need to start/run the array to affect the changes?

Dave

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RAID 10 array won't assemble ... spare ... metadata - Correction Disappointing Report Back
  2009-04-18 20:13   ` RAID 10 array won't assemble ... spare ... metadata - Disappointing Report Back Dave Fisher
@ 2009-04-18 20:20     ` Dave Fisher
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Fisher @ 2009-04-18 20:20 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On Sat, Apr 18, 2009 at 09:13:28PM +0100, Dave Fisher wrote:
> But when I ran mdadm -E on each of the partitions the counts were
> competely unchanged, i.e. sd[bcde]4 were still all at 219.
> 
> sda4 was still at 221

The above should have read like so:

But when I ran mdadm -E on each of the partitions the counts were
competely unchanged, i.e. 
  
  sd[bcd]4 were still all at 219
  
  sde4 was still at 218 
  
  sda4 was still at 221

Dave


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-04-18 20:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-17 14:14 RAID 10 array won't assemble, all devices marked spare, confusing mdadm metadata Dave Fisher
2009-04-17 22:38 ` Neil Brown
2009-04-18 14:43   ` Dave Fisher
2009-04-18 20:13   ` RAID 10 array won't assemble ... spare ... metadata - Disappointing Report Back Dave Fisher
2009-04-18 20:20     ` RAID 10 array won't assemble ... spare ... metadata - Correction " Dave Fisher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.