All of lore.kernel.org
 help / color / mirror / Atom feed
* mdadm ignoring X as it reports Y as failed
@ 2013-07-06 19:06 Marek Jaros
  2013-07-08  1:28 ` NeilBrown
  2013-07-08  3:58 ` NeilBrown
  0 siblings, 2 replies; 3+ messages in thread
From: Marek Jaros @ 2013-07-06 19:06 UTC (permalink / raw)
  To: linux-raid

Hey everybody.

To keep it short, I have a RAID-5 mdraid, just today 2 out of the 5  
drives dropped out. It was a cable issue and has since been fixed. The  
array was not being written to or utilized in other way so no data has  
been lost.

However when I attempted to reassemble the array with

mdadm --assemble --force --verbose /dev/md0 /dev/sdc /dev/sdd /dev/sde  
/dev/sdf /dev/sdg


I got the folowing errors

mdadm: looking for devices for /dev/md0
mdadm: /dev/sdc is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdd is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sde is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdf is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdg is identified as a member of /dev/md0, slot 4.
mdadm: ignoring /dev/sde as it reports /dev/sdc as failed
mdadm: ignoring /dev/sdf as it reports /dev/sdc as failed
mdadm: ignoring /dev/sdg as it reports /dev/sdc as failed
mdadm: added /dev/sdd to /dev/md0 as 1
mdadm: no uptodate device for slot 2 of /dev/md0
mdadm: no uptodate device for slot 3 of /dev/md0
mdadm: no uptodate device for slot 4 of /dev/md0
mdadm: added /dev/sdc to /dev/md0 as 0
mdadm: /dev/md0 assembled from 2 drives - not enough to start the array.


After doing --examine* I indeed found out that the StateArray info  
inside the superblock has marked the first two drives as missing. That  
is however not true anymore but I can't force it to assemble the array  
or update the superblock info.

So is there anyway to force mdadm to assemble the array? Or perhaps  
edit the superblock info manually? I'd rather avoid having to recreate  
the array from scratch.

Any help or pointers with more info are highly appreciated. Thank you.


Regards,

Marek Jaros


*--examine output here:

/dev/sdc:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 43222604:a15a957d:313d7b6c:d087ba6a
            Name : YSGARD:0
   Creation Time : Thu Apr 25 12:38:17 2013
      Raid Level : raid5
    Raid Devices : 5

  Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
      Array Size : 3906525184 (3725.55 GiB 4000.28 GB)
   Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : d2a4d8ed:b2c2bf12:6d553d16:7aabe804

     Update Time : Sat Jul  6 14:43:14 2013
        Checksum : 1b569235 - correct
          Events : 2742

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 0
    Array State : AAAAA ('A' == active, '.' == missing)
/dev/sdd:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 43222604:a15a957d:313d7b6c:d087ba6a
            Name : YSGARD:0
   Creation Time : Thu Apr 25 12:38:17 2013
      Raid Level : raid5
    Raid Devices : 5

  Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
      Array Size : 3906525184 (3725.55 GiB 4000.28 GB)
   Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : f729ddca:adb2eaec:748ad0b6:0e321903

     Update Time : Sat Jul  6 14:29:42 2013
        Checksum : 7149bfc6 - correct
          Events : 2742

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 1
    Array State : AAAAA ('A' == active, '.' == missing)
/dev/sde:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 43222604:a15a957d:313d7b6c:d087ba6a
            Name : YSGARD:0
   Creation Time : Thu Apr 25 12:38:17 2013
      Raid Level : raid5
    Raid Devices : 5

  Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
      Array Size : 3906525184 (3725.55 GiB 4000.28 GB)
   Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : 83aea2f7:dc6a7d68:0d875248:c54d2bc9

     Update Time : Sat Jul  6 14:46:15 2013
        Checksum : 713418b2 - correct
          Events : 2742

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 2
    Array State : ..AAA ('A' == active, '.' == missing)
/dev/sdg:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 43222604:a15a957d:313d7b6c:d087ba6a
            Name : YSGARD:0
   Creation Time : Thu Apr 25 12:38:17 2013
      Raid Level : raid5
    Raid Devices : 5

  Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
      Array Size : 3906525184 (3725.55 GiB 4000.28 GB)
   Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : b8295773:a4a73e18:6a097100:15ecc829

     Update Time : Sat Jul  6 14:46:15 2013
        Checksum : b565f15d - correct
          Events : 2742

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 4
    Array State : ..AAA ('A' == active, '.' == missing)

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: mdadm ignoring X as it reports Y as failed
  2013-07-06 19:06 mdadm ignoring X as it reports Y as failed Marek Jaros
@ 2013-07-08  1:28 ` NeilBrown
  2013-07-08  3:58 ` NeilBrown
  1 sibling, 0 replies; 3+ messages in thread
From: NeilBrown @ 2013-07-08  1:28 UTC (permalink / raw)
  To: Marek Jaros; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 6381 bytes --]

On Sat, 06 Jul 2013 21:06:14 +0200 "Marek Jaros" <mjaros1@nbox.cz> wrote:

> Hey everybody.
> 
> To keep it short, I have a RAID-5 mdraid, just today 2 out of the 5  
> drives dropped out. It was a cable issue and has since been fixed. The  
> array was not being written to or utilized in other way so no data has  
> been lost.
> 
> However when I attempted to reassemble the array with
> 
> mdadm --assemble --force --verbose /dev/md0 /dev/sdc /dev/sdd /dev/sde  
> /dev/sdf /dev/sdg
> 
> 
> I got the folowing errors
> 
> mdadm: looking for devices for /dev/md0
> mdadm: /dev/sdc is identified as a member of /dev/md0, slot 0.
> mdadm: /dev/sdd is identified as a member of /dev/md0, slot 1.
> mdadm: /dev/sde is identified as a member of /dev/md0, slot 2.
> mdadm: /dev/sdf is identified as a member of /dev/md0, slot 3.
> mdadm: /dev/sdg is identified as a member of /dev/md0, slot 4.
> mdadm: ignoring /dev/sde as it reports /dev/sdc as failed
> mdadm: ignoring /dev/sdf as it reports /dev/sdc as failed
> mdadm: ignoring /dev/sdg as it reports /dev/sdc as failed
> mdadm: added /dev/sdd to /dev/md0 as 1
> mdadm: no uptodate device for slot 2 of /dev/md0
> mdadm: no uptodate device for slot 3 of /dev/md0
> mdadm: no uptodate device for slot 4 of /dev/md0
> mdadm: added /dev/sdc to /dev/md0 as 0
> mdadm: /dev/md0 assembled from 2 drives - not enough to start the array.
> 

Try the same assemble command, but rearrange the devices so that one of sde
sdf sdg is first.  That might work.

The super blocks all have the same event count, but report different things
about which devices have failed.  That shouldn't really happen but it
obviously did.
I'll see if I can figure out how to make mdadm cope correctly with this
situation.

Thanks for the report,

NeilBrown


> 
> After doing --examine* I indeed found out that the StateArray info  
> inside the superblock has marked the first two drives as missing. That  
> is however not true anymore but I can't force it to assemble the array  
> or update the superblock info.
> 
> So is there anyway to force mdadm to assemble the array? Or perhaps  
> edit the superblock info manually? I'd rather avoid having to recreate  
> the array from scratch.
> 
> Any help or pointers with more info are highly appreciated. Thank you.
> 
> 
> Regards,
> 
> Marek Jaros
> 
> 
> *--examine output here:
> 
> /dev/sdc:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0x0
>       Array UUID : 43222604:a15a957d:313d7b6c:d087ba6a
>             Name : YSGARD:0
>    Creation Time : Thu Apr 25 12:38:17 2013
>       Raid Level : raid5
>     Raid Devices : 5
> 
>   Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
>       Array Size : 3906525184 (3725.55 GiB 4000.28 GB)
>    Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
>      Data Offset : 262144 sectors
>     Super Offset : 8 sectors
>            State : clean
>      Device UUID : d2a4d8ed:b2c2bf12:6d553d16:7aabe804
> 
>      Update Time : Sat Jul  6 14:43:14 2013
>         Checksum : 1b569235 - correct
>           Events : 2742
> 
>           Layout : left-symmetric
>       Chunk Size : 512K
> 
>     Device Role : Active device 0
>     Array State : AAAAA ('A' == active, '.' == missing)
> /dev/sdd:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0x0
>       Array UUID : 43222604:a15a957d:313d7b6c:d087ba6a
>             Name : YSGARD:0
>    Creation Time : Thu Apr 25 12:38:17 2013
>       Raid Level : raid5
>     Raid Devices : 5
> 
>   Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
>       Array Size : 3906525184 (3725.55 GiB 4000.28 GB)
>    Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
>      Data Offset : 262144 sectors
>     Super Offset : 8 sectors
>            State : clean
>      Device UUID : f729ddca:adb2eaec:748ad0b6:0e321903
> 
>      Update Time : Sat Jul  6 14:29:42 2013
>         Checksum : 7149bfc6 - correct
>           Events : 2742
> 
>           Layout : left-symmetric
>       Chunk Size : 512K
> 
>     Device Role : Active device 1
>     Array State : AAAAA ('A' == active, '.' == missing)
> /dev/sde:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0x0
>       Array UUID : 43222604:a15a957d:313d7b6c:d087ba6a
>             Name : YSGARD:0
>    Creation Time : Thu Apr 25 12:38:17 2013
>       Raid Level : raid5
>     Raid Devices : 5
> 
>   Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
>       Array Size : 3906525184 (3725.55 GiB 4000.28 GB)
>    Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
>      Data Offset : 262144 sectors
>     Super Offset : 8 sectors
>            State : clean
>      Device UUID : 83aea2f7:dc6a7d68:0d875248:c54d2bc9
> 
>      Update Time : Sat Jul  6 14:46:15 2013
>         Checksum : 713418b2 - correct
>           Events : 2742
> 
>           Layout : left-symmetric
>       Chunk Size : 512K
> 
>     Device Role : Active device 2
>     Array State : ..AAA ('A' == active, '.' == missing)
> /dev/sdg:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0x0
>       Array UUID : 43222604:a15a957d:313d7b6c:d087ba6a
>             Name : YSGARD:0
>    Creation Time : Thu Apr 25 12:38:17 2013
>       Raid Level : raid5
>     Raid Devices : 5
> 
>   Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
>       Array Size : 3906525184 (3725.55 GiB 4000.28 GB)
>    Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
>      Data Offset : 262144 sectors
>     Super Offset : 8 sectors
>            State : clean
>      Device UUID : b8295773:a4a73e18:6a097100:15ecc829
> 
>      Update Time : Sat Jul  6 14:46:15 2013
>         Checksum : b565f15d - correct
>           Events : 2742
> 
>           Layout : left-symmetric
>       Chunk Size : 512K
> 
>     Device Role : Active device 4
>     Array State : ..AAA ('A' == active, '.' == missing)
> 
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: mdadm ignoring X as it reports Y as failed
  2013-07-06 19:06 mdadm ignoring X as it reports Y as failed Marek Jaros
  2013-07-08  1:28 ` NeilBrown
@ 2013-07-08  3:58 ` NeilBrown
  1 sibling, 0 replies; 3+ messages in thread
From: NeilBrown @ 2013-07-08  3:58 UTC (permalink / raw)
  To: Marek Jaros; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3833 bytes --]

On Sat, 06 Jul 2013 21:06:14 +0200 "Marek Jaros" <mjaros1@nbox.cz> wrote:

> Hey everybody.
> 
> To keep it short, I have a RAID-5 mdraid, just today 2 out of the 5  
> drives dropped out. It was a cable issue and has since been fixed. The  
> array was not being written to or utilized in other way so no data has  
> been lost.
> 
> However when I attempted to reassemble the array with
> 
> mdadm --assemble --force --verbose /dev/md0 /dev/sdc /dev/sdd /dev/sde  
> /dev/sdf /dev/sdg
> 
> 
> I got the folowing errors
> 
> mdadm: looking for devices for /dev/md0
> mdadm: /dev/sdc is identified as a member of /dev/md0, slot 0.
> mdadm: /dev/sdd is identified as a member of /dev/md0, slot 1.
> mdadm: /dev/sde is identified as a member of /dev/md0, slot 2.
> mdadm: /dev/sdf is identified as a member of /dev/md0, slot 3.
> mdadm: /dev/sdg is identified as a member of /dev/md0, slot 4.
> mdadm: ignoring /dev/sde as it reports /dev/sdc as failed
> mdadm: ignoring /dev/sdf as it reports /dev/sdc as failed
> mdadm: ignoring /dev/sdg as it reports /dev/sdc as failed
> mdadm: added /dev/sdd to /dev/md0 as 1
> mdadm: no uptodate device for slot 2 of /dev/md0
> mdadm: no uptodate device for slot 3 of /dev/md0
> mdadm: no uptodate device for slot 4 of /dev/md0
> mdadm: added /dev/sdc to /dev/md0 as 0
> mdadm: /dev/md0 assembled from 2 drives - not enough to start the array.
> 
> 
> After doing --examine* I indeed found out that the StateArray info  
> inside the superblock has marked the first two drives as missing. That  
> is however not true anymore but I can't force it to assemble the array  
> or update the superblock info.
> 
> So is there anyway to force mdadm to assemble the array? Or perhaps  
> edit the superblock info manually? I'd rather avoid having to recreate  
> the array from scratch.
> 
> Any help or pointers with more info are highly appreciated. Thank you.
> 

Hi again,
 could you tell me what kernel you are running?  Because as far as I can tell
 the state of the devices that you reported is impossible!

The interesting bit of the --examine output is:

/dev/sdc:
     Update Time : Sat Jul  6 14:43:14 2013
          Events : 2742
    Array State : AAAAA ('A' == active, '.' == missing)
/dev/sdd:
     Update Time : Sat Jul  6 14:29:42 2013
          Events : 2742
    Array State : AAAAA ('A' == active, '.' == missing)
/dev/sde:
     Update Time : Sat Jul  6 14:46:15 2013
          Events : 2742
    Array State : ..AAA ('A' == active, '.' == missing)
/dev/sdg:
     Update Time : Sat Jul  6 14:46:15 2013
          Events : 2742
    Array State : ..AAA ('A' == active, '.' == missing)

 From this I can see that:
   at 14:29:42 everything was fine and all the superblocks were updated.
   at 14:43:13 everything still seemed to be fine and md tried to update the
               superblock again (it does that from time to time) but failed
               to write to /dev/sdd.  This would have triggered an error so
               it would have marked sdd as faulty and updated the superblocks
               again.
Probably when it tried it found that the write to sdc failed to, so it marked
that as faulty and tried again.
   at 14:46:15 it wrote out metadata to sde and sdg reporting that sdc and sdd
                were faulty.

Every time that it updates the superblock when the array is degraded it must
update the 'Events' count.  However the Events count at 14:46:15 (after 2
devices have failed) is the same as it was at 14:43:14 before anything had
failed.  That is really wrong.

Hence the question.  I need to know if this is a bug that has already been
fixed (I cannot find a fix, but you never know), or if the bug is still
present and I need to hunt some more.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-07-08  3:58 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-06 19:06 mdadm ignoring X as it reports Y as failed Marek Jaros
2013-07-08  1:28 ` NeilBrown
2013-07-08  3:58 ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.