All of lore.kernel.org
 help / color / mirror / Atom feed
* md's fail to assemble correctly consistently at system startup - mdadm 3.1.2 and Ubuntu 10.04
@ 2010-08-08  1:27 fibreraid
  2010-08-08  8:58 ` Neil Brown
  0 siblings, 1 reply; 11+ messages in thread
From: fibreraid @ 2010-08-08  1:27 UTC (permalink / raw)
  To: linux-raid

Hi all,

I am facing a serious issue with md's on my Ubuntu 10.04 64-bit
server. I am using mdadm 3.1.2. The system has 40 drives in it, and
there are 10 md devices, which are a combination of RAID 0, 1, 5, 6,
and 10 levels. The drives are connected via LSI SAS adapters in
external SAS JBODs.

When I boot the system, about 50% of the time, the md's will not come
up correctly. Instead of md0-md9 being active, some or all will be
inactive and there will be new md's like md127, md126, md125, etc.

Here is the output of /proc/mdstat when all md's come up correctly:


Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md0 : active raid6 sdj1[6] sdk1[7] sdf1[2] sdb1[10] sdg1[3] sdl1[8](S)
sdh1[4] sdm1[9] sde1[1] sdi1[12](S) sdc1[11] sdd1[0]
      1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2
[10/10] [UUUUUUUUUU]

md9 : active raid0 sdao1[1] sdan1[0]
      976765440 blocks super 1.2 256k chunks

md8 : active raid0 sdam1[1] sdal1[0]
      976765440 blocks super 1.2 256k chunks

md7 : active raid0 sdak1[1] sdaj1[0]
      976765888 blocks super 1.2 4k chunks

md6 : active raid0 sdai1[1] sdah1[0]
      976765696 blocks super 1.2 128k chunks

md5 : active raid0 sdag1[1] sdaf1[0]
      976765440 blocks super 1.2 256k chunks

md4 : active raid0 sdae1[1] sdad1[0]
      976765888 blocks super 1.2 32k chunks

md3 : active raid1 sdac1[1] sdab1[0]
      195357272 blocks super 1.2 [2/2] [UU]

md2 : active raid0 sdaa1[0] sdz1[1]
      62490672 blocks super 1.2 4k chunks

md1 : active raid5 sdy1[10] sdx1[9] sdw1[8] sdv1[7] sdu1[6] sdt1[5]
sds1[4] sdr1[3] sdq1[2] sdp1[11](S) sdo1[1] sdn1[0]
      2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2
[11/11] [UUUUUUUUUUU]

unused devices: <none>


--------------------------------------------------------------------------------------------------------------------------


Here are several examples of when they do not come up correctly.
Again, I am not making any configuration changes; I just reboot the
system and check /proc/mdstat several minutes after it is fully
booted.


Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md124 : inactive sdam1[1](S)
      488382944 blocks super 1.2

md125 : inactive sdag1[1](S)
      488382944 blocks super 1.2

md7 : active raid0 sdaj1[0] sdak1[1]
      976765888 blocks super 1.2 4k chunks

md126 : inactive sdw1[8](S) sdn1[0](S) sdo1[1](S) sdu1[6](S)
sdq1[2](S) sdx1[9](S)
      1757761512 blocks super 1.2

md9 : active raid0 sdan1[0] sdao1[1]
      976765440 blocks super 1.2 256k chunks

md6 : inactive sdah1[0](S)
      488382944 blocks super 1.2

md4 : inactive sdae1[1](S)
      488382944 blocks super 1.2

md8 : inactive sdal1[0](S)
      488382944 blocks super 1.2

md127 : inactive sdg1[3](S) sdl1[8](S) sdc1[11](S) sdi1[12](S)
sdf1[2](S) sdb1[10](S)
      860226027 blocks super 1.2

md5 : inactive sdaf1[0](S)
      488382944 blocks super 1.2

md1 : inactive sdr1[3](S) sdp1[11](S) sdt1[5](S) sds1[4](S)
sdy1[10](S) sdv1[7](S)
      1757761512 blocks super 1.2

md0 : inactive sde1[1](S) sdh1[4](S) sdm1[9](S) sdj1[6](S) sdd1[0](S) sdk1[7](S)
      860226027 blocks super 1.2

md3 : inactive sdab1[0](S)
      195357344 blocks super 1.2

md2 : active raid0 sdaa1[0] sdz1[1]
      62490672 blocks super 1.2 4k chunks

unused devices: <none>


---------------------------------------------------------------------------------------------------------------------------


Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md126 : inactive sdaf1[0](S)
      488382944 blocks super 1.2

md127 : inactive sdae1[1](S)
      488382944 blocks super 1.2

md9 : active raid0 sdan1[0] sdao1[1]
      976765440 blocks super 1.2 256k chunks

md7 : active raid0 sdaj1[0] sdak1[1]
      976765888 blocks super 1.2 4k chunks

md4 : inactive sdad1[0](S)
      488382944 blocks super 1.2

md6 : active raid0 sdah1[0] sdai1[1]
      976765696 blocks super 1.2 128k chunks

md8 : active raid0 sdam1[1] sdal1[0]
      976765440 blocks super 1.2 256k chunks

md5 : inactive sdag1[1](S)
      488382944 blocks super 1.2

md0 : active raid6 sdc1[11] sdd1[0] sdh1[4] sdf1[2] sdm1[9] sde1[1]
sdb1[10] sdg1[3] sdl1[8](S) sdj1[6] sdk1[7] sdi1[12](S)
      1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2
[10/10] [UUUUUUUUUU]

md1 : active raid5 sdq1[2] sdy1[10] sdv1[7] sdn1[0] sdt1[5] sdw1[8]
sdp1[11](S) sdr1[3] sdu1[6] sdx1[9] sdo1[1] sds1[4]
      2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2
[11/11] [UUUUUUUUUUU]

md3 : active raid1 sdac1[1] sdab1[0]
      195357272 blocks super 1.2 [2/2] [UU]

md2 : active raid0 sdz1[1] sdaa1[0]
      62490672 blocks super 1.2 4k chunks

unused devices: <none>


--------------------------------------------------------------------------------------------------------------------------


Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md127 : inactive sdab1[0](S)
      195357344 blocks super 1.2

md4 : active raid0 sdad1[0] sdae1[1]
      976765888 blocks super 1.2 32k chunks

md7 : active raid0 sdak1[1] sdaj1[0]
      976765888 blocks super 1.2 4k chunks

md8 : active raid0 sdam1[1] sdal1[0]
      976765440 blocks super 1.2 256k chunks

md6 : active raid0 sdah1[0] sdai1[1]
      976765696 blocks super 1.2 128k chunks

md9 : active raid0 sdao1[1] sdan1[0]
      976765440 blocks super 1.2 256k chunks

md5 : active raid0 sdaf1[0] sdag1[1]
      976765440 blocks super 1.2 256k chunks

md1 : active raid5 sdy1[10] sdv1[7] sdu1[6] sds1[4] sdq1[2]
sdp1[11](S) sdt1[5] sdo1[1] sdx1[9] sdr1[3] sdw1[8] sdn1[0]
      2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2
[11/11] [UUUUUUUUUUU]

md0 : active raid6 sdl1[8](S) sdd1[0] sdc1[11] sdg1[3] sdk1[7] sde1[1]
sdm1[9] sdb1[10] sdi1[12](S) sdh1[4] sdf1[2] sdj1[6]
      1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2
[10/10] [UUUUUUUUUU]

md3 : inactive sdac1[1](S)
      195357344 blocks super 1.2

md2 : active raid0 sdz1[1] sdaa1[0]
      62490672 blocks super 1.2 4k chunks

unused devices: <none>



My mdadm.conf file is as follows:


# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays

# This file was auto-generated on Sun, 13 Jul 2008 20:42:57 -0500
# by mkconf $Id$




Any insight would be greatly appreciated. This is a big problem as it
is now. Thank you very much in advance!

Best,
-Tommy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: md's fail to assemble correctly consistently at system startup - mdadm 3.1.2 and Ubuntu 10.04
  2010-08-08  1:27 md's fail to assemble correctly consistently at system startup - mdadm 3.1.2 and Ubuntu 10.04 fibreraid
@ 2010-08-08  8:58 ` Neil Brown
  2010-08-08 14:26   ` fibreraid
  0 siblings, 1 reply; 11+ messages in thread
From: Neil Brown @ 2010-08-08  8:58 UTC (permalink / raw)
  To: fibreraid; +Cc: linux-raid

On Sat, 7 Aug 2010 18:27:58 -0700
"fibreraid@gmail.com" <fibreraid@gmail.com> wrote:

> Hi all,
> 
> I am facing a serious issue with md's on my Ubuntu 10.04 64-bit
> server. I am using mdadm 3.1.2. The system has 40 drives in it, and
> there are 10 md devices, which are a combination of RAID 0, 1, 5, 6,
> and 10 levels. The drives are connected via LSI SAS adapters in
> external SAS JBODs.
> 
> When I boot the system, about 50% of the time, the md's will not come
> up correctly. Instead of md0-md9 being active, some or all will be
> inactive and there will be new md's like md127, md126, md125, etc.

Sounds like a locking problem - udev is calling "mdadm -I" on each device and
might call some in parallel.  mdadm needs to serialise things to ensure this
sort of confusion doesn't happen.

It is possible that this is fixed in the just-released mdadm-3.1.3.  If you
could test and and see if it makes a difference that would help a lot.

Thanks,
NeilBrown

> 
> Here is the output of /proc/mdstat when all md's come up correctly:
> 
> 
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md0 : active raid6 sdj1[6] sdk1[7] sdf1[2] sdb1[10] sdg1[3] sdl1[8](S)
> sdh1[4] sdm1[9] sde1[1] sdi1[12](S) sdc1[11] sdd1[0]
>       1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2
> [10/10] [UUUUUUUUUU]
> 
> md9 : active raid0 sdao1[1] sdan1[0]
>       976765440 blocks super 1.2 256k chunks
> 
> md8 : active raid0 sdam1[1] sdal1[0]
>       976765440 blocks super 1.2 256k chunks
> 
> md7 : active raid0 sdak1[1] sdaj1[0]
>       976765888 blocks super 1.2 4k chunks
> 
> md6 : active raid0 sdai1[1] sdah1[0]
>       976765696 blocks super 1.2 128k chunks
> 
> md5 : active raid0 sdag1[1] sdaf1[0]
>       976765440 blocks super 1.2 256k chunks
> 
> md4 : active raid0 sdae1[1] sdad1[0]
>       976765888 blocks super 1.2 32k chunks
> 
> md3 : active raid1 sdac1[1] sdab1[0]
>       195357272 blocks super 1.2 [2/2] [UU]
> 
> md2 : active raid0 sdaa1[0] sdz1[1]
>       62490672 blocks super 1.2 4k chunks
> 
> md1 : active raid5 sdy1[10] sdx1[9] sdw1[8] sdv1[7] sdu1[6] sdt1[5]
> sds1[4] sdr1[3] sdq1[2] sdp1[11](S) sdo1[1] sdn1[0]
>       2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2
> [11/11] [UUUUUUUUUUU]
> 
> unused devices: <none>
> 
> 
> --------------------------------------------------------------------------------------------------------------------------
> 
> 
> Here are several examples of when they do not come up correctly.
> Again, I am not making any configuration changes; I just reboot the
> system and check /proc/mdstat several minutes after it is fully
> booted.
> 
> 
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md124 : inactive sdam1[1](S)
>       488382944 blocks super 1.2
> 
> md125 : inactive sdag1[1](S)
>       488382944 blocks super 1.2
> 
> md7 : active raid0 sdaj1[0] sdak1[1]
>       976765888 blocks super 1.2 4k chunks
> 
> md126 : inactive sdw1[8](S) sdn1[0](S) sdo1[1](S) sdu1[6](S)
> sdq1[2](S) sdx1[9](S)
>       1757761512 blocks super 1.2
> 
> md9 : active raid0 sdan1[0] sdao1[1]
>       976765440 blocks super 1.2 256k chunks
> 
> md6 : inactive sdah1[0](S)
>       488382944 blocks super 1.2
> 
> md4 : inactive sdae1[1](S)
>       488382944 blocks super 1.2
> 
> md8 : inactive sdal1[0](S)
>       488382944 blocks super 1.2
> 
> md127 : inactive sdg1[3](S) sdl1[8](S) sdc1[11](S) sdi1[12](S)
> sdf1[2](S) sdb1[10](S)
>       860226027 blocks super 1.2
> 
> md5 : inactive sdaf1[0](S)
>       488382944 blocks super 1.2
> 
> md1 : inactive sdr1[3](S) sdp1[11](S) sdt1[5](S) sds1[4](S)
> sdy1[10](S) sdv1[7](S)
>       1757761512 blocks super 1.2
> 
> md0 : inactive sde1[1](S) sdh1[4](S) sdm1[9](S) sdj1[6](S) sdd1[0](S) sdk1[7](S)
>       860226027 blocks super 1.2
> 
> md3 : inactive sdab1[0](S)
>       195357344 blocks super 1.2
> 
> md2 : active raid0 sdaa1[0] sdz1[1]
>       62490672 blocks super 1.2 4k chunks
> 
> unused devices: <none>
> 
> 
> ---------------------------------------------------------------------------------------------------------------------------
> 
> 
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md126 : inactive sdaf1[0](S)
>       488382944 blocks super 1.2
> 
> md127 : inactive sdae1[1](S)
>       488382944 blocks super 1.2
> 
> md9 : active raid0 sdan1[0] sdao1[1]
>       976765440 blocks super 1.2 256k chunks
> 
> md7 : active raid0 sdaj1[0] sdak1[1]
>       976765888 blocks super 1.2 4k chunks
> 
> md4 : inactive sdad1[0](S)
>       488382944 blocks super 1.2
> 
> md6 : active raid0 sdah1[0] sdai1[1]
>       976765696 blocks super 1.2 128k chunks
> 
> md8 : active raid0 sdam1[1] sdal1[0]
>       976765440 blocks super 1.2 256k chunks
> 
> md5 : inactive sdag1[1](S)
>       488382944 blocks super 1.2
> 
> md0 : active raid6 sdc1[11] sdd1[0] sdh1[4] sdf1[2] sdm1[9] sde1[1]
> sdb1[10] sdg1[3] sdl1[8](S) sdj1[6] sdk1[7] sdi1[12](S)
>       1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2
> [10/10] [UUUUUUUUUU]
> 
> md1 : active raid5 sdq1[2] sdy1[10] sdv1[7] sdn1[0] sdt1[5] sdw1[8]
> sdp1[11](S) sdr1[3] sdu1[6] sdx1[9] sdo1[1] sds1[4]
>       2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2
> [11/11] [UUUUUUUUUUU]
> 
> md3 : active raid1 sdac1[1] sdab1[0]
>       195357272 blocks super 1.2 [2/2] [UU]
> 
> md2 : active raid0 sdz1[1] sdaa1[0]
>       62490672 blocks super 1.2 4k chunks
> 
> unused devices: <none>
> 
> 
> --------------------------------------------------------------------------------------------------------------------------
> 
> 
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md127 : inactive sdab1[0](S)
>       195357344 blocks super 1.2
> 
> md4 : active raid0 sdad1[0] sdae1[1]
>       976765888 blocks super 1.2 32k chunks
> 
> md7 : active raid0 sdak1[1] sdaj1[0]
>       976765888 blocks super 1.2 4k chunks
> 
> md8 : active raid0 sdam1[1] sdal1[0]
>       976765440 blocks super 1.2 256k chunks
> 
> md6 : active raid0 sdah1[0] sdai1[1]
>       976765696 blocks super 1.2 128k chunks
> 
> md9 : active raid0 sdao1[1] sdan1[0]
>       976765440 blocks super 1.2 256k chunks
> 
> md5 : active raid0 sdaf1[0] sdag1[1]
>       976765440 blocks super 1.2 256k chunks
> 
> md1 : active raid5 sdy1[10] sdv1[7] sdu1[6] sds1[4] sdq1[2]
> sdp1[11](S) sdt1[5] sdo1[1] sdx1[9] sdr1[3] sdw1[8] sdn1[0]
>       2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2
> [11/11] [UUUUUUUUUUU]
> 
> md0 : active raid6 sdl1[8](S) sdd1[0] sdc1[11] sdg1[3] sdk1[7] sde1[1]
> sdm1[9] sdb1[10] sdi1[12](S) sdh1[4] sdf1[2] sdj1[6]
>       1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2
> [10/10] [UUUUUUUUUU]
> 
> md3 : inactive sdac1[1](S)
>       195357344 blocks super 1.2
> 
> md2 : active raid0 sdz1[1] sdaa1[0]
>       62490672 blocks super 1.2 4k chunks
> 
> unused devices: <none>
> 
> 
> 
> My mdadm.conf file is as follows:
> 
> 
> # mdadm.conf
> #
> # Please refer to mdadm.conf(5) for information about this file.
> #
> 
> # by default, scan all partitions (/proc/partitions) for MD superblocks.
> # alternatively, specify devices to scan, using wildcards if desired.
> DEVICE partitions
> 
> # auto-create devices with Debian standard permissions
> CREATE owner=root group=disk mode=0660 auto=yes
> 
> # automatically tag new arrays as belonging to the local system
> HOMEHOST <system>
> 
> # instruct the monitoring daemon where to send mail alerts
> MAILADDR root
> 
> # definitions of existing MD arrays
> 
> # This file was auto-generated on Sun, 13 Jul 2008 20:42:57 -0500
> # by mkconf $Id$
> 
> 
> 
> 
> Any insight would be greatly appreciated. This is a big problem as it
> is now. Thank you very much in advance!
> 
> Best,
> -Tommy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: md's fail to assemble correctly consistently at system startup - mdadm 3.1.2 and Ubuntu 10.04
  2010-08-08  8:58 ` Neil Brown
@ 2010-08-08 14:26   ` fibreraid
  2010-08-09  9:00     ` fibreraid
  2010-08-09 11:00     ` Neil Brown
  0 siblings, 2 replies; 11+ messages in thread
From: fibreraid @ 2010-08-08 14:26 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Thank you Neil for the reply and heads-up on 3.1.3. I will test that
immediately and report back my findings.

One potential issue I noticed is that Ubuntu Lucid's default kernel
configuration has CONFIG_MD_AUTODETECT enabled. I thought this feature
might conflict with udev, so I've attempted to disable this by adding
a parameter to my grub2 bootup: raid=noautodetect. But I am not sure
if this is effective. Do you think this kernel setting could also be a
problem source?

Another method I was contemplating to avoid a potential locking issue
is to have udev's mdadm -i command run with watershed, which should in
theory serialize it. What do you think?

SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="linux_raid*", \
   RUN+="watershed -i mdadm /sbin/mdadm --incremental $env{DEVNAME}"

Finally, in your view, is it essential that the underlying partitions
used in the md's be the "Linux raid autodetect" type? My partitions at
the moment are just plain "Linux".

Anyway, I will test mdadm 3.1.3 right now but I wanted to ask for your
insight/comments on the above. Thanks!

Best,
Tommy



On Sun, Aug 8, 2010 at 1:58 AM, Neil Brown <neilb@suse.de> wrote:
> On Sat, 7 Aug 2010 18:27:58 -0700
> "fibreraid@gmail.com" <fibreraid@gmail.com> wrote:
>
>> Hi all,
>>
>> I am facing a serious issue with md's on my Ubuntu 10.04 64-bit
>> server. I am using mdadm 3.1.2. The system has 40 drives in it, and
>> there are 10 md devices, which are a combination of RAID 0, 1, 5, 6,
>> and 10 levels. The drives are connected via LSI SAS adapters in
>> external SAS JBODs.
>>
>> When I boot the system, about 50% of the time, the md's will not come
>> up correctly. Instead of md0-md9 being active, some or all will be
>> inactive and there will be new md's like md127, md126, md125, etc.
>
> Sounds like a locking problem - udev is calling "mdadm -I" on each device and
> might call some in parallel.  mdadm needs to serialise things to ensure this
> sort of confusion doesn't happen.
>
> It is possible that this is fixed in the just-released mdadm-3.1.3.  If you
> could test and and see if it makes a difference that would help a lot.
>
> Thanks,
> NeilBrown
>
>>
>> Here is the output of /proc/mdstat when all md's come up correctly:
>>
>>
>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>> [raid4] [raid10]
>> md0 : active raid6 sdj1[6] sdk1[7] sdf1[2] sdb1[10] sdg1[3] sdl1[8](S)
>> sdh1[4] sdm1[9] sde1[1] sdi1[12](S) sdc1[11] sdd1[0]
>>       1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2
>> [10/10] [UUUUUUUUUU]
>>
>> md9 : active raid0 sdao1[1] sdan1[0]
>>       976765440 blocks super 1.2 256k chunks
>>
>> md8 : active raid0 sdam1[1] sdal1[0]
>>       976765440 blocks super 1.2 256k chunks
>>
>> md7 : active raid0 sdak1[1] sdaj1[0]
>>       976765888 blocks super 1.2 4k chunks
>>
>> md6 : active raid0 sdai1[1] sdah1[0]
>>       976765696 blocks super 1.2 128k chunks
>>
>> md5 : active raid0 sdag1[1] sdaf1[0]
>>       976765440 blocks super 1.2 256k chunks
>>
>> md4 : active raid0 sdae1[1] sdad1[0]
>>       976765888 blocks super 1.2 32k chunks
>>
>> md3 : active raid1 sdac1[1] sdab1[0]
>>       195357272 blocks super 1.2 [2/2] [UU]
>>
>> md2 : active raid0 sdaa1[0] sdz1[1]
>>       62490672 blocks super 1.2 4k chunks
>>
>> md1 : active raid5 sdy1[10] sdx1[9] sdw1[8] sdv1[7] sdu1[6] sdt1[5]
>> sds1[4] sdr1[3] sdq1[2] sdp1[11](S) sdo1[1] sdn1[0]
>>       2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2
>> [11/11] [UUUUUUUUUUU]
>>
>> unused devices: <none>
>>
>>
>> --------------------------------------------------------------------------------------------------------------------------
>>
>>
>> Here are several examples of when they do not come up correctly.
>> Again, I am not making any configuration changes; I just reboot the
>> system and check /proc/mdstat several minutes after it is fully
>> booted.
>>
>>
>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>> [raid4] [raid10]
>> md124 : inactive sdam1[1](S)
>>       488382944 blocks super 1.2
>>
>> md125 : inactive sdag1[1](S)
>>       488382944 blocks super 1.2
>>
>> md7 : active raid0 sdaj1[0] sdak1[1]
>>       976765888 blocks super 1.2 4k chunks
>>
>> md126 : inactive sdw1[8](S) sdn1[0](S) sdo1[1](S) sdu1[6](S)
>> sdq1[2](S) sdx1[9](S)
>>       1757761512 blocks super 1.2
>>
>> md9 : active raid0 sdan1[0] sdao1[1]
>>       976765440 blocks super 1.2 256k chunks
>>
>> md6 : inactive sdah1[0](S)
>>       488382944 blocks super 1.2
>>
>> md4 : inactive sdae1[1](S)
>>       488382944 blocks super 1.2
>>
>> md8 : inactive sdal1[0](S)
>>       488382944 blocks super 1.2
>>
>> md127 : inactive sdg1[3](S) sdl1[8](S) sdc1[11](S) sdi1[12](S)
>> sdf1[2](S) sdb1[10](S)
>>       860226027 blocks super 1.2
>>
>> md5 : inactive sdaf1[0](S)
>>       488382944 blocks super 1.2
>>
>> md1 : inactive sdr1[3](S) sdp1[11](S) sdt1[5](S) sds1[4](S)
>> sdy1[10](S) sdv1[7](S)
>>       1757761512 blocks super 1.2
>>
>> md0 : inactive sde1[1](S) sdh1[4](S) sdm1[9](S) sdj1[6](S) sdd1[0](S) sdk1[7](S)
>>       860226027 blocks super 1.2
>>
>> md3 : inactive sdab1[0](S)
>>       195357344 blocks super 1.2
>>
>> md2 : active raid0 sdaa1[0] sdz1[1]
>>       62490672 blocks super 1.2 4k chunks
>>
>> unused devices: <none>
>>
>>
>> ---------------------------------------------------------------------------------------------------------------------------
>>
>>
>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>> [raid4] [raid10]
>> md126 : inactive sdaf1[0](S)
>>       488382944 blocks super 1.2
>>
>> md127 : inactive sdae1[1](S)
>>       488382944 blocks super 1.2
>>
>> md9 : active raid0 sdan1[0] sdao1[1]
>>       976765440 blocks super 1.2 256k chunks
>>
>> md7 : active raid0 sdaj1[0] sdak1[1]
>>       976765888 blocks super 1.2 4k chunks
>>
>> md4 : inactive sdad1[0](S)
>>       488382944 blocks super 1.2
>>
>> md6 : active raid0 sdah1[0] sdai1[1]
>>       976765696 blocks super 1.2 128k chunks
>>
>> md8 : active raid0 sdam1[1] sdal1[0]
>>       976765440 blocks super 1.2 256k chunks
>>
>> md5 : inactive sdag1[1](S)
>>       488382944 blocks super 1.2
>>
>> md0 : active raid6 sdc1[11] sdd1[0] sdh1[4] sdf1[2] sdm1[9] sde1[1]
>> sdb1[10] sdg1[3] sdl1[8](S) sdj1[6] sdk1[7] sdi1[12](S)
>>       1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2
>> [10/10] [UUUUUUUUUU]
>>
>> md1 : active raid5 sdq1[2] sdy1[10] sdv1[7] sdn1[0] sdt1[5] sdw1[8]
>> sdp1[11](S) sdr1[3] sdu1[6] sdx1[9] sdo1[1] sds1[4]
>>       2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2
>> [11/11] [UUUUUUUUUUU]
>>
>> md3 : active raid1 sdac1[1] sdab1[0]
>>       195357272 blocks super 1.2 [2/2] [UU]
>>
>> md2 : active raid0 sdz1[1] sdaa1[0]
>>       62490672 blocks super 1.2 4k chunks
>>
>> unused devices: <none>
>>
>>
>> --------------------------------------------------------------------------------------------------------------------------
>>
>>
>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>> [raid4] [raid10]
>> md127 : inactive sdab1[0](S)
>>       195357344 blocks super 1.2
>>
>> md4 : active raid0 sdad1[0] sdae1[1]
>>       976765888 blocks super 1.2 32k chunks
>>
>> md7 : active raid0 sdak1[1] sdaj1[0]
>>       976765888 blocks super 1.2 4k chunks
>>
>> md8 : active raid0 sdam1[1] sdal1[0]
>>       976765440 blocks super 1.2 256k chunks
>>
>> md6 : active raid0 sdah1[0] sdai1[1]
>>       976765696 blocks super 1.2 128k chunks
>>
>> md9 : active raid0 sdao1[1] sdan1[0]
>>       976765440 blocks super 1.2 256k chunks
>>
>> md5 : active raid0 sdaf1[0] sdag1[1]
>>       976765440 blocks super 1.2 256k chunks
>>
>> md1 : active raid5 sdy1[10] sdv1[7] sdu1[6] sds1[4] sdq1[2]
>> sdp1[11](S) sdt1[5] sdo1[1] sdx1[9] sdr1[3] sdw1[8] sdn1[0]
>>       2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2
>> [11/11] [UUUUUUUUUUU]
>>
>> md0 : active raid6 sdl1[8](S) sdd1[0] sdc1[11] sdg1[3] sdk1[7] sde1[1]
>> sdm1[9] sdb1[10] sdi1[12](S) sdh1[4] sdf1[2] sdj1[6]
>>       1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2
>> [10/10] [UUUUUUUUUU]
>>
>> md3 : inactive sdac1[1](S)
>>       195357344 blocks super 1.2
>>
>> md2 : active raid0 sdz1[1] sdaa1[0]
>>       62490672 blocks super 1.2 4k chunks
>>
>> unused devices: <none>
>>
>>
>>
>> My mdadm.conf file is as follows:
>>
>>
>> # mdadm.conf
>> #
>> # Please refer to mdadm.conf(5) for information about this file.
>> #
>>
>> # by default, scan all partitions (/proc/partitions) for MD superblocks.
>> # alternatively, specify devices to scan, using wildcards if desired.
>> DEVICE partitions
>>
>> # auto-create devices with Debian standard permissions
>> CREATE owner=root group=disk mode=0660 auto=yes
>>
>> # automatically tag new arrays as belonging to the local system
>> HOMEHOST <system>
>>
>> # instruct the monitoring daemon where to send mail alerts
>> MAILADDR root
>>
>> # definitions of existing MD arrays
>>
>> # This file was auto-generated on Sun, 13 Jul 2008 20:42:57 -0500
>> # by mkconf $Id$
>>
>>
>>
>>
>> Any insight would be greatly appreciated. This is a big problem as it
>> is now. Thank you very much in advance!
>>
>> Best,
>> -Tommy
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: md's fail to assemble correctly consistently at system startup - mdadm 3.1.2 and Ubuntu 10.04
  2010-08-08 14:26   ` fibreraid
@ 2010-08-09  9:00     ` fibreraid
  2010-08-09 10:51       ` Neil Brown
  2010-08-09 11:00     ` Neil Brown
  1 sibling, 1 reply; 11+ messages in thread
From: fibreraid @ 2010-08-09  9:00 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Hi Neil,

I tested out mdadm 3.1.3 on my configuration and great news! Problem
solved. After 30 reboots, all md's have come up correctly each and
every time. I did not have to use watershed either for the mdadm -i
command. Thanks for your recommendation!

Sincerely,
Tommy

On Sun, Aug 8, 2010 at 7:26 AM, fibreraid@gmail.com <fibreraid@gmail.com> wrote:
> Thank you Neil for the reply and heads-up on 3.1.3. I will test that
> immediately and report back my findings.
>
> One potential issue I noticed is that Ubuntu Lucid's default kernel
> configuration has CONFIG_MD_AUTODETECT enabled. I thought this feature
> might conflict with udev, so I've attempted to disable this by adding
> a parameter to my grub2 bootup: raid=noautodetect. But I am not sure
> if this is effective. Do you think this kernel setting could also be a
> problem source?
>
> Another method I was contemplating to avoid a potential locking issue
> is to have udev's mdadm -i command run with watershed, which should in
> theory serialize it. What do you think?
>
> SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="linux_raid*", \
>   RUN+="watershed -i mdadm /sbin/mdadm --incremental $env{DEVNAME}"
>
> Finally, in your view, is it essential that the underlying partitions
> used in the md's be the "Linux raid autodetect" type? My partitions at
> the moment are just plain "Linux".
>
> Anyway, I will test mdadm 3.1.3 right now but I wanted to ask for your
> insight/comments on the above. Thanks!
>
> Best,
> Tommy
>
>
>
> On Sun, Aug 8, 2010 at 1:58 AM, Neil Brown <neilb@suse.de> wrote:
>> On Sat, 7 Aug 2010 18:27:58 -0700
>> "fibreraid@gmail.com" <fibreraid@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I am facing a serious issue with md's on my Ubuntu 10.04 64-bit
>>> server. I am using mdadm 3.1.2. The system has 40 drives in it, and
>>> there are 10 md devices, which are a combination of RAID 0, 1, 5, 6,
>>> and 10 levels. The drives are connected via LSI SAS adapters in
>>> external SAS JBODs.
>>>
>>> When I boot the system, about 50% of the time, the md's will not come
>>> up correctly. Instead of md0-md9 being active, some or all will be
>>> inactive and there will be new md's like md127, md126, md125, etc.
>>
>> Sounds like a locking problem - udev is calling "mdadm -I" on each device and
>> might call some in parallel.  mdadm needs to serialise things to ensure this
>> sort of confusion doesn't happen.
>>
>> It is possible that this is fixed in the just-released mdadm-3.1.3.  If you
>> could test and and see if it makes a difference that would help a lot.
>>
>> Thanks,
>> NeilBrown
>>
>>>
>>> Here is the output of /proc/mdstat when all md's come up correctly:
>>>
>>>
>>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>>> [raid4] [raid10]
>>> md0 : active raid6 sdj1[6] sdk1[7] sdf1[2] sdb1[10] sdg1[3] sdl1[8](S)
>>> sdh1[4] sdm1[9] sde1[1] sdi1[12](S) sdc1[11] sdd1[0]
>>>       1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2
>>> [10/10] [UUUUUUUUUU]
>>>
>>> md9 : active raid0 sdao1[1] sdan1[0]
>>>       976765440 blocks super 1.2 256k chunks
>>>
>>> md8 : active raid0 sdam1[1] sdal1[0]
>>>       976765440 blocks super 1.2 256k chunks
>>>
>>> md7 : active raid0 sdak1[1] sdaj1[0]
>>>       976765888 blocks super 1.2 4k chunks
>>>
>>> md6 : active raid0 sdai1[1] sdah1[0]
>>>       976765696 blocks super 1.2 128k chunks
>>>
>>> md5 : active raid0 sdag1[1] sdaf1[0]
>>>       976765440 blocks super 1.2 256k chunks
>>>
>>> md4 : active raid0 sdae1[1] sdad1[0]
>>>       976765888 blocks super 1.2 32k chunks
>>>
>>> md3 : active raid1 sdac1[1] sdab1[0]
>>>       195357272 blocks super 1.2 [2/2] [UU]
>>>
>>> md2 : active raid0 sdaa1[0] sdz1[1]
>>>       62490672 blocks super 1.2 4k chunks
>>>
>>> md1 : active raid5 sdy1[10] sdx1[9] sdw1[8] sdv1[7] sdu1[6] sdt1[5]
>>> sds1[4] sdr1[3] sdq1[2] sdp1[11](S) sdo1[1] sdn1[0]
>>>       2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2
>>> [11/11] [UUUUUUUUUUU]
>>>
>>> unused devices: <none>
>>>
>>>
>>> --------------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>> Here are several examples of when they do not come up correctly.
>>> Again, I am not making any configuration changes; I just reboot the
>>> system and check /proc/mdstat several minutes after it is fully
>>> booted.
>>>
>>>
>>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>>> [raid4] [raid10]
>>> md124 : inactive sdam1[1](S)
>>>       488382944 blocks super 1.2
>>>
>>> md125 : inactive sdag1[1](S)
>>>       488382944 blocks super 1.2
>>>
>>> md7 : active raid0 sdaj1[0] sdak1[1]
>>>       976765888 blocks super 1.2 4k chunks
>>>
>>> md126 : inactive sdw1[8](S) sdn1[0](S) sdo1[1](S) sdu1[6](S)
>>> sdq1[2](S) sdx1[9](S)
>>>       1757761512 blocks super 1.2
>>>
>>> md9 : active raid0 sdan1[0] sdao1[1]
>>>       976765440 blocks super 1.2 256k chunks
>>>
>>> md6 : inactive sdah1[0](S)
>>>       488382944 blocks super 1.2
>>>
>>> md4 : inactive sdae1[1](S)
>>>       488382944 blocks super 1.2
>>>
>>> md8 : inactive sdal1[0](S)
>>>       488382944 blocks super 1.2
>>>
>>> md127 : inactive sdg1[3](S) sdl1[8](S) sdc1[11](S) sdi1[12](S)
>>> sdf1[2](S) sdb1[10](S)
>>>       860226027 blocks super 1.2
>>>
>>> md5 : inactive sdaf1[0](S)
>>>       488382944 blocks super 1.2
>>>
>>> md1 : inactive sdr1[3](S) sdp1[11](S) sdt1[5](S) sds1[4](S)
>>> sdy1[10](S) sdv1[7](S)
>>>       1757761512 blocks super 1.2
>>>
>>> md0 : inactive sde1[1](S) sdh1[4](S) sdm1[9](S) sdj1[6](S) sdd1[0](S) sdk1[7](S)
>>>       860226027 blocks super 1.2
>>>
>>> md3 : inactive sdab1[0](S)
>>>       195357344 blocks super 1.2
>>>
>>> md2 : active raid0 sdaa1[0] sdz1[1]
>>>       62490672 blocks super 1.2 4k chunks
>>>
>>> unused devices: <none>
>>>
>>>
>>> ---------------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>>> [raid4] [raid10]
>>> md126 : inactive sdaf1[0](S)
>>>       488382944 blocks super 1.2
>>>
>>> md127 : inactive sdae1[1](S)
>>>       488382944 blocks super 1.2
>>>
>>> md9 : active raid0 sdan1[0] sdao1[1]
>>>       976765440 blocks super 1.2 256k chunks
>>>
>>> md7 : active raid0 sdaj1[0] sdak1[1]
>>>       976765888 blocks super 1.2 4k chunks
>>>
>>> md4 : inactive sdad1[0](S)
>>>       488382944 blocks super 1.2
>>>
>>> md6 : active raid0 sdah1[0] sdai1[1]
>>>       976765696 blocks super 1.2 128k chunks
>>>
>>> md8 : active raid0 sdam1[1] sdal1[0]
>>>       976765440 blocks super 1.2 256k chunks
>>>
>>> md5 : inactive sdag1[1](S)
>>>       488382944 blocks super 1.2
>>>
>>> md0 : active raid6 sdc1[11] sdd1[0] sdh1[4] sdf1[2] sdm1[9] sde1[1]
>>> sdb1[10] sdg1[3] sdl1[8](S) sdj1[6] sdk1[7] sdi1[12](S)
>>>       1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2
>>> [10/10] [UUUUUUUUUU]
>>>
>>> md1 : active raid5 sdq1[2] sdy1[10] sdv1[7] sdn1[0] sdt1[5] sdw1[8]
>>> sdp1[11](S) sdr1[3] sdu1[6] sdx1[9] sdo1[1] sds1[4]
>>>       2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2
>>> [11/11] [UUUUUUUUUUU]
>>>
>>> md3 : active raid1 sdac1[1] sdab1[0]
>>>       195357272 blocks super 1.2 [2/2] [UU]
>>>
>>> md2 : active raid0 sdz1[1] sdaa1[0]
>>>       62490672 blocks super 1.2 4k chunks
>>>
>>> unused devices: <none>
>>>
>>>
>>> --------------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>>> [raid4] [raid10]
>>> md127 : inactive sdab1[0](S)
>>>       195357344 blocks super 1.2
>>>
>>> md4 : active raid0 sdad1[0] sdae1[1]
>>>       976765888 blocks super 1.2 32k chunks
>>>
>>> md7 : active raid0 sdak1[1] sdaj1[0]
>>>       976765888 blocks super 1.2 4k chunks
>>>
>>> md8 : active raid0 sdam1[1] sdal1[0]
>>>       976765440 blocks super 1.2 256k chunks
>>>
>>> md6 : active raid0 sdah1[0] sdai1[1]
>>>       976765696 blocks super 1.2 128k chunks
>>>
>>> md9 : active raid0 sdao1[1] sdan1[0]
>>>       976765440 blocks super 1.2 256k chunks
>>>
>>> md5 : active raid0 sdaf1[0] sdag1[1]
>>>       976765440 blocks super 1.2 256k chunks
>>>
>>> md1 : active raid5 sdy1[10] sdv1[7] sdu1[6] sds1[4] sdq1[2]
>>> sdp1[11](S) sdt1[5] sdo1[1] sdx1[9] sdr1[3] sdw1[8] sdn1[0]
>>>       2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2
>>> [11/11] [UUUUUUUUUUU]
>>>
>>> md0 : active raid6 sdl1[8](S) sdd1[0] sdc1[11] sdg1[3] sdk1[7] sde1[1]
>>> sdm1[9] sdb1[10] sdi1[12](S) sdh1[4] sdf1[2] sdj1[6]
>>>       1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2
>>> [10/10] [UUUUUUUUUU]
>>>
>>> md3 : inactive sdac1[1](S)
>>>       195357344 blocks super 1.2
>>>
>>> md2 : active raid0 sdz1[1] sdaa1[0]
>>>       62490672 blocks super 1.2 4k chunks
>>>
>>> unused devices: <none>
>>>
>>>
>>>
>>> My mdadm.conf file is as follows:
>>>
>>>
>>> # mdadm.conf
>>> #
>>> # Please refer to mdadm.conf(5) for information about this file.
>>> #
>>>
>>> # by default, scan all partitions (/proc/partitions) for MD superblocks.
>>> # alternatively, specify devices to scan, using wildcards if desired.
>>> DEVICE partitions
>>>
>>> # auto-create devices with Debian standard permissions
>>> CREATE owner=root group=disk mode=0660 auto=yes
>>>
>>> # automatically tag new arrays as belonging to the local system
>>> HOMEHOST <system>
>>>
>>> # instruct the monitoring daemon where to send mail alerts
>>> MAILADDR root
>>>
>>> # definitions of existing MD arrays
>>>
>>> # This file was auto-generated on Sun, 13 Jul 2008 20:42:57 -0500
>>> # by mkconf $Id$
>>>
>>>
>>>
>>>
>>> Any insight would be greatly appreciated. This is a big problem as it
>>> is now. Thank you very much in advance!
>>>
>>> Best,
>>> -Tommy
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: md's fail to assemble correctly consistently at system startup -  mdadm 3.1.2 and Ubuntu 10.04
  2010-08-09  9:00     ` fibreraid
@ 2010-08-09 10:51       ` Neil Brown
  0 siblings, 0 replies; 11+ messages in thread
From: Neil Brown @ 2010-08-09 10:51 UTC (permalink / raw)
  To: fibreraid; +Cc: linux-raid

On Mon, 9 Aug 2010 02:00:05 -0700
"fibreraid@gmail.com" <fibreraid@gmail.com> wrote:

> Hi Neil,
> 
> I tested out mdadm 3.1.3 on my configuration and great news! Problem
> solved. After 30 reboots, all md's have come up correctly each and
> every time. I did not have to use watershed either for the mdadm -i
> command. Thanks for your recommendation!
> 

Thanks for the confirmation - I hoped it 3.1.3 would fix it but wasn't
completely confident.

Thanks,
NeilBrown


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: md's fail to assemble correctly consistently at system startup - mdadm 3.1.2 and Ubuntu 10.04
  2010-08-08 14:26   ` fibreraid
  2010-08-09  9:00     ` fibreraid
@ 2010-08-09 11:00     ` Neil Brown
  2010-08-09 11:58       ` fibreraid
  1 sibling, 1 reply; 11+ messages in thread
From: Neil Brown @ 2010-08-09 11:00 UTC (permalink / raw)
  To: fibreraid; +Cc: linux-raid

On Sun, 8 Aug 2010 07:26:59 -0700
"fibreraid@gmail.com" <fibreraid@gmail.com> wrote:

> Thank you Neil for the reply and heads-up on 3.1.3. I will test that
> immediately and report back my findings.
> 
> One potential issue I noticed is that Ubuntu Lucid's default kernel
> configuration has CONFIG_MD_AUTODETECT enabled. I thought this feature
> might conflict with udev, so I've attempted to disable this by adding
> a parameter to my grub2 bootup: raid=noautodetect. But I am not sure
> if this is effective. Do you think this kernel setting could also be a
> problem source?

If you don't use "Linux raid autodetect" partition types (which you say below
that you don't) this CONFIG setting will have no effect at all.

> 
> Another method I was contemplating to avoid a potential locking issue
> is to have udev's mdadm -i command run with watershed, which should in
> theory serialize it. What do you think?
> 
> SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="linux_raid*", \
>    RUN+="watershed -i mdadm /sbin/mdadm --incremental $env{DEVNAME}"

I haven't come across watershed before.  I couldn't easily find out much
about it on the web, so I cannot say what effect it would have.  My guess
from what little I have read is 'none'.

> 
> Finally, in your view, is it essential that the underlying partitions
> used in the md's be the "Linux raid autodetect" type? My partitions at
> the moment are just plain "Linux".

I actually recommend "Non-FS data" (0xDA) as 'Linux' might make some tools
think there is a filesystem there even though there isn't.  But 'Linux' is
mostly fine.  I avoid "Linux raid autodetect" as it enable the MD_AUTODETECT
functionality which I don't like.

NeilBrown


> 
> Anyway, I will test mdadm 3.1.3 right now but I wanted to ask for your
> insight/comments on the above. Thanks!
> 
> Best,
> Tommy
> 
> 
> 
> On Sun, Aug 8, 2010 at 1:58 AM, Neil Brown <neilb@suse.de> wrote:
> > On Sat, 7 Aug 2010 18:27:58 -0700
> > "fibreraid@gmail.com" <fibreraid@gmail.com> wrote:
> >
> >> Hi all,
> >>
> >> I am facing a serious issue with md's on my Ubuntu 10.04 64-bit
> >> server. I am using mdadm 3.1.2. The system has 40 drives in it, and
> >> there are 10 md devices, which are a combination of RAID 0, 1, 5, 6,
> >> and 10 levels. The drives are connected via LSI SAS adapters in
> >> external SAS JBODs.
> >>
> >> When I boot the system, about 50% of the time, the md's will not come
> >> up correctly. Instead of md0-md9 being active, some or all will be
> >> inactive and there will be new md's like md127, md126, md125, etc.
> >
> > Sounds like a locking problem - udev is calling "mdadm -I" on each device and
> > might call some in parallel.  mdadm needs to serialise things to ensure this
> > sort of confusion doesn't happen.
> >
> > It is possible that this is fixed in the just-released mdadm-3.1.3.  If you
> > could test and and see if it makes a difference that would help a lot.
> >
> > Thanks,
> > NeilBrown
> >
> >>
> >> Here is the output of /proc/mdstat when all md's come up correctly:
> >>
> >>
> >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> >> [raid4] [raid10]
> >> md0 : active raid6 sdj1[6] sdk1[7] sdf1[2] sdb1[10] sdg1[3] sdl1[8](S)
> >> sdh1[4] sdm1[9] sde1[1] sdi1[12](S) sdc1[11] sdd1[0]
> >>       1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2
> >> [10/10] [UUUUUUUUUU]
> >>
> >> md9 : active raid0 sdao1[1] sdan1[0]
> >>       976765440 blocks super 1.2 256k chunks
> >>
> >> md8 : active raid0 sdam1[1] sdal1[0]
> >>       976765440 blocks super 1.2 256k chunks
> >>
> >> md7 : active raid0 sdak1[1] sdaj1[0]
> >>       976765888 blocks super 1.2 4k chunks
> >>
> >> md6 : active raid0 sdai1[1] sdah1[0]
> >>       976765696 blocks super 1.2 128k chunks
> >>
> >> md5 : active raid0 sdag1[1] sdaf1[0]
> >>       976765440 blocks super 1.2 256k chunks
> >>
> >> md4 : active raid0 sdae1[1] sdad1[0]
> >>       976765888 blocks super 1.2 32k chunks
> >>
> >> md3 : active raid1 sdac1[1] sdab1[0]
> >>       195357272 blocks super 1.2 [2/2] [UU]
> >>
> >> md2 : active raid0 sdaa1[0] sdz1[1]
> >>       62490672 blocks super 1.2 4k chunks
> >>
> >> md1 : active raid5 sdy1[10] sdx1[9] sdw1[8] sdv1[7] sdu1[6] sdt1[5]
> >> sds1[4] sdr1[3] sdq1[2] sdp1[11](S) sdo1[1] sdn1[0]
> >>       2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2
> >> [11/11] [UUUUUUUUUUU]
> >>
> >> unused devices: <none>
> >>
> >>
> >> --------------------------------------------------------------------------------------------------------------------------
> >>
> >>
> >> Here are several examples of when they do not come up correctly.
> >> Again, I am not making any configuration changes; I just reboot the
> >> system and check /proc/mdstat several minutes after it is fully
> >> booted.
> >>
> >>
> >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> >> [raid4] [raid10]
> >> md124 : inactive sdam1[1](S)
> >>       488382944 blocks super 1.2
> >>
> >> md125 : inactive sdag1[1](S)
> >>       488382944 blocks super 1.2
> >>
> >> md7 : active raid0 sdaj1[0] sdak1[1]
> >>       976765888 blocks super 1.2 4k chunks
> >>
> >> md126 : inactive sdw1[8](S) sdn1[0](S) sdo1[1](S) sdu1[6](S)
> >> sdq1[2](S) sdx1[9](S)
> >>       1757761512 blocks super 1.2
> >>
> >> md9 : active raid0 sdan1[0] sdao1[1]
> >>       976765440 blocks super 1.2 256k chunks
> >>
> >> md6 : inactive sdah1[0](S)
> >>       488382944 blocks super 1.2
> >>
> >> md4 : inactive sdae1[1](S)
> >>       488382944 blocks super 1.2
> >>
> >> md8 : inactive sdal1[0](S)
> >>       488382944 blocks super 1.2
> >>
> >> md127 : inactive sdg1[3](S) sdl1[8](S) sdc1[11](S) sdi1[12](S)
> >> sdf1[2](S) sdb1[10](S)
> >>       860226027 blocks super 1.2
> >>
> >> md5 : inactive sdaf1[0](S)
> >>       488382944 blocks super 1.2
> >>
> >> md1 : inactive sdr1[3](S) sdp1[11](S) sdt1[5](S) sds1[4](S)
> >> sdy1[10](S) sdv1[7](S)
> >>       1757761512 blocks super 1.2
> >>
> >> md0 : inactive sde1[1](S) sdh1[4](S) sdm1[9](S) sdj1[6](S) sdd1[0](S) sdk1[7](S)
> >>       860226027 blocks super 1.2
> >>
> >> md3 : inactive sdab1[0](S)
> >>       195357344 blocks super 1.2
> >>
> >> md2 : active raid0 sdaa1[0] sdz1[1]
> >>       62490672 blocks super 1.2 4k chunks
> >>
> >> unused devices: <none>
> >>
> >>
> >> ---------------------------------------------------------------------------------------------------------------------------
> >>
> >>
> >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> >> [raid4] [raid10]
> >> md126 : inactive sdaf1[0](S)
> >>       488382944 blocks super 1.2
> >>
> >> md127 : inactive sdae1[1](S)
> >>       488382944 blocks super 1.2
> >>
> >> md9 : active raid0 sdan1[0] sdao1[1]
> >>       976765440 blocks super 1.2 256k chunks
> >>
> >> md7 : active raid0 sdaj1[0] sdak1[1]
> >>       976765888 blocks super 1.2 4k chunks
> >>
> >> md4 : inactive sdad1[0](S)
> >>       488382944 blocks super 1.2
> >>
> >> md6 : active raid0 sdah1[0] sdai1[1]
> >>       976765696 blocks super 1.2 128k chunks
> >>
> >> md8 : active raid0 sdam1[1] sdal1[0]
> >>       976765440 blocks super 1.2 256k chunks
> >>
> >> md5 : inactive sdag1[1](S)
> >>       488382944 blocks super 1.2
> >>
> >> md0 : active raid6 sdc1[11] sdd1[0] sdh1[4] sdf1[2] sdm1[9] sde1[1]
> >> sdb1[10] sdg1[3] sdl1[8](S) sdj1[6] sdk1[7] sdi1[12](S)
> >>       1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2
> >> [10/10] [UUUUUUUUUU]
> >>
> >> md1 : active raid5 sdq1[2] sdy1[10] sdv1[7] sdn1[0] sdt1[5] sdw1[8]
> >> sdp1[11](S) sdr1[3] sdu1[6] sdx1[9] sdo1[1] sds1[4]
> >>       2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2
> >> [11/11] [UUUUUUUUUUU]
> >>
> >> md3 : active raid1 sdac1[1] sdab1[0]
> >>       195357272 blocks super 1.2 [2/2] [UU]
> >>
> >> md2 : active raid0 sdz1[1] sdaa1[0]
> >>       62490672 blocks super 1.2 4k chunks
> >>
> >> unused devices: <none>
> >>
> >>
> >> --------------------------------------------------------------------------------------------------------------------------
> >>
> >>
> >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> >> [raid4] [raid10]
> >> md127 : inactive sdab1[0](S)
> >>       195357344 blocks super 1.2
> >>
> >> md4 : active raid0 sdad1[0] sdae1[1]
> >>       976765888 blocks super 1.2 32k chunks
> >>
> >> md7 : active raid0 sdak1[1] sdaj1[0]
> >>       976765888 blocks super 1.2 4k chunks
> >>
> >> md8 : active raid0 sdam1[1] sdal1[0]
> >>       976765440 blocks super 1.2 256k chunks
> >>
> >> md6 : active raid0 sdah1[0] sdai1[1]
> >>       976765696 blocks super 1.2 128k chunks
> >>
> >> md9 : active raid0 sdao1[1] sdan1[0]
> >>       976765440 blocks super 1.2 256k chunks
> >>
> >> md5 : active raid0 sdaf1[0] sdag1[1]
> >>       976765440 blocks super 1.2 256k chunks
> >>
> >> md1 : active raid5 sdy1[10] sdv1[7] sdu1[6] sds1[4] sdq1[2]
> >> sdp1[11](S) sdt1[5] sdo1[1] sdx1[9] sdr1[3] sdw1[8] sdn1[0]
> >>       2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2
> >> [11/11] [UUUUUUUUUUU]
> >>
> >> md0 : active raid6 sdl1[8](S) sdd1[0] sdc1[11] sdg1[3] sdk1[7] sde1[1]
> >> sdm1[9] sdb1[10] sdi1[12](S) sdh1[4] sdf1[2] sdj1[6]
> >>       1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2
> >> [10/10] [UUUUUUUUUU]
> >>
> >> md3 : inactive sdac1[1](S)
> >>       195357344 blocks super 1.2
> >>
> >> md2 : active raid0 sdz1[1] sdaa1[0]
> >>       62490672 blocks super 1.2 4k chunks
> >>
> >> unused devices: <none>
> >>
> >>
> >>
> >> My mdadm.conf file is as follows:
> >>
> >>
> >> # mdadm.conf
> >> #
> >> # Please refer to mdadm.conf(5) for information about this file.
> >> #
> >>
> >> # by default, scan all partitions (/proc/partitions) for MD superblocks.
> >> # alternatively, specify devices to scan, using wildcards if desired.
> >> DEVICE partitions
> >>
> >> # auto-create devices with Debian standard permissions
> >> CREATE owner=root group=disk mode=0660 auto=yes
> >>
> >> # automatically tag new arrays as belonging to the local system
> >> HOMEHOST <system>
> >>
> >> # instruct the monitoring daemon where to send mail alerts
> >> MAILADDR root
> >>
> >> # definitions of existing MD arrays
> >>
> >> # This file was auto-generated on Sun, 13 Jul 2008 20:42:57 -0500
> >> # by mkconf $Id$
> >>
> >>
> >>
> >>
> >> Any insight would be greatly appreciated. This is a big problem as it
> >> is now. Thank you very much in advance!
> >>
> >> Best,
> >> -Tommy
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> >

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: md's fail to assemble correctly consistently at system startup - mdadm 3.1.2 and Ubuntu 10.04
  2010-08-09 11:00     ` Neil Brown
@ 2010-08-09 11:58       ` fibreraid
  2010-08-11  5:17         ` Dan Williams
  0 siblings, 1 reply; 11+ messages in thread
From: fibreraid @ 2010-08-09 11:58 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Hi Neil,

I may have spoken a bit too soon. It seems that while the md's are
coming up successfully, on occasion, hot-spares are not coming up
associated with their proper md's. As a result, what was a RAID 5 md
with one hot-spare will on occasion come up as a RAID 5 md with no
hot-spare.

Any ideas on this one?

-Tommy

On Mon, Aug 9, 2010 at 4:00 AM, Neil Brown <neilb@suse.de> wrote:
> On Sun, 8 Aug 2010 07:26:59 -0700
> "fibreraid@gmail.com" <fibreraid@gmail.com> wrote:
>
>> Thank you Neil for the reply and heads-up on 3.1.3. I will test that
>> immediately and report back my findings.
>>
>> One potential issue I noticed is that Ubuntu Lucid's default kernel
>> configuration has CONFIG_MD_AUTODETECT enabled. I thought this feature
>> might conflict with udev, so I've attempted to disable this by adding
>> a parameter to my grub2 bootup: raid=noautodetect. But I am not sure
>> if this is effective. Do you think this kernel setting could also be a
>> problem source?
>
> If you don't use "Linux raid autodetect" partition types (which you say below
> that you don't) this CONFIG setting will have no effect at all.
>
>>
>> Another method I was contemplating to avoid a potential locking issue
>> is to have udev's mdadm -i command run with watershed, which should in
>> theory serialize it. What do you think?
>>
>> SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="linux_raid*", \
>>    RUN+="watershed -i mdadm /sbin/mdadm --incremental $env{DEVNAME}"
>
> I haven't come across watershed before.  I couldn't easily find out much
> about it on the web, so I cannot say what effect it would have.  My guess
> from what little I have read is 'none'.
>
>>
>> Finally, in your view, is it essential that the underlying partitions
>> used in the md's be the "Linux raid autodetect" type? My partitions at
>> the moment are just plain "Linux".
>
> I actually recommend "Non-FS data" (0xDA) as 'Linux' might make some tools
> think there is a filesystem there even though there isn't.  But 'Linux' is
> mostly fine.  I avoid "Linux raid autodetect" as it enable the MD_AUTODETECT
> functionality which I don't like.
>
> NeilBrown
>
>
>>
>> Anyway, I will test mdadm 3.1.3 right now but I wanted to ask for your
>> insight/comments on the above. Thanks!
>>
>> Best,
>> Tommy
>>
>>
>>
>> On Sun, Aug 8, 2010 at 1:58 AM, Neil Brown <neilb@suse.de> wrote:
>> > On Sat, 7 Aug 2010 18:27:58 -0700
>> > "fibreraid@gmail.com" <fibreraid@gmail.com> wrote:
>> >
>> >> Hi all,
>> >>
>> >> I am facing a serious issue with md's on my Ubuntu 10.04 64-bit
>> >> server. I am using mdadm 3.1.2. The system has 40 drives in it, and
>> >> there are 10 md devices, which are a combination of RAID 0, 1, 5, 6,
>> >> and 10 levels. The drives are connected via LSI SAS adapters in
>> >> external SAS JBODs.
>> >>
>> >> When I boot the system, about 50% of the time, the md's will not come
>> >> up correctly. Instead of md0-md9 being active, some or all will be
>> >> inactive and there will be new md's like md127, md126, md125, etc.
>> >
>> > Sounds like a locking problem - udev is calling "mdadm -I" on each device and
>> > might call some in parallel.  mdadm needs to serialise things to ensure this
>> > sort of confusion doesn't happen.
>> >
>> > It is possible that this is fixed in the just-released mdadm-3.1.3.  If you
>> > could test and and see if it makes a difference that would help a lot.
>> >
>> > Thanks,
>> > NeilBrown
>> >
>> >>
>> >> Here is the output of /proc/mdstat when all md's come up correctly:
>> >>
>> >>
>> >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>> >> [raid4] [raid10]
>> >> md0 : active raid6 sdj1[6] sdk1[7] sdf1[2] sdb1[10] sdg1[3] sdl1[8](S)
>> >> sdh1[4] sdm1[9] sde1[1] sdi1[12](S) sdc1[11] sdd1[0]
>> >>       1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2
>> >> [10/10] [UUUUUUUUUU]
>> >>
>> >> md9 : active raid0 sdao1[1] sdan1[0]
>> >>       976765440 blocks super 1.2 256k chunks
>> >>
>> >> md8 : active raid0 sdam1[1] sdal1[0]
>> >>       976765440 blocks super 1.2 256k chunks
>> >>
>> >> md7 : active raid0 sdak1[1] sdaj1[0]
>> >>       976765888 blocks super 1.2 4k chunks
>> >>
>> >> md6 : active raid0 sdai1[1] sdah1[0]
>> >>       976765696 blocks super 1.2 128k chunks
>> >>
>> >> md5 : active raid0 sdag1[1] sdaf1[0]
>> >>       976765440 blocks super 1.2 256k chunks
>> >>
>> >> md4 : active raid0 sdae1[1] sdad1[0]
>> >>       976765888 blocks super 1.2 32k chunks
>> >>
>> >> md3 : active raid1 sdac1[1] sdab1[0]
>> >>       195357272 blocks super 1.2 [2/2] [UU]
>> >>
>> >> md2 : active raid0 sdaa1[0] sdz1[1]
>> >>       62490672 blocks super 1.2 4k chunks
>> >>
>> >> md1 : active raid5 sdy1[10] sdx1[9] sdw1[8] sdv1[7] sdu1[6] sdt1[5]
>> >> sds1[4] sdr1[3] sdq1[2] sdp1[11](S) sdo1[1] sdn1[0]
>> >>       2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2
>> >> [11/11] [UUUUUUUUUUU]
>> >>
>> >> unused devices: <none>
>> >>
>> >>
>> >> --------------------------------------------------------------------------------------------------------------------------
>> >>
>> >>
>> >> Here are several examples of when they do not come up correctly.
>> >> Again, I am not making any configuration changes; I just reboot the
>> >> system and check /proc/mdstat several minutes after it is fully
>> >> booted.
>> >>
>> >>
>> >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>> >> [raid4] [raid10]
>> >> md124 : inactive sdam1[1](S)
>> >>       488382944 blocks super 1.2
>> >>
>> >> md125 : inactive sdag1[1](S)
>> >>       488382944 blocks super 1.2
>> >>
>> >> md7 : active raid0 sdaj1[0] sdak1[1]
>> >>       976765888 blocks super 1.2 4k chunks
>> >>
>> >> md126 : inactive sdw1[8](S) sdn1[0](S) sdo1[1](S) sdu1[6](S)
>> >> sdq1[2](S) sdx1[9](S)
>> >>       1757761512 blocks super 1.2
>> >>
>> >> md9 : active raid0 sdan1[0] sdao1[1]
>> >>       976765440 blocks super 1.2 256k chunks
>> >>
>> >> md6 : inactive sdah1[0](S)
>> >>       488382944 blocks super 1.2
>> >>
>> >> md4 : inactive sdae1[1](S)
>> >>       488382944 blocks super 1.2
>> >>
>> >> md8 : inactive sdal1[0](S)
>> >>       488382944 blocks super 1.2
>> >>
>> >> md127 : inactive sdg1[3](S) sdl1[8](S) sdc1[11](S) sdi1[12](S)
>> >> sdf1[2](S) sdb1[10](S)
>> >>       860226027 blocks super 1.2
>> >>
>> >> md5 : inactive sdaf1[0](S)
>> >>       488382944 blocks super 1.2
>> >>
>> >> md1 : inactive sdr1[3](S) sdp1[11](S) sdt1[5](S) sds1[4](S)
>> >> sdy1[10](S) sdv1[7](S)
>> >>       1757761512 blocks super 1.2
>> >>
>> >> md0 : inactive sde1[1](S) sdh1[4](S) sdm1[9](S) sdj1[6](S) sdd1[0](S) sdk1[7](S)
>> >>       860226027 blocks super 1.2
>> >>
>> >> md3 : inactive sdab1[0](S)
>> >>       195357344 blocks super 1.2
>> >>
>> >> md2 : active raid0 sdaa1[0] sdz1[1]
>> >>       62490672 blocks super 1.2 4k chunks
>> >>
>> >> unused devices: <none>
>> >>
>> >>
>> >> ---------------------------------------------------------------------------------------------------------------------------
>> >>
>> >>
>> >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>> >> [raid4] [raid10]
>> >> md126 : inactive sdaf1[0](S)
>> >>       488382944 blocks super 1.2
>> >>
>> >> md127 : inactive sdae1[1](S)
>> >>       488382944 blocks super 1.2
>> >>
>> >> md9 : active raid0 sdan1[0] sdao1[1]
>> >>       976765440 blocks super 1.2 256k chunks
>> >>
>> >> md7 : active raid0 sdaj1[0] sdak1[1]
>> >>       976765888 blocks super 1.2 4k chunks
>> >>
>> >> md4 : inactive sdad1[0](S)
>> >>       488382944 blocks super 1.2
>> >>
>> >> md6 : active raid0 sdah1[0] sdai1[1]
>> >>       976765696 blocks super 1.2 128k chunks
>> >>
>> >> md8 : active raid0 sdam1[1] sdal1[0]
>> >>       976765440 blocks super 1.2 256k chunks
>> >>
>> >> md5 : inactive sdag1[1](S)
>> >>       488382944 blocks super 1.2
>> >>
>> >> md0 : active raid6 sdc1[11] sdd1[0] sdh1[4] sdf1[2] sdm1[9] sde1[1]
>> >> sdb1[10] sdg1[3] sdl1[8](S) sdj1[6] sdk1[7] sdi1[12](S)
>> >>       1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2
>> >> [10/10] [UUUUUUUUUU]
>> >>
>> >> md1 : active raid5 sdq1[2] sdy1[10] sdv1[7] sdn1[0] sdt1[5] sdw1[8]
>> >> sdp1[11](S) sdr1[3] sdu1[6] sdx1[9] sdo1[1] sds1[4]
>> >>       2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2
>> >> [11/11] [UUUUUUUUUUU]
>> >>
>> >> md3 : active raid1 sdac1[1] sdab1[0]
>> >>       195357272 blocks super 1.2 [2/2] [UU]
>> >>
>> >> md2 : active raid0 sdz1[1] sdaa1[0]
>> >>       62490672 blocks super 1.2 4k chunks
>> >>
>> >> unused devices: <none>
>> >>
>> >>
>> >> --------------------------------------------------------------------------------------------------------------------------
>> >>
>> >>
>> >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>> >> [raid4] [raid10]
>> >> md127 : inactive sdab1[0](S)
>> >>       195357344 blocks super 1.2
>> >>
>> >> md4 : active raid0 sdad1[0] sdae1[1]
>> >>       976765888 blocks super 1.2 32k chunks
>> >>
>> >> md7 : active raid0 sdak1[1] sdaj1[0]
>> >>       976765888 blocks super 1.2 4k chunks
>> >>
>> >> md8 : active raid0 sdam1[1] sdal1[0]
>> >>       976765440 blocks super 1.2 256k chunks
>> >>
>> >> md6 : active raid0 sdah1[0] sdai1[1]
>> >>       976765696 blocks super 1.2 128k chunks
>> >>
>> >> md9 : active raid0 sdao1[1] sdan1[0]
>> >>       976765440 blocks super 1.2 256k chunks
>> >>
>> >> md5 : active raid0 sdaf1[0] sdag1[1]
>> >>       976765440 blocks super 1.2 256k chunks
>> >>
>> >> md1 : active raid5 sdy1[10] sdv1[7] sdu1[6] sds1[4] sdq1[2]
>> >> sdp1[11](S) sdt1[5] sdo1[1] sdx1[9] sdr1[3] sdw1[8] sdn1[0]
>> >>       2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2
>> >> [11/11] [UUUUUUUUUUU]
>> >>
>> >> md0 : active raid6 sdl1[8](S) sdd1[0] sdc1[11] sdg1[3] sdk1[7] sde1[1]
>> >> sdm1[9] sdb1[10] sdi1[12](S) sdh1[4] sdf1[2] sdj1[6]
>> >>       1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2
>> >> [10/10] [UUUUUUUUUU]
>> >>
>> >> md3 : inactive sdac1[1](S)
>> >>       195357344 blocks super 1.2
>> >>
>> >> md2 : active raid0 sdz1[1] sdaa1[0]
>> >>       62490672 blocks super 1.2 4k chunks
>> >>
>> >> unused devices: <none>
>> >>
>> >>
>> >>
>> >> My mdadm.conf file is as follows:
>> >>
>> >>
>> >> # mdadm.conf
>> >> #
>> >> # Please refer to mdadm.conf(5) for information about this file.
>> >> #
>> >>
>> >> # by default, scan all partitions (/proc/partitions) for MD superblocks.
>> >> # alternatively, specify devices to scan, using wildcards if desired.
>> >> DEVICE partitions
>> >>
>> >> # auto-create devices with Debian standard permissions
>> >> CREATE owner=root group=disk mode=0660 auto=yes
>> >>
>> >> # automatically tag new arrays as belonging to the local system
>> >> HOMEHOST <system>
>> >>
>> >> # instruct the monitoring daemon where to send mail alerts
>> >> MAILADDR root
>> >>
>> >> # definitions of existing MD arrays
>> >>
>> >> # This file was auto-generated on Sun, 13 Jul 2008 20:42:57 -0500
>> >> # by mkconf $Id$
>> >>
>> >>
>> >>
>> >>
>> >> Any insight would be greatly appreciated. This is a big problem as it
>> >> is now. Thank you very much in advance!
>> >>
>> >> Best,
>> >> -Tommy
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>> >
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: md's fail to assemble correctly consistently at system startup - mdadm 3.1.2 and Ubuntu 10.04
  2010-08-09 11:58       ` fibreraid
@ 2010-08-11  5:17         ` Dan Williams
  2010-08-12  1:43           ` Neil Brown
  0 siblings, 1 reply; 11+ messages in thread
From: Dan Williams @ 2010-08-11  5:17 UTC (permalink / raw)
  To: fibreraid; +Cc: Neil Brown, linux-raid

On Mon, Aug 9, 2010 at 4:58 AM, fibreraid@gmail.com <fibreraid@gmail.com> wrote:
> Hi Neil,
>
> I may have spoken a bit too soon. It seems that while the md's are
> coming up successfully, on occasion, hot-spares are not coming up
> associated with their proper md's. As a result, what was a RAID 5 md
> with one hot-spare will on occasion come up as a RAID 5 md with no
> hot-spare.
>
> Any ideas on this one?
>

Is this new behavior only seen with 3.1.3, i.e when it worked with
3.1.2 did the hot spares always arrive correctly?  I suspect this is a
result of the new behavior of -I to not add devices to a running array
without the -R parameter, but you don't want to make this the default
for udev otherwise your arrays will always come up degraded.

We could allow disks to be added to active non-degraded arrays, but
that still has the possibility of letting a stale device take the
place of a fresh hot spare (the whole point of changing the behavior
in the first place).  So as far as I can see we need to query the
other disks in the active array and permit the disk to be re-added to
an active array when it is demonstrably a hot spare (or -R is
specified).

--
Dan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: md's fail to assemble correctly consistently at system startup -  mdadm 3.1.2 and Ubuntu 10.04
  2010-08-11  5:17         ` Dan Williams
@ 2010-08-12  1:43           ` Neil Brown
  2010-08-14 16:57             ` fibreraid
  0 siblings, 1 reply; 11+ messages in thread
From: Neil Brown @ 2010-08-12  1:43 UTC (permalink / raw)
  To: Dan Williams; +Cc: fibreraid, linux-raid

On Tue, 10 Aug 2010 22:17:19 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> On Mon, Aug 9, 2010 at 4:58 AM, fibreraid@gmail.com <fibreraid@gmail.com> wrote:
> > Hi Neil,
> >
> > I may have spoken a bit too soon. It seems that while the md's are
> > coming up successfully, on occasion, hot-spares are not coming up
> > associated with their proper md's. As a result, what was a RAID 5 md
> > with one hot-spare will on occasion come up as a RAID 5 md with no
> > hot-spare.
> >
> > Any ideas on this one?
> >
> 
> Is this new behavior only seen with 3.1.3, i.e when it worked with
> 3.1.2 did the hot spares always arrive correctly?  I suspect this is a
> result of the new behavior of -I to not add devices to a running array
> without the -R parameter, but you don't want to make this the default
> for udev otherwise your arrays will always come up degraded.
> 
> We could allow disks to be added to active non-degraded arrays, but
> that still has the possibility of letting a stale device take the
> place of a fresh hot spare (the whole point of changing the behavior
> in the first place).  So as far as I can see we need to query the
> other disks in the active array and permit the disk to be re-added to
> an active array when it is demonstrably a hot spare (or -R is
> specified).
> 
> --
> Dan


Arg... another regression.

Thanks for the report and the analysis.

Here is the fix.

NeilBrown

From ef83fe7cba7355d3da330325e416747b0696baef Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Thu, 12 Aug 2010 11:41:41 +1000
Subject: [PATCH] Allow --incremental to add spares to an array.

Commit 3a6ec29ad56 stopped us from adding apparently-working devices
to an active array with --incremental as there is a good chance that they
are actually old/failed devices.

Unfortunately it also stopped spares from being added to an active
array, which is wrong.  This patch refines the test to be more
careful.

Reported-by: <fibreraid@gmail.com>
Analysed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/Incremental.c b/Incremental.c
index e4b6196..4d3d181 100644
--- a/Incremental.c
+++ b/Incremental.c
@@ -370,14 +370,15 @@ int Incremental(char *devname, int verbose, int runstop,
 		else
 			strcpy(chosen_name, devnum2devname(mp->devnum));
 
-		/* It is generally not OK to add drives to a running array
-		 * as they are probably missing because they failed.
-		 * However if runstop is 1, then the array was possibly
-		 * started early and our best be is to add this anyway.
-		 * It would probably be good to allow explicit policy
-		 * statement about this.
+		/* It is generally not OK to add non-spare drives to a
+		 * running array as they are probably missing because
+		 * they failed.  However if runstop is 1, then the
+		 * array was possibly started early and our best be is
+		 * to add this anyway.  It would probably be good to
+		 * allow explicit policy statement about this.
 		 */
-		if (runstop < 1) {
+		if ((info.disk.state & (1<<MD_DISK_SYNC)) != 0
+		    && runstop < 1) {
 			int active = 0;
 			
 			if (st->ss->external) {

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: md's fail to assemble correctly consistently at system startup - mdadm 3.1.2 and Ubuntu 10.04
  2010-08-12  1:43           ` Neil Brown
@ 2010-08-14 16:57             ` fibreraid
  2010-08-16  4:45               ` Neil Brown
  0 siblings, 1 reply; 11+ messages in thread
From: fibreraid @ 2010-08-14 16:57 UTC (permalink / raw)
  To: Neil Brown; +Cc: Dan Williams, linux-raid

Hi Neil and Dan,

This patch does seem to have fixed the issue for me.

Thanks!
-Tommy

On Wed, Aug 11, 2010 at 6:43 PM, Neil Brown <neilb@suse.de> wrote:
> On Tue, 10 Aug 2010 22:17:19 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
>
>> On Mon, Aug 9, 2010 at 4:58 AM, fibreraid@gmail.com <fibreraid@gmail.com> wrote:
>> > Hi Neil,
>> >
>> > I may have spoken a bit too soon. It seems that while the md's are
>> > coming up successfully, on occasion, hot-spares are not coming up
>> > associated with their proper md's. As a result, what was a RAID 5 md
>> > with one hot-spare will on occasion come up as a RAID 5 md with no
>> > hot-spare.
>> >
>> > Any ideas on this one?
>> >
>>
>> Is this new behavior only seen with 3.1.3, i.e when it worked with
>> 3.1.2 did the hot spares always arrive correctly?  I suspect this is a
>> result of the new behavior of -I to not add devices to a running array
>> without the -R parameter, but you don't want to make this the default
>> for udev otherwise your arrays will always come up degraded.
>>
>> We could allow disks to be added to active non-degraded arrays, but
>> that still has the possibility of letting a stale device take the
>> place of a fresh hot spare (the whole point of changing the behavior
>> in the first place).  So as far as I can see we need to query the
>> other disks in the active array and permit the disk to be re-added to
>> an active array when it is demonstrably a hot spare (or -R is
>> specified).
>>
>> --
>> Dan
>
>
> Arg... another regression.
>
> Thanks for the report and the analysis.
>
> Here is the fix.
>
> NeilBrown
>
> From ef83fe7cba7355d3da330325e416747b0696baef Mon Sep 17 00:00:00 2001
> From: NeilBrown <neilb@suse.de>
> Date: Thu, 12 Aug 2010 11:41:41 +1000
> Subject: [PATCH] Allow --incremental to add spares to an array.
>
> Commit 3a6ec29ad56 stopped us from adding apparently-working devices
> to an active array with --incremental as there is a good chance that they
> are actually old/failed devices.
>
> Unfortunately it also stopped spares from being added to an active
> array, which is wrong.  This patch refines the test to be more
> careful.
>
> Reported-by: <fibreraid@gmail.com>
> Analysed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: NeilBrown <neilb@suse.de>
>
> diff --git a/Incremental.c b/Incremental.c
> index e4b6196..4d3d181 100644
> --- a/Incremental.c
> +++ b/Incremental.c
> @@ -370,14 +370,15 @@ int Incremental(char *devname, int verbose, int runstop,
>                else
>                        strcpy(chosen_name, devnum2devname(mp->devnum));
>
> -               /* It is generally not OK to add drives to a running array
> -                * as they are probably missing because they failed.
> -                * However if runstop is 1, then the array was possibly
> -                * started early and our best be is to add this anyway.
> -                * It would probably be good to allow explicit policy
> -                * statement about this.
> +               /* It is generally not OK to add non-spare drives to a
> +                * running array as they are probably missing because
> +                * they failed.  However if runstop is 1, then the
> +                * array was possibly started early and our best be is
> +                * to add this anyway.  It would probably be good to
> +                * allow explicit policy statement about this.
>                 */
> -               if (runstop < 1) {
> +               if ((info.disk.state & (1<<MD_DISK_SYNC)) != 0
> +                   && runstop < 1) {
>                        int active = 0;
>
>                        if (st->ss->external) {
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: md's fail to assemble correctly consistently at system startup - mdadm 3.1.2 and Ubuntu 10.04
  2010-08-14 16:57             ` fibreraid
@ 2010-08-16  4:45               ` Neil Brown
  0 siblings, 0 replies; 11+ messages in thread
From: Neil Brown @ 2010-08-16  4:45 UTC (permalink / raw)
  To: fibreraid; +Cc: Dan Williams, linux-raid

On Sat, 14 Aug 2010 09:57:01 -0700
"fibreraid@gmail.com" <fibreraid@gmail.com> wrote:

> Hi Neil and Dan,
> 
> This patch does seem to have fixed the issue for me.
> 

Thanks for the confirmation.

NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-08-16  4:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-08  1:27 md's fail to assemble correctly consistently at system startup - mdadm 3.1.2 and Ubuntu 10.04 fibreraid
2010-08-08  8:58 ` Neil Brown
2010-08-08 14:26   ` fibreraid
2010-08-09  9:00     ` fibreraid
2010-08-09 10:51       ` Neil Brown
2010-08-09 11:00     ` Neil Brown
2010-08-09 11:58       ` fibreraid
2010-08-11  5:17         ` Dan Williams
2010-08-12  1:43           ` Neil Brown
2010-08-14 16:57             ` fibreraid
2010-08-16  4:45               ` Neil Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.