All of lore.kernel.org
 help / color / mirror / Atom feed
* Assembly failure
@ 2012-07-10 16:33 Brian Candler
  2012-07-10 16:48 ` Sebastian Riemer
  2012-07-10 17:05 ` pants
  0 siblings, 2 replies; 16+ messages in thread
From: Brian Candler @ 2012-07-10 16:33 UTC (permalink / raw)
  To: linux-raid

An odd one here.

Ubuntu 12.04 system, updated to 3.4.0 kernel from the mainline-ppa. Machine
has a boot disk plus 12 other disks in a RAID10 far2 array.

System was working fine, but after most recent reboot mdraid failed to
assemble.

root@dev-storage1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md127 : inactive sdk[9](S) sdl[10](S) sdj[8](S) sdg[5](S) sdc[1](S) sdm[11](S) sdf[4](S) sde[3](S) sdi[7](S) sdd[2](S) sdb[0](S) sdh[6](S)
      35163186720 blocks super 1.2
       
unused devices: <none>

dmesg shows periodic "export_rdev" messages:

root@dev-storage1:~# dmesg | grep md:
[  953.986401] md: export_rdev(sdo)
[  953.988515] md: export_rdev(sdo)
[  960.237392] md: export_rdev(sdp)
[  960.241928] md: export_rdev(sdp)
[  960.965132] md: export_rdev(sdr)
[  960.967265] md: export_rdev(sdr)
[ 1012.573415] md: export_rdev(sdo)
[ 1012.575650] md: export_rdev(sdo)
[ 1012.829690] md: export_rdev(sdp)
[ 1012.831493] md: export_rdev(sdp)
...
[19378.332473] md: export_rdev(sds)
[19378.333764] md: export_rdev(sds)
[19417.220171] md: export_rdev(sdr)
[19417.221748] md: export_rdev(sdr)
[23739.824227] md: export_rdev(sdr)
[23739.825554] md: export_rdev(sdr)
[23740.568940] md: export_rdev(sds)
[23740.570079] md: export_rdev(sds)

metadata (see below) suggests that some drives think members 1/3/4 are
missing, but those drives think the array is fine.  The "Events" counts are
different on some members though.

What's the best thing to do here - attempt to force assembly? Any ideas how
it got into this state?

The machine was rebooted a couple of times but in what should have been a
clean way, i.e.  sudo reboot or sudo halt -p.

Many thanks,

Brian.


root@dev-storage1:~# for i in /dev/sd{j..u}; do echo "=== $i ==="; mdadm --examine $i; done
=== /dev/sdj ===
/dev/sdj:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
           Name : storage1:storage1
  Creation Time : Thu Jun  7 13:51:21 2012
     Raid Level : raid10
   Raid Devices : 12

 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
     Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
  Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : e96965d5:bf8986b7:fa83b813:e27aa17f

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Jul 10 10:46:19 2012
       Checksum : 5673ac95 - correct
         Events : 29374

         Layout : far=2
     Chunk Size : 1024K

   Device Role : Active device 8
   Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
=== /dev/sdk ===
/dev/sdk:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
           Name : storage1:storage1
  Creation Time : Thu Jun  7 13:51:21 2012
     Raid Level : raid10
   Raid Devices : 12

 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
     Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
  Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : f15bb47f:85ca59b1:cad42dec:f8b1b63c

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Jul 10 10:46:19 2012
       Checksum : 9020674a - correct
         Events : 29374

         Layout : far=2
     Chunk Size : 1024K

   Device Role : Active device 9
   Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
=== /dev/sdl ===
/dev/sdl:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
           Name : storage1:storage1
  Creation Time : Thu Jun  7 13:51:21 2012
     Raid Level : raid10
   Raid Devices : 12

 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
     Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
  Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : e2d82a4c:8409d883:cf2d9b7c:83829aad

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Jul 10 10:46:19 2012
       Checksum : 30664ccb - correct
         Events : 29374

         Layout : far=2
     Chunk Size : 1024K

   Device Role : Active device 10
   Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
=== /dev/sdm ===
/dev/sdm:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
           Name : storage1:storage1
  Creation Time : Thu Jun  7 13:51:21 2012
     Raid Level : raid10
   Raid Devices : 12

 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
     Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
  Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : e3603570:8e767487:63f3131b:afe358ea

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Jul 10 10:46:19 2012
       Checksum : 33446897 - correct
         Events : 29374

         Layout : far=2
     Chunk Size : 1024K

   Device Role : Active device 11
   Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
=== /dev/sdn ===
/dev/sdn:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
           Name : storage1:storage1
  Creation Time : Thu Jun  7 13:51:21 2012
     Raid Level : raid10
   Raid Devices : 12

 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
     Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
  Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 549a5230:005cedf8:b37a0d7e:36648ff0

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Jul 10 10:46:19 2012
       Checksum : ce0a8f46 - correct
         Events : 29374

         Layout : far=2
     Chunk Size : 1024K

   Device Role : Active device 0
   Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
=== /dev/sdo ===
/dev/sdo:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
           Name : storage1:storage1
  Creation Time : Thu Jun  7 13:51:21 2012
     Raid Level : raid10
   Raid Devices : 12

 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
     Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
  Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : b09f3869:09ce8a89:31ed7097:d3621064

Internal Bitmap : 8 sectors from superblock
    Update Time : Sat Jul  7 14:02:07 2012
       Checksum : 246eb119 - correct
         Events : 29355

         Layout : far=2
     Chunk Size : 1024K

   Device Role : Active device 2
   Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
=== /dev/sdp ===
/dev/sdp:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
           Name : storage1:storage1
  Creation Time : Thu Jun  7 13:51:21 2012
     Raid Level : raid10
   Raid Devices : 12

 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
     Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
  Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 71494ee2:1504b35a:00a1d927:543db7c6

Internal Bitmap : 8 sectors from superblock
    Update Time : Sat Jul  7 14:00:55 2012
       Checksum : 61c11eeb - correct
         Events : 29352

         Layout : far=2
     Chunk Size : 1024K

   Device Role : Active device 3
   Array State : A.AA.AAAAAAA ('A' == active, '.' == missing)
=== /dev/sdq ===
/dev/sdq:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
           Name : storage1:storage1
  Creation Time : Thu Jun  7 13:51:21 2012
     Raid Level : raid10
   Raid Devices : 12

 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
     Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
  Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 17fcd5b8:97fc715f:0877d022:3770d08b

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Jul 10 10:46:19 2012
       Checksum : fd1699fc - correct
         Events : 29374

         Layout : far=2
     Chunk Size : 1024K

   Device Role : Active device 7
   Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
=== /dev/sdr ===
/dev/sdr:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
           Name : storage1:storage1
  Creation Time : Thu Jun  7 13:51:21 2012
     Raid Level : raid10
   Raid Devices : 12

 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
     Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
  Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : e0565412:a68cf236:9da9a141:e89f935a

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Jul  3 04:21:50 2012
       Checksum : 1ba72ec1 - correct
         Events : 20228

         Layout : far=2
     Chunk Size : 1024K

   Device Role : Active device 1
   Array State : AAAAAAAAAAAA ('A' == active, '.' == missing)
=== /dev/sds ===
/dev/sds:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
           Name : storage1:storage1
  Creation Time : Thu Jun  7 13:51:21 2012
     Raid Level : raid10
   Raid Devices : 12

 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
     Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
  Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : c0f3bbab:70f3e69e:1314cdca:072d5ad9

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Jul  3 06:37:33 2012
       Checksum : 24f36d4c - correct
         Events : 29312

         Layout : far=2
     Chunk Size : 1024K

   Device Role : Active device 5
   Array State : A.AA.AAAAAAA ('A' == active, '.' == missing)
=== /dev/sdt ===
/dev/sdt:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
           Name : storage1:storage1
  Creation Time : Thu Jun  7 13:51:21 2012
     Raid Level : raid10
   Raid Devices : 12

 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
     Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
  Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 6b4f9595:4de46aa8:fa695fe8:59b797e7

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Jul 10 10:46:19 2012
       Checksum : 44250f1b - correct
         Events : 29374

         Layout : far=2
     Chunk Size : 1024K

   Device Role : Active device 6
   Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
=== /dev/sdu ===
/dev/sdu:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
           Name : storage1:storage1
  Creation Time : Thu Jun  7 13:51:21 2012
     Raid Level : raid10
   Raid Devices : 12

 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
     Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
  Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 46f03854:de80bec8:b44b062c:dd265ba3

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Jul  3 06:36:18 2012
       Checksum : 22812848 - correct
         Events : 29272

         Layout : far=2
     Chunk Size : 1024K

   Device Role : Active device 4
   Array State : A.AAAAAAAAAA ('A' == active, '.' == missing)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Assembly failure
  2012-07-10 16:33 Assembly failure Brian Candler
@ 2012-07-10 16:48 ` Sebastian Riemer
  2012-07-10 17:06   ` Brian Candler
  2012-07-10 17:05 ` pants
  1 sibling, 1 reply; 16+ messages in thread
From: Sebastian Riemer @ 2012-07-10 16:48 UTC (permalink / raw)
  To: Brian Candler; +Cc: linux-raid

1. Are you crazy to do so? The kernel 3.2 is the stable for Ubuntu 12.04.

2. Please provide the complete Ubuntu version number of your kernel so
that we can look for the commits in the Ubuntu Git. Should be
git://kernel.ubuntu.com/ppisati/ubuntu-quantal.git.
There were some nasty bugs in 3.4.0 mainline - I don't know if the fixes
for them are in your kernel.

Cheers,
Sebastian


On 10.07.2012 18:33, Brian Candler wrote:
> An odd one here.
>
> Ubuntu 12.04 system, updated to 3.4.0 kernel from the mainline-ppa. Machine
> has a boot disk plus 12 other disks in a RAID10 far2 array.
>
> System was working fine, but after most recent reboot mdraid failed to
> assemble.
>
> root@dev-storage1:~# cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
> md127 : inactive sdk[9](S) sdl[10](S) sdj[8](S) sdg[5](S) sdc[1](S) sdm[11](S) sdf[4](S) sde[3](S) sdi[7](S) sdd[2](S) sdb[0](S) sdh[6](S)
>       35163186720 blocks super 1.2
>        
> unused devices: <none>
>
> dmesg shows periodic "export_rdev" messages:
>
> root@dev-storage1:~# dmesg | grep md:
> [  953.986401] md: export_rdev(sdo)
> [  953.988515] md: export_rdev(sdo)
> [  960.237392] md: export_rdev(sdp)
> [  960.241928] md: export_rdev(sdp)
> [  960.965132] md: export_rdev(sdr)
> [  960.967265] md: export_rdev(sdr)
> [ 1012.573415] md: export_rdev(sdo)
> [ 1012.575650] md: export_rdev(sdo)
> [ 1012.829690] md: export_rdev(sdp)
> [ 1012.831493] md: export_rdev(sdp)
> ...
> [19378.332473] md: export_rdev(sds)
> [19378.333764] md: export_rdev(sds)
> [19417.220171] md: export_rdev(sdr)
> [19417.221748] md: export_rdev(sdr)
> [23739.824227] md: export_rdev(sdr)
> [23739.825554] md: export_rdev(sdr)
> [23740.568940] md: export_rdev(sds)
> [23740.570079] md: export_rdev(sds)
>
> metadata (see below) suggests that some drives think members 1/3/4 are
> missing, but those drives think the array is fine.  The "Events" counts are
> different on some members though.
>
> What's the best thing to do here - attempt to force assembly? Any ideas how
> it got into this state?
>
> The machine was rebooted a couple of times but in what should have been a
> clean way, i.e.  sudo reboot or sudo halt -p.
>
> Many thanks,
>
> Brian.
>
>
> root@dev-storage1:~# for i in /dev/sd{j..u}; do echo "=== $i ==="; mdadm --examine $i; done
> === /dev/sdj ===
> /dev/sdj:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x1
>      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
>            Name : storage1:storage1
>   Creation Time : Thu Jun  7 13:51:21 2012
>      Raid Level : raid10
>    Raid Devices : 12
>
>  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
>   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : e96965d5:bf8986b7:fa83b813:e27aa17f
>
> Internal Bitmap : 8 sectors from superblock
>     Update Time : Tue Jul 10 10:46:19 2012
>        Checksum : 5673ac95 - correct
>          Events : 29374
>
>          Layout : far=2
>      Chunk Size : 1024K
>
>    Device Role : Active device 8
>    Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
> === /dev/sdk ===
> /dev/sdk:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x1
>      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
>            Name : storage1:storage1
>   Creation Time : Thu Jun  7 13:51:21 2012
>      Raid Level : raid10
>    Raid Devices : 12
>
>  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
>   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : f15bb47f:85ca59b1:cad42dec:f8b1b63c
>
> Internal Bitmap : 8 sectors from superblock
>     Update Time : Tue Jul 10 10:46:19 2012
>        Checksum : 9020674a - correct
>          Events : 29374
>
>          Layout : far=2
>      Chunk Size : 1024K
>
>    Device Role : Active device 9
>    Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
> === /dev/sdl ===
> /dev/sdl:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x1
>      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
>            Name : storage1:storage1
>   Creation Time : Thu Jun  7 13:51:21 2012
>      Raid Level : raid10
>    Raid Devices : 12
>
>  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
>   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : e2d82a4c:8409d883:cf2d9b7c:83829aad
>
> Internal Bitmap : 8 sectors from superblock
>     Update Time : Tue Jul 10 10:46:19 2012
>        Checksum : 30664ccb - correct
>          Events : 29374
>
>          Layout : far=2
>      Chunk Size : 1024K
>
>    Device Role : Active device 10
>    Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
> === /dev/sdm ===
> /dev/sdm:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x1
>      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
>            Name : storage1:storage1
>   Creation Time : Thu Jun  7 13:51:21 2012
>      Raid Level : raid10
>    Raid Devices : 12
>
>  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
>   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : e3603570:8e767487:63f3131b:afe358ea
>
> Internal Bitmap : 8 sectors from superblock
>     Update Time : Tue Jul 10 10:46:19 2012
>        Checksum : 33446897 - correct
>          Events : 29374
>
>          Layout : far=2
>      Chunk Size : 1024K
>
>    Device Role : Active device 11
>    Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
> === /dev/sdn ===
> /dev/sdn:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x1
>      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
>            Name : storage1:storage1
>   Creation Time : Thu Jun  7 13:51:21 2012
>      Raid Level : raid10
>    Raid Devices : 12
>
>  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
>   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 549a5230:005cedf8:b37a0d7e:36648ff0
>
> Internal Bitmap : 8 sectors from superblock
>     Update Time : Tue Jul 10 10:46:19 2012
>        Checksum : ce0a8f46 - correct
>          Events : 29374
>
>          Layout : far=2
>      Chunk Size : 1024K
>
>    Device Role : Active device 0
>    Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
> === /dev/sdo ===
> /dev/sdo:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x1
>      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
>            Name : storage1:storage1
>   Creation Time : Thu Jun  7 13:51:21 2012
>      Raid Level : raid10
>    Raid Devices : 12
>
>  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
>   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : active
>     Device UUID : b09f3869:09ce8a89:31ed7097:d3621064
>
> Internal Bitmap : 8 sectors from superblock
>     Update Time : Sat Jul  7 14:02:07 2012
>        Checksum : 246eb119 - correct
>          Events : 29355
>
>          Layout : far=2
>      Chunk Size : 1024K
>
>    Device Role : Active device 2
>    Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
> === /dev/sdp ===
> /dev/sdp:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x1
>      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
>            Name : storage1:storage1
>   Creation Time : Thu Jun  7 13:51:21 2012
>      Raid Level : raid10
>    Raid Devices : 12
>
>  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
>   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : active
>     Device UUID : 71494ee2:1504b35a:00a1d927:543db7c6
>
> Internal Bitmap : 8 sectors from superblock
>     Update Time : Sat Jul  7 14:00:55 2012
>        Checksum : 61c11eeb - correct
>          Events : 29352
>
>          Layout : far=2
>      Chunk Size : 1024K
>
>    Device Role : Active device 3
>    Array State : A.AA.AAAAAAA ('A' == active, '.' == missing)
> === /dev/sdq ===
> /dev/sdq:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x1
>      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
>            Name : storage1:storage1
>   Creation Time : Thu Jun  7 13:51:21 2012
>      Raid Level : raid10
>    Raid Devices : 12
>
>  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
>   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 17fcd5b8:97fc715f:0877d022:3770d08b
>
> Internal Bitmap : 8 sectors from superblock
>     Update Time : Tue Jul 10 10:46:19 2012
>        Checksum : fd1699fc - correct
>          Events : 29374
>
>          Layout : far=2
>      Chunk Size : 1024K
>
>    Device Role : Active device 7
>    Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
> === /dev/sdr ===
> /dev/sdr:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x1
>      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
>            Name : storage1:storage1
>   Creation Time : Thu Jun  7 13:51:21 2012
>      Raid Level : raid10
>    Raid Devices : 12
>
>  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
>   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : e0565412:a68cf236:9da9a141:e89f935a
>
> Internal Bitmap : 8 sectors from superblock
>     Update Time : Tue Jul  3 04:21:50 2012
>        Checksum : 1ba72ec1 - correct
>          Events : 20228
>
>          Layout : far=2
>      Chunk Size : 1024K
>
>    Device Role : Active device 1
>    Array State : AAAAAAAAAAAA ('A' == active, '.' == missing)
> === /dev/sds ===
> /dev/sds:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x1
>      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
>            Name : storage1:storage1
>   Creation Time : Thu Jun  7 13:51:21 2012
>      Raid Level : raid10
>    Raid Devices : 12
>
>  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
>   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : active
>     Device UUID : c0f3bbab:70f3e69e:1314cdca:072d5ad9
>
> Internal Bitmap : 8 sectors from superblock
>     Update Time : Tue Jul  3 06:37:33 2012
>        Checksum : 24f36d4c - correct
>          Events : 29312
>
>          Layout : far=2
>      Chunk Size : 1024K
>
>    Device Role : Active device 5
>    Array State : A.AA.AAAAAAA ('A' == active, '.' == missing)
> === /dev/sdt ===
> /dev/sdt:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x1
>      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
>            Name : storage1:storage1
>   Creation Time : Thu Jun  7 13:51:21 2012
>      Raid Level : raid10
>    Raid Devices : 12
>
>  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
>   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 6b4f9595:4de46aa8:fa695fe8:59b797e7
>
> Internal Bitmap : 8 sectors from superblock
>     Update Time : Tue Jul 10 10:46:19 2012
>        Checksum : 44250f1b - correct
>          Events : 29374
>
>          Layout : far=2
>      Chunk Size : 1024K
>
>    Device Role : Active device 6
>    Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
> === /dev/sdu ===
> /dev/sdu:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x1
>      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
>            Name : storage1:storage1
>   Creation Time : Thu Jun  7 13:51:21 2012
>      Raid Level : raid10
>    Raid Devices : 12
>
>  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
>   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : active
>     Device UUID : 46f03854:de80bec8:b44b062c:dd265ba3
>
> Internal Bitmap : 8 sectors from superblock
>     Update Time : Tue Jul  3 06:36:18 2012
>        Checksum : 22812848 - correct
>          Events : 29272
>
>          Layout : far=2
>      Chunk Size : 1024K
>
>    Device Role : Active device 4
>    Array State : A.AAAAAAAAAA ('A' == active, '.' == missing)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Assembly failure
  2012-07-10 16:33 Assembly failure Brian Candler
  2012-07-10 16:48 ` Sebastian Riemer
@ 2012-07-10 17:05 ` pants
  1 sibling, 0 replies; 16+ messages in thread
From: pants @ 2012-07-10 17:05 UTC (permalink / raw)
  To: linux-raid

On Tue, Jul 10, 2012 at 05:33:45PM +0100, Brian Candler wrote:
> metadata (see below) suggests that some drives think members 1/3/4 are
> missing, but those drives think the array is fine.  The "Events" counts are
> different on some members though.

I have had this problem before; in fact, it is the usual behavior when a
drive begins to fail.  If the three drives in question fail to assemble,
it is usually because they aren't readable/writable by your system, and
therefore can't have their metadata changed to reflect the degenerate
state of the array.  I would check the SMART status of the drives and
look into your logs to see if any ATA errors exist, but my suspicion is
that, at assembly, none of those drives was talking to your system.

If you feel that the drives are fine and that this is some random fluke,
you can simply add the drives back to the array (you may have to wipe
their metadata blocks) while using --assume-clean to ensure that the
data on the newly added drives is kept.

Good luck!

pants.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Assembly failure
  2012-07-10 16:48 ` Sebastian Riemer
@ 2012-07-10 17:06   ` Brian Candler
  2012-07-10 17:38     ` Sebastian Riemer
  0 siblings, 1 reply; 16+ messages in thread
From: Brian Candler @ 2012-07-10 17:06 UTC (permalink / raw)
  To: Sebastian Riemer; +Cc: linux-raid

On Tue, Jul 10, 2012 at 06:48:00PM +0200, Sebastian Riemer wrote:
> 1. Are you crazy to do so? The kernel 3.2 is the stable for Ubuntu 12.04.

Possibly crazy :-) Specifically I had been testing out whether the
direct-io-enable option for glusterfs would be helpful (it wasn't) - this
required FUSE support for O_DIRECT which is not in 3.2.0.

> 2. Please provide the complete Ubuntu version number of your kernel so
> that we can look for the commits in the Ubuntu Git.

brian@dev-storage1:~$ uname -a
Linux dev-storage1.example.com 3.4.0-030400-generic #201205210521 SMP Mon May 21 09:22:02 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Packages:

ii  linux-headers-3.4.0-030400         3.4.0-030400.201205210521                Header files related to Linux kernel version 3.4.0
ii  linux-headers-3.4.0-030400-generic 3.4.0-030400.201205210521                Linux kernel headers for version 3.4.0 on 32 bit x86 SMP
ii  linux-image-3.4.0-030400-generic   3.4.0-030400.201205210521                Linux kernel image for version 3.4.0 on 32 bit x86 SMP

It was downloaded from here:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.4-precise/

There doesn't seem to be any newer 3.4.x for precise.

> Should be
> git://kernel.ubuntu.com/ppisati/ubuntu-quantal.git.
> There were some nasty bugs in 3.4.0 mainline - I don't know if the fixes
> for them are in your kernel.

I have no problem rolling back into 3.2.0, but I'm also very happy to do any
diagnostics which may be helpful before I do so.

Regards,

Brian.

> Cheers,
> Sebastian
> 
> 
> On 10.07.2012 18:33, Brian Candler wrote:
> > An odd one here.
> >
> > Ubuntu 12.04 system, updated to 3.4.0 kernel from the mainline-ppa. Machine
> > has a boot disk plus 12 other disks in a RAID10 far2 array.
> >
> > System was working fine, but after most recent reboot mdraid failed to
> > assemble.
> >
> > root@dev-storage1:~# cat /proc/mdstat
> > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
> > md127 : inactive sdk[9](S) sdl[10](S) sdj[8](S) sdg[5](S) sdc[1](S) sdm[11](S) sdf[4](S) sde[3](S) sdi[7](S) sdd[2](S) sdb[0](S) sdh[6](S)
> >       35163186720 blocks super 1.2
> >        
> > unused devices: <none>
> >
> > dmesg shows periodic "export_rdev" messages:
> >
> > root@dev-storage1:~# dmesg | grep md:
> > [  953.986401] md: export_rdev(sdo)
> > [  953.988515] md: export_rdev(sdo)
> > [  960.237392] md: export_rdev(sdp)
> > [  960.241928] md: export_rdev(sdp)
> > [  960.965132] md: export_rdev(sdr)
> > [  960.967265] md: export_rdev(sdr)
> > [ 1012.573415] md: export_rdev(sdo)
> > [ 1012.575650] md: export_rdev(sdo)
> > [ 1012.829690] md: export_rdev(sdp)
> > [ 1012.831493] md: export_rdev(sdp)
> > ...
> > [19378.332473] md: export_rdev(sds)
> > [19378.333764] md: export_rdev(sds)
> > [19417.220171] md: export_rdev(sdr)
> > [19417.221748] md: export_rdev(sdr)
> > [23739.824227] md: export_rdev(sdr)
> > [23739.825554] md: export_rdev(sdr)
> > [23740.568940] md: export_rdev(sds)
> > [23740.570079] md: export_rdev(sds)
> >
> > metadata (see below) suggests that some drives think members 1/3/4 are
> > missing, but those drives think the array is fine.  The "Events" counts are
> > different on some members though.
> >
> > What's the best thing to do here - attempt to force assembly? Any ideas how
> > it got into this state?
> >
> > The machine was rebooted a couple of times but in what should have been a
> > clean way, i.e.  sudo reboot or sudo halt -p.
> >
> > Many thanks,
> >
> > Brian.
> >
> >
> > root@dev-storage1:~# for i in /dev/sd{j..u}; do echo "=== $i ==="; mdadm --examine $i; done
> > === /dev/sdj ===
> > /dev/sdj:
> >           Magic : a92b4efc
> >         Version : 1.2
> >     Feature Map : 0x1
> >      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
> >            Name : storage1:storage1
> >   Creation Time : Thu Jun  7 13:51:21 2012
> >      Raid Level : raid10
> >    Raid Devices : 12
> >
> >  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
> >      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
> >   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
> >     Data Offset : 2048 sectors
> >    Super Offset : 8 sectors
> >           State : clean
> >     Device UUID : e96965d5:bf8986b7:fa83b813:e27aa17f
> >
> > Internal Bitmap : 8 sectors from superblock
> >     Update Time : Tue Jul 10 10:46:19 2012
> >        Checksum : 5673ac95 - correct
> >          Events : 29374
> >
> >          Layout : far=2
> >      Chunk Size : 1024K
> >
> >    Device Role : Active device 8
> >    Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
> > === /dev/sdk ===
> > /dev/sdk:
> >           Magic : a92b4efc
> >         Version : 1.2
> >     Feature Map : 0x1
> >      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
> >            Name : storage1:storage1
> >   Creation Time : Thu Jun  7 13:51:21 2012
> >      Raid Level : raid10
> >    Raid Devices : 12
> >
> >  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
> >      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
> >   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
> >     Data Offset : 2048 sectors
> >    Super Offset : 8 sectors
> >           State : clean
> >     Device UUID : f15bb47f:85ca59b1:cad42dec:f8b1b63c
> >
> > Internal Bitmap : 8 sectors from superblock
> >     Update Time : Tue Jul 10 10:46:19 2012
> >        Checksum : 9020674a - correct
> >          Events : 29374
> >
> >          Layout : far=2
> >      Chunk Size : 1024K
> >
> >    Device Role : Active device 9
> >    Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
> > === /dev/sdl ===
> > /dev/sdl:
> >           Magic : a92b4efc
> >         Version : 1.2
> >     Feature Map : 0x1
> >      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
> >            Name : storage1:storage1
> >   Creation Time : Thu Jun  7 13:51:21 2012
> >      Raid Level : raid10
> >    Raid Devices : 12
> >
> >  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
> >      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
> >   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
> >     Data Offset : 2048 sectors
> >    Super Offset : 8 sectors
> >           State : clean
> >     Device UUID : e2d82a4c:8409d883:cf2d9b7c:83829aad
> >
> > Internal Bitmap : 8 sectors from superblock
> >     Update Time : Tue Jul 10 10:46:19 2012
> >        Checksum : 30664ccb - correct
> >          Events : 29374
> >
> >          Layout : far=2
> >      Chunk Size : 1024K
> >
> >    Device Role : Active device 10
> >    Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
> > === /dev/sdm ===
> > /dev/sdm:
> >           Magic : a92b4efc
> >         Version : 1.2
> >     Feature Map : 0x1
> >      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
> >            Name : storage1:storage1
> >   Creation Time : Thu Jun  7 13:51:21 2012
> >      Raid Level : raid10
> >    Raid Devices : 12
> >
> >  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
> >      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
> >   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
> >     Data Offset : 2048 sectors
> >    Super Offset : 8 sectors
> >           State : clean
> >     Device UUID : e3603570:8e767487:63f3131b:afe358ea
> >
> > Internal Bitmap : 8 sectors from superblock
> >     Update Time : Tue Jul 10 10:46:19 2012
> >        Checksum : 33446897 - correct
> >          Events : 29374
> >
> >          Layout : far=2
> >      Chunk Size : 1024K
> >
> >    Device Role : Active device 11
> >    Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
> > === /dev/sdn ===
> > /dev/sdn:
> >           Magic : a92b4efc
> >         Version : 1.2
> >     Feature Map : 0x1
> >      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
> >            Name : storage1:storage1
> >   Creation Time : Thu Jun  7 13:51:21 2012
> >      Raid Level : raid10
> >    Raid Devices : 12
> >
> >  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
> >      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
> >   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
> >     Data Offset : 2048 sectors
> >    Super Offset : 8 sectors
> >           State : clean
> >     Device UUID : 549a5230:005cedf8:b37a0d7e:36648ff0
> >
> > Internal Bitmap : 8 sectors from superblock
> >     Update Time : Tue Jul 10 10:46:19 2012
> >        Checksum : ce0a8f46 - correct
> >          Events : 29374
> >
> >          Layout : far=2
> >      Chunk Size : 1024K
> >
> >    Device Role : Active device 0
> >    Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
> > === /dev/sdo ===
> > /dev/sdo:
> >           Magic : a92b4efc
> >         Version : 1.2
> >     Feature Map : 0x1
> >      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
> >            Name : storage1:storage1
> >   Creation Time : Thu Jun  7 13:51:21 2012
> >      Raid Level : raid10
> >    Raid Devices : 12
> >
> >  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
> >      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
> >   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
> >     Data Offset : 2048 sectors
> >    Super Offset : 8 sectors
> >           State : active
> >     Device UUID : b09f3869:09ce8a89:31ed7097:d3621064
> >
> > Internal Bitmap : 8 sectors from superblock
> >     Update Time : Sat Jul  7 14:02:07 2012
> >        Checksum : 246eb119 - correct
> >          Events : 29355
> >
> >          Layout : far=2
> >      Chunk Size : 1024K
> >
> >    Device Role : Active device 2
> >    Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
> > === /dev/sdp ===
> > /dev/sdp:
> >           Magic : a92b4efc
> >         Version : 1.2
> >     Feature Map : 0x1
> >      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
> >            Name : storage1:storage1
> >   Creation Time : Thu Jun  7 13:51:21 2012
> >      Raid Level : raid10
> >    Raid Devices : 12
> >
> >  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
> >      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
> >   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
> >     Data Offset : 2048 sectors
> >    Super Offset : 8 sectors
> >           State : active
> >     Device UUID : 71494ee2:1504b35a:00a1d927:543db7c6
> >
> > Internal Bitmap : 8 sectors from superblock
> >     Update Time : Sat Jul  7 14:00:55 2012
> >        Checksum : 61c11eeb - correct
> >          Events : 29352
> >
> >          Layout : far=2
> >      Chunk Size : 1024K
> >
> >    Device Role : Active device 3
> >    Array State : A.AA.AAAAAAA ('A' == active, '.' == missing)
> > === /dev/sdq ===
> > /dev/sdq:
> >           Magic : a92b4efc
> >         Version : 1.2
> >     Feature Map : 0x1
> >      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
> >            Name : storage1:storage1
> >   Creation Time : Thu Jun  7 13:51:21 2012
> >      Raid Level : raid10
> >    Raid Devices : 12
> >
> >  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
> >      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
> >   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
> >     Data Offset : 2048 sectors
> >    Super Offset : 8 sectors
> >           State : clean
> >     Device UUID : 17fcd5b8:97fc715f:0877d022:3770d08b
> >
> > Internal Bitmap : 8 sectors from superblock
> >     Update Time : Tue Jul 10 10:46:19 2012
> >        Checksum : fd1699fc - correct
> >          Events : 29374
> >
> >          Layout : far=2
> >      Chunk Size : 1024K
> >
> >    Device Role : Active device 7
> >    Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
> > === /dev/sdr ===
> > /dev/sdr:
> >           Magic : a92b4efc
> >         Version : 1.2
> >     Feature Map : 0x1
> >      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
> >            Name : storage1:storage1
> >   Creation Time : Thu Jun  7 13:51:21 2012
> >      Raid Level : raid10
> >    Raid Devices : 12
> >
> >  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
> >      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
> >   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
> >     Data Offset : 2048 sectors
> >    Super Offset : 8 sectors
> >           State : clean
> >     Device UUID : e0565412:a68cf236:9da9a141:e89f935a
> >
> > Internal Bitmap : 8 sectors from superblock
> >     Update Time : Tue Jul  3 04:21:50 2012
> >        Checksum : 1ba72ec1 - correct
> >          Events : 20228
> >
> >          Layout : far=2
> >      Chunk Size : 1024K
> >
> >    Device Role : Active device 1
> >    Array State : AAAAAAAAAAAA ('A' == active, '.' == missing)
> > === /dev/sds ===
> > /dev/sds:
> >           Magic : a92b4efc
> >         Version : 1.2
> >     Feature Map : 0x1
> >      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
> >            Name : storage1:storage1
> >   Creation Time : Thu Jun  7 13:51:21 2012
> >      Raid Level : raid10
> >    Raid Devices : 12
> >
> >  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
> >      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
> >   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
> >     Data Offset : 2048 sectors
> >    Super Offset : 8 sectors
> >           State : active
> >     Device UUID : c0f3bbab:70f3e69e:1314cdca:072d5ad9
> >
> > Internal Bitmap : 8 sectors from superblock
> >     Update Time : Tue Jul  3 06:37:33 2012
> >        Checksum : 24f36d4c - correct
> >          Events : 29312
> >
> >          Layout : far=2
> >      Chunk Size : 1024K
> >
> >    Device Role : Active device 5
> >    Array State : A.AA.AAAAAAA ('A' == active, '.' == missing)
> > === /dev/sdt ===
> > /dev/sdt:
> >           Magic : a92b4efc
> >         Version : 1.2
> >     Feature Map : 0x1
> >      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
> >            Name : storage1:storage1
> >   Creation Time : Thu Jun  7 13:51:21 2012
> >      Raid Level : raid10
> >    Raid Devices : 12
> >
> >  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
> >      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
> >   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
> >     Data Offset : 2048 sectors
> >    Super Offset : 8 sectors
> >           State : clean
> >     Device UUID : 6b4f9595:4de46aa8:fa695fe8:59b797e7
> >
> > Internal Bitmap : 8 sectors from superblock
> >     Update Time : Tue Jul 10 10:46:19 2012
> >        Checksum : 44250f1b - correct
> >          Events : 29374
> >
> >          Layout : far=2
> >      Chunk Size : 1024K
> >
> >    Device Role : Active device 6
> >    Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
> > === /dev/sdu ===
> > /dev/sdu:
> >           Magic : a92b4efc
> >         Version : 1.2
> >     Feature Map : 0x1
> >      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
> >            Name : storage1:storage1
> >   Creation Time : Thu Jun  7 13:51:21 2012
> >      Raid Level : raid10
> >    Raid Devices : 12
> >
> >  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
> >      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
> >   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
> >     Data Offset : 2048 sectors
> >    Super Offset : 8 sectors
> >           State : active
> >     Device UUID : 46f03854:de80bec8:b44b062c:dd265ba3
> >
> > Internal Bitmap : 8 sectors from superblock
> >     Update Time : Tue Jul  3 06:36:18 2012
> >        Checksum : 22812848 - correct
> >          Events : 29272
> >
> >          Layout : far=2
> >      Chunk Size : 1024K
> >
> >    Device Role : Active device 4
> >    Array State : A.AAAAAAAAAA ('A' == active, '.' == missing)
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Assembly failure
  2012-07-10 17:06   ` Brian Candler
@ 2012-07-10 17:38     ` Sebastian Riemer
  2012-07-10 18:59       ` Brian Candler
  0 siblings, 1 reply; 16+ messages in thread
From: Sebastian Riemer @ 2012-07-10 17:38 UTC (permalink / raw)
  To: Brian Candler; +Cc: linux-raid

Your kernel is similar to v3.4 mainline. Your kernel has been compiled
one day after Linus tagged v3.4. This kernel has major issues. Please
reboot into the old 3.2 kernel.

Your kernel has no tag in the Ubuntu Git repos!

http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-precise.git;a=tags
http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-quantal.git;a=tags

Your kernel is absolutely unstable. Who built this kernel? Can't be
official release!

Cheers,
Sebastian


On 10.07.2012 19:06, Brian Candler wrote:
> On Tue, Jul 10, 2012 at 06:48:00PM +0200, Sebastian Riemer wrote:
>> 1. Are you crazy to do so? The kernel 3.2 is the stable for Ubuntu 12.04.
> Possibly crazy :-) Specifically I had been testing out whether the
> direct-io-enable option for glusterfs would be helpful (it wasn't) - this
> required FUSE support for O_DIRECT which is not in 3.2.0.
>
>> 2. Please provide the complete Ubuntu version number of your kernel so
>> that we can look for the commits in the Ubuntu Git.
> brian@dev-storage1:~$ uname -a
> Linux dev-storage1.example.com 3.4.0-030400-generic #201205210521 SMP Mon May 21 09:22:02 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
>
> Packages:
>
> ii  linux-headers-3.4.0-030400         3.4.0-030400.201205210521                Header files related to Linux kernel version 3.4.0
> ii  linux-headers-3.4.0-030400-generic 3.4.0-030400.201205210521                Linux kernel headers for version 3.4.0 on 32 bit x86 SMP
> ii  linux-image-3.4.0-030400-generic   3.4.0-030400.201205210521                Linux kernel image for version 3.4.0 on 32 bit x86 SMP
>
> It was downloaded from here:
> http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.4-precise/
>
> There doesn't seem to be any newer 3.4.x for precise.
>
>> Should be
>> git://kernel.ubuntu.com/ppisati/ubuntu-quantal.git.
>> There were some nasty bugs in 3.4.0 mainline - I don't know if the fixes
>> for them are in your kernel.
> I have no problem rolling back into 3.2.0, but I'm also very happy to do any
> diagnostics which may be helpful before I do so.
>
> Regards,
>
> Brian.
>
>> Cheers,
>> Sebastian
>>
>>
>> On 10.07.2012 18:33, Brian Candler wrote:
>>> An odd one here.
>>>
>>> Ubuntu 12.04 system, updated to 3.4.0 kernel from the mainline-ppa. Machine
>>> has a boot disk plus 12 other disks in a RAID10 far2 array.
>>>
>>> System was working fine, but after most recent reboot mdraid failed to
>>> assemble.
>>>
>>> root@dev-storage1:~# cat /proc/mdstat
>>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
>>> md127 : inactive sdk[9](S) sdl[10](S) sdj[8](S) sdg[5](S) sdc[1](S) sdm[11](S) sdf[4](S) sde[3](S) sdi[7](S) sdd[2](S) sdb[0](S) sdh[6](S)
>>>       35163186720 blocks super 1.2
>>>        
>>> unused devices: <none>
>>>
>>> dmesg shows periodic "export_rdev" messages:
>>>
>>> root@dev-storage1:~# dmesg | grep md:
>>> [  953.986401] md: export_rdev(sdo)
>>> [  953.988515] md: export_rdev(sdo)
>>> [  960.237392] md: export_rdev(sdp)
>>> [  960.241928] md: export_rdev(sdp)
>>> [  960.965132] md: export_rdev(sdr)
>>> [  960.967265] md: export_rdev(sdr)
>>> [ 1012.573415] md: export_rdev(sdo)
>>> [ 1012.575650] md: export_rdev(sdo)
>>> [ 1012.829690] md: export_rdev(sdp)
>>> [ 1012.831493] md: export_rdev(sdp)
>>> ...
>>> [19378.332473] md: export_rdev(sds)
>>> [19378.333764] md: export_rdev(sds)
>>> [19417.220171] md: export_rdev(sdr)
>>> [19417.221748] md: export_rdev(sdr)
>>> [23739.824227] md: export_rdev(sdr)
>>> [23739.825554] md: export_rdev(sdr)
>>> [23740.568940] md: export_rdev(sds)
>>> [23740.570079] md: export_rdev(sds)
>>>
>>> metadata (see below) suggests that some drives think members 1/3/4 are
>>> missing, but those drives think the array is fine.  The "Events" counts are
>>> different on some members though.
>>>
>>> What's the best thing to do here - attempt to force assembly? Any ideas how
>>> it got into this state?
>>>
>>> The machine was rebooted a couple of times but in what should have been a
>>> clean way, i.e.  sudo reboot or sudo halt -p.
>>>
>>> Many thanks,
>>>
>>> Brian.
>>>
>>>
>>> root@dev-storage1:~# for i in /dev/sd{j..u}; do echo "=== $i ==="; mdadm --examine $i; done
>>> === /dev/sdj ===
>>> /dev/sdj:
>>>           Magic : a92b4efc
>>>         Version : 1.2
>>>     Feature Map : 0x1
>>>      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
>>>            Name : storage1:storage1
>>>   Creation Time : Thu Jun  7 13:51:21 2012
>>>      Raid Level : raid10
>>>    Raid Devices : 12
>>>
>>>  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>>>      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
>>>   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
>>>     Data Offset : 2048 sectors
>>>    Super Offset : 8 sectors
>>>           State : clean
>>>     Device UUID : e96965d5:bf8986b7:fa83b813:e27aa17f
>>>
>>> Internal Bitmap : 8 sectors from superblock
>>>     Update Time : Tue Jul 10 10:46:19 2012
>>>        Checksum : 5673ac95 - correct
>>>          Events : 29374
>>>
>>>          Layout : far=2
>>>      Chunk Size : 1024K
>>>
>>>    Device Role : Active device 8
>>>    Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
>>> === /dev/sdk ===
>>> /dev/sdk:
>>>           Magic : a92b4efc
>>>         Version : 1.2
>>>     Feature Map : 0x1
>>>      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
>>>            Name : storage1:storage1
>>>   Creation Time : Thu Jun  7 13:51:21 2012
>>>      Raid Level : raid10
>>>    Raid Devices : 12
>>>
>>>  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>>>      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
>>>   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
>>>     Data Offset : 2048 sectors
>>>    Super Offset : 8 sectors
>>>           State : clean
>>>     Device UUID : f15bb47f:85ca59b1:cad42dec:f8b1b63c
>>>
>>> Internal Bitmap : 8 sectors from superblock
>>>     Update Time : Tue Jul 10 10:46:19 2012
>>>        Checksum : 9020674a - correct
>>>          Events : 29374
>>>
>>>          Layout : far=2
>>>      Chunk Size : 1024K
>>>
>>>    Device Role : Active device 9
>>>    Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
>>> === /dev/sdl ===
>>> /dev/sdl:
>>>           Magic : a92b4efc
>>>         Version : 1.2
>>>     Feature Map : 0x1
>>>      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
>>>            Name : storage1:storage1
>>>   Creation Time : Thu Jun  7 13:51:21 2012
>>>      Raid Level : raid10
>>>    Raid Devices : 12
>>>
>>>  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>>>      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
>>>   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
>>>     Data Offset : 2048 sectors
>>>    Super Offset : 8 sectors
>>>           State : clean
>>>     Device UUID : e2d82a4c:8409d883:cf2d9b7c:83829aad
>>>
>>> Internal Bitmap : 8 sectors from superblock
>>>     Update Time : Tue Jul 10 10:46:19 2012
>>>        Checksum : 30664ccb - correct
>>>          Events : 29374
>>>
>>>          Layout : far=2
>>>      Chunk Size : 1024K
>>>
>>>    Device Role : Active device 10
>>>    Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
>>> === /dev/sdm ===
>>> /dev/sdm:
>>>           Magic : a92b4efc
>>>         Version : 1.2
>>>     Feature Map : 0x1
>>>      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
>>>            Name : storage1:storage1
>>>   Creation Time : Thu Jun  7 13:51:21 2012
>>>      Raid Level : raid10
>>>    Raid Devices : 12
>>>
>>>  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>>>      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
>>>   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
>>>     Data Offset : 2048 sectors
>>>    Super Offset : 8 sectors
>>>           State : clean
>>>     Device UUID : e3603570:8e767487:63f3131b:afe358ea
>>>
>>> Internal Bitmap : 8 sectors from superblock
>>>     Update Time : Tue Jul 10 10:46:19 2012
>>>        Checksum : 33446897 - correct
>>>          Events : 29374
>>>
>>>          Layout : far=2
>>>      Chunk Size : 1024K
>>>
>>>    Device Role : Active device 11
>>>    Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
>>> === /dev/sdn ===
>>> /dev/sdn:
>>>           Magic : a92b4efc
>>>         Version : 1.2
>>>     Feature Map : 0x1
>>>      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
>>>            Name : storage1:storage1
>>>   Creation Time : Thu Jun  7 13:51:21 2012
>>>      Raid Level : raid10
>>>    Raid Devices : 12
>>>
>>>  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>>>      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
>>>   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
>>>     Data Offset : 2048 sectors
>>>    Super Offset : 8 sectors
>>>           State : clean
>>>     Device UUID : 549a5230:005cedf8:b37a0d7e:36648ff0
>>>
>>> Internal Bitmap : 8 sectors from superblock
>>>     Update Time : Tue Jul 10 10:46:19 2012
>>>        Checksum : ce0a8f46 - correct
>>>          Events : 29374
>>>
>>>          Layout : far=2
>>>      Chunk Size : 1024K
>>>
>>>    Device Role : Active device 0
>>>    Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
>>> === /dev/sdo ===
>>> /dev/sdo:
>>>           Magic : a92b4efc
>>>         Version : 1.2
>>>     Feature Map : 0x1
>>>      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
>>>            Name : storage1:storage1
>>>   Creation Time : Thu Jun  7 13:51:21 2012
>>>      Raid Level : raid10
>>>    Raid Devices : 12
>>>
>>>  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>>>      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
>>>   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
>>>     Data Offset : 2048 sectors
>>>    Super Offset : 8 sectors
>>>           State : active
>>>     Device UUID : b09f3869:09ce8a89:31ed7097:d3621064
>>>
>>> Internal Bitmap : 8 sectors from superblock
>>>     Update Time : Sat Jul  7 14:02:07 2012
>>>        Checksum : 246eb119 - correct
>>>          Events : 29355
>>>
>>>          Layout : far=2
>>>      Chunk Size : 1024K
>>>
>>>    Device Role : Active device 2
>>>    Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
>>> === /dev/sdp ===
>>> /dev/sdp:
>>>           Magic : a92b4efc
>>>         Version : 1.2
>>>     Feature Map : 0x1
>>>      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
>>>            Name : storage1:storage1
>>>   Creation Time : Thu Jun  7 13:51:21 2012
>>>      Raid Level : raid10
>>>    Raid Devices : 12
>>>
>>>  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>>>      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
>>>   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
>>>     Data Offset : 2048 sectors
>>>    Super Offset : 8 sectors
>>>           State : active
>>>     Device UUID : 71494ee2:1504b35a:00a1d927:543db7c6
>>>
>>> Internal Bitmap : 8 sectors from superblock
>>>     Update Time : Sat Jul  7 14:00:55 2012
>>>        Checksum : 61c11eeb - correct
>>>          Events : 29352
>>>
>>>          Layout : far=2
>>>      Chunk Size : 1024K
>>>
>>>    Device Role : Active device 3
>>>    Array State : A.AA.AAAAAAA ('A' == active, '.' == missing)
>>> === /dev/sdq ===
>>> /dev/sdq:
>>>           Magic : a92b4efc
>>>         Version : 1.2
>>>     Feature Map : 0x1
>>>      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
>>>            Name : storage1:storage1
>>>   Creation Time : Thu Jun  7 13:51:21 2012
>>>      Raid Level : raid10
>>>    Raid Devices : 12
>>>
>>>  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>>>      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
>>>   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
>>>     Data Offset : 2048 sectors
>>>    Super Offset : 8 sectors
>>>           State : clean
>>>     Device UUID : 17fcd5b8:97fc715f:0877d022:3770d08b
>>>
>>> Internal Bitmap : 8 sectors from superblock
>>>     Update Time : Tue Jul 10 10:46:19 2012
>>>        Checksum : fd1699fc - correct
>>>          Events : 29374
>>>
>>>          Layout : far=2
>>>      Chunk Size : 1024K
>>>
>>>    Device Role : Active device 7
>>>    Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
>>> === /dev/sdr ===
>>> /dev/sdr:
>>>           Magic : a92b4efc
>>>         Version : 1.2
>>>     Feature Map : 0x1
>>>      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
>>>            Name : storage1:storage1
>>>   Creation Time : Thu Jun  7 13:51:21 2012
>>>      Raid Level : raid10
>>>    Raid Devices : 12
>>>
>>>  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>>>      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
>>>   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
>>>     Data Offset : 2048 sectors
>>>    Super Offset : 8 sectors
>>>           State : clean
>>>     Device UUID : e0565412:a68cf236:9da9a141:e89f935a
>>>
>>> Internal Bitmap : 8 sectors from superblock
>>>     Update Time : Tue Jul  3 04:21:50 2012
>>>        Checksum : 1ba72ec1 - correct
>>>          Events : 20228
>>>
>>>          Layout : far=2
>>>      Chunk Size : 1024K
>>>
>>>    Device Role : Active device 1
>>>    Array State : AAAAAAAAAAAA ('A' == active, '.' == missing)
>>> === /dev/sds ===
>>> /dev/sds:
>>>           Magic : a92b4efc
>>>         Version : 1.2
>>>     Feature Map : 0x1
>>>      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
>>>            Name : storage1:storage1
>>>   Creation Time : Thu Jun  7 13:51:21 2012
>>>      Raid Level : raid10
>>>    Raid Devices : 12
>>>
>>>  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>>>      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
>>>   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
>>>     Data Offset : 2048 sectors
>>>    Super Offset : 8 sectors
>>>           State : active
>>>     Device UUID : c0f3bbab:70f3e69e:1314cdca:072d5ad9
>>>
>>> Internal Bitmap : 8 sectors from superblock
>>>     Update Time : Tue Jul  3 06:37:33 2012
>>>        Checksum : 24f36d4c - correct
>>>          Events : 29312
>>>
>>>          Layout : far=2
>>>      Chunk Size : 1024K
>>>
>>>    Device Role : Active device 5
>>>    Array State : A.AA.AAAAAAA ('A' == active, '.' == missing)
>>> === /dev/sdt ===
>>> /dev/sdt:
>>>           Magic : a92b4efc
>>>         Version : 1.2
>>>     Feature Map : 0x1
>>>      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
>>>            Name : storage1:storage1
>>>   Creation Time : Thu Jun  7 13:51:21 2012
>>>      Raid Level : raid10
>>>    Raid Devices : 12
>>>
>>>  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>>>      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
>>>   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
>>>     Data Offset : 2048 sectors
>>>    Super Offset : 8 sectors
>>>           State : clean
>>>     Device UUID : 6b4f9595:4de46aa8:fa695fe8:59b797e7
>>>
>>> Internal Bitmap : 8 sectors from superblock
>>>     Update Time : Tue Jul 10 10:46:19 2012
>>>        Checksum : 44250f1b - correct
>>>          Events : 29374
>>>
>>>          Layout : far=2
>>>      Chunk Size : 1024K
>>>
>>>    Device Role : Active device 6
>>>    Array State : A.A..AAAAAAA ('A' == active, '.' == missing)
>>> === /dev/sdu ===
>>> /dev/sdu:
>>>           Magic : a92b4efc
>>>         Version : 1.2
>>>     Feature Map : 0x1
>>>      Array UUID : 16b260fd:e49bd157:da886cd0:5394e194
>>>            Name : storage1:storage1
>>>   Creation Time : Thu Jun  7 13:51:21 2012
>>>      Raid Level : raid10
>>>    Raid Devices : 12
>>>
>>>  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>>>      Array Size : 35163168768 (16767.11 GiB 18003.54 GB)
>>>   Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
>>>     Data Offset : 2048 sectors
>>>    Super Offset : 8 sectors
>>>           State : active
>>>     Device UUID : 46f03854:de80bec8:b44b062c:dd265ba3
>>>
>>> Internal Bitmap : 8 sectors from superblock
>>>     Update Time : Tue Jul  3 06:36:18 2012
>>>        Checksum : 22812848 - correct
>>>          Events : 29272
>>>
>>>          Layout : far=2
>>>      Chunk Size : 1024K
>>>
>>>    Device Role : Active device 4
>>>    Array State : A.AAAAAAAAAA ('A' == active, '.' == missing)
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Sebastian Riemer
Linux Kernel Developer

ProfitBricks GmbH
Greifswalder Str. 207
10405 Berlin, Germany

Tel.:  +49 - 30 - 60 98 56 991 - 303
Fax:   +49 - 30 - 51 64 09 22
Email: sebastian.riemer@profitbricks.com
Web:   http://www.profitbricks.com/

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Andreas Gauger, Achim Weiss

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Assembly failure
  2012-07-10 17:38     ` Sebastian Riemer
@ 2012-07-10 18:59       ` Brian Candler
  2012-07-11  2:43         ` NeilBrown
  0 siblings, 1 reply; 16+ messages in thread
From: Brian Candler @ 2012-07-10 18:59 UTC (permalink / raw)
  To: Sebastian Riemer; +Cc: linux-raid

On Tue, Jul 10, 2012 at 07:38:51PM +0200, Sebastian Riemer wrote:
> Your kernel is similar to v3.4 mainline. Your kernel has been compiled
> one day after Linus tagged v3.4. This kernel has major issues. Please
> reboot into the old 3.2 kernel.
> 
> Your kernel has no tag in the Ubuntu Git repos!
> 
> http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-precise.git;a=tags
> http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-quantal.git;a=tags
> 
> Your kernel is absolutely unstable. Who built this kernel? Can't be
> official release!

I don't know who makes ~kernel-ppa packages.

Anyway, box is now on linux-image-3.2.0-24-generic. Same problem:

brian@dev-storage1:~$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md127 : inactive sdm[1](S) sdg[5](S) sdh[4](S) sdd[3](S) sdj[9](S) sdl[11](S) sdi[8](S) sdk[10](S) sdb[0](S) sde[7](S) sdf[6](S) sdc[2](S)
      35163186720 blocks super 1.2
       
unused devices: <none>

What's my best next step? There's nothing critical on here, but I would like
to use this as practice of recovering a broken md raid volume.

Regards,

Brian.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Assembly failure
  2012-07-10 18:59       ` Brian Candler
@ 2012-07-11  2:43         ` NeilBrown
  2012-07-11  7:58           ` Brian Candler
  0 siblings, 1 reply; 16+ messages in thread
From: NeilBrown @ 2012-07-11  2:43 UTC (permalink / raw)
  To: Brian Candler; +Cc: Sebastian Riemer, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2209 bytes --]

On Tue, 10 Jul 2012 19:59:27 +0100 Brian Candler <B.Candler@pobox.com> wrote:

> On Tue, Jul 10, 2012 at 07:38:51PM +0200, Sebastian Riemer wrote:
> > Your kernel is similar to v3.4 mainline. Your kernel has been compiled
> > one day after Linus tagged v3.4. This kernel has major issues. Please
> > reboot into the old 3.2 kernel.
> > 
> > Your kernel has no tag in the Ubuntu Git repos!
> > 
> > http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-precise.git;a=tags
> > http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-quantal.git;a=tags
> > 
> > Your kernel is absolutely unstable. Who built this kernel? Can't be
> > official release!
> 
> I don't know who makes ~kernel-ppa packages.
> 
> Anyway, box is now on linux-image-3.2.0-24-generic. Same problem:
> 
> brian@dev-storage1:~$ cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
> md127 : inactive sdm[1](S) sdg[5](S) sdh[4](S) sdd[3](S) sdj[9](S) sdl[11](S) sdi[8](S) sdk[10](S) sdb[0](S) sde[7](S) sdf[6](S) sdc[2](S)
>       35163186720 blocks super 1.2
>        
> unused devices: <none>
> 
> What's my best next step? There's nothing critical on here, but I would like
> to use this as practice of recovering a broken md raid volume.
> 

mdadm -S /dev/md127

Then assemble again with "--force" as you expected.

Don't try to --create --assume-clean, it isn't needed.
And don't worry too much about the kernel - though keep away from any Ubuntu
3.2 kernel before the one you have -there is a nasty bug (unrelated to your
current experience) that you don't want to go near.

When you re-assemble it won't include all the devices in the array - just
enough to make the array functional.  You would then need to add the others
back in if you trust them.


As others have suggested, there is probably some hardware problem somewhere.
It looks like sdr failed first, around "Jul  3 04:21:50 2012".
The array continued working until about 06:36 when sdu then sds failed.
Since then it doesn't look like much if anything has been written to the
array - but I cannot be completely certain.

Do you have kernel logs from the morning of 3rd July?

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Assembly failure
  2012-07-11  2:43         ` NeilBrown
@ 2012-07-11  7:58           ` Brian Candler
  2012-07-11  8:27             ` Christian Balzer
  0 siblings, 1 reply; 16+ messages in thread
From: Brian Candler @ 2012-07-11  7:58 UTC (permalink / raw)
  To: NeilBrown; +Cc: Sebastian Riemer, linux-raid

On Wed, Jul 11, 2012 at 12:43:16PM +1000, NeilBrown wrote:
> As others have suggested, there is probably some hardware problem somewhere.
> It looks like sdr failed first, around "Jul  3 04:21:50 2012".
> The array continued working until about 06:36 when sdu then sds failed.
> Since then it doesn't look like much if anything has been written to the
> array - but I cannot be completely certain.

Good spotting, how did you work that out from `mdadm --examine` output? I
see 8 drives in state "clean" and four in state "active".  Three have update
times on Jul 3, two on Jul 7, the rest on Jul 10.  I couldn't see anything
which obviously jumps out as "FAULTY".

You're right that the system has been not doing any writes recently.

> Do you have kernel logs from the morning of 3rd July?

Logs at end of mail. It looks like sde failed, then a few seconds later
reattached itself as sdn - and a couple of hours later more things started
to fail (looks like sdh failed and reattached as sdo, and a minute later sdf
failed and reattached as sdp).  Very odd that these devices should fail so
close together, maybe a power glitch?  After that even more drives
apparently started to fail.

SMART shows 7 drives have reported at least one uncorrectable error:

root@dev-storage1:~# for i in /dev/sd?; do smartctl -A $i | grep -i correct; done
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
187 Reported_Uncorrect      0x0032   092   092   000    Old_age   Always       -       8
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
187 Reported_Uncorrect      0x0032   098   098   000    Old_age   Always       -       2
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
187 Reported_Uncorrect      0x0032   099   099   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
187 Reported_Uncorrect      0x0032   095   095   000    Old_age   Always       -       5
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
187 Reported_Uncorrect      0x0032   092   092   000    Old_age   Always       -       8
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
187 Reported_Uncorrect      0x0032   098   098   000    Old_age   Always       -       2
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
187 Reported_Uncorrect      0x0032   093   093   000    Old_age   Always       -       7
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0

And three have very high seek error rates:

root@dev-storage1:~# for i in /dev/sd?; do smartctl -A $i | grep -i seek_error; done
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  7 Seek_Error_Rate         0x000f   063   060   030    Pre-fail  Always       -       2233869
  7 Seek_Error_Rate         0x000f   063   060   030    Pre-fail  Always       -       2231308
  7 Seek_Error_Rate         0x000f   063   060   030    Pre-fail  Always       -       2199098
  7 Seek_Error_Rate         0x000f   063   060   030    Pre-fail  Always       -       2250443
  7 Seek_Error_Rate         0x000f   063   060   030    Pre-fail  Always       -       2220215
  7 Seek_Error_Rate         0x000f   060   059   030    Pre-fail  Always       -       8592064174
  7 Seek_Error_Rate         0x000f   063   060   030    Pre-fail  Always       -       2238212
  7 Seek_Error_Rate         0x000f   063   060   030    Pre-fail  Always       -       4297171738
  7 Seek_Error_Rate         0x000f   063   060   030    Pre-fail  Always       -       2170378
  7 Seek_Error_Rate         0x000f   063   060   030    Pre-fail  Always       -       2253934
  7 Seek_Error_Rate         0x000f   063   060   030    Pre-fail  Always       -       2192100
  7 Seek_Error_Rate         0x000f   063   060   030    Pre-fail  Always       -       4297224551

BTW these are all Seagate ST3000DM001. Yes, I know :-(

So this doesn't look good. Having a go at reassembly anyway:

root@dev-storage1:~# mdadm -S /dev/md127
mdadm: stopped /dev/md127
root@dev-storage1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
unused devices: <none>
root@dev-storage1:~# ls /dev/sd*
/dev/sda   /dev/sda5  /dev/sde  /dev/sdj  /dev/sdn  /dev/sdq
/dev/sda1  /dev/sdb   /dev/sdf  /dev/sdk  /dev/sdo
/dev/sda2  /dev/sdc   /dev/sdi  /dev/sdl  /dev/sdp
root@dev-storage1:~# mdadm --assemble --force /dev/md/storage1 /dev/sd{b,c,e,f,i,j,k,l,n,o,p,q}
mdadm: forcing event count in /dev/sdc(2) from 29355 upto 29374
mdadm: forcing event count in /dev/sdn(3) from 29352 upto 29374
mdadm: forcing event count in /dev/sdq(5) from 29312 upto 29374
mdadm: forcing event count in /dev/sdo(4) from 29272 upto 29374
mdadm: forcing event count in /dev/sdp(1) from 20228 upto 29374
mdadm: clearing FAULTY flag for device 10 in /dev/md/storage1 for /dev/sdp
mdadm: clearing FAULTY flag for device 8 in /dev/md/storage1 for /dev/sdn
mdadm: clearing FAULTY flag for device 9 in /dev/md/storage1 for /dev/sdo
mdadm: Marking array /dev/md/storage1 as 'clean'
mdadm: /dev/md/storage1 assembled from 12 drives - not enough to start the array.
root@dev-storage1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md127 : inactive sdb[0](S) sdl[11](S) sdk[10](S) sdj[9](S) sdi[8](S) sde[7](S) sdf[6](S) sdq[5](S) sdo[4](S) sdn[3](S) sdc[2](S) sdp[1](S)
      35163186720 blocks super 1.2

unused devices: <none>

> Don't try to --create --assume-clean, it isn't needed.
...
> When you re-assemble it won't include all the devices in the array - just
> enough to make the array functional.  You would then need to add the others
> back in if you trust them.

I am guessing that it's not starting the array because devices 8 and 9 were
both marked as failed, which are two halves of the same pair.

This is a test system but I will exchange at least the three drives with the
high seek error rates.

One final point. I would like to be able to monitor for suspect or failed
drives.  Is my best bet to look at /proc/mdstat output and identify drives
which have been kicked out of the array?  Or to monitor SMART variables (in
that case though I need to decide which ones are the most important to
monitor, and what thresholds to set)?

It would be really useful if the kernel itself kept some per-drive counters
for I/O failures, but if it does, I can't find them.
http://www.kernel.org/doc/Documentation/block/stat.txt

Regards,

Brian.
       

Jul  3 04:22:33 dev-storage1 kernel: [50147.362942] sd 4:0:3:0: [sde] Synchronizing SCSI cache
Jul  3 04:22:33 dev-storage1 kernel: [50147.364646] end_request: I/O error, dev sde, sector 8
Jul  3 04:22:33 dev-storage1 kernel: [50147.364656] md: super_written gets error=-5, uptodate=0
Jul  3 04:22:33 dev-storage1 kernel: [50147.364663] md/raid10:md127: Disk failure on sde, disabling device.
Jul  3 04:22:33 dev-storage1 kernel: [50147.364665] md/raid10:md127: Operation continuing on 11 devices.
Jul  3 04:22:33 dev-storage1 kernel: [50147.364710] sd 4:0:3:0: [sde]  Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Jul  3 04:22:33 dev-storage1 kernel: [50147.364881] mpt2sas0: removing handle(0x000c), sas_addr(0x4433221100000000)
Jul  3 04:22:34 dev-storage1 kernel: [50147.437339] RAID10 conf printout:
Jul  3 04:22:34 dev-storage1 kernel: [50147.437344]  --- wd:11 rd:12
Jul  3 04:22:34 dev-storage1 kernel: [50147.437347]  disk 0, wo:0, o:1, dev:sdb
Jul  3 04:22:34 dev-storage1 kernel: [50147.437350]  disk 1, wo:1, o:0, dev:sde
Jul  3 04:22:34 dev-storage1 kernel: [50147.437352]  disk 2, wo:0, o:1, dev:sdc
Jul  3 04:22:34 dev-storage1 kernel: [50147.437354]  disk 3, wo:0, o:1, dev:sdd
Jul  3 04:22:34 dev-storage1 kernel: [50147.437357]  disk 4, wo:0, o:1, dev:sdh
Jul  3 04:22:34 dev-storage1 kernel: [50147.437359]  disk 5, wo:0, o:1, dev:sdf
Jul  3 04:22:34 dev-storage1 kernel: [50147.437361]  disk 6, wo:0, o:1, dev:sdi
Jul  3 04:22:34 dev-storage1 kernel: [50147.437363]  disk 7, wo:0, o:1, dev:sdg
Jul  3 04:22:34 dev-storage1 kernel: [50147.437366]  disk 8, wo:0, o:1, dev:sdj
Jul  3 04:22:34 dev-storage1 kernel: [50147.437368]  disk 9, wo:0, o:1, dev:sdk
Jul  3 04:22:34 dev-storage1 kernel: [50147.437370]  disk 10, wo:0, o:1, dev:sdl
Jul  3 04:22:34 dev-storage1 kernel: [50147.437372]  disk 11, wo:0, o:1, dev:sdm
Jul  3 04:22:34 dev-storage1 kernel: [50147.437429] RAID10 conf printout:
Jul  3 04:22:34 dev-storage1 kernel: [50147.437434]  --- wd:11 rd:12
Jul  3 04:22:34 dev-storage1 kernel: [50147.437437]  disk 0, wo:0, o:1, dev:sdb
Jul  3 04:22:34 dev-storage1 kernel: [50147.437439]  disk 2, wo:0, o:1, dev:sdc
Jul  3 04:22:34 dev-storage1 kernel: [50147.437441]  disk 3, wo:0, o:1, dev:sdd
Jul  3 04:22:34 dev-storage1 kernel: [50147.437444]  disk 4, wo:0, o:1, dev:sdh
Jul  3 04:22:34 dev-storage1 kernel: [50147.437446]  disk 5, wo:0, o:1, dev:sdf
Jul  3 04:22:34 dev-storage1 kernel: [50147.437448]  disk 6, wo:0, o:1, dev:sdi
Jul  3 04:22:34 dev-storage1 kernel: [50147.437450]  disk 7, wo:0, o:1, dev:sdg
Jul  3 04:22:34 dev-storage1 kernel: [50147.437452]  disk 8, wo:0, o:1, dev:sdj
Jul  3 04:22:34 dev-storage1 kernel: [50147.437454]  disk 9, wo:0, o:1, dev:sdk
Jul  3 04:22:34 dev-storage1 kernel: [50147.437457]  disk 10, wo:0, o:1, dev:sdl
Jul  3 04:22:34 dev-storage1 kernel: [50147.437459]  disk 11, wo:0, o:1, dev:sdm
Jul  3 04:22:47 dev-storage1 kernel: [50161.168292] scsi 4:0:8:0: Direct-Access     ATA      ST3000DM001-9YN1 CC4C PQ: 0 ANSI: 6
Jul  3 04:22:47 dev-storage1 kernel: [50161.168302] scsi 4:0:8:0: SATA: handle(0x000c), sas_addr(0x4433221100000000), phy(0), device_name(0x5000c5004a37a3ae)
Jul  3 04:22:47 dev-storage1 kernel: [50161.168307] scsi 4:0:8:0: SATA: enclosure_logical_id(0x500605b00448c4f0), slot(3)
Jul  3 04:22:47 dev-storage1 kernel: [50161.168457] scsi 4:0:8:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Jul  3 04:22:47 dev-storage1 kernel: [50161.168465] scsi 4:0:8:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
Jul  3 04:22:47 dev-storage1 kernel: [50161.168686] sd 4:0:8:0: Attached scsi generic sg4 type 0
Jul  3 04:22:47 dev-storage1 kernel: [50161.170776] sd 4:0:8:0: [sdn] physical block alignment offset: 4096
Jul  3 04:22:47 dev-storage1 kernel: [50161.170783] sd 4:0:8:0: [sdn] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB)
Jul  3 04:22:47 dev-storage1 kernel: [50161.170786] sd 4:0:8:0: [sdn] 4096-byte physical blocks
Jul  3 04:22:47 dev-storage1 kernel: [50161.233278] sd 4:0:8:0: [sdn] Write Protect is off
Jul  3 04:22:47 dev-storage1 kernel: [50161.233283] sd 4:0:8:0: [sdn] Mode Sense: 7f 00 00 08
Jul  3 04:22:47 dev-storage1 kernel: [50161.233981] sd 4:0:8:0: [sdn] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jul  3 04:22:47 dev-storage1 kernel: [50161.325272]  sdn: unknown partition table
Jul  3 04:22:47 dev-storage1 kernel: [50161.392086] sd 4:0:8:0: [sdn] Attached SCSI disk


Jul  3 06:37:00 dev-storage1 kernel: [58191.291518] sd 4:0:6:0: [sdh] Synchronizing SCSI cache
Jul  3 06:37:00 dev-storage1 kernel: [58191.291586] sd 4:0:6:0: [sdh]  Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Jul  3 06:37:00 dev-storage1 kernel: [58191.291975] md: super_written gets error=-19, uptodate=0
Jul  3 06:37:00 dev-storage1 kernel: [58191.291982] md/raid10:md127: Disk failure on sdh, disabling device.
Jul  3 06:37:00 dev-storage1 kernel: [58191.291985] md/raid10:md127: Operation continuing on 10 devices.
Jul  3 06:37:00 dev-storage1 kernel: [58191.292675] mpt2sas0: removing handle(0x000f), sas_addr(0x4433221105000000)
Jul  3 06:37:00 dev-storage1 kernel: [58191.363950] RAID10 conf printout:
Jul  3 06:37:00 dev-storage1 kernel: [58191.363955]  --- wd:10 rd:12
Jul  3 06:37:00 dev-storage1 kernel: [58191.363958]  disk 0, wo:0, o:1, dev:sdb
Jul  3 06:37:00 dev-storage1 kernel: [58191.363961]  disk 2, wo:0, o:1, dev:sdc
Jul  3 06:37:00 dev-storage1 kernel: [58191.363963]  disk 3, wo:0, o:1, dev:sdd
Jul  3 06:37:00 dev-storage1 kernel: [58191.363965]  disk 4, wo:1, o:0, dev:sdh
Jul  3 06:37:00 dev-storage1 kernel: [58191.363967]  disk 5, wo:0, o:1, dev:sdf
Jul  3 06:37:00 dev-storage1 kernel: [58191.363970]  disk 6, wo:0, o:1, dev:sdi
Jul  3 06:37:00 dev-storage1 kernel: [58191.363972]  disk 7, wo:0, o:1, dev:sdg
Jul  3 06:37:00 dev-storage1 kernel: [58191.363974]  disk 8, wo:0, o:1, dev:sdj
Jul  3 06:37:00 dev-storage1 kernel: [58191.363976]  disk 9, wo:0, o:1, dev:sdk
Jul  3 06:37:00 dev-storage1 kernel: [58191.363979]  disk 10, wo:0, o:1, dev:sdl
Jul  3 06:37:00 dev-storage1 kernel: [58191.363981]  disk 11, wo:0, o:1, dev:sdm
Jul  3 06:37:00 dev-storage1 kernel: [58191.364014] RAID10 conf printout:
Jul  3 06:37:00 dev-storage1 kernel: [58191.364018]  --- wd:10 rd:12
Jul  3 06:37:00 dev-storage1 kernel: [58191.364021]  disk 0, wo:0, o:1, dev:sdb
Jul  3 06:37:00 dev-storage1 kernel: [58191.364024]  disk 2, wo:0, o:1, dev:sdc
Jul  3 06:37:00 dev-storage1 kernel: [58191.364026]  disk 3, wo:0, o:1, dev:sdd
Jul  3 06:37:00 dev-storage1 kernel: [58191.364028]  disk 5, wo:0, o:1, dev:sdf
Jul  3 06:37:00 dev-storage1 kernel: [58191.364030]  disk 6, wo:0, o:1, dev:sdi
Jul  3 06:37:00 dev-storage1 kernel: [58191.364033]  disk 7, wo:0, o:1, dev:sdg
Jul  3 06:37:00 dev-storage1 kernel: [58191.364035]  disk 8, wo:0, o:1, dev:sdj
Jul  3 06:37:00 dev-storage1 kernel: [58191.364037]  disk 9, wo:0, o:1, dev:sdk
Jul  3 06:37:00 dev-storage1 kernel: [58191.364039]  disk 10, wo:0, o:1, dev:sdl
Jul  3 06:37:00 dev-storage1 kernel: [58191.364041]  disk 11, wo:0, o:1, dev:sdm
Jul  3 06:37:14 dev-storage1 kernel: [58204.853102] scsi 4:0:9:0: Direct-Access     ATA      ST3000DM001-9YN1 CC4C PQ: 0 ANSI: 6
Jul  3 06:37:14 dev-storage1 kernel: [58204.853112] scsi 4:0:9:0: SATA: handle(0x000f), sas_addr(0x4433221105000000), phy(5), device_name(0x5000c5004a44edbe)
Jul  3 06:37:14 dev-storage1 kernel: [58204.853116] scsi 4:0:9:0: SATA: enclosure_logical_id(0x500605b00448c4f0), slot(6)
Jul  3 06:37:14 dev-storage1 kernel: [58204.853292] scsi 4:0:9:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Jul  3 06:37:14 dev-storage1 kernel: [58204.853299] scsi 4:0:9:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
Jul  3 06:37:14 dev-storage1 kernel: [58204.853491] sd 4:0:9:0: Attached scsi generic sg7 type 0
Jul  3 06:37:14 dev-storage1 kernel: [58204.853882] sd 4:0:9:0: [sdo] physical block alignment offset: 4096
Jul  3 06:37:14 dev-storage1 kernel: [58204.853892] sd 4:0:9:0: [sdo] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB)
Jul  3 06:37:14 dev-storage1 kernel: [58204.853897] sd 4:0:9:0: [sdo] 4096-byte physical blocks
Jul  3 06:37:14 dev-storage1 kernel: [58204.920533] sd 4:0:9:0: [sdo] Write Protect is off
Jul  3 06:37:14 dev-storage1 kernel: [58204.920539] sd 4:0:9:0: [sdo] Mode Sense: 7f 00 00 08
Jul  3 06:37:14 dev-storage1 kernel: [58204.949593] sd 4:0:9:0: [sdo] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jul  3 06:37:14 dev-storage1 kernel: [58205.077376]  sdo: unknown partition table
Jul  3 06:37:14 dev-storage1 kernel: [58205.145744] sd 4:0:9:0: [sdo] Attached SCSI disk
Jul  3 06:38:15 dev-storage1 kernel: [58266.076919] sd 4:0:4:0: [sdf] Synchronizing SCSI cache
Jul  3 06:38:15 dev-storage1 kernel: [58266.077767] sd 4:0:4:0: [sdf]  Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Jul  3 06:38:15 dev-storage1 kernel: [58266.078218] mpt2sas0: removing handle(0x000d), sas_addr(0x4433221104000000)
Jul  3 06:38:15 dev-storage1 kernel: [58266.078681] md: super_written gets error=-19, uptodate=0
Jul  3 06:38:16 dev-storage1 kernel: [58266.356847] md: super_written gets error=-19, uptodate=0
Jul  3 06:38:16 dev-storage1 kernel: [58266.381809] md: super_written gets error=-19, uptodate=0
Jul  3 06:38:18 dev-storage1 kernel: [58268.351226] md: super_written gets error=-19, uptodate=0
Jul  3 06:38:18 dev-storage1 kernel: [58268.381174] md: super_written gets error=-19, uptodate=0
Jul  3 06:38:18 dev-storage1 kernel: [58268.606432] md: super_written gets error=-19, uptodate=0
Jul  3 06:38:18 dev-storage1 kernel: [58268.643731] md: super_written gets error=-19, uptodate=0
Jul  3 06:38:20 dev-storage1 kernel: [58270.676566] md: super_written gets error=-19, uptodate=0
Jul  3 06:38:20 dev-storage1 kernel: [58270.727079] md: super_written gets error=-19, uptodate=0
Jul  3 06:38:20 dev-storage1 kernel: [58270.951730] md: super_written gets error=-19, uptodate=0
Jul  3 06:38:20 dev-storage1 kernel: [58270.986091] md: super_written gets error=-19, uptodate=0
Jul  3 06:38:22 dev-storage1 kernel: [58273.021826] md: super_written gets error=-19, uptodate=0
Jul  3 06:38:22 dev-storage1 kernel: [58273.100664] md: super_written gets error=-19, uptodate=0
Jul  3 06:38:22 dev-storage1 kernel: [58273.122092] md: super_written gets error=-19, uptodate=0
Jul  3 06:38:23 dev-storage1 kernel: [58273.324949] md: super_written gets error=-19, uptodate=0
Jul  3 06:38:23 dev-storage1 kernel: [58273.351418] md: super_written gets error=-19, uptodate=0
Jul  3 06:38:24 dev-storage1 kernel: [58274.318169] md: super_written gets error=-19, uptodate=0
Jul  3 06:38:24 dev-storage1 kernel: [58274.351982] md: super_written gets error=-19, uptodate=0
Jul  3 06:38:24 dev-storage1 kernel: [58274.609269] md: super_written gets error=-19, uptodate=0
Jul  3 06:38:24 dev-storage1 kernel: [58274.636725] md: super_written gets error=-19, uptodate=0
Jul  3 06:38:29 dev-storage1 kernel: [58279.573103] scsi 4:0:10:0: Direct-Access     ATA      ST3000DM001-9YN1 CC4C PQ: 0 ANSI: 6
Jul  3 06:38:29 dev-storage1 kernel: [58279.573113] scsi 4:0:10:0: SATA: handle(0x000d), sas_addr(0x4433221104000000), phy(4), device_name(0x5000c5004a123b93)
Jul  3 06:38:29 dev-storage1 kernel: [58279.573117] scsi 4:0:10:0: SATA: enclosure_logical_id(0x500605b00448c4f0), slot(7)
Jul  3 06:38:29 dev-storage1 kernel: [58279.573252] scsi 4:0:10:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Jul  3 06:38:29 dev-storage1 kernel: [58279.573257] scsi 4:0:10:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
Jul  3 06:38:29 dev-storage1 kernel: [58279.573450] sd 4:0:10:0: Attached scsi generic sg5 type 0
Jul  3 06:38:29 dev-storage1 kernel: [58279.573754] sd 4:0:10:0: [sdp] physical block alignment offset: 4096
Jul  3 06:38:29 dev-storage1 kernel: [58279.573764] sd 4:0:10:0: [sdp] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB)
Jul  3 06:38:29 dev-storage1 kernel: [58279.573768] sd 4:0:10:0: [sdp] 4096-byte physical blocks
Jul  3 06:38:29 dev-storage1 kernel: [58279.626371] sd 4:0:10:0: [sdp] Write Protect is off
Jul  3 06:38:29 dev-storage1 kernel: [58279.626376] sd 4:0:10:0: [sdp] Mode Sense: 7f 00 00 08
Jul  3 06:38:29 dev-storage1 kernel: [58279.627052] sd 4:0:10:0: [sdp] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jul  3 06:38:29 dev-storage1 kernel: [58279.722780]  sdp: unknown partition table
Jul  3 06:38:29 dev-storage1 kernel: [58279.785171] sd 4:0:10:0: [sdp] Attached SCSI disk
Jul  3 06:38:45 dev-storage1 kernel: [58296.071974] md: super_written gets error=-19, uptodate=0
Jul  3 06:38:45 dev-storage1 kernel: [58296.104706] md: super_written gets error=-19, uptodate=0
Jul  3 06:38:45 dev-storage1 kernel: [58296.125410] md: super_written gets error=-19, uptodate=0
Jul  3 06:38:46 dev-storage1 kernel: [58296.347134] md: super_written gets error=-19, uptodate=0
Jul  3 06:38:46 dev-storage1 kernel: [58296.373160] md: super_written gets error=-19, uptodate=0
Jul  3 14:00:27 dev-storage1 kernel: [84721.683301] md: super_written gets error=-19, uptodate=0
Jul  3 14:00:27 dev-storage1 kernel: [84721.717956] md: super_written gets error=-19, uptodate=0
Jul  3 14:00:27 dev-storage1 kernel: [84721.974395] md: super_written gets error=-19, uptodate=0
Jul  3 14:00:27 dev-storage1 kernel: [84721.997145] md: super_written gets error=-19, uptodate=0
Jul  3 14:01:01 dev-storage1 kernel: [84755.610249] md: super_written gets error=-19, uptodate=0
Jul  3 14:01:01 dev-storage1 kernel: [84755.652918] md: super_written gets error=-19, uptodate=0
Jul  3 14:01:01 dev-storage1 kernel: [84755.673944] md: super_written gets error=-19, uptodate=0
Jul  3 14:01:01 dev-storage1 kernel: [84755.694312] quiet_error: 24 callbacks suppressed
Jul  3 14:01:01 dev-storage1 kernel: [84755.694318] Buffer I/O error on device md127, logical block 2067791905
Jul  3 14:01:01 dev-storage1 kernel: [84755.694326] lost page write due to I/O error on md127
Jul  3 14:01:01 dev-storage1 kernel: [84755.897413] md: super_written gets error=-19, uptodate=0
Jul  3 14:01:01 dev-storage1 kernel: [84755.918495] md: super_written gets error=-19, uptodate=0
Jul  4 14:00:34 dev-storage1 kernel: [170882.380831] md: super_written gets error=-19, uptodate=0
Jul  4 14:00:34 dev-storage1 kernel: [170882.415605] md: super_written gets error=-19, uptodate=0
Jul  4 14:00:34 dev-storage1 kernel: [170882.671923] md: super_written gets error=-19, uptodate=0
Jul  4 14:00:34 dev-storage1 kernel: [170882.694429] md: super_written gets error=-19, uptodate=0
Jul  4 14:01:08 dev-storage1 kernel: [170916.387597] md: super_written gets error=-19, uptodate=0
Jul  4 14:01:08 dev-storage1 kernel: [170916.423033] md: super_written gets error=-19, uptodate=0
Jul  4 14:01:08 dev-storage1 kernel: [170916.444517] Buffer I/O error on device md127, logical block 2067791905
Jul  4 14:01:08 dev-storage1 kernel: [170916.444524] lost page write due to I/O error on md127
Jul  4 14:01:08 dev-storage1 kernel: [170916.646795] md: super_written gets error=-19, uptodate=0
Jul  4 14:01:08 dev-storage1 kernel: [170916.667910] md: super_written gets error=-19, uptodate=0
Jul  5 06:25:07 dev-storage1 kernel: [229786.969375] md: super_written gets error=-19, uptodate=0
Jul  5 06:25:07 dev-storage1 kernel: [229787.001133] md: super_written gets error=-19, uptodate=0
Jul  5 06:25:07 dev-storage1 kernel: [229787.264457] md: super_written gets error=-19, uptodate=0
Jul  5 06:25:07 dev-storage1 kernel: [229787.284359] md: super_written gets error=-19, uptodate=0
Jul  5 06:25:41 dev-storage1 kernel: [229821.534543] md: super_written gets error=-19, uptodate=0
Jul  5 06:25:41 dev-storage1 kernel: [229821.568091] md: super_written gets error=-19, uptodate=0
Jul  5 06:25:42 dev-storage1 kernel: [229821.793719] md: super_written gets error=-19, uptodate=0
Jul  5 06:25:42 dev-storage1 kernel: [229821.821779] md: super_written gets error=-19, uptodate=0
Jul  5 14:00:41 dev-storage1 kernel: [257043.062362] md: super_written gets error=-19, uptodate=0
Jul  5 14:00:41 dev-storage1 kernel: [257043.096332] md: super_written gets error=-19, uptodate=0
Jul  5 14:00:41 dev-storage1 kernel: [257043.353502] md: super_written gets error=-19, uptodate=0
Jul  5 14:00:41 dev-storage1 kernel: [257043.372982] md: super_written gets error=-19, uptodate=0
Jul  5 14:01:15 dev-storage1 kernel: [257077.148922] md: super_written gets error=-19, uptodate=0
Jul  5 14:01:15 dev-storage1 kernel: [257077.183874] md: super_written gets error=-19, uptodate=0
Jul  5 14:01:15 dev-storage1 kernel: [257077.205482] Buffer I/O error on device md127, logical block 2067791905
Jul  5 14:01:15 dev-storage1 kernel: [257077.205491] lost page write due to I/O error on md127
Jul  5 14:01:15 dev-storage1 kernel: [257077.408139] md: super_written gets error=-19, uptodate=0
Jul  5 14:01:15 dev-storage1 kernel: [257077.430487] md: super_written gets error=-19, uptodate=0
Jul  6 06:25:08 dev-storage1 kernel: [315941.667994] md: super_written gets error=-19, uptodate=0
Jul  6 06:25:08 dev-storage1 kernel: [315941.701458] md: super_written gets error=-19, uptodate=0
Jul  6 06:25:08 dev-storage1 kernel: [315941.959147] md: super_written gets error=-19, uptodate=0
Jul  6 06:25:08 dev-storage1 kernel: [315941.980677] md: super_written gets error=-19, uptodate=0
Jul  6 06:25:42 dev-storage1 kernel: [315975.882227] md: super_written gets error=-19, uptodate=0
Jul  6 06:25:42 dev-storage1 kernel: [315975.911833] md: super_written gets error=-19, uptodate=0
Jul  6 06:25:42 dev-storage1 kernel: [315976.141403] md: super_written gets error=-19, uptodate=0
Jul  6 06:25:42 dev-storage1 kernel: [315976.169041] md: super_written gets error=-19, uptodate=0
Jul  6 14:00:48 dev-storage1 kernel: [343203.728029] md: super_written gets error=-19, uptodate=0
Jul  6 14:00:48 dev-storage1 kernel: [343203.761823] md: super_written gets error=-19, uptodate=0
Jul  6 14:00:48 dev-storage1 kernel: [343204.019107] md: super_written gets error=-19, uptodate=0
Jul  6 14:00:48 dev-storage1 kernel: [343204.041347] md: super_written gets error=-19, uptodate=0
Jul  6 14:01:22 dev-storage1 kernel: [343238.117700] md: super_written gets error=-19, uptodate=0
Jul  6 14:01:22 dev-storage1 kernel: [343238.153053] md: super_written gets error=-19, uptodate=0
Jul  6 14:01:22 dev-storage1 kernel: [343238.173113] Buffer I/O error on device md127, logical block 2067791905
Jul  6 14:01:22 dev-storage1 kernel: [343238.173123] lost page write due to I/O error on md127
Jul  6 14:01:22 dev-storage1 kernel: [343238.372889] md: super_written gets error=-19, uptodate=0
Jul  6 14:01:22 dev-storage1 kernel: [343238.393954] md: super_written gets error=-19, uptodate=0
Jul  7 06:25:07 dev-storage1 kernel: [402094.372432] md: super_written gets error=-19, uptodate=0
Jul  7 06:25:07 dev-storage1 kernel: [402094.405765] md: super_written gets error=-19, uptodate=0
Jul  7 06:25:07 dev-storage1 kernel: [402094.655548] md: super_written gets error=-19, uptodate=0
Jul  7 06:25:07 dev-storage1 kernel: [402094.676942] md: super_written gets error=-19, uptodate=0
Jul  7 06:25:42 dev-storage1 kernel: [402129.208784] md: super_written gets error=-19, uptodate=0
Jul  7 06:25:42 dev-storage1 kernel: [402129.244327] md: super_written gets error=-19, uptodate=0
Jul  7 06:25:42 dev-storage1 kernel: [402129.467991] md: super_written gets error=-19, uptodate=0
Jul  7 06:25:42 dev-storage1 kernel: [402129.491687] md: super_written gets error=-19, uptodate=0
Jul  7 14:00:55 dev-storage1 kernel: [429364.393602] md: super_written gets error=-19, uptodate=0
Jul  7 14:00:55 dev-storage1 kernel: [429364.443065] md: super_written gets error=-19, uptodate=0
Jul  7 14:01:37 dev-storage1 kernel: [429406.619062] sd 4:0:2:0: [sdd] Synchronizing SCSI cache
Jul  7 14:01:37 dev-storage1 kernel: [429406.619127] sd 4:0:2:0: [sdd]  Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Jul  7 14:01:37 dev-storage1 kernel: [429406.619415] mpt2sas0: removing handle(0x000a), sas_addr(0x4433221102000000)
Jul  7 14:01:37 dev-storage1 kernel: [429406.860149] md: super_written gets error=-19, uptodate=0
Jul  7 14:01:37 dev-storage1 kernel: [429406.860155] md/raid10:md127: Disk failure on sdd, disabling device.
Jul  7 14:01:37 dev-storage1 kernel: [429406.860157] md/raid10:md127: Operation continuing on 9 devices.
Jul  7 14:01:37 dev-storage1 kernel: [429406.860177] md: super_written gets error=-19, uptodate=0
Jul  7 14:01:37 dev-storage1 kernel: [429406.888238] md: super_written gets error=-19, uptodate=0
Jul  7 14:01:37 dev-storage1 kernel: [429406.909143] md: super_written gets error=-19, uptodate=0
Jul  7 14:01:37 dev-storage1 kernel: [429406.930558] md: super_written gets error=-19, uptodate=0
Jul  7 14:01:37 dev-storage1 kernel: [429406.951439] RAID10 conf printout:
Jul  7 14:01:37 dev-storage1 kernel: [429406.951443]  --- wd:9 rd:12
Jul  7 14:01:37 dev-storage1 kernel: [429406.951446]  disk 0, wo:0, o:1, dev:sdb
Jul  7 14:01:37 dev-storage1 kernel: [429406.951449]  disk 2, wo:0, o:1, dev:sdc
Jul  7 14:01:37 dev-storage1 kernel: [429406.951451]  disk 3, wo:1, o:0, dev:sdd
Jul  7 14:01:37 dev-storage1 kernel: [429406.951453]  disk 5, wo:0, o:1, dev:sdf
Jul  7 14:01:37 dev-storage1 kernel: [429406.951456]  disk 6, wo:0, o:1, dev:sdi
Jul  7 14:01:37 dev-storage1 kernel: [429406.951458]  disk 7, wo:0, o:1, dev:sdg
Jul  7 14:01:37 dev-storage1 kernel: [429406.951460]  disk 8, wo:0, o:1, dev:sdj
Jul  7 14:01:37 dev-storage1 kernel: [429406.951462]  disk 9, wo:0, o:1, dev:sdk
Jul  7 14:01:37 dev-storage1 kernel: [429406.951465]  disk 10, wo:0, o:1, dev:sdl
Jul  7 14:01:37 dev-storage1 kernel: [429406.951467]  disk 11, wo:0, o:1, dev:sdm
Jul  7 14:01:37 dev-storage1 kernel: [429406.951527] RAID10 conf printout:
Jul  7 14:01:37 dev-storage1 kernel: [429406.951532]  --- wd:9 rd:12
Jul  7 14:01:37 dev-storage1 kernel: [429406.951535]  disk 0, wo:0, o:1, dev:sdb
Jul  7 14:01:37 dev-storage1 kernel: [429406.951537]  disk 2, wo:0, o:1, dev:sdc
Jul  7 14:01:37 dev-storage1 kernel: [429406.951540]  disk 5, wo:0, o:1, dev:sdf
Jul  7 14:01:37 dev-storage1 kernel: [429406.951542]  disk 6, wo:0, o:1, dev:sdi
Jul  7 14:01:37 dev-storage1 kernel: [429406.951544]  disk 7, wo:0, o:1, dev:sdg
Jul  7 14:01:37 dev-storage1 kernel: [429406.951546]  disk 8, wo:0, o:1, dev:sdj
Jul  7 14:01:37 dev-storage1 kernel: [429406.951549]  disk 9, wo:0, o:1, dev:sdk
Jul  7 14:01:37 dev-storage1 kernel: [429406.951551]  disk 10, wo:0, o:1, dev:sdl
Jul  7 14:01:37 dev-storage1 kernel: [429406.951553]  disk 11, wo:0, o:1, dev:sdm
Jul  7 14:01:51 dev-storage1 kernel: [429420.898133] scsi 4:0:11:0: Direct-Access     ATA      ST3000DM001-9YN1 CC4C PQ: 0 ANSI: 6
Jul  7 14:01:51 dev-storage1 kernel: [429420.898143] scsi 4:0:11:0: SATA: handle(0x000a), sas_addr(0x4433221102000000), phy(2), device_name(0x5000c5004a44f42a)
Jul  7 14:01:51 dev-storage1 kernel: [429420.898147] scsi 4:0:11:0: SATA: enclosure_logical_id(0x500605b00448c4f0), slot(1)
Jul  7 14:01:51 dev-storage1 kernel: [429420.898290] scsi 4:0:11:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Jul  7 14:01:51 dev-storage1 kernel: [429420.898297] scsi 4:0:11:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
Jul  7 14:01:51 dev-storage1 kernel: [429420.898514] sd 4:0:11:0: Attached scsi generic sg3 type 0
Jul  7 14:01:51 dev-storage1 kernel: [429420.898858] sd 4:0:11:0: [sdq] physical block alignment offset: 4096
Jul  7 14:01:51 dev-storage1 kernel: [429420.898867] sd 4:0:11:0: [sdq] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB)
Jul  7 14:01:51 dev-storage1 kernel: [429420.898872] sd 4:0:11:0: [sdq] 4096-byte physical blocks
Jul  7 14:01:51 dev-storage1 kernel: [429420.957177] sd 4:0:11:0: [sdq] Write Protect is off
Jul  7 14:01:51 dev-storage1 kernel: [429420.957183] sd 4:0:11:0: [sdq] Mode Sense: 7f 00 00 08
Jul  7 14:01:51 dev-storage1 kernel: [429420.957882] sd 4:0:11:0: [sdq] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jul  7 14:01:51 dev-storage1 kernel: [429421.030777]  sdq: unknown partition table
Jul  7 14:01:51 dev-storage1 kernel: [429421.099310] sd 4:0:11:0: [sdq] Attached SCSI disk
Jul  7 14:02:07 dev-storage1 kernel: [429436.579265] md: super_written gets error=-19, uptodate=0
Jul  7 14:02:07 dev-storage1 kernel: [429436.612518] md: super_written gets error=-19, uptodate=0
Jul  7 14:02:50 dev-storage1 kernel: [429479.409214] sd 4:0:1:0: [sdc] Synchronizing SCSI cache
Jul  7 14:02:50 dev-storage1 kernel: [429479.409290] sd 4:0:1:0: [sdc]  Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Jul  7 14:02:50 dev-storage1 kernel: [429479.409297] Buffer I/O error on device md127, logical block 2067791905
Jul  7 14:02:50 dev-storage1 kernel: [429479.409300] lost page write due to I/O error on md127
Jul  7 14:02:50 dev-storage1 kernel: [429479.409502] mpt2sas0: removing handle(0x0009), sas_addr(0x4433221101000000)
Jul  7 14:02:50 dev-storage1 kernel: [429479.612173] md: super_written gets error=-19, uptodate=0
Jul  7 14:02:50 dev-storage1 kernel: [429479.612192] md: super_written gets error=-19, uptodate=0
Jul  7 14:02:50 dev-storage1 kernel: [429479.639212] md: super_written gets error=-19, uptodate=0
Jul  7 14:02:50 dev-storage1 kernel: [429479.639222] md: super_written gets error=-19, uptodate=0
Jul  7 14:03:04 dev-storage1 kernel: [429493.162396] scsi 4:0:12:0: Direct-Access     ATA      ST3000DM001-9YN1 CC4C PQ: 0 ANSI: 6
Jul  7 14:03:04 dev-storage1 kernel: [429493.162407] scsi 4:0:12:0: SATA: handle(0x0009), sas_addr(0x4433221101000000), phy(1), device_name(0x5000c5004a46ceca)
Jul  7 14:03:04 dev-storage1 kernel: [429493.162411] scsi 4:0:12:0: SATA: enclosure_logical_id(0x500605b00448c4f0), slot(2)
Jul  7 14:03:04 dev-storage1 kernel: [429493.162518] scsi 4:0:12:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Jul  7 14:03:04 dev-storage1 kernel: [429493.162526] scsi 4:0:12:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
Jul  7 14:03:04 dev-storage1 kernel: [429493.162782] sd 4:0:12:0: Attached scsi generic sg2 type 0
Jul  7 14:03:04 dev-storage1 kernel: [429493.163136] sd 4:0:12:0: [sdr] physical block alignment offset: 4096
Jul  7 14:03:04 dev-storage1 kernel: [429493.163143] sd 4:0:12:0: [sdr] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB)
Jul  7 14:03:04 dev-storage1 kernel: [429493.163146] sd 4:0:12:0: [sdr] 4096-byte physical blocks
Jul  7 14:03:04 dev-storage1 kernel: [429493.217763] sd 4:0:12:0: [sdr] Write Protect is off
Jul  7 14:03:04 dev-storage1 kernel: [429493.217768] sd 4:0:12:0: [sdr] Mode Sense: 7f 00 00 08
Jul  7 14:03:04 dev-storage1 kernel: [429493.218501] sd 4:0:12:0: [sdr] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jul  7 14:03:04 dev-storage1 kernel: [429493.289491]  sdr: unknown partition table
Jul  7 14:03:04 dev-storage1 kernel: [429493.351632] sd 4:0:12:0: [sdr] Attached SCSI disk
Jul  7 15:38:20 dev-storage1 kernel: [435193.023742] sd 4:0:10:0: [sdp] Synchronizing SCSI cache
Jul  7 15:38:20 dev-storage1 kernel: [435193.023794] sd 4:0:10:0: [sdp]  Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Jul  7 15:38:20 dev-storage1 kernel: [435193.024209] mpt2sas0: removing handle(0x000d), sas_addr(0x4433221104000000)
Jul  7 15:38:29 dev-storage1 kernel: [435202.541557] scsi 4:0:13:0: Direct-Access     ATA      ST3000DM001-9YN1 CC4C PQ: 0 ANSI: 6
Jul  7 15:38:29 dev-storage1 kernel: [435202.541567] scsi 4:0:13:0: SATA: handle(0x000d), sas_addr(0x4433221104000000), phy(4), device_name(0x5000c5004a123b93)
Jul  7 15:38:29 dev-storage1 kernel: [435202.541571] scsi 4:0:13:0: SATA: enclosure_logical_id(0x500605b00448c4f0), slot(7)
Jul  7 15:38:29 dev-storage1 kernel: [435202.541693] scsi 4:0:13:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Jul  7 15:38:29 dev-storage1 kernel: [435202.541699] scsi 4:0:13:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
Jul  7 15:38:29 dev-storage1 kernel: [435202.541978] sd 4:0:13:0: Attached scsi generic sg5 type 0
Jul  7 15:38:29 dev-storage1 kernel: [435202.542230] sd 4:0:13:0: [sdp] physical block alignment offset: 4096
Jul  7 15:38:29 dev-storage1 kernel: [435202.542240] sd 4:0:13:0: [sdp] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB)
Jul  7 15:38:29 dev-storage1 kernel: [435202.542245] sd 4:0:13:0: [sdp] 4096-byte physical blocks
Jul  7 15:38:30 dev-storage1 kernel: [435202.594827] sd 4:0:13:0: [sdp] Write Protect is off
Jul  7 15:38:30 dev-storage1 kernel: [435202.594832] sd 4:0:13:0: [sdp] Mode Sense: 7f 00 00 08
Jul  7 15:38:30 dev-storage1 kernel: [435202.595551] sd 4:0:13:0: [sdp] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jul  7 15:38:30 dev-storage1 kernel: [435202.666733]  sdp: unknown partition table
Jul  7 15:38:30 dev-storage1 kernel: [435202.728676] sd 4:0:13:0: [sdp] Attached SCSI disk
Jul  7 15:48:21 dev-storage1 kernel: [435792.305634] sd 4:0:13:0: [sdp] Synchronizing SCSI cache
Jul  7 15:48:21 dev-storage1 kernel: [435792.305685] sd 4:0:13:0: [sdp]  Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Jul  7 15:48:21 dev-storage1 kernel: [435792.305961] mpt2sas0: removing handle(0x000d), sas_addr(0x4433221104000000)
Jul  7 15:48:21 dev-storage1 kernel: [435792.307158] sd 4:0:9:0: [sdo] Synchronizing SCSI cache
Jul  7 15:48:21 dev-storage1 kernel: [435792.307201] sd 4:0:9:0: [sdo]  Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Jul  7 15:48:21 dev-storage1 kernel: [435792.307477] mpt2sas0: removing handle(0x000f), sas_addr(0x4433221105000000)
Jul  7 15:48:30 dev-storage1 kernel: [435801.077761] scsi 4:0:14:0: Direct-Access     ATA      ST3000DM001-9YN1 CC4C PQ: 0 ANSI: 6
Jul  7 15:48:30 dev-storage1 kernel: [435801.077773] scsi 4:0:14:0: SATA: handle(0x000f), sas_addr(0x4433221105000000), phy(5), device_name(0x5000c5004a44edbe)
Jul  7 15:48:30 dev-storage1 kernel: [435801.077780] scsi 4:0:14:0: SATA: enclosure_logical_id(0x500605b00448c4f0), slot(6)
Jul  7 15:48:30 dev-storage1 kernel: [435801.077876] scsi 4:0:14:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Jul  7 15:48:30 dev-storage1 kernel: [435801.077884] scsi 4:0:14:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
Jul  7 15:48:30 dev-storage1 kernel: [435801.078164] sd 4:0:14:0: Attached scsi generic sg5 type 0
Jul  7 15:48:30 dev-storage1 kernel: [435801.078467] sd 4:0:14:0: [sdo] physical block alignment offset: 4096
Jul  7 15:48:30 dev-storage1 kernel: [435801.078476] sd 4:0:14:0: [sdo] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB)
Jul  7 15:48:30 dev-storage1 kernel: [435801.078481] sd 4:0:14:0: [sdo] 4096-byte physical blocks
Jul  7 15:48:30 dev-storage1 kernel: [435801.134074] sd 4:0:14:0: [sdo] Write Protect is off
Jul  7 15:48:30 dev-storage1 kernel: [435801.134079] sd 4:0:14:0: [sdo] Mode Sense: 7f 00 00 08
Jul  7 15:48:30 dev-storage1 kernel: [435801.134786] sd 4:0:14:0: [sdo] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jul  7 15:48:30 dev-storage1 kernel: [435801.208179]  sdo: unknown partition table
Jul  7 15:48:30 dev-storage1 kernel: [435801.276258] sd 4:0:14:0: [sdo] Attached SCSI disk
Jul  7 15:48:30 dev-storage1 kernel: [435801.822171] scsi 4:0:15:0: Direct-Access     ATA      ST3000DM001-9YN1 CC4C PQ: 0 ANSI: 6
Jul  7 15:48:30 dev-storage1 kernel: [435801.822180] scsi 4:0:15:0: SATA: handle(0x000d), sas_addr(0x4433221104000000), phy(4), device_name(0x5000c5004a123b93)
Jul  7 15:48:30 dev-storage1 kernel: [435801.822185] scsi 4:0:15:0: SATA: enclosure_logical_id(0x500605b00448c4f0), slot(7)
Jul  7 15:48:30 dev-storage1 kernel: [435801.822272] scsi 4:0:15:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Jul  7 15:48:30 dev-storage1 kernel: [435801.822279] scsi 4:0:15:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
Jul  7 15:48:30 dev-storage1 kernel: [435801.822589] sd 4:0:15:0: Attached scsi generic sg7 type 0
Jul  7 15:48:30 dev-storage1 kernel: [435801.822886] sd 4:0:15:0: [sdp] physical block alignment offset: 4096
Jul  7 15:48:30 dev-storage1 kernel: [435801.822896] sd 4:0:15:0: [sdp] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB)
Jul  7 15:48:30 dev-storage1 kernel: [435801.822900] sd 4:0:15:0: [sdp] 4096-byte physical blocks
Jul  7 15:48:30 dev-storage1 kernel: [435801.875888] sd 4:0:15:0: [sdp] Write Protect is off
Jul  7 15:48:30 dev-storage1 kernel: [435801.875893] sd 4:0:15:0: [sdp] Mode Sense: 7f 00 00 08
Jul  7 15:48:30 dev-storage1 kernel: [435801.876484] sd 4:0:15:0: [sdp] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jul  7 15:48:31 dev-storage1 kernel: [435801.947660]  sdp: unknown partition table
Jul  7 15:48:31 dev-storage1 kernel: [435802.009745] sd 4:0:15:0: [sdp] Attached SCSI disk
Jul  7 15:49:14 dev-storage1 kernel: [435845.153305] sd 4:0:14:0: [sdo] Synchronizing SCSI cache
Jul  7 15:49:14 dev-storage1 kernel: [435845.153356] sd 4:0:14:0: [sdo]  Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
... snip rest ...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Assembly failure
  2012-07-11  7:58           ` Brian Candler
@ 2012-07-11  8:27             ` Christian Balzer
  2012-07-11  9:09               ` Brian Candler
                                 ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Christian Balzer @ 2012-07-11  8:27 UTC (permalink / raw)
  To: linux-raid; +Cc: Brian Candler

On Wed, 11 Jul 2012 08:58:18 +0100 Brian Candler wrote:

[snip]
> 
> BTW these are all Seagate ST3000DM001. Yes, I know :-(
> 
Indeed, there is your problem. And on a LSI controller (which one?) to
boot. ^o^ 
Though the later part should be fine with a kernel as new as yours.

The new STxxxxM drives from Seagate are <expletive deleted>. 
They're wonderfully fast, but you absolutely can NOT use them in any HW
RAID until they get a non-braindead firmware that won't park (look at the
Load_Cycle_Count in SMART) the heads every 30 seconds, come rain or shine.
Not only will this wear out the drives in any remotely busy scenario, but
it will also cause them to be considered off-line by the SATA controller in
the right (wrong) circumstances, leading to exactly what you're seeing
here.
I experienced the same thing and have switched to Hitachi drives for the
foreseeable future, which seem from one year of experience to be of far
higher quality/reliability anyway.

These Seagates are also suffering from quality control issues and large DOA
and early death rates.

With direct attached drives that you can issue hdparm commands to, you can
"fix" this deadly behavior by issuing an "apm = 255" command to them (in
hdparm.conf, needs to be done on each boot...).

[snip]

Regards,

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@gol.com   	Global OnLine Japan/Fusion Communications
http://www.gol.com/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Assembly failure
  2012-07-11  8:27             ` Christian Balzer
@ 2012-07-11  9:09               ` Brian Candler
  2012-07-11 10:32                 ` Mikael Abrahamsson
  2012-07-11 10:44               ` Roman Mamedov
  2012-07-13 18:52               ` Brian Candler
  2 siblings, 1 reply; 16+ messages in thread
From: Brian Candler @ 2012-07-11  9:09 UTC (permalink / raw)
  To: Christian Balzer; +Cc: linux-raid

On Wed, Jul 11, 2012 at 05:27:42PM +0900, Christian Balzer wrote:
> > BTW these are all Seagate ST3000DM001. Yes, I know :-(
> > 
> Indeed, there is your problem. And on a LSI controller (which one?) to
> boot. ^o^ 
> Though the later part should be fine with a kernel as new as yours.

The drives were only bought because the supplier was out of Hitachis, and we
didn't realise the Seagates don't have ERC.  This is why the Hitachis have
moved to production, and I'm stuck with the Seagates on the dev systems :-(

> (look at the Load_Cycle_Count in SMART)

193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       490
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       549
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       516
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       505
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       76
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       77
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       502
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       495
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       550
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       562
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       532
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       556

Ugh. (Two have a lower count, but maybe those are bad...)

> With direct attached drives that you can issue hdparm commands to, you can
> "fix" this deadly behavior by issuing an "apm = 255" command to them (in
> hdparm.conf, needs to be done on each boot...).

Thanks, rc.local now has:

# Set Error Recovery Control if drive supports it
for i in /dev/sd*; do /usr/sbin/smartctl -l scterc,70,70 $i >/dev/null; done
# Stop drives from spinning down
for i in /dev/sd*; do hdparm -q -B255 $i; done

Cheers,

Brian.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Assembly failure
  2012-07-11  9:09               ` Brian Candler
@ 2012-07-11 10:32                 ` Mikael Abrahamsson
  2012-07-11 10:47                   ` Brian Candler
  0 siblings, 1 reply; 16+ messages in thread
From: Mikael Abrahamsson @ 2012-07-11 10:32 UTC (permalink / raw)
  To: Brian Candler; +Cc: Christian Balzer, linux-raid

On Wed, 11 Jul 2012, Brian Candler wrote:

>> (look at the Load_Cycle_Count in SMART)
>
> 193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       490

That is really low:

$ sudo smartctl -a /dev/sdd | grep -i load
193 Load_Cycle_Count        0x0032   052   052   000    Old_age   Always       -       444425

Most drives are rated for a load cycle count of 200-600k. All mine with 
high load-cycle-count are WD20EARS. WD20EADS doesn't do this.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Assembly failure
  2012-07-11  8:27             ` Christian Balzer
  2012-07-11  9:09               ` Brian Candler
@ 2012-07-11 10:44               ` Roman Mamedov
  2012-07-11 17:21                 ` Christian Balzer
  2012-07-13 18:52               ` Brian Candler
  2 siblings, 1 reply; 16+ messages in thread
From: Roman Mamedov @ 2012-07-11 10:44 UTC (permalink / raw)
  To: Christian Balzer; +Cc: linux-raid, Brian Candler

[-- Attachment #1: Type: text/plain, Size: 917 bytes --]

On Wed, 11 Jul 2012 17:27:42 +0900
Christian Balzer <chibi@gol.com> wrote:

> RAID until they get a non-braindead firmware that won't park (look at the
> Load_Cycle_Count in SMART) the heads every 30 seconds, come rain or shine.

So I seem to have such firmware, what do I win?

Device Model:     ST1000DM003-9YN162

  9 Power_On_Hours          0x0032   097   097   000    Old_age   Always       -       2796
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       326

or maybe....

# hdparm -B /dev/sda

/dev/sda:
 APM_level	= 128

See "man hdparm" about the "-B" switch and check what is the value on your
drives. Mine was at 128 by default btw, I did not have to change this value
manually.

-- 
With respect,
Roman

~~~~~~~~~~~~~~~~~~~~~~~~~~~
"Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free."

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Assembly failure
  2012-07-11 10:32                 ` Mikael Abrahamsson
@ 2012-07-11 10:47                   ` Brian Candler
  0 siblings, 0 replies; 16+ messages in thread
From: Brian Candler @ 2012-07-11 10:47 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: Christian Balzer, linux-raid

On Wed, Jul 11, 2012 at 12:32:28PM +0200, Mikael Abrahamsson wrote:
> That is really low:
> 
> $ sudo smartctl -a /dev/sdd | grep -i load
> 193 Load_Cycle_Count        0x0032   052   052   000    Old_age   Always       -       444425

The drives had -B 254 before I changed them to -B 255. According to
hdparm(8) that should be the least aggressive power management.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Assembly failure
  2012-07-11 10:44               ` Roman Mamedov
@ 2012-07-11 17:21                 ` Christian Balzer
  0 siblings, 0 replies; 16+ messages in thread
From: Christian Balzer @ 2012-07-11 17:21 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: linux-raid, Brian Candler

On Wed, 11 Jul 2012 16:44:41 +0600 Roman Mamedov wrote:

> On Wed, 11 Jul 2012 17:27:42 +0900
> Christian Balzer <chibi@gol.com> wrote:
> 
> > RAID until they get a non-braindead firmware that won't park (look at
> > the Load_Cycle_Count in SMART) the heads every 30 seconds, come rain
> > or shine.
> 
> So I seem to have such firmware, what do I win?
> 
An inflatable washer machine...

> Device Model:     ST1000DM003-9YN162
> 
>   9 Power_On_Hours          0x0032   097   097   000    Old_age
> Always       -       2796 193 Load_Cycle_Count        0x0032   100
> 100   000    Old_age   Always       -       326
>
Read the respective threads on the Seagate forums.

If your drive is not busy most of the time, once the heads get parked
(unloaded), they will stay that way, until it gets accessed again.
However if the drive is busy every second or so, the aforementioned
statement holds. 

Device Model:     ST2000DM001-9YN164
Firmware Version: CC4C
193 Load_Cycle_Count        0x0032   099   099   000    Old_age   Always       -       2419

Those were accumulated in the first 2 days or so of that drive, before I
set the APM level to 255.

-- 
Christian Balzer        Network/Systems Engineer                
chibi@gol.com   	Global OnLine Japan/Fusion Communications
http://www.gol.com/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Assembly failure
  2012-07-11  8:27             ` Christian Balzer
  2012-07-11  9:09               ` Brian Candler
  2012-07-11 10:44               ` Roman Mamedov
@ 2012-07-13 18:52               ` Brian Candler
  2 siblings, 0 replies; 16+ messages in thread
From: Brian Candler @ 2012-07-13 18:52 UTC (permalink / raw)
  To: Christian Balzer; +Cc: linux-raid

OK, after reseating drives and removing the three definitely bad ones, I
think the hardware is stable again now.

So now I have a problem with the five-drive array I had set up in the mean
time.  All five drives are there, but one is a bit behind the others in its
event count and last update time.

Here's the mdadm --examine output:

/dev/sdb:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 149c0025:e7c5da3a:62b7a318:4ca57af7
           Name : storage1.2
  Creation Time : Wed Jul 11 14:50:06 2012
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
     Array Size : 17581590528 (8383.56 GiB 9001.77 GB)
  Used Dev Size : 5860530176 (2794.52 GiB 3000.59 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 56e9ce91:c5df8850:2105c86d:c9c710a1

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed Jul 11 15:19:31 2012
       Checksum : 80c0762 - correct
         Events : 276

         Layout : left-symmetric
     Chunk Size : 1024K

   Device Role : Active device 0
   Array State : AAAAA ('A' == active, '.' == missing)
/dev/sdj:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 149c0025:e7c5da3a:62b7a318:4ca57af7
           Name : storage1.2
  Creation Time : Wed Jul 11 14:50:06 2012
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
     Array Size : 17581590528 (8383.56 GiB 9001.77 GB)
  Used Dev Size : 5860530176 (2794.52 GiB 3000.59 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : db72c8d7:672760b4:572dc944:fc7c151b

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed Jul 11 15:29:52 2012
       Checksum : 11ec5fef - correct
         Events : 357

         Layout : left-symmetric
     Chunk Size : 1024K

   Device Role : Active device 1
   Array State : .AAAA ('A' == active, '.' == missing)
/dev/sdk:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 149c0025:e7c5da3a:62b7a318:4ca57af7
           Name : storage1.2
  Creation Time : Wed Jul 11 14:50:06 2012
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
     Array Size : 17581590528 (8383.56 GiB 9001.77 GB)
  Used Dev Size : 5860530176 (2794.52 GiB 3000.59 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : b12fefdd:74914e6e:9f3ca2bd:8b433e34

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed Jul 11 15:29:52 2012
       Checksum : 64035caa - correct
         Events : 357

         Layout : left-symmetric
     Chunk Size : 1024K

   Device Role : Active device 2
   Array State : .AAAA ('A' == active, '.' == missing)
/dev/sdl:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 149c0025:e7c5da3a:62b7a318:4ca57af7
           Name : storage1.2
  Creation Time : Wed Jul 11 14:50:06 2012
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
     Array Size : 17581590528 (8383.56 GiB 9001.77 GB)
  Used Dev Size : 5860530176 (2794.52 GiB 3000.59 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : db387f8a:383c26f4:4012a3ec:12c7679e

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed Jul 11 15:29:52 2012
       Checksum : 2f9569c2 - correct
         Events : 357

         Layout : left-symmetric
     Chunk Size : 1024K

   Device Role : Active device 3
   Array State : .AAAA ('A' == active, '.' == missing)
/dev/sdm:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 149c0025:e7c5da3a:62b7a318:4ca57af7
           Name : storage1.2
  Creation Time : Wed Jul 11 14:50:06 2012
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
     Array Size : 17581590528 (8383.56 GiB 9001.77 GB)
  Used Dev Size : 5860530176 (2794.52 GiB 3000.59 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : ac50fe77:91ce387a:e819a38d:4d56a734

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed Jul 11 15:29:52 2012
       Checksum : da66aace - correct
         Events : 357

         Layout : left-symmetric
     Chunk Size : 1024K

   Device Role : Active device 4
   Array State : .AAAA ('A' == active, '.' == missing)

Now, a simple assemble fails:

    root@dev-storage1:~# mdadm --assemble /dev/md/storage1.2 /dev/sd{b,j,k,l,m}
    mdadm: /dev/md/storage1.2 assembled from 4 drives - not enough to start the array while not clean - consider --force.
    root@dev-storage1:~# cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
    md127 : inactive sdj[1](S) sdm[4](S) sdl[3](S) sdk[2](S) sdb[0](S)
          14651327800 blocks super 1.2
           
    unused devices: <none>

(Well, md127 exists, but I don't know how to "start" it).
So let's try using --force as it suggests:

    root@dev-storage1:~# mdadm -S /dev/md127
    mdadm: stopped /dev/md127
    root@dev-storage1:~# mdadm --assemble --force /dev/md/storage1.2 /dev/sd{b,j,k,l,m}
    mdadm: /dev/md/storage1.2 has been started with 4 drives (out of 5).
    root@dev-storage1:~# cat /proc/mdstatPersonalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
    md127 : active raid6 sdj[1] sdm[4] sdl[3] sdk[2]
          8790795264 blocks super 1.2 level 6, 1024k chunk, algorithm 2 [5/4] [_UUUU]
          bitmap: 22/22 pages [88KB], 65536KB chunk

    unused devices: <none>
    root@dev-storage1:~# 

Now I have a 4-drive degraded RAID6, /dev/sdb isn't even listed (even though
I gave it on the command line).  Is this correct?  Is the next thing to do
to add the 5th drive into it manually?

    root@dev-storage1:~# mdadm --manage --re-add /dev/md127 /dev/sdb
    mdadm: re-added /dev/sdb
    root@dev-storage1:~# cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4]
    [raid10] 
    md127 : active raid6 sdb[0] sdj[1] sdm[4] sdl[3] sdk[2]
          8790795264 blocks super 1.2 level 6, 1024k chunk, algorithm 2 [5/4]
    [_UUUU]
          [>....................]  recovery =  1.1% (32854540/2930265088)
    finish=952.5min speed=50692K/sec
          bitmap: 22/22 pages [88KB], 65536KB chunk

    unused devices: <none>

That seems to have worked, can someone just confirm that's the right
sequence of things to do though. This is a test system, next time I do this
might be for real :-)

Cheers,

Brian.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Assembly failure
@ 2012-07-13 20:34 Richard Scobie
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Scobie @ 2012-07-13 20:34 UTC (permalink / raw)
  To: Linux RAID Mailing List

Brian Candler wrote:

------------------------------------

One final point. I would like to be able to monitor for suspect or failed
drives.  Is my best bet to look at /proc/mdstat output and identify drives
which have been kicked out of the array?  Or to monitor SMART variables (in
that case though I need to decide which ones are the most important to
monitor, and what thresholds to set)?

-------------------------------------------

For years I have used smartd without issues and it will log and email 
anomalies as they occur.

It is also advisable to regularly "scrub" all md devices, to flush out 
faulty sectors:

echo check > /sys/block/mdX/md/sync_action

See Documentation/md.txt for details.

Regards,

Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2012-07-13 20:34 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-10 16:33 Assembly failure Brian Candler
2012-07-10 16:48 ` Sebastian Riemer
2012-07-10 17:06   ` Brian Candler
2012-07-10 17:38     ` Sebastian Riemer
2012-07-10 18:59       ` Brian Candler
2012-07-11  2:43         ` NeilBrown
2012-07-11  7:58           ` Brian Candler
2012-07-11  8:27             ` Christian Balzer
2012-07-11  9:09               ` Brian Candler
2012-07-11 10:32                 ` Mikael Abrahamsson
2012-07-11 10:47                   ` Brian Candler
2012-07-11 10:44               ` Roman Mamedov
2012-07-11 17:21                 ` Christian Balzer
2012-07-13 18:52               ` Brian Candler
2012-07-10 17:05 ` pants
2012-07-13 20:34 Richard Scobie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.