All of lore.kernel.org
 help / color / mirror / Atom feed
* RAID 6 Failure follow up
@ 2009-11-08 14:07 Andrew Dunn
  2009-11-08 14:23 ` Roger Heflin
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Dunn @ 2009-11-08 14:07 UTC (permalink / raw)
  To: linux-raid list

This is kind of interesting:

storrgie@ALEXANDRIA:~$ sudo mdadm --assemble --force /dev/md0
mdadm: no devices found for /dev/md0

All of the devices are there in /dev, so I wanted to examine them:

storrgie@ALEXANDRIA:~$ sudo mdadm --examine /dev/sde1
/dev/sde1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 397e0b3f:34cbe4cc:613e2239:070da8c8 (local to host
ALEXANDRIA)
  Creation Time : Fri Nov  6 07:06:34 2009
     Raid Level : raid6
  Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
     Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
   Raid Devices : 9
  Total Devices : 9
Preferred Minor : 0

    Update Time : Sun Nov  8 08:57:04 2009
          State : clean
 Active Devices : 5
Working Devices : 5
 Failed Devices : 4
  Spare Devices : 0
       Checksum : 4ff41c5f - correct
         Events : 43

     Chunk Size : 1024K

      Number   Major   Minor   RaidDevice State
this     0       8       65        0      active sync   /dev/sde1

   0     0       8       65        0      active sync   /dev/sde1
   1     1       8       81        1      active sync   /dev/sdf1
   2     2       8       97        2      active sync   /dev/sdg1
   3     3       8      113        3      active sync   /dev/sdh1
   4     4       0        0        4      faulty removed
   5     5       0        0        5      faulty removed
   6     6       0        0        6      faulty removed
   7     7       0        0        7      faulty removed
   8     8       8      193        8      active sync   /dev/sdm1

First raid device shows the failures....

One of the 'removed' devices:

storrgie@ALEXANDRIA:~$ sudo mdadm --examine /dev/sdi1
/dev/sdi1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 397e0b3f:34cbe4cc:613e2239:070da8c8 (local to host
ALEXANDRIA)
  Creation Time : Fri Nov  6 07:06:34 2009
     Raid Level : raid6
  Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
     Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
   Raid Devices : 9
  Total Devices : 9
Preferred Minor : 0

    Update Time : Sun Nov  8 08:53:30 2009
          State : active
 Active Devices : 9
Working Devices : 9
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 4ff41b2f - correct
         Events : 21

     Chunk Size : 1024K

      Number   Major   Minor   RaidDevice State
this     4       8      129        4      active sync   /dev/sdi1

   0     0       8       65        0      active sync   /dev/sde1
   1     1       8       81        1      active sync   /dev/sdf1
   2     2       8       97        2      active sync   /dev/sdg1
   3     3       8      113        3      active sync   /dev/sdh1
   4     4       8      129        4      active sync   /dev/sdi1
   5     5       8      145        5      active sync   /dev/sdj1
   6     6       8      161        6      active sync   /dev/sdk1
   7     7       8      177        7      active sync   /dev/sdl1
   8     8       8      193        8      active sync   /dev/sdm1

-- 
Andrew Dunn
http://agdunn.net


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID 6 Failure follow up
  2009-11-08 14:07 RAID 6 Failure follow up Andrew Dunn
@ 2009-11-08 14:23 ` Roger Heflin
  2009-11-08 14:30   ` Andrew Dunn
  2009-11-08 14:36   ` Andrew Dunn
  0 siblings, 2 replies; 23+ messages in thread
From: Roger Heflin @ 2009-11-08 14:23 UTC (permalink / raw)
  To: Andrew Dunn; +Cc: linux-raid list

Andrew Dunn wrote:
> This is kind of interesting:
> 
> storrgie@ALEXANDRIA:~$ sudo mdadm --assemble --force /dev/md0
> mdadm: no devices found for /dev/md0
> 
> All of the devices are there in /dev, so I wanted to examine them:
> 
> storrgie@ALEXANDRIA:~$ sudo mdadm --examine /dev/sde1
> /dev/sde1:
>           Magic : a92b4efc
>         Version : 00.90.00
>            UUID : 397e0b3f:34cbe4cc:613e2239:070da8c8 (local to host
> ALEXANDRIA)
>   Creation Time : Fri Nov  6 07:06:34 2009
>      Raid Level : raid6
>   Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
>      Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
>    Raid Devices : 9
>   Total Devices : 9
> Preferred Minor : 0
> 
>     Update Time : Sun Nov  8 08:57:04 2009
>           State : clean
>  Active Devices : 5
> Working Devices : 5
>  Failed Devices : 4
>   Spare Devices : 0
>        Checksum : 4ff41c5f - correct
>          Events : 43
> 
>      Chunk Size : 1024K
> 
>       Number   Major   Minor   RaidDevice State
> this     0       8       65        0      active sync   /dev/sde1
> 
>    0     0       8       65        0      active sync   /dev/sde1
>    1     1       8       81        1      active sync   /dev/sdf1
>    2     2       8       97        2      active sync   /dev/sdg1
>    3     3       8      113        3      active sync   /dev/sdh1
>    4     4       0        0        4      faulty removed
>    5     5       0        0        5      faulty removed
>    6     6       0        0        6      faulty removed
>    7     7       0        0        7      faulty removed
>    8     8       8      193        8      active sync   /dev/sdm1
> 
> First raid device shows the failures....
> 
> One of the 'removed' devices:
> 
> storrgie@ALEXANDRIA:~$ sudo mdadm --examine /dev/sdi1
> /dev/sdi1:
>           Magic : a92b4efc
>         Version : 00.90.00
>            UUID : 397e0b3f:34cbe4cc:613e2239:070da8c8 (local to host
> ALEXANDRIA)
>   Creation Time : Fri Nov  6 07:06:34 2009
>      Raid Level : raid6
>   Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
>      Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
>    Raid Devices : 9
>   Total Devices : 9
> Preferred Minor : 0
> 
>     Update Time : Sun Nov  8 08:53:30 2009
>           State : active
>  Active Devices : 9
> Working Devices : 9
>  Failed Devices : 0
>   Spare Devices : 0
>        Checksum : 4ff41b2f - correct
>          Events : 21
> 
>      Chunk Size : 1024K
> 
>       Number   Major   Minor   RaidDevice State
> this     4       8      129        4      active sync   /dev/sdi1
> 
>    0     0       8       65        0      active sync   /dev/sde1
>    1     1       8       81        1      active sync   /dev/sdf1
>    2     2       8       97        2      active sync   /dev/sdg1
>    3     3       8      113        3      active sync   /dev/sdh1
>    4     4       8      129        4      active sync   /dev/sdi1
>    5     5       8      145        5      active sync   /dev/sdj1
>    6     6       8      161        6      active sync   /dev/sdk1
>    7     7       8      177        7      active sync   /dev/sdl1
>    8     8       8      193        8      active sync   /dev/sdm1
> 


Did you check dmesg and see if there were errors on those disks?



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID 6 Failure follow up
  2009-11-08 14:23 ` Roger Heflin
@ 2009-11-08 14:30   ` Andrew Dunn
  2009-11-08 18:01     ` Richard Scobie
  2009-11-08 14:36   ` Andrew Dunn
  1 sibling, 1 reply; 23+ messages in thread
From: Andrew Dunn @ 2009-11-08 14:30 UTC (permalink / raw)
  To: Roger Heflin, robin; +Cc: linux-raid list

storrgie@ALEXANDRIA:~$ dmesg | grep sdi
[   31.019358] sd 11:0:0:0: [sdi] 1953525168 512-byte logical blocks:
(1.00 TB/931 GiB)
[   31.032233] sd 11:0:0:0: [sdi] Write Protect is off
[   31.032235] sd 11:0:0:0: [sdi] Mode Sense: 73 00 00 08
[   31.037483] sd 11:0:0:0: [sdi] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[   31.066991]  sdi:
[   31.075719]  sdi1
[   31.124713] sd 11:0:0:0: [sdi] Attached SCSI disk
[   31.147407] md: bind<sdi1>
[   31.712366] raid5: device sdi1 operational as raid disk 4
[   31.713153]  disk 4, o:1, dev:sdi1
[   33.112975]  disk 4, o:1, dev:sdi1
[  297.528544] sd 11:0:0:0: [sdi] Sense Key : Recovered Error [current]
[descriptor]
[  297.528573] sd 11:0:0:0: [sdi] Add. Sense: ATA pass through
information available
[  297.591382] sd 11:0:0:0: [sdi] Sense Key : Recovered Error [current]
[descriptor]
[  297.591407] sd 11:0:0:0: [sdi] Add. Sense: ATA pass through
information available

I don't see anything glaring.

You should be able to force an assembly anyway (using the --force flag)
but I'd make sure you know exactly what the issue is first, otherwise
this is likely to happen again.

Do you think that the controller is dropping out? I know that I have 4
drives on one controller (AOC-USAS-L8i) and 5 drives on the other
controller (SAME make/model). but I think they are sequentially
connected... as in sd[efghi] should be on one device and sd[jklm] should
be on the other... any easy way to verify?

Roger Heflin wrote:
> Andrew Dunn wrote:
>> This is kind of interesting:
>>
>> storrgie@ALEXANDRIA:~$ sudo mdadm --assemble --force /dev/md0
>> mdadm: no devices found for /dev/md0
>>
>> All of the devices are there in /dev, so I wanted to examine them:
>>
>> storrgie@ALEXANDRIA:~$ sudo mdadm --examine /dev/sde1
>> /dev/sde1:
>>           Magic : a92b4efc
>>         Version : 00.90.00
>>            UUID : 397e0b3f:34cbe4cc:613e2239:070da8c8 (local to host
>> ALEXANDRIA)
>>   Creation Time : Fri Nov  6 07:06:34 2009
>>      Raid Level : raid6
>>   Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
>>      Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
>>    Raid Devices : 9
>>   Total Devices : 9
>> Preferred Minor : 0
>>
>>     Update Time : Sun Nov  8 08:57:04 2009
>>           State : clean
>>  Active Devices : 5
>> Working Devices : 5
>>  Failed Devices : 4
>>   Spare Devices : 0
>>        Checksum : 4ff41c5f - correct
>>          Events : 43
>>
>>      Chunk Size : 1024K
>>
>>       Number   Major   Minor   RaidDevice State
>> this     0       8       65        0      active sync   /dev/sde1
>>
>>    0     0       8       65        0      active sync   /dev/sde1
>>    1     1       8       81        1      active sync   /dev/sdf1
>>    2     2       8       97        2      active sync   /dev/sdg1
>>    3     3       8      113        3      active sync   /dev/sdh1
>>    4     4       0        0        4      faulty removed
>>    5     5       0        0        5      faulty removed
>>    6     6       0        0        6      faulty removed
>>    7     7       0        0        7      faulty removed
>>    8     8       8      193        8      active sync   /dev/sdm1
>>
>> First raid device shows the failures....
>>
>> One of the 'removed' devices:
>>
>> storrgie@ALEXANDRIA:~$ sudo mdadm --examine /dev/sdi1
>> /dev/sdi1:
>>           Magic : a92b4efc
>>         Version : 00.90.00
>>            UUID : 397e0b3f:34cbe4cc:613e2239:070da8c8 (local to host
>> ALEXANDRIA)
>>   Creation Time : Fri Nov  6 07:06:34 2009
>>      Raid Level : raid6
>>   Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
>>      Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
>>    Raid Devices : 9
>>   Total Devices : 9
>> Preferred Minor : 0
>>
>>     Update Time : Sun Nov  8 08:53:30 2009
>>           State : active
>>  Active Devices : 9
>> Working Devices : 9
>>  Failed Devices : 0
>>   Spare Devices : 0
>>        Checksum : 4ff41b2f - correct
>>          Events : 21
>>
>>      Chunk Size : 1024K
>>
>>       Number   Major   Minor   RaidDevice State
>> this     4       8      129        4      active sync   /dev/sdi1
>>
>>    0     0       8       65        0      active sync   /dev/sde1
>>    1     1       8       81        1      active sync   /dev/sdf1
>>    2     2       8       97        2      active sync   /dev/sdg1
>>    3     3       8      113        3      active sync   /dev/sdh1
>>    4     4       8      129        4      active sync   /dev/sdi1
>>    5     5       8      145        5      active sync   /dev/sdj1
>>    6     6       8      161        6      active sync   /dev/sdk1
>>    7     7       8      177        7      active sync   /dev/sdl1
>>    8     8       8      193        8      active sync   /dev/sdm1
>>
>
>
> Did you check dmesg and see if there were errors on those disks?
>
>

-- 
Andrew Dunn
http://agdunn.net


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID 6 Failure follow up
  2009-11-08 14:23 ` Roger Heflin
  2009-11-08 14:30   ` Andrew Dunn
@ 2009-11-08 14:36   ` Andrew Dunn
  2009-11-08 14:56     ` Roger Heflin
  1 sibling, 1 reply; 23+ messages in thread
From: Andrew Dunn @ 2009-11-08 14:36 UTC (permalink / raw)
  To: Roger Heflin; +Cc: linux-raid list

[10:0:0:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sde
[10:0:1:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sdf
[10:0:2:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sdg
[10:0:3:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sdh
[11:0:0:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sdi
[11:0:1:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sdj
[11:0:2:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sdk
[11:0:3:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sdl
[11:0:4:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sdm

So 4 drives dropped out on the second controller. But why didnt sdm go
with them?

Roger Heflin wrote:
> Andrew Dunn wrote:
>> This is kind of interesting:
>>
>> storrgie@ALEXANDRIA:~$ sudo mdadm --assemble --force /dev/md0
>> mdadm: no devices found for /dev/md0
>>
>> All of the devices are there in /dev, so I wanted to examine them:
>>
>> storrgie@ALEXANDRIA:~$ sudo mdadm --examine /dev/sde1
>> /dev/sde1:
>>           Magic : a92b4efc
>>         Version : 00.90.00
>>            UUID : 397e0b3f:34cbe4cc:613e2239:070da8c8 (local to host
>> ALEXANDRIA)
>>   Creation Time : Fri Nov  6 07:06:34 2009
>>      Raid Level : raid6
>>   Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
>>      Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
>>    Raid Devices : 9
>>   Total Devices : 9
>> Preferred Minor : 0
>>
>>     Update Time : Sun Nov  8 08:57:04 2009
>>           State : clean
>>  Active Devices : 5
>> Working Devices : 5
>>  Failed Devices : 4
>>   Spare Devices : 0
>>        Checksum : 4ff41c5f - correct
>>          Events : 43
>>
>>      Chunk Size : 1024K
>>
>>       Number   Major   Minor   RaidDevice State
>> this     0       8       65        0      active sync   /dev/sde1
>>
>>    0     0       8       65        0      active sync   /dev/sde1
>>    1     1       8       81        1      active sync   /dev/sdf1
>>    2     2       8       97        2      active sync   /dev/sdg1
>>    3     3       8      113        3      active sync   /dev/sdh1
>>    4     4       0        0        4      faulty removed
>>    5     5       0        0        5      faulty removed
>>    6     6       0        0        6      faulty removed
>>    7     7       0        0        7      faulty removed
>>    8     8       8      193        8      active sync   /dev/sdm1
>>
>> First raid device shows the failures....
>>
>> One of the 'removed' devices:
>>
>> storrgie@ALEXANDRIA:~$ sudo mdadm --examine /dev/sdi1
>> /dev/sdi1:
>>           Magic : a92b4efc
>>         Version : 00.90.00
>>            UUID : 397e0b3f:34cbe4cc:613e2239:070da8c8 (local to host
>> ALEXANDRIA)
>>   Creation Time : Fri Nov  6 07:06:34 2009
>>      Raid Level : raid6
>>   Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
>>      Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
>>    Raid Devices : 9
>>   Total Devices : 9
>> Preferred Minor : 0
>>
>>     Update Time : Sun Nov  8 08:53:30 2009
>>           State : active
>>  Active Devices : 9
>> Working Devices : 9
>>  Failed Devices : 0
>>   Spare Devices : 0
>>        Checksum : 4ff41b2f - correct
>>          Events : 21
>>
>>      Chunk Size : 1024K
>>
>>       Number   Major   Minor   RaidDevice State
>> this     4       8      129        4      active sync   /dev/sdi1
>>
>>    0     0       8       65        0      active sync   /dev/sde1
>>    1     1       8       81        1      active sync   /dev/sdf1
>>    2     2       8       97        2      active sync   /dev/sdg1
>>    3     3       8      113        3      active sync   /dev/sdh1
>>    4     4       8      129        4      active sync   /dev/sdi1
>>    5     5       8      145        5      active sync   /dev/sdj1
>>    6     6       8      161        6      active sync   /dev/sdk1
>>    7     7       8      177        7      active sync   /dev/sdl1
>>    8     8       8      193        8      active sync   /dev/sdm1
>>
>
>
> Did you check dmesg and see if there were errors on those disks?
>
>

-- 
Andrew Dunn
http://agdunn.net


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID 6 Failure follow up
  2009-11-08 14:36   ` Andrew Dunn
@ 2009-11-08 14:56     ` Roger Heflin
  2009-11-08 17:08       ` Andrew Dunn
  0 siblings, 1 reply; 23+ messages in thread
From: Roger Heflin @ 2009-11-08 14:56 UTC (permalink / raw)
  To: Andrew Dunn; +Cc: linux-raid list

Andrew Dunn wrote:
> [10:0:0:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sde
> [10:0:1:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sdf
> [10:0:2:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sdg
> [10:0:3:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sdh
> [11:0:0:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sdi
> [11:0:1:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sdj
> [11:0:2:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sdk
> [11:0:3:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sdl
> [11:0:4:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sdm
> 
> So 4 drives dropped out on the second controller. But why didnt sdm go
> with them?
> 
>

It is possible that by the time it got to checking the last drive that 
the errors had cleared up, so sdm was ok with it checked.


Is this on a port multiplier?



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID 6 Failure follow up
  2009-11-08 14:56     ` Roger Heflin
@ 2009-11-08 17:08       ` Andrew Dunn
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Dunn @ 2009-11-08 17:08 UTC (permalink / raw)
  To: Roger Heflin; +Cc: linux-raid list

No multiplier, they are on a backpane though. 2 on one backpane, 3 on
another... but only 2 of the 3 dropped off that one.

I looked through dmesg some more, maybe you all might see something of
significance. I don't think this was around when it happened, but it
might shed light onto the issue. I will continue to sift through the log.

[   19.021969] scsi10 : ioc0: LSISAS1068E B3, FwRev=011a0000h, Ports=1,
MaxQ=478, IRQ=16
[   19.061176] mptsas: ioc0: attaching sata device: fw_channel 0, fw_id
0, phy 0, sas_addr 0x1221000000000000
[   19.063708] scsi 10:0:0:0: Direct-Access     ATA      WDC
WD1001FALS-0 0K05 PQ: 0 ANSI: 5
[   19.065473] sd 10:0:0:0: Attached scsi generic sg4 type 0
[   19.067322] mptsas: ioc0: attaching sata device: fw_channel 0, fw_id
1, phy 1, sas_addr 0x1221000001000000
[   19.068074] sd 10:0:0:0: [sde] 1953523055 512-byte logical blocks:
(1.00 TB/931 GiB)
[   19.070474] scsi 10:0:1:0: Direct-Access     ATA      WDC
WD1001FALS-0 0K05 PQ: 0 ANSI: 5
[   19.072797] sd 10:0:1:0: Attached scsi generic sg5 type 0
[   19.074994] mptsas: ioc0: attaching sata device: fw_channel 0, fw_id
4, phy 4, sas_addr 0x1221000004000000
[   19.076025] sd 10:0:1:0: [sdf] 1953525168 512-byte logical blocks:
(1.00 TB/931 GiB)
[   19.078091] scsi 10:0:2:0: Direct-Access     ATA      WDC
WD1001FALS-0 0K05 PQ: 0 ANSI: 5
[   19.080417] sd 10:0:2:0: Attached scsi generic sg6 type 0
[   19.082589] mptsas: ioc0: attaching sata device: fw_channel 0, fw_id
5, phy 5, sas_addr 0x1221000005000000
[   19.082966] sd 10:0:0:0: [sde] Write Protect is off
[   19.082970] sd 10:0:0:0: [sde] Mode Sense: 73 00 00 08
[   19.084186] sd 10:0:2:0: [sdg] 1953525168 512-byte logical blocks:
(1.00 TB/931 GiB)
[   19.086521] sd 10:0:0:0: [sde] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[   19.087036] scsi 10:0:3:0: Direct-Access     ATA      WDC
WD1001FALS-0 0K05 PQ: 0 ANSI: 5
[   19.088389] sd 10:0:1:0: [sdf] Write Protect is off
[   19.088393] sd 10:0:1:0: [sdf] Mode Sense: 73 00 00 08
[   19.089642] sd 10:0:3:0: Attached scsi generic sg7 type 0
[   19.092400] mptsas 0000:02:00.0: PCI INT A -> GSI 16 (level, low) ->
IRQ 16
[   19.092525] mptbase: ioc1: Initiating bringup
[   19.093974] sd 10:0:3:0: [sdh] 1953525168 512-byte logical blocks:
(1.00 TB/931 GiB)
[   19.095129] sd 10:0:1:0: [sdf] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[   19.101887] sd 10:0:2:0: [sdg] Write Protect is off
[   19.101891] sd 10:0:2:0: [sdg] Mode Sense: 73 00 00 08
[   19.104250] sd 10:0:2:0: [sdg] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[   19.107231] sd 10:0:3:0: [sdh] Write Protect is off
[   19.107236] sd 10:0:3:0: [sdh] Mode Sense: 73 00 00 08
[   19.109398]  sde:
[   19.111301] sd 10:0:3:0: [sdh] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[   19.111659]  sdf:
[   19.118664]  sdg: sdf1
[   19.122127]  sde1
[   19.126192]  sdh: sdg1
[   19.137786] sd 10:0:1:0: [sdf] Attached SCSI disk
[   19.143743]  sdh1
[   19.146360] sd 10:0:0:0: [sde] Attached SCSI disk
[   19.148589] sd 10:0:2:0: [sdg] Attached SCSI disk
[   19.158613] sd 10:0:3:0: [sdh] Attached SCSI disk
[   20.780022] ioc1: LSISAS1068E B3: Capabilities={Initiator}
[   20.780035] mptsas 0000:02:00.0: setting latency timer to 64
[   30.971934] scsi11 : ioc1: LSISAS1068E B3, FwRev=011a0000h, Ports=1,
MaxQ=478, IRQ=16
[   31.012437] mptsas: ioc1: attaching sata device: fw_channel 0, fw_id
0, phy 0, sas_addr 0x1221000000000000
[   31.015009] scsi 11:0:0:0: Direct-Access     ATA      WDC
WD1001FALS-0 0K05 PQ: 0 ANSI: 5
[   31.016755] sd 11:0:0:0: Attached scsi generic sg8 type 0
[   31.018603] mptsas: ioc1: attaching sata device: fw_channel 0, fw_id
1, phy 1, sas_addr 0x1221000001000000
[   31.019358] sd 11:0:0:0: [sdi] 1953525168 512-byte logical blocks:
(1.00 TB/931 GiB)
[   31.021753] scsi 11:0:1:0: Direct-Access     ATA      WDC
WD1001FALS-0 0K05 PQ: 0 ANSI: 5
[   31.024075] sd 11:0:1:0: Attached scsi generic sg9 type 0
[   31.026273] mptsas: ioc1: attaching sata device: fw_channel 0, fw_id
4, phy 4, sas_addr 0x1221000004000000
[   31.027302] sd 11:0:1:0: [sdj] 1953525168 512-byte logical blocks:
(1.00 TB/931 GiB)
[   31.029693] scsi 11:0:2:0: Direct-Access     ATA      WDC
WD1001FALS-0 0K05 PQ: 0 ANSI: 5
[   31.032004] sd 11:0:2:0: Attached scsi generic sg10 type 0
[   31.032233] sd 11:0:0:0: [sdi] Write Protect is off
[   31.032235] sd 11:0:0:0: [sdi] Mode Sense: 73 00 00 08
[   31.034133] mptsas: ioc1: attaching sata device: fw_channel 0, fw_id
5, phy 5, sas_addr 0x1221000005000000
[   31.035571] sd 11:0:2:0: [sdk] 1953525168 512-byte logical blocks:
(1.00 TB/931 GiB)
[   31.037483] sd 11:0:0:0: [sdi] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[   31.038793] scsi 11:0:3:0: Direct-Access     ATA      WDC
WD1001FALS-0 0K05 PQ: 0 ANSI: 5
[   31.041160] sd 11:0:3:0: Attached scsi generic sg11 type 0
[   31.043506] mptsas: ioc1: attaching sata device: fw_channel 0, fw_id
6, phy 6, sas_addr 0x1221000006000000
[   31.043884] sd 11:0:1:0: [sdj] Write Protect is off
[   31.043887] sd 11:0:1:0: [sdj] Mode Sense: 73 00 00 08
[   31.046683] sd 11:0:3:0: [sdl] 1953525168 512-byte logical blocks:
(1.00 TB/931 GiB)
[   31.047038] sd 11:0:1:0: [sdj] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[   31.050845] scsi 11:0:4:0: Direct-Access     ATA      WDC
WD1001FALS-0 0K05 PQ: 0 ANSI: 5
[   31.054206] sd 11:0:4:0: Attached scsi generic sg12 type 0
[   31.056125] sd 11:0:2:0: [sdk] Write Protect is off
[   31.056129] sd 11:0:2:0: [sdk] Mode Sense: 73 00 00 08
[   31.059805] sd 11:0:4:0: [sdm] 1953525168 512-byte logical blocks:
(1.00 TB/931 GiB)
[   31.061019] sd 11:0:2:0: [sdk] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[   31.065705] sd 11:0:3:0: [sdl] Write Protect is off
[   31.065710] sd 11:0:3:0: [sdl] Mode Sense: 73 00 00 08
[   31.066991]  sdi:
[   31.069131]  sdj:
[   31.070087] sd 11:0:3:0: [sdl] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[   31.073259] sd 11:0:4:0: [sdm] Write Protect is off
[   31.073262] sd 11:0:4:0: [sdm] Mode Sense: 73 00 00 08
[   31.074045]  sdj1
[   31.075719]  sdi1
[   31.077424] sd 11:0:4:0: [sdm] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[   31.083141]  sdk:
[   31.090760]  sdl: sdk1
[   31.099798]  sdm: sdl1
[   31.115614]  sdm1
[   31.122247] sd 11:0:1:0: [sdj] Attached SCSI disk
[   31.124713] sd 11:0:0:0: [sdi] Attached SCSI disk
[   31.131908] sd 11:0:2:0: [sdk] Attached SCSI disk
[   31.141444] md: bind<sdj1>
[   31.143383] sd 11:0:3:0: [sdl] Attached SCSI disk
[   31.147407] md: bind<sdi1>
[   31.153910] sd 11:0:4:0: [sdm] Attached SCSI disk
[   31.159932] md: bind<sdl1>
[   31.176695] md: bind<sdm1>
[   31.265544] md: bind<sde1>
[   31.354001] md: bind<sdk1>
[   31.467249] md: bind<sdh1>
[   31.476153] md: bind<sdg1>
[   31.670444] md: bind<sdf1>
[   31.672643] md: kicking non-fresh sdk1 from array!
[   31.672652] md: unbind<sdk1>
[   31.711286] md: export_rdev(sdk1)
[   31.712356] raid5: device sdf1 operational as raid disk 1
[   31.712358] raid5: device sdg1 operational as raid disk 2
[   31.712360] raid5: device sdh1 operational as raid disk 3
[   31.712362] raid5: device sde1 operational as raid disk 0
[   31.712363] raid5: device sdm1 operational as raid disk 8
[   31.712365] raid5: device sdl1 operational as raid disk 7
[   31.712366] raid5: device sdi1 operational as raid disk 4
[   31.712368] raid5: device sdj1 operational as raid disk 5
[   31.712962] raid5: allocated 9540kB for md0
[   31.713094] raid5: raid level 6 set md0 active with 8 out of 9
devices, algorithm 2


Roger Heflin wrote:
> Andrew Dunn wrote:
>> [10:0:0:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sde
>> [10:0:1:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sdf
>> [10:0:2:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sdg
>> [10:0:3:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sdh
>> [11:0:0:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sdi
>> [11:0:1:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sdj
>> [11:0:2:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sdk
>> [11:0:3:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sdl
>> [11:0:4:0]   disk    ATA      WDC WD1001FALS-0 0K05  /dev/sdm
>>
>> So 4 drives dropped out on the second controller. But why didnt sdm go
>> with them?
>>
>>
>
> It is possible that by the time it got to checking the last drive that
> the errors had cleared up, so sdm was ok with it checked.
>
>
> Is this on a port multiplier?
>
>

-- 
Andrew Dunn
http://agdunn.net


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID 6 Failure follow up
  2009-11-08 14:30   ` Andrew Dunn
@ 2009-11-08 18:01     ` Richard Scobie
  2009-11-08 18:22       ` Andrew Dunn
  2009-11-08 22:09       ` Andrew Dunn
  0 siblings, 2 replies; 23+ messages in thread
From: Richard Scobie @ 2009-11-08 18:01 UTC (permalink / raw)
  To: Andrew Dunn; +Cc: Roger Heflin, robin, linux-raid list

Andrew Dunn wrote:

> Do you think that the controller is dropping out? I know that I have 4
> drives on one controller (AOC-USAS-L8i) and 5 drives on the other
> controller (SAME make/model). but I think they are sequentially
> connected... as in sd[efghi] should be on one device and sd[jklm] should
> be on the other... any easy way to verify?

If you are running smartd, cease doing so and do not use the smartctl 
command on drives attached to these controllers - use causes drives to 
be offlined.

It appears the smartctl is broken with LSISAS 1068E based controllers.

See:

https://bugzilla.redhat.com/show_bug.cgi?id=452389

and

http://marc.info/?l=linux-scsi&m=125673590221135&w=2

Regards,

Richard


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID 6 Failure follow up
  2009-11-08 18:01     ` Richard Scobie
@ 2009-11-08 18:22       ` Andrew Dunn
  2009-11-08 18:34         ` Joe Landman
  2009-11-08 22:09       ` Andrew Dunn
  1 sibling, 1 reply; 23+ messages in thread
From: Andrew Dunn @ 2009-11-08 18:22 UTC (permalink / raw)
  To: Richard Scobie; +Cc: Roger Heflin, robin, linux-raid list, nfbrown

New data now, I got this from dmesg when it went down again. Hopefully
there is some significance to you guys.

> [14269.650381] sd 10:0:3:0: rejecting I/O to offline device
> [14269.650453] sd 10:0:3:0: rejecting I/O to offline device
> [14269.650524] sd 10:0:3:0: rejecting I/O to offline device
> [14269.650595] sd 10:0:3:0: rejecting I/O to offline device
> [14269.650672] sd 10:0:3:0: [sdh] Unhandled error code
> [14269.650675] sd 10:0:3:0: [sdh] Result: hostbyte=DID_NO_CONNECT
driverbyte=DRIVER_OK
> [14269.650680] end_request: I/O error, dev sdh, sector 1435085631
> [14269.650749] raid5:md0: read error not correctable (sector
1435085568 on sdh1).
> [14269.650753] raid5: Disk failure on sdh1, disabling device.
> [14269.650754] raid5: Operation continuing on 7 devices.
> [14269.650886] raid5:md0: read error not correctable (sector
1435085576 on sdh1).
> [14269.650890] raid5:md0: read error not correctable (sector
1435085584 on sdh1).
> [14269.650894] raid5:md0: read error not correctable (sector
1435085592 on sdh1).
> [14269.650898] raid5:md0: read error not correctable (sector
1435085600 on sdh1).
> [14269.650902] raid5:md0: read error not correctable (sector
1435085608 on sdh1).
> [14269.650905] raid5:md0: read error not correctable (sector
1435085616 on sdh1).
> [14269.650909] raid5:md0: read error not correctable (sector
1435085624 on sdh1).
> [14269.650913] raid5:md0: read error not correctable (sector
1435085632 on sdh1).
> [14269.650917] raid5:md0: read error not correctable (sector
1435085640 on sdh1).
> [14269.650943] sd 10:0:3:0: [sdh] Unhandled error code
> [14269.650946] sd 10:0:3:0: [sdh] Result: hostbyte=DID_NO_CONNECT
driverbyte=DRIVER_OK
> [14269.650950] end_request: I/O error, dev sdh, sector 1435085887
> [14269.651049] sd 10:0:3:0: [sdh] Unhandled error code
> [14269.651051] sd 10:0:3:0: [sdh] Result: hostbyte=DID_NO_CONNECT
driverbyte=DRIVER_OK
> [14269.651055] end_request: I/O error, dev sdh, sector 1435086143
> [14269.651151] sd 10:0:3:0: [sdh] Unhandled error code
> [14269.651153] sd 10:0:3:0: [sdh] Result: hostbyte=DID_NO_CONNECT
driverbyte=DRIVER_OK
> [14269.651157] end_request: I/O error, dev sdh, sector 1435086399
> [14269.651253] sd 10:0:3:0: [sdh] Unhandled error code
> [14269.651255] sd 10:0:3:0: [sdh] Result: hostbyte=DID_NO_CONNECT
driverbyte=DRIVER_OK
> [14269.651259] end_request: I/O error, dev sdh, sector 1435086655
> [14269.651358] sd 10:0:3:0: [sdh] Unhandled error code
> [14269.651361] sd 10:0:3:0: [sdh] Result: hostbyte=DID_NO_CONNECT
driverbyte=DRIVER_OK
> [14269.651364] end_request: I/O error, dev sdh, sector 1435086911
> [14269.651461] sd 10:0:3:0: [sdh] Unhandled error code
> [14269.651463] sd 10:0:3:0: [sdh] Result: hostbyte=DID_NO_CONNECT
driverbyte=DRIVER_OK
> [14269.651467] end_request: I/O error, dev sdh, sector 1435087167
> [14269.651565] sd 10:0:3:0: [sdh] Unhandled error code
> [14269.651568] sd 10:0:3:0: [sdh] Result: hostbyte=DID_NO_CONNECT
driverbyte=DRIVER_OK
> [14269.651571] end_request: I/O error, dev sdh, sector 1435087423
> [14269.670675] end_request: I/O error, dev sdf, sector 1953519935
> [14269.670739] md: super_written gets error=-5, uptodate=0
> [14269.670743] raid5: Disk failure on sdf1, disabling device.
> [14269.670745] raid5: Operation continuing on 6 devices.
> [14269.672525] end_request: I/O error, dev sdg, sector 1953519935
> [14269.672598] md: super_written gets error=-5, uptodate=0
> [14269.672603] raid5: Disk failure on sdg1, disabling device.
> [14269.672605] raid5: Operation continuing on 5 devices.
> [14269.674402] end_request: I/O error, dev sde, sector 1953519935
> [14269.674474] md: super_written gets error=-5, uptodate=0
> [14269.674478] raid5: Disk failure on sde1, disabling device.
> [14269.674480] raid5: Operation continuing on 4 devices.
> [14269.769991] sd 11:0:0:0: [sdi] Sense Key : Recovered Error
[current] [descriptor]
> [14269.769997] Descriptor sense data with sense descriptors (in hex):
> [14269.770000]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14269.770012]         00 4f 00 c2 00 50
> [14269.770018] sd 11:0:0:0: [sdi] Add. Sense: ATA pass through
information available
> [14269.800245] md: md0: recovery done.
> [14269.869990] sd 11:0:0:0: [sdi] Sense Key : Recovered Error
[current] [descriptor]
> [14269.869997] Descriptor sense data with sense descriptors (in hex):
> [14269.870008]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14269.870019]         00 4f 00 c2 00 50
> [14269.870025] sd 11:0:0:0: [sdi] Add. Sense: ATA pass through
information available
> [14269.905144] RAID5 conf printout:
> [14269.905148]  --- rd:9 wd:4
> [14269.905152]  disk 0, o:0, dev:sde1
> [14269.905155]  disk 1, o:0, dev:sdf1
> [14269.905157]  disk 2, o:0, dev:sdg1
> [14269.905160]  disk 3, o:0, dev:sdh1
> [14269.905162]  disk 4, o:1, dev:sdi1
> [14269.905165]  disk 5, o:1, dev:sdj1
> [14269.905167]  disk 6, o:1, dev:sdk1
> [14269.905169]  disk 7, o:1, dev:sdl1
> [14269.905172]  disk 8, o:1, dev:sdm1
> [14269.941265] RAID5 conf printout:
> [14269.941269]  --- rd:9 wd:4
> [14269.941273]  disk 0, o:0, dev:sde1
> [14269.941276]  disk 1, o:0, dev:sdf1
> [14269.941278]  disk 2, o:0, dev:sdg1
> [14269.941281]  disk 3, o:0, dev:sdh1
> [14269.941283]  disk 4, o:1, dev:sdi1
> [14269.941286]  disk 5, o:1, dev:sdj1
> [14269.941289]  disk 7, o:1, dev:sdl1
> [14269.941291]  disk 8, o:1, dev:sdm1
> [14269.941300] RAID5 conf printout:
> [14269.941302]  --- rd:9 wd:4
> [14269.941304]  disk 0, o:0, dev:sde1
> [14269.941307]  disk 1, o:0, dev:sdf1
> [14269.941309]  disk 2, o:0, dev:sdg1
> [14269.941311]  disk 3, o:0, dev:sdh1
> [14269.941314]  disk 4, o:1, dev:sdi1
> [14269.941316]  disk 5, o:1, dev:sdj1
> [14269.941318]  disk 7, o:1, dev:sdl1
> [14269.941321]  disk 8, o:1, dev:sdm1
> [14269.981260] RAID5 conf printout:
> [14269.981263]  --- rd:9 wd:4
> [14269.981265]  disk 0, o:0, dev:sde1
> [14269.981268]  disk 2, o:0, dev:sdg1
> [14269.981270]  disk 3, o:0, dev:sdh1
> [14269.981273]  disk 4, o:1, dev:sdi1
> [14269.981275]  disk 5, o:1, dev:sdj1
> [14269.981277]  disk 7, o:1, dev:sdl1
> [14269.981280]  disk 8, o:1, dev:sdm1
> [14269.981284] RAID5 conf printout:
> [14269.981286]  --- rd:9 wd:4
> [14269.981289]  disk 0, o:0, dev:sde1
> [14269.981291]  disk 2, o:0, dev:sdg1
> [14269.981293]  disk 3, o:0, dev:sdh1
> [14269.981296]  disk 4, o:1, dev:sdi1
> [14269.981298]  disk 5, o:1, dev:sdj1
> [14269.981300]  disk 7, o:1, dev:sdl1
> [14269.981302]  disk 8, o:1, dev:sdm1
> [14270.003316] sd 11:0:0:0: [sdi] Sense Key : Recovered Error
[current] [descriptor]
> [14270.003324] Descriptor sense data with sense descriptors (in hex):
> [14270.003327]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14270.003338]         00 4f 00 c2 00 50
> [14270.003344] sd 11:0:0:0: [sdi] Add. Sense: ATA pass through
information available
> [14270.021260] RAID5 conf printout:
> [14270.021263]  --- rd:9 wd:4
> [14270.021266]  disk 0, o:0, dev:sde1
> [14270.021269]  disk 3, o:0, dev:sdh1
> [14270.021271]  disk 4, o:1, dev:sdi1
> [14270.021274]  disk 5, o:1, dev:sdj1
> [14270.021276]  disk 7, o:1, dev:sdl1
> [14270.021278]  disk 8, o:1, dev:sdm1
> [14270.021283] RAID5 conf printout:
> [14270.021285]  --- rd:9 wd:4
> [14270.021287]  disk 0, o:0, dev:sde1
> [14270.021289]  disk 3, o:0, dev:sdh1
> [14270.021292]  disk 4, o:1, dev:sdi1
> [14270.021294]  disk 5, o:1, dev:sdj1
> [14270.021296]  disk 7, o:1, dev:sdl1
> [14270.021298]  disk 8, o:1, dev:sdm1
> [14270.061261] RAID5 conf printout:
> [14270.061264]  --- rd:9 wd:4
> [14270.061266]  disk 0, o:0, dev:sde1
> [14270.061269]  disk 4, o:1, dev:sdi1
> [14270.061272]  disk 5, o:1, dev:sdj1
> [14270.061274]  disk 7, o:1, dev:sdl1
> [14270.061276]  disk 8, o:1, dev:sdm1
> [14270.061281] RAID5 conf printout:
> [14270.061283]  --- rd:9 wd:4
> [14270.061285]  disk 0, o:0, dev:sde1
> [14270.061287]  disk 4, o:1, dev:sdi1
> [14270.061289]  disk 5, o:1, dev:sdj1
> [14270.061292]  disk 7, o:1, dev:sdl1
> [14270.061294]  disk 8, o:1, dev:sdm1
> [14270.061647] sd 11:0:0:0: [sdi] Sense Key : Recovered Error
[current] [descriptor]
> [14270.061653] Descriptor sense data with sense descriptors (in hex):
> [14270.061656]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14270.061667]         00 4f 00 c2 00 50
> [14270.061672] sd 11:0:0:0: [sdi] Add. Sense: ATA pass through
information available
> [14270.091263] RAID5 conf printout:
> [14270.091267]  --- rd:9 wd:4
> [14270.091271]  disk 4, o:1, dev:sdi1
> [14270.091274]  disk 5, o:1, dev:sdj1
> [14270.091276]  disk 7, o:1, dev:sdl1
> [14270.091279]  disk 8, o:1, dev:sdm1
> [14270.153319] sd 11:0:0:0: [sdi] Sense Key : Recovered Error
[current] [descriptor]
> [14270.153325] Descriptor sense data with sense descriptors (in hex):
> [14270.153328]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14270.153340]         00 4f 00 c2 00 50
> [14270.153346] sd 11:0:0:0: [sdi] Add. Sense: ATA pass through
information available
> [14270.211651] sd 11:0:0:0: [sdi] Sense Key : Recovered Error
[current] [descriptor]
> [14270.211657] Descriptor sense data with sense descriptors (in hex):
> [14270.211660]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14270.211671]         00 4f 00 c2 00 50
> [14270.211677] sd 11:0:0:0: [sdi] Add. Sense: ATA pass through
information available
> [14270.324057] sd 11:0:1:0: [sdj] Sense Key : Recovered Error
[current] [descriptor]
> [14270.324065] Descriptor sense data with sense descriptors (in hex):
> [14270.324067]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14270.324079]         00 4f 00 c2 00 50
> [14270.324085] sd 11:0:1:0: [sdj] Add. Sense: ATA pass through
information available
> [14270.382390] sd 11:0:1:0: [sdj] Sense Key : Recovered Error
[current] [descriptor]
> [14270.382396] Descriptor sense data with sense descriptors (in hex):
> [14270.382399]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14270.382410]         00 4f 00 c2 00 50
> [14270.382416] sd 11:0:1:0: [sdj] Add. Sense: ATA pass through
information available
> [14270.474060] sd 11:0:1:0: [sdj] Sense Key : Recovered Error
[current] [descriptor]
> [14270.474068] Descriptor sense data with sense descriptors (in hex):
> [14270.474071]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14270.474083]         00 4f 00 c2 00 50
> [14270.474089] sd 11:0:1:0: [sdj] Add. Sense: ATA pass through
information available
> [14270.532394] sd 11:0:1:0: [sdj] Sense Key : Recovered Error
[current] [descriptor]
> [14270.532401] Descriptor sense data with sense descriptors (in hex):
> [14270.532404]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14270.532415]         00 4f 00 c2 00 50
> [14270.532421] sd 11:0:1:0: [sdj] Add. Sense: ATA pass through
information available
> [14270.632394] sd 11:0:1:0: [sdj] Sense Key : Recovered Error
[current] [descriptor]
> [14270.632402] Descriptor sense data with sense descriptors (in hex):
> [14270.632405]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14270.632417]         00 4f 00 c2 00 50
> [14270.632423] sd 11:0:1:0: [sdj] Add. Sense: ATA pass through
information available
> [14270.690729] sd 11:0:1:0: [sdj] Sense Key : Recovered Error
[current] [descriptor]
> [14270.690736] Descriptor sense data with sense descriptors (in hex):
> [14270.690739]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14270.690751]         00 4f 00 c2 00 50
> [14270.690757] sd 11:0:1:0: [sdj] Add. Sense: ATA pass through
information available
> [14270.804065] sd 11:0:2:0: [sdk] Sense Key : Recovered Error
[current] [descriptor]
> [14270.804073] Descriptor sense data with sense descriptors (in hex):
> [14270.804076]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14270.804088]         00 4f 00 c2 00 50
> [14270.804094] sd 11:0:2:0: [sdk] Add. Sense: ATA pass through
information available
> [14270.862400] sd 11:0:2:0: [sdk] Sense Key : Recovered Error
[current] [descriptor]
> [14270.862406] Descriptor sense data with sense descriptors (in hex):
> [14270.862409]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14270.862420]         00 4f 00 c2 00 50
> [14270.862426] sd 11:0:2:0: [sdk] Add. Sense: ATA pass through
information available
> [14270.954070] sd 11:0:2:0: [sdk] Sense Key : Recovered Error
[current] [descriptor]
> [14270.954079] Descriptor sense data with sense descriptors (in hex):
> [14270.954081]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14270.954093]         00 4f 00 c2 00 50
> [14270.954099] sd 11:0:2:0: [sdk] Add. Sense: ATA pass through
information available
> [14271.012399] sd 11:0:2:0: [sdk] Sense Key : Recovered Error
[current] [descriptor]
> [14271.012406] Descriptor sense data with sense descriptors (in hex):
> [14271.012408]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14271.012420]         00 4f 00 c2 00 50
> [14271.012426] sd 11:0:2:0: [sdk] Add. Sense: ATA pass through
information available
> [14271.104072] sd 11:0:2:0: [sdk] Sense Key : Recovered Error
[current] [descriptor]
> [14271.104080] Descriptor sense data with sense descriptors (in hex):
> [14271.104083]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14271.104094]         00 4f 00 c2 00 50
> [14271.104100] sd 11:0:2:0: [sdk] Add. Sense: ATA pass through
information available
> [14271.162400] sd 11:0:2:0: [sdk] Sense Key : Recovered Error
[current] [descriptor]
> [14271.162407] Descriptor sense data with sense descriptors (in hex):
> [14271.162410]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14271.162422]         00 4f 00 c2 00 50
> [14271.162428] sd 11:0:2:0: [sdk] Add. Sense: ATA pass through
information available
> [14271.278147] sd 11:0:3:0: [sdl] Sense Key : Recovered Error
[current] [descriptor]
> [14271.278155] Descriptor sense data with sense descriptors (in hex):
> [14271.278157]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14271.278169]         00 4f 00 c2 00 50
> [14271.278175] sd 11:0:3:0: [sdl] Add. Sense: ATA pass through
information available
> [14271.336487] sd 11:0:3:0: [sdl] Sense Key : Recovered Error
[current] [descriptor]
> [14271.336495] Descriptor sense data with sense descriptors (in hex):
> [14271.336498]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14271.336509]         00 4f 00 c2 00 50
> [14271.336515] sd 11:0:3:0: [sdl] Add. Sense: ATA pass through
information available
> [14271.428148] sd 11:0:3:0: [sdl] Sense Key : Recovered Error
[current] [descriptor]
> [14271.428156] Descriptor sense data with sense descriptors (in hex):
> [14271.428158]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14271.428170]         00 4f 00 c2 00 50
> [14271.428176] sd 11:0:3:0: [sdl] Add. Sense: ATA pass through
information available
> [14271.486485] sd 11:0:3:0: [sdl] Sense Key : Recovered Error
[current] [descriptor]
> [14271.486493] Descriptor sense data with sense descriptors (in hex):
> [14271.486496]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14271.486508]         00 4f 00 c2 00 50
> [14271.486514] sd 11:0:3:0: [sdl] Add. Sense: ATA pass through
information available
> [14271.586482] sd 11:0:3:0: [sdl] Sense Key : Recovered Error
[current] [descriptor]
> [14271.586489] Descriptor sense data with sense descriptors (in hex):
> [14271.586492]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14271.586503]         00 4f 00 c2 00 50
> [14271.586509] sd 11:0:3:0: [sdl] Add. Sense: ATA pass through
information available
> [14271.644813] sd 11:0:3:0: [sdl] Sense Key : Recovered Error
[current] [descriptor]
> [14271.644819] Descriptor sense data with sense descriptors (in hex):
> [14271.644822]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14271.644833]         00 4f 00 c2 00 50
> [14271.644839] sd 11:0:3:0: [sdl] Add. Sense: ATA pass through
information available
> [14271.762812] sd 11:0:4:0: [sdm] Sense Key : Recovered Error
[current] [descriptor]
> [14271.762820] Descriptor sense data with sense descriptors (in hex):
> [14271.762823]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14271.762834]         00 4f 00 c2 00 50
> [14271.762841] sd 11:0:4:0: [sdm] Add. Sense: ATA pass through
information available
> [14271.821145] sd 11:0:4:0: [sdm] Sense Key : Recovered Error
[current] [descriptor]
> [14271.821152] Descriptor sense data with sense descriptors (in hex):
> [14271.821154]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14271.821166]         00 4f 00 c2 00 50
> [14271.821172] sd 11:0:4:0: [sdm] Add. Sense: ATA pass through
information available
> [14271.912816] sd 11:0:4:0: [sdm] Sense Key : Recovered Error
[current] [descriptor]
> [14271.912824] Descriptor sense data with sense descriptors (in hex):
> [14271.912827]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14271.912838]         00 4f 00 c2 00 50
> [14271.912844] sd 11:0:4:0: [sdm] Add. Sense: ATA pass through
information available
> [14271.971152] sd 11:0:4:0: [sdm] Sense Key : Recovered Error
[current] [descriptor]
> [14271.971161] Descriptor sense data with sense descriptors (in hex):
> [14271.971163]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14271.971175]         00 4f 00 c2 00 50
> [14271.971181] sd 11:0:4:0: [sdm] Add. Sense: ATA pass through
information available
> [14272.071150] sd 11:0:4:0: [sdm] Sense Key : Recovered Error
[current] [descriptor]
> [14272.071157] Descriptor sense data with sense descriptors (in hex):
> [14272.071160]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14272.071172]         00 4f 00 c2 00 50
> [14272.071178] sd 11:0:4:0: [sdm] Add. Sense: ATA pass through
information available
> [14272.129485] sd 11:0:4:0: [sdm] Sense Key : Recovered Error
[current] [descriptor]
> [14272.129494] Descriptor sense data with sense descriptors (in hex):
> [14272.129497]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> [14272.129508]         00 4f 00 c2 00 50
> [14272.129514] sd 11:0:4:0: [sdm] Add. Sense: ATA pass through
information available
> [14365.066847] Aborting journal on device md0:8.
> [14365.066946] __ratelimit: 246 callbacks suppressed
> [14365.066949] Buffer I/O error on device md0, logical block 854622208
> [14365.067018] lost page write due to I/O error on md0
> [14365.067023] JBD2: I/O error detected when updating journal
superblock for md0:8.
> [14382.768622] EXT4-fs error (device md0): ext4_find_entry: reading
directory #6879966 offset 0
> [14382.820264] Buffer I/O error on device md0, logical block 0
> [14382.820332] lost page write due to I/O error on md0
> [14401.997859] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6891861, block=27267765
> [14401.998043] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.041639] Buffer I/O error on device md0, logical block 0
> [14402.041708] lost page write due to I/O error on md0
> [14402.042025] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6892055, block=27267777
> [14402.042189] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.042337] Buffer I/O error on device md0, logical block 0
> [14402.042404] lost page write due to I/O error on md0
> [14402.042615] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6891691, block=27267754
> [14402.042780] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.042927] Buffer I/O error on device md0, logical block 0
> [14402.042994] lost page write due to I/O error on md0
> [14402.043204] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6891589, block=27267748
> [14402.043369] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.043514] Buffer I/O error on device md0, logical block 0
> [14402.043581] lost page write due to I/O error on md0
> [14402.045186] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6892719, block=27267818
> [14402.045351] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.045500] Buffer I/O error on device md0, logical block 0
> [14402.045569] lost page write due to I/O error on md0
> [14402.061829] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6891914, block=27267768
> [14402.061983] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.062117] Buffer I/O error on device md0, logical block 0
> [14402.062175] lost page write due to I/O error on md0
> [14402.062495] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6891136, block=27267719
> [14402.062651] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.062793] Buffer I/O error on device md0, logical block 0
> [14402.062859] lost page write due to I/O error on md0
> [14402.063053] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6893036, block=27267838
> [14402.063217] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.063357] Buffer I/O error on device md0, logical block 0
> [14402.063423] lost page write due to I/O error on md0
> [14402.063624] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6892357, block=27267796
> [14402.063793] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.063935] Buffer I/O error on device md0, logical block 0
> [14402.064001] lost page write due to I/O error on md0
> [14402.064193] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6891032, block=27267713
> [14402.064355] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.064496] Buffer I/O error on device md0, logical block 0
> [14402.064561] lost page write due to I/O error on md0
> [14402.064741] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6891782, block=27267760
> [14402.064906] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.065232] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6892332, block=27267794
> [14402.065395] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.065714] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6892755, block=27267821
> [14402.065878] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.066197] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6892235, block=27267788
> [14402.066362] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.066675] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6892552, block=27267808
> [14402.066840] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.067156] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6892123, block=27267781
> [14402.067321] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.067635] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6892256, block=27267789
> [14402.067800] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.068114] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6892532, block=27267807
> [14402.068278] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.068594] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6892318, block=27267793
> [14402.068758] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.069069] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6892845, block=27267826
> [14402.069233] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.069543] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6892980, block=27267835
> [14402.069707] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.074971] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6890993, block=27267711
> [14402.075140] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.075540] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6891750, block=27267758
> [14402.075686] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.076028] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6891642, block=27267751
> [14402.076174] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.076543] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6892605, block=27267811
> [14402.076689] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.077059] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6892136, block=27267782
> [14402.077223] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.077567] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6892717, block=27267818
> [14402.077732] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.078080] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6890999, block=27267711
> [14402.078243] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.078593] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6892362, block=27267796
> [14402.080842] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.083259] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6892867, block=27267828
> [14402.083423] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.083798] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6891361, block=27267734
> [14402.083963] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.084315] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6892012, block=27267774
> [14402.084480] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.084852] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6891626, block=27267750
> [14402.085014] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.085365] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6891320, block=27267731
> [14402.085530] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.085880] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6891588, block=27267748
> [14402.086044] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.086390] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6892584, block=27267810
> [14402.086556] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.086901] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6892891, block=27267829
> [14402.087066] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.087416] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6891118, block=27267718
> [14402.087579] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.087930] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6891559, block=27267746
> [14402.088094] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.088445] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6891212, block=27267724
> [14402.088609] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.091550] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6890993, block=27267711
> [14402.091718] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.106045] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6890999, block=27267711
> [14402.106212] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.141662] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6889579, block=27267622
> [14402.141829] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.142185] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6889980, block=27267647
> [14402.142350] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.142703] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6889704, block=27267630
> [14402.142868] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.143318] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6889340, block=27267607
> [14402.143483] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.143826] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6889563, block=27267621
> [14402.143990] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.144341] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6889220, block=27267600
> [14402.144506] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.144869] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6889358, block=27267608
> [14402.145034] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.145379] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6889530, block=27267619
> [14402.145542] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.145890] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6889721, block=27267631
> [14402.146054] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.146398] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6889621, block=27267625
> [14402.146562] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.146900] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6889920, block=27267643
> [14402.147047] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.147390] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6889508, block=27267618
> [14402.147536] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.147869] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6889673, block=27267628
> [14402.148015] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.153911] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6889220, block=27267600
> [14402.154075] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.155819] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6889340, block=27267607
> [14402.155987] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.261374] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6876431, block=27266800
> [14402.261522] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.261981] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6875468, block=27266740
> [14402.262128] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.262587] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6878644, block=27266939
> [14402.262753] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.263223] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6875990, block=27266773
> [14402.263388] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.263741] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6875325, block=27266731
> [14402.263908] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.264259] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6875779, block=27266760
> [14402.264424] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.264808] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6879881, block=27267016
> [14402.264972] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.265325] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6878489, block=27266929
> [14402.265491] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.265842] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6878591, block=27266935
> [14402.266005] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.266357] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6876138, block=27266782
> [14402.266520] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.266876] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6875310, block=27266730
> [14402.267042] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.267396] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6876274, block=27266791
> [14402.267560] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.267907] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6875648, block=27266751
> [14402.268071] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.268422] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6876119, block=27266781
> [14402.268586] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.269056] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6877808, block=27266886
> [14402.269219] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.269573] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6878101, block=27266905
> [14402.269738] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.270088] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6877967, block=27266896
> [14402.270264] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.270614] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6877835, block=27266888
> [14402.270793] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.271146] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6875370, block=27266734
> [14402.271323] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.271679] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6877955, block=27266896
> [14402.271854] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.272214] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6876218, block=27266787
> [14402.272391] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.272745] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6875340, block=27266732
> [14402.272922] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.273281] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6878031, block=27266900
> [14402.273452] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.273919] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6874892, block=27266704
> [14402.274097] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.274454] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6878014, block=27266899
> [14402.274628] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.274987] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6876066, block=27266778
> [14402.275146] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.275488] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6879778, block=27267010
> [14402.275646] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.275996] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6878310, block=27266918
> [14402.276151] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.276624] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6877450, block=27266864
> [14402.276793] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.277148] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6875397, block=27266736
> [14402.277315] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.277778] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6877004, block=27266836
> [14402.277943] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.278295] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6875606, block=27266749
> [14402.280543] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.283306] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6874892, block=27266704
> [14402.283472] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.285354] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6875310, block=27266730
> [14402.285519] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.302533] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6874640, block=27266688
> [14402.302698] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.304480] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6874640, block=27266688
> [14402.304629] EXT4-fs (md0): previous I/O error to superblock detected
> [14402.752437] EXT4-fs error (device md0): ext4_journal_start_sb:
Detected aborted journal
> [14402.752606] EXT4-fs (md0): Remounting filesystem read-only
> [14419.267133] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6874640, block=27266688
> [14419.297937] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6874892, block=27266704
> [14419.301517] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6875310, block=27266730
> [14419.332861] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6889220, block=27267600
> [14419.335590] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6889340, block=27267607
> [14419.341744] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6890993, block=27267711
> [14419.343458] EXT4-fs error (device md0): __ext4_get_inode_loc:
unable to read inode block - inode=6890999, block=27267711


Richard Scobie wrote:
> Andrew Dunn wrote:
>
>> Do you think that the controller is dropping out? I know that I have 4
>> drives on one controller (AOC-USAS-L8i) and 5 drives on the other
>> controller (SAME make/model). but I think they are sequentially
>> connected... as in sd[efghi] should be on one device and sd[jklm] should
>> be on the other... any easy way to verify?
>
> If you are running smartd, cease doing so and do not use the smartctl
> command on drives attached to these controllers - use causes drives to
> be offlined.
>
> It appears the smartctl is broken with LSISAS 1068E based controllers.
>
> See:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=452389
>
> and
>
> http://marc.info/?l=linux-scsi&m=125673590221135&w=2
>
> Regards,
>
> Richard
>

-- 
Andrew Dunn
http://agdunn.net


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID 6 Failure follow up
  2009-11-08 18:22       ` Andrew Dunn
@ 2009-11-08 18:34         ` Joe Landman
  0 siblings, 0 replies; 23+ messages in thread
From: Joe Landman @ 2009-11-08 18:34 UTC (permalink / raw)
  To: Andrew Dunn; +Cc: Richard Scobie, Roger Heflin, robin, linux-raid list, nfbrown

Andrew Dunn wrote:
> New data now, I got this from dmesg when it went down again. Hopefully
> there is some significance to you guys.
> 
>> [14269.650381] sd 10:0:3:0: rejecting I/O to offline device
>> [14269.650453] sd 10:0:3:0: rejecting I/O to offline device
>> [14269.650524] sd 10:0:3:0: rejecting I/O to offline device
>> [14269.650595] sd 10:0:3:0: rejecting I/O to offline device
>> [14269.650672] sd 10:0:3:0: [sdh] Unhandled error code
>> [14269.650675] sd 10:0:3:0: [sdh] Result: hostbyte=DID_NO_CONNECT
> driverbyte=DRIVER_OK
>> [14269.650680] end_request: I/O error, dev sdh, sector 1435085631
>> [14269.650749] raid5:md0: read error not correctable (sector
> 1435085568 on sdh1).
>> [14269.650753] raid5: Disk failure on sdh1, disabling device.
>> [14269.650754] raid5: Operation continuing on 7 devices.
>> [14269.650886] raid5:md0: read error not correctable (sector
> 1435085576 on sdh1).
>> [14269.650890] raid5:md0: read error not correctable (sector
> 1435085584 on sdh1).
>> [14269.650894] raid5:md0: read error not correctable (sector
> 1435085592 on sdh1).

[...]

I am not convinced this is a drive failure (yet).  You have 
sdh,sdi,sdj,sdk,sdl,sdm all reporting errors or error recovery.

This sounds like a physical backplane failure (is this on an expander 
system? we have seen this/had this happen before), a cable to the SATA 
card failing (we have seen this/had this happen before), or a power 
supply issue (can't handle all the drives in constant operation, which 
we have seen before as well).

Driver issues are possible, but it is pursuing normal failure code 
paths, so unless the driver is tickling the remove code on its own ...

Smart could be offlining the drive, and having it non-responsive. 
Something else could be doing that as well (vibration, power quality, ...)

What does

	hdparm -I /dev/sdh

tell us?

If nothing, we need to use sdparm to get some information.

Joe

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID 6 Failure follow up
  2009-11-08 18:01     ` Richard Scobie
  2009-11-08 18:22       ` Andrew Dunn
@ 2009-11-08 22:09       ` Andrew Dunn
  2009-11-08 22:59         ` Richard Scobie
  1 sibling, 1 reply; 23+ messages in thread
From: Andrew Dunn @ 2009-11-08 22:09 UTC (permalink / raw)
  To: Richard Scobie; +Cc: Roger Heflin, robin, linux-raid list

I am not, but this is quite interesting. What versions are affected? I
am in ubuntu 9.10.

Richard Scobie wrote:
> Andrew Dunn wrote:
>
>> Do you think that the controller is dropping out? I know that I have 4
>> drives on one controller (AOC-USAS-L8i) and 5 drives on the other
>> controller (SAME make/model). but I think they are sequentially
>> connected... as in sd[efghi] should be on one device and sd[jklm] should
>> be on the other... any easy way to verify?
>
> If you are running smartd, cease doing so and do not use the smartctl
> command on drives attached to these controllers - use causes drives to
> be offlined.
>
> It appears the smartctl is broken with LSISAS 1068E based controllers.
>
> See:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=452389
>
> and
>
> http://marc.info/?l=linux-scsi&m=125673590221135&w=2
>
> Regards,
>
> Richard
>

-- 
Andrew Dunn
http://agdunn.net


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID 6 Failure follow up
  2009-11-08 22:09       ` Andrew Dunn
@ 2009-11-08 22:59         ` Richard Scobie
  2009-11-09  2:45           ` Ryan Wagoner
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Scobie @ 2009-11-08 22:59 UTC (permalink / raw)
  To: Andrew Dunn, Linux RAID Mailing List

Andrew Dunn wrote:
> I am not, but this is quite interesting. What versions are affected? I
> am in ubuntu 9.10.

To my knowledge, there is no smartmontools version safe for use on these 
LSI based controllers.

You may run smartctl commands a few times and get away with it, but 
eventually it will bite you.

Losing 14 drives on a 16 drive array as a result, is no fun...

Hopefully it will be fixed one day.

Regards,

Richard

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID 6 Failure follow up
  2009-11-08 22:59         ` Richard Scobie
@ 2009-11-09  2:45           ` Ryan Wagoner
  2009-11-09  2:57             ` Richard Scobie
  2009-11-09  8:09             ` Gabor Gombas
  0 siblings, 2 replies; 23+ messages in thread
From: Ryan Wagoner @ 2009-11-09  2:45 UTC (permalink / raw)
  To: Richard Scobie; +Cc: Andrew Dunn, Linux RAID Mailing List

This is interesting to hear as I have been using smartmontools on my
Supermicro LSI 1068E controller with the target firmware for 2 years
now on CentOS 5. I have 3 RAID 1 arrays across 2 drives, a RAID 5
drive across 3 drives, and a RAID 0 across 2 drives.

I routinely will query smartctrl with something like

for i in a b c d e f; do smartctl -a /dev/sd$i | grep Reallocated; done

or

for i in a b c d e f; do smartctl -a /dev/sd$i | grep Temperature; done

Here are the system details

 cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] [raid0]
md0 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]

md2 : active raid1 sdb3[1] sda3[0]
      2032128 blocks [2/2] [UU]

md3 : active raid0 sdd1[1] sdc1[0]
      625137152 blocks 64k chunks

md4 : active raid5 sdg1[2] sdf1[1] sde1[0]
      1953519872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

md1 : active raid1 sdb2[1] sda2[0]
      154151616 blocks [2/2] [UU]

lpsci
02:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E
PCI-Express Fusion-MPT SAS (rev 08)

modprobe.conf
alias scsi_hostadapter mptbase
alias scsi_hostadapter1 mptsas

rpm -qa | grep smartmontools
smartmontools-5.38-2.el5

uname -r
2.6.18-128.4.1.el5

On Sun, Nov 8, 2009 at 5:59 PM, Richard Scobie <richard@sauce.co.nz> wrote:
> Andrew Dunn wrote:
>>
>> I am not, but this is quite interesting. What versions are affected? I
>> am in ubuntu 9.10.
>
> To my knowledge, there is no smartmontools version safe for use on these LSI
> based controllers.
>
> You may run smartctl commands a few times and get away with it, but
> eventually it will bite you.
>
> Losing 14 drives on a 16 drive array as a result, is no fun...
>
> Hopefully it will be fixed one day.
>
> Regards,
>
> Richard
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID 6 Failure follow up
  2009-11-09  2:45           ` Ryan Wagoner
@ 2009-11-09  2:57             ` Richard Scobie
  2009-11-09  8:09             ` Gabor Gombas
  1 sibling, 0 replies; 23+ messages in thread
From: Richard Scobie @ 2009-11-09  2:57 UTC (permalink / raw)
  To: Ryan Wagoner; +Cc: Andrew Dunn, Linux RAID Mailing List

Ryan Wagoner wrote:
> This is interesting to hear as I have been using smartmontools on my
> Supermicro LSI 1068E controller with the target firmware for 2 years
> now on CentOS 5. I have 3 RAID 1 arrays across 2 drives, a RAID 5
> drive across 3 drives, and a RAID 0 across 2 drives.

I have 3 boxes using 1068E controllers attached to 16 drive port 
expander based chassis that have been built over the last 2.5 years and 
they all react badly.

In fact the latest one put together a month ago (which is using more 
recent controller IT firmware and kernel than the other two), will not 
tolerate a single smartctl command, where the other two will maybe 50% 
of the time.

Something is not right here and others running different drive setups - 
direct attached and port multipler based are seeing the same thing.

Suffice it to say, I would recommend heavy testing before putting into 
production and I personally have no confidence currently.

Regards,

Richard

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID 6 Failure follow up
  2009-11-09  2:45           ` Ryan Wagoner
  2009-11-09  2:57             ` Richard Scobie
@ 2009-11-09  8:09             ` Gabor Gombas
  2009-11-09 10:08               ` Andrew Dunn
  1 sibling, 1 reply; 23+ messages in thread
From: Gabor Gombas @ 2009-11-09  8:09 UTC (permalink / raw)
  To: Ryan Wagoner; +Cc: Richard Scobie, Andrew Dunn, Linux RAID Mailing List

On Sun, Nov 08, 2009 at 09:45:40PM -0500, Ryan Wagoner wrote:

> This is interesting to hear as I have been using smartmontools on my
> Supermicro LSI 1068E controller with the target firmware for 2 years
> now on CentOS 5. I have 3 RAID 1 arrays across 2 drives, a RAID 5
> drive across 3 drives, and a RAID 0 across 2 drives.

[...]

> uname -r
> 2.6.18-128.4.1.el5

Kernel version matters. With 2.6.22 we only got occassional complaints
that the drives are not capable of SMART checks that were not true but
were otherwise harmless. With 2.6.26 and 2.6.30, the controller offlines
the disks.

Gabor

-- 
     ---------------------------------------------------------
     MTA SZTAKI Computer and Automation Research Institute
                Hungarian Academy of Sciences
     ---------------------------------------------------------

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID 6 Failure follow up
  2009-11-09  8:09             ` Gabor Gombas
@ 2009-11-09 10:08               ` Andrew Dunn
  2009-11-09 11:34                 ` Gabor Gombas
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Dunn @ 2009-11-09 10:08 UTC (permalink / raw)
  To: Gabor Gombas; +Cc: Ryan Wagoner, Richard Scobie, Linux RAID Mailing List

does it momentarily offline the disks? like they re-appear in /dev
within moments? That would be similar behavior to what I am
experiencing, the disks drop from the array, but they are in /dev by the
time I get a chance to see them.

I am however not running smard to my knowledge, smartmontools is
installed and I access it through the webmin module, but checking the
drives with that and the array failures have not happened at the same time.

Gabor Gombas wrote:
> On Sun, Nov 08, 2009 at 09:45:40PM -0500, Ryan Wagoner wrote:
>
>   
>> This is interesting to hear as I have been using smartmontools on my
>> Supermicro LSI 1068E controller with the target firmware for 2 years
>> now on CentOS 5. I have 3 RAID 1 arrays across 2 drives, a RAID 5
>> drive across 3 drives, and a RAID 0 across 2 drives.
>>     
>
> [...]
>
>   
>> uname -r
>> 2.6.18-128.4.1.el5
>>     
>
> Kernel version matters. With 2.6.22 we only got occassional complaints
> that the drives are not capable of SMART checks that were not true but
> were otherwise harmless. With 2.6.26 and 2.6.30, the controller offlines
> the disks.
>
> Gabor
>
>   

-- 
Andrew Dunn
http://agdunn.net


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID 6 Failure follow up
  2009-11-09 10:08               ` Andrew Dunn
@ 2009-11-09 11:34                 ` Gabor Gombas
  2009-11-09 22:04                   ` Andrew Dunn
  2009-11-10 10:55                   ` Andrew Dunn
  0 siblings, 2 replies; 23+ messages in thread
From: Gabor Gombas @ 2009-11-09 11:34 UTC (permalink / raw)
  To: Andrew Dunn; +Cc: Ryan Wagoner, Richard Scobie, Linux RAID Mailing List

On Mon, Nov 09, 2009 at 05:08:23AM -0500, Andrew Dunn wrote:

> does it momentarily offline the disks? like they re-appear in /dev
> within moments? That would be similar behavior to what I am
> experiencing, the disks drop from the array, but they are in /dev by the
> time I get a chance to see them.

No, either the disks need to be physically removed and re-inserted, or
the machine needs to be rebooted.

Gabor

-- 
     ---------------------------------------------------------
     MTA SZTAKI Computer and Automation Research Institute
                Hungarian Academy of Sciences
     ---------------------------------------------------------

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID 6 Failure follow up
  2009-11-09 11:34                 ` Gabor Gombas
@ 2009-11-09 22:04                   ` Andrew Dunn
  2009-11-10 10:55                   ` Andrew Dunn
  1 sibling, 0 replies; 23+ messages in thread
From: Andrew Dunn @ 2009-11-09 22:04 UTC (permalink / raw)
  To: Gabor Gombas; +Cc: Ryan Wagoner, Richard Scobie, Linux RAID Mailing List

I am not experiencing this issue then. My devices are in /dev after the
raid drop out. I can use smart scanning on them without issue also.

Gabor Gombas wrote:
> On Mon, Nov 09, 2009 at 05:08:23AM -0500, Andrew Dunn wrote:
>
>   
>> does it momentarily offline the disks? like they re-appear in /dev
>> within moments? That would be similar behavior to what I am
>> experiencing, the disks drop from the array, but they are in /dev by the
>> time I get a chance to see them.
>>     
>
> No, either the disks need to be physically removed and re-inserted, or
> the machine needs to be rebooted.
>
> Gabor
>
>   

-- 
Andrew Dunn
http://agdunn.net


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID 6 Failure follow up
  2009-11-09 11:34                 ` Gabor Gombas
  2009-11-09 22:04                   ` Andrew Dunn
@ 2009-11-10 10:55                   ` Andrew Dunn
  2009-11-10 11:34                     ` Vincent Schut
  2009-11-10 12:45                     ` Ryan Wagoner
  1 sibling, 2 replies; 23+ messages in thread
From: Andrew Dunn @ 2009-11-10 10:55 UTC (permalink / raw)
  To: Gabor Gombas; +Cc: Ryan Wagoner, Richard Scobie, Linux RAID Mailing List

I am able to reproduce this smart error now. I have done it twice, so
maybe other things are causing this also.

When I scanned the devices this morning with smartctl via webmin I lost
8 of the 9 drives. They are howerver still in my /dev folder.

Now I sent out my logs from the first failure last night, smartctl was
on the system... I dont know if ubuntu server's default smartd
configuration makes it do periodic scans because I didnt change anything.

I would hate to move back to 9.10 and see this problem again.

Should I just not install smartmontools? This seems like a bad solution
because now I wont be able to check the drives in advance for failures.

Have you installed LSI's linux drivers? Some people say this solves
their issue.

From the logs sent out last night do you think it could be something else?

Thanks a ton,

Gabor Gombas wrote:
> On Mon, Nov 09, 2009 at 05:08:23AM -0500, Andrew Dunn wrote:
>
>   
>> does it momentarily offline the disks? like they re-appear in /dev
>> within moments? That would be similar behavior to what I am
>> experiencing, the disks drop from the array, but they are in /dev by the
>> time I get a chance to see them.
>>     
>
> No, either the disks need to be physically removed and re-inserted, or
> the machine needs to be rebooted.
>
> Gabor
>
>   

-- 
Andrew Dunn
http://agdunn.net


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID 6 Failure follow up
  2009-11-10 10:55                   ` Andrew Dunn
@ 2009-11-10 11:34                     ` Vincent Schut
  2009-11-11 12:34                       ` Andrew Dunn
  2009-11-17  8:40                       ` Vincent Schut
  2009-11-10 12:45                     ` Ryan Wagoner
  1 sibling, 2 replies; 23+ messages in thread
From: Vincent Schut @ 2009-11-10 11:34 UTC (permalink / raw)
  To: Andrew Dunn
  Cc: Gabor Gombas, Ryan Wagoner, Richard Scobie, Linux RAID Mailing List

Andrew Dunn wrote:
> I am able to reproduce this smart error now. I have done it twice, so
> maybe other things are causing this also.
> 
> When I scanned the devices this morning with smartctl via webmin I lost
> 8 of the 9 drives. They are howerver still in my /dev folder.
> 
> Now I sent out my logs from the first failure last night, smartctl was
> on the system... I dont know if ubuntu server's default smartd
> configuration makes it do periodic scans because I didnt change anything.
> 
> I would hate to move back to 9.10 and see this problem again.
> 
> Should I just not install smartmontools? This seems like a bad solution
> because now I wont be able to check the drives in advance for failures.
> 
> Have you installed LSI's linux drivers? Some people say this solves
> their issue.
> 
> From the logs sent out last night do you think it could be something else?
> 
> Thanks a ton,

FWIW, I encountered the same issue, and seem to have found a viable 
workaround by accessing the SATA disks on that LSI backplane as scsi 
devices, e.g. by adding '-d scsi' to my smartctl/smartd.conf lines. No 
more errors in the logs, no more drives being kicked out.
Though not as much info is available that way as when using de sata 
driver ('-d sat', or automatically), like temperature is unavailable, it 
does allow me to initiate the selftests and get their result, and to 
monitor generic smart status of the drives. Quite enough for me.

YMMV, though.

Vincent.
> 
> Gabor Gombas wrote:
>> On Mon, Nov 09, 2009 at 05:08:23AM -0500, Andrew Dunn wrote:
>>
>>   
>>> does it momentarily offline the disks? like they re-appear in /dev
>>> within moments? That would be similar behavior to what I am
>>> experiencing, the disks drop from the array, but they are in /dev by the
>>> time I get a chance to see them.
>>>     
>> No, either the disks need to be physically removed and re-inserted, or
>> the machine needs to be rebooted.
>>
>> Gabor
>>
>>   
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID 6 Failure follow up
  2009-11-10 10:55                   ` Andrew Dunn
  2009-11-10 11:34                     ` Vincent Schut
@ 2009-11-10 12:45                     ` Ryan Wagoner
  1 sibling, 0 replies; 23+ messages in thread
From: Ryan Wagoner @ 2009-11-10 12:45 UTC (permalink / raw)
  To: Andrew Dunn; +Cc: Linux RAID Mailing List

Boot up a CentOS 5 LiveCD. It should detect your arrays and try
running smartctl. From my experience with different distros I have
found that Red Hat spends a good amount of time making sure enterprise
hardware is stable on their system. Ubuntu seems to focus more on
desktops.

Ryan

On Tue, Nov 10, 2009 at 5:55 AM, Andrew Dunn <andrew.g.dunn@gmail.com> wrote:
> I am able to reproduce this smart error now. I have done it twice, so
> maybe other things are causing this also.
>
> When I scanned the devices this morning with smartctl via webmin I lost
> 8 of the 9 drives. They are howerver still in my /dev folder.
>
> Now I sent out my logs from the first failure last night, smartctl was
> on the system... I dont know if ubuntu server's default smartd
> configuration makes it do periodic scans because I didnt change anything.
>
> I would hate to move back to 9.10 and see this problem again.
>
> Should I just not install smartmontools? This seems like a bad solution
> because now I wont be able to check the drives in advance for failures.
>
> Have you installed LSI's linux drivers? Some people say this solves
> their issue.
>
> From the logs sent out last night do you think it could be something else?
>
> Thanks a ton,
>
> Gabor Gombas wrote:
>> On Mon, Nov 09, 2009 at 05:08:23AM -0500, Andrew Dunn wrote:
>>
>>
>>> does it momentarily offline the disks? like they re-appear in /dev
>>> within moments? That would be similar behavior to what I am
>>> experiencing, the disks drop from the array, but they are in /dev by the
>>> time I get a chance to see them.
>>>
>>
>> No, either the disks need to be physically removed and re-inserted, or
>> the machine needs to be rebooted.
>>
>> Gabor
>>
>>
>
> --
> Andrew Dunn
> http://agdunn.net
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID 6 Failure follow up
  2009-11-10 11:34                     ` Vincent Schut
@ 2009-11-11 12:34                       ` Andrew Dunn
  2009-11-11 12:46                         ` Vincent Schut
  2009-11-17  8:40                       ` Vincent Schut
  1 sibling, 1 reply; 23+ messages in thread
From: Andrew Dunn @ 2009-11-11 12:34 UTC (permalink / raw)
  To: Vincent Schut
  Cc: Gabor Gombas, Ryan Wagoner, Richard Scobie, Linux RAID Mailing List

Thanks for your help, so far without smartctl installed I have had no
issues... but it has only been about 12 hours.

Could you send me your smatd.conf?

Vincent Schut wrote:
> Andrew Dunn wrote:
>> I am able to reproduce this smart error now. I have done it twice, so
>> maybe other things are causing this also.
>>
>> When I scanned the devices this morning with smartctl via webmin I lost
>> 8 of the 9 drives. They are howerver still in my /dev folder.
>>
>> Now I sent out my logs from the first failure last night, smartctl was
>> on the system... I dont know if ubuntu server's default smartd
>> configuration makes it do periodic scans because I didnt change
>> anything.
>>
>> I would hate to move back to 9.10 and see this problem again.
>>
>> Should I just not install smartmontools? This seems like a bad solution
>> because now I wont be able to check the drives in advance for failures.
>>
>> Have you installed LSI's linux drivers? Some people say this solves
>> their issue.
>>
>> From the logs sent out last night do you think it could be something
>> else?
>>
>> Thanks a ton,
>
> FWIW, I encountered the same issue, and seem to have found a viable
> workaround by accessing the SATA disks on that LSI backplane as scsi
> devices, e.g. by adding '-d scsi' to my smartctl/smartd.conf lines. No
> more errors in the logs, no more drives being kicked out.
> Though not as much info is available that way as when using de sata
> driver ('-d sat', or automatically), like temperature is unavailable,
> it does allow me to initiate the selftests and get their result, and
> to monitor generic smart status of the drives. Quite enough for me.
>
> YMMV, though.
>
> Vincent.
>>
>> Gabor Gombas wrote:
>>> On Mon, Nov 09, 2009 at 05:08:23AM -0500, Andrew Dunn wrote:
>>>
>>>  
>>>> does it momentarily offline the disks? like they re-appear in /dev
>>>> within moments? That would be similar behavior to what I am
>>>> experiencing, the disks drop from the array, but they are in /dev
>>>> by the
>>>> time I get a chance to see them.
>>>>     
>>> No, either the disks need to be physically removed and re-inserted, or
>>> the machine needs to be rebooted.
>>>
>>> Gabor
>>>
>>>   
>>
>
>

-- 
Andrew Dunn
http://agdunn.net


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID 6 Failure follow up
  2009-11-11 12:34                       ` Andrew Dunn
@ 2009-11-11 12:46                         ` Vincent Schut
  0 siblings, 0 replies; 23+ messages in thread
From: Vincent Schut @ 2009-11-11 12:46 UTC (permalink / raw)
  To: linux-raid

Andrew Dunn wrote:
> Thanks for your help, so far without smartctl installed I have had no
> issues... but it has only been about 12 hours.
I also had no issues when not running smartd/smartctl. It seems the 
combination of kernel, backplane SAS driver, and smart which triggers 
the trouble...
> 
> Could you send me your smatd.conf?

It's pretty much default, there's just one uncommented line in it:

DEVICESCAN -d scsi -a -o on -S on -s (S/../.././02|L/../../6/03) -W 
4,45,55 -R 5 -m my@mail.address -M exec 
/usr/share/smartmontools/smartd-runner

(the above 3 lines should be all on one line).
I plan to replace the devicescan with explicit /dev/sd.. items, but as 
I'm currently regularly adding and removing (usb) drives, I kept the 
auto devicescan statement.
The rest means: enable smart on all drives, plan daily short and weekly 
long selftests, and warn on temperature too high or temp change of more 
than 5 deg., and mail warnings/errors to me.

VS.
> 
> Vincent Schut wrote:
>> Andrew Dunn wrote:
>>> I am able to reproduce this smart error now. I have done it twice, so
>>> maybe other things are causing this also.
>>>
>>> When I scanned the devices this morning with smartctl via webmin I lost
>>> 8 of the 9 drives. They are howerver still in my /dev folder.
>>>
>>> Now I sent out my logs from the first failure last night, smartctl was
>>> on the system... I dont know if ubuntu server's default smartd
>>> configuration makes it do periodic scans because I didnt change
>>> anything.
>>>
>>> I would hate to move back to 9.10 and see this problem again.
>>>
>>> Should I just not install smartmontools? This seems like a bad solution
>>> because now I wont be able to check the drives in advance for failures.
>>>
>>> Have you installed LSI's linux drivers? Some people say this solves
>>> their issue.
>>>
>>> From the logs sent out last night do you think it could be something
>>> else?
>>>
>>> Thanks a ton,
>> FWIW, I encountered the same issue, and seem to have found a viable
>> workaround by accessing the SATA disks on that LSI backplane as scsi
>> devices, e.g. by adding '-d scsi' to my smartctl/smartd.conf lines. No
>> more errors in the logs, no more drives being kicked out.
>> Though not as much info is available that way as when using de sata
>> driver ('-d sat', or automatically), like temperature is unavailable,
>> it does allow me to initiate the selftests and get their result, and
>> to monitor generic smart status of the drives. Quite enough for me.
>>
>> YMMV, though.
>>
>> Vincent.
>>> Gabor Gombas wrote:
>>>> On Mon, Nov 09, 2009 at 05:08:23AM -0500, Andrew Dunn wrote:
>>>>
>>>>  
>>>>> does it momentarily offline the disks? like they re-appear in /dev
>>>>> within moments? That would be similar behavior to what I am
>>>>> experiencing, the disks drop from the array, but they are in /dev
>>>>> by the
>>>>> time I get a chance to see them.
>>>>>     
>>>> No, either the disks need to be physically removed and re-inserted, or
>>>> the machine needs to be rebooted.
>>>>
>>>> Gabor
>>>>
>>>>   
>>
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID 6 Failure follow up
  2009-11-10 11:34                     ` Vincent Schut
  2009-11-11 12:34                       ` Andrew Dunn
@ 2009-11-17  8:40                       ` Vincent Schut
  1 sibling, 0 replies; 23+ messages in thread
From: Vincent Schut @ 2009-11-17  8:40 UTC (permalink / raw)
  Cc: Andrew Dunn, Gabor Gombas, Ryan Wagoner, Richard Scobie,
	Linux RAID Mailing List

Vincent Schut wrote:
> Andrew Dunn wrote:
>> I am able to reproduce this smart error now. I have done it twice, so
>> maybe other things are causing this also.
>>
>> When I scanned the devices this morning with smartctl via webmin I lost
>> 8 of the 9 drives. They are howerver still in my /dev folder.
>>
>> Now I sent out my logs from the first failure last night, smartctl was
>> on the system... I dont know if ubuntu server's default smartd
>> configuration makes it do periodic scans because I didnt change anything.
>>
>> I would hate to move back to 9.10 and see this problem again.
>>
>> Should I just not install smartmontools? This seems like a bad solution
>> because now I wont be able to check the drives in advance for failures.
>>
>> Have you installed LSI's linux drivers? Some people say this solves
>> their issue.
>>
>> From the logs sent out last night do you think it could be something 
>> else?
>>
>> Thanks a ton,
> 
> FWIW, I encountered the same issue, and seem to have found a viable 
> workaround by accessing the SATA disks on that LSI backplane as scsi 
> devices, e.g. by adding '-d scsi' to my smartctl/smartd.conf lines. No 
> more errors in the logs, no more drives being kicked out.
> Though not as much info is available that way as when using de sata 
> driver ('-d sat', or automatically), like temperature is unavailable, it 
> does allow me to initiate the selftests and get their result, and to 
> monitor generic smart status of the drives. Quite enough for me.
> 
> YMMV, though.

Folks, I need to retract this. Thought I've had far less problems with 
'-d scsi' instead of '-d sat' when running the LSI SAS / smartmontools / 
mdadm combo, I got bitten again last night by a drive being kicked out 
for no apparent reason. For now my only possible advise is: don't use 
smartmontools on drives that are on this LSI SAS backplane.
I dearly hope this will improve soon; I hate it to have my drives go 
unmonitored for too long...

Vincent.

> 
> Vincent.
>>
>> Gabor Gombas wrote:
>>> On Mon, Nov 09, 2009 at 05:08:23AM -0500, Andrew Dunn wrote:
>>>
>>>  
>>>> does it momentarily offline the disks? like they re-appear in /dev
>>>> within moments? That would be similar behavior to what I am
>>>> experiencing, the disks drop from the array, but they are in /dev by 
>>>> the
>>>> time I get a chance to see them.
>>>>     
>>> No, either the disks need to be physically removed and re-inserted, or
>>> the machine needs to be rebooted.
>>>
>>> Gabor
>>>
>>>   
>>
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2009-11-17  8:40 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-08 14:07 RAID 6 Failure follow up Andrew Dunn
2009-11-08 14:23 ` Roger Heflin
2009-11-08 14:30   ` Andrew Dunn
2009-11-08 18:01     ` Richard Scobie
2009-11-08 18:22       ` Andrew Dunn
2009-11-08 18:34         ` Joe Landman
2009-11-08 22:09       ` Andrew Dunn
2009-11-08 22:59         ` Richard Scobie
2009-11-09  2:45           ` Ryan Wagoner
2009-11-09  2:57             ` Richard Scobie
2009-11-09  8:09             ` Gabor Gombas
2009-11-09 10:08               ` Andrew Dunn
2009-11-09 11:34                 ` Gabor Gombas
2009-11-09 22:04                   ` Andrew Dunn
2009-11-10 10:55                   ` Andrew Dunn
2009-11-10 11:34                     ` Vincent Schut
2009-11-11 12:34                       ` Andrew Dunn
2009-11-11 12:46                         ` Vincent Schut
2009-11-17  8:40                       ` Vincent Schut
2009-11-10 12:45                     ` Ryan Wagoner
2009-11-08 14:36   ` Andrew Dunn
2009-11-08 14:56     ` Roger Heflin
2009-11-08 17:08       ` Andrew Dunn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.