* RAID 6 Failure follow up @ 2009-11-08 14:07 Andrew Dunn 2009-11-08 14:23 ` Roger Heflin 0 siblings, 1 reply; 23+ messages in thread From: Andrew Dunn @ 2009-11-08 14:07 UTC (permalink / raw) To: linux-raid list This is kind of interesting: storrgie@ALEXANDRIA:~$ sudo mdadm --assemble --force /dev/md0 mdadm: no devices found for /dev/md0 All of the devices are there in /dev, so I wanted to examine them: storrgie@ALEXANDRIA:~$ sudo mdadm --examine /dev/sde1 /dev/sde1: Magic : a92b4efc Version : 00.90.00 UUID : 397e0b3f:34cbe4cc:613e2239:070da8c8 (local to host ALEXANDRIA) Creation Time : Fri Nov 6 07:06:34 2009 Raid Level : raid6 Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) Array Size : 6837318656 (6520.58 GiB 7001.41 GB) Raid Devices : 9 Total Devices : 9 Preferred Minor : 0 Update Time : Sun Nov 8 08:57:04 2009 State : clean Active Devices : 5 Working Devices : 5 Failed Devices : 4 Spare Devices : 0 Checksum : 4ff41c5f - correct Events : 43 Chunk Size : 1024K Number Major Minor RaidDevice State this 0 8 65 0 active sync /dev/sde1 0 0 8 65 0 active sync /dev/sde1 1 1 8 81 1 active sync /dev/sdf1 2 2 8 97 2 active sync /dev/sdg1 3 3 8 113 3 active sync /dev/sdh1 4 4 0 0 4 faulty removed 5 5 0 0 5 faulty removed 6 6 0 0 6 faulty removed 7 7 0 0 7 faulty removed 8 8 8 193 8 active sync /dev/sdm1 First raid device shows the failures.... One of the 'removed' devices: storrgie@ALEXANDRIA:~$ sudo mdadm --examine /dev/sdi1 /dev/sdi1: Magic : a92b4efc Version : 00.90.00 UUID : 397e0b3f:34cbe4cc:613e2239:070da8c8 (local to host ALEXANDRIA) Creation Time : Fri Nov 6 07:06:34 2009 Raid Level : raid6 Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) Array Size : 6837318656 (6520.58 GiB 7001.41 GB) Raid Devices : 9 Total Devices : 9 Preferred Minor : 0 Update Time : Sun Nov 8 08:53:30 2009 State : active Active Devices : 9 Working Devices : 9 Failed Devices : 0 Spare Devices : 0 Checksum : 4ff41b2f - correct Events : 21 Chunk Size : 1024K Number Major Minor RaidDevice State this 4 8 129 4 active sync /dev/sdi1 0 0 8 65 0 active sync /dev/sde1 1 1 8 81 1 active sync /dev/sdf1 2 2 8 97 2 active sync /dev/sdg1 3 3 8 113 3 active sync /dev/sdh1 4 4 8 129 4 active sync /dev/sdi1 5 5 8 145 5 active sync /dev/sdj1 6 6 8 161 6 active sync /dev/sdk1 7 7 8 177 7 active sync /dev/sdl1 8 8 8 193 8 active sync /dev/sdm1 -- Andrew Dunn http://agdunn.net ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RAID 6 Failure follow up 2009-11-08 14:07 RAID 6 Failure follow up Andrew Dunn @ 2009-11-08 14:23 ` Roger Heflin 2009-11-08 14:30 ` Andrew Dunn 2009-11-08 14:36 ` Andrew Dunn 0 siblings, 2 replies; 23+ messages in thread From: Roger Heflin @ 2009-11-08 14:23 UTC (permalink / raw) To: Andrew Dunn; +Cc: linux-raid list Andrew Dunn wrote: > This is kind of interesting: > > storrgie@ALEXANDRIA:~$ sudo mdadm --assemble --force /dev/md0 > mdadm: no devices found for /dev/md0 > > All of the devices are there in /dev, so I wanted to examine them: > > storrgie@ALEXANDRIA:~$ sudo mdadm --examine /dev/sde1 > /dev/sde1: > Magic : a92b4efc > Version : 00.90.00 > UUID : 397e0b3f:34cbe4cc:613e2239:070da8c8 (local to host > ALEXANDRIA) > Creation Time : Fri Nov 6 07:06:34 2009 > Raid Level : raid6 > Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) > Array Size : 6837318656 (6520.58 GiB 7001.41 GB) > Raid Devices : 9 > Total Devices : 9 > Preferred Minor : 0 > > Update Time : Sun Nov 8 08:57:04 2009 > State : clean > Active Devices : 5 > Working Devices : 5 > Failed Devices : 4 > Spare Devices : 0 > Checksum : 4ff41c5f - correct > Events : 43 > > Chunk Size : 1024K > > Number Major Minor RaidDevice State > this 0 8 65 0 active sync /dev/sde1 > > 0 0 8 65 0 active sync /dev/sde1 > 1 1 8 81 1 active sync /dev/sdf1 > 2 2 8 97 2 active sync /dev/sdg1 > 3 3 8 113 3 active sync /dev/sdh1 > 4 4 0 0 4 faulty removed > 5 5 0 0 5 faulty removed > 6 6 0 0 6 faulty removed > 7 7 0 0 7 faulty removed > 8 8 8 193 8 active sync /dev/sdm1 > > First raid device shows the failures.... > > One of the 'removed' devices: > > storrgie@ALEXANDRIA:~$ sudo mdadm --examine /dev/sdi1 > /dev/sdi1: > Magic : a92b4efc > Version : 00.90.00 > UUID : 397e0b3f:34cbe4cc:613e2239:070da8c8 (local to host > ALEXANDRIA) > Creation Time : Fri Nov 6 07:06:34 2009 > Raid Level : raid6 > Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) > Array Size : 6837318656 (6520.58 GiB 7001.41 GB) > Raid Devices : 9 > Total Devices : 9 > Preferred Minor : 0 > > Update Time : Sun Nov 8 08:53:30 2009 > State : active > Active Devices : 9 > Working Devices : 9 > Failed Devices : 0 > Spare Devices : 0 > Checksum : 4ff41b2f - correct > Events : 21 > > Chunk Size : 1024K > > Number Major Minor RaidDevice State > this 4 8 129 4 active sync /dev/sdi1 > > 0 0 8 65 0 active sync /dev/sde1 > 1 1 8 81 1 active sync /dev/sdf1 > 2 2 8 97 2 active sync /dev/sdg1 > 3 3 8 113 3 active sync /dev/sdh1 > 4 4 8 129 4 active sync /dev/sdi1 > 5 5 8 145 5 active sync /dev/sdj1 > 6 6 8 161 6 active sync /dev/sdk1 > 7 7 8 177 7 active sync /dev/sdl1 > 8 8 8 193 8 active sync /dev/sdm1 > Did you check dmesg and see if there were errors on those disks? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RAID 6 Failure follow up 2009-11-08 14:23 ` Roger Heflin @ 2009-11-08 14:30 ` Andrew Dunn 2009-11-08 18:01 ` Richard Scobie 2009-11-08 14:36 ` Andrew Dunn 1 sibling, 1 reply; 23+ messages in thread From: Andrew Dunn @ 2009-11-08 14:30 UTC (permalink / raw) To: Roger Heflin, robin; +Cc: linux-raid list storrgie@ALEXANDRIA:~$ dmesg | grep sdi [ 31.019358] sd 11:0:0:0: [sdi] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) [ 31.032233] sd 11:0:0:0: [sdi] Write Protect is off [ 31.032235] sd 11:0:0:0: [sdi] Mode Sense: 73 00 00 08 [ 31.037483] sd 11:0:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 31.066991] sdi: [ 31.075719] sdi1 [ 31.124713] sd 11:0:0:0: [sdi] Attached SCSI disk [ 31.147407] md: bind<sdi1> [ 31.712366] raid5: device sdi1 operational as raid disk 4 [ 31.713153] disk 4, o:1, dev:sdi1 [ 33.112975] disk 4, o:1, dev:sdi1 [ 297.528544] sd 11:0:0:0: [sdi] Sense Key : Recovered Error [current] [descriptor] [ 297.528573] sd 11:0:0:0: [sdi] Add. Sense: ATA pass through information available [ 297.591382] sd 11:0:0:0: [sdi] Sense Key : Recovered Error [current] [descriptor] [ 297.591407] sd 11:0:0:0: [sdi] Add. Sense: ATA pass through information available I don't see anything glaring. You should be able to force an assembly anyway (using the --force flag) but I'd make sure you know exactly what the issue is first, otherwise this is likely to happen again. Do you think that the controller is dropping out? I know that I have 4 drives on one controller (AOC-USAS-L8i) and 5 drives on the other controller (SAME make/model). but I think they are sequentially connected... as in sd[efghi] should be on one device and sd[jklm] should be on the other... any easy way to verify? Roger Heflin wrote: > Andrew Dunn wrote: >> This is kind of interesting: >> >> storrgie@ALEXANDRIA:~$ sudo mdadm --assemble --force /dev/md0 >> mdadm: no devices found for /dev/md0 >> >> All of the devices are there in /dev, so I wanted to examine them: >> >> storrgie@ALEXANDRIA:~$ sudo mdadm --examine /dev/sde1 >> /dev/sde1: >> Magic : a92b4efc >> Version : 00.90.00 >> UUID : 397e0b3f:34cbe4cc:613e2239:070da8c8 (local to host >> ALEXANDRIA) >> Creation Time : Fri Nov 6 07:06:34 2009 >> Raid Level : raid6 >> Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) >> Array Size : 6837318656 (6520.58 GiB 7001.41 GB) >> Raid Devices : 9 >> Total Devices : 9 >> Preferred Minor : 0 >> >> Update Time : Sun Nov 8 08:57:04 2009 >> State : clean >> Active Devices : 5 >> Working Devices : 5 >> Failed Devices : 4 >> Spare Devices : 0 >> Checksum : 4ff41c5f - correct >> Events : 43 >> >> Chunk Size : 1024K >> >> Number Major Minor RaidDevice State >> this 0 8 65 0 active sync /dev/sde1 >> >> 0 0 8 65 0 active sync /dev/sde1 >> 1 1 8 81 1 active sync /dev/sdf1 >> 2 2 8 97 2 active sync /dev/sdg1 >> 3 3 8 113 3 active sync /dev/sdh1 >> 4 4 0 0 4 faulty removed >> 5 5 0 0 5 faulty removed >> 6 6 0 0 6 faulty removed >> 7 7 0 0 7 faulty removed >> 8 8 8 193 8 active sync /dev/sdm1 >> >> First raid device shows the failures.... >> >> One of the 'removed' devices: >> >> storrgie@ALEXANDRIA:~$ sudo mdadm --examine /dev/sdi1 >> /dev/sdi1: >> Magic : a92b4efc >> Version : 00.90.00 >> UUID : 397e0b3f:34cbe4cc:613e2239:070da8c8 (local to host >> ALEXANDRIA) >> Creation Time : Fri Nov 6 07:06:34 2009 >> Raid Level : raid6 >> Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) >> Array Size : 6837318656 (6520.58 GiB 7001.41 GB) >> Raid Devices : 9 >> Total Devices : 9 >> Preferred Minor : 0 >> >> Update Time : Sun Nov 8 08:53:30 2009 >> State : active >> Active Devices : 9 >> Working Devices : 9 >> Failed Devices : 0 >> Spare Devices : 0 >> Checksum : 4ff41b2f - correct >> Events : 21 >> >> Chunk Size : 1024K >> >> Number Major Minor RaidDevice State >> this 4 8 129 4 active sync /dev/sdi1 >> >> 0 0 8 65 0 active sync /dev/sde1 >> 1 1 8 81 1 active sync /dev/sdf1 >> 2 2 8 97 2 active sync /dev/sdg1 >> 3 3 8 113 3 active sync /dev/sdh1 >> 4 4 8 129 4 active sync /dev/sdi1 >> 5 5 8 145 5 active sync /dev/sdj1 >> 6 6 8 161 6 active sync /dev/sdk1 >> 7 7 8 177 7 active sync /dev/sdl1 >> 8 8 8 193 8 active sync /dev/sdm1 >> > > > Did you check dmesg and see if there were errors on those disks? > > -- Andrew Dunn http://agdunn.net ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RAID 6 Failure follow up 2009-11-08 14:30 ` Andrew Dunn @ 2009-11-08 18:01 ` Richard Scobie 2009-11-08 18:22 ` Andrew Dunn 2009-11-08 22:09 ` Andrew Dunn 0 siblings, 2 replies; 23+ messages in thread From: Richard Scobie @ 2009-11-08 18:01 UTC (permalink / raw) To: Andrew Dunn; +Cc: Roger Heflin, robin, linux-raid list Andrew Dunn wrote: > Do you think that the controller is dropping out? I know that I have 4 > drives on one controller (AOC-USAS-L8i) and 5 drives on the other > controller (SAME make/model). but I think they are sequentially > connected... as in sd[efghi] should be on one device and sd[jklm] should > be on the other... any easy way to verify? If you are running smartd, cease doing so and do not use the smartctl command on drives attached to these controllers - use causes drives to be offlined. It appears the smartctl is broken with LSISAS 1068E based controllers. See: https://bugzilla.redhat.com/show_bug.cgi?id=452389 and http://marc.info/?l=linux-scsi&m=125673590221135&w=2 Regards, Richard ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RAID 6 Failure follow up 2009-11-08 18:01 ` Richard Scobie @ 2009-11-08 18:22 ` Andrew Dunn 2009-11-08 18:34 ` Joe Landman 2009-11-08 22:09 ` Andrew Dunn 1 sibling, 1 reply; 23+ messages in thread From: Andrew Dunn @ 2009-11-08 18:22 UTC (permalink / raw) To: Richard Scobie; +Cc: Roger Heflin, robin, linux-raid list, nfbrown New data now, I got this from dmesg when it went down again. Hopefully there is some significance to you guys. > [14269.650381] sd 10:0:3:0: rejecting I/O to offline device > [14269.650453] sd 10:0:3:0: rejecting I/O to offline device > [14269.650524] sd 10:0:3:0: rejecting I/O to offline device > [14269.650595] sd 10:0:3:0: rejecting I/O to offline device > [14269.650672] sd 10:0:3:0: [sdh] Unhandled error code > [14269.650675] sd 10:0:3:0: [sdh] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK > [14269.650680] end_request: I/O error, dev sdh, sector 1435085631 > [14269.650749] raid5:md0: read error not correctable (sector 1435085568 on sdh1). > [14269.650753] raid5: Disk failure on sdh1, disabling device. > [14269.650754] raid5: Operation continuing on 7 devices. > [14269.650886] raid5:md0: read error not correctable (sector 1435085576 on sdh1). > [14269.650890] raid5:md0: read error not correctable (sector 1435085584 on sdh1). > [14269.650894] raid5:md0: read error not correctable (sector 1435085592 on sdh1). > [14269.650898] raid5:md0: read error not correctable (sector 1435085600 on sdh1). > [14269.650902] raid5:md0: read error not correctable (sector 1435085608 on sdh1). > [14269.650905] raid5:md0: read error not correctable (sector 1435085616 on sdh1). > [14269.650909] raid5:md0: read error not correctable (sector 1435085624 on sdh1). > [14269.650913] raid5:md0: read error not correctable (sector 1435085632 on sdh1). > [14269.650917] raid5:md0: read error not correctable (sector 1435085640 on sdh1). > [14269.650943] sd 10:0:3:0: [sdh] Unhandled error code > [14269.650946] sd 10:0:3:0: [sdh] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK > [14269.650950] end_request: I/O error, dev sdh, sector 1435085887 > [14269.651049] sd 10:0:3:0: [sdh] Unhandled error code > [14269.651051] sd 10:0:3:0: [sdh] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK > [14269.651055] end_request: I/O error, dev sdh, sector 1435086143 > [14269.651151] sd 10:0:3:0: [sdh] Unhandled error code > [14269.651153] sd 10:0:3:0: [sdh] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK > [14269.651157] end_request: I/O error, dev sdh, sector 1435086399 > [14269.651253] sd 10:0:3:0: [sdh] Unhandled error code > [14269.651255] sd 10:0:3:0: [sdh] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK > [14269.651259] end_request: I/O error, dev sdh, sector 1435086655 > [14269.651358] sd 10:0:3:0: [sdh] Unhandled error code > [14269.651361] sd 10:0:3:0: [sdh] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK > [14269.651364] end_request: I/O error, dev sdh, sector 1435086911 > [14269.651461] sd 10:0:3:0: [sdh] Unhandled error code > [14269.651463] sd 10:0:3:0: [sdh] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK > [14269.651467] end_request: I/O error, dev sdh, sector 1435087167 > [14269.651565] sd 10:0:3:0: [sdh] Unhandled error code > [14269.651568] sd 10:0:3:0: [sdh] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK > [14269.651571] end_request: I/O error, dev sdh, sector 1435087423 > [14269.670675] end_request: I/O error, dev sdf, sector 1953519935 > [14269.670739] md: super_written gets error=-5, uptodate=0 > [14269.670743] raid5: Disk failure on sdf1, disabling device. > [14269.670745] raid5: Operation continuing on 6 devices. > [14269.672525] end_request: I/O error, dev sdg, sector 1953519935 > [14269.672598] md: super_written gets error=-5, uptodate=0 > [14269.672603] raid5: Disk failure on sdg1, disabling device. > [14269.672605] raid5: Operation continuing on 5 devices. > [14269.674402] end_request: I/O error, dev sde, sector 1953519935 > [14269.674474] md: super_written gets error=-5, uptodate=0 > [14269.674478] raid5: Disk failure on sde1, disabling device. > [14269.674480] raid5: Operation continuing on 4 devices. > [14269.769991] sd 11:0:0:0: [sdi] Sense Key : Recovered Error [current] [descriptor] > [14269.769997] Descriptor sense data with sense descriptors (in hex): > [14269.770000] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14269.770012] 00 4f 00 c2 00 50 > [14269.770018] sd 11:0:0:0: [sdi] Add. Sense: ATA pass through information available > [14269.800245] md: md0: recovery done. > [14269.869990] sd 11:0:0:0: [sdi] Sense Key : Recovered Error [current] [descriptor] > [14269.869997] Descriptor sense data with sense descriptors (in hex): > [14269.870008] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14269.870019] 00 4f 00 c2 00 50 > [14269.870025] sd 11:0:0:0: [sdi] Add. Sense: ATA pass through information available > [14269.905144] RAID5 conf printout: > [14269.905148] --- rd:9 wd:4 > [14269.905152] disk 0, o:0, dev:sde1 > [14269.905155] disk 1, o:0, dev:sdf1 > [14269.905157] disk 2, o:0, dev:sdg1 > [14269.905160] disk 3, o:0, dev:sdh1 > [14269.905162] disk 4, o:1, dev:sdi1 > [14269.905165] disk 5, o:1, dev:sdj1 > [14269.905167] disk 6, o:1, dev:sdk1 > [14269.905169] disk 7, o:1, dev:sdl1 > [14269.905172] disk 8, o:1, dev:sdm1 > [14269.941265] RAID5 conf printout: > [14269.941269] --- rd:9 wd:4 > [14269.941273] disk 0, o:0, dev:sde1 > [14269.941276] disk 1, o:0, dev:sdf1 > [14269.941278] disk 2, o:0, dev:sdg1 > [14269.941281] disk 3, o:0, dev:sdh1 > [14269.941283] disk 4, o:1, dev:sdi1 > [14269.941286] disk 5, o:1, dev:sdj1 > [14269.941289] disk 7, o:1, dev:sdl1 > [14269.941291] disk 8, o:1, dev:sdm1 > [14269.941300] RAID5 conf printout: > [14269.941302] --- rd:9 wd:4 > [14269.941304] disk 0, o:0, dev:sde1 > [14269.941307] disk 1, o:0, dev:sdf1 > [14269.941309] disk 2, o:0, dev:sdg1 > [14269.941311] disk 3, o:0, dev:sdh1 > [14269.941314] disk 4, o:1, dev:sdi1 > [14269.941316] disk 5, o:1, dev:sdj1 > [14269.941318] disk 7, o:1, dev:sdl1 > [14269.941321] disk 8, o:1, dev:sdm1 > [14269.981260] RAID5 conf printout: > [14269.981263] --- rd:9 wd:4 > [14269.981265] disk 0, o:0, dev:sde1 > [14269.981268] disk 2, o:0, dev:sdg1 > [14269.981270] disk 3, o:0, dev:sdh1 > [14269.981273] disk 4, o:1, dev:sdi1 > [14269.981275] disk 5, o:1, dev:sdj1 > [14269.981277] disk 7, o:1, dev:sdl1 > [14269.981280] disk 8, o:1, dev:sdm1 > [14269.981284] RAID5 conf printout: > [14269.981286] --- rd:9 wd:4 > [14269.981289] disk 0, o:0, dev:sde1 > [14269.981291] disk 2, o:0, dev:sdg1 > [14269.981293] disk 3, o:0, dev:sdh1 > [14269.981296] disk 4, o:1, dev:sdi1 > [14269.981298] disk 5, o:1, dev:sdj1 > [14269.981300] disk 7, o:1, dev:sdl1 > [14269.981302] disk 8, o:1, dev:sdm1 > [14270.003316] sd 11:0:0:0: [sdi] Sense Key : Recovered Error [current] [descriptor] > [14270.003324] Descriptor sense data with sense descriptors (in hex): > [14270.003327] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14270.003338] 00 4f 00 c2 00 50 > [14270.003344] sd 11:0:0:0: [sdi] Add. Sense: ATA pass through information available > [14270.021260] RAID5 conf printout: > [14270.021263] --- rd:9 wd:4 > [14270.021266] disk 0, o:0, dev:sde1 > [14270.021269] disk 3, o:0, dev:sdh1 > [14270.021271] disk 4, o:1, dev:sdi1 > [14270.021274] disk 5, o:1, dev:sdj1 > [14270.021276] disk 7, o:1, dev:sdl1 > [14270.021278] disk 8, o:1, dev:sdm1 > [14270.021283] RAID5 conf printout: > [14270.021285] --- rd:9 wd:4 > [14270.021287] disk 0, o:0, dev:sde1 > [14270.021289] disk 3, o:0, dev:sdh1 > [14270.021292] disk 4, o:1, dev:sdi1 > [14270.021294] disk 5, o:1, dev:sdj1 > [14270.021296] disk 7, o:1, dev:sdl1 > [14270.021298] disk 8, o:1, dev:sdm1 > [14270.061261] RAID5 conf printout: > [14270.061264] --- rd:9 wd:4 > [14270.061266] disk 0, o:0, dev:sde1 > [14270.061269] disk 4, o:1, dev:sdi1 > [14270.061272] disk 5, o:1, dev:sdj1 > [14270.061274] disk 7, o:1, dev:sdl1 > [14270.061276] disk 8, o:1, dev:sdm1 > [14270.061281] RAID5 conf printout: > [14270.061283] --- rd:9 wd:4 > [14270.061285] disk 0, o:0, dev:sde1 > [14270.061287] disk 4, o:1, dev:sdi1 > [14270.061289] disk 5, o:1, dev:sdj1 > [14270.061292] disk 7, o:1, dev:sdl1 > [14270.061294] disk 8, o:1, dev:sdm1 > [14270.061647] sd 11:0:0:0: [sdi] Sense Key : Recovered Error [current] [descriptor] > [14270.061653] Descriptor sense data with sense descriptors (in hex): > [14270.061656] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14270.061667] 00 4f 00 c2 00 50 > [14270.061672] sd 11:0:0:0: [sdi] Add. Sense: ATA pass through information available > [14270.091263] RAID5 conf printout: > [14270.091267] --- rd:9 wd:4 > [14270.091271] disk 4, o:1, dev:sdi1 > [14270.091274] disk 5, o:1, dev:sdj1 > [14270.091276] disk 7, o:1, dev:sdl1 > [14270.091279] disk 8, o:1, dev:sdm1 > [14270.153319] sd 11:0:0:0: [sdi] Sense Key : Recovered Error [current] [descriptor] > [14270.153325] Descriptor sense data with sense descriptors (in hex): > [14270.153328] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14270.153340] 00 4f 00 c2 00 50 > [14270.153346] sd 11:0:0:0: [sdi] Add. Sense: ATA pass through information available > [14270.211651] sd 11:0:0:0: [sdi] Sense Key : Recovered Error [current] [descriptor] > [14270.211657] Descriptor sense data with sense descriptors (in hex): > [14270.211660] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14270.211671] 00 4f 00 c2 00 50 > [14270.211677] sd 11:0:0:0: [sdi] Add. Sense: ATA pass through information available > [14270.324057] sd 11:0:1:0: [sdj] Sense Key : Recovered Error [current] [descriptor] > [14270.324065] Descriptor sense data with sense descriptors (in hex): > [14270.324067] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14270.324079] 00 4f 00 c2 00 50 > [14270.324085] sd 11:0:1:0: [sdj] Add. Sense: ATA pass through information available > [14270.382390] sd 11:0:1:0: [sdj] Sense Key : Recovered Error [current] [descriptor] > [14270.382396] Descriptor sense data with sense descriptors (in hex): > [14270.382399] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14270.382410] 00 4f 00 c2 00 50 > [14270.382416] sd 11:0:1:0: [sdj] Add. Sense: ATA pass through information available > [14270.474060] sd 11:0:1:0: [sdj] Sense Key : Recovered Error [current] [descriptor] > [14270.474068] Descriptor sense data with sense descriptors (in hex): > [14270.474071] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14270.474083] 00 4f 00 c2 00 50 > [14270.474089] sd 11:0:1:0: [sdj] Add. Sense: ATA pass through information available > [14270.532394] sd 11:0:1:0: [sdj] Sense Key : Recovered Error [current] [descriptor] > [14270.532401] Descriptor sense data with sense descriptors (in hex): > [14270.532404] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14270.532415] 00 4f 00 c2 00 50 > [14270.532421] sd 11:0:1:0: [sdj] Add. Sense: ATA pass through information available > [14270.632394] sd 11:0:1:0: [sdj] Sense Key : Recovered Error [current] [descriptor] > [14270.632402] Descriptor sense data with sense descriptors (in hex): > [14270.632405] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14270.632417] 00 4f 00 c2 00 50 > [14270.632423] sd 11:0:1:0: [sdj] Add. Sense: ATA pass through information available > [14270.690729] sd 11:0:1:0: [sdj] Sense Key : Recovered Error [current] [descriptor] > [14270.690736] Descriptor sense data with sense descriptors (in hex): > [14270.690739] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14270.690751] 00 4f 00 c2 00 50 > [14270.690757] sd 11:0:1:0: [sdj] Add. Sense: ATA pass through information available > [14270.804065] sd 11:0:2:0: [sdk] Sense Key : Recovered Error [current] [descriptor] > [14270.804073] Descriptor sense data with sense descriptors (in hex): > [14270.804076] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14270.804088] 00 4f 00 c2 00 50 > [14270.804094] sd 11:0:2:0: [sdk] Add. Sense: ATA pass through information available > [14270.862400] sd 11:0:2:0: [sdk] Sense Key : Recovered Error [current] [descriptor] > [14270.862406] Descriptor sense data with sense descriptors (in hex): > [14270.862409] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14270.862420] 00 4f 00 c2 00 50 > [14270.862426] sd 11:0:2:0: [sdk] Add. Sense: ATA pass through information available > [14270.954070] sd 11:0:2:0: [sdk] Sense Key : Recovered Error [current] [descriptor] > [14270.954079] Descriptor sense data with sense descriptors (in hex): > [14270.954081] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14270.954093] 00 4f 00 c2 00 50 > [14270.954099] sd 11:0:2:0: [sdk] Add. Sense: ATA pass through information available > [14271.012399] sd 11:0:2:0: [sdk] Sense Key : Recovered Error [current] [descriptor] > [14271.012406] Descriptor sense data with sense descriptors (in hex): > [14271.012408] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14271.012420] 00 4f 00 c2 00 50 > [14271.012426] sd 11:0:2:0: [sdk] Add. Sense: ATA pass through information available > [14271.104072] sd 11:0:2:0: [sdk] Sense Key : Recovered Error [current] [descriptor] > [14271.104080] Descriptor sense data with sense descriptors (in hex): > [14271.104083] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14271.104094] 00 4f 00 c2 00 50 > [14271.104100] sd 11:0:2:0: [sdk] Add. Sense: ATA pass through information available > [14271.162400] sd 11:0:2:0: [sdk] Sense Key : Recovered Error [current] [descriptor] > [14271.162407] Descriptor sense data with sense descriptors (in hex): > [14271.162410] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14271.162422] 00 4f 00 c2 00 50 > [14271.162428] sd 11:0:2:0: [sdk] Add. Sense: ATA pass through information available > [14271.278147] sd 11:0:3:0: [sdl] Sense Key : Recovered Error [current] [descriptor] > [14271.278155] Descriptor sense data with sense descriptors (in hex): > [14271.278157] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14271.278169] 00 4f 00 c2 00 50 > [14271.278175] sd 11:0:3:0: [sdl] Add. Sense: ATA pass through information available > [14271.336487] sd 11:0:3:0: [sdl] Sense Key : Recovered Error [current] [descriptor] > [14271.336495] Descriptor sense data with sense descriptors (in hex): > [14271.336498] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14271.336509] 00 4f 00 c2 00 50 > [14271.336515] sd 11:0:3:0: [sdl] Add. Sense: ATA pass through information available > [14271.428148] sd 11:0:3:0: [sdl] Sense Key : Recovered Error [current] [descriptor] > [14271.428156] Descriptor sense data with sense descriptors (in hex): > [14271.428158] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14271.428170] 00 4f 00 c2 00 50 > [14271.428176] sd 11:0:3:0: [sdl] Add. Sense: ATA pass through information available > [14271.486485] sd 11:0:3:0: [sdl] Sense Key : Recovered Error [current] [descriptor] > [14271.486493] Descriptor sense data with sense descriptors (in hex): > [14271.486496] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14271.486508] 00 4f 00 c2 00 50 > [14271.486514] sd 11:0:3:0: [sdl] Add. Sense: ATA pass through information available > [14271.586482] sd 11:0:3:0: [sdl] Sense Key : Recovered Error [current] [descriptor] > [14271.586489] Descriptor sense data with sense descriptors (in hex): > [14271.586492] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14271.586503] 00 4f 00 c2 00 50 > [14271.586509] sd 11:0:3:0: [sdl] Add. Sense: ATA pass through information available > [14271.644813] sd 11:0:3:0: [sdl] Sense Key : Recovered Error [current] [descriptor] > [14271.644819] Descriptor sense data with sense descriptors (in hex): > [14271.644822] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14271.644833] 00 4f 00 c2 00 50 > [14271.644839] sd 11:0:3:0: [sdl] Add. Sense: ATA pass through information available > [14271.762812] sd 11:0:4:0: [sdm] Sense Key : Recovered Error [current] [descriptor] > [14271.762820] Descriptor sense data with sense descriptors (in hex): > [14271.762823] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14271.762834] 00 4f 00 c2 00 50 > [14271.762841] sd 11:0:4:0: [sdm] Add. Sense: ATA pass through information available > [14271.821145] sd 11:0:4:0: [sdm] Sense Key : Recovered Error [current] [descriptor] > [14271.821152] Descriptor sense data with sense descriptors (in hex): > [14271.821154] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14271.821166] 00 4f 00 c2 00 50 > [14271.821172] sd 11:0:4:0: [sdm] Add. Sense: ATA pass through information available > [14271.912816] sd 11:0:4:0: [sdm] Sense Key : Recovered Error [current] [descriptor] > [14271.912824] Descriptor sense data with sense descriptors (in hex): > [14271.912827] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14271.912838] 00 4f 00 c2 00 50 > [14271.912844] sd 11:0:4:0: [sdm] Add. Sense: ATA pass through information available > [14271.971152] sd 11:0:4:0: [sdm] Sense Key : Recovered Error [current] [descriptor] > [14271.971161] Descriptor sense data with sense descriptors (in hex): > [14271.971163] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14271.971175] 00 4f 00 c2 00 50 > [14271.971181] sd 11:0:4:0: [sdm] Add. Sense: ATA pass through information available > [14272.071150] sd 11:0:4:0: [sdm] Sense Key : Recovered Error [current] [descriptor] > [14272.071157] Descriptor sense data with sense descriptors (in hex): > [14272.071160] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14272.071172] 00 4f 00 c2 00 50 > [14272.071178] sd 11:0:4:0: [sdm] Add. Sense: ATA pass through information available > [14272.129485] sd 11:0:4:0: [sdm] Sense Key : Recovered Error [current] [descriptor] > [14272.129494] Descriptor sense data with sense descriptors (in hex): > [14272.129497] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 > [14272.129508] 00 4f 00 c2 00 50 > [14272.129514] sd 11:0:4:0: [sdm] Add. Sense: ATA pass through information available > [14365.066847] Aborting journal on device md0:8. > [14365.066946] __ratelimit: 246 callbacks suppressed > [14365.066949] Buffer I/O error on device md0, logical block 854622208 > [14365.067018] lost page write due to I/O error on md0 > [14365.067023] JBD2: I/O error detected when updating journal superblock for md0:8. > [14382.768622] EXT4-fs error (device md0): ext4_find_entry: reading directory #6879966 offset 0 > [14382.820264] Buffer I/O error on device md0, logical block 0 > [14382.820332] lost page write due to I/O error on md0 > [14401.997859] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6891861, block=27267765 > [14401.998043] EXT4-fs (md0): previous I/O error to superblock detected > [14402.041639] Buffer I/O error on device md0, logical block 0 > [14402.041708] lost page write due to I/O error on md0 > [14402.042025] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6892055, block=27267777 > [14402.042189] EXT4-fs (md0): previous I/O error to superblock detected > [14402.042337] Buffer I/O error on device md0, logical block 0 > [14402.042404] lost page write due to I/O error on md0 > [14402.042615] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6891691, block=27267754 > [14402.042780] EXT4-fs (md0): previous I/O error to superblock detected > [14402.042927] Buffer I/O error on device md0, logical block 0 > [14402.042994] lost page write due to I/O error on md0 > [14402.043204] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6891589, block=27267748 > [14402.043369] EXT4-fs (md0): previous I/O error to superblock detected > [14402.043514] Buffer I/O error on device md0, logical block 0 > [14402.043581] lost page write due to I/O error on md0 > [14402.045186] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6892719, block=27267818 > [14402.045351] EXT4-fs (md0): previous I/O error to superblock detected > [14402.045500] Buffer I/O error on device md0, logical block 0 > [14402.045569] lost page write due to I/O error on md0 > [14402.061829] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6891914, block=27267768 > [14402.061983] EXT4-fs (md0): previous I/O error to superblock detected > [14402.062117] Buffer I/O error on device md0, logical block 0 > [14402.062175] lost page write due to I/O error on md0 > [14402.062495] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6891136, block=27267719 > [14402.062651] EXT4-fs (md0): previous I/O error to superblock detected > [14402.062793] Buffer I/O error on device md0, logical block 0 > [14402.062859] lost page write due to I/O error on md0 > [14402.063053] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6893036, block=27267838 > [14402.063217] EXT4-fs (md0): previous I/O error to superblock detected > [14402.063357] Buffer I/O error on device md0, logical block 0 > [14402.063423] lost page write due to I/O error on md0 > [14402.063624] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6892357, block=27267796 > [14402.063793] EXT4-fs (md0): previous I/O error to superblock detected > [14402.063935] Buffer I/O error on device md0, logical block 0 > [14402.064001] lost page write due to I/O error on md0 > [14402.064193] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6891032, block=27267713 > [14402.064355] EXT4-fs (md0): previous I/O error to superblock detected > [14402.064496] Buffer I/O error on device md0, logical block 0 > [14402.064561] lost page write due to I/O error on md0 > [14402.064741] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6891782, block=27267760 > [14402.064906] EXT4-fs (md0): previous I/O error to superblock detected > [14402.065232] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6892332, block=27267794 > [14402.065395] EXT4-fs (md0): previous I/O error to superblock detected > [14402.065714] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6892755, block=27267821 > [14402.065878] EXT4-fs (md0): previous I/O error to superblock detected > [14402.066197] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6892235, block=27267788 > [14402.066362] EXT4-fs (md0): previous I/O error to superblock detected > [14402.066675] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6892552, block=27267808 > [14402.066840] EXT4-fs (md0): previous I/O error to superblock detected > [14402.067156] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6892123, block=27267781 > [14402.067321] EXT4-fs (md0): previous I/O error to superblock detected > [14402.067635] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6892256, block=27267789 > [14402.067800] EXT4-fs (md0): previous I/O error to superblock detected > [14402.068114] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6892532, block=27267807 > [14402.068278] EXT4-fs (md0): previous I/O error to superblock detected > [14402.068594] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6892318, block=27267793 > [14402.068758] EXT4-fs (md0): previous I/O error to superblock detected > [14402.069069] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6892845, block=27267826 > [14402.069233] EXT4-fs (md0): previous I/O error to superblock detected > [14402.069543] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6892980, block=27267835 > [14402.069707] EXT4-fs (md0): previous I/O error to superblock detected > [14402.074971] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6890993, block=27267711 > [14402.075140] EXT4-fs (md0): previous I/O error to superblock detected > [14402.075540] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6891750, block=27267758 > [14402.075686] EXT4-fs (md0): previous I/O error to superblock detected > [14402.076028] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6891642, block=27267751 > [14402.076174] EXT4-fs (md0): previous I/O error to superblock detected > [14402.076543] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6892605, block=27267811 > [14402.076689] EXT4-fs (md0): previous I/O error to superblock detected > [14402.077059] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6892136, block=27267782 > [14402.077223] EXT4-fs (md0): previous I/O error to superblock detected > [14402.077567] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6892717, block=27267818 > [14402.077732] EXT4-fs (md0): previous I/O error to superblock detected > [14402.078080] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6890999, block=27267711 > [14402.078243] EXT4-fs (md0): previous I/O error to superblock detected > [14402.078593] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6892362, block=27267796 > [14402.080842] EXT4-fs (md0): previous I/O error to superblock detected > [14402.083259] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6892867, block=27267828 > [14402.083423] EXT4-fs (md0): previous I/O error to superblock detected > [14402.083798] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6891361, block=27267734 > [14402.083963] EXT4-fs (md0): previous I/O error to superblock detected > [14402.084315] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6892012, block=27267774 > [14402.084480] EXT4-fs (md0): previous I/O error to superblock detected > [14402.084852] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6891626, block=27267750 > [14402.085014] EXT4-fs (md0): previous I/O error to superblock detected > [14402.085365] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6891320, block=27267731 > [14402.085530] EXT4-fs (md0): previous I/O error to superblock detected > [14402.085880] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6891588, block=27267748 > [14402.086044] EXT4-fs (md0): previous I/O error to superblock detected > [14402.086390] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6892584, block=27267810 > [14402.086556] EXT4-fs (md0): previous I/O error to superblock detected > [14402.086901] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6892891, block=27267829 > [14402.087066] EXT4-fs (md0): previous I/O error to superblock detected > [14402.087416] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6891118, block=27267718 > [14402.087579] EXT4-fs (md0): previous I/O error to superblock detected > [14402.087930] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6891559, block=27267746 > [14402.088094] EXT4-fs (md0): previous I/O error to superblock detected > [14402.088445] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6891212, block=27267724 > [14402.088609] EXT4-fs (md0): previous I/O error to superblock detected > [14402.091550] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6890993, block=27267711 > [14402.091718] EXT4-fs (md0): previous I/O error to superblock detected > [14402.106045] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6890999, block=27267711 > [14402.106212] EXT4-fs (md0): previous I/O error to superblock detected > [14402.141662] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6889579, block=27267622 > [14402.141829] EXT4-fs (md0): previous I/O error to superblock detected > [14402.142185] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6889980, block=27267647 > [14402.142350] EXT4-fs (md0): previous I/O error to superblock detected > [14402.142703] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6889704, block=27267630 > [14402.142868] EXT4-fs (md0): previous I/O error to superblock detected > [14402.143318] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6889340, block=27267607 > [14402.143483] EXT4-fs (md0): previous I/O error to superblock detected > [14402.143826] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6889563, block=27267621 > [14402.143990] EXT4-fs (md0): previous I/O error to superblock detected > [14402.144341] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6889220, block=27267600 > [14402.144506] EXT4-fs (md0): previous I/O error to superblock detected > [14402.144869] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6889358, block=27267608 > [14402.145034] EXT4-fs (md0): previous I/O error to superblock detected > [14402.145379] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6889530, block=27267619 > [14402.145542] EXT4-fs (md0): previous I/O error to superblock detected > [14402.145890] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6889721, block=27267631 > [14402.146054] EXT4-fs (md0): previous I/O error to superblock detected > [14402.146398] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6889621, block=27267625 > [14402.146562] EXT4-fs (md0): previous I/O error to superblock detected > [14402.146900] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6889920, block=27267643 > [14402.147047] EXT4-fs (md0): previous I/O error to superblock detected > [14402.147390] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6889508, block=27267618 > [14402.147536] EXT4-fs (md0): previous I/O error to superblock detected > [14402.147869] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6889673, block=27267628 > [14402.148015] EXT4-fs (md0): previous I/O error to superblock detected > [14402.153911] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6889220, block=27267600 > [14402.154075] EXT4-fs (md0): previous I/O error to superblock detected > [14402.155819] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6889340, block=27267607 > [14402.155987] EXT4-fs (md0): previous I/O error to superblock detected > [14402.261374] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6876431, block=27266800 > [14402.261522] EXT4-fs (md0): previous I/O error to superblock detected > [14402.261981] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6875468, block=27266740 > [14402.262128] EXT4-fs (md0): previous I/O error to superblock detected > [14402.262587] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6878644, block=27266939 > [14402.262753] EXT4-fs (md0): previous I/O error to superblock detected > [14402.263223] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6875990, block=27266773 > [14402.263388] EXT4-fs (md0): previous I/O error to superblock detected > [14402.263741] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6875325, block=27266731 > [14402.263908] EXT4-fs (md0): previous I/O error to superblock detected > [14402.264259] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6875779, block=27266760 > [14402.264424] EXT4-fs (md0): previous I/O error to superblock detected > [14402.264808] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6879881, block=27267016 > [14402.264972] EXT4-fs (md0): previous I/O error to superblock detected > [14402.265325] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6878489, block=27266929 > [14402.265491] EXT4-fs (md0): previous I/O error to superblock detected > [14402.265842] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6878591, block=27266935 > [14402.266005] EXT4-fs (md0): previous I/O error to superblock detected > [14402.266357] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6876138, block=27266782 > [14402.266520] EXT4-fs (md0): previous I/O error to superblock detected > [14402.266876] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6875310, block=27266730 > [14402.267042] EXT4-fs (md0): previous I/O error to superblock detected > [14402.267396] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6876274, block=27266791 > [14402.267560] EXT4-fs (md0): previous I/O error to superblock detected > [14402.267907] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6875648, block=27266751 > [14402.268071] EXT4-fs (md0): previous I/O error to superblock detected > [14402.268422] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6876119, block=27266781 > [14402.268586] EXT4-fs (md0): previous I/O error to superblock detected > [14402.269056] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6877808, block=27266886 > [14402.269219] EXT4-fs (md0): previous I/O error to superblock detected > [14402.269573] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6878101, block=27266905 > [14402.269738] EXT4-fs (md0): previous I/O error to superblock detected > [14402.270088] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6877967, block=27266896 > [14402.270264] EXT4-fs (md0): previous I/O error to superblock detected > [14402.270614] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6877835, block=27266888 > [14402.270793] EXT4-fs (md0): previous I/O error to superblock detected > [14402.271146] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6875370, block=27266734 > [14402.271323] EXT4-fs (md0): previous I/O error to superblock detected > [14402.271679] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6877955, block=27266896 > [14402.271854] EXT4-fs (md0): previous I/O error to superblock detected > [14402.272214] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6876218, block=27266787 > [14402.272391] EXT4-fs (md0): previous I/O error to superblock detected > [14402.272745] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6875340, block=27266732 > [14402.272922] EXT4-fs (md0): previous I/O error to superblock detected > [14402.273281] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6878031, block=27266900 > [14402.273452] EXT4-fs (md0): previous I/O error to superblock detected > [14402.273919] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6874892, block=27266704 > [14402.274097] EXT4-fs (md0): previous I/O error to superblock detected > [14402.274454] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6878014, block=27266899 > [14402.274628] EXT4-fs (md0): previous I/O error to superblock detected > [14402.274987] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6876066, block=27266778 > [14402.275146] EXT4-fs (md0): previous I/O error to superblock detected > [14402.275488] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6879778, block=27267010 > [14402.275646] EXT4-fs (md0): previous I/O error to superblock detected > [14402.275996] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6878310, block=27266918 > [14402.276151] EXT4-fs (md0): previous I/O error to superblock detected > [14402.276624] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6877450, block=27266864 > [14402.276793] EXT4-fs (md0): previous I/O error to superblock detected > [14402.277148] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6875397, block=27266736 > [14402.277315] EXT4-fs (md0): previous I/O error to superblock detected > [14402.277778] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6877004, block=27266836 > [14402.277943] EXT4-fs (md0): previous I/O error to superblock detected > [14402.278295] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6875606, block=27266749 > [14402.280543] EXT4-fs (md0): previous I/O error to superblock detected > [14402.283306] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6874892, block=27266704 > [14402.283472] EXT4-fs (md0): previous I/O error to superblock detected > [14402.285354] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6875310, block=27266730 > [14402.285519] EXT4-fs (md0): previous I/O error to superblock detected > [14402.302533] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6874640, block=27266688 > [14402.302698] EXT4-fs (md0): previous I/O error to superblock detected > [14402.304480] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6874640, block=27266688 > [14402.304629] EXT4-fs (md0): previous I/O error to superblock detected > [14402.752437] EXT4-fs error (device md0): ext4_journal_start_sb: Detected aborted journal > [14402.752606] EXT4-fs (md0): Remounting filesystem read-only > [14419.267133] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6874640, block=27266688 > [14419.297937] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6874892, block=27266704 > [14419.301517] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6875310, block=27266730 > [14419.332861] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6889220, block=27267600 > [14419.335590] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6889340, block=27267607 > [14419.341744] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6890993, block=27267711 > [14419.343458] EXT4-fs error (device md0): __ext4_get_inode_loc: unable to read inode block - inode=6890999, block=27267711 Richard Scobie wrote: > Andrew Dunn wrote: > >> Do you think that the controller is dropping out? I know that I have 4 >> drives on one controller (AOC-USAS-L8i) and 5 drives on the other >> controller (SAME make/model). but I think they are sequentially >> connected... as in sd[efghi] should be on one device and sd[jklm] should >> be on the other... any easy way to verify? > > If you are running smartd, cease doing so and do not use the smartctl > command on drives attached to these controllers - use causes drives to > be offlined. > > It appears the smartctl is broken with LSISAS 1068E based controllers. > > See: > > https://bugzilla.redhat.com/show_bug.cgi?id=452389 > > and > > http://marc.info/?l=linux-scsi&m=125673590221135&w=2 > > Regards, > > Richard > -- Andrew Dunn http://agdunn.net ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RAID 6 Failure follow up 2009-11-08 18:22 ` Andrew Dunn @ 2009-11-08 18:34 ` Joe Landman 0 siblings, 0 replies; 23+ messages in thread From: Joe Landman @ 2009-11-08 18:34 UTC (permalink / raw) To: Andrew Dunn; +Cc: Richard Scobie, Roger Heflin, robin, linux-raid list, nfbrown Andrew Dunn wrote: > New data now, I got this from dmesg when it went down again. Hopefully > there is some significance to you guys. > >> [14269.650381] sd 10:0:3:0: rejecting I/O to offline device >> [14269.650453] sd 10:0:3:0: rejecting I/O to offline device >> [14269.650524] sd 10:0:3:0: rejecting I/O to offline device >> [14269.650595] sd 10:0:3:0: rejecting I/O to offline device >> [14269.650672] sd 10:0:3:0: [sdh] Unhandled error code >> [14269.650675] sd 10:0:3:0: [sdh] Result: hostbyte=DID_NO_CONNECT > driverbyte=DRIVER_OK >> [14269.650680] end_request: I/O error, dev sdh, sector 1435085631 >> [14269.650749] raid5:md0: read error not correctable (sector > 1435085568 on sdh1). >> [14269.650753] raid5: Disk failure on sdh1, disabling device. >> [14269.650754] raid5: Operation continuing on 7 devices. >> [14269.650886] raid5:md0: read error not correctable (sector > 1435085576 on sdh1). >> [14269.650890] raid5:md0: read error not correctable (sector > 1435085584 on sdh1). >> [14269.650894] raid5:md0: read error not correctable (sector > 1435085592 on sdh1). [...] I am not convinced this is a drive failure (yet). You have sdh,sdi,sdj,sdk,sdl,sdm all reporting errors or error recovery. This sounds like a physical backplane failure (is this on an expander system? we have seen this/had this happen before), a cable to the SATA card failing (we have seen this/had this happen before), or a power supply issue (can't handle all the drives in constant operation, which we have seen before as well). Driver issues are possible, but it is pursuing normal failure code paths, so unless the driver is tickling the remove code on its own ... Smart could be offlining the drive, and having it non-responsive. Something else could be doing that as well (vibration, power quality, ...) What does hdparm -I /dev/sdh tell us? If nothing, we need to use sdparm to get some information. Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics, Inc. email: landman@scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/jackrabbit phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RAID 6 Failure follow up 2009-11-08 18:01 ` Richard Scobie 2009-11-08 18:22 ` Andrew Dunn @ 2009-11-08 22:09 ` Andrew Dunn 2009-11-08 22:59 ` Richard Scobie 1 sibling, 1 reply; 23+ messages in thread From: Andrew Dunn @ 2009-11-08 22:09 UTC (permalink / raw) To: Richard Scobie; +Cc: Roger Heflin, robin, linux-raid list I am not, but this is quite interesting. What versions are affected? I am in ubuntu 9.10. Richard Scobie wrote: > Andrew Dunn wrote: > >> Do you think that the controller is dropping out? I know that I have 4 >> drives on one controller (AOC-USAS-L8i) and 5 drives on the other >> controller (SAME make/model). but I think they are sequentially >> connected... as in sd[efghi] should be on one device and sd[jklm] should >> be on the other... any easy way to verify? > > If you are running smartd, cease doing so and do not use the smartctl > command on drives attached to these controllers - use causes drives to > be offlined. > > It appears the smartctl is broken with LSISAS 1068E based controllers. > > See: > > https://bugzilla.redhat.com/show_bug.cgi?id=452389 > > and > > http://marc.info/?l=linux-scsi&m=125673590221135&w=2 > > Regards, > > Richard > -- Andrew Dunn http://agdunn.net ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RAID 6 Failure follow up 2009-11-08 22:09 ` Andrew Dunn @ 2009-11-08 22:59 ` Richard Scobie 2009-11-09 2:45 ` Ryan Wagoner 0 siblings, 1 reply; 23+ messages in thread From: Richard Scobie @ 2009-11-08 22:59 UTC (permalink / raw) To: Andrew Dunn, Linux RAID Mailing List Andrew Dunn wrote: > I am not, but this is quite interesting. What versions are affected? I > am in ubuntu 9.10. To my knowledge, there is no smartmontools version safe for use on these LSI based controllers. You may run smartctl commands a few times and get away with it, but eventually it will bite you. Losing 14 drives on a 16 drive array as a result, is no fun... Hopefully it will be fixed one day. Regards, Richard ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RAID 6 Failure follow up 2009-11-08 22:59 ` Richard Scobie @ 2009-11-09 2:45 ` Ryan Wagoner 2009-11-09 2:57 ` Richard Scobie 2009-11-09 8:09 ` Gabor Gombas 0 siblings, 2 replies; 23+ messages in thread From: Ryan Wagoner @ 2009-11-09 2:45 UTC (permalink / raw) To: Richard Scobie; +Cc: Andrew Dunn, Linux RAID Mailing List This is interesting to hear as I have been using smartmontools on my Supermicro LSI 1068E controller with the target firmware for 2 years now on CentOS 5. I have 3 RAID 1 arrays across 2 drives, a RAID 5 drive across 3 drives, and a RAID 0 across 2 drives. I routinely will query smartctrl with something like for i in a b c d e f; do smartctl -a /dev/sd$i | grep Reallocated; done or for i in a b c d e f; do smartctl -a /dev/sd$i | grep Temperature; done Here are the system details cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] [raid0] md0 : active raid1 sdb1[1] sda1[0] 104320 blocks [2/2] [UU] md2 : active raid1 sdb3[1] sda3[0] 2032128 blocks [2/2] [UU] md3 : active raid0 sdd1[1] sdc1[0] 625137152 blocks 64k chunks md4 : active raid5 sdg1[2] sdf1[1] sde1[0] 1953519872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] md1 : active raid1 sdb2[1] sda2[0] 154151616 blocks [2/2] [UU] lpsci 02:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS (rev 08) modprobe.conf alias scsi_hostadapter mptbase alias scsi_hostadapter1 mptsas rpm -qa | grep smartmontools smartmontools-5.38-2.el5 uname -r 2.6.18-128.4.1.el5 On Sun, Nov 8, 2009 at 5:59 PM, Richard Scobie <richard@sauce.co.nz> wrote: > Andrew Dunn wrote: >> >> I am not, but this is quite interesting. What versions are affected? I >> am in ubuntu 9.10. > > To my knowledge, there is no smartmontools version safe for use on these LSI > based controllers. > > You may run smartctl commands a few times and get away with it, but > eventually it will bite you. > > Losing 14 drives on a 16 drive array as a result, is no fun... > > Hopefully it will be fixed one day. > > Regards, > > Richard > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RAID 6 Failure follow up 2009-11-09 2:45 ` Ryan Wagoner @ 2009-11-09 2:57 ` Richard Scobie 2009-11-09 8:09 ` Gabor Gombas 1 sibling, 0 replies; 23+ messages in thread From: Richard Scobie @ 2009-11-09 2:57 UTC (permalink / raw) To: Ryan Wagoner; +Cc: Andrew Dunn, Linux RAID Mailing List Ryan Wagoner wrote: > This is interesting to hear as I have been using smartmontools on my > Supermicro LSI 1068E controller with the target firmware for 2 years > now on CentOS 5. I have 3 RAID 1 arrays across 2 drives, a RAID 5 > drive across 3 drives, and a RAID 0 across 2 drives. I have 3 boxes using 1068E controllers attached to 16 drive port expander based chassis that have been built over the last 2.5 years and they all react badly. In fact the latest one put together a month ago (which is using more recent controller IT firmware and kernel than the other two), will not tolerate a single smartctl command, where the other two will maybe 50% of the time. Something is not right here and others running different drive setups - direct attached and port multipler based are seeing the same thing. Suffice it to say, I would recommend heavy testing before putting into production and I personally have no confidence currently. Regards, Richard ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RAID 6 Failure follow up 2009-11-09 2:45 ` Ryan Wagoner 2009-11-09 2:57 ` Richard Scobie @ 2009-11-09 8:09 ` Gabor Gombas 2009-11-09 10:08 ` Andrew Dunn 1 sibling, 1 reply; 23+ messages in thread From: Gabor Gombas @ 2009-11-09 8:09 UTC (permalink / raw) To: Ryan Wagoner; +Cc: Richard Scobie, Andrew Dunn, Linux RAID Mailing List On Sun, Nov 08, 2009 at 09:45:40PM -0500, Ryan Wagoner wrote: > This is interesting to hear as I have been using smartmontools on my > Supermicro LSI 1068E controller with the target firmware for 2 years > now on CentOS 5. I have 3 RAID 1 arrays across 2 drives, a RAID 5 > drive across 3 drives, and a RAID 0 across 2 drives. [...] > uname -r > 2.6.18-128.4.1.el5 Kernel version matters. With 2.6.22 we only got occassional complaints that the drives are not capable of SMART checks that were not true but were otherwise harmless. With 2.6.26 and 2.6.30, the controller offlines the disks. Gabor -- --------------------------------------------------------- MTA SZTAKI Computer and Automation Research Institute Hungarian Academy of Sciences --------------------------------------------------------- ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RAID 6 Failure follow up 2009-11-09 8:09 ` Gabor Gombas @ 2009-11-09 10:08 ` Andrew Dunn 2009-11-09 11:34 ` Gabor Gombas 0 siblings, 1 reply; 23+ messages in thread From: Andrew Dunn @ 2009-11-09 10:08 UTC (permalink / raw) To: Gabor Gombas; +Cc: Ryan Wagoner, Richard Scobie, Linux RAID Mailing List does it momentarily offline the disks? like they re-appear in /dev within moments? That would be similar behavior to what I am experiencing, the disks drop from the array, but they are in /dev by the time I get a chance to see them. I am however not running smard to my knowledge, smartmontools is installed and I access it through the webmin module, but checking the drives with that and the array failures have not happened at the same time. Gabor Gombas wrote: > On Sun, Nov 08, 2009 at 09:45:40PM -0500, Ryan Wagoner wrote: > > >> This is interesting to hear as I have been using smartmontools on my >> Supermicro LSI 1068E controller with the target firmware for 2 years >> now on CentOS 5. I have 3 RAID 1 arrays across 2 drives, a RAID 5 >> drive across 3 drives, and a RAID 0 across 2 drives. >> > > [...] > > >> uname -r >> 2.6.18-128.4.1.el5 >> > > Kernel version matters. With 2.6.22 we only got occassional complaints > that the drives are not capable of SMART checks that were not true but > were otherwise harmless. With 2.6.26 and 2.6.30, the controller offlines > the disks. > > Gabor > > -- Andrew Dunn http://agdunn.net ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RAID 6 Failure follow up 2009-11-09 10:08 ` Andrew Dunn @ 2009-11-09 11:34 ` Gabor Gombas 2009-11-09 22:04 ` Andrew Dunn 2009-11-10 10:55 ` Andrew Dunn 0 siblings, 2 replies; 23+ messages in thread From: Gabor Gombas @ 2009-11-09 11:34 UTC (permalink / raw) To: Andrew Dunn; +Cc: Ryan Wagoner, Richard Scobie, Linux RAID Mailing List On Mon, Nov 09, 2009 at 05:08:23AM -0500, Andrew Dunn wrote: > does it momentarily offline the disks? like they re-appear in /dev > within moments? That would be similar behavior to what I am > experiencing, the disks drop from the array, but they are in /dev by the > time I get a chance to see them. No, either the disks need to be physically removed and re-inserted, or the machine needs to be rebooted. Gabor -- --------------------------------------------------------- MTA SZTAKI Computer and Automation Research Institute Hungarian Academy of Sciences --------------------------------------------------------- ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RAID 6 Failure follow up 2009-11-09 11:34 ` Gabor Gombas @ 2009-11-09 22:04 ` Andrew Dunn 2009-11-10 10:55 ` Andrew Dunn 1 sibling, 0 replies; 23+ messages in thread From: Andrew Dunn @ 2009-11-09 22:04 UTC (permalink / raw) To: Gabor Gombas; +Cc: Ryan Wagoner, Richard Scobie, Linux RAID Mailing List I am not experiencing this issue then. My devices are in /dev after the raid drop out. I can use smart scanning on them without issue also. Gabor Gombas wrote: > On Mon, Nov 09, 2009 at 05:08:23AM -0500, Andrew Dunn wrote: > > >> does it momentarily offline the disks? like they re-appear in /dev >> within moments? That would be similar behavior to what I am >> experiencing, the disks drop from the array, but they are in /dev by the >> time I get a chance to see them. >> > > No, either the disks need to be physically removed and re-inserted, or > the machine needs to be rebooted. > > Gabor > > -- Andrew Dunn http://agdunn.net ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RAID 6 Failure follow up 2009-11-09 11:34 ` Gabor Gombas 2009-11-09 22:04 ` Andrew Dunn @ 2009-11-10 10:55 ` Andrew Dunn 2009-11-10 11:34 ` Vincent Schut 2009-11-10 12:45 ` Ryan Wagoner 1 sibling, 2 replies; 23+ messages in thread From: Andrew Dunn @ 2009-11-10 10:55 UTC (permalink / raw) To: Gabor Gombas; +Cc: Ryan Wagoner, Richard Scobie, Linux RAID Mailing List I am able to reproduce this smart error now. I have done it twice, so maybe other things are causing this also. When I scanned the devices this morning with smartctl via webmin I lost 8 of the 9 drives. They are howerver still in my /dev folder. Now I sent out my logs from the first failure last night, smartctl was on the system... I dont know if ubuntu server's default smartd configuration makes it do periodic scans because I didnt change anything. I would hate to move back to 9.10 and see this problem again. Should I just not install smartmontools? This seems like a bad solution because now I wont be able to check the drives in advance for failures. Have you installed LSI's linux drivers? Some people say this solves their issue. From the logs sent out last night do you think it could be something else? Thanks a ton, Gabor Gombas wrote: > On Mon, Nov 09, 2009 at 05:08:23AM -0500, Andrew Dunn wrote: > > >> does it momentarily offline the disks? like they re-appear in /dev >> within moments? That would be similar behavior to what I am >> experiencing, the disks drop from the array, but they are in /dev by the >> time I get a chance to see them. >> > > No, either the disks need to be physically removed and re-inserted, or > the machine needs to be rebooted. > > Gabor > > -- Andrew Dunn http://agdunn.net ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RAID 6 Failure follow up 2009-11-10 10:55 ` Andrew Dunn @ 2009-11-10 11:34 ` Vincent Schut 2009-11-11 12:34 ` Andrew Dunn 2009-11-17 8:40 ` Vincent Schut 2009-11-10 12:45 ` Ryan Wagoner 1 sibling, 2 replies; 23+ messages in thread From: Vincent Schut @ 2009-11-10 11:34 UTC (permalink / raw) To: Andrew Dunn Cc: Gabor Gombas, Ryan Wagoner, Richard Scobie, Linux RAID Mailing List Andrew Dunn wrote: > I am able to reproduce this smart error now. I have done it twice, so > maybe other things are causing this also. > > When I scanned the devices this morning with smartctl via webmin I lost > 8 of the 9 drives. They are howerver still in my /dev folder. > > Now I sent out my logs from the first failure last night, smartctl was > on the system... I dont know if ubuntu server's default smartd > configuration makes it do periodic scans because I didnt change anything. > > I would hate to move back to 9.10 and see this problem again. > > Should I just not install smartmontools? This seems like a bad solution > because now I wont be able to check the drives in advance for failures. > > Have you installed LSI's linux drivers? Some people say this solves > their issue. > > From the logs sent out last night do you think it could be something else? > > Thanks a ton, FWIW, I encountered the same issue, and seem to have found a viable workaround by accessing the SATA disks on that LSI backplane as scsi devices, e.g. by adding '-d scsi' to my smartctl/smartd.conf lines. No more errors in the logs, no more drives being kicked out. Though not as much info is available that way as when using de sata driver ('-d sat', or automatically), like temperature is unavailable, it does allow me to initiate the selftests and get their result, and to monitor generic smart status of the drives. Quite enough for me. YMMV, though. Vincent. > > Gabor Gombas wrote: >> On Mon, Nov 09, 2009 at 05:08:23AM -0500, Andrew Dunn wrote: >> >> >>> does it momentarily offline the disks? like they re-appear in /dev >>> within moments? That would be similar behavior to what I am >>> experiencing, the disks drop from the array, but they are in /dev by the >>> time I get a chance to see them. >>> >> No, either the disks need to be physically removed and re-inserted, or >> the machine needs to be rebooted. >> >> Gabor >> >> > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RAID 6 Failure follow up 2009-11-10 11:34 ` Vincent Schut @ 2009-11-11 12:34 ` Andrew Dunn 2009-11-11 12:46 ` Vincent Schut 2009-11-17 8:40 ` Vincent Schut 1 sibling, 1 reply; 23+ messages in thread From: Andrew Dunn @ 2009-11-11 12:34 UTC (permalink / raw) To: Vincent Schut Cc: Gabor Gombas, Ryan Wagoner, Richard Scobie, Linux RAID Mailing List Thanks for your help, so far without smartctl installed I have had no issues... but it has only been about 12 hours. Could you send me your smatd.conf? Vincent Schut wrote: > Andrew Dunn wrote: >> I am able to reproduce this smart error now. I have done it twice, so >> maybe other things are causing this also. >> >> When I scanned the devices this morning with smartctl via webmin I lost >> 8 of the 9 drives. They are howerver still in my /dev folder. >> >> Now I sent out my logs from the first failure last night, smartctl was >> on the system... I dont know if ubuntu server's default smartd >> configuration makes it do periodic scans because I didnt change >> anything. >> >> I would hate to move back to 9.10 and see this problem again. >> >> Should I just not install smartmontools? This seems like a bad solution >> because now I wont be able to check the drives in advance for failures. >> >> Have you installed LSI's linux drivers? Some people say this solves >> their issue. >> >> From the logs sent out last night do you think it could be something >> else? >> >> Thanks a ton, > > FWIW, I encountered the same issue, and seem to have found a viable > workaround by accessing the SATA disks on that LSI backplane as scsi > devices, e.g. by adding '-d scsi' to my smartctl/smartd.conf lines. No > more errors in the logs, no more drives being kicked out. > Though not as much info is available that way as when using de sata > driver ('-d sat', or automatically), like temperature is unavailable, > it does allow me to initiate the selftests and get their result, and > to monitor generic smart status of the drives. Quite enough for me. > > YMMV, though. > > Vincent. >> >> Gabor Gombas wrote: >>> On Mon, Nov 09, 2009 at 05:08:23AM -0500, Andrew Dunn wrote: >>> >>> >>>> does it momentarily offline the disks? like they re-appear in /dev >>>> within moments? That would be similar behavior to what I am >>>> experiencing, the disks drop from the array, but they are in /dev >>>> by the >>>> time I get a chance to see them. >>>> >>> No, either the disks need to be physically removed and re-inserted, or >>> the machine needs to be rebooted. >>> >>> Gabor >>> >>> >> > > -- Andrew Dunn http://agdunn.net ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RAID 6 Failure follow up 2009-11-11 12:34 ` Andrew Dunn @ 2009-11-11 12:46 ` Vincent Schut 0 siblings, 0 replies; 23+ messages in thread From: Vincent Schut @ 2009-11-11 12:46 UTC (permalink / raw) To: linux-raid Andrew Dunn wrote: > Thanks for your help, so far without smartctl installed I have had no > issues... but it has only been about 12 hours. I also had no issues when not running smartd/smartctl. It seems the combination of kernel, backplane SAS driver, and smart which triggers the trouble... > > Could you send me your smatd.conf? It's pretty much default, there's just one uncommented line in it: DEVICESCAN -d scsi -a -o on -S on -s (S/../.././02|L/../../6/03) -W 4,45,55 -R 5 -m my@mail.address -M exec /usr/share/smartmontools/smartd-runner (the above 3 lines should be all on one line). I plan to replace the devicescan with explicit /dev/sd.. items, but as I'm currently regularly adding and removing (usb) drives, I kept the auto devicescan statement. The rest means: enable smart on all drives, plan daily short and weekly long selftests, and warn on temperature too high or temp change of more than 5 deg., and mail warnings/errors to me. VS. > > Vincent Schut wrote: >> Andrew Dunn wrote: >>> I am able to reproduce this smart error now. I have done it twice, so >>> maybe other things are causing this also. >>> >>> When I scanned the devices this morning with smartctl via webmin I lost >>> 8 of the 9 drives. They are howerver still in my /dev folder. >>> >>> Now I sent out my logs from the first failure last night, smartctl was >>> on the system... I dont know if ubuntu server's default smartd >>> configuration makes it do periodic scans because I didnt change >>> anything. >>> >>> I would hate to move back to 9.10 and see this problem again. >>> >>> Should I just not install smartmontools? This seems like a bad solution >>> because now I wont be able to check the drives in advance for failures. >>> >>> Have you installed LSI's linux drivers? Some people say this solves >>> their issue. >>> >>> From the logs sent out last night do you think it could be something >>> else? >>> >>> Thanks a ton, >> FWIW, I encountered the same issue, and seem to have found a viable >> workaround by accessing the SATA disks on that LSI backplane as scsi >> devices, e.g. by adding '-d scsi' to my smartctl/smartd.conf lines. No >> more errors in the logs, no more drives being kicked out. >> Though not as much info is available that way as when using de sata >> driver ('-d sat', or automatically), like temperature is unavailable, >> it does allow me to initiate the selftests and get their result, and >> to monitor generic smart status of the drives. Quite enough for me. >> >> YMMV, though. >> >> Vincent. >>> Gabor Gombas wrote: >>>> On Mon, Nov 09, 2009 at 05:08:23AM -0500, Andrew Dunn wrote: >>>> >>>> >>>>> does it momentarily offline the disks? like they re-appear in /dev >>>>> within moments? That would be similar behavior to what I am >>>>> experiencing, the disks drop from the array, but they are in /dev >>>>> by the >>>>> time I get a chance to see them. >>>>> >>>> No, either the disks need to be physically removed and re-inserted, or >>>> the machine needs to be rebooted. >>>> >>>> Gabor >>>> >>>> >> > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RAID 6 Failure follow up 2009-11-10 11:34 ` Vincent Schut 2009-11-11 12:34 ` Andrew Dunn @ 2009-11-17 8:40 ` Vincent Schut 1 sibling, 0 replies; 23+ messages in thread From: Vincent Schut @ 2009-11-17 8:40 UTC (permalink / raw) Cc: Andrew Dunn, Gabor Gombas, Ryan Wagoner, Richard Scobie, Linux RAID Mailing List Vincent Schut wrote: > Andrew Dunn wrote: >> I am able to reproduce this smart error now. I have done it twice, so >> maybe other things are causing this also. >> >> When I scanned the devices this morning with smartctl via webmin I lost >> 8 of the 9 drives. They are howerver still in my /dev folder. >> >> Now I sent out my logs from the first failure last night, smartctl was >> on the system... I dont know if ubuntu server's default smartd >> configuration makes it do periodic scans because I didnt change anything. >> >> I would hate to move back to 9.10 and see this problem again. >> >> Should I just not install smartmontools? This seems like a bad solution >> because now I wont be able to check the drives in advance for failures. >> >> Have you installed LSI's linux drivers? Some people say this solves >> their issue. >> >> From the logs sent out last night do you think it could be something >> else? >> >> Thanks a ton, > > FWIW, I encountered the same issue, and seem to have found a viable > workaround by accessing the SATA disks on that LSI backplane as scsi > devices, e.g. by adding '-d scsi' to my smartctl/smartd.conf lines. No > more errors in the logs, no more drives being kicked out. > Though not as much info is available that way as when using de sata > driver ('-d sat', or automatically), like temperature is unavailable, it > does allow me to initiate the selftests and get their result, and to > monitor generic smart status of the drives. Quite enough for me. > > YMMV, though. Folks, I need to retract this. Thought I've had far less problems with '-d scsi' instead of '-d sat' when running the LSI SAS / smartmontools / mdadm combo, I got bitten again last night by a drive being kicked out for no apparent reason. For now my only possible advise is: don't use smartmontools on drives that are on this LSI SAS backplane. I dearly hope this will improve soon; I hate it to have my drives go unmonitored for too long... Vincent. > > Vincent. >> >> Gabor Gombas wrote: >>> On Mon, Nov 09, 2009 at 05:08:23AM -0500, Andrew Dunn wrote: >>> >>> >>>> does it momentarily offline the disks? like they re-appear in /dev >>>> within moments? That would be similar behavior to what I am >>>> experiencing, the disks drop from the array, but they are in /dev by >>>> the >>>> time I get a chance to see them. >>>> >>> No, either the disks need to be physically removed and re-inserted, or >>> the machine needs to be rebooted. >>> >>> Gabor >>> >>> >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RAID 6 Failure follow up 2009-11-10 10:55 ` Andrew Dunn 2009-11-10 11:34 ` Vincent Schut @ 2009-11-10 12:45 ` Ryan Wagoner 1 sibling, 0 replies; 23+ messages in thread From: Ryan Wagoner @ 2009-11-10 12:45 UTC (permalink / raw) To: Andrew Dunn; +Cc: Linux RAID Mailing List Boot up a CentOS 5 LiveCD. It should detect your arrays and try running smartctl. From my experience with different distros I have found that Red Hat spends a good amount of time making sure enterprise hardware is stable on their system. Ubuntu seems to focus more on desktops. Ryan On Tue, Nov 10, 2009 at 5:55 AM, Andrew Dunn <andrew.g.dunn@gmail.com> wrote: > I am able to reproduce this smart error now. I have done it twice, so > maybe other things are causing this also. > > When I scanned the devices this morning with smartctl via webmin I lost > 8 of the 9 drives. They are howerver still in my /dev folder. > > Now I sent out my logs from the first failure last night, smartctl was > on the system... I dont know if ubuntu server's default smartd > configuration makes it do periodic scans because I didnt change anything. > > I would hate to move back to 9.10 and see this problem again. > > Should I just not install smartmontools? This seems like a bad solution > because now I wont be able to check the drives in advance for failures. > > Have you installed LSI's linux drivers? Some people say this solves > their issue. > > From the logs sent out last night do you think it could be something else? > > Thanks a ton, > > Gabor Gombas wrote: >> On Mon, Nov 09, 2009 at 05:08:23AM -0500, Andrew Dunn wrote: >> >> >>> does it momentarily offline the disks? like they re-appear in /dev >>> within moments? That would be similar behavior to what I am >>> experiencing, the disks drop from the array, but they are in /dev by the >>> time I get a chance to see them. >>> >> >> No, either the disks need to be physically removed and re-inserted, or >> the machine needs to be rebooted. >> >> Gabor >> >> > > -- > Andrew Dunn > http://agdunn.net > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RAID 6 Failure follow up 2009-11-08 14:23 ` Roger Heflin 2009-11-08 14:30 ` Andrew Dunn @ 2009-11-08 14:36 ` Andrew Dunn 2009-11-08 14:56 ` Roger Heflin 1 sibling, 1 reply; 23+ messages in thread From: Andrew Dunn @ 2009-11-08 14:36 UTC (permalink / raw) To: Roger Heflin; +Cc: linux-raid list [10:0:0:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sde [10:0:1:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sdf [10:0:2:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sdg [10:0:3:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sdh [11:0:0:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sdi [11:0:1:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sdj [11:0:2:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sdk [11:0:3:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sdl [11:0:4:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sdm So 4 drives dropped out on the second controller. But why didnt sdm go with them? Roger Heflin wrote: > Andrew Dunn wrote: >> This is kind of interesting: >> >> storrgie@ALEXANDRIA:~$ sudo mdadm --assemble --force /dev/md0 >> mdadm: no devices found for /dev/md0 >> >> All of the devices are there in /dev, so I wanted to examine them: >> >> storrgie@ALEXANDRIA:~$ sudo mdadm --examine /dev/sde1 >> /dev/sde1: >> Magic : a92b4efc >> Version : 00.90.00 >> UUID : 397e0b3f:34cbe4cc:613e2239:070da8c8 (local to host >> ALEXANDRIA) >> Creation Time : Fri Nov 6 07:06:34 2009 >> Raid Level : raid6 >> Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) >> Array Size : 6837318656 (6520.58 GiB 7001.41 GB) >> Raid Devices : 9 >> Total Devices : 9 >> Preferred Minor : 0 >> >> Update Time : Sun Nov 8 08:57:04 2009 >> State : clean >> Active Devices : 5 >> Working Devices : 5 >> Failed Devices : 4 >> Spare Devices : 0 >> Checksum : 4ff41c5f - correct >> Events : 43 >> >> Chunk Size : 1024K >> >> Number Major Minor RaidDevice State >> this 0 8 65 0 active sync /dev/sde1 >> >> 0 0 8 65 0 active sync /dev/sde1 >> 1 1 8 81 1 active sync /dev/sdf1 >> 2 2 8 97 2 active sync /dev/sdg1 >> 3 3 8 113 3 active sync /dev/sdh1 >> 4 4 0 0 4 faulty removed >> 5 5 0 0 5 faulty removed >> 6 6 0 0 6 faulty removed >> 7 7 0 0 7 faulty removed >> 8 8 8 193 8 active sync /dev/sdm1 >> >> First raid device shows the failures.... >> >> One of the 'removed' devices: >> >> storrgie@ALEXANDRIA:~$ sudo mdadm --examine /dev/sdi1 >> /dev/sdi1: >> Magic : a92b4efc >> Version : 00.90.00 >> UUID : 397e0b3f:34cbe4cc:613e2239:070da8c8 (local to host >> ALEXANDRIA) >> Creation Time : Fri Nov 6 07:06:34 2009 >> Raid Level : raid6 >> Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) >> Array Size : 6837318656 (6520.58 GiB 7001.41 GB) >> Raid Devices : 9 >> Total Devices : 9 >> Preferred Minor : 0 >> >> Update Time : Sun Nov 8 08:53:30 2009 >> State : active >> Active Devices : 9 >> Working Devices : 9 >> Failed Devices : 0 >> Spare Devices : 0 >> Checksum : 4ff41b2f - correct >> Events : 21 >> >> Chunk Size : 1024K >> >> Number Major Minor RaidDevice State >> this 4 8 129 4 active sync /dev/sdi1 >> >> 0 0 8 65 0 active sync /dev/sde1 >> 1 1 8 81 1 active sync /dev/sdf1 >> 2 2 8 97 2 active sync /dev/sdg1 >> 3 3 8 113 3 active sync /dev/sdh1 >> 4 4 8 129 4 active sync /dev/sdi1 >> 5 5 8 145 5 active sync /dev/sdj1 >> 6 6 8 161 6 active sync /dev/sdk1 >> 7 7 8 177 7 active sync /dev/sdl1 >> 8 8 8 193 8 active sync /dev/sdm1 >> > > > Did you check dmesg and see if there were errors on those disks? > > -- Andrew Dunn http://agdunn.net ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RAID 6 Failure follow up 2009-11-08 14:36 ` Andrew Dunn @ 2009-11-08 14:56 ` Roger Heflin 2009-11-08 17:08 ` Andrew Dunn 0 siblings, 1 reply; 23+ messages in thread From: Roger Heflin @ 2009-11-08 14:56 UTC (permalink / raw) To: Andrew Dunn; +Cc: linux-raid list Andrew Dunn wrote: > [10:0:0:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sde > [10:0:1:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sdf > [10:0:2:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sdg > [10:0:3:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sdh > [11:0:0:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sdi > [11:0:1:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sdj > [11:0:2:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sdk > [11:0:3:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sdl > [11:0:4:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sdm > > So 4 drives dropped out on the second controller. But why didnt sdm go > with them? > > It is possible that by the time it got to checking the last drive that the errors had cleared up, so sdm was ok with it checked. Is this on a port multiplier? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RAID 6 Failure follow up 2009-11-08 14:56 ` Roger Heflin @ 2009-11-08 17:08 ` Andrew Dunn 0 siblings, 0 replies; 23+ messages in thread From: Andrew Dunn @ 2009-11-08 17:08 UTC (permalink / raw) To: Roger Heflin; +Cc: linux-raid list No multiplier, they are on a backpane though. 2 on one backpane, 3 on another... but only 2 of the 3 dropped off that one. I looked through dmesg some more, maybe you all might see something of significance. I don't think this was around when it happened, but it might shed light onto the issue. I will continue to sift through the log. [ 19.021969] scsi10 : ioc0: LSISAS1068E B3, FwRev=011a0000h, Ports=1, MaxQ=478, IRQ=16 [ 19.061176] mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 0, phy 0, sas_addr 0x1221000000000000 [ 19.063708] scsi 10:0:0:0: Direct-Access ATA WDC WD1001FALS-0 0K05 PQ: 0 ANSI: 5 [ 19.065473] sd 10:0:0:0: Attached scsi generic sg4 type 0 [ 19.067322] mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 1, phy 1, sas_addr 0x1221000001000000 [ 19.068074] sd 10:0:0:0: [sde] 1953523055 512-byte logical blocks: (1.00 TB/931 GiB) [ 19.070474] scsi 10:0:1:0: Direct-Access ATA WDC WD1001FALS-0 0K05 PQ: 0 ANSI: 5 [ 19.072797] sd 10:0:1:0: Attached scsi generic sg5 type 0 [ 19.074994] mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 4, phy 4, sas_addr 0x1221000004000000 [ 19.076025] sd 10:0:1:0: [sdf] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) [ 19.078091] scsi 10:0:2:0: Direct-Access ATA WDC WD1001FALS-0 0K05 PQ: 0 ANSI: 5 [ 19.080417] sd 10:0:2:0: Attached scsi generic sg6 type 0 [ 19.082589] mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 5, phy 5, sas_addr 0x1221000005000000 [ 19.082966] sd 10:0:0:0: [sde] Write Protect is off [ 19.082970] sd 10:0:0:0: [sde] Mode Sense: 73 00 00 08 [ 19.084186] sd 10:0:2:0: [sdg] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) [ 19.086521] sd 10:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 19.087036] scsi 10:0:3:0: Direct-Access ATA WDC WD1001FALS-0 0K05 PQ: 0 ANSI: 5 [ 19.088389] sd 10:0:1:0: [sdf] Write Protect is off [ 19.088393] sd 10:0:1:0: [sdf] Mode Sense: 73 00 00 08 [ 19.089642] sd 10:0:3:0: Attached scsi generic sg7 type 0 [ 19.092400] mptsas 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 [ 19.092525] mptbase: ioc1: Initiating bringup [ 19.093974] sd 10:0:3:0: [sdh] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) [ 19.095129] sd 10:0:1:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 19.101887] sd 10:0:2:0: [sdg] Write Protect is off [ 19.101891] sd 10:0:2:0: [sdg] Mode Sense: 73 00 00 08 [ 19.104250] sd 10:0:2:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 19.107231] sd 10:0:3:0: [sdh] Write Protect is off [ 19.107236] sd 10:0:3:0: [sdh] Mode Sense: 73 00 00 08 [ 19.109398] sde: [ 19.111301] sd 10:0:3:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 19.111659] sdf: [ 19.118664] sdg: sdf1 [ 19.122127] sde1 [ 19.126192] sdh: sdg1 [ 19.137786] sd 10:0:1:0: [sdf] Attached SCSI disk [ 19.143743] sdh1 [ 19.146360] sd 10:0:0:0: [sde] Attached SCSI disk [ 19.148589] sd 10:0:2:0: [sdg] Attached SCSI disk [ 19.158613] sd 10:0:3:0: [sdh] Attached SCSI disk [ 20.780022] ioc1: LSISAS1068E B3: Capabilities={Initiator} [ 20.780035] mptsas 0000:02:00.0: setting latency timer to 64 [ 30.971934] scsi11 : ioc1: LSISAS1068E B3, FwRev=011a0000h, Ports=1, MaxQ=478, IRQ=16 [ 31.012437] mptsas: ioc1: attaching sata device: fw_channel 0, fw_id 0, phy 0, sas_addr 0x1221000000000000 [ 31.015009] scsi 11:0:0:0: Direct-Access ATA WDC WD1001FALS-0 0K05 PQ: 0 ANSI: 5 [ 31.016755] sd 11:0:0:0: Attached scsi generic sg8 type 0 [ 31.018603] mptsas: ioc1: attaching sata device: fw_channel 0, fw_id 1, phy 1, sas_addr 0x1221000001000000 [ 31.019358] sd 11:0:0:0: [sdi] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) [ 31.021753] scsi 11:0:1:0: Direct-Access ATA WDC WD1001FALS-0 0K05 PQ: 0 ANSI: 5 [ 31.024075] sd 11:0:1:0: Attached scsi generic sg9 type 0 [ 31.026273] mptsas: ioc1: attaching sata device: fw_channel 0, fw_id 4, phy 4, sas_addr 0x1221000004000000 [ 31.027302] sd 11:0:1:0: [sdj] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) [ 31.029693] scsi 11:0:2:0: Direct-Access ATA WDC WD1001FALS-0 0K05 PQ: 0 ANSI: 5 [ 31.032004] sd 11:0:2:0: Attached scsi generic sg10 type 0 [ 31.032233] sd 11:0:0:0: [sdi] Write Protect is off [ 31.032235] sd 11:0:0:0: [sdi] Mode Sense: 73 00 00 08 [ 31.034133] mptsas: ioc1: attaching sata device: fw_channel 0, fw_id 5, phy 5, sas_addr 0x1221000005000000 [ 31.035571] sd 11:0:2:0: [sdk] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) [ 31.037483] sd 11:0:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 31.038793] scsi 11:0:3:0: Direct-Access ATA WDC WD1001FALS-0 0K05 PQ: 0 ANSI: 5 [ 31.041160] sd 11:0:3:0: Attached scsi generic sg11 type 0 [ 31.043506] mptsas: ioc1: attaching sata device: fw_channel 0, fw_id 6, phy 6, sas_addr 0x1221000006000000 [ 31.043884] sd 11:0:1:0: [sdj] Write Protect is off [ 31.043887] sd 11:0:1:0: [sdj] Mode Sense: 73 00 00 08 [ 31.046683] sd 11:0:3:0: [sdl] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) [ 31.047038] sd 11:0:1:0: [sdj] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 31.050845] scsi 11:0:4:0: Direct-Access ATA WDC WD1001FALS-0 0K05 PQ: 0 ANSI: 5 [ 31.054206] sd 11:0:4:0: Attached scsi generic sg12 type 0 [ 31.056125] sd 11:0:2:0: [sdk] Write Protect is off [ 31.056129] sd 11:0:2:0: [sdk] Mode Sense: 73 00 00 08 [ 31.059805] sd 11:0:4:0: [sdm] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) [ 31.061019] sd 11:0:2:0: [sdk] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 31.065705] sd 11:0:3:0: [sdl] Write Protect is off [ 31.065710] sd 11:0:3:0: [sdl] Mode Sense: 73 00 00 08 [ 31.066991] sdi: [ 31.069131] sdj: [ 31.070087] sd 11:0:3:0: [sdl] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 31.073259] sd 11:0:4:0: [sdm] Write Protect is off [ 31.073262] sd 11:0:4:0: [sdm] Mode Sense: 73 00 00 08 [ 31.074045] sdj1 [ 31.075719] sdi1 [ 31.077424] sd 11:0:4:0: [sdm] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 31.083141] sdk: [ 31.090760] sdl: sdk1 [ 31.099798] sdm: sdl1 [ 31.115614] sdm1 [ 31.122247] sd 11:0:1:0: [sdj] Attached SCSI disk [ 31.124713] sd 11:0:0:0: [sdi] Attached SCSI disk [ 31.131908] sd 11:0:2:0: [sdk] Attached SCSI disk [ 31.141444] md: bind<sdj1> [ 31.143383] sd 11:0:3:0: [sdl] Attached SCSI disk [ 31.147407] md: bind<sdi1> [ 31.153910] sd 11:0:4:0: [sdm] Attached SCSI disk [ 31.159932] md: bind<sdl1> [ 31.176695] md: bind<sdm1> [ 31.265544] md: bind<sde1> [ 31.354001] md: bind<sdk1> [ 31.467249] md: bind<sdh1> [ 31.476153] md: bind<sdg1> [ 31.670444] md: bind<sdf1> [ 31.672643] md: kicking non-fresh sdk1 from array! [ 31.672652] md: unbind<sdk1> [ 31.711286] md: export_rdev(sdk1) [ 31.712356] raid5: device sdf1 operational as raid disk 1 [ 31.712358] raid5: device sdg1 operational as raid disk 2 [ 31.712360] raid5: device sdh1 operational as raid disk 3 [ 31.712362] raid5: device sde1 operational as raid disk 0 [ 31.712363] raid5: device sdm1 operational as raid disk 8 [ 31.712365] raid5: device sdl1 operational as raid disk 7 [ 31.712366] raid5: device sdi1 operational as raid disk 4 [ 31.712368] raid5: device sdj1 operational as raid disk 5 [ 31.712962] raid5: allocated 9540kB for md0 [ 31.713094] raid5: raid level 6 set md0 active with 8 out of 9 devices, algorithm 2 Roger Heflin wrote: > Andrew Dunn wrote: >> [10:0:0:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sde >> [10:0:1:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sdf >> [10:0:2:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sdg >> [10:0:3:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sdh >> [11:0:0:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sdi >> [11:0:1:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sdj >> [11:0:2:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sdk >> [11:0:3:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sdl >> [11:0:4:0] disk ATA WDC WD1001FALS-0 0K05 /dev/sdm >> >> So 4 drives dropped out on the second controller. But why didnt sdm go >> with them? >> >> > > It is possible that by the time it got to checking the last drive that > the errors had cleared up, so sdm was ok with it checked. > > > Is this on a port multiplier? > > -- Andrew Dunn http://agdunn.net ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2009-11-17 8:40 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-11-08 14:07 RAID 6 Failure follow up Andrew Dunn 2009-11-08 14:23 ` Roger Heflin 2009-11-08 14:30 ` Andrew Dunn 2009-11-08 18:01 ` Richard Scobie 2009-11-08 18:22 ` Andrew Dunn 2009-11-08 18:34 ` Joe Landman 2009-11-08 22:09 ` Andrew Dunn 2009-11-08 22:59 ` Richard Scobie 2009-11-09 2:45 ` Ryan Wagoner 2009-11-09 2:57 ` Richard Scobie 2009-11-09 8:09 ` Gabor Gombas 2009-11-09 10:08 ` Andrew Dunn 2009-11-09 11:34 ` Gabor Gombas 2009-11-09 22:04 ` Andrew Dunn 2009-11-10 10:55 ` Andrew Dunn 2009-11-10 11:34 ` Vincent Schut 2009-11-11 12:34 ` Andrew Dunn 2009-11-11 12:46 ` Vincent Schut 2009-11-17 8:40 ` Vincent Schut 2009-11-10 12:45 ` Ryan Wagoner 2009-11-08 14:36 ` Andrew Dunn 2009-11-08 14:56 ` Roger Heflin 2009-11-08 17:08 ` Andrew Dunn
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.