* strange problem with raid6 read errors on active non-degraded array
@ 2014-07-02 9:32 Pedro Teixeira
2014-07-02 9:52 ` Roman Mamedov
2014-07-02 10:45 ` NeilBrown
0 siblings, 2 replies; 19+ messages in thread
From: Pedro Teixeira @ 2014-07-02 9:32 UTC (permalink / raw)
To: linux-raid
- I'm having the following problem on a raid6 md volume consisting og
16 1TB Seagtes SSHD's. ( using kernel 3.15.3 or 3.14.0 ) mdadm is 3.3.
- every time I run a fsck.ext4 I will get the exact same errors (
...short read ). Forcing a repair on the md0 volume shows no errors
and completes without problems. All disks are active and the volume is
not degraded, still I can't get rid of the short errors on those 16
blocks and when the filesystem is mounted the read errors will come up
from time to time as they are probably in use.
- If I try to read those blocks with DD ( dd if=/dev/md0 of=test.txt
seek=458227712 count=6 bs=4096 ) it will instantly create a 1.8T file
but the file doesn't appear to have nothing on it ( and the file
doesn't take the 1.8T on disk as the disk is much smaller )
- this started happening after having a three disk failure. I
recovered from that failure by recreating the array with the
non-failed 13 disks plus the last failed one ( events didn't differ
much ). I then readed the other disks. The failed disks are all
physically good, tested them with hdat2 and they don't have read/write
errors so I reused them. I don't know why they failed, maybe some
incompatibility with SSHD's and the LSI HBA controller..
root@nas3:/# dd if=/dev/md0 of=teste.txt seek=458227712 count=6 bs=4096
6+0 records in
6+0 records out
24576 bytes (25 kB) copied, 0.0019239 s, 12.8 MB/s
root@nas3:/# ls -lah teste.txt
-rw-r--r-- 1 root root 1.8T Jul 2 10:22 teste.txt
root@nas3:/#
root@nas3:/# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sde[0] sdq[15] sdp[14] sdo[17] sdn[19] sdm[16]
sdl[18] sdk[9] sdj[8] sdi[7] sdh[6] sdg[5] sdf[4] sdb[3] sdd[2] sdc[1]
13672838144 blocks super 1.2 level 6, 512k chunk, algorithm 2
[16/16] [UUUUUUUUUUUUUUUU]
- When doing a fsck.ext4 of /dev/md0 it returns the following ( and I
can do it over and over again with the exact same errors) :
root@nas3:/# fsck.ext4 -f /dev/md0
e2fsck 1.42.10 (18-May-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Error reading block 458227712 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227713 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227714 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227715 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227716 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227717 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227718 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227719 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227720 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227721 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227722 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227723 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227724 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227725 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227726 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227727 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Block bitmap differences: +(458227712--458231839)
+(458234642--458235681) +(458244447--458245519)
+(458246454--458247229) +(458248461--458248750) +458250468 +458251108
+(458261280--458261284) +(458263296--458263297) +458263312 +458263328
+(458265376--458265379) +(458267392--458267394)
+(458269440--458269441) +458269456 +(458269472--458269474)
+(458271520--458271543) +(458273536--458273547)
+(458275584--458275585) +458275600 +458275616 +(458277664--458277669)
+(458279680--458279682) +(458281728--458281729)
+(458283776--458284059) +458285824 +458285837 +458285840
+(458285856--458285857) +(458287904--458287907)
+(458289920--458289922) +458291968 +458291984 +458292000
+(458294048--458294054) +(458296064--458296065) +458296080
+(458296096--458296116) +(458298144--458298169)
+(458300160--458300504) +(458302208--458302209) +458302224 +458302240
+(458304288--458304298) +(458306304--458307950)
+(458310400--458310401) +458310416 +458310432 +(458312480--458312483)
+458314496 +458316544 +458316550 +458317824 +(458321152--458321950)
+458321952 +458321954 +458321956 +458321958 +458321965 +458321981
+(458323981--458323986) +(458327296--458327297)
+(458328094--458328097) +(458331392--458331393)
+(458333440--458333441) +(458335488--458335489)
+(458337536--458337537) +458339584 +458339593 +458339595 +458339600
+458339616 +458341616 +(458343680--458343681) +(458345728--458345729)
+458347776 +458347792 +458347808 +458349808 +(458351872--458351874)
+458351888 +458351904 +458353904 +(458355968--458355969)
+(458356765--458356815) +(458359809--458360062) +458360064 +458360080
+458360096 +(458360113--458360120) +458362096 +(458364160--458364161)
+458364176 +458364192 +458366192 +(458368256--458368257) +458370304
+458370307 +(458373115--458373116) +458373119 +(458373127--458373160)
+(458375168--458379263) +(458379271--458379304)
+(458381319--458381352) +(458383360--458432511)
+(458433367--458433686) +(458434560--458514535)
+(458516480--458516488) +(458516496--458561535)
+(458561680--458565631) +(458565648--458574328)
+(458574416--458575982) +(458576912--458577167)
+(458577680--458579535) +(458579968--458582015)
+(458594304--458594585) +(458594632--458595592)
+(458595627--458595725) +(458595728--458596527)
+(458596545--458596687) +(458597423--458598607)
+(458598990--458602495) +(458602922--458603023)
+(458604256--458604623) +(458605072--458605135)
+(458605520--458605717) +(458605908--458608536)
+(458608642--458609662) +(458609680--458610704)
+(458610776--458613449) +(458613519--458615179)
+(458616265--458616831) +(458617702--458618383)
+(458618512--458619007) +(458619088--458619151)
+(458619896--458621625) +(458621648--458622175)
+(458622224--458622489) +(458622508--458622830)
+(458622848--458623129) +(458623162--458623345)
+(458623394--458623953) +(458623962--458624460)
+(458624896--458624975) +(458624986--458626127)
+(458626282--458627727) +(458627920--458629119)
+(458629195--458632207) +(458632695--458632841)
+(458633168--458633231) +(458633668--458633923)
+(458634370--458634621) +(458634646--458634660)
+(458634704--458635306) +(458635344--458636303)
+(458636734--458637311) +(458638356--458639359)
+(458639440--458640109) +(458640195--458645071)
+(458645178--458645503) +(458645776--458645922)
+(458646009--458646479) +(458646546--458647589)
+(458647696--458648655) +(458649040--458649807)
+(458650640--458651663) +(458652432--458653695)
+(458657064--458657199) +(458657792--458658625)
+(458658628--458658631) +(458658640--458659231)
+(458659513--458659748) +(458659792--458659882)
+(458660432--458661337) +(458661899--458663417)
+(458663760--458664083) +(458665232--458665295)
+(458665552--458665706) +(458665808--458668031)
+(458668240--458668855) +(458669126--458669127)
+(458669419--458670079) +(458674183--458674216) +458675464
+(458676231--458676267) +(458676360--458676370)
+(458676488--458676498) +458676616 +(458676744--458676754)
+(458676872--458676873) +458677000 +458677128 +(458677256--458677257)
+458677384 +458677512 +(458677640--458678410) +458678536 +458678664
+458678666 +(458678792--458678794) +458678920 +(458679048--458679049)
+458679306 +(458679688--458679770) +(458680327--458680360)
+(458681736--458681781) +(458682375--458682408)
+(458683784--458685154) +(458685192--458685193)
+(458685832--458685882) +(458686471--458686507)
+(458686600--458686604) +(458687112--458687115) +458687240 +458687368
+(458687880--458688062) +(458688264--458688265)
+(458688519--458688552) +(458689928--458690083)
+(458690567--458690602) +458690978 +(458691976--458693464)
+(458693510--458693514) +458693638 +(458693766--458693769) +458693894
+(458694024--458694652) +(458694663--458694696)
+(458696072--458705014) +458705160 +458705288 +(458705416--458705473)
+(458706312--458706320) +(458706951--458706984)
+(458708999--458709032) +(458711047--458711080)
+(458713095--458713128) +(458715143--458715176)
+(458717191--458717224) +(458719239--458719272) +458720616
+(458721287--458721320) +(458721416--458721421) +458721544 +458722056
+(458722184--458722187) +(458722696--458723254)
+(458723335--458723368) +458723976 +(458724360--458724361)
+(458725383--458725416) +(458725896--458725965)
+(458727431--458727464) +(458727942--458728837)
+(458729479--458729512) +(458731527--458731560)
+(458733575--458733703) +(458734984--458739136)
+(458739719--458739752) +(458741767--458741800)
+(458743815--458743848) +(458745863--458745896)
+(458747911--458747944) +(458749959--458749992) +(458751368--458751999)
Fix<y>? yes
/dev/md0: ***** FILE SYSTEM WAS MODIFIED *****
/dev/md0: 9057/427278336 files (7.2% non-contiguous),
1157126209/3418209536 blocks
dmesg ( while doing the fsck.txt4 ) shows:
[84019.232630] Buffer I/O error on device md0, logical block 458227712
[84019.232715] Buffer I/O error on device md0, logical block 458227712
[84024.149583] Buffer I/O error on device md0, logical block 458227713
[84024.149679] Buffer I/O error on device md0, logical block 458227713
[84025.073526] Buffer I/O error on device md0, logical block 458227714
[84025.073617] Buffer I/O error on device md0, logical block 458227715
[84025.073688] Buffer I/O error on device md0, logical block 458227716
[84025.073765] Buffer I/O error on device md0, logical block 458227714
[84026.571139] Buffer I/O error on device md0, logical block 458227715
[84027.654387] Buffer I/O error on device md0, logical block 458227717
[84027.654474] Buffer I/O error on device md0, logical block 458227718
[84027.654549] Buffer I/O error on device md0, logical block 458227719
[84027.654617] Buffer I/O error on device md0, logical block 458227720
[84027.654684] Buffer I/O error on device md0, logical block 458227721
[84030.577188] quiet_error: 8 callbacks suppressed
[84030.577190] Buffer I/O error on device md0, logical block 458227720
[84031.233856] Buffer I/O error on device md0, logical block 458227721
[84031.907058] Buffer I/O error on device md0, logical block 458227722
[84032.534278] Buffer I/O error on device md0, logical block 458227723
[84033.186672] Buffer I/O error on device md0, logical block 458227724
[84033.847581] Buffer I/O error on device md0, logical block 458227725
[84034.453947] Buffer I/O error on device md0, logical block 458227726
[84035.073116] Buffer I/O error on device md0, logical block 458227727
[84068.605347] Buffer I/O error on device md0, logical block 458227712
[84068.605427] lost page write due to I/O error on md0
[84068.605439] Buffer I/O error on device md0, logical block 458227713
[84068.605519] lost page write due to I/O error on md0
[84068.605528] Buffer I/O error on device md0, logical block 458227714
[84068.605747] lost page write due to I/O error on md0
[84068.605757] Buffer I/O error on device md0, logical block 458227715
[84068.605828] lost page write due to I/O error on md0
[84068.605837] Buffer I/O error on device md0, logical block 458227716
[84068.605910] lost page write due to I/O error on md0
[84068.605919] Buffer I/O error on device md0, logical block 458227717
[84068.605995] lost page write due to I/O error on md0
[84068.606048] Buffer I/O error on device md0, logical block 458227718
[84068.606217] lost page write due to I/O error on md0
[84068.606227] Buffer I/O error on device md0, logical block 458227719
[84068.606295] lost page write due to I/O error on md0
[84068.606327] Buffer I/O error on device md0, logical block 458227720
[84068.606398] lost page write due to I/O error on md0
[84068.606407] Buffer I/O error on device md0, logical block 458227721
[84068.606471] lost page write due to I/O error on md0
Doing a resync brings no errors and finishes without problem:
[24406.670968] md: requested-resync of RAID array md0
[24406.670971] md: minimum _guaranteed_ speed: 1410065407 KB/sec/disk.
[24406.670973] md: using maximum available idle IO bandwidth (but not
more than 1410065407 KB/sec) for requested-resync.
[24406.670981] md: using 128k window, over a total of 976631296k.
[33488.135225] md: md0: requested-resync done.
- doing:
root@nas3:/# debugfs /dev/md0
debugfs 1.42.10 (18-May-2014)
/dev/md0: Can't read a block bitmap while reading block bitmap
debugfs:
- brings the same kind of errors to dmesg.
- filesystem mounts and unmounts fine:
root@nas3:/# mount /dev/md0 /mnt
root@nas3:/# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 106G 5.3G 95G 6% /
tmpfs 3.9G 0 3.9G 0% /lib/init/rw
udev 3.9G 196K 3.9G 1% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 3.9G 0 3.9G 0% /tmp
/dev/md0 13T 4.3T 8.5T 34% /mnt
[84215.958792] EXT4-fs (md0): mounted filesystem with ordered data
mode. Opts: (null)
root@nas3:/# umount /mnt
mdadm --examine /dev/sd[bcdefghijklmnopqr] >> raid.status
/dev/sdb:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : b56fa722:c5be1eda:5b3e89cc:7199d266
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
Checksum : e8a1ec1f - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdc:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : e72b076e:42886d45:8978e63b:b70c3c1b
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : c3171f37 - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdd:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : a195ff09:a794b5fc:7c830670:bcf450f1
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 208c8851 - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sde:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : 6ab2bcfc:872649a6:a053e0fe:94fe1fc3
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
Checksum : 1d8610fd - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdf:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : f4612be4:5e8b4db0:4e23f28d:e37d27b6
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
Checksum : 9112745e - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 4
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdg:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : e595d71c:c45d6fda:24a49338:2615328b
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 738c92c6 - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 5
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdh:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : 347fa638:4193adb2:4b8616d4:058fff18
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
Checksum : 90ea0da1 - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 6
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdi:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : 2f6ab7cb:3957ffa0:8b2decd2:b133cb5a
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 52ee087a - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 7
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdj:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : cd1cbc05:552bedbd:bf8f7be8:960afcd1
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 36a0c84e - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 8
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdk:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : 4e352f48:398c4529:b39cd8c8:d5a14e7e
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
Checksum : 711be5ee - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 9
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdl:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : 01e6c661:a4d8c466:84fd830c:dc3ec346
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
Checksum : d452e0ec - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 10
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdm:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : aa22b86a:fb4effe6:8028a5ae:df01a2c2
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
Checksum : 7b7e81eb - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 11
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdn:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : 8e0f1a50:50538cf7:c7553f75:22af1e8a
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
Checksum : ff844db0 - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 12
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdo:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : ea496b92:ac96fabc:23b5026a:30b0b80f
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
Checksum : 81a12bd0 - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 13
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdp:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : 01173faa:f45adebc:9a1dc160:306641a2
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 229fdb9c - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 14
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdq:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : a7a6c77f:88c5d5d7:c330ab03:6cf98a83
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
Checksum : 97537c43 - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 15
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
________________________________________________________________________________
Mensagem enviada através do email grátis AEIOU
http://www.aeiou.pt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array
2014-07-02 9:32 strange problem with raid6 read errors on active non-degraded array Pedro Teixeira
@ 2014-07-02 9:52 ` Roman Mamedov
2014-07-02 10:07 ` Pedro Teixeira
2014-07-02 10:45 ` NeilBrown
1 sibling, 1 reply; 19+ messages in thread
From: Roman Mamedov @ 2014-07-02 9:52 UTC (permalink / raw)
To: Pedro Teixeira; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1531 bytes --]
On Wed, 02 Jul 2014 10:32:41 +0100
Pedro Teixeira <finas@aeiou.pt> wrote:
> - I'm having the following problem on a raid6 md volume consisting og
> 16 1TB Seagtes SSHD's. ( using kernel 3.15.3 or 3.14.0 ) mdadm is 3.3.
>
> - every time I run a fsck.ext4 I will get the exact same errors (
> ...short read ). Forcing a repair on the md0 volume shows no errors
> and completes without problems. All disks are active and the volume is
> not degraded, still I can't get rid of the short errors on those 16
> blocks and when the filesystem is mounted the read errors will come up
> from time to time as they are probably in use.
Are you sure that Ext4 in your kernel, and all tools that you use with it (such
as the fsck) really support 16 TB filesystems? I recall there have been some
semi-obvious problems with that. Try a different FS, e.g. XFS or Btrfs instead
of Ext4.
> - If I try to read those blocks with DD ( dd if=/dev/md0 of=test.txt
> seek=458227712 count=6 bs=4096 ) it will instantly create a 1.8T file
> but the file doesn't appear to have nothing on it ( and the file
> doesn't take the 1.8T on disk as the disk is much smaller )
> root@nas3:/# dd if=/dev/md0 of=teste.txt seek=458227712 count=6 bs=4096
> 6+0 records in
> 6+0 records out
> 24576 bytes (25 kB) copied, 0.0019239 s, 12.8 MB/s
> root@nas3:/# ls -lah teste.txt
> -rw-r--r-- 1 root root 1.8T Jul 2 10:22 teste.txt
Here you need to use skip=, not seek=. See "man dd".
--
With respect,
Roman
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array
2014-07-02 9:52 ` Roman Mamedov
@ 2014-07-02 10:07 ` Pedro Teixeira
2014-07-02 10:11 ` Roman Mamedov
0 siblings, 1 reply; 19+ messages in thread
From: Pedro Teixeira @ 2014-07-02 10:07 UTC (permalink / raw)
To: Roman Mamedov; +Cc: linux-raid
Hi Roman,
Thanks for the reply and the correction on the "dd" command.
- ext4 is in the kernel as the fs wouldn't mount otherwise and the
tools are the latest ones ( e2fsprogs 1.42.10 )
root@nas3:/# fsck.ext4 -V
e2fsck 1.42.10 (18-May-2014)
Using EXT2FS Library version 1.42.10, 18-May-2014
- Doing the correct "dd" command ( dd if=/dev/md0 of=teste.txt
skip=458227712 count=16 bs=4096 ) will net the same dmesg errors and a
0 bytes file.
dd: reading `/dev/md0': Input/output error
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000268007 s, 0.0 kB/s
[88623.524481] Buffer I/O error on device md0, logical block 458227712
- I'm sure this is not a filesystem problem, but something fishy
with dm. As all disks are active and synced if one would have bad
sectors dm should read the sector from another one, but aparently it
is not doing that.
Cheers
Pedro
________________________________________________________________________________
Mensagem enviada através do email grátis AEIOU
http://www.aeiou.pt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array
2014-07-02 10:07 ` Pedro Teixeira
@ 2014-07-02 10:11 ` Roman Mamedov
2014-07-02 10:37 ` Pedro Teixeira
2014-07-02 11:03 ` Pedro Teixeira
0 siblings, 2 replies; 19+ messages in thread
From: Roman Mamedov @ 2014-07-02 10:11 UTC (permalink / raw)
To: Pedro Teixeira; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 291 bytes --]
On Wed, 02 Jul 2014 11:07:13 +0100
Pedro Teixeira <finas@aeiou.pt> wrote:
> [88623.524481] Buffer I/O error on device md0, logical block 458227712
Ah sorry I have missed these messages quoted in the original mail. Then of
course, it is not an FS issue.
--
With respect,
Roman
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array
2014-07-02 10:11 ` Roman Mamedov
@ 2014-07-02 10:37 ` Pedro Teixeira
2014-07-02 11:03 ` Pedro Teixeira
1 sibling, 0 replies; 19+ messages in thread
From: Pedro Teixeira @ 2014-07-02 10:37 UTC (permalink / raw)
To: linux-raid
I also did a mdadm --examine-badblocks /dev/sd[bcdefghijklmnopqr] >>
raid.b and none of the bad blocks present on the disks are on the
range of the ones that are giving out the read error.
Also, is there a way to clear the badblocks list without destroying
the filesystem?
________________________________________________________________________________
Mensagem enviada através do email grátis AEIOU
http://www.aeiou.pt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array
2014-07-02 9:32 strange problem with raid6 read errors on active non-degraded array Pedro Teixeira
2014-07-02 9:52 ` Roman Mamedov
@ 2014-07-02 10:45 ` NeilBrown
2014-07-02 11:54 ` Pedro Teixeira
1 sibling, 1 reply; 19+ messages in thread
From: NeilBrown @ 2014-07-02 10:45 UTC (permalink / raw)
To: Pedro Teixeira; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 2948 bytes --]
On Wed, 02 Jul 2014 10:32:41 +0100 Pedro Teixeira <finas@aeiou.pt> wrote:
> - I'm having the following problem on a raid6 md volume consisting og
> 16 1TB Seagtes SSHD's. ( using kernel 3.15.3 or 3.14.0 ) mdadm is 3.3.
>
> - every time I run a fsck.ext4 I will get the exact same errors (
> ...short read ). Forcing a repair on the md0 volume shows no errors
> and completes without problems. All disks are active and the volume is
> not degraded, still I can't get rid of the short errors on those 16
> blocks and when the filesystem is mounted the read errors will come up
> from time to time as they are probably in use.
>
> - If I try to read those blocks with DD ( dd if=/dev/md0 of=test.txt
> seek=458227712 count=6 bs=4096 ) it will instantly create a 1.8T file
> but the file doesn't appear to have nothing on it ( and the file
> doesn't take the 1.8T on disk as the disk is much smaller )
>
> - this started happening after having a three disk failure. I
> recovered from that failure by recreating the array with the
> non-failed 13 disks plus the last failed one ( events didn't differ
> much ). I then readed the other disks. The failed disks are all
> physically good, tested them with hdat2 and they don't have read/write
> errors so I reused them. I don't know why they failed, maybe some
> incompatibility with SSHD's and the LSI HBA controller..
>
> root@nas3:/# dd if=/dev/md0 of=teste.txt seek=458227712 count=6 bs=4096
> 6+0 records in
> 6+0 records out
> 24576 bytes (25 kB) copied, 0.0019239 s, 12.8 MB/s
> root@nas3:/# ls -lah teste.txt
> -rw-r--r-- 1 root root 1.8T Jul 2 10:22 teste.txt
> root@nas3:/#
>
>
>
> root@nas3:/# cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid6 sde[0] sdq[15] sdp[14] sdo[17] sdn[19] sdm[16]
> sdl[18] sdk[9] sdj[8] sdi[7] sdh[6] sdg[5] sdf[4] sdb[3] sdd[2] sdc[1]
> 13672838144 blocks super 1.2 level 6, 512k chunk, algorithm 2
> [16/16] [UUUUUUUUUUUUUUUU]
>
> - When doing a fsck.ext4 of /dev/md0 it returns the following ( and I
> can do it over and over again with the exact same errors) :
>
> root@nas3:/# fsck.ext4 -f /dev/md0
> e2fsck 1.42.10 (18-May-2014)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Error reading block 458227712 (Attempt to read block from filesystem
> resulted in short read) while reading inode and block bitmaps. Ignore
> error<y>? yes
Can't possible happen!
(Do worry, I say that a lot - I'm usually wrong).
What sort of computer? Particularly is it 32bit or 64bit?
Try using 'dd' to read a few meg at various offsets (1G, 2G, 4G, 6G, 8G, ....)
and find out if there is a pattern, where it can read and where it cannot.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array
2014-07-02 10:11 ` Roman Mamedov
2014-07-02 10:37 ` Pedro Teixeira
@ 2014-07-02 11:03 ` Pedro Teixeira
1 sibling, 0 replies; 19+ messages in thread
From: Pedro Teixeira @ 2014-07-02 11:03 UTC (permalink / raw)
To: linux-raid
Hi Neil,
"
Can't possible happen!
(Do worry, I say that a lot - I'm usually wrong).
"
:)
- I'll simply do a dd if=/dev/md0 off=/dev/null and see what errors
show up. I will report back when if finishes.
- Debian squeeze x64, with custom 3.15.3 kernel and mdadm 3.3. The md
volume was created with mdadm 3.3 and kernel 3.13 or 3.14 I think.
Cheers
Pedro
________________________________________________________________________________
Mensagem enviada através do email grátis AEIOU
http://www.aeiou.pt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array
2014-07-02 10:45 ` NeilBrown
@ 2014-07-02 11:54 ` Pedro Teixeira
[not found] ` <20140702152429.742a3e8ea8bd100f5b3bae1f@bbaw.de>
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Pedro Teixeira @ 2014-07-02 11:54 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
cpu is a phenom x6, 8gb ram. controller is LSI 9201-i16. hdd's are
seagate sshd ST1000DX001.
So I run the "dd if=/dev/md0 of=/dev/null bs=4096" and it failed on
alot of places. I had to restart the command several times with the
skip parameter set to a couple of blocks after the last block error.
It run for about 1.5TB of the total 13TB of the volume.
The md volume didn't drop any drive when running this.
dmesg showed:
[ 1678.478156] Buffer I/O error on device md0, logical block 196012546
[ 1678.478314] Buffer I/O error on device md0, logical block 196012547
[ 1678.478462] Buffer I/O error on device md0, logical block 196012548
[ 1678.478737] Buffer I/O error on device md0, logical block 196012549
[ 1678.479077] Buffer I/O error on device md0, logical block 196012550
[ 1678.479415] Buffer I/O error on device md0, logical block 196012551
[ 1678.479754] Buffer I/O error on device md0, logical block 196012552
[ 1678.480082] Buffer I/O error on device md0, logical block 196012553
[ 1678.480679] Buffer I/O error on device md0, logical block 196012630
[ 1678.480811] Buffer I/O error on device md0, logical block 196012758
[ 2305.139382] quiet_error: 369 callbacks suppressed
[ 2305.139385] Buffer I/O error on device md0, logical block 196012759
[ 2310.592687] Buffer I/O error on device md0, logical block 196012760
[ 2313.135470] Buffer I/O error on device md0, logical block 196012761
[ 2315.971196] Buffer I/O error on device md0, logical block 196012762
[ 2319.013647] Buffer I/O error on device md0, logical block 196012763
[ 2321.125008] Buffer I/O error on device md0, logical block 196012764
[ 2323.774654] Buffer I/O error on device md0, logical block 196012765
[ 2327.439527] Buffer I/O error on device md0, logical block 196012766
[ 2329.399068] Buffer I/O error on device md0, logical block 196012767
[ 2331.389823] Buffer I/O error on device md0, logical block 196012768
[ 2334.166786] Buffer I/O error on device md0, logical block 196012769
[ 2337.817145] Buffer I/O error on device md0, logical block 196012770
[ 2340.713005] Buffer I/O error on device md0, logical block 196012771
[ 2342.594948] Buffer I/O error on device md0, logical block 196012772
[ 2344.678599] Buffer I/O error on device md0, logical block 196012773
[ 2347.150423] Buffer I/O error on device md0, logical block 196012774
[ 2349.433777] Buffer I/O error on device md0, logical block 196012775
[ 2351.559728] Buffer I/O error on device md0, logical block 196012776
[ 2353.650886] Buffer I/O error on device md0, logical block 196012777
[ 2385.719365] Buffer I/O error on device md0, logical block 196012778
[ 2388.937566] Buffer I/O error on device md0, logical block 196012779
[ 2391.831046] Buffer I/O error on device md0, logical block 196012780
[ 2393.971170] Buffer I/O error on device md0, logical block 196012781
[ 2396.118172] Buffer I/O error on device md0, logical block 196012782
[ 2399.717491] Buffer I/O error on device md0, logical block 196012783
[ 2401.913373] Buffer I/O error on device md0, logical block 196012784
[ 2403.892253] Buffer I/O error on device md0, logical block 196012785
[ 2405.796383] Buffer I/O error on device md0, logical block 196012786
[ 2408.171017] Buffer I/O error on device md0, logical block 196012787
[ 2410.233107] Buffer I/O error on device md0, logical block 196012788
[ 2413.184341] Buffer I/O error on device md0, logical block 196012789
[ 2416.396825] Buffer I/O error on device md0, logical block 196012790
[ 2420.734772] Buffer I/O error on device md0, logical block 196012890
[ 2426.320297] Buffer I/O error on device md0, logical block 196013570
[ 2426.320397] Buffer I/O error on device md0, logical block 196013571
[ 2426.320504] Buffer I/O error on device md0, logical block 196013572
[ 2426.320595] Buffer I/O error on device md0, logical block 196013573
[ 2426.320686] Buffer I/O error on device md0, logical block 196013574
[ 2426.320778] Buffer I/O error on device md0, logical block 196013575
[ 2426.320877] Buffer I/O error on device md0, logical block 196013576
[ 2426.321024] Buffer I/O error on device md0, logical block 196013577
[ 2426.321193] Buffer I/O error on device md0, logical block 196013578
[ 2436.240507] quiet_error: 119 callbacks suppressed
[ 2436.240509] Buffer I/O error on device md0, logical block 196012900
[ 2440.078873] Buffer I/O error on device md0, logical block 196012910
[ 2442.323624] Buffer I/O error on device md0, logical block 196012920
[ 2445.852897] Buffer I/O error on device md0, logical block 196013570
[ 2454.009848] Buffer I/O error on device md0, logical block 196013570
[ 2456.810436] Buffer I/O error on device md0, logical block 196013570
[ 2461.672818] Buffer I/O error on device md0, logical block 196014336
[ 2461.672901] Buffer I/O error on device md0, logical block 196014464
[ 2461.672985] Buffer I/O error on device md0, logical block 196014337
[ 2461.673109] Buffer I/O error on device md0, logical block 196014465
[ 2461.695280] Buffer I/O error on device md0, logical block 196014592
[ 2461.695371] Buffer I/O error on device md0, logical block 196014720
[ 2461.695458] Buffer I/O error on device md0, logical block 196014593
[ 2461.695548] Buffer I/O error on device md0, logical block 196014721
[ 2461.695633] Buffer I/O error on device md0, logical block 196014336
[ 2465.937036] Buffer I/O error on device md0, logical block 196125442
[ 2538.797979] quiet_error: 252 callbacks suppressed
[ 2538.797982] Buffer I/O error on device md0, logical block 217780096
[ 2538.798084] Buffer I/O error on device md0, logical block 217780097
[ 2538.798163] Buffer I/O error on device md0, logical block 217780098
[ 2538.798240] Buffer I/O error on device md0, logical block 217780099
[ 2538.798321] Buffer I/O error on device md0, logical block 217780100
[ 2538.798404] Buffer I/O error on device md0, logical block 217780101
[ 2538.798486] Buffer I/O error on device md0, logical block 217780102
[ 2538.798569] Buffer I/O error on device md0, logical block 217780103
[ 2538.798681] Buffer I/O error on device md0, logical block 217780104
[ 2538.798812] Buffer I/O error on device md0, logical block 217780105
[ 2582.229715] quiet_error: 607 callbacks suppressed
[ 2582.229717] Buffer I/O error on device md0, logical block 217780106
[ 2584.667289] Buffer I/O error on device md0, logical block 217780107
[ 2590.211304] Buffer I/O error on device md0, logical block 228358304
[ 2590.211388] Buffer I/O error on device md0, logical block 228358432
[ 2590.211467] Buffer I/O error on device md0, logical block 228358560
[ 2590.211555] Buffer I/O error on device md0, logical block 228358305
[ 2590.211628] Buffer I/O error on device md0, logical block 228358433
[ 2590.211712] Buffer I/O error on device md0, logical block 228358561
[ 2590.211792] Buffer I/O error on device md0, logical block 228358306
[ 2590.211871] Buffer I/O error on device md0, logical block 228358434
[ 2590.211945] Buffer I/O error on device md0, logical block 228358562
[ 2590.212025] Buffer I/O error on device md0, logical block 228358307
[ 2652.455446] quiet_error: 375 callbacks suppressed
[ 2652.455449] Buffer I/O error on device md0, logical block 260370751
[ 2652.455541] Buffer I/O error on device md0, logical block 260370752
[ 2652.455618] Buffer I/O error on device md0, logical block 260370753
[ 2652.455694] Buffer I/O error on device md0, logical block 260370754
[ 2652.455779] Buffer I/O error on device md0, logical block 260370755
[ 2652.455853] Buffer I/O error on device md0, logical block 260370756
[ 2652.455930] Buffer I/O error on device md0, logical block 260370757
[ 2652.456003] Buffer I/O error on device md0, logical block 260370758
[ 2652.456090] Buffer I/O error on device md0, logical block 260370759
[ 2652.456166] Buffer I/O error on device md0, logical block 260370760
[ 2695.663954] quiet_error: 56 callbacks suppressed
[ 2695.663957] Buffer I/O error on device md0, logical block 262508480
[ 2695.664039] Buffer I/O error on device md0, logical block 262508608
[ 2695.664113] Buffer I/O error on device md0, logical block 262508736
[ 2695.664197] Buffer I/O error on device md0, logical block 262508481
[ 2695.664264] Buffer I/O error on device md0, logical block 262508609
[ 2695.664344] Buffer I/O error on device md0, logical block 262508737
[ 2695.664417] Buffer I/O error on device md0, logical block 262508482
[ 2695.664489] Buffer I/O error on device md0, logical block 262508610
[ 2695.664557] Buffer I/O error on device md0, logical block 262508738
[ 2695.664632] Buffer I/O error on device md0, logical block 262508483
[ 2980.623591] quiet_error: 312 callbacks suppressed
[ 2980.623595] Buffer I/O error on device md0, logical block 370515910
[ 2980.623676] Buffer I/O error on device md0, logical block 370516038
[ 2980.623761] Buffer I/O error on device md0, logical block 370515911
[ 2980.623828] Buffer I/O error on device md0, logical block 370516039
[ 2980.623903] Buffer I/O error on device md0, logical block 370515912
[ 2980.623970] Buffer I/O error on device md0, logical block 370516040
[ 2980.624046] Buffer I/O error on device md0, logical block 370515913
[ 2980.624119] Buffer I/O error on device md0, logical block 370516041
[ 2980.624191] Buffer I/O error on device md0, logical block 370515914
[ 2980.624262] Buffer I/O error on device md0, logical block 370516042
[ 3005.209442] quiet_error: 281 callbacks suppressed
[ 3005.209444] Buffer I/O error on device md0, logical block 370516043
[ 3010.575774] Buffer I/O error on device md0, logical block 372582176
[ 3010.575854] Buffer I/O error on device md0, logical block 372582304
[ 3010.575927] Buffer I/O error on device md0, logical block 372582432
[ 3010.576004] Buffer I/O error on device md0, logical block 372582177
[ 3010.576082] Buffer I/O error on device md0, logical block 372582305
[ 3010.576147] Buffer I/O error on device md0, logical block 372582433
[ 3010.576232] Buffer I/O error on device md0, logical block 372582178
[ 3010.576298] Buffer I/O error on device md0, logical block 372582306
[ 3010.576361] Buffer I/O error on device md0, logical block 372582434
[ 3024.205000] quiet_error: 472 callbacks suppressed
[ 3024.205003] Buffer I/O error on device md0, logical block 375180000
[ 3024.205082] Buffer I/O error on device md0, logical block 375180128
[ 3024.205154] Buffer I/O error on device md0, logical block 375180256
[ 3024.205229] Buffer I/O error on device md0, logical block 375180001
[ 3024.205308] Buffer I/O error on device md0, logical block 375180129
[ 3024.205374] Buffer I/O error on device md0, logical block 375180257
[ 3024.205441] Buffer I/O error on device md0, logical block 375180002
[ 3024.205509] Buffer I/O error on device md0, logical block 375180130
[ 3024.205581] Buffer I/O error on device md0, logical block 375180258
[ 3024.205655] Buffer I/O error on device md0, logical block 375180003
[ 3182.726623] quiet_error: 183 callbacks suppressed
[ 3182.726626] Buffer I/O error on device md0, logical block 434495873
[ 3182.726708] Buffer I/O error on device md0, logical block 434495874
[ 3182.726787] Buffer I/O error on device md0, logical block 434495875
[ 3182.726857] Buffer I/O error on device md0, logical block 434495876
[ 3182.726927] Buffer I/O error on device md0, logical block 434495877
[ 3182.727036] Buffer I/O error on device md0, logical block 434495878
[ 3182.727129] Buffer I/O error on device md0, logical block 434495879
[ 3182.727210] Buffer I/O error on device md0, logical block 434495880
[ 3182.727292] Buffer I/O error on device md0, logical block 434495881
[ 3182.727374] Buffer I/O error on device md0, logical block 434495882
[ 3201.149784] quiet_error: 118 callbacks suppressed
[ 3201.149786] Buffer I/O error on device md0, logical block 434495883
[ 3243.707353] Buffer I/O error on device md0, logical block 458225568
[ 3243.707439] Buffer I/O error on device md0, logical block 458225569
[ 3243.707526] Buffer I/O error on device md0, logical block 458225570
[ 3243.707600] Buffer I/O error on device md0, logical block 458225571
[ 3243.707675] Buffer I/O error on device md0, logical block 458225572
[ 3243.707748] Buffer I/O error on device md0, logical block 458225573
[ 3243.707825] Buffer I/O error on device md0, logical block 458225574
[ 3243.707903] Buffer I/O error on device md0, logical block 458225575
[ 3243.707975] Buffer I/O error on device md0, logical block 458225576
[ 3410.602968] quiet_error: 139 callbacks suppressed
[ 3410.602971] Buffer I/O error on device md0, logical block 490875483
[ 3410.603049] Buffer I/O error on device md0, logical block 490875611
[ 3410.603126] Buffer I/O error on device md0, logical block 490875484
[ 3410.603204] Buffer I/O error on device md0, logical block 490875612
[ 3410.603279] Buffer I/O error on device md0, logical block 490875485
[ 3410.603349] Buffer I/O error on device md0, logical block 490875613
[ 3410.603424] Buffer I/O error on device md0, logical block 490875486
[ 3410.603509] Buffer I/O error on device md0, logical block 490875614
[ 3410.603592] Buffer I/O error on device md0, logical block 490875487
[ 3410.603663] Buffer I/O error on device md0, logical block 490875615
The command "mdadm --examine-badblocks /dev/sd[bcdefghijklmnopqr] >>
raid.b" before and after running the "dd" command returned no changes:
Bad-blocks on /dev/sdb:
112269328 for 512 sectors
112269840 for 512 sectors
112271376 for 512 sectors
112271888 for 512 sectors
112272400 for 512 sectors
112272912 for 512 sectors
112273424 for 512 sectors
112273936 for 512 sectors
112333840 for 512 sectors
112334352 for 512 sectors
112337680 for 128 sectors
130752768 for 512 sectors
130753280 for 512 sectors
130755840 for 512 sectors
130756352 for 512 sectors
130757120 for 384 sectors
149045752 for 512 sectors
149046264 for 512 sectors
212193536 for 512 sectors
212194048 for 512 sectors
248914952 for 512 sectors
248915464 for 512 sectors
262105344 for 512 sectors
262105856 for 512 sectors
273867480 for 512 sectors
273867992 for 512 sectors
Bad-blocks list is empty in /dev/sdc
Bad-blocks list is empty in /dev/sdd
Bad-blocks on /dev/sde:
114228480 for 512 sectors
114228992 for 512 sectors
Bad-blocks on /dev/sdf:
248545288 for 512 sectors
248545800 for 512 sectors
487421952 for 512 sectors
487422464 for 512 sectors
487422976 for 128 sectors
Bad-blocks list is empty in /dev/sdg
Bad-blocks on /dev/sdh:
280763096 for 512 sectors
280763608 for 512 sectors
Bad-blocks list is empty in /dev/sdi
Bad-blocks list is empty in /dev/sdj
Bad-blocks on /dev/sdk:
124707840 for 512 sectors
124708352 for 512 sectors
124708864 for 512 sectors
124709376 for 512 sectors
124712192 for 384 sectors
130771840 for 256 sectors
130803968 for 512 sectors
130804480 for 512 sectors
130808960 for 256 sectors
130852224 for 256 sectors
130852608 for 256 sectors
130853120 for 256 sectors
130859520 for 256 sectors
150267392 for 512 sectors
150267904 for 512 sectors
211985968 for 512 sectors
211986480 for 512 sectors
212037552 for 256 sectors
212051504 for 512 sectors
212052016 for 512 sectors
213166336 for 512 sectors
213166848 for 512 sectors
213167360 for 512 sectors
213167872 for 512 sectors
213177600 for 512 sectors
213178112 for 512 sectors
214650624 for 512 sectors
214651136 for 512 sectors
249476104 for 512 sectors
249476616 for 512 sectors
262317312 for 512 sectors
262317824 for 512 sectors
262318464 for 512 sectors
262318976 for 256 sectors
262321408 for 512 sectors
262321920 for 512 sectors
714478672 for 512 sectors
714479184 for 512 sectors
714754128 for 512 sectors
714754640 for 512 sectors
714755152 for 512 sectors
714755664 for 512 sectors
935584432 for 512 sectors
935584944 for 512 sectors
940173568 for 512 sectors
940174080 for 512 sectors
976792224 for 512 sectors
976792736 for 512 sectors
976793248 for 512 sectors
976793760 for 512 sectors
980668064 for 512 sectors
980668576 for 512 sectors
980669088 for 512 sectors
980669600 for 512 sectors
Bad-blocks on /dev/sdl:
112269328 for 512 sectors
112269840 for 512 sectors
112271376 for 512 sectors
112271376 for 512 sectors
112271888 for 512 sectors
112272400 for 512 sectors
112272912 for 512 sectors
112273424 for 512 sectors
112273936 for 512 sectors
112333840 for 512 sectors
112334352 for 512 sectors
112337680 for 128 sectors
114228480 for 512 sectors
114228992 for 512 sectors
124707840 for 512 sectors
124708352 for 512 sectors
124708864 for 512 sectors
124709376 for 512 sectors
124712192 for 384 sectors
130752768 for 512 sectors
130753280 for 512 sectors
130755840 for 512 sectors
130756352 for 512 sectors
130757120 for 384 sectors
130771840 for 256 sectors
130803968 for 512 sectors
130804480 for 512 sectors
130808960 for 256 sectors
130852224 for 256 sectors
130852608 for 256 sectors
130853120 for 256 sectors
130859520 for 256 sectors
149045752 for 512 sectors
149046264 for 512 sectors
150267392 for 512 sectors
150267904 for 512 sectors
211985968 for 512 sectors
211986480 for 512 sectors
211996592 for 128 sectors
212037552 for 256 sectors
212051504 for 512 sectors
212052016 for 512 sectors
212193536 for 512 sectors
212194048 for 512 sectors
213166336 for 512 sectors
213166848 for 512 sectors
213167360 for 512 sectors
213167872 for 512 sectors
213177600 for 512 sectors
213178112 for 512 sectors
214650624 for 512 sectors
214651136 for 512 sectors
248545288 for 512 sectors
248545800 for 512 sectors
248914952 for 512 sectors
248915464 for 512 sectors
249476104 for 512 sectors
249476616 for 512 sectors
262105344 for 512 sectors
262105856 for 512 sectors
262317312 for 512 sectors
262317824 for 512 sectors
262318464 for 512 sectors
262318976 for 256 sectors
262321408 for 512 sectors
262321920 for 512 sectors
273867480 for 512 sectors
273867992 for 512 sectors
280763096 for 512 sectors
280763608 for 512 sectors
487421952 for 512 sectors
487422464 for 512 sectors
487422976 for 128 sectors
714478672 for 512 sectors
714479184 for 512 sectors
714754128 for 512 sectors
714754640 for 512 sectors
714755152 for 512 sectors
714755664 for 512 sectors
935584432 for 512 sectors
935584944 for 512 sectors
940173568 for 512 sectors
940174080 for 512 sectors
976792224 for 512 sectors
976792736 for 512 sectors
976793248 for 512 sectors
976793760 for 512 sectors
980668064 for 512 sectors
980668576 for 512 sectors
980669088 for 512 sectors
980669600 for 512 sectors
Bad-blocks on /dev/sdm:
112269328 for 512 sectors
112269840 for 512 sectors
112271376 for 512 sectors
112271888 for 512 sectors
112272400 for 512 sectors
112272912 for 512 sectors
112273424 for 512 sectors
112273936 for 512 sectors
112333840 for 512 sectors
112334352 for 512 sectors
112337680 for 128 sectors
114228480 for 512 sectors
114228992 for 512 sectors
124707840 for 512 sectors
124708352 for 512 sectors
124708864 for 512 sectors
124709376 for 512 sectors
124712192 for 384 sectors
130752768 for 512 sectors
130753280 for 512 sectors
130755840 for 512 sectors
130756352 for 512 sectors
130757120 for 384 sectors
130771840 for 256 sectors
130803968 for 512 sectors
130804480 for 512 sectors
130808960 for 256 sectors
130852224 for 256 sectors
130852608 for 256 sectors
130853120 for 256 sectors
130859520 for 256 sectors
149045752 for 512 sectors
149046264 for 512 sectors
150267392 for 512 sectors
150267904 for 512 sectors
211985968 for 512 sectors
211986480 for 512 sectors
211996592 for 128 sectors
212037552 for 256 sectors
212051504 for 512 sectors
212052016 for 512 sectors
212193536 for 512 sectors
212194048 for 512 sectors
213166336 for 512 sectors
213166848 for 512 sectors
213167360 for 512 sectors
213167872 for 512 sectors
213177600 for 512 sectors
213178112 for 512 sectors
214650624 for 512 sectors
214651136 for 512 sectors
248545288 for 512 sectors
248545800 for 512 sectors
248914952 for 512 sectors
248915464 for 512 sectors
249476104 for 512 sectors
249476616 for 512 sectors
262105344 for 512 sectors
262105856 for 512 sectors
262317312 for 512 sectors
262317824 for 512 sectors
262318464 for 512 sectors
262318976 for 256 sectors
262321408 for 512 sectors
262321920 for 512 sectors
273867480 for 512 sectors
273867992 for 512 sectors
280763096 for 512 sectors
280763608 for 512 sectors
487421952 for 512 sectors
487422464 for 512 sectors
487422976 for 128 sectors
714478672 for 512 sectors
714479184 for 512 sectors
714754128 for 512 sectors
714754640 for 512 sectors
714755152 for 512 sectors
714755664 for 512 sectors
935584432 for 512 sectors
935584944 for 512 sectors
940173568 for 512 sectors
940174080 for 512 sectors
976792224 for 512 sectors
976792736 for 512 sectors
976793248 for 512 sectors
976793760 for 512 sectors
980668064 for 512 sectors
980668576 for 512 sectors
980669088 for 512 sectors
980669600 for 512 sectors
Bad-blocks on /dev/sdn:
112269328 for 512 sectors
112269840 for 512 sectors
112271376 for 512 sectors
112271888 for 512 sectors
112272400 for 512 sectors
112272912 for 512 sectors
112273424 for 512 sectors
112273936 for 512 sectors
112333840 for 512 sectors
112334352 for 512 sectors
112337680 for 128 sectors
114228480 for 512 sectors
114228992 for 512 sectors
124707840 for 512 sectors
124708352 for 512 sectors
124708864 for 512 sectors
124709376 for 512 sectors
124712192 for 384 sectors
130752768 for 512 sectors
130753280 for 512 sectors
130755840 for 512 sectors
130756352 for 512 sectors
130757120 for 384 sectors
130771840 for 256 sectors
130803968 for 512 sectors
130804480 for 512 sectors
130808960 for 256 sectors
130852224 for 256 sectors
130852608 for 256 sectors
130853120 for 256 sectors
130859520 for 256 sectors
149045752 for 512 sectors
149046264 for 512 sectors
150267392 for 512 sectors
150267904 for 512 sectors
211985968 for 512 sectors
211986480 for 512 sectors
211996592 for 128 sectors
212037552 for 256 sectors
212051504 for 512 sectors
212052016 for 512 sectors
212193536 for 512 sectors
212194048 for 512 sectors
213166336 for 512 sectors
213166848 for 512 sectors
213167360 for 512 sectors
213167872 for 512 sectors
213177600 for 512 sectors
213178112 for 512 sectors
214650624 for 512 sectors
214651136 for 512 sectors
248545288 for 512 sectors
248545800 for 512 sectors
248914952 for 512 sectors
248915464 for 512 sectors
249476104 for 512 sectors
249476616 for 512 sectors
262105344 for 512 sectors
262105856 for 512 sectors
262317312 for 512 sectors
262317824 for 512 sectors
262318464 for 512 sectors
262318976 for 256 sectors
262321408 for 512 sectors
262321920 for 512 sectors
273867480 for 512 sectors
273867992 for 512 sectors
280763096 for 512 sectors
280763608 for 512 sectors
487421952 for 512 sectors
487422464 for 512 sectors
487422976 for 128 sectors
714478672 for 512 sectors
714479184 for 512 sectors
714754128 for 512 sectors
714754640 for 512 sectors
714755152 for 512 sectors
714755664 for 512 sectors
935584432 for 512 sectors
935584944 for 512 sectors
940173568 for 512 sectors
940174080 for 512 sectors
976792224 for 512 sectors
976792736 for 512 sectors
976793248 for 512 sectors
976793760 for 512 sectors
980668064 for 512 sectors
980668576 for 512 sectors
980669088 for 512 sectors
980669600 for 512 sectors
Bad-blocks on /dev/sdo:
112269328 for 512 sectors
112269840 for 512 sectors
112271376 for 512 sectors
112271888 for 512 sectors
112272400 for 512 sectors
112272912 for 512 sectors
112273424 for 512 sectors
112273936 for 512 sectors
112333840 for 512 sectors
112334352 for 512 sectors
112337680 for 128 sectors
114228480 for 512 sectors
114228992 for 512 sectors
124707840 for 512 sectors
124708352 for 512 sectors
124708864 for 512 sectors
124709376 for 512 sectors
124712192 for 384 sectors
130752768 for 512 sectors
130753280 for 512 sectors
130755840 for 512 sectors
130756352 for 512 sectors
130757120 for 384 sectors
130771840 for 256 sectors
130803968 for 512 sectors
130804480 for 512 sectors
130808960 for 256 sectors
130852224 for 256 sectors
130852608 for 256 sectors
130853120 for 256 sectors
130859520 for 256 sectors
149045752 for 512 sectors
149046264 for 512 sectors
150267392 for 512 sectors
150267904 for 512 sectors
211985968 for 512 sectors
211986480 for 512 sectors
211996592 for 128 sectors
212037552 for 256 sectors
212051504 for 512 sectors
212052016 for 512 sectors
212193536 for 512 sectors
212194048 for 512 sectors
213166336 for 512 sectors
213166848 for 512 sectors
213167360 for 512 sectors
213167872 for 512 sectors
213177600 for 512 sectors
213178112 for 512 sectors
214650624 for 512 sectors
214651136 for 512 sectors
248545288 for 512 sectors
248545800 for 512 sectors
248914952 for 512 sectors
248915464 for 512 sectors
249476104 for 512 sectors
249476616 for 512 sectors
262105344 for 512 sectors
262105856 for 512 sectors
262317312 for 512 sectors
262317824 for 512 sectors
262318464 for 512 sectors
262318976 for 256 sectors
262321408 for 512 sectors
262321920 for 512 sectors
273867480 for 512 sectors
273867992 for 512 sectors
280763096 for 512 sectors
280763608 for 512 sectors
487421952 for 512 sectors
487422464 for 512 sectors
487422976 for 128 sectors
714478672 for 512 sectors
714479184 for 512 sectors
714754128 for 512 sectors
714754640 for 512 sectors
714755152 for 512 sectors
714755664 for 512 sectors
935584432 for 512 sectors
935584944 for 512 sectors
940173568 for 512 sectors
940174080 for 512 sectors
976792224 for 512 sectors
976792736 for 512 sectors
976793248 for 512 sectors
976793760 for 512 sectors
980668064 for 512 sectors
980668576 for 512 sectors
980669088 for 512 sectors
980669600 for 512 sectors
Bad-blocks list is empty in /dev/sdp
Bad-blocks on /dev/sdq:
211996592 for 128 sectors
________________________________________________________________________________
Mensagem enviada através do email grátis AEIOU
http://www.aeiou.pt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array
[not found] ` <20140702152429.742a3e8ea8bd100f5b3bae1f@bbaw.de>
@ 2014-07-02 14:14 ` Pedro Teixeira
2014-07-02 14:55 ` Lars Täuber
2014-07-02 16:35 ` Ethan Wilson
0 siblings, 2 replies; 19+ messages in thread
From: Pedro Teixeira @ 2014-07-02 14:14 UTC (permalink / raw)
To: Lars Täuber; +Cc: linux-raid
Hi Lars,
the output of those commands:
root@nas3:/# cat /sys/block/sdb/queue/physical_block_size
4096
root@nas3:/# cat /sys/block/md0/queue/physical_block_size
4096
root@nas3:/#
The strange thing here is that dmesg is not poluted with sata errors
like it is usual when a hard disk has bad sectors or some other
hardware problem. the only thing in dmesg that hints to why reading
the md volume fails are from dm itself.
Cheers
Pedro
Citando Lars Täuber
> Hi Pedro,
>
> maybe an issue with the logical/physical blocksize?
> What tell these commands:
>
> cat /sys/block/sdb/queue/physical_block_size
> cat /sys/block/md0/queue/physical_block_size
>
> Seagate says there are 4096 bytes/sector on this devices.
>
> Lars
________________________________________________________________________________
Mensagem enviada através do email grátis AEIOU
http://www.aeiou.pt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array
2014-07-02 14:14 ` Pedro Teixeira
@ 2014-07-02 14:55 ` Lars Täuber
2014-07-02 16:35 ` Ethan Wilson
1 sibling, 0 replies; 19+ messages in thread
From: Lars Täuber @ 2014-07-02 14:55 UTC (permalink / raw)
To: linux-raid
Hi Pedro,
Wed, 02 Jul 2014 15:14:06 +0100
Pedro Teixeira <finas@aeiou.pt> ==> Lars Täuber <taeuber@bbaw.de> :
> Hi Lars,
>
> the output of those commands:
>
> root@nas3:/# cat /sys/block/sdb/queue/physical_block_size
> 4096
> root@nas3:/# cat /sys/block/md0/queue/physical_block_size
> 4096
> root@nas3:/#
>
> The strange thing here is that dmesg is not poluted with sata errors
> like it is usual when a hard disk has bad sectors or some other
> hardware problem. the only thing in dmesg that hints to why reading
> the md volume fails are from dm itself.
maybe because the controller-drive combination doesn't fit.
Does the controller tell some errors?
The LSI 9201-i16 compatibility list doesn't mention any 4k SATA drive.
Only 3 4k-SAS drives (seagate though) are mentioned to be compatible.
Maybe that's the cause?
Good luck
Lars
> Cheers
> Pedro
>
>
> Citando Lars Täuber
> > Hi Pedro,
> >
> > maybe an issue with the logical/physical blocksize?
> > What tell these commands:
> >
> > cat /sys/block/sdb/queue/physical_block_size
> > cat /sys/block/md0/queue/physical_block_size
> >
> > Seagate says there are 4096 bytes/sector on this devices.
> >
> > Lars
>
>
>
> ________________________________________________________________________________
> Mensagem enviada através do email grátis AEIOU
> http://www.aeiou.pt
--
Informationstechnologie
Berlin-Brandenburgische Akademie der Wissenschaften
Jägerstraße 22-23 10117 Berlin
Tel.: +49 30 20370-352 http://www.bbaw.de
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array
2014-07-02 14:14 ` Pedro Teixeira
2014-07-02 14:55 ` Lars Täuber
@ 2014-07-02 16:35 ` Ethan Wilson
[not found] ` <20140702192825.Horde.18y4TPYRo99TtE9JC9kSzUA@webmail.aeiou.pt>
1 sibling, 1 reply; 19+ messages in thread
From: Ethan Wilson @ 2014-07-02 16:35 UTC (permalink / raw)
To: Pedro Teixeira, Lars Täuber; +Cc: linux-raid
You have multiple bad-blocks list (an MD feature) which are already full
of sectors. Those are earlier disk errors which were stored on MD
headers (one list per drive).
MD will not try to read from such sectors anymore, and during reads MD
will return error to the upper layers immediately. This is if the stripe
does not have enough good components to read after excluding the bad
blocks, e.g. raid5 is able to tolerate up to 1 disk with badblocks in a
stripe, so with 2 badblocks in 2 different disks in the same stripes MD
will return a read error immediately and without trying.
That's why in dmesg you are seeing read errors from MD but not from the
component devices.
Now the question is how could so many badblocks be recorded on your array.
It seems very unlikely that so many disks of your array are in such bad
shape . This might indicate an MD bug in the badblocks code.
I am thinking some form of erroneous propagation of bad blocks, so that
e.g. writing to an area where an MD badblock exists, instead of clearing
the bad block could have propagated the badblock to the other disks in
the same stripe. Something like that.
See if you can check that writing to a bad block clears it. It will be
difficult to compute the correct offset to write to, though. You might
want to do some trials-and-errors with dd together with blktrace. If you
can do that, you might want to check that it behaves correctly even when
writing something that does not align to 512b or 4k . Obviously this
test is desctructive wrt your data in that location.
Another easier test is if to try to read with dd from a component device
itself. If MD has recorded (even if happened long time in the past) a
bad block there, the direct read with dd should also hit it, return
error and stop, because badblocks in the surface of disks do not heal by
themselves with time.
Another test is to read from md0 with dd from an area where you see that
only 1 disk has badblocks (probably requires some trial and error with
blktrace because the offsets of md0 are not equal to the offsets of the
component devices) . If MD works correctly, with such read it should
"heal" the badblock: compute from parity from the other disks, then
write over the badblock. The MD badblock should disappear.
The last 2 tests I described should not be destructive except in case of
MD bugs.
EW
On 02/07/2014 16:14, Pedro Teixeira wrote:
> Hi Lars,
>
> the output of those commands:
>
> root@nas3:/# cat /sys/block/sdb/queue/physical_block_size
> 4096
> root@nas3:/# cat /sys/block/md0/queue/physical_block_size
> 4096
> root@nas3:/#
>
> The strange thing here is that dmesg is not poluted with sata errors
> like it is usual when a hard disk has bad sectors or some other
> hardware problem. the only thing in dmesg that hints to why reading
> the md volume fails are from dm itself.
>
> Cheers
> Pedro
>
>
> Citando Lars Täuber
>> Hi Pedro,
>>
>> maybe an issue with the logical/physical blocksize?
>> What tell these commands:
>>
>> cat /sys/block/sdb/queue/physical_block_size
>> cat /sys/block/md0/queue/physical_block_size
>>
>> Seagate says there are 4096 bytes/sector on this devices.
>>
>> Lars
>
>
>
> ________________________________________________________________________________
>
> Mensagem enviada através do email grátis AEIOU
> http://www.aeiou.pt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array
2014-07-02 11:54 ` Pedro Teixeira
[not found] ` <20140702152429.742a3e8ea8bd100f5b3bae1f@bbaw.de>
@ 2014-07-02 16:43 ` John Stoffel
[not found] ` <20140702193706.Horde.Q4yuGvYRo99TtFFSw8qw6-A@webmail.aeiou.pt>
2014-07-03 2:40 ` NeilBrown
2 siblings, 1 reply; 19+ messages in thread
From: John Stoffel @ 2014-07-02 16:43 UTC (permalink / raw)
To: Pedro Teixeira; +Cc: NeilBrown, linux-raid
>>>>> "Pedro" == Pedro Teixeira <finas@aeiou.pt> writes:
Pedro> cpu is a phenom x6, 8gb ram. controller is LSI 9201-i16. hdd's are
Pedro> seagate sshd ST1000DX001.
Pedro> So I run the "dd if=/dev/md0 of=/dev/null bs=4096" and it failed on
Pedro> alot of places. I had to restart the command several times with the
Pedro> skip parameter set to a couple of blocks after the last block error.
Pedro> It run for about 1.5TB of the total 13TB of the volume.
Pedro> The md volume didn't drop any drive when running this.
Can you destroy the filesystem and re-create the RAID6 from scratch by
any chance? Or can you maybe create a smaller array with only 6
devices to run some tests?
Can you provide more details on your ext4 filesystem using tune2fs?
Have you tried using XFS instead? Does the filesystem have a logfile
or not? And does a full fsck run to completion?
Have you checked all the cables? Do you have RAID firmware on the LSI
card by any chance, or are they setup as JBOD? Could you have a too
small a power supply so you're seeing corruption on the system due to
low voltage on one of the 5V or 12V rails? Can you try powering half
the disks from another power supply as a test?
Do you have a graphics card in the system? If so, can you pull it and
run it headless, or maybe put in a less power hungry card?
John
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array
[not found] ` <20140702193706.Horde.Q4yuGvYRo99TtFFSw8qw6-A@webmail.aeiou.pt>
@ 2014-07-02 18:41 ` Pedro Teixeira
2014-07-02 19:01 ` John Stoffel
1 sibling, 0 replies; 19+ messages in thread
From: Pedro Teixeira @ 2014-07-02 18:41 UTC (permalink / raw)
To: John Stoffel, NeilBrown, linux-raid
Hi John,
I can't destroy the fs at the moment.
The problem is not filesystem related as md throws an error when
reading with dd when the filesystem is not mounted.
The controler is flashed with the latest P19 firmware IT mode, meaning
that disks are "passed-though". No raid or jbod. Power supply has a
singe 12v rail and total output of 800w. Graphics card is a pcie 1x
nvidia card.
I have a very similar machine, that has the same case, the same power
supply, the same LSI controller in the same mode with the same
firmware, same OS, same kernel. Diferences are the motherboard Z87
chipset and i7 cpu, and the hard disks are 16x 4TB seagate HDD's in
raid6 created the exact same way as this one with mdadm 3.3. I have no
problems with it.
Cheers
Pedro
________________________________________________________________________________
Mensagem enviada através do email grátis AEIOU
http://www.aeiou.pt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array
[not found] ` <20140702193706.Horde.Q4yuGvYRo99TtFFSw8qw6-A@webmail.aeiou.pt>
2014-07-02 18:41 ` Pedro Teixeira
@ 2014-07-02 19:01 ` John Stoffel
1 sibling, 0 replies; 19+ messages in thread
From: John Stoffel @ 2014-07-02 19:01 UTC (permalink / raw)
To: Pedro Teixeira; +Cc: John Stoffel, NeilBrown, linux-raid
Pedro> I can't destroy the fs at the moment. The problem is not
Pedro> filesystem related as md throws an error when reading with dd
Pedro> when the filesystem is not mounted.
I hope you have backups of all this data, because I stongly suspect
you've run into either an MD coding problem, or you have the data
structures so confused that MD really needs to be re-built from
scratch.
Pedro> The controler is flashed with the latest P19 firmware IT mode,
Pedro> meaning that disks are "passed-though". No raid or jbod.
JBOD means Just a Bunch Of Disks, which is what you have, so good.
Pedro> Power supply has a singe 12v rail and total output of 800w.
Should be ok then.
Pedro> Graphics card is a pcie 1x nvidia card. I have a very similar
Pedro> machine, that has the same case, the same power supply, the
Pedro> same LSI controller in the same mode with the same firmware,
Pedro> same OS, same kernel. Diferences are the motherboard Z87
Pedro> chipset and i7 cpu, and the hard disks are 16x 4TB seagate
Pedro> HDD's in raid6 created the exact same way as this one. I have
Pedro> no problems with it.
Hmm... so how did the system crash and lose the disk(s) in the fist
place? Did the cables get knocked? Are they in a disk cage or hot
swap bays? Why kinds of physical cabling are you using here?
The suggestion to use blktrace to examine how IO flows into the MD
device and then down into the various devices is a good one, but I
don't have any good suggestions on what to do here.
But in any case, I'll repeat this now. Backup your data, and
basically assume some of it is toast and needs to be restored or
re-created if at all possible. With all the errors you're showing,
there's bound to be major filesystem corruption and even undetected
corruption in some files on there. Not a good place to be.
Too bad you can't just copy the data off to the other machine with the
16 x 4Tb disks. That would give you a good chance to save your data.
John
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array
[not found] ` <20140702192825.Horde.18y4TPYRo99TtE9JC9kSzUA@webmail.aeiou.pt>
@ 2014-07-02 21:34 ` Ethan Wilson
0 siblings, 0 replies; 19+ messages in thread
From: Ethan Wilson @ 2014-07-02 21:34 UTC (permalink / raw)
To: Pedro Teixeira; +Cc: Lars Täuber, linux-raid
On 02/07/2014 20:28, Pedro Teixeira wrote:
>
> Hi Ethan,
>
> The thing here is that some of the bad blocks ( if not all ) that are
> giving read errors are not on the bad blocks list.
>
Are you sure? Please note that the offset is a complex topic because an
offset given by fsck will be a sector offset in the md0 sense, while the
device badblock list contains offset in the device sense, which means
that to convert one onto the other you have to divide, or multiply, by
the number of data disks, approximately, and handle the remainder
manually also considering the problem of the rotating parity. Not
simple. Is this the computation that you did?
> Specifically, the ones that show up when doing a fsck are not on any
> drive. For these sectors fsck tries to re-write then and md still
> throws an error but they are not added to the list.
>
Not "added" but "removed". Writing to a bad block should create valid
content so they should be removed from the list. If they don't then
indeed there is probably a bug in the MD code, see my previous post.
> I replaced sdm with a new disk. this was one that had a bunch or bad
> blocks reported by md, and after finishing the rebuild ( with no
> errors at all ) the --examine-badblocks still gives me the exact same
> list of errors. I would expect that replacing the disk by a new one
> would clear the errors.
>
This is the correct behaviour by design.
Source disks did not have valid content in those positions, so good data
cannot be created from nothing. Badblocks will be replicated onto the
new disk.
"Bad" here is more a synonym of "containing invalid data", not really
"unreadable surface".
> as I know the disks are good, is there any way of reseting the bad
> blocks list without destroying the filesystem?
>
This one I don't know but doing that would probably not help to find the
bug.
Regads
EW
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array
2014-07-02 11:54 ` Pedro Teixeira
[not found] ` <20140702152429.742a3e8ea8bd100f5b3bae1f@bbaw.de>
2014-07-02 16:43 ` John Stoffel
@ 2014-07-03 2:40 ` NeilBrown
2014-07-03 8:29 ` Pedro Teixeira
` (2 more replies)
2 siblings, 3 replies; 19+ messages in thread
From: NeilBrown @ 2014-07-03 2:40 UTC (permalink / raw)
To: Pedro Teixeira; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 3599 bytes --]
On Wed, 02 Jul 2014 12:54:34 +0100 Pedro Teixeira <finas@aeiou.pt> wrote:
> cpu is a phenom x6, 8gb ram. controller is LSI 9201-i16. hdd's are
> seagate sshd ST1000DX001.
>
> So I run the "dd if=/dev/md0 of=/dev/null bs=4096" and it failed on
> alot of places. I had to restart the command several times with the
> skip parameter set to a couple of blocks after the last block error.
> It run for about 1.5TB of the total 13TB of the volume.
> The md volume didn't drop any drive when running this.
>
> dmesg showed:
>
> [ 1678.478156] Buffer I/O error on device md0, logical block 196012546
I love numbers, thanks.
The logical block size is 4096, or 8 sectors (1 sector is defined as 512
bytes), so this is at
196012546*8 == 1568100368 sectors into the array.
The array has a chunksize of 512K, or 1024 sectors so
196012546*8/1024 = 1531348.015625
gives us the chunk number, and the remaining fraction of a chunk.
The RAID6 has 16 devices, so there are 14 data chunks in each stripe, so to
find where the above chunk is stored we divide by 14
1531348/14 = 109382.0000
So that is chunk 109382 on the first device (though with rotating data,
it might not be the very first).
Add back in the factional part, multiple by 1024 sectors per chunk, and add
the Data Offset,
109382.01562500*1024+262144 = 112269328
So it seems that sector 112269328 on some device is bad.
> The command "mdadm --examine-badblocks /dev/sd[bcdefghijklmnopqr] >>
> raid.b" before and after running the "dd" command returned no changes:
>
I didn't notice the fact that the bad block logs were not empty before, sorry.
Anyway:...
>
> Bad-blocks on /dev/sdb:
> 112269328 for 512 sectors
Look at that - exactly the number I calculated. I love it when that works
out.
So the problem is exactly that some blocks are thought by md to be bad.
Blocks get recorded as bad (for raid6) when:
- a 'read' reported an error which could not be fixed, either
because the array was degraded so the data could not be recovered,
or because the attempt to write restored data failed
- when recovering a spare, if the data to be written cannot be found (due to
errors on other devices)
- when a 'write' request to a device fails
When your array had three failed devices, some reads and writes would have
failed. Maybe that caused the bad blocks to be recorded.
What sort of devices failures where they? If the device became completely
inaccessible, then it would not have been possible to record the bad block
information.
Can you describe the sequence of events that lead to the three failures?
When you put the array back together, did you --create it, or --assemble
--force?
There isn't an easy way to remove the bad block list, as doing so is normally
asking for data corruption.
However it is probably justified in your case.
As it happens I included code in the kernel to make it possible to remove bad
blocks from the list - it was intended for testing only but I never removed
it.
If you run
sed 's/^/-/' /sys/block/md0/md/dev-sdq/bad_blocks |
while read; do
echo $a > /sys/block/md0/md/dev-sdq/bad_blocks
done
then it should clear all of the bad blocks recorded on sdq.
You should probably fail/remove the last two devices that you added to the
array before you do this, as they probably don't have properly uptodate
information and doing this will cause corruption.
I probably need to think about better ways to handle the bad block lists.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array
2014-07-03 2:40 ` NeilBrown
@ 2014-07-03 8:29 ` Pedro Teixeira
2014-07-03 10:39 ` Pedro Teixeira
2014-07-03 21:06 ` Pedro Teixeira
2 siblings, 0 replies; 19+ messages in thread
From: Pedro Teixeira @ 2014-07-03 8:29 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
Hi Neil,
Thanks for the very informative answer, that nailed it, and Ethan
was obviously onto it too!
I tried running the commands you posted and it gives me an error:
bb.sh
"
sed 's/^/-/' /sys/block/md0/md/dev-sdq/bad_blocks
while read; do
echo $a > /sys/block/md0/md/dev-sdq/bad_blocks
done
"
root@nas3:~# ./bb.sh
-211996592 128
./bb.sh: line 3: echo: write error: Invalid argument
"
Can you help me with this?
I will clear all the bad blocks on all the drives and force a
repair and see if some error shows up. If not, I will then fsck the
filesystem.
I'm not sure how the volume failed. On one friday morning ( past
month ) I checked the system and everything was ok ( no dmesg errors
and mdastat repoted all disks up ). next monday I got a call telling
me that the volume was inacessible. When I got back the next thursday,
the machine had already been rebooted and the md0 volume had three
failed disks. I did a --examine and two of them were completly off in
terms of events regarding the non-failed disks. the other one was much
more close, but still a bit off. Not close enough to do a --assemble
--force, so I recreated the array with something like this:
"mdadm --create --assume-clean --level=6 --raid-devices=16
--name=nas3:Datastore --uuid=9e97c588:59135324:c7d3fdf6:e543bdc3
/dev/md0 /dev/sde /dev/sdc /dev/sdd /dev/sdb /dev/sdf /dev/sdg
/dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl missing /dev/sdn missing
/dev/sdp /dev/sdq". I think the last failed drive was sdl or sdn,
can't remember.
then I cleared the superblocks on the missing disks and readded them.
Then I fsck'd the filesystem and I started getting those errors. I
since then replaced them with new disks and tested the old ones only
to find that they have no smart errors reported ( smart is enabled in
bios ) and I also did a read-write test to them and I found them to be
ok.
I will rebuild this machine next weekend, or the one after that, to
try to sort out some hardware problem or issues with the cabling, but
I am inclined to say that maybe it's related to the sshd's.
Cheers
Pedro
Citando NeilBrown <neilb@suse.de>:
> On Wed, 02 Jul 2014 12:54:34 +0100 Pedro Teixeira <finas@aeiou.pt> wrote:
>> cpu is a phenom x6, 8gb ram. controller is LSI 9201-i16. hdd's are
>> seagate sshd ST1000DX001.
>>
>> So I run the "dd if=/dev/md0 of=/dev/null bs=4096" and it failed on
>> alot of places. I had to restart the command several times with the
>> skip parameter set to a couple of blocks after the last block error.
>> It run for about 1.5TB of the total 13TB of the volume.
>> The md volume didn't drop any drive when running this.
>>
>> dmesg showed:
>>
>> [ 1678.478156] Buffer I/O error on device md0, logical block 196012546
> I love numbers, thanks.
> The logical block size is 4096, or 8 sectors (1 sector is defined as 512
> bytes), so this is at
> 196012546*8 == 1568100368 sectors into the array.
>
> The array has a chunksize of 512K, or 1024 sectors so
> 196012546*8/1024 = 1531348.015625
>
> gives us the chunk number, and the remaining fraction of a chunk.
>
> The RAID6 has 16 devices, so there are 14 data chunks in each stripe, so to
> find where the above chunk is stored we divide by 14
>
> 1531348/14 = 109382.0000
>
> So that is chunk 109382 on the first device (though with rotating data,
> it might not be the very first).
>
> Add back in the factional part, multiple by 1024 sectors per chunk, and add
> the Data Offset,
>
> 109382.01562500*1024+262144 = 112269328
>
> So it seems that sector 112269328 on some device is bad.
>> The command "mdadm --examine-badblocks /dev/sd[bcdefghijklmnopqr] >>
>> raid.b" before and after running the "dd" command returned no changes:
> I didn't notice the fact that the bad block logs were not empty
> before, sorry.
> Anyway:... > Bad-blocks on /dev/sdb:
>> 112269328 for 512 sectors
> Look at that - exactly the number I calculated. I love it when that works
> out.
>
> So the problem is exactly that some blocks are thought by md to be bad.
>
>
> Blocks get recorded as bad (for raid6) when:
>
> - a 'read' reported an error which could not be fixed, either
> because the array was degraded so the data could not be recovered,
> or because the attempt to write restored data failed
> - when recovering a spare, if the data to be written cannot be
> found (due to
> errors on other devices)
> - when a 'write' request to a device fails
>
> When your array had three failed devices, some reads and writes would have
> failed. Maybe that caused the bad blocks to be recorded.
> What sort of devices failures where they? If the device became completely
> inaccessible, then it would not have been possible to record the bad block
> information.
>
> Can you describe the sequence of events that lead to the three failures?
> When you put the array back together, did you --create it, or --assemble
> --force?
>
> There isn't an easy way to remove the bad block list, as doing so
> is normally
> asking for data corruption.
> However it is probably justified in your case.
> As it happens I included code in the kernel to make it possible to
> remove bad
> blocks from the list - it was intended for testing only but I never removed
> it.
> If you run
> sed 's/^/-/' /sys/block/md0/md/dev-sdq/bad_blocks |
> while read; do
> echo $a > /sys/block/md0/md/dev-sdq/bad_blocks
> done
>
> then it should clear all of the bad blocks recorded on sdq.
> You should probably fail/remove the last two devices that you added to the
> array before you do this, as they probably don't have properly uptodate
> information and doing this will cause corruption.
>
> I probably need to think about better ways to handle the bad block lists.
> NeilBrown
________________________________________________________________________________
Mensagem enviada através do email grátis AEIOU
http://www.aeiou.pt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array
2014-07-03 2:40 ` NeilBrown
2014-07-03 8:29 ` Pedro Teixeira
@ 2014-07-03 10:39 ` Pedro Teixeira
2014-07-03 21:06 ` Pedro Teixeira
2 siblings, 0 replies; 19+ messages in thread
From: Pedro Teixeira @ 2014-07-03 10:39 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
I ended up understanding the command but if I run it manually it
doesn't work. bad_block is cleared but the --examine-badblocks stills
shows it. and after stopping/assembling the md volume the bad block
shows up again.
root@nas3:~# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
root@nas3:~# mdadm --assemble /dev/md0
mdadm: failed to get exclusive lock on mapfile - continue anyway...
mdadm: /dev/md0 has been started with 16 drives.
root@nas3:~# cat /sys/block/md0/md/dev-sdq/bad_blocks
211996592 128
root@nas3:~# echo "-211996592 128" > /sys/block/md0/md/dev-sdq/bad_blocks
root@nas3:~# cat /sys/block/md0/md/dev-sdq/bad_blocks
root@nas3:~# mdadm --examine-badblocks /dev/sdq
Bad-blocks on /dev/sdq:
211996592 for 128 sectors
root@nas3:~#
so "cat /sys/block/md0/md/dev-sdq/bad_blocks" shows now bad blocks,
but the --examine-badblocks still lists it.
________________________________________________________________________________
Mensagem enviada através do email grátis AEIOU
http://www.aeiou.pt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array
2014-07-03 2:40 ` NeilBrown
2014-07-03 8:29 ` Pedro Teixeira
2014-07-03 10:39 ` Pedro Teixeira
@ 2014-07-03 21:06 ` Pedro Teixeira
2 siblings, 0 replies; 19+ messages in thread
From: Pedro Teixeira @ 2014-07-03 21:06 UTC (permalink / raw)
To: linux-raid
I was able to fix the volume and the filesystem!
- the command Neil posted didn't work but I got the idea and made a
script that cleared the list for all disks. The --examine-bad-blocks
still lists the bad blocks, and stopping and assembling the volume
again will populate the bad block list again. Still, I cleared them
all again and issued a "repair" on the volume. I got a bunch of errors
from a couple of disks, mostly sdk and sdb but the volume synced till
the end, and after stopping it and assembling it again, no bad blocks
in any disk, and --examine-bad-blocks also showed no bad blocks. I
have since replaced sdk and sdb, with no errors when syncing and no
errors on dmesg. After that I fsck'd the filesystem, and it's up and
running again. I will now replace the other two disks that exibited
read errors when repairing the volume as soon as I get some
replacements.
Thanks all for the help!!!
As a sugestion, I would make md distinguish a read error that is
caused by no good strip available due to bad block list from other
read errors to ease troubleshooting and maybe implement a way to clear
bad block list from disks with mdadm ( and maybe forcing a resync of
that strip after the list is cleared ).
Cheers
Pedro
________________________________________________________________________________
Mensagem enviada através do email grátis AEIOU
http://www.aeiou.pt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2014-07-03 21:06 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-02 9:32 strange problem with raid6 read errors on active non-degraded array Pedro Teixeira
2014-07-02 9:52 ` Roman Mamedov
2014-07-02 10:07 ` Pedro Teixeira
2014-07-02 10:11 ` Roman Mamedov
2014-07-02 10:37 ` Pedro Teixeira
2014-07-02 11:03 ` Pedro Teixeira
2014-07-02 10:45 ` NeilBrown
2014-07-02 11:54 ` Pedro Teixeira
[not found] ` <20140702152429.742a3e8ea8bd100f5b3bae1f@bbaw.de>
2014-07-02 14:14 ` Pedro Teixeira
2014-07-02 14:55 ` Lars Täuber
2014-07-02 16:35 ` Ethan Wilson
[not found] ` <20140702192825.Horde.18y4TPYRo99TtE9JC9kSzUA@webmail.aeiou.pt>
2014-07-02 21:34 ` Ethan Wilson
2014-07-02 16:43 ` John Stoffel
[not found] ` <20140702193706.Horde.Q4yuGvYRo99TtFFSw8qw6-A@webmail.aeiou.pt>
2014-07-02 18:41 ` Pedro Teixeira
2014-07-02 19:01 ` John Stoffel
2014-07-03 2:40 ` NeilBrown
2014-07-03 8:29 ` Pedro Teixeira
2014-07-03 10:39 ` Pedro Teixeira
2014-07-03 21:06 ` Pedro Teixeira
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.