From mboxrd@z Thu Jan 1 00:00:00 1970 From: Piergiorgio Sartor Subject: Re: Does a "check" of a RAID6 actually read all disks in a stripe? Date: Tue, 28 Apr 2020 17:40:13 +0200 Message-ID: <20200428154013.GA4633@lazy.lzy> References: <18271293-9866-1381-d73e-e351bf9278fd@fnarfbargle.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <18271293-9866-1381-d73e-e351bf9278fd@fnarfbargle.com> Sender: linux-raid-owner@vger.kernel.org To: Brad Campbell Cc: linux-raid@vger.kernel.org List-Id: linux-raid.ids On Tue, Apr 28, 2020 at 02:47:04PM +0800, Brad Campbell wrote: > G'day all, > > I have a test server with some old disks I use for beating up on. Bear in mind the disks are old and dicey which is *why* they live in a test server. I'm not after reliability, I'm more interested in finding corner cases. > > One disk has a persistent read error (pending sector). This can be identified easily with dd on a specific or whole disk basis. > > The array has 9 2TB drives in a RAID6 : > > md3 : active raid6 sdh[12] sdm[8] sdc[10] sde[6] sdj[9] sdk[4] sdl[11] sdg[13] > 13673684416 blocks super 1.2 level 6, 64k chunk, algorithm 2 [9/8] [UU_UUUUUU] > bitmap: 0/15 pages [0KB], 65536KB chunk > > Ignore the missing disk, it's out right now being secure erased, but it was in for the tests. > > The read error is on sdj, about 23G into the disk : > > [Sun Apr 26 15:05:30 2020] sd 4:0:4:0: [sdj] tag#229 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 > [Sun Apr 26 15:05:30 2020] sd 4:0:4:0: [sdj] tag#229 Sense Key : 0x3 [current] > [Sun Apr 26 15:05:30 2020] sd 4:0:4:0: [sdj] tag#229 ASC=0x11 ASCQ=0x0 > [Sun Apr 26 15:05:30 2020] sd 4:0:4:0: [sdj] tag#229 CDB: opcode=0x28 28 00 03 39 d8 08 00 20 00 00 > [Sun Apr 26 15:05:30 2020] blk_update_request: critical medium error, dev sdj, sector 54126096 op 0x0:(READ) flags 0x80700 phys_seg 37 prio class 0 > > Trigger a "check" : > [Mon Apr 27 18:51:15 2020] md: data-check of RAID array md3 > [Tue Apr 28 03:42:21 2020] md: md3: data-check done. > > Just to be sure it's still there : > [Tue Apr 28 14:13:33 2020] sd 4:0:4:0: [sdj] tag#100 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 > [Tue Apr 28 14:13:33 2020] sd 4:0:4:0: [sdj] tag#100 Sense Key : 0x3 [current] > [Tue Apr 28 14:13:33 2020] sd 4:0:4:0: [sdj] tag#100 ASC=0x11 ASCQ=0x0 > [Tue Apr 28 14:13:33 2020] sd 4:0:4:0: [sdj] tag#100 CDB: opcode=0x28 28 00 03 39 e6 10 00 00 08 00 > [Tue Apr 28 14:13:33 2020] blk_update_request: critical medium error, dev sdj, sector 54126096 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 > > So I can read from the disk with dd and trigger a read error each and every time, but a RAID6 "check" appears to skip over it without triggering the read error. > > For completeness, the complete log : > [Sun Apr 26 15:05:30 2020] sd 4:0:4:0: [sdj] tag#229 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 > [Sun Apr 26 15:05:30 2020] sd 4:0:4:0: [sdj] tag#229 Sense Key : 0x3 [current] > [Sun Apr 26 15:05:30 2020] sd 4:0:4:0: [sdj] tag#229 ASC=0x11 ASCQ=0x0 > [Sun Apr 26 15:05:30 2020] sd 4:0:4:0: [sdj] tag#229 CDB: opcode=0x28 28 00 03 39 d8 08 00 20 00 00 > [Sun Apr 26 15:05:30 2020] blk_update_request: critical medium error, dev sdj, sector 54126096 op 0x0:(READ) flags 0x80700 phys_seg 37 prio class 0 > [Sun Apr 26 21:15:47 2020] sdd: sdd1 sdd2 > [Mon Apr 27 18:51:15 2020] md: data-check of RAID array md3 > [Tue Apr 28 03:42:21 2020] md: md3: data-check done. > [Tue Apr 28 09:39:18 2020] md/raid:md3: Disk failure on sdi, disabling device. > md/raid:md3: Operation continuing on 8 devices. > [Tue Apr 28 14:13:33 2020] sd 4:0:4:0: [sdj] tag#100 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 > [Tue Apr 28 14:13:33 2020] sd 4:0:4:0: [sdj] tag#100 Sense Key : 0x3 [current] > [Tue Apr 28 14:13:33 2020] sd 4:0:4:0: [sdj] tag#100 ASC=0x11 ASCQ=0x0 > [Tue Apr 28 14:13:33 2020] sd 4:0:4:0: [sdj] tag#100 CDB: opcode=0x28 28 00 03 39 e6 10 00 00 08 00 > [Tue Apr 28 14:13:33 2020] blk_update_request: critical medium error, dev sdj, sector 54126096 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 > [Tue Apr 28 14:13:35 2020] sd 4:0:4:0: [sdj] tag#112 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 > [Tue Apr 28 14:13:35 2020] sd 4:0:4:0: [sdj] tag#112 Sense Key : 0x3 [current] > [Tue Apr 28 14:13:35 2020] sd 4:0:4:0: [sdj] tag#112 ASC=0x11 ASCQ=0x0 > [Tue Apr 28 14:13:35 2020] sd 4:0:4:0: [sdj] tag#112 CDB: opcode=0x28 28 00 03 39 e6 10 00 00 08 00 > [Tue Apr 28 14:13:35 2020] blk_update_request: critical medium error, dev sdj, sector 54126096 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 > [Tue Apr 28 14:13:35 2020] Buffer I/O error on dev sdj, logical block 6765762, async page read > > Examine on the suspect disk : > > test:/home/brad# mdadm --examine /dev/sdj > /dev/sdj: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x1 > Array UUID : dbbca7b5:327751b1:895f8f11:443f6ecb > Name : test:3 (local to host test) > Creation Time : Wed Nov 29 10:46:21 2017 > Raid Level : raid6 > Raid Devices : 9 > > Avail Dev Size : 3906767024 (1862.89 GiB 2000.26 GB) > Array Size : 13673684416 (13040.24 GiB 14001.85 GB) > Used Dev Size : 3906766976 (1862.89 GiB 2000.26 GB) > Data Offset : 262144 sectors > Super Offset : 8 sectors > Unused Space : before=262056 sectors, after=48 sectors > State : clean > Device UUID : f1a39d9b:fe217c62:26b065e3:0f859afd > > Internal Bitmap : 8 sectors from superblock > Update Time : Tue Apr 28 09:39:23 2020 > Bad Block Log : 512 entries available at offset 72 sectors > Checksum : cb44256b - correct > Events : 177156 > > Layout : left-symmetric > Chunk Size : 64K > > Device Role : Active device 5 > Array State : AA.AAAAAA ('A' == active, '.' == missing, 'R' == replacing) > > test:/home/brad# mdadm --detail /dev/md3 > /dev/md3: > Version : 1.2 > Creation Time : Wed Nov 29 10:46:21 2017 > Raid Level : raid6 > Array Size : 13673684416 (13040.24 GiB 14001.85 GB) > Used Dev Size : 1953383488 (1862.89 GiB 2000.26 GB) > Raid Devices : 9 > Total Devices : 8 > Persistence : Superblock is persistent > > Intent Bitmap : Internal > > Update Time : Tue Apr 28 09:39:23 2020 > State : clean, degraded > Active Devices : 8 > Working Devices : 8 > Failed Devices : 0 > Spare Devices : 0 > > Layout : left-symmetric > Chunk Size : 64K > > Name : test:3 (local to host test) > UUID : dbbca7b5:327751b1:895f8f11:443f6ecb > Events : 177156 > > Number Major Minor RaidDevice State > 12 8 112 0 active sync /dev/sdh > 13 8 96 1 active sync /dev/sdg > 4 0 0 4 removed > 11 8 176 3 active sync /dev/sdl > 4 8 160 4 active sync /dev/sdk > 9 8 144 5 active sync /dev/sdj > 6 8 64 6 active sync /dev/sde > 10 8 32 7 active sync /dev/sdc > 8 8 192 8 active sync /dev/sdm > > test:/home/brad# uname -a > Linux test 5.4.11 #49 SMP Wed Jan 15 11:23:38 AWST 2020 x86_64 GNU/Linux > > So the read error is well into the array member, yet a "check" doesn't hit it. Does that sound right? > These disks grow bad sectors not infrequently, and so a check quite often forces a repair on a block of 8 sectors, but it has persistently missed this one. I suspect, but Neil or some expert should confim or deny, that a check on a RAID-6 uses only the P parity to verify stripe consistency. If there are errors in the Q parity chunk, these will not be found. On the other hand, "raid6check" :-) uses both parities and, maybe, can trigger the rebuild. Hoep this helps, bye, pg > > Regards, > Brad -- piergiorgio