Buffer I/O error... async page read

* Buffer I/O error... async page read
@ 2018-02-05 19:10 Liwei
  2018-02-06 13:55 ` Liwei
  0 siblings, 1 reply; 5+ messages in thread
From: Liwei @ 2018-02-05 19:10 UTC (permalink / raw)
  To: linux-raid

Hi list,

tl;dr: Array seems to be remembering bad blocks from recovered drive,
even though drive the image is on is fine. Is there a way to make
array forget the blocks? Is it safe?

    We had a raid6 array that went down because 2 drives went down and
1 drive encountered bad sectors.
    We managed to recover the 1 drive with bad sectors (we engaged a
recovery lab), and the remaining drives in the array report neither
pending nor re-allocated sectors (from smartctl).

    After re-integrating the (image of the) recovered drive with bad
sectors and starting the array in degraded mode, we realised we are
still unable to read from some sectors in the md device. I believe
they correspond to where the bad sectors were previously.

    When trying to read from said sectors, this comes up in dmesg:

[Feb 6 02:05] Buffer I/O error on dev dm-26, logical block 5166101891,
async page read
[  +0.000458] Buffer I/O error on dev dm-26, logical block 5166101891,
async page read
[ +13.297834] Buffer I/O error on dev dm-26, logical block 5166101891,
async page read
[  +0.000438] Buffer I/O error on dev dm-26, logical block 5166101891,
async page read
[Feb 6 02:06] Buffer I/O error on dev dm-26, logical block 5166101891,
async page read
[  +0.000390] Buffer I/O error on dev dm-26, logical block 5166101891,
async page read
[ +13.284550] Buffer I/O error on dev dm-26, logical block 5166102915,
async page read
[  +0.000448] Buffer I/O error on dev dm-26, logical block 5166102915,
async page read
[Feb 6 02:17] Buffer I/O error on dev dm-26, logical block 5166101891,
async page read
[  +0.000341] Buffer I/O error on dev dm-26, logical block 5166101891,
async page read
[Feb 6 02:24] Buffer I/O error on dev dm-26, logical block 5166118804,
async page read
[  +0.002417] Buffer I/O error on dev dm-26, logical block 5166118804,
async page read
[  +2.972446] Buffer I/O error on dev dm-26, logical block 5166118804,
async page read
[  +0.002172] Buffer I/O error on dev dm-26, logical block 5166118804,
async page read
[Feb 6 02:25] Buffer I/O error on dev dm-26, logical block 5166118804,
async page read
[  +0.002130] Buffer I/O error on dev dm-26, logical block 5166118804,
async page read

    However, I've checked smartctl and ran a pass of (read-only)
badblocks over the drives, all sectors are readable, there are no
pending sectors, and no reallocated sectors.

    So what is generating these buffer I/O errors?

    Also, upon investigating, I'm astonished to find a non-empty list when I do:
        /sys/block/md126/md/dev-*/bad_blocks

    Almost every drive in the array has a few entries. That's not
normal isn't it? My theory is that since these are consumer-grade SATA
drives, some odd read/write timeout must have occurred at some point,
causing md to think that the sectors are bad? Is there a way to make
md forget about these blocks? Is it safe to do so?

Warm regards,
Liwei

^ permalink raw reply	[flat|nested] 5+ messages in thread