* Slow performance with Btrfs RAID 10 with a failed disk
@ 2019-11-27 8:36 Christopher Baines
2019-11-27 14:19 ` Austin S. Hemmelgarn
0 siblings, 1 reply; 3+ messages in thread
From: Christopher Baines @ 2019-11-27 8:36 UTC (permalink / raw)
To: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 5558 bytes --]
Hey,
I'm using RAID 10, and one of the disks has recently failed [1], and I'm
seeing plenty of warning and errors in the dmesg output [2].
What kind of performance should be expected from Btrfs when a disk has
failed? [3] At the moment, the system seems very slow. One contributing
factor may be that all the logging that Btrfs is generating is being
written to the btrfs filesystem that's degraded, probably causing more
log messages to be produced.
I guess that replacing the failed disk is the long term solution to get
the filesystem back in to proper operation, but is there anything else
that can be done to get it back operating until then?
Also, is there anything that can stop btrfs logging so much about the
failures, now that I know that a disk has failed?
Thanks,
Chris
1:
Nov 26 19:20:56 localhost vmunix: [5117520.484302] sd 0:1:0:5: [sdf] Unaligned partial completion (resid=52, sector_sz=512)
Nov 26 19:20:56 localhost vmunix: [5117520.525506] sd 0:1:0:5: [sdf] tag#360 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 26 19:20:56 localhost vmunix: [5117520.525525] sd 0:1:0:5: [sdf] Unaligned partial completion (resid=24384, sector_sz=512)
Nov 26 19:20:56 localhost vmunix: [5117520.566649] sd 0:1:0:5: [sdf] tag#360 Sense Key : Hardware Error [current]
Nov 26 19:20:57 localhost vmunix: [5117520.597829] sd 0:1:0:5: [sdf] tag#363 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 26 19:20:57 localhost vmunix: [5117520.637610] sd 0:1:0:5: [sdf] tag#360 Add. Sense: Logical unit failure
Nov 26 19:20:57 localhost vmunix: [5117520.668134] sd 0:1:0:5: [sdf] tag#363 Sense Key : Hardware Error [current]
Nov 26 19:20:57 localhost vmunix: [5117520.668136] sd 0:1:0:5: [sdf] tag#363 Add. Sense: Logical unit failure
Nov 26 19:20:58 localhost vmunix: [5117520.707347] sd 0:1:0:5: [sdf] tag#360 CDB: Write(10) 2a 00 46 86 12 00 00 00 80 00
Nov 26 19:20:58 localhost vmunix: [5117520.736962] sd 0:1:0:5: [sdf] tag#363 CDB: Write(10) 2a 00 47 1e 0e 00 00 02 00 00
Nov 26 19:20:58 localhost vmunix: [5117520.774569] print_req_error: critical target error, dev sdf, sector 1183191552 flags 100001
Nov 26 19:20:59 localhost vmunix: [5117520.774573] BTRFS error (device sda3): bdev /dev/sdf errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
Nov 26 19:20:59 localhost vmunix: [5117520.803740] print_req_error: critical target error, dev sdf, sector 1193152000 flags 4001
Nov 26 19:20:59 localhost vmunix: [5117520.803746] BTRFS error (device sda3): bdev /dev/sdf errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
Nov 26 19:20:59 localhost vmunix: [5117520.840559] sd 0:1:0:5: [sdf] Unaligned partial completion (resid=52, sector_sz=512)
Nov 26 19:20:59 localhost vmunix: [5117520.868966] BTRFS error (device sda3): bdev /dev/sdf errs: wr 4, rd 0, flush 0, corrupt 0, gen 0
Nov 26 19:21:00 localhost vmunix: [5117520.869037] sd 0:1:0:5: [sdf] Unaligned partial completion (resid=52, sector_sz=512)
Nov 26 19:21:00 localhost vmunix: [5117520.869042] sd 0:1:0:5: [sdf] tag#385 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
2:
[5168107.359619] BTRFS error (device sda3): error writing primary super block to device 6
[5168107.932712] BTRFS warning (device sda3): lost page write due to IO error on /dev/sdf
[5168108.091827] BTRFS error (device sda3): error writing primary super block to device 6
[5168108.155217] BTRFS warning (device sda3): lost page write due to IO error on /dev/sdf
[5168108.288296] BTRFS error (device sda3): error writing primary super block to device 6
[5168108.972431] BTRFS warning (device sda3): lost page write due to IO error on /dev/sdf
[5168109.204083] BTRFS error (device sda3): error writing primary super block to device 6
[5168109.595413] btrfs_dev_stat_print_on_error: 296 callbacks suppressed
[5168109.595422] BTRFS error (device sda3): bdev /dev/sdf errs: wr 5071725, rd 408586, flush 0, corrupt 0, gen 0
[5168109.639670] BTRFS error (device sda3): bdev /dev/sdf errs: wr 5071726, rd 408586, flush 0, corrupt 0, gen 0
[5168109.664981] BTRFS error (device sda3): bdev /dev/sdf errs: wr 5071727, rd 408586, flush 0, corrupt 0, gen 0
[5168109.689197] BTRFS error (device sda3): bdev /dev/sdf errs: wr 5071728, rd 408586, flush 0, corrupt 0, gen 0
[5168109.728189] BTRFS error (device sda3): bdev /dev/sdf errs: wr 5071729, rd 408586, flush 0, corrupt 0, gen 0
[5168109.744894] BTRFS error (device sda3): bdev /dev/sdf errs: wr 5071730, rd 408586, flush 0, corrupt 0, gen 0
[5168109.755457] BTRFS error (device sda3): bdev /dev/sdf errs: wr 5071731, rd 408586, flush 0, corrupt 0, gen 0
[5168109.831763] BTRFS warning (device sda3): lost page write due to IO error on /dev/sdf
[5168109.848128] BTRFS error (device sda3): bdev /dev/sdf errs: wr 5071732, rd 408586, flush 0, corrupt 0, gen 0
[5168109.849445] BTRFS error (device sda3): bdev /dev/sdf errs: wr 5071733, rd 408586, flush 0, corrupt 0, gen 0
[5168109.917277] BTRFS error (device sda3): error writing primary super block to device 6
[5168109.941132] BTRFS error (device sda3): bdev /dev/sdf errs: wr 5071734, rd 408586, flush 0, corrupt 0, gen 0
[5168110.009785] BTRFS warning (device sda3): lost page write due to IO error on /dev/sdf
3:
Label: none uuid: 620115c7-89c7-4d79-a0bb-4957057d9991
Total devices 6 FS bytes used 1.08TiB
devid 1 size 72.70GiB used 72.70GiB path /dev/sda3
devid 2 size 72.70GiB used 72.70GiB path /dev/sdb3
devid 3 size 931.48GiB used 555.73GiB path /dev/sdc
devid 4 size 931.48GiB used 555.73GiB path /dev/sdd
devid 5 size 931.48GiB used 555.73GiB path /dev/sde
*** Some devices missing
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 962 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Slow performance with Btrfs RAID 10 with a failed disk
2019-11-27 8:36 Slow performance with Btrfs RAID 10 with a failed disk Christopher Baines
@ 2019-11-27 14:19 ` Austin S. Hemmelgarn
2019-12-02 15:08 ` Christopher Baines
0 siblings, 1 reply; 3+ messages in thread
From: Austin S. Hemmelgarn @ 2019-11-27 14:19 UTC (permalink / raw)
To: Christopher Baines, linux-btrfs
On 2019-11-27 03:36, Christopher Baines wrote:
> Hey,
>
> I'm using RAID 10, and one of the disks has recently failed [1], and I'm
> seeing plenty of warning and errors in the dmesg output [2].
>
> What kind of performance should be expected from Btrfs when a disk has
> failed? [3] At the moment, the system seems very slow. One contributing
> factor may be that all the logging that Btrfs is generating is being
> written to the btrfs filesystem that's degraded, probably causing more
> log messages to be produced.
>
> I guess that replacing the failed disk is the long term solution to get
> the filesystem back in to proper operation, but is there anything else
> that can be done to get it back operating until then?
>
> Also, is there anything that can stop btrfs logging so much about the
> failures, now that I know that a disk has failed?
You can solve both problems by replacing the disc, or if possible, just
removing it from the array. You should, in theory, be able to convert to
regular raid1 and then remove the failed disc, though it will likely
take a while. Given your output below, I'd actually drop /dev/sdb as
well, and look at replacing both with a single 1TB disc like your other
three.
The issue here is that BTRFS doesn't see the disc as failed, so it keeps
trying to access it. That's what's slowing things down (because it
eventually times out on the access attempt) and why it's logging so much
(because BTRFS logs every IO error it encounters (like it should)).
>
> Thanks,
>
> Chris
>
>
> 1:
> Nov 26 19:20:56 localhost vmunix: [5117520.484302] sd 0:1:0:5: [sdf] Unaligned partial completion (resid=52, sector_sz=512)
> Nov 26 19:20:56 localhost vmunix: [5117520.525506] sd 0:1:0:5: [sdf] tag#360 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Nov 26 19:20:56 localhost vmunix: [5117520.525525] sd 0:1:0:5: [sdf] Unaligned partial completion (resid=24384, sector_sz=512)
> Nov 26 19:20:56 localhost vmunix: [5117520.566649] sd 0:1:0:5: [sdf] tag#360 Sense Key : Hardware Error [current]
> Nov 26 19:20:57 localhost vmunix: [5117520.597829] sd 0:1:0:5: [sdf] tag#363 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Nov 26 19:20:57 localhost vmunix: [5117520.637610] sd 0:1:0:5: [sdf] tag#360 Add. Sense: Logical unit failure
> Nov 26 19:20:57 localhost vmunix: [5117520.668134] sd 0:1:0:5: [sdf] tag#363 Sense Key : Hardware Error [current]
> Nov 26 19:20:57 localhost vmunix: [5117520.668136] sd 0:1:0:5: [sdf] tag#363 Add. Sense: Logical unit failure
> Nov 26 19:20:58 localhost vmunix: [5117520.707347] sd 0:1:0:5: [sdf] tag#360 CDB: Write(10) 2a 00 46 86 12 00 00 00 80 00
> Nov 26 19:20:58 localhost vmunix: [5117520.736962] sd 0:1:0:5: [sdf] tag#363 CDB: Write(10) 2a 00 47 1e 0e 00 00 02 00 00
> Nov 26 19:20:58 localhost vmunix: [5117520.774569] print_req_error: critical target error, dev sdf, sector 1183191552 flags 100001
> Nov 26 19:20:59 localhost vmunix: [5117520.774573] BTRFS error (device sda3): bdev /dev/sdf errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
> Nov 26 19:20:59 localhost vmunix: [5117520.803740] print_req_error: critical target error, dev sdf, sector 1193152000 flags 4001
> Nov 26 19:20:59 localhost vmunix: [5117520.803746] BTRFS error (device sda3): bdev /dev/sdf errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
> Nov 26 19:20:59 localhost vmunix: [5117520.840559] sd 0:1:0:5: [sdf] Unaligned partial completion (resid=52, sector_sz=512)
> Nov 26 19:20:59 localhost vmunix: [5117520.868966] BTRFS error (device sda3): bdev /dev/sdf errs: wr 4, rd 0, flush 0, corrupt 0, gen 0
> Nov 26 19:21:00 localhost vmunix: [5117520.869037] sd 0:1:0:5: [sdf] Unaligned partial completion (resid=52, sector_sz=512)
> Nov 26 19:21:00 localhost vmunix: [5117520.869042] sd 0:1:0:5: [sdf] tag#385 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
>
> 2:
> [5168107.359619] BTRFS error (device sda3): error writing primary super block to device 6
> [5168107.932712] BTRFS warning (device sda3): lost page write due to IO error on /dev/sdf
> [5168108.091827] BTRFS error (device sda3): error writing primary super block to device 6
> [5168108.155217] BTRFS warning (device sda3): lost page write due to IO error on /dev/sdf
> [5168108.288296] BTRFS error (device sda3): error writing primary super block to device 6
> [5168108.972431] BTRFS warning (device sda3): lost page write due to IO error on /dev/sdf
> [5168109.204083] BTRFS error (device sda3): error writing primary super block to device 6
> [5168109.595413] btrfs_dev_stat_print_on_error: 296 callbacks suppressed
> [5168109.595422] BTRFS error (device sda3): bdev /dev/sdf errs: wr 5071725, rd 408586, flush 0, corrupt 0, gen 0
> [5168109.639670] BTRFS error (device sda3): bdev /dev/sdf errs: wr 5071726, rd 408586, flush 0, corrupt 0, gen 0
> [5168109.664981] BTRFS error (device sda3): bdev /dev/sdf errs: wr 5071727, rd 408586, flush 0, corrupt 0, gen 0
> [5168109.689197] BTRFS error (device sda3): bdev /dev/sdf errs: wr 5071728, rd 408586, flush 0, corrupt 0, gen 0
> [5168109.728189] BTRFS error (device sda3): bdev /dev/sdf errs: wr 5071729, rd 408586, flush 0, corrupt 0, gen 0
> [5168109.744894] BTRFS error (device sda3): bdev /dev/sdf errs: wr 5071730, rd 408586, flush 0, corrupt 0, gen 0
> [5168109.755457] BTRFS error (device sda3): bdev /dev/sdf errs: wr 5071731, rd 408586, flush 0, corrupt 0, gen 0
> [5168109.831763] BTRFS warning (device sda3): lost page write due to IO error on /dev/sdf
> [5168109.848128] BTRFS error (device sda3): bdev /dev/sdf errs: wr 5071732, rd 408586, flush 0, corrupt 0, gen 0
> [5168109.849445] BTRFS error (device sda3): bdev /dev/sdf errs: wr 5071733, rd 408586, flush 0, corrupt 0, gen 0
> [5168109.917277] BTRFS error (device sda3): error writing primary super block to device 6
> [5168109.941132] BTRFS error (device sda3): bdev /dev/sdf errs: wr 5071734, rd 408586, flush 0, corrupt 0, gen 0
> [5168110.009785] BTRFS warning (device sda3): lost page write due to IO error on /dev/sdf
>
> 3:
> Label: none uuid: 620115c7-89c7-4d79-a0bb-4957057d9991
> Total devices 6 FS bytes used 1.08TiB
> devid 1 size 72.70GiB used 72.70GiB path /dev/sda3
> devid 2 size 72.70GiB used 72.70GiB path /dev/sdb3
> devid 3 size 931.48GiB used 555.73GiB path /dev/sdc
> devid 4 size 931.48GiB used 555.73GiB path /dev/sdd
> devid 5 size 931.48GiB used 555.73GiB path /dev/sde
> *** Some devices missing
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Slow performance with Btrfs RAID 10 with a failed disk
2019-11-27 14:19 ` Austin S. Hemmelgarn
@ 2019-12-02 15:08 ` Christopher Baines
0 siblings, 0 replies; 3+ messages in thread
From: Christopher Baines @ 2019-12-02 15:08 UTC (permalink / raw)
To: Austin S. Hemmelgarn; +Cc: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 2461 bytes --]
Austin S. Hemmelgarn <ahferroin7@gmail.com> writes:
> On 2019-11-27 03:36, Christopher Baines wrote:
>> Hey,
>>
>> I'm using RAID 10, and one of the disks has recently failed [1], and I'm
>> seeing plenty of warning and errors in the dmesg output [2].
>>
>> What kind of performance should be expected from Btrfs when a disk has
>> failed? [3] At the moment, the system seems very slow. One contributing
>> factor may be that all the logging that Btrfs is generating is being
>> written to the btrfs filesystem that's degraded, probably causing more
>> log messages to be produced.
>>
>> I guess that replacing the failed disk is the long term solution to get
>> the filesystem back in to proper operation, but is there anything else
>> that can be done to get it back operating until then?
>>
>> Also, is there anything that can stop btrfs logging so much about the
>> failures, now that I know that a disk has failed?
>
> You can solve both problems by replacing the disc, or if possible,
> just removing it from the array. You should, in theory, be able to
> convert to regular raid1 and then remove the failed disc, though it
> will likely take a while. Given your output below, I'd actually drop
> /dev/sdb as well, and look at replacing both with a single 1TB disc
> like your other three.
>
> The issue here is that BTRFS doesn't see the disc as failed, so it
> keeps trying to access it. That's what's slowing things down (because
> it eventually times out on the access attempt) and why it's logging so
> much (because BTRFS logs every IO error it encounters (like it
> should)).
Thanks for the tips :)
I've now remounted the filesystem with the degraded flag.
However, I haven't managed to remove the disk from the array yet.
$ sudo btrfs filesystem show /
Label: none uuid: 620115c7-89c7-4d79-a0bb-4957057d9991
Total devices 6 FS bytes used 1.08TiB
devid 1 size 72.70GiB used 72.70GiB path /dev/sda3
devid 2 size 72.70GiB used 72.70GiB path /dev/sdb3
devid 3 size 931.48GiB used 530.73GiB path /dev/sdc
devid 4 size 931.48GiB used 530.73GiB path /dev/sdd
devid 5 size 931.48GiB used 530.73GiB path /dev/sde
*** Some devices missing
$ sudo btrfs device delete missing /
ERROR: error removing device 'missing': no missing devices found to remove
So Btrfs knows at some level that a device is missing, from the output
of the first command, but it won't delete the missing device.
Am I missing something?
Thanks,
Chris
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 962 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-12-02 15:08 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-27 8:36 Slow performance with Btrfs RAID 10 with a failed disk Christopher Baines
2019-11-27 14:19 ` Austin S. Hemmelgarn
2019-12-02 15:08 ` Christopher Baines
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).