All of lore.kernel.org
 help / color / mirror / Atom feed
* bug: btrfs device stats not showing raid1 errors
@ 2021-09-20 16:37 Chris Murphy
  2021-09-20 19:42 ` waxhead
  2021-09-21 19:57 ` Chris Murphy
  0 siblings, 2 replies; 3+ messages in thread
From: Chris Murphy @ 2021-09-20 16:37 UTC (permalink / raw)
  To: Btrfs BTRFS

https://bugzilla.redhat.com/show_bug.cgi?id=2005987

Various kernel messages like this:

[2634355.709564] BTRFS info (device sda3): read error corrected: ino
27902168 off 8773632 (dev /dev/sda3 sector 52960104)
[2634355.733898] BTRFS info (device sda3): read error corrected: ino
27902168 off 8749056 (dev /dev/sda3 sector 52960056)

And yet 'btrfs dev stats' does not show an increment in tracked
statistics, in particular read_io_errs

This does seem like suboptimal behavior.  Discussed a bit on IRC today
and Zygo found the behavior is introduced with commit 0cc068e6ee59
btrfs: don't report readahead errors and don't update statistics

Zygo on IRC writes:
readahead errors are things like "out of memory" or device-mapper nonsense
so the best is 'don't correct and don't count'
since there's probably nothing wrong with the underlying media
but if there is something wrong with the underlying media, we want a
proper read, correct, and count to happen
which means we can safely do nothing during readahead
so the right answer is don't correct and don't count
---

I'm not sure how noisy it could be to always report such read errors
discovered during read ahead, but my gut instinct is that anytime
there's a read error whether physical or virtual, we probably want to
know about this? If these are bogus errors then that suggests (a) do
not increment the dev stats counter, and also (b) do not fix up.




-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: bug: btrfs device stats not showing raid1 errors
  2021-09-20 16:37 bug: btrfs device stats not showing raid1 errors Chris Murphy
@ 2021-09-20 19:42 ` waxhead
  2021-09-21 19:57 ` Chris Murphy
  1 sibling, 0 replies; 3+ messages in thread
From: waxhead @ 2021-09-20 19:42 UTC (permalink / raw)
  To: Chris Murphy, Btrfs BTRFS

Chris Murphy wrote:
> https://bugzilla.redhat.com/show_bug.cgi?id=2005987
> 
> Various kernel messages like this:
> 
> [2634355.709564] BTRFS info (device sda3): read error corrected: ino
> 27902168 off 8773632 (dev /dev/sda3 sector 52960104)
> [2634355.733898] BTRFS info (device sda3): read error corrected: ino
> 27902168 off 8749056 (dev /dev/sda3 sector 52960056)
> 
> And yet 'btrfs dev stats' does not show an increment in tracked
> statistics, in particular read_io_errs
> 
This is extremely confusing for me as well and I am just a BTRFS user...
I am an BTRFS "enthusiast" if there is such a thing , and if this seems 
wrong (regardless if it is wrong or not) imagine the frustration and 
confusion for those not that into filesystems.

> This does seem like suboptimal behavior.  Discussed a bit on IRC today
> and Zygo found the behavior is introduced with commit 0cc068e6ee59
> btrfs: don't report readahead errors and don't update statistics
> 
> Zygo on IRC writes:
> readahead errors are things like "out of memory" or device-mapper nonsense
> so the best is 'don't correct and don't count'
> since there's probably nothing wrong with the underlying media
> but if there is something wrong with the underlying media, we want a
> proper read, correct, and count to happen
> which means we can safely do nothing during readahead
> so the right answer is don't correct and don't count
> ---
> 
> I'm not sure how noisy it could be to always report such read errors
> discovered during read ahead, but my gut instinct is that anytime
> there's a read error whether physical or virtual, we probably want to
> know about this? If these are bogus errors then that suggests (a) do
> not increment the dev stats counter, and also (b) do not fix up.
> 
...And in case someone clears this up. Please consider a table output 
option like btrfs fi us -T /mnt ... e.g. btrfs de st -T /mnt that output 
something like

Device stat ErrWrite ErrRead ErrFlush ErrCorrupt ErrGen
----------- -------- ------- -------- ---------- ------
/dev/sdb1          0       1        2          0      3
/dev/sdt1          0       2        3          0      4
/dev/sdr1          0       3        4          0      6
/dev/sdf1          0       4        5          0      7
/dev/sds1          0       5        6          0      8

instead or in addition to...

[/dev/sdb1].write_io_errs    0
[/dev/sdb1].read_io_errs     0
[/dev/sdb1].flush_io_errs    0
[/dev/sdb1].corruption_errs  0
[/dev/sdb1].generation_errs  0
[/dev/sdt1].write_io_errs    0
[/dev/sdt1].read_io_errs     0
[/dev/sdt1].flush_io_errs    0
[/dev/sdt1].corruption_errs  0
[/dev/sdt1].generation_errs  0
...etc...

The current list that duplicates stuff takes up an awful lot of space if 
you have plenty of storage devices. I have 18 harddrives in a BTRFS pool 
and the btrfs de st list is annoyingly long...

A table would be nice , or simply SKIP printing the lines where the stat 
counter==0 as this simply is not needed.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: bug: btrfs device stats not showing raid1 errors
  2021-09-20 16:37 bug: btrfs device stats not showing raid1 errors Chris Murphy
  2021-09-20 19:42 ` waxhead
@ 2021-09-21 19:57 ` Chris Murphy
  1 sibling, 0 replies; 3+ messages in thread
From: Chris Murphy @ 2021-09-21 19:57 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On Mon, Sep 20, 2021 at 10:37 AM Chris Murphy <lists@colorremedies.com> wrote:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=2005987

The downstream bug has been updated. The problem does seem to be with
the drive itself failing reads.

[2634355.708201] sd 2:0:0:0: [sda] tag#0 FAILED Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=30s
[2634355.708209] sd 2:0:0:0: [sda] tag#0 Sense Key : Illegal Request [current]
[2634355.708214] sd 2:0:0:0: [sda] tag#0 Add. Sense: Unaligned write command

I think it's fair to say, readahead or no, it's a read io error that
Btrfs probably ought to track even though there's also some pretty
obvious hardware issues occurring with the drive too.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-09-21 19:57 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-20 16:37 bug: btrfs device stats not showing raid1 errors Chris Murphy
2021-09-20 19:42 ` waxhead
2021-09-21 19:57 ` Chris Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.