linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonas Aaberg <cja@lithops.se>
To: Martin Raiber <martin@urbackup.org>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: On the issue of direct I/O and csum warnings
Date: Sat, 24 Jul 2021 08:30:37 +0200	[thread overview]
Message-ID: <20210724083037.463fb0d5@poirot.localdomain> (raw)
In-Reply-To: <0102017ad4affb63-e12c8463-8971-4b1c-b271-ee880969fa5b-000000@eu-west-1.amazonses.com>

On Fri, 23 Jul 2021 18:45:40 +0000
Martin Raiber <martin@urbackup.org> wrote:

> On 23.07.2021 16:55 Jonas Aaberg wrote:
> > Hi,
> >
> > I use btrfs on dm-crypt. About two months ago, I started to get:
> >
> > --
> > BTRFS warning (device dm-0): csum failed root 257 ino 1068852 off
> > 25690112 csum 0xa27faf9a expected csum 0x4c266278 mirror 1 BTRFS
> > error (device dm-0): bdev /dev/mapper/disk0 errs: wr 0, rd 0, flush
> > 0, corrupt 349, gen 0
> > --
> >
> > kind of warning/errors on my laptop. I went a bought a new NVME disk
> > because I'm rather found of my data, eventhough most is backup-ed
> > up.
> >
> > A week later, I started to get the same kind of warning/error
> > message on my new NVME. After half a day of memtest86, resulted in
> > no memory errors found, I gave up on my otherwise stable laptop and
> > started to use an old laptop that I've been to lazy to sell instead
> > while looking out for a decent pre-owned newer laptop.
> >
> > Now I'm just about to install and move over to a newly bought
> > laptop, when today my old laptop started to show the same
> > warning/errors. My old laptop does not share a single part with the
> > laptop which I previous got the "checksum failure" warnings on.
> > Therefore I have a hard time to believe that I've gotten the same
> > hardware failure twice.
> >
> > Then I found:
> > <https://btrfs.wiki.kernel.org/index.php/Gotchas> and "Direct I/O
> > and CRCs".
> >
> > Which I believe is what I've ran into. One of the affect files is
> > a log file from syncthing on both computers.  
> 
> I wouldn't be certain about the conclusion that it is the direct I/O
> csum issue. Are you sure syncthing is writing to logs via direct I/O?
> That would be bad e.g. because it disables btrfs compression and log
> files compress really well. So I'd say report additional information
> like kernel version (and if it is a vanilla kernel), how your btrfs
> is setup (metadata RAID1), etc.

No, I've not checked syncthing and its dependencies. But I'll do that.
Just to be sure we're talking about the same thing, "direct" means
O_DIRECT on syscall open()?

I use archlinux, with their stock "linux-lts" kernel which has been
on 5.10 since winter/spring. I'm sure that the two last checksum errors
have occurred on 5.10.x - unsure about exactly which version. Currently
the computer runs 5.10.52, but it was after a system update and a
restart that I noticed the checksum error. So the checksum error
probably occurred on a previous kernel version in the 5.10 range.

regarding mount options:

/dev/mapper/disk0 on / type btrfs
(rw,relatime,compress=zstd:3,ssd,space_cache,autodefrag,subvolid=256,subvol=/__current/ROOT)
/dev/mapper/disk0 on /home type btrfs
(rw,relatime,compress=zstd:3,ssd,space_cache,autodefrag,subvolid=257,subvol=/__current/home)
/dev/mapper/disk0 on /var/log type btrfs
(rw,relatime,compress=zstd:3,ssd,space_cache,autodefrag,subvolid=258,subvol=/__current/log)

No raid. Just btrfs upon dmcrypt.

The file with faulty checksum is in the subvolume=/__current/home.
(/home//jonas/.config/syncthing/index-v0.14.0.db/007197.log)

If I recall right, I did correct the checksum errors on the first nvme
disk where it occurred. The second NVME is left as it is when it
occurred, and the error is still present on my SSD. So I can maybe get
some history if needed.

Any more information that you would like to have?

> 
> > I have just one humble request, please do something about this
> > checksum error message. Just add printk with a link to:
> > <https://btrfs.wiki.kernel.org/index.php/Gotchas> and the issue of
> > "Direct I/O and CRCs".  
> The problem is nothing can be done without impacting performance and
> direct I/O is used for performance.
Understood. I was talking about making the print less alarming.

> IMO it should be disabled by
> default (i.e. it just pretends to do direct I/O like ZFSOnLinux) and
> be able to be enabled via mount option.
Sounds like a good idea.

> >
> > Maybe update the wiki with:
> > `find <mountpoint> -inum <ino-number-from-warning-message>`
> > would be a helpful as well.  
> 
> btrfs inspect-internal inode-resolve
> <ino-number-from-warning-message> <fs>
> 
> is faster.
Thanks!

BR,
 Jonas Aaberg



  reply	other threads:[~2021-07-24  6:30 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-23 14:55 On the issue of direct I/O and csum warnings Jonas Aaberg
2021-07-23 18:45 ` Martin Raiber
2021-07-24  6:30   ` Jonas Aaberg [this message]
2021-07-24  9:44     ` Martin Raiber

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210724083037.463fb0d5@poirot.localdomain \
    --to=cja@lithops.se \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=martin@urbackup.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).