All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Raiber <martin@urbackup.org>
To: Jonas Aaberg <cja@lithops.se>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: On the issue of direct I/O and csum warnings
Date: Sat, 24 Jul 2021 09:44:33 +0000	[thread overview]
Message-ID: <0102017ad7e6ee78-b824b182-71e2-4f18-ba52-931711706630-000000@eu-west-1.amazonses.com> (raw)
In-Reply-To: <20210724083037.463fb0d5@poirot.localdomain>

On 24.07.2021 08:30 Jonas Aaberg wrote:
> On Fri, 23 Jul 2021 18:45:40 +0000
> Martin Raiber <martin@urbackup.org> wrote:
>
>> On 23.07.2021 16:55 Jonas Aaberg wrote:
>>> Hi,
>>>
>>> I use btrfs on dm-crypt. About two months ago, I started to get:
>>>
>>> --
>>> BTRFS warning (device dm-0): csum failed root 257 ino 1068852 off
>>> 25690112 csum 0xa27faf9a expected csum 0x4c266278 mirror 1 BTRFS
>>> error (device dm-0): bdev /dev/mapper/disk0 errs: wr 0, rd 0, flush
>>> 0, corrupt 349, gen 0
>>> --
>>>
>>> kind of warning/errors on my laptop. I went a bought a new NVME disk
>>> because I'm rather found of my data, eventhough most is backup-ed
>>> up.
>>>
>>> A week later, I started to get the same kind of warning/error
>>> message on my new NVME. After half a day of memtest86, resulted in
>>> no memory errors found, I gave up on my otherwise stable laptop and
>>> started to use an old laptop that I've been to lazy to sell instead
>>> while looking out for a decent pre-owned newer laptop.
>>>
>>> Now I'm just about to install and move over to a newly bought
>>> laptop, when today my old laptop started to show the same
>>> warning/errors. My old laptop does not share a single part with the
>>> laptop which I previous got the "checksum failure" warnings on.
>>> Therefore I have a hard time to believe that I've gotten the same
>>> hardware failure twice.
>>>
>>> Then I found:
>>> <https://btrfs.wiki.kernel.org/index.php/Gotchas> and "Direct I/O
>>> and CRCs".
>>>
>>> Which I believe is what I've ran into. One of the affect files is
>>> a log file from syncthing on both computers.  
>> I wouldn't be certain about the conclusion that it is the direct I/O
>> csum issue. Are you sure syncthing is writing to logs via direct I/O?
>> That would be bad e.g. because it disables btrfs compression and log
>> files compress really well. So I'd say report additional information
>> like kernel version (and if it is a vanilla kernel), how your btrfs
>> is setup (metadata RAID1), etc.
> No, I've not checked syncthing and its dependencies. But I'll do that.
> Just to be sure we're talking about the same thing, "direct" means
> O_DIRECT on syscall open()?
Yes.
>
> I use archlinux, with their stock "linux-lts" kernel which has been
> on 5.10 since winter/spring. I'm sure that the two last checksum errors
> have occurred on 5.10.x - unsure about exactly which version. Currently
> the computer runs 5.10.52, but it was after a system update and a
> restart that I noticed the checksum error. So the checksum error
> probably occurred on a previous kernel version in the 5.10 range.
>
> regarding mount options:
>
> /dev/mapper/disk0 on / type btrfs
> (rw,relatime,compress=zstd:3,ssd,space_cache,autodefrag,subvolid=256,subvol=/__current/ROOT)
> /dev/mapper/disk0 on /home type btrfs
> (rw,relatime,compress=zstd:3,ssd,space_cache,autodefrag,subvolid=257,subvol=/__current/home)
> /dev/mapper/disk0 on /var/log type btrfs
> (rw,relatime,compress=zstd:3,ssd,space_cache,autodefrag,subvolid=258,subvol=/__current/log)
>
> No raid. Just btrfs upon dmcrypt.
>
> The file with faulty checksum is in the subvolume=/__current/home.
> (/home//jonas/.config/syncthing/index-v0.14.0.db/007197.log)

That looks like a leveldb log file. I looked at rocksdb and that has options to use O_DIRECT, but it uses https://github.com/syndtr/goleveldb and I can see no hint of it using O_DIRECT there...

>
> If I recall right, I did correct the checksum errors on the first nvme
> disk where it occurred. The second NVME is left as it is when it
> occurred, and the error is still present on my SSD. So I can maybe get
> some history if needed.
>
> Any more information that you would like to have?
>
>>> I have just one humble request, please do something about this
>>> checksum error message. Just add printk with a link to:
>>> <https://btrfs.wiki.kernel.org/index.php/Gotchas> and the issue of
>>> "Direct I/O and CRCs".  
>> The problem is nothing can be done without impacting performance and
>> direct I/O is used for performance.
> Understood. I was talking about making the print less alarming.
It can't really distinguish the case where the buffer changed between write-out and checksumming and the case where data changed on disk either (without impacting performance).
>
>> IMO it should be disabled by
>> default (i.e. it just pretends to do direct I/O like ZFSOnLinux) and
>> be able to be enabled via mount option.
> Sounds like a good idea.
>
>>> Maybe update the wiki with:
>>> `find <mountpoint> -inum <ino-number-from-warning-message>`
>>> would be a helpful as well.  
>> btrfs inspect-internal inode-resolve
>> <ino-number-from-warning-message> <fs>
>>
>> is faster.
> Thanks!
>
> BR,
>  Jonas Aaberg
>
>


      reply	other threads:[~2021-07-24  9:44 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-23 14:55 On the issue of direct I/O and csum warnings Jonas Aaberg
2021-07-23 18:45 ` Martin Raiber
2021-07-24  6:30   ` Jonas Aaberg
2021-07-24  9:44     ` Martin Raiber [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0102017ad7e6ee78-b824b182-71e2-4f18-ba52-931711706630-000000@eu-west-1.amazonses.com \
    --to=martin@urbackup.org \
    --cc=cja@lithops.se \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.