All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Harvey <jmsharvey771@gmail.com>
To: dsterba@suse.cz, Qu Wenruo <wqu@suse.com>,
	James Harvey <jmsharvey771@gmail.com>,
	Qu Wenruo <quwenruo.btrfs@gmx.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: csum failed, bad tree, block, IO failures. Is my drive dead or has my BTRFS broke itself?
Date: Mon, 18 Oct 2021 11:37:10 +0100	[thread overview]
Message-ID: <CAHB2pq-7ADos4BVbATboA3CAM0DK2Gm9_26qpAAF+pdCFYWaJg@mail.gmail.com> (raw)
In-Reply-To: <20211018100846.GF30611@twin.jikos.cz>

(Resending because I accidentally sent as HTML and forgot to mention something)

The checksum errors can also be detected at runtime during reads, I
was trying to get a directory from the server and it failed because of
checksum errors (luckily I have that directory backed up). It may only
affect one top-level directory, since the remaining backups I'm doing
haven't stopped for any other directories. I also saw a USB reset in
my logs, which may indicate a bad cable/connection/drive. Here's a bit
of my logs from that:

(loads of SFTP IO errors and btrfs errors above, same errors that I've
sent before)
Oct 18 00:22:06 James-Server kernel: sd 2:0:0:0: [sdb] tag#26
uas_eh_abort_handler 0 uas-tag 1 inflight: IN
Oct 18 00:22:06 James-Server kernel: sd 2:0:0:0: [sdb] tag#26 CDB:
Read(16) 88 00 00 00 00 04 7e c9 a9 80 00 00 00 20 00 00
Oct 18 00:22:06 James-Server kernel: scsi host2:
uas_eh_device_reset_handler start
Oct 18 00:22:06 James-Server kernel: usb 2-6: reset high-speed USB
device number 3 using xhci_hcd
Oct 18 00:22:07 James-Server kernel: scsi host2:
uas_eh_device_reset_handler success
Oct 18 00:22:07 James-Server kernel: btrfs_print_data_csum_error: 1586
callbacks suppressed
Oct 18 00:22:07 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 103336 off 937984 csum 0x40832952 expected csum
0x00000000 mirror 1
Oct 18 00:22:07 James-Server kernel: btrfs_dev_stat_print_on_error:
1631 callbacks suppressed
Oct 18 00:22:07 James-Server kernel: BTRFS error (device sdb1): bdev
/dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 13530, gen 0
Oct 18 00:22:07 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 103336 off 942080 csum 0x901404f8 expected csum
0x00000000 mirror 1
Oct 18 00:22:07 James-Server kernel: BTRFS error (device sdb1): bdev
/dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 13531, gen 0
Oct 18 00:22:07 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 103336 off 946176 csum 0x0286f198 expected csum
0x00000000 mirror 1
Oct 18 00:22:07 James-Server kernel: BTRFS error (device sdb1): bdev
/dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 13532, gen 0
Oct 18 00:22:07 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 103336 off 950272 csum 0x1ef344b7 expected csum
0x00000000 mirror 1
Oct 18 00:22:07 James-Server kernel: BTRFS error (device sdb1): bdev
/dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 13533, gen 0
Oct 18 00:22:07 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 103336 off 954368 csum 0x8cfb460b expected csum
0x00000000 mirror 1
Oct 18 00:22:07 James-Server kernel: BTRFS error (device sdb1): bdev
/dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 13534, gen 0
Oct 18 00:22:07 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 103336 off 958464 csum 0xb18f6951 expected csum
0x00000000 mirror 1
Oct 18 00:22:07 James-Server kernel: BTRFS error (device sdb1): bdev
/dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 13535, gen 0
Oct 18 00:22:07 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 103336 off 962560 csum 0x99cdfa88 expected csum
0x00000000 mirror 1
Oct 18 00:22:07 James-Server kernel: BTRFS error (device sdb1): bdev
/dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 13536, gen 0
Oct 18 00:22:07 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 103336 off 966656 csum 0x906c81a9 expected csum
0x00000000 mirror 1
Oct 18 00:22:07 James-Server kernel: BTRFS error (device sdb1): bdev
/dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 13537, gen 0
Oct 18 00:22:07 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 103336 off 970752 csum 0xe6d4dd60 expected csum
0x00000000 mirror 1
Oct 18 00:22:07 James-Server kernel: BTRFS error (device sdb1): bdev
/dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 13538, gen 0
Oct 18 00:22:07 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 103336 off 974848 csum 0x1395994a expected csum
0x00000000 mirror 1
Oct 18 00:22:07 James-Server kernel: BTRFS error (device sdb1): bdev
/dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 13539, gen 0

On Mon, 18 Oct 2021 at 11:09, David Sterba <dsterba@suse.cz> wrote:
>
> On Sun, Oct 17, 2021 at 08:00:59AM +0800, Qu Wenruo wrote:
> > On 2021/10/17 04:45, James Harvey wrote:
> > > Check hasn't done yet, but it's spit out about 1700 messages (tmux
> > > won't let me scroll up futher) that all look like this:
> >
> > Yeah, this means quite a lot of metadata are filled with garbage.
> >
> > I'm not sure why, but it doesn't like to be caused by btrfs itself.
>
> Agreed, this amount of garbage would be detected by other means
> (mismatching csums while the system is still in use or by
> pre-write/post-read tree checker). It's not bitflips, there are too many
> changes eg. in the bogus block offsets.
>
> Analyzing the actual data left on disk for some known pattern could at
> least give some hint what it was, eg. strings, file headers or raw
> pointers. Besides that a manual system check could prevent that in the
> future, so check cables, possible overheating, up to date
> kernel/firmware (in case it would be cause by other subsystems).

      reply	other threads:[~2021-10-18 10:37 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-16  0:14 csum failed, bad tree, block, IO failures. Is my drive dead or has my BTRFS broke itself? James Harvey
2021-10-16  1:52 ` Qu Wenruo
2021-10-16  3:18   ` James Harvey
2021-10-16  3:30     ` Qu Wenruo
2021-10-16 20:45       ` James Harvey
2021-10-17  0:00         ` Qu Wenruo
2021-10-18 10:08           ` David Sterba
2021-10-18 10:37             ` James Harvey [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHB2pq-7ADos4BVbATboA3CAM0DK2Gm9_26qpAAF+pdCFYWaJg@mail.gmail.com \
    --to=jmsharvey771@gmail.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.