linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: "Sébastien Luttringer" <seblu@seblu.net>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Corrupted filesystem, looking for guidance
Date: Tue, 12 Feb 2019 07:05:50 -0500	[thread overview]
Message-ID: <09f37190-bc2a-c0fa-a467-18a30d360d6f@gmail.com> (raw)
In-Reply-To: <7ef0e91501a04cd4c5e0d942db638a0b50ef3ec3.camel@seblu.net>

On 2019-02-11 22:16, Sébastien Luttringer wrote:
> Hello,
> 
> The context is a BTRFS filesystem on top of an md device (raid5 on 6 disks).
> System is an Arch Linux and the kernel was a vanilla 4.20.2.
> 
> # btrfs fi us /home
> Overall:
>      Device size:                  27.29TiB
>      Device allocated:              5.01TiB
>      Device unallocated:           22.28TiB
>      Device missing:                  0.00B
>      Used:                          5.00TiB
>      Free (estimated):             22.28TiB      (min: 22.28TiB)
>      Data ratio:                       1.00
>      Metadata ratio:                   1.00
>      Global reserve:              512.00MiB      (used: 0.00B)
> 
> Data,single: Size:4.95TiB, Used:4.95TiB
>     /dev/md127      4.95TiB
> 
> Metadata,single: Size:61.01GiB, Used:57.72GiB
>     /dev/md127     61.01GiB
> 
> System,single: Size:36.00MiB, Used:560.00KiB
>     /dev/md127     36.00MiB
> 
> Unallocated:
>     /dev/md127     22.28TiB
> 
> I'm not able to find the root cause of the btrfs corruption. All disks looks
> healthy (selftest ok, no error logged), no kernel trace of link failure or
> something.
> I run a check on the md layer, and 2 mismatch was discovered:
> Feb 11 04:02:35 kernel: md127: mismatch sector in range 490387096-490387104
> Feb 11 04:31:14 kernel: md127: mismatch sector in range 1024770720-1024770728
> I run a repair (resync) but mismatch are still around after. 😱
> 
> The first BTRFS warning was:
> Feb 07 11:27:57 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> 
> 
> After that, the userland process crashed. Few days ago, I run it again. It
> crashes again but filesystem become read-only
> 
> Feb 10 01:07:02 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 01:07:03 kernel: BTRFS error (device md127): error loading props for ino
> 9930722 (root 5): -5
> Feb 10 01:07:03 kernel: BTRFS error (device md127): error loading props for ino
> 9930722 (root 5): -5
> Feb 10 01:07:03 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 01:07:03 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 01:07:03 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 01:07:03 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 01:07:03 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 01:07:03 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 01:07:03 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 01:07:03 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 03:16:24 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 03:16:28 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 03:27:34 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 03:27:40 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 05:59:34 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 05:59:34 kernel: BTRFS error (device md127): error loading props for ino
> 9930722 (root 5): -5
> Feb 10 05:59:34 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 05:59:34 kernel: BTRFS info (device md127): failed to delete reference
> to fImage%252057(1).jpg, inode 9930722 parent 58718826
> Feb 10 05:59:34 kernel: BTRFS: error (device md127) in
> __btrfs_unlink_inode:3971: errno=-5 IO failure
> Feb 10 05:59:34 kernel: BTRFS info (device md127): forced readonly
> 
> The btrfs check report:
> 
> # btrfs check -p /dev/md127
> Opening filesystem to check...
> Checking filesystem on /dev/md127
> UUID: 64403592-5a24-4851-bda2-ce4b3844c168
> [1/7] checking root items                      (0:10:21 elapsed, 10056723 items
> checked)
> [2/7] checking extents                         (0:04:59 elapsed, 155136 items
> checked)
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B043109 items
> checked)
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> Csum didn't match
> ref mismatch on [2622304964608 28672] extent item 1, found 0sed, 3783066 items
> checked)
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> Csum didn't match
> incorrect local backref count on 2622304964608 root 5 owner 9930722 offset 0
> found 0 wanted 1 back 0x55d61387cd40
> backref disk bytenr does not match extent record, bytenr=2622304964608, ref
> bytenr=0
> backpointer mismatch on [2622304964608 28672]
> owner ref check failed [2622304964608 28672]
> ref mismatch on [2622304993280 262144] extent item 1, found 0
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> Csum didn't match
> incorrect local backref count on 2622304993280 root 5 owner 9930724 offset 0
> found 0 wanted 1 back 0x55d61387ce70
> backref disk bytenr does not match extent record, bytenr=2622304993280, ref
> bytenr=0
> backpointer mismatch on [2622304993280 262144]
> owner ref check failed [2622304993280 262144]
> ref mismatch on [2622305255424 4096] extent item 1, found 0
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> Csum didn't match
> incorrect local backref count on 2622305255424 root 5 owner 9930727 offset 0
> found 0 wanted 1 back 0x55d61387cfa0
> backref disk bytenr does not match extent record, bytenr=2622305255424, ref
> bytenr=0
> backpointer mismatch on [2622305255424 4096]
> owner ref check failed [2622305255424 4096]
> ref mismatch on [2622305259520 8192] extent item 1, found 0
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> Csum didn't match
> incorrect local backref count on 2622305259520 root 5 owner 9930731 offset 0
> found 0 wanted 1 back 0x55d61387d0d0
> backref disk bytenr does not match extent record, bytenr=2622305259520, ref
> bytenr=0
> backpointer mismatch on [2622305259520 8192]
> owner ref check failed [2622305259520 8192]
> ref mismatch on [2622305267712 188416] extent item 1, found 0
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> Csum didn't match
> incorrect local backref count on 2622305267712 root 5 owner 9930733 offset 0
> found 0 wanted 1 back 0x55d61387d200
> backref disk bytenr does not match extent record, bytenr=2622305267712, ref
> bytenr=0
> backpointer mismatch on [2622305267712 188416]
> owner ref check failed [2622305267712 188416]
> ref mismatch on [2622305456128 4096] extent item 1, found 0
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> Csum didn't match
> incorrect local backref count on 2622305456128 root 5 owner 9930734 offset 0
> found 0 wanted 1 back 0x55d61387d330
> backref disk bytenr does not match extent record, bytenr=2622305456128, ref
> bytenr=0
> backpointer mismatch on [2622305456128 4096]
> owner ref check failed [2622305456128 4096]
> owner ref check failed [4140883394560 16384]
> [2/7] checking extents                         (0:31:38 elapsed, 3783074 items
> checked)
> ERROR: errors found in extent allocation tree or chunk allocation
> [3/7] checking free space cache                (0:03:58 elapsed, 5135 items
> checked)
> [4/7] checking fs roots                        (1:02:53 elapsed, 139654 items
> checked)
> 
> I tried to mount the filesystem with nodatasum but I was not able to delete the
> suspected wrong directory. FS was remounted RO.
> btrfs inspect-internal logical-resolve and btrfs inspect-internal inode-resolve
> are not able to resolve logical and inode path from the above errors.
> 
> How could I save my filesystem? Should I try --repair or --init-csum-tree?
Have you checked your RAM yet?  This looks to me like cumulative damage 
from bad hardware, and if you've ruled the disks out, RAM is the next 
most likely culprit.

Until you figure out what is causing the problem in the first place 
though, there's not much point in trying to fix it (do make sure you 
have current backups however).


  reply	other threads:[~2019-02-12 12:05 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-12  3:16 Corrupted filesystem, looking for guidance Sébastien Luttringer
2019-02-12 12:05 ` Austin S. Hemmelgarn [this message]
2019-02-12 12:31 ` Artem Mygaiev
2019-02-12 23:50   ` Sébastien Luttringer
2019-02-12 22:57 ` Chris Murphy
     [not found] ` <CAJCQCtQ+b9y7fBXPPhB-gQrHAH-pCzau6nP1OabsC1GNqNnE1w@mail.gmail.com>
2019-02-18 20:14   ` Sébastien Luttringer
2019-02-18 21:06     ` Chris Murphy
2019-02-23 18:14       ` Sébastien Luttringer
2019-02-24  0:00         ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=09f37190-bc2a-c0fa-a467-18a30d360d6f@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=seblu@seblu.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).