All of lore.kernel.org
 help / color / mirror / Atom feed
From: b11g <b11g@protonmail.com>
To: Chris Murphy <lists@colorremedies.com>
Cc: Qu Wenruo <quwenruo.btrfs@gmx.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: BTRFS corruption: open_ctree failed
Date: Thu, 03 Jan 2019 13:55:54 +0000	[thread overview]
Message-ID: <OAUyvYQ65eFoxQY0GmEgh96YvtyWfbnvB6ebVJkADANt8NfUcRVct_IdyQduy58Dnuv2ExOhmP4MpOX1Q7Ln-GK7kUQDz6Gt2T7jYSMv3yM=@protonmail.com> (raw)
In-Reply-To: <CAJCQCtR8qEyw51Ox2jU1E4nWBVfV4j8_nEcMgz2gTJa=yhv=Tw@mail.gmail.com>

Responded in-line.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, 3 January 2019 05:52, Chris Murphy <lists@colorremedies.com> wrote:

> On Wed, Jan 2, 2019 at 5:26 PM b11g b11g@protonmail.com wrote:
>
> > Hi all,
> > I have several BTRFS success-stories, and I've been an happy user for quite a long time now. I was therefore surprised to face a BTRFS corruption on a system I'd just installed.
> > I use NixOS, unstable branch (linux kernel 4.19.12). The system runs on a SSD with an ext4 boot partition, a simple btrfs root with some subvolumes, and some swap space only used for hibernation. I was working on my server as normal when I noticed all of my BTRFS subvolumes had been remounted ro. After a short time, I started getting various IO errors ("bus error" by journalctl, "I/O error" by ls etc.). I halted the system (hard reboot), at the reboot the BTRFS partition would not mount. I suspected the corruption to be disk-related, but smartctl does not show any warning for the disk, and the ext4 partition seems healthy.
> > Those are the kernel messages logged when I attempt to mount the partition:
> > Jan 02 23:39:38 nixos kernel: BTRFS warning (device sdd2): sdd2 checksum verify failed on <L> wanted <A> found <B> level 0
> > Jan 02 23:39:38 nixos kernel: BTRFS error (device sdd2): failed to read block groups: -5
> > Jan 02 23:39:38 nixos systemd[1]: Started Cleanup of Temporary Directories.
> > Jan 02 23:39:38 nixos kernel: BTRFS error (device sdd2): open_ctree failed
>
> Do you have the entire kernel message from the previous boot when the
> problem started, including I/O errors? We kinda need to see what was
> going on leading up to the read only mount, and the bus and I/O
> errors. journalctl -b-1 -k should do it, or using journalctl
> --list-boots to find it. You can redirect to a file with > and then
> attach to the reply if it's small enough, or put it up somewhere like
> Dropbox or Google Drive if it's too big.

Sadly I cannot find the journal file relevant to the boot in which the system failed in /var/log - only older entries, with no I/O errors. If you have any idea on where to look for logs I can check.


>
> btrfs rescue super -v /dev/sdd2
All Devices:
        Device: id = 1, name = /dev/sdd2

Before Recovering:
        [All good supers]:
                device name = /dev/sdd2
                superblock bytenr = 65536

                device name = /dev/sdd2
                superblock bytenr = <big N>

        [All bad supers]:

All supers are valid, no need to recover


> btrfs insp dump-s -f /dev/sdd2
superblock: bytenr=65536, device=/dev/sdd2
---------------------------------------------------------
csum_type               0 (crc32c)
csum_size               4
csum                    0x<C> [match]
bytenr                  65536
flags                   0x1
                        ( WRITTEN )
magic                   _BHRfS_M [match]
fsid                    <ID>
label                   main
generation              6337
root                    <~10^10>
sys_array_size          97
chunk_root_generation   5976
root_level              1
chunk_root              <~10^7>
chunk_root_level        0
log_root                <~10^9>
log_root_transid        0
log_root_level          0
total_bytes             <X:~10^12>
bytes_used              <~10^12>
sectorsize              4096
nodesize                16384
leafsize (deprecated)           16384
stripesize              4096
root_dir                6
num_devices             1
compat_flags            0x0
compat_ro_flags         0x0
incompat_flags          0x169
                        ( MIXED_BACKREF |
                          COMPRESS_LZO |
                          BIG_METADATA |
                          EXTENDED_IREF |
                          SKINNY_METADATA )
cache_generation        6337
uuid_tree_generation    6337
dev_item.uuid           <ID2>
dev_item.fsid           <ID> [match]
dev_item.type           0
dev_item.total_bytes    <X:~10^12>
dev_item.bytes_used     <~10^12>
dev_item.io_align       4096
dev_item.io_width       4096
dev_item.sector_size    4096
dev_item.devid          1
dev_item.dev_group      0
dev_item.seek_speed     0
dev_item.bandwidth      0
dev_item.generation     0
sys_chunk_array[2048]:
        item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM <Y>)
                length <L> owner 2 stripe_len 65536 type SYSTEM
                io_align 4096 io_width 4096 sector_size 4096
                num_stripes 1 sub_stripes 0
                        stripe 0 devid 1 offset <Y>
                        dev_uuid <ID2>
backup_roots[4]:
        backup 0:
<...>

>
> Those are reader only. And also try to mount with -o usebackuproot and
> if that fails -o ro,usebackuproot is often more tolerant. But that's
> for getting data off the volume, it's more useful to know why the file
> system broke. And also why btrfs check is failing, given that it's a
> current version.

I got the data back using btrfs restore, mount -o ro,usebackuproot fails with the same errors (open_ctree failed).


>
> If you get a chance you can take an image, maybe a Btrfs developer
> will find it useful to understand why the Btrfs check is failing.
>
>  <dev> /path/to/fileoutput.image
>
> That is usually around 1/2 the size of file system metadata. It
> contains no data and filenames will be hashed.
>
>
> ------------------------------------------------------------------------------------------------------------------
>
> Chris Murphy

I tried to take an image but even that fails:
"btrfs-image -c9 -t4 -ss /dev/sdd2 /mnt/metadata.image"
checksum verify failed on <N> found <A> wanted <B>
checksum verify failed on <N> found <A> wanted <B>
Csum didn't match
ERROR: open ctree failed
ERROR: create failed: Success


-b11g

  reply	other threads:[~2019-01-03 13:56 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-03  0:26 BTRFS corruption: open_ctree failed b11g
2019-01-03  4:52 ` Chris Murphy
2019-01-03 13:55   ` b11g [this message]
2019-01-11 12:29 ` b11g
2019-01-03  2:52 Tomasz Chmielewski
2019-01-03  7:27 ` Andrea Gelmini
2019-01-03  7:43   ` Tomasz Chmielewski
2019-01-03  8:22     ` Andrea Gelmini
2019-01-03  8:29       ` Tomasz Chmielewski
2019-01-03  9:46         ` Andrea Gelmini
2019-01-03 14:32   ` b11g

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='OAUyvYQ65eFoxQY0GmEgh96YvtyWfbnvB6ebVJkADANt8NfUcRVct_IdyQduy58Dnuv2ExOhmP4MpOX1Q7Ln-GK7kUQDz6Gt2T7jYSMv3yM=@protonmail.com' \
    --to=b11g@protonmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.