linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: BTRFS corruption: open_ctree failed
@ 2019-01-03  2:52 Tomasz Chmielewski
  2019-01-03  7:27 ` Andrea Gelmini
  0 siblings, 1 reply; 11+ messages in thread
From: Tomasz Chmielewski @ 2019-01-03  2:52 UTC (permalink / raw)
  To: Btrfs BTRFS

> I have several BTRFS success-stories, and I've been an happy user for 
> quite=
> a long time now. I was therefore surprised to face a BTRFS corruption 
> on a=
> system I'd just installed.
> I use NixOS, unstable branch (linux kernel 4.19.12). The system runs on 
> a S=
> SD with an ext4 boot partition, a simple btrfs root with some 
> subvolumes, a=

Did you use 4.19.x kernels earlier than 4.19.8?

They had a bug which would corrupt filesystems (mostly ext4 users would 
be reporting it, but I saw it with other filesystems, like xfs and 
btrfs, too):

https://www.phoronix.com/scan.php?page=news_item&px=Linux-4.19-4.20-BLK-MQ-Fix

Interestingly, btrfs in RAID mode would often detect and correct these 
corruptions.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BTRFS corruption: open_ctree failed
  2019-01-03  2:52 BTRFS corruption: open_ctree failed Tomasz Chmielewski
@ 2019-01-03  7:27 ` Andrea Gelmini
  2019-01-03  7:43   ` Tomasz Chmielewski
  2019-01-03 14:32   ` b11g
  0 siblings, 2 replies; 11+ messages in thread
From: Andrea Gelmini @ 2019-01-03  7:27 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: Btrfs BTRFS, b11g

On Thu, Jan 03, 2019 at 11:52:05AM +0900, Tomasz Chmielewski wrote:
> Did you use 4.19.x kernels earlier than 4.19.8?
> 
> They had a bug which would corrupt filesystems (mostly ext4 users would be
> reporting it, but I saw it with other filesystems, like xfs and btrfs, too):

Well, just for the record, it triggers when you have 
scsi devices using elevator=none over blkmq.

And it's not a default/usual configuration.

So, b11g, can you check please if the NixOS kernel is compiled with these flags?
And/or if they have something like:
scsi_mod.use_blk_mq=1
in the boot command line?

Ciao,
Gelma

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BTRFS corruption: open_ctree failed
  2019-01-03  7:27 ` Andrea Gelmini
@ 2019-01-03  7:43   ` Tomasz Chmielewski
  2019-01-03  8:22     ` Andrea Gelmini
  2019-01-03 14:32   ` b11g
  1 sibling, 1 reply; 11+ messages in thread
From: Tomasz Chmielewski @ 2019-01-03  7:43 UTC (permalink / raw)
  To: Andrea Gelmini; +Cc: Btrfs BTRFS, b11g, Andrea Gelmini

On 2019-01-03 16:27, Andrea Gelmini wrote:
> On Thu, Jan 03, 2019 at 11:52:05AM +0900, Tomasz Chmielewski wrote:
>> Did you use 4.19.x kernels earlier than 4.19.8?
>> 
>> They had a bug which would corrupt filesystems (mostly ext4 users 
>> would be
>> reporting it, but I saw it with other filesystems, like xfs and btrfs, 
>> too):
> 
> Well, just for the record, it triggers when you have
> scsi devices using elevator=none over blkmq.
> 
> And it's not a default/usual configuration.

Still - it is a default configuration for some distributions. I.e. 
Ubuntu "ppa" kernels[1] have this enabled by default (at least 4.19.x 
and 4.20.x).


[1] https://kernel.ubuntu.com/~kernel-ppa/mainline/


Tomasz Chmielewski
https://lxadm.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BTRFS corruption: open_ctree failed
  2019-01-03  7:43   ` Tomasz Chmielewski
@ 2019-01-03  8:22     ` Andrea Gelmini
  2019-01-03  8:29       ` Tomasz Chmielewski
  0 siblings, 1 reply; 11+ messages in thread
From: Andrea Gelmini @ 2019-01-03  8:22 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: Btrfs BTRFS, b11g

On Thu, Jan 03, 2019 at 04:43:20PM +0900, Tomasz Chmielewski wrote:
> > And it's not a default/usual configuration.
> 
> Still - it is a default configuration for some distributions. I.e. Ubuntu
> "ppa" kernels[1] have this enabled by default (at least 4.19.x and 4.20.x).

a) he is not using Ubuntu;
b) I use this PPA;
c) if you look at the config, you see the default scheduler is not "none", also
   the mq is compiled as module and - anyway - on release with the bug you have to
   force the kernel to use it with boot parameter.

Ciao,
Gelma

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BTRFS corruption: open_ctree failed
  2019-01-03  8:22     ` Andrea Gelmini
@ 2019-01-03  8:29       ` Tomasz Chmielewski
  2019-01-03  9:46         ` Andrea Gelmini
  0 siblings, 1 reply; 11+ messages in thread
From: Tomasz Chmielewski @ 2019-01-03  8:29 UTC (permalink / raw)
  To: Andrea Gelmini; +Cc: Btrfs BTRFS, b11g

On 2019-01-03 17:22, Andrea Gelmini wrote:
> On Thu, Jan 03, 2019 at 04:43:20PM +0900, Tomasz Chmielewski wrote:
>> > And it's not a default/usual configuration.
>> 
>> Still - it is a default configuration for some distributions. I.e. 
>> Ubuntu
>> "ppa" kernels[1] have this enabled by default (at least 4.19.x and 
>> 4.20.x).
> 
> a) he is not using Ubuntu;
> b) I use this PPA;
> c) if you look at the config, you see the default scheduler is not 
> "none", also
>    the mq is compiled as module and - anyway - on release with the bug
> you have to
>    force the kernel to use it with boot parameter.

Hmmm, is it the case?

# uname -r
4.20.0-042000-generic

# cat /sys/block/sda/queue/scheduler
[none]

# cat /sys/block/sdb/queue/scheduler
[none]


I could see filesystem corruption on every system using with PPA 4.19 
lower than .8. Didn't do any kernel boot parameter changes when 
upgrading from 4.18.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BTRFS corruption: open_ctree failed
  2019-01-03  8:29       ` Tomasz Chmielewski
@ 2019-01-03  9:46         ` Andrea Gelmini
  0 siblings, 0 replies; 11+ messages in thread
From: Andrea Gelmini @ 2019-01-03  9:46 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: Btrfs BTRFS, b11g

My last email on this thread: I swear.

I don't want to bother the mailing list about this more.

On Thu, Jan 03, 2019 at 05:29:51PM +0900, Tomasz Chmielewski wrote:

> I could see filesystem corruption on every system using with PPA 4.19 lower
> than .8. Didn't do any kernel boot parameter changes when upgrading from
> 4.18.

gelma@check:~$ uname -a # to use a broken kernel
Linux check 4.19.6-041906-generic #201812030857 SMP Mon Dec 3 13:59:30 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
gelma@check:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04 LTS
Release:        18.04
Codename:       bionic
gelma@check:~$ dmesg|grep -i scheduler
[    1.062769] io scheduler noop registered
[    1.062993] io scheduler deadline registered
[    1.063257] io scheduler cfq registered (default)

But also just having set "none" is not enough.
To trigger it you must force multiqueue.

I usually compile the git kernel, but I destroyed the
fs only when I added:

scsi_mod.use_blk_mq=1

on boot kernel.

Details here:
https://bugzilla.kernel.org/show_bug.cgi?id=201685

Ciao,
Gelma

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BTRFS corruption: open_ctree failed
  2019-01-03  7:27 ` Andrea Gelmini
  2019-01-03  7:43   ` Tomasz Chmielewski
@ 2019-01-03 14:32   ` b11g
  1 sibling, 0 replies; 11+ messages in thread
From: b11g @ 2019-01-03 14:32 UTC (permalink / raw)
  To: Andrea Gelmini; +Cc: Tomasz Chmielewski, Btrfs BTRFS

I heard of the ext4 bug, but I didn't check my kernel more carefully - contrarily to what I reported previously, my kernel version was 4.14.79, NOT 4.19.12 (I think I installed the latter, but the system was still pending a reboot). I am sorry if this caused confusion.

AFAIK, those are the flags used in the NixOS kernel:
https://github.com/NixOS/nixpkgs/blob/0396345b79436f54920f7eb651ab42acf2eb7973/pkgs/os-specific/linux/kernel/common-config.nix
I did not find references to "elevator" nor "scsi_mod.use_blk_mq" there or in the boot command line.

-b11g

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, 3 January 2019 08:27, Andrea Gelmini <andrea.gelmini@gelma.net> wrote:

> On Thu, Jan 03, 2019 at 11:52:05AM +0900, Tomasz Chmielewski wrote:
>
> > Did you use 4.19.x kernels earlier than 4.19.8?
> > They had a bug which would corrupt filesystems (mostly ext4 users would be
> > reporting it, but I saw it with other filesystems, like xfs and btrfs, too):
>
> Well, just for the record, it triggers when you have
> scsi devices using elevator=none over blkmq.
>
> And it's not a default/usual configuration.
>
> So, b11g, can you check please if the NixOS kernel is compiled with these flags?
> And/or if they have something like:
> scsi_mod.use_blk_mq=1
> in the boot command line?
>
> Ciao,
> Gelma

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BTRFS corruption: open_ctree failed
  2019-01-03  0:26 b11g
  2019-01-03  4:52 ` Chris Murphy
@ 2019-01-11 12:29 ` b11g
  1 sibling, 0 replies; 11+ messages in thread
From: b11g @ 2019-01-11 12:29 UTC (permalink / raw)
  To: linux-btrfs

Follow up: the issue was a faulty DIMM module. For some strange coincidence, only the space allocated to disk caches appeared to be corrupted - with the rest of the system working flawlessly most of the time.

I would guess that BTRFS tried to self-heal based on the cached data, ultimately corrupting the file system behind salvation?

If anyone gets here with similar problems - memtest your ram before doing anything!

-b11g


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, 3 January 2019 01:26, b11g <b11g@protonmail.com> wrote:

> Hi all,
>
> I have several BTRFS success-stories, and I've been an happy user for quite a long time now. I was therefore surprised to face a BTRFS corruption on a system I'd just installed.
>
> I use NixOS, unstable branch (linux kernel 4.19.12). The system runs on a SSD with an ext4 boot partition, a simple btrfs root with some subvolumes, and some swap space only used for hibernation. I was working on my server as normal when I noticed all of my BTRFS subvolumes had been remounted ro. After a short time, I started getting various IO errors ("bus error" by journalctl, "I/O error" by ls etc.). I halted the system (hard reboot), at the reboot the BTRFS partition would not mount. I suspected the corruption to be disk-related, but smartctl does not show any warning for the disk, and the ext4 partition seems healthy.
>
> Those are the kernel messages logged when I attempt to mount the partition:
> Jan 02 23:39:38 nixos kernel: BTRFS warning (device sdd2): sdd2 checksum verify failed on <L> wanted <A> found <B> level 0
> Jan 02 23:39:38 nixos kernel: BTRFS error (device sdd2): failed to read block groups: -5
> Jan 02 23:39:38 nixos systemd[1]: Started Cleanup of Temporary Directories.
> Jan 02 23:39:38 nixos kernel: BTRFS error (device sdd2): open_ctree failed
>
> Some queries for the error code I got lead me to those two recent threads:
> https://www.spinics.net/lists/linux-btrfs/msg84973.html
> https://www.spinics.net/lists/linux-btrfs/msg83833.html
>
> Using btrfs-progs-4.15.1, "btrfs restore /dev/sdd2 /tmp/" fails with:
> checksum verify failed on <N> found <A> wanted <B>
> checksum verify failed on <N> found <A> wanted <B>
> Csum didn't match
> Could not open root, trying backup super
> checksum verify failed on <N> found <A> wanted <B>
> checksum verify failed on <N> found <A> wanted <B>
> Csum didn't match
> Could not open root, trying backup super
> ERROR: superblock bytenr <X> is larger than device size <Y>
> Could not open root, trying backup super
>
> Using btrfs-progs-4.19.1, "btrfs restore /dev/sdd2 /tmp/" succeeds with some exceptions:
> We have looped trying to restore files in /@/nix/store too many times to be making progress, stopping
>
> I do not have much time for debugging the issue and I did not lose important data, so I tried a couple of commands suggested on the threads and in the docs (without fully understanding them):
>
> "btrfs rescue zero-log /dev/sdd2":
> checksum verify failed on <N> found <A> wanted <B>
> checksum verify failed on <N> found <A> wanted <B>
> Csum didn't match
> ERROR: could not open ctree
>
> "btrfs check --repair /dev/sdd2" (I know, I was not supposed to run this one):
> Opening filesystem to check...
> checksum verify failed on <N> found <A> wanted <B>
> checksum verify failed on <N> found <A> wanted <B>
> Csum didn't match
> ERROR: could not open ctree
>
> Same for "btrfs check --init-csum-tree /dev/sdd2".
>
> I expect to wipe the disk and do a clean start in the following days, I just wanted to report this in the hope it helps in the development (sorry for the redaction). If you need more information, I'll be glad to help as I can!
>
> Thank you for your work,
> Cheers,
>
> -   b11g



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BTRFS corruption: open_ctree failed
  2019-01-03  4:52 ` Chris Murphy
@ 2019-01-03 13:55   ` b11g
  0 siblings, 0 replies; 11+ messages in thread
From: b11g @ 2019-01-03 13:55 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Qu Wenruo, linux-btrfs

Responded in-line.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, 3 January 2019 05:52, Chris Murphy <lists@colorremedies.com> wrote:

> On Wed, Jan 2, 2019 at 5:26 PM b11g b11g@protonmail.com wrote:
>
> > Hi all,
> > I have several BTRFS success-stories, and I've been an happy user for quite a long time now. I was therefore surprised to face a BTRFS corruption on a system I'd just installed.
> > I use NixOS, unstable branch (linux kernel 4.19.12). The system runs on a SSD with an ext4 boot partition, a simple btrfs root with some subvolumes, and some swap space only used for hibernation. I was working on my server as normal when I noticed all of my BTRFS subvolumes had been remounted ro. After a short time, I started getting various IO errors ("bus error" by journalctl, "I/O error" by ls etc.). I halted the system (hard reboot), at the reboot the BTRFS partition would not mount. I suspected the corruption to be disk-related, but smartctl does not show any warning for the disk, and the ext4 partition seems healthy.
> > Those are the kernel messages logged when I attempt to mount the partition:
> > Jan 02 23:39:38 nixos kernel: BTRFS warning (device sdd2): sdd2 checksum verify failed on <L> wanted <A> found <B> level 0
> > Jan 02 23:39:38 nixos kernel: BTRFS error (device sdd2): failed to read block groups: -5
> > Jan 02 23:39:38 nixos systemd[1]: Started Cleanup of Temporary Directories.
> > Jan 02 23:39:38 nixos kernel: BTRFS error (device sdd2): open_ctree failed
>
> Do you have the entire kernel message from the previous boot when the
> problem started, including I/O errors? We kinda need to see what was
> going on leading up to the read only mount, and the bus and I/O
> errors. journalctl -b-1 -k should do it, or using journalctl
> --list-boots to find it. You can redirect to a file with > and then
> attach to the reply if it's small enough, or put it up somewhere like
> Dropbox or Google Drive if it's too big.

Sadly I cannot find the journal file relevant to the boot in which the system failed in /var/log - only older entries, with no I/O errors. If you have any idea on where to look for logs I can check.


>
> btrfs rescue super -v /dev/sdd2
All Devices:
        Device: id = 1, name = /dev/sdd2

Before Recovering:
        [All good supers]:
                device name = /dev/sdd2
                superblock bytenr = 65536

                device name = /dev/sdd2
                superblock bytenr = <big N>

        [All bad supers]:

All supers are valid, no need to recover


> btrfs insp dump-s -f /dev/sdd2
superblock: bytenr=65536, device=/dev/sdd2
---------------------------------------------------------
csum_type               0 (crc32c)
csum_size               4
csum                    0x<C> [match]
bytenr                  65536
flags                   0x1
                        ( WRITTEN )
magic                   _BHRfS_M [match]
fsid                    <ID>
label                   main
generation              6337
root                    <~10^10>
sys_array_size          97
chunk_root_generation   5976
root_level              1
chunk_root              <~10^7>
chunk_root_level        0
log_root                <~10^9>
log_root_transid        0
log_root_level          0
total_bytes             <X:~10^12>
bytes_used              <~10^12>
sectorsize              4096
nodesize                16384
leafsize (deprecated)           16384
stripesize              4096
root_dir                6
num_devices             1
compat_flags            0x0
compat_ro_flags         0x0
incompat_flags          0x169
                        ( MIXED_BACKREF |
                          COMPRESS_LZO |
                          BIG_METADATA |
                          EXTENDED_IREF |
                          SKINNY_METADATA )
cache_generation        6337
uuid_tree_generation    6337
dev_item.uuid           <ID2>
dev_item.fsid           <ID> [match]
dev_item.type           0
dev_item.total_bytes    <X:~10^12>
dev_item.bytes_used     <~10^12>
dev_item.io_align       4096
dev_item.io_width       4096
dev_item.sector_size    4096
dev_item.devid          1
dev_item.dev_group      0
dev_item.seek_speed     0
dev_item.bandwidth      0
dev_item.generation     0
sys_chunk_array[2048]:
        item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM <Y>)
                length <L> owner 2 stripe_len 65536 type SYSTEM
                io_align 4096 io_width 4096 sector_size 4096
                num_stripes 1 sub_stripes 0
                        stripe 0 devid 1 offset <Y>
                        dev_uuid <ID2>
backup_roots[4]:
        backup 0:
<...>

>
> Those are reader only. And also try to mount with -o usebackuproot and
> if that fails -o ro,usebackuproot is often more tolerant. But that's
> for getting data off the volume, it's more useful to know why the file
> system broke. And also why btrfs check is failing, given that it's a
> current version.

I got the data back using btrfs restore, mount -o ro,usebackuproot fails with the same errors (open_ctree failed).


>
> If you get a chance you can take an image, maybe a Btrfs developer
> will find it useful to understand why the Btrfs check is failing.
>
>  <dev> /path/to/fileoutput.image
>
> That is usually around 1/2 the size of file system metadata. It
> contains no data and filenames will be hashed.
>
>
> ------------------------------------------------------------------------------------------------------------------
>
> Chris Murphy

I tried to take an image but even that fails:
"btrfs-image -c9 -t4 -ss /dev/sdd2 /mnt/metadata.image"
checksum verify failed on <N> found <A> wanted <B>
checksum verify failed on <N> found <A> wanted <B>
Csum didn't match
ERROR: open ctree failed
ERROR: create failed: Success


-b11g

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BTRFS corruption: open_ctree failed
  2019-01-03  0:26 b11g
@ 2019-01-03  4:52 ` Chris Murphy
  2019-01-03 13:55   ` b11g
  2019-01-11 12:29 ` b11g
  1 sibling, 1 reply; 11+ messages in thread
From: Chris Murphy @ 2019-01-03  4:52 UTC (permalink / raw)
  To: b11g, Qu Wenruo; +Cc: linux-btrfs

On Wed, Jan 2, 2019 at 5:26 PM b11g <b11g@protonmail.com> wrote:
>
> Hi all,
>
> I have several BTRFS success-stories, and I've been an happy user for quite a long time now. I was therefore surprised to face a BTRFS corruption on a system I'd just installed.
>
> I use NixOS, unstable branch (linux kernel 4.19.12). The system runs on a SSD with an ext4 boot partition, a simple btrfs root with some subvolumes, and some swap space only used for hibernation. I was working on my server as normal when I noticed all of my BTRFS subvolumes had been remounted ro. After a short time, I started getting various IO errors ("bus error" by journalctl, "I/O error" by ls etc.). I halted the system (hard reboot), at the reboot the BTRFS partition would not mount. I suspected the corruption to be disk-related, but smartctl does not show any warning for the disk, and the ext4 partition seems healthy.
>
> Those are the kernel messages logged when I attempt to mount the partition:
> Jan 02 23:39:38 nixos kernel: BTRFS warning (device sdd2): sdd2 checksum verify failed on <L> wanted <A> found <B> level 0
> Jan 02 23:39:38 nixos kernel: BTRFS error (device sdd2): failed to read block groups: -5
> Jan 02 23:39:38 nixos systemd[1]: Started Cleanup of Temporary Directories.
> Jan 02 23:39:38 nixos kernel: BTRFS error (device sdd2): open_ctree failed

Do you have the entire kernel message from the previous boot when the
problem started, including I/O errors? We kinda need to see what was
going on leading up to the read only mount, and the bus and I/O
errors. journalctl -b-1 -k should do it, or using journalctl
--list-boots to find it. You can redirect to a file with > and then
attach to the reply if it's small enough, or put it up somewhere like
Dropbox or Google Drive if it's too big.

btrfs rescue super -v /dev/sdd2
btrfs insp dump-s -f /dev/sdd2

Those are reader only. And also try to mount with -o usebackuproot and
if that fails -o ro,usebackuproot is often more tolerant. But that's
for getting data off the volume, it's more useful to know why the file
system broke. And also why btrfs check is failing, given that it's a
current version.

If you get a chance you can take an image, maybe a Btrfs developer
will find it useful to  understand why the Btrfs check is failing.

btrfs-image -c9 -t4 -ss <dev> /path/to/fileoutput.image

That is usually around 1/2 the size of file system metadata. It
contains no data and filenames will be hashed.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* BTRFS corruption: open_ctree failed
@ 2019-01-03  0:26 b11g
  2019-01-03  4:52 ` Chris Murphy
  2019-01-11 12:29 ` b11g
  0 siblings, 2 replies; 11+ messages in thread
From: b11g @ 2019-01-03  0:26 UTC (permalink / raw)
  To: linux-btrfs

Hi all,

I have several BTRFS success-stories, and I've been an happy user for quite a long time now. I was therefore surprised to face a BTRFS corruption on a system I'd just installed.

I use NixOS, unstable branch (linux kernel 4.19.12). The system runs on a SSD with an ext4 boot partition, a simple btrfs root with some subvolumes, and some swap space only used for hibernation. I was working on my server as normal when I noticed all of my BTRFS subvolumes had been remounted ro. After a short time, I started getting various IO errors ("bus error" by journalctl, "I/O error" by ls etc.). I halted the system (hard reboot), at the reboot the BTRFS partition would not mount. I suspected the corruption to be disk-related, but smartctl does not show any warning for the disk, and the ext4 partition seems healthy.

Those are the kernel messages logged when I attempt to mount the partition:
Jan 02 23:39:38 nixos kernel: BTRFS warning (device sdd2): sdd2 checksum verify failed on <L> wanted <A> found <B> level 0
Jan 02 23:39:38 nixos kernel: BTRFS error (device sdd2): failed to read block groups: -5
Jan 02 23:39:38 nixos systemd[1]: Started Cleanup of Temporary Directories.
Jan 02 23:39:38 nixos kernel: BTRFS error (device sdd2): open_ctree failed


Some queries for the error code I got lead me to those two recent threads:
https://www.spinics.net/lists/linux-btrfs/msg84973.html
https://www.spinics.net/lists/linux-btrfs/msg83833.html


Using btrfs-progs-4.15.1,  "btrfs restore /dev/sdd2 /tmp/" fails with:
checksum verify failed on <N> found <A> wanted <B>
checksum verify failed on <N> found <A> wanted <B>
Csum didn't match
Could not open root, trying backup super
checksum verify failed on <N> found <A> wanted <B>
checksum verify failed on <N> found <A> wanted <B>
Csum didn't match
Could not open root, trying backup super
ERROR: superblock bytenr <X> is larger than device size <Y>
Could not open root, trying backup super

Using btrfs-progs-4.19.1, "btrfs restore /dev/sdd2 /tmp/" succeeds with some exceptions:
We have looped trying to restore files in /@/nix/store too many times to be making progress, stopping

I do not have much time for debugging the issue and I did not lose important data, so I tried a couple of commands suggested on the threads and in the docs (without fully understanding them):

"btrfs rescue zero-log /dev/sdd2":
checksum verify failed on <N> found <A> wanted <B>
checksum verify failed on <N> found <A> wanted <B>
Csum didn't match
ERROR: could not open ctree

"btrfs check --repair /dev/sdd2" (I know, I was not supposed to run this one):
Opening filesystem to check...
checksum verify failed on <N> found <A> wanted <B>
checksum verify failed on <N> found <A> wanted <B>
Csum didn't match
ERROR: could not open ctree

Same for "btrfs check --init-csum-tree /dev/sdd2".


I expect to wipe the disk and do a clean start in the following days, I just wanted to report this in the hope it helps in the development (sorry for the redaction). If you need more information, I'll be glad to help as I can!

Thank you for your work,
Cheers,
- b11g


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-01-11 12:29 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-03  2:52 BTRFS corruption: open_ctree failed Tomasz Chmielewski
2019-01-03  7:27 ` Andrea Gelmini
2019-01-03  7:43   ` Tomasz Chmielewski
2019-01-03  8:22     ` Andrea Gelmini
2019-01-03  8:29       ` Tomasz Chmielewski
2019-01-03  9:46         ` Andrea Gelmini
2019-01-03 14:32   ` b11g
  -- strict thread matches above, loose matches on Subject: below --
2019-01-03  0:26 b11g
2019-01-03  4:52 ` Chris Murphy
2019-01-03 13:55   ` b11g
2019-01-11 12:29 ` b11g

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).