On Mon, Dec 02, 2019 at 10:27:49PM +0100, Gard Vaaler wrote:
> > 1. des. 2019 kl. 19:51 skrev Nikolay Borisov <nborisov@suse.com>:
> > On 1.12.19 г. 19:27 ч., Gard Vaaler wrote:
> >> Trying to recover a filesystem that was corrupted by losing writes due to a failing caching device, I get the following error:
> >>> ERROR: child eb corrupted: parent bytenr=2529690976256 item=0 parent level=2 child level=0
> >> 
> >> Trying to zero the journal or reinitialising the extent tree yields the same error. Is there any way to recover the filesystem? Relevant logs attached.
> > 
> > Provide more information about your storage stack.
> 
> 
> Nothing special: SATA disks with (now-detached) SATA SSDs.

Is it a pair of 2x (bcache-on-disk) in raid1?  Did both cache devices
fail?  Were they configured as writeback cache?  Does the drive firmware
have bugs that affect either btrfs or bcache?

If the caches are independent (no shared caches or disks), and you
had only one cache device failure, and the filesystem is btrfs raid1,
then the non-failing cache should be OK, and can be used to recover the
contents of failed device.  You'll need at least one pair of cache and
disk to be up and running.

If any of those conditions are false then it's probably toast.  btrfs
will reject a filesystem missing just one write--a filesystem missing
thousands or millions of writes due to a writeback cache failure is
going to be data soup.



> -- 
> Gard
>