From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f43.google.com ([74.125.82.43]:38251 "EHLO mail-wm0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752344AbcITUSp (ORCPT ); Tue, 20 Sep 2016 16:18:45 -0400 Received: by mail-wm0-f43.google.com with SMTP id l132so54788782wmf.1 for ; Tue, 20 Sep 2016 13:18:44 -0700 (PDT) Subject: Re: multi-device btrfs with single data mode and disk failure To: Chris Murphy References: <1634818f-ff1d-722c-6d73-747ed7203a13@gmail.com> <760be1b7-79b2-a25d-7c60-04ceac1b6e40@gmail.com> <3460a1ac-7e66-cf6f-b229-06a0825401a5@gmail.com> <64102181-e02d-69a8-ead7-a27acadbe6a8@gmail.com> <4e7ec5eb-7fb6-2d19-f29d-82461e2d0bd2@gmail.com> Cc: Btrfs BTRFS From: Alexandre Poux Message-ID: <0b29471c-363a-1e2f-d352-1d422c07df64@gmail.com> Date: Tue, 20 Sep 2016 22:18:40 +0200 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: Le 20/09/2016 à 21:46, Chris Murphy a écrit : > On Tue, Sep 20, 2016 at 1:31 PM, Alexandre Poux wrote: >> >> Le 20/09/2016 à 21:11, Chris Murphy a écrit : >>> And no backup? Umm, I'd resolve that sooner than anything else. >> Yeah you are absolutely right, this was a temporary solution which came >> to be not that temporary. >> And I regret it already... > Well on the bright side, if this were LVM or mdadm linear/concat > array, the whole thing would be toast because any other file system > would have lost too much fs metadata on the missing device. > >>> It >>> should be true that it'll tolerate a read only mount indefinitely, but >>> read write? Not sure. This sort of edge case isn't well tested at all >>> seeing as it required changing the kernel to reduce safe guards. So >>> all bets are off the whole thing could become unmountable, not even >>> read only, and then it's a scraping job. >> I'm not that crazy, I tried the patch inside a virtual machine on >> virtual drives... >> And since it's only virtual, it may not work on the real partition... > Are you sure the virtual setup lacked a CHUNK_ITEM on the missing > device? That might be what pinned it in that case. In fact in my virtual setup there was more chunk missing (1 metadata 1 System and 1 Data). I will try to do a setup closer to my real one. > You could try some sort of overlay for your remaining drives. > Something like this: > https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file > > Make sure you understand the gotcha about cloning which applies here: > https://btrfs.wiki.kernel.org/index.php/Gotchas > > I think it's safe to use blockdev --setro on every real device you're > trying to protect from changes. And when mounting you'll at least need > to use device= mount option to explicitly mount each of the overlay > devices. Based on the wiki, I'm wincing, I don't really know for sure > if device mount option is enough to compel Btrfs to only use those > devices and not go off the rails and still use one of the real > devices, but at least if they're setro it won't matter (the mount will > just fail somehow due to write failures). > > So now you can try removing the missing device... and see what > happens. You could inspect the overlay files and see what changes were > made. Wow that looks like nice. So, if it work, and if we find a way to fix the filesystem inside the vm, I can use this over the real partion to check if it works before trying the fix for real. Nice idea. >>> What do you get for btrfs-debug-tree -t 3 >>> >>> That should show the chunk tree, and what I'm wondering if if the >>> chunk tree has any references to chunks on the missing device. Even if >>> there are no extents on that device, if there are chunks, that might >>> be one of the safeguards. >>> >> You'll find it attached. >> The missing device is the devid 8 (since it's the only one missing in >> btrfs fi show) >> I found it only once line 63 > Yeah bummer. Not used for system, data, or metadata chunks at all. > >