All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Neal Gompa <ngompa13@gmail.com>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Recovering Btrfs from a freak failure of the disk controller
Date: Fri, 5 Mar 2021 09:12:13 -0500	[thread overview]
Message-ID: <e5b5409b-0e34-abd5-81c9-48ef59c3fa03@toxicpanda.com> (raw)
In-Reply-To: <CAEg-Je_TN04fnE2Bg46Nysm2_fG7dcni-7c6wbfZQZqXhDhbnA@mail.gmail.com>

On 3/4/21 6:54 PM, Neal Gompa wrote:
> On Thu, Mar 4, 2021 at 3:25 PM Josef Bacik <josef@toxicpanda.com> wrote:
>>
>> On 3/3/21 2:38 PM, Neal Gompa wrote:
>>> On Wed, Mar 3, 2021 at 1:42 PM Josef Bacik <josef@toxicpanda.com> wrote:
>>>>
>>>> On 2/24/21 10:47 PM, Neal Gompa wrote:
>>>>> On Wed, Feb 24, 2021 at 10:44 AM Josef Bacik <josef@toxicpanda.com> wrote:
>>>>>>
>>>>>> On 2/24/21 9:23 AM, Neal Gompa wrote:
>>>>>>> On Tue, Feb 23, 2021 at 10:05 AM Josef Bacik <josef@toxicpanda.com> wrote:
>>>>>>>>
>>>>>>>> On 2/22/21 11:03 PM, Neal Gompa wrote:
>>>>>>>>> On Mon, Feb 22, 2021 at 2:34 PM Josef Bacik <josef@toxicpanda.com> wrote:
>>>>>>>>>>
>>>>>>>>>> On 2/21/21 1:27 PM, Neal Gompa wrote:
>>>>>>>>>>> On Wed, Feb 17, 2021 at 11:44 AM Josef Bacik <josef@toxicpanda.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 2/17/21 11:29 AM, Neal Gompa wrote:
>>>>>>>>>>>>> On Wed, Feb 17, 2021 at 9:59 AM Josef Bacik <josef@toxicpanda.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 2/17/21 9:50 AM, Neal Gompa wrote:
>>>>>>>>>>>>>>> On Wed, Feb 17, 2021 at 9:36 AM Josef Bacik <josef@toxicpanda.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 2/16/21 9:05 PM, Neal Gompa wrote:
>>>>>>>>>>>>>>>>> On Tue, Feb 16, 2021 at 4:24 PM Josef Bacik <josef@toxicpanda.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 2/16/21 3:29 PM, Neal Gompa wrote:
>>>>>>>>>>>>>>>>>>> On Tue, Feb 16, 2021 at 1:11 PM Josef Bacik <josef@toxicpanda.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On 2/16/21 11:27 AM, Neal Gompa wrote:
>>>>>>>>>>>>>>>>>>>>> On Tue, Feb 16, 2021 at 10:19 AM Josef Bacik <josef@toxicpanda.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On 2/14/21 3:25 PM, Neal Gompa wrote:
>>>>>>>>>>>>>>>>>>>>>>> Hey all,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> So one of my main computers recently had a disk controller failure
>>>>>>>>>>>>>>>>>>>>>>> that caused my machine to freeze. After rebooting, Btrfs refuses to
>>>>>>>>>>>>>>>>>>>>>>> mount. I tried to do a mount and the following errors show up in the
>>>>>>>>>>>>>>>>>>>>>>> journal:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS info (device sda3): disk space caching is enabled
>>>>>>>>>>>>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS info (device sda3): has skinny extents
>>>>>>>>>>>>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS critical (device sda3): corrupt leaf: root=401 block=796082176 slot=15 ino=203657, invalid inode transid: has 888896 expect [0, 888895]
>>>>>>>>>>>>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS error (device sda3): block=796082176 read time tree block corruption detected
>>>>>>>>>>>>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS critical (device sda3): corrupt leaf: root=401 block=796082176 slot=15 ino=203657, invalid inode transid: has 888896 expect [0, 888895]
>>>>>>>>>>>>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS error (device sda3): block=796082176 read time tree block corruption detected
>>>>>>>>>>>>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS warning (device sda3): couldn't read tree root
>>>>>>>>>>>>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS error (device sda3): open_ctree failed
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I've tried to do -o recovery,ro mount and get the same issue. I can't
>>>>>>>>>>>>>>>>>>>>>>> seem to find any reasonably good information on how to do recovery in
>>>>>>>>>>>>>>>>>>>>>>> this scenario, even to just recover enough to copy data off.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I'm on Fedora 33, the system was on Linux kernel version 5.9.16 and
>>>>>>>>>>>>>>>>>>>>>>> the Fedora 33 live ISO I'm using has Linux kernel version 5.10.14. I'm
>>>>>>>>>>>>>>>>>>>>>>> using btrfs-progs v5.10.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Can anyone help?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Can you try
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> btrfs check --clear-space-cache v1 /dev/whatever
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> That should fix the inode generation thing so it's sane, and then the tree
>>>>>>>>>>>>>>>>>>>>>> checker will allow the fs to be read, hopefully.  If not we can work out some
>>>>>>>>>>>>>>>>>>>>>> other magic.  Thanks,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Josef
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I got the same error as I did with btrfs-check --readonly...
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Oh lovely, what does btrfs check --readonly --backup do?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> No dice...
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> # btrfs check --readonly --backup /dev/sda3
>>>>>>>>>>>>>>>>>>>> Opening filesystem to check...
>>>>>>>>>>>>>>>>>>>> parent transid verify failed on 791281664 wanted 888893 found 888895
>>>>>>>>>>>>>>>>>>>> parent transid verify failed on 791281664 wanted 888893 found 888895
>>>>>>>>>>>>>>>>>>>> parent transid verify failed on 791281664 wanted 888893 found 888895
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hey look the block we're looking for, I wrote you some magic, just pull
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> https://github.com/josefbacik/btrfs-progs/tree/for-neal
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> build, and then run
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> btrfs-neal-magic /dev/sda3 791281664 888895
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This will force us to point at the old root with (hopefully) the right bytenr
>>>>>>>>>>>>>>>>>> and gen, and then hopefully you'll be able to recover from there.  This is kind
>>>>>>>>>>>>>>>>>> of saucy, so yolo, but I can undo it if it makes things worse.  Thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> # btrfs check --readonly /dev/sda3
>>>>>>>>>>>>>>>>>> Opening filesystem to check...
>>>>>>>>>>>>>>>>>> ERROR: could not setup extent tree
>>>>>>>>>>>>>>>>>> ERROR: cannot open file system
>>>>>>>>>>>>>>>>> # btrfs check --clear-space-cache v1 /dev/sda3
>>>>>>>>>>>>>>>>>> Opening filesystem to check...
>>>>>>>>>>>>>>>>>> ERROR: could not setup extent tree
>>>>>>>>>>>>>>>>>> ERROR: cannot open file system
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It's better, but still no dice... :(
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hmm it's not telling us what's wrong with the extent tree, which is annoying.
>>>>>>>>>>>>>>>> Does mount -o rescue=all,ro work now that the root tree is normal?  Thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Nope, I see this in the journal:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): enabling all of the rescue options
>>>>>>>>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): ignoring data csums
>>>>>>>>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): ignoring bad roots
>>>>>>>>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): disabling log replay at mount time
>>>>>>>>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): disk space caching is enabled
>>>>>>>>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): has skinny extents
>>>>>>>>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS error (device sda3): tree level mismatch detected, bytenr=791281664 level expected=1 has=2
>>>>>>>>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS error (device sda3): tree level mismatch detected, bytenr=791281664 level expected=1 has=2
>>>>>>>>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS warning (device sda3): couldn't read tree root
>>>>>>>>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS error (device sda3): open_ctree failed
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ok git pull for-neal, rebuild, then run
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> btrfs-neal-magic /dev/sda3 791281664 888895 2
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I thought of this yesterday but in my head was like "naaahhhh, whats the chances
>>>>>>>>>>>>>> that the level doesn't match??".  Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Tried rescue mount again after running that and got a stack trace in
>>>>>>>>>>>>> the kernel, detailed in the following attached log.
>>>>>>>>>>>>
>>>>>>>>>>>> Huh I wonder how I didn't hit this when testing, I must have only tested with
>>>>>>>>>>>> zero'ing the extent root and the csum root.  You're going to have to build a
>>>>>>>>>>>> kernel with a fix for this
>>>>>>>>>>>>
>>>>>>>>>>>> https://paste.centos.org/view/7b48aaea
>>>>>>>>>>>>
>>>>>>>>>>>> and see if that gets you further.  Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I built a kernel build as an RPM with your patch[1] and tried it.
>>>>>>>>>>>
>>>>>>>>>>> [root@fedora ~]# mount -t btrfs -o rescue=all,ro /dev/sdb3 /mnt
>>>>>>>>>>> Killed
>>>>>>>>>>>
>>>>>>>>>>> The log from the journal is attached.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Ahh crud my bad, this should do it
>>>>>>>>>>
>>>>>>>>>> https://paste.centos.org/view/ac2e61ef
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Patch doesn't apply (note it is patch 667 below):
>>>>>>>>
>>>>>>>> Ah sorry, should have just sent you an iterative patch.  You can take the above
>>>>>>>> patch and just delete the hunk from volumes.c as you already have that applied
>>>>>>>> and then it'll work.  Thanks,
>>>>>>>>
>>>>>>>
>>>>>>> Failed with a weird error...?
>>>>>>>
>>>>>>> [root@fedora ~]# mount -t btrfs -o rescue=all,ro /dev/sda3 /mnt
>>>>>>> mount: /mnt: mount(2) system call failed: No such file or directory.
>>>>>>>
>>>>>>> Journal log with traceback attached.
>>>>>>
>>>>>> Last one maybe?
>>>>>>
>>>>>> https://paste.centos.org/view/80edd6fd
>>>>>>
>>>>>
>>>>> Similar weird failure:
>>>>>
>>>>> [root@fedora ~]# mount -t btrfs -o rescue=all,ro /dev/sdb3 /mnt
>>>>> mount: /mnt: mount(2) system call failed: No such file or directory.
>>>>>
>>>>> No crash in the journal this time, though:
>>>>>
>>>>>> Feb 24 22:43:19 fedora kernel: BTRFS info (device sdb3): enabling all of the rescue options
>>>>>> Feb 24 22:43:19 fedora kernel: BTRFS info (device sdb3): ignoring data csums
>>>>>> Feb 24 22:43:19 fedora kernel: BTRFS info (device sdb3): ignoring bad roots
>>>>>> Feb 24 22:43:19 fedora kernel: BTRFS info (device sdb3): disabling log replay at mount time
>>>>>> Feb 24 22:43:19 fedora kernel: BTRFS info (device sdb3): disk space caching is enabled
>>>>>> Feb 24 22:43:19 fedora kernel: BTRFS info (device sdb3): has skinny extents
>>>>>> Feb 24 22:43:19 fedora kernel: BTRFS warning (device sdb3): failed to read fs tree: -2
>>>>>> Feb 24 22:43:19 fedora kernel: BTRFS error (device sdb3): open_ctree failed
>>>>>
>>>>>
>>>>
>>>> Sorry Neal, you replied when I was in the middle of something and promptly
>>>> forgot about it.  I figured the fs root was fine, can you do the following so I
>>>> can figure out from the error messages what might be wrong
>>>>
>>>> btrfs check --readonly
>>>> btrfs restore -D
>>>> btrfs restore -l
>>>>
>>>
>>> It didn't work.. Here's the output:
>>>
>>> [root@fedora ~]# btrfs check --readonly /dev/sdb3
>>> Opening filesystem to check...
>>> ERROR: could not setup extent tree
>>> ERROR: cannot open file system
>>> [root@fedora ~]# btrfs restore -D /dev/sdb3 /mnt
>>> WARNING: could not setup extent tree, skipping it
>>> Couldn't setup device tree
>>> Could not open root, trying backup super
>>> parent transid verify failed on 796082176 wanted 888894 found 888896
>>> parent transid verify failed on 796082176 wanted 888894 found 888896
>>> parent transid verify failed on 796082176 wanted 888894 found 888896
>>> Ignoring transid failure
>>> WARNING: could not setup extent tree, skipping it
>>> Couldn't setup device tree
>>> Could not open root, trying backup super
>>> ERROR: superblock bytenr 274877906944 is larger than device size 263132807168
>>> Could not open root, trying backup super
>>> [root@fedora ~]# btrfs restore -l /dev/sdb3 /mnt
>>> WARNING: could not setup extent tree, skipping it
>>> Couldn't setup device tree
>>> Could not open root, trying backup super
>>> parent transid verify failed on 796082176 wanted 888894 found 888896
>>> parent transid verify failed on 796082176 wanted 888894 found 888896
>>> parent transid verify failed on 796082176 wanted 888894 found 888896
>>> Ignoring transid failure
>>> WARNING: could not setup extent tree, skipping it
>>> Couldn't setup device tree
>>> Could not open root, trying backup super
>>> ERROR: superblock bytenr 274877906944 is larger than device size 263132807168
>>> Could not open root, trying backup super
>>>
>>>
>>
>> Hmm OK I think we want the neal magic for this one too, but before we go doing
>> that can I get a
>>
>> btrfs inspect-internal -f /dev/whatever
>>
>> so I can make sure I'm not just blindly clobbering something.  Thanks,
>>
> 
> Doesn't work, did you mean some other command?
> 
> [root@fedora ~]#  btrfs inspect-internal -f /dev/sdb3
> btrfs inspect-internal: unknown token '-f'

Sigh, sorry, btrfs inspect-internal dump-super -f /dev/sdb3

Josef

  reply	other threads:[~2021-03-05 14:13 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-14 20:25 Recovering Btrfs from a freak failure of the disk controller Neal Gompa
2021-02-14 21:33 ` Chris Murphy
2021-02-14 22:04 ` Chris Murphy
2021-02-14 22:11 ` Chris Murphy
2021-02-14 23:23   ` Neal Gompa
2021-02-14 23:51     ` Chris Murphy
2021-02-14 23:56       ` Neal Gompa
2021-02-16 15:19 ` Josef Bacik
2021-02-16 16:27   ` Neal Gompa
2021-02-16 18:11     ` Josef Bacik
2021-02-16 20:29       ` Neal Gompa
2021-02-16 21:24         ` Josef Bacik
2021-02-17  2:05           ` Neal Gompa
2021-02-17 14:36             ` Josef Bacik
2021-02-17 14:50               ` Neal Gompa
2021-02-17 14:59                 ` Josef Bacik
2021-02-17 16:29                   ` Neal Gompa
2021-02-17 16:44                     ` Josef Bacik
2021-02-21 18:27                       ` Neal Gompa
2021-02-22 19:34                         ` Josef Bacik
2021-02-23  4:03                           ` Neal Gompa
2021-02-23 15:05                             ` Josef Bacik
2021-02-24 14:23                               ` Neal Gompa
2021-02-24 15:44                                 ` Josef Bacik
2021-02-25  3:47                                   ` Neal Gompa
2021-03-03 18:42                                     ` Josef Bacik
2021-03-03 19:38                                       ` Neal Gompa
2021-03-04 20:25                                         ` Josef Bacik
2021-03-04 23:54                                           ` Neal Gompa
2021-03-05 14:12                                             ` Josef Bacik [this message]
2021-03-05 14:41                                               ` Neal Gompa
2021-03-05 22:01                                                 ` Josef Bacik
2021-03-06  1:03                                                   ` Neal Gompa
2021-03-08 18:38                                                     ` Josef Bacik
2021-03-08 20:01                                                       ` Neal Gompa
2021-03-08 22:04                                                         ` Josef Bacik
2021-03-09  1:12                                                           ` Neal Gompa
2021-03-09 19:04                                                             ` Josef Bacik
2021-03-09 21:06                                                               ` Neal Gompa
2021-03-09 21:56                                                                 ` Josef Bacik
2021-03-09 23:31                                                                   ` Neal Gompa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e5b5409b-0e34-abd5-81c9-48ef59c3fa03@toxicpanda.com \
    --to=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=ngompa13@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.