linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Wade <spikewade@gmail.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: BTRFS RAID filesystem unmountable
Date: Sun, 29 Apr 2018 09:59:13 +0100	[thread overview]
Message-ID: <CAB+znrHPF7963-MAXtgu8s3+vLEA7ENE0=uCCJ6sWCF6KUWToQ@mail.gmail.com> (raw)
In-Reply-To: <bff1c7b3-5bac-9806-a376-3612057865cf@gmx.com>

Ok, will it be possible for me to install the new version of the tools
on my current kernel without overriding the existing install? Hesitant
to update kernel/btrfs as it might break the ReadyNAS interface /
future firmware upgrades.

Perhaps I could grab this:
https://github.com/kdave/btrfs-progs/releases/tag/v4.16.1 and
hopefully build from source and then run the binaries directly?

Kind regards

On 29 April 2018 at 09:33, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
> On 2018年04月29日 16:11, Michael Wade wrote:
>> Thanks Qu,
>>
>> Please find attached the log file for the chunk recover command.
>
> Strangely, btrfs chunk recovery found no extra chunk beyond current
> system chunk range.
>
> Which means, it's chunk tree corrupted.
>
> Please dump the chunk tree with latest btrfs-progs (which provides the
> new --follow option).
>
> # btrfs inspect dump-tree -b 20800943685632 <device>
>
> If it doesn't work, please provide the following binary dump:
>
> # dd if=<dev> of=/tmp/chunk_root.copy1 bs=1 count=32K skip=266325721088
> # dd if=<dev> of=/tmp/chunk_root.copy2 bs=1 count=32K skip=266359275520
> (And will need to repeat similar dump for several times according to
> above dump)
>
> Thanks,
> Qu
>
>
>>
>> Kind regards
>> Michael
>>
>> On 28 April 2018 at 12:38, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>
>>>
>>> On 2018年04月28日 17:37, Michael Wade wrote:
>>>> Hi Qu,
>>>>
>>>> Thanks for your reply. I will investigate upgrading the kernel,
>>>> however I worry that future ReadyNAS firmware upgrades would fail on a
>>>> newer kernel version (I don't have much linux experience so maybe my
>>>> concerns are unfounded!?).
>>>>
>>>> I have attached the output of the dump super command.
>>>>
>>>> I did actually run chunk recover before, without the verbose option,
>>>> it took around 24 hours to finish but did not resolve my issue. Happy
>>>> to start that again if you need its output.
>>>
>>> The system chunk only contains the following chunks:
>>> [0, 4194304]:           Initial temporary chunk, not used at all
>>> [20971520, 29360128]:   System chunk created by mkfs, should be full
>>>                         used up
>>> [20800943685632, 20800977240064]:
>>>                         The newly created large system chunk.
>>>
>>> The chunk root is still in 2nd chunk thus valid, but some of its leaf is
>>> out of the range.
>>>
>>> If you can't wait 24h for chunk recovery to run, my advice would be move
>>> the disk to some other computer, and use latest btrfs-progs to execute
>>> the following command:
>>>
>>> # btrfs inpsect dump-tree -b 20800943685632 --follow
>>>
>>> If we're lucky enough, we may read out the tree leaf containing the new
>>> system chunk and save a day.
>>>
>>> Thanks,
>>> Qu
>>>
>>>>
>>>> Thanks so much for your help.
>>>>
>>>> Kind regards
>>>> Michael
>>>>
>>>> On 28 April 2018 at 09:45, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>>
>>>>>
>>>>> On 2018年04月28日 16:30, Michael Wade wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I was hoping that someone would be able to help me resolve the issues
>>>>>> I am having with my ReadyNAS BTRFS volume. Basically my trouble
>>>>>> started after a power cut, subsequently the volume would not mount.
>>>>>> Here are the details of my setup as it is at the moment:
>>>>>>
>>>>>> uname -a
>>>>>> Linux QAI 4.4.116.alpine.1 #1 SMP Mon Feb 19 21:58:38 PST 2018 armv7l GNU/Linux
>>>>>
>>>>> The kernel is pretty old for btrfs.
>>>>> Strongly recommended to upgrade.
>>>>>
>>>>>>
>>>>>> btrfs --version
>>>>>> btrfs-progs v4.12
>>>>>
>>>>> So is the user tools.
>>>>>
>>>>> Although I think it won't be a big problem, as needed tool should be there.
>>>>>
>>>>>>
>>>>>> btrfs fi show
>>>>>> Label: '11baed92:data'  uuid: 20628cda-d98f-4f85-955c-932a367f8821
>>>>>> Total devices 1 FS bytes used 5.12TiB
>>>>>> devid    1 size 7.27TiB used 6.24TiB path /dev/md127
>>>>>
>>>>> So, it's btrfs on mdraid.
>>>>> It would normally make things harder to debug, so I could only provide
>>>>> advice from the respect of btrfs.
>>>>> For mdraid part, I can't ensure anything.
>>>>>
>>>>>>
>>>>>> Here are the relevant dmesg logs for the current state of the device:
>>>>>>
>>>>>> [   19.119391] md: md127 stopped.
>>>>>> [   19.120841] md: bind<sdb3>
>>>>>> [   19.121120] md: bind<sdc3>
>>>>>> [   19.121380] md: bind<sda3>
>>>>>> [   19.125535] md/raid:md127: device sda3 operational as raid disk 0
>>>>>> [   19.125547] md/raid:md127: device sdc3 operational as raid disk 2
>>>>>> [   19.125554] md/raid:md127: device sdb3 operational as raid disk 1
>>>>>> [   19.126712] md/raid:md127: allocated 3240kB
>>>>>> [   19.126778] md/raid:md127: raid level 5 active with 3 out of 3
>>>>>> devices, algorithm 2
>>>>>> [   19.126784] RAID conf printout:
>>>>>> [   19.126789]  --- level:5 rd:3 wd:3
>>>>>> [   19.126794]  disk 0, o:1, dev:sda3
>>>>>> [   19.126799]  disk 1, o:1, dev:sdb3
>>>>>> [   19.126804]  disk 2, o:1, dev:sdc3
>>>>>> [   19.128118] md127: detected capacity change from 0 to 7991637573632
>>>>>> [   19.395112] Adding 523708k swap on /dev/md1.  Priority:-1 extents:1
>>>>>> across:523708k
>>>>>> [   19.434956] BTRFS: device label 11baed92:data devid 1 transid
>>>>>> 151800 /dev/md127
>>>>>> [   19.739276] BTRFS info (device md127): setting nodatasum
>>>>>> [   19.740440] BTRFS critical (device md127): unable to find logical
>>>>>> 3208757641216 len 4096
>>>>>> [   19.740450] BTRFS critical (device md127): unable to find logical
>>>>>> 3208757641216 len 4096
>>>>>> [   19.740498] BTRFS critical (device md127): unable to find logical
>>>>>> 3208757641216 len 4096
>>>>>> [   19.740512] BTRFS critical (device md127): unable to find logical
>>>>>> 3208757641216 len 4096
>>>>>> [   19.740552] BTRFS critical (device md127): unable to find logical
>>>>>> 3208757641216 len 4096
>>>>>> [   19.740560] BTRFS critical (device md127): unable to find logical
>>>>>> 3208757641216 len 4096
>>>>>> [   19.740576] BTRFS error (device md127): failed to read chunk root
>>>>>
>>>>> This shows it pretty clear, btrfs fails to read chunk root.
>>>>> And according your above "len 4096" it's pretty old fs, as it's still
>>>>> using 4K nodesize other than 16K nodesize.
>>>>>
>>>>> According to above output, it means your superblock by somehow lacks the
>>>>> needed system chunk mapping, which is used to initialize chunk mapping.
>>>>>
>>>>> Please provide the following command output:
>>>>>
>>>>> # btrfs inspect dump-super -fFa /dev/md127
>>>>>
>>>>> Also, please consider run the following command and dump all its output:
>>>>>
>>>>> # btrfs rescue chunk-recover -v /dev/md127.
>>>>>
>>>>> Please note that, above command can take a long time to finish, and if
>>>>> it works without problem, it may solve your problem.
>>>>> But if it doesn't work, the output could help me to manually craft a fix
>>>>> to your super block.
>>>>>
>>>>> Thanks,
>>>>> Qu
>>>>>
>>>>>
>>>>>> [   19.783975] BTRFS error (device md127): open_ctree failed
>>>>>>
>>>>>> In an attempt to recover the volume myself I run a few BTRFS commands
>>>>>> mostly using advice from here:
>>>>>> https://lists.opensuse.org/opensuse/2017-02/msg00930.html. However
>>>>>> that actually seems to have made things worse as I can no longer mount
>>>>>> the file system, not even in readonly mode.
>>>>>>
>>>>>> So starting from the beginning here is a list of things I have done so
>>>>>> far (hopefully I remembered the order in which I ran them!)
>>>>>>
>>>>>> 1. Noticed that my backups to the NAS were not running (didn't get
>>>>>> notified that the volume had basically "died")
>>>>>> 2. ReadyNAS UI indicated that the volume was inactive.
>>>>>> 3. SSHed onto the box and found that the first drive was not marked as
>>>>>> operational (log showed I/O errors / UNKOWN (0x2003))  so I replaced
>>>>>> the disk and let the array resync.
>>>>>> 4. After resync the volume still was unaccessible so I looked at the
>>>>>> logs once more and saw something like the following which seemed to
>>>>>> indicate that the replay log had been corrupted when the power went
>>>>>> out:
>>>>>>
>>>>>> BTRFS critical (device md127): corrupt leaf, non-root leaf's nritems
>>>>>> is 0: block=232292352, root=7, slot=0
>>>>>> BTRFS critical (device md127): corrupt leaf, non-root leaf's nritems
>>>>>> is 0: block=232292352, root=7, slot=0
>>>>>> BTRFS: error (device md127) in btrfs_replay_log:2524: errno=-5 IO
>>>>>> failure (Failed to recover log tree)
>>>>>> BTRFS error (device md127): pending csums is 155648
>>>>>> BTRFS error (device md127): cleaner transaction attach returned -30
>>>>>> BTRFS critical (device md127): corrupt leaf, non-root leaf's nritems
>>>>>> is 0: block=232292352, root=7, slot=0
>>>>>>
>>>>>> 5. Then:
>>>>>>
>>>>>> btrfs rescue zero-log
>>>>>>
>>>>>> 6. Was then able to mount the volume in readonly mode.
>>>>>>
>>>>>> btrfs scrub start
>>>>>>
>>>>>> Which fixed some errors but not all:
>>>>>>
>>>>>> scrub status for 20628cda-d98f-4f85-955c-932a367f8821
>>>>>>
>>>>>> scrub started at Tue Apr 24 17:27:44 2018, running for 04:00:34
>>>>>> total bytes scrubbed: 224.26GiB with 6 errors
>>>>>> error details: csum=6
>>>>>> corrected errors: 0, uncorrectable errors: 6, unverified errors: 0
>>>>>>
>>>>>> scrub status for 20628cda-d98f-4f85-955c-932a367f8821
>>>>>> scrub started at Tue Apr 24 17:27:44 2018, running for 04:34:43
>>>>>> total bytes scrubbed: 224.26GiB with 6 errors
>>>>>> error details: csum=6
>>>>>> corrected errors: 0, uncorrectable errors: 6, unverified errors: 0
>>>>>>
>>>>>> 6. Seeing this hanging I rebooted the NAS
>>>>>> 7. Think this is when the volume would not mount at all.
>>>>>> 8. Seeing log entries like these:
>>>>>>
>>>>>> BTRFS warning (device md127): checksum error at logical 20800943685632
>>>>>> on dev /dev/md127, sector 520167424: metadata node (level 1) in tree 3
>>>>>>
>>>>>> I ran
>>>>>>
>>>>>> btrfs check --fix-crc
>>>>>>
>>>>>> And that brings us to where I am now: Some seemly corrupted BTRFS
>>>>>> metadata and unable to mount the drive even with the recovery option.
>>>>>>
>>>>>> Any help you can give is much appreciated!
>>>>>>
>>>>>> Kind regards
>>>>>> Michael
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>
>>>
>

  reply	other threads:[~2018-04-29  8:59 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-28  8:30 BTRFS RAID filesystem unmountable Michael Wade
2018-04-28  8:45 ` Qu Wenruo
2018-04-28  9:37   ` Michael Wade
2018-04-28 11:38     ` Qu Wenruo
     [not found]       ` <CAB+znrF_d+Hg_A9AMvWEB=S5eVAtYrvr2jUPcvR4FfB4hnCMWA@mail.gmail.com>
2018-04-29  8:33         ` Qu Wenruo
2018-04-29  8:59           ` Michael Wade [this message]
2018-04-29  9:33             ` Qu Wenruo
     [not found]               ` <CAB+znrEcW3+++ZBrB_ZGRFncssO-zffbJ6ug8_z0DJOhbp+vGA@mail.gmail.com>
2018-04-30  1:52                 ` Qu Wenruo
2018-04-30  3:02                 ` Qu Wenruo
2018-05-01 15:50                   ` Michael Wade
2018-05-02  1:31                     ` Qu Wenruo
2018-05-02  5:29                       ` Michael Wade
2018-05-04 16:18                         ` Michael Wade
2018-05-05  0:43                           ` Qu Wenruo
2018-05-19 11:43                             ` Michael Wade
     [not found]                               ` <CAB+znrFS=Xi+4tPS3szqZro1FdjnVcbe29UV9UMUUxsGL6NJUg@mail.gmail.com>
2018-12-06 23:26                                 ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAB+znrHPF7963-MAXtgu8s3+vLEA7ENE0=uCCJ6sWCF6KUWToQ@mail.gmail.com' \
    --to=spikewade@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).