From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua0-f173.google.com ([209.85.217.173]:33809 "EHLO mail-ua0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751280AbeD2I7O (ORCPT ); Sun, 29 Apr 2018 04:59:14 -0400 Received: by mail-ua0-f173.google.com with SMTP id f22so3738484uam.1 for ; Sun, 29 Apr 2018 01:59:14 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <54d2f70a-adae-98cc-581f-2e4786783b26@gmx.com> From: Michael Wade Date: Sun, 29 Apr 2018 09:59:13 +0100 Message-ID: Subject: Re: BTRFS RAID filesystem unmountable To: Qu Wenruo Cc: linux-btrfs@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-btrfs-owner@vger.kernel.org List-ID: Ok, will it be possible for me to install the new version of the tools on my current kernel without overriding the existing install? Hesitant to update kernel/btrfs as it might break the ReadyNAS interface / future firmware upgrades. Perhaps I could grab this: https://github.com/kdave/btrfs-progs/releases/tag/v4.16.1 and hopefully build from source and then run the binaries directly? Kind regards On 29 April 2018 at 09:33, Qu Wenruo wrote: > > > On 2018年04月29日 16:11, Michael Wade wrote: >> Thanks Qu, >> >> Please find attached the log file for the chunk recover command. > > Strangely, btrfs chunk recovery found no extra chunk beyond current > system chunk range. > > Which means, it's chunk tree corrupted. > > Please dump the chunk tree with latest btrfs-progs (which provides the > new --follow option). > > # btrfs inspect dump-tree -b 20800943685632 > > If it doesn't work, please provide the following binary dump: > > # dd if= of=/tmp/chunk_root.copy1 bs=1 count=32K skip=266325721088 > # dd if= of=/tmp/chunk_root.copy2 bs=1 count=32K skip=266359275520 > (And will need to repeat similar dump for several times according to > above dump) > > Thanks, > Qu > > >> >> Kind regards >> Michael >> >> On 28 April 2018 at 12:38, Qu Wenruo wrote: >>> >>> >>> On 2018年04月28日 17:37, Michael Wade wrote: >>>> Hi Qu, >>>> >>>> Thanks for your reply. I will investigate upgrading the kernel, >>>> however I worry that future ReadyNAS firmware upgrades would fail on a >>>> newer kernel version (I don't have much linux experience so maybe my >>>> concerns are unfounded!?). >>>> >>>> I have attached the output of the dump super command. >>>> >>>> I did actually run chunk recover before, without the verbose option, >>>> it took around 24 hours to finish but did not resolve my issue. Happy >>>> to start that again if you need its output. >>> >>> The system chunk only contains the following chunks: >>> [0, 4194304]: Initial temporary chunk, not used at all >>> [20971520, 29360128]: System chunk created by mkfs, should be full >>> used up >>> [20800943685632, 20800977240064]: >>> The newly created large system chunk. >>> >>> The chunk root is still in 2nd chunk thus valid, but some of its leaf is >>> out of the range. >>> >>> If you can't wait 24h for chunk recovery to run, my advice would be move >>> the disk to some other computer, and use latest btrfs-progs to execute >>> the following command: >>> >>> # btrfs inpsect dump-tree -b 20800943685632 --follow >>> >>> If we're lucky enough, we may read out the tree leaf containing the new >>> system chunk and save a day. >>> >>> Thanks, >>> Qu >>> >>>> >>>> Thanks so much for your help. >>>> >>>> Kind regards >>>> Michael >>>> >>>> On 28 April 2018 at 09:45, Qu Wenruo wrote: >>>>> >>>>> >>>>> On 2018年04月28日 16:30, Michael Wade wrote: >>>>>> Hi all, >>>>>> >>>>>> I was hoping that someone would be able to help me resolve the issues >>>>>> I am having with my ReadyNAS BTRFS volume. Basically my trouble >>>>>> started after a power cut, subsequently the volume would not mount. >>>>>> Here are the details of my setup as it is at the moment: >>>>>> >>>>>> uname -a >>>>>> Linux QAI 4.4.116.alpine.1 #1 SMP Mon Feb 19 21:58:38 PST 2018 armv7l GNU/Linux >>>>> >>>>> The kernel is pretty old for btrfs. >>>>> Strongly recommended to upgrade. >>>>> >>>>>> >>>>>> btrfs --version >>>>>> btrfs-progs v4.12 >>>>> >>>>> So is the user tools. >>>>> >>>>> Although I think it won't be a big problem, as needed tool should be there. >>>>> >>>>>> >>>>>> btrfs fi show >>>>>> Label: '11baed92:data' uuid: 20628cda-d98f-4f85-955c-932a367f8821 >>>>>> Total devices 1 FS bytes used 5.12TiB >>>>>> devid 1 size 7.27TiB used 6.24TiB path /dev/md127 >>>>> >>>>> So, it's btrfs on mdraid. >>>>> It would normally make things harder to debug, so I could only provide >>>>> advice from the respect of btrfs. >>>>> For mdraid part, I can't ensure anything. >>>>> >>>>>> >>>>>> Here are the relevant dmesg logs for the current state of the device: >>>>>> >>>>>> [ 19.119391] md: md127 stopped. >>>>>> [ 19.120841] md: bind >>>>>> [ 19.121120] md: bind >>>>>> [ 19.121380] md: bind >>>>>> [ 19.125535] md/raid:md127: device sda3 operational as raid disk 0 >>>>>> [ 19.125547] md/raid:md127: device sdc3 operational as raid disk 2 >>>>>> [ 19.125554] md/raid:md127: device sdb3 operational as raid disk 1 >>>>>> [ 19.126712] md/raid:md127: allocated 3240kB >>>>>> [ 19.126778] md/raid:md127: raid level 5 active with 3 out of 3 >>>>>> devices, algorithm 2 >>>>>> [ 19.126784] RAID conf printout: >>>>>> [ 19.126789] --- level:5 rd:3 wd:3 >>>>>> [ 19.126794] disk 0, o:1, dev:sda3 >>>>>> [ 19.126799] disk 1, o:1, dev:sdb3 >>>>>> [ 19.126804] disk 2, o:1, dev:sdc3 >>>>>> [ 19.128118] md127: detected capacity change from 0 to 7991637573632 >>>>>> [ 19.395112] Adding 523708k swap on /dev/md1. Priority:-1 extents:1 >>>>>> across:523708k >>>>>> [ 19.434956] BTRFS: device label 11baed92:data devid 1 transid >>>>>> 151800 /dev/md127 >>>>>> [ 19.739276] BTRFS info (device md127): setting nodatasum >>>>>> [ 19.740440] BTRFS critical (device md127): unable to find logical >>>>>> 3208757641216 len 4096 >>>>>> [ 19.740450] BTRFS critical (device md127): unable to find logical >>>>>> 3208757641216 len 4096 >>>>>> [ 19.740498] BTRFS critical (device md127): unable to find logical >>>>>> 3208757641216 len 4096 >>>>>> [ 19.740512] BTRFS critical (device md127): unable to find logical >>>>>> 3208757641216 len 4096 >>>>>> [ 19.740552] BTRFS critical (device md127): unable to find logical >>>>>> 3208757641216 len 4096 >>>>>> [ 19.740560] BTRFS critical (device md127): unable to find logical >>>>>> 3208757641216 len 4096 >>>>>> [ 19.740576] BTRFS error (device md127): failed to read chunk root >>>>> >>>>> This shows it pretty clear, btrfs fails to read chunk root. >>>>> And according your above "len 4096" it's pretty old fs, as it's still >>>>> using 4K nodesize other than 16K nodesize. >>>>> >>>>> According to above output, it means your superblock by somehow lacks the >>>>> needed system chunk mapping, which is used to initialize chunk mapping. >>>>> >>>>> Please provide the following command output: >>>>> >>>>> # btrfs inspect dump-super -fFa /dev/md127 >>>>> >>>>> Also, please consider run the following command and dump all its output: >>>>> >>>>> # btrfs rescue chunk-recover -v /dev/md127. >>>>> >>>>> Please note that, above command can take a long time to finish, and if >>>>> it works without problem, it may solve your problem. >>>>> But if it doesn't work, the output could help me to manually craft a fix >>>>> to your super block. >>>>> >>>>> Thanks, >>>>> Qu >>>>> >>>>> >>>>>> [ 19.783975] BTRFS error (device md127): open_ctree failed >>>>>> >>>>>> In an attempt to recover the volume myself I run a few BTRFS commands >>>>>> mostly using advice from here: >>>>>> https://lists.opensuse.org/opensuse/2017-02/msg00930.html. However >>>>>> that actually seems to have made things worse as I can no longer mount >>>>>> the file system, not even in readonly mode. >>>>>> >>>>>> So starting from the beginning here is a list of things I have done so >>>>>> far (hopefully I remembered the order in which I ran them!) >>>>>> >>>>>> 1. Noticed that my backups to the NAS were not running (didn't get >>>>>> notified that the volume had basically "died") >>>>>> 2. ReadyNAS UI indicated that the volume was inactive. >>>>>> 3. SSHed onto the box and found that the first drive was not marked as >>>>>> operational (log showed I/O errors / UNKOWN (0x2003)) so I replaced >>>>>> the disk and let the array resync. >>>>>> 4. After resync the volume still was unaccessible so I looked at the >>>>>> logs once more and saw something like the following which seemed to >>>>>> indicate that the replay log had been corrupted when the power went >>>>>> out: >>>>>> >>>>>> BTRFS critical (device md127): corrupt leaf, non-root leaf's nritems >>>>>> is 0: block=232292352, root=7, slot=0 >>>>>> BTRFS critical (device md127): corrupt leaf, non-root leaf's nritems >>>>>> is 0: block=232292352, root=7, slot=0 >>>>>> BTRFS: error (device md127) in btrfs_replay_log:2524: errno=-5 IO >>>>>> failure (Failed to recover log tree) >>>>>> BTRFS error (device md127): pending csums is 155648 >>>>>> BTRFS error (device md127): cleaner transaction attach returned -30 >>>>>> BTRFS critical (device md127): corrupt leaf, non-root leaf's nritems >>>>>> is 0: block=232292352, root=7, slot=0 >>>>>> >>>>>> 5. Then: >>>>>> >>>>>> btrfs rescue zero-log >>>>>> >>>>>> 6. Was then able to mount the volume in readonly mode. >>>>>> >>>>>> btrfs scrub start >>>>>> >>>>>> Which fixed some errors but not all: >>>>>> >>>>>> scrub status for 20628cda-d98f-4f85-955c-932a367f8821 >>>>>> >>>>>> scrub started at Tue Apr 24 17:27:44 2018, running for 04:00:34 >>>>>> total bytes scrubbed: 224.26GiB with 6 errors >>>>>> error details: csum=6 >>>>>> corrected errors: 0, uncorrectable errors: 6, unverified errors: 0 >>>>>> >>>>>> scrub status for 20628cda-d98f-4f85-955c-932a367f8821 >>>>>> scrub started at Tue Apr 24 17:27:44 2018, running for 04:34:43 >>>>>> total bytes scrubbed: 224.26GiB with 6 errors >>>>>> error details: csum=6 >>>>>> corrected errors: 0, uncorrectable errors: 6, unverified errors: 0 >>>>>> >>>>>> 6. Seeing this hanging I rebooted the NAS >>>>>> 7. Think this is when the volume would not mount at all. >>>>>> 8. Seeing log entries like these: >>>>>> >>>>>> BTRFS warning (device md127): checksum error at logical 20800943685632 >>>>>> on dev /dev/md127, sector 520167424: metadata node (level 1) in tree 3 >>>>>> >>>>>> I ran >>>>>> >>>>>> btrfs check --fix-crc >>>>>> >>>>>> And that brings us to where I am now: Some seemly corrupted BTRFS >>>>>> metadata and unable to mount the drive even with the recovery option. >>>>>> >>>>>> Any help you can give is much appreciated! >>>>>> >>>>>> Kind regards >>>>>> Michael >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>> >>> >