All of lore.kernel.org
 help / color / mirror / Atom feed
From: Su Yue <l@damenly.su>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Robert Wyrick <rob@wyrick.org>,
	Anand Jain <anand.jain@oracle.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: Next steps in recovery?
Date: Wed, 08 Sep 2021 09:59:22 +0800	[thread overview]
Message-ID: <lf47byws.fsf@damenly.su> (raw)
In-Reply-To: <ddb57c10-3581-0089-d84a-4935644bef5a@gmx.com>


On Wed 08 Sep 2021 at 07:15, Qu Wenruo <quwenruo.btrfs@gmx.com> 
wrote:

> On 2021/9/8 上午1:02, Robert Wyrick wrote:
>> Ran a repair:
>>
>> $ sudo ./btrfs check --repair -p /dev/sda  # I did NOT make 
>> install,
>> just ran from the compiled directory
>> enabling repair mode
>> WARNING:
>>
>> Do not use --repair unless you are advised to do so by a 
>> developer
>> or an experienced user, and then only after having accepted 
>> that no
>> fsck can successfully repair all types of filesystem 
>> corruption. Eg.
>> some software or hardware bugs can fatally damage a volume.
>> The operation will start in 10 seconds.
>> Use Ctrl-C to stop it.
>> 10 9 8 7 6 5 4 3 2 1
>> Starting repair.
>> Opening filesystem to check...
>> Checking filesystem on /dev/sda
>> UUID: 75f1f45c-552e-4ae2-a56f-46e44b6647cf
>> [1/7] checking root items                      (0:00:59 
>> elapsed,
>> 2649102 items checked)
>> Fixed 0 roots.
>> Reset extent item (38179182174208) generation to 
>> 4057084elapsed,
>> 1116143 items checked)
>> No device size related problem found           (0:02:22 
>> elapsed,
>> 1116143 items checked)
>> [2/7] checking extents                         (0:02:23 
>> elapsed,
>> 1116143 items checked)
>> cache and super generation don't match, space cache will be 
>> invalidated
>> [3/7] checking free space cache                (0:00:00 
>> elapsed)
>> Deleting bad dir index [8348950,96,3] root 259 (0:00:25 
>> elapsed,
>> 106695 items checked)
>> repairing missing dir index item for inode 834922400:26 
>> elapsed,
>> 108893 items checked)
>> [4/7] checking fs roots                        (0:01:04 
>> elapsed,
>> 217787 items checked)
>> [5/7] checking csums (without verifying data)  (0:00:04 
>> elapsed,
>> 12350321 items checked)
>> [6/7] checking root refs                       (0:00:00 
>> elapsed, 4
>> items checked)
>> [7/7] checking quota groups skipped (not enabled on this FS)
>> found 15729059057664 bytes used, no error found
>> total csum bytes: 15313288548
>> total tree bytes: 18286739456
>> total fs tree bytes: 1791819776
>> total extent tree bytes: 229130240
>> btree space waste bytes: 1018844959
>> file data blocks allocated: 51587230502912
>>   referenced 15627926712320
>>
>> I can now mount the filesystem successfully!  Thank you for 
>> your help.
>>
>> I do have some additional questions if you don't mind...
>> I am already using RAID 1 to handle single disk outages.
>
> One thing to note is, RAID is not perfect, not even close to 
> proper backup.
>
> RAID is really only suitable to handle disk failures, nothing 
> more than
> that.
>
> In a spectrum of backup, RAID is really just better than 
> nothing.
>
> In this particular case, all the corruption is from bitflips, 
> thus all
> copies are corrupted, no profile can save the day.
>
>>  I assume
>> things could have gone much worse and I could have lost the 
>> whole
>> filesystem.
>
> If you're using newer kernels all time, the kernel can detect 
> the extent
> generation problem before writing the corrupted data back to 
> disk, thus
> save the day.
>
>>  Aside from backups (I know, I know), is there anything
>> else I can do to prevent such issues or make them easier to 
>> recover
>> from?
>
> Newer kernel (v5.11 and newer) can prevent it.
> Although when such rejection happens, it will not feel that 
> comfort
> though, as it would mostly result the fs to go RO.
> But still way better than writing bad data onto disks.
>
>>  Could this problem have been avoided/detected earlier?
>
> Yes, newer kernel.
>
>>  This
>> wasn't a disk failure and according to memtest86+, it wasn't 
>> due to
>> bad memory either....
>
> I still don't believe, maybe you can try to run memtester (which 
> is ran
> in user space, and since we have kernel doing the page mapping, 
> it may
> expose a different workload on the memory controller than 
> memtest86+)
>
And testmem5 using config anta777 may help. It's widely used to 
test
memory stability after overclocking but runs on evil Windows 
though.
It won't take too much time. Two cicles test of 64G memory 
consumes
about 4~6 hours.

--
Su

Subject
> Since the extent generation corruption is a super obvious 
> bitflip.
>
>>  I don't run scrubs very often.  Should I?
>
> For newer kernels, the corruption can be rejected in first 
> place, thus
> the scrub is only going to detect problems already in the fs.
>
> For older kernels, scrub won't detect the problem anyw.
>
> So I guess you don't need that frequent scrub, but it's still 
> recommended.
> Maybe monthly?
>
>>  I
>> guess the more general question is:  What are the best 
>> practices for
>> maintaining a healthy btrfs file system?
>
> Well, healthy hardware, balanced kernel version between cutting 
> edge and
> stable.
> Personally I'm more towards cutting edge thought.
>
> Thanks,
> Qu
>
>>
>> Thanks again!
>>
>> On Mon, Sep 6, 2021 at 10:53 PM Qu Wenruo 
>> <quwenruo.btrfs@gmx.com> wrote:
>>>
>>>
>>>
>>> On 2021/9/7 下午12:36, Robert Wyrick wrote:
>>>> What exactly would i be disabling?  I don't know what zoned 
>>>> does.
>>>
>>> The zoned device support.
>>>
>>> If you don't have any host-managed zoned device, there is no 
>>> reason you
>>> would like to enable it.
>>>
>>> https://zonedstorage.io/introduction/
>>>
>>> Thanks,
>>> Qu
>>>
>>>>
>>>> On Mon, Sep 6, 2021, 9:07 PM Anand Jain 
>>>> <anand.jain@oracle.com> wrote:
>>>>>
>>>>> On 07/09/2021 10:36, Robert Wyrick wrote:
>>>>>> Trying to build latest btrfs-progs.  I'm seeing errors in 
>>>>>> the configure script.
>>>>>>
>>>>>> $ cat /etc/os-release
>>>>>> NAME="Linux Mint"
>>>>>> VERSION="20.2 (Uma)"
>>>>>> ID=linuxmint
>>>>>> ID_LIKE=ubuntu
>>>>>> PRETTY_NAME="Linux Mint 20.2"
>>>>>> VERSION_ID="20.2"
>>>>>> HOME_URL="https://www.linuxmint.com/"
>>>>>> SUPPORT_URL="https://forums.linuxmint.com/"
>>>>>> BUG_REPORT_URL="http://linuxmint-troubleshooting-guide.readthedocs.io/en/latest/"
>>>>>> PRIVACY_POLICY_URL="https://www.linuxmint.com/"
>>>>>> VERSION_CODENAME=uma
>>>>>> UBUNTU_CODENAME=focal
>>>>>>
>>>>>> $ uname -a
>>>>>> Linux bigbox 5.11.0-27-generic #29~20.04.1-Ubuntu SMP Wed 
>>>>>> Aug 11
>>>>>> 15:58:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
>>>>>>
>>>>>> $ ./configure
>>>>>> checking for gcc... gcc
>>>>>> checking whether the C compiler works... yes
>>>>>> checking for C compiler default output file name... a.out
>>>>>> checking for suffix of executables...
>>>>>> checking whether we are cross compiling... no
>>>>>> checking for suffix of object files... o
>>>>>> checking whether we are using the GNU C compiler... yes
>>>>>> checking whether gcc accepts -g... yes
>>>>>> checking for gcc option to accept ISO C89... none needed
>>>>>> checking how to run the C preprocessor... gcc -E
>>>>>> checking for grep that handles long lines and -e... 
>>>>>> /bin/grep
>>>>>> checking for egrep... /bin/grep -E
>>>>>> checking for ANSI C header files... yes
>>>>>> checking for sys/types.h... yes
>>>>>> checking for sys/stat.h... yes
>>>>>> checking for stdlib.h... yes
>>>>>> checking for string.h... yes
>>>>>> checking for memory.h... yes
>>>>>> checking for strings.h... yes
>>>>>> checking for inttypes.h... yes
>>>>>> checking for stdint.h... yes
>>>>>> checking for unistd.h... yes
>>>>>> checking minix/config.h usability... no
>>>>>> checking minix/config.h presence... no
>>>>>> checking for minix/config.h... no
>>>>>> checking whether it is safe to define __EXTENSIONS__... yes
>>>>>> checking for gcc... (cached) gcc
>>>>>> checking whether we are using the GNU C compiler... 
>>>>>> (cached) yes
>>>>>> checking whether gcc accepts -g... (cached) yes
>>>>>> checking for gcc option to accept ISO C89... (cached) none 
>>>>>> needed
>>>>>> checking whether C compiler accepts -std=gnu90... yes
>>>>>> checking build system type... x86_64-pc-linux-gnu
>>>>>> checking host system type... x86_64-pc-linux-gnu
>>>>>> checking for an ANSI C-conforming const... yes
>>>>>> checking for working volatile... yes
>>>>>> checking whether byte ordering is bigendian... no
>>>>>> checking for special C compiler options needed for large 
>>>>>> files... no
>>>>>> checking for _FILE_OFFSET_BITS value needed for large 
>>>>>> files... no
>>>>>> checking for a BSD-compatible install... /usr/bin/install 
>>>>>> -c
>>>>>> checking whether ln -s works... yes
>>>>>> checking for ar... ar
>>>>>> checking for rm... /bin/rm
>>>>>> checking for rmdir... /bin/rmdir
>>>>>> checking for openat... yes
>>>>>> checking for reallocarray... yes
>>>>>> checking for clock_gettime... yes
>>>>>> checking linux/perf_event.h usability... yes
>>>>>> checking linux/perf_event.h presence... yes
>>>>>> checking for linux/perf_event.h... yes
>>>>>> checking linux/hw_breakpoint.h usability... yes
>>>>>> checking linux/hw_breakpoint.h presence... yes
>>>>>> checking for linux/hw_breakpoint.h... yes
>>>>>> checking for pkg-config... /usr/bin/pkg-config
>>>>>> checking pkg-config is at least version 0.9.0... yes
>>>>>> checking execinfo.h usability... yes
>>>>>> checking execinfo.h presence... yes
>>>>>> checking for execinfo.h... yes
>>>>>> checking for backtrace... yes
>>>>>> checking for backtrace_symbols_fd... yes
>>>>>> checking for xmlto... /usr/bin/xmlto
>>>>>> checking for mv... /bin/mv
>>>>>> checking for a sed that does not truncate output... 
>>>>>> /bin/sed
>>>>>> checking for asciidoc... /usr/bin/asciidoc
>>>>>> checking for asciidoctor... no
>>>>>> checking for EXT2FS... yes
>>>>>> checking for COM_ERR... yes
>>>>>> checking for REISERFS... yes
>>>>>> checking for FIEMAP_EXTENT_SHARED defined in 
>>>>>> linux/fiemap.h... yes
>>>>>> checking for EXT4_EPOCH_MASK defined in ext2fs/ext2_fs.h... 
>>>>>> yes
>>>>>> checking linux/blkzoned.h usability... yes
>>>>>> checking linux/blkzoned.h presence... yes
>>>>>> checking for linux/blkzoned.h... yes
>>>>>> checking for struct blk_zone.capacity... no
>>>>>> checking for BLKGETZONESZ defined in linux/blkzoned.h... 
>>>>>> yes
>>>>>
>>>>>> configure: error: linux/blkzoned.h does not provide 
>>>>>> blk_zone.capacity
>>>>>
>>>>>
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> Info on the file in question (linux/blkzoned.h):
>>>>>>
>>>>>> $ dpkg -S /usr/include/linux/blkzoned.h
>>>>>> linux-libc-dev:amd64: /usr/include/linux/blkzoned.h
>>>>>>
>>>>>> $ dpkg -l linux-libc-dev
>>>>>> Desired=Unknown/Install/Remove/Purge/Hold
>>>>>> | 
>>>>>> Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
>>>>>> |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
>>>>>> ||/ Name                 Version      Architecture 
>>>>>> Description
>>>>>> +++-====================-============-============-====================================
>>>>>> ii  linux-libc-dev:amd64 5.4.0-81.91  amd64        Linux 
>>>>>> Kernel
>>>>>> Headers for development
>>>>>>
>>>>>>
>>>>>> So it appears that linux-libc-dev is way out-dated compared 
>>>>>> to my
>>>>>> kernel.  I don't know how to update it, though... there 
>>>>>> doesn't appear
>>>>>> to be a newer version available.
>>>>>
>>>>> You could disable the zoned.
>>>>>
>>>>>      ./configure --disable-zoned
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>

  reply	other threads:[~2021-09-08  2:10 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-03  2:43 Next steps in recovery? Robert Wyrick
2021-09-03  2:47 ` Robert Wyrick
2021-09-03  6:48 ` Qu Wenruo
2021-09-03  6:53   ` Qu Wenruo
     [not found]     ` <CAA_aC99-C8xOf7EAvJAMk2ZkYSaN2vyK7YFMw06utQ0T+tsh9A@mail.gmail.com>
2021-09-05 22:03       ` Qu Wenruo
2021-09-06 14:42         ` Robert Wyrick
2021-09-06 23:26           ` Qu Wenruo
2021-09-07  2:36             ` Robert Wyrick
2021-09-07  3:06               ` Anand Jain
2021-09-07  4:36                 ` Robert Wyrick
2021-09-07  4:53                   ` Qu Wenruo
2021-09-07 17:02                     ` Robert Wyrick
2021-09-07 17:17                       ` Robert Wyrick
2021-09-07 20:47                         ` Robert Wyrick
2021-09-07 23:17                           ` Qu Wenruo
2021-09-07 23:20                             ` Robert Wyrick
2021-09-07 23:28                               ` Qu Wenruo
2021-09-07 23:15                       ` Qu Wenruo
2021-09-08  1:59                         ` Su Yue [this message]
2021-09-08  6:50                           ` Robert Wyrick

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=lf47byws.fsf@damenly.su \
    --to=l@damenly.su \
    --cc=anand.jain@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=rob@wyrick.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.