linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Problems with BTRFS formatted disk
@ 2022-06-18 18:55 David C. Partridge
  2022-06-18 23:00 ` Qu Wenruo
  0 siblings, 1 reply; 24+ messages in thread
From: David C. Partridge @ 2022-06-18 18:55 UTC (permalink / raw)
  To: linux-btrfs

It all started with a power outage.

When I brought the system back up I got:

Jun 18 15:40:27 charon kernel: BTRFS error (device sdb1): parent transid
verify failed on 12554992156672 wanted 130582 found 127355
Jun 18 15:40:27 charon kernel: BTRFS error (device sdb1): parent transid
verify failed on 12554992156672 wanted 130582 found 127355
Jun 18 15:40:27 charon kernel: BTRFS error (device sdb1): failed to read
block groups: -5
Jun 18 15:40:27 charon mount[629]: mount: /shared: wrong fs type, bad
option, bad superblock on /dev/sdb1, missing codepage or helper program, or
othe>
Jun 18 15:40:27 charon systemd[1]: shared.mount: Mount process exited,
code=exited, status=32/n/a
Jun 18 15:40:27 charon systemd[1]: shared.mount: Failed with result
'exit-code'.
Jun 18 15:40:27 charon systemd[1]: Failed to mount /shared.
Jun 18 15:40:27 charon kernel: BTRFS error (device sdb1): open_ctree failed

I tried:
root@charon:/home/amonra# btrfs check /dev/sdb1
Opening filesystem to check...
parent transid verify failed on 12554992156672 wanted 130582 found 127355
parent transid verify failed on 12554992156672 wanted 130582 found 127355
parent transid verify failed on 12554992156672 wanted 130582 found 127355
Ignoring transid failure
leaf parent key incorrect 12554992156672
ERROR: failed to read block groups: Operation not permitted
ERROR: cannot open file system
root@charon:/home/amonra# btrfs check -s 1 /dev/sdb1
using SB copy 1, bytenr 67108864
Opening filesystem to check...
parent transid verify failed on 12554992156672 wanted 130582 found 127355
parent transid verify failed on 12554992156672 wanted 130582 found 127355
parent transid verify failed on 12554992156672 wanted 130582 found 127355
Ignoring transid failure
leaf parent key incorrect 12554992156672
ERROR: failed to read block groups: Operation not permitted
ERROR: cannot open file system
root@charon:/home/amonra# btrfs check -s 2 /dev/sdb1
using SB copy 2, bytenr 274877906944
Opening filesystem to check...
parent transid verify failed on 12554992156672 wanted 130582 found 127355
parent transid verify failed on 12554992156672 wanted 130582 found 127355
parent transid verify failed on 12554992156672 wanted 130582 found 127355
Ignoring transid failure
leaf parent key incorrect 12554992156672
ERROR: failed to read block groups: Operation not permitted
ERROR: cannot open file system
root@charon:/home/amonra#

but that didn't achieve much.

Following advice I tried: btrfs rescue zero-log which appeared to work, but
attempt to mount afterwards gave me:

Jun 18 18:58:38 charon kernel: BTRFS info (device sdb1): flagging fs with
big metadata feature
Jun 18 18:58:38 charon kernel: BTRFS info (device sdb1): disk space caching
is enabled
Jun 18 18:58:38 charon kernel: BTRFS info (device sdb1): has skinny extents
Jun 18 18:58:39 charon kernel: BTRFS error (device sdb1): parent transid
verify failed on 12554992156672 wanted 130582 found 127355
Jun 18 18:58:39 charon kernel: BTRFS error (device sdb1): parent transid
verify failed on 12554992156672 wanted 130582 found 127355
Jun 18 18:58:39 charon kernel: BTRFS error (device sdb1): failed to read
block groups: -5
Jun 18 18:58:39 charon kernel: BTRFS error (device sdb1): open_ctree failed

In desperation I tried: btrfs check --repair which gave me:

Opening filesystem to check...
parent transid verify failed on 12554992156672 wanted 130582 found 127355
parent transid verify failed on 12554992156672 wanted 130582 found 127355
parent transid verify failed on 12554992156672 wanted 130582 found 127355
Ignoring transid failure
leaf parent key incorrect 12554992156672
ERROR: failed to read block groups: Operation not permitted
ERROR: cannot open file system

So what do I do now?  I don't have a disk large enough to attempt btrfs
restore (if that would even work).  I don't have a backup of this volume as
this is my backup disk.

Thanks 
David










Cheers, David



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problems with BTRFS formatted disk
  2022-06-18 18:55 Problems with BTRFS formatted disk David C. Partridge
@ 2022-06-18 23:00 ` Qu Wenruo
  2022-06-19  1:33   ` David C. Partridge
  2022-06-19  1:37   ` David C. Partridge
  0 siblings, 2 replies; 24+ messages in thread
From: Qu Wenruo @ 2022-06-18 23:00 UTC (permalink / raw)
  To: David C. Partridge, linux-btrfs



On 2022/6/19 02:55, David C. Partridge wrote:
> It all started with a power outage.
>
> When I brought the system back up I got:
>
> Jun 18 15:40:27 charon kernel: BTRFS error (device sdb1): parent transid
> verify failed on 12554992156672 wanted 130582 found 127355
> Jun 18 15:40:27 charon kernel: BTRFS error (device sdb1): parent transid
> verify failed on 12554992156672 wanted 130582 found 127355

Some data write doesn't reach disk, even btrfs does the proper FLUSH call.

Mind to provide the disk model?

> Jun 18 15:40:27 charon kernel: BTRFS error (device sdb1): failed to read
> block groups: -5
> Jun 18 15:40:27 charon mount[629]: mount: /shared: wrong fs type, bad
> option, bad superblock on /dev/sdb1, missing codepage or helper program, or
> othe>
> Jun 18 15:40:27 charon systemd[1]: shared.mount: Mount process exited,
> code=exited, status=32/n/a
> Jun 18 15:40:27 charon systemd[1]: shared.mount: Failed with result
> 'exit-code'.
> Jun 18 15:40:27 charon systemd[1]: Failed to mount /shared.
> Jun 18 15:40:27 charon kernel: BTRFS error (device sdb1): open_ctree failed
>
> I tried:
> root@charon:/home/amonra# btrfs check /dev/sdb1
> Opening filesystem to check...
> parent transid verify failed on 12554992156672 wanted 130582 found 127355
> parent transid verify failed on 12554992156672 wanted 130582 found 127355
> parent transid verify failed on 12554992156672 wanted 130582 found 127355
> Ignoring transid failure
> leaf parent key incorrect 12554992156672
> ERROR: failed to read block groups: Operation not permitted
> ERROR: cannot open file system
> root@charon:/home/amonra# btrfs check -s 1 /dev/sdb1
> using SB copy 1, bytenr 67108864
> Opening filesystem to check...
> parent transid verify failed on 12554992156672 wanted 130582 found 127355
> parent transid verify failed on 12554992156672 wanted 130582 found 127355
> parent transid verify failed on 12554992156672 wanted 130582 found 127355
> Ignoring transid failure
> leaf parent key incorrect 12554992156672
> ERROR: failed to read block groups: Operation not permitted
> ERROR: cannot open file system
> root@charon:/home/amonra# btrfs check -s 2 /dev/sdb1
> using SB copy 2, bytenr 274877906944
> Opening filesystem to check...
> parent transid verify failed on 12554992156672 wanted 130582 found 127355
> parent transid verify failed on 12554992156672 wanted 130582 found 127355
> parent transid verify failed on 12554992156672 wanted 130582 found 127355
> Ignoring transid failure
> leaf parent key incorrect 12554992156672
> ERROR: failed to read block groups: Operation not permitted
> ERROR: cannot open file system
> root@charon:/home/amonra#
>
> but that didn't achieve much.
>
> Following advice I tried: btrfs rescue zero-log which appeared to work, but
> attempt to mount afterwards gave me:
>
> Jun 18 18:58:38 charon kernel: BTRFS info (device sdb1): flagging fs with
> big metadata feature
> Jun 18 18:58:38 charon kernel: BTRFS info (device sdb1): disk space caching
> is enabled
> Jun 18 18:58:38 charon kernel: BTRFS info (device sdb1): has skinny extents
> Jun 18 18:58:39 charon kernel: BTRFS error (device sdb1): parent transid
> verify failed on 12554992156672 wanted 130582 found 127355
> Jun 18 18:58:39 charon kernel: BTRFS error (device sdb1): parent transid
> verify failed on 12554992156672 wanted 130582 found 127355
> Jun 18 18:58:39 charon kernel: BTRFS error (device sdb1): failed to read
> block groups: -5
> Jun 18 18:58:39 charon kernel: BTRFS error (device sdb1): open_ctree failed
>
> In desperation I tried: btrfs check --repair which gave me:
>
> Opening filesystem to check...
> parent transid verify failed on 12554992156672 wanted 130582 found 127355
> parent transid verify failed on 12554992156672 wanted 130582 found 127355
> parent transid verify failed on 12554992156672 wanted 130582 found 127355
> Ignoring transid failure
> leaf parent key incorrect 12554992156672
> ERROR: failed to read block groups: Operation not permitted
> ERROR: cannot open file system
>
> So what do I do now?  I don't have a disk large enough to attempt btrfs
> restore (if that would even work).  I don't have a backup of this volume as
> this is my backup disk.

You can try rescue=all mount option, which has the extra handling on
corrupted extent tree.

Although you have to use kernels newer than v5.15 (including v5.15) to
benefit from the change.

Thanks,
Qu

>
> Thanks
> David
>
>
>
>
>
>
>
>
>
>
> Cheers, David
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Problems with BTRFS formatted disk
  2022-06-18 23:00 ` Qu Wenruo
@ 2022-06-19  1:33   ` David C. Partridge
  2022-06-19  2:01     ` Qu Wenruo
  2022-06-19  1:37   ` David C. Partridge
  1 sibling, 1 reply; 24+ messages in thread
From: David C. Partridge @ 2022-06-19  1:33 UTC (permalink / raw)
  To: 'Qu Wenruo', linux-btrfs

I at least know when the problem happened - the power fail happened about 13:20:01 on May 26th:

May 26 12:00:01 charon CRON[1959806]: (root) CMD (mount -t btrfs -U c63bcf2b-e4e5-431f-b03d-36f822c68b53 /mnt/root && cd /mnt/root && btrfs-snaps hourly 3 | grep -Ev "$GREPOUT" ; cd / && umount /mnt/root)
May 26 12:00:01 charon CRON[1959808]: (smmsp) CMD (test -x /etc/init.d/sendmail && test -x /usr/share/sendmail/sendmail && test -x /usr/lib/sm.bin/sendmail && /usr/share/sendmail/sendmail cron-msp)
May 26 12:00:01 charon CRON[1959804]: pam_unix(cron:session): session closed for user smmsp
May 26 12:00:01 charon CRON[1959802]: pam_unix(cron:session): session closed for user root
May 26 12:00:01 charon systemd[1372]: mnt-shared.mount: Succeeded.
May 26 12:00:01 charon systemd[1074]: mnt-shared.mount: Succeeded.
May 26 12:00:01 charon systemd[1]: mnt-shared.mount: Succeeded.
May 26 12:00:03 charon CRON[1959803]: pam_unix(cron:session): session closed for user root
May 26 12:00:03 charon systemd[1]: mnt-root.mount: Succeeded.
May 26 12:00:03 charon systemd[1074]: mnt-root.mount: Succeeded.
May 26 12:00:03 charon systemd[1372]: mnt-root.mount: Succeeded.
May 26 12:05:01 charon CRON[1960480]: pam_unix(cron:session): session opened for user root by (uid=0)
May 26 12:05:01 charon CRON[1960481]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
May 26 12:05:01 charon CRON[1960480]: pam_unix(cron:session): session closed for user root
May 26 12:15:01 charon CRON[1961677]: pam_unix(cron:session): session opened for user root by (uid=0)
May 26 12:15:01 charon CRON[1961678]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
May 26 12:15:01 charon CRON[1961677]: pam_unix(cron:session): session closed for user root
May 26 12:17:01 charon CRON[1961923]: pam_unix(cron:session): session opened for user root by (uid=0)
May 26 12:17:01 charon CRON[1961924]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
May 26 12:17:01 charon CRON[1961923]: pam_unix(cron:session): session closed for user root
May 26 12:20:01 charon CRON[1962284]: pam_unix(cron:session): session opened for user smmsp by (uid=0)
May 26 12:20:01 charon CRON[1962286]: (smmsp) CMD (test -x /etc/init.d/sendmail && test -x /usr/share/sendmail/sendmail && test -x /usr/lib/sm.bin/sendmail && /usr/share/sendmail/sendmail cron-msp)
May 26 12:20:01 charon CRON[1962284]: pam_unix(cron:session): session closed for user smmsp
May 26 12:25:01 charon CRON[1962904]: pam_unix(cron:session): session opened for user root by (uid=0)
May 26 12:25:01 charon CRON[1962906]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
May 26 12:25:01 charon CRON[1962904]: pam_unix(cron:session): session closed for user root
May 26 12:30:01 charon CRON[1963512]: pam_unix(cron:session): session opened for user root by (uid=0)
May 26 12:30:01 charon CRON[1963514]: (root) CMD ([ -x /etc/init.d/anacron ] && if [ ! -d /run/systemd/system ]; then /usr/sbin/invoke-rc.d anacron start >/dev/null; fi)
May 26 12:30:01 charon CRON[1963512]: pam_unix(cron:session): session closed for user root
May 26 12:33:01 charon systemd[1]: Started Run anacron jobs.
May 26 12:33:01 charon anacron[1963871]: Anacron 2.3 started on 2022-05-26
May 26 12:33:01 charon anacron[1963871]: Normal exit (0 jobs run)
May 26 12:33:01 charon systemd[1]: anacron.service: Succeeded.
May 26 12:35:01 charon CRON[1964126]: pam_unix(cron:session): session opened for user root by (uid=0)
May 26 12:35:01 charon CRON[1964128]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
May 26 12:35:01 charon CRON[1964126]: pam_unix(cron:session): session closed for user root
May 26 12:40:01 charon CRON[1964736]: pam_unix(cron:session): session opened for user smmsp by (uid=0)
May 26 12:40:01 charon CRON[1964738]: (smmsp) CMD (test -x /etc/init.d/sendmail && test -x /usr/share/sendmail/sendmail && test -x /usr/lib/sm.bin/sendmail && /usr/share/sendmail/sendmail cron-msp)
May 26 12:40:01 charon CRON[1964736]: pam_unix(cron:session): session closed for user smmsp
May 26 12:45:01 charon CRON[1965359]: pam_unix(cron:session): session opened for user root by (uid=0)
May 26 12:45:01 charon CRON[1965361]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
May 26 12:45:01 charon CRON[1965359]: pam_unix(cron:session): session closed for user root
May 26 12:51:14 charon Radarr[866]: [Info] RssSyncService: Starting RSS Sync
May 26 12:51:16 charon Radarr[866]: [Info] DownloadDecisionMaker: Processing 100 releases
May 26 12:51:16 charon Radarr[866]: [Info] RssSyncService: RSS Sync Completed. Reports found: 100, Reports grabbed: 0
May 26 12:55:01 charon CRON[1966565]: pam_unix(cron:session): session opened for user root by (uid=0)
May 26 12:55:01 charon CRON[1966566]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
May 26 12:55:01 charon CRON[1966565]: pam_unix(cron:session): session closed for user root
May 26 13:00:01 charon CRON[1967160]: pam_unix(cron:session): session opened for user smmsp by (uid=0)
May 26 13:00:01 charon CRON[1967161]: (smmsp) CMD (test -x /etc/init.d/sendmail && test -x /usr/share/sendmail/sendmail && test -x /usr/lib/sm.bin/sendmail && /usr/share/sendmail/sendmail cron-msp)
May 26 13:00:01 charon CRON[1967160]: pam_unix(cron:session): session closed for user smmsp
May 26 13:05:01 charon CRON[1967787]: pam_unix(cron:session): session opened for user root by (uid=0)
May 26 13:05:01 charon CRON[1967789]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
May 26 13:05:01 charon CRON[1967787]: pam_unix(cron:session): session closed for user root
May 26 13:15:01 charon CRON[1968995]: pam_unix(cron:session): session opened for user root by (uid=0)
May 26 13:15:01 charon CRON[1968996]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
May 26 13:15:01 charon CRON[1968995]: pam_unix(cron:session): session closed for user root
May 26 13:17:01 charon CRON[1969237]: pam_unix(cron:session): session opened for user root by (uid=0)
May 26 13:17:01 charon CRON[1969238]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
May 26 13:17:01 charon CRON[1969237]: pam_unix(cron:session): session closed for user root
May 26 13:20:01 charon CRON[1969603]: pam_unix(cron:session): session opened for user smmsp by (uid=0)
May 26 13:20:01 charon CRON[1969605]: (smmsp) CMD (test -x /etc/init.d/sendmail && test -x /usr/share/sendmail/sendmail && test -x /usr/lib/sm.bin/sendmail && /usr/share/sendmail/sendmail cron-msp)
May 26 13:20:01 charon CRON[1969603]: pam_unix(cron:session): session closed for user smmsp
The next btrfs log data follows immediately (well actually a week or so later as I hadn't rebooted since) ☹
-- Reboot --
    Messages deleted
Jun 18 15:20:12 charon kernel: Btrfs loaded, crc32c=crc32c-intel
Jun 18 15:20:12 charon kernel: BTRFS: device fsid 4fc521d7-c18f-4cb3-9eac-d9d367e2b0eb devid 1 transid 130613 /dev/sdb1
Jun 18 15:20:12 charon kernel: BTRFS: device fsid c63bcf2b-e4e5-431f-b03d-36f822c68b53 devid 1 transid 5607929 /dev/sda2
Jun 18 15:20:12 charon kernel: BTRFS info (device sda2): flagging fs with big metadata feature
Jun 18 15:20:12 charon kernel: BTRFS info (device sda2): disk space caching is enabled
Jun 18 15:20:12 charon kernel: BTRFS info (device sda2): has skinny extents
Jun 18 15:20:12 charon kernel: BTRFS info (device sda2): enabling ssd optimizations
Jun 18 15:20:12 charon kernel: BTRFS info (device sda2): enabling auto defrag
Jun 18 15:20:12 charon kernel: BTRFS info (device sda2): turning on discard
Jun 18 15:20:12 charon kernel: BTRFS info (device sda2): use lzo compression, level 0
Jun 18 15:20:12 charon kernel: BTRFS info (device sda2): disk space caching is enabled
Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): flagging fs with big metadata feature
Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): disk space caching is enabled
Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): has skinny extents
Jun 18 15:20:18 charon kernel: BTRFS error (device sdb1): parent transid verify failed on 12554306306048 wanted 130605 found 127414
Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): read error corrected: ino 0 off 12554306306048 (dev /dev/sdb1 sector 1007336416)
Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): read error corrected: ino 0 off 12554306310144 (dev /dev/sdb1 sector 1007336424)
Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): read error corrected: ino 0 off 12554306314240 (dev /dev/sdb1 sector 1007336432)
Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): read error corrected: ino 0 off 12554306318336 (dev /dev/sdb1 sector 1007336440)
Jun 18 15:20:18 charon kernel: BTRFS error (device sdb1): parent transid verify failed on 12554682138624 wanted 129690 found 127567
Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): read error corrected: ino 0 off 12554682138624 (dev /dev/sdb1 sector 1008070464)
Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): read error corrected: ino 0 off 12554682142720 (dev /dev/sdb1 sector 1008070472)
Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): read error corrected: ino 0 off 12554682146816 (dev /dev/sdb1 sector 1008070480)
Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): read error corrected: ino 0 off 12554682150912 (dev /dev/sdb1 sector 1008070488)
Jun 18 15:20:18 charon kernel: BTRFS error (device sdb1): parent transid verify failed on 12554682155008 wanted 129690 found 127567
Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): read error corrected: ino 0 off 12554682155008 (dev /dev/sdb1 sector 1008070496)
Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): read error corrected: ino 0 off 12554682159104 (dev /dev/sdb1 sector 1008070504)
Jun 18 15:20:18 charon kernel: BTRFS error (device sdb1): parent transid verify failed on 12554992156672 wanted 130582 found 127355
Jun 18 15:20:18 charon kernel: BTRFS error (device sdb1): parent transid verify failed on 12554992156672 wanted 130582 found 127355
Jun 18 15:20:18 charon kernel: BTRFS error (device sdb1): failed to read block groups: -5
Jun 18 15:20:18 charon kernel: BTRFS error (device sdb1): open_ctree failed 

>You can try rescue=all mount option, which has the extra handling on
>corrupted extent tree.

>Although you have to use kernels newer than v5.15 (including v5.15) to
>benefit from the change.

Unfortunately: 
amonra@charon:~$ uname -a
Linux charon 5.4.0-113-generic #127-Ubuntu SMP Wed May 18 14:30:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux



-----Original Message-----
From: Qu Wenruo <quwenruo.btrfs@gmx.com> 
Sent: 19 June 2022 00:00
To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
Subject: Re: Problems with BTRFS formatted disk



On 2022/6/19 02:55, David C. Partridge wrote:
> It all started with a power outage.
>
> When I brought the system back up I got:
>
> Jun 18 15:40:27 charon kernel: BTRFS error (device sdb1): parent transid
> verify failed on 12554992156672 wanted 130582 found 127355
> Jun 18 15:40:27 charon kernel: BTRFS error (device sdb1): parent transid
> verify failed on 12554992156672 wanted 130582 found 127355

Some data write doesn't reach disk, even btrfs does the proper FLUSH call.

Mind to provide the disk model?

> Jun 18 15:40:27 charon kernel: BTRFS error (device sdb1): failed to read
> block groups: -5
> Jun 18 15:40:27 charon mount[629]: mount: /shared: wrong fs type, bad
> option, bad superblock on /dev/sdb1, missing codepage or helper program, or
> othe>
> Jun 18 15:40:27 charon systemd[1]: shared.mount: Mount process exited,
> code=exited, status=32/n/a
> Jun 18 15:40:27 charon systemd[1]: shared.mount: Failed with result
> 'exit-code'.
> Jun 18 15:40:27 charon systemd[1]: Failed to mount /shared.
> Jun 18 15:40:27 charon kernel: BTRFS error (device sdb1): open_ctree failed
>
> I tried:
> root@charon:/home/amonra# btrfs check /dev/sdb1
> Opening filesystem to check...
> parent transid verify failed on 12554992156672 wanted 130582 found 127355
> parent transid verify failed on 12554992156672 wanted 130582 found 127355
> parent transid verify failed on 12554992156672 wanted 130582 found 127355
> Ignoring transid failure
> leaf parent key incorrect 12554992156672
> ERROR: failed to read block groups: Operation not permitted
> ERROR: cannot open file system
> root@charon:/home/amonra# btrfs check -s 1 /dev/sdb1
> using SB copy 1, bytenr 67108864
> Opening filesystem to check...
> parent transid verify failed on 12554992156672 wanted 130582 found 127355
> parent transid verify failed on 12554992156672 wanted 130582 found 127355
> parent transid verify failed on 12554992156672 wanted 130582 found 127355
> Ignoring transid failure
> leaf parent key incorrect 12554992156672
> ERROR: failed to read block groups: Operation not permitted
> ERROR: cannot open file system
> root@charon:/home/amonra# btrfs check -s 2 /dev/sdb1
> using SB copy 2, bytenr 274877906944
> Opening filesystem to check...
> parent transid verify failed on 12554992156672 wanted 130582 found 127355
> parent transid verify failed on 12554992156672 wanted 130582 found 127355
> parent transid verify failed on 12554992156672 wanted 130582 found 127355
> Ignoring transid failure
> leaf parent key incorrect 12554992156672
> ERROR: failed to read block groups: Operation not permitted
> ERROR: cannot open file system
> root@charon:/home/amonra#
>
> but that didn't achieve much.
>
> Following advice I tried: btrfs rescue zero-log which appeared to work, but
> attempt to mount afterwards gave me:
>
> Jun 18 18:58:38 charon kernel: BTRFS info (device sdb1): flagging fs with
> big metadata feature
> Jun 18 18:58:38 charon kernel: BTRFS info (device sdb1): disk space caching
> is enabled
> Jun 18 18:58:38 charon kernel: BTRFS info (device sdb1): has skinny extents
> Jun 18 18:58:39 charon kernel: BTRFS error (device sdb1): parent transid
> verify failed on 12554992156672 wanted 130582 found 127355
> Jun 18 18:58:39 charon kernel: BTRFS error (device sdb1): parent transid
> verify failed on 12554992156672 wanted 130582 found 127355
> Jun 18 18:58:39 charon kernel: BTRFS error (device sdb1): failed to read
> block groups: -5
> Jun 18 18:58:39 charon kernel: BTRFS error (device sdb1): open_ctree failed
>
> In desperation I tried: btrfs check --repair which gave me:
>
> Opening filesystem to check...
> parent transid verify failed on 12554992156672 wanted 130582 found 127355
> parent transid verify failed on 12554992156672 wanted 130582 found 127355
> parent transid verify failed on 12554992156672 wanted 130582 found 127355
> Ignoring transid failure
> leaf parent key incorrect 12554992156672
> ERROR: failed to read block groups: Operation not permitted
> ERROR: cannot open file system
>
> So what do I do now?  I don't have a disk large enough to attempt btrfs
> restore (if that would even work).  I don't have a backup of this volume as
> this is my backup disk.

You can try rescue=all mount option, which has the extra handling on
corrupted extent tree.

Although you have to use kernels newer than v5.15 (including v5.15) to
benefit from the change.

Thanks,
Qu

>
> Thanks
> David
>
>
>
>
>
>
>
>
>
>
> Cheers, David
>
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Problems with BTRFS formatted disk
  2022-06-18 23:00 ` Qu Wenruo
  2022-06-19  1:33   ` David C. Partridge
@ 2022-06-19  1:37   ` David C. Partridge
  1 sibling, 0 replies; 24+ messages in thread
From: David C. Partridge @ 2022-06-19  1:37 UTC (permalink / raw)
  To: 'Qu Wenruo', linux-btrfs

It's a RAID 5 array hosted by an Adaptec ASR8885 (which thinks the disk is "Optimal").

-----Original Message-----
From: Qu Wenruo <quwenruo.btrfs@gmx.com> 
Sent: 19 June 2022 00:00
To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
Subject: Re: Problems with BTRFS formatted disk

>Mind to provide the disk model?



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problems with BTRFS formatted disk
  2022-06-19  1:33   ` David C. Partridge
@ 2022-06-19  2:01     ` Qu Wenruo
  2022-06-19 10:29       ` David C. Partridge
  0 siblings, 1 reply; 24+ messages in thread
From: Qu Wenruo @ 2022-06-19  2:01 UTC (permalink / raw)
  To: David C. Partridge, linux-btrfs



On 2022/6/19 09:33, David C. Partridge wrote:
> I at least know when the problem happened - the power fail happened about 13:20:01 on May 26th:
>
> May 26 12:00:01 charon CRON[1959806]: (root) CMD (mount -t btrfs -U c63bcf2b-e4e5-431f-b03d-36f822c68b53 /mnt/root && cd /mnt/root && btrfs-snaps hourly 3 | grep -Ev "$GREPOUT" ; cd / && umount /mnt/root)
> May 26 12:00:01 charon CRON[1959808]: (smmsp) CMD (test -x /etc/init.d/sendmail && test -x /usr/share/sendmail/sendmail && test -x /usr/lib/sm.bin/sendmail && /usr/share/sendmail/sendmail cron-msp)
> May 26 12:00:01 charon CRON[1959804]: pam_unix(cron:session): session closed for user smmsp
> May 26 12:00:01 charon CRON[1959802]: pam_unix(cron:session): session closed for user root
> May 26 12:00:01 charon systemd[1372]: mnt-shared.mount: Succeeded.
> May 26 12:00:01 charon systemd[1074]: mnt-shared.mount: Succeeded.
> May 26 12:00:01 charon systemd[1]: mnt-shared.mount: Succeeded.
> May 26 12:00:03 charon CRON[1959803]: pam_unix(cron:session): session closed for user root
> May 26 12:00:03 charon systemd[1]: mnt-root.mount: Succeeded.
> May 26 12:00:03 charon systemd[1074]: mnt-root.mount: Succeeded.
> May 26 12:00:03 charon systemd[1372]: mnt-root.mount: Succeeded.
> May 26 12:05:01 charon CRON[1960480]: pam_unix(cron:session): session opened for user root by (uid=0)
> May 26 12:05:01 charon CRON[1960481]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
> May 26 12:05:01 charon CRON[1960480]: pam_unix(cron:session): session closed for user root
> May 26 12:15:01 charon CRON[1961677]: pam_unix(cron:session): session opened for user root by (uid=0)
> May 26 12:15:01 charon CRON[1961678]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
> May 26 12:15:01 charon CRON[1961677]: pam_unix(cron:session): session closed for user root
> May 26 12:17:01 charon CRON[1961923]: pam_unix(cron:session): session opened for user root by (uid=0)
> May 26 12:17:01 charon CRON[1961924]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
> May 26 12:17:01 charon CRON[1961923]: pam_unix(cron:session): session closed for user root
> May 26 12:20:01 charon CRON[1962284]: pam_unix(cron:session): session opened for user smmsp by (uid=0)
> May 26 12:20:01 charon CRON[1962286]: (smmsp) CMD (test -x /etc/init.d/sendmail && test -x /usr/share/sendmail/sendmail && test -x /usr/lib/sm.bin/sendmail && /usr/share/sendmail/sendmail cron-msp)
> May 26 12:20:01 charon CRON[1962284]: pam_unix(cron:session): session closed for user smmsp
> May 26 12:25:01 charon CRON[1962904]: pam_unix(cron:session): session opened for user root by (uid=0)
> May 26 12:25:01 charon CRON[1962906]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
> May 26 12:25:01 charon CRON[1962904]: pam_unix(cron:session): session closed for user root
> May 26 12:30:01 charon CRON[1963512]: pam_unix(cron:session): session opened for user root by (uid=0)
> May 26 12:30:01 charon CRON[1963514]: (root) CMD ([ -x /etc/init.d/anacron ] && if [ ! -d /run/systemd/system ]; then /usr/sbin/invoke-rc.d anacron start >/dev/null; fi)
> May 26 12:30:01 charon CRON[1963512]: pam_unix(cron:session): session closed for user root
> May 26 12:33:01 charon systemd[1]: Started Run anacron jobs.
> May 26 12:33:01 charon anacron[1963871]: Anacron 2.3 started on 2022-05-26
> May 26 12:33:01 charon anacron[1963871]: Normal exit (0 jobs run)
> May 26 12:33:01 charon systemd[1]: anacron.service: Succeeded.
> May 26 12:35:01 charon CRON[1964126]: pam_unix(cron:session): session opened for user root by (uid=0)
> May 26 12:35:01 charon CRON[1964128]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
> May 26 12:35:01 charon CRON[1964126]: pam_unix(cron:session): session closed for user root
> May 26 12:40:01 charon CRON[1964736]: pam_unix(cron:session): session opened for user smmsp by (uid=0)
> May 26 12:40:01 charon CRON[1964738]: (smmsp) CMD (test -x /etc/init.d/sendmail && test -x /usr/share/sendmail/sendmail && test -x /usr/lib/sm.bin/sendmail && /usr/share/sendmail/sendmail cron-msp)
> May 26 12:40:01 charon CRON[1964736]: pam_unix(cron:session): session closed for user smmsp
> May 26 12:45:01 charon CRON[1965359]: pam_unix(cron:session): session opened for user root by (uid=0)
> May 26 12:45:01 charon CRON[1965361]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
> May 26 12:45:01 charon CRON[1965359]: pam_unix(cron:session): session closed for user root
> May 26 12:51:14 charon Radarr[866]: [Info] RssSyncService: Starting RSS Sync
> May 26 12:51:16 charon Radarr[866]: [Info] DownloadDecisionMaker: Processing 100 releases
> May 26 12:51:16 charon Radarr[866]: [Info] RssSyncService: RSS Sync Completed. Reports found: 100, Reports grabbed: 0
> May 26 12:55:01 charon CRON[1966565]: pam_unix(cron:session): session opened for user root by (uid=0)
> May 26 12:55:01 charon CRON[1966566]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
> May 26 12:55:01 charon CRON[1966565]: pam_unix(cron:session): session closed for user root
> May 26 13:00:01 charon CRON[1967160]: pam_unix(cron:session): session opened for user smmsp by (uid=0)
> May 26 13:00:01 charon CRON[1967161]: (smmsp) CMD (test -x /etc/init.d/sendmail && test -x /usr/share/sendmail/sendmail && test -x /usr/lib/sm.bin/sendmail && /usr/share/sendmail/sendmail cron-msp)
> May 26 13:00:01 charon CRON[1967160]: pam_unix(cron:session): session closed for user smmsp
> May 26 13:05:01 charon CRON[1967787]: pam_unix(cron:session): session opened for user root by (uid=0)
> May 26 13:05:01 charon CRON[1967789]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
> May 26 13:05:01 charon CRON[1967787]: pam_unix(cron:session): session closed for user root
> May 26 13:15:01 charon CRON[1968995]: pam_unix(cron:session): session opened for user root by (uid=0)
> May 26 13:15:01 charon CRON[1968996]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
> May 26 13:15:01 charon CRON[1968995]: pam_unix(cron:session): session closed for user root
> May 26 13:17:01 charon CRON[1969237]: pam_unix(cron:session): session opened for user root by (uid=0)
> May 26 13:17:01 charon CRON[1969238]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
> May 26 13:17:01 charon CRON[1969237]: pam_unix(cron:session): session closed for user root
> May 26 13:20:01 charon CRON[1969603]: pam_unix(cron:session): session opened for user smmsp by (uid=0)
> May 26 13:20:01 charon CRON[1969605]: (smmsp) CMD (test -x /etc/init.d/sendmail && test -x /usr/share/sendmail/sendmail && test -x /usr/lib/sm.bin/sendmail && /usr/share/sendmail/sendmail cron-msp)
> May 26 13:20:01 charon CRON[1969603]: pam_unix(cron:session): session closed for user smmsp
> The next btrfs log data follows immediately (well actually a week or so later as I hadn't rebooted since) ☹
> -- Reboot --
>      Messages deleted
> Jun 18 15:20:12 charon kernel: Btrfs loaded, crc32c=crc32c-intel
> Jun 18 15:20:12 charon kernel: BTRFS: device fsid 4fc521d7-c18f-4cb3-9eac-d9d367e2b0eb devid 1 transid 130613 /dev/sdb1
> Jun 18 15:20:12 charon kernel: BTRFS: device fsid c63bcf2b-e4e5-431f-b03d-36f822c68b53 devid 1 transid 5607929 /dev/sda2
> Jun 18 15:20:12 charon kernel: BTRFS info (device sda2): flagging fs with big metadata feature
> Jun 18 15:20:12 charon kernel: BTRFS info (device sda2): disk space caching is enabled
> Jun 18 15:20:12 charon kernel: BTRFS info (device sda2): has skinny extents
> Jun 18 15:20:12 charon kernel: BTRFS info (device sda2): enabling ssd optimizations
> Jun 18 15:20:12 charon kernel: BTRFS info (device sda2): enabling auto defrag
> Jun 18 15:20:12 charon kernel: BTRFS info (device sda2): turning on discard
> Jun 18 15:20:12 charon kernel: BTRFS info (device sda2): use lzo compression, level 0
> Jun 18 15:20:12 charon kernel: BTRFS info (device sda2): disk space caching is enabled
> Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): flagging fs with big metadata feature
> Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): disk space caching is enabled
> Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): has skinny extents
> Jun 18 15:20:18 charon kernel: BTRFS error (device sdb1): parent transid verify failed on 12554306306048 wanted 130605 found 127414
> Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): read error corrected: ino 0 off 12554306306048 (dev /dev/sdb1 sector 1007336416)
> Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): read error corrected: ino 0 off 12554306310144 (dev /dev/sdb1 sector 1007336424)
> Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): read error corrected: ino 0 off 12554306314240 (dev /dev/sdb1 sector 1007336432)
> Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): read error corrected: ino 0 off 12554306318336 (dev /dev/sdb1 sector 1007336440)
> Jun 18 15:20:18 charon kernel: BTRFS error (device sdb1): parent transid verify failed on 12554682138624 wanted 129690 found 127567
> Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): read error corrected: ino 0 off 12554682138624 (dev /dev/sdb1 sector 1008070464)
> Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): read error corrected: ino 0 off 12554682142720 (dev /dev/sdb1 sector 1008070472)
> Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): read error corrected: ino 0 off 12554682146816 (dev /dev/sdb1 sector 1008070480)
> Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): read error corrected: ino 0 off 12554682150912 (dev /dev/sdb1 sector 1008070488)
> Jun 18 15:20:18 charon kernel: BTRFS error (device sdb1): parent transid verify failed on 12554682155008 wanted 129690 found 127567
> Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): read error corrected: ino 0 off 12554682155008 (dev /dev/sdb1 sector 1008070496)
> Jun 18 15:20:18 charon kernel: BTRFS info (device sdb1): read error corrected: ino 0 off 12554682159104 (dev /dev/sdb1 sector 1008070504)
> Jun 18 15:20:18 charon kernel: BTRFS error (device sdb1): parent transid verify failed on 12554992156672 wanted 130582 found 127355
> Jun 18 15:20:18 charon kernel: BTRFS error (device sdb1): parent transid verify failed on 12554992156672 wanted 130582 found 127355

So this means, some copies are correct but some are not, the repaired
ones are from the other copy.
(Btrfs uses DUP by default for its metadata, thus the metadata is
written twice on top of the virtual disk provided by the RAID card).

I'd say, this already shows the raid card is doing something unexpected.

> Jun 18 15:20:18 charon kernel: BTRFS error (device sdb1): failed to read block groups: -5
> Jun 18 15:20:18 charon kernel: BTRFS error (device sdb1): open_ctree failed
>
>> You can try rescue=all mount option, which has the extra handling on
>> corrupted extent tree.
>
>> Although you have to use kernels newer than v5.15 (including v5.15) to
>> benefit from the change.
>
> Unfortunately:
> amonra@charon:~$ uname -a
> Linux charon 5.4.0-113-generic #127-Ubuntu SMP Wed May 18 14:30:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Any special reason that you can not even use a liveUSB to boot a newer
kernel to do the salvage?

 > It's a RAID 5 array hosted by an Adaptec ASR8885 (which thinks the
disk is "Optimal").

So I doubt if the card is doing something tricky with its cache or recovery.

Anyway for now what I can recommend is just find a way to run newer
kernel to utilize rescue=all mount option.
Or btrfs-restore is the only solution.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Problems with BTRFS formatted disk
  2022-06-19  2:01     ` Qu Wenruo
@ 2022-06-19 10:29       ` David C. Partridge
  2022-06-19 10:40         ` Qu Wenruo
  0 siblings, 1 reply; 24+ messages in thread
From: David C. Partridge @ 2022-06-19 10:29 UTC (permalink / raw)
  To: 'Qu Wenruo', linux-btrfs

Booted from live USB 22.04 LUbuntu.

root@lubuntu:/home/lubuntu# mount -t btrfs -o rescue=all /dev/sdc1 /mnt
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error.
root@lubuntu:/home/lubuntu# 

Content of system journal

Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): flagging fs with big metadata feature
Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): disk space caching is enabled
Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): has skinny extents
Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): parent transid verify failed on 12554992156672 wanted 130582 found 127355
Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): parent transid verify failed on 12554992156672 wanted 130582 found 127355
Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): failed to read block groups: -5
Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): open_ctree failed

David

-----Original Message-----
From: Qu Wenruo <quwenruo.btrfs@gmx.com> 
Sent: 19 June 2022 03:02
To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
Subject: Re: Problems with BTRFS formatted disk

>> You can try rescue=all mount option, which has the extra handling on
>> corrupted extent tree.
>
>> Although you have to use kernels newer than v5.15 (including v5.15) to
>> benefit from the change.
>
> Unfortunately:
> amonra@charon:~$ uname -a
> Linux charon 5.4.0-113-generic #127-Ubuntu SMP Wed May 18 14:30:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Any special reason that you can not even use a liveUSB to boot a newer
kernel to do the salvage?


Thanks,
Qu


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problems with BTRFS formatted disk
  2022-06-19 10:29       ` David C. Partridge
@ 2022-06-19 10:40         ` Qu Wenruo
  2022-06-19 11:14           ` David C. Partridge
  0 siblings, 1 reply; 24+ messages in thread
From: Qu Wenruo @ 2022-06-19 10:40 UTC (permalink / raw)
  To: David C. Partridge, linux-btrfs



On 2022/6/19 18:29, David C. Partridge wrote:
> Booted from live USB 22.04 LUbuntu.

Ubuntu kernel version doesn't seem to be that consistent even for its
LTS releases:

https://ubuntu.com/about/release-cycle#ubuntu-kernel-release-cycle

Please use something rolling released distro/branch instead.

Thanks,
Qu
>
> root@lubuntu:/home/lubuntu# mount -t btrfs -o rescue=all /dev/sdc1 /mnt
> mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error.
> root@lubuntu:/home/lubuntu#
>
> Content of system journal
>
> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): flagging fs with big metadata feature
> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): disk space caching is enabled
> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): has skinny extents
> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): parent transid verify failed on 12554992156672 wanted 130582 found 127355
> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): parent transid verify failed on 12554992156672 wanted 130582 found 127355
> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): failed to read block groups: -5
> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): open_ctree failed
>
> David
>
> -----Original Message-----
> From: Qu Wenruo <quwenruo.btrfs@gmx.com>
> Sent: 19 June 2022 03:02
> To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
> Subject: Re: Problems with BTRFS formatted disk
>
>>> You can try rescue=all mount option, which has the extra handling on
>>> corrupted extent tree.
>>
>>> Although you have to use kernels newer than v5.15 (including v5.15) to
>>> benefit from the change.
>>
>> Unfortunately:
>> amonra@charon:~$ uname -a
>> Linux charon 5.4.0-113-generic #127-Ubuntu SMP Wed May 18 14:30:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
>
> Any special reason that you can not even use a liveUSB to boot a newer
> kernel to do the salvage?
>
>
> Thanks,
> Qu
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Problems with BTRFS formatted disk
  2022-06-19 10:40         ` Qu Wenruo
@ 2022-06-19 11:14           ` David C. Partridge
  2022-06-19 11:51             ` Qu Wenruo
  0 siblings, 1 reply; 24+ messages in thread
From: David C. Partridge @ 2022-06-19 11:14 UTC (permalink / raw)
  To: 'Qu Wenruo', linux-btrfs

LUbuntu 22.04 was definitely 5.15 kernel, what alternative distro do you propose I use? 

-----Original Message-----
From: Qu Wenruo <quwenruo.btrfs@gmx.com> 
Sent: 19 June 2022 11:41
To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
Subject: Re: Problems with BTRFS formatted disk



On 2022/6/19 18:29, David C. Partridge wrote:
> Booted from live USB 22.04 LUbuntu.

Ubuntu kernel version doesn't seem to be that consistent even for its
LTS releases:

https://ubuntu.com/about/release-cycle#ubuntu-kernel-release-cycle

Please use something rolling released distro/branch instead.

Thanks,
Qu
>
> root@lubuntu:/home/lubuntu# mount -t btrfs -o rescue=all /dev/sdc1 /mnt
> mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error.
> root@lubuntu:/home/lubuntu#
>
> Content of system journal
>
> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): flagging fs with big metadata feature
> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): disk space caching is enabled
> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): has skinny extents
> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): parent transid verify failed on 12554992156672 wanted 130582 found 127355
> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): parent transid verify failed on 12554992156672 wanted 130582 found 127355
> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): failed to read block groups: -5
> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): open_ctree failed
>
> David
>
> -----Original Message-----
> From: Qu Wenruo <quwenruo.btrfs@gmx.com>
> Sent: 19 June 2022 03:02
> To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
> Subject: Re: Problems with BTRFS formatted disk
>
>>> You can try rescue=all mount option, which has the extra handling on
>>> corrupted extent tree.
>>
>>> Although you have to use kernels newer than v5.15 (including v5.15) to
>>> benefit from the change.
>>
>> Unfortunately:
>> amonra@charon:~$ uname -a
>> Linux charon 5.4.0-113-generic #127-Ubuntu SMP Wed May 18 14:30:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
>
> Any special reason that you can not even use a liveUSB to boot a newer
> kernel to do the salvage?
>
>
> Thanks,
> Qu
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problems with BTRFS formatted disk
  2022-06-19 11:14           ` David C. Partridge
@ 2022-06-19 11:51             ` Qu Wenruo
  2022-06-19 12:53               ` David C. Partridge
  0 siblings, 1 reply; 24+ messages in thread
From: Qu Wenruo @ 2022-06-19 11:51 UTC (permalink / raw)
  To: David C. Partridge, linux-btrfs



On 2022/6/19 19:14, David C. Partridge wrote:
> LUbuntu 22.04 was definitely 5.15 kernel, what alternative distro do you propose I use?

I have no idea why 22.04 doesn't work here.

The upstream commit is 2b29726c473b ("btrfs: rescue: allow ibadroots to
skip bad extent tree when reading block group items"), which is already
in v5.15 kernels.

I double checked the current code base, as long as it's error reading
the block group items and rescue=all (implies ibadroots), it should go
fill_dummy_bgs().

For the alternative distros, OpenSUSE tumbleweed, Archlinux, etc. As
they are definitely upstream and v5.15+.

For example, Archlinux 2022.06.01, it goes with 5.18 kernel:

$ file arch/boot/x86_64/vmlinuz-linux
arch/boot/x86_64/vmlinuz-linux: Linux kernel x86 boot executable
bzImage, version 5.18.1-arch1-1 (linux@archlinux) #1 SMP PREEMPT_DYNAMIC
Mon, 30 May 2022 17:53:11 +0000, RO-rootFS, swap_dev 0XA, Normal VGA

If that still doesn't work, let me creating a similar fs with some block
groups items corrupted to see why it doesn't work.

Thanks,
Qu
>
> -----Original Message-----
> From: Qu Wenruo <quwenruo.btrfs@gmx.com>
> Sent: 19 June 2022 11:41
> To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
> Subject: Re: Problems with BTRFS formatted disk
>
>
>
> On 2022/6/19 18:29, David C. Partridge wrote:
>> Booted from live USB 22.04 LUbuntu.
>
> Ubuntu kernel version doesn't seem to be that consistent even for its
> LTS releases:
>
> https://ubuntu.com/about/release-cycle#ubuntu-kernel-release-cycle
>
> Please use something rolling released distro/branch instead.
>
> Thanks,
> Qu
>>
>> root@lubuntu:/home/lubuntu# mount -t btrfs -o rescue=all /dev/sdc1 /mnt
>> mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error.
>> root@lubuntu:/home/lubuntu#
>>
>> Content of system journal
>>
>> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): flagging fs with big metadata feature
>> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): disk space caching is enabled
>> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): has skinny extents
>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): parent transid verify failed on 12554992156672 wanted 130582 found 127355
>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): parent transid verify failed on 12554992156672 wanted 130582 found 127355
>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): failed to read block groups: -5
>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): open_ctree failed
>>
>> David
>>
>> -----Original Message-----
>> From: Qu Wenruo <quwenruo.btrfs@gmx.com>
>> Sent: 19 June 2022 03:02
>> To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
>> Subject: Re: Problems with BTRFS formatted disk
>>
>>>> You can try rescue=all mount option, which has the extra handling on
>>>> corrupted extent tree.
>>>
>>>> Although you have to use kernels newer than v5.15 (including v5.15) to
>>>> benefit from the change.
>>>
>>> Unfortunately:
>>> amonra@charon:~$ uname -a
>>> Linux charon 5.4.0-113-generic #127-Ubuntu SMP Wed May 18 14:30:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
>>
>> Any special reason that you can not even use a liveUSB to boot a newer
>> kernel to do the salvage?
>>
>>
>> Thanks,
>> Qu
>>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Problems with BTRFS formatted disk
  2022-06-19 11:51             ` Qu Wenruo
@ 2022-06-19 12:53               ` David C. Partridge
  2022-06-19 13:21                 ` Qu Wenruo
  2022-06-19 13:26                 ` David C. Partridge
  0 siblings, 2 replies; 24+ messages in thread
From: David C. Partridge @ 2022-06-19 12:53 UTC (permalink / raw)
  To: 'Qu Wenruo', linux-btrfs

Here's what the 2022.06.01 version of Archlinux had to say in the log when I issued:

mount -t btrfs -o rescue=all /dev/sdc1 /mnt

Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): flagging fs with big metadata feature
Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): enabling all of the rescue options
Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): ignoring data csums
Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): ignoring bad roots
Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): disabling log replay at mount time
Jun 19 12:43:01 archiso kernel: BTRFS error (device sdc1): nologreplay must be used with ro mount option
Jun 19 12:43:01 archiso kernel: BTRFS error (device sdc1): open_ctree failed

Did I need to say:

mount -t btrfs -o ro,rescue=all /dev/sdc1 /mnt

D.

-----Original Message-----
From: Qu Wenruo <quwenruo.btrfs@gmx.com> 
Sent: 19 June 2022 12:51
To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
Subject: Re: Problems with BTRFS formatted disk



On 2022/6/19 19:14, David C. Partridge wrote:
> LUbuntu 22.04 was definitely 5.15 kernel, what alternative distro do you propose I use?

I have no idea why 22.04 doesn't work here.

The upstream commit is 2b29726c473b ("btrfs: rescue: allow ibadroots to
skip bad extent tree when reading block group items"), which is already
in v5.15 kernels.

I double checked the current code base, as long as it's error reading
the block group items and rescue=all (implies ibadroots), it should go
fill_dummy_bgs().

For the alternative distros, OpenSUSE tumbleweed, Archlinux, etc. As
they are definitely upstream and v5.15+.

For example, Archlinux 2022.06.01, it goes with 5.18 kernel:

$ file arch/boot/x86_64/vmlinuz-linux
arch/boot/x86_64/vmlinuz-linux: Linux kernel x86 boot executable
bzImage, version 5.18.1-arch1-1 (linux@archlinux) #1 SMP PREEMPT_DYNAMIC
Mon, 30 May 2022 17:53:11 +0000, RO-rootFS, swap_dev 0XA, Normal VGA

If that still doesn't work, let me creating a similar fs with some block
groups items corrupted to see why it doesn't work.

Thanks,
Qu
>
> -----Original Message-----
> From: Qu Wenruo <quwenruo.btrfs@gmx.com>
> Sent: 19 June 2022 11:41
> To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
> Subject: Re: Problems with BTRFS formatted disk
>
>
>
> On 2022/6/19 18:29, David C. Partridge wrote:
>> Booted from live USB 22.04 LUbuntu.
>
> Ubuntu kernel version doesn't seem to be that consistent even for its
> LTS releases:
>
> https://ubuntu.com/about/release-cycle#ubuntu-kernel-release-cycle
>
> Please use something rolling released distro/branch instead.
>
> Thanks,
> Qu
>>
>> root@lubuntu:/home/lubuntu# mount -t btrfs -o rescue=all /dev/sdc1 /mnt
>> mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error.
>> root@lubuntu:/home/lubuntu#
>>
>> Content of system journal
>>
>> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): flagging fs with big metadata feature
>> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): disk space caching is enabled
>> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): has skinny extents
>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): parent transid verify failed on 12554992156672 wanted 130582 found 127355
>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): parent transid verify failed on 12554992156672 wanted 130582 found 127355
>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): failed to read block groups: -5
>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): open_ctree failed
>>
>> David
>>
>> -----Original Message-----
>> From: Qu Wenruo <quwenruo.btrfs@gmx.com>
>> Sent: 19 June 2022 03:02
>> To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
>> Subject: Re: Problems with BTRFS formatted disk
>>
>>>> You can try rescue=all mount option, which has the extra handling on
>>>> corrupted extent tree.
>>>
>>>> Although you have to use kernels newer than v5.15 (including v5.15) to
>>>> benefit from the change.
>>>
>>> Unfortunately:
>>> amonra@charon:~$ uname -a
>>> Linux charon 5.4.0-113-generic #127-Ubuntu SMP Wed May 18 14:30:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
>>
>> Any special reason that you can not even use a liveUSB to boot a newer
>> kernel to do the salvage?
>>
>>
>> Thanks,
>> Qu
>>
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problems with BTRFS formatted disk
  2022-06-19 12:53               ` David C. Partridge
@ 2022-06-19 13:21                 ` Qu Wenruo
  2022-06-19 13:26                 ` David C. Partridge
  1 sibling, 0 replies; 24+ messages in thread
From: Qu Wenruo @ 2022-06-19 13:21 UTC (permalink / raw)
  To: David C. Partridge, linux-btrfs



On 2022/6/19 20:53, David C. Partridge wrote:
> Here's what the 2022.06.01 version of Archlinux had to say in the log when I issued:
>
> mount -t btrfs -o rescue=all /dev/sdc1 /mnt
>
> Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): flagging fs with big metadata feature
> Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): enabling all of the rescue options
> Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): ignoring data csums
> Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): ignoring bad roots
> Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): disabling log replay at mount time
> Jun 19 12:43:01 archiso kernel: BTRFS error (device sdc1): nologreplay must be used with ro mount option
> Jun 19 12:43:01 archiso kernel: BTRFS error (device sdc1): open_ctree failed
>
> Did I need to say:
>
> mount -t btrfs -o ro,rescue=all /dev/sdc1 /mnt

Yep.

Thanks,
Qu
>
> D.
>
> -----Original Message-----
> From: Qu Wenruo <quwenruo.btrfs@gmx.com>
> Sent: 19 June 2022 12:51
> To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
> Subject: Re: Problems with BTRFS formatted disk
>
>
>
> On 2022/6/19 19:14, David C. Partridge wrote:
>> LUbuntu 22.04 was definitely 5.15 kernel, what alternative distro do you propose I use?
>
> I have no idea why 22.04 doesn't work here.
>
> The upstream commit is 2b29726c473b ("btrfs: rescue: allow ibadroots to
> skip bad extent tree when reading block group items"), which is already
> in v5.15 kernels.
>
> I double checked the current code base, as long as it's error reading
> the block group items and rescue=all (implies ibadroots), it should go
> fill_dummy_bgs().
>
> For the alternative distros, OpenSUSE tumbleweed, Archlinux, etc. As
> they are definitely upstream and v5.15+.
>
> For example, Archlinux 2022.06.01, it goes with 5.18 kernel:
>
> $ file arch/boot/x86_64/vmlinuz-linux
> arch/boot/x86_64/vmlinuz-linux: Linux kernel x86 boot executable
> bzImage, version 5.18.1-arch1-1 (linux@archlinux) #1 SMP PREEMPT_DYNAMIC
> Mon, 30 May 2022 17:53:11 +0000, RO-rootFS, swap_dev 0XA, Normal VGA
>
> If that still doesn't work, let me creating a similar fs with some block
> groups items corrupted to see why it doesn't work.
>
> Thanks,
> Qu
>>
>> -----Original Message-----
>> From: Qu Wenruo <quwenruo.btrfs@gmx.com>
>> Sent: 19 June 2022 11:41
>> To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
>> Subject: Re: Problems with BTRFS formatted disk
>>
>>
>>
>> On 2022/6/19 18:29, David C. Partridge wrote:
>>> Booted from live USB 22.04 LUbuntu.
>>
>> Ubuntu kernel version doesn't seem to be that consistent even for its
>> LTS releases:
>>
>> https://ubuntu.com/about/release-cycle#ubuntu-kernel-release-cycle
>>
>> Please use something rolling released distro/branch instead.
>>
>> Thanks,
>> Qu
>>>
>>> root@lubuntu:/home/lubuntu# mount -t btrfs -o rescue=all /dev/sdc1 /mnt
>>> mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error.
>>> root@lubuntu:/home/lubuntu#
>>>
>>> Content of system journal
>>>
>>> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): flagging fs with big metadata feature
>>> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): disk space caching is enabled
>>> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): has skinny extents
>>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): parent transid verify failed on 12554992156672 wanted 130582 found 127355
>>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): parent transid verify failed on 12554992156672 wanted 130582 found 127355
>>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): failed to read block groups: -5
>>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): open_ctree failed
>>>
>>> David
>>>
>>> -----Original Message-----
>>> From: Qu Wenruo <quwenruo.btrfs@gmx.com>
>>> Sent: 19 June 2022 03:02
>>> To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
>>> Subject: Re: Problems with BTRFS formatted disk
>>>
>>>>> You can try rescue=all mount option, which has the extra handling on
>>>>> corrupted extent tree.
>>>>
>>>>> Although you have to use kernels newer than v5.15 (including v5.15) to
>>>>> benefit from the change.
>>>>
>>>> Unfortunately:
>>>> amonra@charon:~$ uname -a
>>>> Linux charon 5.4.0-113-generic #127-Ubuntu SMP Wed May 18 14:30:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> Any special reason that you can not even use a liveUSB to boot a newer
>>> kernel to do the salvage?
>>>
>>>
>>> Thanks,
>>> Qu
>>>
>>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Problems with BTRFS formatted disk
  2022-06-19 12:53               ` David C. Partridge
  2022-06-19 13:21                 ` Qu Wenruo
@ 2022-06-19 13:26                 ` David C. Partridge
  2022-06-19 13:30                   ` Qu Wenruo
  1 sibling, 1 reply; 24+ messages in thread
From: David C. Partridge @ 2022-06-19 13:26 UTC (permalink / raw)
  To: 'Qu Wenruo', linux-btrfs

Aha this is much more interesting:

I issued: mount -t btrfs -o ro,rescue=all /dev/sdc1 /mnt

And got this in the system log:

Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): flagging fs with big metadata feature
Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): enabling all of the rescue options
Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): ignoring data csums
Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): ignoring bad roots
Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): disabling log replay at mount time
Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): disk space caching is enabled
Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): has skinny extents
Jun 19 13:04:32 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 12554992156672 wanted 130582 found 127355
Jun 19 13:04:32 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 12554992156672 wanted 130582 found 127355
Jun 19 13:05:12 archiso systemd[1]: dev-virtio\x2dports-org.qemu.guest_agent.0.device: Job dev-virtio\x2dports-org.qemu.guest_agent.0.device/start timed out.
Jun 19 13:05:12 archiso systemd[1]: Timed out waiting for device /dev/virtio-ports/org.qemu.guest_agent.0.
Jun 19 13:05:12 archiso systemd[1]: Dependency failed for QEMU Guest Agent.
Jun 19 13:05:12 archiso systemd[1]: qemu-guest-agent.service: Job qemu-guest-agent.service/start failed with result 'dependency'.
Jun 19 13:05:12 archiso systemd[1]: dev-virtio\x2dports-org.qemu.guest_agent.0.device: Job dev-virtio\x2dports-org.qemu.guest_agent.0.device/start failed with result 'timeout'.
Jun 19 13:05:12 archiso systemd[1]: Reached target Multi-User System.
Jun 19 13:05:12 archiso systemd[1]: Reached target Graphical Interface.
Jun 19 13:05:12 archiso systemd[1]: Startup finished in 1min 4.847s (firmware) + 4.837s (loader) + 9.433s (kernel) + 1min 31.546s (userspace) = 2min 50.664s.
Jun 19 13:05:38 archiso kernel: BTRFS info (device sda2): flagging fs with big metadata feature
Jun 19 13:05:38 archiso kernel: BTRFS info (device sda2): disk space caching is enabled
Jun 19 13:05:38 archiso kernel: BTRFS info (device sda2): has skinny extents
Jun 19 13:05:38 archiso kernel: BTRFS info (device sda2): enabling ssd optimizations

ll /mnt got me this:

Jun 19 13:08:13 archiso kernel: verify_parent_transid: 4 callbacks suppressed
Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929

ls: cannot access '/mnt/@': Input/output error
ls: cannot access '/mnt/@_daily.20220525_00:11:01': Input/output error
ls: cannot access '/mnt/@_daily.20220526_00:11:01': Input/output error
ls: cannot access '/mnt/@_hourly.20220526_06:00:01': Input/output error
ls: cannot access '/mnt/@_hourly.20220526_09:00:01': Input/output error
ls: cannot access '/mnt/@_hourly.20220526_12:00:01': Input/output error
total 0
d????????? ? ?    ?      ?            ? @
drwxrwxr-x 1 root 1000 204 May 15 16:27 @_daily.20220523_00:11:01
drwxrwxr-x 1 root 1000 204 May 15 16:27 @_daily.20220524_00:11:01
d????????? ? ?    ?      ?            ? @_daily.20220525_00:11:01
d????????? ? ?    ?      ?            ? @_daily.20220526_00:11:01
d????????? ? ?    ?      ?            ? @_hourly.20220526_06:00:01
d????????? ? ?    ?      ?            ? @_hourly.20220526_09:00:01
d????????? ? ?    ?      ?            ? @_hourly.20220526_12:00:01
drwxrwxr-x 1 root 1000 184 Dec 16  2021 @_weekly.20220424_00:12:01
drwxrwxr-x 1 root 1000 184 Dec 16  2021 @_weekly.20220508_00:12:01
drwxrwxr-x 1 root 1000 184 Dec 16  2021 @_weekly.20220515_00:12:01
drwxrwxr-x 1 root 1000 204 May 15 16:27 @_weekly.20220522_00:12:01

So it appears that there may be recoverable sub-volumes there ...

So if I can remount it rw after having mounted it ro,rescue=all I should be able to delete the broken subvolumes and rename one of the @daily or @weekly ones that appear OK?

Or can I manipulate the subvolumes even if it is mounted ro?

Your guidance will be most welcome

D.

-----Original Message-----
From: David C. Partridge <david.partridge@perdrix.co.uk> 
Sent: 19 June 2022 13:54
To: 'Qu Wenruo' <quwenruo.btrfs@gmx.com>; linux-btrfs@vger.kernel.org
Subject: RE: Problems with BTRFS formatted disk

Here's what the 2022.06.01 version of Archlinux had to say in the log when I issued:

mount -t btrfs -o rescue=all /dev/sdc1 /mnt

Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): flagging fs with big metadata feature
Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): enabling all of the rescue options
Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): ignoring data csums
Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): ignoring bad roots
Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): disabling log replay at mount time
Jun 19 12:43:01 archiso kernel: BTRFS error (device sdc1): nologreplay must be used with ro mount option
Jun 19 12:43:01 archiso kernel: BTRFS error (device sdc1): open_ctree failed

Did I need to say:

mount -t btrfs -o ro,rescue=all /dev/sdc1 /mnt

D.

-----Original Message-----
From: Qu Wenruo <quwenruo.btrfs@gmx.com> 
Sent: 19 June 2022 12:51
To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
Subject: Re: Problems with BTRFS formatted disk



On 2022/6/19 19:14, David C. Partridge wrote:
> LUbuntu 22.04 was definitely 5.15 kernel, what alternative distro do you propose I use?

I have no idea why 22.04 doesn't work here.

The upstream commit is 2b29726c473b ("btrfs: rescue: allow ibadroots to
skip bad extent tree when reading block group items"), which is already
in v5.15 kernels.

I double checked the current code base, as long as it's error reading
the block group items and rescue=all (implies ibadroots), it should go
fill_dummy_bgs().

For the alternative distros, OpenSUSE tumbleweed, Archlinux, etc. As
they are definitely upstream and v5.15+.

For example, Archlinux 2022.06.01, it goes with 5.18 kernel:

$ file arch/boot/x86_64/vmlinuz-linux
arch/boot/x86_64/vmlinuz-linux: Linux kernel x86 boot executable
bzImage, version 5.18.1-arch1-1 (linux@archlinux) #1 SMP PREEMPT_DYNAMIC
Mon, 30 May 2022 17:53:11 +0000, RO-rootFS, swap_dev 0XA, Normal VGA

If that still doesn't work, let me creating a similar fs with some block
groups items corrupted to see why it doesn't work.

Thanks,
Qu
>
> -----Original Message-----
> From: Qu Wenruo <quwenruo.btrfs@gmx.com>
> Sent: 19 June 2022 11:41
> To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
> Subject: Re: Problems with BTRFS formatted disk
>
>
>
> On 2022/6/19 18:29, David C. Partridge wrote:
>> Booted from live USB 22.04 LUbuntu.
>
> Ubuntu kernel version doesn't seem to be that consistent even for its
> LTS releases:
>
> https://ubuntu.com/about/release-cycle#ubuntu-kernel-release-cycle
>
> Please use something rolling released distro/branch instead.
>
> Thanks,
> Qu
>>
>> root@lubuntu:/home/lubuntu# mount -t btrfs -o rescue=all /dev/sdc1 /mnt
>> mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error.
>> root@lubuntu:/home/lubuntu#
>>
>> Content of system journal
>>
>> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): flagging fs with big metadata feature
>> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): disk space caching is enabled
>> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): has skinny extents
>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): parent transid verify failed on 12554992156672 wanted 130582 found 127355
>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): parent transid verify failed on 12554992156672 wanted 130582 found 127355
>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): failed to read block groups: -5
>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): open_ctree failed
>>
>> David
>>
>> -----Original Message-----
>> From: Qu Wenruo <quwenruo.btrfs@gmx.com>
>> Sent: 19 June 2022 03:02
>> To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
>> Subject: Re: Problems with BTRFS formatted disk
>>
>>>> You can try rescue=all mount option, which has the extra handling on
>>>> corrupted extent tree.
>>>
>>>> Although you have to use kernels newer than v5.15 (including v5.15) to
>>>> benefit from the change.
>>>
>>> Unfortunately:
>>> amonra@charon:~$ uname -a
>>> Linux charon 5.4.0-113-generic #127-Ubuntu SMP Wed May 18 14:30:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
>>
>> Any special reason that you can not even use a liveUSB to boot a newer
>> kernel to do the salvage?
>>
>>
>> Thanks,
>> Qu
>>
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problems with BTRFS formatted disk
  2022-06-19 13:26                 ` David C. Partridge
@ 2022-06-19 13:30                   ` Qu Wenruo
  2022-06-19 14:15                     ` David C. Partridge
  0 siblings, 1 reply; 24+ messages in thread
From: Qu Wenruo @ 2022-06-19 13:30 UTC (permalink / raw)
  To: David C. Partridge, linux-btrfs



On 2022/6/19 21:26, David C. Partridge wrote:
> Aha this is much more interesting:
>
> I issued: mount -t btrfs -o ro,rescue=all /dev/sdc1 /mnt
>
> And got this in the system log:
>
> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): flagging fs with big metadata feature
> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): enabling all of the rescue options
> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): ignoring data csums
> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): ignoring bad roots
> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): disabling log replay at mount time
> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): disk space caching is enabled
> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): has skinny extents
> Jun 19 13:04:32 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 12554992156672 wanted 130582 found 127355
> Jun 19 13:04:32 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 12554992156672 wanted 130582 found 127355
> Jun 19 13:05:12 archiso systemd[1]: dev-virtio\x2dports-org.qemu.guest_agent.0.device: Job dev-virtio\x2dports-org.qemu.guest_agent.0.device/start timed out.
> Jun 19 13:05:12 archiso systemd[1]: Timed out waiting for device /dev/virtio-ports/org.qemu.guest_agent.0.
> Jun 19 13:05:12 archiso systemd[1]: Dependency failed for QEMU Guest Agent.
> Jun 19 13:05:12 archiso systemd[1]: qemu-guest-agent.service: Job qemu-guest-agent.service/start failed with result 'dependency'.
> Jun 19 13:05:12 archiso systemd[1]: dev-virtio\x2dports-org.qemu.guest_agent.0.device: Job dev-virtio\x2dports-org.qemu.guest_agent.0.device/start failed with result 'timeout'.
> Jun 19 13:05:12 archiso systemd[1]: Reached target Multi-User System.
> Jun 19 13:05:12 archiso systemd[1]: Reached target Graphical Interface.
> Jun 19 13:05:12 archiso systemd[1]: Startup finished in 1min 4.847s (firmware) + 4.837s (loader) + 9.433s (kernel) + 1min 31.546s (userspace) = 2min 50.664s.
> Jun 19 13:05:38 archiso kernel: BTRFS info (device sda2): flagging fs with big metadata feature
> Jun 19 13:05:38 archiso kernel: BTRFS info (device sda2): disk space caching is enabled
> Jun 19 13:05:38 archiso kernel: BTRFS info (device sda2): has skinny extents
> Jun 19 13:05:38 archiso kernel: BTRFS info (device sda2): enabling ssd optimizations
>
> ll /mnt got me this:
>
> Jun 19 13:08:13 archiso kernel: verify_parent_transid: 4 callbacks suppressed
> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929

This is definitely not just *some* metadata didn't reach disk, but
*tons* of metadata didn't reach disk.

All expected transid > found transid.

Almost certain the RAID card is doing something incorrectly related to
FLUSH.

>
> ls: cannot access '/mnt/@': Input/output error
> ls: cannot access '/mnt/@_daily.20220525_00:11:01': Input/output error
> ls: cannot access '/mnt/@_daily.20220526_00:11:01': Input/output error
> ls: cannot access '/mnt/@_hourly.20220526_06:00:01': Input/output error
> ls: cannot access '/mnt/@_hourly.20220526_09:00:01': Input/output error
> ls: cannot access '/mnt/@_hourly.20220526_12:00:01': Input/output error
> total 0
> d????????? ? ?    ?      ?            ? @
> drwxrwxr-x 1 root 1000 204 May 15 16:27 @_daily.20220523_00:11:01
> drwxrwxr-x 1 root 1000 204 May 15 16:27 @_daily.20220524_00:11:01
> d????????? ? ?    ?      ?            ? @_daily.20220525_00:11:01
> d????????? ? ?    ?      ?            ? @_daily.20220526_00:11:01
> d????????? ? ?    ?      ?            ? @_hourly.20220526_06:00:01
> d????????? ? ?    ?      ?            ? @_hourly.20220526_09:00:01
> d????????? ? ?    ?      ?            ? @_hourly.20220526_12:00:01
> drwxrwxr-x 1 root 1000 184 Dec 16  2021 @_weekly.20220424_00:12:01
> drwxrwxr-x 1 root 1000 184 Dec 16  2021 @_weekly.20220508_00:12:01
> drwxrwxr-x 1 root 1000 184 Dec 16  2021 @_weekly.20220515_00:12:01
> drwxrwxr-x 1 root 1000 204 May 15 16:27 @_weekly.20220522_00:12:01
>
> So it appears that there may be recoverable sub-volumes there ...
>
> So if I can remount it rw after having mounted it ro,rescue=all I should be able to delete the broken subvolumes and rename one of the @daily or @weekly ones that appear OK?

Nope, rescue=all is really just let you to grab what you can, the fs has
so many transid mismatch, is definitely no way to save.

And I strongly recommend to do more testing on that RAID5 card later
(for power loss tests).
That card doesn't sound cheap at all, and if such card doesn't do FLUSH
correctly, the vendor really deserve tons of blame.

Or it can be the HDDs? Mind to provide the model too?

Thanks,
Qu

>
> Or can I manipulate the subvolumes even if it is mounted ro?
>
> Your guidance will be most welcome
>
> D.
>
> -----Original Message-----
> From: David C. Partridge <david.partridge@perdrix.co.uk>
> Sent: 19 June 2022 13:54
> To: 'Qu Wenruo' <quwenruo.btrfs@gmx.com>; linux-btrfs@vger.kernel.org
> Subject: RE: Problems with BTRFS formatted disk
>
> Here's what the 2022.06.01 version of Archlinux had to say in the log when I issued:
>
> mount -t btrfs -o rescue=all /dev/sdc1 /mnt
>
> Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): flagging fs with big metadata feature
> Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): enabling all of the rescue options
> Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): ignoring data csums
> Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): ignoring bad roots
> Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): disabling log replay at mount time
> Jun 19 12:43:01 archiso kernel: BTRFS error (device sdc1): nologreplay must be used with ro mount option
> Jun 19 12:43:01 archiso kernel: BTRFS error (device sdc1): open_ctree failed
>
> Did I need to say:
>
> mount -t btrfs -o ro,rescue=all /dev/sdc1 /mnt
>
> D.
>
> -----Original Message-----
> From: Qu Wenruo <quwenruo.btrfs@gmx.com>
> Sent: 19 June 2022 12:51
> To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
> Subject: Re: Problems with BTRFS formatted disk
>
>
>
> On 2022/6/19 19:14, David C. Partridge wrote:
>> LUbuntu 22.04 was definitely 5.15 kernel, what alternative distro do you propose I use?
>
> I have no idea why 22.04 doesn't work here.
>
> The upstream commit is 2b29726c473b ("btrfs: rescue: allow ibadroots to
> skip bad extent tree when reading block group items"), which is already
> in v5.15 kernels.
>
> I double checked the current code base, as long as it's error reading
> the block group items and rescue=all (implies ibadroots), it should go
> fill_dummy_bgs().
>
> For the alternative distros, OpenSUSE tumbleweed, Archlinux, etc. As
> they are definitely upstream and v5.15+.
>
> For example, Archlinux 2022.06.01, it goes with 5.18 kernel:
>
> $ file arch/boot/x86_64/vmlinuz-linux
> arch/boot/x86_64/vmlinuz-linux: Linux kernel x86 boot executable
> bzImage, version 5.18.1-arch1-1 (linux@archlinux) #1 SMP PREEMPT_DYNAMIC
> Mon, 30 May 2022 17:53:11 +0000, RO-rootFS, swap_dev 0XA, Normal VGA
>
> If that still doesn't work, let me creating a similar fs with some block
> groups items corrupted to see why it doesn't work.
>
> Thanks,
> Qu
>>
>> -----Original Message-----
>> From: Qu Wenruo <quwenruo.btrfs@gmx.com>
>> Sent: 19 June 2022 11:41
>> To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
>> Subject: Re: Problems with BTRFS formatted disk
>>
>>
>>
>> On 2022/6/19 18:29, David C. Partridge wrote:
>>> Booted from live USB 22.04 LUbuntu.
>>
>> Ubuntu kernel version doesn't seem to be that consistent even for its
>> LTS releases:
>>
>> https://ubuntu.com/about/release-cycle#ubuntu-kernel-release-cycle
>>
>> Please use something rolling released distro/branch instead.
>>
>> Thanks,
>> Qu
>>>
>>> root@lubuntu:/home/lubuntu# mount -t btrfs -o rescue=all /dev/sdc1 /mnt
>>> mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error.
>>> root@lubuntu:/home/lubuntu#
>>>
>>> Content of system journal
>>>
>>> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): flagging fs with big metadata feature
>>> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): disk space caching is enabled
>>> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): has skinny extents
>>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): parent transid verify failed on 12554992156672 wanted 130582 found 127355
>>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): parent transid verify failed on 12554992156672 wanted 130582 found 127355
>>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): failed to read block groups: -5
>>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): open_ctree failed
>>>
>>> David
>>>
>>> -----Original Message-----
>>> From: Qu Wenruo <quwenruo.btrfs@gmx.com>
>>> Sent: 19 June 2022 03:02
>>> To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
>>> Subject: Re: Problems with BTRFS formatted disk
>>>
>>>>> You can try rescue=all mount option, which has the extra handling on
>>>>> corrupted extent tree.
>>>>
>>>>> Although you have to use kernels newer than v5.15 (including v5.15) to
>>>>> benefit from the change.
>>>>
>>>> Unfortunately:
>>>> amonra@charon:~$ uname -a
>>>> Linux charon 5.4.0-113-generic #127-Ubuntu SMP Wed May 18 14:30:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> Any special reason that you can not even use a liveUSB to boot a newer
>>> kernel to do the salvage?
>>>
>>>
>>> Thanks,
>>> Qu
>>>
>>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Problems with BTRFS formatted disk
  2022-06-19 13:30                   ` Qu Wenruo
@ 2022-06-19 14:15                     ` David C. Partridge
  2022-06-19 19:06                       ` Andrei Borzenkov
  2022-06-19 21:40                       ` Qu Wenruo
  0 siblings, 2 replies; 24+ messages in thread
From: David C. Partridge @ 2022-06-19 14:15 UTC (permalink / raw)
  To: 'Qu Wenruo', linux-btrfs

I can't "grab what I can" as I don't have enough TB to copy the data I want to save ☹

Does it make any sense to try:

 mount -o remount,rw /mnt
 btrfs subvolume delete /mnt/@
 btrfs subvolume delete /mnt/@_daily.20220525_00:11:01
 btrfs subvolume delete /mnt/@_daily.20220526_00:11:01
 btrfs subvolume delete /mnt/@_hourly.20220526_06:00:01
 btrfs subvolume delete /mnt/@_hourly.20220526_09:00:01
 btrfs subvolume delete /mnt/@_hourly.20220526_12:00:01

 mv /mnt/@_daily.20220524_00:11:01 /mnt/@

or is that doomed to total failure?

The disks behind the raid card are all Western Digital WD4001FYYG SAS drives

David


-----Original Message-----
From: Qu Wenruo <quwenruo.btrfs@gmx.com> 
Sent: 19 June 2022 14:31
To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
Subject: Re: Problems with BTRFS formatted disk



On 2022/6/19 21:26, David C. Partridge wrote:
> Aha this is much more interesting:
>
> I issued: mount -t btrfs -o ro,rescue=all /dev/sdc1 /mnt
>
> And got this in the system log:
>
> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): flagging fs with big metadata feature
> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): enabling all of the rescue options
> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): ignoring data csums
> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): ignoring bad roots
> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): disabling log replay at mount time
> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): disk space caching is enabled
> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): has skinny extents
> Jun 19 13:04:32 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 12554992156672 wanted 130582 found 127355
> Jun 19 13:04:32 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 12554992156672 wanted 130582 found 127355
> Jun 19 13:05:12 archiso systemd[1]: dev-virtio\x2dports-org.qemu.guest_agent.0.device: Job dev-virtio\x2dports-org.qemu.guest_agent.0.device/start timed out.
> Jun 19 13:05:12 archiso systemd[1]: Timed out waiting for device /dev/virtio-ports/org.qemu.guest_agent.0.
> Jun 19 13:05:12 archiso systemd[1]: Dependency failed for QEMU Guest Agent.
> Jun 19 13:05:12 archiso systemd[1]: qemu-guest-agent.service: Job qemu-guest-agent.service/start failed with result 'dependency'.
> Jun 19 13:05:12 archiso systemd[1]: dev-virtio\x2dports-org.qemu.guest_agent.0.device: Job dev-virtio\x2dports-org.qemu.guest_agent.0.device/start failed with result 'timeout'.
> Jun 19 13:05:12 archiso systemd[1]: Reached target Multi-User System.
> Jun 19 13:05:12 archiso systemd[1]: Reached target Graphical Interface.
> Jun 19 13:05:12 archiso systemd[1]: Startup finished in 1min 4.847s (firmware) + 4.837s (loader) + 9.433s (kernel) + 1min 31.546s (userspace) = 2min 50.664s.
> Jun 19 13:05:38 archiso kernel: BTRFS info (device sda2): flagging fs with big metadata feature
> Jun 19 13:05:38 archiso kernel: BTRFS info (device sda2): disk space caching is enabled
> Jun 19 13:05:38 archiso kernel: BTRFS info (device sda2): has skinny extents
> Jun 19 13:05:38 archiso kernel: BTRFS info (device sda2): enabling ssd optimizations
>
> ll /mnt got me this:
>
> Jun 19 13:08:13 archiso kernel: verify_parent_transid: 4 callbacks suppressed
> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929

This is definitely not just *some* metadata didn't reach disk, but
*tons* of metadata didn't reach disk.

All expected transid > found transid.

Almost certain the RAID card is doing something incorrectly related to
FLUSH.

>
> ls: cannot access '/mnt/@': Input/output error
> ls: cannot access '/mnt/@_daily.20220525_00:11:01': Input/output error
> ls: cannot access '/mnt/@_daily.20220526_00:11:01': Input/output error
> ls: cannot access '/mnt/@_hourly.20220526_06:00:01': Input/output error
> ls: cannot access '/mnt/@_hourly.20220526_09:00:01': Input/output error
> ls: cannot access '/mnt/@_hourly.20220526_12:00:01': Input/output error
> total 0
> d????????? ? ?    ?      ?            ? @
> drwxrwxr-x 1 root 1000 204 May 15 16:27 @_daily.20220523_00:11:01
> drwxrwxr-x 1 root 1000 204 May 15 16:27 @_daily.20220524_00:11:01
> d????????? ? ?    ?      ?            ? @_daily.20220525_00:11:01
> d????????? ? ?    ?      ?            ? @_daily.20220526_00:11:01
> d????????? ? ?    ?      ?            ? @_hourly.20220526_06:00:01
> d????????? ? ?    ?      ?            ? @_hourly.20220526_09:00:01
> d????????? ? ?    ?      ?            ? @_hourly.20220526_12:00:01
> drwxrwxr-x 1 root 1000 184 Dec 16  2021 @_weekly.20220424_00:12:01
> drwxrwxr-x 1 root 1000 184 Dec 16  2021 @_weekly.20220508_00:12:01
> drwxrwxr-x 1 root 1000 184 Dec 16  2021 @_weekly.20220515_00:12:01
> drwxrwxr-x 1 root 1000 204 May 15 16:27 @_weekly.20220522_00:12:01
>
> So it appears that there may be recoverable sub-volumes there ...
>
> So if I can remount it rw after having mounted it ro,rescue=all I should be able to delete the broken subvolumes and rename one of the @daily or @weekly ones that appear OK?

Nope, rescue=all is really just let you to grab what you can, the fs has
so many transid mismatch, is definitely no way to save.

And I strongly recommend to do more testing on that RAID5 card later
(for power loss tests).
That card doesn't sound cheap at all, and if such card doesn't do FLUSH
correctly, the vendor really deserve tons of blame.

Or it can be the HDDs? Mind to provide the model too?

Thanks,
Qu

>
> Or can I manipulate the subvolumes even if it is mounted ro?
>
> Your guidance will be most welcome
>
> D.
>
> -----Original Message-----
> From: David C. Partridge <david.partridge@perdrix.co.uk>
> Sent: 19 June 2022 13:54
> To: 'Qu Wenruo' <quwenruo.btrfs@gmx.com>; linux-btrfs@vger.kernel.org
> Subject: RE: Problems with BTRFS formatted disk
>
> Here's what the 2022.06.01 version of Archlinux had to say in the log when I issued:
>
> mount -t btrfs -o rescue=all /dev/sdc1 /mnt
>
> Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): flagging fs with big metadata feature
> Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): enabling all of the rescue options
> Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): ignoring data csums
> Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): ignoring bad roots
> Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): disabling log replay at mount time
> Jun 19 12:43:01 archiso kernel: BTRFS error (device sdc1): nologreplay must be used with ro mount option
> Jun 19 12:43:01 archiso kernel: BTRFS error (device sdc1): open_ctree failed
>
> Did I need to say:
>
> mount -t btrfs -o ro,rescue=all /dev/sdc1 /mnt
>
> D.
>
> -----Original Message-----
> From: Qu Wenruo <quwenruo.btrfs@gmx.com>
> Sent: 19 June 2022 12:51
> To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
> Subject: Re: Problems with BTRFS formatted disk
>
>
>
> On 2022/6/19 19:14, David C. Partridge wrote:
>> LUbuntu 22.04 was definitely 5.15 kernel, what alternative distro do you propose I use?
>
> I have no idea why 22.04 doesn't work here.
>
> The upstream commit is 2b29726c473b ("btrfs: rescue: allow ibadroots to
> skip bad extent tree when reading block group items"), which is already
> in v5.15 kernels.
>
> I double checked the current code base, as long as it's error reading
> the block group items and rescue=all (implies ibadroots), it should go
> fill_dummy_bgs().
>
> For the alternative distros, OpenSUSE tumbleweed, Archlinux, etc. As
> they are definitely upstream and v5.15+.
>
> For example, Archlinux 2022.06.01, it goes with 5.18 kernel:
>
> $ file arch/boot/x86_64/vmlinuz-linux
> arch/boot/x86_64/vmlinuz-linux: Linux kernel x86 boot executable
> bzImage, version 5.18.1-arch1-1 (linux@archlinux) #1 SMP PREEMPT_DYNAMIC
> Mon, 30 May 2022 17:53:11 +0000, RO-rootFS, swap_dev 0XA, Normal VGA
>
> If that still doesn't work, let me creating a similar fs with some block
> groups items corrupted to see why it doesn't work.
>
> Thanks,
> Qu
>>
>> -----Original Message-----
>> From: Qu Wenruo <quwenruo.btrfs@gmx.com>
>> Sent: 19 June 2022 11:41
>> To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
>> Subject: Re: Problems with BTRFS formatted disk
>>
>>
>>
>> On 2022/6/19 18:29, David C. Partridge wrote:
>>> Booted from live USB 22.04 LUbuntu.
>>
>> Ubuntu kernel version doesn't seem to be that consistent even for its
>> LTS releases:
>>
>> https://ubuntu.com/about/release-cycle#ubuntu-kernel-release-cycle
>>
>> Please use something rolling released distro/branch instead.
>>
>> Thanks,
>> Qu
>>>
>>> root@lubuntu:/home/lubuntu# mount -t btrfs -o rescue=all /dev/sdc1 /mnt
>>> mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error.
>>> root@lubuntu:/home/lubuntu#
>>>
>>> Content of system journal
>>>
>>> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): flagging fs with big metadata feature
>>> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): disk space caching is enabled
>>> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): has skinny extents
>>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): parent transid verify failed on 12554992156672 wanted 130582 found 127355
>>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): parent transid verify failed on 12554992156672 wanted 130582 found 127355
>>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): failed to read block groups: -5
>>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): open_ctree failed
>>>
>>> David
>>>
>>> -----Original Message-----
>>> From: Qu Wenruo <quwenruo.btrfs@gmx.com>
>>> Sent: 19 June 2022 03:02
>>> To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
>>> Subject: Re: Problems with BTRFS formatted disk
>>>
>>>>> You can try rescue=all mount option, which has the extra handling on
>>>>> corrupted extent tree.
>>>>
>>>>> Although you have to use kernels newer than v5.15 (including v5.15) to
>>>>> benefit from the change.
>>>>
>>>> Unfortunately:
>>>> amonra@charon:~$ uname -a
>>>> Linux charon 5.4.0-113-generic #127-Ubuntu SMP Wed May 18 14:30:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> Any special reason that you can not even use a liveUSB to boot a newer
>>> kernel to do the salvage?
>>>
>>>
>>> Thanks,
>>> Qu
>>>
>>
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problems with BTRFS formatted disk
  2022-06-19 14:15                     ` David C. Partridge
@ 2022-06-19 19:06                       ` Andrei Borzenkov
  2022-06-19 20:06                         ` David C. Partridge
  2022-06-19 21:40                       ` Qu Wenruo
  1 sibling, 1 reply; 24+ messages in thread
From: Andrei Borzenkov @ 2022-06-19 19:06 UTC (permalink / raw)
  To: David C. Partridge, 'Qu Wenruo', linux-btrfs

On 19.06.2022 17:15, David C. Partridge wrote:
> I can't "grab what I can" as I don't have enough TB to copy the data I want to save ☹
> 
> Does it make any sense to try:
> 
>  mount -o remount,rw /mnt
>  btrfs subvolume delete /mnt/@
>  btrfs subvolume delete /mnt/@_daily.20220525_00:11:01
>  btrfs subvolume delete /mnt/@_daily.20220526_00:11:01
>  btrfs subvolume delete /mnt/@_hourly.20220526_06:00:01
>  btrfs subvolume delete /mnt/@_hourly.20220526_09:00:01
>  btrfs subvolume delete /mnt/@_hourly.20220526_12:00:01
> 
>  mv /mnt/@_daily.20220524_00:11:01 /mnt/@
> 
> or is that doomed to total failure?
> 
> The disks behind the raid card are all Western Digital WD4001FYYG SAS drives
> 

Is write caching enabled for these disks? I know that it is default for
some RAID cards (at least, for some profiles).

For disks behind RAID controller write caching is normally managed by
RAID controller itself.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Problems with BTRFS formatted disk
  2022-06-19 19:06                       ` Andrei Borzenkov
@ 2022-06-19 20:06                         ` David C. Partridge
  2022-06-20  0:38                           ` Qu Wenruo
  0 siblings, 1 reply; 24+ messages in thread
From: David C. Partridge @ 2022-06-19 20:06 UTC (permalink / raw)
  To: 'Qu Wenruo', linux-btrfs

Yes write caching was enabled - I suspect that the way it worked was that on power fail the super-caps held the data until power was restored.

Sadly it wasn't restored for a few weeks by which time the super-caps had lost their charge.

I've reconfigured to use write through.


-----Original Message-----
From: Andrei Borzenkov <arvidjaar@gmail.com> 
Sent: 19 June 2022 20:06
To: David C. Partridge <david.partridge@perdrix.co.uk>; 'Qu Wenruo' <quwenruo.btrfs@gmx.com>; linux-btrfs@vger.kernel.org
Subject: Re: Problems with BTRFS formatted disk

On 19.06.2022 17:15, David C. Partridge wrote:
> I can't "grab what I can" as I don't have enough TB to copy the data I want to save ☹
> 
> Does it make any sense to try:
> 
>  mount -o remount,rw /mnt
>  btrfs subvolume delete /mnt/@
>  btrfs subvolume delete /mnt/@_daily.20220525_00:11:01
>  btrfs subvolume delete /mnt/@_daily.20220526_00:11:01
>  btrfs subvolume delete /mnt/@_hourly.20220526_06:00:01
>  btrfs subvolume delete /mnt/@_hourly.20220526_09:00:01
>  btrfs subvolume delete /mnt/@_hourly.20220526_12:00:01
> 
>  mv /mnt/@_daily.20220524_00:11:01 /mnt/@
> 
> or is that doomed to total failure?
> 
> The disks behind the raid card are all Western Digital WD4001FYYG SAS drives
> 

Is write caching enabled for these disks? I know that it is default for
some RAID cards (at least, for some profiles).

For disks behind RAID controller write caching is normally managed by
RAID controller itself.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problems with BTRFS formatted disk
  2022-06-19 14:15                     ` David C. Partridge
  2022-06-19 19:06                       ` Andrei Borzenkov
@ 2022-06-19 21:40                       ` Qu Wenruo
  2022-06-20  8:17                         ` David C. Partridge
  1 sibling, 1 reply; 24+ messages in thread
From: Qu Wenruo @ 2022-06-19 21:40 UTC (permalink / raw)
  To: David C. Partridge, linux-btrfs



On 2022/6/19 22:15, David C. Partridge wrote:
> I can't "grab what I can" as I don't have enough TB to copy the data I want to save ☹
>
> Does it make any sense to try:
>
>   mount -o remount,rw /mnt

Nope, remount RW will be completely rejected for rescue=all case.

>   btrfs subvolume delete /mnt/@
>   btrfs subvolume delete /mnt/@_daily.20220525_00:11:01
>   btrfs subvolume delete /mnt/@_daily.20220526_00:11:01
>   btrfs subvolume delete /mnt/@_hourly.20220526_06:00:01
>   btrfs subvolume delete /mnt/@_hourly.20220526_09:00:01
>   btrfs subvolume delete /mnt/@_hourly.20220526_12:00:01

Deleting them won't help, the transid mismatch is affecting too many
parts of the fs.

>
>   mv /mnt/@_daily.20220524_00:11:01 /mnt/@
>
> or is that doomed to total failure?

Mostly yes. Thus the only thing can do is really data salvage.

Thanks,
Qu

>
> The disks behind the raid card are all Western Digital WD4001FYYG SAS drives
>
> David
>
>
> -----Original Message-----
> From: Qu Wenruo <quwenruo.btrfs@gmx.com>
> Sent: 19 June 2022 14:31
> To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
> Subject: Re: Problems with BTRFS formatted disk
>
>
>
> On 2022/6/19 21:26, David C. Partridge wrote:
>> Aha this is much more interesting:
>>
>> I issued: mount -t btrfs -o ro,rescue=all /dev/sdc1 /mnt
>>
>> And got this in the system log:
>>
>> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): flagging fs with big metadata feature
>> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): enabling all of the rescue options
>> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): ignoring data csums
>> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): ignoring bad roots
>> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): disabling log replay at mount time
>> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): disk space caching is enabled
>> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): has skinny extents
>> Jun 19 13:04:32 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 12554992156672 wanted 130582 found 127355
>> Jun 19 13:04:32 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 12554992156672 wanted 130582 found 127355
>> Jun 19 13:05:12 archiso systemd[1]: dev-virtio\x2dports-org.qemu.guest_agent.0.device: Job dev-virtio\x2dports-org.qemu.guest_agent.0.device/start timed out.
>> Jun 19 13:05:12 archiso systemd[1]: Timed out waiting for device /dev/virtio-ports/org.qemu.guest_agent.0.
>> Jun 19 13:05:12 archiso systemd[1]: Dependency failed for QEMU Guest Agent.
>> Jun 19 13:05:12 archiso systemd[1]: qemu-guest-agent.service: Job qemu-guest-agent.service/start failed with result 'dependency'.
>> Jun 19 13:05:12 archiso systemd[1]: dev-virtio\x2dports-org.qemu.guest_agent.0.device: Job dev-virtio\x2dports-org.qemu.guest_agent.0.device/start failed with result 'timeout'.
>> Jun 19 13:05:12 archiso systemd[1]: Reached target Multi-User System.
>> Jun 19 13:05:12 archiso systemd[1]: Reached target Graphical Interface.
>> Jun 19 13:05:12 archiso systemd[1]: Startup finished in 1min 4.847s (firmware) + 4.837s (loader) + 9.433s (kernel) + 1min 31.546s (userspace) = 2min 50.664s.
>> Jun 19 13:05:38 archiso kernel: BTRFS info (device sda2): flagging fs with big metadata feature
>> Jun 19 13:05:38 archiso kernel: BTRFS info (device sda2): disk space caching is enabled
>> Jun 19 13:05:38 archiso kernel: BTRFS info (device sda2): has skinny extents
>> Jun 19 13:05:38 archiso kernel: BTRFS info (device sda2): enabling ssd optimizations
>>
>> ll /mnt got me this:
>>
>> Jun 19 13:08:13 archiso kernel: verify_parent_transid: 4 callbacks suppressed
>> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
>> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
>> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
>> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
>> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
>> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
>> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
>> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
>> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
>> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
>
> This is definitely not just *some* metadata didn't reach disk, but
> *tons* of metadata didn't reach disk.
>
> All expected transid > found transid.
>
> Almost certain the RAID card is doing something incorrectly related to
> FLUSH.
>
>>
>> ls: cannot access '/mnt/@': Input/output error
>> ls: cannot access '/mnt/@_daily.20220525_00:11:01': Input/output error
>> ls: cannot access '/mnt/@_daily.20220526_00:11:01': Input/output error
>> ls: cannot access '/mnt/@_hourly.20220526_06:00:01': Input/output error
>> ls: cannot access '/mnt/@_hourly.20220526_09:00:01': Input/output error
>> ls: cannot access '/mnt/@_hourly.20220526_12:00:01': Input/output error
>> total 0
>> d????????? ? ?    ?      ?            ? @
>> drwxrwxr-x 1 root 1000 204 May 15 16:27 @_daily.20220523_00:11:01
>> drwxrwxr-x 1 root 1000 204 May 15 16:27 @_daily.20220524_00:11:01
>> d????????? ? ?    ?      ?            ? @_daily.20220525_00:11:01
>> d????????? ? ?    ?      ?            ? @_daily.20220526_00:11:01
>> d????????? ? ?    ?      ?            ? @_hourly.20220526_06:00:01
>> d????????? ? ?    ?      ?            ? @_hourly.20220526_09:00:01
>> d????????? ? ?    ?      ?            ? @_hourly.20220526_12:00:01
>> drwxrwxr-x 1 root 1000 184 Dec 16  2021 @_weekly.20220424_00:12:01
>> drwxrwxr-x 1 root 1000 184 Dec 16  2021 @_weekly.20220508_00:12:01
>> drwxrwxr-x 1 root 1000 184 Dec 16  2021 @_weekly.20220515_00:12:01
>> drwxrwxr-x 1 root 1000 204 May 15 16:27 @_weekly.20220522_00:12:01
>>
>> So it appears that there may be recoverable sub-volumes there ...
>>
>> So if I can remount it rw after having mounted it ro,rescue=all I should be able to delete the broken subvolumes and rename one of the @daily or @weekly ones that appear OK?
>
> Nope, rescue=all is really just let you to grab what you can, the fs has
> so many transid mismatch, is definitely no way to save.
>
> And I strongly recommend to do more testing on that RAID5 card later
> (for power loss tests).
> That card doesn't sound cheap at all, and if such card doesn't do FLUSH
> correctly, the vendor really deserve tons of blame.
>
> Or it can be the HDDs? Mind to provide the model too?
>
> Thanks,
> Qu
>
>>
>> Or can I manipulate the subvolumes even if it is mounted ro?
>>
>> Your guidance will be most welcome
>>
>> D.
>>
>> -----Original Message-----
>> From: David C. Partridge <david.partridge@perdrix.co.uk>
>> Sent: 19 June 2022 13:54
>> To: 'Qu Wenruo' <quwenruo.btrfs@gmx.com>; linux-btrfs@vger.kernel.org
>> Subject: RE: Problems with BTRFS formatted disk
>>
>> Here's what the 2022.06.01 version of Archlinux had to say in the log when I issued:
>>
>> mount -t btrfs -o rescue=all /dev/sdc1 /mnt
>>
>> Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): flagging fs with big metadata feature
>> Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): enabling all of the rescue options
>> Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): ignoring data csums
>> Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): ignoring bad roots
>> Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): disabling log replay at mount time
>> Jun 19 12:43:01 archiso kernel: BTRFS error (device sdc1): nologreplay must be used with ro mount option
>> Jun 19 12:43:01 archiso kernel: BTRFS error (device sdc1): open_ctree failed
>>
>> Did I need to say:
>>
>> mount -t btrfs -o ro,rescue=all /dev/sdc1 /mnt
>>
>> D.
>>
>> -----Original Message-----
>> From: Qu Wenruo <quwenruo.btrfs@gmx.com>
>> Sent: 19 June 2022 12:51
>> To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
>> Subject: Re: Problems with BTRFS formatted disk
>>
>>
>>
>> On 2022/6/19 19:14, David C. Partridge wrote:
>>> LUbuntu 22.04 was definitely 5.15 kernel, what alternative distro do you propose I use?
>>
>> I have no idea why 22.04 doesn't work here.
>>
>> The upstream commit is 2b29726c473b ("btrfs: rescue: allow ibadroots to
>> skip bad extent tree when reading block group items"), which is already
>> in v5.15 kernels.
>>
>> I double checked the current code base, as long as it's error reading
>> the block group items and rescue=all (implies ibadroots), it should go
>> fill_dummy_bgs().
>>
>> For the alternative distros, OpenSUSE tumbleweed, Archlinux, etc. As
>> they are definitely upstream and v5.15+.
>>
>> For example, Archlinux 2022.06.01, it goes with 5.18 kernel:
>>
>> $ file arch/boot/x86_64/vmlinuz-linux
>> arch/boot/x86_64/vmlinuz-linux: Linux kernel x86 boot executable
>> bzImage, version 5.18.1-arch1-1 (linux@archlinux) #1 SMP PREEMPT_DYNAMIC
>> Mon, 30 May 2022 17:53:11 +0000, RO-rootFS, swap_dev 0XA, Normal VGA
>>
>> If that still doesn't work, let me creating a similar fs with some block
>> groups items corrupted to see why it doesn't work.
>>
>> Thanks,
>> Qu
>>>
>>> -----Original Message-----
>>> From: Qu Wenruo <quwenruo.btrfs@gmx.com>
>>> Sent: 19 June 2022 11:41
>>> To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
>>> Subject: Re: Problems with BTRFS formatted disk
>>>
>>>
>>>
>>> On 2022/6/19 18:29, David C. Partridge wrote:
>>>> Booted from live USB 22.04 LUbuntu.
>>>
>>> Ubuntu kernel version doesn't seem to be that consistent even for its
>>> LTS releases:
>>>
>>> https://ubuntu.com/about/release-cycle#ubuntu-kernel-release-cycle
>>>
>>> Please use something rolling released distro/branch instead.
>>>
>>> Thanks,
>>> Qu
>>>>
>>>> root@lubuntu:/home/lubuntu# mount -t btrfs -o rescue=all /dev/sdc1 /mnt
>>>> mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error.
>>>> root@lubuntu:/home/lubuntu#
>>>>
>>>> Content of system journal
>>>>
>>>> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): flagging fs with big metadata feature
>>>> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): disk space caching is enabled
>>>> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): has skinny extents
>>>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): parent transid verify failed on 12554992156672 wanted 130582 found 127355
>>>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): parent transid verify failed on 12554992156672 wanted 130582 found 127355
>>>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): failed to read block groups: -5
>>>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): open_ctree failed
>>>>
>>>> David
>>>>
>>>> -----Original Message-----
>>>> From: Qu Wenruo <quwenruo.btrfs@gmx.com>
>>>> Sent: 19 June 2022 03:02
>>>> To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
>>>> Subject: Re: Problems with BTRFS formatted disk
>>>>
>>>>>> You can try rescue=all mount option, which has the extra handling on
>>>>>> corrupted extent tree.
>>>>>
>>>>>> Although you have to use kernels newer than v5.15 (including v5.15) to
>>>>>> benefit from the change.
>>>>>
>>>>> Unfortunately:
>>>>> amonra@charon:~$ uname -a
>>>>> Linux charon 5.4.0-113-generic #127-Ubuntu SMP Wed May 18 14:30:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
>>>>
>>>> Any special reason that you can not even use a liveUSB to boot a newer
>>>> kernel to do the salvage?
>>>>
>>>>
>>>> Thanks,
>>>> Qu
>>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problems with BTRFS formatted disk
  2022-06-19 20:06                         ` David C. Partridge
@ 2022-06-20  0:38                           ` Qu Wenruo
  2022-06-20  1:01                             ` Wang Yugui
  2022-06-20  8:19                             ` David C. Partridge
  0 siblings, 2 replies; 24+ messages in thread
From: Qu Wenruo @ 2022-06-20  0:38 UTC (permalink / raw)
  To: David C. Partridge, linux-btrfs



On 2022/6/20 04:06, David C. Partridge wrote:
> Yes write caching was enabled - I suspect that the way it worked was that on power fail the super-caps held the data until power was restored.
>
> Sadly it wasn't restored for a few weeks by which time the super-caps had lost their charge.

A little off-topic here, since I'm not familiar with how those hardware
RAID controllers work.

Yep, those cards should have (super) caps to handle power loss, but for
  SCSI SYNC CACHE commands, they should "the device server ensure that
the specified logical blocks have their most recent data values recorded
in non-volatile cache and/or on the medium."

Considering in a power loss event, the juice in those caps is definitely
not enough to power those HDDs, it should at least have some
non-volatile cache, like NAND, as backups.

But from sites I can found, it only states the card has 1024MiB cache
memory.

Even with caps to keep the memory alive, it's still far from
"non-volatile cache" required by SCSI spec.

Or is this a common practice in hardware RAID controller world to use
volatile cache and break the SCSI spec requirement?

Thanks,
Qu
>
> I've reconfigured to use write through.
>
>
> -----Original Message-----
> From: Andrei Borzenkov <arvidjaar@gmail.com>
> Sent: 19 June 2022 20:06
> To: David C. Partridge <david.partridge@perdrix.co.uk>; 'Qu Wenruo' <quwenruo.btrfs@gmx.com>; linux-btrfs@vger.kernel.org
> Subject: Re: Problems with BTRFS formatted disk
>
> On 19.06.2022 17:15, David C. Partridge wrote:
>> I can't "grab what I can" as I don't have enough TB to copy the data I want to save ☹
>>
>> Does it make any sense to try:
>>
>>   mount -o remount,rw /mnt
>>   btrfs subvolume delete /mnt/@
>>   btrfs subvolume delete /mnt/@_daily.20220525_00:11:01
>>   btrfs subvolume delete /mnt/@_daily.20220526_00:11:01
>>   btrfs subvolume delete /mnt/@_hourly.20220526_06:00:01
>>   btrfs subvolume delete /mnt/@_hourly.20220526_09:00:01
>>   btrfs subvolume delete /mnt/@_hourly.20220526_12:00:01
>>
>>   mv /mnt/@_daily.20220524_00:11:01 /mnt/@
>>
>> or is that doomed to total failure?
>>
>> The disks behind the raid card are all Western Digital WD4001FYYG SAS drives
>>
>
> Is write caching enabled for these disks? I know that it is default for
> some RAID cards (at least, for some profiles).
>
> For disks behind RAID controller write caching is normally managed by
> RAID controller itself.
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problems with BTRFS formatted disk
  2022-06-20  0:38                           ` Qu Wenruo
@ 2022-06-20  1:01                             ` Wang Yugui
  2022-06-20  2:04                               ` Qu Wenruo
  2022-06-20  8:19                             ` David C. Partridge
  1 sibling, 1 reply; 24+ messages in thread
From: Wang Yugui @ 2022-06-20  1:01 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: David C. Partridge, linux-btrfs

Hi,

> On 2022/6/20 04:06, David C. Partridge wrote:
> > Yes write caching was enabled - I suspect that the way it worked was that on power fail the super-caps held the data until power was restored.
> >
> > Sadly it wasn't restored for a few weeks by which time the super-caps had lost their charge.
> 
> A little off-topic here, since I'm not familiar with how those hardware
> RAID controllers work.
> 
> Yep, those cards should have (super) caps to handle power loss, but for
>   SCSI SYNC CACHE commands, they should "the device server ensure that
> the specified logical blocks have their most recent data values recorded
> in non-volatile cache and/or on the medium."
> 
> Considering in a power loss event, the juice in those caps is definitely
> not enough to power those HDDs, it should at least have some
> non-volatile cache, like NAND, as backups.
> 
> But from sites I can found, it only states the card has 1024MiB cache
> memory.
> 
> Even with caps to keep the memory alive, it's still far from
> "non-volatile cache" required by SCSI spec.
> 
> Or is this a common practice in hardware RAID controller world to use
> volatile cache and break the SCSI spec requirement?


"non-volatile cache"  require a battery(inside or separated) and
Backup Flash(inside or separated).

For Microsemi Adaptec,
the battery and Backup Flash are in separated 'Flash Backup Module
AFM-700'.

For broadcom MegaRAID 9480/9580,
Backup Flash is inside the card,
but the battery is in separated CacheVault CVPM05.

For Dell, H730/H740 have battery/backup flash inside,
but H330 have no battery inside and we can not add battery to H330.

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2022/06/20



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problems with BTRFS formatted disk
  2022-06-20  1:01                             ` Wang Yugui
@ 2022-06-20  2:04                               ` Qu Wenruo
  2022-06-20  2:12                                 ` Wang Yugui
  0 siblings, 1 reply; 24+ messages in thread
From: Qu Wenruo @ 2022-06-20  2:04 UTC (permalink / raw)
  To: Wang Yugui; +Cc: David C. Partridge, linux-btrfs



On 2022/6/20 09:01, Wang Yugui wrote:
> Hi,
>
>> On 2022/6/20 04:06, David C. Partridge wrote:
>>> Yes write caching was enabled - I suspect that the way it worked was that on power fail the super-caps held the data until power was restored.
>>>
>>> Sadly it wasn't restored for a few weeks by which time the super-caps had lost their charge.
>>
>> A little off-topic here, since I'm not familiar with how those hardware
>> RAID controllers work.
>>
>> Yep, those cards should have (super) caps to handle power loss, but for
>>    SCSI SYNC CACHE commands, they should "the device server ensure that
>> the specified logical blocks have their most recent data values recorded
>> in non-volatile cache and/or on the medium."
>>
>> Considering in a power loss event, the juice in those caps is definitely
>> not enough to power those HDDs, it should at least have some
>> non-volatile cache, like NAND, as backups.
>>
>> But from sites I can found, it only states the card has 1024MiB cache
>> memory.
>>
>> Even with caps to keep the memory alive, it's still far from
>> "non-volatile cache" required by SCSI spec.
>>
>> Or is this a common practice in hardware RAID controller world to use
>> volatile cache and break the SCSI spec requirement?
>
>
> "non-volatile cache"  require a battery(inside or separated) and
> Backup Flash(inside or separated).

Then it comes the question, why there is need for battery if our
non-volatile cache is NAND flash?
Since NAND doesn't need power to keep its data.

My guess is, NAND flash is not fast enough so they have battery for RAM
as the primary cache, and only when power loss happen, then use the
battery to power the RAM so that we have enough time to dump the content
of RAM into the backup flash?

Thanks,
Qu

>
> For Microsemi Adaptec,
> the battery and Backup Flash are in separated 'Flash Backup Module
> AFM-700'.
>
> For broadcom MegaRAID 9480/9580,
> Backup Flash is inside the card,
> but the battery is in separated CacheVault CVPM05.
>
> For Dell, H730/H740 have battery/backup flash inside,
> but H330 have no battery inside and we can not add battery to H330.
>
> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2022/06/20
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problems with BTRFS formatted disk
  2022-06-20  2:04                               ` Qu Wenruo
@ 2022-06-20  2:12                                 ` Wang Yugui
  2022-06-20  2:12                                   ` Qu Wenruo
  0 siblings, 1 reply; 24+ messages in thread
From: Wang Yugui @ 2022-06-20  2:12 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: David C. Partridge, linux-btrfs

Hi,

> On 2022/6/20 09:01, Wang Yugui wrote:
> > Hi,
> >
> >> On 2022/6/20 04:06, David C. Partridge wrote:
> >>> Yes write caching was enabled - I suspect that the way it worked was that on power fail the super-caps held the data until power was restored.
> >>>
> >>> Sadly it wasn't restored for a few weeks by which time the super-caps had lost their charge.
> >>
> >> A little off-topic here, since I'm not familiar with how those hardware
> >> RAID controllers work.
> >>
> >> Yep, those cards should have (super) caps to handle power loss, but for
> >>    SCSI SYNC CACHE commands, they should "the device server ensure that
> >> the specified logical blocks have their most recent data values recorded
> >> in non-volatile cache and/or on the medium."
> >>
> >> Considering in a power loss event, the juice in those caps is definitely
> >> not enough to power those HDDs, it should at least have some
> >> non-volatile cache, like NAND, as backups.
> >>
> >> But from sites I can found, it only states the card has 1024MiB cache
> >> memory.
> >>
> >> Even with caps to keep the memory alive, it's still far from
> >> "non-volatile cache" required by SCSI spec.
> >>
> >> Or is this a common practice in hardware RAID controller world to use
> >> volatile cache and break the SCSI spec requirement?
> >
> >
> > "non-volatile cache"  require a battery(inside or separated) and
> > Backup Flash(inside or separated).
> 
> Then it comes the question, why there is need for battery if our
> non-volatile cache is NAND flash?
> Since NAND doesn't need power to keep its data.
> 
> My guess is, NAND flash is not fast enough so they have battery for RAM
> as the primary cache, and only when power loss happen, then use the
> battery to power the RAM so that we have enough time to dump the content
> of RAM into the backup flash?

Yes.

RAM/memory is fast than NAND. 
and NAND have the limit of total write bytes.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problems with BTRFS formatted disk
  2022-06-20  2:12                                 ` Wang Yugui
@ 2022-06-20  2:12                                   ` Qu Wenruo
  0 siblings, 0 replies; 24+ messages in thread
From: Qu Wenruo @ 2022-06-20  2:12 UTC (permalink / raw)
  To: Wang Yugui; +Cc: David C. Partridge, linux-btrfs



On 2022/6/20 10:12, Wang Yugui wrote:
> Hi,
>
>> On 2022/6/20 09:01, Wang Yugui wrote:
>>> Hi,
>>>
>>>> On 2022/6/20 04:06, David C. Partridge wrote:
>>>>> Yes write caching was enabled - I suspect that the way it worked was that on power fail the super-caps held the data until power was restored.
>>>>>
>>>>> Sadly it wasn't restored for a few weeks by which time the super-caps had lost their charge.
>>>>
>>>> A little off-topic here, since I'm not familiar with how those hardware
>>>> RAID controllers work.
>>>>
>>>> Yep, those cards should have (super) caps to handle power loss, but for
>>>>     SCSI SYNC CACHE commands, they should "the device server ensure that
>>>> the specified logical blocks have their most recent data values recorded
>>>> in non-volatile cache and/or on the medium."
>>>>
>>>> Considering in a power loss event, the juice in those caps is definitely
>>>> not enough to power those HDDs, it should at least have some
>>>> non-volatile cache, like NAND, as backups.
>>>>
>>>> But from sites I can found, it only states the card has 1024MiB cache
>>>> memory.
>>>>
>>>> Even with caps to keep the memory alive, it's still far from
>>>> "non-volatile cache" required by SCSI spec.
>>>>
>>>> Or is this a common practice in hardware RAID controller world to use
>>>> volatile cache and break the SCSI spec requirement?
>>>
>>>
>>> "non-volatile cache"  require a battery(inside or separated) and
>>> Backup Flash(inside or separated).
>>
>> Then it comes the question, why there is need for battery if our
>> non-volatile cache is NAND flash?
>> Since NAND doesn't need power to keep its data.
>>
>> My guess is, NAND flash is not fast enough so they have battery for RAM
>> as the primary cache, and only when power loss happen, then use the
>> battery to power the RAM so that we have enough time to dump the content
>> of RAM into the backup flash?
>
> Yes.
>
> RAM/memory is fast than NAND.
> and NAND have the limit of total write bytes.
>
OK, got it.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Problems with BTRFS formatted disk
  2022-06-19 21:40                       ` Qu Wenruo
@ 2022-06-20  8:17                         ` David C. Partridge
  0 siblings, 0 replies; 24+ messages in thread
From: David C. Partridge @ 2022-06-20  8:17 UTC (permalink / raw)
  To: 'Qu Wenruo', linux-btrfs

OK 16TB USB disk on order ...

-----Original Message-----
From: Qu Wenruo <quwenruo.btrfs@gmx.com> 
Sent: 19 June 2022 22:40
To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
Subject: Re: Problems with BTRFS formatted disk



On 2022/6/19 22:15, David C. Partridge wrote:
> I can't "grab what I can" as I don't have enough TB to copy the data I want to save ☹
>
> Does it make any sense to try:
>
>   mount -o remount,rw /mnt

Nope, remount RW will be completely rejected for rescue=all case.

>   btrfs subvolume delete /mnt/@
>   btrfs subvolume delete /mnt/@_daily.20220525_00:11:01
>   btrfs subvolume delete /mnt/@_daily.20220526_00:11:01
>   btrfs subvolume delete /mnt/@_hourly.20220526_06:00:01
>   btrfs subvolume delete /mnt/@_hourly.20220526_09:00:01
>   btrfs subvolume delete /mnt/@_hourly.20220526_12:00:01

Deleting them won't help, the transid mismatch is affecting too many
parts of the fs.

>
>   mv /mnt/@_daily.20220524_00:11:01 /mnt/@
>
> or is that doomed to total failure?

Mostly yes. Thus the only thing can do is really data salvage.

Thanks,
Qu

>
> The disks behind the raid card are all Western Digital WD4001FYYG SAS drives
>
> David
>
>
> -----Original Message-----
> From: Qu Wenruo <quwenruo.btrfs@gmx.com>
> Sent: 19 June 2022 14:31
> To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
> Subject: Re: Problems with BTRFS formatted disk
>
>
>
> On 2022/6/19 21:26, David C. Partridge wrote:
>> Aha this is much more interesting:
>>
>> I issued: mount -t btrfs -o ro,rescue=all /dev/sdc1 /mnt
>>
>> And got this in the system log:
>>
>> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): flagging fs with big metadata feature
>> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): enabling all of the rescue options
>> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): ignoring data csums
>> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): ignoring bad roots
>> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): disabling log replay at mount time
>> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): disk space caching is enabled
>> Jun 19 13:04:32 archiso kernel: BTRFS info (device sdc1): has skinny extents
>> Jun 19 13:04:32 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 12554992156672 wanted 130582 found 127355
>> Jun 19 13:04:32 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 12554992156672 wanted 130582 found 127355
>> Jun 19 13:05:12 archiso systemd[1]: dev-virtio\x2dports-org.qemu.guest_agent.0.device: Job dev-virtio\x2dports-org.qemu.guest_agent.0.device/start timed out.
>> Jun 19 13:05:12 archiso systemd[1]: Timed out waiting for device /dev/virtio-ports/org.qemu.guest_agent.0.
>> Jun 19 13:05:12 archiso systemd[1]: Dependency failed for QEMU Guest Agent.
>> Jun 19 13:05:12 archiso systemd[1]: qemu-guest-agent.service: Job qemu-guest-agent.service/start failed with result 'dependency'.
>> Jun 19 13:05:12 archiso systemd[1]: dev-virtio\x2dports-org.qemu.guest_agent.0.device: Job dev-virtio\x2dports-org.qemu.guest_agent.0.device/start failed with result 'timeout'.
>> Jun 19 13:05:12 archiso systemd[1]: Reached target Multi-User System.
>> Jun 19 13:05:12 archiso systemd[1]: Reached target Graphical Interface.
>> Jun 19 13:05:12 archiso systemd[1]: Startup finished in 1min 4.847s (firmware) + 4.837s (loader) + 9.433s (kernel) + 1min 31.546s (userspace) = 2min 50.664s.
>> Jun 19 13:05:38 archiso kernel: BTRFS info (device sda2): flagging fs with big metadata feature
>> Jun 19 13:05:38 archiso kernel: BTRFS info (device sda2): disk space caching is enabled
>> Jun 19 13:05:38 archiso kernel: BTRFS info (device sda2): has skinny extents
>> Jun 19 13:05:38 archiso kernel: BTRFS info (device sda2): enabling ssd optimizations
>>
>> ll /mnt got me this:
>>
>> Jun 19 13:08:13 archiso kernel: verify_parent_transid: 4 callbacks suppressed
>> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
>> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
>> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
>> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
>> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
>> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
>> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
>> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
>> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
>> Jun 19 13:08:13 archiso kernel: BTRFS error (device sdc1: state C): parent transid verify failed on 576192512 wanted 129948 found 122929
>
> This is definitely not just *some* metadata didn't reach disk, but
> *tons* of metadata didn't reach disk.
>
> All expected transid > found transid.
>
> Almost certain the RAID card is doing something incorrectly related to
> FLUSH.
>
>>
>> ls: cannot access '/mnt/@': Input/output error
>> ls: cannot access '/mnt/@_daily.20220525_00:11:01': Input/output error
>> ls: cannot access '/mnt/@_daily.20220526_00:11:01': Input/output error
>> ls: cannot access '/mnt/@_hourly.20220526_06:00:01': Input/output error
>> ls: cannot access '/mnt/@_hourly.20220526_09:00:01': Input/output error
>> ls: cannot access '/mnt/@_hourly.20220526_12:00:01': Input/output error
>> total 0
>> d????????? ? ?    ?      ?            ? @
>> drwxrwxr-x 1 root 1000 204 May 15 16:27 @_daily.20220523_00:11:01
>> drwxrwxr-x 1 root 1000 204 May 15 16:27 @_daily.20220524_00:11:01
>> d????????? ? ?    ?      ?            ? @_daily.20220525_00:11:01
>> d????????? ? ?    ?      ?            ? @_daily.20220526_00:11:01
>> d????????? ? ?    ?      ?            ? @_hourly.20220526_06:00:01
>> d????????? ? ?    ?      ?            ? @_hourly.20220526_09:00:01
>> d????????? ? ?    ?      ?            ? @_hourly.20220526_12:00:01
>> drwxrwxr-x 1 root 1000 184 Dec 16  2021 @_weekly.20220424_00:12:01
>> drwxrwxr-x 1 root 1000 184 Dec 16  2021 @_weekly.20220508_00:12:01
>> drwxrwxr-x 1 root 1000 184 Dec 16  2021 @_weekly.20220515_00:12:01
>> drwxrwxr-x 1 root 1000 204 May 15 16:27 @_weekly.20220522_00:12:01
>>
>> So it appears that there may be recoverable sub-volumes there ...
>>
>> So if I can remount it rw after having mounted it ro,rescue=all I should be able to delete the broken subvolumes and rename one of the @daily or @weekly ones that appear OK?
>
> Nope, rescue=all is really just let you to grab what you can, the fs has
> so many transid mismatch, is definitely no way to save.
>
> And I strongly recommend to do more testing on that RAID5 card later
> (for power loss tests).
> That card doesn't sound cheap at all, and if such card doesn't do FLUSH
> correctly, the vendor really deserve tons of blame.
>
> Or it can be the HDDs? Mind to provide the model too?
>
> Thanks,
> Qu
>
>>
>> Or can I manipulate the subvolumes even if it is mounted ro?
>>
>> Your guidance will be most welcome
>>
>> D.
>>
>> -----Original Message-----
>> From: David C. Partridge <david.partridge@perdrix.co.uk>
>> Sent: 19 June 2022 13:54
>> To: 'Qu Wenruo' <quwenruo.btrfs@gmx.com>; linux-btrfs@vger.kernel.org
>> Subject: RE: Problems with BTRFS formatted disk
>>
>> Here's what the 2022.06.01 version of Archlinux had to say in the log when I issued:
>>
>> mount -t btrfs -o rescue=all /dev/sdc1 /mnt
>>
>> Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): flagging fs with big metadata feature
>> Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): enabling all of the rescue options
>> Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): ignoring data csums
>> Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): ignoring bad roots
>> Jun 19 12:43:01 archiso kernel: BTRFS info (device sdc1): disabling log replay at mount time
>> Jun 19 12:43:01 archiso kernel: BTRFS error (device sdc1): nologreplay must be used with ro mount option
>> Jun 19 12:43:01 archiso kernel: BTRFS error (device sdc1): open_ctree failed
>>
>> Did I need to say:
>>
>> mount -t btrfs -o ro,rescue=all /dev/sdc1 /mnt
>>
>> D.
>>
>> -----Original Message-----
>> From: Qu Wenruo <quwenruo.btrfs@gmx.com>
>> Sent: 19 June 2022 12:51
>> To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
>> Subject: Re: Problems with BTRFS formatted disk
>>
>>
>>
>> On 2022/6/19 19:14, David C. Partridge wrote:
>>> LUbuntu 22.04 was definitely 5.15 kernel, what alternative distro do you propose I use?
>>
>> I have no idea why 22.04 doesn't work here.
>>
>> The upstream commit is 2b29726c473b ("btrfs: rescue: allow ibadroots to
>> skip bad extent tree when reading block group items"), which is already
>> in v5.15 kernels.
>>
>> I double checked the current code base, as long as it's error reading
>> the block group items and rescue=all (implies ibadroots), it should go
>> fill_dummy_bgs().
>>
>> For the alternative distros, OpenSUSE tumbleweed, Archlinux, etc. As
>> they are definitely upstream and v5.15+.
>>
>> For example, Archlinux 2022.06.01, it goes with 5.18 kernel:
>>
>> $ file arch/boot/x86_64/vmlinuz-linux
>> arch/boot/x86_64/vmlinuz-linux: Linux kernel x86 boot executable
>> bzImage, version 5.18.1-arch1-1 (linux@archlinux) #1 SMP PREEMPT_DYNAMIC
>> Mon, 30 May 2022 17:53:11 +0000, RO-rootFS, swap_dev 0XA, Normal VGA
>>
>> If that still doesn't work, let me creating a similar fs with some block
>> groups items corrupted to see why it doesn't work.
>>
>> Thanks,
>> Qu
>>>
>>> -----Original Message-----
>>> From: Qu Wenruo <quwenruo.btrfs@gmx.com>
>>> Sent: 19 June 2022 11:41
>>> To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
>>> Subject: Re: Problems with BTRFS formatted disk
>>>
>>>
>>>
>>> On 2022/6/19 18:29, David C. Partridge wrote:
>>>> Booted from live USB 22.04 LUbuntu.
>>>
>>> Ubuntu kernel version doesn't seem to be that consistent even for its
>>> LTS releases:
>>>
>>> https://ubuntu.com/about/release-cycle#ubuntu-kernel-release-cycle
>>>
>>> Please use something rolling released distro/branch instead.
>>>
>>> Thanks,
>>> Qu
>>>>
>>>> root@lubuntu:/home/lubuntu# mount -t btrfs -o rescue=all /dev/sdc1 /mnt
>>>> mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error.
>>>> root@lubuntu:/home/lubuntu#
>>>>
>>>> Content of system journal
>>>>
>>>> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): flagging fs with big metadata feature
>>>> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): disk space caching is enabled
>>>> Jun 19 10:08:03 lubuntu kernel: BTRFS info (device sdc1): has skinny extents
>>>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): parent transid verify failed on 12554992156672 wanted 130582 found 127355
>>>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): parent transid verify failed on 12554992156672 wanted 130582 found 127355
>>>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): failed to read block groups: -5
>>>> Jun 19 10:08:03 lubuntu kernel: BTRFS error (device sdc1): open_ctree failed
>>>>
>>>> David
>>>>
>>>> -----Original Message-----
>>>> From: Qu Wenruo <quwenruo.btrfs@gmx.com>
>>>> Sent: 19 June 2022 03:02
>>>> To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
>>>> Subject: Re: Problems with BTRFS formatted disk
>>>>
>>>>>> You can try rescue=all mount option, which has the extra handling on
>>>>>> corrupted extent tree.
>>>>>
>>>>>> Although you have to use kernels newer than v5.15 (including v5.15) to
>>>>>> benefit from the change.
>>>>>
>>>>> Unfortunately:
>>>>> amonra@charon:~$ uname -a
>>>>> Linux charon 5.4.0-113-generic #127-Ubuntu SMP Wed May 18 14:30:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
>>>>
>>>> Any special reason that you can not even use a liveUSB to boot a newer
>>>> kernel to do the salvage?
>>>>
>>>>
>>>> Thanks,
>>>> Qu
>>>>
>>>
>>
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Problems with BTRFS formatted disk
  2022-06-20  0:38                           ` Qu Wenruo
  2022-06-20  1:01                             ` Wang Yugui
@ 2022-06-20  8:19                             ` David C. Partridge
  1 sibling, 0 replies; 24+ messages in thread
From: David C. Partridge @ 2022-06-20  8:19 UTC (permalink / raw)
  To: 'Qu Wenruo', linux-btrfs

I don't know ... The Super-Caps only power the cache memory ...

-----Original Message-----
From: Qu Wenruo <quwenruo.btrfs@gmx.com> 
Sent: 20 June 2022 01:38
To: David C. Partridge <david.partridge@perdrix.co.uk>; linux-btrfs@vger.kernel.org
Subject: Re: Problems with BTRFS formatted disk



On 2022/6/20 04:06, David C. Partridge wrote:
> Yes write caching was enabled - I suspect that the way it worked was that on power fail the super-caps held the data until power was restored.
>
> Sadly it wasn't restored for a few weeks by which time the super-caps had lost their charge.

A little off-topic here, since I'm not familiar with how those hardware
RAID controllers work.

Yep, those cards should have (super) caps to handle power loss, but for
  SCSI SYNC CACHE commands, they should "the device server ensure that
the specified logical blocks have their most recent data values recorded
in non-volatile cache and/or on the medium."

Considering in a power loss event, the juice in those caps is definitely
not enough to power those HDDs, it should at least have some
non-volatile cache, like NAND, as backups.

But from sites I can found, it only states the card has 1024MiB cache
memory.

Even with caps to keep the memory alive, it's still far from
"non-volatile cache" required by SCSI spec.

Or is this a common practice in hardware RAID controller world to use
volatile cache and break the SCSI spec requirement?

Thanks,
Qu
>
> I've reconfigured to use write through.
>
>
> -----Original Message-----
> From: Andrei Borzenkov <arvidjaar@gmail.com>
> Sent: 19 June 2022 20:06
> To: David C. Partridge <david.partridge@perdrix.co.uk>; 'Qu Wenruo' <quwenruo.btrfs@gmx.com>; linux-btrfs@vger.kernel.org
> Subject: Re: Problems with BTRFS formatted disk
>
> On 19.06.2022 17:15, David C. Partridge wrote:
>> I can't "grab what I can" as I don't have enough TB to copy the data I want to save ☹
>>
>> Does it make any sense to try:
>>
>>   mount -o remount,rw /mnt
>>   btrfs subvolume delete /mnt/@
>>   btrfs subvolume delete /mnt/@_daily.20220525_00:11:01
>>   btrfs subvolume delete /mnt/@_daily.20220526_00:11:01
>>   btrfs subvolume delete /mnt/@_hourly.20220526_06:00:01
>>   btrfs subvolume delete /mnt/@_hourly.20220526_09:00:01
>>   btrfs subvolume delete /mnt/@_hourly.20220526_12:00:01
>>
>>   mv /mnt/@_daily.20220524_00:11:01 /mnt/@
>>
>> or is that doomed to total failure?
>>
>> The disks behind the raid card are all Western Digital WD4001FYYG SAS drives
>>
>
> Is write caching enabled for these disks? I know that it is default for
> some RAID cards (at least, for some profiles).
>
> For disks behind RAID controller write caching is normally managed by
> RAID controller itself.
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-06-20  8:19 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-18 18:55 Problems with BTRFS formatted disk David C. Partridge
2022-06-18 23:00 ` Qu Wenruo
2022-06-19  1:33   ` David C. Partridge
2022-06-19  2:01     ` Qu Wenruo
2022-06-19 10:29       ` David C. Partridge
2022-06-19 10:40         ` Qu Wenruo
2022-06-19 11:14           ` David C. Partridge
2022-06-19 11:51             ` Qu Wenruo
2022-06-19 12:53               ` David C. Partridge
2022-06-19 13:21                 ` Qu Wenruo
2022-06-19 13:26                 ` David C. Partridge
2022-06-19 13:30                   ` Qu Wenruo
2022-06-19 14:15                     ` David C. Partridge
2022-06-19 19:06                       ` Andrei Borzenkov
2022-06-19 20:06                         ` David C. Partridge
2022-06-20  0:38                           ` Qu Wenruo
2022-06-20  1:01                             ` Wang Yugui
2022-06-20  2:04                               ` Qu Wenruo
2022-06-20  2:12                                 ` Wang Yugui
2022-06-20  2:12                                   ` Qu Wenruo
2022-06-20  8:19                             ` David C. Partridge
2022-06-19 21:40                       ` Qu Wenruo
2022-06-20  8:17                         ` David C. Partridge
2022-06-19  1:37   ` David C. Partridge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).