linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BTRFS errors, and won't mount
@ 2019-10-04  6:59 Patrick Dijkgraaf
  2019-10-04  7:17 ` Patrick Dijkgraaf
  2019-10-04  7:22 ` Qu Wenruo
  0 siblings, 2 replies; 7+ messages in thread
From: Patrick Dijkgraaf @ 2019-10-04  6:59 UTC (permalink / raw)
  To: linux-btrfs

Hi guys,

During the night, I started getting the following errors and data was
no longer accessible:

[Fri Oct  4 08:04:26 2019] btree_readpage_end_io_hook: 2522 callbacks
suppressed
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
start 17686343003259060482 7808404996096
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
start 254095834002432 7808404996096
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
start 2574563607252646368 7808404996096
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
start 17873260189421384017 7808404996096
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
start 9965805624054187110 7808404996096
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
start 15108378087789580224 7808404996096
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
start 7914705769619568652 7808404996096
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
start 16752645757091223687 7808404996096
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
start 9617669583708276649 7808404996096
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
start 3384408928046898608 7808404996096
[Fri Oct  4 08:04:26 2019] btrfs_dev_stat_print_on_error: 159 callbacks
suppressed
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
errs: wr 7, rd 174280, flush 0, corrupt 0, gen 0
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
errs: wr 7, rd 174281, flush 0, corrupt 0, gen 0
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
errs: wr 7, rd 174282, flush 0, corrupt 0, gen 0
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
errs: wr 7, rd 174283, flush 0, corrupt 0, gen 0
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
errs: wr 7, rd 174284, flush 0, corrupt 0, gen 0
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
errs: wr 7, rd 174285, flush 0, corrupt 0, gen 0
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
errs: wr 7, rd 174286, flush 0, corrupt 0, gen 0
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
errs: wr 7, rd 174287, flush 0, corrupt 0, gen 0
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
errs: wr 7, rd 174288, flush 0, corrupt 0, gen 0
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
errs: wr 7, rd 174289, flush 0, corrupt 0, gen 0

Decided to reboot (for another reason) and tried to mount afterwards:

[Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): disk space caching
is enabled
[Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): has skinny extents
[Fri Oct  4 08:29:44 2019] BTRFS error (device sde2): parent transid
verify failed on 5483020828672 wanted 470169 found 470108
[Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
start 2286352011705795888 5483020828672
[Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
start 2286318771218040112 5483020828672
[Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
start 2286363934109025584 5483020828672
[Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
start 2286229742125204784 5483020828672
[Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
start 2286353230849918256 5483020828672
[Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
start 2286246155688035632 5483020828672
[Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
start 2286321695890425136 5483020828672
[Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
start 2286384677254874416 5483020828672
[Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
start 2286386365024912688 5483020828672
[Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
start 2286284400752608560 5483020828672
[Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): failed to recover
balance: -5
[Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): open_ctree failed

The FS info is shown below. It is a RAID6.

Label: 'data'  uuid: 43472491-7bb3-418c-b476-874a52e8b2b0
	Total devices 16 FS bytes used 36.73TiB
	devid    1 size 7.28TiB used 2.66TiB path /dev/sde2
	devid    2 size 3.64TiB used 2.66TiB path /dev/sdf2
	devid    3 size 3.64TiB used 2.66TiB path /dev/sdg2
	devid    4 size 7.28TiB used 2.66TiB path /dev/sdh2
	devid    5 size 3.64TiB used 2.66TiB path /dev/sdi2
	devid    6 size 7.28TiB used 2.66TiB path /dev/sdj2
	devid    7 size 3.64TiB used 2.66TiB path /dev/sdk2
	devid    8 size 3.64TiB used 2.66TiB path /dev/sdl2
	devid    9 size 7.28TiB used 2.66TiB path /dev/sdm2
	devid   10 size 3.64TiB used 2.66TiB path /dev/sdn2
	devid   11 size 7.28TiB used 2.66TiB path /dev/sdo2
	devid   12 size 3.64TiB used 2.66TiB path /dev/sdp2
	devid   13 size 7.28TiB used 2.66TiB path /dev/sdq2
	devid   14 size 7.28TiB used 2.66TiB path /dev/sdr2
	devid   15 size 3.64TiB used 2.66TiB path /dev/sds2
	devid   16 size 3.64TiB used 2.66TiB path /dev/sdt2

The initial error refers to sdw, so possibly something happened that
caused one or more disks in the external cabinet to disappear and
reappear.

Kernel is 4.18.16-arch1-1-ARCH. Very hesitant to upgrade it, because
previously I had to downgrade the kernel to get the volume mounted
again.

Question: I know that running checks on BTRFS can be dangerous, what
can you recommend me doing to get the volume back online?

-- 
Groet / Cheers,
Patrick Dijkgraaf





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BTRFS errors, and won't mount
  2019-10-04  6:59 BTRFS errors, and won't mount Patrick Dijkgraaf
@ 2019-10-04  7:17 ` Patrick Dijkgraaf
  2019-10-04  7:22 ` Qu Wenruo
  1 sibling, 0 replies; 7+ messages in thread
From: Patrick Dijkgraaf @ 2019-10-04  7:17 UTC (permalink / raw)
  To: linux-btrfs

BTRFS restore shows the following output:

#btrfs restore -D /dev/sde2 /mnt/data
This is a dry-run, no files are going to be restored
parent transid verify failed on 4314638041088 wanted 470169 found
470107
parent transid verify failed on 4314638041088 wanted 470169 found
470107
checksum verify failed on 4314638041088 found 4D792F65 wanted A99A92D3
checksum verify failed on 4314638041088 found 8D966120 wanted 4C528768
checksum verify failed on 4314638041088 found 8D966120 wanted 4C528768
bad tree block 4314638041088, bytenr mismatch, want=4314638041088,
have=20210165085184
Error reading subvolume /mnt/data/00-live/nextcloud:
18446744073709551611
Error searching /mnt/data/00-live/nextcloud

-- 
Groet / Cheers,
Patrick Dijkgraaf



On Fri, 2019-10-04 at 08:59 +0200, Patrick Dijkgraaf wrote:
> Hi guys,
> 
> During the night, I started getting the following errors and data was
> no longer accessible:
> 
> [Fri Oct  4 08:04:26 2019] btree_readpage_end_io_hook: 2522 callbacks
> suppressed
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 17686343003259060482 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 254095834002432 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 2574563607252646368 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 17873260189421384017 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 9965805624054187110 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 15108378087789580224 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 7914705769619568652 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 16752645757091223687 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 9617669583708276649 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 3384408928046898608 7808404996096
> [Fri Oct  4 08:04:26 2019] btrfs_dev_stat_print_on_error: 159
> callbacks
> suppressed
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
> errs: wr 7, rd 174280, flush 0, corrupt 0, gen 0
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
> errs: wr 7, rd 174281, flush 0, corrupt 0, gen 0
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
> errs: wr 7, rd 174282, flush 0, corrupt 0, gen 0
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
> errs: wr 7, rd 174283, flush 0, corrupt 0, gen 0
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
> errs: wr 7, rd 174284, flush 0, corrupt 0, gen 0
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
> errs: wr 7, rd 174285, flush 0, corrupt 0, gen 0
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
> errs: wr 7, rd 174286, flush 0, corrupt 0, gen 0
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
> errs: wr 7, rd 174287, flush 0, corrupt 0, gen 0
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
> errs: wr 7, rd 174288, flush 0, corrupt 0, gen 0
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
> errs: wr 7, rd 174289, flush 0, corrupt 0, gen 0
> 
> Decided to reboot (for another reason) and tried to mount afterwards:
> 
> [Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): disk space
> caching
> is enabled
> [Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): has skinny
> extents
> [Fri Oct  4 08:29:44 2019] BTRFS error (device sde2): parent transid
> verify failed on 5483020828672 wanted 470169 found 470108
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286352011705795888 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286318771218040112 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286363934109025584 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286229742125204784 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286353230849918256 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286246155688035632 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286321695890425136 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286384677254874416 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286386365024912688 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286284400752608560 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): failed to
> recover
> balance: -5
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): open_ctree
> failed
> 
> The FS info is shown below. It is a RAID6.
> 
> Label: 'data'  uuid: 43472491-7bb3-418c-b476-874a52e8b2b0
> 	Total devices 16 FS bytes used 36.73TiB
> 	devid    1 size 7.28TiB used 2.66TiB path /dev/sde2
> 	devid    2 size 3.64TiB used 2.66TiB path /dev/sdf2
> 	devid    3 size 3.64TiB used 2.66TiB path /dev/sdg2
> 	devid    4 size 7.28TiB used 2.66TiB path /dev/sdh2
> 	devid    5 size 3.64TiB used 2.66TiB path /dev/sdi2
> 	devid    6 size 7.28TiB used 2.66TiB path /dev/sdj2
> 	devid    7 size 3.64TiB used 2.66TiB path /dev/sdk2
> 	devid    8 size 3.64TiB used 2.66TiB path /dev/sdl2
> 	devid    9 size 7.28TiB used 2.66TiB path /dev/sdm2
> 	devid   10 size 3.64TiB used 2.66TiB path /dev/sdn2
> 	devid   11 size 7.28TiB used 2.66TiB path /dev/sdo2
> 	devid   12 size 3.64TiB used 2.66TiB path /dev/sdp2
> 	devid   13 size 7.28TiB used 2.66TiB path /dev/sdq2
> 	devid   14 size 7.28TiB used 2.66TiB path /dev/sdr2
> 	devid   15 size 3.64TiB used 2.66TiB path /dev/sds2
> 	devid   16 size 3.64TiB used 2.66TiB path /dev/sdt2
> 
> The initial error refers to sdw, so possibly something happened that
> caused one or more disks in the external cabinet to disappear and
> reappear.
> 
> Kernel is 4.18.16-arch1-1-ARCH. Very hesitant to upgrade it, because
> previously I had to downgrade the kernel to get the volume mounted
> again.
> 
> Question: I know that running checks on BTRFS can be dangerous, what
> can you recommend me doing to get the volume back online?
> 





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BTRFS errors, and won't mount
  2019-10-04  6:59 BTRFS errors, and won't mount Patrick Dijkgraaf
  2019-10-04  7:17 ` Patrick Dijkgraaf
@ 2019-10-04  7:22 ` Qu Wenruo
  2019-10-04  7:41   ` Patrick Dijkgraaf
  1 sibling, 1 reply; 7+ messages in thread
From: Qu Wenruo @ 2019-10-04  7:22 UTC (permalink / raw)
  To: Patrick Dijkgraaf, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 5893 bytes --]



On 2019/10/4 下午2:59, Patrick Dijkgraaf wrote:
> Hi guys,
> 
> During the night, I started getting the following errors and data was
> no longer accessible:
> 
> [Fri Oct  4 08:04:26 2019] btree_readpage_end_io_hook: 2522 callbacks
> suppressed
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 17686343003259060482 7808404996096

Tree block at address 7808404996096 is completely broken.

All the other messages with 7808404996096 shows btrfs is trying all
possible device combinations to rebuild that tree block, but obviously
all failed.

Not sure why the tree block is corrupted, but it's pretty possible that
RAID5/6 write hole ruined your possibility to recover.

> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 254095834002432 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 2574563607252646368 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 17873260189421384017 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 9965805624054187110 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 15108378087789580224 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 7914705769619568652 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 16752645757091223687 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 9617669583708276649 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 3384408928046898608 7808404996096
[...]
> Decided to reboot (for another reason) and tried to mount afterwards:
> 
> [Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): disk space caching
> is enabled
> [Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): has skinny extents
> [Fri Oct  4 08:29:44 2019] BTRFS error (device sde2): parent transid
> verify failed on 5483020828672 wanted 470169 found 470108
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286352011705795888 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286318771218040112 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286363934109025584 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286229742125204784 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286353230849918256 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286246155688035632 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286321695890425136 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286384677254874416 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286386365024912688 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286284400752608560 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): failed to recover
> balance: -5
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): open_ctree failed

You're lucky, as the problem is from balance recovery, thus you may have
a chance to mount the RO.
As your fs can progress to btrfs_recover_relocation(), most essential
trees should be OK, thus you have a chance to mount it RO.

> 
> The FS info is shown below. It is a RAID6.
> 
> Label: 'data'  uuid: 43472491-7bb3-418c-b476-874a52e8b2b0
> 	Total devices 16 FS bytes used 36.73TiB

You won't want to salvage data from a near 40T fs...

> 	devid    1 size 7.28TiB used 2.66TiB path /dev/sde2
> 	devid    2 size 3.64TiB used 2.66TiB path /dev/sdf2
> 	devid    3 size 3.64TiB used 2.66TiB path /dev/sdg2
> 	devid    4 size 7.28TiB used 2.66TiB path /dev/sdh2
> 	devid    5 size 3.64TiB used 2.66TiB path /dev/sdi2
> 	devid    6 size 7.28TiB used 2.66TiB path /dev/sdj2
> 	devid    7 size 3.64TiB used 2.66TiB path /dev/sdk2
> 	devid    8 size 3.64TiB used 2.66TiB path /dev/sdl2
> 	devid    9 size 7.28TiB used 2.66TiB path /dev/sdm2
> 	devid   10 size 3.64TiB used 2.66TiB path /dev/sdn2
> 	devid   11 size 7.28TiB used 2.66TiB path /dev/sdo2
> 	devid   12 size 3.64TiB used 2.66TiB path /dev/sdp2
> 	devid   13 size 7.28TiB used 2.66TiB path /dev/sdq2
> 	devid   14 size 7.28TiB used 2.66TiB path /dev/sdr2
> 	devid   15 size 3.64TiB used 2.66TiB path /dev/sds2
> 	devid   16 size 3.64TiB used 2.66TiB path /dev/sdt2

And you won't want to use RAID6 if you're expecting RAID6 to tolerant 2
disks malfunction.

As btrfs RAID5/6 has write-hole problem, any unexpected power loss or
disk error could reduce the error tolerance step by step, if you're not
running scrub regularly.

> 
> The initial error refers to sdw, so possibly something happened that
> caused one or more disks in the external cabinet to disappear and
> reappear.
> 
> Kernel is 4.18.16-arch1-1-ARCH. Very hesitant to upgrade it, because
> previously I had to downgrade the kernel to get the volume mounted
> again.
> 
> Question: I know that running checks on BTRFS can be dangerous, what
> can you recommend me doing to get the volume back online?

"btrfs check" is not dangerous at all. In fact it's pretty safe and it's
the main tool we use to expose any problem.

It's "btrfs check --repair" dangerous, but way less dangerous in recent
years. (although in your case, --repair is completely unrelated and
won't help at all)

"btrfs check" output from latest btrfs-progs would help.

Thanks,
Qu

> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BTRFS errors, and won't mount
  2019-10-04  7:22 ` Qu Wenruo
@ 2019-10-04  7:41   ` Patrick Dijkgraaf
  2019-10-04  7:58     ` Qu Wenruo
  2019-10-04  7:59     ` Patrick Dijkgraaf
  0 siblings, 2 replies; 7+ messages in thread
From: Patrick Dijkgraaf @ 2019-10-04  7:41 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Hi Qu,

I know about RAID5/6 risks, so I won't blame anyone but myself. I'm
currenlty working on another solution, but I was not quite there yet...

mount -o ro /dev/sdh2 /mnt/data gives me:

[Fri Oct  4 09:36:27 2019] BTRFS info (device sde2): disk space caching
is enabled
[Fri Oct  4 09:36:27 2019] BTRFS info (device sde2): has skinny extents
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): parent transid
verify failed on 5483020828672 wanted 470169 found 470108
[Fri Oct  4 09:36:27 2019] btree_readpage_end_io_hook: 5 callbacks
suppressed
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
start 2286352011705795888 5483020828672
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
start 2286318771218040112 5483020828672
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
start 2286363934109025584 5483020828672
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
start 2286229742125204784 5483020828672
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
start 2286353230849918256 5483020828672
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
start 2286246155688035632 5483020828672
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
start 2286321695890425136 5483020828672
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
start 2286384677254874416 5483020828672
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
start 2286386365024912688 5483020828672
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
start 2286284400752608560 5483020828672
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): failed to recover
balance: -5
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): open_ctree failed

Do you think there is any chance to recover?

Thanks,
Patrick.


On Fri, 2019-10-04 at 15:22 +0800, Qu Wenruo wrote:
> On 2019/10/4 下午2:59, Patrick Dijkgraaf wrote:
> > Hi guys,
> > 
> > During the night, I started getting the following errors and data
> > was
> > no longer accessible:
> > 
> > [Fri Oct  4 08:04:26 2019] btree_readpage_end_io_hook: 2522
> > callbacks
> > suppressed
> > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > block
> > start 17686343003259060482 7808404996096
> 
> Tree block at address 7808404996096 is completely broken.
> 
> All the other messages with 7808404996096 shows btrfs is trying all
> possible device combinations to rebuild that tree block, but
> obviously
> all failed.
> 
> Not sure why the tree block is corrupted, but it's pretty possible
> that
> RAID5/6 write hole ruined your possibility to recover.
> 
> > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > block
> > start 254095834002432 7808404996096
> > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > block
> > start 2574563607252646368 7808404996096
> > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > block
> > start 17873260189421384017 7808404996096
> > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > block
> > start 9965805624054187110 7808404996096
> > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > block
> > start 15108378087789580224 7808404996096
> > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > block
> > start 7914705769619568652 7808404996096
> > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > block
> > start 16752645757091223687 7808404996096
> > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > block
> > start 9617669583708276649 7808404996096
> > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > block
> > start 3384408928046898608 7808404996096
> 
> [...]
> > Decided to reboot (for another reason) and tried to mount
> > afterwards:
> > 
> > [Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): disk space
> > caching
> > is enabled
> > [Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): has skinny
> > extents
> > [Fri Oct  4 08:29:44 2019] BTRFS error (device sde2): parent
> > transid
> > verify failed on 5483020828672 wanted 470169 found 470108
> > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > block
> > start 2286352011705795888 5483020828672
> > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > block
> > start 2286318771218040112 5483020828672
> > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > block
> > start 2286363934109025584 5483020828672
> > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > block
> > start 2286229742125204784 5483020828672
> > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > block
> > start 2286353230849918256 5483020828672
> > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > block
> > start 2286246155688035632 5483020828672
> > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > block
> > start 2286321695890425136 5483020828672
> > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > block
> > start 2286384677254874416 5483020828672
> > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > block
> > start 2286386365024912688 5483020828672
> > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > block
> > start 2286284400752608560 5483020828672
> > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): failed to
> > recover
> > balance: -5
> > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): open_ctree
> > failed
> 
> You're lucky, as the problem is from balance recovery, thus you may
> have
> a chance to mount the RO.
> As your fs can progress to btrfs_recover_relocation(), most essential
> trees should be OK, thus you have a chance to mount it RO.
> 
> > The FS info is shown below. It is a RAID6.
> > 
> > Label: 'data'  uuid: 43472491-7bb3-418c-b476-874a52e8b2b0
> > 	Total devices 16 FS bytes used 36.73TiB
> 
> You won't want to salvage data from a near 40T fs...
> 
> > 	devid    1 size 7.28TiB used 2.66TiB path /dev/sde2
> > 	devid    2 size 3.64TiB used 2.66TiB path /dev/sdf2
> > 	devid    3 size 3.64TiB used 2.66TiB path /dev/sdg2
> > 	devid    4 size 7.28TiB used 2.66TiB path /dev/sdh2
> > 	devid    5 size 3.64TiB used 2.66TiB path /dev/sdi2
> > 	devid    6 size 7.28TiB used 2.66TiB path /dev/sdj2
> > 	devid    7 size 3.64TiB used 2.66TiB path /dev/sdk2
> > 	devid    8 size 3.64TiB used 2.66TiB path /dev/sdl2
> > 	devid    9 size 7.28TiB used 2.66TiB path /dev/sdm2
> > 	devid   10 size 3.64TiB used 2.66TiB path /dev/sdn2
> > 	devid   11 size 7.28TiB used 2.66TiB path /dev/sdo2
> > 	devid   12 size 3.64TiB used 2.66TiB path /dev/sdp2
> > 	devid   13 size 7.28TiB used 2.66TiB path /dev/sdq2
> > 	devid   14 size 7.28TiB used 2.66TiB path /dev/sdr2
> > 	devid   15 size 3.64TiB used 2.66TiB path /dev/sds2
> > 	devid   16 size 3.64TiB used 2.66TiB path /dev/sdt2
> 
> And you won't want to use RAID6 if you're expecting RAID6 to tolerant
> 2
> disks malfunction.
> 
> As btrfs RAID5/6 has write-hole problem, any unexpected power loss or
> disk error could reduce the error tolerance step by step, if you're
> not
> running scrub regularly.
> 
> > The initial error refers to sdw, so possibly something happened
> > that
> > caused one or more disks in the external cabinet to disappear and
> > reappear.
> > 
> > Kernel is 4.18.16-arch1-1-ARCH. Very hesitant to upgrade it,
> > because
> > previously I had to downgrade the kernel to get the volume mounted
> > again.
> > 
> > Question: I know that running checks on BTRFS can be dangerous,
> > what
> > can you recommend me doing to get the volume back online?
> 
> "btrfs check" is not dangerous at all. In fact it's pretty safe and
> it's
> the main tool we use to expose any problem.
> 
> It's "btrfs check --repair" dangerous, but way less dangerous in
> recent
> years. (although in your case, --repair is completely unrelated and
> won't help at all)
> 
> "btrfs check" output from latest btrfs-progs would help.
> 
> Thanks,
> Qu




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BTRFS errors, and won't mount
  2019-10-04  7:41   ` Patrick Dijkgraaf
@ 2019-10-04  7:58     ` Qu Wenruo
  2019-10-04  7:59     ` Patrick Dijkgraaf
  1 sibling, 0 replies; 7+ messages in thread
From: Qu Wenruo @ 2019-10-04  7:58 UTC (permalink / raw)
  To: Patrick Dijkgraaf, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 9297 bytes --]



On 2019/10/4 下午3:41, Patrick Dijkgraaf wrote:
> Hi Qu,
> 
> I know about RAID5/6 risks, so I won't blame anyone but myself. I'm
> currenlty working on another solution, but I was not quite there yet...
> 
> mount -o ro /dev/sdh2 /mnt/data gives me:
> 
> [Fri Oct  4 09:36:27 2019] BTRFS info (device sde2): disk space caching
> is enabled
> [Fri Oct  4 09:36:27 2019] BTRFS info (device sde2): has skinny extents
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): parent transid
> verify failed on 5483020828672 wanted 470169 found 470108
> [Fri Oct  4 09:36:27 2019] btree_readpage_end_io_hook: 5 callbacks
> suppressed
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286352011705795888 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286318771218040112 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286363934109025584 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286229742125204784 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286353230849918256 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286246155688035632 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286321695890425136 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286384677254874416 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286386365024912688 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286284400752608560 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): failed to recover
> balance: -5
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): open_ctree failed
> 
> Do you think there is any chance to recover?

This means it's the tailing part of root tree get corrupted.

You can comment out the btrfs_recover_balance() call in open_ctree() of
fs/btrfs/disk-io.c, then try mount RO again.

This means some of your subvolumes can't be read out.


Another way to salvage is try using backup roots.

You can get all backup roots bytenr by "btrfs ins dump-super -f".
E.g:

$ btrfs ins dump-super -f /dev/nvme/btrfs | grep backup_tree_root
                backup_tree_root:       5259264 gen: 5  level: 0
                backup_tree_root:       24641536        gen: 6  level: 0
                backup_tree_root:       26378240        gen: 7  level: 0
                backup_tree_root:       5341184 gen: 8  level: 0

Then pass the bytenr into "btrfs check --tree-root <bytenr>" to see
which one could process further.

Thanks,
Qu
> 
> Thanks,
> Patrick.
> 
> 
> On Fri, 2019-10-04 at 15:22 +0800, Qu Wenruo wrote:
>> On 2019/10/4 下午2:59, Patrick Dijkgraaf wrote:
>>> Hi guys,
>>>
>>> During the night, I started getting the following errors and data
>>> was
>>> no longer accessible:
>>>
>>> [Fri Oct  4 08:04:26 2019] btree_readpage_end_io_hook: 2522
>>> callbacks
>>> suppressed
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 17686343003259060482 7808404996096
>>
>> Tree block at address 7808404996096 is completely broken.
>>
>> All the other messages with 7808404996096 shows btrfs is trying all
>> possible device combinations to rebuild that tree block, but
>> obviously
>> all failed.
>>
>> Not sure why the tree block is corrupted, but it's pretty possible
>> that
>> RAID5/6 write hole ruined your possibility to recover.
>>
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 254095834002432 7808404996096
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 2574563607252646368 7808404996096
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 17873260189421384017 7808404996096
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 9965805624054187110 7808404996096
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 15108378087789580224 7808404996096
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 7914705769619568652 7808404996096
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 16752645757091223687 7808404996096
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 9617669583708276649 7808404996096
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 3384408928046898608 7808404996096
>>
>> [...]
>>> Decided to reboot (for another reason) and tried to mount
>>> afterwards:
>>>
>>> [Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): disk space
>>> caching
>>> is enabled
>>> [Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): has skinny
>>> extents
>>> [Fri Oct  4 08:29:44 2019] BTRFS error (device sde2): parent
>>> transid
>>> verify failed on 5483020828672 wanted 470169 found 470108
>>> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 2286352011705795888 5483020828672
>>> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 2286318771218040112 5483020828672
>>> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 2286363934109025584 5483020828672
>>> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 2286229742125204784 5483020828672
>>> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 2286353230849918256 5483020828672
>>> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 2286246155688035632 5483020828672
>>> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 2286321695890425136 5483020828672
>>> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 2286384677254874416 5483020828672
>>> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 2286386365024912688 5483020828672
>>> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 2286284400752608560 5483020828672
>>> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): failed to
>>> recover
>>> balance: -5
>>> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): open_ctree
>>> failed
>>
>> You're lucky, as the problem is from balance recovery, thus you may
>> have
>> a chance to mount the RO.
>> As your fs can progress to btrfs_recover_relocation(), most essential
>> trees should be OK, thus you have a chance to mount it RO.
>>
>>> The FS info is shown below. It is a RAID6.
>>>
>>> Label: 'data'  uuid: 43472491-7bb3-418c-b476-874a52e8b2b0
>>> 	Total devices 16 FS bytes used 36.73TiB
>>
>> You won't want to salvage data from a near 40T fs...
>>
>>> 	devid    1 size 7.28TiB used 2.66TiB path /dev/sde2
>>> 	devid    2 size 3.64TiB used 2.66TiB path /dev/sdf2
>>> 	devid    3 size 3.64TiB used 2.66TiB path /dev/sdg2
>>> 	devid    4 size 7.28TiB used 2.66TiB path /dev/sdh2
>>> 	devid    5 size 3.64TiB used 2.66TiB path /dev/sdi2
>>> 	devid    6 size 7.28TiB used 2.66TiB path /dev/sdj2
>>> 	devid    7 size 3.64TiB used 2.66TiB path /dev/sdk2
>>> 	devid    8 size 3.64TiB used 2.66TiB path /dev/sdl2
>>> 	devid    9 size 7.28TiB used 2.66TiB path /dev/sdm2
>>> 	devid   10 size 3.64TiB used 2.66TiB path /dev/sdn2
>>> 	devid   11 size 7.28TiB used 2.66TiB path /dev/sdo2
>>> 	devid   12 size 3.64TiB used 2.66TiB path /dev/sdp2
>>> 	devid   13 size 7.28TiB used 2.66TiB path /dev/sdq2
>>> 	devid   14 size 7.28TiB used 2.66TiB path /dev/sdr2
>>> 	devid   15 size 3.64TiB used 2.66TiB path /dev/sds2
>>> 	devid   16 size 3.64TiB used 2.66TiB path /dev/sdt2
>>
>> And you won't want to use RAID6 if you're expecting RAID6 to tolerant
>> 2
>> disks malfunction.
>>
>> As btrfs RAID5/6 has write-hole problem, any unexpected power loss or
>> disk error could reduce the error tolerance step by step, if you're
>> not
>> running scrub regularly.
>>
>>> The initial error refers to sdw, so possibly something happened
>>> that
>>> caused one or more disks in the external cabinet to disappear and
>>> reappear.
>>>
>>> Kernel is 4.18.16-arch1-1-ARCH. Very hesitant to upgrade it,
>>> because
>>> previously I had to downgrade the kernel to get the volume mounted
>>> again.
>>>
>>> Question: I know that running checks on BTRFS can be dangerous,
>>> what
>>> can you recommend me doing to get the volume back online?
>>
>> "btrfs check" is not dangerous at all. In fact it's pretty safe and
>> it's
>> the main tool we use to expose any problem.
>>
>> It's "btrfs check --repair" dangerous, but way less dangerous in
>> recent
>> years. (although in your case, --repair is completely unrelated and
>> won't help at all)
>>
>> "btrfs check" output from latest btrfs-progs would help.
>>
>> Thanks,
>> Qu
> 
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BTRFS errors, and won't mount
  2019-10-04  7:41   ` Patrick Dijkgraaf
  2019-10-04  7:58     ` Qu Wenruo
@ 2019-10-04  7:59     ` Patrick Dijkgraaf
  2019-10-04 11:46       ` Patrick Dijkgraaf
  1 sibling, 1 reply; 7+ messages in thread
From: Patrick Dijkgraaf @ 2019-10-04  7:59 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Decided to upgrade my system to the latest and give it a shot:

# btrfs check /dev/sde2
Opening filesystem to check...
parent transid verify failed on 4314780106752 wanted 470169 found
470107
checksum verify failed on 4314780106752 found 7077566E wanted 9494EBD8
checksum verify failed on 4314780106752 found 489FC179 wanted 73D057EA
checksum verify failed on 4314780106752 found 489FC179 wanted 73D057EA
bad tree block 4314780106752, bytenr mismatch, want=4314780106752,
have=20212047631104
ERROR: cannot open file system

# uname -r
5.3.1-arch1-1-ARCH

# btrfs --version
btrfs-progs v5.2.2

Does that help anything?

-- 
Groet / Cheers,
Patrick Dijkgraaf


On Fri, 2019-10-04 at 09:41 +0200, Patrick Dijkgraaf wrote:
> Hi Qu,
> 
> I know about RAID5/6 risks, so I won't blame anyone but myself. I'm
> currenlty working on another solution, but I was not quite there
> yet...
> 
> mount -o ro /dev/sdh2 /mnt/data gives me:
> 
> [Fri Oct  4 09:36:27 2019] BTRFS info (device sde2): disk space
> caching
> is enabled
> [Fri Oct  4 09:36:27 2019] BTRFS info (device sde2): has skinny
> extents
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): parent transid
> verify failed on 5483020828672 wanted 470169 found 470108
> [Fri Oct  4 09:36:27 2019] btree_readpage_end_io_hook: 5 callbacks
> suppressed
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286352011705795888 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286318771218040112 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286363934109025584 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286229742125204784 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286353230849918256 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286246155688035632 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286321695890425136 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286384677254874416 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286386365024912688 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286284400752608560 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): failed to
> recover
> balance: -5
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): open_ctree
> failed
> 
> Do you think there is any chance to recover?
> 
> Thanks,
> Patrick.
> 
> 
> On Fri, 2019-10-04 at 15:22 +0800, Qu Wenruo wrote:
> > On 2019/10/4 下午2:59, Patrick Dijkgraaf wrote:
> > > Hi guys,
> > > 
> > > During the night, I started getting the following errors and data
> > > was
> > > no longer accessible:
> > > 
> > > [Fri Oct  4 08:04:26 2019] btree_readpage_end_io_hook: 2522
> > > callbacks
> > > suppressed
> > > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 17686343003259060482 7808404996096
> > 
> > Tree block at address 7808404996096 is completely broken.
> > 
> > All the other messages with 7808404996096 shows btrfs is trying all
> > possible device combinations to rebuild that tree block, but
> > obviously
> > all failed.
> > 
> > Not sure why the tree block is corrupted, but it's pretty possible
> > that
> > RAID5/6 write hole ruined your possibility to recover.
> > 
> > > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 254095834002432 7808404996096
> > > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 2574563607252646368 7808404996096
> > > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 17873260189421384017 7808404996096
> > > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 9965805624054187110 7808404996096
> > > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 15108378087789580224 7808404996096
> > > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 7914705769619568652 7808404996096
> > > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 16752645757091223687 7808404996096
> > > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 9617669583708276649 7808404996096
> > > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 3384408928046898608 7808404996096
> > 
> > [...]
> > > Decided to reboot (for another reason) and tried to mount
> > > afterwards:
> > > 
> > > [Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): disk space
> > > caching
> > > is enabled
> > > [Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): has skinny
> > > extents
> > > [Fri Oct  4 08:29:44 2019] BTRFS error (device sde2): parent
> > > transid
> > > verify failed on 5483020828672 wanted 470169 found 470108
> > > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 2286352011705795888 5483020828672
> > > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 2286318771218040112 5483020828672
> > > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 2286363934109025584 5483020828672
> > > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 2286229742125204784 5483020828672
> > > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 2286353230849918256 5483020828672
> > > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 2286246155688035632 5483020828672
> > > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 2286321695890425136 5483020828672
> > > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 2286384677254874416 5483020828672
> > > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 2286386365024912688 5483020828672
> > > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 2286284400752608560 5483020828672
> > > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): failed to
> > > recover
> > > balance: -5
> > > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): open_ctree
> > > failed
> > 
> > You're lucky, as the problem is from balance recovery, thus you may
> > have
> > a chance to mount the RO.
> > As your fs can progress to btrfs_recover_relocation(), most
> > essential
> > trees should be OK, thus you have a chance to mount it RO.
> > 
> > > The FS info is shown below. It is a RAID6.
> > > 
> > > Label: 'data'  uuid: 43472491-7bb3-418c-b476-874a52e8b2b0
> > > 	Total devices 16 FS bytes used 36.73TiB
> > 
> > You won't want to salvage data from a near 40T fs...
> > 
> > > 	devid    1 size 7.28TiB used 2.66TiB path /dev/sde2
> > > 	devid    2 size 3.64TiB used 2.66TiB path /dev/sdf2
> > > 	devid    3 size 3.64TiB used 2.66TiB path /dev/sdg2
> > > 	devid    4 size 7.28TiB used 2.66TiB path /dev/sdh2
> > > 	devid    5 size 3.64TiB used 2.66TiB path /dev/sdi2
> > > 	devid    6 size 7.28TiB used 2.66TiB path /dev/sdj2
> > > 	devid    7 size 3.64TiB used 2.66TiB path /dev/sdk2
> > > 	devid    8 size 3.64TiB used 2.66TiB path /dev/sdl2
> > > 	devid    9 size 7.28TiB used 2.66TiB path /dev/sdm2
> > > 	devid   10 size 3.64TiB used 2.66TiB path /dev/sdn2
> > > 	devid   11 size 7.28TiB used 2.66TiB path /dev/sdo2
> > > 	devid   12 size 3.64TiB used 2.66TiB path /dev/sdp2
> > > 	devid   13 size 7.28TiB used 2.66TiB path /dev/sdq2
> > > 	devid   14 size 7.28TiB used 2.66TiB path /dev/sdr2
> > > 	devid   15 size 3.64TiB used 2.66TiB path /dev/sds2
> > > 	devid   16 size 3.64TiB used 2.66TiB path /dev/sdt2
> > 
> > And you won't want to use RAID6 if you're expecting RAID6 to
> > tolerant
> > 2
> > disks malfunction.
> > 
> > As btrfs RAID5/6 has write-hole problem, any unexpected power loss
> > or
> > disk error could reduce the error tolerance step by step, if you're
> > not
> > running scrub regularly.
> > 
> > > The initial error refers to sdw, so possibly something happened
> > > that
> > > caused one or more disks in the external cabinet to disappear and
> > > reappear.
> > > 
> > > Kernel is 4.18.16-arch1-1-ARCH. Very hesitant to upgrade it,
> > > because
> > > previously I had to downgrade the kernel to get the volume
> > > mounted
> > > again.
> > > 
> > > Question: I know that running checks on BTRFS can be dangerous,
> > > what
> > > can you recommend me doing to get the volume back online?
> > 
> > "btrfs check" is not dangerous at all. In fact it's pretty safe and
> > it's
> > the main tool we use to expose any problem.
> > 
> > It's "btrfs check --repair" dangerous, but way less dangerous in
> > recent
> > years. (although in your case, --repair is completely unrelated and
> > won't help at all)
> > 
> > "btrfs check" output from latest btrfs-progs would help.
> > 
> > Thanks,
> > Qu
> 
> 
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BTRFS errors, and won't mount
  2019-10-04  7:59     ` Patrick Dijkgraaf
@ 2019-10-04 11:46       ` Patrick Dijkgraaf
  0 siblings, 0 replies; 7+ messages in thread
From: Patrick Dijkgraaf @ 2019-10-04 11:46 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Hi Qu,

Tried it with backup roots and it won't mount. I have a backup of a
week ago and I'll accept the dataloss.

I'll rebuild (this time without BTRFS RAID5/6) and restore.

Thanks for your help!

-- 
Groet / Cheers,
Patrick Dijkgraaf



On Fri, 2019-10-04 at 09:59 +0200, Patrick Dijkgraaf wrote:
> Decided to upgrade my system to the latest and give it a shot:
> 
> # btrfs check /dev/sde2
> Opening filesystem to check...
> parent transid verify failed on 4314780106752 wanted 470169 found
> 470107
> checksum verify failed on 4314780106752 found 7077566E wanted
> 9494EBD8
> checksum verify failed on 4314780106752 found 489FC179 wanted
> 73D057EA
> checksum verify failed on 4314780106752 found 489FC179 wanted
> 73D057EA
> bad tree block 4314780106752, bytenr mismatch, want=4314780106752,
> have=20212047631104
> ERROR: cannot open file system
> 
> # uname -r
> 5.3.1-arch1-1-ARCH
> 
> # btrfs --version
> btrfs-progs v5.2.2
> 
> Does that help anything?
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-10-04 11:46 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-04  6:59 BTRFS errors, and won't mount Patrick Dijkgraaf
2019-10-04  7:17 ` Patrick Dijkgraaf
2019-10-04  7:22 ` Qu Wenruo
2019-10-04  7:41   ` Patrick Dijkgraaf
2019-10-04  7:58     ` Qu Wenruo
2019-10-04  7:59     ` Patrick Dijkgraaf
2019-10-04 11:46       ` Patrick Dijkgraaf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).