All of lore.kernel.org
 help / color / mirror / Atom feed
* raid10 corruption while removing failing disk
@ 2020-08-10  7:03 Agustín DallʼAlba
  2020-08-10  7:22 ` Nikolay Borisov
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Agustín DallʼAlba @ 2020-08-10  7:03 UTC (permalink / raw)
  To: linux-btrfs

Hello!

The last quarterly scrub on our btrfs filesystem found a few bad
sectors in one of its devices (/dev/sdd), and because there's nobody on
site to replace the failing disk I decided to remove it from the array
with `btrfs device remove` before the problem could get worse.

The removal was going relatively well (although slowly and I had to
reboot a few times due to the bad sectors) until it had about 200 GB
left to move. Now the filesystem turns read only when I try to finish
the removal and `btrfs check` complains about wrong metadata checksums.
However as far as I can tell none of the copies of the corrupt data are
in the failing disk.

How could this happen? Is it possible to fix this filesystem?

I have refrained from trying anything so far, like upgrading to a newer
kernel or disconnecting the failing drive, before confirming with you
that it's safe.

Kind regards.


# uname -a
Linux susanita 4.15.0-111-generic #112-Ubuntu SMP Thu Jul 9 20:32:34
UTC 2020 x86_64 x86_64 x86_64 GNU/Linux


# btrfs --version
btrfs-progs v4.15.1


# btrfs fi show
Label: 'Susanita'  uuid: 4d3acf20-d408-49ab-b0a6-182396a9f27c
	Total devices 5 FS bytes used 4.90TiB
	devid    1 size 3.64TiB used 3.42TiB path /dev/sda
	devid    2 size 3.64TiB used 3.42TiB path /dev/sde
	devid    3 size 1.82TiB used 1.59TiB path /dev/sdb
	devid    5 size 0.00B used 185.50GiB path /dev/sdd
	devid    6 size 1.82TiB used 1.22TiB path /dev/sdc


# btrfs fi df /
Data, RAID1: total=4.90TiB, used=4.90TiB
System, RAID10: total=64.00MiB, used=880.00KiB
Metadata, RAID10: total=9.00GiB, used=7.57GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


# btrfs check --force --readonly /dev/sda
WARNING: filesystem mounted, continuing because of --force
Checking filesystem on /dev/sda
UUID: 4d3acf20-d408-49ab-b0a6-182396a9f27c
checksum verify failed on 10919566688256 found BAB1746E wanted A8A48266
checksum verify failed on 10919566688256 found BAB1746E wanted A8A48266
bytenr mismatch, want=10919566688256, have=17196831625821864417
ERROR: failed to repair root items: Input/output error

# btrfs-map-logical -l 10919566688256 /dev/sda
mirror 1 logical 10919566688256 physical 394473357312 device /dev/sdc
mirror 2 logical 10919566688256 physical 477218586624 device /dev/sda


Relevant dmesg output:
[    4.963420] Btrfs loaded, crc32c=crc32c-generic
[    5.072878] BTRFS: device label Susanita devid 6 transid 4241535 /dev/sdc
[    5.073165] BTRFS: device label Susanita devid 3 transid 4241535 /dev/sdb
[    5.073713] BTRFS: device label Susanita devid 2 transid 4241535 /dev/sde
[    5.073916] BTRFS: device label Susanita devid 5 transid 4241535 /dev/sdd
[    5.074398] BTRFS: device label Susanita devid 1 transid 4241535 /dev/sda
[    5.152479] BTRFS info (device sda): disk space caching is enabled
[    5.152551] BTRFS info (device sda): has skinny extents
[    5.332538] BTRFS info (device sda): bdev /dev/sdd errs: wr 0, rd 24, flush 0, corrupt 0, gen 0
[   38.869423] BTRFS info (device sda): enabling auto defrag
[   38.869490] BTRFS info (device sda): use lzo compression, level 0
[   38.869547] BTRFS info (device sda): disk space caching is enabled


After running btrfs device remove /dev/sdd /:
[  193.684703] BTRFS info (device sda): relocating block group 10593404846080 flags metadata|raid10
[  312.921934] BTRFS error (device sda): bad tree block start 10597444141056 10919566688256
[  313.034339] BTRFS error (device sda): bad tree block start 17196831625821864417 10919566688256
[  313.034595] BTRFS error (device sda): bad tree block start 10597444141056 10919566688256
[  313.034621] BTRFS: error (device sda) in btrfs_run_delayed_refs:3083: errno=-5 IO failure
[  313.034627] BTRFS info (device sda): forced readonly
[  313.036328] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[  313.036596] IP: merge_reloc_roots+0x19f/0x2c0 [btrfs]
[  313.036650] PGD 0 P4D 0 
[  313.036704] Oops: 0000 [#1] SMP PTI
[  313.036756] Modules linked in: veth nft_fib_ipv4 nft_fib nft_ct nft_meta nf_tables_ipv4 nf_tables nfnetlink wireguard ip6_udp_tunnel udp_tunnel tcp_lp i2c_i801 8021q garp mrp bridge stp llc nfsd auth_rpcgss nfs_acl ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_DSCP xt_TCPMSS iptable_mangle ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_comment xt_tcpudp lockd nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_conntrack nf_conntrack iptable_filter grace sunrpc input_leds shpchp intel_powerclamp serio_raw lpc_ich mac_hid e752x_edac tcp_bbr sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi lm85 hwmon_vid ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy
[  313.037035]  async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mpt3sas 8139too 8139cp pata_acpi mii raid_class sky2 scsi_transport_sas
[  313.037156] CPU: 0 PID: 1173 Comm: btrfs Not tainted 4.15.0-111-generic #112-Ubuntu
[  313.037230] Hardware name:  /SE7525RP2, BIOS SE7525RP20.86B.P.05.00.0020.121620050818 12/16/2005
[  313.037390] RIP: 0010:merge_reloc_roots+0x19f/0x2c0 [btrfs]
[  313.037452] RSP: 0018:ffff98ff4080faf8 EFLAGS: 00010246
[  313.037516] RAX: 0000000000000000 RBX: ffff8dd5b656b000 RCX: 0000000000000000
[  313.037582] RDX: ffff8dd576745800 RSI: 00000000000027e6 RDI: ffff8dd5733a0078
[  313.037658] RBP: ffff98ff4080fb58 R08: ffff8dd5b0c51240 R09: ffff8dd576745800
[  313.037718] R10: 0000000000000040 R11: 0000000000000000 R12: ffff8dd5b656a000
[  313.037777] R13: ffff98ff4080fb18 R14: ffff8dd576745800 R15: ffff8dd5b656b3a0
[  313.037839] FS:  00007f1d2da398c0(0000) GS:ffff8dd5bfc00000(0000) knlGS:0000000000000000
[  313.037912] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  313.037971] CR2: 0000000000000000 CR3: 00000000359c8000 CR4: 00000000000006f0
[  313.038035] Call Trace:
[  313.038153]  relocate_block_group+0x17a/0x640 [btrfs]
[  313.038266]  btrfs_relocate_block_group+0x18f/0x280 [btrfs]
[  313.038377]  btrfs_relocate_chunk+0x38/0xd0 [btrfs]
[  313.038488]  btrfs_shrink_device+0x1d1/0x560 [btrfs]
[  313.038597]  btrfs_rm_device+0x19e/0x590 [btrfs]
[  313.038676]  ? _copy_from_user+0x3e/0x60
[  313.038787]  btrfs_ioctl+0x221c/0x2490 [btrfs]
[  313.038850]  ? _copy_to_user+0x26/0x30
[  313.038914]  ? cp_new_stat+0x152/0x180
[  313.038977]  do_vfs_ioctl+0xa8/0x630
[  313.039082]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
[  313.039146]  ? do_vfs_ioctl+0xa8/0x630
[  313.039206]  ? SYSC_newstat+0x50/0x70
[  313.039266]  SyS_ioctl+0x79/0x90
[  313.039337]  do_syscall_64+0x73/0x130
[  313.039410]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
[  313.039476] RIP: 0033:0x7f1d2c81b6d7
[  313.039533] RSP: 002b:00007ffc0c0bee08 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  313.039606] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1d2c81b6d7
[  313.039661] RDX: 00007ffc0c0bfe28 RSI: 000000005000943a RDI: 0000000000000003
[  313.039717] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[  313.039772] R10: 00007ffc0c0c3e80 R11: 0000000000000246 R12: 00007ffc0c0c0f78
[  313.039829] R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000003
[  313.039884] Code: 08 4c 89 f7 e8 a3 47 ae d5 49 8b 17 49 8b 47 08 48 89 42 08 48 89 10 4d 89 3f 4d 89 7f 08 e9 53 ff ff ff 49 c7 46 18 00 00 00 00 <48> 8b 3c 25 00 00 00 00 e8 74 9a fc ff 49 8b 56 18 48 8b 7a 08 
[  313.040037] RIP: merge_reloc_roots+0x19f/0x2c0 [btrfs] RSP: ffff98ff4080faf8
[  313.040093] CR2: 0000000000000000
[  313.040192] ---[ end trace c981300ad343d57c ]---
[  313.175005] BTRFS error (device sda): pending csums is 323584


The most recent scrub found no problems except for the failing drive:
# btrfs scrub status -dR /
scrub status for 4d3acf20-d408-49ab-b0a6-182396a9f27c
scrub device /dev/sda (id 1) history
	scrub started at Wed Jul  8 19:53:52 2020 and finished after 10:04:57
	data_extents_scrubbed: 57444761
	tree_extents_scrubbed: 299537
	data_bytes_scrubbed: 3633406001152
	tree_bytes_scrubbed: 4907614208
	read_errors: 0
	csum_errors: 0
	verify_errors: 0
	no_csum: 1894424
	csum_discards: 0
	super_errors: 0
	malloc_errors: 0
	uncorrectable_errors: 0
	unverified_errors: 0
	corrected_errors: 0
	last_physical: 3642413285376
scrub device /dev/sde (id 2) history
	scrub started at Wed Jul  8 19:53:52 2020 and finished after 10:17:31
	data_extents_scrubbed: 57533871
	tree_extents_scrubbed: 88610
	data_bytes_scrubbed: 3636789604352
	tree_bytes_scrubbed: 1451786240
	read_errors: 0
	csum_errors: 0
	verify_errors: 0
	no_csum: 3596495
	csum_discards: 0
	super_errors: 0
	malloc_errors: 0
	uncorrectable_errors: 0
	unverified_errors: 0
	corrected_errors: 0
	last_physical: 3641977077760
scrub device /dev/sdb (id 3) history
	scrub started at Wed Jul  8 19:53:52 2020 and finished after 05:15:48
	data_extents_scrubbed: 25189397
	tree_extents_scrubbed: 210630
	data_bytes_scrubbed: 1633732304896
	tree_bytes_scrubbed: 3450961920
	read_errors: 0
	csum_errors: 0
	verify_errors: 0
	no_csum: 1966272
	csum_discards: 0
	super_errors: 0
	malloc_errors: 0
	uncorrectable_errors: 0
	unverified_errors: 0
	corrected_errors: 0
	last_physical: 1640678555648
scrub device /dev/sdd (id 5) history
	scrub started at Wed Jul  8 19:53:52 2020 and finished after 05:00:15
	data_extents_scrubbed: 25301261
	tree_extents_scrubbed: 298654
	data_bytes_scrubbed: 1632169230336
	tree_bytes_scrubbed: 4893147136
	read_errors: 24
	csum_errors: 0
	verify_errors: 0
	no_csum: 1515107
	csum_discards: 0
	super_errors: 0
	malloc_errors: 0
	uncorrectable_errors: 0
	unverified_errors: 0
	corrected_errors: 24
	last_physical: 1640175239168
scrub device /dev/sdc (id 6) history
	scrub started at Wed Jul  8 19:53:52 2020 and finished after 01:58:45
	data_extents_scrubbed: 8887668
	tree_extents_scrubbed: 298747
	data_bytes_scrubbed: 565333995520
	tree_bytes_scrubbed: 4894670848
	read_errors: 0
	csum_errors: 0
	verify_errors: 0
	no_csum: 1723495
	csum_discards: 0
	super_errors: 0
	malloc_errors: 0
	uncorrectable_errors: 0
	unverified_errors: 0
	corrected_errors: 0
	last_physical: 574989795328



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid10 corruption while removing failing disk
  2020-08-10  7:03 raid10 corruption while removing failing disk Agustín DallʼAlba
@ 2020-08-10  7:22 ` Nikolay Borisov
  2020-08-10  7:38   ` Martin Steigerwald
  2020-08-10  8:21 ` Nikolay Borisov
  2020-08-11  2:34 ` Chris Murphy
  2 siblings, 1 reply; 17+ messages in thread
From: Nikolay Borisov @ 2020-08-10  7:22 UTC (permalink / raw)
  To: Agustín DallʼAlba, linux-btrfs



On 10.08.20 г. 10:03 ч., Agustín DallʼAlba wrote:
> Hello!

<skip>

> 
> 
> # uname -a
> Linux susanita 4.15.0-111-generic #112-Ubuntu SMP Thu Jul 9 20:32:34
> UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
> 

This is a vendor kernel so you should ideally seek support through the
vendor. This kernel is not even an LTS so it's not entirely clear which
patches have/have not been backported. With btrfs it's advisable too use
the latest stable kernel as each release brings bug fixes or at the very
least (because always using the latest is not feasible) at least stick
to a supported long-term stable kernel  - i.e 4.14, 4.19 or 5.4
(preferably 5.4)




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid10 corruption while removing failing disk
  2020-08-10  7:22 ` Nikolay Borisov
@ 2020-08-10  7:38   ` Martin Steigerwald
  2020-08-10  7:51     ` Nikolay Borisov
  2020-08-10  7:59     ` Agustín DallʼAlba
  0 siblings, 2 replies; 17+ messages in thread
From: Martin Steigerwald @ 2020-08-10  7:38 UTC (permalink / raw)
  To: Agustín DallʼAlba, linux-btrfs, Nikolay Borisov

Hi Nikolay.

Nikolay Borisov - 10.08.20, 09:22:14 CEST:
> On 10.08.20 г. 10:03 ч., Agustín DallʼAlba wrote:
[…]
> > # uname -a
> > Linux susanita 4.15.0-111-generic #112-Ubuntu SMP Thu Jul 9 20:32:34
> > UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
> 
> This is a vendor kernel so you should ideally seek support through the
> vendor. This kernel is not even an LTS so it's not entirely clear
> which patches have/have not been backported. With btrfs it's
> advisable too use the latest stable kernel as each release brings bug
> fixes or at the very least (because always using the latest is not
> feasible) at least stick to a supported long-term stable kernel  -
> i.e 4.14, 4.19 or 5.4 (preferably 5.4)

The interesting thing with this recommendation is that it to some part 
equals:

Do not use distro / vendor kernels.

Consequently vendors shall *just* exclude BTRFS from being shipped?

At one point in BTRFS development, I'd expect BTRFS to be stable enough 
to be shipped in distro kernels, like XFS, Ext4 and other filesystems.

For me it appears to be. I used it on 5.8 stable kernel on this laptop, 
but on another laptop and on two server virtual machines I used the 
standard Devuan 3 aka Debian 10 kernel (4.19). Without issues so far.

I am just raising this, cause I would believe that at one point it time 
it is important to say: It is *okay* to use vendor kernels. Still 
probably ask your vendor for support first, but it is basically *okay* to 
use them. On the other hand, regarding Debian, I'd expect I could reach 
way more experts regarding BTRFS issues on this mailing list than I 
would find in the Debian kernel team. So I'd probably still ask here 
first.

What would need to happen for it to be okay to use vendor kernels? Is 
there a minimum LTS version where you would say it would be okay?

I am challenging this standard recommendation here, cause I am not sure 
whether for recent distribution releases it would still be accurate or 
helpful. At some point BTRFS got to be as stable as XFS or Ext4 I would 
think. Again, for me it is.

I have no idea why Ubuntu opted to use a non LTS kernel – especially as 
4.15 is pretty old and so does not sound to come from a supported Ubuntu 
release unless it is some Ubuntu LTS release, but then I'd expect a LTS 
kernel to be used –, but "-111" indicates they added a lot of patches by 
now. So maybe they provide some kind of LTS support themselves.

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid10 corruption while removing failing disk
  2020-08-10  7:38   ` Martin Steigerwald
@ 2020-08-10  7:51     ` Nikolay Borisov
  2020-08-10  8:57       ` Martin Steigerwald
  2020-08-11  1:30       ` Chris Murphy
  2020-08-10  7:59     ` Agustín DallʼAlba
  1 sibling, 2 replies; 17+ messages in thread
From: Nikolay Borisov @ 2020-08-10  7:51 UTC (permalink / raw)
  To: Martin Steigerwald, Agustín DallʼAlba, linux-btrfs



On 10.08.20 г. 10:38 ч., Martin Steigerwald wrote:
> Hi Nikolay.
> 
> Nikolay Borisov - 10.08.20, 09:22:14 CEST:
>> On 10.08.20 г. 10:03 ч., Agustín DallʼAlba wrote:
> […]
>>> # uname -a
>>> Linux susanita 4.15.0-111-generic #112-Ubuntu SMP Thu Jul 9 20:32:34
>>> UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>>
>> This is a vendor kernel so you should ideally seek support through the
>> vendor. This kernel is not even an LTS so it's not entirely clear
>> which patches have/have not been backported. With btrfs it's
>> advisable too use the latest stable kernel as each release brings bug
>> fixes or at the very least (because always using the latest is not
>> feasible) at least stick to a supported long-term stable kernel  -
>> i.e 4.14, 4.19 or 5.4 (preferably 5.4)
> 
> The interesting thing with this recommendation is that it to some part 
> equals:
> 
> Do not use distro / vendor kernels.

On the contrary - it means to use kernels which have support for btrfs.
Namely - Suse distributes kernels with btrfs + have developers who are
familiar with the state of btrfs on their kernel. So if someone hits a
problem on a Suse kernel  - they should report this to Suse and not the
upstream mailing list. Suse's (or any other vendor for that matter)
needn't look anything like the upstream kernel. Same thing with Fedora
or Ubuntu. Since time is limited I (as an upstream developer) would
prefer to spend my time where it would have the most impact - upstream
and not spend possibly hours looking at some custom kernel.

> 
> Consequently vendors shall *just* exclude BTRFS from being shipped?

No, vendors should verify that what they are offering is stable and when
they diverge from upstream should provide support. Fedora for example
spent the time to test btrfs for their use cases + they have the support
from some of the developers to fix issues they find because they
themselves don't have the development expertise (yet) to deal with
btrfs. Furthermore I *think* fedora is sticking to using unadorned
upstream kernels (don't quote me on this, I have never used fedora).


> 
> At one point in BTRFS development, I'd expect BTRFS to be stable enough 
> to be shipped in distro kernels, like XFS, Ext4 and other filesystems.
> 
> For me it appears to be. I used it on 5.8 stable kernel on this laptop, 
> but on another laptop and on two server virtual machines I used the 
> standard Devuan 3 aka Debian 10 kernel (4.19). Without issues so far.
> 
> I am just raising this, cause I would believe that at one point it time 
> it is important to say: It is *okay* to use vendor kernels. Still 
> probably ask your vendor for support first, but it is basically *okay* to 
> use them. On the other hand, regarding Debian, I'd expect I could reach 
> way more experts regarding BTRFS issues on this mailing list than I 
> would find in the Debian kernel team. So I'd probably still ask here 
> first.

Sure, but what guarantees that Debian (or any other distro) have
backported whatever patches are necessary too reach certain stability?
Upstream is a moving target and we (at Suse at least) always target an
upstream-first policy so most effort is spent there.

> 
> What would need to happen for it to be okay to use vendor kernels? Is 
> there a minimum LTS version where you would say it would be okay?
> 
> I am challenging this standard recommendation here, cause I am not sure 
> whether for recent distribution releases it would still be accurate or 
> helpful. At some point BTRFS got to be as stable as XFS or Ext4 I would 
> think. Again, for me it is.
> 
> I have no idea why Ubuntu opted to use a non LTS kernel – especially as 
> 4.15 is pretty old and so does not sound to come from a supported Ubuntu 
> release unless it is some Ubuntu LTS release, but then I'd expect a LTS 
> kernel to be used –, but "-111" indicates they added a lot of patches by 
> now. So maybe they provide some kind of LTS support themselves.

All those are valid assumptions - however without direct experience with
the Ubuntu kernel it's not entirely clear how accurate they are. Hence
my recommendation to address Ubuntu kernel people because they should
know best.

> 
> Best,
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid10 corruption while removing failing disk
  2020-08-10  7:38   ` Martin Steigerwald
  2020-08-10  7:51     ` Nikolay Borisov
@ 2020-08-10  7:59     ` Agustín DallʼAlba
  1 sibling, 0 replies; 17+ messages in thread
From: Agustín DallʼAlba @ 2020-08-10  7:59 UTC (permalink / raw)
  To: Martin Steigerwald, linux-btrfs, Nikolay Borisov

On Mon, 2020-08-10 at 09:38 +0200, Martin Steigerwald wrote:
> I have no idea why Ubuntu opted to use a non LTS kernel – especially as 
> 4.15 is pretty old and so does not sound to come from a supported Ubuntu 
> release unless it is some Ubuntu LTS release, but then I'd expect a LTS 
> kernel to be used –, but "-111" indicates they added a lot of patches by 
> now. So maybe they provide some kind of LTS support themselves.

It is indeed the kernel of the 18.04 LTS Ubuntu release.

On Mon, 2020-08-10 at 10:22 +0300, Nikolay Borisov wrote:
> This is a vendor kernel so you should ideally seek support through the
> vendor. This kernel is not even an LTS so it's not entirely clear which
> patches have/have not been backported. With btrfs it's advisable too use
> the latest stable kernel as each release brings bug fixes or at the very
> least (because always using the latest is not feasible) at least stick
> to a supported long-term stable kernel  - i.e 4.14, 4.19 or 5.4
> (preferably 5.4)

Would it help to run a mainline kernel and report again, or is it 'too
late' for this filesystem? If so, should I use 5.4.57 or go straight to
5.8? For now I have forwarded my question to the Ubuntu kernel team
mailing list.

Kind regards.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid10 corruption while removing failing disk
  2020-08-10  7:03 raid10 corruption while removing failing disk Agustín DallʼAlba
  2020-08-10  7:22 ` Nikolay Borisov
@ 2020-08-10  8:21 ` Nikolay Borisov
  2020-08-10 22:24   ` Zygo Blaxell
  2020-08-11  1:18   ` Agustín DallʼAlba
  2020-08-11  2:34 ` Chris Murphy
  2 siblings, 2 replies; 17+ messages in thread
From: Nikolay Borisov @ 2020-08-10  8:21 UTC (permalink / raw)
  To: Agustín DallʼAlba, linux-btrfs



On 10.08.20 г. 10:03 ч., Agustín DallʼAlba wrote:
> Hello!
> 
> The last quarterly scrub on our btrfs filesystem found a few bad
> sectors in one of its devices (/dev/sdd), and because there's nobody on
> site to replace the failing disk I decided to remove it from the array
> with `btrfs device remove` before the problem could get worse.
> 
> The removal was going relatively well (although slowly and I had to
> reboot a few times due to the bad sectors) until it had about 200 GB
> left to move. Now the filesystem turns read only when I try to finish
> the removal and `btrfs check` complains about wrong metadata checksums.
> However as far as I can tell none of the copies of the corrupt data are
> in the failing disk.
> 
> How could this happen? Is it possible to fix this filesystem?
> 
> I have refrained from trying anything so far, like upgrading to a newer
> kernel or disconnecting the failing drive, before confirming with you
> that it's safe.
> 
> Kind regards.
> 
> 
> # uname -a
> Linux susanita 4.15.0-111-generic #112-Ubuntu SMP Thu Jul 9 20:32:34
> UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
> 
> 
> # btrfs --version
> btrfs-progs v4.15.1
> 
> 
> # btrfs fi show
> Label: 'Susanita'  uuid: 4d3acf20-d408-49ab-b0a6-182396a9f27c
> 	Total devices 5 FS bytes used 4.90TiB
> 	devid    1 size 3.64TiB used 3.42TiB path /dev/sda
> 	devid    2 size 3.64TiB used 3.42TiB path /dev/sde
> 	devid    3 size 1.82TiB used 1.59TiB path /dev/sdb
> 	devid    5 size 0.00B used 185.50GiB path /dev/sdd
> 	devid    6 size 1.82TiB used 1.22TiB path /dev/sdc
> 
> 
> # btrfs fi df /
> Data, RAID1: total=4.90TiB, used=4.90TiB
> System, RAID10: total=64.00MiB, used=880.00KiB
> Metadata, RAID10: total=9.00GiB, used=7.57GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> 
> # btrfs check --force --readonly /dev/sda
> WARNING: filesystem mounted, continuing because of --force
> Checking filesystem on /dev/sda
> UUID: 4d3acf20-d408-49ab-b0a6-182396a9f27c
> checksum verify failed on 10919566688256 found BAB1746E wanted A8A48266
> checksum verify failed on 10919566688256 found BAB1746E wanted A8A48266
> bytenr mismatch, want=10919566688256, have=17196831625821864417
> ERROR: failed to repair root items: Input/output error
> 
> # btrfs-map-logical -l 10919566688256 /dev/sda
> mirror 1 logical 10919566688256 physical 394473357312 device /dev/sdc
> mirror 2 logical 10919566688256 physical 477218586624 device /dev/sda
> 
> 
> Relevant dmesg output:
> [    4.963420] Btrfs loaded, crc32c=crc32c-generic
> [    5.072878] BTRFS: device label Susanita devid 6 transid 4241535 /dev/sdc
> [    5.073165] BTRFS: device label Susanita devid 3 transid 4241535 /dev/sdb
> [    5.073713] BTRFS: device label Susanita devid 2 transid 4241535 /dev/sde
> [    5.073916] BTRFS: device label Susanita devid 5 transid 4241535 /dev/sdd
> [    5.074398] BTRFS: device label Susanita devid 1 transid 4241535 /dev/sda
> [    5.152479] BTRFS info (device sda): disk space caching is enabled
> [    5.152551] BTRFS info (device sda): has skinny extents
> [    5.332538] BTRFS info (device sda): bdev /dev/sdd errs: wr 0, rd 24, flush 0, corrupt 0, gen 0
> [   38.869423] BTRFS info (device sda): enabling auto defrag
> [   38.869490] BTRFS info (device sda): use lzo compression, level 0
> [   38.869547] BTRFS info (device sda): disk space caching is enabled
> 
> 
> After running btrfs device remove /dev/sdd /:
> [  193.684703] BTRFS info (device sda): relocating block group 10593404846080 flags metadata|raid10
> [  312.921934] BTRFS error (device sda): bad tree block start 10597444141056 10919566688256
> [  313.034339] BTRFS error (device sda): bad tree block start 17196831625821864417 10919566688256
> [  313.034595] BTRFS error (device sda): bad tree block start 10597444141056 10919566688256
> [  313.034621] BTRFS: error (device sda) in btrfs_run_delayed_refs:3083: errno=-5 IO failure
> [  313.034627] BTRFS info (device sda): forced readonly
> [  313.036328] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> [  313.036596] IP: merge_reloc_roots+0x19f/0x2c0 [btrfs]

This suggests you are hitting a known problem with reloc roots which
have been fixed in the latest upstream and lts (5.4) kernels:

044ca910276b btrfs: reloc: fix reloc root leak and NULL pointer
dereference (3 months ago) <Qu Wenruo>
707de9c0806d btrfs: relocation: fix reloc_root lifespan and access (7
months ago) <Qu Wenruo>
1fac4a54374f btrfs: relocation: fix use-after-free on dead relocation
roots (11 months ago) <Qu Wenruo>


So yes, try to update to latest stable kernel and re-run the device
remove. Also update your btrfs progs to latest 5.6 version and rerun
check again (by default it's a read only operations so it shouldn't
cause any more damage).


<snip>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid10 corruption while removing failing disk
  2020-08-10  7:51     ` Nikolay Borisov
@ 2020-08-10  8:57       ` Martin Steigerwald
  2020-08-11  1:30       ` Chris Murphy
  1 sibling, 0 replies; 17+ messages in thread
From: Martin Steigerwald @ 2020-08-10  8:57 UTC (permalink / raw)
  To: Agustín DallʼAlba, linux-btrfs, Nikolay Borisov

Nikolay Borisov - 10.08.20, 09:51:47 CEST:
> On 10.08.20 г. 10:38 ч., Martin Steigerwald wrote:
> > Hi Nikolay.
> > 
> > Nikolay Borisov - 10.08.20, 09:22:14 CEST:
> >> On 10.08.20 г. 10:03 ч., Agustín DallʼAlba wrote:
> > […]
> > 
> >>> # uname -a
> >>> Linux susanita 4.15.0-111-generic #112-Ubuntu SMP Thu Jul 9
> >>> 20:32:34
> >>> UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
> >> 
> >> This is a vendor kernel so you should ideally seek support through
> >> the vendor. This kernel is not even an LTS so it's not entirely
> >> clear which patches have/have not been backported. With btrfs it's
> >> advisable too use the latest stable kernel as each release brings
> >> bug fixes or at the very least (because always using the latest is
> >> not feasible) at least stick to a supported long-term stable
> >> kernel  - i.e 4.14, 4.19 or 5.4 (preferably 5.4)
> > 
> > The interesting thing with this recommendation is that it to some
> > part equals:
> > 
> > Do not use distro / vendor kernels.
> 
> On the contrary - it means to use kernels which have support for
> btrfs. Namely - Suse distributes kernels with btrfs + have developers
> who are familiar with the state of btrfs on their kernel. So if
> someone hits a problem on a Suse kernel  - they should report this to
> Suse and not the upstream mailing list. Suse's (or any other vendor
> for that matter) needn't look anything like the upstream kernel. Same
> thing with Fedora or Ubuntu. Since time is limited I (as an upstream
> developer) would prefer to spend my time where it would have the most
> impact - upstream and not spend possibly hours looking at some custom
> kernel.

While I get your argument and I bet SUSE support for BTRFS has improved 
a lot, I also still remember that SUSE supported BTRFS in SLES while it 
was still being unstable. With SLES 11 SP 2/3, not sure from memory 
which one it was, I had 2 GiB free in a BTRFS file system, yet got "no 
space left on device" with no ability to delete a file, remove a snapshot 
or do anything else about this, except to add a new virtual disk to 
BTRFS, delete something then. All of this happened cause the standard 
settings for Snapper created a huge lot of snapshots. I do not know how 
much for sure, but I made sure I had quite some free space as I 
installed SLES, knowing of free space related issues back then. All I 
did to trigger the failure condition was to install some OpenLDAP to do 
some example solutions for a training of mine and letting it sit over 
night. Next morning no remove desktop login anymore due to the out of 
space issue.

So my firm statement here is: SUSE used an instable version of BTRFS on 
this older SLES release. In my trainings I recommend at least SLES 12, 
preferably SLES 15 for BTRFS usage, preferable with latest service pack. 
I cannot prove it anymore, without installing such a VM again, cause I 
do not have that VM image anymore. But I remember what I have seen. With 
SLES 11 SP2/3 BTRFS was not really ready for production use – that was 
at least my experience.

So I still wonder: When is BTRFS in upstream to be considered stable 
enough to be used in distro kernels, at least when they are based on LTS 
kernels and updated to their latest releases regularly? I believe it 
would or should be by now. Or otherwise asked: Since which upstream LTS 
kernel can BTRFS be considered stable enough for *production use*?

My last issues had been around Linux 4.4 or 4.5. Since 4.6 I personally 
have no issues anymore… but what is upstream developers idea on this?

[…]
> > What would need to happen for it to be okay to use vendor kernels?
> > Is
> > there a minimum LTS version where you would say it would be okay?
> > 
> > I am challenging this standard recommendation here, cause I am not
> > sure whether for recent distribution releases it would still be
> > accurate or helpful. At some point BTRFS got to be as stable as XFS
> > or Ext4 I would think. Again, for me it is.
> > 
> > I have no idea why Ubuntu opted to use a non LTS kernel – especially
> > as 4.15 is pretty old and so does not sound to come from a
> > supported Ubuntu release unless it is some Ubuntu LTS release, but
> > then I'd expect a LTS kernel to be used –, but "-111" indicates
> > they added a lot of patches by now. So maybe they provide some kind
> > of LTS support themselves.
> All those are valid assumptions - however without direct experience
> with the Ubuntu kernel it's not entirely clear how accurate they are.
> Hence my recommendation to address Ubuntu kernel people because they
> should know best.

Fair enough. I get the recommendation to test against upstream kernel.

At the same you helped Agustín already with an idea that the Ubuntu 
kernel team may not have been aware of.

So I am still challenging any notion that people should not write to 
this list when they have trouble, even when they use some recent enough 
distribution kernel. My question though still is: What would be recent 
enough? 4.19, 5.4, 4.6, 5.7? When would upstream consider it to be 
recent enough to ask here?

At the same time of course you are free not to respond to a mail, in 
case you consider it too much trouble or time needed…

Ciao,
-- 
Martin



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid10 corruption while removing failing disk
  2020-08-10  8:21 ` Nikolay Borisov
@ 2020-08-10 22:24   ` Zygo Blaxell
  2020-08-11  1:18   ` Agustín DallʼAlba
  1 sibling, 0 replies; 17+ messages in thread
From: Zygo Blaxell @ 2020-08-10 22:24 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Agustín DallʼAlba, linux-btrfs

On Mon, Aug 10, 2020 at 11:21:29AM +0300, Nikolay Borisov wrote:
> 
> 
> On 10.08.20 г. 10:03 ч., Agustín DallʼAlba wrote:
> > Hello!
> > 
> > The last quarterly scrub on our btrfs filesystem found a few bad
> > sectors in one of its devices (/dev/sdd), and because there's nobody on
> > site to replace the failing disk I decided to remove it from the array
> > with `btrfs device remove` before the problem could get worse.
> > 
> > The removal was going relatively well (although slowly and I had to
> > reboot a few times due to the bad sectors) until it had about 200 GB
> > left to move. Now the filesystem turns read only when I try to finish
> > the removal and `btrfs check` complains about wrong metadata checksums.
> > However as far as I can tell none of the copies of the corrupt data are
> > in the failing disk.
> > 
> > How could this happen? Is it possible to fix this filesystem?
> > 
> > I have refrained from trying anything so far, like upgrading to a newer
> > kernel or disconnecting the failing drive, before confirming with you
> > that it's safe.
> > 
> > Kind regards.
> > 
> > 
> > # uname -a
> > Linux susanita 4.15.0-111-generic #112-Ubuntu SMP Thu Jul 9 20:32:34
> > UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
> > 
> > 
> > # btrfs --version
> > btrfs-progs v4.15.1
> > 
> > 
> > # btrfs fi show
> > Label: 'Susanita'  uuid: 4d3acf20-d408-49ab-b0a6-182396a9f27c
> > 	Total devices 5 FS bytes used 4.90TiB
> > 	devid    1 size 3.64TiB used 3.42TiB path /dev/sda
> > 	devid    2 size 3.64TiB used 3.42TiB path /dev/sde
> > 	devid    3 size 1.82TiB used 1.59TiB path /dev/sdb
> > 	devid    5 size 0.00B used 185.50GiB path /dev/sdd
> > 	devid    6 size 1.82TiB used 1.22TiB path /dev/sdc
> > 
> > 
> > # btrfs fi df /
> > Data, RAID1: total=4.90TiB, used=4.90TiB
> > System, RAID10: total=64.00MiB, used=880.00KiB
> > Metadata, RAID10: total=9.00GiB, used=7.57GiB
> > GlobalReserve, single: total=512.00MiB, used=0.00B
> > 
> > 
> > # btrfs check --force --readonly /dev/sda
> > WARNING: filesystem mounted, continuing because of --force
> > Checking filesystem on /dev/sda
> > UUID: 4d3acf20-d408-49ab-b0a6-182396a9f27c
> > checksum verify failed on 10919566688256 found BAB1746E wanted A8A48266
> > checksum verify failed on 10919566688256 found BAB1746E wanted A8A48266
> > bytenr mismatch, want=10919566688256, have=17196831625821864417
> > ERROR: failed to repair root items: Input/output error
> > 
> > # btrfs-map-logical -l 10919566688256 /dev/sda
> > mirror 1 logical 10919566688256 physical 394473357312 device /dev/sdc
> > mirror 2 logical 10919566688256 physical 477218586624 device /dev/sda
> > 
> > 
> > Relevant dmesg output:
> > [    4.963420] Btrfs loaded, crc32c=crc32c-generic
> > [    5.072878] BTRFS: device label Susanita devid 6 transid 4241535 /dev/sdc
> > [    5.073165] BTRFS: device label Susanita devid 3 transid 4241535 /dev/sdb
> > [    5.073713] BTRFS: device label Susanita devid 2 transid 4241535 /dev/sde
> > [    5.073916] BTRFS: device label Susanita devid 5 transid 4241535 /dev/sdd
> > [    5.074398] BTRFS: device label Susanita devid 1 transid 4241535 /dev/sda
> > [    5.152479] BTRFS info (device sda): disk space caching is enabled
> > [    5.152551] BTRFS info (device sda): has skinny extents
> > [    5.332538] BTRFS info (device sda): bdev /dev/sdd errs: wr 0, rd 24, flush 0, corrupt 0, gen 0
> > [   38.869423] BTRFS info (device sda): enabling auto defrag
> > [   38.869490] BTRFS info (device sda): use lzo compression, level 0
> > [   38.869547] BTRFS info (device sda): disk space caching is enabled
> > 
> > 
> > After running btrfs device remove /dev/sdd /:
> > [  193.684703] BTRFS info (device sda): relocating block group 10593404846080 flags metadata|raid10
> > [  312.921934] BTRFS error (device sda): bad tree block start 10597444141056 10919566688256
> > [  313.034339] BTRFS error (device sda): bad tree block start 17196831625821864417 10919566688256
> > [  313.034595] BTRFS error (device sda): bad tree block start 10597444141056 10919566688256
> > [  313.034621] BTRFS: error (device sda) in btrfs_run_delayed_refs:3083: errno=-5 IO failure
> > [  313.034627] BTRFS info (device sda): forced readonly
> > [  313.036328] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> > [  313.036596] IP: merge_reloc_roots+0x19f/0x2c0 [btrfs]
> 
> This suggests you are hitting a known problem with reloc roots which
> have been fixed in the latest upstream and lts (5.4) kernels:
> 
> 044ca910276b btrfs: reloc: fix reloc root leak and NULL pointer
> dereference (3 months ago) <Qu Wenruo>
> 707de9c0806d btrfs: relocation: fix reloc_root lifespan and access (7
> months ago) <Qu Wenruo>
> 1fac4a54374f btrfs: relocation: fix use-after-free on dead relocation
> roots (11 months ago) <Qu Wenruo>

Those commits fix a bug that did not exist in btrfs before 5.1.  What is
the rationale for these commits being relevant to a 4.15 kernel?

> So yes, try to update to latest stable kernel and re-run the device
> remove. Also update your btrfs progs to latest 5.6 version and rerun
> check again (by default it's a read only operations so it shouldn't
> cause any more damage).

> 
> <snip>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid10 corruption while removing failing disk
  2020-08-10  8:21 ` Nikolay Borisov
  2020-08-10 22:24   ` Zygo Blaxell
@ 2020-08-11  1:18   ` Agustín DallʼAlba
  2020-08-11  1:48     ` Chris Murphy
  1 sibling, 1 reply; 17+ messages in thread
From: Agustín DallʼAlba @ 2020-08-11  1:18 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs, Zygo Blaxell

On Mon, 2020-08-10 at 11:21 +0300, Nikolay Borisov wrote:
> This suggests you are hitting a known problem with reloc roots which
> have been fixed in the latest upstream and lts (5.4) kernels:
> 
> 044ca910276b btrfs: reloc: fix reloc root leak and NULL pointer
> dereference (3 months ago) <Qu Wenruo>
> 707de9c0806d btrfs: relocation: fix reloc_root lifespan and access (7
> months ago) <Qu Wenruo>
> 1fac4a54374f btrfs: relocation: fix use-after-free on dead relocation
> roots (11 months ago) <Qu Wenruo>
> 
> 
> So yes, try to update to latest stable kernel and re-run the device
> remove. Also update your btrfs progs to latest 5.6 version and rerun
> check again (by default it's a read only operations so it shouldn't
> cause any more damage).

I have tried again with the 5.8.0 kernel and btrfs-progs v5.7 (which
I've compiled statically on a different machine and used only for btrfs
device remove and btrfs check). The system still goes read-only when I
attempt to remove the failing drive, but it doesn't oops in this
version.

This version of btrfs check finds many more problems, however the
'checksum verify failed' line looks supicious: instead of `found
BAB1746E wanted A8A48266` it prints `found 0000006E wanted 00000066`
like if the checksums had been truncated to 8 bits before printing.

I still haven't tried disconnecting the failing disk.

Thanks again.

# mount / -o remount,ro
# /root/btrfs.box.static check --force /dev/sda
Opening filesystem to check...
WARNING: filesystem mounted, continuing because of --force
Checking filesystem on /dev/sda
UUID: 4d3acf20-d408-49ab-b0a6-182396a9f27c
[1/7] checking root items
checksum verify failed on 10919566688256 found 0000006E wanted 00000066
checksum verify failed on 10919566688256 found 0000006E wanted 00000066
bad tree block 10919566688256, bytenr mismatch, want=10919566688256, have=17196831625821864417
ERROR: failed to repair root items: Input/output error
[2/7] checking extents
checksum verify failed on 10919566688256 found 0000006E wanted 00000066
checksum verify failed on 10919566688256 found 0000006E wanted 00000066
bad tree block 10919566688256, bytenr mismatch, want=10919566688256, have=17196831625821864417
ref mismatch on [1927970496512 266240] extent item 0, found 1
data backref 1927970496512 parent 10916974444544 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927970496512 parent 10916974444544 owner 0 offset 0 found 1 wanted 0 back 0x4e8429e0
backpointer mismatch on [1927970496512 266240]
ref mismatch on [1927970762752 262144] extent item 0, found 1
data backref 1927970762752 parent 10916974444544 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927970762752 parent 10916974444544 owner 0 offset 0 found 1 wanted 0 back 0x37cee2e0
backpointer mismatch on [1927970762752 262144]
ref mismatch on [1927971024896 262144] extent item 0, found 1
data backref 1927971024896 parent 10916974444544 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927971024896 parent 10916974444544 owner 0 offset 0 found 1 wanted 0 back 0x4ce044a0
backpointer mismatch on [1927971024896 262144]
ref mismatch on [1927971287040 528384] extent item 0, found 1
data backref 1927971287040 parent 10916974444544 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927971287040 parent 10916974444544 owner 0 offset 0 found 1 wanted 0 back 0x4e848680
backpointer mismatch on [1927971287040 528384]
ref mismatch on [1927971815424 790528] extent item 0, found 1
data backref 1927971815424 parent 10916974444544 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927971815424 parent 10916974444544 owner 0 offset 0 found 1 wanted 0 back 0x4076da70
backpointer mismatch on [1927971815424 790528]
ref mismatch on [1927972605952 524288] extent item 0, found 1
data backref 1927972605952 parent 10916974444544 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927972605952 parent 10916974444544 owner 0 offset 0 found 1 wanted 0 back 0x37cee410
backpointer mismatch on [1927972605952 524288]
ref mismatch on [1927973130240 262144] extent item 0, found 1
data backref 1927973130240 parent 10916974444544 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927973130240 parent 10916974444544 owner 0 offset 0 found 1 wanted 0 back 0x4075bde0
backpointer mismatch on [1927973130240 262144]
ref mismatch on [1927973392384 262144] extent item 0, found 1
data backref 1927973392384 parent 10916974444544 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927973392384 parent 10916974444544 owner 0 offset 0 found 1 wanted 0 back 0x426b5430
backpointer mismatch on [1927973392384 262144]
ref mismatch on [1927973654528 262144] extent item 0, found 1
data backref 1927973654528 parent 10916974444544 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927973654528 parent 10916974444544 owner 0 offset 0 found 1 wanted 0 back 0x4076c510
backpointer mismatch on [1927973654528 262144]
ref mismatch on [1927973916672 262144] extent item 0, found 1
data backref 1927973916672 parent 10916974444544 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927973916672 parent 10916974444544 owner 0 offset 0 found 1 wanted 0 back 0x49784810
backpointer mismatch on [1927973916672 262144]
ref mismatch on [1927974178816 262144] extent item 0, found 1
data backref 1927974178816 parent 10916974444544 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927974178816 parent 10916974444544 owner 0 offset 0 found 1 wanted 0 back 0x37ce3520
backpointer mismatch on [1927974178816 262144]
ref mismatch on [1927974440960 262144] extent item 0, found 1
data backref 1927974440960 parent 10917170806784 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927974440960 parent 10917170806784 owner 0 offset 0 found 1 wanted 0 back 0x426b8f30
backpointer mismatch on [1927974440960 262144]
ref mismatch on [1927974703104 262144] extent item 0, found 1
data backref 1927974703104 parent 10916974444544 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927974703104 parent 10916974444544 owner 0 offset 0 found 1 wanted 0 back 0x4076e2c0
backpointer mismatch on [1927974703104 262144]
ref mismatch on [1927974965248 397312] extent item 0, found 1
data backref 1927974965248 parent 10916975575040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927974965248 parent 10916975575040 owner 0 offset 0 found 1 wanted 0 back 0x4076e520
backpointer mismatch on [1927974965248 397312]
ref mismatch on [1927975362560 262144] extent item 0, found 1
data backref 1927975362560 parent 10916975575040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927975362560 parent 10916975575040 owner 0 offset 0 found 1 wanted 0 back 0x4e84d870
backpointer mismatch on [1927975362560 262144]
ref mismatch on [1927975624704 262144] extent item 0, found 1
data backref 1927975624704 parent 10916975575040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927975624704 parent 10916975575040 owner 0 offset 0 found 1 wanted 0 back 0x4ce04a90
backpointer mismatch on [1927975624704 262144]
ref mismatch on [1927975886848 528384] extent item 0, found 1
data backref 1927975886848 parent 10916975575040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927975886848 parent 10916975575040 owner 0 offset 0 found 1 wanted 0 back 0x37ceeec0
backpointer mismatch on [1927975886848 528384]
ref mismatch on [1927976415232 262144] extent item 0, found 1
data backref 1927976415232 parent 10916975575040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927976415232 parent 10916975575040 owner 0 offset 0 found 1 wanted 0 back 0x4076e650
backpointer mismatch on [1927976415232 262144]
ref mismatch on [1927976677376 524288] extent item 0, found 1
data backref 1927976677376 parent 10916975575040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927976677376 parent 10916975575040 owner 0 offset 0 found 1 wanted 0 back 0x37cef120
backpointer mismatch on [1927976677376 524288]
ref mismatch on [1927977201664 528384] extent item 0, found 1
data backref 1927977201664 parent 10916975575040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927977201664 parent 10916975575040 owner 0 offset 0 found 1 wanted 0 back 0x4076e9e0
backpointer mismatch on [1927977201664 528384]
ref mismatch on [1927977730048 262144] extent item 0, found 1
data backref 1927977730048 parent 10916975575040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927977730048 parent 10916975575040 owner 0 offset 0 found 1 wanted 0 back 0x4076eb10
backpointer mismatch on [1927977730048 262144]
ref mismatch on [1927977992192 528384] extent item 0, found 1
data backref 1927977992192 parent 10916975575040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927977992192 parent 10916975575040 owner 0 offset 0 found 1 wanted 0 back 0x37cef250
backpointer mismatch on [1927977992192 528384]
ref mismatch on [1927978520576 262144] extent item 0, found 1
data backref 1927978520576 parent 10918936985600 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927978520576 parent 10918936985600 owner 0 offset 0 found 1 wanted 0 back 0x1f1d8590
backpointer mismatch on [1927978520576 262144]
ref mismatch on [1927978782720 262144] extent item 0, found 1
data backref 1927978782720 parent 10916975575040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927978782720 parent 10916975575040 owner 0 offset 0 found 1 wanted 0 back 0x4076ec40
backpointer mismatch on [1927978782720 262144]
ref mismatch on [1927979044864 262144] extent item 0, found 1
data backref 1927979044864 parent 10916974444544 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927979044864 parent 10916974444544 owner 0 offset 0 found 1 wanted 0 back 0x4076d810
backpointer mismatch on [1927979044864 262144]
ref mismatch on [1927979307008 262144] extent item 0, found 1
data backref 1927979307008 parent 10916975575040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927979307008 parent 10916975575040 owner 0 offset 0 found 1 wanted 0 back 0x4075cc20
backpointer mismatch on [1927979307008 262144]
ref mismatch on [1927979569152 262144] extent item 0, found 1
data backref 1927979569152 parent 10916975575040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927979569152 parent 10916975575040 owner 0 offset 0 found 1 wanted 0 back 0x4075ce80
backpointer mismatch on [1927979569152 262144]
ref mismatch on [1927979831296 262144] extent item 0, found 1
data backref 1927979831296 parent 10916975575040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927979831296 parent 10916975575040 owner 0 offset 0 found 1 wanted 0 back 0x4076f490
backpointer mismatch on [1927979831296 262144]
ref mismatch on [1927980093440 262144] extent item 0, found 1
data backref 1927980093440 parent 10918936985600 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927980093440 parent 10918936985600 owner 0 offset 0 found 1 wanted 0 back 0x26a1ae90
backpointer mismatch on [1927980093440 262144]
ref mismatch on [1927980355584 266240] extent item 0, found 1
data backref 1927980355584 parent 10916975575040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927980355584 parent 10916975575040 owner 0 offset 0 found 1 wanted 0 back 0x4ce11150
backpointer mismatch on [1927980355584 266240]
ref mismatch on [1927980621824 262144] extent item 0, found 1
data backref 1927980621824 parent 10916975575040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927980621824 parent 10916975575040 owner 0 offset 0 found 1 wanted 0 back 0x4075d210
backpointer mismatch on [1927980621824 262144]
ref mismatch on [1927980883968 262144] extent item 0, found 1
data backref 1927980883968 parent 10916975575040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927980883968 parent 10916975575040 owner 0 offset 0 found 1 wanted 0 back 0x426b57c0
backpointer mismatch on [1927980883968 262144]
ref mismatch on [1927981146112 397312] extent item 0, found 1
data backref 1927981146112 parent 10916975575040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927981146112 parent 10916975575040 owner 0 offset 0 found 1 wanted 0 back 0x37ce45c0
backpointer mismatch on [1927981146112 397312]
ref mismatch on [1927981543424 397312] extent item 0, found 1
data backref 1927981543424 parent 10916975575040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927981543424 parent 10916975575040 owner 0 offset 0 found 1 wanted 0 back 0x4ce04e20
backpointer mismatch on [1927981543424 397312]
ref mismatch on [1927981940736 262144] extent item 0, found 1
data backref 1927981940736 parent 10916975575040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927981940736 parent 10916975575040 owner 0 offset 0 found 1 wanted 0 back 0x4ce04cf0
backpointer mismatch on [1927981940736 262144]
ref mismatch on [1927982202880 262144] extent item 0, found 1
data backref 1927982202880 parent 10918936985600 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927982202880 parent 10918936985600 owner 0 offset 0 found 1 wanted 0 back 0x1f1da010
backpointer mismatch on [1927982202880 262144]
ref mismatch on [1927982465024 131072] extent item 0, found 1
data backref 1927982465024 parent 10916974444544 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927982465024 parent 10916974444544 owner 0 offset 0 found 1 wanted 0 back 0x4e84d150
backpointer mismatch on [1927982465024 131072]
ref mismatch on [1927982596096 131072] extent item 0, found 1
data backref 1927982596096 parent 10917135319040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927982596096 parent 10917135319040 owner 0 offset 0 found 1 wanted 0 back 0x497949b0
backpointer mismatch on [1927982596096 131072]
ref mismatch on [1927982727168 262144] extent item 0, found 1
data backref 1927982727168 parent 10916975575040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927982727168 parent 10916975575040 owner 0 offset 0 found 1 wanted 0 back 0x37ce4950
backpointer mismatch on [1927982727168 262144]
ref mismatch on [1927982989312 262144] extent item 0, found 1
data backref 1927982989312 parent 10916975575040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927982989312 parent 10916975575040 owner 0 offset 0 found 1 wanted 0 back 0x37cefe30
backpointer mismatch on [1927982989312 262144]
ref mismatch on [1927983251456 4096] extent item 0, found 1
data backref 1927983251456 parent 10593646477312 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927983251456 parent 10593646477312 owner 0 offset 0 found 1 wanted 0 back 0x37cf9500
backpointer mismatch on [1927983251456 4096]
ref mismatch on [1927983255552 266240] extent item 0, found 1
data backref 1927983255552 parent 10916975722496 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927983255552 parent 10916975722496 owner 0 offset 0 found 1 wanted 0 back 0x4ce11870
backpointer mismatch on [1927983255552 266240]
ref mismatch on [1927983521792 262144] extent item 0, found 1
data backref 1927983521792 parent 10916975722496 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927983521792 parent 10916975722496 owner 0 offset 0 found 1 wanted 0 back 0x49792ad0
backpointer mismatch on [1927983521792 262144]
ref mismatch on [1927983783936 262144] extent item 0, found 1
data backref 1927983783936 parent 10916974444544 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927983783936 parent 10916974444544 owner 0 offset 0 found 1 wanted 0 back 0x4c7a4ea0
backpointer mismatch on [1927983783936 262144]
ref mismatch on [1927984046080 262144] extent item 0, found 1
data backref 1927984046080 parent 10916975722496 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927984046080 parent 10916975722496 owner 0 offset 0 found 1 wanted 0 back 0x49792f90
backpointer mismatch on [1927984046080 262144]
ref mismatch on [1927984308224 262144] extent item 0, found 1
data backref 1927984308224 parent 10916975722496 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927984308224 parent 10916975722496 owner 0 offset 0 found 1 wanted 0 back 0x497930c0
backpointer mismatch on [1927984308224 262144]
ref mismatch on [1927984570368 262144] extent item 0, found 1
data backref 1927984570368 parent 10916975722496 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927984570368 parent 10916975722496 owner 0 offset 0 found 1 wanted 0 back 0x497931f0
backpointer mismatch on [1927984570368 262144]
ref mismatch on [1927984832512 262144] extent item 0, found 1
data backref 1927984832512 parent 10916975722496 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927984832512 parent 10916975722496 owner 0 offset 0 found 1 wanted 0 back 0x40770b20
backpointer mismatch on [1927984832512 262144]
ref mismatch on [1927985094656 262144] extent item 0, found 1
data backref 1927985094656 parent 10916975722496 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927985094656 parent 10916975722496 owner 0 offset 0 found 1 wanted 0 back 0x37ce51a0
backpointer mismatch on [1927985094656 262144]
ref mismatch on [1927985356800 262144] extent item 0, found 1
data backref 1927985356800 parent 10916975722496 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927985356800 parent 10916975722496 owner 0 offset 0 found 1 wanted 0 back 0x49793450
backpointer mismatch on [1927985356800 262144]
ref mismatch on [1927985618944 397312] extent item 0, found 1
data backref 1927985618944 parent 10916975722496 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927985618944 parent 10916975722496 owner 0 offset 0 found 1 wanted 0 back 0x37cf0a10
backpointer mismatch on [1927985618944 397312]
ref mismatch on [1927986016256 528384] extent item 0, found 1
data backref 1927986016256 parent 10916975722496 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927986016256 parent 10916975722496 owner 0 offset 0 found 1 wanted 0 back 0x40771900
backpointer mismatch on [1927986016256 528384]
ref mismatch on [1927986544640 262144] extent item 0, found 1
data backref 1927986544640 parent 10918936985600 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927986544640 parent 10918936985600 owner 0 offset 0 found 1 wanted 0 back 0x33398bc0
backpointer mismatch on [1927986544640 262144]
ref mismatch on [1927986806784 262144] extent item 0, found 1
data backref 1927986806784 parent 10917170806784 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927986806784 parent 10917170806784 owner 0 offset 0 found 1 wanted 0 back 0x426b9780
backpointer mismatch on [1927986806784 262144]
ref mismatch on [1927987068928 262144] extent item 0, found 1
data backref 1927987068928 parent 10916975722496 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927987068928 parent 10916975722496 owner 0 offset 0 found 1 wanted 0 back 0x40771c90
backpointer mismatch on [1927987068928 262144]
ref mismatch on [1927987331072 397312] extent item 0, found 1
data backref 1927987331072 parent 10916975722496 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927987331072 parent 10916975722496 owner 0 offset 0 found 1 wanted 0 back 0x49794030
backpointer mismatch on [1927987331072 397312]
ref mismatch on [1927987728384 262144] extent item 0, found 1
data backref 1927987728384 parent 10916975722496 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927987728384 parent 10916975722496 owner 0 offset 0 found 1 wanted 0 back 0x37ce59f0
backpointer mismatch on [1927987728384 262144]
ref mismatch on [1927987990528 262144] extent item 0, found 1
data backref 1927987990528 parent 10918936985600 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927987990528 parent 10918936985600 owner 0 offset 0 found 1 wanted 0 back 0x4ce33b10
backpointer mismatch on [1927987990528 262144]
ref mismatch on [1927988252672 262144] extent item 0, found 1
data backref 1927988252672 parent 10917135319040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927988252672 parent 10917135319040 owner 0 offset 0 found 1 wanted 0 back 0x426b7050
backpointer mismatch on [1927988252672 262144]
ref mismatch on [1927988514816 262144] extent item 0, found 1
data backref 1927988514816 parent 10917135319040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927988514816 parent 10917135319040 owner 0 offset 0 found 1 wanted 0 back 0x4e83fb50
backpointer mismatch on [1927988514816 262144]
ref mismatch on [1927988776960 262144] extent item 0, found 1
data backref 1927988776960 parent 10917135319040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927988776960 parent 10917135319040 owner 0 offset 0 found 1 wanted 0 back 0x40772c00
backpointer mismatch on [1927988776960 262144]
ref mismatch on [1927989039104 262144] extent item 0, found 1
data backref 1927989039104 parent 10917135319040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927989039104 parent 10917135319040 owner 0 offset 0 found 1 wanted 0 back 0x37ce6f50
backpointer mismatch on [1927989039104 262144]
ref mismatch on [1927989301248 262144] extent item 0, found 1
data backref 1927989301248 parent 10917135319040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927989301248 parent 10917135319040 owner 0 offset 0 found 1 wanted 0 back 0x407731f0
backpointer mismatch on [1927989301248 262144]
ref mismatch on [1927989563392 262144] extent item 0, found 1
data backref 1927989563392 parent 10917135319040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927989563392 parent 10917135319040 owner 0 offset 0 found 1 wanted 0 back 0x49795590
backpointer mismatch on [1927989563392 262144]
ref mismatch on [1927989825536 397312] extent item 0, found 1
data backref 1927989825536 parent 10917135319040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927989825536 parent 10917135319040 owner 0 offset 0 found 1 wanted 0 back 0x426b7b00
backpointer mismatch on [1927989825536 397312]
ref mismatch on [1927990222848 262144] extent item 0, found 1
data backref 1927990222848 parent 10917135319040 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927990222848 parent 10917135319040 owner 0 offset 0 found 1 wanted 0 back 0x407602c0
backpointer mismatch on [1927990222848 262144]
ref mismatch on [1927990484992 262144] extent item 0, found 1
data backref 1927990484992 parent 10917170806784 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927990484992 parent 10917170806784 owner 0 offset 0 found 1 wanted 0 back 0x37cf33a0
backpointer mismatch on [1927990484992 262144]
ref mismatch on [1927990747136 397312] extent item 0, found 1
data backref 1927990747136 parent 10917170806784 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927990747136 parent 10917170806784 owner 0 offset 0 found 1 wanted 0 back 0x106d0350
backpointer mismatch on [1927990747136 397312]
ref mismatch on [1927991144448 262144] extent item 0, found 1
data backref 1927991144448 parent 10917170806784 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927991144448 parent 10917170806784 owner 0 offset 0 found 1 wanted 0 back 0x37ce98e0
backpointer mismatch on [1927991144448 262144]
ref mismatch on [1927991406592 397312] extent item 0, found 1
data backref 1927991406592 parent 10917170806784 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927991406592 parent 10917170806784 owner 0 offset 0 found 1 wanted 0 back 0x426b8350
backpointer mismatch on [1927991406592 397312]
ref mismatch on [1927991803904 262144] extent item 0, found 1
data backref 1927991803904 parent 10917180702720 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927991803904 parent 10917180702720 owner 0 offset 0 found 1 wanted 0 back 0x49796e80
backpointer mismatch on [1927991803904 262144]
ref mismatch on [1927992066048 397312] extent item 0, found 1
data backref 1927992066048 parent 10917170806784 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927992066048 parent 10917170806784 owner 0 offset 0 found 1 wanted 0 back 0x49795b80
backpointer mismatch on [1927992066048 397312]
ref mismatch on [1927992463360 262144] extent item 0, found 1
data backref 1927992463360 parent 10917180702720 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927992463360 parent 10917180702720 owner 0 offset 0 found 1 wanted 0 back 0x497976d0
backpointer mismatch on [1927992463360 262144]
ref mismatch on [1927992725504 262144] extent item 0, found 1
data backref 1927992725504 parent 10917170806784 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927992725504 parent 10917170806784 owner 0 offset 0 found 1 wanted 0 back 0x37ce7c60
backpointer mismatch on [1927992725504 262144]
ref mismatch on [1927992987648 262144] extent item 0, found 1
data backref 1927992987648 parent 10917170806784 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927992987648 parent 10917170806784 owner 0 offset 0 found 1 wanted 0 back 0x497734f0
backpointer mismatch on [1927992987648 262144]
ref mismatch on [1927993249792 262144] extent item 0, found 1
data backref 1927993249792 parent 10917170806784 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927993249792 parent 10917170806784 owner 0 offset 0 found 1 wanted 0 back 0x49796040
backpointer mismatch on [1927993249792 262144]
ref mismatch on [1927993511936 262144] extent item 0, found 1
data backref 1927993511936 parent 10918936985600 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927993511936 parent 10918936985600 owner 0 offset 0 found 1 wanted 0 back 0x4976e700
backpointer mismatch on [1927993511936 262144]
ref mismatch on [1927993774080 397312] extent item 0, found 1
data backref 1927993774080 parent 10917170806784 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927993774080 parent 10917170806784 owner 0 offset 0 found 1 wanted 0 back 0x40760d70
backpointer mismatch on [1927993774080 397312]
ref mismatch on [1927994171392 397312] extent item 0, found 1
data backref 1927994171392 parent 10917170806784 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927994171392 parent 10917170806784 owner 0 offset 0 found 1 wanted 0 back 0x497969c0
backpointer mismatch on [1927994171392 397312]
ref mismatch on [1927994568704 262144] extent item 0, found 1
data backref 1927994568704 parent 10918936985600 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927994568704 parent 10918936985600 owner 0 offset 0 found 1 wanted 0 back 0x21294ae0
backpointer mismatch on [1927994568704 262144]
ref mismatch on [1927994830848 262144] extent item 0, found 1
data backref 1927994830848 parent 10917180702720 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927994830848 parent 10917180702720 owner 0 offset 0 found 1 wanted 0 back 0x49796d50
backpointer mismatch on [1927994830848 262144]
ref mismatch on [1927995092992 262144] extent item 0, found 1
data backref 1927995092992 parent 10917180702720 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927995092992 parent 10917180702720 owner 0 offset 0 found 1 wanted 0 back 0x426b99e0
backpointer mismatch on [1927995092992 262144]
ref mismatch on [1927995355136 524288] extent item 0, found 1
data backref 1927995355136 parent 10917180702720 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927995355136 parent 10917180702720 owner 0 offset 0 found 1 wanted 0 back 0x37cf34d0
backpointer mismatch on [1927995355136 524288]
ref mismatch on [1927995879424 262144] extent item 0, found 1
data backref 1927995879424 parent 10917180702720 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927995879424 parent 10917180702720 owner 0 offset 0 found 1 wanted 0 back 0x4c7a6400
backpointer mismatch on [1927995879424 262144]
ref mismatch on [1927996141568 397312] extent item 0, found 1
data backref 1927996141568 parent 10917180702720 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927996141568 parent 10917180702720 owner 0 offset 0 found 1 wanted 0 back 0x40761820
backpointer mismatch on [1927996141568 397312]
ref mismatch on [1927996538880 262144] extent item 0, found 1
data backref 1927996538880 parent 10917180702720 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927996538880 parent 10917180702720 owner 0 offset 0 found 1 wanted 0 back 0x4ce159c0
backpointer mismatch on [1927996538880 262144]
ref mismatch on [1927996801024 262144] extent item 0, found 1
data backref 1927996801024 parent 10917180702720 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927996801024 parent 10917180702720 owner 0 offset 0 found 1 wanted 0 back 0x497752a0
backpointer mismatch on [1927996801024 262144]
ref mismatch on [1927997063168 262144] extent item 0, found 1
data backref 1927997063168 parent 10917180702720 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927997063168 parent 10917180702720 owner 0 offset 0 found 1 wanted 0 back 0x4ce16340
backpointer mismatch on [1927997063168 262144]
ref mismatch on [1927997325312 262144] extent item 0, found 1
data backref 1927997325312 parent 10917180702720 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927997325312 parent 10917180702720 owner 0 offset 0 found 1 wanted 0 back 0x40761a80
backpointer mismatch on [1927997325312 262144]
ref mismatch on [1927997587456 262144] extent item 0, found 1
data backref 1927997587456 parent 10917180702720 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927997587456 parent 10917180702720 owner 0 offset 0 found 1 wanted 0 back 0x49775d50
backpointer mismatch on [1927997587456 262144]
ref mismatch on [1927997849600 262144] extent item 0, found 1
data backref 1927997849600 parent 10917180702720 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927997849600 parent 10917180702720 owner 0 offset 0 found 1 wanted 0 back 0x497983e0
backpointer mismatch on [1927997849600 262144]
ref mismatch on [1927998111744 397312] extent item 0, found 1
data backref 1927998111744 parent 10917180702720 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927998111744 parent 10917180702720 owner 0 offset 0 found 1 wanted 0 back 0x49775fb0
backpointer mismatch on [1927998111744 397312]
ref mismatch on [1927998509056 262144] extent item 0, found 1
data backref 1927998509056 parent 10917180702720 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927998509056 parent 10917180702720 owner 0 offset 0 found 1 wanted 0 back 0x37d171e0
backpointer mismatch on [1927998509056 262144]
ref mismatch on [1927998771200 397312] extent item 0, found 1
data backref 1927998771200 parent 10917282856960 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927998771200 parent 10917282856960 owner 0 offset 0 found 1 wanted 0 back 0x497989d0
backpointer mismatch on [1927998771200 397312]
ref mismatch on [1927999168512 262144] extent item 0, found 1
data backref 1927999168512 parent 10917282856960 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927999168512 parent 10917282856960 owner 0 offset 0 found 1 wanted 0 back 0x40762530
backpointer mismatch on [1927999168512 262144]
ref mismatch on [1927999430656 262144] extent item 0, found 1
data backref 1927999430656 parent 10916985520128 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927999430656 parent 10916985520128 owner 0 offset 0 found 1 wanted 0 back 0x4ce3bd40
backpointer mismatch on [1927999430656 262144]
ref mismatch on [1927999692800 262144] extent item 0, found 1
data backref 1927999692800 parent 10917282856960 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927999692800 parent 10917282856960 owner 0 offset 0 found 1 wanted 0 back 0x49798e90
backpointer mismatch on [1927999692800 262144]
ref mismatch on [1927999954944 262144] extent item 0, found 1
data backref 1927999954944 parent 10917180702720 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1927999954944 parent 10917180702720 owner 0 offset 0 found 1 wanted 0 back 0x497753d0
backpointer mismatch on [1927999954944 262144]
ref mismatch on [1928000217088 397312] extent item 0, found 1
data backref 1928000217088 parent 10917282856960 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928000217088 parent 10917282856960 owner 0 offset 0 found 1 wanted 0 back 0x49777250
backpointer mismatch on [1928000217088 397312]
ref mismatch on [1928000614400 262144] extent item 0, found 1
data backref 1928000614400 parent 10917282856960 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928000614400 parent 10917282856960 owner 0 offset 0 found 1 wanted 0 back 0x37cf4900
backpointer mismatch on [1928000614400 262144]
ref mismatch on [1928000876544 266240] extent item 0, found 1
data backref 1928000876544 parent 10917282856960 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928000876544 parent 10917282856960 owner 0 offset 0 found 1 wanted 0 back 0x4ce17050
backpointer mismatch on [1928000876544 266240]
ref mismatch on [1928001142784 262144] extent item 0, found 1
data backref 1928001142784 parent 10917180702720 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928001142784 parent 10917180702720 owner 0 offset 0 found 1 wanted 0 back 0x426ba6f0
backpointer mismatch on [1928001142784 262144]
ref mismatch on [1928001404928 262144] extent item 0, found 1
data backref 1928001404928 parent 10917282856960 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928001404928 parent 10917282856960 owner 0 offset 0 found 1 wanted 0 back 0x37d19190
backpointer mismatch on [1928001404928 262144]
ref mismatch on [1928001667072 262144] extent item 0, found 1
data backref 1928001667072 parent 10917282856960 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928001667072 parent 10917282856960 owner 0 offset 0 found 1 wanted 0 back 0x37cea4c0
backpointer mismatch on [1928001667072 262144]
ref mismatch on [1928001929216 397312] extent item 0, found 1
data backref 1928001929216 parent 10917282856960 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928001929216 parent 10917282856960 owner 0 offset 0 found 1 wanted 0 back 0x49799480
backpointer mismatch on [1928001929216 397312]
ref mismatch on [1928002326528 786432] extent item 0, found 1
data backref 1928002326528 parent 10917282856960 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928002326528 parent 10917282856960 owner 0 offset 0 found 1 wanted 0 back 0x4ce172b0
backpointer mismatch on [1928002326528 786432]
ref mismatch on [1928003112960 262144] extent item 0, found 1
data backref 1928003112960 parent 10917282856960 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928003112960 parent 10917282856960 owner 0 offset 0 found 1 wanted 0 back 0x407662f0
backpointer mismatch on [1928003112960 262144]
ref mismatch on [1928003375104 659456] extent item 0, found 1
data backref 1928003375104 parent 10917282856960 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928003375104 parent 10917282856960 owner 0 offset 0 found 1 wanted 0 back 0x49778a10
backpointer mismatch on [1928003375104 659456]
ref mismatch on [1928004034560 262144] extent item 0, found 1
data backref 1928004034560 parent 10917282856960 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928004034560 parent 10917282856960 owner 0 offset 0 found 1 wanted 0 back 0x37d19d70
backpointer mismatch on [1928004034560 262144]
ref mismatch on [1928004296704 262144] extent item 0, found 1
data backref 1928004296704 parent 10917282856960 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928004296704 parent 10917282856960 owner 0 offset 0 found 1 wanted 0 back 0x37ceaab0
backpointer mismatch on [1928004296704 262144]
ref mismatch on [1928004558848 262144] extent item 0, found 1
data backref 1928004558848 parent 10917282856960 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928004558848 parent 10917282856960 owner 0 offset 0 found 1 wanted 0 back 0x37ceabe0
backpointer mismatch on [1928004558848 262144]
ref mismatch on [1928004820992 262144] extent item 0, found 1
data backref 1928004820992 parent 10917282856960 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928004820992 parent 10917282856960 owner 0 offset 0 found 1 wanted 0 back 0x49799cd0
backpointer mismatch on [1928004820992 262144]
ref mismatch on [1928005083136 262144] extent item 0, found 1
data backref 1928005083136 parent 10916985520128 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928005083136 parent 10916985520128 owner 0 offset 0 found 1 wanted 0 back 0x212bbf00
backpointer mismatch on [1928005083136 262144]
ref mismatch on [1928005345280 262144] extent item 0, found 1
data backref 1928005345280 parent 10917292867584 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928005345280 parent 10917292867584 owner 0 offset 0 found 1 wanted 0 back 0x49779850
backpointer mismatch on [1928005345280 262144]
ref mismatch on [1928005607424 262144] extent item 0, found 1
data backref 1928005607424 parent 10917292867584 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928005607424 parent 10917292867584 owner 0 offset 0 found 1 wanted 0 back 0x40762fe0
backpointer mismatch on [1928005607424 262144]
ref mismatch on [1928005869568 397312] extent item 0, found 1
data backref 1928005869568 parent 10917292867584 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928005869568 parent 10917292867584 owner 0 offset 0 found 1 wanted 0 back 0x49779ab0
backpointer mismatch on [1928005869568 397312]
ref mismatch on [1928006266880 266240] extent item 0, found 1
data backref 1928006266880 parent 10917292867584 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928006266880 parent 10917292867584 owner 0 offset 0 found 1 wanted 0 back 0x4ce01190
backpointer mismatch on [1928006266880 266240]
ref mismatch on [1928006533120 262144] extent item 0, found 1
data backref 1928006533120 parent 10917292867584 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928006533120 parent 10917292867584 owner 0 offset 0 found 1 wanted 0 back 0x4ce0b640
backpointer mismatch on [1928006533120 262144]
ref mismatch on [1928006795264 262144] extent item 0, found 1
data backref 1928006795264 parent 10917292867584 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928006795264 parent 10917292867584 owner 0 offset 0 found 1 wanted 0 back 0x4ce0bd60
backpointer mismatch on [1928006795264 262144]
ref mismatch on [1928007057408 262144] extent item 0, found 1
data backref 1928007057408 parent 10917292867584 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928007057408 parent 10917292867584 owner 0 offset 0 found 1 wanted 0 back 0x4c7a7110
backpointer mismatch on [1928007057408 262144]
ref mismatch on [1928007319552 528384] extent item 0, found 1
data backref 1928007319552 parent 10917292867584 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928007319552 parent 10917292867584 owner 0 offset 0 found 1 wanted 0 back 0x40763cf0
backpointer mismatch on [1928007319552 528384]
ref mismatch on [1928007847936 262144] extent item 0, found 1
data backref 1928007847936 parent 10917292867584 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928007847936 parent 10917292867584 owner 0 offset 0 found 1 wanted 0 back 0x4979b820
backpointer mismatch on [1928007847936 262144]
ref mismatch on [1928008110080 262144] extent item 0, found 1
data backref 1928008110080 parent 10916985520128 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928008110080 parent 10916985520128 owner 0 offset 0 found 1 wanted 0 back 0x4ce4c2e0
backpointer mismatch on [1928008110080 262144]
ref mismatch on [1928008372224 262144] extent item 0, found 1
data backref 1928008372224 parent 10917292867584 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928008372224 parent 10917292867584 owner 0 offset 0 found 1 wanted 0 back 0x4ce19190
backpointer mismatch on [1928008372224 262144]
ref mismatch on [1928008634368 262144] extent item 0, found 1
data backref 1928008634368 parent 10917292867584 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928008634368 parent 10917292867584 owner 0 offset 0 found 1 wanted 0 back 0x4ce0c350
backpointer mismatch on [1928008634368 262144]
ref mismatch on [1928008896512 262144] extent item 0, found 1
data backref 1928008896512 parent 10917292867584 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928008896512 parent 10917292867584 owner 0 offset 0 found 1 wanted 0 back 0x4977fd40
backpointer mismatch on [1928008896512 262144]
ref mismatch on [1928009158656 262144] extent item 0, found 1
data backref 1928009158656 parent 10916985520128 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928009158656 parent 10916985520128 owner 0 offset 0 found 1 wanted 0 back 0x4ce4d120
backpointer mismatch on [1928009158656 262144]
ref mismatch on [1928009420800 262144] extent item 0, found 1
data backref 1928009420800 parent 10917292867584 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928009420800 parent 10917292867584 owner 0 offset 0 found 1 wanted 0 back 0x4979be10
backpointer mismatch on [1928009420800 262144]
ref mismatch on [1928009682944 4096] extent item 0, found 1
data backref 1928009682944 parent 10593646477312 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928009682944 parent 10593646477312 owner 0 offset 0 found 1 wanted 0 back 0x37cf93d0
backpointer mismatch on [1928009682944 4096]
ref mismatch on [1928009687040 262144] extent item 0, found 1
data backref 1928009687040 parent 10917292867584 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928009687040 parent 10917292867584 owner 0 offset 0 found 1 wanted 0 back 0x37cebbb0
backpointer mismatch on [1928009687040 262144]
ref mismatch on [1928009949184 262144] extent item 0, found 1
data backref 1928009949184 parent 10917294374912 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928009949184 parent 10917294374912 owner 0 offset 0 found 1 wanted 0 back 0x106d0b10
backpointer mismatch on [1928009949184 262144]
ref mismatch on [1928010211328 262144] extent item 0, found 1
data backref 1928010211328 parent 10917294374912 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928010211328 parent 10917294374912 owner 0 offset 0 found 1 wanted 0 back 0x4c7a7240
backpointer mismatch on [1928010211328 262144]
ref mismatch on [1928010473472 262144] extent item 0, found 1
data backref 1928010473472 parent 10917180702720 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928010473472 parent 10917180702720 owner 0 offset 0 found 1 wanted 0 back 0x49797b90
backpointer mismatch on [1928010473472 262144]
ref mismatch on [1928010735616 397312] extent item 0, found 1
data backref 1928010735616 parent 10917294374912 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928010735616 parent 10917294374912 owner 0 offset 0 found 1 wanted 0 back 0x4ce19d70
backpointer mismatch on [1928010735616 397312]
ref mismatch on [1928011132928 266240] extent item 0, found 1
data backref 1928011132928 parent 10917294374912 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928011132928 parent 10917294374912 owner 0 offset 0 found 1 wanted 0 back 0x4ce0d060
backpointer mismatch on [1928011132928 266240]
ref mismatch on [1928011399168 262144] extent item 0, found 1
data backref 1928011399168 parent 10917294374912 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928011399168 parent 10917294374912 owner 0 offset 0 found 1 wanted 0 back 0x37cf6ca0
backpointer mismatch on [1928011399168 262144]
ref mismatch on [1928011661312 262144] extent item 0, found 1
data backref 1928011661312 parent 10917294374912 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928011661312 parent 10917294374912 owner 0 offset 0 found 1 wanted 0 back 0x40764d90
backpointer mismatch on [1928011661312 262144]
ref mismatch on [1928011923456 262144] extent item 0, found 1
data backref 1928011923456 parent 10917294374912 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928011923456 parent 10917294374912 owner 0 offset 0 found 1 wanted 0 back 0x4ce1a230
backpointer mismatch on [1928011923456 262144]
ref mismatch on [1928012185600 262144] extent item 0, found 1
data backref 1928012185600 parent 10917294374912 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928012185600 parent 10917294374912 owner 0 offset 0 found 1 wanted 0 back 0x4979c790
backpointer mismatch on [1928012185600 262144]
ref mismatch on [1928012447744 262144] extent item 0, found 1
data backref 1928012447744 parent 10917180702720 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928012447744 parent 10917180702720 owner 0 offset 0 found 1 wanted 0 back 0x37d16990
backpointer mismatch on [1928012447744 262144]
ref mismatch on [1928012709888 262144] extent item 0, found 1
data backref 1928012709888 parent 10917294374912 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928012709888 parent 10917294374912 owner 0 offset 0 found 1 wanted 0 back 0x407654b0
backpointer mismatch on [1928012709888 262144]
ref mismatch on [1928012972032 262144] extent item 0, found 1
data backref 1928012972032 parent 10917294374912 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928012972032 parent 10917294374912 owner 0 offset 0 found 1 wanted 0 back 0x37d1fed0
backpointer mismatch on [1928012972032 262144]
ref mismatch on [1928013234176 262144] extent item 0, found 1
data backref 1928013234176 parent 10916985520128 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928013234176 parent 10916985520128 owner 0 offset 0 found 1 wanted 0 back 0x4ce4cff0
backpointer mismatch on [1928013234176 262144]
ref mismatch on [1928013496320 262144] extent item 0, found 1
data backref 1928013496320 parent 10917294456832 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928013496320 parent 10917294456832 owner 0 offset 0 found 1 wanted 0 back 0x37cec790
backpointer mismatch on [1928013496320 262144]
ref mismatch on [1928013758464 262144] extent item 0, found 1
data backref 1928013758464 parent 10916985520128 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928013758464 parent 10916985520128 owner 0 offset 0 found 1 wanted 0 back 0x4ce4c1b0
backpointer mismatch on [1928013758464 262144]
ref mismatch on [1928014020608 262144] extent item 0, found 1
data backref 1928014020608 parent 10917294456832 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928014020608 parent 10917294456832 owner 0 offset 0 found 1 wanted 0 back 0x49782df0
backpointer mismatch on [1928014020608 262144]
ref mismatch on [1928014282752 262144] extent item 0, found 1
data backref 1928014282752 parent 10916985520128 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928014282752 parent 10916985520128 owner 0 offset 0 found 1 wanted 0 back 0x4ce640d0
backpointer mismatch on [1928014282752 262144]
ref mismatch on [1928014544896 262144] extent item 0, found 1
data backref 1928014544896 parent 10917294456832 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928014544896 parent 10917294456832 owner 0 offset 0 found 1 wanted 0 back 0x4c7a2e90
backpointer mismatch on [1928014544896 262144]
ref mismatch on [1928014807040 262144] extent item 0, found 1
data backref 1928014807040 parent 10917294456832 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928014807040 parent 10917294456832 owner 0 offset 0 found 1 wanted 0 back 0x40766090
backpointer mismatch on [1928014807040 262144]
ref mismatch on [1928015069184 262144] extent item 0, found 1
data backref 1928015069184 parent 10917294456832 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928015069184 parent 10917294456832 owner 0 offset 0 found 1 wanted 0 back 0x426b44c0
backpointer mismatch on [1928015069184 262144]
ref mismatch on [1928015331328 528384] extent item 0, found 1
data backref 1928015331328 parent 10872129732608 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928015331328 parent 10872129732608 owner 0 offset 0 found 1 wanted 0 back 0x4ce0e950
backpointer mismatch on [1928015331328 528384]
ref mismatch on [1928015859712 528384] extent item 0, found 1
data backref 1928015859712 parent 10872129732608 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928015859712 parent 10872129732608 owner 0 offset 0 found 1 wanted 0 back 0x49783c30
backpointer mismatch on [1928015859712 528384]
ref mismatch on [1928016388096 528384] extent item 0, found 1
data backref 1928016388096 parent 10872129732608 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928016388096 parent 10872129732608 owner 0 offset 0 found 1 wanted 0 back 0x426b4850
backpointer mismatch on [1928016388096 528384]
ref mismatch on [1928016916480 262144] extent item 0, found 1
data backref 1928016916480 parent 10872129732608 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928016916480 parent 10872129732608 owner 0 offset 0 found 1 wanted 0 back 0x40766b40
backpointer mismatch on [1928016916480 262144]
ref mismatch on [1928017178624 262144] extent item 0, found 1
data backref 1928017178624 parent 10872129732608 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 1928017178624 parent 10872129732608 owner 0 offset 0 found 1 wanted 0 back 0x4c7a4d70
backpointer mismatch on [1928017178624 262144]
owner ref check failed [10919566688256 16384]
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache
[4/7] checking fs roots
root 257 inode 12842679 errors 1040, bad file extent, some csum missing
root 10463 inode 2159253 errors 1040, bad file extent, some csum missing
root 10463 inode 2159255 errors 1040, bad file extent, some csum missing
root 10463 inode 2159356 errors 1040, bad file extent, some csum missing
root 10463 inode 2159414 errors 1040, bad file extent, some csum missing
root 10464 inode 12842679 errors 1040, bad file extent, some csum missing
root 10531 inode 12842679 errors 1040, bad file extent, some csum missing
root 10594 inode 12842679 errors 1040, bad file extent, some csum missing
root 10622 inode 12842679 errors 1040, bad file extent, some csum missing
root 10653 inode 2159253 errors 1040, bad file extent, some csum missing
root 10653 inode 2159255 errors 1040, bad file extent, some csum missing
root 10653 inode 2159356 errors 1040, bad file extent, some csum missing
root 10653 inode 2159414 errors 1040, bad file extent, some csum missing
root 10667 inode 2159253 errors 1040, bad file extent, some csum missing
root 10667 inode 2159255 errors 1040, bad file extent, some csum missing
root 10667 inode 2159356 errors 1040, bad file extent, some csum missing
root 10667 inode 2159414 errors 1040, bad file extent, some csum missing
root 10668 inode 12842679 errors 1040, bad file extent, some csum missing
root 10825 inode 2159253 errors 1040, bad file extent, some csum missing
root 10825 inode 2159255 errors 1040, bad file extent, some csum missing
root 10825 inode 2159356 errors 1040, bad file extent, some csum missing
root 10825 inode 2159414 errors 1040, bad file extent, some csum missing
root 10826 inode 12842679 errors 1040, bad file extent, some csum missing
root 10854 inode 2159253 errors 1040, bad file extent, some csum missing
root 10854 inode 2159255 errors 1040, bad file extent, some csum missing
root 10854 inode 2159356 errors 1040, bad file extent, some csum missing
root 10854 inode 2159414 errors 1040, bad file extent, some csum missing
root 10855 inode 12842679 errors 1040, bad file extent, some csum missing
root 10915 inode 2159253 errors 1040, bad file extent, some csum missing
root 10915 inode 2159255 errors 1040, bad file extent, some csum missing
root 10915 inode 2159356 errors 1040, bad file extent, some csum missing
root 10915 inode 2159414 errors 1040, bad file extent, some csum missing
root 10916 inode 12842679 errors 1040, bad file extent, some csum missing
root 10924 inode 2159253 errors 1040, bad file extent, some csum missing
root 10924 inode 2159255 errors 1040, bad file extent, some csum missing
root 10924 inode 2159356 errors 1040, bad file extent, some csum missing
root 10924 inode 2159414 errors 1040, bad file extent, some csum missing
root 10925 inode 12842679 errors 1040, bad file extent, some csum missing
root 10948 inode 12842679 errors 1040, bad file extent, some csum missing
root 10971 inode 12842679 errors 1040, bad file extent, some csum missing
root 10975 inode 12842679 errors 1040, bad file extent, some csum missing
root 10978 inode 12842679 errors 1040, bad file extent, some csum missing
root 10996 inode 12842679 errors 1040, bad file extent, some csum missing
root 11002 inode 12842679 errors 1040, bad file extent, some csum missing
root 11006 inode 12842679 errors 1040, bad file extent, some csum missing
root 11011 inode 12842679 errors 1040, bad file extent, some csum missing
root 11028 inode 12842679 errors 1040, bad file extent, some csum missing
root 11142 inode 12842679 errors 1040, bad file extent, some csum missing
root 11187 inode 12842679 errors 1040, bad file extent, some csum missing
root 11188 inode 12842679 errors 1040, bad file extent, some csum missing
root 12277 inode 12842679 errors 1040, bad file extent, some csum missing
root 12278 inode 12842679 errors 1040, bad file extent, some csum missing
root 12279 inode 12842679 errors 1040, bad file extent, some csum missing
root 12280 inode 12842679 errors 1040, bad file extent, some csum missing
root 12281 inode 12842679 errors 1040, bad file extent, some csum missing
ERROR: errors found in fs roots
found 5392046571529 bytes used, error(s) found
total csum bytes: 5236510572
total tree bytes: 8053063680
total fs tree bytes: 1788641280
total extent tree bytes: 420773888
btree space waste bytes: 917733620
file data blocks allocated: 57094123556864
 referenced 5448931557376


Relevant dmesg lines for the removal:
[    2.780304] sd 2:0:0:0: Attached scsi generic sg0 type 0
[    2.780433] sd 2:0:0:0: [sda] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    2.780508] sd 2:0:0:0: [sda] 4096-byte physical blocks
[    2.780595] sd 2:0:0:0: [sda] Write Protect is off
[    2.780650] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    2.780698] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    2.791335]  sda:
[    2.792007] sd 2:0:0:0: [sda] Attached SCSI disk
[    3.585553] sd 5:0:0:0: Attached scsi generic sg1 type 0
[    3.586091] sd 5:0:1:0: Attached scsi generic sg2 type 0
[    3.586598] sd 5:0:2:0: Attached scsi generic sg3 type 0
[    3.587345] sd 5:0:3:0: Attached scsi generic sg4 type 0
[    3.588572] sd 5:0:2:0: [sdd] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
[    3.588642] sd 5:0:2:0: [sdd] 4096-byte physical blocks
[    3.589282] sd 5:0:3:0: [sde] 7814035055 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    3.589349] sd 5:0:3:0: [sde] 4096-byte physical blocks
[    3.591269] sd 5:0:0:0: [sdb] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
[    3.591794] sd 5:0:1:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
[    3.592081] sd 5:0:2:0: [sdd] Write Protect is off
[    3.592136] sd 5:0:2:0: [sdd] Mode Sense: 7f 00 10 08
[    3.592939] sd 5:0:2:0: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    3.593217] sd 5:0:3:0: [sde] Write Protect is off
[    3.593274] sd 5:0:3:0: [sde] Mode Sense: 7f 00 10 08
[    3.594144] sd 5:0:3:0: [sde] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    3.603890] sd 5:0:1:0: [sdc] Write Protect is off
[    3.603947] sd 5:0:1:0: [sdc] Mode Sense: 7f 00 10 08
[    3.606233] sd 5:0:1:0: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    3.610693] sd 5:0:0:0: [sdb] Write Protect is off
[    3.610751] sd 5:0:0:0: [sdb] Mode Sense: 7f 00 10 08
[    3.613028] sd 5:0:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    3.622881]  sde:
[    3.630991] sd 5:0:3:0: [sde] Attached SCSI disk
[    3.639546]  sdd:
[    3.646910] sd 5:0:2:0: [sdd] Attached SCSI disk
[    3.653398]  sdc:
[    3.660196]  sdb:
[    3.708399] sd 5:0:1:0: [sdc] Attached SCSI disk
[    3.715218] sd 5:0:0:0: [sdb] Attached SCSI disk
[    5.908498] Btrfs loaded, crc32c=crc32c-generic
[    6.100531] BTRFS: device label Susanita devid 3 transid 4241667 /dev/sdb scanned by btrfs (333)
[    6.100838] BTRFS: device label Susanita devid 6 transid 4241667 /dev/sdc scanned by btrfs (333)
[    6.101405] BTRFS: device label Susanita devid 2 transid 4241667 /dev/sde scanned by btrfs (333)
[    6.101622] BTRFS: device label Susanita devid 5 transid 4241667 /dev/sdd scanned by btrfs (333)
[    6.101931] BTRFS: device label Susanita devid 1 transid 4241667 /dev/sda scanned by btrfs (333)
[    6.118215] process '/bin/fstype' started with executable stack
[    6.176610] BTRFS info (device sda): disk space caching is enabled
[    6.176689] BTRFS info (device sda): has skinny extents
[    6.850895] BTRFS info (device sda): bdev /dev/sdd errs: wr 0, rd 24, flush 0, corrupt 0, gen 0
[   40.273679] BTRFS info (device sda): enabling auto defrag
[   40.273756] BTRFS info (device sda): use lzo compression, level 0
[   40.273824] BTRFS info (device sda): disk space caching is enabled
[ 2192.336693] BTRFS info (device sda): relocating block group 10593404846080 flags metadata|raid10
[ 2239.450909] BTRFS error (device sda): bad tree block start, want 10919566688256 have 10597444141056
[ 2239.481337] BTRFS error (device sda): bad tree block start, want 10919566688256 have 17196831625821864417
[ 2239.481382] BTRFS: error (device sda) in btrfs_run_delayed_refs:2173: errno=-5 IO failure
[ 2239.481388] BTRFS info (device sda): forced readonly
[ 2239.481839] BTRFS error (device sda): bad tree block start, want 10919566688256 have 10597444141056
[ 2239.483125] BTRFS error (device sda): bad tree block start, want 10919566688256 have 17196831625821864417
[ 2239.483166] BTRFS: error (device sda) in btrfs_run_delayed_refs:2173: errno=-5 IO failure


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid10 corruption while removing failing disk
  2020-08-10  7:51     ` Nikolay Borisov
  2020-08-10  8:57       ` Martin Steigerwald
@ 2020-08-11  1:30       ` Chris Murphy
  1 sibling, 0 replies; 17+ messages in thread
From: Chris Murphy @ 2020-08-11  1:30 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Martin Steigerwald, Agustín DallʼAlba, Btrfs BTRFS

On Mon, Aug 10, 2020 at 1:51 AM Nikolay Borisov <nborisov@suse.com> wrote:
> Fedora for example
> spent the time to test btrfs for their use cases + they have the support
> from some of the developers to fix issues they find because they
> themselves don't have the development expertise (yet) to deal with
> btrfs. Furthermore I *think* fedora is sticking to using unadorned
> upstream kernels (don't quote me on this, I have never used fedora).

Fedora has explicitly listed btrfs in release blocking criteria,
meaning related bugs (i.e. some btrfs bug related to the installer,
not even necessarily kernel bugs, but that too) get fixed somehow,
since 2012. Fedora btrfs installations use mkfs.btrfs defaults, and
default mount options.

Also Fedora doesn't carry any special patches for btrfs. Although, if
there were an upstream tested important fix that's intended to head to
current stable anyway, it'd likely be taken as a patch and thus early
fix. Right now we're carrying one planned for btrfs-progs 5.8, which
is mkfs.btrfs default for multiple devices, using -d single instead of
-d raid0.

The other thing is that Fedora users have fairly up to date kernels if
they accept the recommended updates. Right now everyone on Fedora 31
and 32 should have 5.7.11. And 5.7.14 is heading to stable updates.
Anyone can opt into updates-testing to get the most current upstream
stable kernel version, or even rawhide kernels to get the upstream
mainline kernels which now have CONFIG_BTRFS_ASSERT=y and
CONFIG_BTRFS_FS=y instead of built as a module.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid10 corruption while removing failing disk
  2020-08-11  1:18   ` Agustín DallʼAlba
@ 2020-08-11  1:48     ` Chris Murphy
  0 siblings, 0 replies; 17+ messages in thread
From: Chris Murphy @ 2020-08-11  1:48 UTC (permalink / raw)
  To: Agustín DallʼAlba; +Cc: Nikolay Borisov, Btrfs BTRFS, Zygo Blaxell

On Mon, Aug 10, 2020 at 7:19 PM Agustín DallʼAlba
<agustin@dallalba.com.ar> wrote:
>
> On Mon, 2020-08-10 at 11:21 +0300, Nikolay Borisov wrote:
> > This suggests you are hitting a known problem with reloc roots which
> > have been fixed in the latest upstream and lts (5.4) kernels:
> >
> > 044ca910276b btrfs: reloc: fix reloc root leak and NULL pointer
> > dereference (3 months ago) <Qu Wenruo>
> > 707de9c0806d btrfs: relocation: fix reloc_root lifespan and access (7
> > months ago) <Qu Wenruo>
> > 1fac4a54374f btrfs: relocation: fix use-after-free on dead relocation
> > roots (11 months ago) <Qu Wenruo>
> >
> >
> > So yes, try to update to latest stable kernel and re-run the device
> > remove. Also update your btrfs progs to latest 5.6 version and rerun
> > check again (by default it's a read only operations so it shouldn't
> > cause any more damage).
>
> I have tried again with the 5.8.0 kernel and btrfs-progs v5.7 (which
> I've compiled statically on a different machine and used only for btrfs
> device remove and btrfs check). The system still goes read-only when I
> attempt to remove the failing drive, but it doesn't oops in this
> version.
>
> This version of btrfs check finds many more problems, however the
> 'checksum verify failed' line looks supicious: instead of `found
> BAB1746E wanted A8A48266` it prints `found 0000006E wanted 00000066`
> like if the checksums had been truncated to 8 bits before printing.


6E = 1101110
66 = 1100110

So it's a bit flip. I think newer versions of found/wanted are only
showing what's different, maybe someone will confirm it. 'btrfs check
--repair' can usually fix most cases of bit flips in metadata. I'm not
sure about the other problems, if they're related or fixable.

Top candidate is bad RAM but it could be something else. Best to start
doing a memtest86+, newer the better. If it's bad RAM sometimes it'll
find it in a couple hours, and other times, a few days which is
definitely annoying.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid10 corruption while removing failing disk
  2020-08-10  7:03 raid10 corruption while removing failing disk Agustín DallʼAlba
  2020-08-10  7:22 ` Nikolay Borisov
  2020-08-10  8:21 ` Nikolay Borisov
@ 2020-08-11  2:34 ` Chris Murphy
  2020-08-11  5:06   ` Agustín DallʼAlba
  2 siblings, 1 reply; 17+ messages in thread
From: Chris Murphy @ 2020-08-11  2:34 UTC (permalink / raw)
  To: Agustín DallʼAlba; +Cc: Btrfs BTRFS

On Mon, Aug 10, 2020 at 1:03 AM Agustín DallʼAlba
<agustin@dallalba.com.ar> wrote:
>
> Hello!
>
> The last quarterly scrub on our btrfs filesystem found a few bad
> sectors in one of its devices (/dev/sdd), and because there's nobody on
> site to replace the failing disk I decided to remove it from the array
> with `btrfs device remove` before the problem could get worse.

It doesn't much matter if it gets worse, because you still have
redundancy on that dying drive until the moment it's completely toast.
And btrfs doesn't care if it's spewing read errors. You're better off
getting a replacement drive and using 'btrfs replace' because it's
faster and a bit safer. A device remove is kinda expensive because it
also involves a file system resize, and a partial balance as it moves
block groups from the device being removed.

>
> The removal was going relatively well (although slowly and I had to
> reboot a few times due to the bad sectors) until it had about 200 GB
> left to move. Now the filesystem turns read only when I try to finish
> the removal and `btrfs check` complains about wrong metadata checksums.
> However as far as I can tell none of the copies of the corrupt data are
> in the failing disk.

Do you have a complete dmesg for this time period? Because (a) bad
sectors should not exist on a recently scrubbed system (b) even if
they do exist, during device removal it's a read error like any other
time, and btrfs grabs the copy instead. Slowness suggests to me there
is a timing mismatch between SCT ERC and the default SCSI command
timer. It leads to lengthy delays and prevents bad sectors from being
properly fixed.


https://raid.wiki.kernel.org/index.php/Timeout_Mismatch

You need to check the SCT ERC for all of your drives, and make sure
they are less than the per device node command timer (this is a kernel
timer). If the drives don't support SCT ERC then you have to increase
the kernel's command timer.

This is a common problem. It afflicts users who don't know about this
bullshit kernel obscurity, and end up accepting all the defaults which
with consumer drives + Linux will almost inevitably mean data loss
with software raid. Doesn't matter if it's mdadm, lvm, or Btrfs. And
it's bullshit. But thus far upstream kernel development has argued
this should be changed by distributions instead of accepting the
kernel command timer default. I don't know any of them that do this.


>
> # btrfs fi df /
> Data, RAID1: total=4.90TiB, used=4.90TiB
> System, RAID10: total=64.00MiB, used=880.00KiB
> Metadata, RAID10: total=9.00GiB, used=7.57GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B

This is not bad but it's an odd duck. Usually you'd raid10 the data,
and raid1 the metadata. There isn't much of an advantage to raid10
metadata, plus it makes things a bit more complex. I think you'd be
better off with raid1 or even raid1c3 for metadata with this many
drives. But you need a newer kernel for btrfs raid1c3 profile.


> # btrfs check --force --readonly /dev/sda
> WARNING: filesystem mounted, continuing because of --force
> Checking filesystem on /dev/sda
> UUID: 4d3acf20-d408-49ab-b0a6-182396a9f27c
> checksum verify failed on 10919566688256 found BAB1746E wanted A8A48266
> checksum verify failed on 10919566688256 found BAB1746E wanted A8A48266

So they aren't at all the same, that's unexpected.


> After running btrfs device remove /dev/sdd /:
> [  193.684703] BTRFS info (device sda): relocating block group 10593404846080 flags metadata|raid10
> [  312.921934] BTRFS error (device sda): bad tree block start 10597444141056 10919566688256
> [  313.034339] BTRFS error (device sda): bad tree block start 17196831625821864417 10919566688256
> [  313.034595] BTRFS error (device sda): bad tree block start 10597444141056 10919566688256
> [  313.034621] BTRFS: error (device sda) in btrfs_run_delayed_refs:3083: errno=-5 IO failure

My take on this is that you've got extent tree corruption from an
older kernel bug or possibly memory corruption, and that's why scrub
didn't catch it. Scrub only verifies against checksums not for the
correctness of file system metadata, i.e. it's not a file system
check.

5.4 and newer kernels have a read and write time tree checker designed
to catch problems like this.

My advice is to mount ro, backup (or two copies for important info),
and start with a new Btrfs file system and restore. It's not worth
repairing. When you first mount the file system, do a one time mount
option 'space_cache=v2' - it'll soon be the default and you'll see
slightly better performance. This sets a feature flag so you don't
need to keep using the mount option, it'll get used automatically.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid10 corruption while removing failing disk
  2020-08-11  2:34 ` Chris Murphy
@ 2020-08-11  5:06   ` Agustín DallʼAlba
  2020-08-11 19:17     ` Chris Murphy
  0 siblings, 1 reply; 17+ messages in thread
From: Agustín DallʼAlba @ 2020-08-11  5:06 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On Mon, 2020-08-10 at 20:34 -0600, Chris Murphy wrote:
> On Mon, Aug 10, 2020 at 1:03 AM Agustín DallʼAlba
> <agustin@dallalba.com.ar> wrote:
> > Hello!
> > 
> > The last quarterly scrub on our btrfs filesystem found a few bad
> > sectors in one of its devices (/dev/sdd), and because there's nobody on
> > site to replace the failing disk I decided to remove it from the array
> > with `btrfs device remove` before the problem could get worse.
> 
> It doesn't much matter if it gets worse, because you still have
> redundancy on that dying drive until the moment it's completely toast.
> And btrfs doesn't care if it's spewing read errors. 

By 'get worse', I mean another drive failing, and then we'd definitely
lose data. Because of the pandemic there was (and still is) nobody on
site to replace the drive, and I won't be able to go there for who
knows how many months.

> Do you have a complete dmesg for this time period? Because (a) bad
> sectors should not exist on a recently scrubbed system (b) even if
> they do exist, during device removal it's a read error like any other
> time, and btrfs grabs the copy instead. Slowness suggests to me there
> is a timing mismatch between SCT ERC and the default SCSI command
> timer. It leads to lengthy delays and prevents bad sectors from being
> properly fixed.

I have a _partial_ dmesg of this time period. It's got a lot of gaps in
between reboots. I'll send it to you without ccing the list. The
failing drive is an atrocious WD green for which I forgot to set the
idle3 timer, that doesn't support SCT ERC and lately just hangs forever
and requires a power cycle. So there's no way around the slowness. It
was added on a pinch a year ago because we needed more space. I
probably should have ask someone to disconnect it and used 'remove
missing'.

> > # btrfs check --force --readonly /dev/sda
> > WARNING: filesystem mounted, continuing because of --force
> > Checking filesystem on /dev/sda
> > UUID: 4d3acf20-d408-49ab-b0a6-182396a9f27c
> > checksum verify failed on 10919566688256 found BAB1746E wanted A8A48266
> > checksum verify failed on 10919566688256 found BAB1746E wanted A8A48266
> 
> So they aren't at all the same, that's unexpected.

What do you mean by this?

> My advice is to mount ro, backup (or two copies for important info),
> and start with a new Btrfs file system and restore. It's not worth
> repairing.

Sigh, I was expecting I'd have to do this. At least no data was lost,
and the system still functions even though it's read-only. Do you think
check --repair is not worth trying? Everything of value is already
backed up, but restoring it would take many hours of work.

Thanks for all the information, I hope you have a good day.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid10 corruption while removing failing disk
  2020-08-11  5:06   ` Agustín DallʼAlba
@ 2020-08-11 19:17     ` Chris Murphy
  2020-08-11 20:40       ` Agustín DallʼAlba
  2020-08-31 20:05       ` Agustín DallʼAlba
  0 siblings, 2 replies; 17+ messages in thread
From: Chris Murphy @ 2020-08-11 19:17 UTC (permalink / raw)
  To: Agustín DallʼAlba; +Cc: Chris Murphy, Btrfs BTRFS

On Mon, Aug 10, 2020 at 11:06 PM Agustín DallʼAlba
<agustin@dallalba.com.ar> wrote:
>
> On Mon, 2020-08-10 at 20:34 -0600, Chris Murphy wrote:
> > On Mon, Aug 10, 2020 at 1:03 AM Agustín DallʼAlba
> > <agustin@dallalba.com.ar> wrote:
> > > Hello!
> > >
> > > The last quarterly scrub on our btrfs filesystem found a few bad
> > > sectors in one of its devices (/dev/sdd), and because there's nobody on
> > > site to replace the failing disk I decided to remove it from the array
> > > with `btrfs device remove` before the problem could get worse.
> >
> > It doesn't much matter if it gets worse, because you still have
> > redundancy on that dying drive until the moment it's completely toast.
> > And btrfs doesn't care if it's spewing read errors.
>
> By 'get worse', I mean another drive failing, and then we'd definitely
> lose data. Because of the pandemic there was (and still is) nobody on
> site to replace the drive, and I won't be able to go there for who
> knows how many months.

Fair point.

> I have a _partial_ dmesg of this time period. It's got a lot of gaps in
> between reboots. I'll send it to you without ccing the list. The
> failing drive is an atrocious WD green for which I forgot to set the
> idle3 timer, that doesn't support SCT ERC and lately just hangs forever
> and requires a power cycle. So there's no way around the slowness. It
> was added on a pinch a year ago because we needed more space. I
> probably should have ask someone to disconnect it and used 'remove
> missing'.

That drive should have '/sys/block/sda/device/timeout' at least 120.
Although I've seen folks on linux-raid@ suggest 180. I don't know what
the actual maximum time for "deep recovery" these drives could have.

As the signal in a sector weakens, the reads get slower. You can
freshen the signal simply by rewriting data. Btrfs doesn't ever do
overwrites, but you can use 'btrfs balance' for this task. Once a year
seems reasonable, or as you notice reads becoming slower. And use a
filtered balance to avoid doing it all at once.


>
> > > # btrfs check --force --readonly /dev/sda
> > > WARNING: filesystem mounted, continuing because of --force
> > > Checking filesystem on /dev/sda
> > > UUID: 4d3acf20-d408-49ab-b0a6-182396a9f27c
> > > checksum verify failed on 10919566688256 found BAB1746E wanted A8A48266
> > > checksum verify failed on 10919566688256 found BAB1746E wanted A8A48266
> >
> > So they aren't at all the same, that's unexpected.
>
> What do you mean by this?

I only fully understood what you meant by this:
>instead of `found BAB1746E wanted A8A48266` it prints `found 0000006E wanted 00000066`

once I re-read the first email that had the full 'btrfs check' output
from the old version. And yeah I don't know why they're different now.


> > My advice is to mount ro, backup (or two copies for important info),
> > and start with a new Btrfs file system and restore. It's not worth
> > repairing.
>
> Sigh, I was expecting I'd have to do this. At least no data was lost,
> and the system still functions even though it's read-only. Do you think
> check --repair is not worth trying? Everything of value is already
> backed up, but restoring it would take many hours of work.

Metadata, RAID10: total=9.00GiB, used=7.57GiB

Ballpark 8 hours for --repair given metadata size and spinning drives.
It'll add some time adding --init-extent-tree which... is decently
likely to be needed here. So the gotcha is, see if --repair works, and
it fixes some stuff but still needs extent tree repaired anyway. Now
you have to do that and it could be another 8 hours. Or do you go with
the heavy hammer right away to save time and do both at once? But the
heavy hammer is riskier.

Whether repair or start over, you need to have the backup plus 2x for
important stuff. To do the repair you need to be prepared for the
possibility tihngs get worse. I'll argue strongly that it's a bug if
things get worse (i.e. now you can't mount ro at all) but as a risk
assessment, it has to be considered.


--
Chris Murphy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid10 corruption while removing failing disk
  2020-08-11 19:17     ` Chris Murphy
@ 2020-08-11 20:40       ` Agustín DallʼAlba
  2020-08-12  3:03         ` Chris Murphy
  2020-08-31 20:05       ` Agustín DallʼAlba
  1 sibling, 1 reply; 17+ messages in thread
From: Agustín DallʼAlba @ 2020-08-11 20:40 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On Tue, 2020-08-11 at 13:17 -0600, Chris Murphy wrote:
> That drive should have '/sys/block/sda/device/timeout' at least 120.
> Although I've seen folks on linux-raid@ suggest 180. I don't know what
> the actual maximum time for "deep recovery" these drives could have.

I'll do this. Is there any reason not to set _every_ drive to 180s? As
far as I can tell it doesn't really hurt to have the timeout be very
long when the drives do support SCT ERC and if I simply write an udev
rule that matches all disks I won't have to remember to do this again
in the future

> As the signal in a sector weakens, the reads get slower. You can
> freshen the signal simply by rewriting data. Btrfs doesn't ever do
> overwrites, but you can use 'btrfs balance' for this task. Once a year
> seems reasonable, or as you notice reads becoming slower. And use a
> filtered balance to avoid doing it all at once.

I suspect it's the head that's damaged, not the sectors. I forgot to
set the idle3 timer on this drive, which is a power saving "feature" of
WD greens, to something reasonable for years and in the meantime the
head has parked 1.7 million times. Keeping this in mind it sounds to me
like a bad idea to write to it.

> I only fully understood what you meant by this:
> > instead of `found BAB1746E wanted A8A48266` it prints `found 0000006E wanted 00000066`
> 
> once I re-read the first email that had the full 'btrfs check' output
> from the old version. And yeah I don't know why they're different now.

I looked at the code and I think it's just a presentation bug. In disk-
io.c:177 both `result` and `buf->data` are arrays of u8, while they
used to be casted to u32 in btrfs-progs v4.15. The memcmp checks the
full checksum anyway so there's no worries about btrfs check doing the
wrong thing.

> Ballpark 8 hours for --repair given metadata size and spinning drives.
> It'll add some time adding --init-extent-tree which... is decently
> likely to be needed here. So the gotcha is, see if --repair works, and
> it fixes some stuff but still needs extent tree repaired anyway. Now
> you have to do that and it could be another 8 hours. Or do you go with
> the heavy hammer right away to save time and do both at once? But the
> heavy hammer is riskier.
> 
> Whether repair or start over, you need to have the backup plus 2x for
> important stuff. To do the repair you need to be prepared for the
> possibility tihngs get worse. I'll argue strongly that it's a bug if
> things get worse (i.e. now you can't mount ro at all) but as a risk
> assessment, it has to be considered.

It's 16 hours I can run overnight vs 1 - 2 weeks of copying 4 TB of
non-essential data over the Internet at 100 Mbps. I think I'll make
sure there's two copies of the important stuff somewhere and take the
risk.

Is it worse to do the --repair while degraded? I'm sure the failing
drive will manage to ruin the day if leave it connected, as I said it
sometimes decides to hang forever.

Thanks a lot.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid10 corruption while removing failing disk
  2020-08-11 20:40       ` Agustín DallʼAlba
@ 2020-08-12  3:03         ` Chris Murphy
  0 siblings, 0 replies; 17+ messages in thread
From: Chris Murphy @ 2020-08-12  3:03 UTC (permalink / raw)
  To: Agustín DallʼAlba; +Cc: Chris Murphy, Btrfs BTRFS

On Tue, Aug 11, 2020 at 2:40 PM Agustín DallʼAlba
<agustin@dallalba.com.ar> wrote:
>
> On Tue, 2020-08-11 at 13:17 -0600, Chris Murphy wrote:
> > That drive should have '/sys/block/sda/device/timeout' at least 120.
> > Although I've seen folks on linux-raid@ suggest 180. I don't know what
> > the actual maximum time for "deep recovery" these drives could have.
>
> I'll do this. Is there any reason not to set _every_ drive to 180s?

Arguably the drive should figure out WTF it wants to do with a command
within a second. Giving it 7 seconds to fail is quite a while. The
very idea a drive could seriously need 180 seconds to read one g.d.
sector and keep iterating on it to recover? It seems curious. The
rationale for 30s is that a drive taking that long to decide is
probably a drive that needs a link reset, which is what the SCSI
driver does when the timeout is reached. And yet that's obviously
wrong for a large number of consumer HDDs that have these lengthy
recoveries, by design.

The ideal timeout would be the one recommended by the manufacturer for
each make/model, i.e. our drive worse case, will either return data or
an error in X seconds. And you'd set the command timer to that number
+ 1. And it'd be fine to make it different per drive, because any
delay longer than that just means unnecessary waiting.


>As
> far as I can tell it doesn't really hurt to have the timeout be very
> long when the drives do support SCT ERC and if I simply write an udev
> rule that matches all disks I won't have to remember to do this again
> in the future.

If the drive supports configurable SCT ERC, you want to use that. Set
it to 7-10 seconds (the units for SCT ERC are deciseconds), and leave
the SCSI command timer set to a 30s default. It may not be great for
your workload to have an transient delay up to 3 minutes, but that
shouldn't happen in the first place if the sector data signal is good,
the reads should not be slow, they shouldn't need deep recovery.

As I think about it, you might instead of using a filtered balance,
put a spare in this system. And use 'btrfs replace' to replace drives,
round robin style. That'll perform better than balance, and gets you
back to a "normal" state much faster. Plus if you ever have a drive
failure, you've got a drive ready as a persistent replacement.



> > As the signal in a sector weakens, the reads get slower. You can
> > freshen the signal simply by rewriting data. Btrfs doesn't ever do
> > overwrites, but you can use 'btrfs balance' for this task. Once a year
> > seems reasonable, or as you notice reads becoming slower. And use a
> > filtered balance to avoid doing it all at once.
>
> I suspect it's the head that's damaged, not the sectors. I forgot to
> set the idle3 timer on this drive, which is a power saving "feature" of
> WD greens, to something reasonable for years and in the meantime the
> head has parked 1.7 million times. Keeping this in mind it sounds to me
> like a bad idea to write to it.

I see. If you think it's bad for reads, you could optionally do

# echo 1 > /sys/block/sda/device/delete

That'll just make it vanish. And then you could do a 'btrfs device
remove missing'. This is ordinarily riskier because it effectively
makes the array degraded. The effect of 'device remove missing' is to
reconstruct the missing data from the remaining drives. If all the
other drives are healthy, this would be a faster way to shrink the
file system by device removal.


> It's 16 hours I can run overnight vs 1 - 2 weeks of copying 4 TB of
> non-essential data over the Internet at 100 Mbps. I think I'll make
> sure there's two copies of the important stuff somewhere and take the
> risk.

Yeah.

How many changes are happening with this file system? IF it's a ton
you probably want a locally writable file system. If it's not a ton,
you could leave this Btrfs read-only, and overlayfs some other file
system on top of it to accept the small number of changes.

> Is it worse to do the --repair while degraded?

Not sure about degraded repairs. In particular I don't know how much
testing it gets.

>I'm sure the failing
> drive will manage to ruin the day if leave it connected, as I said it
> sometimes decides to hang forever.

Yeah that's not a great sign.

--
Chris Murphy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid10 corruption while removing failing disk
  2020-08-11 19:17     ` Chris Murphy
  2020-08-11 20:40       ` Agustín DallʼAlba
@ 2020-08-31 20:05       ` Agustín DallʼAlba
  1 sibling, 0 replies; 17+ messages in thread
From: Agustín DallʼAlba @ 2020-08-31 20:05 UTC (permalink / raw)
  Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3765 bytes --]

[Resent because the message was too long for the list]

On Tue, 2020-08-11 at 13:17 -0600, Chris Murphy wrote:
> > > My advice is to mount ro, backup (or two copies for important info),
> > > and start with a new Btrfs file system and restore. It's not worth
> > > repairing.
> > Sigh, I was expecting I'd have to do this. At least no data was lost,
> > and the system still functions even though it's read-only. Do you think
> > check --repair is not worth trying? Everything of value is already
> > backed up, but restoring it would take many hours of work.
> 
> Metadata, RAID10: total=9.00GiB, used=7.57GiB
> 
> Ballpark 8 hours for --repair given metadata size and spinning drives.
> It'll add some time adding --init-extent-tree which... is decently
> likely to be needed here. So the gotcha is, see if --repair works, and
> it fixes some stuff but still needs extent tree repaired anyway. Now
> you have to do that and it could be another 8 hours. Or do you go with
> the heavy hammer right away to save time and do both at once? But the
> heavy hammer is riskier.
> 
> Whether repair or start over, you need to have the backup plus 2x for
> important stuff. To do the repair you need to be prepared for the
> possibility tihngs get worse. I'll argue strongly that it's a bug if
> things get worse (i.e. now you can't mount ro at all) but as a risk
> assessment, it has to be considered.

So, I've finally managed to get someone to add a disk to this system
and ran a btrfs check --repair. It failed almost immediately with:

Starting repair.
Opening filesystem to check...
Checking filesystem on /dev/disk/by-label/Susanita
UUID: 4d3acf20-d408-49ab-b0a6-182396a9f27c
[1/7] checking root items
checksum verify failed on 10919566688256 found 0000006E wanted 00000066
checksum verify failed on 10919566688256 found 0000006E wanted 00000066
bad tree block 10919566688256, bytenr mismatch, want=10919566688256, have=17196831625821864417
ERROR: failed to repair root items: Input/output error

so I ran btrfs check --init-extent-tree, and it's still running after
24 hours. It seems to have processed 2 GiB of... something:

[2/7] checking extents                         (0:04:22 elapsed, 434185 items checked)
ref mismatch on [331916251136 4096] extent item 0, found 1
data backref 331916251136 parent 10915911958528 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 331916251136 parent 10915911958528 owner 0 offset 0 found 1 wanted 0 back 0x557cdf7560f0
backpointer mismatch on [331916251136 4096]
adding new data backref on 331916251136 parent 10915911958528 owner 0 offset 0 found 1
Repaired extent references for 331916251136

[24 hours later]

[2/7] checking extents                         (23:47:26 elapsed, 434185 items checked)
ref mismatch on [334605303808 188416] extent item 0, found 2
data backref 334605303808 parent 10915986505728 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 334605303808 parent 10915986505728 owner 0 offset 0 found 1 wanted 0 back 0x557ce0ac16c0
data backref 334605303808 root 10455 owner 219090 offset 921600 num_refs 0 not found in extent tree
incorrect local backref count on 334605303808 root 10455 owner 219090 offset 921600 found 1 wanted 0 back 0x557d14faebc0
backpointer mismatch on [334605303808 188416]
adding new data backref on 334605303808 parent 10915986505728 owner 0 offset 0 found 1
adding new data backref on 334605303808 root 10455 owner 219090 offset 921600 found 1
Repaired extent references for 334605303808

But now but I've got no idea if it's doing something useful or if I'd
better ^C it and give up with this filesystem. I attached the log of the ongoing repair and of a read-only check I ran immediately before.

Cheers.

[-- Attachment #2: btrfs-check-3.xz --]
[-- Type: application/x-xz, Size: 23472 bytes --]

[-- Attachment #3: btrfs-init-extent-tree-3-truncated.xz --]
[-- Type: application/x-xz, Size: 40036 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-08-31 20:05 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-10  7:03 raid10 corruption while removing failing disk Agustín DallʼAlba
2020-08-10  7:22 ` Nikolay Borisov
2020-08-10  7:38   ` Martin Steigerwald
2020-08-10  7:51     ` Nikolay Borisov
2020-08-10  8:57       ` Martin Steigerwald
2020-08-11  1:30       ` Chris Murphy
2020-08-10  7:59     ` Agustín DallʼAlba
2020-08-10  8:21 ` Nikolay Borisov
2020-08-10 22:24   ` Zygo Blaxell
2020-08-11  1:18   ` Agustín DallʼAlba
2020-08-11  1:48     ` Chris Murphy
2020-08-11  2:34 ` Chris Murphy
2020-08-11  5:06   ` Agustín DallʼAlba
2020-08-11 19:17     ` Chris Murphy
2020-08-11 20:40       ` Agustín DallʼAlba
2020-08-12  3:03         ` Chris Murphy
2020-08-31 20:05       ` Agustín DallʼAlba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.