* Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers")
@ 2021-11-25 18:06 Lukas Pirl
2021-12-02 14:49 ` Lukas Pirl
0 siblings, 1 reply; 8+ messages in thread
From: Lukas Pirl @ 2021-11-25 18:06 UTC (permalink / raw)
To: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 7485 bytes --]
Dear btrfs community,
this is another report of a probably endless balance which loops on
"found 1 extents, stage: update data pointers".
I observe it on a btrfs RAID 1 on around 7 luks-encrypted spinning
disks (more fs details below) used for storing cold data. One disk
failed physically. Now, I try to "btrfs device delete missing". The
operation runs forever (probably, waited more than 30 days, another
time more than 50 days).
dmesg says:
[ 22:26] BTRFS info (device dm-1): relocating block group 1109204664320 flags data|raid1
[ 22:27] BTRFS info (device dm-1): found 4164 extents, stage: move data extents
[ +5.476247] BTRFS info (device dm-1): found 4164 extents, stage: update data pointers
[ +2.545299] BTRFS info (device dm-1): found 1 extents, stage: update data pointers
and then the last message repeats every ~ .25 seconds ("forever").
Memory and CPU usage are not excessive (most is IO wait, I assume).
What I have tried:
* Linux 4 (multiple minor versions, don't remember which exactly)
* Linux 5.10
* Linux 5.15
* btrfs-progs v5.15
* remove subvolues (before: ~ 200, after: ~ 90)
* free space cache v1, v2, none
* reboot, restart removal/balance (multiple times)
How can we find the problem here to make btrfs an even more stable
file system in the future?
(The particular fs has been created 2016, I am otherwise happy with
btrfs and advocating; BTW I have backups and are ready to use them)
Another question I was asking myself: can btrfs be forced to forget
about a device (as in "delete from meta data) to then just run a
regular balance?
Thanks in advance; I hope we can debug this.
Lukas
======================================================================
filesystem show
===============
Label: 'pool_16-03' uuid: 59301fea-434a-xxxx-bb45-08fcfe8ce113
Total devices 8 FS bytes used 3.84TiB
devid 1 size 931.51GiB used 592.00GiB path /dev/mapper/WD-
WCAU45xxxx03
devid 3 size 1.82TiB used 1.37TiB path /dev/mapper/WD-WCAZAFxxxx78
devid 4 size 931.51GiB used 593.00GiB path /dev/mapper/WD-
WCC4J7xxxxSZ
devid 5 size 1.82TiB used 1.46TiB path /dev/mapper/WD-WCC4M2xxxxXH
devid 7 size 931.51GiB used 584.00GiB path /dev/mapper/S1xxxxJ3
devid 9 size 2.73TiB used 2.28TiB path /dev/mapper/WD-WCC4N3xxxx17
devid 10 size 3.64TiB used 1.03TiB path /dev/mapper/WD-WCC7K2xxxxNS
*** Some devices missing
subvolumes
==========
~ 90, of which ~ 60 are read-only snapshots of the other ~ 30
filesystem usage
================
Overall:
Device size: 12.74TiB
Device allocated: 8.36TiB
Device unallocated: 4.38TiB
Device missing: 0.00B
Used: 7.69TiB
Free (estimated): 2.50TiB (min: 2.50TiB)
Free (statfs, df): 1.46TiB
Data ratio: 2.00
Metadata ratio: 2.00
Global reserve: 512.00MiB (used: 48.00KiB)
Multiple profiles: no
Data,RAID1: Size:4.14TiB, Used:3.82TiB (92.33%)
/dev/mapper/WD-WCAU45xxxx03 584.00GiB
/dev/mapper/WD-WCAZAFxxxx78 1.35TiB
/dev/mapper/WD-WCC4J7xxxxSZ 588.00GiB
/dev/mapper/WD-WCC4M2xxxxXH 1.44TiB
missing 510.00GiB
/dev/mapper/S1xxxxJ3 579.00GiB
/dev/mapper/WD-WCC4N3xxxx17 2.26TiB
/dev/mapper/WD-WCC7K2xxxxNS 1.01TiB
Metadata,RAID1: Size:41.00GiB, Used:23.14GiB (56.44%)
/dev/mapper/WD-WCAU45xxxx03 8.00GiB
/dev/mapper/WD-WCAZAFxxxx78 17.00GiB
/dev/mapper/WD-WCC4J7xxxxSZ 5.00GiB
/dev/mapper/WD-WCC4M2xxxxXH 13.00GiB
missing 3.00GiB
/dev/mapper/S1xxxxJ3 5.00GiB
/dev/mapper/WD-WCC4N3xxxx17 16.00GiB
/dev/mapper/WD-WCC7K2xxxxNS 15.00GiB
System,RAID1: Size:32.00MiB, Used:848.00KiB (2.59%)
missing 32.00MiB
/dev/mapper/WD-WCC4N3xxxx17 32.00MiB
Unallocated:
/dev/mapper/WD-WCAU45xxxx03 339.51GiB
/dev/mapper/WD-WCAZAFxxxx78 461.01GiB
/dev/mapper/WD-WCC4J7xxxxSZ 338.51GiB
/dev/mapper/WD-WCC4M2xxxxXH 373.01GiB
missing -513.03GiB
/dev/mapper/S1xxxxJ3 347.51GiB
/dev/mapper/WD-WCC4N3xxxx17 460.47GiB
/dev/mapper/WD-WCC7K2xxxxNS 2.61TiB
dump-super
==========
superblock: bytenr=65536, device=/dev/mapper/WD-WCAU45xxxx03
---------------------------------------------------------
csum_type 0 (crc32c)
csum_size 4
csum 0x51beb068 [match]
bytenr 65536
flags 0x1
( WRITTEN )
magic _BHRfS_M [match]
fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113
metadata_uuid 59301fea-434a-xxxx-bb45-08fcfe8ce113
label pool_16-03
generation 113519755
root 15602414796800
sys_array_size 129
chunk_root_generation 63394299
root_level 1
chunk_root 19216820502528
chunk_root_level 1
log_root 0
log_root_transid 0
log_root_level 0
total_bytes 16003136864256
bytes_used 4227124142080
sectorsize 4096
nodesize 16384
leafsize (deprecated) 16384
stripesize 4096
root_dir 6
num_devices 8
compat_flags 0x0
compat_ro_flags 0x0
incompat_flags 0x371
( MIXED_BACKREF |
COMPRESS_ZSTD |
BIG_METADATA |
EXTENDED_IREF |
SKINNY_METADATA |
NO_HOLES )
cache_generation 2975866
uuid_tree_generation 113519755
dev_item.uuid a9b2e4ea-404c-xxxx-a450-dc84b0956ce1
dev_item.fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113 [match]
dev_item.type 0
dev_item.total_bytes 1000201740288
dev_item.bytes_used 635655159808
dev_item.io_align 4096
dev_item.io_width 4096
dev_item.sector_size 4096
dev_item.devid 1
dev_item.dev_group 0
dev_item.seek_speed 0
dev_item.bandwidth 0
dev_item.generation 0
device stats
============
[/dev/mapper/WD-WCAU45xxxx03].write_io_errs 0
[/dev/mapper/WD-WCAU45xxxx03].read_io_errs 0
[/dev/mapper/WD-WCAU45xxxx03].flush_io_errs 0
[/dev/mapper/WD-WCAU45xxxx03].corruption_errs 0
[/dev/mapper/WD-WCAU45xxxx03].generation_errs 0
[/dev/mapper/WD-WCAZAFxxxx78].write_io_errs 0
[/dev/mapper/WD-WCAZAFxxxx78].read_io_errs 0
[/dev/mapper/WD-WCAZAFxxxx78].flush_io_errs 0
[/dev/mapper/WD-WCAZAFxxxx78].corruption_errs 0
[/dev/mapper/WD-WCAZAFxxxx78].generation_errs 0
[/dev/mapper/WD-WCC4J7xxxxSZ].write_io_errs 0
[/dev/mapper/WD-WCC4J7xxxxSZ].read_io_errs 1
[/dev/mapper/WD-WCC4J7xxxxSZ].flush_io_errs 0
[/dev/mapper/WD-WCC4J7xxxxSZ].corruption_errs 0
[/dev/mapper/WD-WCC4J7xxxxSZ].generation_errs 0
[/dev/mapper/WD-WCC4M2xxxxXH].write_io_errs 0
[/dev/mapper/WD-WCC4M2xxxxXH].read_io_errs 0
[/dev/mapper/WD-WCC4M2xxxxXH].flush_io_errs 0
[/dev/mapper/WD-WCC4M2xxxxXH].corruption_errs 0
[/dev/mapper/WD-WCC4M2xxxxXH].generation_errs 0
[devid:6].write_io_errs 0
[devid:6].read_io_errs 0
[devid:6].flush_io_errs 0
[devid:6].corruption_errs 72016
[devid:6].generation_errs 100
[/dev/mapper/S1xxxxJ3].write_io_errs 0
[/dev/mapper/S1xxxxJ3].read_io_errs 0
[/dev/mapper/S1xxxxJ3].flush_io_errs 0
[/dev/mapper/S1xxxxJ3].corruption_errs 2
[/dev/mapper/S1xxxxJ3].generation_errs 0
[/dev/mapper/WD-WCC4N3xxxx17].write_io_errs 0
[/dev/mapper/WD-WCC4N3xxxx17].read_io_errs 0
[/dev/mapper/WD-WCC4N3xxxx17].flush_io_errs 0
[/dev/mapper/WD-WCC4N3xxxx17].corruption_errs 0
[/dev/mapper/WD-WCC4N3xxxx17].generation_errs 0
[/dev/mapper/WD-WCC7K2xxxxNS].write_io_errs 0
[/dev/mapper/WD-WCC7K2xxxxNS].read_io_errs 0
[/dev/mapper/WD-WCC7K2xxxxNS].flush_io_errs 0
[/dev/mapper/WD-WCC7K2xxxxNS].corruption_errs 0
[/dev/mapper/WD-WCC7K2xxxxNS].generation_errs 0
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers")
2021-11-25 18:06 Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers") Lukas Pirl
@ 2021-12-02 14:49 ` Lukas Pirl
2021-12-02 18:11 ` Zygo Blaxell
0 siblings, 1 reply; 8+ messages in thread
From: Lukas Pirl @ 2021-12-02 14:49 UTC (permalink / raw)
To: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 9957 bytes --]
(re-post)
Is there no motivation to address this or do I need to supply additional
information?
Cheers
Lukas
On Thu, 2021-11-25 19:06 +0100, Lukas Pirl wrote as excerpted:
> Dear btrfs community,
>
> this is another report of a probably endless balance which loops on
> "found 1 extents, stage: update data pointers".
>
> I observe it on a btrfs RAID 1 on around 7 luks-encrypted spinning
> disks (more fs details below) used for storing cold data. One disk
> failed physically. Now, I try to "btrfs device delete missing". The
> operation runs forever (probably, waited more than 30 days, another
> time more than 50 days).
>
> dmesg says:
> [ 22:26] BTRFS info (device dm-1): relocating block group 1109204664320
> flags data|raid1
> [ 22:27] BTRFS info (device dm-1): found 4164 extents, stage: move data
> extents
> [ +5.476247] BTRFS info (device dm-1): found 4164 extents, stage: update
> data pointers
> [ +2.545299] BTRFS info (device dm-1): found 1 extents, stage: update data
> pointers
>
> and then the last message repeats every ~ .25 seconds ("forever").
> Memory and CPU usage are not excessive (most is IO wait, I assume).
>
> What I have tried:
> * Linux 4 (multiple minor versions, don't remember which exactly)
> * Linux 5.10
> * Linux 5.15
> * btrfs-progs v5.15
> * remove subvolues (before: ~ 200, after: ~ 90)
> * free space cache v1, v2, none
> * reboot, restart removal/balance (multiple times)
>
> How can we find the problem here to make btrfs an even more stable
> file system in the future?
> (The particular fs has been created 2016, I am otherwise happy with
> btrfs and advocating; BTW I have backups and are ready to use them)
>
> Another question I was asking myself: can btrfs be forced to forget
> about a device (as in "delete from meta data) to then just run a
> regular balance?
>
> Thanks in advance; I hope we can debug this.
>
> Lukas
>
> ======================================================================
>
> filesystem show
> ===============
>
> Label: 'pool_16-03' uuid: 59301fea-434a-xxxx-bb45-08fcfe8ce113
> Total devices 8 FS bytes used 3.84TiB
> devid 1 size 931.51GiB used 592.00GiB path /dev/mapper/WD-
> WCAU45xxxx03
> devid 3 size 1.82TiB used 1.37TiB path /dev/mapper/WD-
> WCAZAFxxxx78
> devid 4 size 931.51GiB used 593.00GiB path /dev/mapper/WD-
> WCC4J7xxxxSZ
> devid 5 size 1.82TiB used 1.46TiB path /dev/mapper/WD-
> WCC4M2xxxxXH
> devid 7 size 931.51GiB used 584.00GiB path /dev/mapper/S1xxxxJ3
> devid 9 size 2.73TiB used 2.28TiB path /dev/mapper/WD-
> WCC4N3xxxx17
> devid 10 size 3.64TiB used 1.03TiB path /dev/mapper/WD-
> WCC7K2xxxxNS
> *** Some devices missing
>
> subvolumes
> ==========
>
> ~ 90, of which ~ 60 are read-only snapshots of the other ~ 30
>
> filesystem usage
> ================
>
> Overall:
> Device size: 12.74TiB
> Device allocated: 8.36TiB
> Device unallocated: 4.38TiB
> Device missing: 0.00B
> Used: 7.69TiB
> Free (estimated): 2.50TiB (min: 2.50TiB)
> Free (statfs, df): 1.46TiB
> Data ratio: 2.00
> Metadata ratio: 2.00
> Global reserve: 512.00MiB (used: 48.00KiB)
> Multiple profiles: no
>
> Data,RAID1: Size:4.14TiB, Used:3.82TiB (92.33%)
> /dev/mapper/WD-WCAU45xxxx03 584.00GiB
> /dev/mapper/WD-WCAZAFxxxx78 1.35TiB
> /dev/mapper/WD-WCC4J7xxxxSZ 588.00GiB
> /dev/mapper/WD-WCC4M2xxxxXH 1.44TiB
> missing 510.00GiB
> /dev/mapper/S1xxxxJ3 579.00GiB
> /dev/mapper/WD-WCC4N3xxxx17 2.26TiB
> /dev/mapper/WD-WCC7K2xxxxNS 1.01TiB
>
> Metadata,RAID1: Size:41.00GiB, Used:23.14GiB (56.44%)
> /dev/mapper/WD-WCAU45xxxx03 8.00GiB
> /dev/mapper/WD-WCAZAFxxxx78 17.00GiB
> /dev/mapper/WD-WCC4J7xxxxSZ 5.00GiB
> /dev/mapper/WD-WCC4M2xxxxXH 13.00GiB
> missing 3.00GiB
> /dev/mapper/S1xxxxJ3 5.00GiB
> /dev/mapper/WD-WCC4N3xxxx17 16.00GiB
> /dev/mapper/WD-WCC7K2xxxxNS 15.00GiB
>
> System,RAID1: Size:32.00MiB, Used:848.00KiB (2.59%)
> missing 32.00MiB
> /dev/mapper/WD-WCC4N3xxxx17 32.00MiB
>
> Unallocated:
> /dev/mapper/WD-WCAU45xxxx03 339.51GiB
> /dev/mapper/WD-WCAZAFxxxx78 461.01GiB
> /dev/mapper/WD-WCC4J7xxxxSZ 338.51GiB
> /dev/mapper/WD-WCC4M2xxxxXH 373.01GiB
> missing -513.03GiB
> /dev/mapper/S1xxxxJ3 347.51GiB
> /dev/mapper/WD-WCC4N3xxxx17 460.47GiB
> /dev/mapper/WD-WCC7K2xxxxNS 2.61TiB
>
> dump-super
> ==========
>
> superblock: bytenr=65536, device=/dev/mapper/WD-WCAU45xxxx03
> ---------------------------------------------------------
> csum_type 0 (crc32c)
> csum_size 4
> csum 0x51beb068 [match]
> bytenr 65536
> flags 0x1
> ( WRITTEN )
> magic _BHRfS_M [match]
> fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113
> metadata_uuid 59301fea-434a-xxxx-bb45-08fcfe8ce113
> label pool_16-03
> generation 113519755
> root 15602414796800
> sys_array_size 129
> chunk_root_generation 63394299
> root_level 1
> chunk_root 19216820502528
> chunk_root_level 1
> log_root 0
> log_root_transid 0
> log_root_level 0
> total_bytes 16003136864256
> bytes_used 4227124142080
> sectorsize 4096
> nodesize 16384
> leafsize (deprecated) 16384
> stripesize 4096
> root_dir 6
> num_devices 8
> compat_flags 0x0
> compat_ro_flags 0x0
> incompat_flags 0x371
> ( MIXED_BACKREF |
> COMPRESS_ZSTD |
> BIG_METADATA |
> EXTENDED_IREF |
> SKINNY_METADATA |
> NO_HOLES )
> cache_generation 2975866
> uuid_tree_generation 113519755
> dev_item.uuid a9b2e4ea-404c-xxxx-a450-dc84b0956ce1
> dev_item.fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113 [match]
> dev_item.type 0
> dev_item.total_bytes 1000201740288
> dev_item.bytes_used 635655159808
> dev_item.io_align 4096
> dev_item.io_width 4096
> dev_item.sector_size 4096
> dev_item.devid 1
> dev_item.dev_group 0
> dev_item.seek_speed 0
> dev_item.bandwidth 0
> dev_item.generation 0
>
> device stats
> ============
>
> [/dev/mapper/WD-WCAU45xxxx03].write_io_errs 0
> [/dev/mapper/WD-WCAU45xxxx03].read_io_errs 0
> [/dev/mapper/WD-WCAU45xxxx03].flush_io_errs 0
> [/dev/mapper/WD-WCAU45xxxx03].corruption_errs 0
> [/dev/mapper/WD-WCAU45xxxx03].generation_errs 0
> [/dev/mapper/WD-WCAZAFxxxx78].write_io_errs 0
> [/dev/mapper/WD-WCAZAFxxxx78].read_io_errs 0
> [/dev/mapper/WD-WCAZAFxxxx78].flush_io_errs 0
> [/dev/mapper/WD-WCAZAFxxxx78].corruption_errs 0
> [/dev/mapper/WD-WCAZAFxxxx78].generation_errs 0
> [/dev/mapper/WD-WCC4J7xxxxSZ].write_io_errs 0
> [/dev/mapper/WD-WCC4J7xxxxSZ].read_io_errs 1
> [/dev/mapper/WD-WCC4J7xxxxSZ].flush_io_errs 0
> [/dev/mapper/WD-WCC4J7xxxxSZ].corruption_errs 0
> [/dev/mapper/WD-WCC4J7xxxxSZ].generation_errs 0
> [/dev/mapper/WD-WCC4M2xxxxXH].write_io_errs 0
> [/dev/mapper/WD-WCC4M2xxxxXH].read_io_errs 0
> [/dev/mapper/WD-WCC4M2xxxxXH].flush_io_errs 0
> [/dev/mapper/WD-WCC4M2xxxxXH].corruption_errs 0
> [/dev/mapper/WD-WCC4M2xxxxXH].generation_errs 0
> [devid:6].write_io_errs 0
> [devid:6].read_io_errs 0
> [devid:6].flush_io_errs 0
> [devid:6].corruption_errs 72016
> [devid:6].generation_errs 100
> [/dev/mapper/S1xxxxJ3].write_io_errs 0
> [/dev/mapper/S1xxxxJ3].read_io_errs 0
> [/dev/mapper/S1xxxxJ3].flush_io_errs 0
> [/dev/mapper/S1xxxxJ3].corruption_errs 2
> [/dev/mapper/S1xxxxJ3].generation_errs 0
> [/dev/mapper/WD-WCC4N3xxxx17].write_io_errs 0
> [/dev/mapper/WD-WCC4N3xxxx17].read_io_errs 0
> [/dev/mapper/WD-WCC4N3xxxx17].flush_io_errs 0
> [/dev/mapper/WD-WCC4N3xxxx17].corruption_errs 0
> [/dev/mapper/WD-WCC4N3xxxx17].generation_errs 0
> [/dev/mapper/WD-WCC7K2xxxxNS].write_io_errs 0
> [/dev/mapper/WD-WCC7K2xxxxNS].read_io_errs 0
> [/dev/mapper/WD-WCC7K2xxxxNS].flush_io_errs 0
> [/dev/mapper/WD-WCC7K2xxxxNS].corruption_errs 0
> [/dev/mapper/WD-WCC7K2xxxxNS].generation_errs 0
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers")
2021-12-02 14:49 ` Lukas Pirl
@ 2021-12-02 18:11 ` Zygo Blaxell
2021-12-03 10:14 ` Lukas Pirl
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Zygo Blaxell @ 2021-12-02 18:11 UTC (permalink / raw)
To: Lukas Pirl; +Cc: linux-btrfs
On Thu, Dec 02, 2021 at 03:49:08PM +0100, Lukas Pirl wrote:
> (re-post)
>
> Is there no motivation to address this or do I need to supply additional
> information?
>
> Cheers
>
> Lukas
>
> On Thu, 2021-11-25 19:06 +0100, Lukas Pirl wrote as excerpted:
> > Dear btrfs community,
> >
> > this is another report of a probably endless balance which loops on
> > "found 1 extents, stage: update data pointers".
> >
> > I observe it on a btrfs RAID 1 on around 7 luks-encrypted spinning
> > disks (more fs details below) used for storing cold data. One disk
> > failed physically. Now, I try to "btrfs device delete missing". The
> > operation runs forever (probably, waited more than 30 days, another
> > time more than 50 days).
> >
> > dmesg says:
> > [ 22:26] BTRFS info (device dm-1): relocating block group 1109204664320
> > flags data|raid1
> > [ 22:27] BTRFS info (device dm-1): found 4164 extents, stage: move data
> > extents
> > [ +5.476247] BTRFS info (device dm-1): found 4164 extents, stage: update
> > data pointers
> > [ +2.545299] BTRFS info (device dm-1): found 1 extents, stage: update data
> > pointers
> >
> > and then the last message repeats every ~ .25 seconds ("forever").
> > Memory and CPU usage are not excessive (most is IO wait, I assume).
> >
> > What I have tried:
> > * Linux 4 (multiple minor versions, don't remember which exactly)
> > * Linux 5.10
> > * Linux 5.15
> > * btrfs-progs v5.15
> > * remove subvolues (before: ~ 200, after: ~ 90)
> > * free space cache v1, v2, none
> > * reboot, restart removal/balance (multiple times)
Does it always happen on the same block group? If so, that points to
something lurking in your metadata. If a reboot fixes it for one block
group and then it gets stuck on some other block group, it points to
an issue in kernel memory state.
It's more likely something on disk given all the reboots and kernel
versions you have already tried.
> > How can we find the problem here to make btrfs an even more stable
> > file system in the future?
What do you get from 'btrfs check --readonly'?
> > (The particular fs has been created 2016, I am otherwise happy with
> > btrfs and advocating; BTW I have backups and are ready to use them)
> >
> > Another question I was asking myself: can btrfs be forced to forget
> > about a device (as in "delete from meta data) to then just run a
> > regular balance?
It can, but the way you do that is "mount in degraded mode (to force
forget the device), then run btrfs device delete," and you're getting
stuck on the "btrfs device delete" step.
"btrfs device delete" is itself "resize device to zero, then run balance"
and it's the balance step you're stuck on.
> > Thanks in advance; I hope we can debug this.
> >
> > Lukas
> >
> > ======================================================================
> >
> > filesystem show
> > ===============
> >
> > Label: 'pool_16-03' uuid: 59301fea-434a-xxxx-bb45-08fcfe8ce113
> > Total devices 8 FS bytes used 3.84TiB
> > devid 1 size 931.51GiB used 592.00GiB path /dev/mapper/WD-
> > WCAU45xxxx03
> > devid 3 size 1.82TiB used 1.37TiB path /dev/mapper/WD-
> > WCAZAFxxxx78
> > devid 4 size 931.51GiB used 593.00GiB path /dev/mapper/WD-
> > WCC4J7xxxxSZ
> > devid 5 size 1.82TiB used 1.46TiB path /dev/mapper/WD-
> > WCC4M2xxxxXH
> > devid 7 size 931.51GiB used 584.00GiB path /dev/mapper/S1xxxxJ3
> > devid 9 size 2.73TiB used 2.28TiB path /dev/mapper/WD-
> > WCC4N3xxxx17
> > devid 10 size 3.64TiB used 1.03TiB path /dev/mapper/WD-
> > WCC7K2xxxxNS
> > *** Some devices missing
> >
> > subvolumes
> > ==========
> >
> > ~ 90, of which ~ 60 are read-only snapshots of the other ~ 30
> >
> > filesystem usage
> > ================
> >
> > Overall:
> > Device size: 12.74TiB
> > Device allocated: 8.36TiB
> > Device unallocated: 4.38TiB
> > Device missing: 0.00B
> > Used: 7.69TiB
> > Free (estimated): 2.50TiB (min: 2.50TiB)
> > Free (statfs, df): 1.46TiB
> > Data ratio: 2.00
> > Metadata ratio: 2.00
> > Global reserve: 512.00MiB (used: 48.00KiB)
> > Multiple profiles: no
> >
> > Data,RAID1: Size:4.14TiB, Used:3.82TiB (92.33%)
> > /dev/mapper/WD-WCAU45xxxx03 584.00GiB
> > /dev/mapper/WD-WCAZAFxxxx78 1.35TiB
> > /dev/mapper/WD-WCC4J7xxxxSZ 588.00GiB
> > /dev/mapper/WD-WCC4M2xxxxXH 1.44TiB
> > missing 510.00GiB
> > /dev/mapper/S1xxxxJ3 579.00GiB
> > /dev/mapper/WD-WCC4N3xxxx17 2.26TiB
> > /dev/mapper/WD-WCC7K2xxxxNS 1.01TiB
> >
> > Metadata,RAID1: Size:41.00GiB, Used:23.14GiB (56.44%)
> > /dev/mapper/WD-WCAU45xxxx03 8.00GiB
> > /dev/mapper/WD-WCAZAFxxxx78 17.00GiB
> > /dev/mapper/WD-WCC4J7xxxxSZ 5.00GiB
> > /dev/mapper/WD-WCC4M2xxxxXH 13.00GiB
> > missing 3.00GiB
> > /dev/mapper/S1xxxxJ3 5.00GiB
> > /dev/mapper/WD-WCC4N3xxxx17 16.00GiB
> > /dev/mapper/WD-WCC7K2xxxxNS 15.00GiB
> >
> > System,RAID1: Size:32.00MiB, Used:848.00KiB (2.59%)
> > missing 32.00MiB
> > /dev/mapper/WD-WCC4N3xxxx17 32.00MiB
> >
> > Unallocated:
> > /dev/mapper/WD-WCAU45xxxx03 339.51GiB
> > /dev/mapper/WD-WCAZAFxxxx78 461.01GiB
> > /dev/mapper/WD-WCC4J7xxxxSZ 338.51GiB
> > /dev/mapper/WD-WCC4M2xxxxXH 373.01GiB
> > missing -513.03GiB
> > /dev/mapper/S1xxxxJ3 347.51GiB
> > /dev/mapper/WD-WCC4N3xxxx17 460.47GiB
> > /dev/mapper/WD-WCC7K2xxxxNS 2.61TiB
> >
> > dump-super
> > ==========
> >
> > superblock: bytenr=65536, device=/dev/mapper/WD-WCAU45xxxx03
> > ---------------------------------------------------------
> > csum_type 0 (crc32c)
> > csum_size 4
> > csum 0x51beb068 [match]
> > bytenr 65536
> > flags 0x1
> > ( WRITTEN )
> > magic _BHRfS_M [match]
> > fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113
> > metadata_uuid 59301fea-434a-xxxx-bb45-08fcfe8ce113
> > label pool_16-03
> > generation 113519755
> > root 15602414796800
> > sys_array_size 129
> > chunk_root_generation 63394299
> > root_level 1
> > chunk_root 19216820502528
> > chunk_root_level 1
> > log_root 0
> > log_root_transid 0
> > log_root_level 0
> > total_bytes 16003136864256
> > bytes_used 4227124142080
> > sectorsize 4096
> > nodesize 16384
> > leafsize (deprecated) 16384
> > stripesize 4096
> > root_dir 6
> > num_devices 8
> > compat_flags 0x0
> > compat_ro_flags 0x0
> > incompat_flags 0x371
> > ( MIXED_BACKREF |
> > COMPRESS_ZSTD |
> > BIG_METADATA |
> > EXTENDED_IREF |
> > SKINNY_METADATA |
> > NO_HOLES )
> > cache_generation 2975866
> > uuid_tree_generation 113519755
> > dev_item.uuid a9b2e4ea-404c-xxxx-a450-dc84b0956ce1
> > dev_item.fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113 [match]
> > dev_item.type 0
> > dev_item.total_bytes 1000201740288
> > dev_item.bytes_used 635655159808
> > dev_item.io_align 4096
> > dev_item.io_width 4096
> > dev_item.sector_size 4096
> > dev_item.devid 1
> > dev_item.dev_group 0
> > dev_item.seek_speed 0
> > dev_item.bandwidth 0
> > dev_item.generation 0
> >
> > device stats
> > ============
> >
> > [/dev/mapper/WD-WCAU45xxxx03].write_io_errs 0
> > [/dev/mapper/WD-WCAU45xxxx03].read_io_errs 0
> > [/dev/mapper/WD-WCAU45xxxx03].flush_io_errs 0
> > [/dev/mapper/WD-WCAU45xxxx03].corruption_errs 0
> > [/dev/mapper/WD-WCAU45xxxx03].generation_errs 0
> > [/dev/mapper/WD-WCAZAFxxxx78].write_io_errs 0
> > [/dev/mapper/WD-WCAZAFxxxx78].read_io_errs 0
> > [/dev/mapper/WD-WCAZAFxxxx78].flush_io_errs 0
> > [/dev/mapper/WD-WCAZAFxxxx78].corruption_errs 0
> > [/dev/mapper/WD-WCAZAFxxxx78].generation_errs 0
> > [/dev/mapper/WD-WCC4J7xxxxSZ].write_io_errs 0
> > [/dev/mapper/WD-WCC4J7xxxxSZ].read_io_errs 1
> > [/dev/mapper/WD-WCC4J7xxxxSZ].flush_io_errs 0
> > [/dev/mapper/WD-WCC4J7xxxxSZ].corruption_errs 0
> > [/dev/mapper/WD-WCC4J7xxxxSZ].generation_errs 0
> > [/dev/mapper/WD-WCC4M2xxxxXH].write_io_errs 0
> > [/dev/mapper/WD-WCC4M2xxxxXH].read_io_errs 0
> > [/dev/mapper/WD-WCC4M2xxxxXH].flush_io_errs 0
> > [/dev/mapper/WD-WCC4M2xxxxXH].corruption_errs 0
> > [/dev/mapper/WD-WCC4M2xxxxXH].generation_errs 0
> > [devid:6].write_io_errs 0
> > [devid:6].read_io_errs 0
> > [devid:6].flush_io_errs 0
> > [devid:6].corruption_errs 72016
> > [devid:6].generation_errs 100
> > [/dev/mapper/S1xxxxJ3].write_io_errs 0
> > [/dev/mapper/S1xxxxJ3].read_io_errs 0
> > [/dev/mapper/S1xxxxJ3].flush_io_errs 0
> > [/dev/mapper/S1xxxxJ3].corruption_errs 2
> > [/dev/mapper/S1xxxxJ3].generation_errs 0
> > [/dev/mapper/WD-WCC4N3xxxx17].write_io_errs 0
> > [/dev/mapper/WD-WCC4N3xxxx17].read_io_errs 0
> > [/dev/mapper/WD-WCC4N3xxxx17].flush_io_errs 0
> > [/dev/mapper/WD-WCC4N3xxxx17].corruption_errs 0
> > [/dev/mapper/WD-WCC4N3xxxx17].generation_errs 0
> > [/dev/mapper/WD-WCC7K2xxxxNS].write_io_errs 0
> > [/dev/mapper/WD-WCC7K2xxxxNS].read_io_errs 0
> > [/dev/mapper/WD-WCC7K2xxxxNS].flush_io_errs 0
> > [/dev/mapper/WD-WCC7K2xxxxNS].corruption_errs 0
> > [/dev/mapper/WD-WCC7K2xxxxNS].generation_errs 0
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers")
2021-12-02 18:11 ` Zygo Blaxell
@ 2021-12-03 10:14 ` Lukas Pirl
2021-12-05 11:54 ` Lukas Pirl
2021-12-10 13:28 ` Lukas Pirl
2 siblings, 0 replies; 8+ messages in thread
From: Lukas Pirl @ 2021-12-03 10:14 UTC (permalink / raw)
To: Zygo Blaxell; +Cc: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 1580 bytes --]
Thanks for taking care, Zygo.
On Thu, 2021-12-02 13:11 -0500, Zygo Blaxell wrote as excerpted:
> Does it always happen on the same block group? If so, that points to
> something lurking in your metadata. If a reboot fixes it for one block
> group and then it gets stuck on some other block group, it points to
> an issue in kernel memory state.
Good point, I'll check. I can also do a memory test but the machine runs well
otherwise.
> What do you get from 'btrfs check --readonly'?
Interestingly, the machine disappeared from the network shortly after I issued
the command. :) I'll drive to the machine today or tomorrow, see what is going
on and report back.
> > > (The particular fs has been created 2016, I am otherwise happy with
> > > btrfs and advocating; BTW I have backups and are ready to use them)
> > >
> > > Another question I was asking myself: can btrfs be forced to forget
> > > about a device (as in "delete from meta data) to then just run a
> > > regular balance?
>
> It can, but the way you do that is "mount in degraded mode (to force
> forget the device), then run btrfs device delete," and you're getting
> stuck on the "btrfs device delete" step.
>
> "btrfs device delete" is itself "resize device to zero, then run balance"
> and it's the balance step you're stuck on.
Yes, but btrfs still knows about the drive. But if it's really the balance
that hangs, it probably wouldn't make much sense to just "delete the device
from the metadata" if one would have to balance afterwards anyway.
Cheers
Lukas
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers")
2021-12-02 18:11 ` Zygo Blaxell
2021-12-03 10:14 ` Lukas Pirl
@ 2021-12-05 11:54 ` Lukas Pirl
2021-12-10 13:28 ` Lukas Pirl
2 siblings, 0 replies; 8+ messages in thread
From: Lukas Pirl @ 2021-12-05 11:54 UTC (permalink / raw)
To: Zygo Blaxell; +Cc: linux-btrfs
Hello Zygo,
it took me (and the disks) a while to report back; here we go:
On Thu, 2021-12-02 13:11 -0500, Zygo Blaxell wrote as excerpted:
> > On Thu, 2021-11-25 19:06 +0100, Lukas Pirl wrote as excerpted:
> > > Dear btrfs community,
> > >
> > > this is another report of a probably endless balance which loops on
> > > "found 1 extents, stage: update data pointers".
> > >
> > > I observe it on a btrfs RAID 1 on around 7 luks-encrypted spinning
> > > disks (more fs details below) used for storing cold data. One disk
> > > failed physically. Now, I try to "btrfs device delete missing". The
> > > operation runs forever (probably, waited more than 30 days, another
> > > time more than 50 days).
> > >
> > > dmesg says:
> > > [ 22:26] BTRFS info (device dm-1): relocating block group
> > > 1109204664320
> > > flags data|raid1
> > > [ 22:27] BTRFS info (device dm-1): found 4164 extents, stage: move
> > > data
> > > extents
> > > [ +5.476247] BTRFS info (device dm-1): found 4164 extents, stage: update
> > > data pointers
> > > [ +2.545299] BTRFS info (device dm-1): found 1 extents, stage: update
> > > data
> > > pointers
> > >
> > > and then the last message repeats every ~ .25 seconds ("forever").
> > > Memory and CPU usage are not excessive (most is IO wait, I assume).
> > >
> > > What I have tried:
> > > * Linux 4 (multiple minor versions, don't remember which exactly)
> > > * Linux 5.10
> > > * Linux 5.15
> > > * btrfs-progs v5.15
> > > * remove subvolues (before: ~ 200, after: ~ 90)
> > > * free space cache v1, v2, none
> > > * reboot, restart removal/balance (multiple times)
>
> Does it always happen on the same block group? If so, that points to
> something lurking in your metadata. If a reboot fixes it for one block
> group and then it gets stuck on some other block group, it points to
> an issue in kernel memory state.
Although I haven't paid attention to the block group number in the past,
another run of ``btrfs dev del`` just now gave the same last block group
number (1109204664320) before, presumably, looping.
> What do you get from 'btrfs check --readonly'?
$ btrfs check --readonly --mode lowmem /dev/disk/by-label/pool_16-03
[1/7] checking root items
Opening filesystem to check...
warning, device 6 is missing
Checking filesystem on /dev/disk/by-label/pool_16-03
UUID: 59301fea-434a-4c43-bb45-08fcfe8ce113
[2/7] checking extents
ERROR: extent[1109584044032, 8192] referencer count mismatch (root: 276,
owner: 1154248, offset: 100401152) wanted: 1, have: 0
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space tree
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs done with fs roots in lowmem mode, skipping
[7/7] checking quota groups skipped (not enabled on this FS)
found 4252313206784 bytes used, error(s) found
total csum bytes: 4128183360
total tree bytes: 25053184000
total fs tree bytes: 16415014912
total extent tree bytes: 3662594048
btree space waste bytes: 4949241278
file data blocks allocated: 8025128243200
referenced 7552211206144
Thanks for your help
Lukas
> > > ======================================================================
> > >
> > > filesystem show
> > > ===============
> > >
> > > Label: 'pool_16-03' uuid: 59301fea-434a-xxxx-bb45-08fcfe8ce113
> > > Total devices 8 FS bytes used 3.84TiB
> > > devid 1 size 931.51GiB used 592.00GiB path /dev/mapper/WD-
> > > WCAU45xxxx03
> > > devid 3 size 1.82TiB used 1.37TiB path /dev/mapper/WD-
> > > WCAZAFxxxx78
> > > devid 4 size 931.51GiB used 593.00GiB path /dev/mapper/WD-
> > > WCC4J7xxxxSZ
> > > devid 5 size 1.82TiB used 1.46TiB path /dev/mapper/WD-
> > > WCC4M2xxxxXH
> > > devid 7 size 931.51GiB used 584.00GiB path
> > > /dev/mapper/S1xxxxJ3
> > > devid 9 size 2.73TiB used 2.28TiB path /dev/mapper/WD-
> > > WCC4N3xxxx17
> > > devid 10 size 3.64TiB used 1.03TiB path /dev/mapper/WD-
> > > WCC7K2xxxxNS
> > > *** Some devices missing
> > >
> > > subvolumes
> > > ==========
> > >
> > > ~ 90, of which ~ 60 are read-only snapshots of the other ~ 30
> > >
> > > filesystem usage
> > > ================
> > >
> > > Overall:
> > > Device size: 12.74TiB
> > > Device allocated: 8.36TiB
> > > Device unallocated: 4.38TiB
> > > Device missing: 0.00B
> > > Used: 7.69TiB
> > > Free (estimated): 2.50TiB (min: 2.50TiB)
> > > Free (statfs, df): 1.46TiB
> > > Data ratio: 2.00
> > > Metadata ratio: 2.00
> > > Global reserve: 512.00MiB (used: 48.00KiB)
> > > Multiple profiles: no
> > >
> > > Data,RAID1: Size:4.14TiB, Used:3.82TiB (92.33%)
> > > /dev/mapper/WD-WCAU45xxxx03 584.00GiB
> > > /dev/mapper/WD-WCAZAFxxxx78 1.35TiB
> > > /dev/mapper/WD-WCC4J7xxxxSZ 588.00GiB
> > > /dev/mapper/WD-WCC4M2xxxxXH 1.44TiB
> > > missing 510.00GiB
> > > /dev/mapper/S1xxxxJ3 579.00GiB
> > > /dev/mapper/WD-WCC4N3xxxx17 2.26TiB
> > > /dev/mapper/WD-WCC7K2xxxxNS 1.01TiB
> > >
> > > Metadata,RAID1: Size:41.00GiB, Used:23.14GiB (56.44%)
> > > /dev/mapper/WD-WCAU45xxxx03 8.00GiB
> > > /dev/mapper/WD-WCAZAFxxxx78 17.00GiB
> > > /dev/mapper/WD-WCC4J7xxxxSZ 5.00GiB
> > > /dev/mapper/WD-WCC4M2xxxxXH 13.00GiB
> > > missing 3.00GiB
> > > /dev/mapper/S1xxxxJ3 5.00GiB
> > > /dev/mapper/WD-WCC4N3xxxx17 16.00GiB
> > > /dev/mapper/WD-WCC7K2xxxxNS 15.00GiB
> > >
> > > System,RAID1: Size:32.00MiB, Used:848.00KiB (2.59%)
> > > missing 32.00MiB
> > > /dev/mapper/WD-WCC4N3xxxx17 32.00MiB
> > >
> > > Unallocated:
> > > /dev/mapper/WD-WCAU45xxxx03 339.51GiB
> > > /dev/mapper/WD-WCAZAFxxxx78 461.01GiB
> > > /dev/mapper/WD-WCC4J7xxxxSZ 338.51GiB
> > > /dev/mapper/WD-WCC4M2xxxxXH 373.01GiB
> > > missing -513.03GiB
> > > /dev/mapper/S1xxxxJ3 347.51GiB
> > > /dev/mapper/WD-WCC4N3xxxx17 460.47GiB
> > > /dev/mapper/WD-WCC7K2xxxxNS 2.61TiB
> > >
> > > dump-super
> > > ==========
> > >
> > > superblock: bytenr=65536, device=/dev/mapper/WD-WCAU45xxxx03
> > > ---------------------------------------------------------
> > > csum_type 0 (crc32c)
> > > csum_size 4
> > > csum 0x51beb068 [match]
> > > bytenr 65536
> > > flags 0x1
> > > ( WRITTEN )
> > > magic _BHRfS_M [match]
> > > fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113
> > > metadata_uuid 59301fea-434a-xxxx-bb45-08fcfe8ce113
> > > label pool_16-03
> > > generation 113519755
> > > root 15602414796800
> > > sys_array_size 129
> > > chunk_root_generation 63394299
> > > root_level 1
> > > chunk_root 19216820502528
> > > chunk_root_level 1
> > > log_root 0
> > > log_root_transid 0
> > > log_root_level 0
> > > total_bytes 16003136864256
> > > bytes_used 4227124142080
> > > sectorsize 4096
> > > nodesize 16384
> > > leafsize (deprecated) 16384
> > > stripesize 4096
> > > root_dir 6
> > > num_devices 8
> > > compat_flags 0x0
> > > compat_ro_flags 0x0
> > > incompat_flags 0x371
> > > ( MIXED_BACKREF |
> > > COMPRESS_ZSTD |
> > > BIG_METADATA |
> > > EXTENDED_IREF |
> > > SKINNY_METADATA |
> > > NO_HOLES )
> > > cache_generation 2975866
> > > uuid_tree_generation 113519755
> > > dev_item.uuid a9b2e4ea-404c-xxxx-a450-dc84b0956ce1
> > > dev_item.fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113 [match]
> > > dev_item.type 0
> > > dev_item.total_bytes 1000201740288
> > > dev_item.bytes_used 635655159808
> > > dev_item.io_align 4096
> > > dev_item.io_width 4096
> > > dev_item.sector_size 4096
> > > dev_item.devid 1
> > > dev_item.dev_group 0
> > > dev_item.seek_speed 0
> > > dev_item.bandwidth 0
> > > dev_item.generation 0
> > >
> > > device stats
> > > ============
> > >
> > > [/dev/mapper/WD-WCAU45xxxx03].write_io_errs 0
> > > [/dev/mapper/WD-WCAU45xxxx03].read_io_errs 0
> > > [/dev/mapper/WD-WCAU45xxxx03].flush_io_errs 0
> > > [/dev/mapper/WD-WCAU45xxxx03].corruption_errs 0
> > > [/dev/mapper/WD-WCAU45xxxx03].generation_errs 0
> > > [/dev/mapper/WD-WCAZAFxxxx78].write_io_errs 0
> > > [/dev/mapper/WD-WCAZAFxxxx78].read_io_errs 0
> > > [/dev/mapper/WD-WCAZAFxxxx78].flush_io_errs 0
> > > [/dev/mapper/WD-WCAZAFxxxx78].corruption_errs 0
> > > [/dev/mapper/WD-WCAZAFxxxx78].generation_errs 0
> > > [/dev/mapper/WD-WCC4J7xxxxSZ].write_io_errs 0
> > > [/dev/mapper/WD-WCC4J7xxxxSZ].read_io_errs 1
> > > [/dev/mapper/WD-WCC4J7xxxxSZ].flush_io_errs 0
> > > [/dev/mapper/WD-WCC4J7xxxxSZ].corruption_errs 0
> > > [/dev/mapper/WD-WCC4J7xxxxSZ].generation_errs 0
> > > [/dev/mapper/WD-WCC4M2xxxxXH].write_io_errs 0
> > > [/dev/mapper/WD-WCC4M2xxxxXH].read_io_errs 0
> > > [/dev/mapper/WD-WCC4M2xxxxXH].flush_io_errs 0
> > > [/dev/mapper/WD-WCC4M2xxxxXH].corruption_errs 0
> > > [/dev/mapper/WD-WCC4M2xxxxXH].generation_errs 0
> > > [devid:6].write_io_errs 0
> > > [devid:6].read_io_errs 0
> > > [devid:6].flush_io_errs 0
> > > [devid:6].corruption_errs 72016
> > > [devid:6].generation_errs 100
> > > [/dev/mapper/S1xxxxJ3].write_io_errs 0
> > > [/dev/mapper/S1xxxxJ3].read_io_errs 0
> > > [/dev/mapper/S1xxxxJ3].flush_io_errs 0
> > > [/dev/mapper/S1xxxxJ3].corruption_errs 2
> > > [/dev/mapper/S1xxxxJ3].generation_errs 0
> > > [/dev/mapper/WD-WCC4N3xxxx17].write_io_errs 0
> > > [/dev/mapper/WD-WCC4N3xxxx17].read_io_errs 0
> > > [/dev/mapper/WD-WCC4N3xxxx17].flush_io_errs 0
> > > [/dev/mapper/WD-WCC4N3xxxx17].corruption_errs 0
> > > [/dev/mapper/WD-WCC4N3xxxx17].generation_errs 0
> > > [/dev/mapper/WD-WCC7K2xxxxNS].write_io_errs 0
> > > [/dev/mapper/WD-WCC7K2xxxxNS].read_io_errs 0
> > > [/dev/mapper/WD-WCC7K2xxxxNS].flush_io_errs 0
> > > [/dev/mapper/WD-WCC7K2xxxxNS].corruption_errs 0
> > > [/dev/mapper/WD-WCC7K2xxxxNS].generation_errs 0
> >
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers")
2021-12-02 18:11 ` Zygo Blaxell
2021-12-03 10:14 ` Lukas Pirl
2021-12-05 11:54 ` Lukas Pirl
@ 2021-12-10 13:28 ` Lukas Pirl
2021-12-11 2:53 ` Zygo Blaxell
2 siblings, 1 reply; 8+ messages in thread
From: Lukas Pirl @ 2021-12-10 13:28 UTC (permalink / raw)
To: Zygo Blaxell; +Cc: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 11961 bytes --]
(friendly, humble re-post)
Hello Zygo,
it took me (and the disks) a while to report back; here we go:
On Thu, 2021-12-02 13:11 -0500, Zygo Blaxell wrote as excerpted:
> > On Thu, 2021-11-25 19:06 +0100, Lukas Pirl wrote as excerpted:
> > > Dear btrfs community,
> > >
> > > this is another report of a probably endless balance which loops on
> > > "found 1 extents, stage: update data pointers".
> > >
> > > I observe it on a btrfs RAID 1 on around 7 luks-encrypted spinning
> > > disks (more fs details below) used for storing cold data. One disk
> > > failed physically. Now, I try to "btrfs device delete missing". The
> > > operation runs forever (probably, waited more than 30 days, another
> > > time more than 50 days).
> > >
> > > dmesg says:
> > > [ 22:26] BTRFS info (device dm-1): relocating block group
> > > 1109204664320
> > > flags data|raid1
> > > [ 22:27] BTRFS info (device dm-1): found 4164 extents, stage: move
> > > data
> > > extents
> > > [ +5.476247] BTRFS info (device dm-1): found 4164 extents, stage:
> > > update
> > > data pointers
> > > [ +2.545299] BTRFS info (device dm-1): found 1 extents, stage: update
> > > data
> > > pointers
> > >
> > > and then the last message repeats every ~ .25 seconds ("forever").
> > > Memory and CPU usage are not excessive (most is IO wait, I assume).
> > >
> > > What I have tried:
> > > * Linux 4 (multiple minor versions, don't remember which exactly)
> > > * Linux 5.10
> > > * Linux 5.15
> > > * btrfs-progs v5.15
> > > * remove subvolues (before: ~ 200, after: ~ 90)
> > > * free space cache v1, v2, none
> > > * reboot, restart removal/balance (multiple times)
>
> Does it always happen on the same block group? If so, that points to
> something lurking in your metadata. If a reboot fixes it for one block
> group and then it gets stuck on some other block group, it points to
> an issue in kernel memory state.
Although I haven't paid attention to the block group number in the past,
another run of ``btrfs dev del`` just now gave the same last block group
number (1109204664320) before, presumably, looping.
> What do you get from 'btrfs check --readonly'?
$ btrfs check --readonly --mode lowmem /dev/disk/by-label/pool_16-03
[1/7] checking root items
Opening filesystem to check...
warning, device 6 is missing
Checking filesystem on /dev/disk/by-label/pool_16-03
UUID: 59301fea-434a-4c43-bb45-08fcfe8ce113
[2/7] checking extents
ERROR: extent[1109584044032, 8192] referencer count mismatch (root: 276,
owner: 1154248, offset: 100401152) wanted: 1, have: 0
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space tree
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs done with fs roots in lowmem mode, skipping
[7/7] checking quota groups skipped (not enabled on this FS)
found 4252313206784 bytes used, error(s) found
total csum bytes: 4128183360
total tree bytes: 25053184000
total fs tree bytes: 16415014912
total extent tree bytes: 3662594048
btree space waste bytes: 4949241278
file data blocks allocated: 8025128243200
referenced 7552211206144
So what can be done? ``check --repair``? Or too dangerous? :)
Thanks for your help
Lukas
> > > ======================================================================
> > >
> > > filesystem show
> > > ===============
> > >
> > > Label: 'pool_16-03' uuid: 59301fea-434a-xxxx-bb45-08fcfe8ce113
> > > Total devices 8 FS bytes used 3.84TiB
> > > devid 1 size 931.51GiB used 592.00GiB path /dev/mapper/WD-
> > > WCAU45xxxx03
> > > devid 3 size 1.82TiB used 1.37TiB path /dev/mapper/WD-
> > > WCAZAFxxxx78
> > > devid 4 size 931.51GiB used 593.00GiB path /dev/mapper/WD-
> > > WCC4J7xxxxSZ
> > > devid 5 size 1.82TiB used 1.46TiB path /dev/mapper/WD-
> > > WCC4M2xxxxXH
> > > devid 7 size 931.51GiB used 584.00GiB path
> > > /dev/mapper/S1xxxxJ3
> > > devid 9 size 2.73TiB used 2.28TiB path /dev/mapper/WD-
> > > WCC4N3xxxx17
> > > devid 10 size 3.64TiB used 1.03TiB path /dev/mapper/WD-
> > > WCC7K2xxxxNS
> > > *** Some devices missing
> > >
> > > subvolumes
> > > ==========
> > >
> > > ~ 90, of which ~ 60 are read-only snapshots of the other ~ 30
> > >
> > > filesystem usage
> > > ================
> > >
> > > Overall:
> > > Device size: 12.74TiB
> > > Device allocated: 8.36TiB
> > > Device unallocated: 4.38TiB
> > > Device missing: 0.00B
> > > Used: 7.69TiB
> > > Free (estimated): 2.50TiB (min: 2.50TiB)
> > > Free (statfs, df): 1.46TiB
> > > Data ratio: 2.00
> > > Metadata ratio: 2.00
> > > Global reserve: 512.00MiB (used: 48.00KiB)
> > > Multiple profiles: no
> > >
> > > Data,RAID1: Size:4.14TiB, Used:3.82TiB (92.33%)
> > > /dev/mapper/WD-WCAU45xxxx03 584.00GiB
> > > /dev/mapper/WD-WCAZAFxxxx78 1.35TiB
> > > /dev/mapper/WD-WCC4J7xxxxSZ 588.00GiB
> > > /dev/mapper/WD-WCC4M2xxxxXH 1.44TiB
> > > missing 510.00GiB
> > > /dev/mapper/S1xxxxJ3 579.00GiB
> > > /dev/mapper/WD-WCC4N3xxxx17 2.26TiB
> > > /dev/mapper/WD-WCC7K2xxxxNS 1.01TiB
> > >
> > > Metadata,RAID1: Size:41.00GiB, Used:23.14GiB (56.44%)
> > > /dev/mapper/WD-WCAU45xxxx03 8.00GiB
> > > /dev/mapper/WD-WCAZAFxxxx78 17.00GiB
> > > /dev/mapper/WD-WCC4J7xxxxSZ 5.00GiB
> > > /dev/mapper/WD-WCC4M2xxxxXH 13.00GiB
> > > missing 3.00GiB
> > > /dev/mapper/S1xxxxJ3 5.00GiB
> > > /dev/mapper/WD-WCC4N3xxxx17 16.00GiB
> > > /dev/mapper/WD-WCC7K2xxxxNS 15.00GiB
> > >
> > > System,RAID1: Size:32.00MiB, Used:848.00KiB (2.59%)
> > > missing 32.00MiB
> > > /dev/mapper/WD-WCC4N3xxxx17 32.00MiB
> > >
> > > Unallocated:
> > > /dev/mapper/WD-WCAU45xxxx03 339.51GiB
> > > /dev/mapper/WD-WCAZAFxxxx78 461.01GiB
> > > /dev/mapper/WD-WCC4J7xxxxSZ 338.51GiB
> > > /dev/mapper/WD-WCC4M2xxxxXH 373.01GiB
> > > missing -513.03GiB
> > > /dev/mapper/S1xxxxJ3 347.51GiB
> > > /dev/mapper/WD-WCC4N3xxxx17 460.47GiB
> > > /dev/mapper/WD-WCC7K2xxxxNS 2.61TiB
> > >
> > > dump-super
> > > ==========
> > >
> > > superblock: bytenr=65536, device=/dev/mapper/WD-WCAU45xxxx03
> > > ---------------------------------------------------------
> > > csum_type 0 (crc32c)
> > > csum_size 4
> > > csum 0x51beb068 [match]
> > > bytenr 65536
> > > flags 0x1
> > > ( WRITTEN )
> > > magic _BHRfS_M [match]
> > > fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113
> > > metadata_uuid 59301fea-434a-xxxx-bb45-08fcfe8ce113
> > > label pool_16-03
> > > generation 113519755
> > > root 15602414796800
> > > sys_array_size 129
> > > chunk_root_generation 63394299
> > > root_level 1
> > > chunk_root 19216820502528
> > > chunk_root_level 1
> > > log_root 0
> > > log_root_transid 0
> > > log_root_level 0
> > > total_bytes 16003136864256
> > > bytes_used 4227124142080
> > > sectorsize 4096
> > > nodesize 16384
> > > leafsize (deprecated) 16384
> > > stripesize 4096
> > > root_dir 6
> > > num_devices 8
> > > compat_flags 0x0
> > > compat_ro_flags 0x0
> > > incompat_flags 0x371
> > > ( MIXED_BACKREF |
> > > COMPRESS_ZSTD |
> > > BIG_METADATA |
> > > EXTENDED_IREF |
> > > SKINNY_METADATA |
> > > NO_HOLES )
> > > cache_generation 2975866
> > > uuid_tree_generation 113519755
> > > dev_item.uuid a9b2e4ea-404c-xxxx-a450-dc84b0956ce1
> > > dev_item.fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113 [match]
> > > dev_item.type 0
> > > dev_item.total_bytes 1000201740288
> > > dev_item.bytes_used 635655159808
> > > dev_item.io_align 4096
> > > dev_item.io_width 4096
> > > dev_item.sector_size 4096
> > > dev_item.devid 1
> > > dev_item.dev_group 0
> > > dev_item.seek_speed 0
> > > dev_item.bandwidth 0
> > > dev_item.generation 0
> > >
> > > device stats
> > > ============
> > >
> > > [/dev/mapper/WD-WCAU45xxxx03].write_io_errs 0
> > > [/dev/mapper/WD-WCAU45xxxx03].read_io_errs 0
> > > [/dev/mapper/WD-WCAU45xxxx03].flush_io_errs 0
> > > [/dev/mapper/WD-WCAU45xxxx03].corruption_errs 0
> > > [/dev/mapper/WD-WCAU45xxxx03].generation_errs 0
> > > [/dev/mapper/WD-WCAZAFxxxx78].write_io_errs 0
> > > [/dev/mapper/WD-WCAZAFxxxx78].read_io_errs 0
> > > [/dev/mapper/WD-WCAZAFxxxx78].flush_io_errs 0
> > > [/dev/mapper/WD-WCAZAFxxxx78].corruption_errs 0
> > > [/dev/mapper/WD-WCAZAFxxxx78].generation_errs 0
> > > [/dev/mapper/WD-WCC4J7xxxxSZ].write_io_errs 0
> > > [/dev/mapper/WD-WCC4J7xxxxSZ].read_io_errs 1
> > > [/dev/mapper/WD-WCC4J7xxxxSZ].flush_io_errs 0
> > > [/dev/mapper/WD-WCC4J7xxxxSZ].corruption_errs 0
> > > [/dev/mapper/WD-WCC4J7xxxxSZ].generation_errs 0
> > > [/dev/mapper/WD-WCC4M2xxxxXH].write_io_errs 0
> > > [/dev/mapper/WD-WCC4M2xxxxXH].read_io_errs 0
> > > [/dev/mapper/WD-WCC4M2xxxxXH].flush_io_errs 0
> > > [/dev/mapper/WD-WCC4M2xxxxXH].corruption_errs 0
> > > [/dev/mapper/WD-WCC4M2xxxxXH].generation_errs 0
> > > [devid:6].write_io_errs 0
> > > [devid:6].read_io_errs 0
> > > [devid:6].flush_io_errs 0
> > > [devid:6].corruption_errs 72016
> > > [devid:6].generation_errs 100
> > > [/dev/mapper/S1xxxxJ3].write_io_errs 0
> > > [/dev/mapper/S1xxxxJ3].read_io_errs 0
> > > [/dev/mapper/S1xxxxJ3].flush_io_errs 0
> > > [/dev/mapper/S1xxxxJ3].corruption_errs 2
> > > [/dev/mapper/S1xxxxJ3].generation_errs 0
> > > [/dev/mapper/WD-WCC4N3xxxx17].write_io_errs 0
> > > [/dev/mapper/WD-WCC4N3xxxx17].read_io_errs 0
> > > [/dev/mapper/WD-WCC4N3xxxx17].flush_io_errs 0
> > > [/dev/mapper/WD-WCC4N3xxxx17].corruption_errs 0
> > > [/dev/mapper/WD-WCC4N3xxxx17].generation_errs 0
> > > [/dev/mapper/WD-WCC7K2xxxxNS].write_io_errs 0
> > > [/dev/mapper/WD-WCC7K2xxxxNS].read_io_errs 0
> > > [/dev/mapper/WD-WCC7K2xxxxNS].flush_io_errs 0
> > > [/dev/mapper/WD-WCC7K2xxxxNS].corruption_errs 0
> > > [/dev/mapper/WD-WCC7K2xxxxNS].generation_errs 0
> >
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers")
2021-12-10 13:28 ` Lukas Pirl
@ 2021-12-11 2:53 ` Zygo Blaxell
2021-12-16 20:52 ` Lukas Pirl
0 siblings, 1 reply; 8+ messages in thread
From: Zygo Blaxell @ 2021-12-11 2:53 UTC (permalink / raw)
To: Lukas Pirl; +Cc: linux-btrfs
On Fri, Dec 10, 2021 at 02:28:28PM +0100, Lukas Pirl wrote:
> (friendly, humble re-post)
>
> Hello Zygo,
>
> it took me (and the disks) a while to report back; here we go:
>
> On Thu, 2021-12-02 13:11 -0500, Zygo Blaxell wrote as excerpted:
> > > On Thu, 2021-11-25 19:06 +0100, Lukas Pirl wrote as excerpted:
> > > > Dear btrfs community,
> > > >
> > > > this is another report of a probably endless balance which loops on
> > > > "found 1 extents, stage: update data pointers".
> > > >
> > > > I observe it on a btrfs RAID 1 on around 7 luks-encrypted spinning
> > > > disks (more fs details below) used for storing cold data. One disk
> > > > failed physically. Now, I try to "btrfs device delete missing". The
> > > > operation runs forever (probably, waited more than 30 days, another
> > > > time more than 50 days).
> > > >
> > > > dmesg says:
> > > > [ 22:26] BTRFS info (device dm-1): relocating block group
> > > > 1109204664320
> > > > flags data|raid1
> > > > [ 22:27] BTRFS info (device dm-1): found 4164 extents, stage: move
> > > > data
> > > > extents
> > > > [ +5.476247] BTRFS info (device dm-1): found 4164 extents, stage:
> > > > update
> > > > data pointers
> > > > [ +2.545299] BTRFS info (device dm-1): found 1 extents, stage: update
> > > > data
> > > > pointers
> > > >
> > > > and then the last message repeats every ~ .25 seconds ("forever").
> > > > Memory and CPU usage are not excessive (most is IO wait, I assume).
> > > >
> > > > What I have tried:
> > > > * Linux 4 (multiple minor versions, don't remember which exactly)
> > > > * Linux 5.10
> > > > * Linux 5.15
> > > > * btrfs-progs v5.15
> > > > * remove subvolues (before: ~ 200, after: ~ 90)
> > > > * free space cache v1, v2, none
> > > > * reboot, restart removal/balance (multiple times)
> >
> > Does it always happen on the same block group? If so, that points to
> > something lurking in your metadata. If a reboot fixes it for one block
> > group and then it gets stuck on some other block group, it points to
> > an issue in kernel memory state.
>
> Although I haven't paid attention to the block group number in the past,
> another run of ``btrfs dev del`` just now gave the same last block group
> number (1109204664320) before, presumably, looping.
>
> > What do you get from 'btrfs check --readonly'?
>
> $ btrfs check --readonly --mode lowmem /dev/disk/by-label/pool_16-03
>
> [1/7] checking root items
> Opening filesystem to check...
> warning, device 6 is missing
> Checking filesystem on /dev/disk/by-label/pool_16-03
> UUID: 59301fea-434a-4c43-bb45-08fcfe8ce113
> [2/7] checking extents
> ERROR: extent[1109584044032, 8192] referencer count mismatch (root: 276,
> owner: 1154248, offset: 100401152) wanted: 1, have: 0
> ERROR: errors found in extent allocation tree or chunk allocation
> [3/7] checking free space tree
> [4/7] checking fs roots
> [5/7] checking only csums items (without verifying data)
> [6/7] checking root refs done with fs roots in lowmem mode, skipping
> [7/7] checking quota groups skipped (not enabled on this FS)
> found 4252313206784 bytes used, error(s) found
> total csum bytes: 4128183360
> total tree bytes: 25053184000
> total fs tree bytes: 16415014912
> total extent tree bytes: 3662594048
> btree space waste bytes: 4949241278
> file data blocks allocated: 8025128243200
> referenced 7552211206144
OK that's not too bad, just one bad reference.
> So what can be done? ``check --repair``? Or too dangerous? :)
If you have backups are you are prepared to restore them, you can try
check --repair.
> Thanks for your help
>
> Lukas
>
> > > > ======================================================================
> > > >
> > > > filesystem show
> > > > ===============
> > > >
> > > > Label: 'pool_16-03' uuid: 59301fea-434a-xxxx-bb45-08fcfe8ce113
> > > > Total devices 8 FS bytes used 3.84TiB
> > > > devid 1 size 931.51GiB used 592.00GiB path /dev/mapper/WD-
> > > > WCAU45xxxx03
> > > > devid 3 size 1.82TiB used 1.37TiB path /dev/mapper/WD-
> > > > WCAZAFxxxx78
> > > > devid 4 size 931.51GiB used 593.00GiB path /dev/mapper/WD-
> > > > WCC4J7xxxxSZ
> > > > devid 5 size 1.82TiB used 1.46TiB path /dev/mapper/WD-
> > > > WCC4M2xxxxXH
> > > > devid 7 size 931.51GiB used 584.00GiB path
> > > > /dev/mapper/S1xxxxJ3
> > > > devid 9 size 2.73TiB used 2.28TiB path /dev/mapper/WD-
> > > > WCC4N3xxxx17
> > > > devid 10 size 3.64TiB used 1.03TiB path /dev/mapper/WD-
> > > > WCC7K2xxxxNS
> > > > *** Some devices missing
> > > >
> > > > subvolumes
> > > > ==========
> > > >
> > > > ~ 90, of which ~ 60 are read-only snapshots of the other ~ 30
> > > >
> > > > filesystem usage
> > > > ================
> > > >
> > > > Overall:
> > > > Device size: 12.74TiB
> > > > Device allocated: 8.36TiB
> > > > Device unallocated: 4.38TiB
> > > > Device missing: 0.00B
> > > > Used: 7.69TiB
> > > > Free (estimated): 2.50TiB (min: 2.50TiB)
> > > > Free (statfs, df): 1.46TiB
> > > > Data ratio: 2.00
> > > > Metadata ratio: 2.00
> > > > Global reserve: 512.00MiB (used: 48.00KiB)
> > > > Multiple profiles: no
> > > >
> > > > Data,RAID1: Size:4.14TiB, Used:3.82TiB (92.33%)
> > > > /dev/mapper/WD-WCAU45xxxx03 584.00GiB
> > > > /dev/mapper/WD-WCAZAFxxxx78 1.35TiB
> > > > /dev/mapper/WD-WCC4J7xxxxSZ 588.00GiB
> > > > /dev/mapper/WD-WCC4M2xxxxXH 1.44TiB
> > > > missing 510.00GiB
> > > > /dev/mapper/S1xxxxJ3 579.00GiB
> > > > /dev/mapper/WD-WCC4N3xxxx17 2.26TiB
> > > > /dev/mapper/WD-WCC7K2xxxxNS 1.01TiB
> > > >
> > > > Metadata,RAID1: Size:41.00GiB, Used:23.14GiB (56.44%)
> > > > /dev/mapper/WD-WCAU45xxxx03 8.00GiB
> > > > /dev/mapper/WD-WCAZAFxxxx78 17.00GiB
> > > > /dev/mapper/WD-WCC4J7xxxxSZ 5.00GiB
> > > > /dev/mapper/WD-WCC4M2xxxxXH 13.00GiB
> > > > missing 3.00GiB
> > > > /dev/mapper/S1xxxxJ3 5.00GiB
> > > > /dev/mapper/WD-WCC4N3xxxx17 16.00GiB
> > > > /dev/mapper/WD-WCC7K2xxxxNS 15.00GiB
> > > >
> > > > System,RAID1: Size:32.00MiB, Used:848.00KiB (2.59%)
> > > > missing 32.00MiB
> > > > /dev/mapper/WD-WCC4N3xxxx17 32.00MiB
> > > >
> > > > Unallocated:
> > > > /dev/mapper/WD-WCAU45xxxx03 339.51GiB
> > > > /dev/mapper/WD-WCAZAFxxxx78 461.01GiB
> > > > /dev/mapper/WD-WCC4J7xxxxSZ 338.51GiB
> > > > /dev/mapper/WD-WCC4M2xxxxXH 373.01GiB
> > > > missing -513.03GiB
> > > > /dev/mapper/S1xxxxJ3 347.51GiB
> > > > /dev/mapper/WD-WCC4N3xxxx17 460.47GiB
> > > > /dev/mapper/WD-WCC7K2xxxxNS 2.61TiB
> > > >
> > > > dump-super
> > > > ==========
> > > >
> > > > superblock: bytenr=65536, device=/dev/mapper/WD-WCAU45xxxx03
> > > > ---------------------------------------------------------
> > > > csum_type 0 (crc32c)
> > > > csum_size 4
> > > > csum 0x51beb068 [match]
> > > > bytenr 65536
> > > > flags 0x1
> > > > ( WRITTEN )
> > > > magic _BHRfS_M [match]
> > > > fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113
> > > > metadata_uuid 59301fea-434a-xxxx-bb45-08fcfe8ce113
> > > > label pool_16-03
> > > > generation 113519755
> > > > root 15602414796800
> > > > sys_array_size 129
> > > > chunk_root_generation 63394299
> > > > root_level 1
> > > > chunk_root 19216820502528
> > > > chunk_root_level 1
> > > > log_root 0
> > > > log_root_transid 0
> > > > log_root_level 0
> > > > total_bytes 16003136864256
> > > > bytes_used 4227124142080
> > > > sectorsize 4096
> > > > nodesize 16384
> > > > leafsize (deprecated) 16384
> > > > stripesize 4096
> > > > root_dir 6
> > > > num_devices 8
> > > > compat_flags 0x0
> > > > compat_ro_flags 0x0
> > > > incompat_flags 0x371
> > > > ( MIXED_BACKREF |
> > > > COMPRESS_ZSTD |
> > > > BIG_METADATA |
> > > > EXTENDED_IREF |
> > > > SKINNY_METADATA |
> > > > NO_HOLES )
> > > > cache_generation 2975866
> > > > uuid_tree_generation 113519755
> > > > dev_item.uuid a9b2e4ea-404c-xxxx-a450-dc84b0956ce1
> > > > dev_item.fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113 [match]
> > > > dev_item.type 0
> > > > dev_item.total_bytes 1000201740288
> > > > dev_item.bytes_used 635655159808
> > > > dev_item.io_align 4096
> > > > dev_item.io_width 4096
> > > > dev_item.sector_size 4096
> > > > dev_item.devid 1
> > > > dev_item.dev_group 0
> > > > dev_item.seek_speed 0
> > > > dev_item.bandwidth 0
> > > > dev_item.generation 0
> > > >
> > > > device stats
> > > > ============
> > > >
> > > > [/dev/mapper/WD-WCAU45xxxx03].write_io_errs 0
> > > > [/dev/mapper/WD-WCAU45xxxx03].read_io_errs 0
> > > > [/dev/mapper/WD-WCAU45xxxx03].flush_io_errs 0
> > > > [/dev/mapper/WD-WCAU45xxxx03].corruption_errs 0
> > > > [/dev/mapper/WD-WCAU45xxxx03].generation_errs 0
> > > > [/dev/mapper/WD-WCAZAFxxxx78].write_io_errs 0
> > > > [/dev/mapper/WD-WCAZAFxxxx78].read_io_errs 0
> > > > [/dev/mapper/WD-WCAZAFxxxx78].flush_io_errs 0
> > > > [/dev/mapper/WD-WCAZAFxxxx78].corruption_errs 0
> > > > [/dev/mapper/WD-WCAZAFxxxx78].generation_errs 0
> > > > [/dev/mapper/WD-WCC4J7xxxxSZ].write_io_errs 0
> > > > [/dev/mapper/WD-WCC4J7xxxxSZ].read_io_errs 1
> > > > [/dev/mapper/WD-WCC4J7xxxxSZ].flush_io_errs 0
> > > > [/dev/mapper/WD-WCC4J7xxxxSZ].corruption_errs 0
> > > > [/dev/mapper/WD-WCC4J7xxxxSZ].generation_errs 0
> > > > [/dev/mapper/WD-WCC4M2xxxxXH].write_io_errs 0
> > > > [/dev/mapper/WD-WCC4M2xxxxXH].read_io_errs 0
> > > > [/dev/mapper/WD-WCC4M2xxxxXH].flush_io_errs 0
> > > > [/dev/mapper/WD-WCC4M2xxxxXH].corruption_errs 0
> > > > [/dev/mapper/WD-WCC4M2xxxxXH].generation_errs 0
> > > > [devid:6].write_io_errs 0
> > > > [devid:6].read_io_errs 0
> > > > [devid:6].flush_io_errs 0
> > > > [devid:6].corruption_errs 72016
> > > > [devid:6].generation_errs 100
> > > > [/dev/mapper/S1xxxxJ3].write_io_errs 0
> > > > [/dev/mapper/S1xxxxJ3].read_io_errs 0
> > > > [/dev/mapper/S1xxxxJ3].flush_io_errs 0
> > > > [/dev/mapper/S1xxxxJ3].corruption_errs 2
> > > > [/dev/mapper/S1xxxxJ3].generation_errs 0
> > > > [/dev/mapper/WD-WCC4N3xxxx17].write_io_errs 0
> > > > [/dev/mapper/WD-WCC4N3xxxx17].read_io_errs 0
> > > > [/dev/mapper/WD-WCC4N3xxxx17].flush_io_errs 0
> > > > [/dev/mapper/WD-WCC4N3xxxx17].corruption_errs 0
> > > > [/dev/mapper/WD-WCC4N3xxxx17].generation_errs 0
> > > > [/dev/mapper/WD-WCC7K2xxxxNS].write_io_errs 0
> > > > [/dev/mapper/WD-WCC7K2xxxxNS].read_io_errs 0
> > > > [/dev/mapper/WD-WCC7K2xxxxNS].flush_io_errs 0
> > > > [/dev/mapper/WD-WCC7K2xxxxNS].corruption_errs 0
> > > > [/dev/mapper/WD-WCC7K2xxxxNS].generation_errs 0
> > >
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers")
2021-12-11 2:53 ` Zygo Blaxell
@ 2021-12-16 20:52 ` Lukas Pirl
0 siblings, 0 replies; 8+ messages in thread
From: Lukas Pirl @ 2021-12-16 20:52 UTC (permalink / raw)
To: Zygo Blaxell; +Cc: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 13752 bytes --]
On Fri, 2021-12-10 21:53 -0500, Zygo Blaxell wrote as excerpted:
> On Fri, Dec 10, 2021 at 02:28:28PM +0100, Lukas Pirl wrote:
> >
>
> > it took me (and the disks) a while to report back; here we go:
> >
> > On Thu, 2021-12-02 13:11 -0500, Zygo Blaxell wrote as excerpted:
> > > > On Thu, 2021-11-25 19:06 +0100, Lukas Pirl wrote as excerpted:
> > > > > Dear btrfs community,
> > > > >
> > > > > this is another report of a probably endless balance which loops on
> > > > > "found 1 extents, stage: update data pointers".
> > > > >
> > > > > I observe it on a btrfs RAID 1 on around 7 luks-encrypted spinning
> > > > > disks (more fs details below) used for storing cold data. One disk
> > > > > failed physically. Now, I try to "btrfs device delete missing". The
> > > > > operation runs forever (probably, waited more than 30 days, another
> > > > > time more than 50 days).
> > > > >
> > > > > dmesg says:
> > > > > [ 22:26] BTRFS info (device dm-1): relocating block group
> > > > > 1109204664320
> > > > > flags data|raid1
> > > > > [ 22:27] BTRFS info (device dm-1): found 4164 extents, stage:
> > > > > move
> > > > > data
> > > > > extents
> > > > > [ +5.476247] BTRFS info (device dm-1): found 4164 extents, stage:
> > > > > update
> > > > > data pointers
> > > > > [ +2.545299] BTRFS info (device dm-1): found 1 extents, stage:
> > > > > update
> > > > > data
> > > > > pointers
> > > > >
> > > > > and then the last message repeats every ~ .25 seconds ("forever").
> > > > > Memory and CPU usage are not excessive (most is IO wait, I assume).
> > > > >
> > > > > What I have tried:
> > > > > * Linux 4 (multiple minor versions, don't remember which exactly)
> > > > > * Linux 5.10
> > > > > * Linux 5.15
> > > > > * btrfs-progs v5.15
> > > > > * remove subvolues (before: ~ 200, after: ~ 90)
> > > > > * free space cache v1, v2, none
> > > > > * reboot, restart removal/balance (multiple times)
> > >
> > > Does it always happen on the same block group? If so, that points to
> > > something lurking in your metadata. If a reboot fixes it for one block
> > > group and then it gets stuck on some other block group, it points to
> > > an issue in kernel memory state.
> >
> > Although I haven't paid attention to the block group number in the past,
> > another run of ``btrfs dev del`` just now gave the same last block group
> > number (1109204664320) before, presumably, looping.
> >
> > > What do you get from 'btrfs check --readonly'?
> >
> > $ btrfs check --readonly --mode lowmem /dev/disk/by-label/pool_16-03
> >
> > [1/7] checking root items
> > Opening filesystem to check...
> > warning, device 6 is missing
> > Checking filesystem on /dev/disk/by-label/pool_16-03
> > UUID: 59301fea-434a-4c43-bb45-08fcfe8ce113
> > [2/7] checking extents
> > ERROR: extent[1109584044032, 8192] referencer count mismatch (root: 276,
> > owner: 1154248, offset: 100401152) wanted: 1, have: 0
> > ERROR: errors found in extent allocation tree or chunk allocation
> > [3/7] checking free space tree
> > [4/7] checking fs roots
> > [5/7] checking only csums items (without verifying data)
> > [6/7] checking root refs done with fs roots in lowmem mode, skipping
> > [7/7] checking quota groups skipped (not enabled on this FS)
> > found 4252313206784 bytes used, error(s) found
> > total csum bytes: 4128183360
> > total tree bytes: 25053184000
> > total fs tree bytes: 16415014912
> > total extent tree bytes: 3662594048
> > btree space waste bytes: 4949241278
> > file data blocks allocated: 8025128243200
> > referenced 7552211206144
>
> OK that's not too bad, just one bad reference.
>
> > So what can be done? ``check --repair``? Or too dangerous? :)
>
> If you have backups are you are prepared to restore them, you can try
> check --repair.
To report back, all I got from ``check --repair`` was a segfault (progs
v5.15.1, Linux 5.15.0):
…
[2/7] checking extents
ERROR: extent[1109584044032, 8192] referencer count mismatch (root: 276,
owner: 1154248, offset: 100401152) wanted: 1, have: 0
zsh: segmentation fault btrfs check --repair --mode lowmem /dev/disk/by-
label/…
Any open, known, and possibly related bugs I can wait for to be fixed to try
again? :)
Cheers,
Lukas
> > Thanks for your help
> >
> > Lukas
> >
> > > > > ====================================================================
> > > > > ==
> > > > >
> > > > > filesystem show
> > > > > ===============
> > > > >
> > > > > Label: 'pool_16-03' uuid: 59301fea-434a-xxxx-bb45-08fcfe8ce113
> > > > > Total devices 8 FS bytes used 3.84TiB
> > > > > devid 1 size 931.51GiB used 592.00GiB path
> > > > > /dev/mapper/WD-
> > > > > WCAU45xxxx03
> > > > > devid 3 size 1.82TiB used 1.37TiB path /dev/mapper/WD-
> > > > > WCAZAFxxxx78
> > > > > devid 4 size 931.51GiB used 593.00GiB path
> > > > > /dev/mapper/WD-
> > > > > WCC4J7xxxxSZ
> > > > > devid 5 size 1.82TiB used 1.46TiB path /dev/mapper/WD-
> > > > > WCC4M2xxxxXH
> > > > > devid 7 size 931.51GiB used 584.00GiB path
> > > > > /dev/mapper/S1xxxxJ3
> > > > > devid 9 size 2.73TiB used 2.28TiB path /dev/mapper/WD-
> > > > > WCC4N3xxxx17
> > > > > devid 10 size 3.64TiB used 1.03TiB path /dev/mapper/WD-
> > > > > WCC7K2xxxxNS
> > > > > *** Some devices missing
> > > > >
> > > > > subvolumes
> > > > > ==========
> > > > >
> > > > > ~ 90, of which ~ 60 are read-only snapshots of the other ~ 30
> > > > >
> > > > > filesystem usage
> > > > > ================
> > > > >
> > > > > Overall:
> > > > > Device size: 12.74TiB
> > > > > Device allocated: 8.36TiB
> > > > > Device unallocated: 4.38TiB
> > > > > Device missing: 0.00B
> > > > > Used: 7.69TiB
> > > > > Free (estimated): 2.50TiB (min: 2.50TiB)
> > > > > Free (statfs, df): 1.46TiB
> > > > > Data ratio: 2.00
> > > > > Metadata ratio: 2.00
> > > > > Global reserve: 512.00MiB (used: 48.00KiB)
> > > > > Multiple profiles: no
> > > > >
> > > > > Data,RAID1: Size:4.14TiB, Used:3.82TiB (92.33%)
> > > > > /dev/mapper/WD-WCAU45xxxx03 584.00GiB
> > > > > /dev/mapper/WD-WCAZAFxxxx78 1.35TiB
> > > > > /dev/mapper/WD-WCC4J7xxxxSZ 588.00GiB
> > > > > /dev/mapper/WD-WCC4M2xxxxXH 1.44TiB
> > > > > missing 510.00GiB
> > > > > /dev/mapper/S1xxxxJ3 579.00GiB
> > > > > /dev/mapper/WD-WCC4N3xxxx17 2.26TiB
> > > > > /dev/mapper/WD-WCC7K2xxxxNS 1.01TiB
> > > > >
> > > > > Metadata,RAID1: Size:41.00GiB, Used:23.14GiB (56.44%)
> > > > > /dev/mapper/WD-WCAU45xxxx03 8.00GiB
> > > > > /dev/mapper/WD-WCAZAFxxxx78 17.00GiB
> > > > > /dev/mapper/WD-WCC4J7xxxxSZ 5.00GiB
> > > > > /dev/mapper/WD-WCC4M2xxxxXH 13.00GiB
> > > > > missing 3.00GiB
> > > > > /dev/mapper/S1xxxxJ3 5.00GiB
> > > > > /dev/mapper/WD-WCC4N3xxxx17 16.00GiB
> > > > > /dev/mapper/WD-WCC7K2xxxxNS 15.00GiB
> > > > >
> > > > > System,RAID1: Size:32.00MiB, Used:848.00KiB (2.59%)
> > > > > missing 32.00MiB
> > > > > /dev/mapper/WD-WCC4N3xxxx17 32.00MiB
> > > > >
> > > > > Unallocated:
> > > > > /dev/mapper/WD-WCAU45xxxx03 339.51GiB
> > > > > /dev/mapper/WD-WCAZAFxxxx78 461.01GiB
> > > > > /dev/mapper/WD-WCC4J7xxxxSZ 338.51GiB
> > > > > /dev/mapper/WD-WCC4M2xxxxXH 373.01GiB
> > > > > missing -513.03GiB
> > > > > /dev/mapper/S1xxxxJ3 347.51GiB
> > > > > /dev/mapper/WD-WCC4N3xxxx17 460.47GiB
> > > > > /dev/mapper/WD-WCC7K2xxxxNS 2.61TiB
> > > > >
> > > > > dump-super
> > > > > ==========
> > > > >
> > > > > superblock: bytenr=65536, device=/dev/mapper/WD-WCAU45xxxx03
> > > > > ---------------------------------------------------------
> > > > > csum_type 0 (crc32c)
> > > > > csum_size 4
> > > > > csum 0x51beb068 [match]
> > > > > bytenr 65536
> > > > > flags 0x1
> > > > > ( WRITTEN )
> > > > > magic _BHRfS_M [match]
> > > > > fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113
> > > > > metadata_uuid 59301fea-434a-xxxx-bb45-08fcfe8ce113
> > > > > label pool_16-03
> > > > > generation 113519755
> > > > > root 15602414796800
> > > > > sys_array_size 129
> > > > > chunk_root_generation 63394299
> > > > > root_level 1
> > > > > chunk_root 19216820502528
> > > > > chunk_root_level 1
> > > > > log_root 0
> > > > > log_root_transid 0
> > > > > log_root_level 0
> > > > > total_bytes 16003136864256
> > > > > bytes_used 4227124142080
> > > > > sectorsize 4096
> > > > > nodesize 16384
> > > > > leafsize (deprecated) 16384
> > > > > stripesize 4096
> > > > > root_dir 6
> > > > > num_devices 8
> > > > > compat_flags 0x0
> > > > > compat_ro_flags 0x0
> > > > > incompat_flags 0x371
> > > > > ( MIXED_BACKREF |
> > > > > COMPRESS_ZSTD |
> > > > > BIG_METADATA |
> > > > > EXTENDED_IREF |
> > > > > SKINNY_METADATA |
> > > > > NO_HOLES )
> > > > > cache_generation 2975866
> > > > > uuid_tree_generation 113519755
> > > > > dev_item.uuid a9b2e4ea-404c-xxxx-a450-dc84b0956ce1
> > > > > dev_item.fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113 [match]
> > > > > dev_item.type 0
> > > > > dev_item.total_bytes 1000201740288
> > > > > dev_item.bytes_used 635655159808
> > > > > dev_item.io_align 4096
> > > > > dev_item.io_width 4096
> > > > > dev_item.sector_size 4096
> > > > > dev_item.devid 1
> > > > > dev_item.dev_group 0
> > > > > dev_item.seek_speed 0
> > > > > dev_item.bandwidth 0
> > > > > dev_item.generation 0
> > > > >
> > > > > device stats
> > > > > ============
> > > > >
> > > > > [/dev/mapper/WD-WCAU45xxxx03].write_io_errs 0
> > > > > [/dev/mapper/WD-WCAU45xxxx03].read_io_errs 0
> > > > > [/dev/mapper/WD-WCAU45xxxx03].flush_io_errs 0
> > > > > [/dev/mapper/WD-WCAU45xxxx03].corruption_errs 0
> > > > > [/dev/mapper/WD-WCAU45xxxx03].generation_errs 0
> > > > > [/dev/mapper/WD-WCAZAFxxxx78].write_io_errs 0
> > > > > [/dev/mapper/WD-WCAZAFxxxx78].read_io_errs 0
> > > > > [/dev/mapper/WD-WCAZAFxxxx78].flush_io_errs 0
> > > > > [/dev/mapper/WD-WCAZAFxxxx78].corruption_errs 0
> > > > > [/dev/mapper/WD-WCAZAFxxxx78].generation_errs 0
> > > > > [/dev/mapper/WD-WCC4J7xxxxSZ].write_io_errs 0
> > > > > [/dev/mapper/WD-WCC4J7xxxxSZ].read_io_errs 1
> > > > > [/dev/mapper/WD-WCC4J7xxxxSZ].flush_io_errs 0
> > > > > [/dev/mapper/WD-WCC4J7xxxxSZ].corruption_errs 0
> > > > > [/dev/mapper/WD-WCC4J7xxxxSZ].generation_errs 0
> > > > > [/dev/mapper/WD-WCC4M2xxxxXH].write_io_errs 0
> > > > > [/dev/mapper/WD-WCC4M2xxxxXH].read_io_errs 0
> > > > > [/dev/mapper/WD-WCC4M2xxxxXH].flush_io_errs 0
> > > > > [/dev/mapper/WD-WCC4M2xxxxXH].corruption_errs 0
> > > > > [/dev/mapper/WD-WCC4M2xxxxXH].generation_errs 0
> > > > > [devid:6].write_io_errs 0
> > > > > [devid:6].read_io_errs 0
> > > > > [devid:6].flush_io_errs 0
> > > > > [devid:6].corruption_errs 72016
> > > > > [devid:6].generation_errs 100
> > > > > [/dev/mapper/S1xxxxJ3].write_io_errs 0
> > > > > [/dev/mapper/S1xxxxJ3].read_io_errs 0
> > > > > [/dev/mapper/S1xxxxJ3].flush_io_errs 0
> > > > > [/dev/mapper/S1xxxxJ3].corruption_errs 2
> > > > > [/dev/mapper/S1xxxxJ3].generation_errs 0
> > > > > [/dev/mapper/WD-WCC4N3xxxx17].write_io_errs 0
> > > > > [/dev/mapper/WD-WCC4N3xxxx17].read_io_errs 0
> > > > > [/dev/mapper/WD-WCC4N3xxxx17].flush_io_errs 0
> > > > > [/dev/mapper/WD-WCC4N3xxxx17].corruption_errs 0
> > > > > [/dev/mapper/WD-WCC4N3xxxx17].generation_errs 0
> > > > > [/dev/mapper/WD-WCC7K2xxxxNS].write_io_errs 0
> > > > > [/dev/mapper/WD-WCC7K2xxxxNS].read_io_errs 0
> > > > > [/dev/mapper/WD-WCC7K2xxxxNS].flush_io_errs 0
> > > > > [/dev/mapper/WD-WCC7K2xxxxNS].corruption_errs 0
> > > > > [/dev/mapper/WD-WCC7K2xxxxNS].generation_errs 0
> > > >
> >
> >
>
>
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-12-16 20:53 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-25 18:06 Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers") Lukas Pirl
2021-12-02 14:49 ` Lukas Pirl
2021-12-02 18:11 ` Zygo Blaxell
2021-12-03 10:14 ` Lukas Pirl
2021-12-05 11:54 ` Lukas Pirl
2021-12-10 13:28 ` Lukas Pirl
2021-12-11 2:53 ` Zygo Blaxell
2021-12-16 20:52 ` Lukas Pirl
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.