* Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers") @ 2021-11-25 18:06 Lukas Pirl 2021-12-02 14:49 ` Lukas Pirl 0 siblings, 1 reply; 8+ messages in thread From: Lukas Pirl @ 2021-11-25 18:06 UTC (permalink / raw) To: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 7485 bytes --] Dear btrfs community, this is another report of a probably endless balance which loops on "found 1 extents, stage: update data pointers". I observe it on a btrfs RAID 1 on around 7 luks-encrypted spinning disks (more fs details below) used for storing cold data. One disk failed physically. Now, I try to "btrfs device delete missing". The operation runs forever (probably, waited more than 30 days, another time more than 50 days). dmesg says: [ 22:26] BTRFS info (device dm-1): relocating block group 1109204664320 flags data|raid1 [ 22:27] BTRFS info (device dm-1): found 4164 extents, stage: move data extents [ +5.476247] BTRFS info (device dm-1): found 4164 extents, stage: update data pointers [ +2.545299] BTRFS info (device dm-1): found 1 extents, stage: update data pointers and then the last message repeats every ~ .25 seconds ("forever"). Memory and CPU usage are not excessive (most is IO wait, I assume). What I have tried: * Linux 4 (multiple minor versions, don't remember which exactly) * Linux 5.10 * Linux 5.15 * btrfs-progs v5.15 * remove subvolues (before: ~ 200, after: ~ 90) * free space cache v1, v2, none * reboot, restart removal/balance (multiple times) How can we find the problem here to make btrfs an even more stable file system in the future? (The particular fs has been created 2016, I am otherwise happy with btrfs and advocating; BTW I have backups and are ready to use them) Another question I was asking myself: can btrfs be forced to forget about a device (as in "delete from meta data) to then just run a regular balance? Thanks in advance; I hope we can debug this. Lukas ====================================================================== filesystem show =============== Label: 'pool_16-03' uuid: 59301fea-434a-xxxx-bb45-08fcfe8ce113 Total devices 8 FS bytes used 3.84TiB devid 1 size 931.51GiB used 592.00GiB path /dev/mapper/WD- WCAU45xxxx03 devid 3 size 1.82TiB used 1.37TiB path /dev/mapper/WD-WCAZAFxxxx78 devid 4 size 931.51GiB used 593.00GiB path /dev/mapper/WD- WCC4J7xxxxSZ devid 5 size 1.82TiB used 1.46TiB path /dev/mapper/WD-WCC4M2xxxxXH devid 7 size 931.51GiB used 584.00GiB path /dev/mapper/S1xxxxJ3 devid 9 size 2.73TiB used 2.28TiB path /dev/mapper/WD-WCC4N3xxxx17 devid 10 size 3.64TiB used 1.03TiB path /dev/mapper/WD-WCC7K2xxxxNS *** Some devices missing subvolumes ========== ~ 90, of which ~ 60 are read-only snapshots of the other ~ 30 filesystem usage ================ Overall: Device size: 12.74TiB Device allocated: 8.36TiB Device unallocated: 4.38TiB Device missing: 0.00B Used: 7.69TiB Free (estimated): 2.50TiB (min: 2.50TiB) Free (statfs, df): 1.46TiB Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 48.00KiB) Multiple profiles: no Data,RAID1: Size:4.14TiB, Used:3.82TiB (92.33%) /dev/mapper/WD-WCAU45xxxx03 584.00GiB /dev/mapper/WD-WCAZAFxxxx78 1.35TiB /dev/mapper/WD-WCC4J7xxxxSZ 588.00GiB /dev/mapper/WD-WCC4M2xxxxXH 1.44TiB missing 510.00GiB /dev/mapper/S1xxxxJ3 579.00GiB /dev/mapper/WD-WCC4N3xxxx17 2.26TiB /dev/mapper/WD-WCC7K2xxxxNS 1.01TiB Metadata,RAID1: Size:41.00GiB, Used:23.14GiB (56.44%) /dev/mapper/WD-WCAU45xxxx03 8.00GiB /dev/mapper/WD-WCAZAFxxxx78 17.00GiB /dev/mapper/WD-WCC4J7xxxxSZ 5.00GiB /dev/mapper/WD-WCC4M2xxxxXH 13.00GiB missing 3.00GiB /dev/mapper/S1xxxxJ3 5.00GiB /dev/mapper/WD-WCC4N3xxxx17 16.00GiB /dev/mapper/WD-WCC7K2xxxxNS 15.00GiB System,RAID1: Size:32.00MiB, Used:848.00KiB (2.59%) missing 32.00MiB /dev/mapper/WD-WCC4N3xxxx17 32.00MiB Unallocated: /dev/mapper/WD-WCAU45xxxx03 339.51GiB /dev/mapper/WD-WCAZAFxxxx78 461.01GiB /dev/mapper/WD-WCC4J7xxxxSZ 338.51GiB /dev/mapper/WD-WCC4M2xxxxXH 373.01GiB missing -513.03GiB /dev/mapper/S1xxxxJ3 347.51GiB /dev/mapper/WD-WCC4N3xxxx17 460.47GiB /dev/mapper/WD-WCC7K2xxxxNS 2.61TiB dump-super ========== superblock: bytenr=65536, device=/dev/mapper/WD-WCAU45xxxx03 --------------------------------------------------------- csum_type 0 (crc32c) csum_size 4 csum 0x51beb068 [match] bytenr 65536 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113 metadata_uuid 59301fea-434a-xxxx-bb45-08fcfe8ce113 label pool_16-03 generation 113519755 root 15602414796800 sys_array_size 129 chunk_root_generation 63394299 root_level 1 chunk_root 19216820502528 chunk_root_level 1 log_root 0 log_root_transid 0 log_root_level 0 total_bytes 16003136864256 bytes_used 4227124142080 sectorsize 4096 nodesize 16384 leafsize (deprecated) 16384 stripesize 4096 root_dir 6 num_devices 8 compat_flags 0x0 compat_ro_flags 0x0 incompat_flags 0x371 ( MIXED_BACKREF | COMPRESS_ZSTD | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA | NO_HOLES ) cache_generation 2975866 uuid_tree_generation 113519755 dev_item.uuid a9b2e4ea-404c-xxxx-a450-dc84b0956ce1 dev_item.fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113 [match] dev_item.type 0 dev_item.total_bytes 1000201740288 dev_item.bytes_used 635655159808 dev_item.io_align 4096 dev_item.io_width 4096 dev_item.sector_size 4096 dev_item.devid 1 dev_item.dev_group 0 dev_item.seek_speed 0 dev_item.bandwidth 0 dev_item.generation 0 device stats ============ [/dev/mapper/WD-WCAU45xxxx03].write_io_errs 0 [/dev/mapper/WD-WCAU45xxxx03].read_io_errs 0 [/dev/mapper/WD-WCAU45xxxx03].flush_io_errs 0 [/dev/mapper/WD-WCAU45xxxx03].corruption_errs 0 [/dev/mapper/WD-WCAU45xxxx03].generation_errs 0 [/dev/mapper/WD-WCAZAFxxxx78].write_io_errs 0 [/dev/mapper/WD-WCAZAFxxxx78].read_io_errs 0 [/dev/mapper/WD-WCAZAFxxxx78].flush_io_errs 0 [/dev/mapper/WD-WCAZAFxxxx78].corruption_errs 0 [/dev/mapper/WD-WCAZAFxxxx78].generation_errs 0 [/dev/mapper/WD-WCC4J7xxxxSZ].write_io_errs 0 [/dev/mapper/WD-WCC4J7xxxxSZ].read_io_errs 1 [/dev/mapper/WD-WCC4J7xxxxSZ].flush_io_errs 0 [/dev/mapper/WD-WCC4J7xxxxSZ].corruption_errs 0 [/dev/mapper/WD-WCC4J7xxxxSZ].generation_errs 0 [/dev/mapper/WD-WCC4M2xxxxXH].write_io_errs 0 [/dev/mapper/WD-WCC4M2xxxxXH].read_io_errs 0 [/dev/mapper/WD-WCC4M2xxxxXH].flush_io_errs 0 [/dev/mapper/WD-WCC4M2xxxxXH].corruption_errs 0 [/dev/mapper/WD-WCC4M2xxxxXH].generation_errs 0 [devid:6].write_io_errs 0 [devid:6].read_io_errs 0 [devid:6].flush_io_errs 0 [devid:6].corruption_errs 72016 [devid:6].generation_errs 100 [/dev/mapper/S1xxxxJ3].write_io_errs 0 [/dev/mapper/S1xxxxJ3].read_io_errs 0 [/dev/mapper/S1xxxxJ3].flush_io_errs 0 [/dev/mapper/S1xxxxJ3].corruption_errs 2 [/dev/mapper/S1xxxxJ3].generation_errs 0 [/dev/mapper/WD-WCC4N3xxxx17].write_io_errs 0 [/dev/mapper/WD-WCC4N3xxxx17].read_io_errs 0 [/dev/mapper/WD-WCC4N3xxxx17].flush_io_errs 0 [/dev/mapper/WD-WCC4N3xxxx17].corruption_errs 0 [/dev/mapper/WD-WCC4N3xxxx17].generation_errs 0 [/dev/mapper/WD-WCC7K2xxxxNS].write_io_errs 0 [/dev/mapper/WD-WCC7K2xxxxNS].read_io_errs 0 [/dev/mapper/WD-WCC7K2xxxxNS].flush_io_errs 0 [/dev/mapper/WD-WCC7K2xxxxNS].corruption_errs 0 [/dev/mapper/WD-WCC7K2xxxxNS].generation_errs 0 [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers") 2021-11-25 18:06 Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers") Lukas Pirl @ 2021-12-02 14:49 ` Lukas Pirl 2021-12-02 18:11 ` Zygo Blaxell 0 siblings, 1 reply; 8+ messages in thread From: Lukas Pirl @ 2021-12-02 14:49 UTC (permalink / raw) To: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 9957 bytes --] (re-post) Is there no motivation to address this or do I need to supply additional information? Cheers Lukas On Thu, 2021-11-25 19:06 +0100, Lukas Pirl wrote as excerpted: > Dear btrfs community, > > this is another report of a probably endless balance which loops on > "found 1 extents, stage: update data pointers". > > I observe it on a btrfs RAID 1 on around 7 luks-encrypted spinning > disks (more fs details below) used for storing cold data. One disk > failed physically. Now, I try to "btrfs device delete missing". The > operation runs forever (probably, waited more than 30 days, another > time more than 50 days). > > dmesg says: > [ 22:26] BTRFS info (device dm-1): relocating block group 1109204664320 > flags data|raid1 > [ 22:27] BTRFS info (device dm-1): found 4164 extents, stage: move data > extents > [ +5.476247] BTRFS info (device dm-1): found 4164 extents, stage: update > data pointers > [ +2.545299] BTRFS info (device dm-1): found 1 extents, stage: update data > pointers > > and then the last message repeats every ~ .25 seconds ("forever"). > Memory and CPU usage are not excessive (most is IO wait, I assume). > > What I have tried: > * Linux 4 (multiple minor versions, don't remember which exactly) > * Linux 5.10 > * Linux 5.15 > * btrfs-progs v5.15 > * remove subvolues (before: ~ 200, after: ~ 90) > * free space cache v1, v2, none > * reboot, restart removal/balance (multiple times) > > How can we find the problem here to make btrfs an even more stable > file system in the future? > (The particular fs has been created 2016, I am otherwise happy with > btrfs and advocating; BTW I have backups and are ready to use them) > > Another question I was asking myself: can btrfs be forced to forget > about a device (as in "delete from meta data) to then just run a > regular balance? > > Thanks in advance; I hope we can debug this. > > Lukas > > ====================================================================== > > filesystem show > =============== > > Label: 'pool_16-03' uuid: 59301fea-434a-xxxx-bb45-08fcfe8ce113 > Total devices 8 FS bytes used 3.84TiB > devid 1 size 931.51GiB used 592.00GiB path /dev/mapper/WD- > WCAU45xxxx03 > devid 3 size 1.82TiB used 1.37TiB path /dev/mapper/WD- > WCAZAFxxxx78 > devid 4 size 931.51GiB used 593.00GiB path /dev/mapper/WD- > WCC4J7xxxxSZ > devid 5 size 1.82TiB used 1.46TiB path /dev/mapper/WD- > WCC4M2xxxxXH > devid 7 size 931.51GiB used 584.00GiB path /dev/mapper/S1xxxxJ3 > devid 9 size 2.73TiB used 2.28TiB path /dev/mapper/WD- > WCC4N3xxxx17 > devid 10 size 3.64TiB used 1.03TiB path /dev/mapper/WD- > WCC7K2xxxxNS > *** Some devices missing > > subvolumes > ========== > > ~ 90, of which ~ 60 are read-only snapshots of the other ~ 30 > > filesystem usage > ================ > > Overall: > Device size: 12.74TiB > Device allocated: 8.36TiB > Device unallocated: 4.38TiB > Device missing: 0.00B > Used: 7.69TiB > Free (estimated): 2.50TiB (min: 2.50TiB) > Free (statfs, df): 1.46TiB > Data ratio: 2.00 > Metadata ratio: 2.00 > Global reserve: 512.00MiB (used: 48.00KiB) > Multiple profiles: no > > Data,RAID1: Size:4.14TiB, Used:3.82TiB (92.33%) > /dev/mapper/WD-WCAU45xxxx03 584.00GiB > /dev/mapper/WD-WCAZAFxxxx78 1.35TiB > /dev/mapper/WD-WCC4J7xxxxSZ 588.00GiB > /dev/mapper/WD-WCC4M2xxxxXH 1.44TiB > missing 510.00GiB > /dev/mapper/S1xxxxJ3 579.00GiB > /dev/mapper/WD-WCC4N3xxxx17 2.26TiB > /dev/mapper/WD-WCC7K2xxxxNS 1.01TiB > > Metadata,RAID1: Size:41.00GiB, Used:23.14GiB (56.44%) > /dev/mapper/WD-WCAU45xxxx03 8.00GiB > /dev/mapper/WD-WCAZAFxxxx78 17.00GiB > /dev/mapper/WD-WCC4J7xxxxSZ 5.00GiB > /dev/mapper/WD-WCC4M2xxxxXH 13.00GiB > missing 3.00GiB > /dev/mapper/S1xxxxJ3 5.00GiB > /dev/mapper/WD-WCC4N3xxxx17 16.00GiB > /dev/mapper/WD-WCC7K2xxxxNS 15.00GiB > > System,RAID1: Size:32.00MiB, Used:848.00KiB (2.59%) > missing 32.00MiB > /dev/mapper/WD-WCC4N3xxxx17 32.00MiB > > Unallocated: > /dev/mapper/WD-WCAU45xxxx03 339.51GiB > /dev/mapper/WD-WCAZAFxxxx78 461.01GiB > /dev/mapper/WD-WCC4J7xxxxSZ 338.51GiB > /dev/mapper/WD-WCC4M2xxxxXH 373.01GiB > missing -513.03GiB > /dev/mapper/S1xxxxJ3 347.51GiB > /dev/mapper/WD-WCC4N3xxxx17 460.47GiB > /dev/mapper/WD-WCC7K2xxxxNS 2.61TiB > > dump-super > ========== > > superblock: bytenr=65536, device=/dev/mapper/WD-WCAU45xxxx03 > --------------------------------------------------------- > csum_type 0 (crc32c) > csum_size 4 > csum 0x51beb068 [match] > bytenr 65536 > flags 0x1 > ( WRITTEN ) > magic _BHRfS_M [match] > fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113 > metadata_uuid 59301fea-434a-xxxx-bb45-08fcfe8ce113 > label pool_16-03 > generation 113519755 > root 15602414796800 > sys_array_size 129 > chunk_root_generation 63394299 > root_level 1 > chunk_root 19216820502528 > chunk_root_level 1 > log_root 0 > log_root_transid 0 > log_root_level 0 > total_bytes 16003136864256 > bytes_used 4227124142080 > sectorsize 4096 > nodesize 16384 > leafsize (deprecated) 16384 > stripesize 4096 > root_dir 6 > num_devices 8 > compat_flags 0x0 > compat_ro_flags 0x0 > incompat_flags 0x371 > ( MIXED_BACKREF | > COMPRESS_ZSTD | > BIG_METADATA | > EXTENDED_IREF | > SKINNY_METADATA | > NO_HOLES ) > cache_generation 2975866 > uuid_tree_generation 113519755 > dev_item.uuid a9b2e4ea-404c-xxxx-a450-dc84b0956ce1 > dev_item.fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113 [match] > dev_item.type 0 > dev_item.total_bytes 1000201740288 > dev_item.bytes_used 635655159808 > dev_item.io_align 4096 > dev_item.io_width 4096 > dev_item.sector_size 4096 > dev_item.devid 1 > dev_item.dev_group 0 > dev_item.seek_speed 0 > dev_item.bandwidth 0 > dev_item.generation 0 > > device stats > ============ > > [/dev/mapper/WD-WCAU45xxxx03].write_io_errs 0 > [/dev/mapper/WD-WCAU45xxxx03].read_io_errs 0 > [/dev/mapper/WD-WCAU45xxxx03].flush_io_errs 0 > [/dev/mapper/WD-WCAU45xxxx03].corruption_errs 0 > [/dev/mapper/WD-WCAU45xxxx03].generation_errs 0 > [/dev/mapper/WD-WCAZAFxxxx78].write_io_errs 0 > [/dev/mapper/WD-WCAZAFxxxx78].read_io_errs 0 > [/dev/mapper/WD-WCAZAFxxxx78].flush_io_errs 0 > [/dev/mapper/WD-WCAZAFxxxx78].corruption_errs 0 > [/dev/mapper/WD-WCAZAFxxxx78].generation_errs 0 > [/dev/mapper/WD-WCC4J7xxxxSZ].write_io_errs 0 > [/dev/mapper/WD-WCC4J7xxxxSZ].read_io_errs 1 > [/dev/mapper/WD-WCC4J7xxxxSZ].flush_io_errs 0 > [/dev/mapper/WD-WCC4J7xxxxSZ].corruption_errs 0 > [/dev/mapper/WD-WCC4J7xxxxSZ].generation_errs 0 > [/dev/mapper/WD-WCC4M2xxxxXH].write_io_errs 0 > [/dev/mapper/WD-WCC4M2xxxxXH].read_io_errs 0 > [/dev/mapper/WD-WCC4M2xxxxXH].flush_io_errs 0 > [/dev/mapper/WD-WCC4M2xxxxXH].corruption_errs 0 > [/dev/mapper/WD-WCC4M2xxxxXH].generation_errs 0 > [devid:6].write_io_errs 0 > [devid:6].read_io_errs 0 > [devid:6].flush_io_errs 0 > [devid:6].corruption_errs 72016 > [devid:6].generation_errs 100 > [/dev/mapper/S1xxxxJ3].write_io_errs 0 > [/dev/mapper/S1xxxxJ3].read_io_errs 0 > [/dev/mapper/S1xxxxJ3].flush_io_errs 0 > [/dev/mapper/S1xxxxJ3].corruption_errs 2 > [/dev/mapper/S1xxxxJ3].generation_errs 0 > [/dev/mapper/WD-WCC4N3xxxx17].write_io_errs 0 > [/dev/mapper/WD-WCC4N3xxxx17].read_io_errs 0 > [/dev/mapper/WD-WCC4N3xxxx17].flush_io_errs 0 > [/dev/mapper/WD-WCC4N3xxxx17].corruption_errs 0 > [/dev/mapper/WD-WCC4N3xxxx17].generation_errs 0 > [/dev/mapper/WD-WCC7K2xxxxNS].write_io_errs 0 > [/dev/mapper/WD-WCC7K2xxxxNS].read_io_errs 0 > [/dev/mapper/WD-WCC7K2xxxxNS].flush_io_errs 0 > [/dev/mapper/WD-WCC7K2xxxxNS].corruption_errs 0 > [/dev/mapper/WD-WCC7K2xxxxNS].generation_errs 0 [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers") 2021-12-02 14:49 ` Lukas Pirl @ 2021-12-02 18:11 ` Zygo Blaxell 2021-12-03 10:14 ` Lukas Pirl ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Zygo Blaxell @ 2021-12-02 18:11 UTC (permalink / raw) To: Lukas Pirl; +Cc: linux-btrfs On Thu, Dec 02, 2021 at 03:49:08PM +0100, Lukas Pirl wrote: > (re-post) > > Is there no motivation to address this or do I need to supply additional > information? > > Cheers > > Lukas > > On Thu, 2021-11-25 19:06 +0100, Lukas Pirl wrote as excerpted: > > Dear btrfs community, > > > > this is another report of a probably endless balance which loops on > > "found 1 extents, stage: update data pointers". > > > > I observe it on a btrfs RAID 1 on around 7 luks-encrypted spinning > > disks (more fs details below) used for storing cold data. One disk > > failed physically. Now, I try to "btrfs device delete missing". The > > operation runs forever (probably, waited more than 30 days, another > > time more than 50 days). > > > > dmesg says: > > [ 22:26] BTRFS info (device dm-1): relocating block group 1109204664320 > > flags data|raid1 > > [ 22:27] BTRFS info (device dm-1): found 4164 extents, stage: move data > > extents > > [ +5.476247] BTRFS info (device dm-1): found 4164 extents, stage: update > > data pointers > > [ +2.545299] BTRFS info (device dm-1): found 1 extents, stage: update data > > pointers > > > > and then the last message repeats every ~ .25 seconds ("forever"). > > Memory and CPU usage are not excessive (most is IO wait, I assume). > > > > What I have tried: > > * Linux 4 (multiple minor versions, don't remember which exactly) > > * Linux 5.10 > > * Linux 5.15 > > * btrfs-progs v5.15 > > * remove subvolues (before: ~ 200, after: ~ 90) > > * free space cache v1, v2, none > > * reboot, restart removal/balance (multiple times) Does it always happen on the same block group? If so, that points to something lurking in your metadata. If a reboot fixes it for one block group and then it gets stuck on some other block group, it points to an issue in kernel memory state. It's more likely something on disk given all the reboots and kernel versions you have already tried. > > How can we find the problem here to make btrfs an even more stable > > file system in the future? What do you get from 'btrfs check --readonly'? > > (The particular fs has been created 2016, I am otherwise happy with > > btrfs and advocating; BTW I have backups and are ready to use them) > > > > Another question I was asking myself: can btrfs be forced to forget > > about a device (as in "delete from meta data) to then just run a > > regular balance? It can, but the way you do that is "mount in degraded mode (to force forget the device), then run btrfs device delete," and you're getting stuck on the "btrfs device delete" step. "btrfs device delete" is itself "resize device to zero, then run balance" and it's the balance step you're stuck on. > > Thanks in advance; I hope we can debug this. > > > > Lukas > > > > ====================================================================== > > > > filesystem show > > =============== > > > > Label: 'pool_16-03' uuid: 59301fea-434a-xxxx-bb45-08fcfe8ce113 > > Total devices 8 FS bytes used 3.84TiB > > devid 1 size 931.51GiB used 592.00GiB path /dev/mapper/WD- > > WCAU45xxxx03 > > devid 3 size 1.82TiB used 1.37TiB path /dev/mapper/WD- > > WCAZAFxxxx78 > > devid 4 size 931.51GiB used 593.00GiB path /dev/mapper/WD- > > WCC4J7xxxxSZ > > devid 5 size 1.82TiB used 1.46TiB path /dev/mapper/WD- > > WCC4M2xxxxXH > > devid 7 size 931.51GiB used 584.00GiB path /dev/mapper/S1xxxxJ3 > > devid 9 size 2.73TiB used 2.28TiB path /dev/mapper/WD- > > WCC4N3xxxx17 > > devid 10 size 3.64TiB used 1.03TiB path /dev/mapper/WD- > > WCC7K2xxxxNS > > *** Some devices missing > > > > subvolumes > > ========== > > > > ~ 90, of which ~ 60 are read-only snapshots of the other ~ 30 > > > > filesystem usage > > ================ > > > > Overall: > > Device size: 12.74TiB > > Device allocated: 8.36TiB > > Device unallocated: 4.38TiB > > Device missing: 0.00B > > Used: 7.69TiB > > Free (estimated): 2.50TiB (min: 2.50TiB) > > Free (statfs, df): 1.46TiB > > Data ratio: 2.00 > > Metadata ratio: 2.00 > > Global reserve: 512.00MiB (used: 48.00KiB) > > Multiple profiles: no > > > > Data,RAID1: Size:4.14TiB, Used:3.82TiB (92.33%) > > /dev/mapper/WD-WCAU45xxxx03 584.00GiB > > /dev/mapper/WD-WCAZAFxxxx78 1.35TiB > > /dev/mapper/WD-WCC4J7xxxxSZ 588.00GiB > > /dev/mapper/WD-WCC4M2xxxxXH 1.44TiB > > missing 510.00GiB > > /dev/mapper/S1xxxxJ3 579.00GiB > > /dev/mapper/WD-WCC4N3xxxx17 2.26TiB > > /dev/mapper/WD-WCC7K2xxxxNS 1.01TiB > > > > Metadata,RAID1: Size:41.00GiB, Used:23.14GiB (56.44%) > > /dev/mapper/WD-WCAU45xxxx03 8.00GiB > > /dev/mapper/WD-WCAZAFxxxx78 17.00GiB > > /dev/mapper/WD-WCC4J7xxxxSZ 5.00GiB > > /dev/mapper/WD-WCC4M2xxxxXH 13.00GiB > > missing 3.00GiB > > /dev/mapper/S1xxxxJ3 5.00GiB > > /dev/mapper/WD-WCC4N3xxxx17 16.00GiB > > /dev/mapper/WD-WCC7K2xxxxNS 15.00GiB > > > > System,RAID1: Size:32.00MiB, Used:848.00KiB (2.59%) > > missing 32.00MiB > > /dev/mapper/WD-WCC4N3xxxx17 32.00MiB > > > > Unallocated: > > /dev/mapper/WD-WCAU45xxxx03 339.51GiB > > /dev/mapper/WD-WCAZAFxxxx78 461.01GiB > > /dev/mapper/WD-WCC4J7xxxxSZ 338.51GiB > > /dev/mapper/WD-WCC4M2xxxxXH 373.01GiB > > missing -513.03GiB > > /dev/mapper/S1xxxxJ3 347.51GiB > > /dev/mapper/WD-WCC4N3xxxx17 460.47GiB > > /dev/mapper/WD-WCC7K2xxxxNS 2.61TiB > > > > dump-super > > ========== > > > > superblock: bytenr=65536, device=/dev/mapper/WD-WCAU45xxxx03 > > --------------------------------------------------------- > > csum_type 0 (crc32c) > > csum_size 4 > > csum 0x51beb068 [match] > > bytenr 65536 > > flags 0x1 > > ( WRITTEN ) > > magic _BHRfS_M [match] > > fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113 > > metadata_uuid 59301fea-434a-xxxx-bb45-08fcfe8ce113 > > label pool_16-03 > > generation 113519755 > > root 15602414796800 > > sys_array_size 129 > > chunk_root_generation 63394299 > > root_level 1 > > chunk_root 19216820502528 > > chunk_root_level 1 > > log_root 0 > > log_root_transid 0 > > log_root_level 0 > > total_bytes 16003136864256 > > bytes_used 4227124142080 > > sectorsize 4096 > > nodesize 16384 > > leafsize (deprecated) 16384 > > stripesize 4096 > > root_dir 6 > > num_devices 8 > > compat_flags 0x0 > > compat_ro_flags 0x0 > > incompat_flags 0x371 > > ( MIXED_BACKREF | > > COMPRESS_ZSTD | > > BIG_METADATA | > > EXTENDED_IREF | > > SKINNY_METADATA | > > NO_HOLES ) > > cache_generation 2975866 > > uuid_tree_generation 113519755 > > dev_item.uuid a9b2e4ea-404c-xxxx-a450-dc84b0956ce1 > > dev_item.fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113 [match] > > dev_item.type 0 > > dev_item.total_bytes 1000201740288 > > dev_item.bytes_used 635655159808 > > dev_item.io_align 4096 > > dev_item.io_width 4096 > > dev_item.sector_size 4096 > > dev_item.devid 1 > > dev_item.dev_group 0 > > dev_item.seek_speed 0 > > dev_item.bandwidth 0 > > dev_item.generation 0 > > > > device stats > > ============ > > > > [/dev/mapper/WD-WCAU45xxxx03].write_io_errs 0 > > [/dev/mapper/WD-WCAU45xxxx03].read_io_errs 0 > > [/dev/mapper/WD-WCAU45xxxx03].flush_io_errs 0 > > [/dev/mapper/WD-WCAU45xxxx03].corruption_errs 0 > > [/dev/mapper/WD-WCAU45xxxx03].generation_errs 0 > > [/dev/mapper/WD-WCAZAFxxxx78].write_io_errs 0 > > [/dev/mapper/WD-WCAZAFxxxx78].read_io_errs 0 > > [/dev/mapper/WD-WCAZAFxxxx78].flush_io_errs 0 > > [/dev/mapper/WD-WCAZAFxxxx78].corruption_errs 0 > > [/dev/mapper/WD-WCAZAFxxxx78].generation_errs 0 > > [/dev/mapper/WD-WCC4J7xxxxSZ].write_io_errs 0 > > [/dev/mapper/WD-WCC4J7xxxxSZ].read_io_errs 1 > > [/dev/mapper/WD-WCC4J7xxxxSZ].flush_io_errs 0 > > [/dev/mapper/WD-WCC4J7xxxxSZ].corruption_errs 0 > > [/dev/mapper/WD-WCC4J7xxxxSZ].generation_errs 0 > > [/dev/mapper/WD-WCC4M2xxxxXH].write_io_errs 0 > > [/dev/mapper/WD-WCC4M2xxxxXH].read_io_errs 0 > > [/dev/mapper/WD-WCC4M2xxxxXH].flush_io_errs 0 > > [/dev/mapper/WD-WCC4M2xxxxXH].corruption_errs 0 > > [/dev/mapper/WD-WCC4M2xxxxXH].generation_errs 0 > > [devid:6].write_io_errs 0 > > [devid:6].read_io_errs 0 > > [devid:6].flush_io_errs 0 > > [devid:6].corruption_errs 72016 > > [devid:6].generation_errs 100 > > [/dev/mapper/S1xxxxJ3].write_io_errs 0 > > [/dev/mapper/S1xxxxJ3].read_io_errs 0 > > [/dev/mapper/S1xxxxJ3].flush_io_errs 0 > > [/dev/mapper/S1xxxxJ3].corruption_errs 2 > > [/dev/mapper/S1xxxxJ3].generation_errs 0 > > [/dev/mapper/WD-WCC4N3xxxx17].write_io_errs 0 > > [/dev/mapper/WD-WCC4N3xxxx17].read_io_errs 0 > > [/dev/mapper/WD-WCC4N3xxxx17].flush_io_errs 0 > > [/dev/mapper/WD-WCC4N3xxxx17].corruption_errs 0 > > [/dev/mapper/WD-WCC4N3xxxx17].generation_errs 0 > > [/dev/mapper/WD-WCC7K2xxxxNS].write_io_errs 0 > > [/dev/mapper/WD-WCC7K2xxxxNS].read_io_errs 0 > > [/dev/mapper/WD-WCC7K2xxxxNS].flush_io_errs 0 > > [/dev/mapper/WD-WCC7K2xxxxNS].corruption_errs 0 > > [/dev/mapper/WD-WCC7K2xxxxNS].generation_errs 0 > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers") 2021-12-02 18:11 ` Zygo Blaxell @ 2021-12-03 10:14 ` Lukas Pirl 2021-12-05 11:54 ` Lukas Pirl 2021-12-10 13:28 ` Lukas Pirl 2 siblings, 0 replies; 8+ messages in thread From: Lukas Pirl @ 2021-12-03 10:14 UTC (permalink / raw) To: Zygo Blaxell; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 1580 bytes --] Thanks for taking care, Zygo. On Thu, 2021-12-02 13:11 -0500, Zygo Blaxell wrote as excerpted: > Does it always happen on the same block group? If so, that points to > something lurking in your metadata. If a reboot fixes it for one block > group and then it gets stuck on some other block group, it points to > an issue in kernel memory state. Good point, I'll check. I can also do a memory test but the machine runs well otherwise. > What do you get from 'btrfs check --readonly'? Interestingly, the machine disappeared from the network shortly after I issued the command. :) I'll drive to the machine today or tomorrow, see what is going on and report back. > > > (The particular fs has been created 2016, I am otherwise happy with > > > btrfs and advocating; BTW I have backups and are ready to use them) > > > > > > Another question I was asking myself: can btrfs be forced to forget > > > about a device (as in "delete from meta data) to then just run a > > > regular balance? > > It can, but the way you do that is "mount in degraded mode (to force > forget the device), then run btrfs device delete," and you're getting > stuck on the "btrfs device delete" step. > > "btrfs device delete" is itself "resize device to zero, then run balance" > and it's the balance step you're stuck on. Yes, but btrfs still knows about the drive. But if it's really the balance that hangs, it probably wouldn't make much sense to just "delete the device from the metadata" if one would have to balance afterwards anyway. Cheers Lukas [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers") 2021-12-02 18:11 ` Zygo Blaxell 2021-12-03 10:14 ` Lukas Pirl @ 2021-12-05 11:54 ` Lukas Pirl 2021-12-10 13:28 ` Lukas Pirl 2 siblings, 0 replies; 8+ messages in thread From: Lukas Pirl @ 2021-12-05 11:54 UTC (permalink / raw) To: Zygo Blaxell; +Cc: linux-btrfs Hello Zygo, it took me (and the disks) a while to report back; here we go: On Thu, 2021-12-02 13:11 -0500, Zygo Blaxell wrote as excerpted: > > On Thu, 2021-11-25 19:06 +0100, Lukas Pirl wrote as excerpted: > > > Dear btrfs community, > > > > > > this is another report of a probably endless balance which loops on > > > "found 1 extents, stage: update data pointers". > > > > > > I observe it on a btrfs RAID 1 on around 7 luks-encrypted spinning > > > disks (more fs details below) used for storing cold data. One disk > > > failed physically. Now, I try to "btrfs device delete missing". The > > > operation runs forever (probably, waited more than 30 days, another > > > time more than 50 days). > > > > > > dmesg says: > > > [ 22:26] BTRFS info (device dm-1): relocating block group > > > 1109204664320 > > > flags data|raid1 > > > [ 22:27] BTRFS info (device dm-1): found 4164 extents, stage: move > > > data > > > extents > > > [ +5.476247] BTRFS info (device dm-1): found 4164 extents, stage: update > > > data pointers > > > [ +2.545299] BTRFS info (device dm-1): found 1 extents, stage: update > > > data > > > pointers > > > > > > and then the last message repeats every ~ .25 seconds ("forever"). > > > Memory and CPU usage are not excessive (most is IO wait, I assume). > > > > > > What I have tried: > > > * Linux 4 (multiple minor versions, don't remember which exactly) > > > * Linux 5.10 > > > * Linux 5.15 > > > * btrfs-progs v5.15 > > > * remove subvolues (before: ~ 200, after: ~ 90) > > > * free space cache v1, v2, none > > > * reboot, restart removal/balance (multiple times) > > Does it always happen on the same block group? If so, that points to > something lurking in your metadata. If a reboot fixes it for one block > group and then it gets stuck on some other block group, it points to > an issue in kernel memory state. Although I haven't paid attention to the block group number in the past, another run of ``btrfs dev del`` just now gave the same last block group number (1109204664320) before, presumably, looping. > What do you get from 'btrfs check --readonly'? $ btrfs check --readonly --mode lowmem /dev/disk/by-label/pool_16-03 [1/7] checking root items Opening filesystem to check... warning, device 6 is missing Checking filesystem on /dev/disk/by-label/pool_16-03 UUID: 59301fea-434a-4c43-bb45-08fcfe8ce113 [2/7] checking extents ERROR: extent[1109584044032, 8192] referencer count mismatch (root: 276, owner: 1154248, offset: 100401152) wanted: 1, have: 0 ERROR: errors found in extent allocation tree or chunk allocation [3/7] checking free space tree [4/7] checking fs roots [5/7] checking only csums items (without verifying data) [6/7] checking root refs done with fs roots in lowmem mode, skipping [7/7] checking quota groups skipped (not enabled on this FS) found 4252313206784 bytes used, error(s) found total csum bytes: 4128183360 total tree bytes: 25053184000 total fs tree bytes: 16415014912 total extent tree bytes: 3662594048 btree space waste bytes: 4949241278 file data blocks allocated: 8025128243200 referenced 7552211206144 Thanks for your help Lukas > > > ====================================================================== > > > > > > filesystem show > > > =============== > > > > > > Label: 'pool_16-03' uuid: 59301fea-434a-xxxx-bb45-08fcfe8ce113 > > > Total devices 8 FS bytes used 3.84TiB > > > devid 1 size 931.51GiB used 592.00GiB path /dev/mapper/WD- > > > WCAU45xxxx03 > > > devid 3 size 1.82TiB used 1.37TiB path /dev/mapper/WD- > > > WCAZAFxxxx78 > > > devid 4 size 931.51GiB used 593.00GiB path /dev/mapper/WD- > > > WCC4J7xxxxSZ > > > devid 5 size 1.82TiB used 1.46TiB path /dev/mapper/WD- > > > WCC4M2xxxxXH > > > devid 7 size 931.51GiB used 584.00GiB path > > > /dev/mapper/S1xxxxJ3 > > > devid 9 size 2.73TiB used 2.28TiB path /dev/mapper/WD- > > > WCC4N3xxxx17 > > > devid 10 size 3.64TiB used 1.03TiB path /dev/mapper/WD- > > > WCC7K2xxxxNS > > > *** Some devices missing > > > > > > subvolumes > > > ========== > > > > > > ~ 90, of which ~ 60 are read-only snapshots of the other ~ 30 > > > > > > filesystem usage > > > ================ > > > > > > Overall: > > > Device size: 12.74TiB > > > Device allocated: 8.36TiB > > > Device unallocated: 4.38TiB > > > Device missing: 0.00B > > > Used: 7.69TiB > > > Free (estimated): 2.50TiB (min: 2.50TiB) > > > Free (statfs, df): 1.46TiB > > > Data ratio: 2.00 > > > Metadata ratio: 2.00 > > > Global reserve: 512.00MiB (used: 48.00KiB) > > > Multiple profiles: no > > > > > > Data,RAID1: Size:4.14TiB, Used:3.82TiB (92.33%) > > > /dev/mapper/WD-WCAU45xxxx03 584.00GiB > > > /dev/mapper/WD-WCAZAFxxxx78 1.35TiB > > > /dev/mapper/WD-WCC4J7xxxxSZ 588.00GiB > > > /dev/mapper/WD-WCC4M2xxxxXH 1.44TiB > > > missing 510.00GiB > > > /dev/mapper/S1xxxxJ3 579.00GiB > > > /dev/mapper/WD-WCC4N3xxxx17 2.26TiB > > > /dev/mapper/WD-WCC7K2xxxxNS 1.01TiB > > > > > > Metadata,RAID1: Size:41.00GiB, Used:23.14GiB (56.44%) > > > /dev/mapper/WD-WCAU45xxxx03 8.00GiB > > > /dev/mapper/WD-WCAZAFxxxx78 17.00GiB > > > /dev/mapper/WD-WCC4J7xxxxSZ 5.00GiB > > > /dev/mapper/WD-WCC4M2xxxxXH 13.00GiB > > > missing 3.00GiB > > > /dev/mapper/S1xxxxJ3 5.00GiB > > > /dev/mapper/WD-WCC4N3xxxx17 16.00GiB > > > /dev/mapper/WD-WCC7K2xxxxNS 15.00GiB > > > > > > System,RAID1: Size:32.00MiB, Used:848.00KiB (2.59%) > > > missing 32.00MiB > > > /dev/mapper/WD-WCC4N3xxxx17 32.00MiB > > > > > > Unallocated: > > > /dev/mapper/WD-WCAU45xxxx03 339.51GiB > > > /dev/mapper/WD-WCAZAFxxxx78 461.01GiB > > > /dev/mapper/WD-WCC4J7xxxxSZ 338.51GiB > > > /dev/mapper/WD-WCC4M2xxxxXH 373.01GiB > > > missing -513.03GiB > > > /dev/mapper/S1xxxxJ3 347.51GiB > > > /dev/mapper/WD-WCC4N3xxxx17 460.47GiB > > > /dev/mapper/WD-WCC7K2xxxxNS 2.61TiB > > > > > > dump-super > > > ========== > > > > > > superblock: bytenr=65536, device=/dev/mapper/WD-WCAU45xxxx03 > > > --------------------------------------------------------- > > > csum_type 0 (crc32c) > > > csum_size 4 > > > csum 0x51beb068 [match] > > > bytenr 65536 > > > flags 0x1 > > > ( WRITTEN ) > > > magic _BHRfS_M [match] > > > fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113 > > > metadata_uuid 59301fea-434a-xxxx-bb45-08fcfe8ce113 > > > label pool_16-03 > > > generation 113519755 > > > root 15602414796800 > > > sys_array_size 129 > > > chunk_root_generation 63394299 > > > root_level 1 > > > chunk_root 19216820502528 > > > chunk_root_level 1 > > > log_root 0 > > > log_root_transid 0 > > > log_root_level 0 > > > total_bytes 16003136864256 > > > bytes_used 4227124142080 > > > sectorsize 4096 > > > nodesize 16384 > > > leafsize (deprecated) 16384 > > > stripesize 4096 > > > root_dir 6 > > > num_devices 8 > > > compat_flags 0x0 > > > compat_ro_flags 0x0 > > > incompat_flags 0x371 > > > ( MIXED_BACKREF | > > > COMPRESS_ZSTD | > > > BIG_METADATA | > > > EXTENDED_IREF | > > > SKINNY_METADATA | > > > NO_HOLES ) > > > cache_generation 2975866 > > > uuid_tree_generation 113519755 > > > dev_item.uuid a9b2e4ea-404c-xxxx-a450-dc84b0956ce1 > > > dev_item.fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113 [match] > > > dev_item.type 0 > > > dev_item.total_bytes 1000201740288 > > > dev_item.bytes_used 635655159808 > > > dev_item.io_align 4096 > > > dev_item.io_width 4096 > > > dev_item.sector_size 4096 > > > dev_item.devid 1 > > > dev_item.dev_group 0 > > > dev_item.seek_speed 0 > > > dev_item.bandwidth 0 > > > dev_item.generation 0 > > > > > > device stats > > > ============ > > > > > > [/dev/mapper/WD-WCAU45xxxx03].write_io_errs 0 > > > [/dev/mapper/WD-WCAU45xxxx03].read_io_errs 0 > > > [/dev/mapper/WD-WCAU45xxxx03].flush_io_errs 0 > > > [/dev/mapper/WD-WCAU45xxxx03].corruption_errs 0 > > > [/dev/mapper/WD-WCAU45xxxx03].generation_errs 0 > > > [/dev/mapper/WD-WCAZAFxxxx78].write_io_errs 0 > > > [/dev/mapper/WD-WCAZAFxxxx78].read_io_errs 0 > > > [/dev/mapper/WD-WCAZAFxxxx78].flush_io_errs 0 > > > [/dev/mapper/WD-WCAZAFxxxx78].corruption_errs 0 > > > [/dev/mapper/WD-WCAZAFxxxx78].generation_errs 0 > > > [/dev/mapper/WD-WCC4J7xxxxSZ].write_io_errs 0 > > > [/dev/mapper/WD-WCC4J7xxxxSZ].read_io_errs 1 > > > [/dev/mapper/WD-WCC4J7xxxxSZ].flush_io_errs 0 > > > [/dev/mapper/WD-WCC4J7xxxxSZ].corruption_errs 0 > > > [/dev/mapper/WD-WCC4J7xxxxSZ].generation_errs 0 > > > [/dev/mapper/WD-WCC4M2xxxxXH].write_io_errs 0 > > > [/dev/mapper/WD-WCC4M2xxxxXH].read_io_errs 0 > > > [/dev/mapper/WD-WCC4M2xxxxXH].flush_io_errs 0 > > > [/dev/mapper/WD-WCC4M2xxxxXH].corruption_errs 0 > > > [/dev/mapper/WD-WCC4M2xxxxXH].generation_errs 0 > > > [devid:6].write_io_errs 0 > > > [devid:6].read_io_errs 0 > > > [devid:6].flush_io_errs 0 > > > [devid:6].corruption_errs 72016 > > > [devid:6].generation_errs 100 > > > [/dev/mapper/S1xxxxJ3].write_io_errs 0 > > > [/dev/mapper/S1xxxxJ3].read_io_errs 0 > > > [/dev/mapper/S1xxxxJ3].flush_io_errs 0 > > > [/dev/mapper/S1xxxxJ3].corruption_errs 2 > > > [/dev/mapper/S1xxxxJ3].generation_errs 0 > > > [/dev/mapper/WD-WCC4N3xxxx17].write_io_errs 0 > > > [/dev/mapper/WD-WCC4N3xxxx17].read_io_errs 0 > > > [/dev/mapper/WD-WCC4N3xxxx17].flush_io_errs 0 > > > [/dev/mapper/WD-WCC4N3xxxx17].corruption_errs 0 > > > [/dev/mapper/WD-WCC4N3xxxx17].generation_errs 0 > > > [/dev/mapper/WD-WCC7K2xxxxNS].write_io_errs 0 > > > [/dev/mapper/WD-WCC7K2xxxxNS].read_io_errs 0 > > > [/dev/mapper/WD-WCC7K2xxxxNS].flush_io_errs 0 > > > [/dev/mapper/WD-WCC7K2xxxxNS].corruption_errs 0 > > > [/dev/mapper/WD-WCC7K2xxxxNS].generation_errs 0 > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers") 2021-12-02 18:11 ` Zygo Blaxell 2021-12-03 10:14 ` Lukas Pirl 2021-12-05 11:54 ` Lukas Pirl @ 2021-12-10 13:28 ` Lukas Pirl 2021-12-11 2:53 ` Zygo Blaxell 2 siblings, 1 reply; 8+ messages in thread From: Lukas Pirl @ 2021-12-10 13:28 UTC (permalink / raw) To: Zygo Blaxell; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 11961 bytes --] (friendly, humble re-post) Hello Zygo, it took me (and the disks) a while to report back; here we go: On Thu, 2021-12-02 13:11 -0500, Zygo Blaxell wrote as excerpted: > > On Thu, 2021-11-25 19:06 +0100, Lukas Pirl wrote as excerpted: > > > Dear btrfs community, > > > > > > this is another report of a probably endless balance which loops on > > > "found 1 extents, stage: update data pointers". > > > > > > I observe it on a btrfs RAID 1 on around 7 luks-encrypted spinning > > > disks (more fs details below) used for storing cold data. One disk > > > failed physically. Now, I try to "btrfs device delete missing". The > > > operation runs forever (probably, waited more than 30 days, another > > > time more than 50 days). > > > > > > dmesg says: > > > [ 22:26] BTRFS info (device dm-1): relocating block group > > > 1109204664320 > > > flags data|raid1 > > > [ 22:27] BTRFS info (device dm-1): found 4164 extents, stage: move > > > data > > > extents > > > [ +5.476247] BTRFS info (device dm-1): found 4164 extents, stage: > > > update > > > data pointers > > > [ +2.545299] BTRFS info (device dm-1): found 1 extents, stage: update > > > data > > > pointers > > > > > > and then the last message repeats every ~ .25 seconds ("forever"). > > > Memory and CPU usage are not excessive (most is IO wait, I assume). > > > > > > What I have tried: > > > * Linux 4 (multiple minor versions, don't remember which exactly) > > > * Linux 5.10 > > > * Linux 5.15 > > > * btrfs-progs v5.15 > > > * remove subvolues (before: ~ 200, after: ~ 90) > > > * free space cache v1, v2, none > > > * reboot, restart removal/balance (multiple times) > > Does it always happen on the same block group? If so, that points to > something lurking in your metadata. If a reboot fixes it for one block > group and then it gets stuck on some other block group, it points to > an issue in kernel memory state. Although I haven't paid attention to the block group number in the past, another run of ``btrfs dev del`` just now gave the same last block group number (1109204664320) before, presumably, looping. > What do you get from 'btrfs check --readonly'? $ btrfs check --readonly --mode lowmem /dev/disk/by-label/pool_16-03 [1/7] checking root items Opening filesystem to check... warning, device 6 is missing Checking filesystem on /dev/disk/by-label/pool_16-03 UUID: 59301fea-434a-4c43-bb45-08fcfe8ce113 [2/7] checking extents ERROR: extent[1109584044032, 8192] referencer count mismatch (root: 276, owner: 1154248, offset: 100401152) wanted: 1, have: 0 ERROR: errors found in extent allocation tree or chunk allocation [3/7] checking free space tree [4/7] checking fs roots [5/7] checking only csums items (without verifying data) [6/7] checking root refs done with fs roots in lowmem mode, skipping [7/7] checking quota groups skipped (not enabled on this FS) found 4252313206784 bytes used, error(s) found total csum bytes: 4128183360 total tree bytes: 25053184000 total fs tree bytes: 16415014912 total extent tree bytes: 3662594048 btree space waste bytes: 4949241278 file data blocks allocated: 8025128243200 referenced 7552211206144 So what can be done? ``check --repair``? Or too dangerous? :) Thanks for your help Lukas > > > ====================================================================== > > > > > > filesystem show > > > =============== > > > > > > Label: 'pool_16-03' uuid: 59301fea-434a-xxxx-bb45-08fcfe8ce113 > > > Total devices 8 FS bytes used 3.84TiB > > > devid 1 size 931.51GiB used 592.00GiB path /dev/mapper/WD- > > > WCAU45xxxx03 > > > devid 3 size 1.82TiB used 1.37TiB path /dev/mapper/WD- > > > WCAZAFxxxx78 > > > devid 4 size 931.51GiB used 593.00GiB path /dev/mapper/WD- > > > WCC4J7xxxxSZ > > > devid 5 size 1.82TiB used 1.46TiB path /dev/mapper/WD- > > > WCC4M2xxxxXH > > > devid 7 size 931.51GiB used 584.00GiB path > > > /dev/mapper/S1xxxxJ3 > > > devid 9 size 2.73TiB used 2.28TiB path /dev/mapper/WD- > > > WCC4N3xxxx17 > > > devid 10 size 3.64TiB used 1.03TiB path /dev/mapper/WD- > > > WCC7K2xxxxNS > > > *** Some devices missing > > > > > > subvolumes > > > ========== > > > > > > ~ 90, of which ~ 60 are read-only snapshots of the other ~ 30 > > > > > > filesystem usage > > > ================ > > > > > > Overall: > > > Device size: 12.74TiB > > > Device allocated: 8.36TiB > > > Device unallocated: 4.38TiB > > > Device missing: 0.00B > > > Used: 7.69TiB > > > Free (estimated): 2.50TiB (min: 2.50TiB) > > > Free (statfs, df): 1.46TiB > > > Data ratio: 2.00 > > > Metadata ratio: 2.00 > > > Global reserve: 512.00MiB (used: 48.00KiB) > > > Multiple profiles: no > > > > > > Data,RAID1: Size:4.14TiB, Used:3.82TiB (92.33%) > > > /dev/mapper/WD-WCAU45xxxx03 584.00GiB > > > /dev/mapper/WD-WCAZAFxxxx78 1.35TiB > > > /dev/mapper/WD-WCC4J7xxxxSZ 588.00GiB > > > /dev/mapper/WD-WCC4M2xxxxXH 1.44TiB > > > missing 510.00GiB > > > /dev/mapper/S1xxxxJ3 579.00GiB > > > /dev/mapper/WD-WCC4N3xxxx17 2.26TiB > > > /dev/mapper/WD-WCC7K2xxxxNS 1.01TiB > > > > > > Metadata,RAID1: Size:41.00GiB, Used:23.14GiB (56.44%) > > > /dev/mapper/WD-WCAU45xxxx03 8.00GiB > > > /dev/mapper/WD-WCAZAFxxxx78 17.00GiB > > > /dev/mapper/WD-WCC4J7xxxxSZ 5.00GiB > > > /dev/mapper/WD-WCC4M2xxxxXH 13.00GiB > > > missing 3.00GiB > > > /dev/mapper/S1xxxxJ3 5.00GiB > > > /dev/mapper/WD-WCC4N3xxxx17 16.00GiB > > > /dev/mapper/WD-WCC7K2xxxxNS 15.00GiB > > > > > > System,RAID1: Size:32.00MiB, Used:848.00KiB (2.59%) > > > missing 32.00MiB > > > /dev/mapper/WD-WCC4N3xxxx17 32.00MiB > > > > > > Unallocated: > > > /dev/mapper/WD-WCAU45xxxx03 339.51GiB > > > /dev/mapper/WD-WCAZAFxxxx78 461.01GiB > > > /dev/mapper/WD-WCC4J7xxxxSZ 338.51GiB > > > /dev/mapper/WD-WCC4M2xxxxXH 373.01GiB > > > missing -513.03GiB > > > /dev/mapper/S1xxxxJ3 347.51GiB > > > /dev/mapper/WD-WCC4N3xxxx17 460.47GiB > > > /dev/mapper/WD-WCC7K2xxxxNS 2.61TiB > > > > > > dump-super > > > ========== > > > > > > superblock: bytenr=65536, device=/dev/mapper/WD-WCAU45xxxx03 > > > --------------------------------------------------------- > > > csum_type 0 (crc32c) > > > csum_size 4 > > > csum 0x51beb068 [match] > > > bytenr 65536 > > > flags 0x1 > > > ( WRITTEN ) > > > magic _BHRfS_M [match] > > > fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113 > > > metadata_uuid 59301fea-434a-xxxx-bb45-08fcfe8ce113 > > > label pool_16-03 > > > generation 113519755 > > > root 15602414796800 > > > sys_array_size 129 > > > chunk_root_generation 63394299 > > > root_level 1 > > > chunk_root 19216820502528 > > > chunk_root_level 1 > > > log_root 0 > > > log_root_transid 0 > > > log_root_level 0 > > > total_bytes 16003136864256 > > > bytes_used 4227124142080 > > > sectorsize 4096 > > > nodesize 16384 > > > leafsize (deprecated) 16384 > > > stripesize 4096 > > > root_dir 6 > > > num_devices 8 > > > compat_flags 0x0 > > > compat_ro_flags 0x0 > > > incompat_flags 0x371 > > > ( MIXED_BACKREF | > > > COMPRESS_ZSTD | > > > BIG_METADATA | > > > EXTENDED_IREF | > > > SKINNY_METADATA | > > > NO_HOLES ) > > > cache_generation 2975866 > > > uuid_tree_generation 113519755 > > > dev_item.uuid a9b2e4ea-404c-xxxx-a450-dc84b0956ce1 > > > dev_item.fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113 [match] > > > dev_item.type 0 > > > dev_item.total_bytes 1000201740288 > > > dev_item.bytes_used 635655159808 > > > dev_item.io_align 4096 > > > dev_item.io_width 4096 > > > dev_item.sector_size 4096 > > > dev_item.devid 1 > > > dev_item.dev_group 0 > > > dev_item.seek_speed 0 > > > dev_item.bandwidth 0 > > > dev_item.generation 0 > > > > > > device stats > > > ============ > > > > > > [/dev/mapper/WD-WCAU45xxxx03].write_io_errs 0 > > > [/dev/mapper/WD-WCAU45xxxx03].read_io_errs 0 > > > [/dev/mapper/WD-WCAU45xxxx03].flush_io_errs 0 > > > [/dev/mapper/WD-WCAU45xxxx03].corruption_errs 0 > > > [/dev/mapper/WD-WCAU45xxxx03].generation_errs 0 > > > [/dev/mapper/WD-WCAZAFxxxx78].write_io_errs 0 > > > [/dev/mapper/WD-WCAZAFxxxx78].read_io_errs 0 > > > [/dev/mapper/WD-WCAZAFxxxx78].flush_io_errs 0 > > > [/dev/mapper/WD-WCAZAFxxxx78].corruption_errs 0 > > > [/dev/mapper/WD-WCAZAFxxxx78].generation_errs 0 > > > [/dev/mapper/WD-WCC4J7xxxxSZ].write_io_errs 0 > > > [/dev/mapper/WD-WCC4J7xxxxSZ].read_io_errs 1 > > > [/dev/mapper/WD-WCC4J7xxxxSZ].flush_io_errs 0 > > > [/dev/mapper/WD-WCC4J7xxxxSZ].corruption_errs 0 > > > [/dev/mapper/WD-WCC4J7xxxxSZ].generation_errs 0 > > > [/dev/mapper/WD-WCC4M2xxxxXH].write_io_errs 0 > > > [/dev/mapper/WD-WCC4M2xxxxXH].read_io_errs 0 > > > [/dev/mapper/WD-WCC4M2xxxxXH].flush_io_errs 0 > > > [/dev/mapper/WD-WCC4M2xxxxXH].corruption_errs 0 > > > [/dev/mapper/WD-WCC4M2xxxxXH].generation_errs 0 > > > [devid:6].write_io_errs 0 > > > [devid:6].read_io_errs 0 > > > [devid:6].flush_io_errs 0 > > > [devid:6].corruption_errs 72016 > > > [devid:6].generation_errs 100 > > > [/dev/mapper/S1xxxxJ3].write_io_errs 0 > > > [/dev/mapper/S1xxxxJ3].read_io_errs 0 > > > [/dev/mapper/S1xxxxJ3].flush_io_errs 0 > > > [/dev/mapper/S1xxxxJ3].corruption_errs 2 > > > [/dev/mapper/S1xxxxJ3].generation_errs 0 > > > [/dev/mapper/WD-WCC4N3xxxx17].write_io_errs 0 > > > [/dev/mapper/WD-WCC4N3xxxx17].read_io_errs 0 > > > [/dev/mapper/WD-WCC4N3xxxx17].flush_io_errs 0 > > > [/dev/mapper/WD-WCC4N3xxxx17].corruption_errs 0 > > > [/dev/mapper/WD-WCC4N3xxxx17].generation_errs 0 > > > [/dev/mapper/WD-WCC7K2xxxxNS].write_io_errs 0 > > > [/dev/mapper/WD-WCC7K2xxxxNS].read_io_errs 0 > > > [/dev/mapper/WD-WCC7K2xxxxNS].flush_io_errs 0 > > > [/dev/mapper/WD-WCC7K2xxxxNS].corruption_errs 0 > > > [/dev/mapper/WD-WCC7K2xxxxNS].generation_errs 0 > > [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers") 2021-12-10 13:28 ` Lukas Pirl @ 2021-12-11 2:53 ` Zygo Blaxell 2021-12-16 20:52 ` Lukas Pirl 0 siblings, 1 reply; 8+ messages in thread From: Zygo Blaxell @ 2021-12-11 2:53 UTC (permalink / raw) To: Lukas Pirl; +Cc: linux-btrfs On Fri, Dec 10, 2021 at 02:28:28PM +0100, Lukas Pirl wrote: > (friendly, humble re-post) > > Hello Zygo, > > it took me (and the disks) a while to report back; here we go: > > On Thu, 2021-12-02 13:11 -0500, Zygo Blaxell wrote as excerpted: > > > On Thu, 2021-11-25 19:06 +0100, Lukas Pirl wrote as excerpted: > > > > Dear btrfs community, > > > > > > > > this is another report of a probably endless balance which loops on > > > > "found 1 extents, stage: update data pointers". > > > > > > > > I observe it on a btrfs RAID 1 on around 7 luks-encrypted spinning > > > > disks (more fs details below) used for storing cold data. One disk > > > > failed physically. Now, I try to "btrfs device delete missing". The > > > > operation runs forever (probably, waited more than 30 days, another > > > > time more than 50 days). > > > > > > > > dmesg says: > > > > [ 22:26] BTRFS info (device dm-1): relocating block group > > > > 1109204664320 > > > > flags data|raid1 > > > > [ 22:27] BTRFS info (device dm-1): found 4164 extents, stage: move > > > > data > > > > extents > > > > [ +5.476247] BTRFS info (device dm-1): found 4164 extents, stage: > > > > update > > > > data pointers > > > > [ +2.545299] BTRFS info (device dm-1): found 1 extents, stage: update > > > > data > > > > pointers > > > > > > > > and then the last message repeats every ~ .25 seconds ("forever"). > > > > Memory and CPU usage are not excessive (most is IO wait, I assume). > > > > > > > > What I have tried: > > > > * Linux 4 (multiple minor versions, don't remember which exactly) > > > > * Linux 5.10 > > > > * Linux 5.15 > > > > * btrfs-progs v5.15 > > > > * remove subvolues (before: ~ 200, after: ~ 90) > > > > * free space cache v1, v2, none > > > > * reboot, restart removal/balance (multiple times) > > > > Does it always happen on the same block group? If so, that points to > > something lurking in your metadata. If a reboot fixes it for one block > > group and then it gets stuck on some other block group, it points to > > an issue in kernel memory state. > > Although I haven't paid attention to the block group number in the past, > another run of ``btrfs dev del`` just now gave the same last block group > number (1109204664320) before, presumably, looping. > > > What do you get from 'btrfs check --readonly'? > > $ btrfs check --readonly --mode lowmem /dev/disk/by-label/pool_16-03 > > [1/7] checking root items > Opening filesystem to check... > warning, device 6 is missing > Checking filesystem on /dev/disk/by-label/pool_16-03 > UUID: 59301fea-434a-4c43-bb45-08fcfe8ce113 > [2/7] checking extents > ERROR: extent[1109584044032, 8192] referencer count mismatch (root: 276, > owner: 1154248, offset: 100401152) wanted: 1, have: 0 > ERROR: errors found in extent allocation tree or chunk allocation > [3/7] checking free space tree > [4/7] checking fs roots > [5/7] checking only csums items (without verifying data) > [6/7] checking root refs done with fs roots in lowmem mode, skipping > [7/7] checking quota groups skipped (not enabled on this FS) > found 4252313206784 bytes used, error(s) found > total csum bytes: 4128183360 > total tree bytes: 25053184000 > total fs tree bytes: 16415014912 > total extent tree bytes: 3662594048 > btree space waste bytes: 4949241278 > file data blocks allocated: 8025128243200 > referenced 7552211206144 OK that's not too bad, just one bad reference. > So what can be done? ``check --repair``? Or too dangerous? :) If you have backups are you are prepared to restore them, you can try check --repair. > Thanks for your help > > Lukas > > > > > ====================================================================== > > > > > > > > filesystem show > > > > =============== > > > > > > > > Label: 'pool_16-03' uuid: 59301fea-434a-xxxx-bb45-08fcfe8ce113 > > > > Total devices 8 FS bytes used 3.84TiB > > > > devid 1 size 931.51GiB used 592.00GiB path /dev/mapper/WD- > > > > WCAU45xxxx03 > > > > devid 3 size 1.82TiB used 1.37TiB path /dev/mapper/WD- > > > > WCAZAFxxxx78 > > > > devid 4 size 931.51GiB used 593.00GiB path /dev/mapper/WD- > > > > WCC4J7xxxxSZ > > > > devid 5 size 1.82TiB used 1.46TiB path /dev/mapper/WD- > > > > WCC4M2xxxxXH > > > > devid 7 size 931.51GiB used 584.00GiB path > > > > /dev/mapper/S1xxxxJ3 > > > > devid 9 size 2.73TiB used 2.28TiB path /dev/mapper/WD- > > > > WCC4N3xxxx17 > > > > devid 10 size 3.64TiB used 1.03TiB path /dev/mapper/WD- > > > > WCC7K2xxxxNS > > > > *** Some devices missing > > > > > > > > subvolumes > > > > ========== > > > > > > > > ~ 90, of which ~ 60 are read-only snapshots of the other ~ 30 > > > > > > > > filesystem usage > > > > ================ > > > > > > > > Overall: > > > > Device size: 12.74TiB > > > > Device allocated: 8.36TiB > > > > Device unallocated: 4.38TiB > > > > Device missing: 0.00B > > > > Used: 7.69TiB > > > > Free (estimated): 2.50TiB (min: 2.50TiB) > > > > Free (statfs, df): 1.46TiB > > > > Data ratio: 2.00 > > > > Metadata ratio: 2.00 > > > > Global reserve: 512.00MiB (used: 48.00KiB) > > > > Multiple profiles: no > > > > > > > > Data,RAID1: Size:4.14TiB, Used:3.82TiB (92.33%) > > > > /dev/mapper/WD-WCAU45xxxx03 584.00GiB > > > > /dev/mapper/WD-WCAZAFxxxx78 1.35TiB > > > > /dev/mapper/WD-WCC4J7xxxxSZ 588.00GiB > > > > /dev/mapper/WD-WCC4M2xxxxXH 1.44TiB > > > > missing 510.00GiB > > > > /dev/mapper/S1xxxxJ3 579.00GiB > > > > /dev/mapper/WD-WCC4N3xxxx17 2.26TiB > > > > /dev/mapper/WD-WCC7K2xxxxNS 1.01TiB > > > > > > > > Metadata,RAID1: Size:41.00GiB, Used:23.14GiB (56.44%) > > > > /dev/mapper/WD-WCAU45xxxx03 8.00GiB > > > > /dev/mapper/WD-WCAZAFxxxx78 17.00GiB > > > > /dev/mapper/WD-WCC4J7xxxxSZ 5.00GiB > > > > /dev/mapper/WD-WCC4M2xxxxXH 13.00GiB > > > > missing 3.00GiB > > > > /dev/mapper/S1xxxxJ3 5.00GiB > > > > /dev/mapper/WD-WCC4N3xxxx17 16.00GiB > > > > /dev/mapper/WD-WCC7K2xxxxNS 15.00GiB > > > > > > > > System,RAID1: Size:32.00MiB, Used:848.00KiB (2.59%) > > > > missing 32.00MiB > > > > /dev/mapper/WD-WCC4N3xxxx17 32.00MiB > > > > > > > > Unallocated: > > > > /dev/mapper/WD-WCAU45xxxx03 339.51GiB > > > > /dev/mapper/WD-WCAZAFxxxx78 461.01GiB > > > > /dev/mapper/WD-WCC4J7xxxxSZ 338.51GiB > > > > /dev/mapper/WD-WCC4M2xxxxXH 373.01GiB > > > > missing -513.03GiB > > > > /dev/mapper/S1xxxxJ3 347.51GiB > > > > /dev/mapper/WD-WCC4N3xxxx17 460.47GiB > > > > /dev/mapper/WD-WCC7K2xxxxNS 2.61TiB > > > > > > > > dump-super > > > > ========== > > > > > > > > superblock: bytenr=65536, device=/dev/mapper/WD-WCAU45xxxx03 > > > > --------------------------------------------------------- > > > > csum_type 0 (crc32c) > > > > csum_size 4 > > > > csum 0x51beb068 [match] > > > > bytenr 65536 > > > > flags 0x1 > > > > ( WRITTEN ) > > > > magic _BHRfS_M [match] > > > > fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113 > > > > metadata_uuid 59301fea-434a-xxxx-bb45-08fcfe8ce113 > > > > label pool_16-03 > > > > generation 113519755 > > > > root 15602414796800 > > > > sys_array_size 129 > > > > chunk_root_generation 63394299 > > > > root_level 1 > > > > chunk_root 19216820502528 > > > > chunk_root_level 1 > > > > log_root 0 > > > > log_root_transid 0 > > > > log_root_level 0 > > > > total_bytes 16003136864256 > > > > bytes_used 4227124142080 > > > > sectorsize 4096 > > > > nodesize 16384 > > > > leafsize (deprecated) 16384 > > > > stripesize 4096 > > > > root_dir 6 > > > > num_devices 8 > > > > compat_flags 0x0 > > > > compat_ro_flags 0x0 > > > > incompat_flags 0x371 > > > > ( MIXED_BACKREF | > > > > COMPRESS_ZSTD | > > > > BIG_METADATA | > > > > EXTENDED_IREF | > > > > SKINNY_METADATA | > > > > NO_HOLES ) > > > > cache_generation 2975866 > > > > uuid_tree_generation 113519755 > > > > dev_item.uuid a9b2e4ea-404c-xxxx-a450-dc84b0956ce1 > > > > dev_item.fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113 [match] > > > > dev_item.type 0 > > > > dev_item.total_bytes 1000201740288 > > > > dev_item.bytes_used 635655159808 > > > > dev_item.io_align 4096 > > > > dev_item.io_width 4096 > > > > dev_item.sector_size 4096 > > > > dev_item.devid 1 > > > > dev_item.dev_group 0 > > > > dev_item.seek_speed 0 > > > > dev_item.bandwidth 0 > > > > dev_item.generation 0 > > > > > > > > device stats > > > > ============ > > > > > > > > [/dev/mapper/WD-WCAU45xxxx03].write_io_errs 0 > > > > [/dev/mapper/WD-WCAU45xxxx03].read_io_errs 0 > > > > [/dev/mapper/WD-WCAU45xxxx03].flush_io_errs 0 > > > > [/dev/mapper/WD-WCAU45xxxx03].corruption_errs 0 > > > > [/dev/mapper/WD-WCAU45xxxx03].generation_errs 0 > > > > [/dev/mapper/WD-WCAZAFxxxx78].write_io_errs 0 > > > > [/dev/mapper/WD-WCAZAFxxxx78].read_io_errs 0 > > > > [/dev/mapper/WD-WCAZAFxxxx78].flush_io_errs 0 > > > > [/dev/mapper/WD-WCAZAFxxxx78].corruption_errs 0 > > > > [/dev/mapper/WD-WCAZAFxxxx78].generation_errs 0 > > > > [/dev/mapper/WD-WCC4J7xxxxSZ].write_io_errs 0 > > > > [/dev/mapper/WD-WCC4J7xxxxSZ].read_io_errs 1 > > > > [/dev/mapper/WD-WCC4J7xxxxSZ].flush_io_errs 0 > > > > [/dev/mapper/WD-WCC4J7xxxxSZ].corruption_errs 0 > > > > [/dev/mapper/WD-WCC4J7xxxxSZ].generation_errs 0 > > > > [/dev/mapper/WD-WCC4M2xxxxXH].write_io_errs 0 > > > > [/dev/mapper/WD-WCC4M2xxxxXH].read_io_errs 0 > > > > [/dev/mapper/WD-WCC4M2xxxxXH].flush_io_errs 0 > > > > [/dev/mapper/WD-WCC4M2xxxxXH].corruption_errs 0 > > > > [/dev/mapper/WD-WCC4M2xxxxXH].generation_errs 0 > > > > [devid:6].write_io_errs 0 > > > > [devid:6].read_io_errs 0 > > > > [devid:6].flush_io_errs 0 > > > > [devid:6].corruption_errs 72016 > > > > [devid:6].generation_errs 100 > > > > [/dev/mapper/S1xxxxJ3].write_io_errs 0 > > > > [/dev/mapper/S1xxxxJ3].read_io_errs 0 > > > > [/dev/mapper/S1xxxxJ3].flush_io_errs 0 > > > > [/dev/mapper/S1xxxxJ3].corruption_errs 2 > > > > [/dev/mapper/S1xxxxJ3].generation_errs 0 > > > > [/dev/mapper/WD-WCC4N3xxxx17].write_io_errs 0 > > > > [/dev/mapper/WD-WCC4N3xxxx17].read_io_errs 0 > > > > [/dev/mapper/WD-WCC4N3xxxx17].flush_io_errs 0 > > > > [/dev/mapper/WD-WCC4N3xxxx17].corruption_errs 0 > > > > [/dev/mapper/WD-WCC4N3xxxx17].generation_errs 0 > > > > [/dev/mapper/WD-WCC7K2xxxxNS].write_io_errs 0 > > > > [/dev/mapper/WD-WCC7K2xxxxNS].read_io_errs 0 > > > > [/dev/mapper/WD-WCC7K2xxxxNS].flush_io_errs 0 > > > > [/dev/mapper/WD-WCC7K2xxxxNS].corruption_errs 0 > > > > [/dev/mapper/WD-WCC7K2xxxxNS].generation_errs 0 > > > > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers") 2021-12-11 2:53 ` Zygo Blaxell @ 2021-12-16 20:52 ` Lukas Pirl 0 siblings, 0 replies; 8+ messages in thread From: Lukas Pirl @ 2021-12-16 20:52 UTC (permalink / raw) To: Zygo Blaxell; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 13752 bytes --] On Fri, 2021-12-10 21:53 -0500, Zygo Blaxell wrote as excerpted: > On Fri, Dec 10, 2021 at 02:28:28PM +0100, Lukas Pirl wrote: > > > > > it took me (and the disks) a while to report back; here we go: > > > > On Thu, 2021-12-02 13:11 -0500, Zygo Blaxell wrote as excerpted: > > > > On Thu, 2021-11-25 19:06 +0100, Lukas Pirl wrote as excerpted: > > > > > Dear btrfs community, > > > > > > > > > > this is another report of a probably endless balance which loops on > > > > > "found 1 extents, stage: update data pointers". > > > > > > > > > > I observe it on a btrfs RAID 1 on around 7 luks-encrypted spinning > > > > > disks (more fs details below) used for storing cold data. One disk > > > > > failed physically. Now, I try to "btrfs device delete missing". The > > > > > operation runs forever (probably, waited more than 30 days, another > > > > > time more than 50 days). > > > > > > > > > > dmesg says: > > > > > [ 22:26] BTRFS info (device dm-1): relocating block group > > > > > 1109204664320 > > > > > flags data|raid1 > > > > > [ 22:27] BTRFS info (device dm-1): found 4164 extents, stage: > > > > > move > > > > > data > > > > > extents > > > > > [ +5.476247] BTRFS info (device dm-1): found 4164 extents, stage: > > > > > update > > > > > data pointers > > > > > [ +2.545299] BTRFS info (device dm-1): found 1 extents, stage: > > > > > update > > > > > data > > > > > pointers > > > > > > > > > > and then the last message repeats every ~ .25 seconds ("forever"). > > > > > Memory and CPU usage are not excessive (most is IO wait, I assume). > > > > > > > > > > What I have tried: > > > > > * Linux 4 (multiple minor versions, don't remember which exactly) > > > > > * Linux 5.10 > > > > > * Linux 5.15 > > > > > * btrfs-progs v5.15 > > > > > * remove subvolues (before: ~ 200, after: ~ 90) > > > > > * free space cache v1, v2, none > > > > > * reboot, restart removal/balance (multiple times) > > > > > > Does it always happen on the same block group? If so, that points to > > > something lurking in your metadata. If a reboot fixes it for one block > > > group and then it gets stuck on some other block group, it points to > > > an issue in kernel memory state. > > > > Although I haven't paid attention to the block group number in the past, > > another run of ``btrfs dev del`` just now gave the same last block group > > number (1109204664320) before, presumably, looping. > > > > > What do you get from 'btrfs check --readonly'? > > > > $ btrfs check --readonly --mode lowmem /dev/disk/by-label/pool_16-03 > > > > [1/7] checking root items > > Opening filesystem to check... > > warning, device 6 is missing > > Checking filesystem on /dev/disk/by-label/pool_16-03 > > UUID: 59301fea-434a-4c43-bb45-08fcfe8ce113 > > [2/7] checking extents > > ERROR: extent[1109584044032, 8192] referencer count mismatch (root: 276, > > owner: 1154248, offset: 100401152) wanted: 1, have: 0 > > ERROR: errors found in extent allocation tree or chunk allocation > > [3/7] checking free space tree > > [4/7] checking fs roots > > [5/7] checking only csums items (without verifying data) > > [6/7] checking root refs done with fs roots in lowmem mode, skipping > > [7/7] checking quota groups skipped (not enabled on this FS) > > found 4252313206784 bytes used, error(s) found > > total csum bytes: 4128183360 > > total tree bytes: 25053184000 > > total fs tree bytes: 16415014912 > > total extent tree bytes: 3662594048 > > btree space waste bytes: 4949241278 > > file data blocks allocated: 8025128243200 > > referenced 7552211206144 > > OK that's not too bad, just one bad reference. > > > So what can be done? ``check --repair``? Or too dangerous? :) > > If you have backups are you are prepared to restore them, you can try > check --repair. To report back, all I got from ``check --repair`` was a segfault (progs v5.15.1, Linux 5.15.0): … [2/7] checking extents ERROR: extent[1109584044032, 8192] referencer count mismatch (root: 276, owner: 1154248, offset: 100401152) wanted: 1, have: 0 zsh: segmentation fault btrfs check --repair --mode lowmem /dev/disk/by- label/… Any open, known, and possibly related bugs I can wait for to be fixed to try again? :) Cheers, Lukas > > Thanks for your help > > > > Lukas > > > > > > > ==================================================================== > > > > > == > > > > > > > > > > filesystem show > > > > > =============== > > > > > > > > > > Label: 'pool_16-03' uuid: 59301fea-434a-xxxx-bb45-08fcfe8ce113 > > > > > Total devices 8 FS bytes used 3.84TiB > > > > > devid 1 size 931.51GiB used 592.00GiB path > > > > > /dev/mapper/WD- > > > > > WCAU45xxxx03 > > > > > devid 3 size 1.82TiB used 1.37TiB path /dev/mapper/WD- > > > > > WCAZAFxxxx78 > > > > > devid 4 size 931.51GiB used 593.00GiB path > > > > > /dev/mapper/WD- > > > > > WCC4J7xxxxSZ > > > > > devid 5 size 1.82TiB used 1.46TiB path /dev/mapper/WD- > > > > > WCC4M2xxxxXH > > > > > devid 7 size 931.51GiB used 584.00GiB path > > > > > /dev/mapper/S1xxxxJ3 > > > > > devid 9 size 2.73TiB used 2.28TiB path /dev/mapper/WD- > > > > > WCC4N3xxxx17 > > > > > devid 10 size 3.64TiB used 1.03TiB path /dev/mapper/WD- > > > > > WCC7K2xxxxNS > > > > > *** Some devices missing > > > > > > > > > > subvolumes > > > > > ========== > > > > > > > > > > ~ 90, of which ~ 60 are read-only snapshots of the other ~ 30 > > > > > > > > > > filesystem usage > > > > > ================ > > > > > > > > > > Overall: > > > > > Device size: 12.74TiB > > > > > Device allocated: 8.36TiB > > > > > Device unallocated: 4.38TiB > > > > > Device missing: 0.00B > > > > > Used: 7.69TiB > > > > > Free (estimated): 2.50TiB (min: 2.50TiB) > > > > > Free (statfs, df): 1.46TiB > > > > > Data ratio: 2.00 > > > > > Metadata ratio: 2.00 > > > > > Global reserve: 512.00MiB (used: 48.00KiB) > > > > > Multiple profiles: no > > > > > > > > > > Data,RAID1: Size:4.14TiB, Used:3.82TiB (92.33%) > > > > > /dev/mapper/WD-WCAU45xxxx03 584.00GiB > > > > > /dev/mapper/WD-WCAZAFxxxx78 1.35TiB > > > > > /dev/mapper/WD-WCC4J7xxxxSZ 588.00GiB > > > > > /dev/mapper/WD-WCC4M2xxxxXH 1.44TiB > > > > > missing 510.00GiB > > > > > /dev/mapper/S1xxxxJ3 579.00GiB > > > > > /dev/mapper/WD-WCC4N3xxxx17 2.26TiB > > > > > /dev/mapper/WD-WCC7K2xxxxNS 1.01TiB > > > > > > > > > > Metadata,RAID1: Size:41.00GiB, Used:23.14GiB (56.44%) > > > > > /dev/mapper/WD-WCAU45xxxx03 8.00GiB > > > > > /dev/mapper/WD-WCAZAFxxxx78 17.00GiB > > > > > /dev/mapper/WD-WCC4J7xxxxSZ 5.00GiB > > > > > /dev/mapper/WD-WCC4M2xxxxXH 13.00GiB > > > > > missing 3.00GiB > > > > > /dev/mapper/S1xxxxJ3 5.00GiB > > > > > /dev/mapper/WD-WCC4N3xxxx17 16.00GiB > > > > > /dev/mapper/WD-WCC7K2xxxxNS 15.00GiB > > > > > > > > > > System,RAID1: Size:32.00MiB, Used:848.00KiB (2.59%) > > > > > missing 32.00MiB > > > > > /dev/mapper/WD-WCC4N3xxxx17 32.00MiB > > > > > > > > > > Unallocated: > > > > > /dev/mapper/WD-WCAU45xxxx03 339.51GiB > > > > > /dev/mapper/WD-WCAZAFxxxx78 461.01GiB > > > > > /dev/mapper/WD-WCC4J7xxxxSZ 338.51GiB > > > > > /dev/mapper/WD-WCC4M2xxxxXH 373.01GiB > > > > > missing -513.03GiB > > > > > /dev/mapper/S1xxxxJ3 347.51GiB > > > > > /dev/mapper/WD-WCC4N3xxxx17 460.47GiB > > > > > /dev/mapper/WD-WCC7K2xxxxNS 2.61TiB > > > > > > > > > > dump-super > > > > > ========== > > > > > > > > > > superblock: bytenr=65536, device=/dev/mapper/WD-WCAU45xxxx03 > > > > > --------------------------------------------------------- > > > > > csum_type 0 (crc32c) > > > > > csum_size 4 > > > > > csum 0x51beb068 [match] > > > > > bytenr 65536 > > > > > flags 0x1 > > > > > ( WRITTEN ) > > > > > magic _BHRfS_M [match] > > > > > fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113 > > > > > metadata_uuid 59301fea-434a-xxxx-bb45-08fcfe8ce113 > > > > > label pool_16-03 > > > > > generation 113519755 > > > > > root 15602414796800 > > > > > sys_array_size 129 > > > > > chunk_root_generation 63394299 > > > > > root_level 1 > > > > > chunk_root 19216820502528 > > > > > chunk_root_level 1 > > > > > log_root 0 > > > > > log_root_transid 0 > > > > > log_root_level 0 > > > > > total_bytes 16003136864256 > > > > > bytes_used 4227124142080 > > > > > sectorsize 4096 > > > > > nodesize 16384 > > > > > leafsize (deprecated) 16384 > > > > > stripesize 4096 > > > > > root_dir 6 > > > > > num_devices 8 > > > > > compat_flags 0x0 > > > > > compat_ro_flags 0x0 > > > > > incompat_flags 0x371 > > > > > ( MIXED_BACKREF | > > > > > COMPRESS_ZSTD | > > > > > BIG_METADATA | > > > > > EXTENDED_IREF | > > > > > SKINNY_METADATA | > > > > > NO_HOLES ) > > > > > cache_generation 2975866 > > > > > uuid_tree_generation 113519755 > > > > > dev_item.uuid a9b2e4ea-404c-xxxx-a450-dc84b0956ce1 > > > > > dev_item.fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113 [match] > > > > > dev_item.type 0 > > > > > dev_item.total_bytes 1000201740288 > > > > > dev_item.bytes_used 635655159808 > > > > > dev_item.io_align 4096 > > > > > dev_item.io_width 4096 > > > > > dev_item.sector_size 4096 > > > > > dev_item.devid 1 > > > > > dev_item.dev_group 0 > > > > > dev_item.seek_speed 0 > > > > > dev_item.bandwidth 0 > > > > > dev_item.generation 0 > > > > > > > > > > device stats > > > > > ============ > > > > > > > > > > [/dev/mapper/WD-WCAU45xxxx03].write_io_errs 0 > > > > > [/dev/mapper/WD-WCAU45xxxx03].read_io_errs 0 > > > > > [/dev/mapper/WD-WCAU45xxxx03].flush_io_errs 0 > > > > > [/dev/mapper/WD-WCAU45xxxx03].corruption_errs 0 > > > > > [/dev/mapper/WD-WCAU45xxxx03].generation_errs 0 > > > > > [/dev/mapper/WD-WCAZAFxxxx78].write_io_errs 0 > > > > > [/dev/mapper/WD-WCAZAFxxxx78].read_io_errs 0 > > > > > [/dev/mapper/WD-WCAZAFxxxx78].flush_io_errs 0 > > > > > [/dev/mapper/WD-WCAZAFxxxx78].corruption_errs 0 > > > > > [/dev/mapper/WD-WCAZAFxxxx78].generation_errs 0 > > > > > [/dev/mapper/WD-WCC4J7xxxxSZ].write_io_errs 0 > > > > > [/dev/mapper/WD-WCC4J7xxxxSZ].read_io_errs 1 > > > > > [/dev/mapper/WD-WCC4J7xxxxSZ].flush_io_errs 0 > > > > > [/dev/mapper/WD-WCC4J7xxxxSZ].corruption_errs 0 > > > > > [/dev/mapper/WD-WCC4J7xxxxSZ].generation_errs 0 > > > > > [/dev/mapper/WD-WCC4M2xxxxXH].write_io_errs 0 > > > > > [/dev/mapper/WD-WCC4M2xxxxXH].read_io_errs 0 > > > > > [/dev/mapper/WD-WCC4M2xxxxXH].flush_io_errs 0 > > > > > [/dev/mapper/WD-WCC4M2xxxxXH].corruption_errs 0 > > > > > [/dev/mapper/WD-WCC4M2xxxxXH].generation_errs 0 > > > > > [devid:6].write_io_errs 0 > > > > > [devid:6].read_io_errs 0 > > > > > [devid:6].flush_io_errs 0 > > > > > [devid:6].corruption_errs 72016 > > > > > [devid:6].generation_errs 100 > > > > > [/dev/mapper/S1xxxxJ3].write_io_errs 0 > > > > > [/dev/mapper/S1xxxxJ3].read_io_errs 0 > > > > > [/dev/mapper/S1xxxxJ3].flush_io_errs 0 > > > > > [/dev/mapper/S1xxxxJ3].corruption_errs 2 > > > > > [/dev/mapper/S1xxxxJ3].generation_errs 0 > > > > > [/dev/mapper/WD-WCC4N3xxxx17].write_io_errs 0 > > > > > [/dev/mapper/WD-WCC4N3xxxx17].read_io_errs 0 > > > > > [/dev/mapper/WD-WCC4N3xxxx17].flush_io_errs 0 > > > > > [/dev/mapper/WD-WCC4N3xxxx17].corruption_errs 0 > > > > > [/dev/mapper/WD-WCC4N3xxxx17].generation_errs 0 > > > > > [/dev/mapper/WD-WCC7K2xxxxNS].write_io_errs 0 > > > > > [/dev/mapper/WD-WCC7K2xxxxNS].read_io_errs 0 > > > > > [/dev/mapper/WD-WCC7K2xxxxNS].flush_io_errs 0 > > > > > [/dev/mapper/WD-WCC7K2xxxxNS].corruption_errs 0 > > > > > [/dev/mapper/WD-WCC7K2xxxxNS].generation_errs 0 > > > > > > > > > > [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-12-16 20:53 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-11-25 18:06 Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers") Lukas Pirl 2021-12-02 14:49 ` Lukas Pirl 2021-12-02 18:11 ` Zygo Blaxell 2021-12-03 10:14 ` Lukas Pirl 2021-12-05 11:54 ` Lukas Pirl 2021-12-10 13:28 ` Lukas Pirl 2021-12-11 2:53 ` Zygo Blaxell 2021-12-16 20:52 ` Lukas Pirl
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.