linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers")
@ 2021-11-25 18:06 Lukas Pirl
  2021-12-02 14:49 ` Lukas Pirl
  0 siblings, 1 reply; 8+ messages in thread
From: Lukas Pirl @ 2021-11-25 18:06 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 7485 bytes --]

Dear btrfs community,

this is another report of a probably endless balance which loops on
"found 1 extents, stage: update data pointers".

I observe it on a btrfs RAID 1 on around 7 luks-encrypted spinning
disks (more fs details below) used for storing cold data. One disk
failed physically. Now, I try to "btrfs device delete missing". The
operation runs forever (probably, waited more than 30 days, another
time more than 50 days).

dmesg says:
[      22:26] BTRFS info (device dm-1): relocating block group 1109204664320 flags data|raid1
[      22:27] BTRFS info (device dm-1): found 4164 extents, stage: move data extents
[  +5.476247] BTRFS info (device dm-1): found 4164 extents, stage: update data pointers
[  +2.545299] BTRFS info (device dm-1): found 1 extents, stage: update data pointers

and then the last message repeats every ~ .25 seconds ("forever").
Memory and CPU usage are not excessive (most is IO wait, I assume).

What I have tried:
* Linux 4 (multiple minor versions, don't remember which exactly)
* Linux 5.10
* Linux 5.15
* btrfs-progs v5.15
* remove subvolues (before: ~ 200, after: ~ 90)
* free space cache v1, v2, none
* reboot, restart removal/balance (multiple times)

How can we find the problem here to make btrfs an even more stable
file system in the future?
(The particular fs has been created 2016, I am otherwise happy with
btrfs and advocating; BTW I have backups and are ready to use them)

Another question I was asking myself: can btrfs be forced to forget
about a device (as in "delete from meta data) to then just run a
regular balance?

Thanks in advance; I hope we can debug this.

Lukas

======================================================================

filesystem show
===============

Label: 'pool_16-03'  uuid: 59301fea-434a-xxxx-bb45-08fcfe8ce113
	Total devices 8 FS bytes used 3.84TiB
	devid    1 size 931.51GiB used 592.00GiB path /dev/mapper/WD-
WCAU45xxxx03
	devid    3 size 1.82TiB used 1.37TiB path /dev/mapper/WD-WCAZAFxxxx78
	devid    4 size 931.51GiB used 593.00GiB path /dev/mapper/WD-
WCC4J7xxxxSZ
	devid    5 size 1.82TiB used 1.46TiB path /dev/mapper/WD-WCC4M2xxxxXH
	devid    7 size 931.51GiB used 584.00GiB path /dev/mapper/S1xxxxJ3
	devid    9 size 2.73TiB used 2.28TiB path /dev/mapper/WD-WCC4N3xxxx17
	devid   10 size 3.64TiB used 1.03TiB path /dev/mapper/WD-WCC7K2xxxxNS
	*** Some devices missing

subvolumes
==========

~ 90, of which ~ 60 are read-only snapshots of the other ~ 30

filesystem usage
================

Overall:
    Device size:		  12.74TiB
    Device allocated:		   8.36TiB
    Device unallocated:		   4.38TiB
    Device missing:		     0.00B
    Used:			   7.69TiB
    Free (estimated):		   2.50TiB	(min: 2.50TiB)
    Free (statfs, df):		   1.46TiB
    Data ratio:			      2.00
    Metadata ratio:		      2.00
    Global reserve:		 512.00MiB	(used: 48.00KiB)
    Multiple profiles:		        no

Data,RAID1: Size:4.14TiB, Used:3.82TiB (92.33%)
   /dev/mapper/WD-WCAU45xxxx03	 584.00GiB
   /dev/mapper/WD-WCAZAFxxxx78	   1.35TiB
   /dev/mapper/WD-WCC4J7xxxxSZ	 588.00GiB
   /dev/mapper/WD-WCC4M2xxxxXH	   1.44TiB
   missing	 510.00GiB
   /dev/mapper/S1xxxxJ3	 579.00GiB
   /dev/mapper/WD-WCC4N3xxxx17	   2.26TiB
   /dev/mapper/WD-WCC7K2xxxxNS	   1.01TiB

Metadata,RAID1: Size:41.00GiB, Used:23.14GiB (56.44%)
   /dev/mapper/WD-WCAU45xxxx03	   8.00GiB
   /dev/mapper/WD-WCAZAFxxxx78	  17.00GiB
   /dev/mapper/WD-WCC4J7xxxxSZ	   5.00GiB
   /dev/mapper/WD-WCC4M2xxxxXH	  13.00GiB
   missing	   3.00GiB
   /dev/mapper/S1xxxxJ3	   5.00GiB
   /dev/mapper/WD-WCC4N3xxxx17	  16.00GiB
   /dev/mapper/WD-WCC7K2xxxxNS	  15.00GiB

System,RAID1: Size:32.00MiB, Used:848.00KiB (2.59%)
   missing	  32.00MiB
   /dev/mapper/WD-WCC4N3xxxx17	  32.00MiB

Unallocated:
   /dev/mapper/WD-WCAU45xxxx03	 339.51GiB
   /dev/mapper/WD-WCAZAFxxxx78	 461.01GiB
   /dev/mapper/WD-WCC4J7xxxxSZ	 338.51GiB
   /dev/mapper/WD-WCC4M2xxxxXH	 373.01GiB
   missing	-513.03GiB
   /dev/mapper/S1xxxxJ3	 347.51GiB
   /dev/mapper/WD-WCC4N3xxxx17	 460.47GiB
   /dev/mapper/WD-WCC7K2xxxxNS	   2.61TiB

dump-super
==========

superblock: bytenr=65536, device=/dev/mapper/WD-WCAU45xxxx03
---------------------------------------------------------
csum_type		0 (crc32c)
csum_size		4
csum			0x51beb068 [match]
bytenr			65536
flags			0x1
			( WRITTEN )
magic			_BHRfS_M [match]
fsid			59301fea-434a-xxxx-bb45-08fcfe8ce113
metadata_uuid		59301fea-434a-xxxx-bb45-08fcfe8ce113
label			pool_16-03
generation		113519755
root			15602414796800
sys_array_size		129
chunk_root_generation	63394299
root_level		1
chunk_root		19216820502528
chunk_root_level	1
log_root		0
log_root_transid	0
log_root_level		0
total_bytes		16003136864256
bytes_used		4227124142080
sectorsize		4096
nodesize		16384
leafsize (deprecated)	16384
stripesize		4096
root_dir		6
num_devices		8
compat_flags		0x0
compat_ro_flags		0x0
incompat_flags		0x371
			( MIXED_BACKREF |
			  COMPRESS_ZSTD |
			  BIG_METADATA |
			  EXTENDED_IREF |
			  SKINNY_METADATA |
			  NO_HOLES )
cache_generation	2975866
uuid_tree_generation	113519755
dev_item.uuid		a9b2e4ea-404c-xxxx-a450-dc84b0956ce1
dev_item.fsid		59301fea-434a-xxxx-bb45-08fcfe8ce113 [match]
dev_item.type		0
dev_item.total_bytes	1000201740288
dev_item.bytes_used	635655159808
dev_item.io_align	4096
dev_item.io_width	4096
dev_item.sector_size	4096
dev_item.devid		1
dev_item.dev_group	0
dev_item.seek_speed	0
dev_item.bandwidth	0
dev_item.generation	0

device stats
============

[/dev/mapper/WD-WCAU45xxxx03].write_io_errs    0
[/dev/mapper/WD-WCAU45xxxx03].read_io_errs     0
[/dev/mapper/WD-WCAU45xxxx03].flush_io_errs    0
[/dev/mapper/WD-WCAU45xxxx03].corruption_errs  0
[/dev/mapper/WD-WCAU45xxxx03].generation_errs  0
[/dev/mapper/WD-WCAZAFxxxx78].write_io_errs    0
[/dev/mapper/WD-WCAZAFxxxx78].read_io_errs     0
[/dev/mapper/WD-WCAZAFxxxx78].flush_io_errs    0
[/dev/mapper/WD-WCAZAFxxxx78].corruption_errs  0
[/dev/mapper/WD-WCAZAFxxxx78].generation_errs  0
[/dev/mapper/WD-WCC4J7xxxxSZ].write_io_errs    0
[/dev/mapper/WD-WCC4J7xxxxSZ].read_io_errs     1
[/dev/mapper/WD-WCC4J7xxxxSZ].flush_io_errs    0
[/dev/mapper/WD-WCC4J7xxxxSZ].corruption_errs  0
[/dev/mapper/WD-WCC4J7xxxxSZ].generation_errs  0
[/dev/mapper/WD-WCC4M2xxxxXH].write_io_errs    0
[/dev/mapper/WD-WCC4M2xxxxXH].read_io_errs     0
[/dev/mapper/WD-WCC4M2xxxxXH].flush_io_errs    0
[/dev/mapper/WD-WCC4M2xxxxXH].corruption_errs  0
[/dev/mapper/WD-WCC4M2xxxxXH].generation_errs  0
[devid:6].write_io_errs    0
[devid:6].read_io_errs     0
[devid:6].flush_io_errs    0
[devid:6].corruption_errs  72016
[devid:6].generation_errs  100
[/dev/mapper/S1xxxxJ3].write_io_errs    0
[/dev/mapper/S1xxxxJ3].read_io_errs     0
[/dev/mapper/S1xxxxJ3].flush_io_errs    0
[/dev/mapper/S1xxxxJ3].corruption_errs  2
[/dev/mapper/S1xxxxJ3].generation_errs  0
[/dev/mapper/WD-WCC4N3xxxx17].write_io_errs    0
[/dev/mapper/WD-WCC4N3xxxx17].read_io_errs     0
[/dev/mapper/WD-WCC4N3xxxx17].flush_io_errs    0
[/dev/mapper/WD-WCC4N3xxxx17].corruption_errs  0
[/dev/mapper/WD-WCC4N3xxxx17].generation_errs  0
[/dev/mapper/WD-WCC7K2xxxxNS].write_io_errs    0
[/dev/mapper/WD-WCC7K2xxxxNS].read_io_errs     0
[/dev/mapper/WD-WCC7K2xxxxNS].flush_io_errs    0
[/dev/mapper/WD-WCC7K2xxxxNS].corruption_errs  0
[/dev/mapper/WD-WCC7K2xxxxNS].generation_errs  0

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers")
  2021-11-25 18:06 Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers") Lukas Pirl
@ 2021-12-02 14:49 ` Lukas Pirl
  2021-12-02 18:11   ` Zygo Blaxell
  0 siblings, 1 reply; 8+ messages in thread
From: Lukas Pirl @ 2021-12-02 14:49 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 9957 bytes --]

(re-post)

Is there no motivation to address this or do I need to supply additional
information?

Cheers

Lukas

On Thu, 2021-11-25 19:06 +0100, Lukas Pirl wrote as excerpted:
> Dear btrfs community,
> 
> this is another report of a probably endless balance which loops on
> "found 1 extents, stage: update data pointers".
> 
> I observe it on a btrfs RAID 1 on around 7 luks-encrypted spinning
> disks (more fs details below) used for storing cold data. One disk
> failed physically. Now, I try to "btrfs device delete missing". The
> operation runs forever (probably, waited more than 30 days, another
> time more than 50 days).
> 
> dmesg says:
> [      22:26] BTRFS info (device dm-1): relocating block group 1109204664320
> flags data|raid1
> [      22:27] BTRFS info (device dm-1): found 4164 extents, stage: move data
> extents
> [  +5.476247] BTRFS info (device dm-1): found 4164 extents, stage: update
> data pointers
> [  +2.545299] BTRFS info (device dm-1): found 1 extents, stage: update data
> pointers
> 
> and then the last message repeats every ~ .25 seconds ("forever").
> Memory and CPU usage are not excessive (most is IO wait, I assume).
> 
> What I have tried:
> * Linux 4 (multiple minor versions, don't remember which exactly)
> * Linux 5.10
> * Linux 5.15
> * btrfs-progs v5.15
> * remove subvolues (before: ~ 200, after: ~ 90)
> * free space cache v1, v2, none
> * reboot, restart removal/balance (multiple times)
> 
> How can we find the problem here to make btrfs an even more stable
> file system in the future?
> (The particular fs has been created 2016, I am otherwise happy with
> btrfs and advocating; BTW I have backups and are ready to use them)
> 
> Another question I was asking myself: can btrfs be forced to forget
> about a device (as in "delete from meta data) to then just run a
> regular balance?
> 
> Thanks in advance; I hope we can debug this.
> 
> Lukas
> 
> ======================================================================
> 
> filesystem show
> ===============
> 
> Label: 'pool_16-03'  uuid: 59301fea-434a-xxxx-bb45-08fcfe8ce113
>         Total devices 8 FS bytes used 3.84TiB
>         devid    1 size 931.51GiB used 592.00GiB path /dev/mapper/WD-
> WCAU45xxxx03
>         devid    3 size 1.82TiB used 1.37TiB path /dev/mapper/WD-
> WCAZAFxxxx78
>         devid    4 size 931.51GiB used 593.00GiB path /dev/mapper/WD-
> WCC4J7xxxxSZ
>         devid    5 size 1.82TiB used 1.46TiB path /dev/mapper/WD-
> WCC4M2xxxxXH
>         devid    7 size 931.51GiB used 584.00GiB path /dev/mapper/S1xxxxJ3
>         devid    9 size 2.73TiB used 2.28TiB path /dev/mapper/WD-
> WCC4N3xxxx17
>         devid   10 size 3.64TiB used 1.03TiB path /dev/mapper/WD-
> WCC7K2xxxxNS
>         *** Some devices missing
> 
> subvolumes
> ==========
> 
> ~ 90, of which ~ 60 are read-only snapshots of the other ~ 30
> 
> filesystem usage
> ================
> 
> Overall:
>     Device size:                  12.74TiB
>     Device allocated:              8.36TiB
>     Device unallocated:            4.38TiB
>     Device missing:                  0.00B
>     Used:                          7.69TiB
>     Free (estimated):              2.50TiB      (min: 2.50TiB)
>     Free (statfs, df):             1.46TiB
>     Data ratio:                       2.00
>     Metadata ratio:                   2.00
>     Global reserve:              512.00MiB      (used: 48.00KiB)
>     Multiple profiles:                  no
> 
> Data,RAID1: Size:4.14TiB, Used:3.82TiB (92.33%)
>    /dev/mapper/WD-WCAU45xxxx03   584.00GiB
>    /dev/mapper/WD-WCAZAFxxxx78     1.35TiB
>    /dev/mapper/WD-WCC4J7xxxxSZ   588.00GiB
>    /dev/mapper/WD-WCC4M2xxxxXH     1.44TiB
>    missing       510.00GiB
>    /dev/mapper/S1xxxxJ3  579.00GiB
>    /dev/mapper/WD-WCC4N3xxxx17     2.26TiB
>    /dev/mapper/WD-WCC7K2xxxxNS     1.01TiB
> 
> Metadata,RAID1: Size:41.00GiB, Used:23.14GiB (56.44%)
>    /dev/mapper/WD-WCAU45xxxx03     8.00GiB
>    /dev/mapper/WD-WCAZAFxxxx78    17.00GiB
>    /dev/mapper/WD-WCC4J7xxxxSZ     5.00GiB
>    /dev/mapper/WD-WCC4M2xxxxXH    13.00GiB
>    missing         3.00GiB
>    /dev/mapper/S1xxxxJ3    5.00GiB
>    /dev/mapper/WD-WCC4N3xxxx17    16.00GiB
>    /dev/mapper/WD-WCC7K2xxxxNS    15.00GiB
> 
> System,RAID1: Size:32.00MiB, Used:848.00KiB (2.59%)
>    missing        32.00MiB
>    /dev/mapper/WD-WCC4N3xxxx17    32.00MiB
> 
> Unallocated:
>    /dev/mapper/WD-WCAU45xxxx03   339.51GiB
>    /dev/mapper/WD-WCAZAFxxxx78   461.01GiB
>    /dev/mapper/WD-WCC4J7xxxxSZ   338.51GiB
>    /dev/mapper/WD-WCC4M2xxxxXH   373.01GiB
>    missing      -513.03GiB
>    /dev/mapper/S1xxxxJ3  347.51GiB
>    /dev/mapper/WD-WCC4N3xxxx17   460.47GiB
>    /dev/mapper/WD-WCC7K2xxxxNS     2.61TiB
> 
> dump-super
> ==========
> 
> superblock: bytenr=65536, device=/dev/mapper/WD-WCAU45xxxx03
> ---------------------------------------------------------
> csum_type               0 (crc32c)
> csum_size               4
> csum                    0x51beb068 [match]
> bytenr                  65536
> flags                   0x1
>                         ( WRITTEN )
> magic                   _BHRfS_M [match]
> fsid                    59301fea-434a-xxxx-bb45-08fcfe8ce113
> metadata_uuid           59301fea-434a-xxxx-bb45-08fcfe8ce113
> label                   pool_16-03
> generation              113519755
> root                    15602414796800
> sys_array_size          129
> chunk_root_generation   63394299
> root_level              1
> chunk_root              19216820502528
> chunk_root_level        1
> log_root                0
> log_root_transid        0
> log_root_level          0
> total_bytes             16003136864256
> bytes_used              4227124142080
> sectorsize              4096
> nodesize                16384
> leafsize (deprecated)   16384
> stripesize              4096
> root_dir                6
> num_devices             8
> compat_flags            0x0
> compat_ro_flags         0x0
> incompat_flags          0x371
>                         ( MIXED_BACKREF |
>                           COMPRESS_ZSTD |
>                           BIG_METADATA |
>                           EXTENDED_IREF |
>                           SKINNY_METADATA |
>                           NO_HOLES )
> cache_generation        2975866
> uuid_tree_generation    113519755
> dev_item.uuid           a9b2e4ea-404c-xxxx-a450-dc84b0956ce1
> dev_item.fsid           59301fea-434a-xxxx-bb45-08fcfe8ce113 [match]
> dev_item.type           0
> dev_item.total_bytes    1000201740288
> dev_item.bytes_used     635655159808
> dev_item.io_align       4096
> dev_item.io_width       4096
> dev_item.sector_size    4096
> dev_item.devid          1
> dev_item.dev_group      0
> dev_item.seek_speed     0
> dev_item.bandwidth      0
> dev_item.generation     0
> 
> device stats
> ============
> 
> [/dev/mapper/WD-WCAU45xxxx03].write_io_errs    0
> [/dev/mapper/WD-WCAU45xxxx03].read_io_errs     0
> [/dev/mapper/WD-WCAU45xxxx03].flush_io_errs    0
> [/dev/mapper/WD-WCAU45xxxx03].corruption_errs  0
> [/dev/mapper/WD-WCAU45xxxx03].generation_errs  0
> [/dev/mapper/WD-WCAZAFxxxx78].write_io_errs    0
> [/dev/mapper/WD-WCAZAFxxxx78].read_io_errs     0
> [/dev/mapper/WD-WCAZAFxxxx78].flush_io_errs    0
> [/dev/mapper/WD-WCAZAFxxxx78].corruption_errs  0
> [/dev/mapper/WD-WCAZAFxxxx78].generation_errs  0
> [/dev/mapper/WD-WCC4J7xxxxSZ].write_io_errs    0
> [/dev/mapper/WD-WCC4J7xxxxSZ].read_io_errs     1
> [/dev/mapper/WD-WCC4J7xxxxSZ].flush_io_errs    0
> [/dev/mapper/WD-WCC4J7xxxxSZ].corruption_errs  0
> [/dev/mapper/WD-WCC4J7xxxxSZ].generation_errs  0
> [/dev/mapper/WD-WCC4M2xxxxXH].write_io_errs    0
> [/dev/mapper/WD-WCC4M2xxxxXH].read_io_errs     0
> [/dev/mapper/WD-WCC4M2xxxxXH].flush_io_errs    0
> [/dev/mapper/WD-WCC4M2xxxxXH].corruption_errs  0
> [/dev/mapper/WD-WCC4M2xxxxXH].generation_errs  0
> [devid:6].write_io_errs    0
> [devid:6].read_io_errs     0
> [devid:6].flush_io_errs    0
> [devid:6].corruption_errs  72016
> [devid:6].generation_errs  100
> [/dev/mapper/S1xxxxJ3].write_io_errs    0
> [/dev/mapper/S1xxxxJ3].read_io_errs     0
> [/dev/mapper/S1xxxxJ3].flush_io_errs    0
> [/dev/mapper/S1xxxxJ3].corruption_errs  2
> [/dev/mapper/S1xxxxJ3].generation_errs  0
> [/dev/mapper/WD-WCC4N3xxxx17].write_io_errs    0
> [/dev/mapper/WD-WCC4N3xxxx17].read_io_errs     0
> [/dev/mapper/WD-WCC4N3xxxx17].flush_io_errs    0
> [/dev/mapper/WD-WCC4N3xxxx17].corruption_errs  0
> [/dev/mapper/WD-WCC4N3xxxx17].generation_errs  0
> [/dev/mapper/WD-WCC7K2xxxxNS].write_io_errs    0
> [/dev/mapper/WD-WCC7K2xxxxNS].read_io_errs     0
> [/dev/mapper/WD-WCC7K2xxxxNS].flush_io_errs    0
> [/dev/mapper/WD-WCC7K2xxxxNS].corruption_errs  0
> [/dev/mapper/WD-WCC7K2xxxxNS].generation_errs  0


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers")
  2021-12-02 14:49 ` Lukas Pirl
@ 2021-12-02 18:11   ` Zygo Blaxell
  2021-12-03 10:14     ` Lukas Pirl
                       ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Zygo Blaxell @ 2021-12-02 18:11 UTC (permalink / raw)
  To: Lukas Pirl; +Cc: linux-btrfs

On Thu, Dec 02, 2021 at 03:49:08PM +0100, Lukas Pirl wrote:
> (re-post)
> 
> Is there no motivation to address this or do I need to supply additional
> information?
> 
> Cheers
> 
> Lukas
> 
> On Thu, 2021-11-25 19:06 +0100, Lukas Pirl wrote as excerpted:
> > Dear btrfs community,
> > 
> > this is another report of a probably endless balance which loops on
> > "found 1 extents, stage: update data pointers".
> > 
> > I observe it on a btrfs RAID 1 on around 7 luks-encrypted spinning
> > disks (more fs details below) used for storing cold data. One disk
> > failed physically. Now, I try to "btrfs device delete missing". The
> > operation runs forever (probably, waited more than 30 days, another
> > time more than 50 days).
> > 
> > dmesg says:
> > [      22:26] BTRFS info (device dm-1): relocating block group 1109204664320
> > flags data|raid1
> > [      22:27] BTRFS info (device dm-1): found 4164 extents, stage: move data
> > extents
> > [  +5.476247] BTRFS info (device dm-1): found 4164 extents, stage: update
> > data pointers
> > [  +2.545299] BTRFS info (device dm-1): found 1 extents, stage: update data
> > pointers
> > 
> > and then the last message repeats every ~ .25 seconds ("forever").
> > Memory and CPU usage are not excessive (most is IO wait, I assume).
> > 
> > What I have tried:
> > * Linux 4 (multiple minor versions, don't remember which exactly)
> > * Linux 5.10
> > * Linux 5.15
> > * btrfs-progs v5.15
> > * remove subvolues (before: ~ 200, after: ~ 90)
> > * free space cache v1, v2, none
> > * reboot, restart removal/balance (multiple times)

Does it always happen on the same block group?  If so, that points to
something lurking in your metadata.  If a reboot fixes it for one block
group and then it gets stuck on some other block group, it points to
an issue in kernel memory state.

It's more likely something on disk given all the reboots and kernel
versions you have already tried.

> > How can we find the problem here to make btrfs an even more stable
> > file system in the future?

What do you get from 'btrfs check --readonly'?

> > (The particular fs has been created 2016, I am otherwise happy with
> > btrfs and advocating; BTW I have backups and are ready to use them)
> > 
> > Another question I was asking myself: can btrfs be forced to forget
> > about a device (as in "delete from meta data) to then just run a
> > regular balance?

It can, but the way you do that is "mount in degraded mode (to force
forget the device), then run btrfs device delete," and you're getting
stuck on the "btrfs device delete" step.

"btrfs device delete" is itself "resize device to zero, then run balance"
and it's the balance step you're stuck on.

> > Thanks in advance; I hope we can debug this.
> > 
> > Lukas
> > 
> > ======================================================================
> > 
> > filesystem show
> > ===============
> > 
> > Label: 'pool_16-03'  uuid: 59301fea-434a-xxxx-bb45-08fcfe8ce113
> >         Total devices 8 FS bytes used 3.84TiB
> >         devid    1 size 931.51GiB used 592.00GiB path /dev/mapper/WD-
> > WCAU45xxxx03
> >         devid    3 size 1.82TiB used 1.37TiB path /dev/mapper/WD-
> > WCAZAFxxxx78
> >         devid    4 size 931.51GiB used 593.00GiB path /dev/mapper/WD-
> > WCC4J7xxxxSZ
> >         devid    5 size 1.82TiB used 1.46TiB path /dev/mapper/WD-
> > WCC4M2xxxxXH
> >         devid    7 size 931.51GiB used 584.00GiB path /dev/mapper/S1xxxxJ3
> >         devid    9 size 2.73TiB used 2.28TiB path /dev/mapper/WD-
> > WCC4N3xxxx17
> >         devid   10 size 3.64TiB used 1.03TiB path /dev/mapper/WD-
> > WCC7K2xxxxNS
> >         *** Some devices missing
> > 
> > subvolumes
> > ==========
> > 
> > ~ 90, of which ~ 60 are read-only snapshots of the other ~ 30
> > 
> > filesystem usage
> > ================
> > 
> > Overall:
> >     Device size:                  12.74TiB
> >     Device allocated:              8.36TiB
> >     Device unallocated:            4.38TiB
> >     Device missing:                  0.00B
> >     Used:                          7.69TiB
> >     Free (estimated):              2.50TiB      (min: 2.50TiB)
> >     Free (statfs, df):             1.46TiB
> >     Data ratio:                       2.00
> >     Metadata ratio:                   2.00
> >     Global reserve:              512.00MiB      (used: 48.00KiB)
> >     Multiple profiles:                  no
> > 
> > Data,RAID1: Size:4.14TiB, Used:3.82TiB (92.33%)
> >    /dev/mapper/WD-WCAU45xxxx03   584.00GiB
> >    /dev/mapper/WD-WCAZAFxxxx78     1.35TiB
> >    /dev/mapper/WD-WCC4J7xxxxSZ   588.00GiB
> >    /dev/mapper/WD-WCC4M2xxxxXH     1.44TiB
> >    missing       510.00GiB
> >    /dev/mapper/S1xxxxJ3  579.00GiB
> >    /dev/mapper/WD-WCC4N3xxxx17     2.26TiB
> >    /dev/mapper/WD-WCC7K2xxxxNS     1.01TiB
> > 
> > Metadata,RAID1: Size:41.00GiB, Used:23.14GiB (56.44%)
> >    /dev/mapper/WD-WCAU45xxxx03     8.00GiB
> >    /dev/mapper/WD-WCAZAFxxxx78    17.00GiB
> >    /dev/mapper/WD-WCC4J7xxxxSZ     5.00GiB
> >    /dev/mapper/WD-WCC4M2xxxxXH    13.00GiB
> >    missing         3.00GiB
> >    /dev/mapper/S1xxxxJ3    5.00GiB
> >    /dev/mapper/WD-WCC4N3xxxx17    16.00GiB
> >    /dev/mapper/WD-WCC7K2xxxxNS    15.00GiB
> > 
> > System,RAID1: Size:32.00MiB, Used:848.00KiB (2.59%)
> >    missing        32.00MiB
> >    /dev/mapper/WD-WCC4N3xxxx17    32.00MiB
> > 
> > Unallocated:
> >    /dev/mapper/WD-WCAU45xxxx03   339.51GiB
> >    /dev/mapper/WD-WCAZAFxxxx78   461.01GiB
> >    /dev/mapper/WD-WCC4J7xxxxSZ   338.51GiB
> >    /dev/mapper/WD-WCC4M2xxxxXH   373.01GiB
> >    missing      -513.03GiB
> >    /dev/mapper/S1xxxxJ3  347.51GiB
> >    /dev/mapper/WD-WCC4N3xxxx17   460.47GiB
> >    /dev/mapper/WD-WCC7K2xxxxNS     2.61TiB
> > 
> > dump-super
> > ==========
> > 
> > superblock: bytenr=65536, device=/dev/mapper/WD-WCAU45xxxx03
> > ---------------------------------------------------------
> > csum_type               0 (crc32c)
> > csum_size               4
> > csum                    0x51beb068 [match]
> > bytenr                  65536
> > flags                   0x1
> >                         ( WRITTEN )
> > magic                   _BHRfS_M [match]
> > fsid                    59301fea-434a-xxxx-bb45-08fcfe8ce113
> > metadata_uuid           59301fea-434a-xxxx-bb45-08fcfe8ce113
> > label                   pool_16-03
> > generation              113519755
> > root                    15602414796800
> > sys_array_size          129
> > chunk_root_generation   63394299
> > root_level              1
> > chunk_root              19216820502528
> > chunk_root_level        1
> > log_root                0
> > log_root_transid        0
> > log_root_level          0
> > total_bytes             16003136864256
> > bytes_used              4227124142080
> > sectorsize              4096
> > nodesize                16384
> > leafsize (deprecated)   16384
> > stripesize              4096
> > root_dir                6
> > num_devices             8
> > compat_flags            0x0
> > compat_ro_flags         0x0
> > incompat_flags          0x371
> >                         ( MIXED_BACKREF |
> >                           COMPRESS_ZSTD |
> >                           BIG_METADATA |
> >                           EXTENDED_IREF |
> >                           SKINNY_METADATA |
> >                           NO_HOLES )
> > cache_generation        2975866
> > uuid_tree_generation    113519755
> > dev_item.uuid           a9b2e4ea-404c-xxxx-a450-dc84b0956ce1
> > dev_item.fsid           59301fea-434a-xxxx-bb45-08fcfe8ce113 [match]
> > dev_item.type           0
> > dev_item.total_bytes    1000201740288
> > dev_item.bytes_used     635655159808
> > dev_item.io_align       4096
> > dev_item.io_width       4096
> > dev_item.sector_size    4096
> > dev_item.devid          1
> > dev_item.dev_group      0
> > dev_item.seek_speed     0
> > dev_item.bandwidth      0
> > dev_item.generation     0
> > 
> > device stats
> > ============
> > 
> > [/dev/mapper/WD-WCAU45xxxx03].write_io_errs    0
> > [/dev/mapper/WD-WCAU45xxxx03].read_io_errs     0
> > [/dev/mapper/WD-WCAU45xxxx03].flush_io_errs    0
> > [/dev/mapper/WD-WCAU45xxxx03].corruption_errs  0
> > [/dev/mapper/WD-WCAU45xxxx03].generation_errs  0
> > [/dev/mapper/WD-WCAZAFxxxx78].write_io_errs    0
> > [/dev/mapper/WD-WCAZAFxxxx78].read_io_errs     0
> > [/dev/mapper/WD-WCAZAFxxxx78].flush_io_errs    0
> > [/dev/mapper/WD-WCAZAFxxxx78].corruption_errs  0
> > [/dev/mapper/WD-WCAZAFxxxx78].generation_errs  0
> > [/dev/mapper/WD-WCC4J7xxxxSZ].write_io_errs    0
> > [/dev/mapper/WD-WCC4J7xxxxSZ].read_io_errs     1
> > [/dev/mapper/WD-WCC4J7xxxxSZ].flush_io_errs    0
> > [/dev/mapper/WD-WCC4J7xxxxSZ].corruption_errs  0
> > [/dev/mapper/WD-WCC4J7xxxxSZ].generation_errs  0
> > [/dev/mapper/WD-WCC4M2xxxxXH].write_io_errs    0
> > [/dev/mapper/WD-WCC4M2xxxxXH].read_io_errs     0
> > [/dev/mapper/WD-WCC4M2xxxxXH].flush_io_errs    0
> > [/dev/mapper/WD-WCC4M2xxxxXH].corruption_errs  0
> > [/dev/mapper/WD-WCC4M2xxxxXH].generation_errs  0
> > [devid:6].write_io_errs    0
> > [devid:6].read_io_errs     0
> > [devid:6].flush_io_errs    0
> > [devid:6].corruption_errs  72016
> > [devid:6].generation_errs  100
> > [/dev/mapper/S1xxxxJ3].write_io_errs    0
> > [/dev/mapper/S1xxxxJ3].read_io_errs     0
> > [/dev/mapper/S1xxxxJ3].flush_io_errs    0
> > [/dev/mapper/S1xxxxJ3].corruption_errs  2
> > [/dev/mapper/S1xxxxJ3].generation_errs  0
> > [/dev/mapper/WD-WCC4N3xxxx17].write_io_errs    0
> > [/dev/mapper/WD-WCC4N3xxxx17].read_io_errs     0
> > [/dev/mapper/WD-WCC4N3xxxx17].flush_io_errs    0
> > [/dev/mapper/WD-WCC4N3xxxx17].corruption_errs  0
> > [/dev/mapper/WD-WCC4N3xxxx17].generation_errs  0
> > [/dev/mapper/WD-WCC7K2xxxxNS].write_io_errs    0
> > [/dev/mapper/WD-WCC7K2xxxxNS].read_io_errs     0
> > [/dev/mapper/WD-WCC7K2xxxxNS].flush_io_errs    0
> > [/dev/mapper/WD-WCC7K2xxxxNS].corruption_errs  0
> > [/dev/mapper/WD-WCC7K2xxxxNS].generation_errs  0
> 



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers")
  2021-12-02 18:11   ` Zygo Blaxell
@ 2021-12-03 10:14     ` Lukas Pirl
  2021-12-05 11:54     ` Lukas Pirl
  2021-12-10 13:28     ` Lukas Pirl
  2 siblings, 0 replies; 8+ messages in thread
From: Lukas Pirl @ 2021-12-03 10:14 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1580 bytes --]

Thanks for taking care, Zygo.

On Thu, 2021-12-02 13:11 -0500, Zygo Blaxell wrote as excerpted:
> Does it always happen on the same block group?  If so, that points to
> something lurking in your metadata.  If a reboot fixes it for one block
> group and then it gets stuck on some other block group, it points to
> an issue in kernel memory state.

Good point, I'll check. I can also do a memory test but the machine runs well
otherwise.

> What do you get from 'btrfs check --readonly'?

Interestingly, the machine disappeared from the network shortly after I issued
the command. :) I'll drive to the machine today or tomorrow, see what is going
on and report back.

> > > (The particular fs has been created 2016, I am otherwise happy with
> > > btrfs and advocating; BTW I have backups and are ready to use them)
> > > 
> > > Another question I was asking myself: can btrfs be forced to forget
> > > about a device (as in "delete from meta data) to then just run a
> > > regular balance?
> 
> It can, but the way you do that is "mount in degraded mode (to force
> forget the device), then run btrfs device delete," and you're getting
> stuck on the "btrfs device delete" step.
> 
> "btrfs device delete" is itself "resize device to zero, then run balance"
> and it's the balance step you're stuck on.

Yes, but btrfs still knows about the drive. But if it's really the balance
that hangs, it probably wouldn't make much sense to just "delete the device
from the metadata" if one would have to balance afterwards anyway.

Cheers

Lukas


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers")
  2021-12-02 18:11   ` Zygo Blaxell
  2021-12-03 10:14     ` Lukas Pirl
@ 2021-12-05 11:54     ` Lukas Pirl
  2021-12-10 13:28     ` Lukas Pirl
  2 siblings, 0 replies; 8+ messages in thread
From: Lukas Pirl @ 2021-12-05 11:54 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: linux-btrfs

Hello Zygo,

it took me (and the disks) a while to report back; here we go:

On Thu, 2021-12-02 13:11 -0500, Zygo Blaxell wrote as excerpted:
> > On Thu, 2021-11-25 19:06 +0100, Lukas Pirl wrote as excerpted:
> > > Dear btrfs community,
> > > 
> > > this is another report of a probably endless balance which loops on
> > > "found 1 extents, stage: update data pointers".
> > > 
> > > I observe it on a btrfs RAID 1 on around 7 luks-encrypted spinning
> > > disks (more fs details below) used for storing cold data. One disk
> > > failed physically. Now, I try to "btrfs device delete missing". The
> > > operation runs forever (probably, waited more than 30 days, another
> > > time more than 50 days).
> > > 
> > > dmesg says:
> > > [      22:26] BTRFS info (device dm-1): relocating block group
> > > 1109204664320
> > > flags data|raid1
> > > [      22:27] BTRFS info (device dm-1): found 4164 extents, stage: move
> > > data
> > > extents
> > > [  +5.476247] BTRFS info (device dm-1): found 4164 extents, stage: update
> > > data pointers
> > > [  +2.545299] BTRFS info (device dm-1): found 1 extents, stage: update
> > > data
> > > pointers
> > > 
> > > and then the last message repeats every ~ .25 seconds ("forever").
> > > Memory and CPU usage are not excessive (most is IO wait, I assume).
> > > 
> > > What I have tried:
> > > * Linux 4 (multiple minor versions, don't remember which exactly)
> > > * Linux 5.10
> > > * Linux 5.15
> > > * btrfs-progs v5.15
> > > * remove subvolues (before: ~ 200, after: ~ 90)
> > > * free space cache v1, v2, none
> > > * reboot, restart removal/balance (multiple times)
> 
> Does it always happen on the same block group?  If so, that points to
> something lurking in your metadata.  If a reboot fixes it for one block
> group and then it gets stuck on some other block group, it points to
> an issue in kernel memory state.

Although I haven't paid attention to the block group number in the past,
another run of ``btrfs dev del`` just now gave the same last block group
number (1109204664320) before, presumably, looping.

> What do you get from 'btrfs check --readonly'?

$ btrfs check --readonly --mode lowmem /dev/disk/by-label/pool_16-03

[1/7] checking root items
Opening filesystem to check...
warning, device 6 is missing
Checking filesystem on /dev/disk/by-label/pool_16-03
UUID: 59301fea-434a-4c43-bb45-08fcfe8ce113
[2/7] checking extents
ERROR: extent[1109584044032, 8192] referencer count mismatch (root: 276,
owner: 1154248, offset: 100401152) wanted: 1, have: 0
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space tree
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs done with fs roots in lowmem mode, skipping
[7/7] checking quota groups skipped (not enabled on this FS)
found 4252313206784 bytes used, error(s) found
total csum bytes: 4128183360
total tree bytes: 25053184000
total fs tree bytes: 16415014912
total extent tree bytes: 3662594048
btree space waste bytes: 4949241278
file data blocks allocated: 8025128243200
 referenced 7552211206144

Thanks for your help

Lukas

> > > ======================================================================
> > > 
> > > filesystem show
> > > ===============
> > > 
> > > Label: 'pool_16-03'  uuid: 59301fea-434a-xxxx-bb45-08fcfe8ce113
> > >         Total devices 8 FS bytes used 3.84TiB
> > >         devid    1 size 931.51GiB used 592.00GiB path /dev/mapper/WD-
> > > WCAU45xxxx03
> > >         devid    3 size 1.82TiB used 1.37TiB path /dev/mapper/WD-
> > > WCAZAFxxxx78
> > >         devid    4 size 931.51GiB used 593.00GiB path /dev/mapper/WD-
> > > WCC4J7xxxxSZ
> > >         devid    5 size 1.82TiB used 1.46TiB path /dev/mapper/WD-
> > > WCC4M2xxxxXH
> > >         devid    7 size 931.51GiB used 584.00GiB path
> > > /dev/mapper/S1xxxxJ3
> > >         devid    9 size 2.73TiB used 2.28TiB path /dev/mapper/WD-
> > > WCC4N3xxxx17
> > >         devid   10 size 3.64TiB used 1.03TiB path /dev/mapper/WD-
> > > WCC7K2xxxxNS
> > >         *** Some devices missing
> > > 
> > > subvolumes
> > > ==========
> > > 
> > > ~ 90, of which ~ 60 are read-only snapshots of the other ~ 30
> > > 
> > > filesystem usage
> > > ================
> > > 
> > > Overall:
> > >     Device size:                  12.74TiB
> > >     Device allocated:              8.36TiB
> > >     Device unallocated:            4.38TiB
> > >     Device missing:                  0.00B
> > >     Used:                          7.69TiB
> > >     Free (estimated):              2.50TiB      (min: 2.50TiB)
> > >     Free (statfs, df):             1.46TiB
> > >     Data ratio:                       2.00
> > >     Metadata ratio:                   2.00
> > >     Global reserve:              512.00MiB      (used: 48.00KiB)
> > >     Multiple profiles:                  no
> > > 
> > > Data,RAID1: Size:4.14TiB, Used:3.82TiB (92.33%)
> > >    /dev/mapper/WD-WCAU45xxxx03   584.00GiB
> > >    /dev/mapper/WD-WCAZAFxxxx78     1.35TiB
> > >    /dev/mapper/WD-WCC4J7xxxxSZ   588.00GiB
> > >    /dev/mapper/WD-WCC4M2xxxxXH     1.44TiB
> > >    missing       510.00GiB
> > >    /dev/mapper/S1xxxxJ3  579.00GiB
> > >    /dev/mapper/WD-WCC4N3xxxx17     2.26TiB
> > >    /dev/mapper/WD-WCC7K2xxxxNS     1.01TiB
> > > 
> > > Metadata,RAID1: Size:41.00GiB, Used:23.14GiB (56.44%)
> > >    /dev/mapper/WD-WCAU45xxxx03     8.00GiB
> > >    /dev/mapper/WD-WCAZAFxxxx78    17.00GiB
> > >    /dev/mapper/WD-WCC4J7xxxxSZ     5.00GiB
> > >    /dev/mapper/WD-WCC4M2xxxxXH    13.00GiB
> > >    missing         3.00GiB
> > >    /dev/mapper/S1xxxxJ3    5.00GiB
> > >    /dev/mapper/WD-WCC4N3xxxx17    16.00GiB
> > >    /dev/mapper/WD-WCC7K2xxxxNS    15.00GiB
> > > 
> > > System,RAID1: Size:32.00MiB, Used:848.00KiB (2.59%)
> > >    missing        32.00MiB
> > >    /dev/mapper/WD-WCC4N3xxxx17    32.00MiB
> > > 
> > > Unallocated:
> > >    /dev/mapper/WD-WCAU45xxxx03   339.51GiB
> > >    /dev/mapper/WD-WCAZAFxxxx78   461.01GiB
> > >    /dev/mapper/WD-WCC4J7xxxxSZ   338.51GiB
> > >    /dev/mapper/WD-WCC4M2xxxxXH   373.01GiB
> > >    missing      -513.03GiB
> > >    /dev/mapper/S1xxxxJ3  347.51GiB
> > >    /dev/mapper/WD-WCC4N3xxxx17   460.47GiB
> > >    /dev/mapper/WD-WCC7K2xxxxNS     2.61TiB
> > > 
> > > dump-super
> > > ==========
> > > 
> > > superblock: bytenr=65536, device=/dev/mapper/WD-WCAU45xxxx03
> > > ---------------------------------------------------------
> > > csum_type               0 (crc32c)
> > > csum_size               4
> > > csum                    0x51beb068 [match]
> > > bytenr                  65536
> > > flags                   0x1
> > >                         ( WRITTEN )
> > > magic                   _BHRfS_M [match]
> > > fsid                    59301fea-434a-xxxx-bb45-08fcfe8ce113
> > > metadata_uuid           59301fea-434a-xxxx-bb45-08fcfe8ce113
> > > label                   pool_16-03
> > > generation              113519755
> > > root                    15602414796800
> > > sys_array_size          129
> > > chunk_root_generation   63394299
> > > root_level              1
> > > chunk_root              19216820502528
> > > chunk_root_level        1
> > > log_root                0
> > > log_root_transid        0
> > > log_root_level          0
> > > total_bytes             16003136864256
> > > bytes_used              4227124142080
> > > sectorsize              4096
> > > nodesize                16384
> > > leafsize (deprecated)   16384
> > > stripesize              4096
> > > root_dir                6
> > > num_devices             8
> > > compat_flags            0x0
> > > compat_ro_flags         0x0
> > > incompat_flags          0x371
> > >                         ( MIXED_BACKREF |
> > >                           COMPRESS_ZSTD |
> > >                           BIG_METADATA |
> > >                           EXTENDED_IREF |
> > >                           SKINNY_METADATA |
> > >                           NO_HOLES )
> > > cache_generation        2975866
> > > uuid_tree_generation    113519755
> > > dev_item.uuid           a9b2e4ea-404c-xxxx-a450-dc84b0956ce1
> > > dev_item.fsid           59301fea-434a-xxxx-bb45-08fcfe8ce113 [match]
> > > dev_item.type           0
> > > dev_item.total_bytes    1000201740288
> > > dev_item.bytes_used     635655159808
> > > dev_item.io_align       4096
> > > dev_item.io_width       4096
> > > dev_item.sector_size    4096
> > > dev_item.devid          1
> > > dev_item.dev_group      0
> > > dev_item.seek_speed     0
> > > dev_item.bandwidth      0
> > > dev_item.generation     0
> > > 
> > > device stats
> > > ============
> > > 
> > > [/dev/mapper/WD-WCAU45xxxx03].write_io_errs    0
> > > [/dev/mapper/WD-WCAU45xxxx03].read_io_errs     0
> > > [/dev/mapper/WD-WCAU45xxxx03].flush_io_errs    0
> > > [/dev/mapper/WD-WCAU45xxxx03].corruption_errs  0
> > > [/dev/mapper/WD-WCAU45xxxx03].generation_errs  0
> > > [/dev/mapper/WD-WCAZAFxxxx78].write_io_errs    0
> > > [/dev/mapper/WD-WCAZAFxxxx78].read_io_errs     0
> > > [/dev/mapper/WD-WCAZAFxxxx78].flush_io_errs    0
> > > [/dev/mapper/WD-WCAZAFxxxx78].corruption_errs  0
> > > [/dev/mapper/WD-WCAZAFxxxx78].generation_errs  0
> > > [/dev/mapper/WD-WCC4J7xxxxSZ].write_io_errs    0
> > > [/dev/mapper/WD-WCC4J7xxxxSZ].read_io_errs     1
> > > [/dev/mapper/WD-WCC4J7xxxxSZ].flush_io_errs    0
> > > [/dev/mapper/WD-WCC4J7xxxxSZ].corruption_errs  0
> > > [/dev/mapper/WD-WCC4J7xxxxSZ].generation_errs  0
> > > [/dev/mapper/WD-WCC4M2xxxxXH].write_io_errs    0
> > > [/dev/mapper/WD-WCC4M2xxxxXH].read_io_errs     0
> > > [/dev/mapper/WD-WCC4M2xxxxXH].flush_io_errs    0
> > > [/dev/mapper/WD-WCC4M2xxxxXH].corruption_errs  0
> > > [/dev/mapper/WD-WCC4M2xxxxXH].generation_errs  0
> > > [devid:6].write_io_errs    0
> > > [devid:6].read_io_errs     0
> > > [devid:6].flush_io_errs    0
> > > [devid:6].corruption_errs  72016
> > > [devid:6].generation_errs  100
> > > [/dev/mapper/S1xxxxJ3].write_io_errs    0
> > > [/dev/mapper/S1xxxxJ3].read_io_errs     0
> > > [/dev/mapper/S1xxxxJ3].flush_io_errs    0
> > > [/dev/mapper/S1xxxxJ3].corruption_errs  2
> > > [/dev/mapper/S1xxxxJ3].generation_errs  0
> > > [/dev/mapper/WD-WCC4N3xxxx17].write_io_errs    0
> > > [/dev/mapper/WD-WCC4N3xxxx17].read_io_errs     0
> > > [/dev/mapper/WD-WCC4N3xxxx17].flush_io_errs    0
> > > [/dev/mapper/WD-WCC4N3xxxx17].corruption_errs  0
> > > [/dev/mapper/WD-WCC4N3xxxx17].generation_errs  0
> > > [/dev/mapper/WD-WCC7K2xxxxNS].write_io_errs    0
> > > [/dev/mapper/WD-WCC7K2xxxxNS].read_io_errs     0
> > > [/dev/mapper/WD-WCC7K2xxxxNS].flush_io_errs    0
> > > [/dev/mapper/WD-WCC7K2xxxxNS].corruption_errs  0
> > > [/dev/mapper/WD-WCC7K2xxxxNS].generation_errs  0
> > 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers")
  2021-12-02 18:11   ` Zygo Blaxell
  2021-12-03 10:14     ` Lukas Pirl
  2021-12-05 11:54     ` Lukas Pirl
@ 2021-12-10 13:28     ` Lukas Pirl
  2021-12-11  2:53       ` Zygo Blaxell
  2 siblings, 1 reply; 8+ messages in thread
From: Lukas Pirl @ 2021-12-10 13:28 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 11961 bytes --]

(friendly, humble re-post)

Hello Zygo,

it took me (and the disks) a while to report back; here we go:

On Thu, 2021-12-02 13:11 -0500, Zygo Blaxell wrote as excerpted:
> > On Thu, 2021-11-25 19:06 +0100, Lukas Pirl wrote as excerpted:
> > > Dear btrfs community,
> > > 
> > > this is another report of a probably endless balance which loops on
> > > "found 1 extents, stage: update data pointers".
> > > 
> > > I observe it on a btrfs RAID 1 on around 7 luks-encrypted spinning
> > > disks (more fs details below) used for storing cold data. One disk
> > > failed physically. Now, I try to "btrfs device delete missing". The
> > > operation runs forever (probably, waited more than 30 days, another
> > > time more than 50 days).
> > > 
> > > dmesg says:
> > > [      22:26] BTRFS info (device dm-1): relocating block group
> > > 1109204664320
> > > flags data|raid1
> > > [      22:27] BTRFS info (device dm-1): found 4164 extents, stage: move
> > > data
> > > extents
> > > [  +5.476247] BTRFS info (device dm-1): found 4164 extents, stage:
> > > update
> > > data pointers
> > > [  +2.545299] BTRFS info (device dm-1): found 1 extents, stage: update
> > > data
> > > pointers
> > > 
> > > and then the last message repeats every ~ .25 seconds ("forever").
> > > Memory and CPU usage are not excessive (most is IO wait, I assume).
> > > 
> > > What I have tried:
> > > * Linux 4 (multiple minor versions, don't remember which exactly)
> > > * Linux 5.10
> > > * Linux 5.15
> > > * btrfs-progs v5.15
> > > * remove subvolues (before: ~ 200, after: ~ 90)
> > > * free space cache v1, v2, none
> > > * reboot, restart removal/balance (multiple times)
> 
> Does it always happen on the same block group?  If so, that points to
> something lurking in your metadata.  If a reboot fixes it for one block
> group and then it gets stuck on some other block group, it points to
> an issue in kernel memory state.

Although I haven't paid attention to the block group number in the past,
another run of ``btrfs dev del`` just now gave the same last block group
number (1109204664320) before, presumably, looping.

> What do you get from 'btrfs check --readonly'?

$ btrfs check --readonly --mode lowmem /dev/disk/by-label/pool_16-03

[1/7] checking root items
Opening filesystem to check...
warning, device 6 is missing
Checking filesystem on /dev/disk/by-label/pool_16-03
UUID: 59301fea-434a-4c43-bb45-08fcfe8ce113
[2/7] checking extents
ERROR: extent[1109584044032, 8192] referencer count mismatch (root: 276,
owner: 1154248, offset: 100401152) wanted: 1, have: 0
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space tree
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs done with fs roots in lowmem mode, skipping
[7/7] checking quota groups skipped (not enabled on this FS)
found 4252313206784 bytes used, error(s) found
total csum bytes: 4128183360
total tree bytes: 25053184000
total fs tree bytes: 16415014912
total extent tree bytes: 3662594048
btree space waste bytes: 4949241278
file data blocks allocated: 8025128243200
 referenced 7552211206144

So what can be done? ``check --repair``? Or too dangerous? :)

Thanks for your help

Lukas

> > > ======================================================================
> > > 
> > > filesystem show
> > > ===============
> > > 
> > > Label: 'pool_16-03'  uuid: 59301fea-434a-xxxx-bb45-08fcfe8ce113
> > >         Total devices 8 FS bytes used 3.84TiB
> > >         devid    1 size 931.51GiB used 592.00GiB path /dev/mapper/WD-
> > > WCAU45xxxx03
> > >         devid    3 size 1.82TiB used 1.37TiB path /dev/mapper/WD-
> > > WCAZAFxxxx78
> > >         devid    4 size 931.51GiB used 593.00GiB path /dev/mapper/WD-
> > > WCC4J7xxxxSZ
> > >         devid    5 size 1.82TiB used 1.46TiB path /dev/mapper/WD-
> > > WCC4M2xxxxXH
> > >         devid    7 size 931.51GiB used 584.00GiB path
> > > /dev/mapper/S1xxxxJ3
> > >         devid    9 size 2.73TiB used 2.28TiB path /dev/mapper/WD-
> > > WCC4N3xxxx17
> > >         devid   10 size 3.64TiB used 1.03TiB path /dev/mapper/WD-
> > > WCC7K2xxxxNS
> > >         *** Some devices missing
> > > 
> > > subvolumes
> > > ==========
> > > 
> > > ~ 90, of which ~ 60 are read-only snapshots of the other ~ 30
> > > 
> > > filesystem usage
> > > ================
> > > 
> > > Overall:
> > >     Device size:                  12.74TiB
> > >     Device allocated:              8.36TiB
> > >     Device unallocated:            4.38TiB
> > >     Device missing:                  0.00B
> > >     Used:                          7.69TiB
> > >     Free (estimated):              2.50TiB      (min: 2.50TiB)
> > >     Free (statfs, df):             1.46TiB
> > >     Data ratio:                       2.00
> > >     Metadata ratio:                   2.00
> > >     Global reserve:              512.00MiB      (used: 48.00KiB)
> > >     Multiple profiles:                  no
> > > 
> > > Data,RAID1: Size:4.14TiB, Used:3.82TiB (92.33%)
> > >    /dev/mapper/WD-WCAU45xxxx03   584.00GiB
> > >    /dev/mapper/WD-WCAZAFxxxx78     1.35TiB
> > >    /dev/mapper/WD-WCC4J7xxxxSZ   588.00GiB
> > >    /dev/mapper/WD-WCC4M2xxxxXH     1.44TiB
> > >    missing       510.00GiB
> > >    /dev/mapper/S1xxxxJ3  579.00GiB
> > >    /dev/mapper/WD-WCC4N3xxxx17     2.26TiB
> > >    /dev/mapper/WD-WCC7K2xxxxNS     1.01TiB
> > > 
> > > Metadata,RAID1: Size:41.00GiB, Used:23.14GiB (56.44%)
> > >    /dev/mapper/WD-WCAU45xxxx03     8.00GiB
> > >    /dev/mapper/WD-WCAZAFxxxx78    17.00GiB
> > >    /dev/mapper/WD-WCC4J7xxxxSZ     5.00GiB
> > >    /dev/mapper/WD-WCC4M2xxxxXH    13.00GiB
> > >    missing         3.00GiB
> > >    /dev/mapper/S1xxxxJ3    5.00GiB
> > >    /dev/mapper/WD-WCC4N3xxxx17    16.00GiB
> > >    /dev/mapper/WD-WCC7K2xxxxNS    15.00GiB
> > > 
> > > System,RAID1: Size:32.00MiB, Used:848.00KiB (2.59%)
> > >    missing        32.00MiB
> > >    /dev/mapper/WD-WCC4N3xxxx17    32.00MiB
> > > 
> > > Unallocated:
> > >    /dev/mapper/WD-WCAU45xxxx03   339.51GiB
> > >    /dev/mapper/WD-WCAZAFxxxx78   461.01GiB
> > >    /dev/mapper/WD-WCC4J7xxxxSZ   338.51GiB
> > >    /dev/mapper/WD-WCC4M2xxxxXH   373.01GiB
> > >    missing      -513.03GiB
> > >    /dev/mapper/S1xxxxJ3  347.51GiB
> > >    /dev/mapper/WD-WCC4N3xxxx17   460.47GiB
> > >    /dev/mapper/WD-WCC7K2xxxxNS     2.61TiB
> > > 
> > > dump-super
> > > ==========
> > > 
> > > superblock: bytenr=65536, device=/dev/mapper/WD-WCAU45xxxx03
> > > ---------------------------------------------------------
> > > csum_type               0 (crc32c)
> > > csum_size               4
> > > csum                    0x51beb068 [match]
> > > bytenr                  65536
> > > flags                   0x1
> > >                         ( WRITTEN )
> > > magic                   _BHRfS_M [match]
> > > fsid                    59301fea-434a-xxxx-bb45-08fcfe8ce113
> > > metadata_uuid           59301fea-434a-xxxx-bb45-08fcfe8ce113
> > > label                   pool_16-03
> > > generation              113519755
> > > root                    15602414796800
> > > sys_array_size          129
> > > chunk_root_generation   63394299
> > > root_level              1
> > > chunk_root              19216820502528
> > > chunk_root_level        1
> > > log_root                0
> > > log_root_transid        0
> > > log_root_level          0
> > > total_bytes             16003136864256
> > > bytes_used              4227124142080
> > > sectorsize              4096
> > > nodesize                16384
> > > leafsize (deprecated)   16384
> > > stripesize              4096
> > > root_dir                6
> > > num_devices             8
> > > compat_flags            0x0
> > > compat_ro_flags         0x0
> > > incompat_flags          0x371
> > >                         ( MIXED_BACKREF |
> > >                           COMPRESS_ZSTD |
> > >                           BIG_METADATA |
> > >                           EXTENDED_IREF |
> > >                           SKINNY_METADATA |
> > >                           NO_HOLES )
> > > cache_generation        2975866
> > > uuid_tree_generation    113519755
> > > dev_item.uuid           a9b2e4ea-404c-xxxx-a450-dc84b0956ce1
> > > dev_item.fsid           59301fea-434a-xxxx-bb45-08fcfe8ce113 [match]
> > > dev_item.type           0
> > > dev_item.total_bytes    1000201740288
> > > dev_item.bytes_used     635655159808
> > > dev_item.io_align       4096
> > > dev_item.io_width       4096
> > > dev_item.sector_size    4096
> > > dev_item.devid          1
> > > dev_item.dev_group      0
> > > dev_item.seek_speed     0
> > > dev_item.bandwidth      0
> > > dev_item.generation     0
> > > 
> > > device stats
> > > ============
> > > 
> > > [/dev/mapper/WD-WCAU45xxxx03].write_io_errs    0
> > > [/dev/mapper/WD-WCAU45xxxx03].read_io_errs     0
> > > [/dev/mapper/WD-WCAU45xxxx03].flush_io_errs    0
> > > [/dev/mapper/WD-WCAU45xxxx03].corruption_errs  0
> > > [/dev/mapper/WD-WCAU45xxxx03].generation_errs  0
> > > [/dev/mapper/WD-WCAZAFxxxx78].write_io_errs    0
> > > [/dev/mapper/WD-WCAZAFxxxx78].read_io_errs     0
> > > [/dev/mapper/WD-WCAZAFxxxx78].flush_io_errs    0
> > > [/dev/mapper/WD-WCAZAFxxxx78].corruption_errs  0
> > > [/dev/mapper/WD-WCAZAFxxxx78].generation_errs  0
> > > [/dev/mapper/WD-WCC4J7xxxxSZ].write_io_errs    0
> > > [/dev/mapper/WD-WCC4J7xxxxSZ].read_io_errs     1
> > > [/dev/mapper/WD-WCC4J7xxxxSZ].flush_io_errs    0
> > > [/dev/mapper/WD-WCC4J7xxxxSZ].corruption_errs  0
> > > [/dev/mapper/WD-WCC4J7xxxxSZ].generation_errs  0
> > > [/dev/mapper/WD-WCC4M2xxxxXH].write_io_errs    0
> > > [/dev/mapper/WD-WCC4M2xxxxXH].read_io_errs     0
> > > [/dev/mapper/WD-WCC4M2xxxxXH].flush_io_errs    0
> > > [/dev/mapper/WD-WCC4M2xxxxXH].corruption_errs  0
> > > [/dev/mapper/WD-WCC4M2xxxxXH].generation_errs  0
> > > [devid:6].write_io_errs    0
> > > [devid:6].read_io_errs     0
> > > [devid:6].flush_io_errs    0
> > > [devid:6].corruption_errs  72016
> > > [devid:6].generation_errs  100
> > > [/dev/mapper/S1xxxxJ3].write_io_errs    0
> > > [/dev/mapper/S1xxxxJ3].read_io_errs     0
> > > [/dev/mapper/S1xxxxJ3].flush_io_errs    0
> > > [/dev/mapper/S1xxxxJ3].corruption_errs  2
> > > [/dev/mapper/S1xxxxJ3].generation_errs  0
> > > [/dev/mapper/WD-WCC4N3xxxx17].write_io_errs    0
> > > [/dev/mapper/WD-WCC4N3xxxx17].read_io_errs     0
> > > [/dev/mapper/WD-WCC4N3xxxx17].flush_io_errs    0
> > > [/dev/mapper/WD-WCC4N3xxxx17].corruption_errs  0
> > > [/dev/mapper/WD-WCC4N3xxxx17].generation_errs  0
> > > [/dev/mapper/WD-WCC7K2xxxxNS].write_io_errs    0
> > > [/dev/mapper/WD-WCC7K2xxxxNS].read_io_errs     0
> > > [/dev/mapper/WD-WCC7K2xxxxNS].flush_io_errs    0
> > > [/dev/mapper/WD-WCC7K2xxxxNS].corruption_errs  0
> > > [/dev/mapper/WD-WCC7K2xxxxNS].generation_errs  0
> > 



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers")
  2021-12-10 13:28     ` Lukas Pirl
@ 2021-12-11  2:53       ` Zygo Blaxell
  2021-12-16 20:52         ` Lukas Pirl
  0 siblings, 1 reply; 8+ messages in thread
From: Zygo Blaxell @ 2021-12-11  2:53 UTC (permalink / raw)
  To: Lukas Pirl; +Cc: linux-btrfs

On Fri, Dec 10, 2021 at 02:28:28PM +0100, Lukas Pirl wrote:
> (friendly, humble re-post)
> 
> Hello Zygo,
> 
> it took me (and the disks) a while to report back; here we go:
> 
> On Thu, 2021-12-02 13:11 -0500, Zygo Blaxell wrote as excerpted:
> > > On Thu, 2021-11-25 19:06 +0100, Lukas Pirl wrote as excerpted:
> > > > Dear btrfs community,
> > > > 
> > > > this is another report of a probably endless balance which loops on
> > > > "found 1 extents, stage: update data pointers".
> > > > 
> > > > I observe it on a btrfs RAID 1 on around 7 luks-encrypted spinning
> > > > disks (more fs details below) used for storing cold data. One disk
> > > > failed physically. Now, I try to "btrfs device delete missing". The
> > > > operation runs forever (probably, waited more than 30 days, another
> > > > time more than 50 days).
> > > > 
> > > > dmesg says:
> > > > [      22:26] BTRFS info (device dm-1): relocating block group
> > > > 1109204664320
> > > > flags data|raid1
> > > > [      22:27] BTRFS info (device dm-1): found 4164 extents, stage: move
> > > > data
> > > > extents
> > > > [  +5.476247] BTRFS info (device dm-1): found 4164 extents, stage:
> > > > update
> > > > data pointers
> > > > [  +2.545299] BTRFS info (device dm-1): found 1 extents, stage: update
> > > > data
> > > > pointers
> > > > 
> > > > and then the last message repeats every ~ .25 seconds ("forever").
> > > > Memory and CPU usage are not excessive (most is IO wait, I assume).
> > > > 
> > > > What I have tried:
> > > > * Linux 4 (multiple minor versions, don't remember which exactly)
> > > > * Linux 5.10
> > > > * Linux 5.15
> > > > * btrfs-progs v5.15
> > > > * remove subvolues (before: ~ 200, after: ~ 90)
> > > > * free space cache v1, v2, none
> > > > * reboot, restart removal/balance (multiple times)
> > 
> > Does it always happen on the same block group?  If so, that points to
> > something lurking in your metadata.  If a reboot fixes it for one block
> > group and then it gets stuck on some other block group, it points to
> > an issue in kernel memory state.
> 
> Although I haven't paid attention to the block group number in the past,
> another run of ``btrfs dev del`` just now gave the same last block group
> number (1109204664320) before, presumably, looping.
> 
> > What do you get from 'btrfs check --readonly'?
> 
> $ btrfs check --readonly --mode lowmem /dev/disk/by-label/pool_16-03
> 
> [1/7] checking root items
> Opening filesystem to check...
> warning, device 6 is missing
> Checking filesystem on /dev/disk/by-label/pool_16-03
> UUID: 59301fea-434a-4c43-bb45-08fcfe8ce113
> [2/7] checking extents
> ERROR: extent[1109584044032, 8192] referencer count mismatch (root: 276,
> owner: 1154248, offset: 100401152) wanted: 1, have: 0
> ERROR: errors found in extent allocation tree or chunk allocation
> [3/7] checking free space tree
> [4/7] checking fs roots
> [5/7] checking only csums items (without verifying data)
> [6/7] checking root refs done with fs roots in lowmem mode, skipping
> [7/7] checking quota groups skipped (not enabled on this FS)
> found 4252313206784 bytes used, error(s) found
> total csum bytes: 4128183360
> total tree bytes: 25053184000
> total fs tree bytes: 16415014912
> total extent tree bytes: 3662594048
> btree space waste bytes: 4949241278
> file data blocks allocated: 8025128243200
>  referenced 7552211206144

OK that's not too bad, just one bad reference.

> So what can be done? ``check --repair``? Or too dangerous? :)

If you have backups are you are prepared to restore them, you can try
check --repair.

> Thanks for your help
> 
> Lukas
> 
> > > > ======================================================================
> > > > 
> > > > filesystem show
> > > > ===============
> > > > 
> > > > Label: 'pool_16-03'  uuid: 59301fea-434a-xxxx-bb45-08fcfe8ce113
> > > >         Total devices 8 FS bytes used 3.84TiB
> > > >         devid    1 size 931.51GiB used 592.00GiB path /dev/mapper/WD-
> > > > WCAU45xxxx03
> > > >         devid    3 size 1.82TiB used 1.37TiB path /dev/mapper/WD-
> > > > WCAZAFxxxx78
> > > >         devid    4 size 931.51GiB used 593.00GiB path /dev/mapper/WD-
> > > > WCC4J7xxxxSZ
> > > >         devid    5 size 1.82TiB used 1.46TiB path /dev/mapper/WD-
> > > > WCC4M2xxxxXH
> > > >         devid    7 size 931.51GiB used 584.00GiB path
> > > > /dev/mapper/S1xxxxJ3
> > > >         devid    9 size 2.73TiB used 2.28TiB path /dev/mapper/WD-
> > > > WCC4N3xxxx17
> > > >         devid   10 size 3.64TiB used 1.03TiB path /dev/mapper/WD-
> > > > WCC7K2xxxxNS
> > > >         *** Some devices missing
> > > > 
> > > > subvolumes
> > > > ==========
> > > > 
> > > > ~ 90, of which ~ 60 are read-only snapshots of the other ~ 30
> > > > 
> > > > filesystem usage
> > > > ================
> > > > 
> > > > Overall:
> > > >     Device size:                  12.74TiB
> > > >     Device allocated:              8.36TiB
> > > >     Device unallocated:            4.38TiB
> > > >     Device missing:                  0.00B
> > > >     Used:                          7.69TiB
> > > >     Free (estimated):              2.50TiB      (min: 2.50TiB)
> > > >     Free (statfs, df):             1.46TiB
> > > >     Data ratio:                       2.00
> > > >     Metadata ratio:                   2.00
> > > >     Global reserve:              512.00MiB      (used: 48.00KiB)
> > > >     Multiple profiles:                  no
> > > > 
> > > > Data,RAID1: Size:4.14TiB, Used:3.82TiB (92.33%)
> > > >    /dev/mapper/WD-WCAU45xxxx03   584.00GiB
> > > >    /dev/mapper/WD-WCAZAFxxxx78     1.35TiB
> > > >    /dev/mapper/WD-WCC4J7xxxxSZ   588.00GiB
> > > >    /dev/mapper/WD-WCC4M2xxxxXH     1.44TiB
> > > >    missing       510.00GiB
> > > >    /dev/mapper/S1xxxxJ3  579.00GiB
> > > >    /dev/mapper/WD-WCC4N3xxxx17     2.26TiB
> > > >    /dev/mapper/WD-WCC7K2xxxxNS     1.01TiB
> > > > 
> > > > Metadata,RAID1: Size:41.00GiB, Used:23.14GiB (56.44%)
> > > >    /dev/mapper/WD-WCAU45xxxx03     8.00GiB
> > > >    /dev/mapper/WD-WCAZAFxxxx78    17.00GiB
> > > >    /dev/mapper/WD-WCC4J7xxxxSZ     5.00GiB
> > > >    /dev/mapper/WD-WCC4M2xxxxXH    13.00GiB
> > > >    missing         3.00GiB
> > > >    /dev/mapper/S1xxxxJ3    5.00GiB
> > > >    /dev/mapper/WD-WCC4N3xxxx17    16.00GiB
> > > >    /dev/mapper/WD-WCC7K2xxxxNS    15.00GiB
> > > > 
> > > > System,RAID1: Size:32.00MiB, Used:848.00KiB (2.59%)
> > > >    missing        32.00MiB
> > > >    /dev/mapper/WD-WCC4N3xxxx17    32.00MiB
> > > > 
> > > > Unallocated:
> > > >    /dev/mapper/WD-WCAU45xxxx03   339.51GiB
> > > >    /dev/mapper/WD-WCAZAFxxxx78   461.01GiB
> > > >    /dev/mapper/WD-WCC4J7xxxxSZ   338.51GiB
> > > >    /dev/mapper/WD-WCC4M2xxxxXH   373.01GiB
> > > >    missing      -513.03GiB
> > > >    /dev/mapper/S1xxxxJ3  347.51GiB
> > > >    /dev/mapper/WD-WCC4N3xxxx17   460.47GiB
> > > >    /dev/mapper/WD-WCC7K2xxxxNS     2.61TiB
> > > > 
> > > > dump-super
> > > > ==========
> > > > 
> > > > superblock: bytenr=65536, device=/dev/mapper/WD-WCAU45xxxx03
> > > > ---------------------------------------------------------
> > > > csum_type               0 (crc32c)
> > > > csum_size               4
> > > > csum                    0x51beb068 [match]
> > > > bytenr                  65536
> > > > flags                   0x1
> > > >                         ( WRITTEN )
> > > > magic                   _BHRfS_M [match]
> > > > fsid                    59301fea-434a-xxxx-bb45-08fcfe8ce113
> > > > metadata_uuid           59301fea-434a-xxxx-bb45-08fcfe8ce113
> > > > label                   pool_16-03
> > > > generation              113519755
> > > > root                    15602414796800
> > > > sys_array_size          129
> > > > chunk_root_generation   63394299
> > > > root_level              1
> > > > chunk_root              19216820502528
> > > > chunk_root_level        1
> > > > log_root                0
> > > > log_root_transid        0
> > > > log_root_level          0
> > > > total_bytes             16003136864256
> > > > bytes_used              4227124142080
> > > > sectorsize              4096
> > > > nodesize                16384
> > > > leafsize (deprecated)   16384
> > > > stripesize              4096
> > > > root_dir                6
> > > > num_devices             8
> > > > compat_flags            0x0
> > > > compat_ro_flags         0x0
> > > > incompat_flags          0x371
> > > >                         ( MIXED_BACKREF |
> > > >                           COMPRESS_ZSTD |
> > > >                           BIG_METADATA |
> > > >                           EXTENDED_IREF |
> > > >                           SKINNY_METADATA |
> > > >                           NO_HOLES )
> > > > cache_generation        2975866
> > > > uuid_tree_generation    113519755
> > > > dev_item.uuid           a9b2e4ea-404c-xxxx-a450-dc84b0956ce1
> > > > dev_item.fsid           59301fea-434a-xxxx-bb45-08fcfe8ce113 [match]
> > > > dev_item.type           0
> > > > dev_item.total_bytes    1000201740288
> > > > dev_item.bytes_used     635655159808
> > > > dev_item.io_align       4096
> > > > dev_item.io_width       4096
> > > > dev_item.sector_size    4096
> > > > dev_item.devid          1
> > > > dev_item.dev_group      0
> > > > dev_item.seek_speed     0
> > > > dev_item.bandwidth      0
> > > > dev_item.generation     0
> > > > 
> > > > device stats
> > > > ============
> > > > 
> > > > [/dev/mapper/WD-WCAU45xxxx03].write_io_errs    0
> > > > [/dev/mapper/WD-WCAU45xxxx03].read_io_errs     0
> > > > [/dev/mapper/WD-WCAU45xxxx03].flush_io_errs    0
> > > > [/dev/mapper/WD-WCAU45xxxx03].corruption_errs  0
> > > > [/dev/mapper/WD-WCAU45xxxx03].generation_errs  0
> > > > [/dev/mapper/WD-WCAZAFxxxx78].write_io_errs    0
> > > > [/dev/mapper/WD-WCAZAFxxxx78].read_io_errs     0
> > > > [/dev/mapper/WD-WCAZAFxxxx78].flush_io_errs    0
> > > > [/dev/mapper/WD-WCAZAFxxxx78].corruption_errs  0
> > > > [/dev/mapper/WD-WCAZAFxxxx78].generation_errs  0
> > > > [/dev/mapper/WD-WCC4J7xxxxSZ].write_io_errs    0
> > > > [/dev/mapper/WD-WCC4J7xxxxSZ].read_io_errs     1
> > > > [/dev/mapper/WD-WCC4J7xxxxSZ].flush_io_errs    0
> > > > [/dev/mapper/WD-WCC4J7xxxxSZ].corruption_errs  0
> > > > [/dev/mapper/WD-WCC4J7xxxxSZ].generation_errs  0
> > > > [/dev/mapper/WD-WCC4M2xxxxXH].write_io_errs    0
> > > > [/dev/mapper/WD-WCC4M2xxxxXH].read_io_errs     0
> > > > [/dev/mapper/WD-WCC4M2xxxxXH].flush_io_errs    0
> > > > [/dev/mapper/WD-WCC4M2xxxxXH].corruption_errs  0
> > > > [/dev/mapper/WD-WCC4M2xxxxXH].generation_errs  0
> > > > [devid:6].write_io_errs    0
> > > > [devid:6].read_io_errs     0
> > > > [devid:6].flush_io_errs    0
> > > > [devid:6].corruption_errs  72016
> > > > [devid:6].generation_errs  100
> > > > [/dev/mapper/S1xxxxJ3].write_io_errs    0
> > > > [/dev/mapper/S1xxxxJ3].read_io_errs     0
> > > > [/dev/mapper/S1xxxxJ3].flush_io_errs    0
> > > > [/dev/mapper/S1xxxxJ3].corruption_errs  2
> > > > [/dev/mapper/S1xxxxJ3].generation_errs  0
> > > > [/dev/mapper/WD-WCC4N3xxxx17].write_io_errs    0
> > > > [/dev/mapper/WD-WCC4N3xxxx17].read_io_errs     0
> > > > [/dev/mapper/WD-WCC4N3xxxx17].flush_io_errs    0
> > > > [/dev/mapper/WD-WCC4N3xxxx17].corruption_errs  0
> > > > [/dev/mapper/WD-WCC4N3xxxx17].generation_errs  0
> > > > [/dev/mapper/WD-WCC7K2xxxxNS].write_io_errs    0
> > > > [/dev/mapper/WD-WCC7K2xxxxNS].read_io_errs     0
> > > > [/dev/mapper/WD-WCC7K2xxxxNS].flush_io_errs    0
> > > > [/dev/mapper/WD-WCC7K2xxxxNS].corruption_errs  0
> > > > [/dev/mapper/WD-WCC7K2xxxxNS].generation_errs  0
> > > 
> 
> 



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers")
  2021-12-11  2:53       ` Zygo Blaxell
@ 2021-12-16 20:52         ` Lukas Pirl
  0 siblings, 0 replies; 8+ messages in thread
From: Lukas Pirl @ 2021-12-16 20:52 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 13752 bytes --]

On Fri, 2021-12-10 21:53 -0500, Zygo Blaxell wrote as excerpted:
> On Fri, Dec 10, 2021 at 02:28:28PM +0100, Lukas Pirl wrote:
> > 
> 
> > it took me (and the disks) a while to report back; here we go:
> > 
> > On Thu, 2021-12-02 13:11 -0500, Zygo Blaxell wrote as excerpted:
> > > > On Thu, 2021-11-25 19:06 +0100, Lukas Pirl wrote as excerpted:
> > > > > Dear btrfs community,
> > > > > 
> > > > > this is another report of a probably endless balance which loops on
> > > > > "found 1 extents, stage: update data pointers".
> > > > > 
> > > > > I observe it on a btrfs RAID 1 on around 7 luks-encrypted spinning
> > > > > disks (more fs details below) used for storing cold data. One disk
> > > > > failed physically. Now, I try to "btrfs device delete missing". The
> > > > > operation runs forever (probably, waited more than 30 days, another
> > > > > time more than 50 days).
> > > > > 
> > > > > dmesg says:
> > > > > [      22:26] BTRFS info (device dm-1): relocating block group
> > > > > 1109204664320
> > > > > flags data|raid1
> > > > > [      22:27] BTRFS info (device dm-1): found 4164 extents, stage:
> > > > > move
> > > > > data
> > > > > extents
> > > > > [  +5.476247] BTRFS info (device dm-1): found 4164 extents, stage:
> > > > > update
> > > > > data pointers
> > > > > [  +2.545299] BTRFS info (device dm-1): found 1 extents, stage:
> > > > > update
> > > > > data
> > > > > pointers
> > > > > 
> > > > > and then the last message repeats every ~ .25 seconds ("forever").
> > > > > Memory and CPU usage are not excessive (most is IO wait, I assume).
> > > > > 
> > > > > What I have tried:
> > > > > * Linux 4 (multiple minor versions, don't remember which exactly)
> > > > > * Linux 5.10
> > > > > * Linux 5.15
> > > > > * btrfs-progs v5.15
> > > > > * remove subvolues (before: ~ 200, after: ~ 90)
> > > > > * free space cache v1, v2, none
> > > > > * reboot, restart removal/balance (multiple times)
> > > 
> > > Does it always happen on the same block group?  If so, that points to
> > > something lurking in your metadata.  If a reboot fixes it for one block
> > > group and then it gets stuck on some other block group, it points to
> > > an issue in kernel memory state.
> > 
> > Although I haven't paid attention to the block group number in the past,
> > another run of ``btrfs dev del`` just now gave the same last block group
> > number (1109204664320) before, presumably, looping.
> > 
> > > What do you get from 'btrfs check --readonly'?
> > 
> > $ btrfs check --readonly --mode lowmem /dev/disk/by-label/pool_16-03
> > 
> > [1/7] checking root items
> > Opening filesystem to check...
> > warning, device 6 is missing
> > Checking filesystem on /dev/disk/by-label/pool_16-03
> > UUID: 59301fea-434a-4c43-bb45-08fcfe8ce113
> > [2/7] checking extents
> > ERROR: extent[1109584044032, 8192] referencer count mismatch (root: 276,
> > owner: 1154248, offset: 100401152) wanted: 1, have: 0
> > ERROR: errors found in extent allocation tree or chunk allocation
> > [3/7] checking free space tree
> > [4/7] checking fs roots
> > [5/7] checking only csums items (without verifying data)
> > [6/7] checking root refs done with fs roots in lowmem mode, skipping
> > [7/7] checking quota groups skipped (not enabled on this FS)
> > found 4252313206784 bytes used, error(s) found
> > total csum bytes: 4128183360
> > total tree bytes: 25053184000
> > total fs tree bytes: 16415014912
> > total extent tree bytes: 3662594048
> > btree space waste bytes: 4949241278
> > file data blocks allocated: 8025128243200
> >  referenced 7552211206144
> 
> OK that's not too bad, just one bad reference.
> 
> > So what can be done? ``check --repair``? Or too dangerous? :)
> 
> If you have backups are you are prepared to restore them, you can try
> check --repair.

To report back, all I got from ``check --repair`` was a segfault (progs
v5.15.1, Linux 5.15.0):

…
[2/7] checking extents
ERROR: extent[1109584044032, 8192] referencer count mismatch (root: 276,
owner: 1154248, offset: 100401152) wanted: 1, have: 0
zsh: segmentation fault  btrfs check --repair --mode lowmem /dev/disk/by-
label/…

Any open, known, and possibly related bugs I can wait for to be fixed to try
again? :)

Cheers,

Lukas

> > Thanks for your help
> > 
> > Lukas
> > 
> > > > > ====================================================================
> > > > > ==
> > > > > 
> > > > > filesystem show
> > > > > ===============
> > > > > 
> > > > > Label: 'pool_16-03'  uuid: 59301fea-434a-xxxx-bb45-08fcfe8ce113
> > > > >         Total devices 8 FS bytes used 3.84TiB
> > > > >         devid    1 size 931.51GiB used 592.00GiB path
> > > > > /dev/mapper/WD-
> > > > > WCAU45xxxx03
> > > > >         devid    3 size 1.82TiB used 1.37TiB path /dev/mapper/WD-
> > > > > WCAZAFxxxx78
> > > > >         devid    4 size 931.51GiB used 593.00GiB path
> > > > > /dev/mapper/WD-
> > > > > WCC4J7xxxxSZ
> > > > >         devid    5 size 1.82TiB used 1.46TiB path /dev/mapper/WD-
> > > > > WCC4M2xxxxXH
> > > > >         devid    7 size 931.51GiB used 584.00GiB path
> > > > > /dev/mapper/S1xxxxJ3
> > > > >         devid    9 size 2.73TiB used 2.28TiB path /dev/mapper/WD-
> > > > > WCC4N3xxxx17
> > > > >         devid   10 size 3.64TiB used 1.03TiB path /dev/mapper/WD-
> > > > > WCC7K2xxxxNS
> > > > >         *** Some devices missing
> > > > > 
> > > > > subvolumes
> > > > > ==========
> > > > > 
> > > > > ~ 90, of which ~ 60 are read-only snapshots of the other ~ 30
> > > > > 
> > > > > filesystem usage
> > > > > ================
> > > > > 
> > > > > Overall:
> > > > >     Device size:                  12.74TiB
> > > > >     Device allocated:              8.36TiB
> > > > >     Device unallocated:            4.38TiB
> > > > >     Device missing:                  0.00B
> > > > >     Used:                          7.69TiB
> > > > >     Free (estimated):              2.50TiB      (min: 2.50TiB)
> > > > >     Free (statfs, df):             1.46TiB
> > > > >     Data ratio:                       2.00
> > > > >     Metadata ratio:                   2.00
> > > > >     Global reserve:              512.00MiB      (used: 48.00KiB)
> > > > >     Multiple profiles:                  no
> > > > > 
> > > > > Data,RAID1: Size:4.14TiB, Used:3.82TiB (92.33%)
> > > > >    /dev/mapper/WD-WCAU45xxxx03   584.00GiB
> > > > >    /dev/mapper/WD-WCAZAFxxxx78     1.35TiB
> > > > >    /dev/mapper/WD-WCC4J7xxxxSZ   588.00GiB
> > > > >    /dev/mapper/WD-WCC4M2xxxxXH     1.44TiB
> > > > >    missing       510.00GiB
> > > > >    /dev/mapper/S1xxxxJ3  579.00GiB
> > > > >    /dev/mapper/WD-WCC4N3xxxx17     2.26TiB
> > > > >    /dev/mapper/WD-WCC7K2xxxxNS     1.01TiB
> > > > > 
> > > > > Metadata,RAID1: Size:41.00GiB, Used:23.14GiB (56.44%)
> > > > >    /dev/mapper/WD-WCAU45xxxx03     8.00GiB
> > > > >    /dev/mapper/WD-WCAZAFxxxx78    17.00GiB
> > > > >    /dev/mapper/WD-WCC4J7xxxxSZ     5.00GiB
> > > > >    /dev/mapper/WD-WCC4M2xxxxXH    13.00GiB
> > > > >    missing         3.00GiB
> > > > >    /dev/mapper/S1xxxxJ3    5.00GiB
> > > > >    /dev/mapper/WD-WCC4N3xxxx17    16.00GiB
> > > > >    /dev/mapper/WD-WCC7K2xxxxNS    15.00GiB
> > > > > 
> > > > > System,RAID1: Size:32.00MiB, Used:848.00KiB (2.59%)
> > > > >    missing        32.00MiB
> > > > >    /dev/mapper/WD-WCC4N3xxxx17    32.00MiB
> > > > > 
> > > > > Unallocated:
> > > > >    /dev/mapper/WD-WCAU45xxxx03   339.51GiB
> > > > >    /dev/mapper/WD-WCAZAFxxxx78   461.01GiB
> > > > >    /dev/mapper/WD-WCC4J7xxxxSZ   338.51GiB
> > > > >    /dev/mapper/WD-WCC4M2xxxxXH   373.01GiB
> > > > >    missing      -513.03GiB
> > > > >    /dev/mapper/S1xxxxJ3  347.51GiB
> > > > >    /dev/mapper/WD-WCC4N3xxxx17   460.47GiB
> > > > >    /dev/mapper/WD-WCC7K2xxxxNS     2.61TiB
> > > > > 
> > > > > dump-super
> > > > > ==========
> > > > > 
> > > > > superblock: bytenr=65536, device=/dev/mapper/WD-WCAU45xxxx03
> > > > > ---------------------------------------------------------
> > > > > csum_type               0 (crc32c)
> > > > > csum_size               4
> > > > > csum                    0x51beb068 [match]
> > > > > bytenr                  65536
> > > > > flags                   0x1
> > > > >                         ( WRITTEN )
> > > > > magic                   _BHRfS_M [match]
> > > > > fsid                    59301fea-434a-xxxx-bb45-08fcfe8ce113
> > > > > metadata_uuid           59301fea-434a-xxxx-bb45-08fcfe8ce113
> > > > > label                   pool_16-03
> > > > > generation              113519755
> > > > > root                    15602414796800
> > > > > sys_array_size          129
> > > > > chunk_root_generation   63394299
> > > > > root_level              1
> > > > > chunk_root              19216820502528
> > > > > chunk_root_level        1
> > > > > log_root                0
> > > > > log_root_transid        0
> > > > > log_root_level          0
> > > > > total_bytes             16003136864256
> > > > > bytes_used              4227124142080
> > > > > sectorsize              4096
> > > > > nodesize                16384
> > > > > leafsize (deprecated)   16384
> > > > > stripesize              4096
> > > > > root_dir                6
> > > > > num_devices             8
> > > > > compat_flags            0x0
> > > > > compat_ro_flags         0x0
> > > > > incompat_flags          0x371
> > > > >                         ( MIXED_BACKREF |
> > > > >                           COMPRESS_ZSTD |
> > > > >                           BIG_METADATA |
> > > > >                           EXTENDED_IREF |
> > > > >                           SKINNY_METADATA |
> > > > >                           NO_HOLES )
> > > > > cache_generation        2975866
> > > > > uuid_tree_generation    113519755
> > > > > dev_item.uuid           a9b2e4ea-404c-xxxx-a450-dc84b0956ce1
> > > > > dev_item.fsid           59301fea-434a-xxxx-bb45-08fcfe8ce113 [match]
> > > > > dev_item.type           0
> > > > > dev_item.total_bytes    1000201740288
> > > > > dev_item.bytes_used     635655159808
> > > > > dev_item.io_align       4096
> > > > > dev_item.io_width       4096
> > > > > dev_item.sector_size    4096
> > > > > dev_item.devid          1
> > > > > dev_item.dev_group      0
> > > > > dev_item.seek_speed     0
> > > > > dev_item.bandwidth      0
> > > > > dev_item.generation     0
> > > > > 
> > > > > device stats
> > > > > ============
> > > > > 
> > > > > [/dev/mapper/WD-WCAU45xxxx03].write_io_errs    0
> > > > > [/dev/mapper/WD-WCAU45xxxx03].read_io_errs     0
> > > > > [/dev/mapper/WD-WCAU45xxxx03].flush_io_errs    0
> > > > > [/dev/mapper/WD-WCAU45xxxx03].corruption_errs  0
> > > > > [/dev/mapper/WD-WCAU45xxxx03].generation_errs  0
> > > > > [/dev/mapper/WD-WCAZAFxxxx78].write_io_errs    0
> > > > > [/dev/mapper/WD-WCAZAFxxxx78].read_io_errs     0
> > > > > [/dev/mapper/WD-WCAZAFxxxx78].flush_io_errs    0
> > > > > [/dev/mapper/WD-WCAZAFxxxx78].corruption_errs  0
> > > > > [/dev/mapper/WD-WCAZAFxxxx78].generation_errs  0
> > > > > [/dev/mapper/WD-WCC4J7xxxxSZ].write_io_errs    0
> > > > > [/dev/mapper/WD-WCC4J7xxxxSZ].read_io_errs     1
> > > > > [/dev/mapper/WD-WCC4J7xxxxSZ].flush_io_errs    0
> > > > > [/dev/mapper/WD-WCC4J7xxxxSZ].corruption_errs  0
> > > > > [/dev/mapper/WD-WCC4J7xxxxSZ].generation_errs  0
> > > > > [/dev/mapper/WD-WCC4M2xxxxXH].write_io_errs    0
> > > > > [/dev/mapper/WD-WCC4M2xxxxXH].read_io_errs     0
> > > > > [/dev/mapper/WD-WCC4M2xxxxXH].flush_io_errs    0
> > > > > [/dev/mapper/WD-WCC4M2xxxxXH].corruption_errs  0
> > > > > [/dev/mapper/WD-WCC4M2xxxxXH].generation_errs  0
> > > > > [devid:6].write_io_errs    0
> > > > > [devid:6].read_io_errs     0
> > > > > [devid:6].flush_io_errs    0
> > > > > [devid:6].corruption_errs  72016
> > > > > [devid:6].generation_errs  100
> > > > > [/dev/mapper/S1xxxxJ3].write_io_errs    0
> > > > > [/dev/mapper/S1xxxxJ3].read_io_errs     0
> > > > > [/dev/mapper/S1xxxxJ3].flush_io_errs    0
> > > > > [/dev/mapper/S1xxxxJ3].corruption_errs  2
> > > > > [/dev/mapper/S1xxxxJ3].generation_errs  0
> > > > > [/dev/mapper/WD-WCC4N3xxxx17].write_io_errs    0
> > > > > [/dev/mapper/WD-WCC4N3xxxx17].read_io_errs     0
> > > > > [/dev/mapper/WD-WCC4N3xxxx17].flush_io_errs    0
> > > > > [/dev/mapper/WD-WCC4N3xxxx17].corruption_errs  0
> > > > > [/dev/mapper/WD-WCC4N3xxxx17].generation_errs  0
> > > > > [/dev/mapper/WD-WCC7K2xxxxNS].write_io_errs    0
> > > > > [/dev/mapper/WD-WCC7K2xxxxNS].read_io_errs     0
> > > > > [/dev/mapper/WD-WCC7K2xxxxNS].flush_io_errs    0
> > > > > [/dev/mapper/WD-WCC7K2xxxxNS].corruption_errs  0
> > > > > [/dev/mapper/WD-WCC7K2xxxxNS].generation_errs  0
> > > > 
> > 
> > 
> 
> 


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-12-16 20:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-25 18:06 Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers") Lukas Pirl
2021-12-02 14:49 ` Lukas Pirl
2021-12-02 18:11   ` Zygo Blaxell
2021-12-03 10:14     ` Lukas Pirl
2021-12-05 11:54     ` Lukas Pirl
2021-12-10 13:28     ` Lukas Pirl
2021-12-11  2:53       ` Zygo Blaxell
2021-12-16 20:52         ` Lukas Pirl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).