All of lore.kernel.org
 help / color / mirror / Atom feed
* ENOSPC Cannot add device or resize max due to Global reserve hit 512M
@ 2023-07-06 18:15 Jiachen YANG
  0 siblings, 0 replies; only message in thread
From: Jiachen YANG @ 2023-07-06 18:15 UTC (permalink / raw)
  To: linux-btrfs

Hi, dear btrfs developers

I have a server using btrfs RAID1 for metadata and RAID0 for data:

# btrfs filesystem usage /mnt
Overall:
     Device size:                   1.72TiB
     Device allocated:              1.72TiB
     Device unallocated:            2.09MiB
     Device missing:                  0.00B
     Device slack:                 20.00GiB
     Used:                          1.49TiB
     Free (estimated):            238.78GiB      (min: 238.78GiB)
     Free (statfs, df):           238.78GiB
     Data ratio:                       1.00
     Metadata ratio:                   2.00
     Global reserve:              512.00MiB      (used: 0.00B)
     Multiple profiles:                  no

Data,RAID0: Size:1.60TiB, Used:1.36TiB (85.40%)
    /dev/nvme0n1p3        817.50GiB
    /dev/nvme1n1p2        817.50GiB

Metadata,RAID1: Size:64.00GiB, Used:63.30GiB (98.91%)
    /dev/nvme0n1p3         64.00GiB
    /dev/nvme1n1p2         64.00GiB

System,RAID1: Size:8.00MiB, Used:144.00KiB (1.76%)
    /dev/nvme0n1p3          8.00MiB
    /dev/nvme1n1p2          8.00MiB

Unallocated:
    /dev/nvme0n1p3          1.05MiB
    /dev/nvme1n1p2          1.05MiB

It hit ENOSPC and was forced read-only.

I have been trying with these things without success:

1. btrfs rescue zero-log to drop the log tree
2. mount with 
ro,noatime,skip_balance,nodiscard,clear_cache,nospace_cache , and apply 
operations immediately following mount -oremount,rw
3. trying to balance -dusage after remount,rw
4. trying to `device add` 2 other devices after remount,rw
5. moving the partitions 10G forward using sfdisk, and trying to `btrfs 
filesystem resize max` after remount,rw

After mounting rw, the cleaner picked up an orphan snapshot deletion and 
the global reserve spaces started to go up until it hit around 511.48MiB 
and stopped by transaction commit failure.
I can confirm the orphaning by `btrfs-orphan-cleaner-progress` command 
from `python-btrfs`

# btrfs-orphan-cleaner-progress /mnt
1 orphans left to clean
dropping root 36480 for at least 0 sec drop_progress (439534 EXTENT_DATA 0)


`btrfs resize max` can enlarge one device and failed afterwards:

Overall:
     Device size:                   1.73TiB
     Device allocated:              1.72TiB
     Device unallocated:           10.00GiB
     Device missing:                  0.00B
     Device slack:                 10.00GiB
     Used:                          1.49TiB
     Free (estimated):            248.79GiB      (min: 243.79GiB)
     Free (statfs, df):           248.78GiB
     Data ratio:                       1.00
     Metadata ratio:                   2.00
     Global reserve:              512.00MiB      (used: 511.48MiB)
     Multiple profiles:                  no

Data,RAID0: Size:1.60TiB, Used:1.36TiB (85.40%)
    /dev/nvme0n1p3        817.50GiB
    /dev/nvme1n1p2        817.50GiB

Metadata,RAID1: Size:64.00GiB, Used:63.20GiB (98.76%)
    /dev/nvme0n1p3         64.00GiB
    /dev/nvme1n1p2         64.00GiB

System,RAID1: Size:8.00MiB, Used:144.00KiB (1.76%)
    /dev/nvme0n1p3          8.00MiB
    /dev/nvme1n1p2          8.00MiB

Unallocated:
    /dev/nvme0n1p3         10.00GiB
    /dev/nvme1n1p2          1.05MiB


The dmesg output like this:

[Jul 6 17:57] BTRFS info (device nvme0n1p3): using crc32c (crc32c-intel) 
checksum algorithm
[  +0.000019] BTRFS info (device nvme0n1p3): force clearing of disk cache
[  +0.000006] BTRFS info (device nvme0n1p3): disabling tree log
[  +0.373237] BTRFS info (device nvme0n1p3): checking UUID tree
[ +18.385270] ------------[ cut here ]------------
[  +0.000006] BTRFS: Transaction aborted (error -28)
[  +0.002486] WARNING: CPU: 0 PID: 15054 at fs/btrfs/extent-tree.c:3053 
__btrfs_free_extent+0xb26/0x11a0 [btrfs]
[  +0.000215] Modules linked in: pktcdvd ccm qrtr algif_aead cbc 
des_generic libdes ecb algif_skcipher cmac md4 algif_hash af_alg 
intel_rapl_msr intel_rapl_common intel_uncore_frequency
  intel_uncore_frequency_common isst_if_common skx_edac nfit 
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass 
ipmi_ssif crct10dif_pclmul polyval_clmulni polyval_gen
eric gf128mul acpi_ipmi ghash_clmulni_intel cfg80211 iTCO_wdt 
intel_pmc_bxt iTCO_vendor_support rapl mei_me joydev ipmi_si dell_smbios 
rfkill wmi_bmof intel_cstate dell_wmi_descriptor m
ousedev pcspkr ipmi_devintf mei intel_uncore i2c_i801 intel_pch_thermal 
spi_nor dcdbas ipmi_msghandler acpi_power_meter mac_hid lpc_ich mtd 
i2c_smbus pkcs8_key_parser fuse dm_mod bpf_pr
eload ip_tables x_tables overlay squashfs loop isofs sr_mod cdrom uas 
usb_storage usbhid btrfs blake2b_generic xor raid6_pq libcrc32c 
crc32c_generic crc32_pclmul crc32c_intel sha512_sss
e3 ixgbe aesni_intel nvme mdio_devres crypto_simd igb nvme_core cryptd 
spi_intel_pci libphy nvme_common dca
[  +0.000207]  spi_intel mgag200 mdio xhci_pci i2c_algo_bit 
xhci_pci_renesas wmi
[  +0.000020] CPU: 0 PID: 15054 Comm: btrfs Tainted: G        W 
6.4.1-arch2-1 #1 cf34d70ffed66439727ee92a8197bd2b3e0b11de
[  +0.000013] Hardware name: Dell Inc. PowerEdge C6420/0K2TT6, BIOS 
2.4.8 11/27/2019
[  +0.000004] RIP: 0010:__btrfs_free_extent+0xb26/0x11a0 [btrfs]
[  +0.000189] Code: ff ff 84 c0 0f 85 0e 02 00 00 0f 1f 44 00 00 41 b8 
01 00 00 00 e9 cc fd ff ff 8b 74 24 0c 48 c7 c7 e8 e8 c6 c0 e8 4a 77 fa 
c9 <0f> 0b e9 1a fa ff ff 89 df e8 4c 21 f
f ff 84 c0 0f 85 d7 02 00 00
[  +0.000007] RSP: 0018:ffffb49f0cd17a70 EFLAGS: 00010286
[  +0.000008] RAX: 0000000000000000 RBX: 000000a00b4f8000 RCX: 
0000000000000027
[  +0.000006] RDX: ffff9d82ffea16c8 RSI: 0000000000000001 RDI: 
ffff9d82ffea16c0
[  +0.000006] RBP: ffff9d73ee9f7d68 R08: 0000000000000000 R09: 
ffffb49f0cd17900
[  +0.000005] R10: 0000000000000003 R11: ffff9d933ff6ab28 R12: 
0000000000000001
[  +0.000004] R13: 0000000000000000 R14: ffff9d7ee24c6410 R15: 
ffff9d73ce844930
[  +0.000005] FS:  00007f600bb03900(0000) GS:ffff9d82ffe00000(0000) 
knlGS:0000000000000000
[  +0.000007] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000005] CR2: 0000564b74903890 CR3: 00000002acc02005 CR4: 
00000000007706f0
[  +0.000006] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[  +0.000004] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[  +0.000005] PKRU: 55555554
[  +0.000003] Call Trace:
[  +0.000006]  <TASK>
[  +0.000004]  ? __btrfs_free_extent+0xb26/0x11a0 [btrfs 
ba0d149848218bf804d988954b86cfb98d7e0e76]
[  +0.000189]  ? __warn+0x81/0x130
[  +0.000015]  ? __btrfs_free_extent+0xb26/0x11a0 [btrfs 
ba0d149848218bf804d988954b86cfb98d7e0e76]
[  +0.000189]  ? report_bug+0x171/0x1a0
[  +0.000019]  ? handle_bug+0x3c/0x80
[  +0.000010]  ? exc_invalid_op+0x17/0x70
[  +0.000009]  ? asm_exc_invalid_op+0x1a/0x20
[  +0.000020]  ? __btrfs_free_extent+0xb26/0x11a0 [btrfs 
ba0d149848218bf804d988954b86cfb98d7e0e76]
[  +0.000193]  __btrfs_run_delayed_refs+0x7a2/0x11d0 [btrfs 
ba0d149848218bf804d988954b86cfb98d7e0e76]
[  +0.000196]  btrfs_run_delayed_refs+0x91/0x200 [btrfs 
ba0d149848218bf804d988954b86cfb98d7e0e76]
[  +0.000189]  btrfs_commit_transaction+0x654/0xf00 [btrfs 
ba0d149848218bf804d988954b86cfb98d7e0e76]
[  +0.000209]  ? __pfx_autoremove_wake_function+0x10/0x10
[  +0.000019]  btrfs_ioctl_resize+0x450/0x480 [btrfs 
ba0d149848218bf804d988954b86cfb98d7e0e76]
[  +0.000254]  btrfs_ioctl+0x1e5/0x2420 [btrfs 
ba0d149848218bf804d988954b86cfb98d7e0e76]
[  +0.000244]  ? __wake_up_common_lock+0x8f/0xd0
[  +0.000016]  ? file_tty_write.isra.0+0x22a/0x350
[  +0.000012]  ? __pfx_n_tty_write+0x10/0x10
[  +0.000016]  __x64_sys_ioctl+0x91/0xd0
[  +0.000012]  do_syscall_64+0x5d/0x90
[  +0.000018]  ? syscall_exit_to_user_mode+0x1b/0x40
[  +0.000009]  ? do_syscall_64+0x6c/0x90
[  +0.000012]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  +0.000010] RIP: 0033:0x7f600bc5e76f
[  +0.000041] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 
00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 
05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 2
4 18 64 48 2b 04 25 28 00 00
[  +0.000006] RSP: 002b:00007fff6f81dd60 EFLAGS: 00000246 ORIG_RAX: 
0000000000000010
[  +0.000009] RAX: ffffffffffffffda RBX: 00007fff6f81f590 RCX: 
00007f600bc5e76f
[  +0.000005] RDX: 00007fff6f81def0 RSI: 0000000050009403 RDI: 
0000000000000003
[  +0.000005] RBP: 0000000000000003 R08: 0000000000000410 R09: 
0000000000000001
[  +0.000004] R10: 0000000000000003 R11: 0000000000000246 R12: 
00007fff6f81f594
[  +0.000005] R13: 000055a2cd8a3350 R14: 000055a2cca89872 R15: 
00007fff6f81def0
[  +0.000012]  </TASK>
[  +0.000002] ---[ end trace 0000000000000000 ]---
[  +0.000008] BTRFS info (device nvme0n1p3: state A): dumping space info:
[  +0.000007] BTRFS info (device nvme0n1p3: state A): space_info DATA 
has 256392597504 free, is not full
[  +0.000007] BTRFS info (device nvme0n1p3: state A): space_info 
total=1755577581568, used=1499184852992, pinned=0, reserved=0, 
may_use=0, readonly=131072 zone_unusable=0
[  +0.000010] BTRFS info (device nvme0n1p3: state A): space_info 
METADATA has -540672 free, is full
[  +0.000006] BTRFS info (device nvme0n1p3: state A): space_info 
total=68719476736, used=67883679744, pinned=355762176, 
reserved=479903744, may_use=540672, readonly=131072 zone_unusable
=0
[  +0.000009] BTRFS info (device nvme0n1p3: state A): space_info SYSTEM 
has 8208384 free, is not full
[  +0.000006] BTRFS info (device nvme0n1p3: state A): space_info 
total=8388608, used=147456, pinned=32768, reserved=0, may_use=0, 
readonly=0 zone_unusable=0
[  +0.000008] BTRFS info (device nvme0n1p3: state A): global_block_rsv: 
size 536870912 reserved 540672
[  +0.000006] BTRFS info (device nvme0n1p3: state A): trans_block_rsv: 
size 0 reserved 0
[  +0.000005] BTRFS info (device nvme0n1p3: state A): chunk_block_rsv: 
size 0 reserved 0
[  +0.000005] BTRFS info (device nvme0n1p3: state A): delayed_block_rsv: 
size 0 reserved 0
[  +0.000004] BTRFS info (device nvme0n1p3: state A): delayed_refs_rsv: 
size 14152892416 reserved 0
[  +0.000010] BTRFS: error (device nvme0n1p3: state A) in 
__btrfs_free_extent:3053: errno=-28 No space left
[  +0.001688] BTRFS info (device nvme0n1p3: state EA): forced readonly
[  +0.000008] BTRFS error (device nvme0n1p3: state EA): failed to run 
delayed ref for logical 687384526848 num_bytes 16384 type 176 action 2 
ref_mod 1: -28
[  +0.003143] BTRFS: error (device nvme0n1p3: state EA) in 
btrfs_run_delayed_refs:2127: errno=-28 No space left
[  +0.001079] BTRFS warning (device nvme0n1p3: state EA): Skipping 
commit of aborted transaction.
[  +0.000004] BTRFS: error (device nvme0n1p3: state EA) in 
cleanup_transaction:1978: errno=-28 No space left
[  +0.163882] BTRFS info (device nvme0n1p3: state EA): resize device 
/dev/nvme0n1p3 (devid 1) from 946517753856 to 957255172096

Is there anything else I can try to solve this situation? Can I somehow 
suspend the orphaning of the snapshot, to let it add more devices or 
space for the metadata?

I have exported the data through btrfs send when it is read-only.

Thank you


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-07-06 18:16 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-06 18:15 ENOSPC Cannot add device or resize max due to Global reserve hit 512M Jiachen YANG

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.