ENOSPC while df shows 826.93GiB free

* ENOSPC while df shows 826.93GiB free
@ 2021-12-07  2:29 Christoph Anton Mitterer
  2021-12-07  2:59 ` Qu Wenruo
  2021-12-07 15:39 ` Phillip Susi
  0 siblings, 2 replies; 20+ messages in thread
From: Christoph Anton Mitterer @ 2021-12-07  2:29 UTC (permalink / raw)
  To: linux-btrfs

Hey.

At the university I'm running a Tier-2 site for the large hadron
collider, with some total storage of 4 PB.

For a bit more than half of that I use btrfs, with HDDs combined to
some hardware raid, provided as 16TiB devices (on which the btrfs
sits).

It runs Debian bullseye, which has 5.10.70. Oh and I've used -R free-
space-tree.
I don't use snapshots on these filesystems.

On one of the filesystems I've ran now into ENOSPC.

# btrfs filesystem usage /srv/dcache/pools/2
Overall:
    Device size:		  16.00TiB
    Device allocated:		  16.00TiB
    Device unallocated:		   1.00MiB
    Device missing:		     0.00B
    Used:			  15.19TiB
    Free (estimated):		 826.93GiB	(min: 826.93GiB)
    Free (statfs, df):		 826.93GiB
    Data ratio:			      1.00
    Metadata ratio:		      2.00
    Global reserve:		 512.00MiB	(used: 0.00B)
    Multiple profiles:		        no

Data,single: Size:15.97TiB, Used:15.16TiB (94.94%)
   /dev/sdf	  15.97TiB

Metadata,DUP: Size:17.01GiB, Used:16.51GiB (97.06%)
   /dev/sdf	  34.01GiB

System,DUP: Size:8.00MiB, Used:2.12MiB (26.56%)
   /dev/sdf	  16.00MiB

Unallocated:
   /dev/sdf	   1.00MiB

yet:
# /srv/dcache/pools/2/foo
-bash: /srv/dcache/pools/2/foo: No such file or directory

balancing also fails, e.g.:
# btrfs balance start -dusage=50 /srv/dcache/pools/2
ERROR: error during balancing '/srv/dcache/pools/2': No space left on device
There may be more info in syslog - try dmesg | tail
# btrfs balance start -dusage=40 /srv/dcache/pools/2
Done, had to relocate 0 out of 16370 chunks
# btrfs balance start  /srv/dcache/pools/2
WARNING:

	Full balance without filters requested. This operation is very
	intense and takes potentially very long. It is recommended to
	use the balance filters to narrow down the scope of balance.
	Use 'btrfs balance start --full-balance' option to skip this
	warning. The operation will start in 10 seconds.
	Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting balance without any filters.
ERROR: error during balancing '/srv/dcache/pools/2': No space left on device
There may be more info in syslog - try dmesg | tail
# btrfs balance start -dusage=0 /srv/dcache/pools/2
Done, had to relocate 0 out of 16370 chunks

fsck showed no errors.

Any ideas what's going on and how to recover?

Thanks,
Chris.

^ permalink raw reply	[flat|nested] 20+ messages in thread