ENSOPC on a 10% used disk

* ENSOPC on a 10% used disk
@ 2018-10-17  7:52 Avi Kivity
  2018-10-17  8:47 ` Christoph Hellwig
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Avi Kivity @ 2018-10-17  7:52 UTC (permalink / raw)
  To: linux-xfs

I have a user running a 1.7TB filesystem with ~10% usage (as shown by 
df), getting sporadic ENOSPC errors. The disk is mounted with inode64 
and has a relatively small number of large files. The disk is a 
single-member RAID0 array, with 1MB chunk size. There are 32 AGs. 
Running Linux 4.9.17.

The write load consists of AIO/DIO writes, followed by unlinks of these 
files. The writes are non-size-changing (we truncate ahead) and we use 
XFS_IOC_FSSETXATTR/XFS_FLAG_EXTSIZE with a hint size of 32MB. The errors 
happen on commit logs, which have a target size of 32MB (but may exceed 
it a little).

The errors are sporadic and after restarting the workload they go away 
for a few hours to a few days, but then return. During one of the 
crashes I used xfs_db to look at fragmentation and saw that most AGs had 
free extents of size categories up to 128-255, but a few had more. I 
tried xfs_fsr but it did not help.

Is this a known issue? Would upgrading the kernel help?

I'll try to get a metadata dump next time this happens, and I'll be 
happy to supply more information.

^ permalink raw reply	[flat|nested] 26+ messages in thread