heavy xfsaild I/O blocking process exit

* heavy xfsaild I/O blocking process exit
@ 2021-09-08  8:15 Momtchil Momtchev
  2021-09-08 21:27 ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: Momtchil Momtchev @ 2021-09-08  8:15 UTC (permalink / raw)
  To: linux-xfs

Hello,

I have a puzzling problem with XFS on Debian 10. I am running 
number-crunching driven by Node.js - I have a process that creates about 
2 million 1MB to 5MB files per day with an about 24h lifespan (weather 
forecasting). The file system is obviously heavily fragmented. I have 
absolutely no problems when running this in cruise mode, but every time 
I decide to stop that process, especially when it has been running for a 
few weeks or months, the process will become a zombie (freeing all its 
user memory and file descriptors) and then xfsaild/kworker will continue 
flushing the log for about 30-45 minutes before the process really 
quits. It will keep its binds to network ports (which is my main 
problem) but the system will remain responsive and usable. The I/O 
pattern is several seconds of random reading then a second or two of 
sequential writing.

The kernel functions that are running in the zombie process context are 
mainly xfs_btree_lookup, xfs_log_commit_cil, xfs_next_bit, 
xfs_buf_find_isra.26

xfsaild is spending time in radix_tree_next_chunk, xfs_inode_buf_verify

kworker is in xfs_reclaim_inode, radix_tree_next_chunk

This is on (standard up-to date Debian 10):

Linux version 4.19.0-16-amd64 (debian-kernel@lists.debian.org) (gcc 
version 8.3.0 (Debian 8.3.0-6)) #1 SMP Debian 4.19.181-1 (2021-03-19)

xfs_progs 4.20.0-1

File system is RAID-0, 2x2TB disks with LVM over md (512k chunks)

meta-data=/dev/mapper/vg0-home   isize=512    agcount=32, 
agsize=29849728 blks
          =                       sectsz=4096  attr=2, projid32bit=1
          =                       crc=1        finobt=1, sparse=1, rmapbt=0
          =                       reflink=0
data     =                       bsize=4096   blocks=955191296, imaxpct=5
          =                       sunit=128    swidth=256 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=466402, version=2
          =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

MemTotal:       32800968 kB
MemFree:          759308 kB
MemAvailable:   27941208 kB
Buffers:           43900 kB
Cached:         26504332 kB
SwapCached:         7560 kB
Active:         16101380 kB
Inactive:       11488252 kB
Active(anon):     813424 kB
Inactive(anon):   228180 kB
Active(file):   15287956 kB
Inactive(file): 11260072 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      16777212 kB
SwapFree:       16715524 kB
Dirty:              2228 kB
Writeback:             0 kB
AnonPages:       1034280 kB
Mapped:            89660 kB
Shmem:               188 kB
Slab:            1508868 kB
SReclaimable:    1097804 kB
SUnreclaim:       411064 kB
KernelStack:        3792 kB
PageTables:         5872 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    33177696 kB
Committed_AS:    1394296 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
Percpu:             7776 kB
HardwareCorrupted:     0 kB
AnonHugePages:    215040 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:    11682188 kB
DirectMap2M:    21731328 kB
DirectMap1G:     1048576 kB

-- 
Momtchil Momtchev <momtchil@momtchev.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread