I've recently found the performance our development swift system is degrading over time as the number of objects/files increases.  This is a relatively small system, each server has 3 400GB disks.  The system I'm currently looking at has about 70GB tied up in slabs alone, close to 55GB in xfs inodes and ili, and about 2GB free.  The kernel is 3.14.57-1-amd64-hlinux.

Here's the way the filesystems are mounted:

/dev/sdb1 on /srv/node/disk0 type xfs (rw,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,logbsize=256k,sunit=512,swidth=1536,noquota)

I can do about 2000 1K file creates/sec when running 2 minute PUT tests at 100 threads.  If I repeat that tests for multiple hours, I see the number of IOPS steadily decreasing to about 770 and the very next run it drops to 260 and continues to fall from there.  This happens at about 12M files.

The directory structure is 2 tiered, with 1000 directories per tier so we can have about 1M of them, though they don't currently all exist.

I've written a collectl plugin that lets me watch many of the xfs stats in real-time and also have a test script that exercises the swift PUT code directly and so eliminates all the inter-node communications.  This script also allows me to write to the existing swift directories as well as redirect to an empty structure so mimics clean environment with no existing subdirectories.

I'm attaching some xfs stats during the run and hope they're readable.  These values are in operations/sec and each line is 1 second's worth of data.  The first set of numbers is on the clean directory and the second on the existing 12M file one.  At the bottom of these stats are also the xfs slab allocations as reported by collectl.  I can also watch these during a test and can see the number of inode and ilo objects steadily grow at about 1K/sec, which is curious since I'm only creating about 300.

If there is anything else I can provide just let me know.

I don't fully understand all the xfs stats but what does jump out at me is the XFS read/write ops have increased by a factor of about 5 when the system is slower.  Right now the collectl plugin is not something I've released, but if there is interest and someone would like to help me present the data in a more organized/meaningful manner just let me know.

if there are any tuning suggestions I'm more than happy to try them out.

-mark