dave, thanks for getting back to me and the pointer to the config doc. lots to absorb and play with.

the real challenge for me is that I'm doing testing as different levels. While i realize running 100 parallel swift PUT threads on a small system is not the ideal way to do things, it's the only easy way to get massive numbers of objects into the fillesystem and once there, the performance of a single stream is pretty poor and by instrumenting the swift code I can clearly see excess time being spent in creating/writing the objects and so that's lead us to believe the problem lies in the way xfs is configured. creating a new directory structure on that same mount point immediately results in high levels of performance.

As an attempt to try to reproduce the problems w/o swift, I wrote a little python script that simply creates files in a 2-tier structure, the first tier consisting of 1024 directories and each directory contains 4096 subdirectories into which 1K files are created. I'm doing this for 10000 objects as a time and then timing them, reporting the times, 10 per line so each line represents 100 thousand file creates.

Here too I'm seeing degradation and if I look at what happens when there are already 3M files and I write 1M more, I see these creation times/10 thousand:

1.004236 0.961419 0.996514 1.012150 1.101794 0.999422 0.994796 1.214535 0.997276 1.306736

2.793429 1.201471 1.133576 1.069682 1.030985 1.096341 1.052602 1.391364 0.999480 1.914125

1.193892 0.967206 1.263310 0.890472 1.051962 4.253694 1.145573 1.528848 13.586892 4.925790

3.975442 8.896552 1.197005 3.904226 7.503806 1.294842 1.816422 9.329792 7.270323 5.936545

7.058685 5.516841 4.527271 1.956592 1.382551 1.510339 1.318341 13.255939 6.938845 4.106066

2.612064 2.028795 4.647980 7.371628 5.473423 5.823201 14.229120 0.899348 3.539658 8.501498

4.662593 6.423530 7.980757 6.367012 3.414239 7.364857 4.143751 6.317348 11.393067 1.273371

146.067300 1.317814 1.176529 1.177830 52.206605 1.112854 2.087990 42.328220 1.178436 1.335202

49.118140 1.368696 1.515826 44.690431 0.927428 0.920801 0.985965 1.000591 1.027458 60.650443

1.771318 2.690499 2.262868 1.061343 0.932998 64.064210 37.726213 1.245129 0.743771 0.996683

nothing one set of 10K took almost 3 minutes!

my main questions at this point are is this performance expected and/or might a newer kernel help? and might it be possible to significantly improve things via tuning or is it what it is? I do realize I'm starting with an empty directory tree whose performance degrades as it fills, but if I wanted to tune for say 10M or maybe 100M files might I be able to expect more consistent numbers (perhaps starting out at lower performance) as the numbers of objects grow? I'm basically looking for more consistency over a broader range of numbers of files.

-mark

On Wed, Jan 6, 2016 at 5:10 PM, Dave Chinner <david@fromorbit.com> wrote:

On Thu, Jan 07, 2016 at 09:04:54AM +1100, Dave Chinner wrote:
> On Wed, Jan 06, 2016 at 10:15:25AM -0500, Mark Seger wrote:
> > I've recently found the performance our development swift system is
> > degrading over time as the number of objects/files increases. This is a
> > relatively small system, each server has 3 400GB disks. The system I'm
> > currently looking at has about 70GB tied up in slabs alone, close to 55GB
> > in xfs inodes and ili, and about 2GB free. The kernel
> > is 3.14.57-1-amd64-hlinux.
>
> So you go 50M cached inodes in memory, and a relatively old kernel.
>
> > Here's the way the filesystems are mounted:
> >
> > /dev/sdb1 on /srv/node/disk0 type xfs
> > (rw,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,logbsize=256k,sunit=512,swidth=1536,noquota)
> >
> > I can do about 2000 1K file creates/sec when running 2 minute PUT tests at
> > 100 threads. If I repeat that tests for multiple hours, I see the number
> > of IOPS steadily decreasing to about 770 and the very next run it drops to
> > 260 and continues to fall from there. This happens at about 12M files.
>
> According to the numbers you've provided:
>
> lookups creates removes
> Fast: 1550 1350 300
> Slow: 1000 900 250
>
> This is pretty much what I'd expect on the XFS level when going from
> a small empty filesystem to one containing 12M 1k files.
>
> That does not correlate to your numbers above, so it's not at all
> clear that there is realy a problem here at the XFS level.
>
> > The directory structure is 2 tiered, with 1000 directories per tier so we
> > can have about 1M of them, though they don't currently all exist.
>
> That's insane.
>
> The xfs directory structure is much, much more space, time, IO and
> memory efficient that a directory hierachy like this. The only thing
> you need a directory hash hierarchy for is to provide sufficient
> concurrency for your operations, which you would probably get with a
> single level with one or two subdirs per filesystem AG.

BTW, you might want to read the section on directory block size for
a quick introduction to XFS directory design and scalability:

https://git.kernel.org/cgit/fs/xfs/xfs-documentation.git/tree/admin/XFS_Performance_Tuning/filesystem_tunables.asciidoc

Cheers,

Dave.
--
Dave Chinner
david@fromorbit.com