dave, thanks for getting back to me and the pointer to the config doc. lots to absorb and play with.
the real challenge for me is that I'm doing testing as different levels. While i realize running 100 parallel swift PUT threads on a small system is not the ideal way to do things, it's the only easy way to get massive numbers of objects into the fillesystem and once there, the performance of a single stream is pretty poor and by instrumenting the swift code I can clearly see excess time being spent in creating/writing the objects and so that's lead us to believe the problem lies in the way xfs is configured. creating a new directory structure on that same mount point immediately results in high levels of performance.
As an attempt to try to reproduce the problems w/o swift, I wrote a little python script that simply creates files in a 2-tier structure, the first tier consisting of 1024 directories and each directory contains 4096 subdirectories into which 1K files are created. I'm doing this for 10000 objects as a time and then timing them, reporting the times, 10 per line so each line represents 100 thousand file creates.
Here too I'm seeing degradation and if I look at what happens when there are already 3M files and I write 1M more, I see these creation times/10 thousand:
1.004236 0.961419 0.996514 1.012150 1.101794 0.999422 0.994796 1.214535 0.997276 1.306736
2.793429 1.201471 1.133576 1.069682 1.030985 1.096341 1.052602 1.391364 0.999480 1.914125
1.193892 0.967206 1.263310 0.890472 1.051962 4.253694 1.145573 1.528848 13.586892 4.925790
3.975442 8.896552 1.197005 3.904226 7.503806 1.294842 1.816422 9.329792 7.270323 5.936545
7.058685 5.516841 4.527271 1.956592 1.382551 1.510339 1.318341 13.255939 6.938845 4.106066
2.612064 2.028795 4.647980 7.371628 5.473423 5.823201 14.229120 0.899348 3.539658 8.501498
4.662593 6.423530 7.980757 6.367012 3.414239 7.364857 4.143751 6.317348 11.393067 1.273371
146.067300 1.317814 1.176529 1.177830 52.206605 1.112854 2.087990 42.328220 1.178436 1.335202
49.118140 1.368696 1.515826 44.690431 0.927428 0.920801 0.985965 1.000591 1.027458 60.650443
1.771318 2.690499 2.262868 1.061343 0.932998 64.064210 37.726213 1.245129 0.743771 0.996683
nothing one set of 10K took almost 3 minutes!
my main questions at this point are is this performance expected and/or might a newer kernel help? and might it be possible to significantly improve things via tuning or is it what it is? I do realize I'm starting with an empty directory tree whose performance degrades as it fills, but if I wanted to tune for say 10M or maybe 100M files might I be able to expect more consistent numbers (perhaps starting out at lower performance) as the numbers of objects grow? I'm basically looking for more consistency over a broader range of numbers of files.
-mark