On Nov 23, 2004 00:02 +1100, tridge@samba.org wrote: > I've put up graphs of the first set of dbench3 results for various > filesystems at: > > http://samba.org/~tridge/xattr_results/ > > The results show that the ext3 large inode patch is extremely > worthwhile. Using a 256 byte inode on ext3 gained a factor of up to 7x > in performance, and only lost a very small amount when xattrs were not > used. It took ext3 from a very mediocre performance to being the clear > winner among current Linux journaled filesystems for performance when > xattrs are used. Eventually I think that larger inodes should become > the default. For Lustre we tune the inode size at format time to allow the storing of the "default" EA data within the larger inode. Is this the case with samba and 256-byte inodes (i.e. is your EA data all going to fit within the extra 124 bytes of space for storing EAs)? If you have to put any of the commonly-used EA data into an external block the benefits are lost. > The massive gap between ext2 and the other filesystems really shows > clearly how much we are paying for journaling. I haven't tried any > journal on external device or journal on nvram card tricks yet, but it > looks like those will be worth pursuing. One of the other things we do for Lustre right away is create the ext3 filesystem with larger journal sizes so that for the many-client cases we do not get synchronous journal flushing if there are lots of active threads. This can make a huge difference in overall performance at high loads. Use "mke2fs -J size=400 ..." to create a 400MB journal (assuming you have at least that much RAM and a large enough block device, at least 4x the journal size just from a "don't waste space" point of view). One factor is that you don't necessarily need to write so much data at one time, but also that ext3 needs to reserve journal space for the worst-case usage, so you get 40-100 threads allocating "worst case" then "filling" the journal (causing new operations to block) and finally completing with only a small fraction of those reserved journal blocks actually used. Having an external journal device also generally gives you a large journal (by default it is the full size of the block device specified) so sometimes the effects of the large journal are confused with the fact that it is external. I haven't seen any perf numbers recently on what kind of effect having an external journal has. I highly doubt that NVRAM cards are any better than a dedicated disk for the journal, since journal IO is write-only (except during recovery) and virtually seek-free. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/