* LWN.net article: creating 1 billion files -> XFS looses @ 2010-08-19 11:12 Michael Monnerie 2010-08-19 12:05 ` Christoph Hellwig ` (3 more replies) 0 siblings, 4 replies; 22+ messages in thread From: Michael Monnerie @ 2010-08-19 11:12 UTC (permalink / raw) To: xfs [-- Attachment #1.1: Type: text/plain, Size: 597 bytes --] The subject is a bit harsh, but overall the article says: XFS is slowest on creating and deleting a billion files XFS fsck needs 30GB RAM to fsck that 100TB filesystem. http://lwn.net/SubscriberLink/400629/3fb4bc34d6223b32/ -- mit freundlichen Grüssen, Michael Monnerie, Ing. BSc it-management Internet Services http://proteger.at [gesprochen: Prot-e-schee] Tel: 0660 / 415 65 31 ****** Aktuelles Radiointerview! ****** http://www.it-podcast.at/aktuelle-sendung.html // Wir haben im Moment zwei Häuser zu verkaufen: // http://zmi.at/langegg/ // http://zmi.at/haus2009/ [-- Attachment #1.2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: LWN.net article: creating 1 billion files -> XFS looses 2010-08-19 11:12 LWN.net article: creating 1 billion files -> XFS looses Michael Monnerie @ 2010-08-19 12:05 ` Christoph Hellwig 2010-08-19 12:45 ` Michael Monnerie 2010-08-19 13:10 ` Emmanuel Florac ` (2 subsequent siblings) 3 siblings, 1 reply; 22+ messages in thread From: Christoph Hellwig @ 2010-08-19 12:05 UTC (permalink / raw) To: Michael Monnerie; +Cc: xfs On Thu, Aug 19, 2010 at 01:12:45PM +0200, Michael Monnerie wrote: > The subject is a bit harsh, but overall the article says: > XFS is slowest on creating and deleting a billion files > XFS fsck needs 30GB RAM to fsck that 100TB filesystem. > > http://lwn.net/SubscriberLink/400629/3fb4bc34d6223b32/ The creation and deletion performance is a known issue, and too a large extent fixes by the new delaylog code. We're not quite as fast as ext4 yet, but it's getting close. The repair result looks a lot like the pre-3.1.0 xfsprogs repair. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: LWN.net article: creating 1 billion files -> XFS looses 2010-08-19 12:05 ` Christoph Hellwig @ 2010-08-19 12:45 ` Michael Monnerie 2010-08-19 13:55 ` Stan Hoeppner 2010-08-20 7:55 ` Dave Chinner 0 siblings, 2 replies; 22+ messages in thread From: Michael Monnerie @ 2010-08-19 12:45 UTC (permalink / raw) To: xfs [-- Attachment #1.1: Type: Text/Plain, Size: 1064 bytes --] On Donnerstag, 19. August 2010 Christoph Hellwig wrote: > The creation and deletion performance is a known issue, and too a > large extent fixes by the new delaylog code. We're not quite as > fast as ext4 yet, but it's getting close. > > The repair result looks a lot like the pre-3.1.0 xfsprogs repair. Yes I know. I thought some XFS dev might contact the Author to do some re-testing, as a reputation is quickly destroyed by such articles and takes long to be returned. Just this week I had a friend in a FS discussion saying "ins't XFS destroying/zeroing files on power failure?". That information is ancient, but things like that stay in peoples brain for(almost)ever. -- mit freundlichen Grüssen, Michael Monnerie, Ing. BSc it-management Internet Services http://proteger.at [gesprochen: Prot-e-schee] Tel: 0660 / 415 65 31 ****** Aktuelles Radiointerview! ****** http://www.it-podcast.at/aktuelle-sendung.html // Wir haben im Moment zwei Häuser zu verkaufen: // http://zmi.at/langegg/ // http://zmi.at/haus2009/ [-- Attachment #1.2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: LWN.net article: creating 1 billion files -> XFS looses 2010-08-19 12:45 ` Michael Monnerie @ 2010-08-19 13:55 ` Stan Hoeppner 2010-08-20 7:55 ` Dave Chinner 1 sibling, 0 replies; 22+ messages in thread From: Stan Hoeppner @ 2010-08-19 13:55 UTC (permalink / raw) To: xfs Michael Monnerie put forth on 8/19/2010 7:45 AM: > Just this week I had a friend in a FS > discussion saying "ins't XFS destroying/zeroing files on power > failure?". That information is ancient, but things like that stay in > peoples brain for(almost)ever. Had a similar lengthy discussion over on debian-users not more than a month or so ago. Same thing there. Of the 10 or so people active in the thread, I'd say 8 of them were anti-XFS because of the "corruption due to power failure" issue that they'd "read about" years before. Not a single one of them had ever used XFS. A couple of them considered it a "hobbyist quality" filesystem that might be ready for production use in a few years. Ahh the ignorance abounds in our world... I did my best to educate them, sending them to the Wikipedia page on XFS and to the xfs.org site and specifically the relevant sections of the FAQ. Unfortunately there are some people who simply refuse to be educated. But those type of people aren't candidates for XFS anyway, thankfully. ;) -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: LWN.net article: creating 1 billion files -> XFS looses 2010-08-19 12:45 ` Michael Monnerie 2010-08-19 13:55 ` Stan Hoeppner @ 2010-08-20 7:55 ` Dave Chinner 1 sibling, 0 replies; 22+ messages in thread From: Dave Chinner @ 2010-08-20 7:55 UTC (permalink / raw) To: Michael Monnerie; +Cc: xfs On Thu, Aug 19, 2010 at 02:45:22PM +0200, Michael Monnerie wrote: > On Donnerstag, 19. August 2010 Christoph Hellwig wrote: > > The creation and deletion performance is a known issue, and too a > > large extent fixes by the new delaylog code. We're not quite as > > fast as ext4 yet, but it's getting close. > > > > The repair result looks a lot like the pre-3.1.0 xfsprogs repair. > > Yes I know. I thought some XFS dev might contact the Author to do some > re-testing, as a reputation is quickly destroyed by such articles and > takes long to be returned. Just this week I had a friend in a FS > discussion saying "ins't XFS destroying/zeroing files on power > failure?". That information is ancient, but things like that stay in > peoples brain for(almost)ever. Don't worry too much - I have the details of the test that was run and alredy know why XFS appeared so slow: it was single threaded. Immediately, that means XFS will be slower to create 1b files regardless of any other detail. Look at it this way - the initial numbers I'm seeing on my test rig are sustained create rates of about 8,000/s with default mkfs/mount options (i.e. no tuning, no delayed logging, 32k logbsize, etc) and it is burning exactly one of 8 CPUs in the VM. I know I can get an order of magnitude better performance out of XFS on this VM.... It'll take me a few days to run the numbers to be able to write a solid reply, but I have every confidence that a "create 1b inodes" benchmark tuned to XFS's strengths rather than one designed to avoid ext4's weaknesses will show very, very different results. In the meantime, there is no need to start a flamewar. ;) Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: LWN.net article: creating 1 billion files -> XFS looses 2010-08-19 11:12 LWN.net article: creating 1 billion files -> XFS looses Michael Monnerie 2010-08-19 12:05 ` Christoph Hellwig @ 2010-08-19 13:10 ` Emmanuel Florac 2010-09-06 13:42 ` Emmanuel Florac 2010-09-16 10:13 ` LWN.net article: creating 1 billion files -> Tests we did Emmanuel Florac 3 siblings, 0 replies; 22+ messages in thread From: Emmanuel Florac @ 2010-08-19 13:10 UTC (permalink / raw) To: xfs [-- Attachment #1.1: Type: text/plain, Size: 876 bytes --] Le Thu, 19 Aug 2010 13:12:45 +0200 Michael Monnerie <michael.monnerie@is.it-management.at> écrivait: > The subject is a bit harsh, but overall the article says: > XFS is slowest on creating and deleting a billion files > XFS fsck needs 30GB RAM to fsck that 100TB filesystem. Too bad I haven't got a 100 TB machine at hand. However I have a 24TB system dedicated to tests. I'm pretty sure we can do much better with XFS and the proper mount options :) In fact, I have an unused 40 TB array too. See this one later... Stay tuned :) -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 197 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: LWN.net article: creating 1 billion files -> XFS looses 2010-08-19 11:12 LWN.net article: creating 1 billion files -> XFS looses Michael Monnerie 2010-08-19 12:05 ` Christoph Hellwig 2010-08-19 13:10 ` Emmanuel Florac @ 2010-09-06 13:42 ` Emmanuel Florac 2010-09-06 22:04 ` Dave Chinner 2010-09-16 10:13 ` LWN.net article: creating 1 billion files -> Tests we did Emmanuel Florac 3 siblings, 1 reply; 22+ messages in thread From: Emmanuel Florac @ 2010-09-06 13:42 UTC (permalink / raw) To: xfs Le Thu, 19 Aug 2010 13:12:45 +0200 Michael Monnerie <michael.monnerie@is.it-management.at> écrivait: > The subject is a bit harsh, but overall the article says: > XFS is slowest on creating and deleting a billion files > XFS fsck needs 30GB RAM to fsck that 100TB filesystem. Just to go on this subject : a colleague (following my suggestion :) tried to create 1 billion files in the same XFS directory. Unfortunately the directories themselves don't scale well that far : after 1 million files in the first 30 minutes, file creation slows down gradually, so after 100 hours we had about 230 million files. The directory size at that point was 5,3 GB. Now we're starting afresh with 1000 directories with 1 million files each :) (Kernel version used : vanilla 2.6.32.11 x86_64 smp) -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: LWN.net article: creating 1 billion files -> XFS looses 2010-09-06 13:42 ` Emmanuel Florac @ 2010-09-06 22:04 ` Dave Chinner 2010-09-06 22:58 ` Michael Monnerie 2010-09-07 6:46 ` Emmanuel Florac 0 siblings, 2 replies; 22+ messages in thread From: Dave Chinner @ 2010-09-06 22:04 UTC (permalink / raw) To: Emmanuel Florac; +Cc: xfs On Mon, Sep 06, 2010 at 03:42:54PM +0200, Emmanuel Florac wrote: > Le Thu, 19 Aug 2010 13:12:45 +0200 > Michael Monnerie <michael.monnerie@is.it-management.at> écrivait: > > > The subject is a bit harsh, but overall the article says: > > XFS is slowest on creating and deleting a billion files > > XFS fsck needs 30GB RAM to fsck that 100TB filesystem. > > Just to go on this subject : a colleague (following my suggestion :) > tried to create 1 billion files in the same XFS directory. > Unfortunately the directories themselves don't scale well that far : > after 1 million files in the first 30 minutes, file creation slows down > gradually, so after 100 hours we had about 230 million files. The > directory size at that point was 5,3 GB. Oh, that's larger than I've every run before ;) Try using: # mkfs.xfs -d size=64k Will speed up large directory operations by at least an order of magnitude. > Now we're starting afresh with 1000 directories with 1 million files > each :) Which is exactly the test that was used to generate the numbers that were published. > (Kernel version used : vanilla 2.6.32.11 x86_64 smp) Not much point in testing that kernel - delayed logging is where the future is for this sort of workload, which is what I'm testing. FWIW, I'm able to create 50 million inodes in under 14 minutes with delayed logging and 8 threads using directories of 100k entries. The run to 1 billion inodes that I started late last night (10 hours in) has just passed 700M inodes on a 16TB filesystem. It's running at about 25,000 creates/s, but it is limited by bad shrinker behaviour causing the dentry cache to be completely trashed causing ~3000 read iops to reload dentries that are still necessary for operation. It should be running about 3-4x faster than that. FYI, The reason I'm taking a while to get numbers is that parallel create workloads of this scale are showing significant problems (VM livelocks, shrinker misbehaviour, lock contention in IO completion processing, buffer cache hash scaling issues, etc) and I'm trying to fix them as I go - these metadata workloads are completely unexplored territory.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: LWN.net article: creating 1 billion files -> XFS looses 2010-09-06 22:04 ` Dave Chinner @ 2010-09-06 22:58 ` Michael Monnerie 2010-09-07 3:31 ` Dave Chinner 2010-09-07 6:46 ` Emmanuel Florac 1 sibling, 1 reply; 22+ messages in thread From: Michael Monnerie @ 2010-09-06 22:58 UTC (permalink / raw) To: xfs [-- Attachment #1.1: Type: Text/Plain, Size: 1231 bytes --] On Dienstag, 7. September 2010 Dave Chinner wrote: > # mkfs.xfs -d size=64k > > Will speed up large directory operations by at least an order of > magnitude. I've read the man page for mkfs.xfs, but I couldn't find out if using mkfs -d su=64k,sw=2 would be a redundant (and superior) option for that? I'd guess so, reading the description of sunit: sunit=value This is used to specify the stripe unit for a RAID device or a logical volume. The value has to be specified in 512-byte block units. Use the su suboption to specify the stripe unit size in bytes. This suboption ensures that data allocations will be stripe unit aligned when the current end of file is being extended and the file size is larger than 512KiB. Also inode allocations and the internal log will be stripe unit aligned. Or would I still need to use size=64k? -- mit freundlichen Grüssen, Michael Monnerie, Ing. BSc it-management Internet Services http://proteger.at [gesprochen: Prot-e-schee] Tel: 0660 / 415 65 31 ****** Aktuelles Radiointerview! ****** http://www.it-podcast.at/aktuelle-sendung.html // Wir haben im Moment zwei Häuser zu verkaufen: // http://zmi.at/langegg/ // http://zmi.at/haus2009/ [-- Attachment #1.2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: LWN.net article: creating 1 billion files -> XFS looses 2010-09-06 22:58 ` Michael Monnerie @ 2010-09-07 3:31 ` Dave Chinner 2010-09-07 6:20 ` Michael Monnerie 0 siblings, 1 reply; 22+ messages in thread From: Dave Chinner @ 2010-09-07 3:31 UTC (permalink / raw) To: Michael Monnerie; +Cc: xfs On Tue, Sep 07, 2010 at 12:58:40AM +0200, Michael Monnerie wrote: > On Dienstag, 7. September 2010 Dave Chinner wrote: > > # mkfs.xfs -d size=64k > > > > Will speed up large directory operations by at least an order of > > magnitude. > > I've read the man page for mkfs.xfs, but I couldn't find out if using > mkfs -d su=64k,sw=2 Sorry, I screwed that up, it should have read: # mkfs.xfs -n size=64k (-n = naming = directories. -d = data != directories) Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: LWN.net article: creating 1 billion files -> XFS looses 2010-09-07 3:31 ` Dave Chinner @ 2010-09-07 6:20 ` Michael Monnerie 2010-09-07 7:01 ` Dave Chinner 0 siblings, 1 reply; 22+ messages in thread From: Michael Monnerie @ 2010-09-07 6:20 UTC (permalink / raw) To: xfs [-- Attachment #1.1: Type: Text/Plain, Size: 1251 bytes --] On Dienstag, 7. September 2010 Dave Chinner wrote: > # mkfs.xfs -n size=64k > (-n = naming = directories. -d = data != directories) Thank you, Dave. Do I interpret that parameter right: When a new directory is created, per default it would occupy only 4KB, with -n size=64k would be reserved. As the directory fills, space within that block will be used, so in the default case after 4KB (how many inodes would that be roughly? 256 Bytes/Inode, so 16 entries?) XFS would reserve the next block, but in your case 256 entries would fit. That would keep dir fragmentation lower, and with todays disks, take a minimal more space, so it sounds very good to use that option. Especially with RAIDs, where stripes usually are 64KB or bigger. Or would the waste of space be so big that it could hurt? Last question: Is there a way to set that option on a given XFS? -- mit freundlichen Grüssen, Michael Monnerie, Ing. BSc it-management Internet Services http://proteger.at [gesprochen: Prot-e-schee] Tel: 0660 / 415 65 31 ****** Aktuelles Radiointerview! ****** http://www.it-podcast.at/aktuelle-sendung.html // Wir haben im Moment zwei Häuser zu verkaufen: // http://zmi.at/langegg/ // http://zmi.at/haus2009/ [-- Attachment #1.2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: LWN.net article: creating 1 billion files -> XFS looses 2010-09-07 6:20 ` Michael Monnerie @ 2010-09-07 7:01 ` Dave Chinner 2010-09-08 5:42 ` Michael Monnerie 0 siblings, 1 reply; 22+ messages in thread From: Dave Chinner @ 2010-09-07 7:01 UTC (permalink / raw) To: Michael Monnerie; +Cc: xfs On Tue, Sep 07, 2010 at 08:20:07AM +0200, Michael Monnerie wrote: > On Dienstag, 7. September 2010 Dave Chinner wrote: > > # mkfs.xfs -n size=64k > > (-n = naming = directories. -d = data != directories) > > Thank you, Dave. Do I interpret that parameter right: > > When a new directory is created, per default it would occupy only 4KB, > with -n size=64k would be reserved. No, it allocates 64k blocks for the directory instead of 4k blocks. > As the directory fills, space within > that block will be used, so in the default case after 4KB (how many > inodes would that be roughly? 256 Bytes/Inode, so 16 entries?) XFS would > reserve the next block, but in your case 256 entries would fit. Inodes are not stored in the dirctory structure, only the directory entry name and the inode number. Hence the amount of space used by a directory entry is determined by the length of the name. > That would keep dir fragmentation lower, and with todays disks, take a > minimal more space, so it sounds very good to use that option. > Especially with RAIDs, where stripes usually are 64KB or bigger. Or > would the waste of space be so big that it could hurt? Well, there is extra overhead to allocate large directory blocks (16 pages instead of one, to begin with, then there's the vmap overhead, etc), so for small directories smaller block sizes are faster for create and unlink operations. For empty directorys, operations on 4k block sized directories consume roughly 50% less CPU that 64k block size directories. The 4k block size directoeies consume less CPU out to roughly 1.5 million entries where the two are roughly equal. At directory sizes of 10 million entries, 64k directory block operations are consuming about 15% of the CPU that 4k directory block operations consume. In terms of lookups, the 64k block directory will take less IO but consume more CPU for a given lookup. Hence it depends on your IO latency and whether directory readahead can hide that latency as to which will be faster. e.g. For SSDs, CPU usage might be the limiting factor, not the IO. Right now I don't have any numbers on what the difference might be - I'm getting 1B inode population issues worked out first before I start on measuring cold cache lookup times on 1B files.... > Last question: Is there a way to set that option on a given XFS? No, it is a mkfs time parameter, though we have been discussing the possibility of being able to set it per-directory (at mkdir time when no blocks have been allocated). Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: LWN.net article: creating 1 billion files -> XFS looses 2010-09-07 7:01 ` Dave Chinner @ 2010-09-08 5:42 ` Michael Monnerie 0 siblings, 0 replies; 22+ messages in thread From: Michael Monnerie @ 2010-09-08 5:42 UTC (permalink / raw) To: xfs [-- Attachment #1.1: Type: Text/Plain, Size: 612 bytes --] On Dienstag, 7. September 2010 Dave Chinner wrote: > # mkfs.xfs -n size=64k That explanation was worth a FAQ entry to keep it as a reference. http://xfs.org/index.php/XFS_FAQ#Q:_Performance:_mkfs.xfs_-n_size.3D64k_option Thanks, Dave. -- mit freundlichen Grüssen, Michael Monnerie, Ing. BSc it-management Internet Services http://proteger.at [gesprochen: Prot-e-schee] Tel: 0660 / 415 65 31 ****** Aktuelles Radiointerview! ****** http://www.it-podcast.at/aktuelle-sendung.html // Wir haben im Moment zwei Häuser zu verkaufen: // http://zmi.at/langegg/ // http://zmi.at/haus2009/ [-- Attachment #1.2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: LWN.net article: creating 1 billion files -> XFS looses 2010-09-06 22:04 ` Dave Chinner 2010-09-06 22:58 ` Michael Monnerie @ 2010-09-07 6:46 ` Emmanuel Florac 1 sibling, 0 replies; 22+ messages in thread From: Emmanuel Florac @ 2010-09-07 6:46 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs Le Tue, 7 Sep 2010 08:04:10 +1000 vous écriviez: > Oh, that's larger than I've ever run before ;) Excellent :) Still works fine afterwards; mount, umount, etc works flawlessly. Memory consumption though is huge :) > > Try using: > > # mkfs.xfs -d size=64k > > Will speed up large directory operations by at least an order of > magnitude. OK, we'll try that too :) > > Now we're starting afresh with 1000 directories with 1 million files > > each :) > > Which is exactly the test that was used to generate the numbers that > were published. > > > (Kernel version used : vanilla 2.6.32.11 x86_64 smp) > > Not much point in testing that kernel - delayed logging is where the > future is for this sort of workload, which is what I'm testing. I'll compile a 2.6.36rc for comparison. -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: LWN.net article: creating 1 billion files -> Tests we did 2010-08-19 11:12 LWN.net article: creating 1 billion files -> XFS looses Michael Monnerie ` (2 preceding siblings ...) 2010-09-06 13:42 ` Emmanuel Florac @ 2010-09-16 10:13 ` Emmanuel Florac 2010-09-16 21:53 ` Stan Hoeppner 2010-09-17 19:57 ` Peter Grandi 3 siblings, 2 replies; 22+ messages in thread From: Emmanuel Florac @ 2010-09-16 10:13 UTC (permalink / raw) To: Michael Monnerie; +Cc: xfs Le Thu, 19 Aug 2010 13:12:45 +0200 Michael Monnerie <michael.monnerie@is.it-management.at> écrivait: > The subject is a bit harsh, but overall the article says: > XFS is slowest on creating and deleting a billion files > XFS fsck needs 30GB RAM to fsck that 100TB filesystem. > > http://lwn.net/SubscriberLink/400629/3fb4bc34d6223b32/ So We've made a test with 1KB files (space, space...) and a production kernel : 2.6.32.11 (yeah I know, 2.6.38 should be faster but you know, we upgrade our production kernels prudently :). mk1BFiles will create and delete 1000000000 files with 32 threads Version: v0.2.4-10-gf6decd3, build: Sep 7 2010 13:39:34 Creating 1000000000 files, started at 2010-09-07 13:45:16... Done, time spent: 89:35:12.262 Doing `ls -R`, started at 2010-09-11 07:20:28... Stat: ls (pid: 18844) status: ok, returned value: 0 Cpu usage: user: 1:27:47.242, system: 20:18:21.689 Max rss: 229.01 MBytes, page fault: major: 4, minor: 58694 Compute size used by 1000000000 files, started at 2010-09-12 09:30:52... Size used by files: 11.1759 TBytes Size used by directory: 32.897 GBytes Size used (total): 11.2080 TBytes Done, time spent: 25:50:32.355 Deleting 1000000000 files, started at 2010-09-13 11:21:24... Done, time spent: 68:37:38.117 Test run on a dual Opteron quad core, 16 GB RAM, kernel 2.6.32.11 x86_64... -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: LWN.net article: creating 1 billion files -> Tests we did 2010-09-16 10:13 ` LWN.net article: creating 1 billion files -> Tests we did Emmanuel Florac @ 2010-09-16 21:53 ` Stan Hoeppner 2010-09-17 7:54 ` Michael Monnerie ` (2 more replies) 2010-09-17 19:57 ` Peter Grandi 1 sibling, 3 replies; 22+ messages in thread From: Stan Hoeppner @ 2010-09-16 21:53 UTC (permalink / raw) To: xfs Emmanuel Florac put forth on 9/16/2010 5:13 AM: > Test run on a dual Opteron quad core, 16 GB RAM, kernel 2.6.32.11 > x86_64... This is a test of storage system performance, and you left out the storage array specs? By doing so it seems you're stating the underlying storage is not relevant to the results. So, are you saying I should be able to duplicate your results with that dual Opty system, but using an md RAID0 stripe over 8x2TB SATA disks connected to two $60 4 port SiI 3124 PCIe x1 cards? -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: LWN.net article: creating 1 billion files -> Tests we did 2010-09-16 21:53 ` Stan Hoeppner @ 2010-09-17 7:54 ` Michael Monnerie 2010-09-17 19:29 ` Peter Grandi 2010-09-18 11:16 ` Emmanuel Florac 2 siblings, 0 replies; 22+ messages in thread From: Michael Monnerie @ 2010-09-17 7:54 UTC (permalink / raw) To: xfs; +Cc: Stan Hoeppner [-- Attachment #1.1: Type: Text/Plain, Size: 1328 bytes --] On Donnerstag, 16. September 2010 Stan Hoeppner wrote: > So, are you saying I should be able to duplicate your results with > that dual Opty system, but using an md RAID0 stripe over 8x2TB SATA > disks connected to two $60 4 port SiI 3124 PCIe x1 cards? According to Dave, with his patches you should even outperform that if you got faster CPUs :-) Emmanuel, where is the "mk1BFiles" Benchmark? We're planning for a new hardware this year, so this would be a good time to run it. Could I have the script? The output misses the time "ls" took, but I can one can calculate that from the next test start, so 9:30-7:20 = 2:10, is that correct? Two hours to just see all files, ugh. I guess it will take some years until we want to have such a filesystem. Either hardware must become quicker, or another wonderful new patch is needed. -- // Michael Monnerie, Ing.BSc. ---------------------------------- Sorcerers have their magic wands: powerful, potentially dangerous tools with a life of their own. Witches have their familiars: creatures disguised as household beasts that could, if they choose, wreak the witches' havoc. Mystics have their golems: beings built of wood and tin brought to life to do their masters' bidding. I have Linux. ---------------------------------- [-- Attachment #1.2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: LWN.net article: creating 1 billion files -> Tests we did 2010-09-16 21:53 ` Stan Hoeppner 2010-09-17 7:54 ` Michael Monnerie @ 2010-09-17 19:29 ` Peter Grandi 2010-09-18 11:25 ` Emmanuel Florac 2010-09-18 11:16 ` Emmanuel Florac 2 siblings, 1 reply; 22+ messages in thread From: Peter Grandi @ 2010-09-17 19:29 UTC (permalink / raw) To: Linux XFS [ ... useless run of something misrepresented as a test ... ] >> Test run on a dual Opteron quad core, 16 GB RAM, kernel >> 2.6.32.11 x86_64... > This is a test of storage system performance, and you left out > the storage array specs? By doing so it seems you're stating > the underlying storage is not relevant to the results. [ ... ] This is only one of the several aspects of the waste of time that was misrepresented as a storage test, even if it is one of the funniest. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: LWN.net article: creating 1 billion files -> Tests we did 2010-09-17 19:29 ` Peter Grandi @ 2010-09-18 11:25 ` Emmanuel Florac 0 siblings, 0 replies; 22+ messages in thread From: Emmanuel Florac @ 2010-09-18 11:25 UTC (permalink / raw) To: Peter Grandi; +Cc: Linux XFS Le Fri, 17 Sep 2010 20:29:01 +0100 vous écriviez: > [ ... useless run of something misrepresented as a test ... ] I won't comment on the usefulness of your rant, though, because I'd rather stay amiable. If you don't mind, I'll however post complementary results of the same test running on different filesystems on the very same hardware, because apparently you missed the fact that this was about tracking comparative xfs progress in some metadata intensive workloads. -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: LWN.net article: creating 1 billion files -> Tests we did 2010-09-16 21:53 ` Stan Hoeppner 2010-09-17 7:54 ` Michael Monnerie 2010-09-17 19:29 ` Peter Grandi @ 2010-09-18 11:16 ` Emmanuel Florac 2 siblings, 0 replies; 22+ messages in thread From: Emmanuel Florac @ 2010-09-18 11:16 UTC (permalink / raw) To: Stan Hoeppner; +Cc: xfs Le Thu, 16 Sep 2010 16:53:07 -0500 vous écriviez: > This is a test of storage system performance, and you left out the > storage array specs? By doing so it seems you're stating the > underlying storage is not relevant to the results. Sorry :) The storage is a 24 2TB disks RAID-6 array on a 3W&re 9650. Not exactly stellar at IOPS performance. > So, are you saying I should be able to duplicate your results with > that dual Opty system, but using an md RAID0 stripe over 8x2TB SATA > disks connected to two $60 4 port SiI 3124 PCIe x1 cards? > Your setup may be slightly slower. -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: LWN.net article: creating 1 billion files -> Tests we did 2010-09-16 10:13 ` LWN.net article: creating 1 billion files -> Tests we did Emmanuel Florac 2010-09-16 21:53 ` Stan Hoeppner @ 2010-09-17 19:57 ` Peter Grandi 2010-09-18 11:39 ` Emmanuel Florac 1 sibling, 1 reply; 22+ messages in thread From: Peter Grandi @ 2010-09-17 19:57 UTC (permalink / raw) To: Linux XFS >> The subject is a bit harsh, but overall the article says: XFS >> is slowest on creating and deleting a billion files XFS fsck >> needs 30GB RAM to fsck that 100TB filesystem. Hahahaha. Very funny. So what? >> http://lwn.net/SubscriberLink/400629/3fb4bc34d6223b32/ LWN is usually fairly decent, but I have noticed it does occasionally waste pixels on/bits things that the author(s) misrepresent as storage or file system tests. However in this case the main takeaway of the presentation reported is that it is just a bad idea to assume that file systems can scale to large collections of small files as DBMSes designed for that purpose. So what? > So We've made a test with 1KB files (space, space...) and a > production kernel : 2.6.32.11 (yeah I know, 2.6.38 should be > faster but you know, we upgrade our production kernels prudently :). Why is this a test of anything other than how to waste time? > mk1BFiles will create and delete 1000000000 files with 32 > threads Version: v0.2.4-10-gf6decd3, build: Sep 7 2010 > 13:39:34 > Creating 1000000000 files, started at 2010-09-07 13:45:16... > Done, time spent: 89:35:12.262 Was there any intervening cache flush? > Doing `ls -R`, started at 2010-09-11 07:20:28... > Stat: ls (pid: 18844) status: ok, returned value: 0 > Cpu usage: user: 1:27:47.242, system: 20:18:21.689 > Max rss: 229.01 MBytes, page fault: major: 4, minor: 58694 Was there any intervening cache flush? > Compute size used by 1000000000 files, started at 2010-09-12 09:30:52... > Size used by files: 11.1759 TBytes > Size used by directory: 32.897 GBytes > Size used (total): 11.2080 TBytes > Done, time spent: 25:50:32.355 Was there any intervening cache flush? > Deleting 1000000000 files, started at 2010-09-13 11:21:24... > Done, time spent: 68:37:38.117 Was there any intervening cache flush? Why would anybody with even a little knowledge of computers and systems want to use a filesystem as database for small records? > Test run on a dual Opteron quad core, 16 GB RAM, kernel 2.6.32.11 > x86_64... So what? Some of the most amusing quotes from the LWN article are from the comments. "Recently I did similiar tests for determining how well PostgreSQL would be able to deal with databases with potentially hundreds of thousands of tables. From what I found out, it's only limited by the file system's ability to work with that many files in a single directory." HHAHAHAHAHAHAHA. "> But in what situations will it make more sense to not group a > billion of file items into logical groups? Things like squid cache directories, git object directories, ccache cache directories, that hidden thumbnails directory in your $HOME... They all have in common that the files are named by a hash or something similar. There is no logical grouping at all here; it is a completely flat namespace." AAAAAAGGGGGHHHHHHHHHHH. But the original presentation has absolutely the funniest bit: "Why Not Use a Database? ● Users and system administrators are familiar with file systems Backup, creation, etc are all well understood ● File systems handle partial failures pretty well Being able to recover part of the stored data is useful for some applications ● File systems are “cheap” since they come with your operating system!" My evil translation of that is "because so many sysadms and programmers are incompetent and stupid and wish for ponies". Of course the best bit is where someone :-) was quoted making sense: “Millions of files may work; but 1 billion is an utter absurdity. A filesystem that can store reasonably 1 billion small files in 7TB is an unsolved research issue...,” The stupidest bit of the presentation was part of the quoted reply: “Strangely enough, I have been testing ext4 and stopped filling it at a bit over 1 billion 20KB files on Monday (with 60TB of storage). Running fsck on it took only 2.4 hours.” Where the idea that the 'fsck' time that matters is that of a freshly created (and was the page cache flushed?), uncorrupted filesystem is intensely comical. "Possible" does not mean "reasonably". Just delirious. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: LWN.net article: creating 1 billion files -> Tests we did 2010-09-17 19:57 ` Peter Grandi @ 2010-09-18 11:39 ` Emmanuel Florac 0 siblings, 0 replies; 22+ messages in thread From: Emmanuel Florac @ 2010-09-18 11:39 UTC (permalink / raw) To: Peter Grandi; +Cc: Linux XFS Le Fri, 17 Sep 2010 20:57:48 +0100 vous écriviez: > LWN is usually fairly decent, but I have noticed it does > occasionally waste pixels on/bits things that the author(s) > misrepresent as storage or file system tests. How unfortunate we missed your precious stance on that matter. Everybody knows that benchmarks are mostly useless /per se/, however it often occurs that comparative benchmarks may easily reveal some interesting differences. As for the interest of an experiment of pushing something to the limit for the sake of it, it may equally reveal interesting bugs. The fact that all filesystems in this tests didn't simply *fail* under the load is by itself revealing of the overall robustness and stability of these, the VFS and Linux kernel. As a side note, were you using a slightly less harsh tone, you'd probably be help people being less reluctant to discuss those points more deeply. -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2010-09-18 11:38 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-08-19 11:12 LWN.net article: creating 1 billion files -> XFS looses Michael Monnerie 2010-08-19 12:05 ` Christoph Hellwig 2010-08-19 12:45 ` Michael Monnerie 2010-08-19 13:55 ` Stan Hoeppner 2010-08-20 7:55 ` Dave Chinner 2010-08-19 13:10 ` Emmanuel Florac 2010-09-06 13:42 ` Emmanuel Florac 2010-09-06 22:04 ` Dave Chinner 2010-09-06 22:58 ` Michael Monnerie 2010-09-07 3:31 ` Dave Chinner 2010-09-07 6:20 ` Michael Monnerie 2010-09-07 7:01 ` Dave Chinner 2010-09-08 5:42 ` Michael Monnerie 2010-09-07 6:46 ` Emmanuel Florac 2010-09-16 10:13 ` LWN.net article: creating 1 billion files -> Tests we did Emmanuel Florac 2010-09-16 21:53 ` Stan Hoeppner 2010-09-17 7:54 ` Michael Monnerie 2010-09-17 19:29 ` Peter Grandi 2010-09-18 11:25 ` Emmanuel Florac 2010-09-18 11:16 ` Emmanuel Florac 2010-09-17 19:57 ` Peter Grandi 2010-09-18 11:39 ` Emmanuel Florac
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.