* [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 @ 2002-10-01 19:59 Paul P Komkoff Jr 2002-10-01 20:43 ` Hans Reiser ` (3 more replies) 0 siblings, 4 replies; 20+ messages in thread From: Paul P Komkoff Jr @ 2002-10-01 19:59 UTC (permalink / raw) To: Linux Kernel Mailing List This is the stupidiest testcase I've done but it worth seeing (maybe) We create 300000 files named from 00000000 to 000493E0 in one directory, then delete it in order. Tests taken on ext3+htree and reiserfs. ext3 w/o htree hadn't evaluated because it will take long long time ... both filesystems was mounted with noatime,nodiratime and ext3 was data=writeback to be somewhat fair ... real user sys reiserfs: Creating: 3m13.208s 0m4.412s 2m54.404s Deleting: 4m41.250s 0m4.206s 4m17.926s Ext3: Creating: 4m9.331s 0m3.927s 2m21.757s Deleting: 9m14.838s 0m3.446s 1m39.508s htree improved this a much but it still beaten by reiserfs. seems odd to me - deleting taking twice time then creating ... -- Paul P 'Stingray' Komkoff 'Greatest' Jr /// (icq)23200764 /// (http)stingr.net When you're invisible, the only one really watching you is you (my keychain) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 2002-10-01 19:59 [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 Paul P Komkoff Jr @ 2002-10-01 20:43 ` Hans Reiser 2002-10-01 20:49 ` Hans Reiser 2002-10-01 20:43 ` Andreas Dilger ` (2 subsequent siblings) 3 siblings, 1 reply; 20+ messages in thread From: Hans Reiser @ 2002-10-01 20:43 UTC (permalink / raw) To: Paul P Komkoff Jr; +Cc: Linux Kernel Mailing List, god Paul P Komkoff Jr wrote: >This is the stupidiest testcase I've done but it worth seeing (maybe) > >We create 300000 files named from 00000000 to 000493E0 in one >directory, then delete it in order. > >Tests taken on ext3+htree and reiserfs. ext3 w/o htree hadn't >evaluated because it will take long long time ... > >both filesystems was mounted with noatime,nodiratime and ext3 was >data=writeback to be somewhat fair ... > > real user sys >reiserfs: >Creating: 3m13.208s 0m4.412s 2m54.404s >Deleting: 4m41.250s 0m4.206s 4m17.926s > >Ext3: >Creating: 4m9.331s 0m3.927s 2m21.757s >Deleting: 9m14.838s 0m3.446s 1m39.508s > >htree improved this a much but it still beaten by reiserfs. seems odd >to me - deleting taking twice time then creating ... > > > Can you send us the code so we can try it on reiser4? We are going to release reiser4 sometime this month (don't ask me when), and we'd be happy to see you run it when you do. Hans ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 2002-10-01 20:43 ` Hans Reiser @ 2002-10-01 20:49 ` Hans Reiser 2002-10-01 21:17 ` Rik van Riel 2002-10-01 21:31 ` Daniel Phillips 0 siblings, 2 replies; 20+ messages in thread From: Hans Reiser @ 2002-10-01 20:49 UTC (permalink / raw) To: Hans Reiser; +Cc: Paul P Komkoff Jr, Linux Kernel Mailing List, god Hans Reiser wrote: > Paul P Komkoff Jr wrote: > >> This is the stupidiest testcase I've done but it worth seeing (maybe) >> >> We create 300000 files named from 00000000 to 000493E0 in one >> directory, then delete it in order. >> >> Tests taken on ext3+htree and reiserfs. ext3 w/o htree hadn't >> evaluated because it will take long long time ... >> >> both filesystems was mounted with noatime,nodiratime and ext3 was >> data=writeback to be somewhat fair ... >> >> real user sys >> reiserfs: >> Creating: 3m13.208s 0m4.412s 2m54.404s >> Deleting: 4m41.250s 0m4.206s 4m17.926s >> >> Ext3: >> Creating: 4m9.331s 0m3.927s 2m21.757s >> Deleting: 9m14.838s 0m3.446s 1m39.508s >> >> htree improved this a much but it still beaten by reiserfs. seems odd >> to me - deleting taking twice time then creating ... >> >> >> > Can you send us the code so we can try it on reiser4? We are going to > release reiser4 sometime this month (don't ask me when), and we'd be > happy to see you run it when you do. ^you^we Sorry to list for bandwidth waste. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 2002-10-01 20:49 ` Hans Reiser @ 2002-10-01 21:17 ` Rik van Riel 2002-10-01 21:31 ` Daniel Phillips 1 sibling, 0 replies; 20+ messages in thread From: Rik van Riel @ 2002-10-01 21:17 UTC (permalink / raw) To: Hans Reiser; +Cc: Paul P Komkoff Jr, Linux Kernel Mailing List, god On Wed, 2 Oct 2002, Hans Reiser wrote: > Hans Reiser wrote: [snip 50 lines] > ^you^we > > Sorry to list for bandwidth waste. So learn quoting ;) Rik -- A: No. Q: Should I include quotations after my reply? http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 2002-10-01 20:49 ` Hans Reiser 2002-10-01 21:17 ` Rik van Riel @ 2002-10-01 21:31 ` Daniel Phillips 1 sibling, 0 replies; 20+ messages in thread From: Daniel Phillips @ 2002-10-01 21:31 UTC (permalink / raw) To: Hans Reiser; +Cc: Paul P Komkoff Jr, Linux Kernel Mailing List, god Hi Hans, On Tuesday 01 October 2002 22:49, Hans Reiser wrote: > > Can you send us the code so we can try it on reiser4? We are going to > > release reiser4 sometime this month (don't ask me when), and we'd be > > happy to see you run it when you do. > > ^you^we > > Sorry to list for bandwidth waste. Can be much reduced by selective quoting... -- Daniel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 2002-10-01 19:59 [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 Paul P Komkoff Jr 2002-10-01 20:43 ` Hans Reiser @ 2002-10-01 20:43 ` Andreas Dilger 2002-10-01 21:19 ` Hans Reiser ` (2 more replies) 2002-10-01 21:27 ` Daniel Phillips 2002-10-02 6:39 ` Nikita Danilov 3 siblings, 3 replies; 20+ messages in thread From: Andreas Dilger @ 2002-10-01 20:43 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: ext2-devel On Oct 01, 2002 23:59 +0400, Paul P Komkoff Jr wrote: > This is the stupidiest testcase I've done but it worth seeing (maybe) > > We create 300000 files named from 00000000 to 000493E0 in one > directory, then delete it in order. > > Tests taken on ext3+htree and reiserfs. ext3 w/o htree hadn't > evaluated because it will take long long time ... > > both filesystems was mounted with noatime,nodiratime and ext3 was > data=writeback to be somewhat fair ... Why do you think data=writeback is better than data=journal? If the files have no data then it should not make a difference. > real user sys > reiserfs: > Creating: 3m13.208s 0m4.412s 2m54.404s > Deleting: 4m41.250s 0m4.206s 4m17.926s > > Ext3: > Creating: 4m9.331s 0m3.927s 2m21.757s > Deleting: 9m14.838s 0m3.446s 1m39.508s > > htree improved this a much but it still beaten by reiserfs. seems odd > to me - deleting taking twice time then creating ... This is a known issue with the current htree code (not the algorithm or the on-disk format, luckily). The problem is that inodes are being allocated essentially sequentially on disk. If you are deleting in creation order (as you are) then you are randomly dirtying directory leaf blocks, and if you are deleting in readdir() order, then you are randomly dirtying inode blocks. As a result, if the size of the directory + inode table blocks is larger than memory, and also larger than 1/4 of the journal, you are essentially seek-bound because of random block dirtying. This can be fixed by changing the inode allocation routines to allocate inodes in "chunks" which correspond to the leaf page for which the dirent is being allocated. This will try to keep the inodes for a given directory block relatively close together on disk and greatly improve delete performance. You should see what the size of the directory is at its peak (probably 16 bytes * 300k ~= 5MB, and add in the size of the directory blocks (128 bytes * 300k ~= 38MB) and make the journal 4x as large as that, so 192MB (mke2fs -j -J size=192) and re-run the test (I assume you have at least 256MB+ of RAM on the test system). What is very interesting from the above results is that the CPU usage is _much_ smaller for ext3+htree than for reiserfs. It looks like reiserfs is nearly CPU-bound by the tests, so it is unlikely that they can run much faster. In theory, ext3+htree run at the CPU time if we fixed the allocation and/or seeking issues. Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 2002-10-01 20:43 ` Andreas Dilger @ 2002-10-01 21:19 ` Hans Reiser 2002-10-02 10:48 ` Paul P Komkoff Jr 2002-10-04 15:53 ` Oleg Drokin 2 siblings, 0 replies; 20+ messages in thread From: Hans Reiser @ 2002-10-01 21:19 UTC (permalink / raw) To: Andreas Dilger; +Cc: Linux Kernel Mailing List, ext2-devel, god Andreas Dilger wrote: > > > It looks like >reiserfs is nearly CPU-bound by the tests, so it is unlikely that they >can run much faster. > Um, usually being CPU bound is easier to fix. We have probably not CPU profiled this code path, and after Halloween we probably should (but for reiser4, since reiser3 is soon to be obsoleted). It is being IO bound that is usually hard to fix, though since I haven't read the htree code I trust you that it is different in this case.... >In theory, ext3+htree run at the CPU time if we >fixed the allocation and/or seeking issues. > >Cheers, Andreas >-- >Andreas Dilger >http://www-mddsp.enel.ucalgary.ca/People/adilger/ >http://sourceforge.net/projects/ext2resize/ > >- >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html >Please read the FAQ at http://www.tux.org/lkml/ > > > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 2002-10-01 20:43 ` Andreas Dilger 2002-10-01 21:19 ` Hans Reiser @ 2002-10-02 10:48 ` Paul P Komkoff Jr 2002-10-02 16:54 ` Andreas Dilger 2002-10-04 15:53 ` Oleg Drokin 2 siblings, 1 reply; 20+ messages in thread From: Paul P Komkoff Jr @ 2002-10-02 10:48 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: ext2-devel Replying to Andreas Dilger: > Why do you think data=writeback is better than data=journal? If the > files have no data then it should not make a difference. It is better than default data=ordered I think :) Thanks for detailed explanation - it saved much time for me and accortind to yours directions I have recalculated my test. Now ext3 is better :) e3 create 2m49.545s 0m4.162s 2m20.766s delete 2m8.155s 0m3.614s 1m34.945s reiser create 3m13.577s 0m4.338s 2m54.026s delete 4m39.249s 0m3.968s 4m16.297s e3 create 2m50.766s 0m4.024s 2m21.197s delete 2m8.755s 0m3.501s 1m35.737s reiser create 3m13.015s 0m4.432s 2m53.412s delete 4m41.011s 0m3.893s 4m16.845s this is two typical runs. Now I creating ext3 with mke2fs -j -O dir_index -J size=192 -T news /dev/sda4 as you can see, this improves performance by 1/4 Unfortunately, there still one issue in ext3. It called "inode limit". Initially I wanted to run this test on 1000000 files but ... I hit inode limit and don't want to increase it artificially yet. Reiserfs worked fine because it don't have such kind of limit ... -- Paul P 'Stingray' Komkoff 'Greatest' Jr /// (icq)23200764 /// (http)stingr.net When you're invisible, the only one really watching you is you (my keychain) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 2002-10-02 10:48 ` Paul P Komkoff Jr @ 2002-10-02 16:54 ` Andreas Dilger 2002-10-03 0:37 ` [Ext2-devel] " Theodore Ts'o 0 siblings, 1 reply; 20+ messages in thread From: Andreas Dilger @ 2002-10-02 16:54 UTC (permalink / raw) To: Linux Kernel Mailing List, ext2-devel On Oct 02, 2002 14:48 +0400, Paul P Komkoff Jr wrote: > Unfortunately, there still one issue in ext3. It called "inode limit". > Initially I wanted to run this test on 1000000 files but ... I hit > inode limit and don't want to increase it artificially yet. > > Reiserfs worked fine because it don't have such kind of limit ... We have plans to fix this already, but it is not high enough on anyones priority list quite yet (most filesystems have enough inodes for regular usage). Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ext2-devel] Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 2002-10-02 16:54 ` Andreas Dilger @ 2002-10-03 0:37 ` Theodore Ts'o 2002-10-03 12:04 ` Hans Reiser 0 siblings, 1 reply; 20+ messages in thread From: Theodore Ts'o @ 2002-10-03 0:37 UTC (permalink / raw) To: Linux Kernel Mailing List, ext2-devel On Wed, Oct 02, 2002 at 10:54:54AM -0600, Andreas Dilger wrote: > On Oct 02, 2002 14:48 +0400, Paul P Komkoff Jr wrote: > > Unfortunately, there still one issue in ext3. It called "inode limit". > > Initially I wanted to run this test on 1000000 files but ... I hit > > inode limit and don't want to increase it artificially yet. > > > > Reiserfs worked fine because it don't have such kind of limit ... > > We have plans to fix this already, but it is not high enough on anyones > priority list quite yet (most filesystems have enough inodes for regular > usage). Just to be clear, the limit which Paul is referring to is just simply a matter of creating the filesystem with a sufficient number of inodes. (i.e., mke2fs -N 1200000). Yes, having a dynamic inode table would be good, but in practice sysadmins know how many inodes are needed in advance. - Ted ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ext2-devel] Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 2002-10-03 0:37 ` [Ext2-devel] " Theodore Ts'o @ 2002-10-03 12:04 ` Hans Reiser 2002-10-03 19:40 ` Theodore Ts'o 0 siblings, 1 reply; 20+ messages in thread From: Hans Reiser @ 2002-10-03 12:04 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Linux Kernel Mailing List, ext2-devel Theodore Ts'o wrote: > > >Just to be clear, the limit which Paul is referring to is just simply >a matter of creating the filesystem with a sufficient number of >inodes. (i.e., mke2fs -N 1200000). Yes, having a dynamic inode table >would be good, but in practice sysadmins know how many inodes are >needed in advance. > > - Ted > > > No they don't. Average space wastage is more than 50% because sysadmins have to be conservative. Hans ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ext2-devel] Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 2002-10-03 12:04 ` Hans Reiser @ 2002-10-03 19:40 ` Theodore Ts'o 2002-10-03 19:44 ` Hans Reiser 0 siblings, 1 reply; 20+ messages in thread From: Theodore Ts'o @ 2002-10-03 19:40 UTC (permalink / raw) To: Hans Reiser, G; +Cc: Linux Kernel Mailing List, ext2-devel On Thu, Oct 03, 2002 at 04:04:12PM +0400, Hans Reiser wrote: > > No they don't. Average space wastage is more than 50% because sysadmins > have to be conservative. Sure, but even a hundred megabytes or two out of a 100 gigabyte drive is cheap. (Specifically, about fifty cents' worth.) - Ted ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ext2-devel] Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 2002-10-03 19:40 ` Theodore Ts'o @ 2002-10-03 19:44 ` Hans Reiser 0 siblings, 0 replies; 20+ messages in thread From: Hans Reiser @ 2002-10-03 19:44 UTC (permalink / raw) To: Theodore Ts'o; +Cc: G, Linux Kernel Mailing List, ext2-devel Theodore Ts'o wrote: >On Thu, Oct 03, 2002 at 04:04:12PM +0400, Hans Reiser wrote: > > >>No they don't. Average space wastage is more than 50% because sysadmins >>have to be conservative. >> >> > >Sure, but even a hundred megabytes or two out of a 100 gigabyte drive >is cheap. (Specifically, about fifty cents' worth.) > > - Ted > > > > Usual space wastage is on the order of 5% of total partition size, yes? Allocating 0.1% of your drive for inodes will get you into trouble if a user does something like use mh or read news, etc. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 2002-10-01 20:43 ` Andreas Dilger 2002-10-01 21:19 ` Hans Reiser 2002-10-02 10:48 ` Paul P Komkoff Jr @ 2002-10-04 15:53 ` Oleg Drokin 2002-10-04 17:09 ` [Ext2-devel] " Andreas Dilger 2 siblings, 1 reply; 20+ messages in thread From: Oleg Drokin @ 2002-10-04 15:53 UTC (permalink / raw) To: Linux Kernel Mailing List, ext2-devel Hello! On Tue, Oct 01, 2002 at 02:43:30PM -0600, Andreas Dilger wrote: > As a result, if the size of the directory + inode table blocks is larger > than memory, and also larger than 1/4 of the journal, you are essentially > seek-bound because of random block dirtying. > You should see what the size of the directory is at its peak (probably > 16 bytes * 300k ~= 5MB, and add in the size of the directory blocks > (128 bytes * 300k ~= 38MB) and make the journal 4x as large as that, > so 192MB (mke2fs -j -J size=192) and re-run the test (I assume you have > at least 256MB+ of RAM on the test system). Hm. But all of that won't help if you need to read inodes from disk first, right? (until that inode allocation in chunks implemented, of course). BTW, in case of inode allocation in chunks attached to directory blocks, you won't get any benefit in case if application creates file in some tempoarry dir and then rename()s it to its proper place, or am I missing something? > What is very interesting from the above results is that the CPU usage > is _much_ smaller for ext3+htree than for reiserfs. It looks like This is only in case of deletion, probably somehow related to constant item shifting when some of the items are deleted. Bye, Oleg ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ext2-devel] Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 2002-10-04 15:53 ` Oleg Drokin @ 2002-10-04 17:09 ` Andreas Dilger 2002-10-07 6:54 ` Oleg Drokin 2002-10-10 0:27 ` Daniel Phillips 0 siblings, 2 replies; 20+ messages in thread From: Andreas Dilger @ 2002-10-04 17:09 UTC (permalink / raw) To: Oleg Drokin; +Cc: Linux Kernel Mailing List, ext2-devel On Oct 04, 2002 19:53 +0400, Oleg Drokin wrote: > On Tue, Oct 01, 2002 at 02:43:30PM -0600, Andreas Dilger wrote: > > As a result, if the size of the directory + inode table blocks is larger > > than memory, and also larger than 1/4 of the journal, you are essentially > > seek-bound because of random block dirtying. > > You should see what the size of the directory is at its peak (probably > > 16 bytes * 300k ~= 5MB, and add in the size of the directory blocks > > (128 bytes * 300k ~= 38MB) and make the journal 4x as large as that, > > so 192MB (mke2fs -j -J size=192) and re-run the test (I assume you have > > at least 256MB+ of RAM on the test system). > > Hm. But all of that won't help if you need to read inodes from disk first, > right? (until that inode allocation in chunks implemented, of course). Ah, but see the follow-up reply - increasing the size of the journal as advised improved the htree performance to 15% and 55% faster than reiserfs for creates and deletes, respectively: On Wed, 2 Oct 2002 14:48:59 +0400 Paul P Komkoff Jr replied: > Thanks for detailed explanation - it saved much time for me and > accortind to yours directions I have recalculated my test. Now ext3 is > better :) > > real user cpu > e3 > create 2m49.545s 0m4.162s 2m20.766s > delete 2m8.155s 0m3.614s 1m34.945s > > reiser > create 3m13.577s 0m4.338s 2m54.026s > delete 4m39.249s 0m3.968s 4m16.297s > > e3 > create 2m50.766s 0m4.024s 2m21.197s > delete 2m8.755s 0m3.501s 1m35.737s > > reiser > create 3m13.015s 0m4.432s 2m53.412s > delete 4m41.011s 0m3.893s 4m16.845s On Oct 04, 2002 19:53 +0400, Oleg Drokin wrote some more: > BTW, in case of inode allocation in chunks attached to directory blocks, > you won't get any benefit in case if application creates file in some > tempoarry dir and then rename()s it to its proper place, or am I missing > something? No, you are correct. Renaming the files will randomly re-hash the names and break any coherency between the directory leaf blocks and the inode blocks. However, such files are often short-lived anyways (mail spools and such), and for the normal case (e.g. untar of a file) the names are constant, so there should be a benefit for smaller journals from this. > > What is very interesting from the above results is that the CPU usage > > is _much_ smaller for ext3+htree than for reiserfs. It looks like > > This is only in case of deletion, probably somehow related to constant item > shifting when some of the items are deleted. Well, even for creates it is 19% less CPU. The re-tested wall-clock time for htree creates is now less than the CPU usage of reiserfs, so it is impossible for reiserfs to achieve this number without optimization of the code somehow. For deletes the cpu usage of htree is 40% less, but we are currently not doing leaf block compaction, so there would probably be a slight performance hit to merge blocks (although we have some plans to do that efficiently also). Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ext2-devel] Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 2002-10-04 17:09 ` [Ext2-devel] " Andreas Dilger @ 2002-10-07 6:54 ` Oleg Drokin 2002-10-10 0:27 ` Daniel Phillips 1 sibling, 0 replies; 20+ messages in thread From: Oleg Drokin @ 2002-10-07 6:54 UTC (permalink / raw) To: Linux Kernel Mailing List, ext2-devel Hello! On Fri, Oct 04, 2002 at 11:09:35AM -0600, Andreas Dilger wrote: > > > As a result, if the size of the directory + inode table blocks is larger > > > than memory, and also larger than 1/4 of the journal, you are essentially > > > seek-bound because of random block dirtying. > > > You should see what the size of the directory is at its peak (probably > > > 16 bytes * 300k ~= 5MB, and add in the size of the directory blocks > > > (128 bytes * 300k ~= 38MB) and make the journal 4x as large as that, > > > so 192MB (mke2fs -j -J size=192) and re-run the test (I assume you have > > > at least 256MB+ of RAM on the test system). > > Hm. But all of that won't help if you need to read inodes from disk first, > > right? (until that inode allocation in chunks implemented, of course). > Ah, but see the follow-up reply - increasing the size of the journal as > advised improved the htree performance to 15% and 55% faster than > reiserfs for creates and deletes, respectively: Yes, but that was the case with warm caches, as I understand it. Usually you cannot count that all inodes of large file set are already present and should not be read. > > > What is very interesting from the above results is that the CPU usage > > > is _much_ smaller for ext3+htree than for reiserfs. It looks like > > This is only in case of deletion, probably somehow related to constant item > > shifting when some of the items are deleted. > Well, even for creates it is 19% less CPU. The re-tested wall-clock I afraid other parts of code might have contributed there. Like setting s_dirt way more often than needed. Bye, Oleg ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ext2-devel] Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 2002-10-04 17:09 ` [Ext2-devel] " Andreas Dilger 2002-10-07 6:54 ` Oleg Drokin @ 2002-10-10 0:27 ` Daniel Phillips 1 sibling, 0 replies; 20+ messages in thread From: Daniel Phillips @ 2002-10-10 0:27 UTC (permalink / raw) To: Andreas Dilger, Oleg Drokin; +Cc: Linux Kernel Mailing List, ext2-devel On Friday 04 October 2002 19:09, Andreas Dilger wrote: > On Oct 04, 2002 19:53 +0400, Oleg Drokin wrote: > > On Tue, Oct 01, 2002 at 02:43:30PM -0600, Andreas Dilger wrote: > > > What is very interesting from the above results is that the CPU usage > > > is _much_ smaller for ext3+htree than for reiserfs. It looks like > > > > This is only in case of deletion, probably somehow related to constant item > > shifting when some of the items are deleted. > > Well, even for creates it is 19% less CPU. The re-tested wall-clock > time for htree creates is now less than the CPU usage of reiserfs, so > it is impossible for reiserfs to achieve this number without > optimization of the code somehow. For deletes the cpu usage of htree > is 40% less, but we are currently not doing leaf block compaction, so > there would probably be a slight performance hit to merge blocks > (although we have some plans to do that efficiently also). I convinced myself at some point that compaction will cost no more than a couple of percent for deletes and nothing for creates. -- Daniel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 2002-10-01 19:59 [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 Paul P Komkoff Jr 2002-10-01 20:43 ` Hans Reiser 2002-10-01 20:43 ` Andreas Dilger @ 2002-10-01 21:27 ` Daniel Phillips 2002-10-02 16:38 ` Paul P Komkoff Jr 2002-10-02 6:39 ` Nikita Danilov 3 siblings, 1 reply; 20+ messages in thread From: Daniel Phillips @ 2002-10-01 21:27 UTC (permalink / raw) To: Paul P Komkoff Jr, Linux Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 1525 bytes --] On Tuesday 01 October 2002 21:59, Paul P Komkoff Jr wrote: > This is the stupidiest testcase I've done but it worth seeing (maybe) > > We create 300000 files How big are the files? > named from 00000000 to 000493E0 in one directory, then delete it in order. You probably want to try creating the files in random order as well. A program to do that is attached, use in the form: randfiles <basename> <count> y where 'y' means 'print the names', for debugging purposes. What did your delete command look like, "rm -rf" or "echo * | xargs rm"? > Tests taken on ext3+htree and reiserfs. ext3 w/o htree hadn't > evaluated because it will take long long time ... > > both filesystems was mounted with noatime,nodiratime and ext3 was > data=writeback to be somewhat fair ... > > real user sys > reiserfs: > Creating: 3m13.208s 0m4.412s 2m54.404s > Deleting: 4m41.250s 0m4.206s 4m17.926s > > Ext3: > Creating: 4m9.331s 0m3.927s 2m21.757s > Deleting: 9m14.838s 0m3.446s 1m39.508s > > htree improved this a much but it still beaten by reiserfs. seems odd > to me - deleting taking twice time then creating ... Only 300,000 files, you haven't got enough to cause inode table thrashing, though some kernels shrink the inode cache too agressively and that can cause thrashing at lower numbers. Maybe a bottleneck in the journal? Not that anybody is going to complain about any of the above - it's still running less than 1 ms/create, 2 ms/delete. Still, it's slower than I'm used to. -- Daniel [-- Attachment #2: randfiles.c --] [-- Type: text/x-c, Size: 539 bytes --] #include <stdlib.h> #define swap(x, y) do { typeof(x) z = x; x = y; y = z; } while (0) int main (int argc, char *argv[]) { int n = (argc > 2)? strtol(argv[2], 0, 10): 0; int i, size = 50, show = argc > 3 && !strncmp(argv[3], "y", 1); char name[size]; int choose[n]; for (i = 0; i < n; i++) choose[i] = i; for (i = n; i; i--) swap(choose[i-1], choose[rand() % i]); for (i = 0; i < n; i++) { snprintf(name, size, "%s%i", argv[1], choose[i]); if (show) printf("create %s\n", name); close(open(name, 0100)); } return 0; } ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 2002-10-01 21:27 ` Daniel Phillips @ 2002-10-02 16:38 ` Paul P Komkoff Jr 0 siblings, 0 replies; 20+ messages in thread From: Paul P Komkoff Jr @ 2002-10-02 16:38 UTC (permalink / raw) To: Linux Kernel Mailing List Replying to Daniel Phillips: > How big are the files? 0. #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <stdio.h> main(int argc, char* argv[]) { int i, j, k = atoi(argv[1]); char t[128]; for (i = 0; i < k; i++) { snprintf(t, 127, "%08X", i); if (-1 == (j = creat(t, S_IRWXU))) { perror("Create file"); printf("no: %d\n", i); return; } close(j); } } > You probably want to try creating the files in random order as well. A > program to do that is attached, use in the form: > > randfiles <basename> <count> y > > where 'y' means 'print the names', for debugging purposes. this will be the next series of tests :) > What did your delete command look like, "rm -rf" or "echo * | xargs rm"? #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <stdio.h> main(int argc, char* argv[]) { int i, j, k = atoi(argv[1]); char t[128]; for (i = 0; i < k; i++) { snprintf(t, 127, "%08X", i); if (-1 == unlink(t)) { perror("unlink"); printf("no: %d\n", i); return; } } } > Only 300,000 files, you haven't got enough to cause inode table thrashing, > though some kernels shrink the inode cache too agressively and that can > cause thrashing at lower numbers. Maybe a bottleneck in the journal? Yes, increasing journal to fit the whole directory in it (as Andreas Dilger said) improved results by 1/4. But. Initially my test was 1000000 files. /dev/sda4 in my tests 1882844. And I am quickly hitting inode limit on -t news ext3 filesystem so I need to artificially increase it at mke2fs time, but I decided to not do so (yet). > Not that anybody is going to complain about any of the above - it's still > running less than 1 ms/create, 2 ms/delete. Still, it's slower than I'm > used to. I just trying to write a caching proxy-like application and not reinvent the wheel (aka design my own filesystem and store it in a big file just because some filesystem is so slow on large directories/cannot make more than N empty objects etc). -- Paul P 'Stingray' Komkoff 'Greatest' Jr /// (icq)23200764 /// (http)stingr.net When you're invisible, the only one really watching you is you (my keychain) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 2002-10-01 19:59 [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 Paul P Komkoff Jr ` (2 preceding siblings ...) 2002-10-01 21:27 ` Daniel Phillips @ 2002-10-02 6:39 ` Nikita Danilov 3 siblings, 0 replies; 20+ messages in thread From: Nikita Danilov @ 2002-10-02 6:39 UTC (permalink / raw) To: Paul P Komkoff Jr; +Cc: Linux Kernel Mailing List Paul P Komkoff Jr writes: > This is the stupidiest testcase I've done but it worth seeing (maybe) > > We create 300000 files named from 00000000 to 000493E0 in one > directory, then delete it in order. > > Tests taken on ext3+htree and reiserfs. ext3 w/o htree hadn't > evaluated because it will take long long time ... > > both filesystems was mounted with noatime,nodiratime and ext3 was > data=writeback to be somewhat fair ... > > real user sys > reiserfs: > Creating: 3m13.208s 0m4.412s 2m54.404s > Deleting: 4m41.250s 0m4.206s 4m17.926s > > Ext3: > Creating: 4m9.331s 0m3.927s 2m21.757s > Deleting: 9m14.838s 0m3.446s 1m39.508s Why user times are so different? > > htree improved this a much but it still beaten by reiserfs. seems odd > to me - deleting taking twice time then creating ... > > -- > Paul P 'Stingray' Komkoff 'Greatest' Jr /// (icq)23200764 /// (http)stingr.net > When you're invisible, the only one really watching you is you (my keychain) Nikita. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2002-10-10 0:20 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-10-01 19:59 [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 Paul P Komkoff Jr 2002-10-01 20:43 ` Hans Reiser 2002-10-01 20:49 ` Hans Reiser 2002-10-01 21:17 ` Rik van Riel 2002-10-01 21:31 ` Daniel Phillips 2002-10-01 20:43 ` Andreas Dilger 2002-10-01 21:19 ` Hans Reiser 2002-10-02 10:48 ` Paul P Komkoff Jr 2002-10-02 16:54 ` Andreas Dilger 2002-10-03 0:37 ` [Ext2-devel] " Theodore Ts'o 2002-10-03 12:04 ` Hans Reiser 2002-10-03 19:40 ` Theodore Ts'o 2002-10-03 19:44 ` Hans Reiser 2002-10-04 15:53 ` Oleg Drokin 2002-10-04 17:09 ` [Ext2-devel] " Andreas Dilger 2002-10-07 6:54 ` Oleg Drokin 2002-10-10 0:27 ` Daniel Phillips 2002-10-01 21:27 ` Daniel Phillips 2002-10-02 16:38 ` Paul P Komkoff Jr 2002-10-02 6:39 ` Nikita Danilov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).