linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1
@ 2002-10-01 19:59 Paul P Komkoff Jr
  2002-10-01 20:43 ` Hans Reiser
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Paul P Komkoff Jr @ 2002-10-01 19:59 UTC (permalink / raw)
  To: Linux Kernel Mailing List

This is the stupidiest testcase I've done but it worth seeing (maybe)

We create 300000 files named from 00000000 to 000493E0 in one
directory, then delete it in order.

Tests taken on ext3+htree and reiserfs. ext3 w/o htree hadn't
evaluated because it will take long long time ...

both filesystems was mounted with noatime,nodiratime and ext3 was
data=writeback to be somewhat fair ...

	       	real 	      	user  		sys
reiserfs:
Creating: 	3m13.208s	0m4.412s	2m54.404s
Deleting:	4m41.250s	0m4.206s	4m17.926s

Ext3:
Creating:	4m9.331s	0m3.927s	2m21.757s
Deleting:	9m14.838s	0m3.446s	1m39.508s

htree improved this a much but it still beaten by reiserfs. seems odd
to me - deleting taking twice time then creating ...

-- 
Paul P 'Stingray' Komkoff 'Greatest' Jr /// (icq)23200764 /// (http)stingr.net
  When you're invisible, the only one really watching you is you (my keychain)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1
  2002-10-01 19:59 [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 Paul P Komkoff Jr
@ 2002-10-01 20:43 ` Hans Reiser
  2002-10-01 20:49   ` Hans Reiser
  2002-10-01 20:43 ` Andreas Dilger
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: Hans Reiser @ 2002-10-01 20:43 UTC (permalink / raw)
  To: Paul P Komkoff Jr; +Cc: Linux Kernel Mailing List, god

Paul P Komkoff Jr wrote:

>This is the stupidiest testcase I've done but it worth seeing (maybe)
>
>We create 300000 files named from 00000000 to 000493E0 in one
>directory, then delete it in order.
>
>Tests taken on ext3+htree and reiserfs. ext3 w/o htree hadn't
>evaluated because it will take long long time ...
>
>both filesystems was mounted with noatime,nodiratime and ext3 was
>data=writeback to be somewhat fair ...
>
>	       	real 	      	user  		sys
>reiserfs:
>Creating: 	3m13.208s	0m4.412s	2m54.404s
>Deleting:	4m41.250s	0m4.206s	4m17.926s
>
>Ext3:
>Creating:	4m9.331s	0m3.927s	2m21.757s
>Deleting:	9m14.838s	0m3.446s	1m39.508s
>
>htree improved this a much but it still beaten by reiserfs. seems odd
>to me - deleting taking twice time then creating ...
>
>  
>
Can you send us the code so we can try it on reiser4?  We are going to 
release reiser4 sometime this month (don't ask me when), and we'd be 
happy to see you run it when you do.

Hans


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1
  2002-10-01 19:59 [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 Paul P Komkoff Jr
  2002-10-01 20:43 ` Hans Reiser
@ 2002-10-01 20:43 ` Andreas Dilger
  2002-10-01 21:19   ` Hans Reiser
                     ` (2 more replies)
  2002-10-01 21:27 ` Daniel Phillips
  2002-10-02  6:39 ` Nikita Danilov
  3 siblings, 3 replies; 20+ messages in thread
From: Andreas Dilger @ 2002-10-01 20:43 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: ext2-devel

On Oct 01, 2002  23:59 +0400, Paul P Komkoff Jr wrote:
> This is the stupidiest testcase I've done but it worth seeing (maybe)
> 
> We create 300000 files named from 00000000 to 000493E0 in one
> directory, then delete it in order.
> 
> Tests taken on ext3+htree and reiserfs. ext3 w/o htree hadn't
> evaluated because it will take long long time ...
> 
> both filesystems was mounted with noatime,nodiratime and ext3 was
> data=writeback to be somewhat fair ...

Why do you think data=writeback is better than data=journal?  If the
files have no data then it should not make a difference.

> 	       	real 	      	user  		sys
> reiserfs:
> Creating: 	3m13.208s	0m4.412s	2m54.404s
> Deleting:	4m41.250s	0m4.206s	4m17.926s
> 
> Ext3:
> Creating:	4m9.331s	0m3.927s	2m21.757s
> Deleting:	9m14.838s	0m3.446s	1m39.508s
> 
> htree improved this a much but it still beaten by reiserfs. seems odd
> to me - deleting taking twice time then creating ...

This is a known issue with the current htree code (not the algorithm
or the on-disk format, luckily).  The problem is that inodes are being
allocated essentially sequentially on disk.  If you are deleting in
creation order (as you are) then you are randomly dirtying directory
leaf blocks, and if you are deleting in readdir() order, then you are
randomly dirtying inode blocks.

As a result, if the size of the directory + inode table blocks is larger
than memory, and also larger than 1/4 of the journal, you are essentially
seek-bound because of random block dirtying.

This can be fixed by changing the inode allocation routines to allocate
inodes in "chunks" which correspond to the leaf page for which the
dirent is being allocated.  This will try to keep the inodes for a given
directory block relatively close together on disk and greatly improve
delete performance.

You should see what the size of the directory is at its peak (probably
16 bytes * 300k ~= 5MB, and add in the size of the directory blocks
(128 bytes * 300k ~= 38MB) and make the journal 4x as large as that,
so 192MB (mke2fs -j -J size=192) and re-run the test (I assume you have
at least 256MB+ of RAM on the test system).

What is very interesting from the above results is that the CPU usage
is _much_ smaller for ext3+htree than for reiserfs.  It looks like
reiserfs is nearly CPU-bound by the tests, so it is unlikely that they
can run much faster.  In theory, ext3+htree run at the CPU time if we
fixed the allocation and/or seeking issues.

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1
  2002-10-01 20:43 ` Hans Reiser
@ 2002-10-01 20:49   ` Hans Reiser
  2002-10-01 21:17     ` Rik van Riel
  2002-10-01 21:31     ` Daniel Phillips
  0 siblings, 2 replies; 20+ messages in thread
From: Hans Reiser @ 2002-10-01 20:49 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Paul P Komkoff Jr, Linux Kernel Mailing List, god

Hans Reiser wrote:

> Paul P Komkoff Jr wrote:
>
>> This is the stupidiest testcase I've done but it worth seeing (maybe)
>>
>> We create 300000 files named from 00000000 to 000493E0 in one
>> directory, then delete it in order.
>>
>> Tests taken on ext3+htree and reiserfs. ext3 w/o htree hadn't
>> evaluated because it will take long long time ...
>>
>> both filesystems was mounted with noatime,nodiratime and ext3 was
>> data=writeback to be somewhat fair ...
>>
>>                real               user          sys
>> reiserfs:
>> Creating:     3m13.208s    0m4.412s    2m54.404s
>> Deleting:    4m41.250s    0m4.206s    4m17.926s
>>
>> Ext3:
>> Creating:    4m9.331s    0m3.927s    2m21.757s
>> Deleting:    9m14.838s    0m3.446s    1m39.508s
>>
>> htree improved this a much but it still beaten by reiserfs. seems odd
>> to me - deleting taking twice time then creating ...
>>
>>  
>>
> Can you send us the code so we can try it on reiser4?  We are going to 
> release reiser4 sometime this month (don't ask me when), and we'd be 
> happy to see you run it when you do.

^you^we

Sorry to list for bandwidth waste.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1
  2002-10-01 20:49   ` Hans Reiser
@ 2002-10-01 21:17     ` Rik van Riel
  2002-10-01 21:31     ` Daniel Phillips
  1 sibling, 0 replies; 20+ messages in thread
From: Rik van Riel @ 2002-10-01 21:17 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Paul P Komkoff Jr, Linux Kernel Mailing List, god

On Wed, 2 Oct 2002, Hans Reiser wrote:
> Hans Reiser wrote:

[snip 50 lines]

> ^you^we
>
> Sorry to list for bandwidth waste.

So learn quoting ;)

Rik
-- 
A: No.
Q: Should I include quotations after my reply?

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1
  2002-10-01 20:43 ` Andreas Dilger
@ 2002-10-01 21:19   ` Hans Reiser
  2002-10-02 10:48   ` Paul P Komkoff Jr
  2002-10-04 15:53   ` Oleg Drokin
  2 siblings, 0 replies; 20+ messages in thread
From: Hans Reiser @ 2002-10-01 21:19 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Linux Kernel Mailing List, ext2-devel, god

Andreas Dilger wrote:

>  
>
> It looks like
>reiserfs is nearly CPU-bound by the tests, so it is unlikely that they
>can run much faster.  
>
Um, usually being CPU bound is easier to fix.  We have probably not CPU 
profiled this code path, and after Halloween we probably should (but for 
reiser4, since reiser3 is soon to be obsoleted).  It is being IO bound 
that is usually hard to fix, though since I haven't read the htree code 
I trust you that it is different in this case....

>In theory, ext3+htree run at the CPU time if we
>fixed the allocation and/or seeking issues.
>
>Cheers, Andreas
>--
>Andreas Dilger
>http://www-mddsp.enel.ucalgary.ca/People/adilger/
>http://sourceforge.net/projects/ext2resize/
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>
>
>  
>




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1
  2002-10-01 19:59 [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 Paul P Komkoff Jr
  2002-10-01 20:43 ` Hans Reiser
  2002-10-01 20:43 ` Andreas Dilger
@ 2002-10-01 21:27 ` Daniel Phillips
  2002-10-02 16:38   ` Paul P Komkoff Jr
  2002-10-02  6:39 ` Nikita Danilov
  3 siblings, 1 reply; 20+ messages in thread
From: Daniel Phillips @ 2002-10-01 21:27 UTC (permalink / raw)
  To: Paul P Komkoff Jr, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1525 bytes --]

On Tuesday 01 October 2002 21:59, Paul P Komkoff Jr wrote:
> This is the stupidiest testcase I've done but it worth seeing (maybe)
> 
> We create 300000 files

How big are the files?

> named from 00000000 to 000493E0 in one directory, then delete it in order.

You probably want to try creating the files in random order as well.  A
program to do that is attached, use in the form:

    randfiles <basename> <count> y

where 'y' means 'print the names', for debugging purposes.

What did your delete command look like, "rm -rf" or "echo * | xargs rm"?

> Tests taken on ext3+htree and reiserfs. ext3 w/o htree hadn't
> evaluated because it will take long long time ...
> 
> both filesystems was mounted with noatime,nodiratime and ext3 was
> data=writeback to be somewhat fair ...
> 
> 	       	real 	      	user  		sys
> reiserfs:
> Creating: 	3m13.208s	0m4.412s	2m54.404s
> Deleting:	4m41.250s	0m4.206s	4m17.926s
> 
> Ext3:
> Creating:	4m9.331s	0m3.927s	2m21.757s
> Deleting:	9m14.838s	0m3.446s	1m39.508s
> 
> htree improved this a much but it still beaten by reiserfs. seems odd
> to me - deleting taking twice time then creating ...

Only 300,000 files, you haven't got enough to cause inode table thrashing,
though some kernels shrink the inode cache too agressively and that can
cause thrashing at lower numbers.  Maybe a bottleneck in the journal?

Not that anybody is going to complain about any of the above - it's still
running less than 1 ms/create, 2 ms/delete.  Still, it's slower than I'm
used to.

-- 
Daniel

[-- Attachment #2: randfiles.c --]
[-- Type: text/x-c, Size: 539 bytes --]

#include <stdlib.h>

#define swap(x, y) do { typeof(x) z = x; x = y; y = z; } while (0)

int main (int argc, char *argv[])
{
	int n = (argc > 2)? strtol(argv[2], 0, 10): 0;
	int i, size = 50, show = argc > 3 && !strncmp(argv[3], "y", 1);
	char name[size];
	int choose[n];

	for (i = 0; i < n; i++) choose[i] = i;
	for (i = n; i; i--) swap(choose[i-1], choose[rand() % i]);
	for (i = 0; i < n; i++)
	{
		snprintf(name, size, "%s%i", argv[1], choose[i]);
		if (show) printf("create %s\n", name);
		close(open(name, 0100));
	}
	return 0;
}



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1
  2002-10-01 20:49   ` Hans Reiser
  2002-10-01 21:17     ` Rik van Riel
@ 2002-10-01 21:31     ` Daniel Phillips
  1 sibling, 0 replies; 20+ messages in thread
From: Daniel Phillips @ 2002-10-01 21:31 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Paul P Komkoff Jr, Linux Kernel Mailing List, god

Hi Hans,

On Tuesday 01 October 2002 22:49, Hans Reiser wrote:
> > Can you send us the code so we can try it on reiser4?  We are going to 
> > release reiser4 sometime this month (don't ask me when), and we'd be 
> > happy to see you run it when you do.
> 
> ^you^we
> 
> Sorry to list for bandwidth waste.

Can be much reduced by selective quoting...

-- 
Daniel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1
  2002-10-01 19:59 [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 Paul P Komkoff Jr
                   ` (2 preceding siblings ...)
  2002-10-01 21:27 ` Daniel Phillips
@ 2002-10-02  6:39 ` Nikita Danilov
  3 siblings, 0 replies; 20+ messages in thread
From: Nikita Danilov @ 2002-10-02  6:39 UTC (permalink / raw)
  To: Paul P Komkoff Jr; +Cc: Linux Kernel Mailing List

Paul P Komkoff Jr writes:
 > This is the stupidiest testcase I've done but it worth seeing (maybe)
 > 
 > We create 300000 files named from 00000000 to 000493E0 in one
 > directory, then delete it in order.
 > 
 > Tests taken on ext3+htree and reiserfs. ext3 w/o htree hadn't
 > evaluated because it will take long long time ...
 > 
 > both filesystems was mounted with noatime,nodiratime and ext3 was
 > data=writeback to be somewhat fair ...
 > 
 > 	       	real 	      	user  		sys
 > reiserfs:
 > Creating: 	3m13.208s	0m4.412s	2m54.404s
 > Deleting:	4m41.250s	0m4.206s	4m17.926s
 > 
 > Ext3:
 > Creating:	4m9.331s	0m3.927s	2m21.757s
 > Deleting:	9m14.838s	0m3.446s	1m39.508s

Why user times are so different?

 > 
 > htree improved this a much but it still beaten by reiserfs. seems odd
 > to me - deleting taking twice time then creating ...
 > 
 > -- 
 > Paul P 'Stingray' Komkoff 'Greatest' Jr /// (icq)23200764 /// (http)stingr.net
 >   When you're invisible, the only one really watching you is you (my keychain)

Nikita.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1
  2002-10-01 20:43 ` Andreas Dilger
  2002-10-01 21:19   ` Hans Reiser
@ 2002-10-02 10:48   ` Paul P Komkoff Jr
  2002-10-02 16:54     ` Andreas Dilger
  2002-10-04 15:53   ` Oleg Drokin
  2 siblings, 1 reply; 20+ messages in thread
From: Paul P Komkoff Jr @ 2002-10-02 10:48 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: ext2-devel

Replying to Andreas Dilger:
> Why do you think data=writeback is better than data=journal?  If the
> files have no data then it should not make a difference.

It is better than default data=ordered I think :)

Thanks for detailed explanation - it saved much time for me and
accortind to yours directions I have recalculated my test. Now ext3 is
better :)

e3
create		2m49.545s	0m4.162s	2m20.766s
delete		2m8.155s	0m3.614s	1m34.945s

reiser
create		3m13.577s	0m4.338s	2m54.026s
delete		4m39.249s	0m3.968s 	4m16.297s

e3
create		2m50.766s	0m4.024s	2m21.197s
delete		2m8.755s	0m3.501s	1m35.737s

reiser
create		3m13.015s	0m4.432s	2m53.412s
delete		4m41.011s	0m3.893s	4m16.845s


this is two typical runs. Now I creating ext3 with
mke2fs -j -O dir_index -J size=192 -T news /dev/sda4

as you can see, this improves performance by 1/4

Unfortunately, there still one issue in ext3. It called "inode limit".
Initially I wanted to run this test on 1000000 files but ... I hit
inode limit and don't want to increase it artificially yet.

Reiserfs worked fine because it don't have such kind of limit ...

-- 
Paul P 'Stingray' Komkoff 'Greatest' Jr /// (icq)23200764 /// (http)stingr.net
  When you're invisible, the only one really watching you is you (my keychain)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1
  2002-10-01 21:27 ` Daniel Phillips
@ 2002-10-02 16:38   ` Paul P Komkoff Jr
  0 siblings, 0 replies; 20+ messages in thread
From: Paul P Komkoff Jr @ 2002-10-02 16:38 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Replying to Daniel Phillips:
> How big are the files?

0.

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>

main(int argc, char* argv[]) {

	int i, j, k = atoi(argv[1]);
	char t[128];

	for (i = 0; i < k; i++) {
		snprintf(t, 127, "%08X", i);
		if (-1 == (j = creat(t, S_IRWXU))) {
			perror("Create file");
			printf("no: %d\n", i);
			return;
		}
		close(j);
	}

}


> You probably want to try creating the files in random order as well.  A
> program to do that is attached, use in the form:
> 
>     randfiles <basename> <count> y
> 
> where 'y' means 'print the names', for debugging purposes.

this will be the next series of tests :)

> What did your delete command look like, "rm -rf" or "echo * | xargs rm"?

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>

main(int argc, char* argv[]) {

	int i, j, k = atoi(argv[1]);
	char t[128];

	for (i = 0; i < k; i++) {
		snprintf(t, 127, "%08X", i);
		if (-1 == unlink(t)) {
			perror("unlink");
			printf("no: %d\n", i);
			return;
		}
	}

}

> Only 300,000 files, you haven't got enough to cause inode table thrashing,
> though some kernels shrink the inode cache too agressively and that can
> cause thrashing at lower numbers.  Maybe a bottleneck in the journal?

Yes, increasing journal to fit the whole directory in it (as Andreas
Dilger said) improved results by 1/4. But. Initially my test was
1000000 files. /dev/sda4 in my tests 1882844. And I am quickly hitting
inode limit on -t news ext3 filesystem so I need to artificially
increase it at mke2fs time, but I decided to not do so (yet).

> Not that anybody is going to complain about any of the above - it's still
> running less than 1 ms/create, 2 ms/delete.  Still, it's slower than I'm
> used to.

I just trying to write a caching proxy-like application and not
reinvent the wheel (aka design my own filesystem and store it in a big
file just because some filesystem is so slow on large
directories/cannot make more than N empty objects etc).

-- 
Paul P 'Stingray' Komkoff 'Greatest' Jr /// (icq)23200764 /// (http)stingr.net
  When you're invisible, the only one really watching you is you (my keychain)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1
  2002-10-02 10:48   ` Paul P Komkoff Jr
@ 2002-10-02 16:54     ` Andreas Dilger
  2002-10-03  0:37       ` [Ext2-devel] " Theodore Ts'o
  0 siblings, 1 reply; 20+ messages in thread
From: Andreas Dilger @ 2002-10-02 16:54 UTC (permalink / raw)
  To: Linux Kernel Mailing List, ext2-devel

On Oct 02, 2002  14:48 +0400, Paul P Komkoff Jr wrote:
> Unfortunately, there still one issue in ext3. It called "inode limit".
> Initially I wanted to run this test on 1000000 files but ... I hit
> inode limit and don't want to increase it artificially yet.
> 
> Reiserfs worked fine because it don't have such kind of limit ...

We have plans to fix this already, but it is not high enough on anyones
priority list quite yet (most filesystems have enough inodes for regular
usage).

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ext2-devel] Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1
  2002-10-02 16:54     ` Andreas Dilger
@ 2002-10-03  0:37       ` Theodore Ts'o
  2002-10-03 12:04         ` Hans Reiser
  0 siblings, 1 reply; 20+ messages in thread
From: Theodore Ts'o @ 2002-10-03  0:37 UTC (permalink / raw)
  To: Linux Kernel Mailing List, ext2-devel

On Wed, Oct 02, 2002 at 10:54:54AM -0600, Andreas Dilger wrote:
> On Oct 02, 2002  14:48 +0400, Paul P Komkoff Jr wrote:
> > Unfortunately, there still one issue in ext3. It called "inode limit".
> > Initially I wanted to run this test on 1000000 files but ... I hit
> > inode limit and don't want to increase it artificially yet.
> > 
> > Reiserfs worked fine because it don't have such kind of limit ...
> 
> We have plans to fix this already, but it is not high enough on anyones
> priority list quite yet (most filesystems have enough inodes for regular
> usage).

Just to be clear, the limit which Paul is referring to is just simply
a matter of creating the filesystem with a sufficient number of
inodes.  (i.e., mke2fs -N 1200000).  Yes, having a dynamic inode table
would be good, but in practice sysadmins know how many inodes are
needed in advance.

						- Ted

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ext2-devel] Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1
  2002-10-03  0:37       ` [Ext2-devel] " Theodore Ts'o
@ 2002-10-03 12:04         ` Hans Reiser
  2002-10-03 19:40           ` Theodore Ts'o
  0 siblings, 1 reply; 20+ messages in thread
From: Hans Reiser @ 2002-10-03 12:04 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Linux Kernel Mailing List, ext2-devel

Theodore Ts'o wrote:

>  
>
>Just to be clear, the limit which Paul is referring to is just simply
>a matter of creating the filesystem with a sufficient number of
>inodes.  (i.e., mke2fs -N 1200000).  Yes, having a dynamic inode table
>would be good, but in practice sysadmins know how many inodes are
>needed in advance.
>
>						- Ted
>
>  
>

No they don't.  Average space wastage is more than 50% because sysadmins 
have to be conservative.

Hans


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ext2-devel] Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1
  2002-10-03 12:04         ` Hans Reiser
@ 2002-10-03 19:40           ` Theodore Ts'o
  2002-10-03 19:44             ` Hans Reiser
  0 siblings, 1 reply; 20+ messages in thread
From: Theodore Ts'o @ 2002-10-03 19:40 UTC (permalink / raw)
  To: Hans Reiser, G; +Cc: Linux Kernel Mailing List, ext2-devel

On Thu, Oct 03, 2002 at 04:04:12PM +0400, Hans Reiser wrote:
> 
> No they don't.  Average space wastage is more than 50% because sysadmins 
> have to be conservative.

Sure, but even a hundred megabytes or two out of a 100 gigabyte drive
is cheap.  (Specifically, about fifty cents' worth.)

						- Ted

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ext2-devel] Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1
  2002-10-03 19:40           ` Theodore Ts'o
@ 2002-10-03 19:44             ` Hans Reiser
  0 siblings, 0 replies; 20+ messages in thread
From: Hans Reiser @ 2002-10-03 19:44 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: G, Linux Kernel Mailing List, ext2-devel

Theodore Ts'o wrote:

>On Thu, Oct 03, 2002 at 04:04:12PM +0400, Hans Reiser wrote:
>  
>
>>No they don't.  Average space wastage is more than 50% because sysadmins 
>>have to be conservative.
>>    
>>
>
>Sure, but even a hundred megabytes or two out of a 100 gigabyte drive
>is cheap.  (Specifically, about fifty cents' worth.)
>
>						- Ted
>
>
>  
>
Usual space wastage is on the order of 5% of total partition size, yes? 
 Allocating 0.1% of your drive for inodes will get you into trouble if a 
user does something like use mh or read news, etc.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1
  2002-10-01 20:43 ` Andreas Dilger
  2002-10-01 21:19   ` Hans Reiser
  2002-10-02 10:48   ` Paul P Komkoff Jr
@ 2002-10-04 15:53   ` Oleg Drokin
  2002-10-04 17:09     ` [Ext2-devel] " Andreas Dilger
  2 siblings, 1 reply; 20+ messages in thread
From: Oleg Drokin @ 2002-10-04 15:53 UTC (permalink / raw)
  To: Linux Kernel Mailing List, ext2-devel

Hello!

On Tue, Oct 01, 2002 at 02:43:30PM -0600, Andreas Dilger wrote:

> As a result, if the size of the directory + inode table blocks is larger
> than memory, and also larger than 1/4 of the journal, you are essentially
> seek-bound because of random block dirtying.
> You should see what the size of the directory is at its peak (probably
> 16 bytes * 300k ~= 5MB, and add in the size of the directory blocks
> (128 bytes * 300k ~= 38MB) and make the journal 4x as large as that,
> so 192MB (mke2fs -j -J size=192) and re-run the test (I assume you have
> at least 256MB+ of RAM on the test system).

Hm. But all of that won't help if you need to read inodes from disk first,
right? (until that inode allocation in chunks implemented, of course).

BTW, in case of inode allocation in chunks attached to directory blocks,
you won't get any benefit in case if application creates file in some
tempoarry dir and then rename()s it to its proper place, or am I missing
something?

> What is very interesting from the above results is that the CPU usage
> is _much_ smaller for ext3+htree than for reiserfs.  It looks like

This is only in case of deletion, probably somehow related to constant item
shifting when some of the items are deleted.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ext2-devel] Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1
  2002-10-04 15:53   ` Oleg Drokin
@ 2002-10-04 17:09     ` Andreas Dilger
  2002-10-07  6:54       ` Oleg Drokin
  2002-10-10  0:27       ` Daniel Phillips
  0 siblings, 2 replies; 20+ messages in thread
From: Andreas Dilger @ 2002-10-04 17:09 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Linux Kernel Mailing List, ext2-devel

On Oct 04, 2002  19:53 +0400, Oleg Drokin wrote:
> On Tue, Oct 01, 2002 at 02:43:30PM -0600, Andreas Dilger wrote:
> > As a result, if the size of the directory + inode table blocks is larger
> > than memory, and also larger than 1/4 of the journal, you are essentially
> > seek-bound because of random block dirtying.
> > You should see what the size of the directory is at its peak (probably
> > 16 bytes * 300k ~= 5MB, and add in the size of the directory blocks
> > (128 bytes * 300k ~= 38MB) and make the journal 4x as large as that,
> > so 192MB (mke2fs -j -J size=192) and re-run the test (I assume you have
> > at least 256MB+ of RAM on the test system).
> 
> Hm. But all of that won't help if you need to read inodes from disk first,
> right? (until that inode allocation in chunks implemented, of course).

Ah, but see the follow-up reply - increasing the size of the journal as
advised improved the htree performance to 15% and 55% faster than
reiserfs for creates and deletes, respectively:

On Wed, 2 Oct 2002 14:48:59 +0400 Paul P Komkoff Jr replied:
> Thanks for detailed explanation - it saved much time for me and
> accortind to yours directions I have recalculated my test. Now ext3 is
> better :)
> 
>                 real            user            cpu
> e3
> create          2m49.545s       0m4.162s        2m20.766s
> delete          2m8.155s        0m3.614s        1m34.945s
> 
> reiser
> create          3m13.577s       0m4.338s        2m54.026s
> delete          4m39.249s       0m3.968s        4m16.297s
> 
> e3
> create          2m50.766s       0m4.024s        2m21.197s
> delete          2m8.755s        0m3.501s        1m35.737s
> 
> reiser
> create          3m13.015s       0m4.432s        2m53.412s
> delete          4m41.011s       0m3.893s        4m16.845s


On Oct 04, 2002  19:53 +0400, Oleg Drokin wrote some more:
> BTW, in case of inode allocation in chunks attached to directory blocks,
> you won't get any benefit in case if application creates file in some
> tempoarry dir and then rename()s it to its proper place, or am I missing
> something?

No, you are correct.  Renaming the files will randomly re-hash the names
and break any coherency between the directory leaf blocks and the inode
blocks.  However, such files are often short-lived anyways (mail spools
and such), and for the normal case (e.g. untar of a file) the names are
constant, so there should be a benefit for smaller journals from this.

> > What is very interesting from the above results is that the CPU usage
> > is _much_ smaller for ext3+htree than for reiserfs.  It looks like
> 
> This is only in case of deletion, probably somehow related to constant item
> shifting when some of the items are deleted.

Well, even for creates it is 19% less CPU.  The re-tested wall-clock
time for htree creates is now less than the CPU usage of reiserfs, so
it is impossible for reiserfs to achieve this number without
optimization of the code somehow.  For deletes the cpu usage of htree
is 40% less, but we are currently not doing leaf block compaction, so
there would probably be a slight performance hit to merge blocks
(although we have some plans to do that efficiently also).

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ext2-devel] Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1
  2002-10-04 17:09     ` [Ext2-devel] " Andreas Dilger
@ 2002-10-07  6:54       ` Oleg Drokin
  2002-10-10  0:27       ` Daniel Phillips
  1 sibling, 0 replies; 20+ messages in thread
From: Oleg Drokin @ 2002-10-07  6:54 UTC (permalink / raw)
  To: Linux Kernel Mailing List, ext2-devel

Hello!

On Fri, Oct 04, 2002 at 11:09:35AM -0600, Andreas Dilger wrote:
> > > As a result, if the size of the directory + inode table blocks is larger
> > > than memory, and also larger than 1/4 of the journal, you are essentially
> > > seek-bound because of random block dirtying.
> > > You should see what the size of the directory is at its peak (probably
> > > 16 bytes * 300k ~= 5MB, and add in the size of the directory blocks
> > > (128 bytes * 300k ~= 38MB) and make the journal 4x as large as that,
> > > so 192MB (mke2fs -j -J size=192) and re-run the test (I assume you have
> > > at least 256MB+ of RAM on the test system).
> > Hm. But all of that won't help if you need to read inodes from disk first,
> > right? (until that inode allocation in chunks implemented, of course).
> Ah, but see the follow-up reply - increasing the size of the journal as
> advised improved the htree performance to 15% and 55% faster than
> reiserfs for creates and deletes, respectively:

Yes, but that was the case with warm caches, as I understand it.
Usually you cannot count that all inodes of large file set are already present
and should not be read.

> > > What is very interesting from the above results is that the CPU usage
> > > is _much_ smaller for ext3+htree than for reiserfs.  It looks like
> > This is only in case of deletion, probably somehow related to constant item
> > shifting when some of the items are deleted.
> Well, even for creates it is 19% less CPU.  The re-tested wall-clock

I afraid other parts of code might have contributed there.
Like setting s_dirt way more often than needed.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ext2-devel] Re: [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1
  2002-10-04 17:09     ` [Ext2-devel] " Andreas Dilger
  2002-10-07  6:54       ` Oleg Drokin
@ 2002-10-10  0:27       ` Daniel Phillips
  1 sibling, 0 replies; 20+ messages in thread
From: Daniel Phillips @ 2002-10-10  0:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin; +Cc: Linux Kernel Mailing List, ext2-devel

On Friday 04 October 2002 19:09, Andreas Dilger wrote:
> On Oct 04, 2002  19:53 +0400, Oleg Drokin wrote:
> > On Tue, Oct 01, 2002 at 02:43:30PM -0600, Andreas Dilger wrote:
> > > What is very interesting from the above results is that the CPU usage
> > > is _much_ smaller for ext3+htree than for reiserfs.  It looks like
> > 
> > This is only in case of deletion, probably somehow related to constant item
> > shifting when some of the items are deleted.
> 
> Well, even for creates it is 19% less CPU.  The re-tested wall-clock
> time for htree creates is now less than the CPU usage of reiserfs, so
> it is impossible for reiserfs to achieve this number without
> optimization of the code somehow.  For deletes the cpu usage of htree
> is 40% less, but we are currently not doing leaf block compaction, so
> there would probably be a slight performance hit to merge blocks
> (although we have some plans to do that efficiently also).

I convinced myself at some point that compaction will cost no more
than a couple of percent for deletes and nothing for creates.

-- 
Daniel

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2002-10-10  0:20 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-01 19:59 [STUPID TESTCASE] ext3 htree vs. reiserfs on 2.5.40-mm1 Paul P Komkoff Jr
2002-10-01 20:43 ` Hans Reiser
2002-10-01 20:49   ` Hans Reiser
2002-10-01 21:17     ` Rik van Riel
2002-10-01 21:31     ` Daniel Phillips
2002-10-01 20:43 ` Andreas Dilger
2002-10-01 21:19   ` Hans Reiser
2002-10-02 10:48   ` Paul P Komkoff Jr
2002-10-02 16:54     ` Andreas Dilger
2002-10-03  0:37       ` [Ext2-devel] " Theodore Ts'o
2002-10-03 12:04         ` Hans Reiser
2002-10-03 19:40           ` Theodore Ts'o
2002-10-03 19:44             ` Hans Reiser
2002-10-04 15:53   ` Oleg Drokin
2002-10-04 17:09     ` [Ext2-devel] " Andreas Dilger
2002-10-07  6:54       ` Oleg Drokin
2002-10-10  0:27       ` Daniel Phillips
2002-10-01 21:27 ` Daniel Phillips
2002-10-02 16:38   ` Paul P Komkoff Jr
2002-10-02  6:39 ` Nikita Danilov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).