linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ext3 performance inconsistencies, 2.4/2.6
@ 2003-11-04 19:10 Paul Venezia
  2003-11-04 19:36 ` Linus Torvalds
  0 siblings, 1 reply; 17+ messages in thread
From: Paul Venezia @ 2003-11-04 19:10 UTC (permalink / raw)
  To: linux-kernel

I've been running bonnie++ filesystems testing on an IBM x335 server
recently. This box uses the MPT RAID controller, but I've disabled the
RAID and am addressing the disks individually. I'm getting wildly
different results between 2.4.20-20-9 (RedHat mod), 2.4.22 (stock), and
2.6.0-test9.

The full results are here: http://groove.jpj.net/x335-test.html

The base distro is RedHat 9, there are no extraneous daemons running or
modules loaded. I'm using a dedicated drive as the scratch directory.
I'm looking for some insight as to why I'm seeing such a disparity in
performance.

The server has Dual P4 3.06Ghz CPUs, 1.5GB RAM, two 36GB Ultra320 disks.

bonnie++ is run as

bonnie++ -d /test -s 3g -m x335-`uname -r` -n 200 -x 2 -u root -q 

Thanks 

-Paul


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: ext3 performance inconsistencies, 2.4/2.6
  2003-11-04 19:10 ext3 performance inconsistencies, 2.4/2.6 Paul Venezia
@ 2003-11-04 19:36 ` Linus Torvalds
  2003-11-04 20:20   ` Bill Rugolsky Jr.
  0 siblings, 1 reply; 17+ messages in thread
From: Linus Torvalds @ 2003-11-04 19:36 UTC (permalink / raw)
  To: Paul Venezia; +Cc: linux-kernel


On Tue, 4 Nov 2003, Paul Venezia wrote:
>
> I've been running bonnie++ filesystems testing on an IBM x335 server
> recently. This box uses the MPT RAID controller, but I've disabled the
> RAID and am addressing the disks individually. I'm getting wildly
> different results between 2.4.20-20-9 (RedHat mod), 2.4.22 (stock), and
> 2.6.0-test9.

Interesting. The 2.4.22 sequential "per char" results are totally out of
line with anything else.

The thing is, the overhead for the per-char stuff really should be almost 
all in user space unless I'm mistaken. It's just using getch/putch, no?

Which makes me suspect that either the libc does something different
depending on kernel version, _or_ 2.4.22 returns a different st_blksize
thing, causing stdio to use a different blocking size.

Have you tried stracing the "per char" parts of the benchmark to see what 
the system call patterns are? That should show both effects.

			Linus


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: ext3 performance inconsistencies, 2.4/2.6
  2003-11-04 19:36 ` Linus Torvalds
@ 2003-11-04 20:20   ` Bill Rugolsky Jr.
  2003-11-04 20:30     ` Linus Torvalds
  0 siblings, 1 reply; 17+ messages in thread
From: Bill Rugolsky Jr. @ 2003-11-04 20:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul Venezia, linux-kernel

On Tue, Nov 04, 2003 at 11:36:55AM -0800, Linus Torvalds wrote:
> > I've been running bonnie++ filesystems testing on an IBM x335 server
> > recently. This box uses the MPT RAID controller, but I've disabled the
> > RAID and am addressing the disks individually. I'm getting wildly
> > different results between 2.4.20-20-9 (RedHat mod), 2.4.22 (stock), and
> > 2.6.0-test9.
> 
> Interesting. The 2.4.22 sequential "per char" results are totally out of
> line with anything else.
> 
> The thing is, the overhead for the per-char stuff really should be almost 
> all in user space unless I'm mistaken. It's just using getch/putch, no?

Unless bonnie++ is using the _unlocked() variants, it might be an issue of
the mutex overhead from NPTL v. LinuxThreads.  Red Hat 9 has its share
of NPTL bugs.

It is probably worth rerunning the tests with LD_ASSUME_KERNEL=2.4.1 on
the Red Hat kernel.

Regards,
	
	Bill Rugolsky

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: ext3 performance inconsistencies, 2.4/2.6
  2003-11-04 20:20   ` Bill Rugolsky Jr.
@ 2003-11-04 20:30     ` Linus Torvalds
  2003-11-04 21:07       ` Paul Venezia
  2003-11-04 21:39       ` Bill Rugolsky Jr.
  0 siblings, 2 replies; 17+ messages in thread
From: Linus Torvalds @ 2003-11-04 20:30 UTC (permalink / raw)
  To: Bill Rugolsky Jr.; +Cc: Paul Venezia, linux-kernel


On Tue, 4 Nov 2003, Bill Rugolsky Jr. wrote:
> 
> Unless bonnie++ is using the _unlocked() variants, it might be an issue of
> the mutex overhead from NPTL v. LinuxThreads.  Red Hat 9 has its share
> of NPTL bugs.

Hmm.. That would easily explain the differences, since NPTL will trigger 
both on 2.6.0 and the RH-2.4 kernel, but not on the standard 2.4.22 
kernel.

But there really should be zero contention on the stdio data structures, 
so the locking would have to be _seriously_ broken to make that kind o 
fdifference (not necessarily buggy, but seriously badly implemented). 

A non-contended lock should be at most one locked instruction if well 
done, both on LinuxThreads and NPTL.

> It is probably worth rerunning the tests with LD_ASSUME_KERNEL=2.4.1 on
> the Red Hat kernel.

That would be interesting.

		Linus


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: ext3 performance inconsistencies, 2.4/2.6
  2003-11-04 20:30     ` Linus Torvalds
@ 2003-11-04 21:07       ` Paul Venezia
  2003-11-04 21:28         ` Bill Rugolsky Jr.
  2003-11-04 21:39       ` Bill Rugolsky Jr.
  1 sibling, 1 reply; 17+ messages in thread
From: Paul Venezia @ 2003-11-04 21:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Bill Rugolsky Jr., linux-kernel

On Tue, 2003-11-04 at 15:30, Linus Torvalds wrote:
> On Tue, 4 Nov 2003, Bill Rugolsky Jr. wrote:
> > 
> > Unless bonnie++ is using the _unlocked() variants, it might be an issue of
> > the mutex overhead from NPTL v. LinuxThreads.  Red Hat 9 has its share
> > of NPTL bugs.
> 
> Hmm.. That would easily explain the differences, since NPTL will trigger 
> both on 2.6.0 and the RH-2.4 kernel, but not on the standard 2.4.22 
> kernel.
> 
> But there really should be zero contention on the stdio data structures, 
> so the locking would have to be _seriously_ broken to make that kind o 
> fdifference (not necessarily buggy, but seriously badly implemented). 
> 
> A non-contended lock should be at most one locked instruction if well 
> done, both on LinuxThreads and NPTL.

Good point... 

A truncated strace under 2.4.22 is here:
http://groove.jpj.net/bonnie-strace-trunc

It's incomplete, but shows the putc calls.

> > It is probably worth rerunning the tests with LD_ASSUME_KERNEL=2.4.1 on
> > the Red Hat kernel.
> 
> That would be interesting.

Tests are running now. Updates as events warrant.

-Paul


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: ext3 performance inconsistencies, 2.4/2.6
  2003-11-04 21:07       ` Paul Venezia
@ 2003-11-04 21:28         ` Bill Rugolsky Jr.
  2003-11-04 21:40           ` Linus Torvalds
  0 siblings, 1 reply; 17+ messages in thread
From: Bill Rugolsky Jr. @ 2003-11-04 21:28 UTC (permalink / raw)
  To: Paul Venezia; +Cc: Linus Torvalds, linux-kernel

On Tue, Nov 04, 2003 at 04:07:43PM -0500, Paul Venezia wrote:
> Tests are running now. Updates as events warrant.

Well, I'm too lazy to wait for a long test, but with a mere
100MB file, on 1GHz P3:

Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
NPTL          100M  7735  99 127068  98 63048  84  7890  98 +++++ +++ +++++ +++
LinuxThreads  100M 11000  99 127928  97 59075  84 11290  98 +++++ +++ +++++ +++

So something is amiss.

Regards,

	Bill Rugolsky

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: ext3 performance inconsistencies, 2.4/2.6
  2003-11-04 20:30     ` Linus Torvalds
  2003-11-04 21:07       ` Paul Venezia
@ 2003-11-04 21:39       ` Bill Rugolsky Jr.
  1 sibling, 0 replies; 17+ messages in thread
From: Bill Rugolsky Jr. @ 2003-11-04 21:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul Venezia, linux-kernel

On Tue, Nov 04, 2003 at 12:30:23PM -0800, Linus Torvalds wrote:
> But there really should be zero contention on the stdio data structures, 
> so the locking would have to be _seriously_ broken to make that kind o 
> fdifference (not necessarily buggy, but seriously badly implemented). 
> 
> A non-contended lock should be at most one locked instruction if well 
> done, both on LinuxThreads and NPTL.

The results that I just posted are also for Red Hat 9, kernel 2.4.20-20.9.

rugolsky@ti31: getconf GNU_LIBPTHREAD_VERSION
NPTL 0.34

Ulrich's release notes for nptl-0.57 says:

   The changes are numerous and most of them were made by Jakub:

   ...

   ~ better stdio locking

I don't have my laptop running Fedora handy, but that's the next thing
to test.

Regards,

	Bill Rugolsky

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: ext3 performance inconsistencies, 2.4/2.6
  2003-11-04 21:28         ` Bill Rugolsky Jr.
@ 2003-11-04 21:40           ` Linus Torvalds
  2003-11-04 22:00             ` Ulrich Drepper
  2003-11-04 22:19             ` Bill Rugolsky Jr.
  0 siblings, 2 replies; 17+ messages in thread
From: Linus Torvalds @ 2003-11-04 21:40 UTC (permalink / raw)
  To: Bill Rugolsky Jr.; +Cc: Paul Venezia, Kernel Mailing List, Ulrich Drepper


On Tue, 4 Nov 2003, Bill Rugolsky Jr. wrote:
> 
> Well, I'm too lazy to wait for a long test, but with a mere
> 100MB file, on 1GHz P3:
> 
> Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> NPTL          100M  7735  99 127068  98 63048  84  7890  98 +++++ +++ +++++ +++
> LinuxThreads  100M 11000  99 127928  97 59075  84 11290  98 +++++ +++ +++++ +++
> 
> So something is amiss.

Ok, so NPTL locking (even in the absense of any threads and thus any 
contention) seems to be noticeably higher-overhead than the old 
LinuxThreads. 

90% of the overhead of a putc()/getc() implementation these days is likely
just locking. Even so, this implies that NPTL locking is about twice as 
expensive as the old LinuxThreads one.

Don't ask me why. But I'm cc'ing Uli, who can probably tell us. Maybe the 
RH-9 libraries are just not very good, and LinuxThreads has had a lot 
longer to optimize their lock behaviour..

			Linus



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: ext3 performance inconsistencies, 2.4/2.6
  2003-11-04 21:40           ` Linus Torvalds
@ 2003-11-04 22:00             ` Ulrich Drepper
  2003-11-04 22:31               ` Linus Torvalds
  2003-11-04 22:19             ` Bill Rugolsky Jr.
  1 sibling, 1 reply; 17+ messages in thread
From: Ulrich Drepper @ 2003-11-04 22:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Bill Rugolsky Jr., Paul Venezia, Kernel Mailing List

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Linus Torvalds wrote:

> Don't ask me why. But I'm cc'ing Uli, who can probably tell us. Maybe the 
> RH-9 libraries are just not very good, and LinuxThreads has had a lot 
> longer to optimize their lock behaviour..

I don't see any verison numbers mentioned.  If you want to benchmark
NPTL use the recent code, e.g., from Fedora Core 1 or RHEL3.  Nothing
else makes any sense since there have mean countless changes since the
early releases.

- -- 
- --------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/qCFh2ijCOnn/RHQRApi1AKCaU7vBtJsATDmx2dStMYishtbF9wCaAvOe
kNaoizj4xtUNU4TV2wH5GAw=
=0kb0
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: ext3 performance inconsistencies, 2.4/2.6
  2003-11-04 21:40           ` Linus Torvalds
  2003-11-04 22:00             ` Ulrich Drepper
@ 2003-11-04 22:19             ` Bill Rugolsky Jr.
  2003-11-04 22:26               ` Bill Rugolsky Jr.
  2003-11-05  7:14               ` Jakub Jelinek
  1 sibling, 2 replies; 17+ messages in thread
From: Bill Rugolsky Jr. @ 2003-11-04 22:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul Venezia, Kernel Mailing List, Ulrich Drepper

On Tue, Nov 04, 2003 at 01:40:51PM -0800, Linus Torvalds wrote:
> On Tue, 4 Nov 2003, Bill Rugolsky Jr. wrote:
> > 
> > Well, I'm too lazy to wait for a long test, but with a mere
> > 100MB file, on 1GHz P3:
> > 
> > Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
> >                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> > NPTL          100M  7735  99 127068  98 63048  84  7890  98 +++++ +++ +++++ +++
> > LinuxThreads  100M 11000  99 127928  97 59075  84 11290  98 +++++ +++ +++++ +++
> > 
> > So something is amiss.
> 
> Ok, so NPTL locking (even in the absense of any threads and thus any 
> contention) seems to be noticeably higher-overhead than the old 
> LinuxThreads. 
> 
> 90% of the overhead of a putc()/getc() implementation these days is likely
> just locking. Even so, this implies that NPTL locking is about twice as 
> expensive as the old LinuxThreads one.

On Fedora 0.95, Pentium M 1.6GHz, 2.4.22-1.2115.nptl, glibc-2.3.2-10, (NPTL 0.60),
I get:

Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
NPTL           100M 13070 100 +++++ +++ 14141   4 13099 100 +++++ +++ +++++ +++
LinuxThreads   100M 25957 100 +++++ +++ 20037   5 26777  99 +++++ +++ +++++ +++

Ugh, still there.

	Bill Rugolsky

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: ext3 performance inconsistencies, 2.4/2.6
  2003-11-04 22:19             ` Bill Rugolsky Jr.
@ 2003-11-04 22:26               ` Bill Rugolsky Jr.
  2003-11-05  7:14               ` Jakub Jelinek
  1 sibling, 0 replies; 17+ messages in thread
From: Bill Rugolsky Jr. @ 2003-11-04 22:26 UTC (permalink / raw)
  To: Linus Torvalds, Paul Venezia, Kernel Mailing List, Ulrich Drepper

On Tue, Nov 04, 2003 at 05:19:04PM -0500, Bill Rugolsky Jr. wrote:
> On Fedora 0.95, Pentium M 1.6GHz, 2.4.22-1.2115.nptl, glibc-2.3.2-10, (NPTL 0.60),
> I get:
> 
> Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
> NPTL           100M 13070 100 +++++ +++ 14141   4 13099 100 +++++ +++ +++++ +++
> LinuxThreads   100M 25957 100 +++++ +++ 20037   5 26777  99 +++++ +++ +++++ +++
 

Eek, that's glibc-2.3.2-101.
                          ^

 	- Bill Rugolsky

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: ext3 performance inconsistencies, 2.4/2.6
  2003-11-04 22:00             ` Ulrich Drepper
@ 2003-11-04 22:31               ` Linus Torvalds
  2003-11-04 23:48                 ` Ulrich Drepper
  0 siblings, 1 reply; 17+ messages in thread
From: Linus Torvalds @ 2003-11-04 22:31 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Bill Rugolsky Jr., Paul Venezia, Kernel Mailing List


On Tue, 4 Nov 2003, Ulrich Drepper wrote:
> 
> I don't see any verison numbers mentioned.  If you want to benchmark
> NPTL use the recent code, e.g., from Fedora Core 1 or RHEL3.  Nothing
> else makes any sense since there have mean countless changes since the
> early releases.

This is actually _really_ trivial to see with a simple test program.

This is Fedora Core test3:

	#include <stdlib.h>

	/* Change this to match your CPU */
	#define NR (10*1000*1000)

	int main(int argc, char **argv)
	{
	        int i;
	        for (i = 0; i < NR; i++)
	                putchar(0);
	}

and then just time it.

I get:

	torvalds@home:~> time ./a.out > /dev/null 

	real    0m1.305s
	user    0m1.283s
	sys     0m0.004s

and

	torvalds@home:~> time LD_ASSUME_KERNEL=2.4.1 ./a.out > /dev/null 
	
	real    0m0.321s
	user    0m0.318s
	sys     0m0.003s

ie a factor of _four_ difference in the speed of "putchar()".

Interestingly, if I compile the program statically, I don't see this 
effect, and it's noticeably faster still:

	torvalds@home:~> gcc -O2 -static test.c 
	torvalds@home:~> time ./a.out > /dev/null 

	real    0m0.193s
	user    0m0.191s
	sys     0m0.002s

	torvalds@home:~> time LD_ASSUME_KERNEL=2.4.1 ./a.out > /dev/null 

	real    0m0.194s
	user    0m0.190s
	sys     0m0.004s

Is the TLS stuff done through an extra dynamically loaded indirection or
something?

		Linus


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: ext3 performance inconsistencies, 2.4/2.6
  2003-11-04 22:31               ` Linus Torvalds
@ 2003-11-04 23:48                 ` Ulrich Drepper
  2003-11-04 23:56                   ` Linus Torvalds
  2003-11-05  0:58                   ` jlnance
  0 siblings, 2 replies; 17+ messages in thread
From: Ulrich Drepper @ 2003-11-04 23:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Bill Rugolsky Jr., Paul Venezia, Kernel Mailing List

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Linus Torvalds wrote:

> Is the TLS stuff done through an extra dynamically loaded indirection or
> something?

This has nothing to do with TLS.  The code currently used got to use the
general libpthread locking code.  This was, I think, the result of one
of the last changes in the locking code where the libc side wasn't
updated correctly.  I've done this and this is what I see:

drepper@ht 20031104-2$ time ./u > /dev/null
real    0m1.272s
user    0m1.270s
sys     0m0.000s

drepper@ht 20031104-2$ time LD_ASSUME_KERNEL=2.4.1 ./u > /dev/null
real    0m0.316s
user    0m0.320s
sys     0m0.000s

drepper@ht 20031104-2$ time LD_LIBRARY_PATH=. ./u > /dev/null
real    0m0.207s
user    0m0.210s
sys     0m0.000s



The first is the old nptl code, the second LinuxThreads, the third the
current nptl code.

- -- 
- --------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/qDrM2ijCOnn/RHQRAnAyAJ48OxeRGWefxHMVImZMiuZ2YaueOwCgk+8A
9k3SC5sMLghNmlMmzKwWv/E=
=UT6g
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: ext3 performance inconsistencies, 2.4/2.6
  2003-11-04 23:48                 ` Ulrich Drepper
@ 2003-11-04 23:56                   ` Linus Torvalds
  2003-11-05  0:58                   ` jlnance
  1 sibling, 0 replies; 17+ messages in thread
From: Linus Torvalds @ 2003-11-04 23:56 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Bill Rugolsky Jr., Paul Venezia, Kernel Mailing List


On Tue, 4 Nov 2003, Ulrich Drepper wrote:
>
>  This was, I think, the result of one of the last changes in the locking
> code where the libc side wasn't updated correctly.  I've done this and
> this is what I see:

Goodie. 

> drepper@ht 20031104-2$ time ./u > /dev/null
> real    0m1.272s
> user    0m1.270s
> sys     0m0.000s
> 
> drepper@ht 20031104-2$ time LD_ASSUME_KERNEL=2.4.1 ./u > /dev/null
> real    0m0.316s
> user    0m0.320s
> sys     0m0.000s
> 
> drepper@ht 20031104-2$ time LD_LIBRARY_PATH=. ./u > /dev/null
> real    0m0.207s
> user    0m0.210s
> sys     0m0.000s
> 
> The first is the old nptl code, the second LinuxThreads, the third the
> current nptl code.

Now _that_ looks a hell of a lot better. Thanks.

		Linus


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: ext3 performance inconsistencies, 2.4/2.6
  2003-11-04 23:48                 ` Ulrich Drepper
  2003-11-04 23:56                   ` Linus Torvalds
@ 2003-11-05  0:58                   ` jlnance
  2003-11-05  7:08                     ` Jakub Jelinek
  1 sibling, 1 reply; 17+ messages in thread
From: jlnance @ 2003-11-05  0:58 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel

On Tue, Nov 04, 2003 at 03:48:28PM -0800, Ulrich Drepper wrote:
> 
> The first is the old nptl code, the second LinuxThreads, the third the
> current nptl code.

By current, do you mean what is in Fedora, or you personal development copy?

Thanks,

Jim

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: ext3 performance inconsistencies, 2.4/2.6
  2003-11-05  0:58                   ` jlnance
@ 2003-11-05  7:08                     ` Jakub Jelinek
  0 siblings, 0 replies; 17+ messages in thread
From: Jakub Jelinek @ 2003-11-05  7:08 UTC (permalink / raw)
  To: jlnance; +Cc: Ulrich Drepper, linux-kernel

On Tue, Nov 04, 2003 at 07:58:16PM -0500, jlnance@unity.ncsu.edu wrote:
> On Tue, Nov 04, 2003 at 03:48:28PM -0800, Ulrich Drepper wrote:
> > 
> > The first is the old nptl code, the second LinuxThreads, the third the
> > current nptl code.
> 
> By current, do you mean what is in Fedora, or you personal development copy?

Ulrich meant glibc CVS HEAD.
For some reason, stdio locking was not using the jump around lock prefix
variant of locking:
            __asm __volatile ("cmpl $0, %%gs:%P6\n\t"                         \
                              "je,pt 0f\n\t"                                  \
                              "lock\n"                                        \
                              "0:\tcmpxchgl %1, %2\n\t"                       \
                              "jnz _L_mutex_lock_%=\n\t"                      \
			      ".subsection 1\m\t" ...
but one without the first 2 insns, so there were 2 instructions with lock
prefix in putc and similar functions even when only one thread was running.

	Jakub

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: ext3 performance inconsistencies, 2.4/2.6
  2003-11-04 22:19             ` Bill Rugolsky Jr.
  2003-11-04 22:26               ` Bill Rugolsky Jr.
@ 2003-11-05  7:14               ` Jakub Jelinek
  1 sibling, 0 replies; 17+ messages in thread
From: Jakub Jelinek @ 2003-11-05  7:14 UTC (permalink / raw)
  To: Bill Rugolsky Jr.,
	Linus Torvalds, Paul Venezia, Kernel Mailing List,
	Ulrich Drepper

On Tue, Nov 04, 2003 at 05:19:04PM -0500, Bill Rugolsky Jr. wrote:
> On Fedora 0.95, Pentium M 1.6GHz, 2.4.22-1.2115.nptl, glibc-2.3.2-10, (NPTL 0.60),
> I get:
> 
> Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
> NPTL           100M 13070 100 +++++ +++ 14141   4 13099 100 +++++ +++ +++++ +++
> LinuxThreads   100M 25957 100 +++++ +++ 20037   5 26777  99 +++++ +++ +++++ +++
> 
> Ugh, still there.

BTW, there are 3 different cases where locking might be different in glibc.
When -lpthread is not linked in, when -lpthread is linked in but
pthread_create hasn't been compiled yet and when first pthread_create has
been compiled already.

Could you post numbers for all these cases (ie. run the benchmark, then link
the benchmark against -lpthread as well and rerun it and last link it
against -lpthread and add:
static void * tf (void *a) { return NULL; }

...
pthread_t pt;
pthread_create (&pt, NULL, tf, 0);
pthread_join (pt, NULL);
...
to benchmark's main (in each case NPTL and LinuxThreads)?

	Jakub

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2003-11-05  7:14 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-11-04 19:10 ext3 performance inconsistencies, 2.4/2.6 Paul Venezia
2003-11-04 19:36 ` Linus Torvalds
2003-11-04 20:20   ` Bill Rugolsky Jr.
2003-11-04 20:30     ` Linus Torvalds
2003-11-04 21:07       ` Paul Venezia
2003-11-04 21:28         ` Bill Rugolsky Jr.
2003-11-04 21:40           ` Linus Torvalds
2003-11-04 22:00             ` Ulrich Drepper
2003-11-04 22:31               ` Linus Torvalds
2003-11-04 23:48                 ` Ulrich Drepper
2003-11-04 23:56                   ` Linus Torvalds
2003-11-05  0:58                   ` jlnance
2003-11-05  7:08                     ` Jakub Jelinek
2003-11-04 22:19             ` Bill Rugolsky Jr.
2003-11-04 22:26               ` Bill Rugolsky Jr.
2003-11-05  7:14               ` Jakub Jelinek
2003-11-04 21:39       ` Bill Rugolsky Jr.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).