linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: threading question
@ 2001-06-13 19:05 Hubertus Franke
  0 siblings, 0 replies; 34+ messages in thread
From: Hubertus Franke @ 2001-06-13 19:05 UTC (permalink / raw)
  To: bert hubert; +Cc: linux-kernel



>I got that response too. When I pressed kernel people for details it turns
>out that they think having hundreds of runnable threads/processes (mostly
>the same thing under Linux) is wasteful. The scheduler is just not
optimised
>for that.

Try out the http://lse.sourceforge.net/scheduling  patches. The MQ kernel
scheduler sure can handle this
kind of load :-)

Hubertus Franke
Enterprise Linux Group (Mgr),  Linux Technology Center (Member Scalability)
, OS-PIC (Chair)
email: frankeh@us.ibm.com
(w) 914-945-2003    (fax) 914-945-4425   TL: 862-2003



bert hubert <ahu@ds9a.nl>@vger.kernel.org on 06/13/2001 01:31:39 PM

Sent by:  linux-kernel-owner@vger.kernel.org


To:   linux-kernel@vger.kernel.org
cc:
Subject:  Re: threading question



On Tue, Jun 12, 2001 at 12:06:40PM -0700, Kip Macy wrote:
> This may sound like flamebait, but its not. Linux threads are basically
> just processes that share the same address space. Their performance is
> measurably worse than it is on most commercial Unixes and FreeBSD.

Thread creation may be a bit slow. But the kludges to provide posix threads
completely from userspace also hurt. Notably, they do not scale over
multiple CPUs.

> They are not, or at least two years ago, were not POSIX compliant
> (they behaved badly with respect to signals). The impoverished

POSIX threads are silly with respect to signals. I do almost all my
programming these days with pthreads and I find that I really do not miss
signals at all.

> from Larry McVoy's home page attributed to Alan Cox illustrates this
> reasonably well: "A computer is a state machine. Threads are for people
> who can't program state machines." Sorry for not being more helpful.

I got that response too. When I pressed kernel people for details it turns
out that they think having hundreds of runnable threads/processes (mostly
the same thing under Linux) is wasteful. The scheduler is just not
optimised
for that.

Regards,

bert

--
http://www.PowerDNS.com      Versatile DNS Services
Trilab                       The Technology People
'SYN! .. SYN|ACK! .. ACK!' - the mating call of the internet
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
       [not found] ` <fa.e54jbkv.kg4r99@ifi.uio.no>
@ 2001-06-16 22:22   ` Dan Maas
  0 siblings, 0 replies; 34+ messages in thread
From: Dan Maas @ 2001-06-16 22:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: Michael Rothwell, russell.leighton

> Is there a user-space implemenation (library?) for 
> coroutines that would work from C?

Here is another one:

http://oss.sgi.com/projects/state-threads/


Regards,
Dan


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-16 15:19       ` Alan Cox
  2001-06-16 18:33         ` Russell Leighton
@ 2001-06-16 19:06         ` Michael Rothwell
  1 sibling, 0 replies; 34+ messages in thread
From: Michael Rothwell @ 2001-06-16 19:06 UTC (permalink / raw)
  To: Russell Leighton; +Cc: linux-kernel

Try this:

http://lecker.essen.de/~froese/coro/

-M

On 16 Jun 2001 14:33:50 -0400, Russell Leighton wrote:
> 
> Is there a user-space implemenation (library?) for coroutines that would work from C?
> 
> 
> Alan Cox wrote:
> 
> > > Can you provide any info and/or examples of co-routines? I'm curious to
> > > see a good example of co-routines' "betterness."
> >
> > With co-routines you don't need
> >
> >         8K of kernel stack
> >         Scheduler overhead
> >         Fancy locking
> >
> > You don't get the automatic thread switching stuff though.
> >
> > So you might get code that reads like this (note that aio_ stuff works rather
> > well combined with co-routines as it fixes a lack of asynchronicity in the
> > unix disk I/O world)
> >
> >         select(....)
> >
> >         if(FD_ISSET(copier_fd))
> >                 run_coroutine(&copier_state);
> >
> >         ...
> >
> > and the copier might be something like
> >
> >         while(1)
> >         {
> >                 // Yes 1 at a time is dumb but this is an example..
> >                 // Yes Im ignoring EOF for this
> >                 if(read(copier_fd, buf[bufptr], 1)==-1)
> >                 {
> >                         if(errno==-EWOULDBLOCK)
> >                         {
> >                                 coroutine_return();
> >                                 continue;
> >                         }
> >                 }
> >                 if(bufptr==255  || buf[bufptr]=='\n')
> >                 {
> >                         run_coroutine(run_command, buf);
> >                         bufptr=0;
> >                 }
> >                 else
> >                         bufptr++;
> >         }
> >
> > it lets you express a state machine as a set of multiple such small state
> > machines instead.  run_coroutine() will continue a routine where it last
> > coroutine_return()'d from. Thus in the above case we are expressing read
> > bytes until you see a new line cleanly - not mangled in with keeping state
> > in global structures but by using natural C local variables and code flow
> >
> > Alan
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 
> --
> ---------------------------------------------------
> Russell Leighton    russell.leighton@247media.com
> ---------------------------------------------------
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
Michael Rothwell
rothwell@holly-springs.nc.us



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-16 15:19       ` Alan Cox
@ 2001-06-16 18:33         ` Russell Leighton
  2001-06-16 19:06         ` Michael Rothwell
  1 sibling, 0 replies; 34+ messages in thread
From: Russell Leighton @ 2001-06-16 18:33 UTC (permalink / raw)
  To: Alan Cox, linux-kernel


Is there a user-space implemenation (library?) for coroutines that would work from C?


Alan Cox wrote:

> > Can you provide any info and/or examples of co-routines? I'm curious to
> > see a good example of co-routines' "betterness."
>
> With co-routines you don't need
>
>         8K of kernel stack
>         Scheduler overhead
>         Fancy locking
>
> You don't get the automatic thread switching stuff though.
>
> So you might get code that reads like this (note that aio_ stuff works rather
> well combined with co-routines as it fixes a lack of asynchronicity in the
> unix disk I/O world)
>
>         select(....)
>
>         if(FD_ISSET(copier_fd))
>                 run_coroutine(&copier_state);
>
>         ...
>
> and the copier might be something like
>
>         while(1)
>         {
>                 // Yes 1 at a time is dumb but this is an example..
>                 // Yes Im ignoring EOF for this
>                 if(read(copier_fd, buf[bufptr], 1)==-1)
>                 {
>                         if(errno==-EWOULDBLOCK)
>                         {
>                                 coroutine_return();
>                                 continue;
>                         }
>                 }
>                 if(bufptr==255  || buf[bufptr]=='\n')
>                 {
>                         run_coroutine(run_command, buf);
>                         bufptr=0;
>                 }
>                 else
>                         bufptr++;
>         }
>
> it lets you express a state machine as a set of multiple such small state
> machines instead.  run_coroutine() will continue a routine where it last
> coroutine_return()'d from. Thus in the above case we are expressing read
> bytes until you see a new line cleanly - not mangled in with keeping state
> in global structures but by using natural C local variables and code flow
>
> Alan
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
---------------------------------------------------
Russell Leighton    russell.leighton@247media.com
---------------------------------------------------



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-16 14:16     ` Michael Rothwell
@ 2001-06-16 15:19       ` Alan Cox
  2001-06-16 18:33         ` Russell Leighton
  2001-06-16 19:06         ` Michael Rothwell
  0 siblings, 2 replies; 34+ messages in thread
From: Alan Cox @ 2001-06-16 15:19 UTC (permalink / raw)
  To: Michael Rothwell; +Cc: Alan Cox, linux-kernel

> Can you provide any info and/or examples of co-routines? I'm curious to
> see a good example of co-routines' "betterness."

With co-routines you don't need

	8K of kernel stack
	Scheduler overhead
	Fancy locking

You don't get the automatic thread switching stuff though.

So you might get code that reads like this (note that aio_ stuff works rather
well combined with co-routines as it fixes a lack of asynchronicity in the
unix disk I/O world)


	select(....)

	if(FD_ISSET(copier_fd))
		run_coroutine(&copier_state);

	...


and the copier might be something like

	while(1)
	{
		// Yes 1 at a time is dumb but this is an example..
		// Yes Im ignoring EOF for this
		if(read(copier_fd, buf[bufptr], 1)==-1)
		{
			if(errno==-EWOULDBLOCK)
			{
				coroutine_return();
				continue;
			}
		}
		if(bufptr==255  || buf[bufptr]=='\n')
		{
			run_coroutine(run_command, buf);
			bufptr=0;
		}
		else
			bufptr++;
	}


it lets you express a state machine as a set of multiple such small state
machines instead.  run_coroutine() will continue a routine where it last
coroutine_return()'d from. Thus in the above case we are expressing read
bytes until you see a new line cleanly - not mangled in with keeping state
in global structures but by using natural C local variables and code flow

Alan




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-14 18:28   ` Alan Cox
  2001-06-14 19:01     ` bert hubert
  2001-06-14 23:05     ` J . A . Magallon
@ 2001-06-16 14:16     ` Michael Rothwell
  2001-06-16 15:19       ` Alan Cox
  2 siblings, 1 reply; 34+ messages in thread
From: Michael Rothwell @ 2001-06-16 14:16 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

On 14 Jun 2001 19:28:32 +0100, Alan Cox wrote:

> Co-routines or better language choices are much more efficient ways to express
> the event handling problem.

Can you provide any info and/or examples of co-routines? I'm curious to
see a good example of co-routines' "betterness."

Thanks,

--
Michael Rothwell
rothwell@holly-springs.nc.us



^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: threading question
  2001-06-14 19:01     ` bert hubert
  2001-06-14 19:22       ` Russell Leighton
@ 2001-06-15 11:29       ` Anil Kumar
  1 sibling, 0 replies; 34+ messages in thread
From: Anil Kumar @ 2001-06-15 11:29 UTC (permalink / raw)
  To: bert hubert, Alan Cox; +Cc: Kip Macy, ognen, linux-kernel

Since while using only a small subset of primitives provided by the pthreads
the burden for the other primitive maintanence is much more so i too feel
when we use only a small part its better to implement in our own requiredd
way for performance issues.

-----Original Message-----
From: linux-kernel-owner@vger.kernel.org
[mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of bert hubert
Sent: Friday, June 15, 2001 12:32 AM
To: Alan Cox
Cc: Kip Macy; ognen@gene.pbi.nrc.ca; linux-kernel@vger.kernel.org
Subject: Re: threading question


On Thu, Jun 14, 2001 at 07:28:32PM +0100, Alan Cox wrote:

> There are really only two reasons for threaded programming.
>
> - Poor programmer skills/language expression of event handling

The converse is that pthreads are:

 - Very easy to use from C at a reasonable runtime overhead

It is very convenient for a userspace coder to be able to just start a
function in a different thread. Now it might be so that a kernel is not
there to provide ease of use for userspace coders but it is a factor.

I see lots of people only using:
	pthread_create()/pthread_join()
	mutex_lock/unlock
	sem_post/sem_wait
	no signals

My gut feeling is that you could implement this subset in a way that is both
fast and right - although it would not be 'pthreads compliant'. Can anybody
confirm this feeling?

Regards,

bert

--
http://www.PowerDNS.com      Versatile DNS Services
Trilab                       The Technology People
'SYN! .. SYN|ACK! .. ACK!' - the mating call of the internet
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-14 18:28   ` Alan Cox
  2001-06-14 19:01     ` bert hubert
@ 2001-06-14 23:05     ` J . A . Magallon
  2001-06-16 14:16     ` Michael Rothwell
  2 siblings, 0 replies; 34+ messages in thread
From: J . A . Magallon @ 2001-06-14 23:05 UTC (permalink / raw)
  To: Alan Cox; +Cc: Kip Macy, ognen, linux-kernel


On 20010614 Alan Cox wrote:
>
>So you have two choices
>1.	Pthread performance is poorer due to library glue
>2.	Every single signal delivery is 20% slower threaded or otherwise due
>	to all the crap that it adds 
>	And it does damage to other calls too.
>

Pthreads are a standard. You say 'use linux native calls, are faster and
make signal management efficient'. But then portability goes to hell. Now
I can run the same code on Linux, Irix and Solaris. Your way, I would
have to write three versions with clone(), sproc() and lwp_xxxx().
Take the example of OpenGL on IRIX boxes. Time ago it was a wrapper over
IrisGL. Now it is native. If you have a notably poor implimentation of
an standard nobody will use your system.

>In the big picture #1 is definitely preferable. 
>
>There are really only two reasons for threaded programming. 
>
>- Poor programmer skills/language expression of event handling
>
>- OS implementation flaws (and yes the posix/sus unix api has some of these)
>
>Co-routines or better language choices are much more efficient ways to express
>the event handling problem.
>
>fork() is often a better approach than pthreads at least for the design of an
>SMP threaded application because unless you explicitly think about what you
>share you will never get the cache affinity you need for good performance.
>

Joking ? That only works if your more complex structure is an array. Try
to get a rendering program with a complex linked lits-tree data structure
for the geometry, materials, textures, etc and
thinking on cache affinity. You can only think about that locally: mmm, I
need a counter for each thread, I would not put them all in an array because
I will trash caches, lets put them in separate variables; need to return
data to a segment of a big array, lets use a local copy and then pass it back.
But no more. Yes, you can change all your malloc() or new for shm's, but
what is the gain ? That is the beauty of shared memory boxes.

What linux needs is a good implementation for POSIX threads. I do not mean
putting pthreads right into the kernel, but perhaps some small change or
addition can make the user space much much faster. There are many apps that
can benefit much from using threads, use a big data segment in ro mode,
and just communicate a bit between them (a threaded web server, a rendering
program).

-- 
J.A. Magallon                           #  Let the source be with you...        
mailto:jamagallon@able.es
Linux Mandrake release 8.1 (Cooker) for i586
Linux werewolf 2.4.5-ac13 #1 SMP Sun Jun 10 21:42:28 CEST 2001 i686

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-14 19:01     ` bert hubert
@ 2001-06-14 19:22       ` Russell Leighton
  2001-06-15 11:29       ` Anil Kumar
  1 sibling, 0 replies; 34+ messages in thread
From: Russell Leighton @ 2001-06-14 19:22 UTC (permalink / raw)
  To: linux-kernel


bert hubert wrote:

> <stuff deleted>
>
> I see lots of people only using:
>         pthread_create()/pthread_join()
>         mutex_lock/unlock
>         sem_post/sem_wait
>         no signals
>
> My gut feeling is that you could implement this subset in a way that is both
> fast and right - although it would not be 'pthreads compliant'. Can anybody
> confirm this feeling?

... add condition variables (maybe a small per-thread storage area)
and I'd toss out pthreads for most apps I write...especially if it is very efficient.

--
---------------------------------------------------
Russell Leighton    russell.leighton@247media.com
---------------------------------------------------



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-14 18:28   ` Alan Cox
@ 2001-06-14 19:01     ` bert hubert
  2001-06-14 19:22       ` Russell Leighton
  2001-06-15 11:29       ` Anil Kumar
  2001-06-14 23:05     ` J . A . Magallon
  2001-06-16 14:16     ` Michael Rothwell
  2 siblings, 2 replies; 34+ messages in thread
From: bert hubert @ 2001-06-14 19:01 UTC (permalink / raw)
  To: Alan Cox; +Cc: Kip Macy, ognen, linux-kernel

On Thu, Jun 14, 2001 at 07:28:32PM +0100, Alan Cox wrote:

> There are really only two reasons for threaded programming. 
> 
> - Poor programmer skills/language expression of event handling

The converse is that pthreads are:

 - Very easy to use from C at a reasonable runtime overhead

It is very convenient for a userspace coder to be able to just start a
function in a different thread. Now it might be so that a kernel is not
there to provide ease of use for userspace coders but it is a factor.

I see lots of people only using:
	pthread_create()/pthread_join()
	mutex_lock/unlock
	sem_post/sem_wait
	no signals
	
My gut feeling is that you could implement this subset in a way that is both
fast and right - although it would not be 'pthreads compliant'. Can anybody
confirm this feeling?

Regards,

bert

-- 
http://www.PowerDNS.com      Versatile DNS Services  
Trilab                       The Technology People   
'SYN! .. SYN|ACK! .. ACK!' - the mating call of the internet

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-12 19:06 ` Kip Macy
  2001-06-12 19:14   ` Alexander Viro
  2001-06-13 17:31   ` bert hubert
@ 2001-06-14 18:28   ` Alan Cox
  2001-06-14 19:01     ` bert hubert
                       ` (2 more replies)
  2 siblings, 3 replies; 34+ messages in thread
From: Alan Cox @ 2001-06-14 18:28 UTC (permalink / raw)
  To: Kip Macy; +Cc: ognen, linux-kernel

> just processes that share the same address space. Their performance is
> measurably worse than it is on most commercial Unixes and FreeBSD.

Actually their performance is massively superior. But that is because we were
not stupid enough to burden the kernel with all of the posix pthread crap.
Pthreads is an ugly compromise API that can be badly implemented in both
userland and kernel space. Unfortunately its also a standard.

So you have two choices
1.	Pthread performance is poorer due to library glue
2.	Every single signal delivery is 20% slower threaded or otherwise due
	to all the crap that it adds 
	And it does damage to other calls too.

In the big picture #1 is definitely preferable. 

There are really only two reasons for threaded programming. 

- Poor programmer skills/language expression of event handling

- OS implementation flaws (and yes the posix/sus unix api has some of these)

Co-routines or better language choices are much more efficient ways to express
the event handling problem.

fork() is often a better approach than pthreads at least for the design of an
SMP threaded application because unless you explicitly think about what you
share you will never get the cache affinity you need for good performance.

And if you don't care about cache affinity then you shouldnt care about
pthread_create overhead because quite frankly pthread_create overhead is easily
mitigated (thread cache) and in most real world applications considerably less
of an performance hit

Alan


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-12 21:48     ` ognen
@ 2001-06-14 18:15       ` Alan Cox
  0 siblings, 0 replies; 34+ messages in thread
From: Alan Cox @ 2001-06-14 18:15 UTC (permalink / raw)
  To: ognen; +Cc: Davide Libenzi, linux-kernel

> they are done. This should help it (and avoid the pthread_create,
> pthread_exit). I will implement this and report my results if there is
> interest.

You should also check up the cache colouring. X86 boxes have relatively poor
memory performance and most x86 chips have lousy behaviour when data bounces
between processors or is driven out of cache

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-13 17:31   ` bert hubert
@ 2001-06-14  6:45     ` Helge Hafting
  0 siblings, 0 replies; 34+ messages in thread
From: Helge Hafting @ 2001-06-14  6:45 UTC (permalink / raw)
  To: bert hubert; +Cc: linux-kernel

bert hubert wrote:

> > from Larry McVoy's home page attributed to Alan Cox illustrates this
> > reasonably well: "A computer is a state machine. Threads are for people
> > who can't program state machines." Sorry for not being more helpful.
> 
> I got that response too. When I pressed kernel people for details it turns
> out that they think having hundreds of runnable threads/processes (mostly
> the same thing under Linux) is wasteful. The scheduler is just not optimised
> for that.

The scheduler can be optimized for that, so far at the cost of
pessimizing
the common case with few threads.  The bigger problem here is that
your cpu (particularly TLB's and caches) aren't optimized for switching
between a lot of threads either.  This will always be a problem as long
as cpu's have level 1 caches much smaller than the combined working
set of your threads.  So run one thread per cpu, perhaps two if you
expect
io stalls.  The task at hand may easily be divided into many more parts,
but serializing those extra parts will be better for performance.

Helge Hafting

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-12 19:06 ` Kip Macy
  2001-06-12 19:14   ` Alexander Viro
@ 2001-06-13 17:31   ` bert hubert
  2001-06-14  6:45     ` Helge Hafting
  2001-06-14 18:28   ` Alan Cox
  2 siblings, 1 reply; 34+ messages in thread
From: bert hubert @ 2001-06-13 17:31 UTC (permalink / raw)
  To: linux-kernel

On Tue, Jun 12, 2001 at 12:06:40PM -0700, Kip Macy wrote:
> This may sound like flamebait, but its not. Linux threads are basically
> just processes that share the same address space. Their performance is
> measurably worse than it is on most commercial Unixes and FreeBSD.

Thread creation may be a bit slow. But the kludges to provide posix threads
completely from userspace also hurt. Notably, they do not scale over
multiple CPUs.

> They are not, or at least two years ago, were not POSIX compliant
> (they behaved badly with respect to signals). The impoverished

POSIX threads are silly with respect to signals. I do almost all my
programming these days with pthreads and I find that I really do not miss
signals at all.

> from Larry McVoy's home page attributed to Alan Cox illustrates this
> reasonably well: "A computer is a state machine. Threads are for people
> who can't program state machines." Sorry for not being more helpful.

I got that response too. When I pressed kernel people for details it turns
out that they think having hundreds of runnable threads/processes (mostly
the same thing under Linux) is wasteful. The scheduler is just not optimised
for that.

Regards,

bert

-- 
http://www.PowerDNS.com      Versatile DNS Services  
Trilab                       The Technology People   
'SYN! .. SYN|ACK! .. ACK!' - the mating call of the internet

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-13 14:17         ` Philips
@ 2001-06-13 15:06           ` ognen
  0 siblings, 0 replies; 34+ messages in thread
From: ognen @ 2001-06-13 15:06 UTC (permalink / raw)
  To: Philips; +Cc: linux-kernel

Solaris has pset_create() and pset_bind() where you can bind LWPs to
specific processors, but I doubt this works on anything else....

Best regards,
Ognen

On Wed, 13 Jun 2001, Philips wrote:

> 	BTW.
> 	Question was poping in my mind and finally got negative answer by my mind ;-)
>
> 	Is it possible to make somethis like:
>
>
> 	char a[100] = {...}
> 	char b[100] = {...}
> 	char c[100];
> 	char d[100];
>
> 	1: { // run this on first CPU
> 		for (int i=0; i<100; i++) c[i] = a[i] + b[i];
> 	};
> 	2: { // run this on any other CPU
> 		for (int i=0; i<100; i++) d[i] = a[i] * b[i];
> 	};
>
> 	...
> 	// do something else...
> 	...
>
> 	wait 1,2; // to be sure c[] and d[] are ready.
>
>
> 	what was popping in my mind - some prefix (like 0x66 Intel used for 32
> instructions) to say this instruction should run on other CPU?
> 	I know - stupid idea. Too many questions will arise.
> 	If we will do
>
> 	PREFIX jmp far some_routing
>
> 	and this routing will run on other CPU not blocking current execution thread.
> 	(who will clean stack? when?.. question without answers...)
>
> 	Is there anything like this in computerworld? I heard about old computers that
> have a speacial instruction set to implicit run code on given processor.
> 	Is it possible to emulate this behavior on PCs?

-- 
Ognen Duzlevski
Plant Biotechnology Institute
National Research Council of Canada
Bioinformatics team


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-13 13:35       ` J . A . Magallon
@ 2001-06-13 14:17         ` Philips
  2001-06-13 15:06           ` ognen
  0 siblings, 1 reply; 34+ messages in thread
From: Philips @ 2001-06-13 14:17 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1818 bytes --]

"J . A . Magallon" wrote:
> 
> On 20010613 Kurt Garloff wrote:
> >
> > What I do in my numerics code to avoid this problem, is to create all the
> > threads (as many as there are CPUs) on program startup and have then wait
> > (block) for a condition. As soon as there's something to to, variables for
> > the thread are setup (protected by a mutex) and the thread gets signalled
> > (cond_signal).
> > If you're interested in the code, tell me.
> >
> 
> I use the reverse approach. you feed work to the threads, I create the threads
> and let them ask for work to a master until it says 'done'. When the
> master is queried for work, it locks a mutex, decide the next work for
> that thread, and unlocks it. I think it gives the lesser contention and
> is simpler to manage.
> 

	BTW. 
	Question was poping in my mind and finally got negative answer by my mind ;-)

	Is it possible to make somethis like:


	char a[100] = {...}
	char b[100] = {...}
	char c[100];
	char d[100];
	
	1: { // run this on first CPU
		for (int i=0; i<100; i++) c[i] = a[i] + b[i];
	};
	2: { // run this on any other CPU
		for (int i=0; i<100; i++) d[i] = a[i] * b[i];
	};
	
	...
	// do something else...
	...
	
	wait 1,2; // to be sure c[] and d[] are ready.


	what was popping in my mind - some prefix (like 0x66 Intel used for 32
instructions) to say this instruction should run on other CPU?
	I know - stupid idea. Too many questions will arise. 
	If we will do 

	PREFIX jmp far some_routing

	and this routing will run on other CPU not blocking current execution thread.
	(who will clean stack? when?.. question without answers...)

	Is there anything like this in computerworld? I heard about old computers that
have a speacial instruction set to implicit run code on given processor.
	Is it possible to emulate this behavior on PCs?

[-- Attachment #2: Card for Philips --]
[-- Type: text/x-vcard, Size: 407 bytes --]

begin:vcard 
n:Filiapau;Ihar
tel;pager:+375 (0) 17 2850000#6683
tel;fax:+375 (0) 17 2841537
tel;home:+375 (0) 17 2118441
tel;work:+375 (0) 17 2841371
x-mozilla-html:TRUE
url:www.iph.to
org:Enformatica Ltd.;Linux Developement Department
adr:;;Kalinine str. 19-18;Minsk;BY;220012;Belarus
version:2.1
email;internet:philips@iph.to
title:Software Developer
note:(none)
x-mozilla-cpt:;18368
fn:Philips
end:vcard

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-13 12:20     ` Kurt Garloff
@ 2001-06-13 13:35       ` J . A . Magallon
  2001-06-13 14:17         ` Philips
  0 siblings, 1 reply; 34+ messages in thread
From: J . A . Magallon @ 2001-06-13 13:35 UTC (permalink / raw)
  To: Kurt Garloff; +Cc: ognen, Christoph Hellwig, linux-kernel


On 20010613 Kurt Garloff wrote:
> 
> What I do in my numerics code to avoid this problem, is to create all the
> threads (as many as there are CPUs) on program startup and have then wait
> (block) for a condition. As soon as there's something to to, variables for
> the thread are setup (protected by a mutex) and the thread gets signalled
> (cond_signal).
> If you're interested in the code, tell me.
> 

I use the reverse approach. you feed work to the threads, I create the threads
and let them ask for work to a master until it says 'done'. When the
master is queried for work, it locks a mutex, decide the next work for
that thread, and unlocks it. I think it gives the lesser contention and
is simpler to manage.

-- 
J.A. Magallon                           #  Let the source be with you...        
mailto:jamagallon@able.es
Linux Mandrake release 8.1 (Cooker) for i586
Linux werewolf 2.4.5-ac13 #1 SMP Sun Jun 10 21:42:28 CEST 2001 i686

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-12 19:07   ` ognen
  2001-06-12 19:15     ` Kip Macy
  2001-06-12 19:15     ` Christoph Hellwig
@ 2001-06-13 12:20     ` Kurt Garloff
  2001-06-13 13:35       ` J . A . Magallon
  2 siblings, 1 reply; 34+ messages in thread
From: Kurt Garloff @ 2001-06-13 12:20 UTC (permalink / raw)
  To: ognen; +Cc: Christoph Hellwig, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1162 bytes --]

On Tue, Jun 12, 2001 at 01:07:11PM -0600, ognen@gene.pbi.nrc.ca wrote:
> due to the nature of the problem (a pairwise mutual alignment of n
> sequences results in mx. n^2 alignments which can each be done in a
> separate thread), I need to create and destroy the threads frequently.
> 
> I am not really comfortable with 1.4 - 1.5 speedups since the solution was
> intended as a Linux one primarily and it just happenned that it works (and
> now even better) on Solaris/SGI/OSF...

Nor would I. 

What I do in my numerics code to avoid this problem, is to create all the
threads (as many as there are CPUs) on program startup and have then wait
(block) for a condition. As soon as there's something to to, variables for
the thread are setup (protected by a mutex) and the thread gets signalled
(cond_signal).
If you're interested in the code, tell me.

This is supposed to be much faster than thread creation.

Regards,
-- 
Kurt Garloff  <garloff@suse.de>                          Eindhoven, NL
GPG key: See mail header, key servers         Linux kernel development
SuSE GmbH, Nuernberg, FRG                               SCSI, Security

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-12 21:58     ` Albert D. Cahalan
@ 2001-06-12 23:48       ` J . A . Magallon
  0 siblings, 0 replies; 34+ messages in thread
From: J . A . Magallon @ 2001-06-12 23:48 UTC (permalink / raw)
  To: Albert D . Cahalan; +Cc: Davide Libenzi, Christoph Hellwig, linux-kernel, ognen


On 20010612 Albert D. Cahalan wrote:
> 
> In that case, this could be a hardware issue. Note that he seems
> to be comparing an x86 PC against SGI MIPS, Sun SPARC, and Compaq
> Alpha hardware.
> 
> His data set is most likely huge. It's DNA data.
> 
> The x86 box likely has small caches, a fast core, and a slow bus.
> So most of the time the CPU will be stalled waiting for a memory
> operation.
> 

Perhaps is just synchronization of caches. 
say you want to sum all the elements of a vector in parallele split in
two pieces:

int total=0;
thread 1:
	for fist half
		total += v[i]
thread 2:
	for second half
		total += v[i]

and you tought: 'well, I need a mutex for access to total. that will slow
down things, lets use separate counters':

int bigtotal;
int total[2];
thread 1:
	for fist half
		total[0] += v[i]
thread 2:
	for second half
		total[1] += v[i]

bigtotal = total[0]+total[1]

The problem ? total[0] and total[1] are nearby one of each other. So in
the same cache line. So on every write to total[?], even if they are
independent, system has to synchrnize caches.

Big iron (SGI, Sparc), has special hardware, but cheap PC mobos...

-- 
J.A. Magallon                           #  Let the source be with you...        
mailto:jamagallon@able.es
Linux Mandrake release 8.1 (Cooker) for i586
Linux werewolf 2.4.5-ac13 #1 SMP Sun Jun 10 21:42:28 CEST 2001 i686

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-12 19:25     ` Russell Leighton
@ 2001-06-12 23:27       ` Mike Castle
  0 siblings, 0 replies; 34+ messages in thread
From: Mike Castle @ 2001-06-12 23:27 UTC (permalink / raw)
  To: linux-kernel

On Tue, Jun 12, 2001 at 03:25:54PM -0400, Russell Leighton wrote:
> Any recommendations for alternate threading packages?

Does NSPR use native methods (ie, clone), or just ride on top of pthreads?

What about the gnu threading package?

mrc
-- 
     Mike Castle      dalgoda@ix.netcom.com      www.netcom.com/~dalgoda/
    We are all of us living in the shadow of Manhattan.  -- Watchmen
fatal ("You are in a maze of twisty compiler features, all different"); -- gcc

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-12 18:24 ognen
                   ` (2 preceding siblings ...)
  2001-06-12 19:06 ` Kip Macy
@ 2001-06-12 22:41 ` Pavel Machek
  3 siblings, 0 replies; 34+ messages in thread
From: Pavel Machek @ 2001-06-12 22:41 UTC (permalink / raw)
  To: ognen, linux-kernel

Hi!

> I am a summer student implementing a multi-threaded version of a very
> popular bioinformatics tool. So far it compiles and runs without problems
> (as far as I can tell ;) on Linux 2.2.x, Sun Solaris, SGI IRIX and Compaq
> OSF/1 running on Alpha. I have ran a lot of timing tests compared to the
> sequential version of the tool on all of these machines (most of them are
> dual-CPU, although I am also running tests on 12-CPU Solaris and 108 CPU
> SGI IRIX). On dual-CPU machines the speedups are as follows: my version
> is 1.88 faster than the sequential one on IRIX, 1.81 times on Solaris,
> 1.8 times on OSF/1, 1.43 times on Linux 2.2.x and 1.52 times on Linux 2.4
> kernel. Why are the numbers on Linux machines so much lower? It is
> the

But this is all different hw, no?

So dual cpu SPARC is more efficient than dual cpu i686. Maybe SPARCs
have faster RAM and slower cpus... 
								Pavel
-- 
I'm pavel@ucw.cz. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at discuss@linmodems.org

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-12 21:44   ` Davide Libenzi
  2001-06-12 21:48     ` ognen
@ 2001-06-12 21:58     ` Albert D. Cahalan
  2001-06-12 23:48       ` J . A . Magallon
  1 sibling, 1 reply; 34+ messages in thread
From: Albert D. Cahalan @ 2001-06-12 21:58 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Christoph Hellwig, linux-kernel, ognen

Davide Libenzi writes:
> On 12-Jun-2001 Christoph Hellwig wrote:
>> In article <Pine.LNX.4.30.0106121213570.24593-100000@gene.pbi.nrc.ca> you
>> wrote:

>>> On dual-CPU machines the speedups are as follows: my version
>>> is 1.88 faster than the sequential one on IRIX, 1.81 times on Solaris,
>>> 1.8 times on OSF/1, 1.43 times on Linux 2.2.x and 1.52 times on Linux
>>> 2.4 kernel. Why are the numbers on Linux machines so much lower?
...
> The context switch is very low and the user CPU utilization is 100%,
> I don't think it's system responsibility here ( clearly a CPU bound
> program ).  Even if the runqueue is long, the context switch is low.
> I've just close to me a dual PIII 1GHz workstation that run an MTA
> that uses linux pthreads with context switching ranging between 5000
> and 11000 with a thread creation rate of about 300 thread/sec (
> relaying 600000 msg/hour ).  No problem at all with the system even
> if the load avg is a bit high ( about 8 ).

In that case, this could be a hardware issue. Note that he seems
to be comparing an x86 PC against SGI MIPS, Sun SPARC, and Compaq
Alpha hardware.

His data set is most likely huge. It's DNA data.

The x86 box likely has small caches, a fast core, and a slow bus.
So most of the time the CPU will be stalled waiting for a memory
operation.

Maybe there are performance monitor registers that could be used
to determine if this is the case.

(Not that the app design is sane though.)


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-12 21:44   ` Davide Libenzi
@ 2001-06-12 21:48     ` ognen
  2001-06-14 18:15       ` Alan Cox
  2001-06-12 21:58     ` Albert D. Cahalan
  1 sibling, 1 reply; 34+ messages in thread
From: ognen @ 2001-06-12 21:48 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: linux-kernel

Hello,

a good suggestion was given to me to actually create as many threads as
there are CPUs (or a bit more) and then keep them asking for work when
they are done. This should help it (and avoid the pthread_create,
pthread_exit). I will implement this and report my results if there is
interest.

Thank you all,
Ognen

On Tue, 12 Jun 2001, Davide Libenzi wrote:

>
> On 12-Jun-2001 Christoph Hellwig wrote:
> > In article <Pine.LNX.4.30.0106121213570.24593-100000@gene.pbi.nrc.ca> you
> > wrote:
> >> On dual-CPU machines the speedups are as follows: my version
> >> is 1.88 faster than the sequential one on IRIX, 1.81 times on Solaris,
> >> 1.8 times on OSF/1, 1.43 times on Linux 2.2.x and 1.52 times on Linux 2.4
> >> kernel. Why are the numbers on Linux machines so much lower?
> >
> > Does your measurement include the time needed to actually create the
> > threads or do you even frequently create and destroy threads?
>
> This is an extract of the most busy vmstat report running under his tool :
>
> 12  0  0  15508  40980  24880 355480   0   0     0     0  141   481 100   0   0
> 19  0  0  15508  40248  24880 355480   0   0     0     0  142   564 100   0   0
> 12  0  0  15508  40112  24880 355480   0   0     0     0  150   543 100   0   0
> 11  0  0  15508  41272  24880 355480   0   0     0     0  156   594  99   1   0
> 17  0  0  15508  40408  24880 355480   0   0     0     0  156   474  99   1   0
> 17  0  0  15508  39840  24880 355480   0   0     0     0  135   475 100   0   0
> 21  0  0  15508  39568  24880 355480   0   0     0     0  125   409 100   0   0
> 21  0  0  15508  39668  24880 355480   0   0     0     0  135   420 100   0   0
> 16  0  0  15508  39760  24880 355480   0   0     0     0  149   486 100   0   0
>
>
> The context switch is very low and the user CPU utilization is 100% , I don't
> think it's system responsibility here ( clearly a CPU bound program ).
> Even if the runqueue is long, the context switch is low.
> I've just close to me a dual PIII 1GHz workstation that run an MTA that uses
> linux pthreads with context switching ranging between 5000 and 11000 with a
> thread creation rate of about 300 thread/sec ( relaying 600000 msg/hour ).
> No problem at all with the system even if the load avg is a bit high
> ( about 8 ).

-- 
Ognen Duzlevski
Plant Biotechnology Institute
National Research Council of Canada
Bioinformatics team


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-12 18:58 ` Christoph Hellwig
  2001-06-12 19:07   ` ognen
@ 2001-06-12 21:44   ` Davide Libenzi
  2001-06-12 21:48     ` ognen
  2001-06-12 21:58     ` Albert D. Cahalan
  1 sibling, 2 replies; 34+ messages in thread
From: Davide Libenzi @ 2001-06-12 21:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-kernel, ognen


On 12-Jun-2001 Christoph Hellwig wrote:
> In article <Pine.LNX.4.30.0106121213570.24593-100000@gene.pbi.nrc.ca> you
> wrote:
>> On dual-CPU machines the speedups are as follows: my version
>> is 1.88 faster than the sequential one on IRIX, 1.81 times on Solaris,
>> 1.8 times on OSF/1, 1.43 times on Linux 2.2.x and 1.52 times on Linux 2.4
>> kernel. Why are the numbers on Linux machines so much lower?
> 
> Does your measurement include the time needed to actually create the
> threads or do you even frequently create and destroy threads?

This is an extract of the most busy vmstat report running under his tool :

12  0  0  15508  40980  24880 355480   0   0     0     0  141   481 100   0   0
19  0  0  15508  40248  24880 355480   0   0     0     0  142   564 100   0   0
12  0  0  15508  40112  24880 355480   0   0     0     0  150   543 100   0   0
11  0  0  15508  41272  24880 355480   0   0     0     0  156   594  99   1   0
17  0  0  15508  40408  24880 355480   0   0     0     0  156   474  99   1   0
17  0  0  15508  39840  24880 355480   0   0     0     0  135   475 100   0   0
21  0  0  15508  39568  24880 355480   0   0     0     0  125   409 100   0   0
21  0  0  15508  39668  24880 355480   0   0     0     0  135   420 100   0   0
16  0  0  15508  39760  24880 355480   0   0     0     0  149   486 100   0   0


The context switch is very low and the user CPU utilization is 100% , I don't
think it's system responsibility here ( clearly a CPU bound program ).
Even if the runqueue is long, the context switch is low.
I've just close to me a dual PIII 1GHz workstation that run an MTA that uses
linux pthreads with context switching ranging between 5000 and 11000 with a
thread creation rate of about 300 thread/sec ( relaying 600000 msg/hour ).
No problem at all with the system even if the load avg is a bit high
( about 8 ).




- Davide


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-12 19:15     ` Kip Macy
@ 2001-06-12 19:29       ` Christoph Hellwig
  0 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2001-06-12 19:29 UTC (permalink / raw)
  To: Kip Macy; +Cc: linux-kernel

In article <Pine.GSO.4.10.10106121214380.20809-100000@orbit-fe.eng.netapp.com> you wrote:
> For heavy threading, try a user-level threads package.

Sure, userlevel threading is the best way to get SMP-scalability...

-- 
Of course it doesn't work. We've performed a software upgrade.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-12 19:14   ` Alexander Viro
@ 2001-06-12 19:25     ` Russell Leighton
  2001-06-12 23:27       ` Mike Castle
  0 siblings, 1 reply; 34+ messages in thread
From: Russell Leighton @ 2001-06-12 19:25 UTC (permalink / raw)
  To: linux-kernel




Any recommendations for alternate threading packages?

Alexander Viro wrote:

> On Tue, 12 Jun 2001, Kip Macy wrote:
>
> > implementation of threads is not an accidental oversight, threads are not
> > looked upon favorably by most of the core linux kernel hackers. A quote
>
> s/threads/POSIX threads/.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
---------------------------------------------------
Russell Leighton    russell.leighton@247media.com
---------------------------------------------------



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-12 19:07   ` ognen
  2001-06-12 19:15     ` Kip Macy
@ 2001-06-12 19:15     ` Christoph Hellwig
  2001-06-13 12:20     ` Kurt Garloff
  2 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2001-06-12 19:15 UTC (permalink / raw)
  To: ognen; +Cc: linux-kernel

On Tue, Jun 12, 2001 at 01:07:11PM -0600, ognen@gene.pbi.nrc.ca wrote:
> Hello,
> 
> due to the nature of the problem (a pairwise mutual alignment of n
> sequences results in mx. n^2 alignments which can each be done in a
> separate thread), I need to create and destroy the threads frequently.
> 
> I am not really comfortable with 1.4 - 1.5 speedups since the solution was
> intended as a Linux one primarily and it just happenned that it works (and
> now even better) on Solaris/SGI/OSF...

If you havily create threads under load you're rather srewed.  If you want
to stay with the (IMHO rather suboptimal) posix threads API you might want
to take a look at the stuff IBM has produced:

	http://oss.software.ibm.com/developerworks/projects/pthreads/

Otherwise a simple wrapper for clone might be a _lot_ faster, but has it's
own disadvantages: no ready-to-use lcoking primitives, no cross-platform
support (ok, it should be portable to the FreeBSD rfork easily).

	Christoph

-- 
Of course it doesn't work. We've performed a software upgrade.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-12 19:07   ` ognen
@ 2001-06-12 19:15     ` Kip Macy
  2001-06-12 19:29       ` Christoph Hellwig
  2001-06-12 19:15     ` Christoph Hellwig
  2001-06-13 12:20     ` Kurt Garloff
  2 siblings, 1 reply; 34+ messages in thread
From: Kip Macy @ 2001-06-12 19:15 UTC (permalink / raw)
  To: ognen; +Cc: linux-kernel

For heavy threading, try a user-level threads package.

		-Kip


On Tue, 12 Jun 2001 ognen@gene.pbi.nrc.ca wrote:

> Hello,
> 
> due to the nature of the problem (a pairwise mutual alignment of n
> sequences results in mx. n^2 alignments which can each be done in a
> separate thread), I need to create and destroy the threads frequently.
> 
> I am not really comfortable with 1.4 - 1.5 speedups since the solution was
> intended as a Linux one primarily and it just happenned that it works (and
> now even better) on Solaris/SGI/OSF...
> 
> Best regards,
> Ognen
> 
> On Tue, 12 Jun 2001, Christoph Hellwig wrote:
> 
> > In article <Pine.LNX.4.30.0106121213570.24593-100000@gene.pbi.nrc.ca> you wrote:
> > > On dual-CPU machines the speedups are as follows: my version
> > > is 1.88 faster than the sequential one on IRIX, 1.81 times on Solaris,
> > > 1.8 times on OSF/1, 1.43 times on Linux 2.2.x and 1.52 times on Linux 2.4
> > > kernel. Why are the numbers on Linux machines so much lower?
> >
> > Does your measurement include the time needed to actually create the
> > threads or do you even frequently create and destroy threads?
> >
> > The code for creating threads is _horribly_ slow in Linuxthreads due
> > to the way it is implemented.
> >
> > > It is the
> > > same multi-threaded code, I am not using any tricks, the code basically
> > > uses PTHREAD_CREATE_DETACHED and PTHREAD_SCOPE_SYSTEM and the thread stack
> > > size is set to 8K (but the numbers are the same with larger/smaller stack
> > > sizes).
> > >
> > > Is there anything I am missing? Is this to be expected due to Linux way of
> > > handling threads (clone call)? I am just trying to explain the numbers and
> > > nothing else comes to mind....
> >
> > Linuxthreads is a rather bad pthreads implementation performance-wise,
> > mostly due to the rather different linux-native threading model.
> >
> > 	Christoph
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-12 19:06 ` Kip Macy
@ 2001-06-12 19:14   ` Alexander Viro
  2001-06-12 19:25     ` Russell Leighton
  2001-06-13 17:31   ` bert hubert
  2001-06-14 18:28   ` Alan Cox
  2 siblings, 1 reply; 34+ messages in thread
From: Alexander Viro @ 2001-06-12 19:14 UTC (permalink / raw)
  To: Kip Macy; +Cc: ognen, linux-kernel



On Tue, 12 Jun 2001, Kip Macy wrote:

> implementation of threads is not an accidental oversight, threads are not
> looked upon favorably by most of the core linux kernel hackers. A quote

s/threads/POSIX threads/.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-12 18:58 ` Christoph Hellwig
@ 2001-06-12 19:07   ` ognen
  2001-06-12 19:15     ` Kip Macy
                       ` (2 more replies)
  2001-06-12 21:44   ` Davide Libenzi
  1 sibling, 3 replies; 34+ messages in thread
From: ognen @ 2001-06-12 19:07 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-kernel

Hello,

due to the nature of the problem (a pairwise mutual alignment of n
sequences results in mx. n^2 alignments which can each be done in a
separate thread), I need to create and destroy the threads frequently.

I am not really comfortable with 1.4 - 1.5 speedups since the solution was
intended as a Linux one primarily and it just happenned that it works (and
now even better) on Solaris/SGI/OSF...

Best regards,
Ognen

On Tue, 12 Jun 2001, Christoph Hellwig wrote:

> In article <Pine.LNX.4.30.0106121213570.24593-100000@gene.pbi.nrc.ca> you wrote:
> > On dual-CPU machines the speedups are as follows: my version
> > is 1.88 faster than the sequential one on IRIX, 1.81 times on Solaris,
> > 1.8 times on OSF/1, 1.43 times on Linux 2.2.x and 1.52 times on Linux 2.4
> > kernel. Why are the numbers on Linux machines so much lower?
>
> Does your measurement include the time needed to actually create the
> threads or do you even frequently create and destroy threads?
>
> The code for creating threads is _horribly_ slow in Linuxthreads due
> to the way it is implemented.
>
> > It is the
> > same multi-threaded code, I am not using any tricks, the code basically
> > uses PTHREAD_CREATE_DETACHED and PTHREAD_SCOPE_SYSTEM and the thread stack
> > size is set to 8K (but the numbers are the same with larger/smaller stack
> > sizes).
> >
> > Is there anything I am missing? Is this to be expected due to Linux way of
> > handling threads (clone call)? I am just trying to explain the numbers and
> > nothing else comes to mind....
>
> Linuxthreads is a rather bad pthreads implementation performance-wise,
> mostly due to the rather different linux-native threading model.
>
> 	Christoph


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-12 18:24 ognen
  2001-06-12 18:39 ` Davide Libenzi
  2001-06-12 18:58 ` Christoph Hellwig
@ 2001-06-12 19:06 ` Kip Macy
  2001-06-12 19:14   ` Alexander Viro
                     ` (2 more replies)
  2001-06-12 22:41 ` Pavel Machek
  3 siblings, 3 replies; 34+ messages in thread
From: Kip Macy @ 2001-06-12 19:06 UTC (permalink / raw)
  To: ognen; +Cc: linux-kernel

This may sound like flamebait, but its not. Linux threads are basically
just processes that share the same address space. Their performance is
measurably worse than it is on most commercial Unixes and FreeBSD.
They are not, or at least two years ago, were not POSIX compliant
(they behaved badly with respect to signals). The impoverished
implementation of threads is not an accidental oversight, threads are not
looked upon favorably by most of the core linux kernel hackers. A quote
from Larry McVoy's home page attributed to Alan Cox illustrates this
reasonably well: "A computer is a state machine. Threads are for people
who can't program state machines." Sorry for not being more helpful.

		-Kip


On Tue, 12 Jun 2001 ognen@gene.pbi.nrc.ca wrote:

> Hello,
> 
> I am a summer student implementing a multi-threaded version of a very
> popular bioinformatics tool. So far it compiles and runs without problems
> (as far as I can tell ;) on Linux 2.2.x, Sun Solaris, SGI IRIX and Compaq
> OSF/1 running on Alpha. I have ran a lot of timing tests compared to the
> sequential version of the tool on all of these machines (most of them are
> dual-CPU, although I am also running tests on 12-CPU Solaris and 108 CPU
> SGI IRIX). On dual-CPU machines the speedups are as follows: my version
> is 1.88 faster than the sequential one on IRIX, 1.81 times on Solaris,
> 1.8 times on OSF/1, 1.43 times on Linux 2.2.x and 1.52 times on Linux 2.4
> kernel. Why are the numbers on Linux machines so much lower? It is the
> same multi-threaded code, I am not using any tricks, the code basically
> uses PTHREAD_CREATE_DETACHED and PTHREAD_SCOPE_SYSTEM and the thread stack
> size is set to 8K (but the numbers are the same with larger/smaller stack
> sizes).
> 
> Is there anything I am missing? Is this to be expected due to Linux way of
> handling threads (clone call)? I am just trying to explain the numbers and
> nothing else comes to mind....
> 
> Best regards,
> Ognen Duzlevski
> -- 
> ognen@gene.pbi.nrc.ca
> Plant Biotechnology Institute
> National Research Council of Canada
> Bioinformatics team
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: threading question
  2001-06-12 18:24 ognen
  2001-06-12 18:39 ` Davide Libenzi
@ 2001-06-12 18:58 ` Christoph Hellwig
  2001-06-12 19:07   ` ognen
  2001-06-12 21:44   ` Davide Libenzi
  2001-06-12 19:06 ` Kip Macy
  2001-06-12 22:41 ` Pavel Machek
  3 siblings, 2 replies; 34+ messages in thread
From: Christoph Hellwig @ 2001-06-12 18:58 UTC (permalink / raw)
  To: ognen; +Cc: linux-kernel

In article <Pine.LNX.4.30.0106121213570.24593-100000@gene.pbi.nrc.ca> you wrote:
> On dual-CPU machines the speedups are as follows: my version
> is 1.88 faster than the sequential one on IRIX, 1.81 times on Solaris,
> 1.8 times on OSF/1, 1.43 times on Linux 2.2.x and 1.52 times on Linux 2.4
> kernel. Why are the numbers on Linux machines so much lower?

Does your measurement include the time needed to actually create the
threads or do you even frequently create and destroy threads?

The code for creating threads is _horribly_ slow in Linuxthreads due
to the way it is implemented.

> It is the
> same multi-threaded code, I am not using any tricks, the code basically
> uses PTHREAD_CREATE_DETACHED and PTHREAD_SCOPE_SYSTEM and the thread stack
> size is set to 8K (but the numbers are the same with larger/smaller stack
> sizes).
>
> Is there anything I am missing? Is this to be expected due to Linux way of
> handling threads (clone call)? I am just trying to explain the numbers and
> nothing else comes to mind....

Linuxthreads is a rather bad pthreads implementation performance-wise,
mostly due to the rather different linux-native threading model.

	Christoph

-- 
Of course it doesn't work. We've performed a software upgrade.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: threading question
  2001-06-12 18:24 ognen
@ 2001-06-12 18:39 ` Davide Libenzi
  2001-06-12 18:58 ` Christoph Hellwig
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 34+ messages in thread
From: Davide Libenzi @ 2001-06-12 18:39 UTC (permalink / raw)
  To: ognen; +Cc: linux-kernel


On 12-Jun-2001 ognen@gene.pbi.nrc.ca wrote:
> Hello,
> 
> I am a summer student implementing a multi-threaded version of a very
> popular bioinformatics tool. So far it compiles and runs without problems
> (as far as I can tell ;) on Linux 2.2.x, Sun Solaris, SGI IRIX and Compaq
> OSF/1 running on Alpha. I have ran a lot of timing tests compared to the
> sequential version of the tool on all of these machines (most of them are
> dual-CPU, although I am also running tests on 12-CPU Solaris and 108 CPU
> SGI IRIX). On dual-CPU machines the speedups are as follows: my version
> is 1.88 faster than the sequential one on IRIX, 1.81 times on Solaris,
> 1.8 times on OSF/1, 1.43 times on Linux 2.2.x and 1.52 times on Linux 2.4
> kernel. Why are the numbers on Linux machines so much lower? It is the
> same multi-threaded code, I am not using any tricks, the code basically
> uses PTHREAD_CREATE_DETACHED and PTHREAD_SCOPE_SYSTEM and the thread stack
> size is set to 8K (but the numbers are the same with larger/smaller stack
> sizes).
> 
> Is there anything I am missing? Is this to be expected due to Linux way of
> handling threads (clone call)? I am just trying to explain the numbers and
> nothing else comes to mind....

How is your  vmstat  while your tool is running ?



- Davide


^ permalink raw reply	[flat|nested] 34+ messages in thread

* threading question
@ 2001-06-12 18:24 ognen
  2001-06-12 18:39 ` Davide Libenzi
                   ` (3 more replies)
  0 siblings, 4 replies; 34+ messages in thread
From: ognen @ 2001-06-12 18:24 UTC (permalink / raw)
  To: linux-kernel

Hello,

I am a summer student implementing a multi-threaded version of a very
popular bioinformatics tool. So far it compiles and runs without problems
(as far as I can tell ;) on Linux 2.2.x, Sun Solaris, SGI IRIX and Compaq
OSF/1 running on Alpha. I have ran a lot of timing tests compared to the
sequential version of the tool on all of these machines (most of them are
dual-CPU, although I am also running tests on 12-CPU Solaris and 108 CPU
SGI IRIX). On dual-CPU machines the speedups are as follows: my version
is 1.88 faster than the sequential one on IRIX, 1.81 times on Solaris,
1.8 times on OSF/1, 1.43 times on Linux 2.2.x and 1.52 times on Linux 2.4
kernel. Why are the numbers on Linux machines so much lower? It is the
same multi-threaded code, I am not using any tricks, the code basically
uses PTHREAD_CREATE_DETACHED and PTHREAD_SCOPE_SYSTEM and the thread stack
size is set to 8K (but the numbers are the same with larger/smaller stack
sizes).

Is there anything I am missing? Is this to be expected due to Linux way of
handling threads (clone call)? I am just trying to explain the numbers and
nothing else comes to mind....

Best regards,
Ognen Duzlevski
-- 
ognen@gene.pbi.nrc.ca
Plant Biotechnology Institute
National Research Council of Canada
Bioinformatics team


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2001-06-16 22:13 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-06-13 19:05 threading question Hubertus Franke
     [not found] <fa.f6da6av.agod3u@ifi.uio.no>
     [not found] ` <fa.e54jbkv.kg4r99@ifi.uio.no>
2001-06-16 22:22   ` Dan Maas
  -- strict thread matches above, loose matches on Subject: below --
2001-06-12 18:24 ognen
2001-06-12 18:39 ` Davide Libenzi
2001-06-12 18:58 ` Christoph Hellwig
2001-06-12 19:07   ` ognen
2001-06-12 19:15     ` Kip Macy
2001-06-12 19:29       ` Christoph Hellwig
2001-06-12 19:15     ` Christoph Hellwig
2001-06-13 12:20     ` Kurt Garloff
2001-06-13 13:35       ` J . A . Magallon
2001-06-13 14:17         ` Philips
2001-06-13 15:06           ` ognen
2001-06-12 21:44   ` Davide Libenzi
2001-06-12 21:48     ` ognen
2001-06-14 18:15       ` Alan Cox
2001-06-12 21:58     ` Albert D. Cahalan
2001-06-12 23:48       ` J . A . Magallon
2001-06-12 19:06 ` Kip Macy
2001-06-12 19:14   ` Alexander Viro
2001-06-12 19:25     ` Russell Leighton
2001-06-12 23:27       ` Mike Castle
2001-06-13 17:31   ` bert hubert
2001-06-14  6:45     ` Helge Hafting
2001-06-14 18:28   ` Alan Cox
2001-06-14 19:01     ` bert hubert
2001-06-14 19:22       ` Russell Leighton
2001-06-15 11:29       ` Anil Kumar
2001-06-14 23:05     ` J . A . Magallon
2001-06-16 14:16     ` Michael Rothwell
2001-06-16 15:19       ` Alan Cox
2001-06-16 18:33         ` Russell Leighton
2001-06-16 19:06         ` Michael Rothwell
2001-06-12 22:41 ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).