All of lore.kernel.org
 help / color / mirror / Atom feed
* Developing multi-threading applications
@ 2002-06-13  8:13 Roberto Fichera
  2002-06-13  8:26 ` David Schwartz
  0 siblings, 1 reply; 19+ messages in thread
From: Roberto Fichera @ 2002-06-13  8:13 UTC (permalink / raw)
  To: linux-kernel

Hi All,

I'm designing a multithreding application with many threads,
from ~100 to 300/400. I need to take some decisions about
which threading library use, and which patch I need for the
kernel to improve the scheduler performances. The machines
will be a SMP Xeon with 4/8 processors with 4Gb RAM.
All threads are almost computational intensive and the library
need a fast interprocess comunication and syncronization
because there are many sync & async threads time
dependent and/or critical. I'm planning, in the future, to distribuite
all the threads in a pool of SMP box.

Thanks in advance.

Roberto Fichera.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Developing multi-threading applications
  2002-06-13  8:13 Developing multi-threading applications Roberto Fichera
@ 2002-06-13  8:26 ` David Schwartz
  2002-06-13  9:08   ` Roberto Fichera
  0 siblings, 1 reply; 19+ messages in thread
From: David Schwartz @ 2002-06-13  8:26 UTC (permalink / raw)
  To: kernel, linux-kernel


On Thu, 13 Jun 2002 10:13:35 +0200, Roberto Fichera wrote:

>I'm designing a multithreding application with many threads,
>from ~100 to 300/400. I need to take some decisions about
>which threading library use, and which patch I need for the
>kernel to improve the scheduler performances. The machines
>will be a SMP Xeon with 4/8 processors with 4Gb RAM.
>All threads are almost computational intensive and the library
>need a fast interprocess comunication and syncronization
>because there are many sync & async threads time
>dependent and/or critical. I'm planning, in the future, to distribuite
>all the threads in a pool of SMP box.

	With 4/8 processors, you don't want to create 100-400 threads doing 
computation intensive tasks. So redesign things so that the number of threads 
you create is more in line with the number of CPUs you have available. That 
is, use a 'thread per CPU' (or slightly more threads than their are CPUs per 
node) approach and you'll perform a lot better. Distribute the available work 
over the available threads.

	DS



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Developing multi-threading applications
  2002-06-13  8:26 ` David Schwartz
@ 2002-06-13  9:08   ` Roberto Fichera
  2002-06-13  9:44     ` Peter Wächtler
  2002-06-13 10:13     ` David Schwartz
  0 siblings, 2 replies; 19+ messages in thread
From: Roberto Fichera @ 2002-06-13  9:08 UTC (permalink / raw)
  To: David Schwartz; +Cc: linux-kernel

At 01.26 13/06/02 -0700, you wrote:

>On Thu, 13 Jun 2002 10:13:35 +0200, Roberto Fichera wrote:
>
> >I'm designing a multithreding application with many threads,
> >from ~100 to 300/400. I need to take some decisions about
> >which threading library use, and which patch I need for the
> >kernel to improve the scheduler performances. The machines
> >will be a SMP Xeon with 4/8 processors with 4Gb RAM.
> >All threads are almost computational intensive and the library
> >need a fast interprocess comunication and syncronization
> >because there are many sync & async threads time
> >dependent and/or critical. I'm planning, in the future, to distribuite
> >all the threads in a pool of SMP box.
>
>         With 4/8 processors, you don't want to create 100-400 threads doing
>computation intensive tasks. So redesign things so that the number of threads
>you create is more in line with the number of CPUs you have available. That
>is, use a 'thread per CPU' (or slightly more threads than their are CPUs per
>node) approach and you'll perform a lot better. Distribute the available work
>over the available threads.

You are right! But "computational intensive" is not totaly right as I say ;-),
because most of thread are waiting for I/O, after I/O are performed the
computational intensive tasks, finished its work all the result are sent
to thread-father, the father collect all the child's result and perform some
computational work and send its result to its father and so on with many
thread-father controlling other child. So I think the main problem/overhead
is thread creation and the thread's numbers.


>         DS

Roberto Fichera.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Developing multi-threading applications
  2002-06-13  9:08   ` Roberto Fichera
@ 2002-06-13  9:44     ` Peter Wächtler
  2002-06-13  9:52       ` Roberto Fichera
  2002-06-13 10:13     ` David Schwartz
  1 sibling, 1 reply; 19+ messages in thread
From: Peter Wächtler @ 2002-06-13  9:44 UTC (permalink / raw)
  To: Roberto Fichera; +Cc: David Schwartz, linux-kernel

Roberto Fichera wrote:
> At 01.26 13/06/02 -0700, you wrote:
> 
>> On Thu, 13 Jun 2002 10:13:35 +0200, Roberto Fichera wrote:
>>
>> >I'm designing a multithreding application with many threads,
>> >from ~100 to 300/400. I need to take some decisions about
>> >which threading library use, and which patch I need for the
>> >kernel to improve the scheduler performances. The machines
>> >will be a SMP Xeon with 4/8 processors with 4Gb RAM.
>> >All threads are almost computational intensive and the library
>> >need a fast interprocess comunication and syncronization
>> >because there are many sync & async threads time
>> >dependent and/or critical. I'm planning, in the future, to distribuite
>> >all the threads in a pool of SMP box.
>>
>>         With 4/8 processors, you don't want to create 100-400 threads 
>> doing
>> computation intensive tasks. So redesign things so that the number of 
>> threads
>> you create is more in line with the number of CPUs you have available. 
>> That
>> is, use a 'thread per CPU' (or slightly more threads than their are 
>> CPUs per
>> node) approach and you'll perform a lot better. Distribute the 
>> available work
>> over the available threads.
> 
> 
> You are right! But "computational intensive" is not totaly right as I 
> say ;-),
> because most of thread are waiting for I/O, after I/O are performed the
> computational intensive tasks, finished its work all the result are sent
> to thread-father, the father collect all the child's result and perform 
> some
> computational work and send its result to its father and so on with many
> thread-father controlling other child. So I think the main problem/overhead
> is thread creation and the thread's numbers.
> 

Have a look at http://www-124.ibm.com/developerworks/opensource/pthreads/

they provide M:N threading model where threads can live in userspace.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Developing multi-threading applications
  2002-06-13  9:44     ` Peter Wächtler
@ 2002-06-13  9:52       ` Roberto Fichera
  2002-06-13 10:16         ` Peter Wächtler
  0 siblings, 1 reply; 19+ messages in thread
From: Roberto Fichera @ 2002-06-13  9:52 UTC (permalink / raw)
  To: Peter Wächtler; +Cc: linux-kernel

At 11.44 13/06/02 +0200, Peter Wächtler wrote:

>>You are right! But "computational intensive" is not totaly right as I say 
>>;-),
>>because most of thread are waiting for I/O, after I/O are performed the
>>computational intensive tasks, finished its work all the result are sent
>>to thread-father, the father collect all the child's result and perform some
>>computational work and send its result to its father and so on with many
>>thread-father controlling other child. So I think the main problem/overhead
>>is thread creation and the thread's numbers.
>
>Have a look at http://www-124.ibm.com/developerworks/opensource/pthreads/
>
>they provide M:N threading model where threads can live in userspace.

Yes! I'm looking for it. But I want evaluate some other before.

Roberto Fichera.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Developing multi-threading applications
  2002-06-13  9:08   ` Roberto Fichera
  2002-06-13  9:44     ` Peter Wächtler
@ 2002-06-13 10:13     ` David Schwartz
  2002-06-13 11:21       ` Roberto Fichera
  1 sibling, 1 reply; 19+ messages in thread
From: David Schwartz @ 2002-06-13 10:13 UTC (permalink / raw)
  To: kernel; +Cc: linux-kernel


On Thu, 13 Jun 2002 11:08:27 +0200, Roberto Fichera wrote:
>You are right! But "computational intensive" is not totaly right as I say ;-
>),

	It's really not fair to change the premises in the middle of an argument.

>because most of thread are waiting for I/O,

	Still wrong. You don't tie up threads waiting for I/O. You can wait without 
having a thread doing the waiting.

>after I/O are performed the
>computational intensive tasks, finished its work all the result are sent
>to thread-father,

	Okay, so you need a new abstraction -- separate the waiting from the 
working. Create as many threads to do the work as you have processors to do 
the work on. As for the waiting, minimize threads waiting, they're pure 
overhead. If it's sockets, use 'poll' so one thread can do lots of waiting.

>the father collect all the child's result and perform some
>computational work and send its result to its father and so on with many
>thread-father controlling other child. So I think the main problem/overhead
>is thread creation and the thread's numbers.

	So get rid of the problem! Don't create so many threads, create only as many 
threads as can do useful work and reuse them rather than destroying and 
recreating them. Solve the actual problem/overhead since it's totally 
artificial and due to your model rather than your problem!

	DS



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Developing multi-threading applications
  2002-06-13  9:52       ` Roberto Fichera
@ 2002-06-13 10:16         ` Peter Wächtler
  2002-06-13 10:42           ` Roberto Fichera
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Wächtler @ 2002-06-13 10:16 UTC (permalink / raw)
  To: Roberto Fichera; +Cc: linux-kernel

Roberto Fichera wrote:
> At 11.44 13/06/02 +0200, Peter Wächtler wrote:
> 
>>> You are right! But "computational intensive" is not totaly right as I 
>>> say ;-),
>>> because most of thread are waiting for I/O, after I/O are performed the
>>> computational intensive tasks, finished its work all the result are sent
>>> to thread-father, the father collect all the child's result and 
>>> perform some
>>> computational work and send its result to its father and so on with many
>>> thread-father controlling other child. So I think the main 
>>> problem/overhead
>>> is thread creation and the thread's numbers.
>>
>>
>> Have a look at http://www-124.ibm.com/developerworks/opensource/pthreads/
>>
>> they provide M:N threading model where threads can live in userspace.
> 
> 
> Yes! I'm looking for it. But I want evaluate some other before.
> 

There is a paper rse-pmt.ps included in the tar archives from Ralf Engelschall
(author of GNU portable threads).

There you will find lots of interesting pointers to other thread packages.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Developing multi-threading applications
  2002-06-13 10:16         ` Peter Wächtler
@ 2002-06-13 10:42           ` Roberto Fichera
  0 siblings, 0 replies; 19+ messages in thread
From: Roberto Fichera @ 2002-06-13 10:42 UTC (permalink / raw)
  To: Peter Wächtler; +Cc: linux-kernel

At 12.16 13/06/02 +0200, Peter Wächtler wrote:
>Roberto Fichera wrote:
>>At 11.44 13/06/02 +0200, Peter Wächtler wrote:
>>
>>>>You are right! But "computational intensive" is not totaly right as I 
>>>>say ;-),
>>>>because most of thread are waiting for I/O, after I/O are performed the
>>>>computational intensive tasks, finished its work all the result are sent
>>>>to thread-father, the father collect all the child's result and perform 
>>>>some
>>>>computational work and send its result to its father and so on with many
>>>>thread-father controlling other child. So I think the main problem/overhead
>>>>is thread creation and the thread's numbers.
>>>
>>>
>>>Have a look at http://www-124.ibm.com/developerworks/opensource/pthreads/
>>>
>>>they provide M:N threading model where threads can live in userspace.
>>
>>Yes! I'm looking for it. But I want evaluate some other before.

And I don't want use a library that's totally in userspace.


>There is a paper rse-pmt.ps included in the tar archives from Ralf Engelschall
>(author of GNU portable threads).
>
>There you will find lots of interesting pointers to other thread packages.

I'll take a look. Thanks!



>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/

Roberto Fichera.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Developing multi-threading applications
  2002-06-13 10:13     ` David Schwartz
@ 2002-06-13 11:21       ` Roberto Fichera
  2002-06-13 11:58         ` David Schwartz
  0 siblings, 1 reply; 19+ messages in thread
From: Roberto Fichera @ 2002-06-13 11:21 UTC (permalink / raw)
  To: David Schwartz; +Cc: linux-kernel

At 03.13 13/06/02 -0700, you wrote:


>On Thu, 13 Jun 2002 11:08:27 +0200, Roberto Fichera wrote:
> >You are right! But "computational intensive" is not totaly right as I say ;-
> >),
>
>         It's really not fair to change the premises in the middle of an 
> argument.

Sorry ;-)!


> >because most of thread are waiting for I/O,
>
>         Still wrong. You don't tie up threads waiting for I/O. You can 
> wait without
>having a thread doing the waiting.
>
> >after I/O are performed the
> >computational intensive tasks, finished its work all the result are sent
> >to thread-father,
>
>         Okay, so you need a new abstraction -- separate the waiting from the
>working. Create as many threads to do the work as you have processors to do
>the work on. As for the waiting, minimize threads waiting, they're pure
>overhead. If it's sockets, use 'poll' so one thread can do lots of waiting.

This's a possible solution.

> >the father collect all the child's result and perform some
> >computational work and send its result to its father and so on with many
> >thread-father controlling other child. So I think the main problem/overhead
> >is thread creation and the thread's numbers.
>
>         So get rid of the problem! Don't create so many threads, create 
> only as many
>threads as can do useful work and reuse them rather than destroying and
>recreating them. Solve the actual problem/overhead since it's totally
>artificial and due to your model rather than your problem!

Depending by the applications. With my simulation/emulation program I need 
to create
many thread because each thread resolve/manage/compute a specific problem and
it's live depend by some factors. Each thread is create only if needed to 
avoid the
overhead. The simulation/emulation is a "merge" of many and many object, 
each object
work to resolve/manage/compute a specific problem. All the low objects are 
grouped to
resolve a specific problem and are managed by a thread controller that 
should take some
decision or doing some work. Some thread controller are grouped and managed 
by another
thread controller and so on. Do not think that I need always 400 threads 
active they are
create only if need by the controller. You must thinks this 
simulation/emulation as collection
of many and many object that should interoperate, and the model is designed 
to scale easily
on a distribuite environment.


>         DS

Roberto Fichera.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Developing multi-threading applications
  2002-06-13 11:21       ` Roberto Fichera
@ 2002-06-13 11:58         ` David Schwartz
  2002-06-13 16:26           ` Roberto Fichera
  0 siblings, 1 reply; 19+ messages in thread
From: David Schwartz @ 2002-06-13 11:58 UTC (permalink / raw)
  To: kernel; +Cc: linux-kernel


>Depending by the applications. With my simulation/emulation program I need
>to create
>many thread because each thread resolve/manage/compute a specific problem 
and
>it's live depend by some factors. Each thread is create only if needed to
>avoid the
>overhead. The simulation/emulation is a "merge" of many and many object,
>each object
>work to resolve/manage/compute a specific problem. All the low objects are
>grouped to
>resolve a specific problem and are managed by a thread controller that
>should take some
>decision or doing some work. Some thread controller are grouped and managed
>by another
>thread controller and so on. Do not think that I need always 400 threads
>active they are
>create only if need by the controller. You must thinks this
>simulation/emulation as collection
>of many and many object that should interoperate, and the model is designed
>to scale easily
>on a distribuite environment.

	If it's a simulation, you don't *really* need the threads, you just need to 
be able to act as if you had them. After all, what are you simulating if what 
work gets done when is up to the random vagaries of the OS scheduler?

	If it's a real application wanting real performance, the suggestions I made 
stand -- you don't want many more threads working than you have CPUs and you 
don't want a lot of threads sitting around waiting for work (and thus forcing 
bazillions of extra context switches).

	It sounds to me like your design is broken, needlessly mapping threads to 
I/Os that are being waited for one-to-one. This is a common error among 
programmers who consciously or subconsciously have accepted the 'more threads 
can do more work' philosophy.

	What you need to do is take whatever it is you're thinking of as a 'thread' 
right now, which I'd roughly define as 'one logical task, from start to 
completion' and realize that there is absolutely no reason to map this 
one-to-one to actual pthreads threads and every reason in the world not to.

	This will conserve resources (12 thread stacks instead of 300, 12 KSEs 
instead of 300), reduce context switches (context switches will only occur 
when there's no work to do at all or a thread uses up its entire timeslice 
rather than every time we change which client/task we're doing work for/on), 
improve scheduler efficiency (because the number of ready threads will not 
exceed the number of CPUs by much) and more often than not, clean up a lot of 
ugliness in your architecture (because threads are probably being used 
instead of a sane abstraction for 'work to be done' or 'a client I'm doing 
work for').

	DS



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Developing multi-threading applications
  2002-06-13 11:58         ` David Schwartz
@ 2002-06-13 16:26           ` Roberto Fichera
  2002-06-14 20:56             ` David Schwartz
  0 siblings, 1 reply; 19+ messages in thread
From: Roberto Fichera @ 2002-06-13 16:26 UTC (permalink / raw)
  To: David Schwartz; +Cc: linux-kernel

At 04.58 13/06/02 -0700, David Schwartz wrote:

>         If it's a simulation, you don't *really* need the threads, you 
> just need to
>be able to act as if you had them. After all, what are you simulating if what
>work gets done when is up to the random vagaries of the OS scheduler?
>
>         If it's a real application wanting real performance, the 
> suggestions I made
>stand -- you don't want many more threads working than you have CPUs and you
>don't want a lot of threads sitting around waiting for work (and thus forcing
>bazillions of extra context switches).

This is a scheduler problem! All threads waiting for I/O are blocked by
the scheduler, and this doesn't have any impact for the context switches
it increase only the waitqueue, using the Ingo's O(1) scheduler, a big piece
of code, it should make a big difference for example.

>         It sounds to me like your design is broken, needlessly mapping 
> threads to
>I/Os that are being waited for one-to-one. This is a common error among
>programmers who consciously or subconsciously have accepted the 'more threads
>can do more work' philosophy.

I don't think "more threads == more work done"! With the thread's approch it's
possible to split a big sequential program in a variety of concurrent logical
programs with a big win for code revisions and new implementation.

>         What you need to do is take whatever it is you're thinking of as 
> a 'thread'
>right now, which I'd roughly define as 'one logical task, from start to
>completion' and realize that there is absolutely no reason to map this
>one-to-one to actual pthreads threads and every reason in the world not to.
>
>         This will conserve resources (12 thread stacks instead of 300, 12 
> KSEs
>instead of 300), reduce context switches (context switches will only occur
>when there's no work to do at all or a thread uses up its entire timeslice
>rather than every time we change which client/task we're doing work for/on),
>improve scheduler efficiency (because the number of ready threads will not
>exceed the number of CPUs by much) and more often than not, clean up a lot of
>ugliness in your architecture (because threads are probably being used
>instead of a sane abstraction for 'work to be done' or 'a client I'm doing
>work for').

You are right! But depend by the application! If you have todo I/O like 
signal acquisition,
sensors acquisitions and so on, you must have a one thread for each type of 
data acquisition,
you must have a thread that perform some data computation with a subset, 
for examples,
of this data, and generate the output that could be a new input for an 
other thread.
This make the environment more realistic. I agree with you that if we 
increase the thread's
numbers the system could collapse (= context switches become expensive = we 
must increase
the CPU numbers or new box is required or new approch should be make).


Roberto Fichera.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Developing multi-threading applications
  2002-06-13 16:26           ` Roberto Fichera
@ 2002-06-14 20:56             ` David Schwartz
  2002-06-15  9:01               ` Roberto Fichera
  0 siblings, 1 reply; 19+ messages in thread
From: David Schwartz @ 2002-06-14 20:56 UTC (permalink / raw)
  To: kernel; +Cc: linux-kernel


On Thu, 13 Jun 2002 18:26:54 +0200, Roberto Fichera wrote:
>At 04.58 13/06/02 -0700, David Schwartz wrote:

>This is a scheduler problem! All threads waiting for I/O are blocked by
>the scheduler, and this doesn't have any impact for the context switches
>it increase only the waitqueue, using the Ingo's O(1) scheduler, a big piece
>of code, it should make a big difference for example.

	You are incorrect. If you have ten threads each waiting for an I/O and all 
ten I/Os are ready, then ten context switches are needed. If you have one 
thread waiting for ten I/Os, and then I/Os come ready, one context switch is 
needed.

[snip]

>I don't think "more threads == more work done"! With the thread's approch
>it's
>possible to split a big sequential program in a variety of concurrent 
>logical
>programs with a big win for code revisions and new implementation.

	I'm not advising eliminating the threads approach. I'm only advising not 
using threads as your abstraction for clients or work to be done. Use threads 
as the execution vehicles that pick up work when there's work to be done. 
(Think thread pools, think separating I/O from computation.)

[snip]
>You are right! But depend by the application! If you have todo I/O like
>signal acquisition,
>sensors acquisitions and so on, you must have a one thread for each type of
>data acquisition,

	Even if that's true, and it's often not, how many different types of data 
acquisition can you have? Ten? Twenty? That's a far cry from 300.

	DS



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Developing multi-threading applications
  2002-06-14 20:56             ` David Schwartz
@ 2002-06-15  9:01               ` Roberto Fichera
  2002-06-15 10:30                 ` Ingo Oeser
  0 siblings, 1 reply; 19+ messages in thread
From: Roberto Fichera @ 2002-06-15  9:01 UTC (permalink / raw)
  To: David Schwartz; +Cc: linux-kernel

At 13.56 14/06/02 -0700, David Schwartz wrote:


>On Thu, 13 Jun 2002 18:26:54 +0200, Roberto Fichera wrote:
> >At 04.58 13/06/02 -0700, David Schwartz wrote:
>
> >This is a scheduler problem! All threads waiting for I/O are blocked by
> >the scheduler, and this doesn't have any impact for the context switches
> >it increase only the waitqueue, using the Ingo's O(1) scheduler, a big piece
> >of code, it should make a big difference for example.
>
>         You are incorrect. If you have ten threads each waiting for an 
> I/O and all
>ten I/Os are ready, then ten context switches are needed. If you have one
>thread waiting for ten I/Os, and then I/Os come ready, one context switch is
>needed.

You are right with this specific case, but always depending what kind of I/O
you must be done. Not all the case could be reduce to your logic, only a
specific case. It's a only "local" optimization.

>[snip]
>
> >I don't think "more threads == more work done"! With the thread's approch
> >it's
> >possible to split a big sequential program in a variety of concurrent
> >logical
> >programs with a big win for code revisions and new implementation.
>
>         I'm not advising eliminating the threads approach. I'm only 
> advising not
>using threads as your abstraction for clients or work to be done. Use threads
>as the execution vehicles that pick up work when there's work to be done.
>(Think thread pools, think separating I/O from computation.)

Yes! This is what I want!

>[snip]
> >You are right! But depend by the application! If you have todo I/O like
> >signal acquisition,
> >sensors acquisitions and so on, you must have a one thread for each type of
> >data acquisition,
>
>         Even if that's true, and it's often not, how many different types 
> of data
>acquisition can you have? Ten? Twenty? That's a far cry from 300.

Currently are 190! Always active are ~110! So thinking by separating I/O from
the computation we double the threads.

Roberto Fichera.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Developing multi-threading applications
  2002-06-15  9:01               ` Roberto Fichera
@ 2002-06-15 10:30                 ` Ingo Oeser
  2002-06-17  8:17                   ` Roberto Fichera
  0 siblings, 1 reply; 19+ messages in thread
From: Ingo Oeser @ 2002-06-15 10:30 UTC (permalink / raw)
  To: Roberto Fichera; +Cc: David Schwartz, linux-kernel

On Sat, Jun 15, 2002 at 11:01:44AM +0200, Roberto Fichera wrote:
> >         Even if that's true, and it's often not, how many different types 
> > of data
> >acquisition can you have? Ten? Twenty? That's a far cry from 300.
> 
> Currently are 190! Always active are ~110! So thinking by separating I/O from
> the computation we double the threads.

So basically you are just traversing your data depedency graph
wrongly. Do a level order traversion if it is a dependency forest
or an breadth first traversion if not.

If this node require IO -> schedule the IO and return back to the upper
level noticing it, that you like to be woken, if the IO is
finished.

If this node require Computation -> do it, if this CPU is the one with
lowest load, else schedule it for the CPU with lowest load.

Continue with next node.

(load is meant "number of compuations with same metric scheduled
on this thread")

Use only one thread per CPU. Try to make the IO-Waiting as unique
as possible (poll would be perfect).


So this is all doable, once you analyze your data dependency
graph properly and make the simulation data driven (which it
usally is).

Regards

Ingo Oeser
-- 
Science is what we can tell a computer. Art is everything else. --- D.E.Knuth

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Developing multi-threading applications
  2002-06-15 10:30                 ` Ingo Oeser
@ 2002-06-17  8:17                   ` Roberto Fichera
  2002-06-17 16:07                     ` Marco Colombo
  0 siblings, 1 reply; 19+ messages in thread
From: Roberto Fichera @ 2002-06-17  8:17 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: David Schwartz, linux-kernel

At 12.30 15/06/02 +0200, Ingo Oeser wrote:

>On Sat, Jun 15, 2002 at 11:01:44AM +0200, Roberto Fichera wrote:
> > >         Even if that's true, and it's often not, how many different 
> types
> > > of data
> > >acquisition can you have? Ten? Twenty? That's a far cry from 300.
> >
> > Currently are 190! Always active are ~110! So thinking by separating 
> I/O from
> > the computation we double the threads.
>
>So basically you are just traversing your data depedency graph
>wrongly. Do a level order traversion if it is a dependency forest
>or an breadth first traversion if not.

Ok! I've semplified too much ;-)!

>If this node require IO -> schedule the IO and return back to the upper
>level noticing it, that you like to be woken, if the IO is
>finished.
>
>If this node require Computation -> do it, if this CPU is the one with
>lowest load, else schedule it for the CPU with lowest load.

How can I do it ? Shouldn't be a kernel problem ? I could collect
a various patch around that implement a CPU process bind/affinity and
CPU load balance but how can I determine which CPU have the lowest
load in a given time ?

>Continue with next node.
>
>(load is meant "number of compuations with same metric scheduled
>on this thread")
>
>Use only one thread per CPU. Try to make the IO-Waiting as unique
>as possible (poll would be perfect).

This could be implemented by the process affinity to bind the
process to a CPU. But I continue to not hunderstand why
I must have only one thread per CPU. There is some URL
where can I see some kernel/sched/vm/I-O/other-think graph about
this point ?


>So this is all doable, once you analyze your data dependency
>graph properly and make the simulation data driven (which it
>usally is).
>
>Regards
>
>Ingo Oeser
>--
>Science is what we can tell a computer. Art is everything else. --- D.E.Knuth

Roberto Fichera.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Developing multi-threading applications
  2002-06-17  8:17                   ` Roberto Fichera
@ 2002-06-17 16:07                     ` Marco Colombo
  2002-06-17 18:00                       ` Roberto Fichera
  2002-06-17 18:55                       ` Jakob Oestergaard
  0 siblings, 2 replies; 19+ messages in thread
From: Marco Colombo @ 2002-06-17 16:07 UTC (permalink / raw)
  To: Roberto Fichera; +Cc: Ingo Oeser, David Schwartz, linux-kernel

On Mon, 17 Jun 2002, Roberto Fichera wrote:

[...]
> process to a CPU. But I continue to not hunderstand why
> I must have only one thread per CPU. There is some URL
> where can I see some kernel/sched/vm/I-O/other-think graph about
> this point ?

To put it simply, because you have only one PC per CPU. It's not
really an OS thing.

Every time you're saving the PC (and SP, and all the "thread context")
you're "emulating" more CPUs on just one. And what you got is just...
an emulation. A Thread is an execution abstraction, and a CPU is an
execution actor. Sounds sensible to match the two. Use functions instead
to group instructions by their (functional) meaning.

It makes much more sense, on 4-ways system, to have 4 rather complex
threads that are able to execute different functions, like in
a data-driven or event-driven model, than to run 400 simpler threads
which implement one function each, IMHO.

.TM.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Developing multi-threading applications
  2002-06-17 16:07                     ` Marco Colombo
@ 2002-06-17 18:00                       ` Roberto Fichera
  2002-06-17 18:55                       ` Jakob Oestergaard
  1 sibling, 0 replies; 19+ messages in thread
From: Roberto Fichera @ 2002-06-17 18:00 UTC (permalink / raw)
  To: Marco Colombo; +Cc: Ingo Oeser, David Schwartz, linux-kernel

At 18.07 17/06/02 +0200, Marco Colombo wrote:

>On Mon, 17 Jun 2002, Roberto Fichera wrote:
>
>[...]
> > process to a CPU. But I continue to not hunderstand why
> > I must have only one thread per CPU. There is some URL
> > where can I see some kernel/sched/vm/I-O/other-think graph about
> > this point ?
>
>To put it simply, because you have only one PC per CPU. It's not
>really an OS thing.
>
>Every time you're saving the PC (and SP, and all the "thread context")
>you're "emulating" more CPUs on just one. And what you got is just...
>an emulation. A Thread is an execution abstraction, and a CPU is an
>execution actor. Sounds sensible to match the two. Use functions instead
>to group instructions by their (functional) meaning.

Yes! I know ;-)!

>It makes much more sense, on 4-ways system, to have 4 rather complex
>threads that are able to execute different functions, like in
>a data-driven or event-driven model, than to run 400 simpler threads
>which implement one function each, IMHO.

To make it simple, I'll try the 2 solutions!


>.TM.

Roberto Fichera.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Developing multi-threading applications
  2002-06-17 16:07                     ` Marco Colombo
  2002-06-17 18:00                       ` Roberto Fichera
@ 2002-06-17 18:55                       ` Jakob Oestergaard
  1 sibling, 0 replies; 19+ messages in thread
From: Jakob Oestergaard @ 2002-06-17 18:55 UTC (permalink / raw)
  To: Marco Colombo; +Cc: Roberto Fichera, Ingo Oeser, David Schwartz, linux-kernel

On Mon, Jun 17, 2002 at 06:07:51PM +0200, Marco Colombo wrote:
> On Mon, 17 Jun 2002, Roberto Fichera wrote:
> 
> [...]
> > process to a CPU. But I continue to not hunderstand why
> > I must have only one thread per CPU. There is some URL
> > where can I see some kernel/sched/vm/I-O/other-think graph about
> > this point ?
> 
> To put it simply, because you have only one PC per CPU. It's not
> really an OS thing.
> 
> Every time you're saving the PC (and SP, and all the "thread context")
> you're "emulating" more CPUs on just one. And what you got is just...
> an emulation. A Thread is an execution abstraction, and a CPU is an
> execution actor. Sounds sensible to match the two. Use functions instead
> to group instructions by their (functional) meaning.

It is common to use many threads per processor on some operating
systems. But this is (in my experience) because of the lack of proper
non-blocking APIs on said OS.

You can emulate non-blocking APIs with threads and a blocking API. And
on some systems you simply have to.

On GNU/Linux this is not generally a problem.  And as Marco said, you
really shouldn't have to do that.

-- 
................................................................
:   jakob@unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob Østergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Developing multi-threading applications
       [not found] <20020613113158.I22429@nightmaster.csn.tu-chemnitz.de>
@ 2002-06-13 10:25 ` Roberto Fichera
  0 siblings, 0 replies; 19+ messages in thread
From: Roberto Fichera @ 2002-06-13 10:25 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: linux-kernel

At 11.31 13/06/02 +0200, Ingo Oeser wrote:

>On Thu, Jun 13, 2002 at 11:08:27AM +0200, Roberto Fichera wrote:
> > You are right! But "computational intensive" is not totaly right as I 
> say ;-),
> > because most of thread are waiting for I/O, after I/O are performed the
> > computational intensive tasks, finished its work all the result are sent
> > to thread-father, the father collect all the child's result and perform 
> some
> > computational work and send its result to its father and so on with many
> > thread-father controlling other child. So I think the main problem/overhead
> > is thread creation and the thread's numbers.
>
>So you are creating a simulation/emulation application/engine, right?
>Or a measured data analysis engine? (which is basically the same
>task)

Yes! It's a simulation/emulation application.

>For these kind of tasks creating your own kind of "threads" is
>probably better.
>
>Split it in the following data structure:
>
>struct my_thread {
>    actor_function_t actor;
>    input_t inbuf;
>    output_t outbuf;
>    state_t statebuf;
>}
>
>And provide rules and primitives for accessing inbuf/outbuf, if
>they might be shared (which is probable).

This can be a solution.


>Now you can build a dependency tree/graph for the whole stuff
>easily and schedule works of the same level to some real worker
>threads (which might be on different machines), which are one per CPU.
>
>The problem is to build the actor as a REAL primitive, that
>scales only by the size of inbuf and not by the contents of it.

Yes!

>Everything else is going to be bloated and not really scalable,
>but can be implemented by every "Joe Programmer" after finishing
>high school ;-)

Depending by the threading library, if it's totaly userspace or not!
With so many thread that aren't totaly userspace the scheduler
performances/caratteristics are much important. I prefer a mixed
solution for example. Because some problem can be easily resolved
with a userspace threads and other not.


>Regards
>
>Ingo Oeser
>--
>Science is what we can tell a computer. Art is everything else. --- D.E.Knuth

Roberto Fichera.


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2002-06-17 18:55 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-06-13  8:13 Developing multi-threading applications Roberto Fichera
2002-06-13  8:26 ` David Schwartz
2002-06-13  9:08   ` Roberto Fichera
2002-06-13  9:44     ` Peter Wächtler
2002-06-13  9:52       ` Roberto Fichera
2002-06-13 10:16         ` Peter Wächtler
2002-06-13 10:42           ` Roberto Fichera
2002-06-13 10:13     ` David Schwartz
2002-06-13 11:21       ` Roberto Fichera
2002-06-13 11:58         ` David Schwartz
2002-06-13 16:26           ` Roberto Fichera
2002-06-14 20:56             ` David Schwartz
2002-06-15  9:01               ` Roberto Fichera
2002-06-15 10:30                 ` Ingo Oeser
2002-06-17  8:17                   ` Roberto Fichera
2002-06-17 16:07                     ` Marco Colombo
2002-06-17 18:00                       ` Roberto Fichera
2002-06-17 18:55                       ` Jakob Oestergaard
     [not found] <20020613113158.I22429@nightmaster.csn.tu-chemnitz.de>
2002-06-13 10:25 ` Roberto Fichera

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.