linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
@ 2002-09-22 18:55 Peter Waechtler
  2002-09-22 21:32 ` Larry McVoy
                   ` (2 more replies)
  0 siblings, 3 replies; 60+ messages in thread
From: Peter Waechtler @ 2002-09-22 18:55 UTC (permalink / raw)
  To: linux-kernel; +Cc: ingo Molnar

 > The true cost of M:N shows up when threading is actually used
 > for what it's intended to be used :-)

 > M:N's big mistakes is that it concentrates on what
 > matters the least: useruser context switches.

Well, from the perspective of the kernel, userspace is a black box.
Is that also true for kernel developers?

If you, as an application engineer, decide to use a multithreaded
design, it could be a) you want to learn or b) have some good
reasons to choose that.

Having multiple threads doing real work including IO means more
blocking IO and therefore more context switches. One reason to
choose threading is to _not_ have to use select/poll in app code.
If you gather more IO requests and multiplex them with select/poll
the chances are higher that the syscall returns without context
switch. Therefore you _save_ some real context switches with
useruser context switches.

Don't make the mistake to think too much about the optimal case.
(as Linus told us: optimize for the _common_ case :)

You think that one should have an almost equal number of threads
and processors. This is unrealistic despite some server apps
running on +4(8?) way systems. With this assumption nobody would
write a multithreaded desktop app (>90% are UP).

The effect of M:N on UP systems should be even more clear. Your
multithreaded apps can't profit of parallelism but they do not
add load to the system scheduler. The drawback: more syscalls
(I think about removing the need for
flags=fcntl(GETFLAGS);fcntl(fd,NONBLOCK);write(fd);fcntl(fd,flags))

Until we have some numbers we can't say which approach is better.
I'm convinced that apps exist that run better on one and others
on the other.

AIX and Irix deploy M:N - I guess for a good reason: it's more
flexible and combine both approaches with easy runtime tuning if
the app happens to run on SMP (the uncommon case).

Your great work at the scheduler and tuning on exit are highly
appreciated. Both models profit - of course 1:1 much more.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-22 18:55 [ANNOUNCE] Native POSIX Thread Library 0.1 Peter Waechtler
@ 2002-09-22 21:32 ` Larry McVoy
  2002-09-23 10:05   ` Bill Davidsen
  2002-09-23 21:03 ` Bill Huey
  2002-09-24 20:19 ` [ANNOUNCE] Native POSIX Thread Library 0.1 David Schwartz
  2 siblings, 1 reply; 60+ messages in thread
From: Larry McVoy @ 2002-09-22 21:32 UTC (permalink / raw)
  To: Peter Waechtler; +Cc: linux-kernel, ingo Molnar

On Sun, Sep 22, 2002 at 08:55:39PM +0200, Peter Waechtler wrote:
> AIX and Irix deploy M:N - I guess for a good reason: it's more
> flexible and combine both approaches with easy runtime tuning if
> the app happens to run on SMP (the uncommon case).

No, AIX and IRIX do it that way because their processes are so bloated
that it would be unthinkable to do a 1:1 model.

Instead of taking the traditional "we've screwed up the normal system 
primitives so we'll event new lightweight ones" try this:

We depend on the system primitives to not be broken or slow.

If that's a true statement, and in Linux it tends to be far more true
than other operating systems, then there is no reason to have M:N.
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-22 21:32 ` Larry McVoy
@ 2002-09-23 10:05   ` Bill Davidsen
  2002-09-23 11:55     ` Peter Waechtler
                       ` (2 more replies)
  0 siblings, 3 replies; 60+ messages in thread
From: Bill Davidsen @ 2002-09-23 10:05 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Peter Waechtler, linux-kernel, ingo Molnar

On Sun, 22 Sep 2002, Larry McVoy wrote:

> On Sun, Sep 22, 2002 at 08:55:39PM +0200, Peter Waechtler wrote:
> > AIX and Irix deploy M:N - I guess for a good reason: it's more
> > flexible and combine both approaches with easy runtime tuning if
> > the app happens to run on SMP (the uncommon case).
> 
> No, AIX and IRIX do it that way because their processes are so bloated
> that it would be unthinkable to do a 1:1 model.

And BSD? And Solaris?
 
> Instead of taking the traditional "we've screwed up the normal system 
> primitives so we'll event new lightweight ones" try this:
> 
> We depend on the system primitives to not be broken or slow.
> 
> If that's a true statement, and in Linux it tends to be far more true
> than other operating systems, then there is no reason to have M:N.

No matter how fast you do context switch in and out of kernel and a sched
to see what runs next, it can't be done as fast as it can be avoided.
Being N:M doesn't mean all implementations must be faster, just that doing
it all in user mode CAN be faster.

Benchmarks are nice, I await results from a loaded production threaded
DNS/mail/web/news/database server. Well, I guess production and 2.5 don't
really go together, do they, but maybe some experimental site which could
use 2.5 long enough to get numbers. If you could get a threaded database
to run, that would be a good test of shared resources rather than a bunch
of independent activities doing i/o. 

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 10:05   ` Bill Davidsen
@ 2002-09-23 11:55     ` Peter Waechtler
  2002-09-23 19:14       ` Bill Davidsen
  2002-09-23 15:30     ` Larry McVoy
  2002-09-23 21:22     ` Bill Huey
  2 siblings, 1 reply; 60+ messages in thread
From: Peter Waechtler @ 2002-09-23 11:55 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Larry McVoy, linux-kernel, ingo Molnar

Am Montag den, 23. September 2002, um 12:05, schrieb Bill Davidsen:

> On Sun, 22 Sep 2002, Larry McVoy wrote:
>
>> On Sun, Sep 22, 2002 at 08:55:39PM +0200, Peter Waechtler wrote:
>>> AIX and Irix deploy M:N - I guess for a good reason: it's more
>>> flexible and combine both approaches with easy runtime tuning if
>>> the app happens to run on SMP (the uncommon case).
>>
>> No, AIX and IRIX do it that way because their processes are so bloated
>> that it would be unthinkable to do a 1:1 model.
>
> And BSD? And Solaris?

Don't know. I don't have access to all those Unices. I could try FreeBSD.

According to http://www.kegel.com/c10k.html  Sun is moving to 1:1
and FreeBSD still believes in M:N

MacOSX 10.1 does not support PROCESS_SHARED locks, tried that 5 minutes 
ago.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 10:05   ` Bill Davidsen
  2002-09-23 11:55     ` Peter Waechtler
@ 2002-09-23 15:30     ` Larry McVoy
  2002-09-23 19:44       ` Olivier Galibert
                         ` (5 more replies)
  2002-09-23 21:22     ` Bill Huey
  2 siblings, 6 replies; 60+ messages in thread
From: Larry McVoy @ 2002-09-23 15:30 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Larry McVoy, Peter Waechtler, linux-kernel, ingo Molnar

> > Instead of taking the traditional "we've screwed up the normal system 
> > primitives so we'll event new lightweight ones" try this:
> > 
> > We depend on the system primitives to not be broken or slow.
> > 
> > If that's a true statement, and in Linux it tends to be far more true
> > than other operating systems, then there is no reason to have M:N.
> 
> No matter how fast you do context switch in and out of kernel and a sched
> to see what runs next, it can't be done as fast as it can be avoided.

You are arguing about how many angels can dance on the head of a pin.
Sure, there are lotso benchmarks which show how fast user level threads
can context switch amongst each other and it is always faster than going
into the kernel.  So what?  What do you think causes a context switch in
a threaded program?  What?  Could it be blocking on I/O?  Like 99.999%
of the time?  And doesn't that mean you already went into the kernel to
see if the I/O was ready?  And doesn't that mean that in all the real
world applications they are already doing all the work you are arguing
to avoid?
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 11:55     ` Peter Waechtler
@ 2002-09-23 19:14       ` Bill Davidsen
  2002-09-29 23:26         ` Buddy Lumpkin
  0 siblings, 1 reply; 60+ messages in thread
From: Bill Davidsen @ 2002-09-23 19:14 UTC (permalink / raw)
  To: Peter Waechtler; +Cc: Larry McVoy, linux-kernel, ingo Molnar

On Mon, 23 Sep 2002, Peter Waechtler wrote:

> Am Montag den, 23. September 2002, um 12:05, schrieb Bill Davidsen:
> 
> > On Sun, 22 Sep 2002, Larry McVoy wrote:
> >
> >> On Sun, Sep 22, 2002 at 08:55:39PM +0200, Peter Waechtler wrote:
> >>> AIX and Irix deploy M:N - I guess for a good reason: it's more
> >>> flexible and combine both approaches with easy runtime tuning if
> >>> the app happens to run on SMP (the uncommon case).
> >>
> >> No, AIX and IRIX do it that way because their processes are so bloated
> >> that it would be unthinkable to do a 1:1 model.
> >
> > And BSD? And Solaris?
> 
> Don't know. I don't have access to all those Unices. I could try FreeBSD.

At your convenience.
 
> According to http://www.kegel.com/c10k.html  Sun is moving to 1:1
> and FreeBSD still believes in M:N

Sun is total news to me, "moving to" may be in Solaris 9, Sol8 seems to
still be N:M. BSD is as I thought.
> 
> MacOSX 10.1 does not support PROCESS_SHARED locks, tried that 5 minutes 
> ago.

Thank you for the effort. Hum, that's a bit of a surprise, at least to me. 

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 15:30     ` Larry McVoy
@ 2002-09-23 19:44       ` Olivier Galibert
  2002-09-23 19:48       ` Bill Davidsen
                         ` (4 subsequent siblings)
  5 siblings, 0 replies; 60+ messages in thread
From: Olivier Galibert @ 2002-09-23 19:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: Larry McVoy, Bill Davidsen, Larry McVoy, Peter Waechtler, ingo Molnar

On Mon, Sep 23, 2002 at 08:30:04AM -0700, Larry McVoy wrote:
> What do you think causes a context switch in
> a threaded program?  What?  Could it be blocking on I/O?  Like 99.999%
> of the time?  And doesn't that mean you already went into the kernel to
> see if the I/O was ready?  And doesn't that mean that in all the real
> world applications they are already doing all the work you are arguing
> to avoid?

I suspect a fair number of cases is preemption too, when you fire up
computation threads in the background.  Of course, the preemption
event always goes through the kernel at some point, even if it's only
a SIGALARM.

Actually, in normal programs (even java ones), _when_ is a thread
voluntarily giving up control?  Locks?

  OG.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 15:30     ` Larry McVoy
  2002-09-23 19:44       ` Olivier Galibert
@ 2002-09-23 19:48       ` Bill Davidsen
  2002-09-23 20:32         ` Ingo Molnar
  2002-09-23 22:35         ` Mark Mielke
  2002-09-23 19:59       ` Peter Waechtler
                         ` (3 subsequent siblings)
  5 siblings, 2 replies; 60+ messages in thread
From: Bill Davidsen @ 2002-09-23 19:48 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Peter Waechtler, linux-kernel, ingo Molnar

On Mon, 23 Sep 2002, Larry McVoy wrote:

> > No matter how fast you do context switch in and out of kernel and a sched
> > to see what runs next, it can't be done as fast as it can be avoided.
> 
> You are arguing about how many angels can dance on the head of a pin.

Than you have sadly misunderstood the discussion.

> Sure, there are lotso benchmarks which show how fast user level threads
> can context switch amongst each other and it is always faster than going
> into the kernel.  So what?  What do you think causes a context switch in
> a threaded program?  What?  Could it be blocking on I/O?  Like 99.999%
> of the time?  And doesn't that mean you already went into the kernel to
> see if the I/O was ready?  And doesn't that mean that in all the real
> world applications they are already doing all the work you are arguing
> to avoid?

Actually you have it just backward. Let me try to explain how this works.
The programs which benefit from N:M are exactly those which don't behave
the way you describe. Think of programs using locking to access shared
memory, or other fast resources which don't require a visit to the kernel.
It would seem that the switch could be done much faster without the
transition into and out of the kernel.

Looking for data before forming an opinion has always seemed to be
reasonable, and the way design decisions are usually made in Linux, based
on the performance of actual code. The benchmark numbers reports are
encouraging, but actual production loads may not show the huge improvement
seen in the benchmarks. And I don't think anyone is implying that they
will.

Given how small the overhead of threading is on a typical i/o bound
application such as you mentioned, I'm not sure the improvement will be
above the noise. The major improvement from NGPT is not performance in
many cases, but elimination of unexpected application behaviour.

When someone responds to a technical question with an attack on the
question instead of a technical response I always wonder why. In this case
other people have provided technical feedback and I'm sure we will see
some actual application numbers in a short time. I have an IPC benchmark
I'd like to try if I could get any of my test servers to boot a recent
kernel :-(

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 15:30     ` Larry McVoy
  2002-09-23 19:44       ` Olivier Galibert
  2002-09-23 19:48       ` Bill Davidsen
@ 2002-09-23 19:59       ` Peter Waechtler
  2002-09-23 20:36         ` Ingo Molnar
  2002-09-23 21:32       ` Bill Huey
                         ` (2 subsequent siblings)
  5 siblings, 1 reply; 60+ messages in thread
From: Peter Waechtler @ 2002-09-23 19:59 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Bill Davidsen, linux-kernel, ingo Molnar


Am Montag den, 23. September 2002, um 17:30, schrieb Larry McVoy:

>>> Instead of taking the traditional "we've screwed up the normal system
>>> primitives so we'll event new lightweight ones" try this:
>>>
>>> We depend on the system primitives to not be broken or slow.
>>>
>>> If that's a true statement, and in Linux it tends to be far more true
>>> than other operating systems, then there is no reason to have M:N.
>>
>> No matter how fast you do context switch in and out of kernel and a 
>> sched
>> to see what runs next, it can't be done as fast as it can be avoided.
>
> You are arguing about how many angels can dance on the head of a pin.
> Sure, there are lotso benchmarks which show how fast user level threads
> can context switch amongst each other and it is always faster than going
> into the kernel.  So what?  What do you think causes a context switch in
> a threaded program?  What?  Could it be blocking on I/O?  Like 99.999%
> of the time?  And doesn't that mean you already went into the kernel to
> see if the I/O was ready?  And doesn't that mean that in all the real
> world applications they are already doing all the work you are arguing
> to avoid?

Getting into kernel is not the same as a context switch.
Return EAGAIN or EWOULDBLOCK is definetly _not_ causing a context switch.

Is sys_getpid() causing a context switch? Unlikely
Do you know what blocking IO means?  M:N is about to avoid blocking IO!


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 19:48       ` Bill Davidsen
@ 2002-09-23 20:32         ` Ingo Molnar
  2002-09-24  0:03           ` Andy Isaacson
  2002-09-24  7:12           ` Thunder from the hill
  2002-09-23 22:35         ` Mark Mielke
  1 sibling, 2 replies; 60+ messages in thread
From: Ingo Molnar @ 2002-09-23 20:32 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Larry McVoy, Peter Waechtler, linux-kernel, Ingo Molnar


On Mon, 23 Sep 2002, Bill Davidsen wrote:

> The programs which benefit from N:M are exactly those which don't behave
> the way you describe. [...]

90% of the programs that matter behave exactly like Larry has described.
IO is the main source of blocking. Go and profile a busy webserver or
mailserver or database server yourself if you dont believe it.

> [...] Think of programs using locking to access shared memory, or other
> fast resources which don't require a visit to the kernel. [...]

oh - actually, such things are quite rare it turns out. And even if it
happens, the 1:1 model is handling this perfectly fine via futexes, as
long as the contention of the shared resource is light. Which it better be
...

any application with heavy contention over some global shared resource is
serializing itself already and has much bigger problems than that of the
threading model ... Its performance will be bad both under M:N and 1:1
models - think about it.

so a threading abstraction must concentrate on what really matters:  
performing actual useful tasks - most of those tasks involve the use of
some resource, block IO, network IO, user IO - each of them involve entry
into the kernel - at which point the 1:1 design fits much better.

(and all your followup arguments are void due to this basic
misunderstanding.)

	Ingo


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 19:59       ` Peter Waechtler
@ 2002-09-23 20:36         ` Ingo Molnar
  2002-09-23 21:08           ` Peter Wächtler
  2002-09-23 23:57           ` Andy Isaacson
  0 siblings, 2 replies; 60+ messages in thread
From: Ingo Molnar @ 2002-09-23 20:36 UTC (permalink / raw)
  To: Peter Waechtler; +Cc: Larry McVoy, Bill Davidsen, linux-kernel, ingo Molnar


On Mon, 23 Sep 2002, Peter Waechtler wrote:

> Getting into kernel is not the same as a context switch. Return EAGAIN
> or EWOULDBLOCK is definetly _not_ causing a context switch.

this is a common misunderstanding. When switching from thread to thread in
the 1:1 model, most of the cost comes from entering/exiting the kernel. So
*once* we are in the kernel the cheapest way is not to piggyback to
userspace to do some userspace context-switch - but to do it right in the
kernel.

in the kernel we can do much higher quality scheduling decisions than in
userspace. SMP affinity, various statistics are right available in
kernel-space - userspace does not have any of that. Not to talk about
preemption.

	Ingo


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-22 18:55 [ANNOUNCE] Native POSIX Thread Library 0.1 Peter Waechtler
  2002-09-22 21:32 ` Larry McVoy
@ 2002-09-23 21:03 ` Bill Huey
  2002-09-24 12:03   ` Michael Sinz
  2002-09-24 20:19 ` [ANNOUNCE] Native POSIX Thread Library 0.1 David Schwartz
  2 siblings, 1 reply; 60+ messages in thread
From: Bill Huey @ 2002-09-23 21:03 UTC (permalink / raw)
  To: Peter Waechtler; +Cc: linux-kernel, ingo Molnar, Bill Huey (Hui)

On Sun, Sep 22, 2002 at 08:55:39PM +0200, Peter Waechtler wrote:
> AIX and Irix deploy M:N - I guess for a good reason: it's more
> flexible and combine both approaches with easy runtime tuning if
> the app happens to run on SMP (the uncommon case).

Also, for process scoped scheduling in a way so that system wide threads
don't have an impact on a process slice. Folks have piped up about that
being important.

bill


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 20:36         ` Ingo Molnar
@ 2002-09-23 21:08           ` Peter Wächtler
  2002-09-23 22:44             ` Mark Mielke
  2002-09-23 23:57           ` Andy Isaacson
  1 sibling, 1 reply; 60+ messages in thread
From: Peter Wächtler @ 2002-09-23 21:08 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Larry McVoy, Bill Davidsen, linux-kernel

Ingo Molnar schrieb:
> 
> On Mon, 23 Sep 2002, Peter Waechtler wrote:
> 
> > Getting into kernel is not the same as a context switch. Return EAGAIN
> > or EWOULDBLOCK is definetly _not_ causing a context switch.
> 
> this is a common misunderstanding. When switching from thread to thread in
> the 1:1 model, most of the cost comes from entering/exiting the kernel. So
> *once* we are in the kernel the cheapest way is not to piggyback to
> userspace to do some userspace context-switch - but to do it right in the
> kernel.
> 
> in the kernel we can do much higher quality scheduling decisions than in
> userspace. SMP affinity, various statistics are right available in
> kernel-space - userspace does not have any of that. Not to talk about
> preemption.
> 

I'm already almost convinced on the NPT way of doing threading.
But still: the timeslice is per process (and kernel thread).
You still have other processes running.
With 1:1 on "hitting" a blocking condition the kernel will
switch to a different beast (yes, a thread gets a bonus for
using the same MM and the same cpu).
But on M:N the "user process" makes some more progress in its
timeslice (does it get even punished for eating up its 
timeslice?) I would think that it tends to cause less context
switches but tends to do more syscalls :-(

I already had a closer look at NGPT before reading Ulrich's
comments on the phil-list and on his website. I already thought
"puh, that's a complicated beast", and as I saw the
fcntl(GETFL);fcntl(O_NONBLOCK);write();fcntl(oldflags); thingy..

Well, with an O(1) scheduler, faster thread creation and exit
NPT has good chances to perform faster.

Now I'm just curious about the argument about context switch
times. Is Linux really that much faster than Solaris, Irix etc.?

Do you have numbers (or a hint) on comparable (ideal: identical) 
hardware? Is LMbench a good starting point?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 10:05   ` Bill Davidsen
  2002-09-23 11:55     ` Peter Waechtler
  2002-09-23 15:30     ` Larry McVoy
@ 2002-09-23 21:22     ` Bill Huey
  2 siblings, 0 replies; 60+ messages in thread
From: Bill Huey @ 2002-09-23 21:22 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Larry McVoy, Peter Waechtler, linux-kernel, ingo Molnar, Bill Huey (Hui)

On Mon, Sep 23, 2002 at 06:05:18AM -0400, Bill Davidsen wrote:
> And BSD? And Solaris?

Me and buddy of mine ran lmbench on NetBSD 1.6 (ppc) and a recent version
of Linux (same machine) and found that NetBSD was about 2x slower than
Linux for context switching. It's really not that bad considering that it
was worse at one point. It might effect things like inter-process pipe
communication performance, but it's not outside of reasonablility to use
a 1:1 system in that case.

BTW, NetBSD is moving to a scheduler activations threading system and
they have some preliminary stuff in the works and working. ;)

> > If that's a true statement, and in Linux it tends to be far more true
> > than other operating systems, then there is no reason to have M:N.
> 
> No matter how fast you do context switch in and out of kernel and a sched
> to see what runs next, it can't be done as fast as it can be avoided.
> Being N:M doesn't mean all implementations must be faster, just that doing
> it all in user mode CAN be faster.

Unless you have a broken architecture like the x86. The FPU in that case
can be problematic and folks where playing around with adding a syscall
to query the status of the FPU. They things might be more even, but...
this is also unclear as to how these variables are going to play out.

> Benchmarks are nice, I await results from a loaded production threaded
> DNS/mail/web/news/database server. Well, I guess production and 2.5 don't
> really go together, do they, but maybe some experimental site which could
> use 2.5 long enough to get numbers. If you could get a threaded database
> to run, that would be a good test of shared resources rather than a bunch
> of independent activities doing i/o. 

I think that would be interesting too.

bill


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 15:30     ` Larry McVoy
                         ` (2 preceding siblings ...)
  2002-09-23 19:59       ` Peter Waechtler
@ 2002-09-23 21:32       ` Bill Huey
  2002-09-23 21:41       ` dean gaudet
  2002-09-24 10:02       ` Nikita Danilov
  5 siblings, 0 replies; 60+ messages in thread
From: Bill Huey @ 2002-09-23 21:32 UTC (permalink / raw)
  To: Larry McVoy
  Cc: Larry McVoy, Bill Davidsen, Peter Waechtler, linux-kernel,
	ingo Molnar, Bill Huey (Hui)

On Mon, Sep 23, 2002 at 08:30:04AM -0700, Larry McVoy wrote:
> > No matter how fast you do context switch in and out of kernel and a sched
> > to see what runs next, it can't be done as fast as it can be avoided.
> 
> You are arguing about how many angels can dance on the head of a pin.
> Sure, there are lotso benchmarks which show how fast user level threads
> can context switch amongst each other and it is always faster than going
> into the kernel.  So what?  What do you think causes a context switch in
> a threaded program?  What?  Could it be blocking on I/O?  Like 99.999%

That's just for traditional Unix applications, which is only one category.
You exclude CPU intensive applications in that criticism, media related
and otherwise. What about cases where you need to balance a large data
structure across large number of threads or something like that ?

> of the time?  And doesn't that mean you already went into the kernel to
> see if the I/O was ready?  And doesn't that mean that in all the real
> world applications they are already doing all the work you are arguing
> to avoid?

IO isn't the only thing that's event driven. What about event driven
systems that depend on a fast condition-variable ? That's very cheap in
a UTS (userspace thread system), 2 context switches, a call to thread-kernel
to dequeue a waiter and releasing/aquiring some very light weight userspace
locks. And difficult to beat if you think about it.

So that level of confidence in 1:1 is a intuitively presumptuous for those
reasons.

But if you're architecture is broken or exotic...then it gets more complicated ;)

bill


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 15:30     ` Larry McVoy
                         ` (3 preceding siblings ...)
  2002-09-23 21:32       ` Bill Huey
@ 2002-09-23 21:41       ` dean gaudet
  2002-09-23 22:10         ` Bill Huey
  2002-09-23 22:56         ` Mark Mielke
  2002-09-24 10:02       ` Nikita Danilov
  5 siblings, 2 replies; 60+ messages in thread
From: dean gaudet @ 2002-09-23 21:41 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Bill Davidsen, Peter Waechtler, linux-kernel, ingo Molnar

On Mon, 23 Sep 2002, Larry McVoy wrote:

> What do you think causes a context switch in
> a threaded program?  What?  Could it be blocking on I/O?

unfortunately java was originally designed with a thread-per-connection
model as the *only* method of implementing servers.  there wasn't a
non-blocking network API ... and i hear that such an API is in the works,
but i've no idea where it is yet.

so while this is I/O, it's certainly less efficient to have thousands of
tasks blocked in read(2) versus having thousands of entries in <pick your
favourite poll/select/etc. mechanism>.

this is a java problem though... i posted a jvm straw-man proposal years
ago when IBM posted some "linux threading isn't efficient" paper.  since
java threads are way less painful to implement than pthreads, i suggested
the jvm do the M part of M:N.

-dean


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 21:41       ` dean gaudet
@ 2002-09-23 22:10         ` Bill Huey
  2002-09-23 22:56         ` Mark Mielke
  1 sibling, 0 replies; 60+ messages in thread
From: Bill Huey @ 2002-09-23 22:10 UTC (permalink / raw)
  To: dean gaudet
  Cc: Larry McVoy, Bill Davidsen, Peter Waechtler, linux-kernel,
	ingo Molnar, Bill Huey (Hui)

On Mon, Sep 23, 2002 at 02:41:33PM -0700, dean gaudet wrote:
> so while this is I/O, it's certainly less efficient to have thousands of
> tasks blocked in read(2) versus having thousands of entries in <pick your
> favourite poll/select/etc. mechanism>.

NIO in the recent 1.4 J2SE solves this problem now and threads don't have to
be abused.

bill


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 19:48       ` Bill Davidsen
  2002-09-23 20:32         ` Ingo Molnar
@ 2002-09-23 22:35         ` Mark Mielke
  1 sibling, 0 replies; 60+ messages in thread
From: Mark Mielke @ 2002-09-23 22:35 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Larry McVoy, Peter Waechtler, linux-kernel, ingo Molnar

On Mon, Sep 23, 2002 at 03:48:58PM -0400, Bill Davidsen wrote:
> On Mon, 23 Sep 2002, Larry McVoy wrote:
> > Sure, there are lotso benchmarks which show how fast user level threads
> > can context switch amongst each other and it is always faster than going
> > into the kernel.  So what?  What do you think causes a context switch in
> > a threaded program?  What?  Could it be blocking on I/O?  Like 99.999%
> > of the time?  And doesn't that mean you already went into the kernel to
> > see if the I/O was ready?  And doesn't that mean that in all the real
> > world applications they are already doing all the work you are arguing
> > to avoid?
> Actually you have it just backward. Let me try to explain how this works.
> The programs which benefit from N:M are exactly those which don't behave
> the way you describe. Think of programs using locking to access shared
> memory, or other fast resources which don't require a visit to the kernel.
> It would seem that the switch could be done much faster without the
> transition into and out of the kernel.

For operating systems that require cross-process locks to always be
kernel ops, yes. For operating systems that provide _any_ way for most
cross-process locks to be performed completely in user space (i.e. FUTEX),
the entire argument very quickly disappears.

Is there a situation you can think of that requires M:N threading
because accessing user space is cheaper than accessing kernel space? 
What this really means is that the design of the kernel space
primitives is not optimal, and that a potentially better solution that
would benefit more people by a great amount, would be to redesign the
kernel primitives. (i.e. FUTEX)

> Looking for data before forming an opinion has always seemed to be
> reasonable, and the way design decisions are usually made in Linux, based
> on the performance of actual code. The benchmark numbers reports are
> encouraging, but actual production loads may not show the huge improvement
> seen in the benchmarks. And I don't think anyone is implying that they
> will.

You say that people should look to data before forming an opinion, but
you also say that benchmarks mean little and you *suspect* real loads may
be different. It seems to me that you might take your own advice, and
use 'real data' before reaching your own conclusion.

> Given how small the overhead of threading is on a typical i/o bound
> application such as you mentioned, I'm not sure the improvement will be
> above the noise. The major improvement from NGPT is not performance in
> many cases, but elimination of unexpected application behaviour.

Many people would argue that threading overhead has been traditional quite
high. They would have 'real data' to substantiate their claims.

> When someone responds to a technical question with an attack on the
> question instead of a technical response I always wonder why. In this case
> other people have provided technical feedback and I'm sure we will see
> some actual application numbers in a short time. I have an IPC benchmark
> I'd like to try if I could get any of my test servers to boot a recent
> kernel :-(

I've always considered 1:1 to be an optimal model, but an unreachable
model, like cold fusion. :-)

If the kernel can manage the tasks such that they can be very quickly
switched betweens queues, and the run queue can be minimized to
contain only tasks that need to run, or that have a very high
probability of needing to run, and if operations such as locks can be
done, at least in the common case, completely in user space, there
is no reason why 1:1 could not be better than M:N.

There _are_ reasons why OS threads could be better than user space
threads, and the reasons all relate to threads that do actual work.

The line between 1:1 and M:N is artificially bold. M:N is a necessity
where 1:1 is inefficient. Where 1:1 is efficient, M:N ceases to be a
necessity.

mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 21:08           ` Peter Wächtler
@ 2002-09-23 22:44             ` Mark Mielke
  2002-09-23 23:01               ` Bill Huey
  0 siblings, 1 reply; 60+ messages in thread
From: Mark Mielke @ 2002-09-23 22:44 UTC (permalink / raw)
  To: Peter Wächtler; +Cc: Ingo Molnar, Larry McVoy, Bill Davidsen, linux-kernel

On Mon, Sep 23, 2002 at 11:08:53PM +0200, Peter Wächtler wrote:
> With 1:1 on "hitting" a blocking condition the kernel will
> switch to a different beast (yes, a thread gets a bonus for
> using the same MM and the same cpu).

> But on M:N the "user process" makes some more progress in its
> timeslice (does it get even punished for eating up its 
> timeslice?) I would think that it tends to cause less context
> switches but tends to do more syscalls :-(

Think of it this way... two threads are blocked on different resources...
The currently executing thread reaches a point where it blocks.

    OS threads:
        1) thread#1 invokes a system call
        2) OS switches tasks to thread#2 and returns from blocking

    user-space threads:
        1) thread#1 invokes a system call
        2) thread#1 returns from system call, EWOULDBLOCK
        3) thread#1 invokes poll(), select(), ioctl() to determine state
        4) thread#1 returns from system call
        5) thread#1 switches stack pointer to be thread#2 upon determination
           that the resource thread#2 was waiting on is ready.

Certainly the above descriptions are not fully accurate, or complete,
and it is possible that the M:N threading would make a fair compromise
between OS thread sand user-space threads, however, if user-space threads
requires all this extra work, and M:N threads requires some extra work,
some less work, and extra book keeping and system calls, why couldn't
OS threads by themselves be more efficient?

mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 21:41       ` dean gaudet
  2002-09-23 22:10         ` Bill Huey
@ 2002-09-23 22:56         ` Mark Mielke
  1 sibling, 0 replies; 60+ messages in thread
From: Mark Mielke @ 2002-09-23 22:56 UTC (permalink / raw)
  To: dean gaudet
  Cc: Larry McVoy, Bill Davidsen, Peter Waechtler, linux-kernel, ingo Molnar

On Mon, Sep 23, 2002 at 02:41:33PM -0700, dean gaudet wrote:
> so while this is I/O, it's certainly less efficient to have thousands of
> tasks blocked in read(2) versus having thousands of entries in <pick your
> favourite poll/select/etc. mechanism>.

In terms of kernel memory, perhaps. In terms of 'efficiency', I
wouldn't be so sure. Java uses a wack of user space storage to
represent threads regardless of whether they are green or native.  The
only difference is - is Java calling poll()/select()/ioctl()
routinely? Or are the tasks sitting in an efficient kernel task queue?

Which has a better chance of being more efficient, in terms of dispatching,
(especially considering that most of the time, most java threads are idle),
and which has a better chance of being more efficient in terms of the
overhead of querying whether a task is ready to run? I lean towards the OS
on both counts.

mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 22:44             ` Mark Mielke
@ 2002-09-23 23:01               ` Bill Huey
  2002-09-23 23:11                 ` Mark Mielke
  0 siblings, 1 reply; 60+ messages in thread
From: Bill Huey @ 2002-09-23 23:01 UTC (permalink / raw)
  To: Mark Mielke
  Cc: Peter W?chtler, Ingo Molnar, Larry McVoy, Bill Davidsen,
	linux-kernel, Bill Huey (Hui)

On Mon, Sep 23, 2002 at 06:44:23PM -0400, Mark Mielke wrote:
> Think of it this way... two threads are blocked on different resources...
> The currently executing thread reaches a point where it blocks.
> 
>     OS threads:
>         1) thread#1 invokes a system call
>         2) OS switches tasks to thread#2 and returns from blocking
> 
>     user-space threads:
>         1) thread#1 invokes a system call
>         2) thread#1 returns from system call, EWOULDBLOCK

>         3) thread#1 invokes poll(), select(), ioctl() to determine state
>         4) thread#1 returns from system call

More like the UTS blocks the thread and waits for an IO upcall to notify
the change of state in the kernel. It's equivalent to a single in overhead,
something like a SIGIO, or async IO notification.

Delete 3 and 4. It's certainly much faster than select() and family.

>         5) thread#1 switches stack pointer to be thread#2 upon determination
>            that the resource thread#2 was waiting on is ready.

Right, then marks it running and runs it.

> Certainly the above descriptions are not fully accurate, or complete,
> and it is possible that the M:N threading would make a fair compromise
> between OS thread sand user-space threads, however, if user-space threads
> requires all this extra work, and M:N threads requires some extra work,
> some less work, and extra book keeping and system calls, why couldn't
> OS threads by themselves be more efficient?

Crazy synchronization by non-web-server like applications. Who knows. I
personally can't think up really clear example at this time since I don't
do that kind of programming, but I'm sure concurrency experts can...

I'm just not one of those people.

bill


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 23:01               ` Bill Huey
@ 2002-09-23 23:11                 ` Mark Mielke
  2002-09-24  0:21                   ` Bill Huey
  0 siblings, 1 reply; 60+ messages in thread
From: Mark Mielke @ 2002-09-23 23:11 UTC (permalink / raw)
  To: Bill Huey
  Cc: Peter W?chtler, Ingo Molnar, Larry McVoy, Bill Davidsen, linux-kernel

On Mon, Sep 23, 2002 at 04:01:22PM -0700, Bill Huey wrote:
> On Mon, Sep 23, 2002 at 06:44:23PM -0400, Mark Mielke wrote:
> > Certainly the above descriptions are not fully accurate, or complete,
> > and it is possible that the M:N threading would make a fair compromise
> > between OS thread sand user-space threads, however, if user-space threads
> > requires all this extra work, and M:N threads requires some extra work,
> > some less work, and extra book keeping and system calls, why couldn't
> > OS threads by themselves be more efficient?
> Crazy synchronization by non-web-server like applications. Who knows. I
> personally can't think up really clear example at this time since I don't
> do that kind of programming, but I'm sure concurrency experts can...
> I'm just not one of those people.

I do not find it to be profitable to discourage the people working on
this project. If they fail, nobody loses. If they succeed, they can
re-invent the math behind threading, and Linux ends up on the forefront
of operating systems offering the technology.

As for 'crazy synchronization', solutions such as the FUTEX have no
real negative aspects. It wasn't long ago that the FUTEX did not
exist. Why couldn't innovation make 'crazy synchronization by
non-web-server like applications' more efficient using kernel threads?

Concurrency experts would welcome the change. Concurrent 'experts'
would not welcome the change, as it would force them to have to
re-learn everything they know, effectively obsoleting their 'expert'
status. (note the difference between the unquoted, and the quoted...)

Cheers, and good luck...
mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 20:36         ` Ingo Molnar
  2002-09-23 21:08           ` Peter Wächtler
@ 2002-09-23 23:57           ` Andy Isaacson
  2002-09-24  6:32             ` 1:1 threading vs. scheduler activations (was: Re: [ANNOUNCE] Native POSIX Thread Library 0.1) Ingo Molnar
  2002-09-24 18:10             ` [ANNOUNCE] Native POSIX Thread Library 0.1 Christoph Hellwig
  1 sibling, 2 replies; 60+ messages in thread
From: Andy Isaacson @ 2002-09-23 23:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Waechtler, Larry McVoy, Bill Davidsen, linux-kernel, ingo Molnar

I hate big CC lists like this, but I don't know that everyone will see
this if I don't keep the CC list.  Sigh.

On Mon, Sep 23, 2002 at 10:36:28PM +0200, Ingo Molnar wrote:
> On Mon, 23 Sep 2002, Peter Waechtler wrote:
> > Getting into kernel is not the same as a context switch. Return EAGAIN
> > or EWOULDBLOCK is definetly _not_ causing a context switch.
> 
> this is a common misunderstanding. When switching from thread to thread in
> the 1:1 model, most of the cost comes from entering/exiting the kernel. So
> *once* we are in the kernel the cheapest way is not to piggyback to
> userspace to do some userspace context-switch - but to do it right in the
> kernel.
> 
> in the kernel we can do much higher quality scheduling decisions than in
> userspace. SMP affinity, various statistics are right available in
> kernel-space - userspace does not have any of that. Not to talk about
> preemption.

Excellent points, Ingo.  An alternative that I haven't seen considered
is the M:N threading model that NetBSD is adopting, called Scheduler
Activations.  The paper makes excellent reading.

http://web.mit.edu/nathanw/www/usenix/freenix-sa/freenix-sa.html

One advantage of a SA-style system is that the kernel automatically and
very cleanly has a lot of information about the job as a single unit,
for purposes such as signal delivery, scheduling decisions, (and if it
came to that) paging/swapping.  The original Linus-dogma (as I
understood it -- I may well be misrepresenting things here) is that "a
thread is a process, and that's all there is to it".  This has a lovely
clarity, but it ignores the fact that there are times when it's
*important* that the kernel know that "these N threads belong to a
single job".  It appears that the NPTL work is creating a new
"collection-of-threads" object, which will fulfill the role I mention
above...  and this isn't a lot different from the end result of Nathan
Williams' SA work.

Another advantage of keeping a "process" concept is that things like CSA
(Compatible System Accounting, nee Cray System Accounting) need to add
some overhead to process startup/teardown.  If a "thread" can be created
without creating a new "process", this overhead is not needlessly
present at thread-startup time.

-andy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 20:32         ` Ingo Molnar
@ 2002-09-24  0:03           ` Andy Isaacson
  2002-09-24  0:10             ` Jeff Garzik
                               ` (2 more replies)
  2002-09-24  7:12           ` Thunder from the hill
  1 sibling, 3 replies; 60+ messages in thread
From: Andy Isaacson @ 2002-09-24  0:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Bill Davidsen, Larry McVoy, Peter Waechtler, linux-kernel, Ingo Molnar

On Mon, Sep 23, 2002 at 10:32:00PM +0200, Ingo Molnar wrote:
> On Mon, 23 Sep 2002, Bill Davidsen wrote:
> > The programs which benefit from N:M are exactly those which don't behave
> > the way you describe. [...]
> 
> 90% of the programs that matter behave exactly like Larry has described.
> IO is the main source of blocking. Go and profile a busy webserver or
> mailserver or database server yourself if you dont believe it.

There are heavily-threaded programs out there that do not behave this
way, and for which a N:M thread model is completely appropriate.  For
example, simulation codes in operations research are most naturally
implemented as one thread per object being simulated, with virtually no
IO outside the simulation.  The vast majority of the computation time in
such a simulation is spent doing small amounts of work local to the
thread, then sending small messages to another thread via a FIFO, then
going to sleep waiting for more work.

Of course this can be (and frequently is) implemented such that there is
not one Pthreads thread per object; given simulation environments with 1
million objects, and the current crappy state of Pthreads
implementations, the researchers have no choice.

-andy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24  0:03           ` Andy Isaacson
@ 2002-09-24  0:10             ` Jeff Garzik
  2002-09-24  0:14               ` Andy Isaacson
  2002-09-24  5:53             ` Ingo Molnar
  2002-09-24 20:34             ` David Schwartz
  2 siblings, 1 reply; 60+ messages in thread
From: Jeff Garzik @ 2002-09-24  0:10 UTC (permalink / raw)
  To: Andy Isaacson
  Cc: Ingo Molnar, Bill Davidsen, Larry McVoy, Peter Waechtler,
	linux-kernel, Ingo Molnar

Andy Isaacson wrote:
> Of course this can be (and frequently is) implemented such that there is
> not one Pthreads thread per object; given simulation environments with 1
> million objects, and the current crappy state of Pthreads
> implementations, the researchers have no choice.


Are these object threads mostly active or inactive?

Regardless, it seems obvious with today's hardware, that 1 million 
objects should never be one-thread-per-object, pthreads or no.  That's 
just lazy programming.

	Jeff




^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24  0:10             ` Jeff Garzik
@ 2002-09-24  0:14               ` Andy Isaacson
  0 siblings, 0 replies; 60+ messages in thread
From: Andy Isaacson @ 2002-09-24  0:14 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel

On Mon, Sep 23, 2002 at 08:10:24PM -0400, Jeff Garzik wrote:
> Andy Isaacson wrote:
> > Of course this can be (and frequently is) implemented such that there is
> > not one Pthreads thread per object; given simulation environments with 1
> > million objects, and the current crappy state of Pthreads
> > implementations, the researchers have no choice.
> 
> Are these object threads mostly active or inactive?

Mostly inactive (waiting on a semaphore or FIFO).

> Regardless, it seems obvious with today's hardware, that 1 million 
> objects should never be one-thread-per-object, pthreads or no.  That's 
> just lazy programming.

You can call it lazy if you want, but I call it natural.

(Of course I realize that practical considerations prevent users from
creating 1 million kernel threads, or even user threads, today.
Unfortunate, that.)

-andy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 23:11                 ` Mark Mielke
@ 2002-09-24  0:21                   ` Bill Huey
  2002-09-24  3:20                     ` Mark Mielke
  0 siblings, 1 reply; 60+ messages in thread
From: Bill Huey @ 2002-09-24  0:21 UTC (permalink / raw)
  To: Mark Mielke
  Cc: Peter W?chtler, Ingo Molnar, Larry McVoy, Bill Davidsen,
	linux-kernel, Bill Huey (Hui)

On Mon, Sep 23, 2002 at 07:11:32PM -0400, Mark Mielke wrote:
> I do not find it to be profitable to discourage the people working on
> this project. If they fail, nobody loses. If they succeed, they can
> re-invent the math behind threading, and Linux ends up on the forefront
> of operating systems offering the technology.

Math, unlikely. Performance issues, maybe. Overall kernel technology,
highly unlikely and bordering on preposterous claim.

This is forum, like anything else, is to propose a new infrastructure
for something that's very important to the function of this operating
system. For this project to succeed, it must address possible problems
that various folks bring up in examining what's been proposed or built.
That's the role of these discussions.

> As for 'crazy synchronization', solutions such as the FUTEX have no
> real negative aspects. It wasn't long ago that the FUTEX did not
> exist. Why couldn't innovation make 'crazy synchronization by
> non-web-server like applications' more efficient using kernel threads?

To be blunt, I don't believe it. It's out of a technical point of view
from my bias to a FreeBSD's scheduler activation threading and because
people are too easily dismissing M:N performance issues while reaching
conclusions about it that seem to be presumptuous.

The incorrect example where you outline what you think is a M:N call
conversion is (traditional async wrappers instead of upcalls), is something
that don't want to be a future technical strawman that folks create in
this community to attack M:N threading. It may very well still have
legitimacy in the same way that part of the performance of the JVM depends
on accessibilty to a thread's ucontext and run state, which seem to be
initial oversight (unknown reason) when this was originally conceived.

Those are kind of things are what I'm most worried about that eventually
hurt what application folks are on building on top of Linux and its
kernel facilities.

> Concurrency experts would welcome the change. Concurrent 'experts'
> would not welcome the change, as it would force them to have to
> re-learn everything they know, effectively obsoleting their 'expert'
> status. (note the difference between the unquoted, and the quoted...)

Well, what I mean by concurrency experts is there can be specialized
applications where people much become experts in concurrency to solve
difficult problem that might be know to this group at this time.
Dimissing that in the above paragraph doesn't negate that need.

The bottom line here is that ultimately the kernel is providing useable
primitive/terms for applications programmers. It's not the scenario
where kernel folks just build something that's conceptually awkward and
then it's up to applications people to work around bogus design problems
that result from that. So what I meant by folks that have applications
that might push the limits of what the current synchronization model
offers.

That's the core of my rant and it took quite a while to write up. ;)

bill


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24  0:21                   ` Bill Huey
@ 2002-09-24  3:20                     ` Mark Mielke
  0 siblings, 0 replies; 60+ messages in thread
From: Mark Mielke @ 2002-09-24  3:20 UTC (permalink / raw)
  To: Bill Huey
  Cc: Peter W?chtler, Ingo Molnar, Larry McVoy, Bill Davidsen, linux-kernel

On Mon, Sep 23, 2002 at 05:21:35PM -0700, Bill Huey wrote:
> ...
> The incorrect example where you outline what you think is a M:N call
> conversion is (traditional async wrappers instead of upcalls), is something
> that don't want to be a future technical strawman that folks create in
> this community to attack M:N threading. It may very well still have
> legitimacy in the same way that part of the performance of the JVM depends
> on accessibilty to a thread's ucontext and run state, which seem to be
> initial oversight (unknown reason) when this was originally conceived.
> Those are kind of things are what I'm most worried about that eventually
> hurt what application folks are on building on top of Linux and its
> kernel facilities.
> ...
> That's the core of my rant and it took quite a while to write up. ;)

My part in the rant (really somebody else's rant...) is that if kernel
threads can be made to out-perform current implementations of M:N
threading, then all that has really been proven is that current M:N
practices are not fully optimal. 1:1 in an N:N system is just one face
of M:N in an N:N system. A fully functional M:N system _may choose_ to
allow M to equal N.

Worst possibly cases that I expect to see from people experimenting
with this stuff and having a 1:1 system that out-performs commonly
available M:N systems: 1) The M:N people innovate, potentially using
the new technology made available from the 1:1 people, making a
_better_ M:N system 2) The 1:1 system is better, and people use it.

As long as they all use a POSIX, or other standard interface, there
isn't a problem.

If the changes to the kernel made by the 1:1 people are bad, they will
be stopped by Linus and many other people, probably including
yourself... :-)

In any case, I see the 1:1 vs. M:N as a distraction from the *actual*
enhancements being designed, which seem to be, support for cheaper
kernel threads, something that benefits both parties.

mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24  0:03           ` Andy Isaacson
  2002-09-24  0:10             ` Jeff Garzik
@ 2002-09-24  5:53             ` Ingo Molnar
  2002-09-24 20:34             ` David Schwartz
  2 siblings, 0 replies; 60+ messages in thread
From: Ingo Molnar @ 2002-09-24  5:53 UTC (permalink / raw)
  To: Andy Isaacson; +Cc: Larry McVoy, linux-kernel


On Mon, 23 Sep 2002, Andy Isaacson wrote:

> > 90% of the programs that matter behave exactly like Larry has described.
> > IO is the main source of blocking. Go and profile a busy webserver or
> > mailserver or database server yourself if you dont believe it.
> 
> There are heavily-threaded programs out there that do not behave this
> way, and for which a N:M thread model is completely appropriate. [...]

of course, that's the other 10%. [or even smaller.] I never claimed M:N
cannot be viable for certain specific applications. But a generic
threading library should rather concentrate on the common 90% of the
applications.

(obviously for simulations the absolute fastest implementation would be a
pure userspace state-machine, not a threaded application - M:N or 1:1.)

	Ingo


^ permalink raw reply	[flat|nested] 60+ messages in thread

* 1:1 threading vs. scheduler activations (was: Re: [ANNOUNCE] Native POSIX Thread Library 0.1)
  2002-09-23 23:57           ` Andy Isaacson
@ 2002-09-24  6:32             ` Ingo Molnar
  2002-09-25  3:08               ` Bill Huey
  2002-09-24 18:10             ` [ANNOUNCE] Native POSIX Thread Library 0.1 Christoph Hellwig
  1 sibling, 1 reply; 60+ messages in thread
From: Ingo Molnar @ 2002-09-24  6:32 UTC (permalink / raw)
  To: Andy Isaacson
  Cc: Larry McVoy, Peter Wächtler, Bill Davidsen, linux-kernel


On Mon, 23 Sep 2002, Andy Isaacson wrote:

> Excellent points, Ingo.  An alternative that I haven't seen considered
> is the M:N threading model that NetBSD is adopting, called Scheduler
> Activations. [...]

yes, SA's (and KSA's) are an interesting concept, but i personally think
they are way too much complexity - and history has shows that complexity
never leads to anything good, especially not in OS design.

Eg. SA's, like every M:N concept, must have a userspace component of the
scheduler, which gets very funny when you try to implement all the things
the kernel scheduler has had for years: fairness, SMP balancing, RT
scheduling (!), preemption and more.

Eg. 2.0.2 NGPT's current userspace scheduler is still cooperative - a
single userspace thread burning CPU cycles monopolizes the full context.
Obviously this can be fixed, but it gets nastier and nastier as you add
the features - for no good reason, the same functionality is already
present in the kernel's scheduler, which can generally do much better
scheduling decisions - it has direct and reliable access to various
statistics, it knows exactly how much CPU time has been used up. To give
all this information to the userspace scheduler takes alot of effort. I'm
no wimp when it comes to scheduler complexity, but a coupled kernel/user
scheduling concept scares the *hit out of me.

And then i havent mentioned things like upcall costs - what's the point in
upcalling userspace which then has to schedule, instead of doing this
stuff right in the kernel? Scheduler activations concentrate too much on
the 5% of cases that have more userspace<->userspace context switching
than some sort of kernel-provoked context switching. Sure, scheduler
activations can be done, but i cannot see how they can be any better than
'just as fast' as a 1:1 implementation - at a much higher complexity and
robustness cost.

the biggest part of Linux's kernel-space context switching is the cost of
kernel entry - and the cost of kernel entry gets cheaper with every new
generation of CPUs. Basing the whole threading design on the avoidance of
the kernel scheduler is like basing your tent on a glacier, in a hot
summer day.

Plus in an M:N model all the development toolchain suddenly has to
understand the new set of contexts, debuggers, tracers, everything.

Plus there are other issues like security - it's perfectly reasonable in
the 1:1 model for a certain set of server threads to drop all privileges
to do the more dangerous stuff. (while there is no such thing as absolute
security and separation in a threaded app, dropping privileges can avoid
certain classes of exploits.)

generally the whole SA/M:N concept creaks under the huge change that is
introduced by having multiple userspace contexts of execution per a single
kernel-space context of execution. Such detaching of concepts, no matter
which kernel subsystem you look at, causes problems everywhere.

eg. the VM. There's no way you can get an 'upcall' from the VM that you
need to wait for free RAM - most of the related kernel code is simply not
ready and restartable. So VM load can end up blocking kernel contexts
without giving any chance to user contexts to be 'scheduled' by the
userspace scheduler. This happens exactly in the worst moment, when load
increases and stuff starts swapping.

and there are some things that i'm not at all sure can be fixed in any
reasonable way - eg. RT scheduling. [the userspace library would have to
raise/drop the priority of threads in the userspace scheduler, causing an
additional kernel entry/exit, eliminating even the theoretical advantage
it had for pure user<->user context switches.]

plus basic performance issues. If you have a healthy mix of userspace and
kernelspace scheduler activity then you've at least doubled your icache
footprint by having two scheduler - the dcache footprint is higher as
well. A *single* bad cachemiss on a P4 is already almost as expensive as a
kernel entry - and it's not like the growing gap between RAM access
latency and CPU performance will shrink in the future. And we arent even
using SYSENTER/SYSEXIT in the Linux kernel yet, which will shave off
another 40% from the syscall entry (and kernel context switching) cost.

so my current take on threading models is: if you *can* do a really fast
and lightweight kernel based 1:1 threading implementation then you have
won. Anything else is barely more than workarounds for (fixable)  
architectural problems. Concentrate your execution abstraction into the
kernel and make it *really* fast and scalable - that will improve
everything else. OTOH any improvement to the userspace thread scheduler
only improves threaded applications - which are still the minority. Sure,
some of the above problems can be helped, but it's not trivial - and some
problems i dont think can be solved at all.

But we'll see, the FreeBSD folks i think are working on KSA's so we'll
know for sure in a couple of years.

	Ingo


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 20:32         ` Ingo Molnar
  2002-09-24  0:03           ` Andy Isaacson
@ 2002-09-24  7:12           ` Thunder from the hill
  2002-09-24  7:30             ` Ingo Molnar
  1 sibling, 1 reply; 60+ messages in thread
From: Thunder from the hill @ 2002-09-24  7:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Bill Davidsen, Larry McVoy, Peter Waechtler, linux-kernel, Ingo Molnar

Hi,

On Mon, 23 Sep 2002, Ingo Molnar wrote:
> 90% of the programs that matter behave exactly like Larry has described.
> IO is the main source of blocking. Go and profile a busy webserver or
> mailserver or database server yourself if you dont believe it.

Well, I guess Java Web Server behaves the same?

			Thunder
-- 
assert(typeof((fool)->next) == typeof(fool));	/* wrong */


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24  7:12           ` Thunder from the hill
@ 2002-09-24  7:30             ` Ingo Molnar
  0 siblings, 0 replies; 60+ messages in thread
From: Ingo Molnar @ 2002-09-24  7:30 UTC (permalink / raw)
  To: Thunder from the hill
  Cc: Bill Davidsen, Larry McVoy, Peter Waechtler, linux-kernel, Ingo Molnar


On Tue, 24 Sep 2002, Thunder from the hill wrote:

> > 90% of the programs that matter behave exactly like Larry has described.
> > IO is the main source of blocking. Go and profile a busy webserver or
> > mailserver or database server yourself if you dont believe it.
>
> Well, I guess Java Web Server behaves the same?

yes. The most common case is that it either blocks on the external network
connection (IO), or on some internal database connection (IO as well). The
JVMs themselves be better well-threaded internally, with not much
contention on any internal lock. The case of internal synchronization is
really that the 1:1 model makes a 'bad parallelism' more visible: when
there's contention. It's quite rare that heavy synchronization and heavy
lock contention cannot be avoided, and it mostly involves simulation
projects which often do this because they simulate real world IO :-)

(but, all this thread is becoming pretty theoretical - current fact is
that the 1:1 library is currently more than 4 times faster than the only
M:N library that we were able to run the test on using the same kernel, on
M:N's 'home turf'. So anyone who thinks the M:N library should perform
faster is welcome to improve it and send in results.)

	Ingo


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 15:30     ` Larry McVoy
                         ` (4 preceding siblings ...)
  2002-09-23 21:41       ` dean gaudet
@ 2002-09-24 10:02       ` Nikita Danilov
  5 siblings, 0 replies; 60+ messages in thread
From: Nikita Danilov @ 2002-09-24 10:02 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Bill Davidsen, Peter Waechtler, linux-kernel, ingo Molnar

Larry McVoy writes:
 > > > Instead of taking the traditional "we've screwed up the normal system 
 > > > primitives so we'll event new lightweight ones" try this:
 > > > 
 > > > We depend on the system primitives to not be broken or slow.
 > > > 
 > > > If that's a true statement, and in Linux it tends to be far more true
 > > > than other operating systems, then there is no reason to have M:N.
 > > 
 > > No matter how fast you do context switch in and out of kernel and a sched
 > > to see what runs next, it can't be done as fast as it can be avoided.
 > 
 > You are arguing about how many angels can dance on the head of a pin.
 > Sure, there are lotso benchmarks which show how fast user level threads
 > can context switch amongst each other and it is always faster than going
 > into the kernel.  So what?  What do you think causes a context switch in
 > a threaded program?  What?  Could it be blocking on I/O?  Like 99.999%
 > of the time?  And doesn't that mean you already went into the kernel to
 > see if the I/O was ready?  And doesn't that mean that in all the real
 > world applications they are already doing all the work you are arguing
 > to avoid?

M:N threads are supposed to have other advantages beside fast context
switches. Original paper on scheduler activations mentioned case when
kernel thread is preempted while user level thread it runs held spin
lock. When kernel notifies user level scheduler about preemption
(through upcall) it can de-schedule all user level threads spinning on
this lock, so that they will not waste their time slices burning CPU.

 > -- 
 > ---
 > Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

Nikita.

 > -

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 21:03 ` Bill Huey
@ 2002-09-24 12:03   ` Michael Sinz
  2002-09-24 13:40     ` Peter Svensson
  0 siblings, 1 reply; 60+ messages in thread
From: Michael Sinz @ 2002-09-24 12:03 UTC (permalink / raw)
  To: Bill Huey (Hui); +Cc: Peter Waechtler, linux-kernel, ingo Molnar

Bill Huey (Hui) wrote:
> On Sun, Sep 22, 2002 at 08:55:39PM +0200, Peter Waechtler wrote:
> 
>>AIX and Irix deploy M:N - I guess for a good reason: it's more
>>flexible and combine both approaches with easy runtime tuning if
>>the app happens to run on SMP (the uncommon case).
> 
> Also, for process scoped scheduling in a way so that system wide threads
> don't have an impact on a process slice. Folks have piped up about that
> being important.

This is the one major complaint I have with "a thread is a process"
implementation in Linux.  The scheduler does not take process vs thread
into account.

A simple example:  Two users (or two different programs - same user)
are running.  Both could use all of the CPU resources (for whatever
reason).

One of the programs (A) has N threads (where N >> 1) and the other
program (B) has only 1 thread.  Of the N threads in (A), M of them
are not blocked (where M > 1) then (A) will get M:1 CPU usage advantage
over (B).

This means that two processes/programs that should be scheduled equally
are not and the one with many threads "effectively" is stealing cycles
from the other.

In a multi-user (server with multiple processes) environment, this
means that you just write with lots of threads to get more of the
bandwidth out of the scheduled processes.

A real-world (albeit not great) example from many years ago:

A program that does ray-tracing can very easily split the process up
into very small bits.  This is great on multi-processor systems as you
can get each CPU to do part of the work in parallel.  There is almost
no I/O involved in such a system other than initial load and final save.

It turned out that on non-dedicated systems (multi-user systems) that
you could actually get your work done faster by having the program
create many (100, in this case) threads even though there was only
one big CPU.  The reason was that that OS also did not (yet) understand
process scheduling fairness and the student who did this effectively
made a way around the fair scheduling of system resources.

The problem was very quickly noticed as other students quickly learned
how to make use of such "solutions" to their performance wants.  We
relatively quickly had to add process level accounting of thread CPU
usage such that any thread in a process counted to that process's
CPU usage/timeslice/etc.  It basically made the scheduler into a
2-stage device - much like user threads but with the kernel doing
the work and all of the benefits of kernel threads.  (And did not
require any code recompile other than those people who were doing
the many-threads CPU hog type of thing ended up having to revert as
it was now slower than the single thread-per-CPU code...)

Now, computer hardware has changed a lot.  Back then, branch took
longer than current kernel syscall overhead.  Memory was faster
than the CPU.  The scheduler was complex, so I could not say that
it was as efficient as the Linux kernel.  However, we did have real
threads and did quickly get real process accounting after someone
"pointed out" the problem of not doing so :-)

-- 
Michael Sinz -- Director, Systems Engineering -- Worldgate Communications
A master's secrets are only as good as
	the master's ability to explain them to others.



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24 12:03   ` Michael Sinz
@ 2002-09-24 13:40     ` Peter Svensson
  2002-09-24 14:20       ` Michael Sinz
  0 siblings, 1 reply; 60+ messages in thread
From: Peter Svensson @ 2002-09-24 13:40 UTC (permalink / raw)
  To: Michael Sinz; +Cc: Bill Huey (Hui), Peter Waechtler, linux-kernel, ingo Molnar

On Tue, 24 Sep 2002, Michael Sinz wrote:

> The problem was very quickly noticed as other students quickly learned
> how to make use of such "solutions" to their performance wants.  We
> relatively quickly had to add process level accounting of thread CPU
> usage such that any thread in a process counted to that process's
> CPU usage/timeslice/etc.  It basically made the scheduler into a
> 2-stage device - much like user threads but with the kernel doing
> the work and all of the benefits of kernel threads.  (And did not
> require any code recompile other than those people who were doing
> the many-threads CPU hog type of thing ended up having to revert as
> it was now slower than the single thread-per-CPU code...)

Then you can just as well use fork(2) and split into processes with the 
same result. The solution is not thread specific, it is resource limits 
and/or per user cpu accounting. 

Several raytracers can (could?) split the workload into multiple 
processes, some being started on other computers over rsh or similar.

Peter
--
Peter Svensson      ! Pgp key available by finger, fingerprint:
<petersv@psv.nu>    ! 8A E9 20 98 C1 FF 43 E3  07 FD B9 0A 80 72 70 AF
------------------------------------------------------------------------
Remember, Luke, your source will be with you... always...



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24 13:40     ` Peter Svensson
@ 2002-09-24 14:20       ` Michael Sinz
  2002-09-24 14:50         ` Offtopic: (was Re: [ANNOUNCE] Native POSIX Thread Library 0.1) Peter Svensson
  0 siblings, 1 reply; 60+ messages in thread
From: Michael Sinz @ 2002-09-24 14:20 UTC (permalink / raw)
  To: Peter Svensson
  Cc: Bill Huey (Hui), Peter Waechtler, linux-kernel, ingo Molnar

Peter Svensson wrote:
> On Tue, 24 Sep 2002, Michael Sinz wrote:
> 
> 
>>The problem was very quickly noticed as other students quickly learned
>>how to make use of such "solutions" to their performance wants.  We
>>relatively quickly had to add process level accounting of thread CPU
>>usage such that any thread in a process counted to that process's
>>CPU usage/timeslice/etc.  It basically made the scheduler into a
>>2-stage device - much like user threads but with the kernel doing
>>the work and all of the benefits of kernel threads.  (And did not
>>require any code recompile other than those people who were doing
>>the many-threads CPU hog type of thing ended up having to revert as
>>it was now slower than the single thread-per-CPU code...)
> 
> 
> Then you can just as well use fork(2) and split into processes with the 
> same result. The solution is not thread specific, it is resource limits 
> and/or per user cpu accounting. 

I understand that point - but the basic question is if you schedule
based on the process or based on the thread.  In an interactive multi-
user system, you may even want to back out to the user level.  (Thus
no user can hog the system by doing many things).  But that is usually
not the target of Linux systems (yet?)

The problem then is the inter-process communications.  (At least on
that system - Linux has many better solutions)  That system did not
have shared memory and thus the coordination between processes was
difficult at best.

> Several raytracers can (could?) split the workload into multiple 
> processes, some being started on other computers over rsh or similar.

And they exist - but the I/O overhead makes it "not a win" on a
single machine.  (It hurts too much)

-- 
Michael Sinz -- Director, Systems Engineering -- Worldgate Communications
A master's secrets are only as good as
	the master's ability to explain them to others.



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Offtopic: (was Re: [ANNOUNCE] Native POSIX Thread Library 0.1)
  2002-09-24 14:20       ` Michael Sinz
@ 2002-09-24 14:50         ` Peter Svensson
  2002-09-24 15:19           ` Mark Veltzer
  2002-09-24 16:31           ` Rik van Riel
  0 siblings, 2 replies; 60+ messages in thread
From: Peter Svensson @ 2002-09-24 14:50 UTC (permalink / raw)
  To: Michael Sinz; +Cc: Bill Huey (Hui), Peter Waechtler, linux-kernel, ingo Molnar

On Tue, 24 Sep 2002, Michael Sinz wrote:

> > Several raytracers can (could?) split the workload into multiple 
> > processes, some being started on other computers over rsh or similar.
> 
> And they exist - but the I/O overhead makes it "not a win" on a
> single machine.  (It hurts too much)

For raytracers (which was the example) you need almost no coordination at
all. Just partition the scene and you are done. This is going offtopic
fast. The point I was making is that there is really no great reward in
grouping threads. Either you need to educate your users and trust them to
behave, or you need per user scheduling.

Peter
--
Peter Svensson      ! Pgp key available by finger, fingerprint:
<petersv@psv.nu>    ! 8A E9 20 98 C1 FF 43 E3  07 FD B9 0A 80 72 70 AF
------------------------------------------------------------------------
Remember, Luke, your source will be with you... always...




^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: Offtopic: (was Re: [ANNOUNCE] Native POSIX Thread Library 0.1)
  2002-09-24 14:50         ` Offtopic: (was Re: [ANNOUNCE] Native POSIX Thread Library 0.1) Peter Svensson
@ 2002-09-24 15:19           ` Mark Veltzer
  2002-09-24 17:29             ` Rik van Riel
  2002-09-24 16:31           ` Rik van Riel
  1 sibling, 1 reply; 60+ messages in thread
From: Mark Veltzer @ 2002-09-24 15:19 UTC (permalink / raw)
  To: Peter Svensson, Linux kernel mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tuesday 24 September 2002 17:50, Peter Svensson wrote:
> Either you need to educate your users and trust them to
> behave, or you need per user scheduling.

It is obvious that in high end systems you MUST have per user scheduling 
since users will rob each other of cycles.... If Linux is to be a general 
purpose operation system it MUST have this feature (otherwise it will only be 
considered fit for lower end systems) and trusting your users at this no 
better than trusting your users when they promise you they will not seg fault 
or peek into memory pages which are not theirs. It simply isn't done. 
Besides, using the CPU in an abusive manner could happen as a result of a bug 
as much as a result of malicious intent (exactly like a segfault).

Ok. Here's an idea. Why not have both ?!?

There is no real reason why I should have per user scheduling on my machine 
at home (I don't really need a just devision of labour between the root user 
and myself which are almost the only users to use my system). Why not have 
the deault compilation of the kernel be without per user scheduling and 
enable it for high end systems (like a university machine where all the 
students are at each others throats for a few CPU cycles...) ? So how about 
making this a compile option and let the users decide what they like ?

Mark
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE9kIKHxlxDIcceXTgRAjGTAJ9bj1t2QV3zaDheO3GQpvJxxjDSIQCggESi
yqE29XtjTL3VDBu15VTQ0Qc=
=oueS
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: Offtopic: (was Re: [ANNOUNCE] Native POSIX Thread Library 0.1)
  2002-09-24 14:50         ` Offtopic: (was Re: [ANNOUNCE] Native POSIX Thread Library 0.1) Peter Svensson
  2002-09-24 15:19           ` Mark Veltzer
@ 2002-09-24 16:31           ` Rik van Riel
  2002-09-24 18:49             ` Michael Sinz
  1 sibling, 1 reply; 60+ messages in thread
From: Rik van Riel @ 2002-09-24 16:31 UTC (permalink / raw)
  To: Peter Svensson
  Cc: Michael Sinz, Bill Huey (Hui),
	Peter Waechtler, linux-kernel, ingo Molnar

On Tue, 24 Sep 2002, Peter Svensson wrote:

> For raytracers (which was the example) you need almost no coordination at
> all. Just partition the scene and you are done. This is going offtopic
> fast. The point I was making is that there is really no great reward in
> grouping threads. Either you need to educate your users and trust them to
> behave, or you need per user scheduling.

I've got per user scheduling.  I'm currently porting it to 2.4.19
(and having fun with some very subtle bugs) and am thinking about
how to port this beast to the O(1) scheduler in a clean way.

Note that it's not necessarily per user, it's trivial to adapt
the code to use any other resource container instead.

regards,

Rik
-- 
A: No.
Q: Should I include quotations after my reply?

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: Offtopic: (was Re: [ANNOUNCE] Native POSIX Thread Library 0.1)
  2002-09-24 15:19           ` Mark Veltzer
@ 2002-09-24 17:29             ` Rik van Riel
  2002-09-25 18:57               ` Mark Mielke
  0 siblings, 1 reply; 60+ messages in thread
From: Rik van Riel @ 2002-09-24 17:29 UTC (permalink / raw)
  To: Mark Veltzer; +Cc: Peter Svensson, Linux kernel mailing list

On Tue, 24 Sep 2002, Mark Veltzer wrote:
> On Tuesday 24 September 2002 17:50, Peter Svensson wrote:
> > Either you need to educate your users and trust them to
> > behave, or you need per user scheduling.
>
> It is obvious that in high end systems you MUST have per user scheduling
> since users will rob each other of cycles.... If Linux is to be a
> general purpose operation system it MUST have this feature

I just posted a patch for this and will upload the patch to
my home page:

Subject: [PATCH] per user scheduling for 2.4.19


My patch also allows you to switch the per user scheduling
on/off with /proc/sys/kernel/fairsched and has been tested
on both UP and SMP.

kind regards,

Rik
-- 
A: No.
Q: Should I include quotations after my reply?

http://www.surriel.com/		http://distro.conectiva.com/



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 23:57           ` Andy Isaacson
  2002-09-24  6:32             ` 1:1 threading vs. scheduler activations (was: Re: [ANNOUNCE] Native POSIX Thread Library 0.1) Ingo Molnar
@ 2002-09-24 18:10             ` Christoph Hellwig
  1 sibling, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2002-09-24 18:10 UTC (permalink / raw)
  To: Andy Isaacson; +Cc: linux-kernel

> Another advantage of keeping a "process" concept is that things like CSA
> (Compatible System Accounting, nee Cray System Accounting)

Which has been ported to Linux now, btw (rather poorly integrated, though):

	http://oss.sgi.com/projects/csa/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: Offtopic: (was Re: [ANNOUNCE] Native POSIX Thread Library 0.1)
  2002-09-24 16:31           ` Rik van Riel
@ 2002-09-24 18:49             ` Michael Sinz
  2002-09-24 19:12               ` PATCH: per user fair scheduler 2.4.19 (cleaned up, thanks hch) (was: Re: Offtopic: (was Re: [ANNOUNCE] Native POSIX Thread Library 0.1)) Rik van Riel
  0 siblings, 1 reply; 60+ messages in thread
From: Michael Sinz @ 2002-09-24 18:49 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Peter Svensson, Bill Huey (Hui),
	Peter Waechtler, linux-kernel, ingo Molnar

Rik van Riel wrote:
> On Tue, 24 Sep 2002, Peter Svensson wrote:
> 
> 
>>For raytracers (which was the example) you need almost no coordination at
>>all. Just partition the scene and you are done. This is going offtopic
>>fast. The point I was making is that there is really no great reward in
>>grouping threads. Either you need to educate your users and trust them to
>>behave, or you need per user scheduling.
> 
> 
> I've got per user scheduling.  I'm currently porting it to 2.4.19
> (and having fun with some very subtle bugs) and am thinking about
> how to port this beast to the O(1) scheduler in a clean way.
> 
> Note that it's not necessarily per user, it's trivial to adapt
> the code to use any other resource container instead.

Doing it per process prevents some class of problems.  Doing it
per user prevents another class.

Note that this does not limit the user (ulimit type solutions)
when the system is not under stress.  What it does do is make sure
that no one user (or process) can DOS the system.  In other words,
implement a fairness amoung peers (users or processes).

Currently, Linux has fairness amoung threads (since threads and
processes are basically the same as far as the scheduler is
concerned.)

I would be interested in seeing this patch.  A per process thing
would be really cool for what I am building (WorldGate) since
many of the processes are all running as the same user but a per
user thing would also be interesting.

(Or both - processes within a user are fair with each other and
users are fair amoung the other users...)

-- 
Michael Sinz -- Director, Systems Engineering -- Worldgate Communications
A master's secrets are only as good as
	the master's ability to explain them to others.



^ permalink raw reply	[flat|nested] 60+ messages in thread

* PATCH: per user fair scheduler 2.4.19 (cleaned up, thanks hch)  (was: Re: Offtopic: (was Re: [ANNOUNCE] Native POSIX Thread Library 0.1))
  2002-09-24 18:49             ` Michael Sinz
@ 2002-09-24 19:12               ` Rik van Riel
  0 siblings, 0 replies; 60+ messages in thread
From: Rik van Riel @ 2002-09-24 19:12 UTC (permalink / raw)
  To: Michael Sinz
  Cc: Peter Svensson, Bill Huey (Hui),
	Peter Waechtler, linux-kernel, ingo Molnar

On Tue, 24 Sep 2002, Michael Sinz wrote:
> Rik van Riel wrote:

> > I've got per user scheduling.  I'm currently porting it to 2.4.19
> > (and having fun with some very subtle bugs) and am thinking about
> > how to port this beast to the O(1) scheduler in a clean way.

> I would be interested in seeing this patch.

Here it is again, this version has undergone some cleanups
after complaints by Christoph Hellwig ;)

Rik
-- 
A: No.
Q: Should I include quotations after my reply?

http://www.surriel.com/		http://distro.conectiva.com/


--- linux/kernel/sched.c.fair	2002-09-20 10:58:49.000000000 -0300
+++ linux/kernel/sched.c	2002-09-24 14:46:31.000000000 -0300
@@ -45,31 +45,33 @@

 extern void mem_use(void);

+#ifdef CONFIG_FAIRSCHED
+/* Toggle the per-user fair scheduler on/off */
+int fairsched = 1;
+
+/* Move a task to the tail of the tasklist */
+static inline void move_last_tasklist(struct task_struct * p)
+{
+	/* list_del */
+	p->next_task->prev_task = p->prev_task;
+	p->prev_task->next_task = p->next_task;
+
+	/* list_add_tail */
+	p->next_task = &init_task;
+	p->prev_task = init_task.prev_task;
+	init_task.prev_task->next_task = p;
+	init_task.prev_task = p;
+}
+
 /*
- * Scheduling quanta.
- *
- * NOTE! The unix "nice" value influences how long a process
- * gets. The nice value ranges from -20 to +19, where a -20
- * is a "high-priority" task, and a "+10" is a low-priority
- * task.
- *
- * We want the time-slice to be around 50ms or so, so this
- * calculation depends on the value of HZ.
+ * Remember p->next, in case we call move_last_tasklist(p) in the
+ * fairsched recalculation code.
  */
-#if HZ < 200
-#define TICK_SCALE(x)	((x) >> 2)
-#elif HZ < 400
-#define TICK_SCALE(x)	((x) >> 1)
-#elif HZ < 800
-#define TICK_SCALE(x)	(x)
-#elif HZ < 1600
-#define TICK_SCALE(x)	((x) << 1)
-#else
-#define TICK_SCALE(x)	((x) << 2)
-#endif
-
-#define NICE_TO_TICKS(nice)	(TICK_SCALE(20-(nice))+1)
+#define safe_for_each_task(p) \
+	for (p = init_task.next_task, next = p->next_task ; p != &init_task ; \
+			p = next, next = p->next_task)

+#endif /* CONFIG_FAIRSCHED */

 /*
  *	Init task must be ok at boot for the ix86 as we will check its signals
@@ -460,6 +462,54 @@
 }

 /*
+ * If the remaining timeslice lengths of all runnable tasks reach 0
+ * the scheduler recalculates the priority of all processes.
+ */
+static void recalculate_priorities(void)
+{
+#ifdef CONFIG_FAIRSCHED
+	if (fairsched) {
+		struct task_struct *p, *next;
+		struct user_struct *up;
+		long oldcounter, newcounter;
+
+		recalculate_peruser_cputicks();
+
+		write_lock_irq(&tasklist_lock);
+		safe_for_each_task(p) {
+			up = p->user;
+			if (up->cpu_ticks > 0) {
+				oldcounter = p->counter;
+				newcounter = (oldcounter >> 1) +
+					NICE_TO_TICKS(p->nice);
+				up->cpu_ticks += oldcounter;
+				up->cpu_ticks -= newcounter;
+				/*
+				 * If a user is very busy, only some of its
+				 * processes can get CPU time. We move those
+				 * processes out of the way to prevent
+				 * starvation of others.
+				 */
+				if (oldcounter != newcounter) {
+					p->counter = newcounter;
+					move_last_tasklist(p);
+				}
+			}
+		}
+		write_unlock_irq(&tasklist_lock);
+	} else
+#endif /* CONFIG_FAIRSCHED */
+	{
+		struct task_struct *p;
+
+		read_lock(&tasklist_lock);
+		for_each_task(p)
+			p->counter = (p->counter >> 1) + NICE_TO_TICKS(p->nice);
+		read_unlock(&tasklist_lock);
+	}
+}
+
+/*
  * schedule_tail() is getting called from the fork return path. This
  * cleans up all remaining scheduler things, without impacting the
  * common case.
@@ -616,13 +666,10 @@

 	/* Do we need to re-calculate counters? */
 	if (unlikely(!c)) {
-		struct task_struct *p;
-
 		spin_unlock_irq(&runqueue_lock);
-		read_lock(&tasklist_lock);
-		for_each_task(p)
-			p->counter = (p->counter >> 1) + NICE_TO_TICKS(p->nice);
-		read_unlock(&tasklist_lock);
+
+		recalculate_priorities();
+
 		spin_lock_irq(&runqueue_lock);
 		goto repeat_schedule;
 	}
--- linux/kernel/user.c.fair	2002-09-20 11:50:56.000000000 -0300
+++ linux/kernel/user.c	2002-09-24 16:06:11.000000000 -0300
@@ -29,9 +29,12 @@
 struct user_struct root_user = {
 	__count:	ATOMIC_INIT(1),
 	processes:	ATOMIC_INIT(1),
-	files:		ATOMIC_INIT(0)
+	files:		ATOMIC_INIT(0),
+	cpu_ticks:	NICE_TO_TICKS(0)
 };

+static LIST_HEAD(user_list);
+
 /*
  * These routines must be called with the uidhash spinlock held!
  */
@@ -44,6 +47,8 @@
 		next->pprev = &up->next;
 	up->pprev = hashent;
 	*hashent = up;
+
+	list_add(&up->list, &user_list);
 }

 static inline void uid_hash_remove(struct user_struct *up)
@@ -54,6 +59,8 @@
 	if (next)
 		next->pprev = pprev;
 	*pprev = next;
+
+	list_del(&up->list);
 }

 static inline struct user_struct *uid_hash_find(uid_t uid, struct user_struct **hashent)
@@ -101,6 +108,7 @@
 		atomic_set(&new->__count, 1);
 		atomic_set(&new->processes, 0);
 		atomic_set(&new->files, 0);
+		new->cpu_ticks = NICE_TO_TICKS(0);

 		/*
 		 * Before adding this, check whether we raced
@@ -120,6 +128,21 @@
 	return up;
 }

+/* Fair scheduler, recalculate the per user cpu time cap. */
+void recalculate_peruser_cputicks(void)
+{
+	struct list_head * entry;
+	struct user_struct * user;
+
+	spin_lock(&uidhash_lock);
+	list_for_each(entry, &user_list) {
+		user = list_entry(entry, struct user_struct, list);
+		user->cpu_ticks = (user->cpu_ticks / 2) + NICE_TO_TICKS(0);
+	}
+	/* Needed hack, we can get called before uid_cache_init ... */
+	root_user.cpu_ticks = (root_user.cpu_ticks / 2) + NICE_TO_TICKS(0);
+	spin_unlock(&uidhash_lock);
+}

 static int __init uid_cache_init(void)
 {
--- linux/kernel/sysctl.c.fair	2002-09-21 00:00:36.000000000 -0300
+++ linux/kernel/sysctl.c	2002-09-21 20:40:49.000000000 -0300
@@ -50,6 +50,7 @@
 extern int sysrq_enabled;
 extern int core_uses_pid;
 extern int cad_pid;
+extern int fairsched;

 /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
 static int maxolduid = 65535;
@@ -256,6 +257,10 @@
 	{KERN_S390_USER_DEBUG_LOGGING,"userprocess_debug",
 	 &sysctl_userprocess_debug,sizeof(int),0644,NULL,&proc_dointvec},
 #endif
+#ifdef CONFIG_FAIRSCHED
+	{KERN_FAIRSCHED, "fairsched", &fairsched, sizeof(int),
+	 0644, NULL, &proc_dointvec},
+#endif
 	{0}
 };

--- linux/include/linux/sched.h.fair	2002-09-20 10:59:03.000000000 -0300
+++ linux/include/linux/sched.h	2002-09-24 15:12:50.000000000 -0300
@@ -275,6 +275,10 @@
 	/* Hash table maintenance information */
 	struct user_struct *next, **pprev;
 	uid_t uid;
+
+	/* Linked list for for_each_user */
+	struct list_head list;
+	long cpu_ticks;
 };

 #define get_current_user() ({ 				\
@@ -282,6 +286,7 @@
 	atomic_inc(&__user->__count);			\
 	__user; })

+extern void recalculate_peruser_cputicks(void);
 extern struct user_struct root_user;
 #define INIT_USER (&root_user)

@@ -422,6 +427,31 @@
 };

 /*
+ * Scheduling quanta.
+ *
+ * NOTE! The unix "nice" value influences how long a process
+ * gets. The nice value ranges from -20 to +19, where a -20
+ * is a "high-priority" task, and a "+10" is a low-priority
+ * task.
+ *
+ * We want the time-slice to be around 50ms or so, so this
+ * calculation depends on the value of HZ.
+ */
+#if HZ < 200
+#define TICK_SCALE(x)	((x) >> 2)
+#elif HZ < 400
+#define TICK_SCALE(x)	((x) >> 1)
+#elif HZ < 800
+#define TICK_SCALE(x)	(x)
+#elif HZ < 1600
+#define TICK_SCALE(x)	((x) << 1)
+#else
+#define TICK_SCALE(x)	((x) << 2)
+#endif
+
+#define NICE_TO_TICKS(nice)	(TICK_SCALE(20-(nice))+1)
+
+/*
  * Per process flags
  */
 #define PF_ALIGNWARN	0x00000001	/* Print alignment warning msgs */
--- linux/include/linux/sysctl.h.fair	2002-09-21 20:41:18.000000000 -0300
+++ linux/include/linux/sysctl.h	2002-09-21 20:41:43.000000000 -0300
@@ -124,6 +124,7 @@
 	KERN_CORE_USES_PID=52,		/* int: use core or core.%pid */
 	KERN_TAINTED=53,	/* int: various kernel tainted flags */
 	KERN_CADPID=54,		/* int: PID of the process to notify on CAD */
+	KERN_FAIRSCHED=55,	/* int: turn fair scheduler on/off */
 };


--- linux/arch/i386/config.in.fair	2002-09-21 20:42:06.000000000 -0300
+++ linux/arch/i386/config.in	2002-09-21 20:42:35.000000000 -0300
@@ -261,6 +261,9 @@
 bool 'System V IPC' CONFIG_SYSVIPC
 bool 'BSD Process Accounting' CONFIG_BSD_PROCESS_ACCT
 bool 'Sysctl support' CONFIG_SYSCTL
+if [ "$CONFIG_EXPERIMENTAL" = "y" ] ; then
+   bool 'Fair scheduler' CONFIG_FAIRSCHED
+fi
 if [ "$CONFIG_PROC_FS" = "y" ]; then
    choice 'Kernel core (/proc/kcore) format' \
 	"ELF		CONFIG_KCORE_ELF	\
--- linux/arch/alpha/config.in.fair	2002-08-02 21:39:42.000000000 -0300
+++ linux/arch/alpha/config.in	2002-09-21 20:43:21.000000000 -0300
@@ -273,6 +273,9 @@
 bool 'System V IPC' CONFIG_SYSVIPC
 bool 'BSD Process Accounting' CONFIG_BSD_PROCESS_ACCT
 bool 'Sysctl support' CONFIG_SYSCTL
+if [ "$CONFIG_EXPERIMENTAL" = "y" ] ; then
+   bool 'Fair scheduler' CONFIG_FAIRSCHED
+fi
 if [ "$CONFIG_PROC_FS" = "y" ]; then
    choice 'Kernel core (/proc/kcore) format' \
 	"ELF		CONFIG_KCORE_ELF	\


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-22 18:55 [ANNOUNCE] Native POSIX Thread Library 0.1 Peter Waechtler
  2002-09-22 21:32 ` Larry McVoy
  2002-09-23 21:03 ` Bill Huey
@ 2002-09-24 20:19 ` David Schwartz
  2002-09-24 21:10   ` Chris Friesen
  2002-09-24 23:16   ` Peter Waechtler
  2 siblings, 2 replies; 60+ messages in thread
From: David Schwartz @ 2002-09-24 20:19 UTC (permalink / raw)
  To: pwaechtler, linux-kernel


>The effect of M:N on UP systems should be even more clear. Your
>multithreaded apps can't profit of parallelism but they do not
>add load to the system scheduler. The drawback: more syscalls
>(I think about removing the need for
>flags=fcntl(GETFLAGS);fcntl(fd,NONBLOCK);write(fd);fcntl(fd,flags))

	The main reason I write multithreaded apps for single CPU systems is to 
protect against ambush. Consider, for example, a web server. Someone sends it 
an obscure request that triggers some code that's never run before and has to 
fault in. If my application were single-threaded, no work could be done until 
that page faulted in from disk. This is why select-loop and poll-loop type 
servers are bursty.

	DS



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24  0:03           ` Andy Isaacson
  2002-09-24  0:10             ` Jeff Garzik
  2002-09-24  5:53             ` Ingo Molnar
@ 2002-09-24 20:34             ` David Schwartz
  2 siblings, 0 replies; 60+ messages in thread
From: David Schwartz @ 2002-09-24 20:34 UTC (permalink / raw)
  To: adi, Ingo Molnar; +Cc: linux-kernel



On Mon, 23 Sep 2002 19:03:06 -0500, Andy Isaacson wrote:

>Of course this can be (and frequently is) implemented such that there is
>not one Pthreads thread per object; given simulation environments with 1
>million objects, and the current crappy state of Pthreads
>implementations, the researchers have no choice.

	It may well be handy to have a threads implementation that makes these kinds 
of programs easy to write, but an OS's preferred pthreads is not and should 
not be that threads implementation. A platforms default/preferred pthreads 
implementation should be one that allows well-designed, high-performance 
I/O-intensive and compute-intensive tasks to run extremely well.

	DS



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24 20:19 ` [ANNOUNCE] Native POSIX Thread Library 0.1 David Schwartz
@ 2002-09-24 21:10   ` Chris Friesen
  2002-09-24 21:22     ` Rik van Riel
  2002-09-25 19:02     ` David Schwartz
  2002-09-24 23:16   ` Peter Waechtler
  1 sibling, 2 replies; 60+ messages in thread
From: Chris Friesen @ 2002-09-24 21:10 UTC (permalink / raw)
  To: David Schwartz; +Cc: pwaechtler, linux-kernel

David Schwartz wrote:

> 	The main reason I write multithreaded apps for single CPU systems is to 
> protect against ambush. Consider, for example, a web server. Someone sends it 
> an obscure request that triggers some code that's never run before and has to 
> fault in. If my application were single-threaded, no work could be done until 
> that page faulted in from disk.

This is interesting--I hadn't considered this as most of my work for the 
past while has been on embedded systems with everything pinned in ram.

Have you benchmarked this?  I was under the impression that the very 
fastest webservers were still single-threaded using non-blocking io.

Chris


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24 21:10   ` Chris Friesen
@ 2002-09-24 21:22     ` Rik van Riel
  2002-09-24 21:35       ` Roberto Peon
  2002-09-24 21:35       ` Chris Friesen
  2002-09-25 19:02     ` David Schwartz
  1 sibling, 2 replies; 60+ messages in thread
From: Rik van Riel @ 2002-09-24 21:22 UTC (permalink / raw)
  To: Chris Friesen; +Cc: David Schwartz, pwaechtler, linux-kernel

On Tue, 24 Sep 2002, Chris Friesen wrote:
> David Schwartz wrote:
>
> > 	The main reason I write multithreaded apps for single CPU systems is to
> > protect against ambush. Consider, for example, a web server. Someone sends it
> > an obscure request that triggers some code that's never run before and has to
> > fault in. If my application were single-threaded, no work could be done until
> > that page faulted in from disk.
>
> This is interesting--I hadn't considered this as most of my work for the
> past while has been on embedded systems with everything pinned in ram.

On an ftp server (or movie server, or ...) you CAN'T pin everything
in RAM.

Rik
-- 
A: No.
Q: Should I include quotations after my reply?

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24 21:22     ` Rik van Riel
@ 2002-09-24 21:35       ` Roberto Peon
  2002-09-24 21:35       ` Chris Friesen
  1 sibling, 0 replies; 60+ messages in thread
From: Roberto Peon @ 2002-09-24 21:35 UTC (permalink / raw)
  To: Rik van Riel, Chris Friesen; +Cc: David Schwartz, pwaechtler, linux-kernel


On Tuesday 24 September 2002 02:22 pm, Rik van Riel wrote:
> On Tue, 24 Sep 2002, Chris Friesen wrote:
> > David Schwartz wrote:
> > > 	The main reason I write multithreaded apps for single CPU systems is
> > > to protect against ambush. Consider, for example, a web server. Someone
> > > sends it an obscure request that triggers some code that's never run
> > > before and has to fault in. If my application were single-threaded, no
> > > work could be done until that page faulted in from disk.

This is similar to the problems that we face doing realtime virtual video
enhancements-

We have to log camera data (to know where things are pointed) by video
timecode since the data for the camera and the video are asyncronous
(especially in replay). 

These (mmaped) logs can get relatively large (100+ MB ea) and access into them
is relatively random (i.e. determined by the director of the show), so the
process reading the log (and suffering the fault) is in a different thread in
order to not stall the other important tasks such as video output.
(Mis-estimating the position for the enhancement is much less of an issue than
dropping the video frame itself. We don't want 10,000,000 people seeing
pure-green frames popping up in the middle of the broadcast.)

-Roberto JP
robertopeon@sportvision.com



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24 21:22     ` Rik van Riel
  2002-09-24 21:35       ` Roberto Peon
@ 2002-09-24 21:35       ` Chris Friesen
  1 sibling, 0 replies; 60+ messages in thread
From: Chris Friesen @ 2002-09-24 21:35 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Christopher Friesen, David Schwartz, pwaechtler, linux-kernel

Rik van Riel wrote:
> On Tue, 24 Sep 2002, Chris Friesen wrote:

>>This is interesting--I hadn't considered this as most of my work for the
>>past while has been on embedded systems with everything pinned in ram.
>>
> 
> On an ftp server (or movie server, or ...) you CAN'T pin everything
> in RAM.

Yes, but you can use aio to issue the request for data and then go do 
other stuff even with a single thread.  David's case was faulting in 
little-used application code.

Or arm I missing something?

Chris


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24 20:19 ` [ANNOUNCE] Native POSIX Thread Library 0.1 David Schwartz
  2002-09-24 21:10   ` Chris Friesen
@ 2002-09-24 23:16   ` Peter Waechtler
  2002-09-24 23:23     ` Rik van Riel
  2002-09-25 19:05     ` David Schwartz
  1 sibling, 2 replies; 60+ messages in thread
From: Peter Waechtler @ 2002-09-24 23:16 UTC (permalink / raw)
  To: David Schwartz; +Cc: linux-kernel

Am Dienstag den, 24. September 2002, um 22:19, schrieb David Schwartz:

>
>> The effect of M:N on UP systems should be even more clear. Your
>> multithreaded apps can't profit of parallelism but they do not
>> add load to the system scheduler. The drawback: more syscalls
>> (I think about removing the need for
>> flags=fcntl(GETFLAGS);fcntl(fd,NONBLOCK);write(fd);fcntl(fd,flags))
>
> 	The main reason I write multithreaded apps for single CPU systems 
> is to
> protect against ambush. Consider, for example, a web server. Someone 
> sends it
> an obscure request that triggers some code that's never run before and 
> has to
> fault in. If my application were single-threaded, no work could be done 
> until
> that page faulted in from disk. This is why select-loop and poll-loop 
> type
> servers are bursty.

With the current NGPT design your threads would be blocked (all that are 
scheduled
one this kernel vehicle).

With Scheduler Activations this could also be avoided.
The thread scheduler could get an upcall - but this will stay theory for 
a long
time on Linux.
But this is a somewhat far fetched example (for arguing for 1:1), isn't 
it?

There are other means of DoS..




^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24 23:16   ` Peter Waechtler
@ 2002-09-24 23:23     ` Rik van Riel
  2002-09-25 19:05     ` David Schwartz
  1 sibling, 0 replies; 60+ messages in thread
From: Rik van Riel @ 2002-09-24 23:23 UTC (permalink / raw)
  To: Peter Waechtler; +Cc: David Schwartz, linux-kernel

On Wed, 25 Sep 2002, Peter Waechtler wrote:

> With Scheduler Activations this could also be avoided. The thread
> scheduler could get an upcall - but this will stay theory for a long
> time on Linux. But this is a somewhat far fetched example (for arguing
> for 1:1), isn't it?

Actually, the upcalls in a N:M scheme with scheduler activations
seem like a pretty good argument for 1:1 to me ;)

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

Spamtraps of the month:  september@surriel.com trac@trac.org


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: 1:1 threading vs. scheduler activations (was: Re: [ANNOUNCE] Native POSIX Thread Library 0.1)
  2002-09-24  6:32             ` 1:1 threading vs. scheduler activations (was: Re: [ANNOUNCE] Native POSIX Thread Library 0.1) Ingo Molnar
@ 2002-09-25  3:08               ` Bill Huey
  0 siblings, 0 replies; 60+ messages in thread
From: Bill Huey @ 2002-09-25  3:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andy Isaacson, Larry McVoy, Peter W?chtler, Bill Davidsen,
	linux-kernel, Bill Huey (Hui)

On Tue, Sep 24, 2002 at 08:32:16AM +0200, Ingo Molnar wrote:
> yes, SA's (and KSA's) are an interesting concept, but i personally think
> they are way too much complexity - and history has shows that complexity
> never leads to anything good, especially not in OS design.

FreeBSD's KSEs .;)

> Eg. SA's, like every M:N concept, must have a userspace component of the
> scheduler, which gets very funny when you try to implement all the things
> the kernel scheduler has had for years: fairness, SMP balancing, RT
> scheduling (!), preemption and more.

Yeah, I understand. These folks are doing some interesting stuff and
might provide some answers for you:

	http://www.research.ibm.com/K42/

This paper specifically:

	http://www.research.ibm.com/K42/white-papers/Scheduling.pdf

Their stuff isn't too much different than FreeBSD's KSE project, different
names for the primitives, different communication, etc...

> And then i havent mentioned things like upcall costs - what's the point in
> upcalling userspace which then has to schedule, instead of doing this
> stuff right in the kernel? Scheduler activations concentrate too much on
> the 5% of cases that have more userspace<->userspace context switching
> than some sort of kernel-provoked context switching. Sure, scheduler
> activations can be done, but i cannot see how they can be any better than
> 'just as fast' as a 1:1 implementation - at a much higher complexity and
> robustness cost.

Folks have been experimenting with other means of kernel/userspace using
a chunk of shared memory, notification and polling when the UTS gets entered
by a block on a mutex or other operation. Upcalls are what was used in
the original Topaz OS paper that implemented SAs, Mach was the other.
It doesn't mean that it's used universally for all implementations.

> the biggest part of Linux's kernel-space context switching is the cost of
> kernel entry - and the cost of kernel entry gets cheaper with every new
> generation of CPUs. Basing the whole threading design on the avoidance of
> the kernel scheduler is like basing your tent on a glacier, in a hot
> summer day.
> 
> Plus in an M:N model all the development toolchain suddenly has to
> understand the new set of contexts, debuggers, tracers, everything.

That's not an issue. Folks expect that to be so when working with any
new threading system.

> Plus there are other issues like security - it's perfectly reasonable in
> the 1:1 model for a certain set of server threads to drop all privileges
> to do the more dangerous stuff. (while there is no such thing as absolute
> security and separation in a threaded app, dropping privileges can avoid
> certain classes of exploits.)

> generally the whole SA/M:N concept creaks under the huge change that is
> introduced by having multiple userspace contexts of execution per a single
> kernel-space context of execution. Such detaching of concepts, no matter
> which kernel subsystem you look at, causes problems everywhere.

Maybe, it's probably implementation specific. I'm curious as to how K42
performs.

> eg. the VM. There's no way you can get an 'upcall' from the VM that you
> need to wait for free RAM - most of the related kernel code is simply not
> ready and restartable. So VM load can end up blocking kernel contexts
> without giving any chance to user contexts to be 'scheduled' by the
> userspace scheduler. This happens exactly in the worst moment, when load
> increases and stuff starts swapping.

That's solved by refashioning the kernel to pump out a blocking notification
to the UTS for that backing kernel thread. It's expected out of an SA style
system.

> and there are some things that i'm not at all sure can be fixed in any
> reasonable way - eg. RT scheduling. [the userspace library would have to
> raise/drop the priority of threads in the userspace scheduler, causing an
> additional kernel entry/exit, eliminating even the theoretical advantage
> it had for pure user<->user context switches.]

KSEs have a RT scheduling category, but the issue of preemption is not clearly
understood by me just yet so can't comment on it. I was in the process of trying
to understand this stuff at one time since I was thinking about work on that
project.

> plus basic performance issues. If you have a healthy mix of userspace and
> kernelspace scheduler activity then you've at least doubled your icache
> footprint by having two scheduler - the dcache footprint is higher as
> well. A *single* bad cachemiss on a P4 is already almost as expensive as a
> kernel entry - and it's not like the growing gap between RAM access
> latency and CPU performance will shrink in the future. And we arent even
> using SYSENTER/SYSEXIT in the Linux kernel yet, which will shave off
> another 40% from the syscall entry (and kernel context switching) cost.

It'll be localized to the UTS, while threads that blocked in the kernel
are mostly going to be IO driven. Don't know about the situation where
you might have a a mixture of those activities.

The infrastructure for the upcalls might incur significant overhead.

> so my current take on threading models is: if you *can* do a really fast
> and lightweight kernel based 1:1 threading implementation then you have
> won. Anything else is barely more than workarounds for (fixable)  
> architectural problems. Concentrate your execution abstraction into the
> kernel and make it *really* fast and scalable - that will improve
> everything else. OTOH any improvement to the userspace thread scheduler
> only improves threaded applications - which are still the minority. Sure,
> some of the above problems can be helped, but it's not trivial - and some
> problems i dont think can be solved at all.

> But we'll see, the FreeBSD folks i think are working on KSA's so we'll
> know for sure in a couple of years.

There's a lot of ways folks can do this kind of stuff. Who knows ? The current
method you folks are doing could very well be the best for Linux.

I don't have much more to say about this topic.

bill


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: Offtopic: (was Re: [ANNOUNCE] Native POSIX Thread Library 0.1)
  2002-09-24 17:29             ` Rik van Riel
@ 2002-09-25 18:57               ` Mark Mielke
  2002-09-25 19:04                 ` Rik van Riel
  0 siblings, 1 reply; 60+ messages in thread
From: Mark Mielke @ 2002-09-25 18:57 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Mark Veltzer, Peter Svensson, Linux kernel mailing list

On Tue, Sep 24, 2002 at 02:29:30PM -0300, Rik van Riel wrote:
> I just posted a patch for this and will upload the patch to
> my home page:
> Subject: [PATCH] per user scheduling for 2.4.19
> My patch also allows you to switch the per user scheduling
> on/off with /proc/sys/kernel/fairsched and has been tested
> on both UP and SMP.

I missed this one. Does this mean that fork() bombs will have limited
effect on root? :-)

I definately want this, even on my home machine. I've always found it
to be a sort of fatal flaw that per-user resource scheduling did not
exist on the platforms of my choice.

mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24 21:10   ` Chris Friesen
  2002-09-24 21:22     ` Rik van Riel
@ 2002-09-25 19:02     ` David Schwartz
  1 sibling, 0 replies; 60+ messages in thread
From: David Schwartz @ 2002-09-25 19:02 UTC (permalink / raw)
  To: cfriesen; +Cc: pwaechtler, linux-kernel


On Tue, 24 Sep 2002 17:10:17 -0400, Chris Friesen wrote:
>David Schwartz wrote:

>>The main reason I write multithreaded apps for single CPU systems is to
>>protect against ambush. Consider, for example, a web server. Someone sends
>>it
>>an obscure request that triggers some code that's never run before and has
>>to
>>fault in. If my application were single-threaded, no work could be done
>>until
>>that page faulted in from disk.

>This is interesting--I hadn't considered this as most of my work for the
>past while has been on embedded systems with everything pinned in ram.

	In the usual case, the code faults in.

>Have you benchmarked this?  I was under the impression that the very
>fastest webservers were still single-threaded using non-blocking io.

	It's all about how you define "fastest". If speed means being able to do the 
same thing over and over really quickly, yes. But I also want uniform 
(non-bursty) performance in the face of an unpredictable set of jobs.

	DS




^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: Offtopic: (was Re: [ANNOUNCE] Native POSIX Thread Library 0.1)
  2002-09-25 18:57               ` Mark Mielke
@ 2002-09-25 19:04                 ` Rik van Riel
  2002-09-25 19:29                   ` Mark Veltzer
  0 siblings, 1 reply; 60+ messages in thread
From: Rik van Riel @ 2002-09-25 19:04 UTC (permalink / raw)
  To: Mark Mielke; +Cc: Mark Veltzer, Peter Svensson, Linux kernel mailing list

On Wed, 25 Sep 2002, Mark Mielke wrote:

> I missed this one. Does this mean that fork() bombs will have limited
> effect on root? :-)

Indeed. A user can easily run 100 while(1) {} loops, but to the
other users in the system it'll feel just like 1 loop...

> I definately want this, even on my home machine. I've always found it
> to be a sort of fatal flaw that per-user resource scheduling did not
> exist on the platforms of my choice.

It has existed since 2.2.14 or so ;)

I just didn't get around to forward-porting it to newer 2.4
kernels, until last weekend.

cheers,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

Spamtraps of the month:  september@surriel.com trac@trac.org


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24 23:16   ` Peter Waechtler
  2002-09-24 23:23     ` Rik van Riel
@ 2002-09-25 19:05     ` David Schwartz
  1 sibling, 0 replies; 60+ messages in thread
From: David Schwartz @ 2002-09-25 19:05 UTC (permalink / raw)
  To: pwaechtler; +Cc: linux-kernel


>With Scheduler Activations this could also be avoided.
>The thread scheduler could get an upcall - but this will stay theory for
>a long
>time on Linux.
>But this is a somewhat far fetched example (for arguing for 1:1), isn't
>it?

	No, it's not. I write high-performance servers and my main enemy is 
burstiness. One significant cause of burstiness is code faulting in. This is 
especially true because many of my servers support adding code to them 
through user-supplies shared object files.

>There are other means of DoS..

	I'm not talking about deliberate attempts at harming the server. These won't 
work over and over because the code will fault in once and be in. I'm talking 
about smooth performance in the face of unpredictable loads, and that means 
not stalling on every page fault.

	DS



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: Offtopic: (was Re: [ANNOUNCE] Native POSIX Thread Library 0.1)
  2002-09-25 19:29                   ` Mark Veltzer
@ 2002-09-25 19:23                     ` Rik van Riel
  0 siblings, 0 replies; 60+ messages in thread
From: Rik van Riel @ 2002-09-25 19:23 UTC (permalink / raw)
  To: Mark Veltzer; +Cc: Mark Mielke, Linux kernel mailing list, Peter Svensson

On Wed, 25 Sep 2002, Mark Veltzer wrote:

> This is terrific!!! How come something like this was not merged in
> earlier ?!? This seems like an absolute neccesity!!! I'm willing to test
> it if that is what is needed to get it merged.

You can grab the fair scheduler patch from my home page:

	http://surriel.com/patches/

> What does Linus and others feel about this and most importantly when
> will see it in ? (Hopefully in this development cycle).

I have no idea what Linus and others think about this patch,
but I know I'll need to forward-port the thing to the O(1)
scheduler first, we can ask them after that is done ;)

regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

Spamtraps of the month:  september@surriel.com trac@trac.org


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: Offtopic: (was Re: [ANNOUNCE] Native POSIX Thread Library 0.1)
  2002-09-25 19:04                 ` Rik van Riel
@ 2002-09-25 19:29                   ` Mark Veltzer
  2002-09-25 19:23                     ` Rik van Riel
  0 siblings, 1 reply; 60+ messages in thread
From: Mark Veltzer @ 2002-09-25 19:29 UTC (permalink / raw)
  To: Rik van Riel, Mark Mielke, Linux kernel mailing list, Peter Svensson

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wednesday 25 September 2002 22:04, you wrote:
> On Wed, 25 Sep 2002, Mark Mielke wrote:
> > I missed this one. Does this mean that fork() bombs will have limited
> > effect on root? :-)
>
> Indeed. A user can easily run 100 while(1) {} loops, but to the
> other users in the system it'll feel just like 1 loop...

Rik!

This is terrific!!! How come something like this was not merged in earlier 
?!? This seems like an absolute neccesity!!! I'm willing to test it if that 
is what is needed to get it merged. What does Linus and others feel about 
this and most importantly when will see it in ? (Hopefully in this 
development cycle).

	Regards,
	Mark
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE9kg6SxlxDIcceXTgRApApAKClB60zgDs0OB1ltb2ha0Lo8cescgCfTVE7
ZiNKbiTAN78LecVGt6/JzPU=
=/z0y
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 19:14       ` Bill Davidsen
@ 2002-09-29 23:26         ` Buddy Lumpkin
  2002-09-30 14:54           ` Corey Minyard
  0 siblings, 1 reply; 60+ messages in thread
From: Buddy Lumpkin @ 2002-09-29 23:26 UTC (permalink / raw)
  To: 'Bill Davidsen', 'Peter Waechtler'
  Cc: 'Larry McVoy', linux-kernel, 'ingo Molnar'

Sun introduced a new thread library in Solaris 8 that is 1:1, but it did
not replace the default N:M version, you have to link against
/usr/lib/lwp.

http://supportforum.sun.com/freesolaris/techfaqs.html?techfaqs_2957
http://www.itworld.com/AppDev/1170/swol-1218-insidesolaris/

I was at a USENIX BOF on threads in Boston year before last and Bill
Lewis was ranting about how the N:M model sucks. Christopher Provenzano
was right there and didn't seem to add any feelings one way or the
other.

Regards,

--Buddy

-----Original Message-----
From: linux-kernel-owner@vger.kernel.org
[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Bill Davidsen
Sent: Monday, September 23, 2002 12:15 PM
To: Peter Waechtler
Cc: Larry McVoy; linux-kernel@vger.kernel.org; ingo Molnar
Subject: Re: [ANNOUNCE] Native POSIX Thread Library 0.1

On Mon, 23 Sep 2002, Peter Waechtler wrote:

> Am Montag den, 23. September 2002, um 12:05, schrieb Bill Davidsen:
> 
> > On Sun, 22 Sep 2002, Larry McVoy wrote:
> >
> >> On Sun, Sep 22, 2002 at 08:55:39PM +0200, Peter Waechtler wrote:
> >>> AIX and Irix deploy M:N - I guess for a good reason: it's more
> >>> flexible and combine both approaches with easy runtime tuning if
> >>> the app happens to run on SMP (the uncommon case).
> >>
> >> No, AIX and IRIX do it that way because their processes are so
bloated
> >> that it would be unthinkable to do a 1:1 model.
> >
> > And BSD? And Solaris?
> 
> Don't know. I don't have access to all those Unices. I could try
FreeBSD.

At your convenience.
 
> According to http://www.kegel.com/c10k.html  Sun is moving to 1:1
> and FreeBSD still believes in M:N

Sun is total news to me, "moving to" may be in Solaris 9, Sol8 seems to
still be N:M. BSD is as I thought.
> 
> MacOSX 10.1 does not support PROCESS_SHARED locks, tried that 5
minutes 
> ago.

Thank you for the effort. Hum, that's a bit of a surprise, at least to
me. 

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-29 23:26         ` Buddy Lumpkin
@ 2002-09-30 14:54           ` Corey Minyard
  0 siblings, 0 replies; 60+ messages in thread
From: Corey Minyard @ 2002-09-30 14:54 UTC (permalink / raw)
  To: Buddy Lumpkin
  Cc: 'Bill Davidsen', 'Peter Waechtler',
	'Larry McVoy', linux-kernel, 'ingo Molnar'

Buddy Lumpkin wrote:

>Sun introduced a new thread library in Solaris 8 that is 1:1, but it did
>not replace the default N:M version, you have to link against
>/usr/lib/lwp.
>
>http://supportforum.sun.com/freesolaris/techfaqs.html?techfaqs_2957
>http://www.itworld.com/AppDev/1170/swol-1218-insidesolaris/
>
>I was at a USENIX BOF on threads in Boston year before last and Bill
>Lewis was ranting about how the N:M model sucks. Christopher Provenzano
>was right there and didn't seem to add any feelings one way or the
>other.
>
>Regards,
>
>--Buddy
>
I heard this a while ago, and talked with someone I knew who had inside 
information about this.  According to that person, Sun will be switching 
the default threads library to 1:1 (It looks like from the document 
referenced below it is Solaris 9).  In various benchmarks, sometimes M:N 
won, and sometimes 1:1 won, so performance was a wash.  The main problem 
was that they could never get certain things to work "just right" under 
an M:N model, the complexity of M:N was just too high to be able to get 
it working 100% correctly.  He didn't have specific details, though.

Having implemented a threads package with prority inheritance, I expect 
that doing that with an M:N thread model will be extremely complex. 
 With activations is possible, but that doesn't mean it's easy.  It's 
hard enough with a 1:1 model.  A scheduler with good "global" properties 
(for example, a scheduler that guaranteed time share to classes of 
threads that occur in different processes) would be difficult to 
implement properly, too.

Complexity is the enemy of reliability.  Even if the M:N model could get 
slightly better performance, it's going to be very hard to make it work 
100% correctly.  I personally think the NPT is going in the right 
direction on this one.

-Corey

>
>-----Original Message-----
>From: linux-kernel-owner@vger.kernel.org
>[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Bill Davidsen
>Sent: Monday, September 23, 2002 12:15 PM
>To: Peter Waechtler
>Cc: Larry McVoy; linux-kernel@vger.kernel.org; ingo Molnar
>Subject: Re: [ANNOUNCE] Native POSIX Thread Library 0.1
>
>On Mon, 23 Sep 2002, Peter Waechtler wrote:
>
>  
>
>>Am Montag den, 23. September 2002, um 12:05, schrieb Bill Davidsen:
>>
>>    
>>
>>>On Sun, 22 Sep 2002, Larry McVoy wrote:
>>>
>>>      
>>>
>>>>On Sun, Sep 22, 2002 at 08:55:39PM +0200, Peter Waechtler wrote:
>>>>        
>>>>
>>>>>AIX and Irix deploy M:N - I guess for a good reason: it's more
>>>>>flexible and combine both approaches with easy runtime tuning if
>>>>>the app happens to run on SMP (the uncommon case).
>>>>>          
>>>>>
>>>>No, AIX and IRIX do it that way because their processes are so
>>>>        
>>>>
>bloated
>  
>
>>>>that it would be unthinkable to do a 1:1 model.
>>>>        
>>>>
>>>And BSD? And Solaris?
>>>      
>>>
>>Don't know. I don't have access to all those Unices. I could try
>>    
>>
>FreeBSD.
>
>At your convenience.
> 
>  
>
>>According to http://www.kegel.com/c10k.html  Sun is moving to 1:1
>>and FreeBSD still believes in M:N
>>    
>>
>
>Sun is total news to me, "moving to" may be in Solaris 9, Sol8 seems to
>still be N:M. BSD is as I thought.
>  
>
>>MacOSX 10.1 does not support PROCESS_SHARED locks, tried that 5
>>    
>>
>minutes 
>  
>
>>ago.
>>    
>>
>
>Thank you for the effort. Hum, that's a bit of a surprise, at least to
>me. 
>
>  
>




^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2002-09-30 14:49 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-22 18:55 [ANNOUNCE] Native POSIX Thread Library 0.1 Peter Waechtler
2002-09-22 21:32 ` Larry McVoy
2002-09-23 10:05   ` Bill Davidsen
2002-09-23 11:55     ` Peter Waechtler
2002-09-23 19:14       ` Bill Davidsen
2002-09-29 23:26         ` Buddy Lumpkin
2002-09-30 14:54           ` Corey Minyard
2002-09-23 15:30     ` Larry McVoy
2002-09-23 19:44       ` Olivier Galibert
2002-09-23 19:48       ` Bill Davidsen
2002-09-23 20:32         ` Ingo Molnar
2002-09-24  0:03           ` Andy Isaacson
2002-09-24  0:10             ` Jeff Garzik
2002-09-24  0:14               ` Andy Isaacson
2002-09-24  5:53             ` Ingo Molnar
2002-09-24 20:34             ` David Schwartz
2002-09-24  7:12           ` Thunder from the hill
2002-09-24  7:30             ` Ingo Molnar
2002-09-23 22:35         ` Mark Mielke
2002-09-23 19:59       ` Peter Waechtler
2002-09-23 20:36         ` Ingo Molnar
2002-09-23 21:08           ` Peter Wächtler
2002-09-23 22:44             ` Mark Mielke
2002-09-23 23:01               ` Bill Huey
2002-09-23 23:11                 ` Mark Mielke
2002-09-24  0:21                   ` Bill Huey
2002-09-24  3:20                     ` Mark Mielke
2002-09-23 23:57           ` Andy Isaacson
2002-09-24  6:32             ` 1:1 threading vs. scheduler activations (was: Re: [ANNOUNCE] Native POSIX Thread Library 0.1) Ingo Molnar
2002-09-25  3:08               ` Bill Huey
2002-09-24 18:10             ` [ANNOUNCE] Native POSIX Thread Library 0.1 Christoph Hellwig
2002-09-23 21:32       ` Bill Huey
2002-09-23 21:41       ` dean gaudet
2002-09-23 22:10         ` Bill Huey
2002-09-23 22:56         ` Mark Mielke
2002-09-24 10:02       ` Nikita Danilov
2002-09-23 21:22     ` Bill Huey
2002-09-23 21:03 ` Bill Huey
2002-09-24 12:03   ` Michael Sinz
2002-09-24 13:40     ` Peter Svensson
2002-09-24 14:20       ` Michael Sinz
2002-09-24 14:50         ` Offtopic: (was Re: [ANNOUNCE] Native POSIX Thread Library 0.1) Peter Svensson
2002-09-24 15:19           ` Mark Veltzer
2002-09-24 17:29             ` Rik van Riel
2002-09-25 18:57               ` Mark Mielke
2002-09-25 19:04                 ` Rik van Riel
2002-09-25 19:29                   ` Mark Veltzer
2002-09-25 19:23                     ` Rik van Riel
2002-09-24 16:31           ` Rik van Riel
2002-09-24 18:49             ` Michael Sinz
2002-09-24 19:12               ` PATCH: per user fair scheduler 2.4.19 (cleaned up, thanks hch) (was: Re: Offtopic: (was Re: [ANNOUNCE] Native POSIX Thread Library 0.1)) Rik van Riel
2002-09-24 20:19 ` [ANNOUNCE] Native POSIX Thread Library 0.1 David Schwartz
2002-09-24 21:10   ` Chris Friesen
2002-09-24 21:22     ` Rik van Riel
2002-09-24 21:35       ` Roberto Peon
2002-09-24 21:35       ` Chris Friesen
2002-09-25 19:02     ` David Schwartz
2002-09-24 23:16   ` Peter Waechtler
2002-09-24 23:23     ` Rik van Riel
2002-09-25 19:05     ` David Schwartz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).