linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [ANNOUNCE] Native POSIX Thread Library 0.1
@ 2002-09-20  0:41 Ulrich Drepper
  2002-09-20  0:51 ` William Lee Irwin III
                   ` (8 more replies)
  0 siblings, 9 replies; 114+ messages in thread
From: Ulrich Drepper @ 2002-09-20  0:41 UTC (permalink / raw)
  To: linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

We are pleased to announce the first publically available source
release of a new POSIX thread library for Linux.  As part of the
continuous effort to improve Linux's capabilities as a client, server,
and computing platform Red Hat sponsored the development of this
completely new implementation of a POSIX thread library, called Native
POSIX Thread Library, NPTL.

Unless major flaws in the design are found this code is intended to
become the standard POSIX thread library on Linux system and it will
be included in the GNU C library distribution.

The work visible here is the result of close collaboration of kernel
and runtime developers.  The collaboration proceeded by developing the
kernel changes while writing the appropriate parts of the thread
library.  Whenever something couldn't be implemented optimally some
interface was changed to eliminate the issue.  The result is this
thread library which is, unlike previous attempts, a very thin layer
on top of the kernel.  This helps to achieve a maximum of performance
for a minimal price.


A white paper (still in its draft stage, though) describing the design
is available at

   http://people.redhat.com/drepper/nptl-design.pdf

It provides a larger number of details on the design and insight into
the design process.  At this point we want to repeat only a few
important points:

- - the new library is based on an 1-on-1 model.  Earlier design
   documents stated that an M-on-N implementation was necessary to
   support a scalable thread library.  This was especially true for
   the IA-32 and x86-64 platforms since the ABI with respect to threads
   forces the use of segment registers and the only way to use those
   registers was with the Local Descriptor Table (LDT) data structure
   of the processor.

   The kernel limitations the earlier designs were based on have been
   eliminated as part of this project, opening the road to a 1-on-1
   implementation which has many advantages such as

   + less complex implementation;
   + avoidance of two-level scheduling, enabling the kernel to make all
     scheduling decisions;
   + direct interaction between kernel and user-level code (e.g., when
     delivering signals);
   + and more and more.

   It is not generally accepted that a 1-on-1 model is superior but our
   tests showed the viability of this approach and by comparing it with
   the overhead added by existing M-on-N implementations we became
   convinced that 1-on-1 is the right approach.

   Initial confirmations were test runs with huge numbers of threads.
   Even on IA-32 with its limited address space and memory handling
   running 100,000 concurrent threads was no problem at all, creating
   and destroying the threads did not take more than two seconds.  This
   all was made possible by the kernel work performed as part of this
   project.

   The only limiting factors on the number of threads today are
   resource availability (RAM and processor resources) and architecture
   limitations.  Since every thread needs at least a stack and data
   structures describing the thread the number is capped.  On 64-bit
   machines the architecture does not add any limitations anymore (at
   least for the moment) and with enough resources the number of
   threads can be grown arbitrarily.

   This does not mean that using hundreds of thousands of threads is a
   desirable design for the majority of applications.  At least not
   unless the number of processors matches the number of threads.  But
   it is important to note that the design on the library does not have
   a fixed limit.

   The kernel work to optimize for a high thread count is still
   ongoing.  Some places in which the kernel iterates over process and
   threads remain and other places need to be cleaned up.  But it has
   already been shown that given sufficient resources and a reasonable
   architecture an order of magnitude more threads can be created than
   in our tests on IA-32.


- - The futex system call is used extensively in all synchronization
   primitives and other places which need some kind of
   synchronization.  The futex mechanism is generic enough to support
   the standard POSIX synchronization mechanisms with very little
   effort.

   The fact that this is possible is also essential for the selection
   of the 1-on-1 model since only with the kernel seeing all the
   waiters and knowing that they are blocked for synchronization
   purposes will allow the scheduler to make decisions as good as a
   thread library would be able to in an M-on-N model implementation.

   Futexes also allow the implementation of inter-process
   synchronization primitives, a sorely missed feature in the old
   LinuxThreads implementation (Hi jbj!).


- - Substantial effort went into making the thread creation and
   destruction as fast as possible.  Extensions to the clone(2) system
   call were introduced to eliminate the need for a helper thread in
   either creation or destruction.  The exit process in the kernel was
   optimized (previously not a high priority).  The library itself
   optimizes the memory allocation so that in many cases the creation
   of a new thread can be achieved with one single system call.

   On an old IA-32 dual 450MHz PII Xeon system 100,000 threads can be
   created and destroyed in 2.3 secs (with up to 50 threads running at
   any one time).


- - Programs indirectly linked against the thread library had problems
   with the old implementation because of the way symbols are looked
   up. This should not be a problem anymore.


The thread library is designed to be binary compatible with the old
LinuxThreads implementation.  This compatibility obviously has some
limitations.  In places where the LinuxThreads implementation diverged
from the POSIX standard incompatibilities exist.  Users of the old
library have been warned from day one that this day will come and code
which added work-arounds for the POSIX non-compliance better be
prepared to remove that code.  The visible changes of the library
include:


- - The signal handling changes from per-thread signal handling to the
   POSIX process signal handling.  This change will require changes in
   programs which exploit the non-conformance of the old implementation.

   One consequence of this is that SIGSTOP works on the process.  Job 
control
   in the shell and stopping the whole process in a debugger work now.

- - getpid() now returns the same value in all threads

- - the exec functions are implemented correctly: the exec'ed process gets
   the PID of the process.  The parent of the multi-threaded application
   is only notified when the exec'ed process terminates.

- - thread handlers registered with pthread_atfork are not anymore run
   if vfork is used.  This isn't required by the standard (which does
   not define vfork) and all which is allowed in the child is calling
   exit() or an exec function.  A user of vfork better knows what s/he
   does.

- - libpthread should now be much more resistant to linking problems: even
   if the application doesn't list libpthread as a direct dependency
   functions which are extended by libpthread should work correctly.

- - no manager thread

- - inter-process mutex, read-write lock, conditional variable, and barrier
   implementations are available

- - the pthread_kill_other_threads_np function is not available.  It was
   needed to work around the broken signal handling.  If somebody shows
   some existing code which makes legitimate use of this function we
   might add it back.

- - requires a kernel with the threading capabilities of Linux 2.5.36.



The sources for the new library are for the time being available at

   ftp://people.redhat.com/drepper/nptl/

The current sources contain support only for IA-32 but this will
change very quickly.  The thread library is built as part of glibc so
the complete set of glibc sources is available as well.  The current
snapshot for glibc 2.3 (or glibc 2.3 when released) is necessary.  You
can find it at

   ftp://sources.redhat.com/pub/glibc/snapshots

Final releases will be available on ftp.gnu.org and its mirrors.


Building glibc with the new thread library is demanding on the
compilation environment.

- - The 2.5.36 kernel or above must be installed and used.  To compile
   glibc it is necessary to create the symbolic link

      /lib/modules/$(uname -r)/build

   to point to the build directory.

- - The general compiler requirement for glibc is at least gcc 3.2.  For
   the new thread code it is even necessary to have working support for
   the __thread keyword.

   Similarly, binutils with functioning TLS support are needed.

   The (Null) beta release of the upcoming Red Hat Linux product is
   known to have the necessary tools available after updating from the
   latest binaries on the FTP site.  This is no ploy to force everybody
   to use Red Hat Linux, it's just the only environment known to date
   which works.  If alternatives are known they can be announced on the
   mailing list.

- - To configure glibc it is necessary to run in the build directory
   (which always should be separate from the source directory):

    /path/to/glibc/configure --prefix=/usr --enable-add-ons=linuxthreads2 \
       --enable-kernel=current --with-tls

   The --enable-kernel parameter requires that the 2.5.36+ kernel is
   running.  It is not strictly necessary but helps to avoid mistakes.
   It might also be a good idea to add --disable-profile, just to speed
   up the compilation.

   When configured as above the library must not be installed since it
   would overwrite the system's library.  If you want to install the
   resulting library choose a different --prefix parameter value.
   Otherwise the new code can be used without installation.  Running
   existing binaries is possible with

    elf/ld.so --library-path .:linuxthreads2:dlfcn:math <binary> <args>...

   Alternatively the binary could be build to find the dynamic linker
   and DSO by itself.  This is a much easier way to debug the code
   since gdb can start the binary.  Compiling is a bit more complicated
   in this case:

    gcc -nostdlib -nostartfiles -o <OUTPUT> csu/crt1.o csu/crti.o \
      $(gcc --print-file-name=crtbegin.o) <INPUTS> \
      -Wl,-rpath,$PWD,-dynamic-linker,$PWD/ld-linux.so.2 \
      linuxthreads2/libpthread.so.0 ./libc.so.6 ./libc_nonshared.a \
      elf/ld-linux.so.2 $(gcc --print-file-name=crtend.o) csu/crtn.o

   This command assumes that it is run in the build directory.  Correct
   the paths if necessary.  The compilation will use the system's
   headers which is a good test but might lead to strange effects if
   there are compatibility bugs left.


Once all these prerequisites are met compiling glibc should be easy.
But there are some tests which will flunk.  For good reasons we aren't
officially releasing the code yet.  The bugs are either in the TLS
code which is not enabled in the standard glibc build, or obviously in
the thread library itself.  To run the tests for the thread library
run

   make subdirs=linuxthreads2 check

One word on the name 'linuxthreads2' of the directory.  This is only a
convenience thing so that the glibc configure scripts don't complain
about missing thread support.  It will we changed to reflect the real
name of the library ASAP.


What can you expect?

This is a very early version of the code so the obvious answer is:
some problems.  The test suite for the new thread code should pass but
beside that and some performance measurement tool we haven't run much
code.  Ideally we would get people to write many more of these small
test programs which are included in the sources.  Compiling big
programs would mean not being able to locate problems easy.  But I
certainly won't object to people running and debugging bigger
applications.  Please report successes and failures to the mailing
list.

People who are interested in contributing must be aware that for any
non-trivial change we need an assignment of the code to the FSF.  The
process is unfortunately necessary in today's world.

People who are contaminated by having worked on proprietary thread
library implementation should not participate in discussions on the
mailing list unless they willfully disclose the information.  Every
bit of information is publically available from the mailing list
archive.


Which brings us to the final point: the mailing list for *all*
discussions related to this thread library implementation is

   phil-list@redhat.com

Go to

   https://listman.redhat.com/mailman/listinfo/phil-list

to subscribe, unsubscribe, or review the archive.

- -- 
- ---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE9im7E2ijCOnn/RHQRApe9AKCN20A8A5ITi3DUq+3IRZ0gsSVHTQCeKqEu
fA5OFtNuzYqltxSMoL8Ambw=
=4pb4
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20  0:41 [ANNOUNCE] Native POSIX Thread Library 0.1 Ulrich Drepper
@ 2002-09-20  0:51 ` William Lee Irwin III
  2002-09-20  1:35   ` Ulrich Drepper
  2002-09-20  1:56 ` Larry McVoy
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 114+ messages in thread
From: William Lee Irwin III @ 2002-09-20  0:51 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel

On Thu, Sep 19, 2002 at 05:41:37PM -0700, Ulrich Drepper wrote:
>   Initial confirmations were test runs with huge numbers of threads.
>   Even on IA-32 with its limited address space and memory handling
>   running 100,000 concurrent threads was no problem at all, creating
>   and destroying the threads did not take more than two seconds.  This
>   all was made possible by the kernel work performed as part of this
>   project.

What stress tests and/or benchmarks are you using?


Thanks,
Bill

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20  0:51 ` William Lee Irwin III
@ 2002-09-20  1:35   ` Ulrich Drepper
  2002-09-20  1:42     ` William Lee Irwin III
  0 siblings, 1 reply; 114+ messages in thread
From: Ulrich Drepper @ 2002-09-20  1:35 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

William Lee Irwin III wrote:

> What stress tests and/or benchmarks are you using?

We have developed a little benchmark in parallel to the library. 
Nothing special, but you've seen Ingo using it in his argumentations 
(usually called p3).

This does not in any way removes the need for more benchmarks.

- -- 
- ---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE9inta2ijCOnn/RHQRAp+mAJ9KQJgH1wy1hifkON6/v9EgkptjbgCdHQhF
vcrLVpU85pCuq6fZDo8uFn0=
=Ep3W
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20  1:35   ` Ulrich Drepper
@ 2002-09-20  1:42     ` William Lee Irwin III
  0 siblings, 0 replies; 114+ messages in thread
From: William Lee Irwin III @ 2002-09-20  1:42 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel

> William Lee Irwin III wrote:
>> What stress tests and/or benchmarks are you using?

On Thu, Sep 19, 2002 at 06:35:17PM -0700, Ulrich Drepper wrote:
> We have developed a little benchmark in parallel to the library. 
> Nothing special, but you've seen Ingo using it in his argumentations 
> (usually called p3).
> This does not in any way removes the need for more benchmarks.

If you could pass that along for me (or others) to add to the list of
things to bench I'd be much obliged.


Thanks,
Bill

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20  0:41 [ANNOUNCE] Native POSIX Thread Library 0.1 Ulrich Drepper
  2002-09-20  0:51 ` William Lee Irwin III
@ 2002-09-20  1:56 ` Larry McVoy
  2002-09-20  2:01 ` Rik van Riel
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 114+ messages in thread
From: Larry McVoy @ 2002-09-20  1:56 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel

> - - the new library is based on an 1-on-1 model.  Earlier design
>    documents stated that an M-on-N implementation was necessary to
>    support a scalable thread library.  This was especially true for
>    the IA-32 and x86-64 platforms since the ABI with respect to threads
>    forces the use of segment registers and the only way to use those
>    registers was with the Local Descriptor Table (LDT) data structure
>    of the processor.
> 
>    The kernel limitations the earlier designs were based on have been
>    eliminated as part of this project, opening the road to a 1-on-1
>    implementation which has many advantages such as
> 
>    + less complex implementation;
>    + avoidance of two-level scheduling, enabling the kernel to make all
>      scheduling decisions;
>    + direct interaction between kernel and user-level code (e.g., when
>      delivering signals);
>    + and more and more.

I'm just starting to look at this...  Without digging into it, my
impression is that this is 100% the right way to go.  Rob Pike (Mr Plan
9, which while it hasn't had the impact of Linux is actually a fantastic
chunk of work) once said "If you think you need threads, your processes
are too fat".  I believe that's another way of stating that a 1-on-1
model is the right approach.  He's saying "don't make threads to make
things lighter, that's bullshit, use processes as threads, that will
force the processes to be light and that's a good thing for processes
*and* threads".

My only issue with that approach (I'm a big time advocate of that
approach) is TLB & page table sharing.  My understanding (weak as it is)
of Linux is that it does the right thing here so there isn't much of
an issue.  In Linux the address space is a first class object so the
id in the TLB is an address space ID, not a process id, which means
a pile of unrelated processes could, in theory, share the same chunk
of address space.  That's cool.  A lot of processor folks have danced
around that issue for years, I fought with Mash at MIPS about it, he
knew it was something that was needed.  But Linux, as far as I can tell,
got it right in a different way that made the issue go away.  Which means
1-on-1 threads are the right approach for reasons which have nothing to
do with threads, as well as reasons which have to do with threads.

Kudos to Ulrich & team for getting it right.  I'll go dig into it and
see if I've missed the point or not, but this sounds really good.
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20  0:41 [ANNOUNCE] Native POSIX Thread Library 0.1 Ulrich Drepper
  2002-09-20  0:51 ` William Lee Irwin III
  2002-09-20  1:56 ` Larry McVoy
@ 2002-09-20  2:01 ` Rik van Riel
  2002-09-20  2:15   ` Benjamin LaHaise
                     ` (3 more replies)
  2002-09-20  9:53 ` [ANNOUNCE] Native POSIX Thread Library 0.1 Padraig Brady
                   ` (5 subsequent siblings)
  8 siblings, 4 replies; 114+ messages in thread
From: Rik van Riel @ 2002-09-20  2:01 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel

On Thu, 19 Sep 2002, Ulrich Drepper wrote:

>    Initial confirmations were test runs with huge numbers of threads.
>    Even on IA-32 with its limited address space and memory handling
>    running 100,000 concurrent threads was no problem at all,

So, where did you put those 800 MB of kernel stacks needed for
100,000 threads ?

If you used the standard 3:1 user/kernel split you'd be using
all of ZONE_NORMAL for kernel stacks, but if you use a 2:2 split
you'll end up with a lot less user space (bad if you want to
have many threads in the same address space).

Do you have some special solutions up your sleeve or is this
in the category of as-of-yet unsolved problems ?

regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

Spamtraps of the month:  september@surriel.com trac@trac.org


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20  2:01 ` Rik van Riel
@ 2002-09-20  2:15   ` Benjamin LaHaise
  2002-09-20  2:40     ` Dave Hansen
  2002-09-20  2:47     ` William Lee Irwin III
  2002-09-20  2:17   ` Larry McVoy
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 114+ messages in thread
From: Benjamin LaHaise @ 2002-09-20  2:15 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Ulrich Drepper, linux-kernel

On Thu, Sep 19, 2002 at 11:01:33PM -0300, Rik van Riel wrote:
> So, where did you put those 800 MB of kernel stacks needed for
> 100,000 threads ?

That's what the 4KB stack patch is for. ;-)

		-ben

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20  2:01 ` Rik van Riel
  2002-09-20  2:15   ` Benjamin LaHaise
@ 2002-09-20  2:17   ` Larry McVoy
  2002-09-20  2:24     ` Rik van Riel
  2002-09-20  2:23   ` Anton Blanchard
  2002-09-20  7:52   ` 100,000 threads? [was: [ANNOUNCE] Native POSIX Thread Library 0.1] Ingo Molnar
  3 siblings, 1 reply; 114+ messages in thread
From: Larry McVoy @ 2002-09-20  2:17 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Ulrich Drepper, linux-kernel

On Thu, Sep 19, 2002 at 11:01:33PM -0300, Rik van Riel wrote:
> On Thu, 19 Sep 2002, Ulrich Drepper wrote:
> 
> >    Initial confirmations were test runs with huge numbers of threads.
> >    Even on IA-32 with its limited address space and memory handling
> >    running 100,000 concurrent threads was no problem at all,
> 
> So, where did you put those 800 MB of kernel stacks needed for
> 100,000 threads ?

Come on, you and I normally agree, but 100,000 threads?  Where is the need
for that?  More importantly, is there any realistic application that can 
use 100,000 threads where the kernel stack is 0 but the user level stack
doesn't have exactly the same problem?  The kernel can be perfect, i.e.,
cost zero, and you still have a problem.
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20  2:01 ` Rik van Riel
  2002-09-20  2:15   ` Benjamin LaHaise
  2002-09-20  2:17   ` Larry McVoy
@ 2002-09-20  2:23   ` Anton Blanchard
  2002-09-20  7:52   ` 100,000 threads? [was: [ANNOUNCE] Native POSIX Thread Library 0.1] Ingo Molnar
  3 siblings, 0 replies; 114+ messages in thread
From: Anton Blanchard @ 2002-09-20  2:23 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Ulrich Drepper, linux-kernel


> So, where did you put those 800 MB of kernel stacks needed for
> 100,000 threads ?

I hope no one is going to run x86 boxes with 100,000 threads, but
its nice to know we can do it. (just to have an upper limit)

If they want 100k threads they should start thinking about a 64bit
box. Ive already tested 1 million kernel threads with 24GB and we
make machines with more than 10 times that memory... (and no I cant
think of any possible reason someone would want that many threads :)

Anton

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20  2:17   ` Larry McVoy
@ 2002-09-20  2:24     ` Rik van Riel
  2002-09-20  2:32       ` Ulrich Drepper
  2002-09-20  6:01       ` Linus Torvalds
  0 siblings, 2 replies; 114+ messages in thread
From: Rik van Riel @ 2002-09-20  2:24 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Ulrich Drepper, linux-kernel

On Thu, 19 Sep 2002, Larry McVoy wrote:
> On Thu, Sep 19, 2002 at 11:01:33PM -0300, Rik van Riel wrote:

> > So, where did you put those 800 MB of kernel stacks needed for
> > 100,000 threads ?
>
> Come on, you and I normally agree, but 100,000 threads?  Where is the
> need for that?

I agree, it's pretty silly. But still, I was curious how they
managed to achieve it ;)

OTOH, some applications are known for sillyness ...

cheers,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

Spamtraps of the month:  september@surriel.com trac@trac.org


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20  2:24     ` Rik van Riel
@ 2002-09-20  2:32       ` Ulrich Drepper
  2002-09-20  6:01       ` Linus Torvalds
  1 sibling, 0 replies; 114+ messages in thread
From: Ulrich Drepper @ 2002-09-20  2:32 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Larry McVoy, linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Rik van Riel wrote:

> I agree, it's pretty silly. But still, I was curious how they
> managed to achieve it ;)

Ingo will be able to tell you when he gets up.  This is not my area of 
expertise.  AFAIK there were no special changes involved; Ben's irq 
stack patch would add to this number (I think Ingo said something about 
188,000 threads or so).

- -- 
- ---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE9ioi02ijCOnn/RHQRAnw+AJ9fFu36D8ZIk2Y3NC8Rpekb5EXwPwCePCBL
Z/u1XIdgB2F/UuixLkIpNvI=
=Ldzx
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20  2:15   ` Benjamin LaHaise
@ 2002-09-20  2:40     ` Dave Hansen
  2002-09-20  2:47     ` William Lee Irwin III
  1 sibling, 0 replies; 114+ messages in thread
From: Dave Hansen @ 2002-09-20  2:40 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Rik van Riel, Ulrich Drepper, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 441 bytes --]

Benjamin LaHaise wrote:
> On Thu, Sep 19, 2002 at 11:01:33PM -0300, Rik van Riel wrote:
> 
>>So, where did you put those 800 MB of kernel stacks needed for
>>100,000 threads ?
> 
> That's what the 4KB stack patch is for. ;-)

It's shameless plug time...

Just in case anybody is interested, I updated Ben's patch for 2.5.34. 
  I just resynced against current -bk, which is attached.  Works For 
Me (tm)

-- 
Dave Hansen
haveblue@us.ibm.com

[-- Attachment #2: irqstack-2.5.35+bk-0.patch --]
[-- Type: text/plain, Size: 15532 bytes --]

# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.552   -> 1.553  
#	arch/i386/kernel/process.c	1.40    -> 1.41   
#	arch/i386/kernel/irq.c	1.18    -> 1.19   
#	arch/i386/kernel/head.S	1.15    -> 1.16   
#	include/asm-i386/thread_info.h	1.7     -> 1.8    
#	include/asm-i386/page.h	1.16.1.1 -> 1.18   
#	arch/i386/kernel/entry.S	1.41.1.3 -> 1.44   
#	 arch/i386/config.in	1.47.1.2 -> 1.49   
#	  arch/i386/Makefile	1.17.1.3 -> 1.19   
#	arch/i386/kernel/i386_ksyms.c	1.30    -> 1.31   
#	arch/i386/kernel/smpboot.c	1.33.1.2 -> 1.35   
#	arch/i386/boot/compressed/misc.c	1.7     -> 1.8    
#	arch/i386/kernel/init_task.c	1.6     -> 1.7    
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/09/19	haveblue@elm3b96.(none)	1.553
# Merge elm3b96.(none):/work/dave/bk/linux-2.5
# into elm3b96.(none):/work/dave/bk/linux-2.5-irqstack
# --------------------------------------------
#
diff -Nru a/arch/i386/Makefile b/arch/i386/Makefile
--- a/arch/i386/Makefile	Thu Sep 19 19:38:27 2002
+++ b/arch/i386/Makefile	Thu Sep 19 19:38:27 2002
@@ -85,6 +85,10 @@
 CFLAGS += -march=i586
 endif
 
+ifdef CONFIG_X86_STACK_CHECK
+CFLAGS += -p
+endif
+
 HEAD := arch/i386/kernel/head.o arch/i386/kernel/init_task.o
 
 libs-y 					+= arch/i386/lib/
diff -Nru a/arch/i386/boot/compressed/misc.c b/arch/i386/boot/compressed/misc.c
--- a/arch/i386/boot/compressed/misc.c	Thu Sep 19 19:38:27 2002
+++ b/arch/i386/boot/compressed/misc.c	Thu Sep 19 19:38:27 2002
@@ -377,3 +377,7 @@
 	if (high_loaded) close_output_buffer_if_we_run_high(mv);
 	return high_loaded;
 }
+
+/* We don't actually check for stack overflows this early. */
+__asm__(".globl mcount ; mcount: ret\n");
+
diff -Nru a/arch/i386/config.in b/arch/i386/config.in
--- a/arch/i386/config.in	Thu Sep 19 19:38:27 2002
+++ b/arch/i386/config.in	Thu Sep 19 19:38:27 2002
@@ -35,6 +35,7 @@
 #
 # Define implied options from the CPU selection here
 #
+define_bool CONFIG_X86_HAVE_CMOV n
 
 if [ "$CONFIG_M386" = "y" ]; then
    define_bool CONFIG_X86_CMPXCHG n
@@ -91,18 +92,21 @@
    define_bool CONFIG_X86_GOOD_APIC y
    define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
    define_bool CONFIG_X86_PPRO_FENCE y
+   define_bool CONFIG_X86_HAVE_CMOV y
 fi
 if [ "$CONFIG_MPENTIUMIII" = "y" ]; then
    define_int  CONFIG_X86_L1_CACHE_SHIFT 5
    define_bool CONFIG_X86_TSC y
    define_bool CONFIG_X86_GOOD_APIC y
    define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
+   define_bool CONFIG_X86_HAVE_CMOV y
 fi
 if [ "$CONFIG_MPENTIUM4" = "y" ]; then
    define_int  CONFIG_X86_L1_CACHE_SHIFT 7
    define_bool CONFIG_X86_TSC y
    define_bool CONFIG_X86_GOOD_APIC y
    define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
+   define_bool CONFIG_X86_HAVE_CMOV y
 fi
 if [ "$CONFIG_MK6" = "y" ]; then
    define_int  CONFIG_X86_L1_CACHE_SHIFT 5
@@ -116,6 +120,7 @@
    define_bool CONFIG_X86_GOOD_APIC y
    define_bool CONFIG_X86_USE_3DNOW y
    define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
+   define_bool CONFIG_X86_HAVE_CMOV y
 fi
 if [ "$CONFIG_MELAN" = "y" ]; then
    define_int  CONFIG_X86_L1_CACHE_SHIFT 4
@@ -132,6 +137,7 @@
 if [ "$CONFIG_MCRUSOE" = "y" ]; then
    define_int  CONFIG_X86_L1_CACHE_SHIFT 5
    define_bool CONFIG_X86_TSC y
+   define_bool CONFIG_X86_HAVE_CMOV y
 fi
 if [ "$CONFIG_MWINCHIPC6" = "y" ]; then
    define_int  CONFIG_X86_L1_CACHE_SHIFT 5
@@ -435,6 +441,7 @@
    if [ "$CONFIG_HIGHMEM" = "y" ]; then
       bool '  Highmem debugging' CONFIG_DEBUG_HIGHMEM
    fi
+   bool '  Check for stack overflows' CONFIG_X86_STACK_CHECK
 fi
 
 endmenu
diff -Nru a/arch/i386/kernel/entry.S b/arch/i386/kernel/entry.S
--- a/arch/i386/kernel/entry.S	Thu Sep 19 19:38:27 2002
+++ b/arch/i386/kernel/entry.S	Thu Sep 19 19:38:27 2002
@@ -136,7 +136,7 @@
 	movl %ecx,CS(%esp)	#
 	movl %esp, %ebx
 	pushl %ebx
-	andl $-8192, %ebx	# GET_THREAD_INFO
+	GET_THREAD_INFO_WITH_ESP(%ebx)
 	movl TI_EXEC_DOMAIN(%ebx), %edx	# Get the execution domain
 	movl 4(%edx), %edx	# Get the lcall7 handler for the domain
 	pushl $0x7
@@ -158,7 +158,7 @@
 	movl %ecx,CS(%esp)	#
 	movl %esp, %ebx
 	pushl %ebx
-	andl $-8192, %ebx	# GET_THREAD_INFO
+	GET_THREAD_INFO_WITH_ESP(%ebx)
 	movl TI_EXEC_DOMAIN(%ebx), %edx	# Get the execution domain
 	movl 4(%edx), %edx	# Get the lcall7 handler for the domain
 	pushl $0x27
@@ -334,7 +334,39 @@
 	ALIGN
 common_interrupt:
 	SAVE_ALL
+
+	GET_THREAD_INFO(%ebx)
+	movl TI_IRQ_STACK(%ebx),%ecx
+	movl TI_TASK(%ebx),%edx
+	movl %esp,%eax
+	leal (THREAD_SIZE-4)(%ecx),%esi
+	testl %ecx,%ecx			# is there a valid irq_stack?
+
+	# switch to the irq stack
+#ifdef CONFIG_X86_HAVE_CMOV
+	cmovnz %esi,%esp
+#else
+	jnz 1f
+	mov %esi,%esp
+1:
+#endif
+
+	# update the task pointer in the irq stack
+	GET_THREAD_INFO(%esi)
+	movl %edx,TI_TASK(%esi)
+
 	call do_IRQ
+
+	movl %eax,%esp			# potentially restore non-irq stack
+
+	# copy flags from the irq stack back into the task's thread_info
+	# %esi is saved over the do_IRQ call and contains the irq stack
+	# thread_info pointer
+	# %ebx contains the original thread_info pointer
+	movl TI_FLAGS(%esi),%eax
+	movl $0,TI_FLAGS(%esi)
+	LOCK orl %eax,TI_FLAGS(%ebx)
+
 	jmp ret_from_intr
 
 #define BUILD_INTERRUPT(name, nr)	\
@@ -506,6 +538,61 @@
 	pushl $0
 	pushl $do_spurious_interrupt_bug
 	jmp error_code
+
+#ifdef CONFIG_X86_STACK_CHECK
+.data
+	.globl	stack_overflowed
+stack_overflowed:
+	.long	0
+
+.text
+
+ENTRY(mcount)
+	push %eax
+	movl $(THREAD_SIZE - 1),%eax
+	andl %esp,%eax
+	cmpl $0x200,%eax        /* 512 byte danger zone */
+	jle 1f
+2:
+	popl %eax
+	ret
+1:
+	lock; btsl $0,stack_overflowed	/* Prevent reentry via printk */
+	jc      2b
+
+	# switch to overflow stack
+	movl	%esp,%eax
+	movl	$(stack_overflow_stack + THREAD_SIZE - 4),%esp
+
+	pushf
+	cli
+	pushl	%eax
+
+	# push eip then esp of error for stack_overflow_panic
+	pushl	4(%eax)
+	pushl	%eax
+
+	# update the task pointer and cpu in the overflow stack's thread_info.
+	GET_THREAD_INFO_WITH_ESP(%eax)
+	movl	TI_TASK(%eax),%ebx
+	movl	%ebx,stack_overflow_stack+TI_TASK
+	movl	TI_CPU(%eax),%ebx
+	movl	%ebx,stack_overflow_stack+TI_CPU
+
+	# never neverland
+	call	stack_overflow_panic
+
+	addl	$8,%esp
+
+	popf
+	popl	%eax
+	movl	%eax,%esp
+	popl	%eax
+	movl	$0,stack_overflowed
+	ret
+
+#warning stack check enabled
+#endif
 
 .data
 ENTRY(sys_call_table)
diff -Nru a/arch/i386/kernel/head.S b/arch/i386/kernel/head.S
--- a/arch/i386/kernel/head.S	Thu Sep 19 19:38:27 2002
+++ b/arch/i386/kernel/head.S	Thu Sep 19 19:38:27 2002
@@ -15,6 +15,7 @@
 #include <asm/page.h>
 #include <asm/pgtable.h>
 #include <asm/desc.h>
+#include <asm/thread_info.h>
 
 #define OLD_CL_MAGIC_ADDR	0x90020
 #define OLD_CL_MAGIC		0xA33F
@@ -305,7 +306,7 @@
 	ret
 
 ENTRY(stack_start)
-	.long init_thread_union+8192
+	.long init_thread_union+THREAD_SIZE
 	.long __KERNEL_DS
 
 /* This is the default interrupt "handler" :-) */
diff -Nru a/arch/i386/kernel/i386_ksyms.c b/arch/i386/kernel/i386_ksyms.c
--- a/arch/i386/kernel/i386_ksyms.c	Thu Sep 19 19:38:27 2002
+++ b/arch/i386/kernel/i386_ksyms.c	Thu Sep 19 19:38:27 2002
@@ -172,3 +172,8 @@
 EXPORT_SYMBOL(is_sony_vaio_laptop);
 
 EXPORT_SYMBOL(__PAGE_KERNEL);
+
+#ifdef CONFIG_X86_STACK_CHECK
+extern void mcount(void);
+EXPORT_SYMBOL(mcount);
+#endif
diff -Nru a/arch/i386/kernel/init_task.c b/arch/i386/kernel/init_task.c
--- a/arch/i386/kernel/init_task.c	Thu Sep 19 19:38:27 2002
+++ b/arch/i386/kernel/init_task.c	Thu Sep 19 19:38:27 2002
@@ -13,6 +13,14 @@
 static struct signal_struct init_signals = INIT_SIGNALS(init_signals);
 struct mm_struct init_mm = INIT_MM(init_mm);
 
+union thread_union init_irq_union
+	__attribute__((__section__(".data.init_task")));
+
+#ifdef CONFIG_X86_STACK_CHECK
+union thread_union stack_overflow_stack
+	__attribute__((__section__(".data.init_task")));
+#endif
+
 /*
  * Initial thread structure.
  *
@@ -22,7 +30,15 @@
  */
 union thread_union init_thread_union 
 	__attribute__((__section__(".data.init_task"))) =
-		{ INIT_THREAD_INFO(init_task) };
+		{ { 
+			task:		&init_task,
+			exec_domain:	&default_exec_domain,
+			flags:		0,
+			cpu:		0,
+			addr_limit:	KERNEL_DS,
+			irq_stack:	&init_irq_union,
+		} };
+
 
 /*
  * Initial task structure.
diff -Nru a/arch/i386/kernel/irq.c b/arch/i386/kernel/irq.c
--- a/arch/i386/kernel/irq.c	Thu Sep 19 19:38:27 2002
+++ b/arch/i386/kernel/irq.c	Thu Sep 19 19:38:27 2002
@@ -311,7 +311,8 @@
  * SMP cross-CPU interrupts have their own specific
  * handlers).
  */
-asmlinkage unsigned int do_IRQ(struct pt_regs regs)
+struct pt_regs *do_IRQ(struct pt_regs *regs) __attribute__((regparm(1)));
+struct pt_regs *do_IRQ(struct pt_regs *regs)
 {	
 	/* 
 	 * We ack quickly, we don't want the irq controller
@@ -323,7 +324,7 @@
 	 * 0 return value means that this irq is already being
 	 * handled by some other CPU. (or is disabled)
 	 */
-	int irq = regs.orig_eax & 0xff; /* high bits used in ret_from_ code  */
+	int irq = regs->orig_eax & 0xff; /* high bits used in ret_from_ code  */
 	int cpu = smp_processor_id();
 	irq_desc_t *desc = irq_desc + irq;
 	struct irqaction * action;
@@ -373,7 +374,7 @@
 	 */
 	for (;;) {
 		spin_unlock(&desc->lock);
-		handle_IRQ_event(irq, &regs, action);
+		handle_IRQ_event(irq, regs, action);
 		spin_lock(&desc->lock);
 		
 		if (likely(!(desc->status & IRQ_PENDING)))
@@ -392,7 +393,7 @@
 
 	irq_exit();
 
-	return 1;
+	return regs;
 }
 
 /**
diff -Nru a/arch/i386/kernel/process.c b/arch/i386/kernel/process.c
--- a/arch/i386/kernel/process.c	Thu Sep 19 19:38:27 2002
+++ b/arch/i386/kernel/process.c	Thu Sep 19 19:38:27 2002
@@ -438,6 +438,16 @@
 
 extern void show_trace(unsigned long* esp);
 
+#ifdef CONFIG_X86_STACK_CHECK
+void stack_overflow_panic(void *esp, void *eip)
+{
+	printk("stack overflow from %p.  esp: %p\n", eip, esp);
+	show_trace(esp);
+	panic("stack overflow\n");
+}
+
+#endif
+
 void show_regs(struct pt_regs * regs)
 {
 	unsigned long cr0 = 0L, cr2 = 0L, cr3 = 0L, cr4 = 0L;
@@ -693,6 +703,7 @@
 
 	/* never put a printk in __switch_to... printk() calls wake_up*() indirectly */
 
+	next_p->thread_info->irq_stack = prev_p->thread_info->irq_stack;
 	unlazy_fpu(prev_p);
 
 	/*
diff -Nru a/arch/i386/kernel/smpboot.c b/arch/i386/kernel/smpboot.c
--- a/arch/i386/kernel/smpboot.c	Thu Sep 19 19:38:27 2002
+++ b/arch/i386/kernel/smpboot.c	Thu Sep 19 19:38:27 2002
@@ -66,6 +66,10 @@
 /* Per CPU bogomips and other parameters */
 struct cpuinfo_x86 cpu_data[NR_CPUS] __cacheline_aligned;
 
+extern union thread_union init_irq_union;
+union thread_union *irq_stacks[NR_CPUS] __cacheline_aligned =
+	{ &init_irq_union, };
+
 /* Set when the idlers are all forked */
 int smp_threads_ready;
 
@@ -760,6 +764,27 @@
 	return (send_status | accept_status);
 }
 
+static void __init setup_irq_stack(struct task_struct *p, int cpu)
+{
+	unsigned long stk;
+
+	stk = __get_free_pages(GFP_KERNEL, THREAD_ORDER);
+	if (!stk)
+		panic("I can't seem to allocate my irq stack.  Oh well, giving up.");
+
+	irq_stacks[cpu] = (void *)stk;
+	memset(irq_stacks[cpu], 0, THREAD_SIZE);
+	irq_stacks[cpu]->thread_info.cpu = cpu;
+	irq_stacks[cpu]->thread_info.preempt_count = 1;
+					/* interrupts are not preemptable */
+	p->thread_info->irq_stack = irq_stacks[cpu];
+
+	/* If we want to make the irq stack more than one unit
+	 * deep, we can chain then off of the irq_stack pointer
+	 * here.
+	 */
+}
+
 extern unsigned long cpu_initialized;
 
 static void __init do_boot_cpu (int apicid) 
@@ -783,6 +808,8 @@
 	if (IS_ERR(idle))
 		panic("failed fork for CPU %d", cpu);
 
+	setup_irq_stack(idle, cpu);
+
 	/*
 	 * We remove it from the pidhash and the runqueue
 	 * once we got the process:
@@ -800,7 +827,13 @@
 
 	/* So we see what's up   */
 	printk("Booting processor %d/%d eip %lx\n", cpu, apicid, start_eip);
-	stack_start.esp = (void *) (1024 + PAGE_SIZE + (char *)idle->thread_info);
+
+	/* The -4 is to correct for the fact that the stack pointer
+	 * is used to find the location of the thread_info structure
+	 * by masking off several of the LSBs.  Without the -4, esp
+	 * is pointing to the page after the one the stack is on.
+	 */
+	stack_start.esp = (void *)(THREAD_SIZE - 4 + (char *)idle->thread_info);
 
 	/*
 	 * This grunge runs the startup process for
diff -Nru a/include/asm-i386/page.h b/include/asm-i386/page.h
--- a/include/asm-i386/page.h	Thu Sep 19 19:38:27 2002
+++ b/include/asm-i386/page.h	Thu Sep 19 19:38:27 2002
@@ -3,7 +3,11 @@
 
 /* PAGE_SHIFT determines the page size */
 #define PAGE_SHIFT	12
+#ifndef __ASSEMBLY__
 #define PAGE_SIZE	(1UL << PAGE_SHIFT)
+#else
+#define PAGE_SIZE	(1 << PAGE_SHIFT)
+#endif
 #define PAGE_MASK	(~(PAGE_SIZE-1))
 
 #define LARGE_PAGE_MASK (~(LARGE_PAGE_SIZE-1))
diff -Nru a/include/asm-i386/thread_info.h b/include/asm-i386/thread_info.h
--- a/include/asm-i386/thread_info.h	Thu Sep 19 19:38:27 2002
+++ b/include/asm-i386/thread_info.h	Thu Sep 19 19:38:27 2002
@@ -9,6 +9,7 @@
 
 #ifdef __KERNEL__
 
+#include <asm/page.h>
 #ifndef __ASSEMBLY__
 #include <asm/processor.h>
 #endif
@@ -28,9 +29,11 @@
 	__s32			preempt_count; /* 0 => preemptable, <0 => BUG */
 
 	mm_segment_t		addr_limit;	/* thread address space:
+						   0 for interrupts: illegal
 					 	   0-0xBFFFFFFF for user-thead
 						   0-0xFFFFFFFF for kernel-thread
 						*/
+	struct thread_info	*irq_stack;	/* pointer to cpu irq stack */
 
 	__u8			supervisor_stack[0];
 };
@@ -44,6 +47,7 @@
 #define TI_CPU		0x0000000C
 #define TI_PRE_COUNT	0x00000010
 #define TI_ADDR_LIMIT	0x00000014
+#define TI_IRQ_STACK	0x00000018
 
 #endif
 
@@ -54,42 +58,42 @@
  *
  * preempt_count needs to be 1 initially, until the scheduler is functional.
  */
+#define THREAD_ORDER  0
+#define INIT_THREAD_SIZE      THREAD_SIZE
+
 #ifndef __ASSEMBLY__
-#define INIT_THREAD_INFO(tsk)			\
-{						\
-	.task		= &tsk,			\
-	.exec_domain	= &default_exec_domain,	\
-	.flags		= 0,			\
-	.cpu		= 0,			\
-	.preempt_count	= 1,			\
-	.addr_limit	= KERNEL_DS,		\
-}
 
 #define init_thread_info	(init_thread_union.thread_info)
 #define init_stack		(init_thread_union.stack)
 
+/* thread information allocation */
+#define THREAD_SIZE (PAGE_SIZE << THREAD_ORDER)
+#define alloc_thread_info() ((struct thread_info *) __get_free_pages(GFP_KERNEL,THREAD_ORDER))
+#define free_thread_info(ti) free_pages((unsigned long) (ti), THREAD_ORDER)
+#define get_thread_info(ti) get_task_struct((ti)->task)
+#define put_thread_info(ti) put_task_struct((ti)->task)
+ 
+	
 /* how to get the thread information struct from C */
 static inline struct thread_info *current_thread_info(void)
 {
 	struct thread_info *ti;
-	__asm__("andl %%esp,%0; ":"=r" (ti) : "0" (~8191UL));
+	__asm__("andl %%esp,%0; ":"=r" (ti) : "0" (~(THREAD_SIZE - 1)));
 	return ti;
 }
 
-/* thread information allocation */
-#define THREAD_SIZE (2*PAGE_SIZE)
-#define alloc_thread_info() ((struct thread_info *) __get_free_pages(GFP_KERNEL,1))
-#define free_thread_info(ti) free_pages((unsigned long) (ti), 1)
-#define get_thread_info(ti) get_task_struct((ti)->task)
-#define put_thread_info(ti) put_task_struct((ti)->task)
-
 #else /* !__ASSEMBLY__ */
 
+#define THREAD_SIZE (PAGE_SIZE << THREAD_ORDER)
+
 /* how to get the thread information struct from ASM */
 #define GET_THREAD_INFO(reg) \
-	movl $-8192, reg; \
+	movl $-THREAD_SIZE, reg; \
 	andl %esp, reg
-
+/* use this one if reg already contains %esp */
+#define GET_THREAD_INFO_WITH_ESP(reg) \
+	andl $-THREAD_SIZE, reg
+ 
 #endif
 
 /*

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20  2:15   ` Benjamin LaHaise
  2002-09-20  2:40     ` Dave Hansen
@ 2002-09-20  2:47     ` William Lee Irwin III
  1 sibling, 0 replies; 114+ messages in thread
From: William Lee Irwin III @ 2002-09-20  2:47 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Rik van Riel, Ulrich Drepper, linux-kernel

On Thu, Sep 19, 2002 at 11:01:33PM -0300, Rik van Riel wrote:
>> So, where did you put those 800 MB of kernel stacks needed for
>> 100,000 threads ?

On Thu, Sep 19, 2002 at 10:15:46PM -0400, Benjamin LaHaise wrote:
> That's what the 4KB stack patch is for. ;-)
> 		-ben

The task_struct isn't particularly slim either. I heard rumblings
that way back when it was shared with the stack the NR_CPUS arrays
filled (and overflowed) the entire stack... It's also enough to
put a wee bit of pressure on ZONE_NORMAL even on smaller task count
workloads.

Perhaps something like this is in order? vs. 2.5.33:

 fs/proc/array.c       |   22 ----------------------
 fs/proc/base.c        |   11 +----------
 include/linux/sched.h |    1 -
 kernel/fork.c         |   11 +----------
 kernel/timer.c        |    3 ---
 5 files changed, 2 insertions(+), 46 deletions(-)



Cheers,
Bill


===== fs/proc/array.c 1.24 vs edited =====
--- 1.24/fs/proc/array.c	Thu Jul  4 22:54:38 2002
+++ edited/fs/proc/array.c	Tue Jul 16 00:35:26 2002
@@ -592,25 +592,3 @@
 out:
 	return retval;
 }
-
-#ifdef CONFIG_SMP
-int proc_pid_cpu(struct task_struct *task, char * buffer)
-{
-	int i, len;
-
-	len = sprintf(buffer,
-		"cpu  %lu %lu\n",
-		jiffies_to_clock_t(task->utime),
-		jiffies_to_clock_t(task->stime));
-		
-	for (i = 0 ; i < NR_CPUS; i++) {
-		if (cpu_online(i))
-		len += sprintf(buffer + len, "cpu%d %lu %lu\n",
-			i,
-			jiffies_to_clock_t(task->per_cpu_utime[i]),
-			jiffies_to_clock_t(task->per_cpu_stime[i]));
-
-	}
-	return len;
-}
-#endif
===== fs/proc/base.c 1.26 vs edited =====
--- 1.26/fs/proc/base.c	Wed May 22 08:48:14 2002
+++ edited/fs/proc/base.c	Tue Jul 16 00:36:12 2002
@@ -52,7 +52,6 @@
 	PROC_PID_STAT,
 	PROC_PID_STATM,
 	PROC_PID_MAPS,
-	PROC_PID_CPU,
 	PROC_PID_MOUNTS,
 	PROC_PID_FD_DIR = 0x8000,	/* 0x8000-0xffff */
 };
@@ -72,9 +71,6 @@
   E(PROC_PID_CMDLINE,	"cmdline",	S_IFREG|S_IRUGO),
   E(PROC_PID_STAT,	"stat",		S_IFREG|S_IRUGO),
   E(PROC_PID_STATM,	"statm",	S_IFREG|S_IRUGO),
-#ifdef CONFIG_SMP
-  E(PROC_PID_CPU,	"cpu",		S_IFREG|S_IRUGO),
-#endif
   E(PROC_PID_MAPS,	"maps",		S_IFREG|S_IRUGO),
   E(PROC_PID_MEM,	"mem",		S_IFREG|S_IRUSR|S_IWUSR),
   E(PROC_PID_CWD,	"cwd",		S_IFLNK|S_IRWXUGO),
@@ -1003,12 +999,7 @@
 		case PROC_PID_MAPS:
 			inode->i_fop = &proc_maps_operations;
 			break;
-#ifdef CONFIG_SMP
-		case PROC_PID_CPU:
-			inode->i_fop = &proc_info_file_operations;
-			ei->op.proc_read = proc_pid_cpu;
-			break;
-#endif
+
 		case PROC_PID_MEM:
 			inode->i_op = &proc_mem_inode_operations;
 			inode->i_fop = &proc_mem_operations;
===== include/linux/sched.h 1.70 vs edited =====
--- 1.70/include/linux/sched.h	Thu Jul  4 22:33:26 2002
+++ edited/include/linux/sched.h	Tue Jul 16 00:35:26 2002
@@ -325,7 +325,6 @@
 	struct timer_list real_timer;
 	unsigned long utime, stime, cutime, cstime;
 	unsigned long start_time;
-	long per_cpu_utime[NR_CPUS], per_cpu_stime[NR_CPUS];
 /* mm fault and swap info: this can arguably be seen as either mm-specific or thread-specific */
 	unsigned long min_flt, maj_flt, nswap, cmin_flt, cmaj_flt, cnswap;
 	int swappable:1;
===== kernel/fork.c 1.49 vs edited =====
--- 1.49/kernel/fork.c	Mon Jul  1 14:41:36 2002
+++ edited/kernel/fork.c	Tue Jul 16 00:35:26 2002
@@ -725,16 +725,7 @@
 	p->tty_old_pgrp = 0;
 	p->utime = p->stime = 0;
 	p->cutime = p->cstime = 0;
-#ifdef CONFIG_SMP
-	{
-		int i;
-
-		/* ?? should we just memset this ?? */
-		for(i = 0; i < NR_CPUS; i++)
-			p->per_cpu_utime[i] = p->per_cpu_stime[i] = 0;
-		spin_lock_init(&p->sigmask_lock);
-	}
-#endif
+	spin_lock_init(&p->sigmask_lock);
 	p->array = NULL;
 	p->lock_depth = -1;		/* -1 = no lock */
 	p->start_time = jiffies;
===== kernel/timer.c 1.17 vs edited =====
--- 1.17/kernel/timer.c	Mon Jul  1 14:41:36 2002
+++ edited/kernel/timer.c	Tue Jul 16 00:35:26 2002
@@ -569,8 +569,6 @@
 void update_one_process(struct task_struct *p, unsigned long user,
 			unsigned long system, int cpu)
 {
-	p->per_cpu_utime[cpu] += user;
-	p->per_cpu_stime[cpu] += system;
 	do_process_times(p, user, system);
 	do_it_virt(p, user);
 	do_it_prof(p);

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20  2:24     ` Rik van Riel
  2002-09-20  2:32       ` Ulrich Drepper
@ 2002-09-20  6:01       ` Linus Torvalds
  2002-09-20  8:02         ` Ingo Molnar
  1 sibling, 1 reply; 114+ messages in thread
From: Linus Torvalds @ 2002-09-20  6:01 UTC (permalink / raw)
  To: linux-kernel

In article <Pine.LNX.4.44L.0209192323530.1857-100000@imladris.surriel.com>,
Rik van Riel  <riel@conectiva.com.br> wrote:
>On Thu, 19 Sep 2002, Larry McVoy wrote:
>> On Thu, Sep 19, 2002 at 11:01:33PM -0300, Rik van Riel wrote:
>
>> > So, where did you put those 800 MB of kernel stacks needed for
>> > 100,000 threads ?
>>
>> Come on, you and I normally agree, but 100,000 threads?  Where is the
>> need for that?
>
>I agree, it's pretty silly. But still, I was curious how they
>managed to achieve it ;)

You didn't read the post carefully.

They started and waited for 100,000 threads.

They did not have them all running at the same time. I think the
original post said something like "up to 50 at a time".

Basically, the benchmark was how _fast_ thread creation is, not now many
you can run at the same time. 100k threads at once is crazy, but you can
do it now on 64-bit architectures if you really want to.

		Linus

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: 100,000 threads? [was: [ANNOUNCE] Native POSIX Thread Library 0.1]
  2002-09-20  2:01 ` Rik van Riel
                     ` (2 preceding siblings ...)
  2002-09-20  2:23   ` Anton Blanchard
@ 2002-09-20  7:52   ` Ingo Molnar
  2002-09-20 15:47     ` Bill Davidsen
  3 siblings, 1 reply; 114+ messages in thread
From: Ingo Molnar @ 2002-09-20  7:52 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Ulrich Drepper, linux-kernel


On Thu, 19 Sep 2002, Rik van Riel wrote:

> So, where did you put those 800 MB of kernel stacks needed for 100,000
> threads ?

With the default split and kernel stack we can start up 94,000 threads on
x86. With Ben's/Dave's patch we can have up to 188,000 threads. With a 2:2
GB VM split configured we can start 376,000 threads. If someone's that
desperate then with a 1:3 split we can start up 564,000 threads.

Anton tested 1 million concurrent threads on one of his bigger PowerPC
boxes, which started up in around 30 seconds. I think he saw a load
average of around 200 thousand. [ie. the runqueue was probably a few
hundred thousand entries long at times.]

> If you used the standard 3:1 user/kernel split you'd be using all of
> ZONE_NORMAL for kernel stacks, but if you use a 2:2 split you'll end up
> with a lot less user space (bad if you want to have many threads in the
> same address space).

the extreme high-end of threading typically uses very controlled
applications and very small user level stacks.

as to the question of why so many threads, the answer is because we can :)
This, besides demonstrating some of the recent scalability advances, gives
us the warm fuzzy feeling that things are right in this area. I mean,
there are architectures where Linux could map a petabyte of RAM just fine,
even though that might not be something we desperately need today.

	Ingo


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20  6:01       ` Linus Torvalds
@ 2002-09-20  8:02         ` Ingo Molnar
  0 siblings, 0 replies; 114+ messages in thread
From: Ingo Molnar @ 2002-09-20  8:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel


On Fri, 20 Sep 2002, Linus Torvalds wrote:

> They did not have them all running at the same time. I think the
> original post said something like "up to 50 at a time".

actually, that was Ulrich's other test, which tests the serial starting of 
100,000 threads.

the test i did started up 100,000 concurrent threads which shot up the
load-average to a couple of thousands. [the default timeslice the parent
has is enough to start more than 50,000 parallel threads a pop or so.]

> Basically, the benchmark was how _fast_ thread creation is, not now many
> you can run at the same time. 100k threads at once is crazy, but you can
> do it now on 64-bit architectures if you really want to.

we did both, and on the dual-P4 testbox i have started and stopped 100,000
*parallel* threads in less than 2 seconds. Ie. starting up 100,000 threads
without any throttling, waiting for all of them to start up, then killing
them all. It needs roughly 1 GB of RAM to do this test on the default x86
kernel, it need roughly 500 MB of RAM to do this test with the IRQ-stacks
patch applied.

with 2.5.31 this test would have taken roughly 15 minutes, on the same
box, provided the NMI watchdog is turned off.

with 100,000 threads started up and idling silently the system is
completely usable - all the critical for_each_task loops have been fixed.  
Obviously with 100,000 threads running at once there's some shortage in
CPU power :-) [ I will perhaps try that once, at SCHED_BATCH priority,
just for kicks. Not that it makes much sense - they will get a 3 seconds
worth of timeslice every 3 days. ]

	Ingo


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20  0:41 [ANNOUNCE] Native POSIX Thread Library 0.1 Ulrich Drepper
                   ` (2 preceding siblings ...)
  2002-09-20  2:01 ` Rik van Riel
@ 2002-09-20  9:53 ` Padraig Brady
  2002-09-20 13:28   ` Robert Love
  2002-09-20  9:54 ` Adrian Bunk
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 114+ messages in thread
From: Padraig Brady @ 2002-09-20  9:53 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel

Ulrich Drepper wrote:
> We are pleased to announce the first publically available source
> release of a new POSIX thread library for Linux
[snip]
> called Native POSIX Thread Library, NPTL.

Great! Where does this leave NGPT though? I had assumed that
this was going to be the next pthread implementation in glibc.

also:

-------- Original Message --------
Subject: glibc threading performance
Date: Mon, 16 Sep 2002 10:42:42 +0100
From: Padraig Brady <padraig.brady@corvil.com>
To: Ingo Molnar <mingo@redhat.com>, Ulrich Drepper <drepper@redhat.com>

Hey guys,

I noticed you're looking at threading stuff lately,
and was wondering about this thread:
http://sources.redhat.com/ml/bug-glibc/2001-12/msg00048.html

In summary wouldn't it be better to have a per process
flag that was only set when pthread_create() is called.
If the flag is not set, then you don't need to do locking.
This locking seems to have huge overhead. For e.g. I
patched uniq in textutils to use getc_unlocked() rather
than getc() and got a 300% performance increase!

cheers,
Pádraig.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20  0:41 [ANNOUNCE] Native POSIX Thread Library 0.1 Ulrich Drepper
                   ` (3 preceding siblings ...)
  2002-09-20  9:53 ` [ANNOUNCE] Native POSIX Thread Library 0.1 Padraig Brady
@ 2002-09-20  9:54 ` Adrian Bunk
  2002-09-20 10:53   ` Ingo Molnar
  2002-09-20 19:04   ` Ulrich Drepper
  2002-09-20 10:20 ` Bill Huey
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 114+ messages in thread
From: Adrian Bunk @ 2002-09-20  9:54 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel

On Thu, 19 Sep 2002, Ulrich Drepper wrote:

>...
> Unless major flaws in the design are found this code is intended to
> become the standard POSIX thread library on Linux system and it will
> be included in the GNU C library distribution.
>...
> - - requires a kernel with the threading capabilities of Linux 2.5.36.
>...


My personal estimation is that Debian will support kernel 2.4 in it's
stable distribution until 2006 or 2007 (this is based on the experience
that Debian usually supports two stable kernel series and the time between
stable releases of Debian is > 1 year). What is the proposed way for
distributions to deal with this?


cu
Adrian

-- 

You only think this is a free country. Like the US the UK spends a lot of
time explaining its a free country because its a police state.
								Alan Cox



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20  0:41 [ANNOUNCE] Native POSIX Thread Library 0.1 Ulrich Drepper
                   ` (4 preceding siblings ...)
  2002-09-20  9:54 ` Adrian Bunk
@ 2002-09-20 10:20 ` Bill Huey
  2002-09-20 10:47   ` Ingo Molnar
  2002-09-20 10:35 ` Luca Barbieri
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 114+ messages in thread
From: Bill Huey @ 2002-09-20 10:20 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel, Bill Huey (Hui)

On Thu, Sep 19, 2002 at 05:41:37PM -0700, Ulrich Drepper wrote:
>   It is not generally accepted that a 1-on-1 model is superior but our
>   tests showed the viability of this approach and by comparing it with
>   the overhead added by existing M-on-N implementations we became
>   convinced that 1-on-1 is the right approach.

Maybe not but...

You might like to try a context switching/thread wakeup performance
measurement against FreeBSD's libc_r. I'd imagine that it's difficult
to beat a system like that since they keep all of that stuff in
userspace since it's just 2 context switches and a call to their
thread-kernel.

I'm curious as to the rough numbers you got doing the 1:1 and M:N
comparison.

bill


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20  0:41 [ANNOUNCE] Native POSIX Thread Library 0.1 Ulrich Drepper
                   ` (5 preceding siblings ...)
  2002-09-20 10:20 ` Bill Huey
@ 2002-09-20 10:35 ` Luca Barbieri
  2002-09-20 11:19   ` Ingo Molnar
  2002-09-20 12:37 ` jlnance
  2002-09-20 15:43 ` Bill Davidsen
  8 siblings, 1 reply; 114+ messages in thread
From: Luca Barbieri @ 2002-09-20 10:35 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Linux-Kernel ML

[-- Attachment #1: Type: text/plain, Size: 504 bytes --]

Great, but how about using code similar to the following rather than
hand-coded asm operations?

extern struct pthread __pt_current_struct asm("%gs:0");
#define __pt_current (&__pt_current_struct)

#define THREAD_GETMEM(descr, member) (__pt_current->member)
#define THREAD_SETMEM(descr, member, value) ((__pt_current->member) =
value)
#define THREAD_MASKMEM(descr, member, mask) ((__pt_current->member) &=
mask)
...

Of course, it doesn't work if you try to take the address of a member.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20 10:20 ` Bill Huey
@ 2002-09-20 10:47   ` Ingo Molnar
  2002-09-20 12:06     ` Bill Huey
  0 siblings, 1 reply; 114+ messages in thread
From: Ingo Molnar @ 2002-09-20 10:47 UTC (permalink / raw)
  To: Bill Huey; +Cc: Ulrich Drepper, linux-kernel


On Fri, 20 Sep 2002, Bill Huey wrote:

> You might like to try a context switching/thread wakeup performance
> measurement against FreeBSD's libc_r. I'd imagine that it's difficult to
> beat a system like that since they keep all of that stuff in userspace
> since it's just 2 context switches and a call to their thread-kernel.

our kernel thread context switch latency is below 1 usec on a typical P4
box, so our NPT library should compare pretty favorably even in such
benchmarks. We get from the pthread_create() call to the first user
instruction of the specified thread-function code in less than 2 usecs,
and we get from pthread_exit() to the thread that does the pthread_join()
in less than 2 usecs as well - all of these operations are done via a
single system-call and a single context switch.

also consider the fact that the true cost of M:N threading does not show
up with just one or two threads running. The true cost comes when
thousands of threads are running, each of them doing nontrivial work that
matters, ie. IO. The true cost of M:N shows up when threading is actually
used for what it's intended to be used :-) And basically nothing offloads
work to threads for them to just do userspace synchronization - real,
useful work always involves some sort of IO and kernel calls. At which
point M:N loses out badly.

M:N's big mistake is that it concentrates on what matters the least:  
user<->user context switches. Nothing really wants to do that. And if it
does, it's contended on some userspace locking object, at which point it
doesnt really matter whether the cost of switching is 1 usec or 0.5 usecs,
the main application cost is the lost paralellism and increased cache
trashing due to the serialization - independently of what kind of
threading abstraction is used.

and since our NPT library uses futexes for *all* userspace synchronization
primitives (including internal glibc locks), all uncontended
synchronization is done purely in user-space. [and for the contended case
we *want* to switch into the kernel.]

	Ingo


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20  9:54 ` Adrian Bunk
@ 2002-09-20 10:53   ` Ingo Molnar
  2002-09-20 19:04   ` Ulrich Drepper
  1 sibling, 0 replies; 114+ messages in thread
From: Ingo Molnar @ 2002-09-20 10:53 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: Ulrich Drepper, linux-kernel


On Fri, 20 Sep 2002, Adrian Bunk wrote:

> > Unless major flaws in the design are found this code is intended to
> > become the standard POSIX thread library on Linux system and it will
> > be included in the GNU C library distribution.
> >...
> > - - requires a kernel with the threading capabilities of Linux 2.5.36.
> >...
> 
> My personal estimation is that Debian will support kernel 2.4 in it's
> stable distribution until 2006 or 2007 (this is based on the experience
> that Debian usually supports two stable kernel series and the time
> between stable releases of Debian is > 1 year). What is the proposed way
> for distributions to deal with this?

Ulrich will give a fuller reply i guess, but the new threading code in 2.5
does not disable (or in any way obsolete) the old glibc threading library.
So by doing boot-time kernel version checks glibc can decide whether it
wants to provide the new library or the old library.

	Ingo


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20 10:35 ` Luca Barbieri
@ 2002-09-20 11:19   ` Ingo Molnar
  2002-09-20 18:40     ` Roland McGrath
  0 siblings, 1 reply; 114+ messages in thread
From: Ingo Molnar @ 2002-09-20 11:19 UTC (permalink / raw)
  To: Luca Barbieri; +Cc: Ulrich Drepper, Linux-Kernel ML, NPT library mailing list


On 20 Sep 2002, Luca Barbieri wrote:

> Great, but how about using code similar to the following rather than
> hand-coded asm operations?
> 
> extern struct pthread __pt_current_struct asm("%gs:0");
> #define __pt_current (&__pt_current_struct)
> 
> #define THREAD_GETMEM(descr, member) (__pt_current->member)
> #define THREAD_SETMEM(descr, member, value) ((__pt_current->member) =
> value)
> #define THREAD_MASKMEM(descr, member, mask) ((__pt_current->member) &=
> mask)
> ...

it's a good idea i think. Ulrich has an obsession with writing code in
assembly though :-)

	Ingo



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20 10:47   ` Ingo Molnar
@ 2002-09-20 12:06     ` Bill Huey
  2002-09-20 16:20       ` Ingo Molnar
  0 siblings, 1 reply; 114+ messages in thread
From: Bill Huey @ 2002-09-20 12:06 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Ulrich Drepper, linux-kernel

On Fri, Sep 20, 2002 at 12:47:12PM +0200, Ingo Molnar wrote:
> our kernel thread context switch latency is below 1 usec on a typical P4
> box, so our NPT library should compare pretty favorably even in such
> benchmarks. We get from the pthread_create() call to the first user
> instruction of the specified thread-function code in less than 2 usecs,
> and we get from pthread_exit() to the thread that does the pthread_join()
> in less than 2 usecs as well - all of these operations are done via a
> single system-call and a single context switch.

That's outstanding...

> also consider the fact that the true cost of M:N threading does not show
> up with just one or two threads running. The true cost comes when
> thousands of threads are running, each of them doing nontrivial work that
> matters, ie. IO. The true cost of M:N shows up when threading is actually
> used for what it's intended to be used :-) And basically nothing offloads
> work to threads for them to just do userspace synchronization - real,
> useful work always involves some sort of IO and kernel calls. At which
> point M:N loses out badly.

It can. Certainly, if IO upcall overhead is greater than just running the
thread that's blocked inside the kernel, then yes. Not sure how this is all
going to play out...
 
> M:N's big mistake is that it concentrates on what matters the least:  
> user<->user context switches. Nothing really wants to do that. And if it
> does, it's contended on some userspace locking object, at which point it
> doesnt really matter whether the cost of switching is 1 usec or 0.5 usecs,
> the main application cost is the lost paralellism and increased cache
> trashing due to the serialization - independently of what kind of
> threading abstraction is used.

Yeah, that's not a new argument and is a solid criticism...

Hmmm, random thoughts... This is probably outside the scope of lkml,
but...

I'm trying to think up a possible problem with how the JVM does threading that
might be able to exploit this kind of situation...Hmm, there's locks on
the method dictionary, but that's not something that's generally changing a
lot of the time... I'll give it some thought.

The JVM needs a couple of pretty critical things that are a bit off from
the normal Posix threading standard. One of them is very fast thread
suspension for both individual threads and the all threads accept the
currently running one...

In the Solaris threads implementation of JVM/HotSpot it has two methods of
getting a ucontext for doing GC and wierd exception/signal handling via
safepoints (a JIT compiler goody) and it would be nice to have...

1) Slow Version. Throw a SIGUSR1 at a thread and read/write the ucontext on
	the signal frame itself.

2) Fast Version. The thread state and ucontext is examined directly to determine
	the validity of the stored thread context, whether it's blocked on
	a syscall (ignore it) or was doing a CPU intensive operation (use it).  

That ucontext is used for various things:

a) Proper GC so that registers that might contain valid references are
	taken into account properly to maintain the correctness of the
	mark/sweep algorithms.
 
b) The thread's program counter value is altered to deal with safepoints.

(2) above being the most desireable since it's a kind of fast path for
	(a) and (b).

So userspace exposure to the thread's ucontext would be a good thing.
I'm not sure how this is dealt within the current implementation of
what you folks are doing at this moment.

> primitives (including internal glibc locks), all uncontended
> synchronization is done purely in user-space. [and for the contended case
> we *want* to switch into the kernel.]

If there's any thing on this planet that's going to stress a threading
system, it's going to be the JVM. I'll give what you've said a some
thought. My bias has been to FreeBSD's KSE project for the most part
over this last threading/development run.

/me thinks...

bill


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20  0:41 [ANNOUNCE] Native POSIX Thread Library 0.1 Ulrich Drepper
                   ` (6 preceding siblings ...)
  2002-09-20 10:35 ` Luca Barbieri
@ 2002-09-20 12:37 ` jlnance
  2002-09-20 16:42   ` Ingo Molnar
  2002-09-20 15:43 ` Bill Davidsen
  8 siblings, 1 reply; 114+ messages in thread
From: jlnance @ 2002-09-20 12:37 UTC (permalink / raw)
  To: linux-kernel

On Thu, Sep 19, 2002 at 05:41:37PM -0700, Ulrich Drepper wrote:

> We are pleased to announce the first publically available source
> release of a new POSIX thread library for Linux.  As part of the
> continuous effort to improve Linux's capabilities as a client, server,
> and computing platform Red Hat sponsored the development of this
> completely new implementation of a POSIX thread library, called Native
> POSIX Thread Library, NPTL.

Is this related to the thread library work that IBM was doing
or was this independently developed?

Thanks,

Jim

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20  9:53 ` [ANNOUNCE] Native POSIX Thread Library 0.1 Padraig Brady
@ 2002-09-20 13:28   ` Robert Love
  2002-09-20 16:01     ` Bill Davidsen
  0 siblings, 1 reply; 114+ messages in thread
From: Robert Love @ 2002-09-20 13:28 UTC (permalink / raw)
  To: Padraig Brady; +Cc: Ulrich Drepper, linux-kernel

On Fri, 2002-09-20 at 05:53, Padraig Brady wrote:

> Great! Where does this leave NGPT though? I had assumed that
> this was going to be the next pthread implementation in glibc.

This was never the intention of the glibc people.

	Robert Love


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20  0:41 [ANNOUNCE] Native POSIX Thread Library 0.1 Ulrich Drepper
                   ` (7 preceding siblings ...)
  2002-09-20 12:37 ` jlnance
@ 2002-09-20 15:43 ` Bill Davidsen
  2002-09-20 16:15   ` Jakub Jelinek
  8 siblings, 1 reply; 114+ messages in thread
From: Bill Davidsen @ 2002-09-20 15:43 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel

On Thu, 19 Sep 2002, Ulrich Drepper wrote:

> We are pleased to announce the first publically available source
> release of a new POSIX thread library for Linux.  As part of the
> continuous effort to improve Linux's capabilities as a client, server,
> and computing platform Red Hat sponsored the development of this
> completely new implementation of a POSIX thread library, called Native
> POSIX Thread Library, NPTL.
> 
> Unless major flaws in the design are found this code is intended to
> become the standard POSIX thread library on Linux system and it will
> be included in the GNU C library distribution.

If the comment that this doesn't work with the stable kernel is correct, I
consider that a pretty major flaw. Unlike the kernel and NGPT which are
developed using an open source model with lots of eyes on the WIP, this
was done and then released whole with the decision to include it in the
standard library already made. Having any part of glibc not work with the
current stable kernel doesn't seem like such a hot idea, honestly.
 
> The work visible here is the result of close collaboration of kernel
> and runtime developers.  The collaboration proceeded by developing the
> kernel changes while writing the appropriate parts of the thread
> library.  Whenever something couldn't be implemented optimally some
> interface was changed to eliminate the issue.  The result is this
> thread library which is, unlike previous attempts, a very thin layer
> on top of the kernel.  This helps to achieve a maximum of performance
> for a minimal price.

>    Initial confirmations were test runs with huge numbers of threads.
>    Even on IA-32 with its limited address space and memory handling
>    running 100,000 concurrent threads was no problem at all, creating
>    and destroying the threads did not take more than two seconds.  This
>    all was made possible by the kernel work performed as part of this
>    project.

Is there a performance comparison with current pthreads and NGPT and more
typical levels of 5-10k threads as seen on news/mail/dns/web servers?
Eliminating overhead is good, but in most cases there just isn't all that
much overhead in NGPT. I haven't measured Linux threads, but there are a
lot of bad urban legends about them ;-)

> Building glibc with the new thread library is demanding on the
> compilation environment.
> 
> - - The 2.5.36 kernel or above must be installed and used.  To compile
>    glibc it is necessary to create the symbolic link
> 
>       /lib/modules/$(uname -r)/build
> 
>    to point to the build directory.
> 
> - - The general compiler requirement for glibc is at least gcc 3.2.  For
>    the new thread code it is even necessary to have working support for
>    the __thread keyword.
> 
>    Similarly, binutils with functioning TLS support are needed.
> 
>    The (Null) beta release of the upcoming Red Hat Linux product is
>    known to have the necessary tools available after updating from the
>    latest binaries on the FTP site.  This is no ploy to force everybody
>    to use Red Hat Linux, it's just the only environment known to date
>    which works.

Of course not, it's coincidence that only Redhat has these things readily
available, perhaps because this was developed where no other vendor knew
it existed and could have support ready for it.

Modulo my comments on not putting things in libraries which don't widely
work, this souncds as if it is less complex and hopefully more stable (not
that NGPT isn't) and lower maintenence. I'd love to see comparisons of the
three libraries under some typical load, how about a Redhat DNS server
running threaded bind? Run a day with each library and look at response
time, load average, and of course stability.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: 100,000 threads? [was: [ANNOUNCE] Native POSIX Thread Library 0.1]
  2002-09-20  7:52   ` 100,000 threads? [was: [ANNOUNCE] Native POSIX Thread Library 0.1] Ingo Molnar
@ 2002-09-20 15:47     ` Bill Davidsen
  0 siblings, 0 replies; 114+ messages in thread
From: Bill Davidsen @ 2002-09-20 15:47 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

On Fri, 20 Sep 2002, Ingo Molnar wrote:


> the extreme high-end of threading typically uses very controlled
> applications and very small user level stacks.
> 
> as to the question of why so many threads, the answer is because we can :)
> This, besides demonstrating some of the recent scalability advances, gives
> us the warm fuzzy feeling that things are right in this area. I mean,
> there are architectures where Linux could map a petabyte of RAM just fine,
> even though that might not be something we desperately need today.

I think testing at these high numbers is a good proof of scalability,
although response and stability are also important. Before I went to NGPT
I had a fair bit of problem with learning experiences after threads got
beyond 200 or so.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20 13:28   ` Robert Love
@ 2002-09-20 16:01     ` Bill Davidsen
  0 siblings, 0 replies; 114+ messages in thread
From: Bill Davidsen @ 2002-09-20 16:01 UTC (permalink / raw)
  To: Robert Love; +Cc: Padraig Brady, Ulrich Drepper, linux-kernel

On 20 Sep 2002, Robert Love wrote:

> On Fri, 2002-09-20 at 05:53, Padraig Brady wrote:
> 
> > Great! Where does this leave NGPT though? I had assumed that
> > this was going to be the next pthread implementation in glibc.
> 
> This was never the intention of the glibc people.

Was there some shortcoming in NGPT? Clearly someone provided a good bit of
funding for this, so there must have been motivation beyond NIH or funding
someone's honors thesis.

I also expected NGPT to be the next step, not a library which requires a
kernel which is unlikely to be stable for 18 months.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20 15:43 ` Bill Davidsen
@ 2002-09-20 16:15   ` Jakub Jelinek
  2002-09-20 17:16     ` Bill Davidsen
  0 siblings, 1 reply; 114+ messages in thread
From: Jakub Jelinek @ 2002-09-20 16:15 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Ulrich Drepper, linux-kernel

On Fri, Sep 20, 2002 at 11:43:15AM -0400, Bill Davidsen wrote:
> > Unless major flaws in the design are found this code is intended to
> > become the standard POSIX thread library on Linux system and it will
> > be included in the GNU C library distribution.
> 
> If the comment that this doesn't work with the stable kernel is correct, I
> consider that a pretty major flaw. Unlike the kernel and NGPT which are
> developed using an open source model with lots of eyes on the WIP, this
> was done and then released whole with the decision to include it in the
> standard library already made. Having any part of glibc not work with the
> current stable kernel doesn't seem like such a hot idea, honestly.

glibc supports .note.ABI-tag notes for libraries, so there is no problem
with having NPTL libpthread.so.0 --enable-kernel=2.5.36 in say
/lib/i686/libpthread.so.0 and linuxthreads --enable-kernel=2.2.1 in
/lib/libpthread.so.0. The dynamic linker will then choose based
on currently running kernel.
(well, ATM because of libc tsd DL_ERROR --without-tls ld.so cannot be used
with --with-tls libs and vice versa, but that is beeing worked on).

That's similar to non-FLOATING_STACK and FLOATING_STACK linuxthreads,
the latter can be used with 2.4.8+ or something kernels on IA-32.

> > - - The general compiler requirement for glibc is at least gcc 3.2.  For
> >    the new thread code it is even necessary to have working support for
> >    the __thread keyword.
> > 
> >    Similarly, binutils with functioning TLS support are needed.
> > 
> >    The (Null) beta release of the upcoming Red Hat Linux product is
> >    known to have the necessary tools available after updating from the
> >    latest binaries on the FTP site.  This is no ploy to force everybody
> >    to use Red Hat Linux, it's just the only environment known to date
> >    which works.
> 
> Of course not, it's coincidence that only Redhat has these things readily
> available, perhaps because this was developed where no other vendor knew
> it existed and could have support ready for it.

Because all of glibc/gcc/binutils TLS support was developed together (and
still is)? All the changes are publicly available, mostly in the
corresponding CVS archives.

	Jakub

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20 12:06     ` Bill Huey
@ 2002-09-20 16:20       ` Ingo Molnar
  2002-09-20 21:50         ` Bill Huey
  0 siblings, 1 reply; 114+ messages in thread
From: Ingo Molnar @ 2002-09-20 16:20 UTC (permalink / raw)
  To: Bill Huey; +Cc: Ulrich Drepper, linux-kernel


On Fri, 20 Sep 2002, Bill Huey wrote:

> The JVM needs a couple of pretty critical things that are a bit off from
> the normal Posix threading standard. One of them is very fast thread
> suspension for both individual threads and the all threads accept the
> currently running one...

the user contexts for active but preempted threads are stored in the
kernel stack. To support GC safepoints we need fast access to the current
state of every not voluntarily preempted thread. This is admittedly easier
if threads are abstraced in user-space [in which case the context is
stored in user-space], but the question is, what is more important, an
occasional pass of garbage collection, or the cost of doing IO?

until then it can be done via sending SIGSTOP/SIGCONT to the process PID
from the garbage collection thread, which should stop all threads pretty
efficiently in 2.5.35+ kernels. Then all threads that are not voluntarily
sleeping can be fixed up via ptrace calls.

and it can be further improved by tracking preempted user contexts in the
scheduler and giving fast access to them via a syscall. (all voluntarily
sleeping contexts can properly prepare their suspension state in
userspace.) So it's possible to do it efficiently.

how frequently does the GC thread run?

	Ingo


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20 12:37 ` jlnance
@ 2002-09-20 16:42   ` Ingo Molnar
  2002-09-24  0:40     ` Rusty Russell
  0 siblings, 1 reply; 114+ messages in thread
From: Ingo Molnar @ 2002-09-20 16:42 UTC (permalink / raw)
  To: jlnance; +Cc: linux-kernel


On Fri, 20 Sep 2002 jlnance@intrex.net wrote:

> > We are pleased to announce the first publically available source
> > release of a new POSIX thread library for Linux.  As part of the
> > continuous effort to improve Linux's capabilities as a client, server,
> > and computing platform Red Hat sponsored the development of this
> > completely new implementation of a POSIX thread library, called Native
> > POSIX Thread Library, NPTL.
> 
> Is this related to the thread library work that IBM was doing or was
> this independently developed?

independently developed.

	Ingo


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20 16:15   ` Jakub Jelinek
@ 2002-09-20 17:16     ` Bill Davidsen
  0 siblings, 0 replies; 114+ messages in thread
From: Bill Davidsen @ 2002-09-20 17:16 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Ulrich Drepper, linux-kernel

On Fri, 20 Sep 2002, Jakub Jelinek wrote:

> On Fri, Sep 20, 2002 at 11:43:15AM -0400, Bill Davidsen wrote:
> > > Unless major flaws in the design are found this code is intended to
> > > become the standard POSIX thread library on Linux system and it will
> > > be included in the GNU C library distribution.
> > 
> > If the comment that this doesn't work with the stable kernel is correct, I
> > consider that a pretty major flaw. Unlike the kernel and NGPT which are
> > developed using an open source model with lots of eyes on the WIP, this
> > was done and then released whole with the decision to include it in the
> > standard library already made. Having any part of glibc not work with the
> > current stable kernel doesn't seem like such a hot idea, honestly.
> 
> glibc supports .note.ABI-tag notes for libraries, so there is no problem
> with having NPTL libpthread.so.0 --enable-kernel=2.5.36 in say
> /lib/i686/libpthread.so.0 and linuxthreads --enable-kernel=2.2.1 in
> /lib/libpthread.so.0. The dynamic linker will then choose based
> on currently running kernel.
> (well, ATM because of libc tsd DL_ERROR --without-tls ld.so cannot be used
> with --with-tls libs and vice versa, but that is beeing worked on).
> 
> That's similar to non-FLOATING_STACK and FLOATING_STACK linuxthreads,
> the latter can be used with 2.4.8+ or something kernels on IA-32.

Good point, I had forgotten that! It will be somewhat large having both,
but presumably someone who really worried about it would build a subset.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20 11:19   ` Ingo Molnar
@ 2002-09-20 18:40     ` Roland McGrath
  2002-09-20 21:21       ` Luca Barbieri
  0 siblings, 1 reply; 114+ messages in thread
From: Roland McGrath @ 2002-09-20 18:40 UTC (permalink / raw)
  To: phil-list; +Cc: Luca Barbieri, Ulrich Drepper, Linux-Kernel ML

> On 20 Sep 2002, Luca Barbieri wrote:
> 
> > Great, but how about using code similar to the following rather than
> > hand-coded asm operations?
> > 
> > extern struct pthread __pt_current_struct asm("%gs:0");
> > #define __pt_current (&__pt_current_struct)

Try that under -fpic and you will see the problem.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20  9:54 ` Adrian Bunk
  2002-09-20 10:53   ` Ingo Molnar
@ 2002-09-20 19:04   ` Ulrich Drepper
  2002-09-20 23:06     ` J.A. Magallon
  1 sibling, 1 reply; 114+ messages in thread
From: Ulrich Drepper @ 2002-09-20 19:04 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Adrian Bunk wrote:

> My personal estimation is that Debian will support kernel 2.4 in it's
> stable distribution until 2006 or 2007 (this is based on the experience
> that Debian usually supports two stable kernel series and the time between
> stable releases of Debian is > 1 year). What is the proposed way for
> distributions to deal with this?

Two ways:

- - continue to use the old code

- - backport the required functionality


Note that not all the changes Ingo made have to be ported back to 2.4. 
Only those required for correct execution, not the optimizations.

Whether Marcello is interested in this I cannot say, I doubt it though. 
  But this does not mean you cannot have such a kernel in Debian.

- -- 
- ---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE9i3Fb2ijCOnn/RHQRAlC+AJ9kXWMdkfuORtodijTXQ+Hnah0ZYQCfZkOT
Axzw/z1VEFVXIQdZ4d8PLe4=
=ptvg
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20 18:40     ` Roland McGrath
@ 2002-09-20 21:21       ` Luca Barbieri
  0 siblings, 0 replies; 114+ messages in thread
From: Luca Barbieri @ 2002-09-20 21:21 UTC (permalink / raw)
  To: Roland McGrath; +Cc: phil-list, Ulrich Drepper, Linux-Kernel ML

[-- Attachment #1: Type: text/plain, Size: 180 bytes --]

> Try that under -fpic and you will see the problem.
Unfortunately it tries to get it using the GOT and I can't find any
practical workaround, so ignore my broken suggestion.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20 16:20       ` Ingo Molnar
@ 2002-09-20 21:50         ` Bill Huey
  2002-09-20 22:30           ` dean gaudet
                             ` (3 more replies)
  0 siblings, 4 replies; 114+ messages in thread
From: Bill Huey @ 2002-09-20 21:50 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Ulrich Drepper, linux-kernel

On Fri, Sep 20, 2002 at 06:20:10PM +0200, Ingo Molnar wrote:
> the user contexts for active but preempted threads are stored in the
> kernel stack. To support GC safepoints we need fast access to the current
> state of every not voluntarily preempted thread. This is admittedly easier
> if threads are abstraced in user-space [in which case the context is
> stored in user-space], but the question is, what is more important, an
> occasional pass of garbage collection, or the cost of doing IO?

The GC is generational and incremental, so it must deal with a lot of short
term objects that need to be collected. GC performance is a sore point in the
JVM (language runtimes in general) and having slow access to this is potentially
crippling for high load machines. The HotSpot/JVM is a bit overzealous with
safepoints which opens this area to optimization, but the problem exists now.

> until then it can be done via sending SIGSTOP/SIGCONT to the process PID
> from the garbage collection thread, which should stop all threads pretty
> efficiently in 2.5.35+ kernels. Then all threads that are not voluntarily
> sleeping can be fixed up via ptrace calls.

It's better to have an explict pthread_suspend_[thread,all]() function
since this kind of thing is becoming more and more common in thread
heavy language runtimes. The Posix thread spec was build without regard
to this and it's definitely become an important issues these days.

> and it can be further improved by tracking preempted user contexts in the
> scheduler and giving fast access to them via a syscall. (all voluntarily
> sleeping contexts can properly prepare their suspension state in
> userspace.) So it's possible to do it efficiently.
> 
> how frequently does the GC thread run?

Don't remember off hand, but it's like to be several times a second which is
often enough to be a problem especially on large systems with high load.

The JVM with incremental GC is being targetted for media oriented tasks
using the new NIO, 3d library, etc... slowness in safepoints would cripple it
for these tasks. It's a critical item and not easily address by the current
1:1 model.

bill


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20 21:50         ` Bill Huey
@ 2002-09-20 22:30           ` dean gaudet
  2002-09-20 23:11             ` Bill Huey
  2002-09-20 23:45           ` Bill Huey
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 114+ messages in thread
From: dean gaudet @ 2002-09-20 22:30 UTC (permalink / raw)
  To: Bill Huey; +Cc: Ingo Molnar, Ulrich Drepper, linux-kernel



On Fri, 20 Sep 2002, Bill Huey wrote:

> It's better to have an explict pthread_suspend_[thread,all]() function

could this be implemented by having a gc thread in a unique process group
and then suspending the jvm process group?

-dean


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20 19:04   ` Ulrich Drepper
@ 2002-09-20 23:06     ` J.A. Magallon
  2002-09-20 23:33       ` Ulrich Drepper
  0 siblings, 1 reply; 114+ messages in thread
From: J.A. Magallon @ 2002-09-20 23:06 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel


On 2002.09.20 Ulrich Drepper wrote:
>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>Adrian Bunk wrote:
>
>> My personal estimation is that Debian will support kernel 2.4 in it's
>> stable distribution until 2006 or 2007 (this is based on the experience
>> that Debian usually supports two stable kernel series and the time between
>> stable releases of Debian is > 1 year). What is the proposed way for
>> distributions to deal with this?
>
>Two ways:
>
>- - continue to use the old code
>
>- - backport the required functionality
>

Could you post a list of requirements ? For example:
- kernel: futexes, per_cpu_areas
- toolchain: binutils version + RH-patches, gcc version
- glibc: 2.2.xxxx
etc...

Perhaps it is not so difficult, for example futexes are in -aa for 2.4,
Mandrake has gcc-3.2, etc...

Are you pushing hard for the infrastructure you need to get in standard
source trees (ie, changes to gcc, binutils...) ??

Thanks.

-- 
J.A. Magallon <jamagallon@able.es>      \                 Software is like sex:
werewolf.able.es                         \           It's better when it's free
Mandrake Linux release 9.0 (Cooker) for i586
Linux 2.4.20-pre7-jam0 (gcc 3.2 (Mandrake Linux 9.0 3.2-1mdk))

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20 22:30           ` dean gaudet
@ 2002-09-20 23:11             ` Bill Huey
  2002-09-21  3:38               ` dean gaudet
  0 siblings, 1 reply; 114+ messages in thread
From: Bill Huey @ 2002-09-20 23:11 UTC (permalink / raw)
  To: dean gaudet; +Cc: Ingo Molnar, Ulrich Drepper, linux-kernel, Bill Huey (Hui)

On Fri, Sep 20, 2002 at 03:30:19PM -0700, dean gaudet wrote:
> > It's better to have an explict pthread_suspend_[thread,all]() function
> 
> could this be implemented by having a gc thread in a unique process group
> and then suspending the jvm process group?

Suspending how ? via signal ?  

Possibly, but having an explicit syscall() call is important since interrupts
are also suspended under that condition, pthread_cond_timedwait(), etc...
It really needs to be suspended in a way that's different than the SIGSOMETHING
mechanism. I was fixing bugs in libc_r, so I know the issues to a certain degree
and bad logic those particular corner cases was screwing me up.

bill


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20 23:06     ` J.A. Magallon
@ 2002-09-20 23:33       ` Ulrich Drepper
  2002-09-20 23:42         ` J.A. Magallon
  0 siblings, 1 reply; 114+ messages in thread
From: Ulrich Drepper @ 2002-09-20 23:33 UTC (permalink / raw)
  To: J.A. Magallon; +Cc: linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

J.A. Magallon wrote:

> Could you post a list of requirements ? For example:
> - kernel: futexes, per_cpu_areas

There's a lot more.  The signal handling, the exec handling, the exit 
handling, the clone extensions.


> - toolchain: binutils version + RH-patches, gcc version

No RH patches.  Don't spread misinformation.

All the changes needed for kernel, glibc, and the tools are in the 
official sources trees.  It's just that nobody else ships those versions 
or versions with the necessary features backported.


> - glibc: 2.2.xxxx

The announcement said it clearly: you need at least the 2.3 prerelease 
of as yesterday.   Again, all in the public archive.

- -- 
- ---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE9i7BI2ijCOnn/RHQRAn0nAKCF4cZO9K2/vhNbCuawvk0ecM2SCQCeOKE9
Qg19wkleGKXFmr3plY1dbho=
=oWVM
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20 23:33       ` Ulrich Drepper
@ 2002-09-20 23:42         ` J.A. Magallon
  0 siblings, 0 replies; 114+ messages in thread
From: J.A. Magallon @ 2002-09-20 23:42 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel


On 2002.09.21 Ulrich Drepper wrote:
>-----BEGIN PGP SIGNED MESSAGE-----
>> - toolchain: binutils version + RH-patches, gcc version
>
>No RH patches.  Don't spread misinformation.
>

Oh, sorry, no misinformation intended. As I read previous posts
I understood that there were things developed still not in standard trees.
Sometimes I see in Mandrake changelogs 'patch-X, from RH', and then
'patch-X removed, merged upstream'

/by

-- 
J.A. Magallon <jamagallon@able.es>      \                 Software is like sex:
werewolf.able.es                         \           It's better when it's free
Mandrake Linux release 9.0 (Cooker) for i586
Linux 2.4.20-pre7-jam0 (gcc 3.2 (Mandrake Linux 9.0 3.2-1mdk))

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20 21:50         ` Bill Huey
  2002-09-20 22:30           ` dean gaudet
@ 2002-09-20 23:45           ` Bill Huey
  2002-09-21  4:58             ` Ingo Molnar
  2002-09-21  4:48           ` Ingo Molnar
  2002-09-22 13:38           ` Bill Davidsen
  3 siblings, 1 reply; 114+ messages in thread
From: Bill Huey @ 2002-09-20 23:45 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Ulrich Drepper, linux-kernel, Bill Huey (Hui)

On Fri, Sep 20, 2002 at 02:50:29PM -0700, Bill Huey wrote:
> > how frequently does the GC thread run?
> 
> Don't remember off hand, but it's like to be several times a second which is
> often enough to be a problem especially on large systems with high load.
> 
> The JVM with incremental GC is being targetted for media oriented tasks
> using the new NIO, 3d library, etc... slowness in safepoints would cripple it
> for these tasks. It's a critical item and not easily address by the current
> 1:1 model.

Also throwing a signal to get the ucontext is pretty a expensive way of getting
it. But you folks know this already. Solaris threading has this via a some special
libraries. For large number of actively running threads, say, executing in a middle
of a method block it is potentially a huge problem for scalability.

Again, it's a critical issue from what I see of this.

bill


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20 23:11             ` Bill Huey
@ 2002-09-21  3:38               ` dean gaudet
  2002-09-21  4:01                 ` Bill Huey
  0 siblings, 1 reply; 114+ messages in thread
From: dean gaudet @ 2002-09-21  3:38 UTC (permalink / raw)
  To: Bill Huey; +Cc: Ingo Molnar, Ulrich Drepper, linux-kernel

On Fri, 20 Sep 2002, Bill Huey wrote:

> On Fri, Sep 20, 2002 at 03:30:19PM -0700, dean gaudet wrote:
> > > It's better to have an explict pthread_suspend_[thread,all]() function
> >
> > could this be implemented by having a gc thread in a unique process group
> > and then suspending the jvm process group?
>
> Suspending how ? via signal ?

yeah SIGSTOP to the jvm process group.

> Possibly, but having an explicit syscall() call is important since interrupts
> are also suspended under that condition, pthread_cond_timedwait(), etc...
> It really needs to be suspended in a way that's different than the SIGSOMETHING
> mechanism. I was fixing bugs in libc_r, so I know the issues to a certain degree
> and bad logic those particular corner cases was screwing me up.

SIGSTOP is different from other signals because it will stop the whole
process group from continuing.  i am completely aware of how much of a
pain it is to actually trap signals and do something (for apache 2.0's
design i outlawed the use of signals because of the pains of getting
things working in 1.3.x :).

doesn't the hotspot GC work something like this:

- stop all threads
- go read each thread's $pc, and find its nearest "safety point"
- go overwrite that safety point (YUCK SELF MODIFYING CODE!! :) with
  something which will stop the thread
- start the threads and wait for them all to get to their safety points
- perform gc
- undo the above mess

the only part of that which looks challenging with kernel threads is the
$pc reading part...  ptrace will certainly get it for you, but that's a
lot of syscall overhead.

or am i missing something?

-dean


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-21  3:38               ` dean gaudet
@ 2002-09-21  4:01                 ` Bill Huey
  2002-09-21  5:06                   ` Ingo Molnar
  0 siblings, 1 reply; 114+ messages in thread
From: Bill Huey @ 2002-09-21  4:01 UTC (permalink / raw)
  To: dean gaudet; +Cc: Ingo Molnar, Ulrich Drepper, linux-kernel

On Fri, Sep 20, 2002 at 08:38:20PM -0700, dean gaudet wrote:
> SIGSTOP is different from other signals because it will stop the whole
> process group from continuing.  i am completely aware of how much of a
> pain it is to actually trap signals and do something (for apache 2.0's
> design i outlawed the use of signals because of the pains of getting
> things working in 1.3.x :).

There's definitely a need for a pthread_suspend_something() call...

> doesn't the hotspot GC work something like this:
> 
> - stop all threads
> - go read each thread's $pc, and find its nearest "safety point"
> - go overwrite that safety point (YUCK SELF MODIFYING CODE!! :) with
>   something which will stop the thread
> - start the threads and wait for them all to get to their safety points
> - perform gc
> - undo the above mess

+ read the entire ucontext for EAX, etc... so that it can be used for GC
roots. It could be allocating something in an executing method block
that hasn't hit stack or any kind of variable storage known to the GC.

> the only part of that which looks challenging with kernel threads is the
> $pc reading part...  ptrace will certainly get it for you, but that's a
> lot of syscall overhead.

And the entire ucontext.

> or am i missing something?

The ucontext. ;)

bill


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20 21:50         ` Bill Huey
  2002-09-20 22:30           ` dean gaudet
  2002-09-20 23:45           ` Bill Huey
@ 2002-09-21  4:48           ` Ingo Molnar
  2002-09-22  1:38             ` Bill Huey
  2002-09-22 13:38           ` Bill Davidsen
  3 siblings, 1 reply; 114+ messages in thread
From: Ingo Molnar @ 2002-09-21  4:48 UTC (permalink / raw)
  To: Bill Huey; +Cc: Ulrich Drepper, linux-kernel


On Fri, 20 Sep 2002, Bill Huey wrote:

> The JVM with incremental GC is being targetted for media oriented tasks
> using the new NIO, 3d library, etc... slowness in safepoints would
> cripple it for these tasks. It's a critical item and not easily address
> by the current 1:1 model.

actually, in the previous mail i've outlined a sensible way to help
safepoints in the kernel, for the case of the 1:1 model. I'd not call that
'not easily addressed' :-)

there's an even more advanced way to expose preempted user contexts in the
1:1 model: by putting most of the the register info (which is now dumped
into the kernel stack) into a page that is also mapped into user-space.
This too introduces (constant) syscall entry/exit overhead, but can be
done if justified.

	Ingo



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20 23:45           ` Bill Huey
@ 2002-09-21  4:58             ` Ingo Molnar
  2002-09-22  2:51               ` Bill Huey
  0 siblings, 1 reply; 114+ messages in thread
From: Ingo Molnar @ 2002-09-21  4:58 UTC (permalink / raw)
  To: Bill Huey; +Cc: Ulrich Drepper, linux-kernel


On Fri, 20 Sep 2002, Bill Huey wrote:

> Also throwing a signal to get the ucontext is pretty a expensive way of
> getting it. But you folks know this already. [...]

as i've mentioned in the previous mail, 2.5.35+ kernels have a very fast
SIGSTOP/SIGCONT implementation, which change was done as part of this
project - a few orders faster than throwing/catching SIGUSR1 to every
single thread for example.

so right now we first need to get some results back about how big the GC
problem is with the new SIGSTOP/SIGCONT implementation. If it's still not
fast enough then we still have a number of options.

	Ingo


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-21  4:01                 ` Bill Huey
@ 2002-09-21  5:06                   ` Ingo Molnar
  0 siblings, 0 replies; 114+ messages in thread
From: Ingo Molnar @ 2002-09-21  5:06 UTC (permalink / raw)
  To: Bill Huey; +Cc: dean gaudet, Ulrich Drepper, linux-kernel


On Fri, 20 Sep 2002, Bill Huey wrote:

> > doesn't the hotspot GC work something like this:
> > 
> > - stop all threads
> > - go read each thread's $pc, and find its nearest "safety point"
> > - go overwrite that safety point (YUCK SELF MODIFYING CODE!! :) with
> >   something which will stop the thread
> > - start the threads and wait for them all to get to their safety points
> > - perform gc
> > - undo the above mess
> 
> + read the entire ucontext for EAX, etc... so that it can be used for GC
> roots. It could be allocating something in an executing method block
> that hasn't hit stack or any kind of variable storage known to the GC.

PTRACE_GETREGS. Yeah, it's overhead, see my previous mails about various
levels of kernel features of how to do it potentially cheaper. Not like
the above process is particularly fast.

One more method to speed it up: to amortize the kernel entry overhead we
could introduce a new PTRACE_GETREGS_GROUP to get a full array of user
contexts from all group member threads, via a single system-call.

> > the only part of that which looks challenging with kernel threads is the
> > $pc reading part...  ptrace will certainly get it for you, but that's a
> > lot of syscall overhead.
> 
> And the entire ucontext.

PTRACE_GETREGS gets you the instruction pointer and all general purpose 
registers, eax and the rest.

	Ingo


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-21  4:48           ` Ingo Molnar
@ 2002-09-22  1:38             ` Bill Huey
  0 siblings, 0 replies; 114+ messages in thread
From: Bill Huey @ 2002-09-22  1:38 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Ulrich Drepper, linux-kernel, Bill Huey (Hui)

On Sat, Sep 21, 2002 at 06:48:40AM +0200, Ingo Molnar wrote:
> actually, in the previous mail i've outlined a sensible way to help
> safepoints in the kernel, for the case of the 1:1 model. I'd not call that
> 'not easily addressed' :-)
> 
> there's an even more advanced way to expose preempted user contexts in the
> 1:1 model: by putting most of the the register info (which is now dumped
> into the kernel stack) into a page that is also mapped into user-space.
> This too introduces (constant) syscall entry/exit overhead, but can be
> done if justified.

Maybe mmapping a special device into memory ? /proc/satanic_procID_666* ???

A method needs to be considered, definitely. Getting some Sun/Blackdown folks on
this thread wouldn't be bad either.

I'm not exactly sure what Solaris does in this case, so it might be worth
investigating it so that this is conceptually regular across various Unix variants
to a certain degree.

It's also essential to have the run states along with the ucontext to determine
the validity of ucontext backing the thread. Obviously, being blocked in the
kernel on a IO request isn't going to result in any thing useable GC. And that's
because libc library calls are external symbols and don't preserve registers
across calls.  

Also, permission to the PTRACE* interface shouldn't conflict with debuggers...
Complete multithreaded debugging is important of course. I would expect GDB to be
modified so that it can get and examine those register values and thread run
states too.

Uh, the JVM also has a habit of also checksumming the register contents over a window
of time to see if a thread is actively running through its internal debugging
facilities... It shouldn't have to lock or suspend thread execution to examine it.
Now that I think of it, syscall overlay technique isn't going work, since nothing
of interest to the GC is going to be present in the ucontext, so the above suggest
of exporting the ucontext at syscall points isn't going to work.

More to come...

bill


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-21  4:58             ` Ingo Molnar
@ 2002-09-22  2:51               ` Bill Huey
  0 siblings, 0 replies; 114+ messages in thread
From: Bill Huey @ 2002-09-22  2:51 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Ulrich Drepper, linux-kernel

On Sat, Sep 21, 2002 at 06:58:15AM +0200, Ingo Molnar wrote:
> as i've mentioned in the previous mail, 2.5.35+ kernels have a very fast
> SIGSTOP/SIGCONT implementation, which change was done as part of this
> project - a few orders faster than throwing/catching SIGUSR1 to every
> single thread for example.

That's good, but having an explict API for suspending threads is very useful,
since it can greatly simplify the already complicated signal handling in
highly threaded systems. It's something that your group should seriously
consider, since I expect some explicit thread suspension call to be implemented
in Posix threading standard, via their committee. Mainly, because of the
advent of heavily threaded language runtimes as standard programming staple.

It's a good thing to have regardless.

> so right now we first need to get some results back about how big the GC
> problem is with the new SIGSTOP/SIGCONT implementation. If it's still not
> fast enough then we still have a number of options.

I'm running out of things to say. ;)

bill


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20 21:50         ` Bill Huey
                             ` (2 preceding siblings ...)
  2002-09-21  4:48           ` Ingo Molnar
@ 2002-09-22 13:38           ` Bill Davidsen
  2002-09-22 18:41             ` Eric W. Biederman
  2002-09-23 21:12             ` Bill Huey
  3 siblings, 2 replies; 114+ messages in thread
From: Bill Davidsen @ 2002-09-22 13:38 UTC (permalink / raw)
  To: Bill Huey; +Cc: Ingo Molnar, Ulrich Drepper, linux-kernel

On Fri, 20 Sep 2002, Bill Huey wrote:


> Don't remember off hand, but it's like to be several times a second which is
> often enough to be a problem especially on large systems with high load.
> 
> The JVM with incremental GC is being targetted for media oriented tasks
> using the new NIO, 3d library, etc... slowness in safepoints would cripple it
> for these tasks. It's a critical item and not easily address by the current
> 1:1 model.

Could you comment on how whell this works (or not) with linuxthreads,
Solaris, and NGPT? I realize you probably haven't had time to look at NPTL
yet. If an N:M model is really better for your application you might be
able to just run NGPT.

Since preempt threads seem a problem, cound a dedicated machine run w/o
preempt? I assume when you say "high load" that you would be talking a
server, where performance is critical.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-22 13:38           ` Bill Davidsen
@ 2002-09-22 18:41             ` Eric W. Biederman
  2002-09-22 22:13               ` dean gaudet
  2002-09-23  0:11               ` Bill Huey
  2002-09-23 21:12             ` Bill Huey
  1 sibling, 2 replies; 114+ messages in thread
From: Eric W. Biederman @ 2002-09-22 18:41 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Bill Huey, Ingo Molnar, Ulrich Drepper, linux-kernel

Bill Davidsen <davidsen@tmr.com> writes:

> On Fri, 20 Sep 2002, Bill Huey wrote:
> 
> 
> > Don't remember off hand, but it's like to be several times a second which is
> > often enough to be a problem especially on large systems with high load.
> > 
> > The JVM with incremental GC is being targetted for media oriented tasks
> > using the new NIO, 3d library, etc... slowness in safepoints would cripple it
> > for these tasks. It's a critical item and not easily address by the current
> > 1:1 model.
> 
> Could you comment on how whell this works (or not) with linuxthreads,
> Solaris, and NGPT? I realize you probably haven't had time to look at NPTL
> yet. If an N:M model is really better for your application you might be
> able to just run NGPT.
> 
> Since preempt threads seem a problem, cound a dedicated machine run w/o
> preempt? I assume when you say "high load" that you would be talking a
> server, where performance is critical.

>From 10,000 feet out I have one comment.  If the VM has safe points. It sounds
like the problem is more that the safepoints don't provide the register
dumps more than anything else.

They are talking about an incremental GC routine so it does not need to stop
all threads simultaneously.  Threads only need to be stopped when the GC is gather
a root set.  This is what the safe points are for right?  And it does
not need to be 100% accurate in finding all of the garbage.  The
collector just needs to not make mistakes in the other direction.

I fail to see why:

/* This is a safe point ... */
if (needs to be suspended) {
        save_all_registers_on_the_stack()
        flag_gc_thread()
        wait_until_gc_thread_has_what_it_needs()
}

Needs kernel support.

Eric

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-22 18:41             ` Eric W. Biederman
@ 2002-09-22 22:13               ` dean gaudet
  2002-09-26 17:21                 ` Alan Cox
  2002-09-23  0:11               ` Bill Huey
  1 sibling, 1 reply; 114+ messages in thread
From: dean gaudet @ 2002-09-22 22:13 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Bill Davidsen, Bill Huey, Ingo Molnar, Ulrich Drepper, linux-kernel

On 22 Sep 2002, Eric W. Biederman wrote:

> I fail to see why:
>
> /* This is a safe point ... */
> if (needs to be suspended) {
>         save_all_registers_on_the_stack()
>         flag_gc_thread()
>         wait_until_gc_thread_has_what_it_needs()
> }
>
> Needs kernel support.

given that the existing code uses self-modifying-code for the safe-points
i'm guessing there are so many safe-points that the above if statement
would be excessive overhead (and the save/flag/wait stuff would probably
cause a huge amount of code bloat -- but could probably be a subroutine).

there was some really interesting GC work i heard about years ago where
the compiler generated GC code along-side the normal executable code.
the GC code understood the structure of the function and could make much
better choices of GC targets than a generic routine could.  when GC needs
to occur, a walk up the stack in each thread executing the
routine-specific GC stubs would be performed.  (given just the stack
frames you can index into a lookup table for the GC stubs... so there's no
overhead when GC isn't occuring.)  i don't have a reference handy though.

anyhow, this is probably getting off-topic :)

-dean





^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-22 18:41             ` Eric W. Biederman
  2002-09-22 22:13               ` dean gaudet
@ 2002-09-23  0:11               ` Bill Huey
  2002-09-24 16:07                 ` Eric W. Biederman
  1 sibling, 1 reply; 114+ messages in thread
From: Bill Huey @ 2002-09-23  0:11 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Bill Davidsen, Ingo Molnar, Ulrich Drepper, linux-kernel,
	Bill Huey (Hui)

On Sun, Sep 22, 2002 at 12:41:40PM -0600, Eric W. Biederman wrote:
> They are talking about an incremental GC routine so it does not need to stop
> all threads simultaneously.  Threads only need to be stopped when the GC is gather
> a root set.  This is what the safe points are for right?  And it does
> not need to be 100% accurate in finding all of the garbage.  The
> collector just needs to not make mistakes in the other direction.

There's a mixture of GC algorithms in HotSpot including generational and I
believe a traditional mark/sweep. GC isn't my expertise per se.

Think, you have a compiled code block and you suspend/interrupt threads when
you either start hitting the stack yellow guard or by a periodic GC thread...

That can happen anytime, so you can't just expect things to drop onto a
regular boundary in the compiled code block. It's for that reason that
you have to some kind of OS level threading support to get the ucontext.

bill


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-22 13:38           ` Bill Davidsen
  2002-09-22 18:41             ` Eric W. Biederman
@ 2002-09-23 21:12             ` Bill Huey
  1 sibling, 0 replies; 114+ messages in thread
From: Bill Huey @ 2002-09-23 21:12 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Ingo Molnar, Ulrich Drepper, linux-kernel, Bill Huey (Hui)

On Sun, Sep 22, 2002 at 09:38:52AM -0400, Bill Davidsen wrote:
> Could you comment on how whell this works (or not) with linuxthreads,
> Solaris, and NGPT? I realize you probably haven't had time to look at NPTL
> yet. If an N:M model is really better for your application you might be
> able to just run NGPT.

I can't. I'm in a different OS community, FreeBSD, and I deal with issues
related to threading systems there. There's many variables that could be
at play for various performance categories.

> Since preempt threads seem a problem, cound a dedicated machine run w/o
> preempt? I assume when you say "high load" that you would be talking a
> server, where performance is critical.

The JVM itself can has a habit of really stretching the amount of resources
available in many areas and fringe logic in commonly used systems. I can't
really say what the problems are until the Blackdown folks start integrating
the new threading model and then start testing it.

However, there is a mutex fast path in the code itself that can be optionally
used in place of the the OS back version. They felt it was significant to do
the work for that for some reason, so I'm just going to assume that this is
important until otherwise noted.

bill


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-20 16:42   ` Ingo Molnar
@ 2002-09-24  0:40     ` Rusty Russell
  2002-09-24  5:47       ` Ingo Molnar
  0 siblings, 1 reply; 114+ messages in thread
From: Rusty Russell @ 2002-09-24  0:40 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: jlnance, linux-kernel, Ulrich Drepper

On Fri, 20 Sep 2002 18:42:48 +0200 (CEST)
Ingo Molnar <mingo@elte.hu> wrote:
> 
> On Fri, 20 Sep 2002 jlnance@intrex.net wrote:
> > Is this related to the thread library work that IBM was doing or was
> > this independently developed?
> 
> independently developed.

And, ironically, using the futex implementation developed on IBM time 8).

Of course, the time I spent on futexes would have been completely wasted
without the 95% done by Ingo and Uli to reach normal user programs and
address the other scalability problems.

Thanks guys!  IOU each a beer or local equiv...
Rusty.
-- 
   there are those who do and those who hang on and you don't see too
   many doers quoting their contemporaries.  -- Larry McVoy

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24  0:40     ` Rusty Russell
@ 2002-09-24  5:47       ` Ingo Molnar
  2002-09-24  6:15         ` Rusty Russell
  0 siblings, 1 reply; 114+ messages in thread
From: Ingo Molnar @ 2002-09-24  5:47 UTC (permalink / raw)
  To: Rusty Russell; +Cc: jlnance, linux-kernel, Ulrich Drepper


On Tue, 24 Sep 2002, Rusty Russell wrote:

> > > Is this related to the thread library work that IBM was doing or was
> > > this independently developed?
> > 
> > independently developed.
> 
> And, ironically, using the futex implementation developed on IBM time 8).

you are right, futexes are really important for all the userspace locking
primitives and thread-joining. And like basically all core kernel code,
futexes were a collaborative effort as well:

 *  Thanks to Ben LaHaise for yelling "hashed waitqueues" loudly
 *  enough at me, Linus for the original (flawed) idea, Matthew
 *  Kirkwood for proof-of-concept implementation.

there are so many prerequisites to this that it's impossible to list them
all. What i meant above were the specific patches developed for recent 2.5
kernels, and the library itself.

	Ingo


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24  5:47       ` Ingo Molnar
@ 2002-09-24  6:15         ` Rusty Russell
  0 siblings, 0 replies; 114+ messages in thread
From: Rusty Russell @ 2002-09-24  6:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: jlnance, linux-kernel, Ulrich Drepper

In message <Pine.LNX.4.44.0209240741270.8943-100000@localhost.localdomain> you 
write:
> > And, ironically, using the futex implementation developed on IBM time 8).
> 
> you are right, futexes are really important for all the userspace locking
> primitives and thread-joining. And like basically all core kernel code,
> futexes were a collaborative effort as well:
> 
>  *  Thanks to Ben LaHaise for yelling "hashed waitqueues" loudly
>  *  enough at me, Linus for the original (flawed) idea, Matthew
>  *  Kirkwood for proof-of-concept implementation.

And yourself, Robert Love, Paul Mackerras and Hubertus Franke all
contributed to futexes directly, too.  I wasn't complaining about
credit, I just found the IBM involvement worth noting (in case someone
thought we were onesided).

> there are so many prerequisites to this that it's impossible to list them
> all.

True here, but in general: almost all the order-of-magnitude
scalability jumps in 2.5 can be traced back to you or Andrew Morton.
I wouldn't want a casual reader to miss that fact 8)

Cheers,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23  0:11               ` Bill Huey
@ 2002-09-24 16:07                 ` Eric W. Biederman
  2002-09-24 23:21                   ` Bill Huey
  0 siblings, 1 reply; 114+ messages in thread
From: Eric W. Biederman @ 2002-09-24 16:07 UTC (permalink / raw)
  To: Bill Huey; +Cc: Bill Davidsen, Ingo Molnar, Ulrich Drepper, linux-kernel

Bill Huey (Hui) <billh@gnuppy.monkey.org> writes:

> On Sun, Sep 22, 2002 at 12:41:40PM -0600, Eric W. Biederman wrote:
> > They are talking about an incremental GC routine so it does not need to stop
> > all threads simultaneously.  Threads only need to be stopped when the GC is
> gather
> 
> > a root set.  This is what the safe points are for right?  And it does
> > not need to be 100% accurate in finding all of the garbage.  The
> > collector just needs to not make mistakes in the other direction.
> 
> There's a mixture of GC algorithms in HotSpot including generational and I
> believe a traditional mark/sweep. GC isn't my expertise per se.

If they have any sense they also have an incremental GC algorithm.  That
the GC thread can sit around all day and executing.  If they are actually
using a stop and collect algorithm there are real issues.  Though I would
love to see the Java guys justify a copy collector...

> Think, you have a compiled code block and you suspend/interrupt threads when
> you either start hitting the stack yellow guard or by a periodic GC thread...
> 
> That can happen anytime, so you can't just expect things to drop onto a
> regular boundary in the compiled code block.

Agreed, but what was this talk earlier about safe points?

>  It's for that reason that
> you have to some kind of OS level threading support to get the ucontext.

I don't quite follow the need and I'm not certain you do either.  A full
GC pass is very expensive.  So saving a threads context in user space
should not be a big deal.  It is very minor compared to the rest
of the work going on.  Especially in a language like java where practically
everything lives on the heap.

The thing that sounds sensible to me is that before a threads makes a blocking
call it can be certain to save relevant bits of information to the stack.  But
x86 is easy what to do with the pointer heavy architectures where pushing
all of the registers onto the stack starts getting expensive is an entirely
different question.

But beyond that.  The most sensible algorithm I can see is a
generational incremental collector where each thread has it's own
local heap, and does it's own local garbage collection. And only the
boundary where the local heap meets the global heap needs to collected
by the collector for all threads.  This preserves a lot of cache
locality as well as circumventing the whole ucontext issue.

If getting the registers is really a bottle neck in the garbage
collector I suspect it can probably share some generic primitives
with user mode linux.

If support really needs to happen I suspect this case is close
enough to what that user mode linux is doing that someone should
look at how the same mechanism to get the register state can be
shared.

Eric

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24 16:07                 ` Eric W. Biederman
@ 2002-09-24 23:21                   ` Bill Huey
  2002-09-25  3:06                     ` Eric W. Biederman
  0 siblings, 1 reply; 114+ messages in thread
From: Bill Huey @ 2002-09-24 23:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Bill Davidsen, Ingo Molnar, Ulrich Drepper, linux-kernel,
	Bill Huey (Hui)

On Tue, Sep 24, 2002 at 10:07:29AM -0600, Eric W. Biederman wrote:
> If they have any sense they also have an incremental GC algorithm.  That
> the GC thread can sit around all day and executing.  If they are actually
> using a stop and collect algorithm there are real issues.  Though I would
> love to see the Java guys justify a copy collector...

They have that and other crazy things. Thread local storage facilities, etc...

> > regular boundary in the compiled code block.
> Agreed, but what was this talk earlier about safe points?

It's needed to deal with exceptions and GC.

> >  It's for that reason that
> > you have to some kind of OS level threading support to get the ucontext.
> 
> I don't quite follow the need and I'm not certain you do either.  A full
> GC pass is very expensive.  So saving a threads context in user space

But you haven't read the code and the supporting white papers for this
JIT compiler otherwise you wouldn't be whining about this. HotSpot is
no whimp when it comes to these issues. Compilers execute code, code is
stored some where and does memory allocation. Geez.

> should not be a big deal.  It is very minor compared to the rest
> of the work going on.  Especially in a language like java where practically
> everything lives on the heap.

Then you need to look at the compilation architecture of HotSpot. It's pretty
different than your picture of it and is gross oversimplification. Hell,
I don't know what much of it does since it's so freaking large.

> The thing that sounds sensible to me is that before a threads makes a blocking
> call it can be certain to save relevant bits of information to the stack.  But
> x86 is easy what to do with the pointer heavy architectures where pushing
> all of the registers onto the stack starts getting expensive is an entirely
> different question.

Dude, you have not been reading this thread... Nothing valuable is at syscall
time per thread, those are external symbol calls that contain nothing of value
to the GC. It's allowed to execute at that point so any allocated chunk of memory
is going to be properly shoved some where known to the GC.

The stuff that's of value to maintain the correctness of this is within executing
code blocks in the method dictionary... move the program counter to a specific
place, funny execution stuff that I haven't looked at yet since I was pretty happy
about getting the threading glue to the OS working, etc...

> But beyond that.  The most sensible algorithm I can see is a
> generational incremental collector where each thread has it's own
> local heap, and does it's own local garbage collection. And only the
> boundary where the local heap meets the global heap needs to collected
> by the collector for all threads.  This preserves a lot of cache
> locality as well as circumventing the whole ucontext issue.

Which HotSpot has...Read the papers... Thread local storage exists.

This isn't a just a GC library isolated from a compiler. This a very sophisticated
compiler with a heavy threading infrastructure and you have to take into account
various kinds of interaction with the GC. It is non-trivial stuff.

> If getting the registers is really a bottle neck in the garbage
> collector I suspect it can probably share some generic primitives
> with user mode linux.

It can be if you have to send a single to each thread to get the ucontext.

> If support really needs to happen I suspect this case is close
> enough to what that user mode linux is doing that someone should
> look at how the same mechanism to get the register state can be
> shared.

With a non-PTRACE interface to those ucontexts...

bill


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24 23:21                   ` Bill Huey
@ 2002-09-25  3:06                     ` Eric W. Biederman
  0 siblings, 0 replies; 114+ messages in thread
From: Eric W. Biederman @ 2002-09-25  3:06 UTC (permalink / raw)
  To: Bill Huey
  Cc: Eric W. Biederman, Bill Davidsen, Ingo Molnar, Ulrich Drepper,
	linux-kernel

Bill Huey (Hui) <billh@gnuppy.monkey.org> writes:
> Which HotSpot has...Read the papers... Thread local storage exists.

I'd love to. Do you have a URL. I didn't see in my quick look.

Eric

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-22 22:13               ` dean gaudet
@ 2002-09-26 17:21                 ` Alan Cox
  0 siblings, 0 replies; 114+ messages in thread
From: Alan Cox @ 2002-09-26 17:21 UTC (permalink / raw)
  To: dean gaudet
  Cc: Eric W. Biederman, Bill Davidsen, Bill Huey, Ingo Molnar,
	Ulrich Drepper, linux-kernel

On Sun, 2002-09-22 at 23:13, dean gaudet wrote:
> given that the existing code uses self-modifying-code for the safe-points
> i'm guessing there are so many safe-points that the above if statement
> would be excessive overhead (and the save/flag/wait stuff would probably
> cause a huge amount of code bloat -- but could probably be a subroutine).

It might be worth reminding people here that you cannot implement self
modifying code safely on x86 SMP systems without a lot of care. Several
common chips take a long walk off a short bus when the code they are
currently executing is modified as they execute it. Not just because of
write atomicity (which could be fixed) but because of hardware errata.

So if you are patching something that another cpu could be executing at
the same time - you already lost.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-29 23:26         ` Buddy Lumpkin
@ 2002-09-30 14:54           ` Corey Minyard
  0 siblings, 0 replies; 114+ messages in thread
From: Corey Minyard @ 2002-09-30 14:54 UTC (permalink / raw)
  To: Buddy Lumpkin
  Cc: 'Bill Davidsen', 'Peter Waechtler',
	'Larry McVoy', linux-kernel, 'ingo Molnar'

Buddy Lumpkin wrote:

>Sun introduced a new thread library in Solaris 8 that is 1:1, but it did
>not replace the default N:M version, you have to link against
>/usr/lib/lwp.
>
>http://supportforum.sun.com/freesolaris/techfaqs.html?techfaqs_2957
>http://www.itworld.com/AppDev/1170/swol-1218-insidesolaris/
>
>I was at a USENIX BOF on threads in Boston year before last and Bill
>Lewis was ranting about how the N:M model sucks. Christopher Provenzano
>was right there and didn't seem to add any feelings one way or the
>other.
>
>Regards,
>
>--Buddy
>
I heard this a while ago, and talked with someone I knew who had inside 
information about this.  According to that person, Sun will be switching 
the default threads library to 1:1 (It looks like from the document 
referenced below it is Solaris 9).  In various benchmarks, sometimes M:N 
won, and sometimes 1:1 won, so performance was a wash.  The main problem 
was that they could never get certain things to work "just right" under 
an M:N model, the complexity of M:N was just too high to be able to get 
it working 100% correctly.  He didn't have specific details, though.

Having implemented a threads package with prority inheritance, I expect 
that doing that with an M:N thread model will be extremely complex. 
 With activations is possible, but that doesn't mean it's easy.  It's 
hard enough with a 1:1 model.  A scheduler with good "global" properties 
(for example, a scheduler that guaranteed time share to classes of 
threads that occur in different processes) would be difficult to 
implement properly, too.

Complexity is the enemy of reliability.  Even if the M:N model could get 
slightly better performance, it's going to be very hard to make it work 
100% correctly.  I personally think the NPT is going in the right 
direction on this one.

-Corey

>
>-----Original Message-----
>From: linux-kernel-owner@vger.kernel.org
>[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Bill Davidsen
>Sent: Monday, September 23, 2002 12:15 PM
>To: Peter Waechtler
>Cc: Larry McVoy; linux-kernel@vger.kernel.org; ingo Molnar
>Subject: Re: [ANNOUNCE] Native POSIX Thread Library 0.1
>
>On Mon, 23 Sep 2002, Peter Waechtler wrote:
>
>  
>
>>Am Montag den, 23. September 2002, um 12:05, schrieb Bill Davidsen:
>>
>>    
>>
>>>On Sun, 22 Sep 2002, Larry McVoy wrote:
>>>
>>>      
>>>
>>>>On Sun, Sep 22, 2002 at 08:55:39PM +0200, Peter Waechtler wrote:
>>>>        
>>>>
>>>>>AIX and Irix deploy M:N - I guess for a good reason: it's more
>>>>>flexible and combine both approaches with easy runtime tuning if
>>>>>the app happens to run on SMP (the uncommon case).
>>>>>          
>>>>>
>>>>No, AIX and IRIX do it that way because their processes are so
>>>>        
>>>>
>bloated
>  
>
>>>>that it would be unthinkable to do a 1:1 model.
>>>>        
>>>>
>>>And BSD? And Solaris?
>>>      
>>>
>>Don't know. I don't have access to all those Unices. I could try
>>    
>>
>FreeBSD.
>
>At your convenience.
> 
>  
>
>>According to http://www.kegel.com/c10k.html  Sun is moving to 1:1
>>and FreeBSD still believes in M:N
>>    
>>
>
>Sun is total news to me, "moving to" may be in Solaris 9, Sol8 seems to
>still be N:M. BSD is as I thought.
>  
>
>>MacOSX 10.1 does not support PROCESS_SHARED locks, tried that 5
>>    
>>
>minutes 
>  
>
>>ago.
>>    
>>
>
>Thank you for the effort. Hum, that's a bit of a surprise, at least to
>me. 
>
>  
>




^ permalink raw reply	[flat|nested] 114+ messages in thread

* RE: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 19:14       ` Bill Davidsen
@ 2002-09-29 23:26         ` Buddy Lumpkin
  2002-09-30 14:54           ` Corey Minyard
  0 siblings, 1 reply; 114+ messages in thread
From: Buddy Lumpkin @ 2002-09-29 23:26 UTC (permalink / raw)
  To: 'Bill Davidsen', 'Peter Waechtler'
  Cc: 'Larry McVoy', linux-kernel, 'ingo Molnar'

Sun introduced a new thread library in Solaris 8 that is 1:1, but it did
not replace the default N:M version, you have to link against
/usr/lib/lwp.

http://supportforum.sun.com/freesolaris/techfaqs.html?techfaqs_2957
http://www.itworld.com/AppDev/1170/swol-1218-insidesolaris/

I was at a USENIX BOF on threads in Boston year before last and Bill
Lewis was ranting about how the N:M model sucks. Christopher Provenzano
was right there and didn't seem to add any feelings one way or the
other.

Regards,

--Buddy

-----Original Message-----
From: linux-kernel-owner@vger.kernel.org
[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Bill Davidsen
Sent: Monday, September 23, 2002 12:15 PM
To: Peter Waechtler
Cc: Larry McVoy; linux-kernel@vger.kernel.org; ingo Molnar
Subject: Re: [ANNOUNCE] Native POSIX Thread Library 0.1

On Mon, 23 Sep 2002, Peter Waechtler wrote:

> Am Montag den, 23. September 2002, um 12:05, schrieb Bill Davidsen:
> 
> > On Sun, 22 Sep 2002, Larry McVoy wrote:
> >
> >> On Sun, Sep 22, 2002 at 08:55:39PM +0200, Peter Waechtler wrote:
> >>> AIX and Irix deploy M:N - I guess for a good reason: it's more
> >>> flexible and combine both approaches with easy runtime tuning if
> >>> the app happens to run on SMP (the uncommon case).
> >>
> >> No, AIX and IRIX do it that way because their processes are so
bloated
> >> that it would be unthinkable to do a 1:1 model.
> >
> > And BSD? And Solaris?
> 
> Don't know. I don't have access to all those Unices. I could try
FreeBSD.

At your convenience.
 
> According to http://www.kegel.com/c10k.html  Sun is moving to 1:1
> and FreeBSD still believes in M:N

Sun is total news to me, "moving to" may be in Solaris 9, Sol8 seems to
still be N:M. BSD is as I thought.
> 
> MacOSX 10.1 does not support PROCESS_SHARED locks, tried that 5
minutes 
> ago.

Thank you for the effort. Hum, that's a bit of a surprise, at least to
me. 

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24 23:16   ` Peter Waechtler
  2002-09-24 23:23     ` Rik van Riel
@ 2002-09-25 19:05     ` David Schwartz
  1 sibling, 0 replies; 114+ messages in thread
From: David Schwartz @ 2002-09-25 19:05 UTC (permalink / raw)
  To: pwaechtler; +Cc: linux-kernel


>With Scheduler Activations this could also be avoided.
>The thread scheduler could get an upcall - but this will stay theory for
>a long
>time on Linux.
>But this is a somewhat far fetched example (for arguing for 1:1), isn't
>it?

	No, it's not. I write high-performance servers and my main enemy is 
burstiness. One significant cause of burstiness is code faulting in. This is 
especially true because many of my servers support adding code to them 
through user-supplies shared object files.

>There are other means of DoS..

	I'm not talking about deliberate attempts at harming the server. These won't 
work over and over because the code will fault in once and be in. I'm talking 
about smooth performance in the face of unpredictable loads, and that means 
not stalling on every page fault.

	DS



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24 21:10   ` Chris Friesen
  2002-09-24 21:22     ` Rik van Riel
@ 2002-09-25 19:02     ` David Schwartz
  1 sibling, 0 replies; 114+ messages in thread
From: David Schwartz @ 2002-09-25 19:02 UTC (permalink / raw)
  To: cfriesen; +Cc: pwaechtler, linux-kernel


On Tue, 24 Sep 2002 17:10:17 -0400, Chris Friesen wrote:
>David Schwartz wrote:

>>The main reason I write multithreaded apps for single CPU systems is to
>>protect against ambush. Consider, for example, a web server. Someone sends
>>it
>>an obscure request that triggers some code that's never run before and has
>>to
>>fault in. If my application were single-threaded, no work could be done
>>until
>>that page faulted in from disk.

>This is interesting--I hadn't considered this as most of my work for the
>past while has been on embedded systems with everything pinned in ram.

	In the usual case, the code faults in.

>Have you benchmarked this?  I was under the impression that the very
>fastest webservers were still single-threaded using non-blocking io.

	It's all about how you define "fastest". If speed means being able to do the 
same thing over and over really quickly, yes. But I also want uniform 
(non-bursty) performance in the face of an unpredictable set of jobs.

	DS




^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24 23:16   ` Peter Waechtler
@ 2002-09-24 23:23     ` Rik van Riel
  2002-09-25 19:05     ` David Schwartz
  1 sibling, 0 replies; 114+ messages in thread
From: Rik van Riel @ 2002-09-24 23:23 UTC (permalink / raw)
  To: Peter Waechtler; +Cc: David Schwartz, linux-kernel

On Wed, 25 Sep 2002, Peter Waechtler wrote:

> With Scheduler Activations this could also be avoided. The thread
> scheduler could get an upcall - but this will stay theory for a long
> time on Linux. But this is a somewhat far fetched example (for arguing
> for 1:1), isn't it?

Actually, the upcalls in a N:M scheme with scheduler activations
seem like a pretty good argument for 1:1 to me ;)

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

Spamtraps of the month:  september@surriel.com trac@trac.org


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24 20:19 ` David Schwartz
  2002-09-24 21:10   ` Chris Friesen
@ 2002-09-24 23:16   ` Peter Waechtler
  2002-09-24 23:23     ` Rik van Riel
  2002-09-25 19:05     ` David Schwartz
  1 sibling, 2 replies; 114+ messages in thread
From: Peter Waechtler @ 2002-09-24 23:16 UTC (permalink / raw)
  To: David Schwartz; +Cc: linux-kernel

Am Dienstag den, 24. September 2002, um 22:19, schrieb David Schwartz:

>
>> The effect of M:N on UP systems should be even more clear. Your
>> multithreaded apps can't profit of parallelism but they do not
>> add load to the system scheduler. The drawback: more syscalls
>> (I think about removing the need for
>> flags=fcntl(GETFLAGS);fcntl(fd,NONBLOCK);write(fd);fcntl(fd,flags))
>
> 	The main reason I write multithreaded apps for single CPU systems 
> is to
> protect against ambush. Consider, for example, a web server. Someone 
> sends it
> an obscure request that triggers some code that's never run before and 
> has to
> fault in. If my application were single-threaded, no work could be done 
> until
> that page faulted in from disk. This is why select-loop and poll-loop 
> type
> servers are bursty.

With the current NGPT design your threads would be blocked (all that are 
scheduled
one this kernel vehicle).

With Scheduler Activations this could also be avoided.
The thread scheduler could get an upcall - but this will stay theory for 
a long
time on Linux.
But this is a somewhat far fetched example (for arguing for 1:1), isn't 
it?

There are other means of DoS..




^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24 21:22     ` Rik van Riel
  2002-09-24 21:35       ` Roberto Peon
@ 2002-09-24 21:35       ` Chris Friesen
  1 sibling, 0 replies; 114+ messages in thread
From: Chris Friesen @ 2002-09-24 21:35 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Christopher Friesen, David Schwartz, pwaechtler, linux-kernel

Rik van Riel wrote:
> On Tue, 24 Sep 2002, Chris Friesen wrote:

>>This is interesting--I hadn't considered this as most of my work for the
>>past while has been on embedded systems with everything pinned in ram.
>>
> 
> On an ftp server (or movie server, or ...) you CAN'T pin everything
> in RAM.

Yes, but you can use aio to issue the request for data and then go do 
other stuff even with a single thread.  David's case was faulting in 
little-used application code.

Or arm I missing something?

Chris


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24 21:22     ` Rik van Riel
@ 2002-09-24 21:35       ` Roberto Peon
  2002-09-24 21:35       ` Chris Friesen
  1 sibling, 0 replies; 114+ messages in thread
From: Roberto Peon @ 2002-09-24 21:35 UTC (permalink / raw)
  To: Rik van Riel, Chris Friesen; +Cc: David Schwartz, pwaechtler, linux-kernel


On Tuesday 24 September 2002 02:22 pm, Rik van Riel wrote:
> On Tue, 24 Sep 2002, Chris Friesen wrote:
> > David Schwartz wrote:
> > > 	The main reason I write multithreaded apps for single CPU systems is
> > > to protect against ambush. Consider, for example, a web server. Someone
> > > sends it an obscure request that triggers some code that's never run
> > > before and has to fault in. If my application were single-threaded, no
> > > work could be done until that page faulted in from disk.

This is similar to the problems that we face doing realtime virtual video
enhancements-

We have to log camera data (to know where things are pointed) by video
timecode since the data for the camera and the video are asyncronous
(especially in replay). 

These (mmaped) logs can get relatively large (100+ MB ea) and access into them
is relatively random (i.e. determined by the director of the show), so the
process reading the log (and suffering the fault) is in a different thread in
order to not stall the other important tasks such as video output.
(Mis-estimating the position for the enhancement is much less of an issue than
dropping the video frame itself. We don't want 10,000,000 people seeing
pure-green frames popping up in the middle of the broadcast.)

-Roberto JP
robertopeon@sportvision.com



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24 21:10   ` Chris Friesen
@ 2002-09-24 21:22     ` Rik van Riel
  2002-09-24 21:35       ` Roberto Peon
  2002-09-24 21:35       ` Chris Friesen
  2002-09-25 19:02     ` David Schwartz
  1 sibling, 2 replies; 114+ messages in thread
From: Rik van Riel @ 2002-09-24 21:22 UTC (permalink / raw)
  To: Chris Friesen; +Cc: David Schwartz, pwaechtler, linux-kernel

On Tue, 24 Sep 2002, Chris Friesen wrote:
> David Schwartz wrote:
>
> > 	The main reason I write multithreaded apps for single CPU systems is to
> > protect against ambush. Consider, for example, a web server. Someone sends it
> > an obscure request that triggers some code that's never run before and has to
> > fault in. If my application were single-threaded, no work could be done until
> > that page faulted in from disk.
>
> This is interesting--I hadn't considered this as most of my work for the
> past while has been on embedded systems with everything pinned in ram.

On an ftp server (or movie server, or ...) you CAN'T pin everything
in RAM.

Rik
-- 
A: No.
Q: Should I include quotations after my reply?

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24 20:19 ` David Schwartz
@ 2002-09-24 21:10   ` Chris Friesen
  2002-09-24 21:22     ` Rik van Riel
  2002-09-25 19:02     ` David Schwartz
  2002-09-24 23:16   ` Peter Waechtler
  1 sibling, 2 replies; 114+ messages in thread
From: Chris Friesen @ 2002-09-24 21:10 UTC (permalink / raw)
  To: David Schwartz; +Cc: pwaechtler, linux-kernel

David Schwartz wrote:

> 	The main reason I write multithreaded apps for single CPU systems is to 
> protect against ambush. Consider, for example, a web server. Someone sends it 
> an obscure request that triggers some code that's never run before and has to 
> fault in. If my application were single-threaded, no work could be done until 
> that page faulted in from disk.

This is interesting--I hadn't considered this as most of my work for the 
past while has been on embedded systems with everything pinned in ram.

Have you benchmarked this?  I was under the impression that the very 
fastest webservers were still single-threaded using non-blocking io.

Chris


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24  0:03           ` Andy Isaacson
  2002-09-24  0:10             ` Jeff Garzik
  2002-09-24  5:53             ` Ingo Molnar
@ 2002-09-24 20:34             ` David Schwartz
  2 siblings, 0 replies; 114+ messages in thread
From: David Schwartz @ 2002-09-24 20:34 UTC (permalink / raw)
  To: adi, Ingo Molnar; +Cc: linux-kernel



On Mon, 23 Sep 2002 19:03:06 -0500, Andy Isaacson wrote:

>Of course this can be (and frequently is) implemented such that there is
>not one Pthreads thread per object; given simulation environments with 1
>million objects, and the current crappy state of Pthreads
>implementations, the researchers have no choice.

	It may well be handy to have a threads implementation that makes these kinds 
of programs easy to write, but an OS's preferred pthreads is not and should 
not be that threads implementation. A platforms default/preferred pthreads 
implementation should be one that allows well-designed, high-performance 
I/O-intensive and compute-intensive tasks to run extremely well.

	DS



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-22 18:55 Peter Waechtler
  2002-09-22 21:32 ` Larry McVoy
  2002-09-23 21:03 ` Bill Huey
@ 2002-09-24 20:19 ` David Schwartz
  2002-09-24 21:10   ` Chris Friesen
  2002-09-24 23:16   ` Peter Waechtler
  2 siblings, 2 replies; 114+ messages in thread
From: David Schwartz @ 2002-09-24 20:19 UTC (permalink / raw)
  To: pwaechtler, linux-kernel


>The effect of M:N on UP systems should be even more clear. Your
>multithreaded apps can't profit of parallelism but they do not
>add load to the system scheduler. The drawback: more syscalls
>(I think about removing the need for
>flags=fcntl(GETFLAGS);fcntl(fd,NONBLOCK);write(fd);fcntl(fd,flags))

	The main reason I write multithreaded apps for single CPU systems is to 
protect against ambush. Consider, for example, a web server. Someone sends it 
an obscure request that triggers some code that's never run before and has to 
fault in. If my application were single-threaded, no work could be done until 
that page faulted in from disk. This is why select-loop and poll-loop type 
servers are bursty.

	DS



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 23:57           ` Andy Isaacson
@ 2002-09-24 18:10             ` Christoph Hellwig
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Hellwig @ 2002-09-24 18:10 UTC (permalink / raw)
  To: Andy Isaacson; +Cc: linux-kernel

> Another advantage of keeping a "process" concept is that things like CSA
> (Compatible System Accounting, nee Cray System Accounting)

Which has been ported to Linux now, btw (rather poorly integrated, though):

	http://oss.sgi.com/projects/csa/

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24 13:40     ` Peter Svensson
@ 2002-09-24 14:20       ` Michael Sinz
  0 siblings, 0 replies; 114+ messages in thread
From: Michael Sinz @ 2002-09-24 14:20 UTC (permalink / raw)
  To: Peter Svensson
  Cc: Bill Huey (Hui), Peter Waechtler, linux-kernel, ingo Molnar

Peter Svensson wrote:
> On Tue, 24 Sep 2002, Michael Sinz wrote:
> 
> 
>>The problem was very quickly noticed as other students quickly learned
>>how to make use of such "solutions" to their performance wants.  We
>>relatively quickly had to add process level accounting of thread CPU
>>usage such that any thread in a process counted to that process's
>>CPU usage/timeslice/etc.  It basically made the scheduler into a
>>2-stage device - much like user threads but with the kernel doing
>>the work and all of the benefits of kernel threads.  (And did not
>>require any code recompile other than those people who were doing
>>the many-threads CPU hog type of thing ended up having to revert as
>>it was now slower than the single thread-per-CPU code...)
> 
> 
> Then you can just as well use fork(2) and split into processes with the 
> same result. The solution is not thread specific, it is resource limits 
> and/or per user cpu accounting. 

I understand that point - but the basic question is if you schedule
based on the process or based on the thread.  In an interactive multi-
user system, you may even want to back out to the user level.  (Thus
no user can hog the system by doing many things).  But that is usually
not the target of Linux systems (yet?)

The problem then is the inter-process communications.  (At least on
that system - Linux has many better solutions)  That system did not
have shared memory and thus the coordination between processes was
difficult at best.

> Several raytracers can (could?) split the workload into multiple 
> processes, some being started on other computers over rsh or similar.

And they exist - but the I/O overhead makes it "not a win" on a
single machine.  (It hurts too much)

-- 
Michael Sinz -- Director, Systems Engineering -- Worldgate Communications
A master's secrets are only as good as
	the master's ability to explain them to others.



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24 12:03   ` Michael Sinz
@ 2002-09-24 13:40     ` Peter Svensson
  2002-09-24 14:20       ` Michael Sinz
  0 siblings, 1 reply; 114+ messages in thread
From: Peter Svensson @ 2002-09-24 13:40 UTC (permalink / raw)
  To: Michael Sinz; +Cc: Bill Huey (Hui), Peter Waechtler, linux-kernel, ingo Molnar

On Tue, 24 Sep 2002, Michael Sinz wrote:

> The problem was very quickly noticed as other students quickly learned
> how to make use of such "solutions" to their performance wants.  We
> relatively quickly had to add process level accounting of thread CPU
> usage such that any thread in a process counted to that process's
> CPU usage/timeslice/etc.  It basically made the scheduler into a
> 2-stage device - much like user threads but with the kernel doing
> the work and all of the benefits of kernel threads.  (And did not
> require any code recompile other than those people who were doing
> the many-threads CPU hog type of thing ended up having to revert as
> it was now slower than the single thread-per-CPU code...)

Then you can just as well use fork(2) and split into processes with the 
same result. The solution is not thread specific, it is resource limits 
and/or per user cpu accounting. 

Several raytracers can (could?) split the workload into multiple 
processes, some being started on other computers over rsh or similar.

Peter
--
Peter Svensson      ! Pgp key available by finger, fingerprint:
<petersv@psv.nu>    ! 8A E9 20 98 C1 FF 43 E3  07 FD B9 0A 80 72 70 AF
------------------------------------------------------------------------
Remember, Luke, your source will be with you... always...



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 21:03 ` Bill Huey
@ 2002-09-24 12:03   ` Michael Sinz
  2002-09-24 13:40     ` Peter Svensson
  0 siblings, 1 reply; 114+ messages in thread
From: Michael Sinz @ 2002-09-24 12:03 UTC (permalink / raw)
  To: Bill Huey (Hui); +Cc: Peter Waechtler, linux-kernel, ingo Molnar

Bill Huey (Hui) wrote:
> On Sun, Sep 22, 2002 at 08:55:39PM +0200, Peter Waechtler wrote:
> 
>>AIX and Irix deploy M:N - I guess for a good reason: it's more
>>flexible and combine both approaches with easy runtime tuning if
>>the app happens to run on SMP (the uncommon case).
> 
> Also, for process scoped scheduling in a way so that system wide threads
> don't have an impact on a process slice. Folks have piped up about that
> being important.

This is the one major complaint I have with "a thread is a process"
implementation in Linux.  The scheduler does not take process vs thread
into account.

A simple example:  Two users (or two different programs - same user)
are running.  Both could use all of the CPU resources (for whatever
reason).

One of the programs (A) has N threads (where N >> 1) and the other
program (B) has only 1 thread.  Of the N threads in (A), M of them
are not blocked (where M > 1) then (A) will get M:1 CPU usage advantage
over (B).

This means that two processes/programs that should be scheduled equally
are not and the one with many threads "effectively" is stealing cycles
from the other.

In a multi-user (server with multiple processes) environment, this
means that you just write with lots of threads to get more of the
bandwidth out of the scheduled processes.

A real-world (albeit not great) example from many years ago:

A program that does ray-tracing can very easily split the process up
into very small bits.  This is great on multi-processor systems as you
can get each CPU to do part of the work in parallel.  There is almost
no I/O involved in such a system other than initial load and final save.

It turned out that on non-dedicated systems (multi-user systems) that
you could actually get your work done faster by having the program
create many (100, in this case) threads even though there was only
one big CPU.  The reason was that that OS also did not (yet) understand
process scheduling fairness and the student who did this effectively
made a way around the fair scheduling of system resources.

The problem was very quickly noticed as other students quickly learned
how to make use of such "solutions" to their performance wants.  We
relatively quickly had to add process level accounting of thread CPU
usage such that any thread in a process counted to that process's
CPU usage/timeslice/etc.  It basically made the scheduler into a
2-stage device - much like user threads but with the kernel doing
the work and all of the benefits of kernel threads.  (And did not
require any code recompile other than those people who were doing
the many-threads CPU hog type of thing ended up having to revert as
it was now slower than the single thread-per-CPU code...)

Now, computer hardware has changed a lot.  Back then, branch took
longer than current kernel syscall overhead.  Memory was faster
than the CPU.  The scheduler was complex, so I could not say that
it was as efficient as the Linux kernel.  However, we did have real
threads and did quickly get real process accounting after someone
"pointed out" the problem of not doing so :-)

-- 
Michael Sinz -- Director, Systems Engineering -- Worldgate Communications
A master's secrets are only as good as
	the master's ability to explain them to others.



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 15:30     ` Larry McVoy
                         ` (4 preceding siblings ...)
  2002-09-23 21:41       ` dean gaudet
@ 2002-09-24 10:02       ` Nikita Danilov
  5 siblings, 0 replies; 114+ messages in thread
From: Nikita Danilov @ 2002-09-24 10:02 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Bill Davidsen, Peter Waechtler, linux-kernel, ingo Molnar

Larry McVoy writes:
 > > > Instead of taking the traditional "we've screwed up the normal system 
 > > > primitives so we'll event new lightweight ones" try this:
 > > > 
 > > > We depend on the system primitives to not be broken or slow.
 > > > 
 > > > If that's a true statement, and in Linux it tends to be far more true
 > > > than other operating systems, then there is no reason to have M:N.
 > > 
 > > No matter how fast you do context switch in and out of kernel and a sched
 > > to see what runs next, it can't be done as fast as it can be avoided.
 > 
 > You are arguing about how many angels can dance on the head of a pin.
 > Sure, there are lotso benchmarks which show how fast user level threads
 > can context switch amongst each other and it is always faster than going
 > into the kernel.  So what?  What do you think causes a context switch in
 > a threaded program?  What?  Could it be blocking on I/O?  Like 99.999%
 > of the time?  And doesn't that mean you already went into the kernel to
 > see if the I/O was ready?  And doesn't that mean that in all the real
 > world applications they are already doing all the work you are arguing
 > to avoid?

M:N threads are supposed to have other advantages beside fast context
switches. Original paper on scheduler activations mentioned case when
kernel thread is preempted while user level thread it runs held spin
lock. When kernel notifies user level scheduler about preemption
(through upcall) it can de-schedule all user level threads spinning on
this lock, so that they will not waste their time slices burning CPU.

 > -- 
 > ---
 > Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

Nikita.

 > -

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24  7:12           ` Thunder from the hill
@ 2002-09-24  7:30             ` Ingo Molnar
  0 siblings, 0 replies; 114+ messages in thread
From: Ingo Molnar @ 2002-09-24  7:30 UTC (permalink / raw)
  To: Thunder from the hill
  Cc: Bill Davidsen, Larry McVoy, Peter Waechtler, linux-kernel, Ingo Molnar


On Tue, 24 Sep 2002, Thunder from the hill wrote:

> > 90% of the programs that matter behave exactly like Larry has described.
> > IO is the main source of blocking. Go and profile a busy webserver or
> > mailserver or database server yourself if you dont believe it.
>
> Well, I guess Java Web Server behaves the same?

yes. The most common case is that it either blocks on the external network
connection (IO), or on some internal database connection (IO as well). The
JVMs themselves be better well-threaded internally, with not much
contention on any internal lock. The case of internal synchronization is
really that the 1:1 model makes a 'bad parallelism' more visible: when
there's contention. It's quite rare that heavy synchronization and heavy
lock contention cannot be avoided, and it mostly involves simulation
projects which often do this because they simulate real world IO :-)

(but, all this thread is becoming pretty theoretical - current fact is
that the 1:1 library is currently more than 4 times faster than the only
M:N library that we were able to run the test on using the same kernel, on
M:N's 'home turf'. So anyone who thinks the M:N library should perform
faster is welcome to improve it and send in results.)

	Ingo


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 20:32         ` Ingo Molnar
  2002-09-24  0:03           ` Andy Isaacson
@ 2002-09-24  7:12           ` Thunder from the hill
  2002-09-24  7:30             ` Ingo Molnar
  1 sibling, 1 reply; 114+ messages in thread
From: Thunder from the hill @ 2002-09-24  7:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Bill Davidsen, Larry McVoy, Peter Waechtler, linux-kernel, Ingo Molnar

Hi,

On Mon, 23 Sep 2002, Ingo Molnar wrote:
> 90% of the programs that matter behave exactly like Larry has described.
> IO is the main source of blocking. Go and profile a busy webserver or
> mailserver or database server yourself if you dont believe it.

Well, I guess Java Web Server behaves the same?

			Thunder
-- 
assert(typeof((fool)->next) == typeof(fool));	/* wrong */


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24  0:03           ` Andy Isaacson
  2002-09-24  0:10             ` Jeff Garzik
@ 2002-09-24  5:53             ` Ingo Molnar
  2002-09-24 20:34             ` David Schwartz
  2 siblings, 0 replies; 114+ messages in thread
From: Ingo Molnar @ 2002-09-24  5:53 UTC (permalink / raw)
  To: Andy Isaacson; +Cc: Larry McVoy, linux-kernel


On Mon, 23 Sep 2002, Andy Isaacson wrote:

> > 90% of the programs that matter behave exactly like Larry has described.
> > IO is the main source of blocking. Go and profile a busy webserver or
> > mailserver or database server yourself if you dont believe it.
> 
> There are heavily-threaded programs out there that do not behave this
> way, and for which a N:M thread model is completely appropriate. [...]

of course, that's the other 10%. [or even smaller.] I never claimed M:N
cannot be viable for certain specific applications. But a generic
threading library should rather concentrate on the common 90% of the
applications.

(obviously for simulations the absolute fastest implementation would be a
pure userspace state-machine, not a threaded application - M:N or 1:1.)

	Ingo


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24  2:48 ` Peter Chubb
@ 2002-09-24  3:37   ` Mark Mielke
  0 siblings, 0 replies; 114+ messages in thread
From: Mark Mielke @ 2002-09-24  3:37 UTC (permalink / raw)
  To: Peter Chubb
  Cc: Peter Wächtler, Ingo Molnar, Larry McVoy, Bill Davidsen,
	linux-kernel

On Tue, Sep 24, 2002 at 12:48:40PM +1000, Peter Chubb wrote:
> >>>>> "Mark" == Mark Mielke <mark@mark.mielke.cc> writes:
> Mark>     OS threads: 1) thread#1 invokes a system call 2) OS switches
> Mark> tasks to thread#2 and returns from blocking
> Mark>     user-space threads: 1) thread#1 invokes a system call 2)
> Mark> thread#1 returns from system call, EWOULDBLOCK 3) thread#1
> Mark> invokes poll(), select(), ioctl() to determine state 4) thread#1
> Mark> returns from system call 5) thread#1 switches stack pointer to
> Mark> be thread#2 upon determination that the resource thread#2 was
> Mark> waiting on is ready.
> No way!  THe Solaris M:N model notices when all threads belonging to a
> process have blocked, and wakes up the master thread, which can then
> create a new kernel thread if there are any user-mode threads that can
> do work.

As I said, far from accurate, and M:N is really a compromise between
the two approaches, it is not an extreme.

M:N really doesn't mean much at all except that it makes no guarantees
that each thread requires an OS thread, or that only one thread will
be active at any given time.

M:N makes no promises to be faster, although, as with any innovation
designed by engineers who take pride in their work, it is a method of
achieving a goal... that is, getting around costly system invocations
by optimizing system invocations in user space. Is it better? Maybe? 
Has it traditionally allowed standard applications that make use of
threads to perform better than if the application used only kernel
threads? Yes.

Can the rules be broken? I have not seen a single reason to show why
they cannot be broken. M:N is a necessity for kernels that have heavy
weight thread synchronization primitives, or heavy weight context
switching. Are the rules the same with thread synchronization primitives
that have the similar weight whether 1:1 or M:N? (i.e. FUTEX) Are the
rules the same if context switching in kernel space can be made cheaper,
if the scheduling issues can be addressed, or if M:N must rely on just as
many kernel invocations?

Is M:N really cheaper in your Solaris example, where a new thread is
created by the master thread on demand? If threads were sufficiently
light-weight, I do not see how you could consider a master thread
sitting on SIGIO/select()/poll()/ioctl() switching to a thread in the
pool could be cheaper than the kernel pulling a stopped thread into
the run queue.

This is one of those things where the 'proof is in the pudding'. It is
difficult to theorize anything as almost all theory on this subject is
based on comparing performance under a different set of rules. M:N was
necessary before as 1:1 was not feasible. Now that 1:1 may be reaching
the state of being feasible, the rules change, and previous attempts
at analyzing the data mean very little. Previous conclusions mean very
little.

I am one who wants to see what happens. Worst case, M:N
implementations can use the same enhancements that were designed for
1:1, and benefit. The most obviously example, that needs to be
mentioned once again, is FUTEX. By each application, it might benefit
1:1, or M:N, but as a kernel feature, it benefits anybody who can
invent a use for it.

Fast thread switching? This provides a benefit for M:N. In fact, I would
suspect that the people comparing the best 1:1 implementation with the best
M:N implementation will find that once all the patches are applied, the
race will be closer than most people thought, but that BOTH will perform
better on 2.5.x than either ever did on 2.4.x and earlier.

mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24  0:21                   ` Bill Huey
@ 2002-09-24  3:20                     ` Mark Mielke
  0 siblings, 0 replies; 114+ messages in thread
From: Mark Mielke @ 2002-09-24  3:20 UTC (permalink / raw)
  To: Bill Huey
  Cc: Peter W?chtler, Ingo Molnar, Larry McVoy, Bill Davidsen, linux-kernel

On Mon, Sep 23, 2002 at 05:21:35PM -0700, Bill Huey wrote:
> ...
> The incorrect example where you outline what you think is a M:N call
> conversion is (traditional async wrappers instead of upcalls), is something
> that don't want to be a future technical strawman that folks create in
> this community to attack M:N threading. It may very well still have
> legitimacy in the same way that part of the performance of the JVM depends
> on accessibilty to a thread's ucontext and run state, which seem to be
> initial oversight (unknown reason) when this was originally conceived.
> Those are kind of things are what I'm most worried about that eventually
> hurt what application folks are on building on top of Linux and its
> kernel facilities.
> ...
> That's the core of my rant and it took quite a while to write up. ;)

My part in the rant (really somebody else's rant...) is that if kernel
threads can be made to out-perform current implementations of M:N
threading, then all that has really been proven is that current M:N
practices are not fully optimal. 1:1 in an N:N system is just one face
of M:N in an N:N system. A fully functional M:N system _may choose_ to
allow M to equal N.

Worst possibly cases that I expect to see from people experimenting
with this stuff and having a 1:1 system that out-performs commonly
available M:N systems: 1) The M:N people innovate, potentially using
the new technology made available from the 1:1 people, making a
_better_ M:N system 2) The 1:1 system is better, and people use it.

As long as they all use a POSIX, or other standard interface, there
isn't a problem.

If the changes to the kernel made by the 1:1 people are bad, they will
be stopped by Linus and many other people, probably including
yourself... :-)

In any case, I see the 1:1 vs. M:N as a distraction from the *actual*
enhancements being designed, which seem to be, support for cheaper
kernel threads, something that benefits both parties.

mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
       [not found] <987738530@toto.iv>
@ 2002-09-24  2:48 ` Peter Chubb
  2002-09-24  3:37   ` Mark Mielke
  0 siblings, 1 reply; 114+ messages in thread
From: Peter Chubb @ 2002-09-24  2:48 UTC (permalink / raw)
  To: Mark Mielke
  Cc: Peter Wächtler, Ingo Molnar, Larry McVoy, Bill Davidsen,
	linux-kernel

>>>>> "Mark" == Mark Mielke <mark@mark.mielke.cc> writes:

Mark> On Mon, Sep 23, 2002 at 11:08:53PM +0200, Peter Wächtler wrote:
Mark> Think of it this way... two threads are blocked on different
Mark> resources...  The currently executing thread reaches a point
Mark> where it blocks.

Mark>     OS threads: 1) thread#1 invokes a system call 2) OS switches
Mark> tasks to thread#2 and returns from blocking

Mark>     user-space threads: 1) thread#1 invokes a system call 2)
Mark> thread#1 returns from system call, EWOULDBLOCK 3) thread#1
Mark> invokes poll(), select(), ioctl() to determine state 4) thread#1
Mark> returns from system call 5) thread#1 switches stack pointer to
Mark> be thread#2 upon determination that the resource thread#2 was
Mark> waiting on is ready.

No way!  THe Solaris M:N model notices when all threads belonging to a
process have blocked, and wakes up the master thread, which can then
create a new kernel thread if there are any user-mode threads that can
do work.

PeterC

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 23:11                 ` Mark Mielke
@ 2002-09-24  0:21                   ` Bill Huey
  2002-09-24  3:20                     ` Mark Mielke
  0 siblings, 1 reply; 114+ messages in thread
From: Bill Huey @ 2002-09-24  0:21 UTC (permalink / raw)
  To: Mark Mielke
  Cc: Peter W?chtler, Ingo Molnar, Larry McVoy, Bill Davidsen,
	linux-kernel, Bill Huey (Hui)

On Mon, Sep 23, 2002 at 07:11:32PM -0400, Mark Mielke wrote:
> I do not find it to be profitable to discourage the people working on
> this project. If they fail, nobody loses. If they succeed, they can
> re-invent the math behind threading, and Linux ends up on the forefront
> of operating systems offering the technology.

Math, unlikely. Performance issues, maybe. Overall kernel technology,
highly unlikely and bordering on preposterous claim.

This is forum, like anything else, is to propose a new infrastructure
for something that's very important to the function of this operating
system. For this project to succeed, it must address possible problems
that various folks bring up in examining what's been proposed or built.
That's the role of these discussions.

> As for 'crazy synchronization', solutions such as the FUTEX have no
> real negative aspects. It wasn't long ago that the FUTEX did not
> exist. Why couldn't innovation make 'crazy synchronization by
> non-web-server like applications' more efficient using kernel threads?

To be blunt, I don't believe it. It's out of a technical point of view
from my bias to a FreeBSD's scheduler activation threading and because
people are too easily dismissing M:N performance issues while reaching
conclusions about it that seem to be presumptuous.

The incorrect example where you outline what you think is a M:N call
conversion is (traditional async wrappers instead of upcalls), is something
that don't want to be a future technical strawman that folks create in
this community to attack M:N threading. It may very well still have
legitimacy in the same way that part of the performance of the JVM depends
on accessibilty to a thread's ucontext and run state, which seem to be
initial oversight (unknown reason) when this was originally conceived.

Those are kind of things are what I'm most worried about that eventually
hurt what application folks are on building on top of Linux and its
kernel facilities.

> Concurrency experts would welcome the change. Concurrent 'experts'
> would not welcome the change, as it would force them to have to
> re-learn everything they know, effectively obsoleting their 'expert'
> status. (note the difference between the unquoted, and the quoted...)

Well, what I mean by concurrency experts is there can be specialized
applications where people much become experts in concurrency to solve
difficult problem that might be know to this group at this time.
Dimissing that in the above paragraph doesn't negate that need.

The bottom line here is that ultimately the kernel is providing useable
primitive/terms for applications programmers. It's not the scenario
where kernel folks just build something that's conceptually awkward and
then it's up to applications people to work around bogus design problems
that result from that. So what I meant by folks that have applications
that might push the limits of what the current synchronization model
offers.

That's the core of my rant and it took quite a while to write up. ;)

bill


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24  0:10             ` Jeff Garzik
@ 2002-09-24  0:14               ` Andy Isaacson
  0 siblings, 0 replies; 114+ messages in thread
From: Andy Isaacson @ 2002-09-24  0:14 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel

On Mon, Sep 23, 2002 at 08:10:24PM -0400, Jeff Garzik wrote:
> Andy Isaacson wrote:
> > Of course this can be (and frequently is) implemented such that there is
> > not one Pthreads thread per object; given simulation environments with 1
> > million objects, and the current crappy state of Pthreads
> > implementations, the researchers have no choice.
> 
> Are these object threads mostly active or inactive?

Mostly inactive (waiting on a semaphore or FIFO).

> Regardless, it seems obvious with today's hardware, that 1 million 
> objects should never be one-thread-per-object, pthreads or no.  That's 
> just lazy programming.

You can call it lazy if you want, but I call it natural.

(Of course I realize that practical considerations prevent users from
creating 1 million kernel threads, or even user threads, today.
Unfortunate, that.)

-andy

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-24  0:03           ` Andy Isaacson
@ 2002-09-24  0:10             ` Jeff Garzik
  2002-09-24  0:14               ` Andy Isaacson
  2002-09-24  5:53             ` Ingo Molnar
  2002-09-24 20:34             ` David Schwartz
  2 siblings, 1 reply; 114+ messages in thread
From: Jeff Garzik @ 2002-09-24  0:10 UTC (permalink / raw)
  To: Andy Isaacson
  Cc: Ingo Molnar, Bill Davidsen, Larry McVoy, Peter Waechtler,
	linux-kernel, Ingo Molnar

Andy Isaacson wrote:
> Of course this can be (and frequently is) implemented such that there is
> not one Pthreads thread per object; given simulation environments with 1
> million objects, and the current crappy state of Pthreads
> implementations, the researchers have no choice.


Are these object threads mostly active or inactive?

Regardless, it seems obvious with today's hardware, that 1 million 
objects should never be one-thread-per-object, pthreads or no.  That's 
just lazy programming.

	Jeff




^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 20:32         ` Ingo Molnar
@ 2002-09-24  0:03           ` Andy Isaacson
  2002-09-24  0:10             ` Jeff Garzik
                               ` (2 more replies)
  2002-09-24  7:12           ` Thunder from the hill
  1 sibling, 3 replies; 114+ messages in thread
From: Andy Isaacson @ 2002-09-24  0:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Bill Davidsen, Larry McVoy, Peter Waechtler, linux-kernel, Ingo Molnar

On Mon, Sep 23, 2002 at 10:32:00PM +0200, Ingo Molnar wrote:
> On Mon, 23 Sep 2002, Bill Davidsen wrote:
> > The programs which benefit from N:M are exactly those which don't behave
> > the way you describe. [...]
> 
> 90% of the programs that matter behave exactly like Larry has described.
> IO is the main source of blocking. Go and profile a busy webserver or
> mailserver or database server yourself if you dont believe it.

There are heavily-threaded programs out there that do not behave this
way, and for which a N:M thread model is completely appropriate.  For
example, simulation codes in operations research are most naturally
implemented as one thread per object being simulated, with virtually no
IO outside the simulation.  The vast majority of the computation time in
such a simulation is spent doing small amounts of work local to the
thread, then sending small messages to another thread via a FIFO, then
going to sleep waiting for more work.

Of course this can be (and frequently is) implemented such that there is
not one Pthreads thread per object; given simulation environments with 1
million objects, and the current crappy state of Pthreads
implementations, the researchers have no choice.

-andy

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 20:36         ` Ingo Molnar
  2002-09-23 21:08           ` Peter Wächtler
@ 2002-09-23 23:57           ` Andy Isaacson
  2002-09-24 18:10             ` Christoph Hellwig
  1 sibling, 1 reply; 114+ messages in thread
From: Andy Isaacson @ 2002-09-23 23:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Waechtler, Larry McVoy, Bill Davidsen, linux-kernel, ingo Molnar

I hate big CC lists like this, but I don't know that everyone will see
this if I don't keep the CC list.  Sigh.

On Mon, Sep 23, 2002 at 10:36:28PM +0200, Ingo Molnar wrote:
> On Mon, 23 Sep 2002, Peter Waechtler wrote:
> > Getting into kernel is not the same as a context switch. Return EAGAIN
> > or EWOULDBLOCK is definetly _not_ causing a context switch.
> 
> this is a common misunderstanding. When switching from thread to thread in
> the 1:1 model, most of the cost comes from entering/exiting the kernel. So
> *once* we are in the kernel the cheapest way is not to piggyback to
> userspace to do some userspace context-switch - but to do it right in the
> kernel.
> 
> in the kernel we can do much higher quality scheduling decisions than in
> userspace. SMP affinity, various statistics are right available in
> kernel-space - userspace does not have any of that. Not to talk about
> preemption.

Excellent points, Ingo.  An alternative that I haven't seen considered
is the M:N threading model that NetBSD is adopting, called Scheduler
Activations.  The paper makes excellent reading.

http://web.mit.edu/nathanw/www/usenix/freenix-sa/freenix-sa.html

One advantage of a SA-style system is that the kernel automatically and
very cleanly has a lot of information about the job as a single unit,
for purposes such as signal delivery, scheduling decisions, (and if it
came to that) paging/swapping.  The original Linus-dogma (as I
understood it -- I may well be misrepresenting things here) is that "a
thread is a process, and that's all there is to it".  This has a lovely
clarity, but it ignores the fact that there are times when it's
*important* that the kernel know that "these N threads belong to a
single job".  It appears that the NPTL work is creating a new
"collection-of-threads" object, which will fulfill the role I mention
above...  and this isn't a lot different from the end result of Nathan
Williams' SA work.

Another advantage of keeping a "process" concept is that things like CSA
(Compatible System Accounting, nee Cray System Accounting) need to add
some overhead to process startup/teardown.  If a "thread" can be created
without creating a new "process", this overhead is not needlessly
present at thread-startup time.

-andy

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 23:01               ` Bill Huey
@ 2002-09-23 23:11                 ` Mark Mielke
  2002-09-24  0:21                   ` Bill Huey
  0 siblings, 1 reply; 114+ messages in thread
From: Mark Mielke @ 2002-09-23 23:11 UTC (permalink / raw)
  To: Bill Huey
  Cc: Peter W?chtler, Ingo Molnar, Larry McVoy, Bill Davidsen, linux-kernel

On Mon, Sep 23, 2002 at 04:01:22PM -0700, Bill Huey wrote:
> On Mon, Sep 23, 2002 at 06:44:23PM -0400, Mark Mielke wrote:
> > Certainly the above descriptions are not fully accurate, or complete,
> > and it is possible that the M:N threading would make a fair compromise
> > between OS thread sand user-space threads, however, if user-space threads
> > requires all this extra work, and M:N threads requires some extra work,
> > some less work, and extra book keeping and system calls, why couldn't
> > OS threads by themselves be more efficient?
> Crazy synchronization by non-web-server like applications. Who knows. I
> personally can't think up really clear example at this time since I don't
> do that kind of programming, but I'm sure concurrency experts can...
> I'm just not one of those people.

I do not find it to be profitable to discourage the people working on
this project. If they fail, nobody loses. If they succeed, they can
re-invent the math behind threading, and Linux ends up on the forefront
of operating systems offering the technology.

As for 'crazy synchronization', solutions such as the FUTEX have no
real negative aspects. It wasn't long ago that the FUTEX did not
exist. Why couldn't innovation make 'crazy synchronization by
non-web-server like applications' more efficient using kernel threads?

Concurrency experts would welcome the change. Concurrent 'experts'
would not welcome the change, as it would force them to have to
re-learn everything they know, effectively obsoleting their 'expert'
status. (note the difference between the unquoted, and the quoted...)

Cheers, and good luck...
mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 22:44             ` Mark Mielke
@ 2002-09-23 23:01               ` Bill Huey
  2002-09-23 23:11                 ` Mark Mielke
  0 siblings, 1 reply; 114+ messages in thread
From: Bill Huey @ 2002-09-23 23:01 UTC (permalink / raw)
  To: Mark Mielke
  Cc: Peter W?chtler, Ingo Molnar, Larry McVoy, Bill Davidsen,
	linux-kernel, Bill Huey (Hui)

On Mon, Sep 23, 2002 at 06:44:23PM -0400, Mark Mielke wrote:
> Think of it this way... two threads are blocked on different resources...
> The currently executing thread reaches a point where it blocks.
> 
>     OS threads:
>         1) thread#1 invokes a system call
>         2) OS switches tasks to thread#2 and returns from blocking
> 
>     user-space threads:
>         1) thread#1 invokes a system call
>         2) thread#1 returns from system call, EWOULDBLOCK

>         3) thread#1 invokes poll(), select(), ioctl() to determine state
>         4) thread#1 returns from system call

More like the UTS blocks the thread and waits for an IO upcall to notify
the change of state in the kernel. It's equivalent to a single in overhead,
something like a SIGIO, or async IO notification.

Delete 3 and 4. It's certainly much faster than select() and family.

>         5) thread#1 switches stack pointer to be thread#2 upon determination
>            that the resource thread#2 was waiting on is ready.

Right, then marks it running and runs it.

> Certainly the above descriptions are not fully accurate, or complete,
> and it is possible that the M:N threading would make a fair compromise
> between OS thread sand user-space threads, however, if user-space threads
> requires all this extra work, and M:N threads requires some extra work,
> some less work, and extra book keeping and system calls, why couldn't
> OS threads by themselves be more efficient?

Crazy synchronization by non-web-server like applications. Who knows. I
personally can't think up really clear example at this time since I don't
do that kind of programming, but I'm sure concurrency experts can...

I'm just not one of those people.

bill


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 21:41       ` dean gaudet
  2002-09-23 22:10         ` Bill Huey
@ 2002-09-23 22:56         ` Mark Mielke
  1 sibling, 0 replies; 114+ messages in thread
From: Mark Mielke @ 2002-09-23 22:56 UTC (permalink / raw)
  To: dean gaudet
  Cc: Larry McVoy, Bill Davidsen, Peter Waechtler, linux-kernel, ingo Molnar

On Mon, Sep 23, 2002 at 02:41:33PM -0700, dean gaudet wrote:
> so while this is I/O, it's certainly less efficient to have thousands of
> tasks blocked in read(2) versus having thousands of entries in <pick your
> favourite poll/select/etc. mechanism>.

In terms of kernel memory, perhaps. In terms of 'efficiency', I
wouldn't be so sure. Java uses a wack of user space storage to
represent threads regardless of whether they are green or native.  The
only difference is - is Java calling poll()/select()/ioctl()
routinely? Or are the tasks sitting in an efficient kernel task queue?

Which has a better chance of being more efficient, in terms of dispatching,
(especially considering that most of the time, most java threads are idle),
and which has a better chance of being more efficient in terms of the
overhead of querying whether a task is ready to run? I lean towards the OS
on both counts.

mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 21:08           ` Peter Wächtler
@ 2002-09-23 22:44             ` Mark Mielke
  2002-09-23 23:01               ` Bill Huey
  0 siblings, 1 reply; 114+ messages in thread
From: Mark Mielke @ 2002-09-23 22:44 UTC (permalink / raw)
  To: Peter Wächtler; +Cc: Ingo Molnar, Larry McVoy, Bill Davidsen, linux-kernel

On Mon, Sep 23, 2002 at 11:08:53PM +0200, Peter Wächtler wrote:
> With 1:1 on "hitting" a blocking condition the kernel will
> switch to a different beast (yes, a thread gets a bonus for
> using the same MM and the same cpu).

> But on M:N the "user process" makes some more progress in its
> timeslice (does it get even punished for eating up its 
> timeslice?) I would think that it tends to cause less context
> switches but tends to do more syscalls :-(

Think of it this way... two threads are blocked on different resources...
The currently executing thread reaches a point where it blocks.

    OS threads:
        1) thread#1 invokes a system call
        2) OS switches tasks to thread#2 and returns from blocking

    user-space threads:
        1) thread#1 invokes a system call
        2) thread#1 returns from system call, EWOULDBLOCK
        3) thread#1 invokes poll(), select(), ioctl() to determine state
        4) thread#1 returns from system call
        5) thread#1 switches stack pointer to be thread#2 upon determination
           that the resource thread#2 was waiting on is ready.

Certainly the above descriptions are not fully accurate, or complete,
and it is possible that the M:N threading would make a fair compromise
between OS thread sand user-space threads, however, if user-space threads
requires all this extra work, and M:N threads requires some extra work,
some less work, and extra book keeping and system calls, why couldn't
OS threads by themselves be more efficient?

mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 19:48       ` Bill Davidsen
  2002-09-23 20:32         ` Ingo Molnar
@ 2002-09-23 22:35         ` Mark Mielke
  1 sibling, 0 replies; 114+ messages in thread
From: Mark Mielke @ 2002-09-23 22:35 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Larry McVoy, Peter Waechtler, linux-kernel, ingo Molnar

On Mon, Sep 23, 2002 at 03:48:58PM -0400, Bill Davidsen wrote:
> On Mon, 23 Sep 2002, Larry McVoy wrote:
> > Sure, there are lotso benchmarks which show how fast user level threads
> > can context switch amongst each other and it is always faster than going
> > into the kernel.  So what?  What do you think causes a context switch in
> > a threaded program?  What?  Could it be blocking on I/O?  Like 99.999%
> > of the time?  And doesn't that mean you already went into the kernel to
> > see if the I/O was ready?  And doesn't that mean that in all the real
> > world applications they are already doing all the work you are arguing
> > to avoid?
> Actually you have it just backward. Let me try to explain how this works.
> The programs which benefit from N:M are exactly those which don't behave
> the way you describe. Think of programs using locking to access shared
> memory, or other fast resources which don't require a visit to the kernel.
> It would seem that the switch could be done much faster without the
> transition into and out of the kernel.

For operating systems that require cross-process locks to always be
kernel ops, yes. For operating systems that provide _any_ way for most
cross-process locks to be performed completely in user space (i.e. FUTEX),
the entire argument very quickly disappears.

Is there a situation you can think of that requires M:N threading
because accessing user space is cheaper than accessing kernel space? 
What this really means is that the design of the kernel space
primitives is not optimal, and that a potentially better solution that
would benefit more people by a great amount, would be to redesign the
kernel primitives. (i.e. FUTEX)

> Looking for data before forming an opinion has always seemed to be
> reasonable, and the way design decisions are usually made in Linux, based
> on the performance of actual code. The benchmark numbers reports are
> encouraging, but actual production loads may not show the huge improvement
> seen in the benchmarks. And I don't think anyone is implying that they
> will.

You say that people should look to data before forming an opinion, but
you also say that benchmarks mean little and you *suspect* real loads may
be different. It seems to me that you might take your own advice, and
use 'real data' before reaching your own conclusion.

> Given how small the overhead of threading is on a typical i/o bound
> application such as you mentioned, I'm not sure the improvement will be
> above the noise. The major improvement from NGPT is not performance in
> many cases, but elimination of unexpected application behaviour.

Many people would argue that threading overhead has been traditional quite
high. They would have 'real data' to substantiate their claims.

> When someone responds to a technical question with an attack on the
> question instead of a technical response I always wonder why. In this case
> other people have provided technical feedback and I'm sure we will see
> some actual application numbers in a short time. I have an IPC benchmark
> I'd like to try if I could get any of my test servers to boot a recent
> kernel :-(

I've always considered 1:1 to be an optimal model, but an unreachable
model, like cold fusion. :-)

If the kernel can manage the tasks such that they can be very quickly
switched betweens queues, and the run queue can be minimized to
contain only tasks that need to run, or that have a very high
probability of needing to run, and if operations such as locks can be
done, at least in the common case, completely in user space, there
is no reason why 1:1 could not be better than M:N.

There _are_ reasons why OS threads could be better than user space
threads, and the reasons all relate to threads that do actual work.

The line between 1:1 and M:N is artificially bold. M:N is a necessity
where 1:1 is inefficient. Where 1:1 is efficient, M:N ceases to be a
necessity.

mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 21:41       ` dean gaudet
@ 2002-09-23 22:10         ` Bill Huey
  2002-09-23 22:56         ` Mark Mielke
  1 sibling, 0 replies; 114+ messages in thread
From: Bill Huey @ 2002-09-23 22:10 UTC (permalink / raw)
  To: dean gaudet
  Cc: Larry McVoy, Bill Davidsen, Peter Waechtler, linux-kernel,
	ingo Molnar, Bill Huey (Hui)

On Mon, Sep 23, 2002 at 02:41:33PM -0700, dean gaudet wrote:
> so while this is I/O, it's certainly less efficient to have thousands of
> tasks blocked in read(2) versus having thousands of entries in <pick your
> favourite poll/select/etc. mechanism>.

NIO in the recent 1.4 J2SE solves this problem now and threads don't have to
be abused.

bill


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 15:30     ` Larry McVoy
                         ` (3 preceding siblings ...)
  2002-09-23 21:32       ` Bill Huey
@ 2002-09-23 21:41       ` dean gaudet
  2002-09-23 22:10         ` Bill Huey
  2002-09-23 22:56         ` Mark Mielke
  2002-09-24 10:02       ` Nikita Danilov
  5 siblings, 2 replies; 114+ messages in thread
From: dean gaudet @ 2002-09-23 21:41 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Bill Davidsen, Peter Waechtler, linux-kernel, ingo Molnar

On Mon, 23 Sep 2002, Larry McVoy wrote:

> What do you think causes a context switch in
> a threaded program?  What?  Could it be blocking on I/O?

unfortunately java was originally designed with a thread-per-connection
model as the *only* method of implementing servers.  there wasn't a
non-blocking network API ... and i hear that such an API is in the works,
but i've no idea where it is yet.

so while this is I/O, it's certainly less efficient to have thousands of
tasks blocked in read(2) versus having thousands of entries in <pick your
favourite poll/select/etc. mechanism>.

this is a java problem though... i posted a jvm straw-man proposal years
ago when IBM posted some "linux threading isn't efficient" paper.  since
java threads are way less painful to implement than pthreads, i suggested
the jvm do the M part of M:N.

-dean


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 15:30     ` Larry McVoy
                         ` (2 preceding siblings ...)
  2002-09-23 19:59       ` Peter Waechtler
@ 2002-09-23 21:32       ` Bill Huey
  2002-09-23 21:41       ` dean gaudet
  2002-09-24 10:02       ` Nikita Danilov
  5 siblings, 0 replies; 114+ messages in thread
From: Bill Huey @ 2002-09-23 21:32 UTC (permalink / raw)
  To: Larry McVoy
  Cc: Larry McVoy, Bill Davidsen, Peter Waechtler, linux-kernel,
	ingo Molnar, Bill Huey (Hui)

On Mon, Sep 23, 2002 at 08:30:04AM -0700, Larry McVoy wrote:
> > No matter how fast you do context switch in and out of kernel and a sched
> > to see what runs next, it can't be done as fast as it can be avoided.
> 
> You are arguing about how many angels can dance on the head of a pin.
> Sure, there are lotso benchmarks which show how fast user level threads
> can context switch amongst each other and it is always faster than going
> into the kernel.  So what?  What do you think causes a context switch in
> a threaded program?  What?  Could it be blocking on I/O?  Like 99.999%

That's just for traditional Unix applications, which is only one category.
You exclude CPU intensive applications in that criticism, media related
and otherwise. What about cases where you need to balance a large data
structure across large number of threads or something like that ?

> of the time?  And doesn't that mean you already went into the kernel to
> see if the I/O was ready?  And doesn't that mean that in all the real
> world applications they are already doing all the work you are arguing
> to avoid?

IO isn't the only thing that's event driven. What about event driven
systems that depend on a fast condition-variable ? That's very cheap in
a UTS (userspace thread system), 2 context switches, a call to thread-kernel
to dequeue a waiter and releasing/aquiring some very light weight userspace
locks. And difficult to beat if you think about it.

So that level of confidence in 1:1 is a intuitively presumptuous for those
reasons.

But if you're architecture is broken or exotic...then it gets more complicated ;)

bill


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 10:05   ` Bill Davidsen
  2002-09-23 11:55     ` Peter Waechtler
  2002-09-23 15:30     ` Larry McVoy
@ 2002-09-23 21:22     ` Bill Huey
  2 siblings, 0 replies; 114+ messages in thread
From: Bill Huey @ 2002-09-23 21:22 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Larry McVoy, Peter Waechtler, linux-kernel, ingo Molnar, Bill Huey (Hui)

On Mon, Sep 23, 2002 at 06:05:18AM -0400, Bill Davidsen wrote:
> And BSD? And Solaris?

Me and buddy of mine ran lmbench on NetBSD 1.6 (ppc) and a recent version
of Linux (same machine) and found that NetBSD was about 2x slower than
Linux for context switching. It's really not that bad considering that it
was worse at one point. It might effect things like inter-process pipe
communication performance, but it's not outside of reasonablility to use
a 1:1 system in that case.

BTW, NetBSD is moving to a scheduler activations threading system and
they have some preliminary stuff in the works and working. ;)

> > If that's a true statement, and in Linux it tends to be far more true
> > than other operating systems, then there is no reason to have M:N.
> 
> No matter how fast you do context switch in and out of kernel and a sched
> to see what runs next, it can't be done as fast as it can be avoided.
> Being N:M doesn't mean all implementations must be faster, just that doing
> it all in user mode CAN be faster.

Unless you have a broken architecture like the x86. The FPU in that case
can be problematic and folks where playing around with adding a syscall
to query the status of the FPU. They things might be more even, but...
this is also unclear as to how these variables are going to play out.

> Benchmarks are nice, I await results from a loaded production threaded
> DNS/mail/web/news/database server. Well, I guess production and 2.5 don't
> really go together, do they, but maybe some experimental site which could
> use 2.5 long enough to get numbers. If you could get a threaded database
> to run, that would be a good test of shared resources rather than a bunch
> of independent activities doing i/o. 

I think that would be interesting too.

bill


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 20:36         ` Ingo Molnar
@ 2002-09-23 21:08           ` Peter Wächtler
  2002-09-23 22:44             ` Mark Mielke
  2002-09-23 23:57           ` Andy Isaacson
  1 sibling, 1 reply; 114+ messages in thread
From: Peter Wächtler @ 2002-09-23 21:08 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Larry McVoy, Bill Davidsen, linux-kernel

Ingo Molnar schrieb:
> 
> On Mon, 23 Sep 2002, Peter Waechtler wrote:
> 
> > Getting into kernel is not the same as a context switch. Return EAGAIN
> > or EWOULDBLOCK is definetly _not_ causing a context switch.
> 
> this is a common misunderstanding. When switching from thread to thread in
> the 1:1 model, most of the cost comes from entering/exiting the kernel. So
> *once* we are in the kernel the cheapest way is not to piggyback to
> userspace to do some userspace context-switch - but to do it right in the
> kernel.
> 
> in the kernel we can do much higher quality scheduling decisions than in
> userspace. SMP affinity, various statistics are right available in
> kernel-space - userspace does not have any of that. Not to talk about
> preemption.
> 

I'm already almost convinced on the NPT way of doing threading.
But still: the timeslice is per process (and kernel thread).
You still have other processes running.
With 1:1 on "hitting" a blocking condition the kernel will
switch to a different beast (yes, a thread gets a bonus for
using the same MM and the same cpu).
But on M:N the "user process" makes some more progress in its
timeslice (does it get even punished for eating up its 
timeslice?) I would think that it tends to cause less context
switches but tends to do more syscalls :-(

I already had a closer look at NGPT before reading Ulrich's
comments on the phil-list and on his website. I already thought
"puh, that's a complicated beast", and as I saw the
fcntl(GETFL);fcntl(O_NONBLOCK);write();fcntl(oldflags); thingy..

Well, with an O(1) scheduler, faster thread creation and exit
NPT has good chances to perform faster.

Now I'm just curious about the argument about context switch
times. Is Linux really that much faster than Solaris, Irix etc.?

Do you have numbers (or a hint) on comparable (ideal: identical) 
hardware? Is LMbench a good starting point?

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-22 18:55 Peter Waechtler
  2002-09-22 21:32 ` Larry McVoy
@ 2002-09-23 21:03 ` Bill Huey
  2002-09-24 12:03   ` Michael Sinz
  2002-09-24 20:19 ` David Schwartz
  2 siblings, 1 reply; 114+ messages in thread
From: Bill Huey @ 2002-09-23 21:03 UTC (permalink / raw)
  To: Peter Waechtler; +Cc: linux-kernel, ingo Molnar, Bill Huey (Hui)

On Sun, Sep 22, 2002 at 08:55:39PM +0200, Peter Waechtler wrote:
> AIX and Irix deploy M:N - I guess for a good reason: it's more
> flexible and combine both approaches with easy runtime tuning if
> the app happens to run on SMP (the uncommon case).

Also, for process scoped scheduling in a way so that system wide threads
don't have an impact on a process slice. Folks have piped up about that
being important.

bill


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 19:59       ` Peter Waechtler
@ 2002-09-23 20:36         ` Ingo Molnar
  2002-09-23 21:08           ` Peter Wächtler
  2002-09-23 23:57           ` Andy Isaacson
  0 siblings, 2 replies; 114+ messages in thread
From: Ingo Molnar @ 2002-09-23 20:36 UTC (permalink / raw)
  To: Peter Waechtler; +Cc: Larry McVoy, Bill Davidsen, linux-kernel, ingo Molnar


On Mon, 23 Sep 2002, Peter Waechtler wrote:

> Getting into kernel is not the same as a context switch. Return EAGAIN
> or EWOULDBLOCK is definetly _not_ causing a context switch.

this is a common misunderstanding. When switching from thread to thread in
the 1:1 model, most of the cost comes from entering/exiting the kernel. So
*once* we are in the kernel the cheapest way is not to piggyback to
userspace to do some userspace context-switch - but to do it right in the
kernel.

in the kernel we can do much higher quality scheduling decisions than in
userspace. SMP affinity, various statistics are right available in
kernel-space - userspace does not have any of that. Not to talk about
preemption.

	Ingo


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 19:48       ` Bill Davidsen
@ 2002-09-23 20:32         ` Ingo Molnar
  2002-09-24  0:03           ` Andy Isaacson
  2002-09-24  7:12           ` Thunder from the hill
  2002-09-23 22:35         ` Mark Mielke
  1 sibling, 2 replies; 114+ messages in thread
From: Ingo Molnar @ 2002-09-23 20:32 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Larry McVoy, Peter Waechtler, linux-kernel, Ingo Molnar


On Mon, 23 Sep 2002, Bill Davidsen wrote:

> The programs which benefit from N:M are exactly those which don't behave
> the way you describe. [...]

90% of the programs that matter behave exactly like Larry has described.
IO is the main source of blocking. Go and profile a busy webserver or
mailserver or database server yourself if you dont believe it.

> [...] Think of programs using locking to access shared memory, or other
> fast resources which don't require a visit to the kernel. [...]

oh - actually, such things are quite rare it turns out. And even if it
happens, the 1:1 model is handling this perfectly fine via futexes, as
long as the contention of the shared resource is light. Which it better be
...

any application with heavy contention over some global shared resource is
serializing itself already and has much bigger problems than that of the
threading model ... Its performance will be bad both under M:N and 1:1
models - think about it.

so a threading abstraction must concentrate on what really matters:  
performing actual useful tasks - most of those tasks involve the use of
some resource, block IO, network IO, user IO - each of them involve entry
into the kernel - at which point the 1:1 design fits much better.

(and all your followup arguments are void due to this basic
misunderstanding.)

	Ingo


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 15:30     ` Larry McVoy
  2002-09-23 19:44       ` Olivier Galibert
  2002-09-23 19:48       ` Bill Davidsen
@ 2002-09-23 19:59       ` Peter Waechtler
  2002-09-23 20:36         ` Ingo Molnar
  2002-09-23 21:32       ` Bill Huey
                         ` (2 subsequent siblings)
  5 siblings, 1 reply; 114+ messages in thread
From: Peter Waechtler @ 2002-09-23 19:59 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Bill Davidsen, linux-kernel, ingo Molnar


Am Montag den, 23. September 2002, um 17:30, schrieb Larry McVoy:

>>> Instead of taking the traditional "we've screwed up the normal system
>>> primitives so we'll event new lightweight ones" try this:
>>>
>>> We depend on the system primitives to not be broken or slow.
>>>
>>> If that's a true statement, and in Linux it tends to be far more true
>>> than other operating systems, then there is no reason to have M:N.
>>
>> No matter how fast you do context switch in and out of kernel and a 
>> sched
>> to see what runs next, it can't be done as fast as it can be avoided.
>
> You are arguing about how many angels can dance on the head of a pin.
> Sure, there are lotso benchmarks which show how fast user level threads
> can context switch amongst each other and it is always faster than going
> into the kernel.  So what?  What do you think causes a context switch in
> a threaded program?  What?  Could it be blocking on I/O?  Like 99.999%
> of the time?  And doesn't that mean you already went into the kernel to
> see if the I/O was ready?  And doesn't that mean that in all the real
> world applications they are already doing all the work you are arguing
> to avoid?

Getting into kernel is not the same as a context switch.
Return EAGAIN or EWOULDBLOCK is definetly _not_ causing a context switch.

Is sys_getpid() causing a context switch? Unlikely
Do you know what blocking IO means?  M:N is about to avoid blocking IO!


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 15:30     ` Larry McVoy
  2002-09-23 19:44       ` Olivier Galibert
@ 2002-09-23 19:48       ` Bill Davidsen
  2002-09-23 20:32         ` Ingo Molnar
  2002-09-23 22:35         ` Mark Mielke
  2002-09-23 19:59       ` Peter Waechtler
                         ` (3 subsequent siblings)
  5 siblings, 2 replies; 114+ messages in thread
From: Bill Davidsen @ 2002-09-23 19:48 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Peter Waechtler, linux-kernel, ingo Molnar

On Mon, 23 Sep 2002, Larry McVoy wrote:

> > No matter how fast you do context switch in and out of kernel and a sched
> > to see what runs next, it can't be done as fast as it can be avoided.
> 
> You are arguing about how many angels can dance on the head of a pin.

Than you have sadly misunderstood the discussion.

> Sure, there are lotso benchmarks which show how fast user level threads
> can context switch amongst each other and it is always faster than going
> into the kernel.  So what?  What do you think causes a context switch in
> a threaded program?  What?  Could it be blocking on I/O?  Like 99.999%
> of the time?  And doesn't that mean you already went into the kernel to
> see if the I/O was ready?  And doesn't that mean that in all the real
> world applications they are already doing all the work you are arguing
> to avoid?

Actually you have it just backward. Let me try to explain how this works.
The programs which benefit from N:M are exactly those which don't behave
the way you describe. Think of programs using locking to access shared
memory, or other fast resources which don't require a visit to the kernel.
It would seem that the switch could be done much faster without the
transition into and out of the kernel.

Looking for data before forming an opinion has always seemed to be
reasonable, and the way design decisions are usually made in Linux, based
on the performance of actual code. The benchmark numbers reports are
encouraging, but actual production loads may not show the huge improvement
seen in the benchmarks. And I don't think anyone is implying that they
will.

Given how small the overhead of threading is on a typical i/o bound
application such as you mentioned, I'm not sure the improvement will be
above the noise. The major improvement from NGPT is not performance in
many cases, but elimination of unexpected application behaviour.

When someone responds to a technical question with an attack on the
question instead of a technical response I always wonder why. In this case
other people have provided technical feedback and I'm sure we will see
some actual application numbers in a short time. I have an IPC benchmark
I'd like to try if I could get any of my test servers to boot a recent
kernel :-(

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 15:30     ` Larry McVoy
@ 2002-09-23 19:44       ` Olivier Galibert
  2002-09-23 19:48       ` Bill Davidsen
                         ` (4 subsequent siblings)
  5 siblings, 0 replies; 114+ messages in thread
From: Olivier Galibert @ 2002-09-23 19:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: Larry McVoy, Bill Davidsen, Larry McVoy, Peter Waechtler, ingo Molnar

On Mon, Sep 23, 2002 at 08:30:04AM -0700, Larry McVoy wrote:
> What do you think causes a context switch in
> a threaded program?  What?  Could it be blocking on I/O?  Like 99.999%
> of the time?  And doesn't that mean you already went into the kernel to
> see if the I/O was ready?  And doesn't that mean that in all the real
> world applications they are already doing all the work you are arguing
> to avoid?

I suspect a fair number of cases is preemption too, when you fire up
computation threads in the background.  Of course, the preemption
event always goes through the kernel at some point, even if it's only
a SIGALARM.

Actually, in normal programs (even java ones), _when_ is a thread
voluntarily giving up control?  Locks?

  OG.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 11:55     ` Peter Waechtler
@ 2002-09-23 19:14       ` Bill Davidsen
  2002-09-29 23:26         ` Buddy Lumpkin
  0 siblings, 1 reply; 114+ messages in thread
From: Bill Davidsen @ 2002-09-23 19:14 UTC (permalink / raw)
  To: Peter Waechtler; +Cc: Larry McVoy, linux-kernel, ingo Molnar

On Mon, 23 Sep 2002, Peter Waechtler wrote:

> Am Montag den, 23. September 2002, um 12:05, schrieb Bill Davidsen:
> 
> > On Sun, 22 Sep 2002, Larry McVoy wrote:
> >
> >> On Sun, Sep 22, 2002 at 08:55:39PM +0200, Peter Waechtler wrote:
> >>> AIX and Irix deploy M:N - I guess for a good reason: it's more
> >>> flexible and combine both approaches with easy runtime tuning if
> >>> the app happens to run on SMP (the uncommon case).
> >>
> >> No, AIX and IRIX do it that way because their processes are so bloated
> >> that it would be unthinkable to do a 1:1 model.
> >
> > And BSD? And Solaris?
> 
> Don't know. I don't have access to all those Unices. I could try FreeBSD.

At your convenience.
 
> According to http://www.kegel.com/c10k.html  Sun is moving to 1:1
> and FreeBSD still believes in M:N

Sun is total news to me, "moving to" may be in Solaris 9, Sol8 seems to
still be N:M. BSD is as I thought.
> 
> MacOSX 10.1 does not support PROCESS_SHARED locks, tried that 5 minutes 
> ago.

Thank you for the effort. Hum, that's a bit of a surprise, at least to me. 

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
@ 2002-09-23 16:36 Matthias Urlichs
  0 siblings, 0 replies; 114+ messages in thread
From: Matthias Urlichs @ 2002-09-23 16:36 UTC (permalink / raw)
  To: linux-kernel

Peter Waechtler:
>  [ unattributed -- please don't discard attributions ]

>  Having multiple threads doing real work including IO means more
>  blocking IO and therefore more context switches.

On the other hand, having to multiplex in userspace requires calls to 
poll() et al, _and_ explicitly handling state which the kernel needs 
to handle anyway -- including locking and all that crap.

Given that an efficient and fairly-low-cost 1:1 implementation is 
demonstrably possible ;-) the necessity to do any kind of n:m work 
strikes me as extremely low.
-- 
Matthias Urlichs      http://smurf.noris.de     ICQ:20193661    AIM:smurfixx

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 10:05   ` Bill Davidsen
  2002-09-23 11:55     ` Peter Waechtler
@ 2002-09-23 15:30     ` Larry McVoy
  2002-09-23 19:44       ` Olivier Galibert
                         ` (5 more replies)
  2002-09-23 21:22     ` Bill Huey
  2 siblings, 6 replies; 114+ messages in thread
From: Larry McVoy @ 2002-09-23 15:30 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Larry McVoy, Peter Waechtler, linux-kernel, ingo Molnar

> > Instead of taking the traditional "we've screwed up the normal system 
> > primitives so we'll event new lightweight ones" try this:
> > 
> > We depend on the system primitives to not be broken or slow.
> > 
> > If that's a true statement, and in Linux it tends to be far more true
> > than other operating systems, then there is no reason to have M:N.
> 
> No matter how fast you do context switch in and out of kernel and a sched
> to see what runs next, it can't be done as fast as it can be avoided.

You are arguing about how many angels can dance on the head of a pin.
Sure, there are lotso benchmarks which show how fast user level threads
can context switch amongst each other and it is always faster than going
into the kernel.  So what?  What do you think causes a context switch in
a threaded program?  What?  Could it be blocking on I/O?  Like 99.999%
of the time?  And doesn't that mean you already went into the kernel to
see if the I/O was ready?  And doesn't that mean that in all the real
world applications they are already doing all the work you are arguing
to avoid?
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-23 10:05   ` Bill Davidsen
@ 2002-09-23 11:55     ` Peter Waechtler
  2002-09-23 19:14       ` Bill Davidsen
  2002-09-23 15:30     ` Larry McVoy
  2002-09-23 21:22     ` Bill Huey
  2 siblings, 1 reply; 114+ messages in thread
From: Peter Waechtler @ 2002-09-23 11:55 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Larry McVoy, linux-kernel, ingo Molnar

Am Montag den, 23. September 2002, um 12:05, schrieb Bill Davidsen:

> On Sun, 22 Sep 2002, Larry McVoy wrote:
>
>> On Sun, Sep 22, 2002 at 08:55:39PM +0200, Peter Waechtler wrote:
>>> AIX and Irix deploy M:N - I guess for a good reason: it's more
>>> flexible and combine both approaches with easy runtime tuning if
>>> the app happens to run on SMP (the uncommon case).
>>
>> No, AIX and IRIX do it that way because their processes are so bloated
>> that it would be unthinkable to do a 1:1 model.
>
> And BSD? And Solaris?

Don't know. I don't have access to all those Unices. I could try FreeBSD.

According to http://www.kegel.com/c10k.html  Sun is moving to 1:1
and FreeBSD still believes in M:N

MacOSX 10.1 does not support PROCESS_SHARED locks, tried that 5 minutes 
ago.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-22 21:32 ` Larry McVoy
@ 2002-09-23 10:05   ` Bill Davidsen
  2002-09-23 11:55     ` Peter Waechtler
                       ` (2 more replies)
  0 siblings, 3 replies; 114+ messages in thread
From: Bill Davidsen @ 2002-09-23 10:05 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Peter Waechtler, linux-kernel, ingo Molnar

On Sun, 22 Sep 2002, Larry McVoy wrote:

> On Sun, Sep 22, 2002 at 08:55:39PM +0200, Peter Waechtler wrote:
> > AIX and Irix deploy M:N - I guess for a good reason: it's more
> > flexible and combine both approaches with easy runtime tuning if
> > the app happens to run on SMP (the uncommon case).
> 
> No, AIX and IRIX do it that way because their processes are so bloated
> that it would be unthinkable to do a 1:1 model.

And BSD? And Solaris?
 
> Instead of taking the traditional "we've screwed up the normal system 
> primitives so we'll event new lightweight ones" try this:
> 
> We depend on the system primitives to not be broken or slow.
> 
> If that's a true statement, and in Linux it tends to be far more true
> than other operating systems, then there is no reason to have M:N.

No matter how fast you do context switch in and out of kernel and a sched
to see what runs next, it can't be done as fast as it can be avoided.
Being N:M doesn't mean all implementations must be faster, just that doing
it all in user mode CAN be faster.

Benchmarks are nice, I await results from a loaded production threaded
DNS/mail/web/news/database server. Well, I guess production and 2.5 don't
really go together, do they, but maybe some experimental site which could
use 2.5 long enough to get numbers. If you could get a threaded database
to run, that would be a good test of shared resources rather than a bunch
of independent activities doing i/o. 

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
  2002-09-22 18:55 Peter Waechtler
@ 2002-09-22 21:32 ` Larry McVoy
  2002-09-23 10:05   ` Bill Davidsen
  2002-09-23 21:03 ` Bill Huey
  2002-09-24 20:19 ` David Schwartz
  2 siblings, 1 reply; 114+ messages in thread
From: Larry McVoy @ 2002-09-22 21:32 UTC (permalink / raw)
  To: Peter Waechtler; +Cc: linux-kernel, ingo Molnar

On Sun, Sep 22, 2002 at 08:55:39PM +0200, Peter Waechtler wrote:
> AIX and Irix deploy M:N - I guess for a good reason: it's more
> flexible and combine both approaches with easy runtime tuning if
> the app happens to run on SMP (the uncommon case).

No, AIX and IRIX do it that way because their processes are so bloated
that it would be unthinkable to do a 1:1 model.

Instead of taking the traditional "we've screwed up the normal system 
primitives so we'll event new lightweight ones" try this:

We depend on the system primitives to not be broken or slow.

If that's a true statement, and in Linux it tends to be far more true
than other operating systems, then there is no reason to have M:N.
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
@ 2002-09-22 18:55 Peter Waechtler
  2002-09-22 21:32 ` Larry McVoy
                   ` (2 more replies)
  0 siblings, 3 replies; 114+ messages in thread
From: Peter Waechtler @ 2002-09-22 18:55 UTC (permalink / raw)
  To: linux-kernel; +Cc: ingo Molnar

 > The true cost of M:N shows up when threading is actually used
 > for what it's intended to be used :-)

 > M:N's big mistakes is that it concentrates on what
 > matters the least: useruser context switches.

Well, from the perspective of the kernel, userspace is a black box.
Is that also true for kernel developers?

If you, as an application engineer, decide to use a multithreaded
design, it could be a) you want to learn or b) have some good
reasons to choose that.

Having multiple threads doing real work including IO means more
blocking IO and therefore more context switches. One reason to
choose threading is to _not_ have to use select/poll in app code.
If you gather more IO requests and multiplex them with select/poll
the chances are higher that the syscall returns without context
switch. Therefore you _save_ some real context switches with
useruser context switches.

Don't make the mistake to think too much about the optimal case.
(as Linus told us: optimize for the _common_ case :)

You think that one should have an almost equal number of threads
and processors. This is unrealistic despite some server apps
running on +4(8?) way systems. With this assumption nobody would
write a multithreaded desktop app (>90% are UP).

The effect of M:N on UP systems should be even more clear. Your
multithreaded apps can't profit of parallelism but they do not
add load to the system scheduler. The drawback: more syscalls
(I think about removing the need for
flags=fcntl(GETFLAGS);fcntl(fd,NONBLOCK);write(fd);fcntl(fd,flags))

Until we have some numbers we can't say which approach is better.
I'm convinced that apps exist that run better on one and others
on the other.

AIX and Irix deploy M:N - I guess for a good reason: it's more
flexible and combine both approaches with easy runtime tuning if
the app happens to run on SMP (the uncommon case).

Your great work at the scheduler and tuning on exit are highly
appreciated. Both models profit - of course 1:1 much more.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [ANNOUNCE] Native POSIX Thread Library 0.1
@ 2002-09-20  7:46 Joerg Pommnitz
  0 siblings, 0 replies; 114+ messages in thread
From: Joerg Pommnitz @ 2002-09-20  7:46 UTC (permalink / raw)
  To: linux-kernel

Linus Torvalds wrote:
 > They started and waited for 100,000 threads.
 > 
 > They did not have them all running at the same time. I think the
 > original post said something like "up to 50 at a time".

To quote Ulrich:

"Even on IA-32 with its limited address space and memory handling
 running 100,000 concurrent threads was no problem at all, creating
 and destroying the threads did not take more than two seconds."

It clearly states 100,000 CONCURRENT threads. So, it really seems to
work (not that I have the hardware to verify this claim).

Regards
  Jörg

=====
-- 
Regards
       Joerg


__________________________________________________________________

Gesendet von Yahoo! Mail - http://mail.yahoo.de
Möchten Sie mit einem Gruß antworten? http://grusskarten.yahoo.de

^ permalink raw reply	[flat|nested] 114+ messages in thread

end of thread, other threads:[~2002-09-30 14:49 UTC | newest]

Thread overview: 114+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-20  0:41 [ANNOUNCE] Native POSIX Thread Library 0.1 Ulrich Drepper
2002-09-20  0:51 ` William Lee Irwin III
2002-09-20  1:35   ` Ulrich Drepper
2002-09-20  1:42     ` William Lee Irwin III
2002-09-20  1:56 ` Larry McVoy
2002-09-20  2:01 ` Rik van Riel
2002-09-20  2:15   ` Benjamin LaHaise
2002-09-20  2:40     ` Dave Hansen
2002-09-20  2:47     ` William Lee Irwin III
2002-09-20  2:17   ` Larry McVoy
2002-09-20  2:24     ` Rik van Riel
2002-09-20  2:32       ` Ulrich Drepper
2002-09-20  6:01       ` Linus Torvalds
2002-09-20  8:02         ` Ingo Molnar
2002-09-20  2:23   ` Anton Blanchard
2002-09-20  7:52   ` 100,000 threads? [was: [ANNOUNCE] Native POSIX Thread Library 0.1] Ingo Molnar
2002-09-20 15:47     ` Bill Davidsen
2002-09-20  9:53 ` [ANNOUNCE] Native POSIX Thread Library 0.1 Padraig Brady
2002-09-20 13:28   ` Robert Love
2002-09-20 16:01     ` Bill Davidsen
2002-09-20  9:54 ` Adrian Bunk
2002-09-20 10:53   ` Ingo Molnar
2002-09-20 19:04   ` Ulrich Drepper
2002-09-20 23:06     ` J.A. Magallon
2002-09-20 23:33       ` Ulrich Drepper
2002-09-20 23:42         ` J.A. Magallon
2002-09-20 10:20 ` Bill Huey
2002-09-20 10:47   ` Ingo Molnar
2002-09-20 12:06     ` Bill Huey
2002-09-20 16:20       ` Ingo Molnar
2002-09-20 21:50         ` Bill Huey
2002-09-20 22:30           ` dean gaudet
2002-09-20 23:11             ` Bill Huey
2002-09-21  3:38               ` dean gaudet
2002-09-21  4:01                 ` Bill Huey
2002-09-21  5:06                   ` Ingo Molnar
2002-09-20 23:45           ` Bill Huey
2002-09-21  4:58             ` Ingo Molnar
2002-09-22  2:51               ` Bill Huey
2002-09-21  4:48           ` Ingo Molnar
2002-09-22  1:38             ` Bill Huey
2002-09-22 13:38           ` Bill Davidsen
2002-09-22 18:41             ` Eric W. Biederman
2002-09-22 22:13               ` dean gaudet
2002-09-26 17:21                 ` Alan Cox
2002-09-23  0:11               ` Bill Huey
2002-09-24 16:07                 ` Eric W. Biederman
2002-09-24 23:21                   ` Bill Huey
2002-09-25  3:06                     ` Eric W. Biederman
2002-09-23 21:12             ` Bill Huey
2002-09-20 10:35 ` Luca Barbieri
2002-09-20 11:19   ` Ingo Molnar
2002-09-20 18:40     ` Roland McGrath
2002-09-20 21:21       ` Luca Barbieri
2002-09-20 12:37 ` jlnance
2002-09-20 16:42   ` Ingo Molnar
2002-09-24  0:40     ` Rusty Russell
2002-09-24  5:47       ` Ingo Molnar
2002-09-24  6:15         ` Rusty Russell
2002-09-20 15:43 ` Bill Davidsen
2002-09-20 16:15   ` Jakub Jelinek
2002-09-20 17:16     ` Bill Davidsen
2002-09-20  7:46 Joerg Pommnitz
2002-09-22 18:55 Peter Waechtler
2002-09-22 21:32 ` Larry McVoy
2002-09-23 10:05   ` Bill Davidsen
2002-09-23 11:55     ` Peter Waechtler
2002-09-23 19:14       ` Bill Davidsen
2002-09-29 23:26         ` Buddy Lumpkin
2002-09-30 14:54           ` Corey Minyard
2002-09-23 15:30     ` Larry McVoy
2002-09-23 19:44       ` Olivier Galibert
2002-09-23 19:48       ` Bill Davidsen
2002-09-23 20:32         ` Ingo Molnar
2002-09-24  0:03           ` Andy Isaacson
2002-09-24  0:10             ` Jeff Garzik
2002-09-24  0:14               ` Andy Isaacson
2002-09-24  5:53             ` Ingo Molnar
2002-09-24 20:34             ` David Schwartz
2002-09-24  7:12           ` Thunder from the hill
2002-09-24  7:30             ` Ingo Molnar
2002-09-23 22:35         ` Mark Mielke
2002-09-23 19:59       ` Peter Waechtler
2002-09-23 20:36         ` Ingo Molnar
2002-09-23 21:08           ` Peter Wächtler
2002-09-23 22:44             ` Mark Mielke
2002-09-23 23:01               ` Bill Huey
2002-09-23 23:11                 ` Mark Mielke
2002-09-24  0:21                   ` Bill Huey
2002-09-24  3:20                     ` Mark Mielke
2002-09-23 23:57           ` Andy Isaacson
2002-09-24 18:10             ` Christoph Hellwig
2002-09-23 21:32       ` Bill Huey
2002-09-23 21:41       ` dean gaudet
2002-09-23 22:10         ` Bill Huey
2002-09-23 22:56         ` Mark Mielke
2002-09-24 10:02       ` Nikita Danilov
2002-09-23 21:22     ` Bill Huey
2002-09-23 21:03 ` Bill Huey
2002-09-24 12:03   ` Michael Sinz
2002-09-24 13:40     ` Peter Svensson
2002-09-24 14:20       ` Michael Sinz
2002-09-24 20:19 ` David Schwartz
2002-09-24 21:10   ` Chris Friesen
2002-09-24 21:22     ` Rik van Riel
2002-09-24 21:35       ` Roberto Peon
2002-09-24 21:35       ` Chris Friesen
2002-09-25 19:02     ` David Schwartz
2002-09-24 23:16   ` Peter Waechtler
2002-09-24 23:23     ` Rik van Riel
2002-09-25 19:05     ` David Schwartz
2002-09-23 16:36 Matthias Urlichs
     [not found] <987738530@toto.iv>
2002-09-24  2:48 ` Peter Chubb
2002-09-24  3:37   ` Mark Mielke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).