linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: debate on 700 threads vs asynchronous code
@ 2003-01-24  0:07 Lee Chin
  0 siblings, 0 replies; 19+ messages in thread
From: Lee Chin @ 2003-01-24  0:07 UTC (permalink / raw)
  To: lm, leechin; +Cc: linux-kernel, linux-newbie

Hi,
Thanks for the rpely... my question was more so, with setcontext and swapcontext, I will still be messing with the data cache right?  

In otherwords, as long as I have an async system with out setcontext, I know I am good... but with it, havent I degraded to a threaded environment?

Thanks
Lee
----- Original Message -----
From: Larry McVoy <lm@bitmover.com>
Date: Thu, 23 Jan 2003 15:28:34 -0800
To: Lee Chin <leechin@mail.com>
Subject: Re: debate on 700 threads vs asynchronous code

> > b) Write an asycnhrounous system with only 2 or three threads where I manage the connections and stack (via setcontext swapcontext etc), which is progromatically a little harder
> > 
> > Which way will yeild me better performance, considerng both approaches are implemented optimally?
> 
> If this is a serious question, an async system will by definition do better.
> You have either 700 stacks screwing up the data cache or 2-3 stacks nicely
> fitting in the data cache.  Ditto for instruction cache, etc.
> -- 
> ---
> Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

-- 
__________________________________________________________
Sign-up for your own FREE Personalized E-mail at Mail.com
http://www.mail.com/?sr=signup

Meet Singles
http://corp.mail.com/lavalife


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: debate on 700 threads vs asynchronous code
  2003-01-29 17:26 Lee Chin
@ 2003-01-30  9:36 ` Terje Eggestad
  0 siblings, 0 replies; 19+ messages in thread
From: Terje Eggestad @ 2003-01-30  9:36 UTC (permalink / raw)
  To: Lee Chin; +Cc: linux-kernel, linux-newbie

On ons, 2003-01-29 at 18:26, Lee Chin wrote:
> Today I do method (C)... but many people seem to say that, hey,
> pthreads does almost just that with a constant memory overhead of
> remembering the stack per blocking thread... so there is no time
> difference, just that pthreads consumes slightly more memory.  That is
> the issue I am trying to get my head around.
> 
> That particular question, no one has answered... in Linux, the
> scheduler will not go around crazy trying to schedule prcosses that
> are all waiting on IO.  NOw the only time I see a degrade in threads
> would be if all are runnable.... in that case a async scheme with two
> threads would let each task run to completion, not thrashing the
> kernel.  Is that correct to say?


Yes

And you can add that if you have many runnable threads, there will be an
extra overhead doing context switching.


> ----- Original Message -----
> From: Terje Eggestad <terje.eggestad@scali.com>
> Date: 27 Jan 2003 10:48:22 +0100
> To: Lee Chin <leechin@mail.com>
> Subject: Re: debate on 700 threads vs asynchronous code
> 
> > Apart from the argument already given on other replies, you should
> > keep in mind that you probably need to give priority to doing receive.
> > THat include your clients, but if you don't you run into the risk of
> > significantly limiting your bandwidth since the send queues around your
> > system fill up. 
> > 
> > Try doing that with threads. 
> > 
> > 
> > Actually I would recommend the approach c)
> > 
> > c)  Write an asynchronous system with only 2 or three threads where I
> > manage the connections and keep the state of each connection in a data
> > structure.  
> > 
> > 
> > On fre, 2003-01-24 at 00:19, Lee Chin wrote:
> > > Hi
> > > I am discussing with a few people on different approaches to solving a scale problem I am having, and have gotten vastly different views
> > > 
> > > In a nutshell, as far as this debate is concerned, I can say I am writing a web server.
> > > 
> > > Now, to cater to 700 clients, I can
> > > a) launch 700 threads that each block on I/O to disk and to the client (in reading and writing on the socket)
> > > 
> > > OR
> > > 
> > > b) Write an asycnhrounous system with only 2 or three threads where I manage the connections and stack (via setcontext swapcontext etc), which is progromatically a little harder
> > > 
> > > Which way will yeild me better performance, considerng both approaches are implemented optimally?
> > > 
> > > Thanks
> > > Lee
> > -- 
> > _________________________________________________________________________
> > 
> > Terje Eggestad                  mailto:terje.eggestad@scali.no
> > Scali Scalable Linux Systems    http://www.scali.com
> > 
> > Olaf Helsets Vei 6              tel:    +47 22 62 89 61 (OFFICE)
> > P.O.Box 150, Oppsal                     +47 975 31 574  (MOBILE)
> > N-0619 Oslo                     fax:    +47 22 62 89 51
> > NORWAY            
> > _________________________________________________________________________
> > 
-- 
_________________________________________________________________________

Terje Eggestad                  mailto:terje.eggestad@scali.no
Scali Scalable Linux Systems    http://www.scali.com

Olaf Helsets Vei 6              tel:    +47 22 62 89 61 (OFFICE)
P.O.Box 150, Oppsal                     +47 975 31 574  (MOBILE)
N-0619 Oslo                     fax:    +47 22 62 89 51
NORWAY            
_________________________________________________________________________


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: debate on 700 threads vs asynchronous code
@ 2003-01-29 21:32 Dan Kegel
  0 siblings, 0 replies; 19+ messages in thread
From: Dan Kegel @ 2003-01-29 21:32 UTC (permalink / raw)
  To: linux-kernel

Lee Chin wrote:
 > Terje Eggestad <terje.eggestad@scali.com> wrote:
 >> Apart from the argument already given on other replies, you should
 >> keep in mind that you probably need to give priority to doing receive.
 >> THat include your clients, but if you don't you run into the risk of
 >> significantly limiting your bandwidth since the send queues around your
 >> system fill up.
 >>
 >> Try doing that with threads.
 >>
 >> Actually I would recommend the approach c)
 >>
 >> c)  Write an asynchronous system with only 2 or three threads where I
 >> manage the connections and keep the state of each connection in a data
 >> structure.
 >
> Today I do method (C)... but many people seem to say that, hey, pthreads does almost
> just that with a constant memory overhead of remembering the stack per blocking
> thread... so there is no time difference, just that pthreads consumes slightly more
> memory.  That is the issue I am trying to get my head around.

The best way to get your head around it is to
benchmark both approaches, and spend some time
refining your implementation of each so you
understand where the bottlenecks are.

> That particular question, no one has answered... in Linux, the scheduler will not go 
> around crazy trying to schedule prcosses that are all waiting on IO.  NOw the only 
> time I see a degrade in threads would be if all are runnable.... in that case a async
> scheme with two threads would let each task run to completion, not thrashing the
> kernel.  Is that correct to say?

There are lots of other issues, too.
Talk is cheap and fun, but only coding will give the real answer.
Go forth and code...

- Dan







^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: debate on 700 threads vs asynchronous code
@ 2003-01-29 17:26 Lee Chin
  2003-01-30  9:36 ` Terje Eggestad
  0 siblings, 1 reply; 19+ messages in thread
From: Lee Chin @ 2003-01-29 17:26 UTC (permalink / raw)
  To: terje.eggestad, leechin; +Cc: linux-kernel, linux-newbie

Today I do method (C)... but many people seem to say that, hey, pthreads does almost just that with a constant memory overhead of remembering the stack per blocking thread... so there is no time difference, just that pthreads consumes slightly more memory.  That is the issue I am trying to get my head around.

That particular question, no one has answered... in Linux, the scheduler will not go around crazy trying to schedule prcosses that are all waiting on IO.  NOw the only time I see a degrade in threads would be if all are runnable.... in that case a async scheme with two threads would let each task run to completion, not thrashing the kernel.  Is that correct to say?
----- Original Message -----
From: Terje Eggestad <terje.eggestad@scali.com>
Date: 27 Jan 2003 10:48:22 +0100
To: Lee Chin <leechin@mail.com>
Subject: Re: debate on 700 threads vs asynchronous code

> Apart from the argument already given on other replies, you should
> keep in mind that you probably need to give priority to doing receive.
> THat include your clients, but if you don't you run into the risk of
> significantly limiting your bandwidth since the send queues around your
> system fill up. 
> 
> Try doing that with threads. 
> 
> 
> Actually I would recommend the approach c)
> 
> c)  Write an asynchronous system with only 2 or three threads where I
> manage the connections and keep the state of each connection in a data
> structure.  
> 
> 
> On fre, 2003-01-24 at 00:19, Lee Chin wrote:
> > Hi
> > I am discussing with a few people on different approaches to solving a scale problem I am having, and have gotten vastly different views
> > 
> > In a nutshell, as far as this debate is concerned, I can say I am writing a web server.
> > 
> > Now, to cater to 700 clients, I can
> > a) launch 700 threads that each block on I/O to disk and to the client (in reading and writing on the socket)
> > 
> > OR
> > 
> > b) Write an asycnhrounous system with only 2 or three threads where I manage the connections and stack (via setcontext swapcontext etc), which is progromatically a little harder
> > 
> > Which way will yeild me better performance, considerng both approaches are implemented optimally?
> > 
> > Thanks
> > Lee
> -- 
> _________________________________________________________________________
> 
> Terje Eggestad                  mailto:terje.eggestad@scali.no
> Scali Scalable Linux Systems    http://www.scali.com
> 
> Olaf Helsets Vei 6              tel:    +47 22 62 89 61 (OFFICE)
> P.O.Box 150, Oppsal                     +47 975 31 574  (MOBILE)
> N-0619 Oslo                     fax:    +47 22 62 89 51
> NORWAY            
> _________________________________________________________________________
> 

-- 
__________________________________________________________
Sign-up for your own FREE Personalized E-mail at Mail.com
http://www.mail.com/?sr=signup


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: debate on 700 threads vs asynchronous code
  2003-01-23 23:19 Lee Chin
                   ` (2 preceding siblings ...)
  2003-01-27  9:48 ` Terje Eggestad
@ 2003-01-27 22:08 ` Bill Davidsen
  3 siblings, 0 replies; 19+ messages in thread
From: Bill Davidsen @ 2003-01-27 22:08 UTC (permalink / raw)
  To: Lee Chin; +Cc: linux-kernel, linux-newbie

On Thu, 23 Jan 2003, Lee Chin wrote:

> I am discussing with a few people on different approaches to solving a
> scale problem I am having, and have gotten vastly different views
> 
> In a nutshell, as far as this debate is concerned, I can say I am writing a web server.
> 
> Now, to cater to 700 clients, I can a) launch 700 threads that each
> block on I/O to disk and to the client (in reading and writing on the
> socket) 
> 
> OR
> 
> b) Write an asycnhrounous system with only 2 or three threads where I
> manage the connections and stack (via setcontext swapcontext etc), which
> is progromatically a little harder

There are many other ways, involving use of async io for disk and select
on some limited number of sockets per thread. If you want to wallow in
analysis paralysis you can certainly do it. Take a look at existing
usenet, mail, web and dns servers and you will see a number of ways to
attack this problem, and correctly implemented most of them work fine.

I believe Ingo mentioned some huge number of practical threads when he was
first talking about the latest thread library. If you believe it, or if
you really will be happy at 700 tasks per server, then thread per socket
is the easiest to implement, at least IMHO.

I'm using various news software which does most combinations of threading,
select, and even full processes per client, and none of them strike me as
being inherently better (as opposed to some being better implementations). 
Ask Ingo how many threads you can really run in six months when the new
kernel and thread bits are more stable, that's the only scaling bit I
can't even guess. Pick one method, write code. I believe implementation
will be more important than method, unless you make a *really* bad choice.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: debate on 700 threads vs asynchronous code
  2003-01-27  9:48 ` Terje Eggestad
@ 2003-01-27 21:48   ` Bill Davidsen
  0 siblings, 0 replies; 19+ messages in thread
From: Bill Davidsen @ 2003-01-27 21:48 UTC (permalink / raw)
  To: Terje Eggestad; +Cc: Lee Chin, linux-kernel, linux-newbie

On 27 Jan 2003, Terje Eggestad wrote:

> Apart from the argument already given on other replies, you should
> keep in mind that you probably need to give priority to doing receive.
> THat include your clients, but if you don't you run into the risk of
> significantly limiting your bandwidth since the send queues around your
> system fill up. 
> 
> Try doing that with threads.

Okay, I'm running my usenet exchange machines on Linux with Earthquake,
one thread per socket, 300-500 sockets, 700-800GB/day with incoming rate
spikes to 130Mbit on two 100Mbit NICs. What is it I'm supposed to try
doing with threads?

And if this is a webserver or anything like it, the incoming bandwidth is
probably orders of magnitude below the outgoing... Hum, like a usenet
reader server. Below, from a Linux box running Twister, also threaded per
feed in and per reader socket out.

 load free buffs swap pgin pgou dk0 dk1 dk2 dk3 ipkt opkt  int  ctx   usr  sys idl  i_netK  o_netK
 2.98  5.0  1807  0.0  544 2220  71  66  21   0 6173 3390 9600 17983     3  17  80  7170.8   941.9
 4.77  4.5  1805  0.0 1117 6267  39 134 134   0 5403 3212 8780 20663     8  34  58  6645.4   978.9
 2.35  4.3  1802  0.0 1529 6900  37 176 189   0 6134 3648 10007 18492    9  25  66  7470.4  1087.9
 1.10  4.8  1800  0.0 1428 5609  33 149 150   0 5871 3447 9505 18028     9  25  66  7235.2   961.0
 1.38  6.7  1798  0.0  970 6671  34 139 134   0 6250 3685 10051 20210    9  26  65  7503.4  1088.8
 6.57  5.0  1797  0.0 1589 7673  89 184 188   0 5912 3571 9732 20165     8  33  59  7003.7  1169.3
 2.30  4.6  1799  0.0 1648 5900  44 154 146   0 6539 3998 10660 17975    9  27  64  7631.0  1382.6

Forgive the formatting, it kind of break with larger numbers...

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: debate on 700 threads vs asynchronous code
  2003-01-23 23:19 Lee Chin
  2003-01-23 23:28 ` Larry McVoy
  2003-01-23 23:31 ` Ben Greear
@ 2003-01-27  9:48 ` Terje Eggestad
  2003-01-27 21:48   ` Bill Davidsen
  2003-01-27 22:08 ` Bill Davidsen
  3 siblings, 1 reply; 19+ messages in thread
From: Terje Eggestad @ 2003-01-27  9:48 UTC (permalink / raw)
  To: Lee Chin; +Cc: linux-kernel, linux-newbie

Apart from the argument already given on other replies, you should
keep in mind that you probably need to give priority to doing receive.
THat include your clients, but if you don't you run into the risk of
significantly limiting your bandwidth since the send queues around your
system fill up. 

Try doing that with threads. 


Actually I would recommend the approach c)

c)  Write an asynchronous system with only 2 or three threads where I
manage the connections and keep the state of each connection in a data
structure.  


On fre, 2003-01-24 at 00:19, Lee Chin wrote:
> Hi
> I am discussing with a few people on different approaches to solving a scale problem I am having, and have gotten vastly different views
> 
> In a nutshell, as far as this debate is concerned, I can say I am writing a web server.
> 
> Now, to cater to 700 clients, I can
> a) launch 700 threads that each block on I/O to disk and to the client (in reading and writing on the socket)
> 
> OR
> 
> b) Write an asycnhrounous system with only 2 or three threads where I manage the connections and stack (via setcontext swapcontext etc), which is progromatically a little harder
> 
> Which way will yeild me better performance, considerng both approaches are implemented optimally?
> 
> Thanks
> Lee
-- 
_________________________________________________________________________

Terje Eggestad                  mailto:terje.eggestad@scali.no
Scali Scalable Linux Systems    http://www.scali.com

Olaf Helsets Vei 6              tel:    +47 22 62 89 61 (OFFICE)
P.O.Box 150, Oppsal                     +47 975 31 574  (MOBILE)
N-0619 Oslo                     fax:    +47 22 62 89 51
NORWAY            
_________________________________________________________________________


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: debate on 700 threads vs asynchronous code
       [not found] <Pine.LNX.4.44.0301241840450.11758-100000@coffee.psychology.mcmaster.ca>
@ 2003-01-25  0:24 ` Dan Kegel
  0 siblings, 0 replies; 19+ messages in thread
From: Dan Kegel @ 2003-01-25  0:24 UTC (permalink / raw)
  To: Mark Hahn; +Cc: linux-kernel

Mark Hahn wrote:
>>>>>does epoll provide a thunk (callback and state variable) as well as the 
>>>>>IO completion status?
>>>>
>>>>No.  It provides an event record containing a user-defined state pointer
>>>>plus the IO readiness status change (different from IO completion status).
>>>>But that's what you need; you can do the call yourself.
>>>
>>>well, that means another syscall, which makes a footprint claim kind of moot,
>>>no?
>>
>>What syscall?  You call sys_epoll once for every thousand events or so,
>>then you call your handler, which does a write or whatever.  No
>>extra syscall.
> 
> before a client can be sent the next chunk, the IO status of the last 
> chunk must be tested.  with the simple blocking, thread-per-client approach,
> this happens automaticaly (write returns the number of bytes written).
> 
> with epoll, don't you have to do a syscall to query the status of 
> the just-completed IO?

Nope.  Just go ahead and write.  (Same as with poll(), except that
with epoll, you only get notified once.)  Any errors are reported
immediately by write(), so there's no more status to get.
- Dan


-- 
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: debate on 700 threads vs asynchronous code
  2003-01-24 23:29         ` Randy.Dunlap
@ 2003-01-25  0:11           ` Dan Kegel
  0 siblings, 0 replies; 19+ messages in thread
From: Dan Kegel @ 2003-01-25  0:11 UTC (permalink / raw)
  To: Randy.Dunlap
  Cc: Matti Aarnio, Corey Minyard, Mark Mielke, Mark Hahn, linux-kernel

Randy.Dunlap wrote:
> On Sat, 25 Jan 2003, Matti Aarnio wrote:
> 
> | On Fri, Jan 24, 2003 at 04:53:46PM -0600, Corey Minyard wrote:
> | ...
> | > I would disagree.  One thread per connection is easier to conceptually
> | > understand.  In my experience, an event-driven model (which is what you
> | > end up with if you use one or a few threads) is actually easier to
> | > correctly implement and it tends to make your code more modular and
> | > portable.
> |
> |   An old thing from early annals of computer science (I browsed Knuth's
> | "The Art" again..) is called   Coroutine.
> |
> | Gives you "one thread per connection" programming model, but without
> | actual multiple scheduling threads in the kernel side.  ...
> | Doing coroutine library all in portable C (by means of setjmp()/longjmp())
> | is possible, but not very efficient.  A bit of assembly helps a lot.

There's also an elegant implementation that uses switch statements
or computed gotos; see http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html
I'm using it.  It's a bit limited, but hey, it works for me.

> Davide Libenzi (epoll) likes and discusses coroutines on one of his
> web pages:  http://www.xmailserver.org/linux-patches/nio-improve.html
> (search for /coroutine/)

IMHO coroutines are harder to use than either threads or nonblocking I/O.
Then again, I don't like Scheme; many things in this world are a matter of taste.
- Dan

-- 
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: debate on 700 threads vs asynchronous code
  2003-01-24 23:21       ` Matti Aarnio
@ 2003-01-24 23:29         ` Randy.Dunlap
  2003-01-25  0:11           ` Dan Kegel
  0 siblings, 1 reply; 19+ messages in thread
From: Randy.Dunlap @ 2003-01-24 23:29 UTC (permalink / raw)
  To: Matti Aarnio
  Cc: Corey Minyard, Mark Mielke, Dan Kegel, Mark Hahn, linux-kernel

On Sat, 25 Jan 2003, Matti Aarnio wrote:

| On Fri, Jan 24, 2003 at 04:53:46PM -0600, Corey Minyard wrote:
| ...
| > I would disagree.  One thread per connection is easier to conceptually
| > understand.  In my experience, an event-driven model (which is what you
| > end up with if you use one or a few threads) is actually easier to
| > correctly implement and it tends to make your code more modular and
| > portable.
|
|   An old thing from early annals of computer science (I browsed Knuth's
| "The Art" again..) is called   Coroutine.
|
| Gives you "one thread per connection" programming model, but without
| actual multiple scheduling threads in the kernel side.
|
| Simplest coroutine implementations are truly simple.. Pagefull of C.
| Knuth shows it with very few MIX (assembly) instructions.
|
| Throwing in non-blocking socket/filedescriptor access, and in event
| of "EAGAIN", coroutine-yielding to some other coroutine, does complicate
| things, naturally.
|
| Good coder finds balance in between various methods, possibly uses
| both coroutine "userspace threads", and actual kernel threads.
|
| Doing coroutine library all in portable C (by means of setjmp()/longjmp())
| is possible, but not very efficient.  A bit of assembly helps a lot.
|
| > -Corey
|
| /Matti Aarnio
| -

Davide Libenzi (epoll) likes and discusses coroutines on one of his
web pages:  http://www.xmailserver.org/linux-patches/nio-improve.html
(search for /coroutine/)

-- 
~Randy


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: debate on 700 threads vs asynchronous code
  2003-01-24 22:53     ` Corey Minyard
@ 2003-01-24 23:21       ` Matti Aarnio
  2003-01-24 23:29         ` Randy.Dunlap
  0 siblings, 1 reply; 19+ messages in thread
From: Matti Aarnio @ 2003-01-24 23:21 UTC (permalink / raw)
  To: Corey Minyard; +Cc: Mark Mielke, Dan Kegel, Mark Hahn, linux-kernel

On Fri, Jan 24, 2003 at 04:53:46PM -0600, Corey Minyard wrote:
...
> I would disagree.  One thread per connection is easier to conceptually 
> understand.  In my experience, an event-driven model (which is what you 
> end up with if you use one or a few threads) is actually easier to 
> correctly implement and it tends to make your code more modular and 
> portable.

  An old thing from early annals of computer science (I browsed Knuth's
"The Art" again..) is called   Coroutine.

Gives you "one thread per connection" programming model, but without
actual multiple scheduling threads in the kernel side.

Simplest coroutine implementations are truly simple.. Pagefull of C.
Knuth shows it with very few MIX (assembly) instructions.

Throwing in non-blocking socket/filedescriptor access, and in event
of "EAGAIN", coroutine-yielding to some other coroutine, does complicate
things, naturally.

Good coder finds balance in between various methods, possibly uses
both coroutine "userspace threads", and actual kernel threads.

Doing coroutine library all in portable C (by means of setjmp()/longjmp())
is possible, but not very efficient.  A bit of assembly helps a lot.

> -Corey

/Matti Aarnio

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: debate on 700 threads vs asynchronous code
  2003-01-24  8:26   ` Mark Mielke
@ 2003-01-24 22:53     ` Corey Minyard
  2003-01-24 23:21       ` Matti Aarnio
  0 siblings, 1 reply; 19+ messages in thread
From: Corey Minyard @ 2003-01-24 22:53 UTC (permalink / raw)
  To: Mark Mielke; +Cc: Dan Kegel, Mark Hahn, linux-kernel

Mark Mielke wrote:

>>And, for what it's worth, programmer productivity is sometimes
>>more important than all the above.  I happen to work
>>at a place where performance is worth a lot of extra effort,
>>but other shops prefer to throw hardware at the problem and
>>not worry about that last 10%.
>>    
>>
>
>Definately an argument for the one thread per connection model. :-)
>
I would disagree.  One thread per connection is easier to conceptually 
understand.  In my experience, an event-driven model (which is what you 
end up with if you use one or a few threads) is actually easier to 
correctly implement and it tends to make your code more modular and 
portable.

-Corey


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: debate on 700 threads vs asynchronous code
  2003-01-24  8:21 ` Dan Kegel
@ 2003-01-24  8:26   ` Mark Mielke
  2003-01-24 22:53     ` Corey Minyard
  0 siblings, 1 reply; 19+ messages in thread
From: Mark Mielke @ 2003-01-24  8:26 UTC (permalink / raw)
  To: Dan Kegel; +Cc: Mark Hahn, linux-kernel

On Fri, Jan 24, 2003 at 12:21:49AM -0800, Dan Kegel wrote:
> In any case, benchmarking's the only way to go.  No amount of talk will
> substitute for a good real-life measurement.  That's what convinced
> me that epoll was faster than sigio, and that sigio was
> sometimes slower than select() !

Also, anybody can write a poor implementation of each, so even
benchmarks are suspect...

My personal favourite model currently is switched I/O, but prioritized
threads per expected event frequency or event priority. For example,
events that won't likely occur for some time, or have a low priority,
can all be pushed to a low priority thread. Not only does this keep
the operating system free to give the CPU's to higher priority
threads, but the higher priority threads have fewer resources to
manage, leading to more efficient operation. Also, event handling code
that may take some time to complete should be moved to its own thread
in a thread pool, allowing the dispatching to fully complete without
needing to actually execute all of the (expensive) handlers.

> And, for what it's worth, programmer productivity is sometimes
> more important than all the above.  I happen to work
> at a place where performance is worth a lot of extra effort,
> but other shops prefer to throw hardware at the problem and
> not worry about that last 10%.

Definately an argument for the one thread per connection model. :-)

mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: debate on 700 threads vs asynchronous code
       [not found] <Pine.LNX.4.44.0301232144470.8203-100000@coffee.psychology.mcmaster.ca>
@ 2003-01-24  8:21 ` Dan Kegel
  2003-01-24  8:26   ` Mark Mielke
  0 siblings, 1 reply; 19+ messages in thread
From: Dan Kegel @ 2003-01-24  8:21 UTC (permalink / raw)
  To: Mark Hahn, linux-kernel

Mark Hahn wrote:
> in principle, why should the footprint be large?  it's a register set
> and at most a couple cachelines of stack frame.

... but all the threads' cachelines will collide, whereas if
you're using nonblocking I/O, session state might be staggered better.
This is just a guess; I haven't measured it.

>>>does epoll provide a thunk (callback and state variable) as well as the 
>>>IO completion status?
>>
>>No.  It provides an event record containing a user-defined state pointer
>>plus the IO readiness status change (different from IO completion status).
>>But that's what you need; you can do the call yourself.
> 
> well, that means another syscall, which makes a footprint claim kind of moot,
> no?

What syscall?  You call sys_epoll once for every thousand events or so,
then you call your handler, which does a write or whatever.  No
extra syscall.

>>>>See http://www.kegel.com/c10k.html for an overview of the issue and some links.
>>>
>>>
>>>it's a great resource, except that for 700 clients, the difference
>>>between select, poll, epoll, aio are pretty moot.  no?
>>
>>Depends on how close to maximal performance you need, and whether
>>you might later need to scale to more clients.
> 
> 
> no, I'm suggesting the choice is nonlinear: that for moderately large loads,
> like 700 clients, there is no advantage to traditional approaches.

I agree that for 700 clients the answer may be different than for 2000 clients.
However, if you have to handle 700 clients, how do you know you won't
have to handle 2000 later?

In any case, benchmarking's the only way to go.  No amount of talk will
substitute for a good real-life measurement.  That's what convinced
me that epoll was faster than sigio, and that sigio was
sometimes slower than select() !

And, for what it's worth, programmer productivity is sometimes
more important than all the above.  I happen to work
at a place where performance is worth a lot of extra effort,
but other shops prefer to throw hardware at the problem and
not worry about that last 10%.

- Dan

-- 
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: debate on 700 threads vs asynchronous code
       [not found] <Pine.LNX.4.44.0301232028480.980-100000@coffee.psychology.mcmaster.ca>
@ 2003-01-24  2:04 ` Dan Kegel
  0 siblings, 0 replies; 19+ messages in thread
From: Dan Kegel @ 2003-01-24  2:04 UTC (permalink / raw)
  To: Mark Hahn, linux-kernel

Mark Hahn wrote:
>>Nonblocking I/O is totally the way to go if you have full control over your
>>source code and want the maximal performance in userspace.  The best way
> 
> why do you think it's better for user-space?  I was trying to explain
> it to someone this afternoon, and we couldn't find any reason for 
> threads/blocking to be slow.  IO-completion wakes up the thread, which
> goes through the scheduler right back to the user's stack-frame,
> even providing the io-completion status.  no large cache footprint 
> anywhere (at least with a lightweight thread library), no multiplexing
> like for select/poll, etc.

I suspect the thread *does* have a larger cache footprint,
since in nonblocking I/O, session state is stored more compactly.
Also, the threaded approach involves lots more context switches.

> does epoll provide a thunk (callback and state variable) as well as the 
> IO completion status?

No.  It provides an event record containing a user-defined state pointer
plus the IO readiness status change (different from IO completion status).
But that's what you need; you can do the call yourself.

>>See http://www.kegel.com/c10k.html for an overview of the issue and some links.
> 
> 
> it's a great resource, except that for 700 clients, the difference
> between select, poll, epoll, aio are pretty moot.  no?

Depends on how close to maximal performance you need, and whether
you might later need to scale to more clients.

The average server is so lightly loaded, it really doesn't matter which approach you use.
- Dan


-- 
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: debate on 700 threads vs asynchronous code
@ 2003-01-24  1:46 Dan Kegel
  0 siblings, 0 replies; 19+ messages in thread
From: Dan Kegel @ 2003-01-24  1:46 UTC (permalink / raw)
  To: Lee Chin, linux-kernel

Lee Chin <leechin@mail.com> wrote:
> Larry McVoy wrote:
>> > Now, to cater to 700 clients, I can
>> > a) launch 700 threads that each block on I/O to disk and to the client (in 
>> > reading and writing on the socket) 
>> > OR
>> > b) Write an asycnhrounous system with only 2 or three threads where I manage the
>> > connections and stack (via setcontext swapcontext etc), which is 
>> > programatically a little harder 
>> > Which way will yeild me better performance, considerng both approaches are
>> > implemented optimally?
>> 
>> If this is a serious question, an async system will by definition do better.
>> You have either 700 stacks screwing up the data cache or 2-3 stacks nicely
>> fitting in the data cache.  Ditto for instruction cache, etc.
 >
> Thanks for the rpely... my question was more so, with setcontext and swapcontext, I
> will still be messing with the data cache right?  
> 
> In otherwords, as long as I have an async system with out setcontext, I know I am
> good... but with it, havent I degraded to a threaded environment?

I suspect Linux's implementation of asynch I/O isn't able to handle sockets yet.
Thus the choice is between nonblocking I/O and blocking I/O.

Nonblocking I/O is totally the way to go if you have full control over your
source code and want the maximal performance in userspace.  The best way
to get good performance with nonblocking I/O in Linux is to use the sys_epoll
system call; it's part of the 2.5 kernel, but a backport to 2.4 is available.

See http://www.kegel.com/c10k.html for an overview of the issue and some links.
- Dan

-- 
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: debate on 700 threads vs asynchronous code
  2003-01-23 23:19 Lee Chin
  2003-01-23 23:28 ` Larry McVoy
@ 2003-01-23 23:31 ` Ben Greear
  2003-01-27  9:48 ` Terje Eggestad
  2003-01-27 22:08 ` Bill Davidsen
  3 siblings, 0 replies; 19+ messages in thread
From: Ben Greear @ 2003-01-23 23:31 UTC (permalink / raw)
  To: Lee Chin; +Cc: linux-kernel, linux-newbie

Lee Chin wrote:
> Hi
> I am discussing with a few people on different approaches to solving a scale problem I am having, and have gotten vastly different views
> 
> In a nutshell, as far as this debate is concerned, I can say I am writing a web server.
> 
> Now, to cater to 700 clients, I can
> a) launch 700 threads that each block on I/O to disk and to the client (in reading and writing on the socket)
> 
> OR
> 
> b) Write an asycnhrounous system with only 2 or three threads where I manage the connections and stack (via setcontext swapcontext etc), which is progromatically a little harder

You could also write something with async non-blocking IO and use NO threads
(ie, just a single process), which
may greatly simplify the debugging of your program (unless the developer(s) on your
project are very good at threaded programming already).

I suspect the async IO will perform better as well, but that is just an
un-founded opinion based on not wanting to think about scheduling 700 processes
that want to do IO :)

> 
> Which way will yeild me better performance, considerng both approaches are implemented optimally?
> 
> Thanks
> Lee


-- 
Ben Greear <greearb@candelatech.com>       <Ben_Greear AT excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: debate on 700 threads vs asynchronous code
  2003-01-23 23:19 Lee Chin
@ 2003-01-23 23:28 ` Larry McVoy
  2003-01-23 23:31 ` Ben Greear
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 19+ messages in thread
From: Larry McVoy @ 2003-01-23 23:28 UTC (permalink / raw)
  To: Lee Chin; +Cc: linux-kernel, linux-newbie

> b) Write an asycnhrounous system with only 2 or three threads where I manage the connections and stack (via setcontext swapcontext etc), which is progromatically a little harder
> 
> Which way will yeild me better performance, considerng both approaches are implemented optimally?

If this is a serious question, an async system will by definition do better.
You have either 700 stacks screwing up the data cache or 2-3 stacks nicely
fitting in the data cache.  Ditto for instruction cache, etc.
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* debate on 700 threads vs asynchronous code
@ 2003-01-23 23:19 Lee Chin
  2003-01-23 23:28 ` Larry McVoy
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Lee Chin @ 2003-01-23 23:19 UTC (permalink / raw)
  To: linux-kernel, linux-newbie

Hi
I am discussing with a few people on different approaches to solving a scale problem I am having, and have gotten vastly different views

In a nutshell, as far as this debate is concerned, I can say I am writing a web server.

Now, to cater to 700 clients, I can
a) launch 700 threads that each block on I/O to disk and to the client (in reading and writing on the socket)

OR

b) Write an asycnhrounous system with only 2 or three threads where I manage the connections and stack (via setcontext swapcontext etc), which is progromatically a little harder

Which way will yeild me better performance, considerng both approaches are implemented optimally?

Thanks
Lee
-- 
__________________________________________________________
Sign-up for your own FREE Personalized E-mail at Mail.com
http://www.mail.com/?sr=signup

Meet Singles
http://corp.mail.com/lavalife


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2003-01-30  9:27 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-01-24  0:07 debate on 700 threads vs asynchronous code Lee Chin
  -- strict thread matches above, loose matches on Subject: below --
2003-01-29 21:32 Dan Kegel
2003-01-29 17:26 Lee Chin
2003-01-30  9:36 ` Terje Eggestad
     [not found] <Pine.LNX.4.44.0301241840450.11758-100000@coffee.psychology.mcmaster.ca>
2003-01-25  0:24 ` Dan Kegel
     [not found] <Pine.LNX.4.44.0301232144470.8203-100000@coffee.psychology.mcmaster.ca>
2003-01-24  8:21 ` Dan Kegel
2003-01-24  8:26   ` Mark Mielke
2003-01-24 22:53     ` Corey Minyard
2003-01-24 23:21       ` Matti Aarnio
2003-01-24 23:29         ` Randy.Dunlap
2003-01-25  0:11           ` Dan Kegel
     [not found] <Pine.LNX.4.44.0301232028480.980-100000@coffee.psychology.mcmaster.ca>
2003-01-24  2:04 ` Dan Kegel
2003-01-24  1:46 Dan Kegel
2003-01-23 23:19 Lee Chin
2003-01-23 23:28 ` Larry McVoy
2003-01-23 23:31 ` Ben Greear
2003-01-27  9:48 ` Terje Eggestad
2003-01-27 21:48   ` Bill Davidsen
2003-01-27 22:08 ` Bill Davidsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).