linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Davide Libenzi <davidel@xmailserver.org>
To: David Schwartz <davids@webmaster.com>
Cc: Eric Varsanyi <e0206@foo21.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: RE: [Patch][RFC] epoll and half closed TCP connections
Date: Sun, 13 Jul 2003 14:14:33 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.55.0307131334380.15022@bigblue.dev.mcafeelabs.com> (raw)
In-Reply-To: <MDEHLPKNGKAHNMBLJOLKIEEPEFAA.davids@webmaster.com>

On Sun, 13 Jul 2003, David Schwartz wrote:

> 	For most real-world loads, M is some fraction of N. The fraction
> asymptotically approaches 1 as load increases because under load it takes
> you longer to get back to polling, so a higher fraction of the descriptors
> will be ready when you do.
>
> 	Even if you argue that most real-world loads consists of a few very busy
> file descriptors and a lot of idle file descriptors, why would you think
> that this ratio changes as the number of connections increase? Say a group
> of two servers is handling a bunch of connections. Some of those connections
> will be very active and some will be very idle. But surely the *percentage*
> of active connections won't change just becase the connections are split
> over the servers 50/50 rather than 10/90.
>
> 	If a particular protocol and usage sees 10 idle connections for every
> active one, then N will be ten times M, and O(M) will be the same as O(N).
> It's only if a higher percentage of connections are idle when there are more
> connections (which seems an extreme rarity to me) that O(M) is better than
> O(N).

Apoligize, I abused of O(*) (hopefully noone of my math profs are on lkml :).
Yes, N/M has little/none fluctuation in the N domain. So, using O(*)
correctly, they both scale O(N). But, we can trivially say that if we call
CP the cost of poll() in CPU cycles, and CE the cost of epoll :

CP(N, M) = KP * N
EP(N, M) = KE * M

Where KP and KE are constant that depends on the code architecture of the
two systems. If we fix KA (active coefficent ) :

KA = M / N

we can write the scalability coefficent like :

         KP * N          KP
KS = ------------- = ---------
      KE * KA * N     KE * KA

The scalability coefficent is clearly inv. proportional to KA. Let's look
at what the poll code does :

1) It has to allocate the kernel buffer for events

2) It has to copy it from userspace

3) It has to allocate wait queue buffer calling get_free_page (possibly
	multiple times when we talk about decent fds numbers)

4) It has to loop calling N times f_op->poll() that in turn will add into
	the wait queue getting/releasing IRQ locks

5) Loop another M loop to copy events to userspace

6) Call kfree() for all blocks allocated

7) Call poll_freewait() that will go with another N loop to unregister
	poll waits, that in turn will do another N IRQ locks

The epoll code does remember/cache things so that KE is largely lower that
KP, and this together with a pretty low KA explain results about poll
scalability against epoll.



> 	Is there any actual evidence to suggest that epoll scales better than poll
> for "real loads"? Tests with increasing numbers of idle file descriptors as
> the active count stays constant are not real loads.

Yes, of course. The time spent inside poll/select becomes a PITA when you
start dealing with huge number of fds. And this is kernel time. This does
not obviously mean that if epoll is 10 times faster than poll under load,
and you switch your app on epoll, it'll be ten times faster. It means that
the kernel time spent inside poll will be 1/10. And many of the operations
done by poll require IRQ locks and this increase the time the kernel
spend with disabled IRQs, that is never a good thing.



- Davide


  parent reply	other threads:[~2003-07-13 21:07 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-07-12 18:16 [Patch][RFC] epoll and half closed TCP connections Eric Varsanyi
2003-07-12 19:44 ` Jamie Lokier
2003-07-12 20:51   ` Eric Varsanyi
2003-07-12 20:48     ` Davide Libenzi
2003-07-12 21:19       ` Eric Varsanyi
2003-07-12 21:20         ` Davide Libenzi
2003-07-12 21:41         ` Davide Libenzi
2003-07-12 23:11           ` Eric Varsanyi
2003-07-12 23:55             ` Davide Libenzi
2003-07-13  1:05               ` Eric Varsanyi
2003-07-13 20:32       ` David Schwartz
2003-07-13 21:10         ` Jamie Lokier
2003-07-13 23:05           ` David Schwartz
2003-07-13 23:09             ` Davide Libenzi
2003-07-14  8:14               ` Alan Cox
2003-07-14 15:03                 ` Davide Libenzi
2003-07-14  1:27             ` Jamie Lokier
2003-07-13 21:14         ` Davide Libenzi [this message]
2003-07-13 23:05           ` David Schwartz
2003-07-13 23:11             ` Davide Libenzi
2003-07-13 23:52             ` Entrope
2003-07-14  6:14               ` David Schwartz
2003-07-14  7:20                 ` Jamie Lokier
2003-07-14  1:51             ` Jamie Lokier
2003-07-14  6:14               ` David Schwartz
2003-07-15 20:27             ` James Antill
2003-07-16  1:46               ` David Schwartz
2003-07-16  2:09                 ` James Antill
2003-07-13 13:12     ` Jamie Lokier
2003-07-13 16:55       ` Davide Libenzi
2003-07-12 20:01 ` Davide Libenzi
2003-07-13  5:24   ` David S. Miller
2003-07-13 14:07     ` Jamie Lokier
2003-07-13 17:00       ` Davide Libenzi
2003-07-13 19:15         ` Jamie Lokier
2003-07-13 23:03           ` Davide Libenzi
2003-07-14  1:41             ` Jamie Lokier
2003-07-14  2:24               ` POLLRDONCE optimisation for epoll users (was: epoll and half closed TCP connections) Jamie Lokier
2003-07-14  2:37                 ` Davide Libenzi
2003-07-14  2:43                   ` Davide Libenzi
2003-07-14  2:56                   ` Jamie Lokier
2003-07-14  3:02                     ` Davide Libenzi
2003-07-14  3:16                       ` Jamie Lokier
2003-07-14  3:21                         ` Davide Libenzi
2003-07-14  3:42                           ` Jamie Lokier
2003-07-14  4:00                             ` Davide Libenzi
2003-07-14  5:51                               ` Jamie Lokier
2003-07-14  6:24                                 ` Davide Libenzi
2003-07-14  6:57                                   ` Jamie Lokier
2003-07-14  3:12                     ` Jamie Lokier
2003-07-14  3:17                       ` Davide Libenzi
2003-07-14  3:35                         ` Jamie Lokier
2003-07-14  3:04                   ` Jamie Lokier
2003-07-14  3:12                     ` Davide Libenzi
2003-07-14  3:27                       ` Jamie Lokier
2003-07-14 17:09     ` [Patch][RFC] epoll and half closed TCP connections kuznet
2003-07-14 17:09       ` Davide Libenzi
2003-07-14 21:45       ` Jamie Lokier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.55.0307131334380.15022@bigblue.dev.mcafeelabs.com \
    --to=davidel@xmailserver.org \
    --cc=davids@webmaster.com \
    --cc=e0206@foo21.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).