All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jamie Lokier <jamie@shareable.org>
To: Eric Varsanyi <e0206@foo21.com>
Cc: linux-kernel@vger.kernel.org, davidel@xmailserver.org
Subject: Re: [Patch][RFC] epoll and half closed TCP connections
Date: Sun, 13 Jul 2003 14:12:10 +0100	[thread overview]
Message-ID: <20030713131210.GA19132@mail.jlokier.co.uk> (raw)
In-Reply-To: <20030712205114.GC15643@srv.foo21.com>

Eric Varsanyi wrote:
> > Well then, use epoll's level-triggered mode.  It's quite easy - it's
> > the default now. :)
> 
> The problem with all the level triggered schemes (poll, select, epoll w/o
> EPOLLET) is that they call every driver and poll status for every call into
> the kernel. This appeared to be killing my app's performance and I verified
> by writing some simple micro benchmarks.

OH! :-O

Level-triggered epoll_wait() time _should_ be scalable - proportional
to the number of ready events, not the number of listening events.  If
this is not the case then it's a bug in epoll.

In principle, you will see a large delay only if you don't handle
those events (e.g. by calling read() on each ready fd), so that they
are still ready.

Reading the code in eventpoll.c et al, I think that some time will
be taken for fds that are transitioning on events which you're not
interested in.  Notably, each time a TCP segment is sent and
acknowledged by the other end, poll-waiters are woken, your task will
be woken and do some work in epoll_wait(), but no events are returned
if you are only listening for read availability.

I'm not 100% sure of this, but tracing through

    skb->destructor
    -> sock_wfree()
    -> tcp_write_space()
    -> wake_up_interruptible()
    -> ep_poll_callback()

it looks as though _every_ TCP ACK you receive will cause epoll to wake up
a task which is interested in _any_ socket events, but then in

    <context switch>
    ep_poll()
    -> ep_events_transfer()
    -> ep_send_events()

no events are transferred, so ep_poll() will loop and try again.  This
is quite unfortunate if true, as many of the apps which need to scale
write a lot of segments without receiving very much.

> As we start to scale up to production sized fd sets it gets crazy: around
> 8000 completely idle fd's the cost is 4ms per syscall. At this point
> even a high real load (which gathers lots of I/O per call) doesn't cover the
> now very high latency for each trip into the kernel to gather more work.

It should only be 4ms per syscall if it's actually returning ~8000
ready events.  If you're listening to 8000 but only, say, 10 are
ready, it should be fast.

> What was interesting is the response time was non-linear up to around 400-500
> fd's, then went steep and linear after that, so you pay much more (maybe due
> to some cache effects, I didn't pursue) for each connecting client in a light
> load environment.

> This is not web traffic, the clients typically connect and sit mostly idle.

Can you post your code?

(Btw, I don't disagree with POLLRDHUP - I think it's a fine idea.  I'd
use it.  It'd be unfortunate if it only worked with some socket types
and was not set by others, though.  Global search and replace POLLHUP
with "POLLHUP | POLLRDHUP" in most setters?  Following that a bit
further, we might as well admit that POLLHUP should be called
POLLWRHUP.)

-- Jamie

  parent reply	other threads:[~2003-07-13 12:57 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-07-12 18:16 [Patch][RFC] epoll and half closed TCP connections Eric Varsanyi
2003-07-12 19:44 ` Jamie Lokier
2003-07-12 20:51   ` Eric Varsanyi
2003-07-12 20:48     ` Davide Libenzi
2003-07-12 21:19       ` Eric Varsanyi
2003-07-12 21:20         ` Davide Libenzi
2003-07-12 21:41         ` Davide Libenzi
2003-07-12 23:11           ` Eric Varsanyi
2003-07-12 23:55             ` Davide Libenzi
2003-07-13  1:05               ` Eric Varsanyi
2003-07-13 20:32       ` David Schwartz
2003-07-13 21:10         ` Jamie Lokier
2003-07-13 23:05           ` David Schwartz
2003-07-13 23:09             ` Davide Libenzi
2003-07-14  8:14               ` Alan Cox
2003-07-14 15:03                 ` Davide Libenzi
2003-07-14  1:27             ` Jamie Lokier
2003-07-13 21:14         ` Davide Libenzi
2003-07-13 23:05           ` David Schwartz
2003-07-13 23:11             ` Davide Libenzi
2003-07-13 23:52             ` Entrope
2003-07-14  6:14               ` David Schwartz
2003-07-14  7:20                 ` Jamie Lokier
2003-07-14  1:51             ` Jamie Lokier
2003-07-14  6:14               ` David Schwartz
2003-07-15 20:27             ` James Antill
2003-07-16  1:46               ` David Schwartz
2003-07-16  2:09                 ` James Antill
2003-07-13 13:12     ` Jamie Lokier [this message]
2003-07-13 16:55       ` Davide Libenzi
2003-07-12 20:01 ` Davide Libenzi
2003-07-13  5:24   ` David S. Miller
2003-07-13 14:07     ` Jamie Lokier
2003-07-13 17:00       ` Davide Libenzi
2003-07-13 19:15         ` Jamie Lokier
2003-07-13 23:03           ` Davide Libenzi
2003-07-14  1:41             ` Jamie Lokier
2003-07-14  2:24               ` POLLRDONCE optimisation for epoll users (was: epoll and half closed TCP connections) Jamie Lokier
2003-07-14  2:37                 ` Davide Libenzi
2003-07-14  2:43                   ` Davide Libenzi
2003-07-14  2:56                   ` Jamie Lokier
2003-07-14  3:02                     ` Davide Libenzi
2003-07-14  3:16                       ` Jamie Lokier
2003-07-14  3:21                         ` Davide Libenzi
2003-07-14  3:42                           ` Jamie Lokier
2003-07-14  4:00                             ` Davide Libenzi
2003-07-14  5:51                               ` Jamie Lokier
2003-07-14  6:24                                 ` Davide Libenzi
2003-07-14  6:57                                   ` Jamie Lokier
2003-07-14  3:12                     ` Jamie Lokier
2003-07-14  3:17                       ` Davide Libenzi
2003-07-14  3:35                         ` Jamie Lokier
2003-07-14  3:04                   ` Jamie Lokier
2003-07-14  3:12                     ` Davide Libenzi
2003-07-14  3:27                       ` Jamie Lokier
2003-07-14 17:09     ` [Patch][RFC] epoll and half closed TCP connections kuznet
2003-07-14 17:09       ` Davide Libenzi
2003-07-14 21:45       ` Jamie Lokier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030713131210.GA19132@mail.jlokier.co.uk \
    --to=jamie@shareable.org \
    --cc=davidel@xmailserver.org \
    --cc=e0206@foo21.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.