From: Jamie Lokier <jamie@shareable.org>
To: Eric Varsanyi <e0206@foo21.com>
Cc: linux-kernel@vger.kernel.org, davidel@xmailserver.org
Subject: Re: [Patch][RFC] epoll and half closed TCP connections
Date: Sun, 13 Jul 2003 14:12:10 +0100 [thread overview]
Message-ID: <20030713131210.GA19132@mail.jlokier.co.uk> (raw)
In-Reply-To: <20030712205114.GC15643@srv.foo21.com>
Eric Varsanyi wrote:
> > Well then, use epoll's level-triggered mode. It's quite easy - it's
> > the default now. :)
>
> The problem with all the level triggered schemes (poll, select, epoll w/o
> EPOLLET) is that they call every driver and poll status for every call into
> the kernel. This appeared to be killing my app's performance and I verified
> by writing some simple micro benchmarks.
OH! :-O
Level-triggered epoll_wait() time _should_ be scalable - proportional
to the number of ready events, not the number of listening events. If
this is not the case then it's a bug in epoll.
In principle, you will see a large delay only if you don't handle
those events (e.g. by calling read() on each ready fd), so that they
are still ready.
Reading the code in eventpoll.c et al, I think that some time will
be taken for fds that are transitioning on events which you're not
interested in. Notably, each time a TCP segment is sent and
acknowledged by the other end, poll-waiters are woken, your task will
be woken and do some work in epoll_wait(), but no events are returned
if you are only listening for read availability.
I'm not 100% sure of this, but tracing through
skb->destructor
-> sock_wfree()
-> tcp_write_space()
-> wake_up_interruptible()
-> ep_poll_callback()
it looks as though _every_ TCP ACK you receive will cause epoll to wake up
a task which is interested in _any_ socket events, but then in
<context switch>
ep_poll()
-> ep_events_transfer()
-> ep_send_events()
no events are transferred, so ep_poll() will loop and try again. This
is quite unfortunate if true, as many of the apps which need to scale
write a lot of segments without receiving very much.
> As we start to scale up to production sized fd sets it gets crazy: around
> 8000 completely idle fd's the cost is 4ms per syscall. At this point
> even a high real load (which gathers lots of I/O per call) doesn't cover the
> now very high latency for each trip into the kernel to gather more work.
It should only be 4ms per syscall if it's actually returning ~8000
ready events. If you're listening to 8000 but only, say, 10 are
ready, it should be fast.
> What was interesting is the response time was non-linear up to around 400-500
> fd's, then went steep and linear after that, so you pay much more (maybe due
> to some cache effects, I didn't pursue) for each connecting client in a light
> load environment.
> This is not web traffic, the clients typically connect and sit mostly idle.
Can you post your code?
(Btw, I don't disagree with POLLRDHUP - I think it's a fine idea. I'd
use it. It'd be unfortunate if it only worked with some socket types
and was not set by others, though. Global search and replace POLLHUP
with "POLLHUP | POLLRDHUP" in most setters? Following that a bit
further, we might as well admit that POLLHUP should be called
POLLWRHUP.)
-- Jamie
next prev parent reply other threads:[~2003-07-13 12:57 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-07-12 18:16 [Patch][RFC] epoll and half closed TCP connections Eric Varsanyi
2003-07-12 19:44 ` Jamie Lokier
2003-07-12 20:51 ` Eric Varsanyi
2003-07-12 20:48 ` Davide Libenzi
2003-07-12 21:19 ` Eric Varsanyi
2003-07-12 21:20 ` Davide Libenzi
2003-07-12 21:41 ` Davide Libenzi
2003-07-12 23:11 ` Eric Varsanyi
2003-07-12 23:55 ` Davide Libenzi
2003-07-13 1:05 ` Eric Varsanyi
2003-07-13 20:32 ` David Schwartz
2003-07-13 21:10 ` Jamie Lokier
2003-07-13 23:05 ` David Schwartz
2003-07-13 23:09 ` Davide Libenzi
2003-07-14 8:14 ` Alan Cox
2003-07-14 15:03 ` Davide Libenzi
2003-07-14 1:27 ` Jamie Lokier
2003-07-13 21:14 ` Davide Libenzi
2003-07-13 23:05 ` David Schwartz
2003-07-13 23:11 ` Davide Libenzi
2003-07-13 23:52 ` Entrope
2003-07-14 6:14 ` David Schwartz
2003-07-14 7:20 ` Jamie Lokier
2003-07-14 1:51 ` Jamie Lokier
2003-07-14 6:14 ` David Schwartz
2003-07-15 20:27 ` James Antill
2003-07-16 1:46 ` David Schwartz
2003-07-16 2:09 ` James Antill
2003-07-13 13:12 ` Jamie Lokier [this message]
2003-07-13 16:55 ` Davide Libenzi
2003-07-12 20:01 ` Davide Libenzi
2003-07-13 5:24 ` David S. Miller
2003-07-13 14:07 ` Jamie Lokier
2003-07-13 17:00 ` Davide Libenzi
2003-07-13 19:15 ` Jamie Lokier
2003-07-13 23:03 ` Davide Libenzi
2003-07-14 1:41 ` Jamie Lokier
2003-07-14 2:24 ` POLLRDONCE optimisation for epoll users (was: epoll and half closed TCP connections) Jamie Lokier
2003-07-14 2:37 ` Davide Libenzi
2003-07-14 2:43 ` Davide Libenzi
2003-07-14 2:56 ` Jamie Lokier
2003-07-14 3:02 ` Davide Libenzi
2003-07-14 3:16 ` Jamie Lokier
2003-07-14 3:21 ` Davide Libenzi
2003-07-14 3:42 ` Jamie Lokier
2003-07-14 4:00 ` Davide Libenzi
2003-07-14 5:51 ` Jamie Lokier
2003-07-14 6:24 ` Davide Libenzi
2003-07-14 6:57 ` Jamie Lokier
2003-07-14 3:12 ` Jamie Lokier
2003-07-14 3:17 ` Davide Libenzi
2003-07-14 3:35 ` Jamie Lokier
2003-07-14 3:04 ` Jamie Lokier
2003-07-14 3:12 ` Davide Libenzi
2003-07-14 3:27 ` Jamie Lokier
2003-07-14 17:09 ` [Patch][RFC] epoll and half closed TCP connections kuznet
2003-07-14 17:09 ` Davide Libenzi
2003-07-14 21:45 ` Jamie Lokier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20030713131210.GA19132@mail.jlokier.co.uk \
--to=jamie@shareable.org \
--cc=davidel@xmailserver.org \
--cc=e0206@foo21.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).