From: Davide Libenzi <davidel@xmailserver.org>
To: Jamie Lokier <lk@tantalophile.demon.co.uk>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
<linux-aio@kvack.org>, <lse-tech@lists.sourceforge.net>,
Linus Torvalds <torvalds@transmeta.com>,
Andrew Morton <akpm@digeo.com>,
Alan Cox <alan@lxorguk.ukuu.org.uk>
Subject: Re: Unifying epoll,aio,futexes etc. (What I really want from epoll)
Date: Thu, 31 Oct 2002 12:28:11 -0800 (PST) [thread overview]
Message-ID: <Pine.LNX.4.44.0210311211160.1562-100000@blue1.dev.mcafeelabs.com> (raw)
In-Reply-To: <20021031154112.GB27801@bjl1.asuk.net>
On Thu, 31 Oct 2002, Jamie Lokier wrote:
> ps. I thought I should explain what bothers me most about epoll at the
> moment. It's good at what it does, but it's so very limited in what
> it supports.
>
> I have a high performance server application in mind, that epoll is
> _almost_ perfect for but not quite.
>
> Davide, you like coroutines, so perhaps you will appreciate a web
> server that serves a mixture of dynamic and static content, using
> coroutines and user+kernel threading in a carefully balanced way.
> Dynamic content is cached, accurately (taking advantage of nanosecond
> mtimes if possible), yet served as fast as static pages (using a
> clever cache validation method), and is built from files (read using
> aio to improve throughput) and subrequests to other servers just like
> a proxy. Data is served zero-copy using sendfile and /dev/shm.
>
> A top quality server like that, optimised for performance, has to
> respond to these events:
>
> - network accept()
> - read/write/exception on sockets and pipes
> - timers
> - aio
> - futexes
> - dnotify events
>
> See how epoll only helps with the first two? And this is the very
> application space that epoll could _almost_ be perfect for.
>
> Btw, it doesn't _have_ to be a web server. Enterprise scale Java
> runtimes, database servers, spider clients, network load generators,
> proxies, even humble X servers - also have very similar requirements.
>
> There are several scalable and fast event queuing mechanisms in the
> kernel now: rt-signals, aio and epoll, yet each of them is limited by
> only keeping track of a few kinds of possible event.
>
> Technically, it's possible to use them all together. If you want to
> react to all the kinds of events I listed above, you have to. But
> it's mighty ugly code to use them all at once, and it's certainly not
> the "lean and mean" event loop that everyone aspires to.
>
> By adding yet another mechanism without solving the general problem,
> epoll just makes the mighty ugly userspace more ugly. (But it's
> probably worth using - socket notifcation through rt-signals has its
> own problems).
>
> I would very much like to see a general solution to the problem of all
> different kinds of events being queued to userspace efficiently,
> through one mechanism ("to bind them all..."). Every piece of this puzzle
> has been written already, they're just not joined up very well.
>
> I'm giving this serious thought now, if anyone wants to offer input.
Jamie, the fact that epoll supports a limited number of "objects" was an
as-designed at that time. I see it quite easy to extend it to support
other objects. Futexes are a matter of one line of code int :
/* Waiter either waiting in FUTEX_WAIT or poll(), or expecting signal */
static inline void tell_waiter(struct futex_q *q)
{
wake_up_all(&q->waiters);
if (q->filp) {
send_sigio(&q->filp->f_owner, q->fd, POLL_IN);
+ file_notify_send(q->filp, ION_IN, POLLIN | POLLRDNORM);
}
}
Timer, as long as you access them through a file* interface ( like futexes )
will become trivial too. Another line should be sufficent for dnotify :
void __inode_dir_notify(struct inode *inode, unsigned long event)
{
struct dnotify_struct * dn;
struct dnotify_struct **prev;
struct fown_struct * fown;
int changed = 0;
write_lock(&dn_lock);
prev = &inode->i_dnotify;
while ((dn = *prev) != NULL) {
if ((dn->dn_mask & event) == 0) {
prev = &dn->dn_next;
continue;
}
fown = &dn->dn_filp->f_owner;
send_sigio(fown, dn->dn_fd, POLL_MSG);
+ file_notify_send(dn->dn_filp, ION_IN, POLLIN | POLLRDNORM | POLLMSG);
if (dn->dn_mask & DN_MULTISHOT)
prev = &dn->dn_next;
else {
*prev = dn->dn_next;
changed = 1;
kmem_cache_free(dn_cache, dn);
}
}
if (changed)
redo_inode_mask(inode);
write_unlock(&dn_lock);
}
This is the result of a quite quick analysis, but I do not expect it to be
much more difficult than that.
- Davide
next prev parent reply other threads:[~2002-10-31 20:13 UTC|newest]
Thread overview: 117+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-10-28 19:14 [PATCH] epoll more scalable than poll Hanna Linder
2002-10-28 20:10 ` Hanna Linder
2002-10-28 20:56 ` Martin Waitz
2002-10-28 22:02 ` bert hubert
2002-10-28 22:15 ` bert hubert
2002-10-28 22:17 ` Davide Libenzi
2002-10-28 22:08 ` bert hubert
2002-10-28 22:12 ` [Lse-tech] " Shailabh Nagar
2002-10-28 22:37 ` Davide Libenzi
2002-10-28 22:29 ` Davide Libenzi
2002-10-28 22:58 ` and nicer too - " bert hubert
2002-10-28 23:23 ` Davide Libenzi
2002-10-28 23:45 ` John Gardiner Myers
2002-10-29 0:08 ` Davide Libenzi
2002-10-29 12:59 ` Martin Waitz
2002-10-29 15:19 ` bert hubert
2002-10-29 22:54 ` Martin Waitz
2002-10-30 2:24 ` Davide Libenzi
2002-10-30 19:38 ` Martin Waitz
2002-10-31 5:04 ` Davide Libenzi
2002-10-29 0:18 ` bert hubert
2002-10-29 0:32 ` Davide Libenzi
2002-10-29 0:40 ` bert hubert
2002-10-29 0:57 ` Davide Libenzi
2002-10-29 0:53 ` bert hubert
2002-10-29 1:13 ` Davide Libenzi
2002-10-29 1:08 ` [Lse-tech] " Hanna Linder
2002-10-29 1:39 ` Davide Libenzi
2002-10-29 2:05 ` Jamie Lokier
2002-10-29 2:44 ` Davide Libenzi
2002-10-29 4:01 ` [PATCH] Updated sys_epoll now with man pages Hanna Linder
2002-10-29 5:09 ` Andrew Morton
2002-10-29 5:28 ` [Lse-tech] " Randy.Dunlap
2002-10-29 5:47 ` Davide Libenzi
2002-10-29 5:41 ` Randy.Dunlap
2002-10-29 6:12 ` Davide Libenzi
2002-10-29 6:03 ` Randy.Dunlap
2002-10-29 6:23 ` Davide Libenzi
2002-10-29 14:59 ` Paul Larson
2002-10-29 5:31 ` Davide Libenzi
2002-10-29 7:34 ` Davide Libenzi
2002-10-29 11:04 ` bert hubert
2002-10-29 15:30 ` [Lse-tech] " Shailabh Nagar
2002-10-29 17:45 ` Davide Libenzi
2002-10-29 19:30 ` Hanna Linder
2002-10-29 19:49 ` Davide Libenzi
2002-10-29 13:09 ` and nicer too - Re: [PATCH] epoll more scalable than poll bert hubert
2002-10-29 21:25 ` Davide Libenzi
2002-10-29 21:23 ` Hanna Linder
2002-10-29 21:41 ` Davide Libenzi
2002-10-29 23:06 ` Hanna Linder
2002-10-29 23:14 ` [Lse-tech] " Randy.Dunlap
2002-10-29 23:25 ` Davide Libenzi
2002-10-29 1:47 ` Security critical race condition in epoll code John Gardiner Myers
2002-10-29 2:13 ` Davide Libenzi
2002-10-29 3:38 ` Davide Libenzi
2002-10-29 19:49 ` and nicer too - Re: [PATCH] epoll more scalable than poll John Gardiner Myers
2002-10-29 21:03 ` Davide Libenzi
2002-10-30 0:26 ` Jamie Lokier
2002-10-30 2:09 ` Davide Libenzi
2002-10-30 5:51 ` Davide Libenzi
2002-10-30 2:22 ` John Gardiner Myers
2002-10-30 3:51 ` Davide Libenzi
2002-10-31 2:07 ` John Gardiner Myers
2002-10-31 3:21 ` Davide Libenzi
2002-10-31 11:10 ` [Lse-tech] " Suparna Bhattacharya
2002-10-31 18:42 ` Davide Libenzi
2002-10-30 23:01 ` Jamie Lokier
2002-10-30 23:53 ` Davide Libenzi
2002-10-31 0:52 ` Jamie Lokier
2002-10-31 4:15 ` Davide Libenzi
2002-10-31 15:07 ` Jamie Lokier
2002-10-31 19:10 ` Davide Libenzi
2002-11-01 17:42 ` Dan Kegel
2002-11-01 17:45 ` Davide Libenzi
2002-11-01 18:41 ` Dan Kegel
2002-11-01 19:16 ` Jamie Lokier
2002-11-01 20:04 ` Charlie Krasic
2002-11-01 20:14 ` Jamie Lokier
2002-11-01 20:22 ` Mark Mielke
2002-10-31 15:41 ` Unifying epoll,aio,futexes etc. (What I really want from epoll) Jamie Lokier
2002-10-31 15:48 ` bert hubert
2002-10-31 16:45 ` Alan Cox
2002-10-31 22:00 ` Rusty Russell
2002-11-01 0:32 ` Jamie Lokier
2002-11-01 13:23 ` Alan Cox
2002-10-31 20:28 ` Davide Libenzi [this message]
2002-10-31 23:02 ` Jamie Lokier
2002-11-01 1:01 ` Davide Libenzi
2002-11-01 2:01 ` Jamie Lokier
2002-11-01 17:36 ` Davide Libenzi
2002-11-01 23:27 ` John Gardiner Myers
2002-11-02 4:55 ` Mark Mielke
2002-11-05 18:15 ` pipe POLLOUT oddity John Gardiner Myers
2002-11-05 18:18 ` Benjamin LaHaise
2002-11-02 15:41 ` Unifying epoll,aio,futexes etc. (What I really want from epoll) Jamie Lokier
2002-11-01 20:45 ` Jamie Lokier
2002-11-01 1:55 ` Matthew D. Hall
2002-11-01 2:54 ` Davide Libenzi
2002-11-01 18:18 ` Dan Kegel
2002-11-01 2:56 ` Jamie Lokier
2002-11-01 23:16 ` John Gardiner Myers
2002-11-01 4:29 ` Mark Mielke
2002-11-01 4:59 ` Jamie Lokier
2002-10-30 18:59 ` and nicer too - Re: [PATCH] epoll more scalable than poll Zach Brown
2002-10-30 19:25 ` Davide Libenzi
2002-10-31 16:54 ` Davide Libenzi
2002-10-28 23:44 ` Jamie Lokier
2002-10-29 0:02 ` Davide Libenzi
2002-10-29 1:51 ` Jamie Lokier
2002-10-29 5:06 ` Davide Libenzi
2002-10-29 11:20 ` Jamie Lokier
2002-10-30 0:16 ` Davide Libenzi
2002-10-29 0:03 ` bert hubert
2002-10-29 0:20 ` Davide Libenzi
2002-10-29 0:48 ` Jamie Lokier
2002-10-29 1:53 ` Jamie Lokier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.44.0210311211160.1562-100000@blue1.dev.mcafeelabs.com \
--to=davidel@xmailserver.org \
--cc=akpm@digeo.com \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=linux-aio@kvack.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lk@tantalophile.demon.co.uk \
--cc=lse-tech@lists.sourceforge.net \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).