linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Davide Libenzi <davidel@xmailserver.org>
To: Jamie Lokier <lk@tantalophile.demon.co.uk>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	<linux-aio@kvack.org>, <lse-tech@lists.sourceforge.net>,
	Linus Torvalds <torvalds@transmeta.com>,
	Andrew Morton <akpm@digeo.com>,
	Alan Cox <alan@lxorguk.ukuu.org.uk>
Subject: Re: Unifying epoll,aio,futexes etc. (What I really want from epoll)
Date: Thu, 31 Oct 2002 12:28:11 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.44.0210311211160.1562-100000@blue1.dev.mcafeelabs.com> (raw)
In-Reply-To: <20021031154112.GB27801@bjl1.asuk.net>

On Thu, 31 Oct 2002, Jamie Lokier wrote:

> ps. I thought I should explain what bothers me most about epoll at the
> moment.  It's good at what it does, but it's so very limited in what
> it supports.
>
> I have a high performance server application in mind, that epoll is
> _almost_ perfect for but not quite.
>
> Davide, you like coroutines, so perhaps you will appreciate a web
> server that serves a mixture of dynamic and static content, using
> coroutines and user+kernel threading in a carefully balanced way.
> Dynamic content is cached, accurately (taking advantage of nanosecond
> mtimes if possible), yet served as fast as static pages (using a
> clever cache validation method), and is built from files (read using
> aio to improve throughput) and subrequests to other servers just like
> a proxy.  Data is served zero-copy using sendfile and /dev/shm.
>
> A top quality server like that, optimised for performance, has to
> respond to these events:
>
> 	- network accept()
> 	- read/write/exception on sockets and pipes
> 	- timers
> 	- aio
> 	- futexes
> 	- dnotify events
>
> See how epoll only helps with the first two?  And this is the very
> application space that epoll could _almost_ be perfect for.
>
> Btw, it doesn't _have_ to be a web server.  Enterprise scale Java
> runtimes, database servers, spider clients, network load generators,
> proxies, even humble X servers - also have very similar requirements.
>
> There are several scalable and fast event queuing mechanisms in the
> kernel now: rt-signals, aio and epoll, yet each of them is limited by
> only keeping track of a few kinds of possible event.
>
> Technically, it's possible to use them all together.  If you want to
> react to all the kinds of events I listed above, you have to.  But
> it's mighty ugly code to use them all at once, and it's certainly not
> the "lean and mean" event loop that everyone aspires to.
>
> By adding yet another mechanism without solving the general problem,
> epoll just makes the mighty ugly userspace more ugly.  (But it's
> probably worth using - socket notifcation through rt-signals has its
> own problems).
>
> I would very much like to see a general solution to the problem of all
> different kinds of events being queued to userspace efficiently,
> through one mechanism ("to bind them all...").  Every piece of this puzzle
> has been written already, they're just not joined up very well.
>
> I'm giving this serious thought now, if anyone wants to offer input.

Jamie, the fact that epoll supports a limited number of "objects" was an
as-designed at that time. I see it quite easy to extend it to support
other objects. Futexes are a matter of one line of code int :

/* Waiter either waiting in FUTEX_WAIT or poll(), or expecting signal */
static inline void tell_waiter(struct futex_q *q)
{
        wake_up_all(&q->waiters);
        if (q->filp) {
                send_sigio(&q->filp->f_owner, q->fd, POLL_IN);
+		file_notify_send(q->filp, ION_IN, POLLIN | POLLRDNORM);
	}
}

Timer, as long as you access them through a file* interface ( like futexes )
will become trivial too. Another line should be sufficent for dnotify :

void __inode_dir_notify(struct inode *inode, unsigned long event)
{
        struct dnotify_struct * dn;
        struct dnotify_struct **prev;
        struct fown_struct *    fown;
        int                     changed = 0;

        write_lock(&dn_lock);
        prev = &inode->i_dnotify;
        while ((dn = *prev) != NULL) {
                if ((dn->dn_mask & event) == 0) {
                        prev = &dn->dn_next;
                        continue;
                }
                fown = &dn->dn_filp->f_owner;
                send_sigio(fown, dn->dn_fd, POLL_MSG);
+		file_notify_send(dn->dn_filp, ION_IN, POLLIN | POLLRDNORM | POLLMSG);
                if (dn->dn_mask & DN_MULTISHOT)
                        prev = &dn->dn_next;
                else {
                        *prev = dn->dn_next;
                        changed = 1;
                        kmem_cache_free(dn_cache, dn);
                }
        }
        if (changed)
                redo_inode_mask(inode);
        write_unlock(&dn_lock);
}

This is the result of a quite quick analysis, but I do not expect it to be
much more difficult than that.




- Davide



  parent reply	other threads:[~2002-10-31 20:13 UTC|newest]

Thread overview: 117+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-10-28 19:14 [PATCH] epoll more scalable than poll Hanna Linder
2002-10-28 20:10 ` Hanna Linder
2002-10-28 20:56 ` Martin Waitz
2002-10-28 22:02   ` bert hubert
2002-10-28 22:15     ` bert hubert
2002-10-28 22:17   ` Davide Libenzi
2002-10-28 22:08 ` bert hubert
2002-10-28 22:12   ` [Lse-tech] " Shailabh Nagar
2002-10-28 22:37     ` Davide Libenzi
2002-10-28 22:29   ` Davide Libenzi
2002-10-28 22:58     ` and nicer too - " bert hubert
2002-10-28 23:23       ` Davide Libenzi
2002-10-28 23:45       ` John Gardiner Myers
2002-10-29  0:08         ` Davide Libenzi
2002-10-29 12:59           ` Martin Waitz
2002-10-29 15:19             ` bert hubert
2002-10-29 22:54               ` Martin Waitz
2002-10-30  2:24                 ` Davide Libenzi
2002-10-30 19:38                   ` Martin Waitz
2002-10-31  5:04                     ` Davide Libenzi
2002-10-29  0:18         ` bert hubert
2002-10-29  0:32           ` Davide Libenzi
2002-10-29  0:40             ` bert hubert
2002-10-29  0:57               ` Davide Libenzi
2002-10-29  0:53                 ` bert hubert
2002-10-29  1:13                   ` Davide Libenzi
2002-10-29  1:08                     ` [Lse-tech] " Hanna Linder
2002-10-29  1:39                       ` Davide Libenzi
2002-10-29  2:05                   ` Jamie Lokier
2002-10-29  2:44                     ` Davide Libenzi
2002-10-29  4:01                       ` [PATCH] Updated sys_epoll now with man pages Hanna Linder
2002-10-29  5:09                         ` Andrew Morton
2002-10-29  5:28                           ` [Lse-tech] " Randy.Dunlap
2002-10-29  5:47                             ` Davide Libenzi
2002-10-29  5:41                               ` Randy.Dunlap
2002-10-29  6:12                                 ` Davide Libenzi
2002-10-29  6:03                                   ` Randy.Dunlap
2002-10-29  6:23                                     ` Davide Libenzi
2002-10-29 14:59                             ` Paul Larson
2002-10-29  5:31                           ` Davide Libenzi
2002-10-29  7:34                           ` Davide Libenzi
2002-10-29 11:04                           ` bert hubert
2002-10-29 15:30                           ` [Lse-tech] " Shailabh Nagar
2002-10-29 17:45                             ` Davide Libenzi
2002-10-29 19:30                               ` Hanna Linder
2002-10-29 19:49                                 ` Davide Libenzi
2002-10-29 13:09                 ` and nicer too - Re: [PATCH] epoll more scalable than poll bert hubert
2002-10-29 21:25                   ` Davide Libenzi
2002-10-29 21:23                     ` Hanna Linder
2002-10-29 21:41                       ` Davide Libenzi
2002-10-29 23:06                         ` Hanna Linder
2002-10-29 23:14                           ` [Lse-tech] " Randy.Dunlap
2002-10-29 23:25                           ` Davide Libenzi
2002-10-29  1:47           ` Security critical race condition in epoll code John Gardiner Myers
2002-10-29  2:13             ` Davide Libenzi
2002-10-29  3:38             ` Davide Libenzi
2002-10-29 19:49           ` and nicer too - Re: [PATCH] epoll more scalable than poll John Gardiner Myers
2002-10-29 21:03             ` Davide Libenzi
2002-10-30  0:26               ` Jamie Lokier
2002-10-30  2:09                 ` Davide Libenzi
2002-10-30  5:51                 ` Davide Libenzi
2002-10-30  2:22               ` John Gardiner Myers
2002-10-30  3:51                 ` Davide Libenzi
2002-10-31  2:07                   ` John Gardiner Myers
2002-10-31  3:21                     ` Davide Libenzi
2002-10-31 11:10                       ` [Lse-tech] " Suparna Bhattacharya
2002-10-31 18:42                         ` Davide Libenzi
2002-10-30 23:01                 ` Jamie Lokier
2002-10-30 23:53                   ` Davide Libenzi
2002-10-31  0:52                     ` Jamie Lokier
2002-10-31  4:15                       ` Davide Libenzi
2002-10-31 15:07                         ` Jamie Lokier
2002-10-31 19:10                           ` Davide Libenzi
2002-11-01 17:42                             ` Dan Kegel
2002-11-01 17:45                               ` Davide Libenzi
2002-11-01 18:41                                 ` Dan Kegel
2002-11-01 19:16                               ` Jamie Lokier
2002-11-01 20:04                                 ` Charlie Krasic
2002-11-01 20:14                                   ` Jamie Lokier
2002-11-01 20:22                                 ` Mark Mielke
2002-10-31 15:41                         ` Unifying epoll,aio,futexes etc. (What I really want from epoll) Jamie Lokier
2002-10-31 15:48                           ` bert hubert
2002-10-31 16:45                           ` Alan Cox
2002-10-31 22:00                             ` Rusty Russell
2002-11-01  0:32                               ` Jamie Lokier
2002-11-01 13:23                               ` Alan Cox
2002-10-31 20:28                           ` Davide Libenzi [this message]
2002-10-31 23:02                             ` Jamie Lokier
2002-11-01  1:01                               ` Davide Libenzi
2002-11-01  2:01                                 ` Jamie Lokier
2002-11-01 17:36                                   ` Davide Libenzi
2002-11-01 23:27                                   ` John Gardiner Myers
2002-11-02  4:55                                     ` Mark Mielke
2002-11-05 18:15                                       ` pipe POLLOUT oddity John Gardiner Myers
2002-11-05 18:18                                         ` Benjamin LaHaise
2002-11-02 15:41                                     ` Unifying epoll,aio,futexes etc. (What I really want from epoll) Jamie Lokier
2002-11-01 20:45                                 ` Jamie Lokier
2002-11-01  1:55                               ` Matthew D. Hall
2002-11-01  2:54                                 ` Davide Libenzi
2002-11-01 18:18                                   ` Dan Kegel
2002-11-01  2:56                                 ` Jamie Lokier
2002-11-01 23:16                                 ` John Gardiner Myers
2002-11-01  4:29                               ` Mark Mielke
2002-11-01  4:59                                 ` Jamie Lokier
2002-10-30 18:59               ` and nicer too - Re: [PATCH] epoll more scalable than poll Zach Brown
2002-10-30 19:25                 ` Davide Libenzi
2002-10-31 16:54                 ` Davide Libenzi
2002-10-28 23:44     ` Jamie Lokier
2002-10-29  0:02       ` Davide Libenzi
2002-10-29  1:51         ` Jamie Lokier
2002-10-29  5:06           ` Davide Libenzi
2002-10-29 11:20             ` Jamie Lokier
2002-10-30  0:16               ` Davide Libenzi
2002-10-29  0:03       ` bert hubert
2002-10-29  0:20         ` Davide Libenzi
2002-10-29  0:48         ` Jamie Lokier
2002-10-29  1:53           ` Jamie Lokier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.44.0210311211160.1562-100000@blue1.dev.mcafeelabs.com \
    --to=davidel@xmailserver.org \
    --cc=akpm@digeo.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=linux-aio@kvack.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lk@tantalophile.demon.co.uk \
    --cc=lse-tech@lists.sourceforge.net \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).