linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* sigopen() vs. /dev/sigtimedwait
@ 2001-08-04  1:32 Dan Kegel
  2001-08-04  1:38 ` Petru Paler
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Kegel @ 2001-08-04  1:32 UTC (permalink / raw)
  To: Christopher Smith, linux-kernel; +Cc: Michael Elkins, Zach Brown

So I've been thinking about the sigopen() system call I proposed.
(To recap: sigopen() would let you use read() instead of sigwaitinfo()
 to retrieve lots of realtime signals at one go, AND would
 protect your signal from being swiped by hostile code elsewhere
 in the application, a la Sun's JDK.)

Upon further consideration, maybe I should model it after
/dev/epoll.  That would get rid of nagging questions like
"but read() can't leave holes like sigtimedwait could",
and would be even higher performance than read()
(see graphs at http://www.xmailserver.org/linux-patches/nio-improve.html )

So I'm proposing the following user story:

  // open a fd linked to signal mysignum
  int fd = open("/dev/sigtimedwait", O_RDWR);
  int sigs[1]; sigs[0] = mysignum;
  write(fd, sigs, sizeof(sigs[0]));

  // memory map a result buffer
  struct siginfo_t *map = mmap(NULL, mapsize, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);

  for (;;) {
      // grab recent siginfo_t's
      struct devsiginfo dsi;
      dsi.dsi_nsis = 1000;
      dsi.dsi_sis = NULL;      // NULL means "use map instead of buffer"
      dsi.dsi_timeout = 1;
      int nsis = ioctl(fd, DS_SIGTIMEDWAIT, &dvp);   

      // use 'em.  Some might be completion notifications; some might be readiness notifications.
      for (i=0; i<nsis; i++)
          handle_siginfo(map+i);
  }

Sure, the interface is crap, but it's fast, and at least it doesn't
add any syscalls (the sigopen() proposal required two new syscalls: sigopen()
and timedread()).

Comments?

BTW I'm halfway thru "Understanding the Linux Kernel" and it's
a very good read (modulo some strange lingo, e.g. "cycle" for "loop"
and "table" for "record" or "struct").
So since I only halfway understand the linux kernel, the above proposal
may be half baked.
- Dan

-- 
"I have seen the future, and it licks itself clean." -- Bucky Katt

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sigopen() vs. /dev/sigtimedwait
  2001-08-04  1:32 sigopen() vs. /dev/sigtimedwait Dan Kegel
@ 2001-08-04  1:38 ` Petru Paler
  2001-08-04  2:10   ` Dan Kegel
  0 siblings, 1 reply; 8+ messages in thread
From: Petru Paler @ 2001-08-04  1:38 UTC (permalink / raw)
  To: Dan Kegel; +Cc: Christopher Smith, linux-kernel, Michael Elkins, Zach Brown

On Fri, Aug 03, 2001 at 06:32:52PM -0700, Dan Kegel wrote:
> So I'm proposing the following user story:
> 
>   // open a fd linked to signal mysignum
>   int fd = open("/dev/sigtimedwait", O_RDWR);
>   int sigs[1]; sigs[0] = mysignum;
>   write(fd, sigs, sizeof(sigs[0]));
> 
>   // memory map a result buffer
>   struct siginfo_t *map = mmap(NULL, mapsize, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
> 
>   for (;;) {
>       // grab recent siginfo_t's
>       struct devsiginfo dsi;
>       dsi.dsi_nsis = 1000;
>       dsi.dsi_sis = NULL;      // NULL means "use map instead of buffer"
>       dsi.dsi_timeout = 1;
>       int nsis = ioctl(fd, DS_SIGTIMEDWAIT, &dvp);   
> 
>       // use 'em.  Some might be completion notifications; some might be readiness notifications.
>       for (i=0; i<nsis; i++)
>           handle_siginfo(map+i);
>   }

And the advantage of this over /dev/epoll would be that you don't have to
explicitly add/remove fd's?

I ask because yesterday I used /dev/epoll in a project and it behaves *very*
well, so I'm wondering what advantages your interface would bring.

> Comments?

How do you handle signal queue overflow? signal-per-fd helps, but you still
have to have the queue as big as the maximum number of fds is...

Petru

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sigopen() vs. /dev/sigtimedwait
  2001-08-04  1:38 ` Petru Paler
@ 2001-08-04  2:10   ` Dan Kegel
  2001-08-04  3:04     ` Could /dev/epoll deliver aio completion notifications? (was: Re: sigopen() vs. /dev/sigtimedwait) Dan Kegel
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Kegel @ 2001-08-04  2:10 UTC (permalink / raw)
  To: Petru Paler; +Cc: Christopher Smith, linux-kernel, Zach Brown

Petru Paler wrote:
> 
> On Fri, Aug 03, 2001 at 06:32:52PM -0700, Dan Kegel wrote:
> > So I'm proposing the following user story:
> >
> >   // open a fd linked to signal mysignum
> >   int fd = open("/dev/sigtimedwait", O_RDWR);
> >   int sigs[1]; sigs[0] = mysignum;
> >   write(fd, sigs, sizeof(sigs[0]));
> >
> >   // memory map a result buffer
> >   struct siginfo_t *map = mmap(NULL, mapsize, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
> >
> >   for (;;) {
> >       // grab recent siginfo_t's
> >       struct devsiginfo dsi;
> >       dsi.dsi_nsis = 1000;
> >       dsi.dsi_sis = NULL;      // NULL means "use map instead of buffer"
> >       dsi.dsi_timeout = 1;
> >       int nsis = ioctl(fd, DS_SIGTIMEDWAIT, &dvp);
> >
> >       // use 'em.  Some might be completion notifications; some might be readiness notifications.
> >       for (i=0; i<nsis; i++)
> >           handle_siginfo(map+i);
> >   }
> 
> And the advantage of this over /dev/epoll would be that you don't have to
> explicitly add/remove fd's?

The advantage is that it can be used to collect
completion notifications for aio.  (It can also be
used to collect readiness notification via either
linux's traditional rtsig stuff, or the signal-per-fd stuff,
so this unifies readiness notification and completion notification,
in case you happen to want to use both in the same thread.)
 
> I ask because yesterday I used /dev/epoll in a project and it behaves *very*
> well, so I'm wondering what advantages your interface would bring.

I am a huge fan of /dev/epoll and would like to see it integrated
into the ac series.  /dev/epoll doesn't address the needs of those
who are doing aio, though.
 
> How do you handle signal queue overflow? signal-per-fd helps, but you still
> have to have the queue as big as the maximum number of fds is...

I am not addressing that issue.  However, when doing aio, the
application can simply avoid issuing more than N I/O operations,
where N is comfortably lower than the current size of the signal queue.

When I get around to reading the kernel source finally, maybe I'll
have a look at what the costs of large signal queues are.

- Dan

-- 
"I have seen the future, and it licks itself clean." -- Bucky Katt

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Could /dev/epoll deliver aio completion notifications? (was: Re:  sigopen() vs. /dev/sigtimedwait)
  2001-08-04  2:10   ` Dan Kegel
@ 2001-08-04  3:04     ` Dan Kegel
  2001-08-04  5:18       ` Zach Brown
  2001-08-23 21:17       ` Could /dev/epoll deliver aio completion notifications? (was: Davide Libenzi
  0 siblings, 2 replies; 8+ messages in thread
From: Dan Kegel @ 2001-08-04  3:04 UTC (permalink / raw)
  To: Petru Paler, Christopher Smith, linux-kernel, Zach Brown, Davide Libenzi

Dan Kegel wrote:
> Petru Paler wrote:
> > And the advantage of this over /dev/epoll would be that you don't have to
> > explicitly add/remove fd's?
> 
> The advantage is that it can be used to collect
> completion notifications for aio.  (It can also be
> used to collect readiness notification via either
> linux's traditional rtsig stuff, or the signal-per-fd stuff,
> so this unifies readiness notification and completion notification,
> in case you happen to want to use both in the same thread.)
> 
> > I ask because yesterday I used /dev/epoll in a project and it behaves *very*
> > well, so I'm wondering what advantages your interface would bring.
> 
> I am a huge fan of /dev/epoll and would like to see it integrated
> into the ac series.  /dev/epoll doesn't address the needs of those
> who are doing aio, though.

On the other hand, if /dev/epoll were flexible enough that it could
deliver AIO completion notifications, then /dev/sigtimedwait
would not be needed.  For instance:

// extend bits/poll.h
#define POLLAIO 0x800   // aio completion event; pollfd.fd contains aiocb *

      // open /dev/epoll and set up map as usual
      kdpfd = open("/dev/epoll", O_RDWR);
      char *map = mmap(NULL, mapsize, PROT_READ | PROT_WRITE, MAP_PRIVATE, kdpfd, 0);

      // tell our /dev/epoll fd that we're interested in events on fd diskfd
      struct pollfd pfd;
      pfd.fd = diskfd;
      pfd.events = AIOEVENT;
      pfd.revents = 0;
      write(kdpfd, &pfd, sizeof(pfd));

      // set up an asynchronous read from 'diskfd'
      struct aiocb *r = malloc(sizeof(*r));
      r->aio_filedes = diskfd;
      r->aio_... = ...
      // when read is finished, have it notify the /dev/epoll device 
      // interested in diskfd rather than sending a signal
      r->aio_sigevent.sigev_notify = SIGEV_NONE;
      aio_read(r);
      ...

      // Pick up events
      for (;;) {
          struct devpoll dvp;
          dvp.dp_nfds = 1000;
          dvp.dp_fds = NULL;      // NULL means "use map instead of buffer"
          dvp.dp_timeout = 1;
          int nevents = ioctl(kdpfd, DS_SIGTIMEDWAIT, &dvp);
          struct pollfd *result = map + dvp.result_offset;

          // use 'em.  Some might be aio completion notifications; 
          // some might be traditional poll notifications
          // (and if this is AIX, some might be sysv message queue notifications!)
          for (i=0; i<nevents; i++)
             if (result[i].revents & POLLAIO)
                  handle_aio_completion((struct aiocb *)result[i].fd);
             else 
                  handle_readiness(&result[i]);
      }

Davide, is that along the lines of what you were thinking of
for /dev/epoll and disk files?   (Plain old polling of disk
files doesn't make much sense unless you're just interested in
them growing, I suppose; aio completion notification is what you 
really want.)

- Dan

-- 
"I have seen the future, and it licks itself clean." -- Bucky Katt

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Could /dev/epoll deliver aio completion notifications? (was: Re: sigopen() vs. /dev/sigtimedwait)
  2001-08-04  3:04     ` Could /dev/epoll deliver aio completion notifications? (was: Re: sigopen() vs. /dev/sigtimedwait) Dan Kegel
@ 2001-08-04  5:18       ` Zach Brown
  2001-08-04  6:27         ` Dan Kegel
  2001-08-23 21:17       ` Could /dev/epoll deliver aio completion notifications? (was: Davide Libenzi
  1 sibling, 1 reply; 8+ messages in thread
From: Zach Brown @ 2001-08-04  5:18 UTC (permalink / raw)
  To: Dan Kegel; +Cc: Petru Paler, Christopher Smith, linux-kernel, Davide Libenzi

> On the other hand, if /dev/epoll were flexible enough that it could
> deliver AIO completion notifications, 

As far as I know, Ben LaHaise (bcrl@redhat.com) already has a fine
method conceived for receiving batches of async completion, including an
"async poll".  It should give the sort of behaviour you want and is also
useful for other AIO things, obviously :)

You should really chat with him.

- z

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Could /dev/epoll deliver aio completion notifications? (was: Re:  sigopen() vs. /dev/sigtimedwait)
  2001-08-04  5:18       ` Zach Brown
@ 2001-08-04  6:27         ` Dan Kegel
  0 siblings, 0 replies; 8+ messages in thread
From: Dan Kegel @ 2001-08-04  6:27 UTC (permalink / raw)
  To: linux-kernel, Davide Libenzi; +Cc: Christopher Smith, Zach Brown

Zach Brown wrote:
> 
> > On the other hand, if /dev/epoll were flexible enough that it could
> > deliver AIO completion notifications,
> 
> As far as I know, Ben LaHaise (bcrl@redhat.com) already has a fine
> method conceived for receiving batches of async completion, including an
> "async poll".  It should give the sort of behaviour you want and is also
> useful for other AIO things, obviously :)
> 
> You should really chat with him.

I suppose it shouldn't suprise me that real kernel hackers like Ben
just work on cool things quietly, and bufoons like me get excited
at the merest thought, and have to broadcast them on l-k.  Sigh.

A little digging finds a few references to Ben's aio work:

http://uwsg.iu.edu/hypermail/linux/kernel/0102.0/0384.html
http://www.kvack.org/~blah/aio/v2.4.5-ac9-bcrl4-aio.diff

His patch creates /dev/aio, among other things, and includes
the wonderful excerpt

diff -urN /md0/kernels/2.4/v2.4.5-ac9/include/asm-i386/errno.h aio-2.4.5-ac9/include/asm-i386/errno.h
--- /md0/kernels/2.4/v2.4.5-ac9/include/asm-i386/errno.h        Mon Feb 26 10:20:14 2001
+++ aio-2.4.5-ac9/include/asm-i386/errno.h      Wed Jun 13 17:08:50 2001
@@ -128,5 +128,6 @@
 
 #define        ENOMEDIUM       123     /* No medium found */
 #define        EMEDIUMTYPE     124     /* Wrong medium type */
+#define        ENOCLUE         125     /* userland programmer induced race condition */

:-)

Given how occupied Ben is with urgent matters like tracking down the
VM suckage, I don't expect he has much time for chatting about this stuff.

OK, guess I'll content myself with working through Ben's (and Davide's)
code and understanding it.  Sorry for all the line noise, folks.
- Dan

-- 
"I have seen the future, and it licks itself clean." -- Bucky Katt

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Could /dev/epoll deliver aio completion notifications? (was:
  2001-08-04  3:04     ` Could /dev/epoll deliver aio completion notifications? (was: Re: sigopen() vs. /dev/sigtimedwait) Dan Kegel
  2001-08-04  5:18       ` Zach Brown
@ 2001-08-23 21:17       ` Davide Libenzi
  1 sibling, 0 replies; 8+ messages in thread
From: Davide Libenzi @ 2001-08-23 21:17 UTC (permalink / raw)
  To: Dan Kegel
  Cc: Zach Brown, Zach Brown, linux-kernel, Christopher Smith, Petru Paler


On 04-Aug-2001 Dan Kegel wrote:
> Davide, is that along the lines of what you were thinking of
> for /dev/epoll and disk files?   (Plain old polling of disk
> files doesn't make much sense unless you're just interested in
> them growing, I suppose; aio completion notification is what you 
> really want.)

Dan, sorry for the response delay but I was in Vacation Mode ( new CPU execution mode
that will be included in the next x86 generation ).
I trashed my original idea to extend /dev/epoll to other other files coz this will make the
patch way more intrusive. The only easy extension that comes in my mind is pipes.
As soon as I'll finish to read the remaining 4231 messages in my mbox I'll fix /dev/epoll
to get rid of stale events that I ( and Erich Nahum ) noticed in the current implementation.



- Davide


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sigopen() vs. /dev/sigtimedwait
@ 2001-08-07 14:20 Erich Nahum
  0 siblings, 0 replies; 8+ messages in thread
From: Erich Nahum @ 2001-08-07 14:20 UTC (permalink / raw)
  To: linux-kernel


Abhishek Chandra and I are benchmarking /dev/epoll vs. RT signals
with signal-per-FD, and we wanted to chip in some thoughts along
these lines.

First of all, Davide Libenzi's /dev/epoll does not have the same
semantics as the original /dev/poll that Sun did.  Select, poll,
and the original /dev/poll are all state-based mechanisms, whereas 
/dev/epoll and RT signals are event-based mechanisms.  In the state-based
approach, the application can ask the kernel which file descriptors
are ready to read or write to.  In the event-based approach, the
kernel notifies the application when something changes.  This has
serious implications for how one develops the server;  in the 
event-based case, the server has to keep track of the state of the 
connections more carefully.  For more discussion of event-based vs. 
state-based, see the original Banga/Druschel/Mogul work via Dan 
Kegel's c10k page (http://www.kegel.com/c10k.html)

Both /dev/epoll and RT signals with sig-per-fd have the property that
the event queue never overflows, since events are coalesced on a per-fd
basis, assuming the user-space server isn't broken.  If the server 
underestimages the max number of file descriptors it can use, the
event queue can overflow in either scenario.

Event-based interfaces have some conditions that the server developer 
has to be aware of.  For example, when a server using writes to a 
socket for the first time, /dev/poll will tell you the socket is ready, 
whereas no event will show up on /dev/epoll, since the socket write 
state hasn't changed.  If you naively wait for a write event to happen 
(as we did before we realized this), you'll wait a long time.  

Some race conditions can also occur.  One is when the data arrives
on the socket after the accept but before the kernel is notifyied via 
/dev/epoll, thus never generating an event.  Another involves getting 
stray events after the fd is closed (soon to be fixed according to 
Davide Libenzi).  A third is when you have simultaneous reads and writes 
going on a socket, as happens with HTTP 1.1.

As far as we can tell, the /dev/epoll has the same semantics as the
RT signals with signal-per-fd.  The differences are in the interfaces,
which may have some performance implications.  For example, 
/dev/epoll can get batches of events through the read to /dev/epoll,
whereas RT signals get one signal at a time through sigtimedwait().
On the other hand, RT signals don't have to explicitly notify the kernel 
with each new or closed connection the way /dev/epoll does. Instead, 
it's done implicitly through the setsockopt/fcntl call to make the socket
asynchronous/non-blocking, which the server has to do anyway.
As I mentioned earlier, we're benchmarking these to see what the 
performance difference is, if any.  Davide Libenzi is also pursuing
this comparison.

So far Abhishek and I haven't looked at Ben LeHaises async I/O interface,
but it's on our schedule.

-Erich

-- 
Erich M. Nahum                  IBM T.J. Watson Research Center
Networking Research             P.O. Box 704
nahum@watson.ibm.com            Yorktown Heights NY 10598

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2001-08-23 21:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-08-04  1:32 sigopen() vs. /dev/sigtimedwait Dan Kegel
2001-08-04  1:38 ` Petru Paler
2001-08-04  2:10   ` Dan Kegel
2001-08-04  3:04     ` Could /dev/epoll deliver aio completion notifications? (was: Re: sigopen() vs. /dev/sigtimedwait) Dan Kegel
2001-08-04  5:18       ` Zach Brown
2001-08-04  6:27         ` Dan Kegel
2001-08-23 21:17       ` Could /dev/epoll deliver aio completion notifications? (was: Davide Libenzi
2001-08-07 14:20 sigopen() vs. /dev/sigtimedwait Erich Nahum

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).