linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* epoll vs stdin/stdout
@ 2003-07-07 15:48 Eric Varsanyi
  2003-07-07 18:57 ` Davide Libenzi
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Varsanyi @ 2003-07-07 15:48 UTC (permalink / raw)
  To: linux-kernel; +Cc: davidel

It seems reasonable to register for read events on stdin and write events
on stdout.  In an earlier posting on the epoll API it was asserted that 
anyone registering for events on 2 fd's that shared the same file * was 
asking for trouble.

I can imagine many apps that might want to proxy async traffic thru
stdin/stdout, what is the intended general solution for this with epoll?

FWIW in my app I'm just assuming that fd0 is a dup of fd1 if EPOLL_CTL_ADD 
on fd1 fails with EEXISTS, then I EPOLL_CTL_MOD on fd0 to add the write event.
This seems like a bit of a hack tho.

Example, 2nd ADD fails with EEXIST:

int
main(int ac, char **av)
{
        int pfd;
        struct epoll_event ev;

        pfd = epoll_create(10);
        ev.events = EPOLLIN | EPOLLET;
        ev.data.u32 = 0xdeadbee0;
        if (epoll_ctl(pfd, EPOLL_CTL_ADD, 0, &ev) < 0) {
                perror("EPOLL_CTL_ADD stdin");
                exit(1);
        }
        ev.events = EPOLLOUT | EPOLLET;
        ev.data.u32 = 0xdeadbee1;
        if (epoll_ctl(pfd, EPOLL_CTL_ADD, 1, &ev) < 0) {
                perror("EPOLL_CTL_ADD stdout");
                exit(1);
        }
}

-Eric Varsanyi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: epoll vs stdin/stdout
  2003-07-07 15:48 epoll vs stdin/stdout Eric Varsanyi
@ 2003-07-07 18:57 ` Davide Libenzi
  2003-07-07 19:47   ` Eric Varsanyi
  0 siblings, 1 reply; 23+ messages in thread
From: Davide Libenzi @ 2003-07-07 18:57 UTC (permalink / raw)
  To: Eric Varsanyi; +Cc: Linux Kernel Mailing List

On Mon, 7 Jul 2003, Eric Varsanyi wrote:

> It seems reasonable to register for read events on stdin and write events
> on stdout.  In an earlier posting on the epoll API it was asserted that
> anyone registering for events on 2 fd's that shared the same file * was
> asking for trouble.
>
> I can imagine many apps that might want to proxy async traffic thru
> stdin/stdout, what is the intended general solution for this with epoll?
>
> FWIW in my app I'm just assuming that fd0 is a dup of fd1 if EPOLL_CTL_ADD
> on fd1 fails with EEXISTS, then I EPOLL_CTL_MOD on fd0 to add the write event.
> This seems like a bit of a hack tho.

Events caught by epoll comes from a file* since that is the abstraction
the kernel handles. Events really happen on the file* and there's no way
if you dup()ing 1000 times a single fd, to say that events are for fd = 122.



- Davide


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: epoll vs stdin/stdout
  2003-07-07 18:57 ` Davide Libenzi
@ 2003-07-07 19:47   ` Eric Varsanyi
  2003-07-07 20:03     ` Jamie Lokier
  2003-07-07 22:12     ` Davide Libenzi
  0 siblings, 2 replies; 23+ messages in thread
From: Eric Varsanyi @ 2003-07-07 19:47 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Linux Kernel Mailing List

On Mon, Jul 07, 2003 at 11:57:02AM -0700, Davide Libenzi wrote:
> Events caught by epoll comes from a file* since that is the abstraction
> the kernel handles. Events really happen on the file* and there's no way
> if you dup()ing 1000 times a single fd, to say that events are for fd = 122.

It is useful/mildly common at the app level to want to get read events
for fd0 and write avail events for fd1. An app that might want to deal
with reads from stdin in one process and writes to stdout in another
(something like "team" perhaps) would have trouble here too.

Epoll's API/impl is great as it is IMO, not suggesting need for change, I was
hoping there was a good standard trick someone worked up to get around
this specifc end case of stdin/stdout usually being dups but sometimes
not. Porting my event system over to use epoll was easy/straightforward
except for this one minor hitch.

I considered:
	- using a second epoll object just for one of the fd's (to inspire
	  delivery of the event to 2 wait queues in the kernel); a little
	  ugly because of need to stack another epfd under the main one
	  just for stdout write events

	- select() on (0, 1, epfd) and just use epoll with a timeout of 0
	  when select fires to gather bulk of events; morally similar to 
	  previous but using select (which I want to just get away from)

	- make the app use stdin as its output (this is what I ended up doing);
	  breaks redirection of stdout but that doesn't matter to this app

Thanks,
-Eric Varsanyi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: epoll vs stdin/stdout
  2003-07-07 19:47   ` Eric Varsanyi
@ 2003-07-07 20:03     ` Jamie Lokier
  2003-07-07 20:18       ` Miquel van Smoorenburg
  2003-07-07 22:11       ` Davide Libenzi
  2003-07-07 22:12     ` Davide Libenzi
  1 sibling, 2 replies; 23+ messages in thread
From: Jamie Lokier @ 2003-07-07 20:03 UTC (permalink / raw)
  To: Eric Varsanyi; +Cc: Davide Libenzi, Linux Kernel Mailing List

Eric Varsanyi wrote:
> Epoll's API/impl is great as it is IMO, not suggesting need for change, I was
> hoping there was a good standard trick someone worked up to get around
> this specifc end case of stdin/stdout usually being dups but sometimes
> not. Porting my event system over to use epoll was easy/straightforward
> except for this one minor hitch.

Easy: if it's a read event, it's stdin; if it's a write event, it's stdout :)

You've raised an interesting problem.  It is easy to fix this in the
specific case of stdin/stdout, however what happens when your process
is passed a pair of fds from some other process (or more than one
process, using AF_UNIX), and told to read one and write the other?
What happens when you have 10 fds from different sources, some for
reading and some for writing (quite typical in a complex server)?

With the epoll API, your process has to know whether any paids or fds
correspond to the same file *, in order to decide whether to register
one interested in READ+WRITE or two interests separately.

Unfortunately I cannot think of a way for a process to know, in
general, whether two fds that it is passed correspond to the same file
*.  Well, apart from trying epoll on it and seeing what happens :/

Perhaps this indicates the epoll API _is_ flawed.  Epoll maintains
this state mapping:

	file * -> (event mask, event states)

when it ought to maintain this:

	(file *, event type) -> event state

In other words, perhaps epoll should be keeping registered interest in
read events and registered interest in write events completely
separate.

I suspect changing the API to do that wouldn't even break any of the
existing apps.

Davide, what do you think?

-- Jamie

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: epoll vs stdin/stdout
  2003-07-07 20:03     ` Jamie Lokier
@ 2003-07-07 20:18       ` Miquel van Smoorenburg
  2003-07-07 21:20         ` H. Peter Anvin
  2003-07-07 22:11       ` Davide Libenzi
  1 sibling, 1 reply; 23+ messages in thread
From: Miquel van Smoorenburg @ 2003-07-07 20:18 UTC (permalink / raw)
  To: linux-kernel

In article <20030707200315.GA10939@mail.jlokier.co.uk>,
Jamie Lokier  <jamie@shareable.org> wrote:
>Unfortunately I cannot think of a way for a process to know, in
>general, whether two fds that it is passed correspond to the same file
>*.  Well, apart from trying epoll on it and seeing what happens :/

fstat() and compare st_dev and st_ino

Mike.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: epoll vs stdin/stdout
  2003-07-07 20:18       ` Miquel van Smoorenburg
@ 2003-07-07 21:20         ` H. Peter Anvin
  0 siblings, 0 replies; 23+ messages in thread
From: H. Peter Anvin @ 2003-07-07 21:20 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <beckj5$e2a$1@news.cistron.nl>
By author:    "Miquel van Smoorenburg" <miquels@cistron.nl>
In newsgroup: linux.dev.kernel
>
> In article <20030707200315.GA10939@mail.jlokier.co.uk>,
> Jamie Lokier  <jamie@shareable.org> wrote:
> >Unfortunately I cannot think of a way for a process to know, in
> >general, whether two fds that it is passed correspond to the same file
> >*.  Well, apart from trying epoll on it and seeing what happens :/
> 
> fstat() and compare st_dev and st_ino
> 

That doesn't show if they share the same file *.  Two fd's can be open
to the same file or other object without sharing the same file *.

Seriously, this is not something userspace should have to worry
about.  The interface should handle it, otherwise it's broken.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: epoll vs stdin/stdout
  2003-07-07 20:03     ` Jamie Lokier
  2003-07-07 20:18       ` Miquel van Smoorenburg
@ 2003-07-07 22:11       ` Davide Libenzi
  2003-07-08  0:24         ` Jamie Lokier
  1 sibling, 1 reply; 23+ messages in thread
From: Davide Libenzi @ 2003-07-07 22:11 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Eric Varsanyi, Linux Kernel Mailing List

On Mon, 7 Jul 2003, Jamie Lokier wrote:

> Eric Varsanyi wrote:
> > Epoll's API/impl is great as it is IMO, not suggesting need for change, I was
> > hoping there was a good standard trick someone worked up to get around
> > this specifc end case of stdin/stdout usually being dups but sometimes
> > not. Porting my event system over to use epoll was easy/straightforward
> > except for this one minor hitch.
>
> Easy: if it's a read event, it's stdin; if it's a write event, it's stdout :)
>
> You've raised an interesting problem.  It is easy to fix this in the
> specific case of stdin/stdout, however what happens when your process
> is passed a pair of fds from some other process (or more than one
> process, using AF_UNIX), and told to read one and write the other?
> What happens when you have 10 fds from different sources, some for
> reading and some for writing (quite typical in a complex server)?
>
> With the epoll API, your process has to know whether any paids or fds
> correspond to the same file *, in order to decide whether to register
> one interested in READ+WRITE or two interests separately.
>
> Unfortunately I cannot think of a way for a process to know, in
> general, whether two fds that it is passed correspond to the same file
> *.  Well, apart from trying epoll on it and seeing what happens :/
>
> Perhaps this indicates the epoll API _is_ flawed.  Epoll maintains
> this state mapping:
>
> 	file * -> (event mask, event states)
>
> when it ought to maintain this:
>
> 	(file *, event type) -> event state
>
> In other words, perhaps epoll should be keeping registered interest in
> read events and registered interest in write events completely
> separate.

It has to keep (file*, fd) as hashing key. That will work out just fine.


> I suspect changing the API to do that wouldn't even break any of the
> existing apps.
>
> Davide, what do you think?

Not even thinking changing the API since it'll break existing apps. The
above trick will do it. Going to test it ...



- Davide


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: epoll vs stdin/stdout
  2003-07-07 19:47   ` Eric Varsanyi
  2003-07-07 20:03     ` Jamie Lokier
@ 2003-07-07 22:12     ` Davide Libenzi
  2003-07-07 23:26       ` Davide Libenzi
  1 sibling, 1 reply; 23+ messages in thread
From: Davide Libenzi @ 2003-07-07 22:12 UTC (permalink / raw)
  To: Eric Varsanyi; +Cc: Linux Kernel Mailing List

On Mon, 7 Jul 2003, Eric Varsanyi wrote:

> On Mon, Jul 07, 2003 at 11:57:02AM -0700, Davide Libenzi wrote:
> > Events caught by epoll comes from a file* since that is the abstraction
> > the kernel handles. Events really happen on the file* and there's no way
> > if you dup()ing 1000 times a single fd, to say that events are for fd = 122.
>
> It is useful/mildly common at the app level to want to get read events
> for fd0 and write avail events for fd1. An app that might want to deal
> with reads from stdin in one process and writes to stdout in another
> (something like "team" perhaps) would have trouble here too.
>
> Epoll's API/impl is great as it is IMO, not suggesting need for change, I was
> hoping there was a good standard trick someone worked up to get around
> this specifc end case of stdin/stdout usually being dups but sometimes
> not. Porting my event system over to use epoll was easy/straightforward
> except for this one minor hitch.
>
> I considered:
> 	- using a second epoll object just for one of the fd's (to inspire
> 	  delivery of the event to 2 wait queues in the kernel); a little
> 	  ugly because of need to stack another epfd under the main one
> 	  just for stdout write events
>
> 	- select() on (0, 1, epfd) and just use epoll with a timeout of 0
> 	  when select fires to gather bulk of events; morally similar to
> 	  previous but using select (which I want to just get away from)
>
> 	- make the app use stdin as its output (this is what I ended up doing);
> 	  breaks redirection of stdout but that doesn't matter to this app

Any of the above. Pls wait for an incoming patch ...



- Davide


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: epoll vs stdin/stdout
  2003-07-07 22:12     ` Davide Libenzi
@ 2003-07-07 23:26       ` Davide Libenzi
  2003-07-08  0:32         ` Jamie Lokier
  2003-07-08 15:46         ` Eric Varsanyi
  0 siblings, 2 replies; 23+ messages in thread
From: Davide Libenzi @ 2003-07-07 23:26 UTC (permalink / raw)
  To: Eric Varsanyi; +Cc: Linux Kernel Mailing List

On Mon, 7 Jul 2003, Davide Libenzi wrote:

> On Mon, 7 Jul 2003, Eric Varsanyi wrote:
>
> > On Mon, Jul 07, 2003 at 11:57:02AM -0700, Davide Libenzi wrote:
> > > Events caught by epoll comes from a file* since that is the abstraction
> > > the kernel handles. Events really happen on the file* and there's no way
> > > if you dup()ing 1000 times a single fd, to say that events are for fd = 122.
> >
> > It is useful/mildly common at the app level to want to get read events
> > for fd0 and write avail events for fd1. An app that might want to deal
> > with reads from stdin in one process and writes to stdout in another
> > (something like "team" perhaps) would have trouble here too.
> >
> > Epoll's API/impl is great as it is IMO, not suggesting need for change, I was
> > hoping there was a good standard trick someone worked up to get around
> > this specifc end case of stdin/stdout usually being dups but sometimes
> > not. Porting my event system over to use epoll was easy/straightforward
> > except for this one minor hitch.
> >
> > I considered:
> > 	- using a second epoll object just for one of the fd's (to inspire
> > 	  delivery of the event to 2 wait queues in the kernel); a little
> > 	  ugly because of need to stack another epfd under the main one
> > 	  just for stdout write events
> >
> > 	- select() on (0, 1, epfd) and just use epoll with a timeout of 0
> > 	  when select fires to gather bulk of events; morally similar to
> > 	  previous but using select (which I want to just get away from)
> >
> > 	- make the app use stdin as its output (this is what I ended up doing);
> > 	  breaks redirection of stdout but that doesn't matter to this app
>
> Any of the above. Pls wait for an incoming patch ...

Try out this one, either over 2.5.74 or over an existing epoll-patched
2.4.{20,21} ...



- Davide




diff -Nru linux-2.5.74.vanilla/fs/eventpoll.c linux-2.5.74.epoll/fs/eventpoll.c
--- linux-2.5.74.vanilla/fs/eventpoll.c	2003-07-07 15:22:37.000000000 -0700
+++ linux-2.5.74.epoll/fs/eventpoll.c	2003-07-07 15:40:44.000000000 -0700
@@ -245,6 +245,9 @@
 	/* The "container" of this item */
 	struct eventpoll *ep;

+	/* The file descriptor this item refers to */
+	int fd;
+
 	/* The file this item refers to */
 	struct file *file;

@@ -285,15 +288,17 @@
 static int ep_alloc_pages(char **pages, int numpages);
 static int ep_free_pages(char **pages, int numpages);
 static int ep_file_init(struct file *file, unsigned int hashbits);
-static unsigned int ep_hash_index(struct eventpoll *ep, struct file *file);
+static unsigned int ep_hash_index(struct eventpoll *ep, struct file *file, int fd);
 static struct list_head *ep_hash_entry(struct eventpoll *ep, unsigned int index);
 static int ep_init(struct eventpoll *ep, unsigned int hashbits);
 static void ep_free(struct eventpoll *ep);
-static struct epitem *ep_find(struct eventpoll *ep, struct file *file);
+static struct epitem *ep_find(struct eventpoll *ep, struct file *file, int fd);
 static void ep_use_epitem(struct epitem *epi);
 static void ep_release_epitem(struct epitem *epi);
-static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead, poll_table *pt);
-static int ep_insert(struct eventpoll *ep, struct epoll_event *event, struct file *tfile);
+static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead,
+				 poll_table *pt);
+static int ep_insert(struct eventpoll *ep, struct epoll_event *event,
+		     struct file *tfile, int fd);
 static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_event *event);
 static void ep_unregister_pollwait(struct eventpoll *ep, struct epitem *epi);
 static int ep_unlink(struct eventpoll *ep, struct epitem *epi);
@@ -580,7 +585,7 @@
 	down_write(&ep->sem);

 	/* Try to lookup the file inside our hash table */
-	epi = ep_find(ep, tfile);
+	epi = ep_find(ep, tfile, fd);

 	error = -EINVAL;
 	switch (op) {
@@ -588,7 +593,7 @@
 		if (!epi) {
 			epds.events |= POLLERR | POLLHUP;

-			error = ep_insert(ep, &epds, tfile);
+			error = ep_insert(ep, &epds, tfile, fd);
 		} else
 			error = -EEXIST;
 		break;
@@ -814,10 +819,11 @@
 /*
  * Calculate the index of the hash relative to "file".
  */
-static unsigned int ep_hash_index(struct eventpoll *ep, struct file *file)
+static unsigned int ep_hash_index(struct eventpoll *ep, struct file *file, int fd)
 {
+	unsigned long ptr = (unsigned long) file ^ (fd << ep->hashbits);

-	return (unsigned int) hash_ptr(file, ep->hashbits);
+	return (unsigned int) hash_ptr((void *) ptr, ep->hashbits);
 }


@@ -920,7 +926,7 @@
  * the returned item, so the caller must call ep_release_epitem()
  * after finished using the "struct epitem".
  */
-static struct epitem *ep_find(struct eventpoll *ep, struct file *file)
+static struct epitem *ep_find(struct eventpoll *ep, struct file *file, int fd)
 {
 	unsigned long flags;
 	struct list_head *lsthead, *lnk;
@@ -928,11 +934,11 @@

 	read_lock_irqsave(&ep->lock, flags);

-	lsthead = ep_hash_entry(ep, ep_hash_index(ep, file));
+	lsthead = ep_hash_entry(ep, ep_hash_index(ep, file, fd));
 	list_for_each(lnk, lsthead) {
 		epi = list_entry(lnk, struct epitem, llink);

-		if (epi->file == file) {
+		if (epi->file == file && epi->fd == fd) {
 			ep_use_epitem(epi);
 			break;
 		}
@@ -976,7 +982,8 @@
  * This is the callback that is used to add our wait queue to the
  * target file wakeup lists.
  */
-static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead, poll_table *pt)
+static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead,
+				 poll_table *pt)
 {
 	struct epitem *epi = EP_ITEM_FROM_EPQUEUE(pt);
 	struct eppoll_entry *pwq;
@@ -995,7 +1002,8 @@
 }


-static int ep_insert(struct eventpoll *ep, struct epoll_event *event, struct file *tfile)
+static int ep_insert(struct eventpoll *ep, struct epoll_event *event,
+		     struct file *tfile, int fd)
 {
 	int error, revents, pwake = 0;
 	unsigned long flags;
@@ -1014,6 +1022,7 @@
 	INIT_LIST_HEAD(&epi->pwqlist);
 	epi->ep = ep;
 	epi->file = tfile;
+	epi->fd = fd;
 	epi->event = *event;
 	atomic_set(&epi->usecnt, 1);
 	epi->nwait = 0;
@@ -1046,7 +1055,7 @@
 	write_lock_irqsave(&ep->lock, flags);

 	/* Add the current item to the hash table */
-	list_add(&epi->llink, ep_hash_entry(ep, ep_hash_index(ep, tfile)));
+	list_add(&epi->llink, ep_hash_entry(ep, ep_hash_index(ep, tfile, fd)));

 	/* If the file is already "ready" we drop it inside the ready list */
 	if ((revents & event->events) && !EP_IS_LINKED(&epi->rdllink)) {
@@ -1065,8 +1074,8 @@
 	if (pwake)
 		ep_poll_safewake(&psw, &ep->poll_wait);

-	DNPRINTK(3, (KERN_INFO "[%p] eventpoll: ep_insert(%p, %p)\n",
-		     current, ep, tfile));
+	DNPRINTK(3, (KERN_INFO "[%p] eventpoll: ep_insert(%p, %p, %d)\n",
+		     current, ep, tfile, fd));

 	return 0;


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: epoll vs stdin/stdout
  2003-07-08  0:24         ` Jamie Lokier
@ 2003-07-08  0:23           ` Davide Libenzi
  0 siblings, 0 replies; 23+ messages in thread
From: Davide Libenzi @ 2003-07-08  0:23 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Eric Varsanyi, Linux Kernel Mailing List

On Tue, 8 Jul 2003, Jamie Lokier wrote:

> Davide Libenzi wrote:
> > It has to keep (file*, fd) as hashing key. That will work out just fine.
>
> Do you mean epoll has to use (file*,fd) as the hash key?

Basically it wasn't a limit of the architecture itself. It was an
artificial constraint added. I'll post the patch to Andrew->Linus tonight.



- Davide


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: epoll vs stdin/stdout
  2003-07-07 22:11       ` Davide Libenzi
@ 2003-07-08  0:24         ` Jamie Lokier
  2003-07-08  0:23           ` Davide Libenzi
  0 siblings, 1 reply; 23+ messages in thread
From: Jamie Lokier @ 2003-07-08  0:24 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Eric Varsanyi, Linux Kernel Mailing List

Davide Libenzi wrote:
> It has to keep (file*, fd) as hashing key. That will work out just fine.

Do you mean epoll has to use (file*,fd) as the hash key?

> Not even thinking changing the API since it'll break existing apps.

Oh, you're right.  I forgot that apps wait for both read & write on
the same network fd... duh! :)

> The above trick will do it. Going to test it ...

Good-oh.

-- Jamie

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: epoll vs stdin/stdout
  2003-07-08  0:32         ` Jamie Lokier
@ 2003-07-08  0:32           ` Davide Libenzi
  2003-07-08  0:52             ` Jamie Lokier
  0 siblings, 1 reply; 23+ messages in thread
From: Davide Libenzi @ 2003-07-08  0:32 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Eric Varsanyi, Linux Kernel Mailing List

On Tue, 8 Jul 2003, Jamie Lokier wrote:

> Davide Libenzi wrote:
> > Try out this one, either over 2.5.74 or over an existing epoll-patched
> > 2.4.{20,21} ...
>
> Sorry, can't try it out.
> But I have a question anyway :)
>
> Does this correctly free everything when you:
>
> 	declare interest in some events on fd 3
> 	dup2(3,4)
> 	close(3)
> ?

Neither the old and the new code does it. Cleanup is done either with an
EPOLL_CTL_DEL (explicitly) or with the file* removal (__fput()). In you
case when you close(4) if you do not do it explicitly.



- Davide


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: epoll vs stdin/stdout
  2003-07-07 23:26       ` Davide Libenzi
@ 2003-07-08  0:32         ` Jamie Lokier
  2003-07-08  0:32           ` Davide Libenzi
  2003-07-08 15:46         ` Eric Varsanyi
  1 sibling, 1 reply; 23+ messages in thread
From: Jamie Lokier @ 2003-07-08  0:32 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Eric Varsanyi, Linux Kernel Mailing List

Davide Libenzi wrote:
> Try out this one, either over 2.5.74 or over an existing epoll-patched
> 2.4.{20,21} ...

Sorry, can't try it out.
But I have a question anyway :)

Does this correctly free everything when you:

	declare interest in some events on fd 3
	dup2(3,4)
	close(3)
?

-- Jamie

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: epoll vs stdin/stdout
  2003-07-08  0:32           ` Davide Libenzi
@ 2003-07-08  0:52             ` Jamie Lokier
  2003-07-08  1:13               ` Davide Libenzi
  0 siblings, 1 reply; 23+ messages in thread
From: Jamie Lokier @ 2003-07-08  0:52 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Eric Varsanyi, Linux Kernel Mailing List

Davide Libenzi wrote:
> > Does this correctly free everything when you:
> >
> > 	declare interest in some events on fd 3
> > 	dup2(3,4)
> > 	close(3)
> > ?
> 
> Neither the old and the new code does it. Cleanup is done either with an
> EPOLL_CTL_DEL (explicitly) or with the file* removal (__fput()). In you
> case when you close(4) if you do not do it explicitly.

The old code didn't need to do it, because the events were registered
for all the fd values pointing to a single file *.  That's cool -
that's exactly what anyone would expect.  The point of dup2() is that
the fds are equivalent, and share state (such as seek pointer).

Now you have two strange (IMHO unclean) behaviours, that an
application programmer needs to be aware of:

   1. Duplicate fds don't share registered event state.

Theoretically this could break an existing app, but I doubt if any
depend on this behaviour at present.  This is not likely to confuse anyone, as long as it is documented.

   2. When process A sends an fd to process B, the events will appear
      in process B _iff_ the fd number in B happens to be the same
      value as in process A.  (Without your patch, the events will always
      appear in process B).

      Furthermore, when process B dups the fd or passes it elsewhere,
      events will appear if the new fd happens to be the same as the
      original fd number in A.

      The only correct application code in this case is to use
      EPOLL_CTL_DEL in A and EPOLL_CTL_ADD in B, although it is
      confusing because you'll have programs which _sometimes_ work
      without that.

Oh, and:

   3. It's almost a memory leak.  Not quite because it's cleaned up
      eventually.  But it looks and feels just like one.

I guess what I'm saying is that hashing on fd number is quite simply
wrong.  The fundamental object is the file *, that's how its meant to be.
:)

-- Jamie

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: epoll vs stdin/stdout
  2003-07-08  0:52             ` Jamie Lokier
@ 2003-07-08  1:13               ` Davide Libenzi
  2003-07-08 12:34                 ` Jamie Lokier
  0 siblings, 1 reply; 23+ messages in thread
From: Davide Libenzi @ 2003-07-08  1:13 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Eric Varsanyi, Linux Kernel Mailing List

On Tue, 8 Jul 2003, Jamie Lokier wrote:

> The old code didn't need to do it, because the events were registered
> for all the fd values pointing to a single file *.  That's cool -
> that's exactly what anyone would expect.  The point of dup2() is that
> the fds are equivalent, and share state (such as seek pointer).
>
> Now you have two strange (IMHO unclean) behaviours, that an
> application programmer needs to be aware of:
>
>    1. Duplicate fds don't share registered event state.

That's no unclean. It is the purpose of the patch. You can do now :

	/* Remotely */
	dup2(3, 4);
	...
	evt.data.ptr = mydata0;
	evt.event = EPOLLIN | EPOLLET;
	epoll_ctl(epfd, EPOLL_CTL_ADD, 3, &evt);
	evt.data.ptr = mydata1;
	evt.event = EPOLLOUT | EPOLLET;
	epoll_ctl(epfd, EPOLL_CTL_ADD, 4, &evt);

And receive two different events for "mydata0" and "mydata1". That's what
you really link to.



>    2. When process A sends an fd to process B, the events will appear
>       in process B _iff_ the fd number in B happens to be the same
>       value as in process A.  (Without your patch, the events will always
>       appear in process B).
>
>       Furthermore, when process B dups the fd or passes it elsewhere,
>       events will appear if the new fd happens to be the same as the
>       original fd number in A.
>
>       The only correct application code in this case is to use
>       EPOLL_CTL_DEL in A and EPOLL_CTL_ADD in B, although it is
>       confusing because you'll have programs which _sometimes_ work
>       without that.

This is false. Internally it's the file* that is the sink for events. So
they will receive events as long as the originator file* is alive.



> Oh, and:
>
>    3. It's almost a memory leak.  Not quite because it's cleaned up
>       eventually.  But it looks and feels just like one.

This is false too. When the last user (file count) of the underneath file*
will drop it, it will be cleaned. This is not a memory leak in books where
I studied. Is like saying that :

int main() {
	int i;
	for (i = 0; i < 1000; i++)
		open(...);
	for (;;)
		sleep(1);
	return 0;
}

is a memory leak ;)



> I guess what I'm saying is that hashing on fd number is quite simply
> wrong.  The fundamental object is the file *, that's how its meant to be.

The architecture is all based on the file*, it is there that events shows
up. The (file*, fd) key is a constraint.



- Davide


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: epoll vs stdin/stdout
  2003-07-08  1:13               ` Davide Libenzi
@ 2003-07-08 12:34                 ` Jamie Lokier
  2003-07-08 13:51                   ` Jamie Lokier
  0 siblings, 1 reply; 23+ messages in thread
From: Jamie Lokier @ 2003-07-08 12:34 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Eric Varsanyi, Linux Kernel Mailing List

Davide Libenzi wrote:
> >    2. When process A sends an fd to process B, the events will appear
> >       in process B _iff_ the fd number in B happens to be the same
> >       value as in process A.  (Without your patch, the events will always
> >       appear in process B).
> >
> >       Furthermore, when process B dups the fd or passes it elsewhere,
> >       events will appear if the new fd happens to be the same as the
> >       original fd number in A.
> >
> >       The only correct application code in this case is to use
> >       EPOLL_CTL_DEL in A and EPOLL_CTL_ADD in B, although it is
> >       confusing because you'll have programs which _sometimes_ work
> >       without that.
> 
> This is false.

Actually it's true :)

> Is like saying that :
> [...]
> is a memory leak ;)

Well, yeah :)

You'll have to document this loud and clear: anyone who closes, dup2s
over or passed an epoll-activated file descriptor _must_ use
EPOLL_CTL_DEL first, otherwise heisenbugs may eventually follow.

> > I guess what I'm saying is that hashing on fd number is quite simply
> > wrong.  The fundamental object is the file *, that's how its meant to be.
> 
> The architecture is all based on the file*, it is there that events shows
> up. The (file*, fd) key is a constraint.

Unfortunately I just thought of a real problem :(

What happens when process A sends (or forks) a dup of its fd 3 to
process B which also has it as fd 3, and both processes use epoll on
it?  (This is a fairly common scenario, to have one process/thread
reading and the other writing a tty or socket).  The two processes
will clash because they have the same (file *, fd) pair, yet there is
no way for them to know they are clashing.

With current epoll code this is a problem already, because keying on
(file *) alone is not enough.  Unfortunately (file *,fd) key doesn't
fix this one.

-- Jamie

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: epoll vs stdin/stdout
  2003-07-08 12:34                 ` Jamie Lokier
@ 2003-07-08 13:51                   ` Jamie Lokier
  2003-07-08 15:20                     ` Davide Libenzi
  0 siblings, 1 reply; 23+ messages in thread
From: Jamie Lokier @ 2003-07-08 13:51 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Eric Varsanyi, Linux Kernel Mailing List

Doh!  I'm sorry.  I forgot that the lookup key is actually (epoll_fd,
file *, fd).

So all I said in the parent mail about problems with fds shared among
multiple processes is nonsense - they will each have a different
epoll_fd, so maintain separate epoll state.

Remember not to take me seriously in future.
(Oh, you weren't... :)

-- Jamie (blushing)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: epoll vs stdin/stdout
  2003-07-08 13:51                   ` Jamie Lokier
@ 2003-07-08 15:20                     ` Davide Libenzi
  0 siblings, 0 replies; 23+ messages in thread
From: Davide Libenzi @ 2003-07-08 15:20 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Eric Varsanyi, Linux Kernel Mailing List

On Tue, 8 Jul 2003, Jamie Lokier wrote:

> Doh!  I'm sorry.  I forgot that the lookup key is actually (epoll_fd,
> file *, fd).
>
> So all I said in the parent mail about problems with fds shared among
> multiple processes is nonsense - they will each have a different
> epoll_fd, so maintain separate epoll state.
>
> Remember not to take me seriously in future.
> (Oh, you weren't... :)

I was indeed wondering what stuff you were smoking :)



- Davide


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: epoll vs stdin/stdout
  2003-07-08 15:46         ` Eric Varsanyi
@ 2003-07-08 15:42           ` Davide Libenzi
  2003-07-08 16:02             ` Eric Varsanyi
  0 siblings, 1 reply; 23+ messages in thread
From: Davide Libenzi @ 2003-07-08 15:42 UTC (permalink / raw)
  To: Eric Varsanyi; +Cc: Linux Kernel Mailing List

On Tue, 8 Jul 2003, Eric Varsanyi wrote:

> On Mon, Jul 07, 2003 at 04:26:21PM -0700, Davide Libenzi wrote:
> > On Mon, 7 Jul 2003, Davide Libenzi wrote:
> > Try out this one, either over 2.5.74 or over an existing epoll-patched
> > 2.4.{20,21} ...
>
> This appears to be working as advertised, thanks!
>
> IMO it doesn't seem that evil to deliver events per-fd rather than
> per-file, this is similar to the semantic you get from select() on
> fd's sharing an object. To be surprised someone would have to have
> coded to the (previous) sharing visible nature of epoll and be expecting
> the EEXIST back when sharing was in play.

It is not that events are delivered per-fd. If 3 and 4 refer to the same
file* and you register both 3 and 4 with EPOLLIN, you'll get two events if
an EPOLLIN happen. One for 3 and one for 4.



- Davide


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: epoll vs stdin/stdout
  2003-07-07 23:26       ` Davide Libenzi
  2003-07-08  0:32         ` Jamie Lokier
@ 2003-07-08 15:46         ` Eric Varsanyi
  2003-07-08 15:42           ` Davide Libenzi
  1 sibling, 1 reply; 23+ messages in thread
From: Eric Varsanyi @ 2003-07-08 15:46 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Linux Kernel Mailing List

On Mon, Jul 07, 2003 at 04:26:21PM -0700, Davide Libenzi wrote:
> On Mon, 7 Jul 2003, Davide Libenzi wrote:
> Try out this one, either over 2.5.74 or over an existing epoll-patched
> 2.4.{20,21} ...

This appears to be working as advertised, thanks! 

IMO it doesn't seem that evil to deliver events per-fd rather than
per-file, this is similar to the semantic you get from select() on
fd's sharing an object. To be surprised someone would have to have
coded to the (previous) sharing visible nature of epoll and be expecting 
the EEXIST back when sharing was in play.

-Eric Varsanyi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: epoll vs stdin/stdout
  2003-07-08 15:42           ` Davide Libenzi
@ 2003-07-08 16:02             ` Eric Varsanyi
  2003-07-08 17:06               ` Davide Libenzi
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Varsanyi @ 2003-07-08 16:02 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Linux Kernel Mailing List

On Tue, Jul 08, 2003 at 08:42:29AM -0700, Davide Libenzi wrote:
> It is not that events are delivered per-fd. If 3 and 4 refer to the same
> file* and you register both 3 and 4 with EPOLLIN, you'll get two events if
> an EPOLLIN happen. One for 3 and one for 4.

Agreed 100%, this is roughly what would happen with select() as well which      
IMO is good (not surprising behaviour) for event loop writers: it would         
return with both bits set. The EEXIST we were getting before this patch         
would be analogous to select() returning an error if you set 2 bits that        
where for fd's sharing an object (even across read/write bit vectors).          
                                                                                
One could argue at the logic of having 2 fd's get read events on a              
shared underlying object, but one read and the other write certainly            
makes sense as discussed earlier.                                               
                                                                                
Thanks again,                                                                   
-Eric Varsanyi                                                                  

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: epoll vs stdin/stdout
  2003-07-08 16:02             ` Eric Varsanyi
@ 2003-07-08 17:06               ` Davide Libenzi
  2003-07-08 18:40                 ` Eric Varsanyi
  0 siblings, 1 reply; 23+ messages in thread
From: Davide Libenzi @ 2003-07-08 17:06 UTC (permalink / raw)
  To: Eric Varsanyi; +Cc: Linux Kernel Mailing List

On Tue, 8 Jul 2003, Eric Varsanyi wrote:

> On Tue, Jul 08, 2003 at 08:42:29AM -0700, Davide Libenzi wrote:
> > It is not that events are delivered per-fd. If 3 and 4 refer to the same
> > file* and you register both 3 and 4 with EPOLLIN, you'll get two events if
> > an EPOLLIN happen. One for 3 and one for 4.
>
> Agreed 100%, this is roughly what would happen with select() as well which
> IMO is good (not surprising behaviour) for event loop writers: it would
> return with both bits set. The EEXIST we were getting before this patch
> would be analogous to select() returning an error if you set 2 bits that
> where for fd's sharing an object (even across read/write bit vectors).
>
> One could argue at the logic of having 2 fd's get read events on a
> shared underlying object, but one read and the other write certainly
> makes sense as discussed earlier.

I did not have the time to test the patch in your scenario, but if you can
confirm me it is working fine I'll push it.


- Davide


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: epoll vs stdin/stdout
  2003-07-08 17:06               ` Davide Libenzi
@ 2003-07-08 18:40                 ` Eric Varsanyi
  0 siblings, 0 replies; 23+ messages in thread
From: Eric Varsanyi @ 2003-07-08 18:40 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Linux Kernel Mailing List

> I did not have the time to test the patch in your scenario, but if you can
> confirm me it is working fine I'll push it.

Tested this morning, works fine for me. I haven't run any complex loads on
my new event loop, so I can't say it is exhaustively tested, but it passes
basic unit tests.

-Eric

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2003-07-08 18:25 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-07-07 15:48 epoll vs stdin/stdout Eric Varsanyi
2003-07-07 18:57 ` Davide Libenzi
2003-07-07 19:47   ` Eric Varsanyi
2003-07-07 20:03     ` Jamie Lokier
2003-07-07 20:18       ` Miquel van Smoorenburg
2003-07-07 21:20         ` H. Peter Anvin
2003-07-07 22:11       ` Davide Libenzi
2003-07-08  0:24         ` Jamie Lokier
2003-07-08  0:23           ` Davide Libenzi
2003-07-07 22:12     ` Davide Libenzi
2003-07-07 23:26       ` Davide Libenzi
2003-07-08  0:32         ` Jamie Lokier
2003-07-08  0:32           ` Davide Libenzi
2003-07-08  0:52             ` Jamie Lokier
2003-07-08  1:13               ` Davide Libenzi
2003-07-08 12:34                 ` Jamie Lokier
2003-07-08 13:51                   ` Jamie Lokier
2003-07-08 15:20                     ` Davide Libenzi
2003-07-08 15:46         ` Eric Varsanyi
2003-07-08 15:42           ` Davide Libenzi
2003-07-08 16:02             ` Eric Varsanyi
2003-07-08 17:06               ` Davide Libenzi
2003-07-08 18:40                 ` Eric Varsanyi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).