linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* epoll gives broken results when interrupted with a signal
@ 2003-10-29 15:00 Ben Mansell
  2003-10-29 17:26 ` Davide Libenzi
  0 siblings, 1 reply; 9+ messages in thread
From: Ben Mansell @ 2003-10-29 15:00 UTC (permalink / raw)
  To: linux-kernel

I'm using the epoll system interface on a 2.6.0-test9 kernel, but I hit
a problem if the process calling epoll_wait() gets interrupted. The
epoll_wait() returns with several events, but the last event of which
contains junk (e.g. typically reports that a file descriptor like
-91534560 received an event)

The epoll is being used to monitor only a handful of file descriptors.
Some of these however are TCP network sockets that were bound to a port
by a parent process, and then passed on to the process doing the epoll.
Another file descriptor is that of a socket connected to the parent
process. The epoll failure is brought about when the parent process
tries to kill off the child with a SIGTERM. The parent then exits.

The final (interrupted) epoll returns two events - the first is that of
the socket to the dead parent, receiving EPOLLIN | EPOLLHUP, which seems
reasonable. The next event is then random garbage. Perhaps epoll is just
returning one too many results?

Unfortunately, I've no simple test case code to demonstrate this -
but if there's any other information I can provide to help track this
down then let me know.


(Please CC: me on replies to the list)

Cheers,
Ben

-- 
Ben Mansell, <ben@zeus.com>                       Zeus Technology Ltd
Download the world's fastest webserver!   Universally Serving the Net
T:+44(0)1223 525000 F:+44(0)1223 525100           http://www.zeus.com
Zeus House, Cowley Road, Cambridge, CB4 0ZT, ENGLAND

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: epoll gives broken results when interrupted with a signal
  2003-10-29 15:00 epoll gives broken results when interrupted with a signal Ben Mansell
@ 2003-10-29 17:26 ` Davide Libenzi
  2003-10-29 17:38   ` Ben Mansell
  0 siblings, 1 reply; 9+ messages in thread
From: Davide Libenzi @ 2003-10-29 17:26 UTC (permalink / raw)
  To: Ben Mansell; +Cc: linux-kernel

On Wed, 29 Oct 2003, Ben Mansell wrote:

> I'm using the epoll system interface on a 2.6.0-test9 kernel, but I hit
> a problem if the process calling epoll_wait() gets interrupted. The
> epoll_wait() returns with several events, but the last event of which
> contains junk (e.g. typically reports that a file descriptor like
> -91534560 received an event)
>
> The epoll is being used to monitor only a handful of file descriptors.
> Some of these however are TCP network sockets that were bound to a port
> by a parent process, and then passed on to the process doing the epoll.
> Another file descriptor is that of a socket connected to the parent
> process. The epoll failure is brought about when the parent process
> tries to kill off the child with a SIGTERM. The parent then exits.
>
> The final (interrupted) epoll returns two events - the first is that of
> the socket to the dead parent, receiving EPOLLIN | EPOLLHUP, which seems
> reasonable. The next event is then random garbage. Perhaps epoll is just
> returning one too many results?

Is it an UP or an SMP machine? The descriptor is passed how? fork? If I'll
send you a debug patch for epoll will you be able to run it?



- Davide


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: epoll gives broken results when interrupted with a signal
  2003-10-29 17:26 ` Davide Libenzi
@ 2003-10-29 17:38   ` Ben Mansell
  2003-10-29 19:23     ` Davide Libenzi
  0 siblings, 1 reply; 9+ messages in thread
From: Ben Mansell @ 2003-10-29 17:38 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: linux-kernel

On Wed, 29 Oct 2003, Davide Libenzi wrote:

> On Wed, 29 Oct 2003, Ben Mansell wrote:
>
> > I'm using the epoll system interface on a 2.6.0-test9 kernel, but I hit
> > a problem if the process calling epoll_wait() gets interrupted. The
> > epoll_wait() returns with several events, but the last event of which
> > contains junk (e.g. typically reports that a file descriptor like
> > -91534560 received an event)
> >
> > The epoll is being used to monitor only a handful of file descriptors.
> > Some of these however are TCP network sockets that were bound to a port
> > by a parent process, and then passed on to the process doing the epoll.
> > Another file descriptor is that of a socket connected to the parent
> > process. The epoll failure is brought about when the parent process
> > tries to kill off the child with a SIGTERM. The parent then exits.
> >
> > The final (interrupted) epoll returns two events - the first is that of
> > the socket to the dead parent, receiving EPOLLIN | EPOLLHUP, which seems
> > reasonable. The next event is then random garbage. Perhaps epoll is just
> > returning one too many results?
>
> Is it an UP or an SMP machine? The descriptor is passed how? fork? If I'll
> send you a debug patch for epoll will you be able to run it?

UP machine - x86_64 however, if that makes any difference (sorry, my
initial message didn't contain much information, as I wasn't sure what
was relevant and what was not). The descriptor is passed over the unix
domain socket between the two processes.

I'm willing to try any debug patch that might help track down the
problem.


Thanks,
Ben


-- 
Ben Mansell, <ben@zeus.com>                       Zeus Technology Ltd
Download the world's fastest webserver!   Universally Serving the Net
T:+44(0)1223 525000 F:+44(0)1223 525100           http://www.zeus.com
Zeus House, Cowley Road, Cambridge, CB4 0ZT, ENGLAND

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: epoll gives broken results when interrupted with a signal
  2003-10-29 17:38   ` Ben Mansell
@ 2003-10-29 19:23     ` Davide Libenzi
  2003-10-30 11:22       ` Ben Mansell
  0 siblings, 1 reply; 9+ messages in thread
From: Davide Libenzi @ 2003-10-29 19:23 UTC (permalink / raw)
  To: Ben Mansell; +Cc: Linux Kernel Mailing List

On Wed, 29 Oct 2003, Ben Mansell wrote:

> UP machine - x86_64 however, if that makes any difference (sorry, my
> initial message didn't contain much information, as I wasn't sure what
> was relevant and what was not). The descriptor is passed over the unix
> domain socket between the two processes.
>
> I'm willing to try any debug patch that might help track down the
> problem.

Can you try the patch below and show me a dmesg when this happen?
Also, it shouldn't change a whit, but are you able to try on a x86 UP?



- Davide



diff -Nru linux-2.6.0-test9.vanilla/fs/eventpoll.c linux-2.6.0-test9.mod/fs/eventpoll.c
--- linux-2.6.0-test9.vanilla/fs/eventpoll.c	2003-10-25 11:43:26.000000000 -0700
+++ linux-2.6.0-test9.mod/fs/eventpoll.c	2003-10-29 11:21:27.849490320 -0800
@@ -75,7 +75,7 @@

 #define EVENTPOLLFS_MAGIC 0x03111965 /* My birthday should work for this :) */

-#define DEBUG_EPOLL 0
+#define DEBUG_EPOLL 1

 #if DEBUG_EPOLL > 0
 #define DPRINTK(x) printk x
@@ -462,6 +462,9 @@
 	struct eventpoll *ep;
 	struct epitem *epi;

+	DNPRINTK(3, (KERN_INFO "[%p] eventpoll: eventpoll_release_file(%p)\n",
+		     current, file));
+
 	/*
 	 * We don't want to get "file->f_ep_lock" because it is not
 	 * necessary. It is not necessary because we're in the "struct file"
@@ -478,6 +481,10 @@

 		ep = epi->ep;
 		EP_LIST_DEL(&epi->fllink);
+
+		DNPRINTK(3, (KERN_INFO "[%p] eventpoll: remove ep=%p epi=%p\n",
+			     current, ep, epi));
+
 		down_write(&ep->sem);
 		ep_remove(ep, epi);
 		up_write(&ep->sem);
@@ -1432,6 +1439,9 @@
 	list_for_each(lnk, txlist) {
 		epi = list_entry(lnk, struct epitem, txlink);

+		DNPRINTK(3, (KERN_INFO "[%p] eventpoll: polling file=%p ep=%p epi=%p\n",
+			     current, epi->file, ep, epi));
+
 		/*
 		 * Get the ready file event set. We can safely use the file
 		 * because we are holding the "sem" in read and this will
@@ -1447,6 +1457,9 @@
 		epi->revents = revents & epi->event.events;

 		if (epi->revents) {
+			DNPRINTK(3, (KERN_INFO "[%p] eventpoll: pollres file=%p ep=%p epi=%p events=%u\n",
+				     current, epi->file, ep, epi, epi->revents));
+
 			event[eventbuf] = epi->event;
 			event[eventbuf].events &= revents;
 			eventbuf++;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: epoll gives broken results when interrupted with a signal
  2003-10-29 19:23     ` Davide Libenzi
@ 2003-10-30 11:22       ` Ben Mansell
  2003-10-30 14:48         ` Ben Mansell
  0 siblings, 1 reply; 9+ messages in thread
From: Ben Mansell @ 2003-10-30 11:22 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Linux Kernel Mailing List

On Wed, 29 Oct 2003, Davide Libenzi wrote:

> Can you try the patch below and show me a dmesg when this happen?

Ok, patch applied. (I changed DEBUG_EPOLL to 10 however, otherwise
nothing would be printed). Now, epoll appears to behave perfectly and I
can't re-create the problem :(
Maybe it is a race of some kind and the printks change the behaviour?
The bug still occured the first time I tried the patch with DEBUG_EPOLL
1.

> Also, it shouldn't change a whit, but are you able to try on a x86 UP?

I could set one up if needed; we have a x86 build machine running
2.6.0-test3 here which I can upgrade (epoll seems fine on that btw)

Here's the dmesg output anyway:


[0000010009d462c0] eventpoll: sys_epoll_create(10)
[0000010009d462c0] eventpoll: ep_file_init() ep=000001000475e000
[0000010009d462c0] eventpoll: sys_epoll_create(10) = 4
[0000010009d462c0] eventpoll: close() ep=000001000475e000
[0000010009d462c0] eventpoll: sys_epoll_create(256)
[0000010009d462c0] eventpoll: ep_file_init() ep=000001000475e000
[0000010009d462c0] eventpoll: sys_epoll_create(256) = 4
[0000010009d462c0] eventpoll: sys_epoll_ctl(4, 1, 5, 0000007fbffff070)
[0000010009d462c0] eventpoll: ep_find(00000100059c9580) -> 0000000000000000
[0000010009d462c0] eventpoll: ep_insert(000001000475e000, 00000100059c9580, 5)
[0000010009d462c0] eventpoll: sys_epoll_ctl(4, 1, 5, 0000007fbffff070) = 0
[0000010009d462c0] eventpoll: sys_epoll_ctl(4, 1, 6, 0000007fbffff290)
[0000010009d462c0] eventpoll: ep_find(000001000cc3a980) -> 0000000000000000
[0000010009d462c0] eventpoll: ep_insert(000001000475e000, 000001000cc3a980, 6)
[0000010009d462c0] eventpoll: sys_epoll_ctl(4, 1, 6, 0000007fbffff290) = 0
[0000010009d46b50] eventpoll: sys_epoll_create(1024)
[0000010009d46b50] eventpoll: ep_file_init() ep=000001000ccf2000
[0000010009d46b50] eventpoll: sys_epoll_create(1024) = 3
[0000010009d46b50] eventpoll: sys_epoll_ctl(3, 1, 7, 0000007fbffff1c0)
[0000010009d46b50] eventpoll: ep_find(0000010005e2c180) -> 0000000000000000
[0000010009d46b50] eventpoll: ep_insert(000001000ccf2000, 0000010005e2c180, 7)
[0000010009d46b50] eventpoll: sys_epoll_ctl(3, 1, 7, 0000007fbffff1c0) = 0
[0000010009d46b50] eventpoll: sys_epoll_ctl(3, 3, 7, 000000000070fcc0)
[0000010009d46b50] eventpoll: ep_find(0000010005e2c180) -> 00000100091cde00
[0000010009d46b50] eventpoll: sys_epoll_ctl(3, 3, 7, 000000000070fcc0) = 0
[0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000)
[0000010009d462c0] eventpoll: poll_callback(0000010005e2c180) epi=00000100091cde00 ep=000001000ccf2000
[0000010009d46b50] eventpoll: polling file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00
[0000010009d46b50] eventpoll: pollres file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00 events=1
[0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000) = 1
[0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000)
[0000010009d46b50] eventpoll: polling file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00
[0000010009d462c0] eventpoll: poll_callback(0000010005e2c180) epi=00000100091cde00 ep=000001000ccf2000
[0000010009d462c0] eventpoll: sys_epoll_ctl(4, 3, 6, 000000000070fcc0)
[0000010009d462c0] eventpoll: ep_find(000001000cc3a980) -> 00000100091cdec0
[0000010009d462c0] eventpoll: sys_epoll_ctl(4, 3, 6, 000000000070fcc0) = 0
[0000010009d462c0] eventpoll: sys_epoll_ctl(4, 3, 5, 000000000070fcc0)
[0000010009d462c0] eventpoll: ep_find(00000100059c9580) -> 00000100091cd980
[0000010009d462c0] eventpoll: sys_epoll_ctl(4, 3, 5, 000000000070fcc0) = 0
[0000010009d462c0] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 0)
[0000010009d462c0] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 0) = 0
[0000010009d462c0] eventpoll: poll_callback(0000010005e2c180) epi=00000100091cde00 ep=000001000ccf2000
[0000010009d46b50] eventpoll: polling file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00
[0000010009d46b50] eventpoll: pollres file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00 events=1
[0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000) = 1
[0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000)
[0000010009d46b50] eventpoll: polling file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00
[0000010009d462c0] eventpoll: poll_callback(0000010005e2c180) epi=00000100091cde00 ep=000001000ccf2000
[0000010009d46b50] eventpoll: polling file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00
[0000010009d462c0] eventpoll: poll_callback(0000010005e2c180) epi=00000100091cde00 ep=000001000ccf2000
[0000010009d46b50] eventpoll: polling file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00
[0000010009d46b50] eventpoll: pollres file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00 events=1
[0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000) = 1
[0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000)
[0000010009d46b50] eventpoll: polling file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00
[0000010009d462c0] eventpoll: poll_callback(0000010005e2c180) epi=00000100091cde00 ep=000001000ccf2000
[0000010009d462c0] eventpoll: poll_callback(0000010005e2c180) epi=00000100091cde00 ep=000001000ccf2000
[0000010009d462c0] eventpoll: poll_callback(0000010005e2c180) epi=00000100091cde00 ep=000001000ccf2000
[0000010009d46b50] eventpoll: polling file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00
[0000010009d46b50] eventpoll: pollres file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00 events=1
[0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000) = 1
[0000010009d46b50] eventpoll: sys_epoll_ctl(3, 1, 4, 0000007fbfffe9a0)
[0000010009d46b50] eventpoll: ep_find(000001000eb6a080) -> 0000000000000000
[0000010009d46b50] eventpoll: ep_insert(000001000ccf2000, 000001000eb6a080, 4)
[0000010009d46b50] eventpoll: sys_epoll_ctl(3, 1, 4, 0000007fbfffe9a0) = 0
[0000010009d46b50] eventpoll: sys_epoll_ctl(3, 3, 4, 000000000070fcc0)
[0000010009d46b50] eventpoll: ep_find(000001000eb6a080) -> 00000100091cdd40
[0000010009d46b50] eventpoll: sys_epoll_ctl(3, 3, 4, 000000000070fcc0) = 0
[0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000)
[0000010009d46b50] eventpoll: polling file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00
[0000010009d462c0] eventpoll: poll_callback(0000010005e2c180) epi=00000100091cde00 ep=000001000ccf2000
[0000010009d462c0] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 3000)
[0000010009d46b50] eventpoll: polling file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00
[0000010009d462c0] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 3000) = -4
[0000010009d462c0] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 2000)
[0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000) = 0
[0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000)
[0000010009d462c0] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 2000) = -4
[0000010009d462c0] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 1000)
[0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000) = 0
[0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000)
[0000010009d462c0] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 1000) = -4
[0000010009d462c0] eventpoll: poll_callback(000001000eb6a080) epi=00000100091cdd40 ep=000001000ccf2000
[0000010009d462c0] eventpoll: poll_callback(000001000eb6a080) epi=00000100091cdd40 ep=000001000ccf2000
[0000010009d462c0] eventpoll: poll_callback(00000100059c9580) epi=00000100091cd980 ep=000001000475e000
[0000010009d462c0] eventpoll: poll_callback(00000100059c9580) epi=00000100091cd980 ep=000001000475e000
[0000010009d462c0] eventpoll: eventpoll_release_file(00000100059c9580)
[0000010009d462c0] eventpoll: remove ep=000001000475e000 epi=00000100091cd980
[0000010009d462c0] eventpoll: ep_unlink(000001000475e000, 00000100059c9580) = 0
[0000010009d462c0] eventpoll: ep_remove(000001000475e000, 00000100059c9580) = 0
[0000010009d462c0] eventpoll: eventpoll_release_file(000001000cc3a980)
[0000010009d462c0] eventpoll: remove ep=000001000475e000 epi=00000100091cdec0
[0000010009d462c0] eventpoll: ep_unlink(000001000475e000, 000001000cc3a980) = 0
[0000010009d462c0] eventpoll: ep_remove(000001000475e000, 000001000cc3a980) = 0
[0000010009d462c0] eventpoll: close() ep=000001000475e000
[0000010009d462c0] eventpoll: poll_callback(0000010005e2c180) epi=00000100091cde00 ep=000001000ccf2000
[0000010009d46b50] eventpoll: polling file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00
[0000010009d46b50] eventpoll: pollres file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00 events=17
[0000010009d46b50] eventpoll: polling file=000001000eb6a080 ep=000001000ccf2000 epi=00000100091cdd40
[0000010009d46b50] eventpoll: pollres file=000001000eb6a080 ep=000001000ccf2000 epi=00000100091cdd40 events=16
[0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000) = 2
[0000010009d46b50] eventpoll: eventpoll_release_file(0000010005e2c180)
[0000010009d46b50] eventpoll: remove ep=000001000ccf2000 epi=00000100091cde00
[0000010009d46b50] eventpoll: ep_unlink(000001000ccf2000, 0000010005e2c180) = 0
[0000010009d46b50] eventpoll: ep_remove(000001000ccf2000, 0000010005e2c180) = 0
[0000010009d46b50] eventpoll: eventpoll_release_file(000001000eb6a080)
[0000010009d46b50] eventpoll: remove ep=000001000ccf2000 epi=00000100091cdd40
[0000010009d46b50] eventpoll: ep_unlink(000001000ccf2000, 000001000eb6a080) = 0
[0000010009d46b50] eventpoll: ep_remove(000001000ccf2000, 000001000eb6a080) = 0
[0000010009d46b50] eventpoll: close() ep=000001000ccf2000



-- 
Ben Mansell, <ben@zeus.com>                       Zeus Technology Ltd
Download the world's fastest webserver!   Universally Serving the Net
T:+44(0)1223 525000 F:+44(0)1223 525100           http://www.zeus.com
Zeus House, Cowley Road, Cambridge, CB4 0ZT, ENGLAND

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: epoll gives broken results when interrupted with a signal
  2003-10-30 11:22       ` Ben Mansell
@ 2003-10-30 14:48         ` Ben Mansell
  2003-10-30 23:36           ` Davide Libenzi
  2003-10-31 23:00           ` Davide Libenzi
  0 siblings, 2 replies; 9+ messages in thread
From: Ben Mansell @ 2003-10-30 14:48 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Linux Kernel Mailing List

On Thu, 30 Oct 2003, Ben Mansell wrote:

> On Wed, 29 Oct 2003, Davide Libenzi wrote:
>
> > Can you try the patch below and show me a dmesg when this happen?
>
> Ok, patch applied. (I changed DEBUG_EPOLL to 10 however, otherwise
> nothing would be printed). Now, epoll appears to behave perfectly and I
> can't re-create the problem :(

Got it! I was missing the problem because I had removed some debug
messages in my own code. Here's another run, this time the
final epoll_wait() call of the child process brings back 2 events:
 Event 0 fd: 7 events: 17
 Event 1 fd: -2095926561 events: 0

I've added the debug to the end of this message.

If I modify the code so there are several 'child' processes, all
monitoring the same sockets with their own epolls, they all seem to get
the same results from epoll_wait().

> > Also, it shouldn't change a whit, but are you able to try on a x86 UP?

On a UP x86, 2.6.0-test9, I can't reproduce the problem at all.

[0000010017126e50] eventpoll: sys_epoll_create(10)
[0000010017126e50] eventpoll: ep_file_init() ep=0000010008964000
[0000010017126e50] eventpoll: sys_epoll_create(10) = 4
[0000010017126e50] eventpoll: close() ep=0000010008964000
[0000010017126e50] eventpoll: sys_epoll_create(256)
[0000010017126e50] eventpoll: ep_file_init() ep=0000010008964000
[0000010017126e50] eventpoll: sys_epoll_create(256) = 4
[0000010017126e50] eventpoll: sys_epoll_ctl(4, 1, 5, 0000007fbffff070)
[0000010017126e50] eventpoll: ep_find(0000010005e2cc80) -> 0000000000000000
[0000010017126e50] eventpoll: ep_insert(0000010008964000, 0000010005e2cc80, 5)
[0000010017126e50] eventpoll: sys_epoll_ctl(4, 1, 5, 0000007fbffff070) = 0
[0000010017126e50] eventpoll: sys_epoll_ctl(4, 1, 6, 0000007fbffff290)
[0000010017126e50] eventpoll: ep_find(000001000402b2c0) -> 0000000000000000
[0000010017126e50] eventpoll: ep_insert(0000010008964000, 000001000402b2c0, 6)
[0000010017126e50] eventpoll: sys_epoll_ctl(4, 1, 6, 0000007fbffff290) = 0
[0000010002ba7520] eventpoll: sys_epoll_create(1024)
[0000010002ba7520] eventpoll: ep_file_init() ep=000001001f928000
[0000010002ba7520] eventpoll: sys_epoll_create(1024) = 3
[0000010002ba7520] eventpoll: sys_epoll_ctl(3, 1, 7, 0000007fbffff1c0)
[0000010002ba7520] eventpoll: ep_find(00000100099822c0) -> 0000000000000000
[0000010002ba7520] eventpoll: ep_insert(000001001f928000, 00000100099822c0, 7)
[0000010002ba7520] eventpoll: sys_epoll_ctl(3, 1, 7, 0000007fbffff1c0) = 0
[0000010002ba7520] eventpoll: sys_epoll_ctl(3, 3, 7, 000000000070fd80)
[0000010002ba7520] eventpoll: ep_find(00000100099822c0) -> 000001000d9c6a80
[0000010002ba7520] eventpoll: sys_epoll_ctl(3, 3, 7, 000000000070fd80) = 0
[0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000)
[0000010017126e50] eventpoll: poll_callback(00000100099822c0) epi=000001000d9c6a80 ep=000001001f928000
[0000010002ba7520] eventpoll: polling file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80
[0000010002ba7520] eventpoll: pollres file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80 events=1
[0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000) = 1
[0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000)
[0000010002ba7520] eventpoll: polling file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80
[0000010017126e50] eventpoll: poll_callback(00000100099822c0) epi=000001000d9c6a80 ep=000001001f928000
[0000010017126e50] eventpoll: sys_epoll_ctl(4, 3, 6, 000000000070fd80)
[0000010017126e50] eventpoll: ep_find(000001000402b2c0) -> 000001000d9c6b40
[0000010017126e50] eventpoll: sys_epoll_ctl(4, 3, 6, 000000000070fd80) = 0
[0000010017126e50] eventpoll: sys_epoll_ctl(4, 3, 5, 000000000070fd80)
[0000010017126e50] eventpoll: ep_find(0000010005e2cc80) -> 000001000d9c6c00
[0000010017126e50] eventpoll: sys_epoll_ctl(4, 3, 5, 000000000070fd80) = 0
[0000010017126e50] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 0)
[0000010017126e50] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 0) = 0
[0000010017126e50] eventpoll: poll_callback(00000100099822c0) epi=000001000d9c6a80 ep=000001001f928000
[0000010002ba7520] eventpoll: polling file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80
[0000010002ba7520] eventpoll: pollres file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80 events=1
[0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000) = 1
[0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000)
[0000010002ba7520] eventpoll: polling file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80
[0000010017126e50] eventpoll: poll_callback(00000100099822c0) epi=000001000d9c6a80 ep=000001001f928000
[0000010002ba7520] eventpoll: polling file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80
[0000010017126e50] eventpoll: poll_callback(00000100099822c0) epi=000001000d9c6a80 ep=000001001f928000
[0000010002ba7520] eventpoll: polling file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80
[0000010002ba7520] eventpoll: pollres file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80 events=1
[0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000) = 1
[0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000)
[0000010002ba7520] eventpoll: polling file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80
[0000010017126e50] eventpoll: poll_callback(00000100099822c0) epi=000001000d9c6a80 ep=000001001f928000
[0000010017126e50] eventpoll: poll_callback(00000100099822c0) epi=000001000d9c6a80 ep=000001001f928000
[0000010017126e50] eventpoll: poll_callback(00000100099822c0) epi=000001000d9c6a80 ep=000001001f928000
[0000010002ba7520] eventpoll: polling file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80
[0000010002ba7520] eventpoll: pollres file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80 events=1
[0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000) = 1
[0000010002ba7520] eventpoll: sys_epoll_ctl(3, 1, 4, 0000007fbfffe850)
[0000010002ba7520] eventpoll: ep_find(000001000615d980) -> 0000000000000000
[0000010002ba7520] eventpoll: ep_insert(000001001f928000, 000001000615d980, 4)
[0000010002ba7520] eventpoll: sys_epoll_ctl(3, 1, 4, 0000007fbfffe850) = 0
[0000010002ba7520] eventpoll: sys_epoll_ctl(3, 3, 4, 000000000070fd80)
[0000010002ba7520] eventpoll: ep_find(000001000615d980) -> 000001000d9c69c0
[0000010002ba7520] eventpoll: sys_epoll_ctl(3, 3, 4, 000000000070fd80) = 0
[0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000)
[0000010002ba7520] eventpoll: polling file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80
[0000010017126e50] eventpoll: poll_callback(00000100099822c0) epi=000001000d9c6a80 ep=000001001f928000
[0000010017126e50] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 3000)
[0000010002ba7520] eventpoll: polling file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80
[0000010017126e50] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 3000) = -4
[0000010017126e50] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 2000)
[0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000) = 0
[0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000)
[0000010017126e50] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 2000) = -4
[0000010017126e50] eventpoll: poll_callback(000001000615d980) epi=000001000d9c69c0 ep=000001001f928000
[0000010017126e50] eventpoll: poll_callback(000001000615d980) epi=000001000d9c69c0 ep=000001001f928000
[0000010017126e50] eventpoll: poll_callback(0000010005e2cc80) epi=000001000d9c6c00 ep=0000010008964000
[0000010017126e50] eventpoll: poll_callback(0000010005e2cc80) epi=000001000d9c6c00 ep=0000010008964000
[0000010017126e50] eventpoll: eventpoll_release_file(0000010005e2cc80)
[0000010017126e50] eventpoll: remove ep=0000010008964000 epi=000001000d9c6c00
[0000010017126e50] eventpoll: ep_unlink(0000010008964000, 0000010005e2cc80) = 0
[0000010017126e50] eventpoll: ep_remove(0000010008964000, 0000010005e2cc80) = 0
[0000010017126e50] eventpoll: eventpoll_release_file(000001000402b2c0)
[0000010017126e50] eventpoll: remove ep=0000010008964000 epi=000001000d9c6b40
[0000010017126e50] eventpoll: ep_unlink(0000010008964000, 000001000402b2c0) = 0
[0000010017126e50] eventpoll: ep_remove(0000010008964000, 000001000402b2c0) = 0
[0000010017126e50] eventpoll: close() ep=0000010008964000
[0000010017126e50] eventpoll: poll_callback(00000100099822c0) epi=000001000d9c6a80 ep=000001001f928000
[0000010002ba7520] eventpoll: polling file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80
[0000010002ba7520] eventpoll: pollres file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80 events=17
[0000010002ba7520] eventpoll: polling file=000001000615d980 ep=000001001f928000 epi=000001000d9c69c0
[0000010002ba7520] eventpoll: pollres file=000001000615d980 ep=000001001f928000 epi=000001000d9c69c0 events=16
[0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000) = 2
[0000010002ba7520] eventpoll: eventpoll_release_file(00000100099822c0)
[0000010002ba7520] eventpoll: remove ep=000001001f928000 epi=000001000d9c6a80
[0000010002ba7520] eventpoll: ep_unlink(000001001f928000, 00000100099822c0) = 0
[0000010002ba7520] eventpoll: ep_remove(000001001f928000, 00000100099822c0) = 0
[0000010002ba7520] eventpoll: eventpoll_release_file(000001000615d980)
[0000010002ba7520] eventpoll: remove ep=000001001f928000 epi=000001000d9c69c0
[0000010002ba7520] eventpoll: ep_unlink(000001001f928000, 000001000615d980) = 0
[0000010002ba7520] eventpoll: ep_remove(000001001f928000, 000001000615d980) = 0
[0000010002ba7520] eventpoll: close() ep=000001001f928000


-- 
Ben Mansell, <ben@zeus.com>                       Zeus Technology Ltd
Download the world's fastest webserver!   Universally Serving the Net
T:+44(0)1223 525000 F:+44(0)1223 525100           http://www.zeus.com
Zeus House, Cowley Road, Cambridge, CB4 0ZT, ENGLAND


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: epoll gives broken results when interrupted with a signal
  2003-10-30 14:48         ` Ben Mansell
@ 2003-10-30 23:36           ` Davide Libenzi
  2003-11-03 11:07             ` Ben Mansell
  2003-10-31 23:00           ` Davide Libenzi
  1 sibling, 1 reply; 9+ messages in thread
From: Davide Libenzi @ 2003-10-30 23:36 UTC (permalink / raw)
  To: Ben Mansell; +Cc: Linux Kernel Mailing List

On Thu, 30 Oct 2003, Ben Mansell wrote:

> On Thu, 30 Oct 2003, Ben Mansell wrote:
>
> > On Wed, 29 Oct 2003, Davide Libenzi wrote:
> >
> > > Can you try the patch below and show me a dmesg when this happen?
> >
> > Ok, patch applied. (I changed DEBUG_EPOLL to 10 however, otherwise
> > nothing would be printed). Now, epoll appears to behave perfectly and I
> > can't re-create the problem :(
>
> Got it! I was missing the problem because I had removed some debug
> messages in my own code. Here's another run, this time the
> final epoll_wait() call of the child process brings back 2 events:
>  Event 0 fd: 7 events: 17
>  Event 1 fd: -2095926561 events: 0
>
> I've added the debug to the end of this message.
>
> If I modify the code so there are several 'child' processes, all
> monitoring the same sockets with their own epolls, they all seem to get
> the same results from epoll_wait().
>
> > > Also, it shouldn't change a whit, but are you able to try on a x86 UP?
>
> On a UP x86, 2.6.0-test9, I can't reproduce the problem at all.

Could you try to poison the event buffer before an epoll_wait() to see how
many bytes are effectively written by the function?

	memset(events, 'e', num * sizeof(epoll_event));




- Davide


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: epoll gives broken results when interrupted with a signal
  2003-10-30 14:48         ` Ben Mansell
  2003-10-30 23:36           ` Davide Libenzi
@ 2003-10-31 23:00           ` Davide Libenzi
  1 sibling, 0 replies; 9+ messages in thread
From: Davide Libenzi @ 2003-10-31 23:00 UTC (permalink / raw)
  To: Ben Mansell; +Cc: Linux Kernel Mailing List

On Thu, 30 Oct 2003, Ben Mansell wrote:

> On Thu, 30 Oct 2003, Ben Mansell wrote:
>
> > On Wed, 29 Oct 2003, Davide Libenzi wrote:
> >
> > > Can you try the patch below and show me a dmesg when this happen?
> >
> > Ok, patch applied. (I changed DEBUG_EPOLL to 10 however, otherwise
> > nothing would be printed). Now, epoll appears to behave perfectly and I
> > can't re-create the problem :(
>
> Got it! I was missing the problem because I had removed some debug
> messages in my own code. Here's another run, this time the
> final epoll_wait() call of the child process brings back 2 events:
>  Event 0 fd: 7 events: 17
>  Event 1 fd: -2095926561 events: 0

It is really strage. If you look what epoll sees:

> [0000010002ba7520] eventpoll: polling file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80
> [0000010002ba7520] eventpoll: pollres file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80 events=17
> [0000010002ba7520] eventpoll: polling file=000001000615d980 ep=000001001f928000 epi=000001000d9c69c0
> [0000010002ba7520] eventpoll: pollres file=000001000615d980 ep=000001001f928000 epi=000001000d9c69c0 events=16
> [0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000) = 2
> [0000010002ba7520] eventpoll: eventpoll_release_file(00000100099822c0)
> [0000010002ba7520] eventpoll: remove ep=000001001f928000 epi=000001000d9c6a80
> [0000010002ba7520] eventpoll: ep_unlink(000001001f928000, 00000100099822c0) = 0
> [0000010002ba7520] eventpoll: ep_remove(000001001f928000, 00000100099822c0) = 0
> [0000010002ba7520] eventpoll: eventpoll_release_file(000001000615d980)
> [0000010002ba7520] eventpoll: remove ep=000001001f928000 epi=000001000d9c69c0
> [0000010002ba7520] eventpoll: ep_unlink(000001001f928000, 000001000615d980) = 0
> [0000010002ba7520] eventpoll: ep_remove(000001001f928000, 000001000615d980) = 0
> [0000010002ba7520] eventpoll: close() ep=000001001f928000

It clearly sees two events with masks 16 and 17. And at this points events
are already inside a buffer ready to be pushed to usespace with a
copy_to_user().



- Davide


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: epoll gives broken results when interrupted with a signal
  2003-10-30 23:36           ` Davide Libenzi
@ 2003-11-03 11:07             ` Ben Mansell
  0 siblings, 0 replies; 9+ messages in thread
From: Ben Mansell @ 2003-11-03 11:07 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Linux Kernel Mailing List

On Thu, 30 Oct 2003, Davide Libenzi wrote:

> Could you try to poison the event buffer before an epoll_wait() to see how
> many bytes are effectively written by the function?
>
> 	memset(events, 'e', num * sizeof(epoll_event));

aha! This is interesting. When epoll seems to go wrong, the
epoll_event.data of the second event is unwritten, but the .events field
is zeroed. Sometimes though, the final call to epoll_wait() returns just
one event instead of the two, and in that case, the high 32 bits of the
first epoll_event.data are unwritten. The lower 32 bits contain the
right data, and as the code treats this as a (int) FD, the top 32 bits
are discarded anyway.

Drat! I've just looked back at the kernel headers and noticed the
following, prior to the epoll_event definition:

/*
 * On x86-64 make the 64bit structure have the same alignment as the
 * 32bit structure. This makes 32bit emulation easier.
 */
#ifdef __x86_64__
#define EPOLL_PACKED __attribute__((packed))
#else
#define EPOLL_PACKED
#endif

struct epoll_event {
        __u32 events;
        __u64 data;
} EPOLL_PACKED;


Now, my code can't pull in the kernel headers so it has its own
epoll_event definition, which is missing the packed attribute. D'oh!
Guess what, everything works properly when I add it in. Looks like this
bug has been of my own making all along...

Sorry for taking up your time!


Ben

-- 
Ben Mansell, <ben@zeus.com>                       Zeus Technology Ltd
Download the world's fastest webserver!   Universally Serving the Net
T:+44(0)1223 525000 F:+44(0)1223 525100           http://www.zeus.com
Zeus House, Cowley Road, Cambridge, CB4 0ZT, ENGLAND

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-11-03 11:07 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-10-29 15:00 epoll gives broken results when interrupted with a signal Ben Mansell
2003-10-29 17:26 ` Davide Libenzi
2003-10-29 17:38   ` Ben Mansell
2003-10-29 19:23     ` Davide Libenzi
2003-10-30 11:22       ` Ben Mansell
2003-10-30 14:48         ` Ben Mansell
2003-10-30 23:36           ` Davide Libenzi
2003-11-03 11:07             ` Ben Mansell
2003-10-31 23:00           ` Davide Libenzi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).