* epoll gives broken results when interrupted with a signal @ 2003-10-29 15:00 Ben Mansell 2003-10-29 17:26 ` Davide Libenzi 0 siblings, 1 reply; 9+ messages in thread From: Ben Mansell @ 2003-10-29 15:00 UTC (permalink / raw) To: linux-kernel I'm using the epoll system interface on a 2.6.0-test9 kernel, but I hit a problem if the process calling epoll_wait() gets interrupted. The epoll_wait() returns with several events, but the last event of which contains junk (e.g. typically reports that a file descriptor like -91534560 received an event) The epoll is being used to monitor only a handful of file descriptors. Some of these however are TCP network sockets that were bound to a port by a parent process, and then passed on to the process doing the epoll. Another file descriptor is that of a socket connected to the parent process. The epoll failure is brought about when the parent process tries to kill off the child with a SIGTERM. The parent then exits. The final (interrupted) epoll returns two events - the first is that of the socket to the dead parent, receiving EPOLLIN | EPOLLHUP, which seems reasonable. The next event is then random garbage. Perhaps epoll is just returning one too many results? Unfortunately, I've no simple test case code to demonstrate this - but if there's any other information I can provide to help track this down then let me know. (Please CC: me on replies to the list) Cheers, Ben -- Ben Mansell, <ben@zeus.com> Zeus Technology Ltd Download the world's fastest webserver! Universally Serving the Net T:+44(0)1223 525000 F:+44(0)1223 525100 http://www.zeus.com Zeus House, Cowley Road, Cambridge, CB4 0ZT, ENGLAND ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: epoll gives broken results when interrupted with a signal 2003-10-29 15:00 epoll gives broken results when interrupted with a signal Ben Mansell @ 2003-10-29 17:26 ` Davide Libenzi 2003-10-29 17:38 ` Ben Mansell 0 siblings, 1 reply; 9+ messages in thread From: Davide Libenzi @ 2003-10-29 17:26 UTC (permalink / raw) To: Ben Mansell; +Cc: linux-kernel On Wed, 29 Oct 2003, Ben Mansell wrote: > I'm using the epoll system interface on a 2.6.0-test9 kernel, but I hit > a problem if the process calling epoll_wait() gets interrupted. The > epoll_wait() returns with several events, but the last event of which > contains junk (e.g. typically reports that a file descriptor like > -91534560 received an event) > > The epoll is being used to monitor only a handful of file descriptors. > Some of these however are TCP network sockets that were bound to a port > by a parent process, and then passed on to the process doing the epoll. > Another file descriptor is that of a socket connected to the parent > process. The epoll failure is brought about when the parent process > tries to kill off the child with a SIGTERM. The parent then exits. > > The final (interrupted) epoll returns two events - the first is that of > the socket to the dead parent, receiving EPOLLIN | EPOLLHUP, which seems > reasonable. The next event is then random garbage. Perhaps epoll is just > returning one too many results? Is it an UP or an SMP machine? The descriptor is passed how? fork? If I'll send you a debug patch for epoll will you be able to run it? - Davide ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: epoll gives broken results when interrupted with a signal 2003-10-29 17:26 ` Davide Libenzi @ 2003-10-29 17:38 ` Ben Mansell 2003-10-29 19:23 ` Davide Libenzi 0 siblings, 1 reply; 9+ messages in thread From: Ben Mansell @ 2003-10-29 17:38 UTC (permalink / raw) To: Davide Libenzi; +Cc: linux-kernel On Wed, 29 Oct 2003, Davide Libenzi wrote: > On Wed, 29 Oct 2003, Ben Mansell wrote: > > > I'm using the epoll system interface on a 2.6.0-test9 kernel, but I hit > > a problem if the process calling epoll_wait() gets interrupted. The > > epoll_wait() returns with several events, but the last event of which > > contains junk (e.g. typically reports that a file descriptor like > > -91534560 received an event) > > > > The epoll is being used to monitor only a handful of file descriptors. > > Some of these however are TCP network sockets that were bound to a port > > by a parent process, and then passed on to the process doing the epoll. > > Another file descriptor is that of a socket connected to the parent > > process. The epoll failure is brought about when the parent process > > tries to kill off the child with a SIGTERM. The parent then exits. > > > > The final (interrupted) epoll returns two events - the first is that of > > the socket to the dead parent, receiving EPOLLIN | EPOLLHUP, which seems > > reasonable. The next event is then random garbage. Perhaps epoll is just > > returning one too many results? > > Is it an UP or an SMP machine? The descriptor is passed how? fork? If I'll > send you a debug patch for epoll will you be able to run it? UP machine - x86_64 however, if that makes any difference (sorry, my initial message didn't contain much information, as I wasn't sure what was relevant and what was not). The descriptor is passed over the unix domain socket between the two processes. I'm willing to try any debug patch that might help track down the problem. Thanks, Ben -- Ben Mansell, <ben@zeus.com> Zeus Technology Ltd Download the world's fastest webserver! Universally Serving the Net T:+44(0)1223 525000 F:+44(0)1223 525100 http://www.zeus.com Zeus House, Cowley Road, Cambridge, CB4 0ZT, ENGLAND ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: epoll gives broken results when interrupted with a signal 2003-10-29 17:38 ` Ben Mansell @ 2003-10-29 19:23 ` Davide Libenzi 2003-10-30 11:22 ` Ben Mansell 0 siblings, 1 reply; 9+ messages in thread From: Davide Libenzi @ 2003-10-29 19:23 UTC (permalink / raw) To: Ben Mansell; +Cc: Linux Kernel Mailing List On Wed, 29 Oct 2003, Ben Mansell wrote: > UP machine - x86_64 however, if that makes any difference (sorry, my > initial message didn't contain much information, as I wasn't sure what > was relevant and what was not). The descriptor is passed over the unix > domain socket between the two processes. > > I'm willing to try any debug patch that might help track down the > problem. Can you try the patch below and show me a dmesg when this happen? Also, it shouldn't change a whit, but are you able to try on a x86 UP? - Davide diff -Nru linux-2.6.0-test9.vanilla/fs/eventpoll.c linux-2.6.0-test9.mod/fs/eventpoll.c --- linux-2.6.0-test9.vanilla/fs/eventpoll.c 2003-10-25 11:43:26.000000000 -0700 +++ linux-2.6.0-test9.mod/fs/eventpoll.c 2003-10-29 11:21:27.849490320 -0800 @@ -75,7 +75,7 @@ #define EVENTPOLLFS_MAGIC 0x03111965 /* My birthday should work for this :) */ -#define DEBUG_EPOLL 0 +#define DEBUG_EPOLL 1 #if DEBUG_EPOLL > 0 #define DPRINTK(x) printk x @@ -462,6 +462,9 @@ struct eventpoll *ep; struct epitem *epi; + DNPRINTK(3, (KERN_INFO "[%p] eventpoll: eventpoll_release_file(%p)\n", + current, file)); + /* * We don't want to get "file->f_ep_lock" because it is not * necessary. It is not necessary because we're in the "struct file" @@ -478,6 +481,10 @@ ep = epi->ep; EP_LIST_DEL(&epi->fllink); + + DNPRINTK(3, (KERN_INFO "[%p] eventpoll: remove ep=%p epi=%p\n", + current, ep, epi)); + down_write(&ep->sem); ep_remove(ep, epi); up_write(&ep->sem); @@ -1432,6 +1439,9 @@ list_for_each(lnk, txlist) { epi = list_entry(lnk, struct epitem, txlink); + DNPRINTK(3, (KERN_INFO "[%p] eventpoll: polling file=%p ep=%p epi=%p\n", + current, epi->file, ep, epi)); + /* * Get the ready file event set. We can safely use the file * because we are holding the "sem" in read and this will @@ -1447,6 +1457,9 @@ epi->revents = revents & epi->event.events; if (epi->revents) { + DNPRINTK(3, (KERN_INFO "[%p] eventpoll: pollres file=%p ep=%p epi=%p events=%u\n", + current, epi->file, ep, epi, epi->revents)); + event[eventbuf] = epi->event; event[eventbuf].events &= revents; eventbuf++; ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: epoll gives broken results when interrupted with a signal 2003-10-29 19:23 ` Davide Libenzi @ 2003-10-30 11:22 ` Ben Mansell 2003-10-30 14:48 ` Ben Mansell 0 siblings, 1 reply; 9+ messages in thread From: Ben Mansell @ 2003-10-30 11:22 UTC (permalink / raw) To: Davide Libenzi; +Cc: Linux Kernel Mailing List On Wed, 29 Oct 2003, Davide Libenzi wrote: > Can you try the patch below and show me a dmesg when this happen? Ok, patch applied. (I changed DEBUG_EPOLL to 10 however, otherwise nothing would be printed). Now, epoll appears to behave perfectly and I can't re-create the problem :( Maybe it is a race of some kind and the printks change the behaviour? The bug still occured the first time I tried the patch with DEBUG_EPOLL 1. > Also, it shouldn't change a whit, but are you able to try on a x86 UP? I could set one up if needed; we have a x86 build machine running 2.6.0-test3 here which I can upgrade (epoll seems fine on that btw) Here's the dmesg output anyway: [0000010009d462c0] eventpoll: sys_epoll_create(10) [0000010009d462c0] eventpoll: ep_file_init() ep=000001000475e000 [0000010009d462c0] eventpoll: sys_epoll_create(10) = 4 [0000010009d462c0] eventpoll: close() ep=000001000475e000 [0000010009d462c0] eventpoll: sys_epoll_create(256) [0000010009d462c0] eventpoll: ep_file_init() ep=000001000475e000 [0000010009d462c0] eventpoll: sys_epoll_create(256) = 4 [0000010009d462c0] eventpoll: sys_epoll_ctl(4, 1, 5, 0000007fbffff070) [0000010009d462c0] eventpoll: ep_find(00000100059c9580) -> 0000000000000000 [0000010009d462c0] eventpoll: ep_insert(000001000475e000, 00000100059c9580, 5) [0000010009d462c0] eventpoll: sys_epoll_ctl(4, 1, 5, 0000007fbffff070) = 0 [0000010009d462c0] eventpoll: sys_epoll_ctl(4, 1, 6, 0000007fbffff290) [0000010009d462c0] eventpoll: ep_find(000001000cc3a980) -> 0000000000000000 [0000010009d462c0] eventpoll: ep_insert(000001000475e000, 000001000cc3a980, 6) [0000010009d462c0] eventpoll: sys_epoll_ctl(4, 1, 6, 0000007fbffff290) = 0 [0000010009d46b50] eventpoll: sys_epoll_create(1024) [0000010009d46b50] eventpoll: ep_file_init() ep=000001000ccf2000 [0000010009d46b50] eventpoll: sys_epoll_create(1024) = 3 [0000010009d46b50] eventpoll: sys_epoll_ctl(3, 1, 7, 0000007fbffff1c0) [0000010009d46b50] eventpoll: ep_find(0000010005e2c180) -> 0000000000000000 [0000010009d46b50] eventpoll: ep_insert(000001000ccf2000, 0000010005e2c180, 7) [0000010009d46b50] eventpoll: sys_epoll_ctl(3, 1, 7, 0000007fbffff1c0) = 0 [0000010009d46b50] eventpoll: sys_epoll_ctl(3, 3, 7, 000000000070fcc0) [0000010009d46b50] eventpoll: ep_find(0000010005e2c180) -> 00000100091cde00 [0000010009d46b50] eventpoll: sys_epoll_ctl(3, 3, 7, 000000000070fcc0) = 0 [0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000) [0000010009d462c0] eventpoll: poll_callback(0000010005e2c180) epi=00000100091cde00 ep=000001000ccf2000 [0000010009d46b50] eventpoll: polling file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00 [0000010009d46b50] eventpoll: pollres file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00 events=1 [0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000) = 1 [0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000) [0000010009d46b50] eventpoll: polling file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00 [0000010009d462c0] eventpoll: poll_callback(0000010005e2c180) epi=00000100091cde00 ep=000001000ccf2000 [0000010009d462c0] eventpoll: sys_epoll_ctl(4, 3, 6, 000000000070fcc0) [0000010009d462c0] eventpoll: ep_find(000001000cc3a980) -> 00000100091cdec0 [0000010009d462c0] eventpoll: sys_epoll_ctl(4, 3, 6, 000000000070fcc0) = 0 [0000010009d462c0] eventpoll: sys_epoll_ctl(4, 3, 5, 000000000070fcc0) [0000010009d462c0] eventpoll: ep_find(00000100059c9580) -> 00000100091cd980 [0000010009d462c0] eventpoll: sys_epoll_ctl(4, 3, 5, 000000000070fcc0) = 0 [0000010009d462c0] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 0) [0000010009d462c0] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 0) = 0 [0000010009d462c0] eventpoll: poll_callback(0000010005e2c180) epi=00000100091cde00 ep=000001000ccf2000 [0000010009d46b50] eventpoll: polling file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00 [0000010009d46b50] eventpoll: pollres file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00 events=1 [0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000) = 1 [0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000) [0000010009d46b50] eventpoll: polling file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00 [0000010009d462c0] eventpoll: poll_callback(0000010005e2c180) epi=00000100091cde00 ep=000001000ccf2000 [0000010009d46b50] eventpoll: polling file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00 [0000010009d462c0] eventpoll: poll_callback(0000010005e2c180) epi=00000100091cde00 ep=000001000ccf2000 [0000010009d46b50] eventpoll: polling file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00 [0000010009d46b50] eventpoll: pollres file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00 events=1 [0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000) = 1 [0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000) [0000010009d46b50] eventpoll: polling file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00 [0000010009d462c0] eventpoll: poll_callback(0000010005e2c180) epi=00000100091cde00 ep=000001000ccf2000 [0000010009d462c0] eventpoll: poll_callback(0000010005e2c180) epi=00000100091cde00 ep=000001000ccf2000 [0000010009d462c0] eventpoll: poll_callback(0000010005e2c180) epi=00000100091cde00 ep=000001000ccf2000 [0000010009d46b50] eventpoll: polling file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00 [0000010009d46b50] eventpoll: pollres file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00 events=1 [0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000) = 1 [0000010009d46b50] eventpoll: sys_epoll_ctl(3, 1, 4, 0000007fbfffe9a0) [0000010009d46b50] eventpoll: ep_find(000001000eb6a080) -> 0000000000000000 [0000010009d46b50] eventpoll: ep_insert(000001000ccf2000, 000001000eb6a080, 4) [0000010009d46b50] eventpoll: sys_epoll_ctl(3, 1, 4, 0000007fbfffe9a0) = 0 [0000010009d46b50] eventpoll: sys_epoll_ctl(3, 3, 4, 000000000070fcc0) [0000010009d46b50] eventpoll: ep_find(000001000eb6a080) -> 00000100091cdd40 [0000010009d46b50] eventpoll: sys_epoll_ctl(3, 3, 4, 000000000070fcc0) = 0 [0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000) [0000010009d46b50] eventpoll: polling file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00 [0000010009d462c0] eventpoll: poll_callback(0000010005e2c180) epi=00000100091cde00 ep=000001000ccf2000 [0000010009d462c0] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 3000) [0000010009d46b50] eventpoll: polling file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00 [0000010009d462c0] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 3000) = -4 [0000010009d462c0] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 2000) [0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000) = 0 [0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000) [0000010009d462c0] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 2000) = -4 [0000010009d462c0] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 1000) [0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000) = 0 [0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000) [0000010009d462c0] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 1000) = -4 [0000010009d462c0] eventpoll: poll_callback(000001000eb6a080) epi=00000100091cdd40 ep=000001000ccf2000 [0000010009d462c0] eventpoll: poll_callback(000001000eb6a080) epi=00000100091cdd40 ep=000001000ccf2000 [0000010009d462c0] eventpoll: poll_callback(00000100059c9580) epi=00000100091cd980 ep=000001000475e000 [0000010009d462c0] eventpoll: poll_callback(00000100059c9580) epi=00000100091cd980 ep=000001000475e000 [0000010009d462c0] eventpoll: eventpoll_release_file(00000100059c9580) [0000010009d462c0] eventpoll: remove ep=000001000475e000 epi=00000100091cd980 [0000010009d462c0] eventpoll: ep_unlink(000001000475e000, 00000100059c9580) = 0 [0000010009d462c0] eventpoll: ep_remove(000001000475e000, 00000100059c9580) = 0 [0000010009d462c0] eventpoll: eventpoll_release_file(000001000cc3a980) [0000010009d462c0] eventpoll: remove ep=000001000475e000 epi=00000100091cdec0 [0000010009d462c0] eventpoll: ep_unlink(000001000475e000, 000001000cc3a980) = 0 [0000010009d462c0] eventpoll: ep_remove(000001000475e000, 000001000cc3a980) = 0 [0000010009d462c0] eventpoll: close() ep=000001000475e000 [0000010009d462c0] eventpoll: poll_callback(0000010005e2c180) epi=00000100091cde00 ep=000001000ccf2000 [0000010009d46b50] eventpoll: polling file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00 [0000010009d46b50] eventpoll: pollres file=0000010005e2c180 ep=000001000ccf2000 epi=00000100091cde00 events=17 [0000010009d46b50] eventpoll: polling file=000001000eb6a080 ep=000001000ccf2000 epi=00000100091cdd40 [0000010009d46b50] eventpoll: pollres file=000001000eb6a080 ep=000001000ccf2000 epi=00000100091cdd40 events=16 [0000010009d46b50] eventpoll: sys_epoll_wait(3, 000000000073e3b0, 32, 1000) = 2 [0000010009d46b50] eventpoll: eventpoll_release_file(0000010005e2c180) [0000010009d46b50] eventpoll: remove ep=000001000ccf2000 epi=00000100091cde00 [0000010009d46b50] eventpoll: ep_unlink(000001000ccf2000, 0000010005e2c180) = 0 [0000010009d46b50] eventpoll: ep_remove(000001000ccf2000, 0000010005e2c180) = 0 [0000010009d46b50] eventpoll: eventpoll_release_file(000001000eb6a080) [0000010009d46b50] eventpoll: remove ep=000001000ccf2000 epi=00000100091cdd40 [0000010009d46b50] eventpoll: ep_unlink(000001000ccf2000, 000001000eb6a080) = 0 [0000010009d46b50] eventpoll: ep_remove(000001000ccf2000, 000001000eb6a080) = 0 [0000010009d46b50] eventpoll: close() ep=000001000ccf2000 -- Ben Mansell, <ben@zeus.com> Zeus Technology Ltd Download the world's fastest webserver! Universally Serving the Net T:+44(0)1223 525000 F:+44(0)1223 525100 http://www.zeus.com Zeus House, Cowley Road, Cambridge, CB4 0ZT, ENGLAND ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: epoll gives broken results when interrupted with a signal 2003-10-30 11:22 ` Ben Mansell @ 2003-10-30 14:48 ` Ben Mansell 2003-10-30 23:36 ` Davide Libenzi 2003-10-31 23:00 ` Davide Libenzi 0 siblings, 2 replies; 9+ messages in thread From: Ben Mansell @ 2003-10-30 14:48 UTC (permalink / raw) To: Davide Libenzi; +Cc: Linux Kernel Mailing List On Thu, 30 Oct 2003, Ben Mansell wrote: > On Wed, 29 Oct 2003, Davide Libenzi wrote: > > > Can you try the patch below and show me a dmesg when this happen? > > Ok, patch applied. (I changed DEBUG_EPOLL to 10 however, otherwise > nothing would be printed). Now, epoll appears to behave perfectly and I > can't re-create the problem :( Got it! I was missing the problem because I had removed some debug messages in my own code. Here's another run, this time the final epoll_wait() call of the child process brings back 2 events: Event 0 fd: 7 events: 17 Event 1 fd: -2095926561 events: 0 I've added the debug to the end of this message. If I modify the code so there are several 'child' processes, all monitoring the same sockets with their own epolls, they all seem to get the same results from epoll_wait(). > > Also, it shouldn't change a whit, but are you able to try on a x86 UP? On a UP x86, 2.6.0-test9, I can't reproduce the problem at all. [0000010017126e50] eventpoll: sys_epoll_create(10) [0000010017126e50] eventpoll: ep_file_init() ep=0000010008964000 [0000010017126e50] eventpoll: sys_epoll_create(10) = 4 [0000010017126e50] eventpoll: close() ep=0000010008964000 [0000010017126e50] eventpoll: sys_epoll_create(256) [0000010017126e50] eventpoll: ep_file_init() ep=0000010008964000 [0000010017126e50] eventpoll: sys_epoll_create(256) = 4 [0000010017126e50] eventpoll: sys_epoll_ctl(4, 1, 5, 0000007fbffff070) [0000010017126e50] eventpoll: ep_find(0000010005e2cc80) -> 0000000000000000 [0000010017126e50] eventpoll: ep_insert(0000010008964000, 0000010005e2cc80, 5) [0000010017126e50] eventpoll: sys_epoll_ctl(4, 1, 5, 0000007fbffff070) = 0 [0000010017126e50] eventpoll: sys_epoll_ctl(4, 1, 6, 0000007fbffff290) [0000010017126e50] eventpoll: ep_find(000001000402b2c0) -> 0000000000000000 [0000010017126e50] eventpoll: ep_insert(0000010008964000, 000001000402b2c0, 6) [0000010017126e50] eventpoll: sys_epoll_ctl(4, 1, 6, 0000007fbffff290) = 0 [0000010002ba7520] eventpoll: sys_epoll_create(1024) [0000010002ba7520] eventpoll: ep_file_init() ep=000001001f928000 [0000010002ba7520] eventpoll: sys_epoll_create(1024) = 3 [0000010002ba7520] eventpoll: sys_epoll_ctl(3, 1, 7, 0000007fbffff1c0) [0000010002ba7520] eventpoll: ep_find(00000100099822c0) -> 0000000000000000 [0000010002ba7520] eventpoll: ep_insert(000001001f928000, 00000100099822c0, 7) [0000010002ba7520] eventpoll: sys_epoll_ctl(3, 1, 7, 0000007fbffff1c0) = 0 [0000010002ba7520] eventpoll: sys_epoll_ctl(3, 3, 7, 000000000070fd80) [0000010002ba7520] eventpoll: ep_find(00000100099822c0) -> 000001000d9c6a80 [0000010002ba7520] eventpoll: sys_epoll_ctl(3, 3, 7, 000000000070fd80) = 0 [0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000) [0000010017126e50] eventpoll: poll_callback(00000100099822c0) epi=000001000d9c6a80 ep=000001001f928000 [0000010002ba7520] eventpoll: polling file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80 [0000010002ba7520] eventpoll: pollres file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80 events=1 [0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000) = 1 [0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000) [0000010002ba7520] eventpoll: polling file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80 [0000010017126e50] eventpoll: poll_callback(00000100099822c0) epi=000001000d9c6a80 ep=000001001f928000 [0000010017126e50] eventpoll: sys_epoll_ctl(4, 3, 6, 000000000070fd80) [0000010017126e50] eventpoll: ep_find(000001000402b2c0) -> 000001000d9c6b40 [0000010017126e50] eventpoll: sys_epoll_ctl(4, 3, 6, 000000000070fd80) = 0 [0000010017126e50] eventpoll: sys_epoll_ctl(4, 3, 5, 000000000070fd80) [0000010017126e50] eventpoll: ep_find(0000010005e2cc80) -> 000001000d9c6c00 [0000010017126e50] eventpoll: sys_epoll_ctl(4, 3, 5, 000000000070fd80) = 0 [0000010017126e50] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 0) [0000010017126e50] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 0) = 0 [0000010017126e50] eventpoll: poll_callback(00000100099822c0) epi=000001000d9c6a80 ep=000001001f928000 [0000010002ba7520] eventpoll: polling file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80 [0000010002ba7520] eventpoll: pollres file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80 events=1 [0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000) = 1 [0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000) [0000010002ba7520] eventpoll: polling file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80 [0000010017126e50] eventpoll: poll_callback(00000100099822c0) epi=000001000d9c6a80 ep=000001001f928000 [0000010002ba7520] eventpoll: polling file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80 [0000010017126e50] eventpoll: poll_callback(00000100099822c0) epi=000001000d9c6a80 ep=000001001f928000 [0000010002ba7520] eventpoll: polling file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80 [0000010002ba7520] eventpoll: pollres file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80 events=1 [0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000) = 1 [0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000) [0000010002ba7520] eventpoll: polling file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80 [0000010017126e50] eventpoll: poll_callback(00000100099822c0) epi=000001000d9c6a80 ep=000001001f928000 [0000010017126e50] eventpoll: poll_callback(00000100099822c0) epi=000001000d9c6a80 ep=000001001f928000 [0000010017126e50] eventpoll: poll_callback(00000100099822c0) epi=000001000d9c6a80 ep=000001001f928000 [0000010002ba7520] eventpoll: polling file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80 [0000010002ba7520] eventpoll: pollres file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80 events=1 [0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000) = 1 [0000010002ba7520] eventpoll: sys_epoll_ctl(3, 1, 4, 0000007fbfffe850) [0000010002ba7520] eventpoll: ep_find(000001000615d980) -> 0000000000000000 [0000010002ba7520] eventpoll: ep_insert(000001001f928000, 000001000615d980, 4) [0000010002ba7520] eventpoll: sys_epoll_ctl(3, 1, 4, 0000007fbfffe850) = 0 [0000010002ba7520] eventpoll: sys_epoll_ctl(3, 3, 4, 000000000070fd80) [0000010002ba7520] eventpoll: ep_find(000001000615d980) -> 000001000d9c69c0 [0000010002ba7520] eventpoll: sys_epoll_ctl(3, 3, 4, 000000000070fd80) = 0 [0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000) [0000010002ba7520] eventpoll: polling file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80 [0000010017126e50] eventpoll: poll_callback(00000100099822c0) epi=000001000d9c6a80 ep=000001001f928000 [0000010017126e50] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 3000) [0000010002ba7520] eventpoll: polling file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80 [0000010017126e50] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 3000) = -4 [0000010017126e50] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 2000) [0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000) = 0 [0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000) [0000010017126e50] eventpoll: sys_epoll_wait(4, 0000000000734510, 32, 2000) = -4 [0000010017126e50] eventpoll: poll_callback(000001000615d980) epi=000001000d9c69c0 ep=000001001f928000 [0000010017126e50] eventpoll: poll_callback(000001000615d980) epi=000001000d9c69c0 ep=000001001f928000 [0000010017126e50] eventpoll: poll_callback(0000010005e2cc80) epi=000001000d9c6c00 ep=0000010008964000 [0000010017126e50] eventpoll: poll_callback(0000010005e2cc80) epi=000001000d9c6c00 ep=0000010008964000 [0000010017126e50] eventpoll: eventpoll_release_file(0000010005e2cc80) [0000010017126e50] eventpoll: remove ep=0000010008964000 epi=000001000d9c6c00 [0000010017126e50] eventpoll: ep_unlink(0000010008964000, 0000010005e2cc80) = 0 [0000010017126e50] eventpoll: ep_remove(0000010008964000, 0000010005e2cc80) = 0 [0000010017126e50] eventpoll: eventpoll_release_file(000001000402b2c0) [0000010017126e50] eventpoll: remove ep=0000010008964000 epi=000001000d9c6b40 [0000010017126e50] eventpoll: ep_unlink(0000010008964000, 000001000402b2c0) = 0 [0000010017126e50] eventpoll: ep_remove(0000010008964000, 000001000402b2c0) = 0 [0000010017126e50] eventpoll: close() ep=0000010008964000 [0000010017126e50] eventpoll: poll_callback(00000100099822c0) epi=000001000d9c6a80 ep=000001001f928000 [0000010002ba7520] eventpoll: polling file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80 [0000010002ba7520] eventpoll: pollres file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80 events=17 [0000010002ba7520] eventpoll: polling file=000001000615d980 ep=000001001f928000 epi=000001000d9c69c0 [0000010002ba7520] eventpoll: pollres file=000001000615d980 ep=000001001f928000 epi=000001000d9c69c0 events=16 [0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000) = 2 [0000010002ba7520] eventpoll: eventpoll_release_file(00000100099822c0) [0000010002ba7520] eventpoll: remove ep=000001001f928000 epi=000001000d9c6a80 [0000010002ba7520] eventpoll: ep_unlink(000001001f928000, 00000100099822c0) = 0 [0000010002ba7520] eventpoll: ep_remove(000001001f928000, 00000100099822c0) = 0 [0000010002ba7520] eventpoll: eventpoll_release_file(000001000615d980) [0000010002ba7520] eventpoll: remove ep=000001001f928000 epi=000001000d9c69c0 [0000010002ba7520] eventpoll: ep_unlink(000001001f928000, 000001000615d980) = 0 [0000010002ba7520] eventpoll: ep_remove(000001001f928000, 000001000615d980) = 0 [0000010002ba7520] eventpoll: close() ep=000001001f928000 -- Ben Mansell, <ben@zeus.com> Zeus Technology Ltd Download the world's fastest webserver! Universally Serving the Net T:+44(0)1223 525000 F:+44(0)1223 525100 http://www.zeus.com Zeus House, Cowley Road, Cambridge, CB4 0ZT, ENGLAND ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: epoll gives broken results when interrupted with a signal 2003-10-30 14:48 ` Ben Mansell @ 2003-10-30 23:36 ` Davide Libenzi 2003-11-03 11:07 ` Ben Mansell 2003-10-31 23:00 ` Davide Libenzi 1 sibling, 1 reply; 9+ messages in thread From: Davide Libenzi @ 2003-10-30 23:36 UTC (permalink / raw) To: Ben Mansell; +Cc: Linux Kernel Mailing List On Thu, 30 Oct 2003, Ben Mansell wrote: > On Thu, 30 Oct 2003, Ben Mansell wrote: > > > On Wed, 29 Oct 2003, Davide Libenzi wrote: > > > > > Can you try the patch below and show me a dmesg when this happen? > > > > Ok, patch applied. (I changed DEBUG_EPOLL to 10 however, otherwise > > nothing would be printed). Now, epoll appears to behave perfectly and I > > can't re-create the problem :( > > Got it! I was missing the problem because I had removed some debug > messages in my own code. Here's another run, this time the > final epoll_wait() call of the child process brings back 2 events: > Event 0 fd: 7 events: 17 > Event 1 fd: -2095926561 events: 0 > > I've added the debug to the end of this message. > > If I modify the code so there are several 'child' processes, all > monitoring the same sockets with their own epolls, they all seem to get > the same results from epoll_wait(). > > > > Also, it shouldn't change a whit, but are you able to try on a x86 UP? > > On a UP x86, 2.6.0-test9, I can't reproduce the problem at all. Could you try to poison the event buffer before an epoll_wait() to see how many bytes are effectively written by the function? memset(events, 'e', num * sizeof(epoll_event)); - Davide ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: epoll gives broken results when interrupted with a signal 2003-10-30 23:36 ` Davide Libenzi @ 2003-11-03 11:07 ` Ben Mansell 0 siblings, 0 replies; 9+ messages in thread From: Ben Mansell @ 2003-11-03 11:07 UTC (permalink / raw) To: Davide Libenzi; +Cc: Linux Kernel Mailing List On Thu, 30 Oct 2003, Davide Libenzi wrote: > Could you try to poison the event buffer before an epoll_wait() to see how > many bytes are effectively written by the function? > > memset(events, 'e', num * sizeof(epoll_event)); aha! This is interesting. When epoll seems to go wrong, the epoll_event.data of the second event is unwritten, but the .events field is zeroed. Sometimes though, the final call to epoll_wait() returns just one event instead of the two, and in that case, the high 32 bits of the first epoll_event.data are unwritten. The lower 32 bits contain the right data, and as the code treats this as a (int) FD, the top 32 bits are discarded anyway. Drat! I've just looked back at the kernel headers and noticed the following, prior to the epoll_event definition: /* * On x86-64 make the 64bit structure have the same alignment as the * 32bit structure. This makes 32bit emulation easier. */ #ifdef __x86_64__ #define EPOLL_PACKED __attribute__((packed)) #else #define EPOLL_PACKED #endif struct epoll_event { __u32 events; __u64 data; } EPOLL_PACKED; Now, my code can't pull in the kernel headers so it has its own epoll_event definition, which is missing the packed attribute. D'oh! Guess what, everything works properly when I add it in. Looks like this bug has been of my own making all along... Sorry for taking up your time! Ben -- Ben Mansell, <ben@zeus.com> Zeus Technology Ltd Download the world's fastest webserver! Universally Serving the Net T:+44(0)1223 525000 F:+44(0)1223 525100 http://www.zeus.com Zeus House, Cowley Road, Cambridge, CB4 0ZT, ENGLAND ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: epoll gives broken results when interrupted with a signal 2003-10-30 14:48 ` Ben Mansell 2003-10-30 23:36 ` Davide Libenzi @ 2003-10-31 23:00 ` Davide Libenzi 1 sibling, 0 replies; 9+ messages in thread From: Davide Libenzi @ 2003-10-31 23:00 UTC (permalink / raw) To: Ben Mansell; +Cc: Linux Kernel Mailing List On Thu, 30 Oct 2003, Ben Mansell wrote: > On Thu, 30 Oct 2003, Ben Mansell wrote: > > > On Wed, 29 Oct 2003, Davide Libenzi wrote: > > > > > Can you try the patch below and show me a dmesg when this happen? > > > > Ok, patch applied. (I changed DEBUG_EPOLL to 10 however, otherwise > > nothing would be printed). Now, epoll appears to behave perfectly and I > > can't re-create the problem :( > > Got it! I was missing the problem because I had removed some debug > messages in my own code. Here's another run, this time the > final epoll_wait() call of the child process brings back 2 events: > Event 0 fd: 7 events: 17 > Event 1 fd: -2095926561 events: 0 It is really strage. If you look what epoll sees: > [0000010002ba7520] eventpoll: polling file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80 > [0000010002ba7520] eventpoll: pollres file=00000100099822c0 ep=000001001f928000 epi=000001000d9c6a80 events=17 > [0000010002ba7520] eventpoll: polling file=000001000615d980 ep=000001001f928000 epi=000001000d9c69c0 > [0000010002ba7520] eventpoll: pollres file=000001000615d980 ep=000001001f928000 epi=000001000d9c69c0 events=16 > [0000010002ba7520] eventpoll: sys_epoll_wait(3, 000000000073e390, 32, 1000) = 2 > [0000010002ba7520] eventpoll: eventpoll_release_file(00000100099822c0) > [0000010002ba7520] eventpoll: remove ep=000001001f928000 epi=000001000d9c6a80 > [0000010002ba7520] eventpoll: ep_unlink(000001001f928000, 00000100099822c0) = 0 > [0000010002ba7520] eventpoll: ep_remove(000001001f928000, 00000100099822c0) = 0 > [0000010002ba7520] eventpoll: eventpoll_release_file(000001000615d980) > [0000010002ba7520] eventpoll: remove ep=000001001f928000 epi=000001000d9c69c0 > [0000010002ba7520] eventpoll: ep_unlink(000001001f928000, 000001000615d980) = 0 > [0000010002ba7520] eventpoll: ep_remove(000001001f928000, 000001000615d980) = 0 > [0000010002ba7520] eventpoll: close() ep=000001001f928000 It clearly sees two events with masks 16 and 17. And at this points events are already inside a buffer ready to be pushed to usespace with a copy_to_user(). - Davide ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2003-11-03 11:07 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-10-29 15:00 epoll gives broken results when interrupted with a signal Ben Mansell 2003-10-29 17:26 ` Davide Libenzi 2003-10-29 17:38 ` Ben Mansell 2003-10-29 19:23 ` Davide Libenzi 2003-10-30 11:22 ` Ben Mansell 2003-10-30 14:48 ` Ben Mansell 2003-10-30 23:36 ` Davide Libenzi 2003-11-03 11:07 ` Ben Mansell 2003-10-31 23:00 ` Davide Libenzi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).