* sys_epoll_wait high CPU load in 2.6.37 @ 2011-01-26 0:09 Simon Kirby 2011-01-26 7:18 ` Eric Dumazet 0 siblings, 1 reply; 14+ messages in thread From: Simon Kirby @ 2011-01-26 0:09 UTC (permalink / raw) To: linux-kernel, Shawn Bohrer, Eric Dumazet Hello! Since upgrading 2.6.36 -> 2.6.37, dovecot's "anvil" process seems to end up taking a lot more time in "top", and "perf top" shows output like this (system-wide): samples pcnt function DSO _______ _____ _____________________________ __________________________ 2405.00 68.8% sys_epoll_wait [kernel.kallsyms] 33.00 0.9% mail_cache_lookup_iter_next libdovecot-storage.so.0.0.0 30.00 0.9% _raw_spin_lock [kernel.kallsyms] ...etc... It only wakes up 5-10 times per second or so (on this box), and does stuff like this: epoll_wait(12, {{EPOLLIN, {u32=19417616, u64=19417616}}}, 25, 2147483647) = 1 read(29, "PENALTY-GET\t192.168.31.10\n"..., 738) = 26 write(29, "0 0\n"..., 4) = 4 epoll_wait(12, {{EPOLLIN, {u32=19395632, u64=19395632}}}, 25, 2147483647) = 1 read(18, "LOOKUP\tpop3/192.168.31.10/tshield"..., 668) = 58 write(18, "0\n"..., 2) = 2 epoll_wait(12, {{EPOLLIN, {u32=19373072, u64=19373072}}}, 25, 2147483647) = 1 read(7, "CONNECT\t3490\tpop3/192.168.31.10/t"..., 254) = 64 epoll_wait(12, {{EPOLLIN, {u32=19373072, u64=19373072}}}, 25, 2147483647) = 1 read(7, "DISCONNECT\t3482\tpop3/192.168.31.1"..., 190) = 62 Anything obvious here? anvil talks over UNIX sockets to the rest of dovecot, and uses epoll_wait. So, suspect commits might be: 95aac7b1cd224f568fb83937044cd303ff11b029 5456f09aaf88731e16dbcea7522cb330b6846415 or other bits from git log v2.6.36..v2.6.37 net/unix/af_unix.c fs/eventpoll.c I suspect it has something to do with that "infinite value" check removal in that first commit. It doesn't show up easily on a test box, but I can try reverting 95aac7b1cd in production if it's not obvious. Simon- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37 2011-01-26 0:09 sys_epoll_wait high CPU load in 2.6.37 Simon Kirby @ 2011-01-26 7:18 ` Eric Dumazet 2011-01-26 11:16 ` Eric Dumazet 0 siblings, 1 reply; 14+ messages in thread From: Eric Dumazet @ 2011-01-26 7:18 UTC (permalink / raw) To: Simon Kirby; +Cc: linux-kernel, Shawn Bohrer, Davide Libenzi Le mardi 25 janvier 2011 à 16:09 -0800, Simon Kirby a écrit : > Hello! > > Since upgrading 2.6.36 -> 2.6.37, dovecot's "anvil" process seems to end > up taking a lot more time in "top", and "perf top" shows output like this > (system-wide): > > samples pcnt function DSO > _______ _____ _____________________________ __________________________ > > 2405.00 68.8% sys_epoll_wait [kernel.kallsyms] > 33.00 0.9% mail_cache_lookup_iter_next libdovecot-storage.so.0.0.0 > 30.00 0.9% _raw_spin_lock [kernel.kallsyms] > ...etc... > > It only wakes up 5-10 times per second or so (on this box), and does > stuff like this: > > epoll_wait(12, {{EPOLLIN, {u32=19417616, u64=19417616}}}, 25, 2147483647) = 1 > read(29, "PENALTY-GET\t192.168.31.10\n"..., 738) = 26 > write(29, "0 0\n"..., 4) = 4 > epoll_wait(12, {{EPOLLIN, {u32=19395632, u64=19395632}}}, 25, 2147483647) = 1 > read(18, "LOOKUP\tpop3/192.168.31.10/tshield"..., 668) = 58 > write(18, "0\n"..., 2) = 2 > epoll_wait(12, {{EPOLLIN, {u32=19373072, u64=19373072}}}, 25, 2147483647) = 1 > read(7, "CONNECT\t3490\tpop3/192.168.31.10/t"..., 254) = 64 > epoll_wait(12, {{EPOLLIN, {u32=19373072, u64=19373072}}}, 25, 2147483647) = 1 > read(7, "DISCONNECT\t3482\tpop3/192.168.31.1"..., 190) = 62 > > Anything obvious here? anvil talks over UNIX sockets to the rest of > dovecot, and uses epoll_wait. So, suspect commits might be: > > 95aac7b1cd224f568fb83937044cd303ff11b029 > 5456f09aaf88731e16dbcea7522cb330b6846415 > or other bits from > git log v2.6.36..v2.6.37 net/unix/af_unix.c fs/eventpoll.c > > I suspect it has something to do with that "infinite value" check removal > in that first commit. It doesn't show up easily on a test box, but I can > try reverting 95aac7b1cd in production if it's not obvious. > > Simon- Yes, 95aac7b1cd is the problem, but anvil should use a 0 (no) timeout instead of 2147483647 ms : epoll_wait() doesnt have to arm a timer in this case, it is a bit faster. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37 2011-01-26 7:18 ` Eric Dumazet @ 2011-01-26 11:16 ` Eric Dumazet 2011-01-26 15:31 ` Davide Libenzi 0 siblings, 1 reply; 14+ messages in thread From: Eric Dumazet @ 2011-01-26 11:16 UTC (permalink / raw) To: Simon Kirby; +Cc: linux-kernel, Shawn Bohrer, Davide Libenzi, Andrew Morton Le mercredi 26 janvier 2011 à 08:18 +0100, Eric Dumazet a écrit : > Le mardi 25 janvier 2011 à 16:09 -0800, Simon Kirby a écrit : > > Hello! > > > > Since upgrading 2.6.36 -> 2.6.37, dovecot's "anvil" process seems to end > > up taking a lot more time in "top", and "perf top" shows output like this > > (system-wide): > > > > samples pcnt function DSO > > _______ _____ _____________________________ __________________________ > > > > 2405.00 68.8% sys_epoll_wait [kernel.kallsyms] > > 33.00 0.9% mail_cache_lookup_iter_next libdovecot-storage.so.0.0.0 > > 30.00 0.9% _raw_spin_lock [kernel.kallsyms] > > ...etc... > > > > It only wakes up 5-10 times per second or so (on this box), and does > > stuff like this: > > > > epoll_wait(12, {{EPOLLIN, {u32=19417616, u64=19417616}}}, 25, 2147483647) = 1 > > read(29, "PENALTY-GET\t192.168.31.10\n"..., 738) = 26 > > write(29, "0 0\n"..., 4) = 4 > > epoll_wait(12, {{EPOLLIN, {u32=19395632, u64=19395632}}}, 25, 2147483647) = 1 > > read(18, "LOOKUP\tpop3/192.168.31.10/tshield"..., 668) = 58 > > write(18, "0\n"..., 2) = 2 > > epoll_wait(12, {{EPOLLIN, {u32=19373072, u64=19373072}}}, 25, 2147483647) = 1 > > read(7, "CONNECT\t3490\tpop3/192.168.31.10/t"..., 254) = 64 > > epoll_wait(12, {{EPOLLIN, {u32=19373072, u64=19373072}}}, 25, 2147483647) = 1 > > read(7, "DISCONNECT\t3482\tpop3/192.168.31.1"..., 190) = 62 > > > > Anything obvious here? anvil talks over UNIX sockets to the rest of > > dovecot, and uses epoll_wait. So, suspect commits might be: > > > > 95aac7b1cd224f568fb83937044cd303ff11b029 > > 5456f09aaf88731e16dbcea7522cb330b6846415 > > or other bits from > > git log v2.6.36..v2.6.37 net/unix/af_unix.c fs/eventpoll.c > > > > I suspect it has something to do with that "infinite value" check removal > > in that first commit. It doesn't show up easily on a test box, but I can > > try reverting 95aac7b1cd in production if it's not obvious. > > > > Simon- > > Yes, 95aac7b1cd is the problem, but anvil should use a 0 (no) timeout > instead of 2147483647 ms : epoll_wait() doesnt have to arm a timer in > this case, it is a bit faster. > > Slowness comes from timespec_add_ns() : This one assumed small 'ns' argument, since it wants to avoid a divide instruction. static __always_inline void timespec_add_ns(struct timespec *a, u64 ns) { a->tv_sec += __iter_div_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns); a->tv_nsec = ns; } We should do this differently for epoll usage ;) Please try following patch : [PATCH] epoll: epoll_wait() should be careful in timespec_add_ns use commit 95aac7b1cd224f (epoll: make epoll_wait() use the hrtimer range feature) added a performance regression because it used timespec_add_ns() with potential very large 'ns' values. Reported-by: Simon Kirby <sim@hostway.ca> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Shawn Bohrer <shawn.bohrer@gmail.com> CC: Davide Libenzi <davidel@xmailserver.org> CC: Andrew Morton <akpm@linux-foundation.org> --- fs/eventpoll.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index cc8a9b7..7ec0890 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1126,7 +1126,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, if (timeout > 0) { ktime_get_ts(&end_time); - timespec_add_ns(&end_time, (u64)timeout * NSEC_PER_MSEC); + end_time.tv_sec += timeout / MSEC_PER_SEC; + timeout %= MSEC_PER_SEC; + timespec_add_ns(&end_time, timeout * NSEC_PER_MSEC); slack = select_estimate_accuracy(&end_time); to = &expires; *to = timespec_to_ktime(end_time); ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37 2011-01-26 11:16 ` Eric Dumazet @ 2011-01-26 15:31 ` Davide Libenzi 2011-01-26 15:43 ` Eric Dumazet 0 siblings, 1 reply; 14+ messages in thread From: Davide Libenzi @ 2011-01-26 15:31 UTC (permalink / raw) To: Eric Dumazet Cc: Simon Kirby, Linux Kernel Mailing List, Shawn Bohrer, Andrew Morton [-- Attachment #1: Type: TEXT/PLAIN, Size: 4048 bytes --] On Wed, 26 Jan 2011, Eric Dumazet wrote: > Le mercredi 26 janvier 2011 à 08:18 +0100, Eric Dumazet a écrit : > > Le mardi 25 janvier 2011 à 16:09 -0800, Simon Kirby a écrit : > > > Hello! > > > > > > Since upgrading 2.6.36 -> 2.6.37, dovecot's "anvil" process seems to end > > > up taking a lot more time in "top", and "perf top" shows output like this > > > (system-wide): > > > > > > samples pcnt function DSO > > > _______ _____ _____________________________ __________________________ > > > > > > 2405.00 68.8% sys_epoll_wait [kernel.kallsyms] > > > 33.00 0.9% mail_cache_lookup_iter_next libdovecot-storage.so.0.0.0 > > > 30.00 0.9% _raw_spin_lock [kernel.kallsyms] > > > ...etc... > > > > > > It only wakes up 5-10 times per second or so (on this box), and does > > > stuff like this: > > > > > > epoll_wait(12, {{EPOLLIN, {u32=19417616, u64=19417616}}}, 25, 2147483647) = 1 > > > read(29, "PENALTY-GET\t192.168.31.10\n"..., 738) = 26 > > > write(29, "0 0\n"..., 4) = 4 > > > epoll_wait(12, {{EPOLLIN, {u32=19395632, u64=19395632}}}, 25, 2147483647) = 1 > > > read(18, "LOOKUP\tpop3/192.168.31.10/tshield"..., 668) = 58 > > > write(18, "0\n"..., 2) = 2 > > > epoll_wait(12, {{EPOLLIN, {u32=19373072, u64=19373072}}}, 25, 2147483647) = 1 > > > read(7, "CONNECT\t3490\tpop3/192.168.31.10/t"..., 254) = 64 > > > epoll_wait(12, {{EPOLLIN, {u32=19373072, u64=19373072}}}, 25, 2147483647) = 1 > > > read(7, "DISCONNECT\t3482\tpop3/192.168.31.1"..., 190) = 62 > > > > > > Anything obvious here? anvil talks over UNIX sockets to the rest of > > > dovecot, and uses epoll_wait. So, suspect commits might be: > > > > > > 95aac7b1cd224f568fb83937044cd303ff11b029 > > > 5456f09aaf88731e16dbcea7522cb330b6846415 > > > or other bits from > > > git log v2.6.36..v2.6.37 net/unix/af_unix.c fs/eventpoll.c > > > > > > I suspect it has something to do with that "infinite value" check removal > > > in that first commit. It doesn't show up easily on a test box, but I can > > > try reverting 95aac7b1cd in production if it's not obvious. > > > > > > Simon- > > > > Yes, 95aac7b1cd is the problem, but anvil should use a 0 (no) timeout > > instead of 2147483647 ms : epoll_wait() doesnt have to arm a timer in > > this case, it is a bit faster. > > > > > > Slowness comes from timespec_add_ns() : This one assumed small 'ns' > argument, since it wants to avoid a divide instruction. > > static __always_inline void timespec_add_ns(struct timespec *a, u64 ns) > { > a->tv_sec += __iter_div_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns); > a->tv_nsec = ns; > } > > We should do this differently for epoll usage ;) > > Please try following patch : > > [PATCH] epoll: epoll_wait() should be careful in timespec_add_ns use > > commit 95aac7b1cd224f (epoll: make epoll_wait() use the hrtimer range > feature) added a performance regression because it used > timespec_add_ns() with potential very large 'ns' values. > > Reported-by: Simon Kirby <sim@hostway.ca> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > CC: Shawn Bohrer <shawn.bohrer@gmail.com> > CC: Davide Libenzi <davidel@xmailserver.org> > CC: Andrew Morton <akpm@linux-foundation.org> > --- > fs/eventpoll.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c > index cc8a9b7..7ec0890 100644 > --- a/fs/eventpoll.c > +++ b/fs/eventpoll.c > @@ -1126,7 +1126,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, > > if (timeout > 0) { > ktime_get_ts(&end_time); > - timespec_add_ns(&end_time, (u64)timeout * NSEC_PER_MSEC); > + end_time.tv_sec += timeout / MSEC_PER_SEC; > + timeout %= MSEC_PER_SEC; > + timespec_add_ns(&end_time, timeout * NSEC_PER_MSEC); > slack = select_estimate_accuracy(&end_time); > to = &expires; > *to = timespec_to_ktime(end_time); Yep, we can overflow the timeout, with the calculation above. A timespec_add_ms()? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37 2011-01-26 15:31 ` Davide Libenzi @ 2011-01-26 15:43 ` Eric Dumazet 2011-01-26 15:52 ` Davide Libenzi 0 siblings, 1 reply; 14+ messages in thread From: Eric Dumazet @ 2011-01-26 15:43 UTC (permalink / raw) To: Davide Libenzi Cc: Simon Kirby, Linux Kernel Mailing List, Shawn Bohrer, Andrew Morton Le mercredi 26 janvier 2011 à 07:31 -0800, Davide Libenzi a écrit : > On Wed, 26 Jan 2011, Eric Dumazet wrote: > > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c > > index cc8a9b7..7ec0890 100644 > > --- a/fs/eventpoll.c > > +++ b/fs/eventpoll.c > > @@ -1126,7 +1126,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, > > > > if (timeout > 0) { > > ktime_get_ts(&end_time); > > - timespec_add_ns(&end_time, (u64)timeout * NSEC_PER_MSEC); > > + end_time.tv_sec += timeout / MSEC_PER_SEC; > > + timeout %= MSEC_PER_SEC; > > + timespec_add_ns(&end_time, timeout * NSEC_PER_MSEC); > > slack = select_estimate_accuracy(&end_time); > > to = &expires; > > *to = timespec_to_ktime(end_time); > > Yep, we can overflow the timeout, with the calculation above. > A timespec_add_ms()? Well, given timeout after modulo contains a number between 0 and 999, multiply by 1.000.000 (NSEC_PER_MSEC) cant overflow. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37 2011-01-26 15:43 ` Eric Dumazet @ 2011-01-26 15:52 ` Davide Libenzi 2011-01-26 15:59 ` Eric Dumazet 0 siblings, 1 reply; 14+ messages in thread From: Davide Libenzi @ 2011-01-26 15:52 UTC (permalink / raw) To: Eric Dumazet Cc: Simon Kirby, Linux Kernel Mailing List, Shawn Bohrer, Andrew Morton [-- Attachment #1: Type: TEXT/PLAIN, Size: 1285 bytes --] On Wed, 26 Jan 2011, Eric Dumazet wrote: > Le mercredi 26 janvier 2011 à 07:31 -0800, Davide Libenzi a écrit : > > On Wed, 26 Jan 2011, Eric Dumazet wrote: > > > > > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c > > > index cc8a9b7..7ec0890 100644 > > > --- a/fs/eventpoll.c > > > +++ b/fs/eventpoll.c > > > @@ -1126,7 +1126,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, > > > > > > if (timeout > 0) { > > > ktime_get_ts(&end_time); > > > - timespec_add_ns(&end_time, (u64)timeout * NSEC_PER_MSEC); > > > + end_time.tv_sec += timeout / MSEC_PER_SEC; > > > + timeout %= MSEC_PER_SEC; > > > + timespec_add_ns(&end_time, timeout * NSEC_PER_MSEC); > > > slack = select_estimate_accuracy(&end_time); > > > to = &expires; > > > *to = timespec_to_ktime(end_time); > > > > Yep, we can overflow the timeout, with the calculation above. > > A timespec_add_ms()? > > Well, given timeout after modulo contains a number between 0 and 999, > multiply by 1.000.000 (NSEC_PER_MSEC) cant overflow. For "above", I meant the current epoll expire time calculation, which was described above in the message ;) The hint for a timespec_add_ms() was because we must be doing something similar in poll, don't we (/me got no code in front ATM)? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37 2011-01-26 15:52 ` Davide Libenzi @ 2011-01-26 15:59 ` Eric Dumazet 2011-01-26 16:13 ` Eric Dumazet 0 siblings, 1 reply; 14+ messages in thread From: Eric Dumazet @ 2011-01-26 15:59 UTC (permalink / raw) To: Davide Libenzi Cc: Simon Kirby, Linux Kernel Mailing List, Shawn Bohrer, Andrew Morton Le mercredi 26 janvier 2011 à 07:52 -0800, Davide Libenzi a écrit : > For "above", I meant the current epoll expire time calculation, which was > described above in the message ;) Well, problem was not an overflow, but doing a loop 2.000.000 times ;) > The hint for a timespec_add_ms() was because we must be doing something > similar in poll, don't we (/me got no code in front ATM)? Apparently its done differently in poll(), using poll_select_set_timeout() helper. Give me some minutes I'll try to cook an alternate patch ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37 2011-01-26 15:59 ` Eric Dumazet @ 2011-01-26 16:13 ` Eric Dumazet 2011-01-26 17:20 ` Shawn Bohrer 0 siblings, 1 reply; 14+ messages in thread From: Eric Dumazet @ 2011-01-26 16:13 UTC (permalink / raw) To: Davide Libenzi Cc: Simon Kirby, Linux Kernel Mailing List, Shawn Bohrer, Andrew Morton, Thomas Gleixner Le mercredi 26 janvier 2011 à 16:59 +0100, Eric Dumazet a écrit : > Le mercredi 26 janvier 2011 à 07:52 -0800, Davide Libenzi a écrit : > > > For "above", I meant the current epoll expire time calculation, which was > > described above in the message ;) > > Well, problem was not an overflow, but doing a loop 2.000.000 times ;) > > > The hint for a timespec_add_ms() was because we must be doing something > > similar in poll, don't we (/me got no code in front ATM)? > > Apparently its done differently in poll(), using > poll_select_set_timeout() helper. > > > Give me some minutes I'll try to cook an alternate patch > Here is the alternate patch, using poll_select_set_timeout() helper Thanks [PATCH v2] epoll: epoll_wait() should not use timespec_add_ns() commit 95aac7b1cd224f (epoll: make epoll_wait() use the hrtimer range feature) added a performance regression because it uses timespec_add_ns() with potential very large 'ns' values. Use poll_select_set_timeout() helper like poll()/select() Reported-by: Simon Kirby <sim@hostway.ca> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Shawn Bohrer <shawn.bohrer@gmail.com> CC: Davide Libenzi <davidel@xmailserver.org> CC: Thomas Gleixner <tglx@linutronix.de> CC: Andrew Morton <akpm@linux-foundation.org> --- fs/eventpoll.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index cc8a9b7..94d887b 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1125,8 +1125,8 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, ktime_t expires, *to = NULL; if (timeout > 0) { - ktime_get_ts(&end_time); - timespec_add_ns(&end_time, (u64)timeout * NSEC_PER_MSEC); + poll_select_set_timeout(&end_time, timeout / MSEC_PER_SEC, + NSEC_PER_MSEC * (timeout % MSEC_PER_SEC)); slack = select_estimate_accuracy(&end_time); to = &expires; *to = timespec_to_ktime(end_time); ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37 2011-01-26 16:13 ` Eric Dumazet @ 2011-01-26 17:20 ` Shawn Bohrer 2011-01-26 17:51 ` Eric Dumazet 0 siblings, 1 reply; 14+ messages in thread From: Shawn Bohrer @ 2011-01-26 17:20 UTC (permalink / raw) To: Eric Dumazet Cc: Davide Libenzi, Simon Kirby, Linux Kernel Mailing List, Andrew Morton, Thomas Gleixner On Wed, Jan 26, 2011 at 05:13:29PM +0100, Eric Dumazet wrote: > Le mercredi 26 janvier 2011 à 16:59 +0100, Eric Dumazet a écrit : > > Le mercredi 26 janvier 2011 à 07:52 -0800, Davide Libenzi a écrit : > > > > > For "above", I meant the current epoll expire time calculation, which was > > > described above in the message ;) > > > > Well, problem was not an overflow, but doing a loop 2.000.000 times ;) > > > > > The hint for a timespec_add_ms() was because we must be doing something > > > similar in poll, don't we (/me got no code in front ATM)? > > > > Apparently its done differently in poll(), using > > poll_select_set_timeout() helper. > > > > > > Give me some minutes I'll try to cook an alternate patch > > > > Here is the alternate patch, using poll_select_set_timeout() helper > > Thanks > > [PATCH v2] epoll: epoll_wait() should not use timespec_add_ns() > > commit 95aac7b1cd224f (epoll: make epoll_wait() use the hrtimer range > feature) added a performance regression because it uses > timespec_add_ns() with potential very large 'ns' values. > > Use poll_select_set_timeout() helper like poll()/select() > > Reported-by: Simon Kirby <sim@hostway.ca> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > CC: Shawn Bohrer <shawn.bohrer@gmail.com> > CC: Davide Libenzi <davidel@xmailserver.org> > CC: Thomas Gleixner <tglx@linutronix.de> > CC: Andrew Morton <akpm@linux-foundation.org> > --- > fs/eventpoll.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c > index cc8a9b7..94d887b 100644 > --- a/fs/eventpoll.c > +++ b/fs/eventpoll.c > @@ -1125,8 +1125,8 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, > ktime_t expires, *to = NULL; > > if (timeout > 0) { > - ktime_get_ts(&end_time); > - timespec_add_ns(&end_time, (u64)timeout * NSEC_PER_MSEC); > + poll_select_set_timeout(&end_time, timeout / MSEC_PER_SEC, > + NSEC_PER_MSEC * (timeout % MSEC_PER_SEC)); > slack = select_estimate_accuracy(&end_time); > to = &expires; > *to = timespec_to_ktime(end_time); poll_select_set_timeout() jumps through some extra hoops that aren't necessary in the epoll case so I actually like your previous patch better. -- Shawn ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37 2011-01-26 17:20 ` Shawn Bohrer @ 2011-01-26 17:51 ` Eric Dumazet 2011-01-26 18:06 ` Davide Libenzi 0 siblings, 1 reply; 14+ messages in thread From: Eric Dumazet @ 2011-01-26 17:51 UTC (permalink / raw) To: Shawn Bohrer Cc: Davide Libenzi, Simon Kirby, Linux Kernel Mailing List, Andrew Morton, Thomas Gleixner Le mercredi 26 janvier 2011 à 11:20 -0600, Shawn Bohrer a écrit : > On Wed, Jan 26, 2011 at 05:13:29PM +0100, Eric Dumazet wrote: > > Le mercredi 26 janvier 2011 à 16:59 +0100, Eric Dumazet a écrit : > > > Le mercredi 26 janvier 2011 à 07:52 -0800, Davide Libenzi a écrit : > > > > > > > For "above", I meant the current epoll expire time calculation, which was > > > > described above in the message ;) > > > > > > Well, problem was not an overflow, but doing a loop 2.000.000 times ;) > > > > > > > The hint for a timespec_add_ms() was because we must be doing something > > > > similar in poll, don't we (/me got no code in front ATM)? > > > > > > Apparently its done differently in poll(), using > > > poll_select_set_timeout() helper. > > > > > > > > > Give me some minutes I'll try to cook an alternate patch > > > > > > > Here is the alternate patch, using poll_select_set_timeout() helper > > > > Thanks > > > > [PATCH v2] epoll: epoll_wait() should not use timespec_add_ns() > > > > commit 95aac7b1cd224f (epoll: make epoll_wait() use the hrtimer range > > feature) added a performance regression because it uses > > timespec_add_ns() with potential very large 'ns' values. > > > > Use poll_select_set_timeout() helper like poll()/select() > > > > Reported-by: Simon Kirby <sim@hostway.ca> > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > > CC: Shawn Bohrer <shawn.bohrer@gmail.com> > > CC: Davide Libenzi <davidel@xmailserver.org> > > CC: Thomas Gleixner <tglx@linutronix.de> > > CC: Andrew Morton <akpm@linux-foundation.org> > > --- > > fs/eventpoll.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c > > index cc8a9b7..94d887b 100644 > > --- a/fs/eventpoll.c > > +++ b/fs/eventpoll.c > > @@ -1125,8 +1125,8 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, > > ktime_t expires, *to = NULL; > > > > if (timeout > 0) { > > - ktime_get_ts(&end_time); > > - timespec_add_ns(&end_time, (u64)timeout * NSEC_PER_MSEC); > > + poll_select_set_timeout(&end_time, timeout / MSEC_PER_SEC, > > + NSEC_PER_MSEC * (timeout % MSEC_PER_SEC)); > > slack = select_estimate_accuracy(&end_time); > > to = &expires; > > *to = timespec_to_ktime(end_time); > > poll_select_set_timeout() jumps through some extra hoops that > aren't necessary in the epoll case so I actually like your previous > patch better. Well, I dont care, I let Davide decide, he is the boss ;) This is a stable candidate, so adding timespec_add_ms() sounds overkill. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37 2011-01-26 17:51 ` Eric Dumazet @ 2011-01-26 18:06 ` Davide Libenzi 2011-01-26 18:18 ` Eric Dumazet 0 siblings, 1 reply; 14+ messages in thread From: Davide Libenzi @ 2011-01-26 18:06 UTC (permalink / raw) To: Eric Dumazet Cc: Shawn Bohrer, Simon Kirby, Linux Kernel Mailing List, Andrew Morton, Thomas Gleixner On Wed, 26 Jan 2011, Eric Dumazet wrote: > > > --- a/fs/eventpoll.c > > > +++ b/fs/eventpoll.c > > > @@ -1125,8 +1125,8 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, > > > ktime_t expires, *to = NULL; > > > > > > if (timeout > 0) { > > > - ktime_get_ts(&end_time); > > > - timespec_add_ns(&end_time, (u64)timeout * NSEC_PER_MSEC); > > > + poll_select_set_timeout(&end_time, timeout / MSEC_PER_SEC, > > > + NSEC_PER_MSEC * (timeout % MSEC_PER_SEC)); > > > slack = select_estimate_accuracy(&end_time); > > > to = &expires; > > > *to = timespec_to_ktime(end_time); > > > > poll_select_set_timeout() jumps through some extra hoops that > > aren't necessary in the epoll case so I actually like your previous > > patch better. > > Well, I dont care, I let Davide decide, he is the boss ;) > > This is a stable candidate, so adding timespec_add_ms() sounds overkill. Eric, if you look at fs/select.c (~line 925), poll does exactly the same thing as epoll do. It too, ignores the eventual return value of poll_select_set_timeout(), so maybe a little bit more optimized ktime_get_ts+timespec_add_ms could make sense. - Davide ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37 2011-01-26 18:06 ` Davide Libenzi @ 2011-01-26 18:18 ` Eric Dumazet 2011-01-26 18:45 ` Eric Dumazet 0 siblings, 1 reply; 14+ messages in thread From: Eric Dumazet @ 2011-01-26 18:18 UTC (permalink / raw) To: Davide Libenzi Cc: Shawn Bohrer, Simon Kirby, Linux Kernel Mailing List, Andrew Morton, Thomas Gleixner Le mercredi 26 janvier 2011 à 10:06 -0800, Davide Libenzi a écrit : > Eric, if you look at fs/select.c (~line 925), poll does exactly the same > thing as epoll do. > It too, ignores the eventual return value of poll_select_set_timeout(), so > maybe a little bit more optimized ktime_get_ts+timespec_add_ms could make > sense. > OK, I'll post a V3 ;) Thanks ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37 2011-01-26 18:18 ` Eric Dumazet @ 2011-01-26 18:45 ` Eric Dumazet 2011-01-28 0:17 ` Davide Libenzi 0 siblings, 1 reply; 14+ messages in thread From: Eric Dumazet @ 2011-01-26 18:45 UTC (permalink / raw) To: Davide Libenzi Cc: Shawn Bohrer, Simon Kirby, Linux Kernel Mailing List, Andrew Morton, Thomas Gleixner Le mercredi 26 janvier 2011 à 19:18 +0100, Eric Dumazet a écrit : > Le mercredi 26 janvier 2011 à 10:06 -0800, Davide Libenzi a écrit : > > > Eric, if you look at fs/select.c (~line 925), poll does exactly the same > > thing as epoll do. > > It too, ignores the eventual return value of poll_select_set_timeout(), so > > maybe a little bit more optimized ktime_get_ts+timespec_add_ms could make > > sense. > > > > OK, I'll post a V3 ;) Well in the poll() case we handle a zero timeout, not in epoll(). So the helper function cannot be shared and can be static to epoll. [PATCH v3] epoll: epoll_wait() should not use timespec_add_ns() commit 95aac7b1cd224f (epoll: make epoll_wait() use the hrtimer range feature) added a performance regression because it uses timespec_add_ns() with potential very large 'ns' values. Reported-by: Simon Kirby <sim@hostway.ca> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Shawn Bohrer <shawn.bohrer@gmail.com> CC: Davide Libenzi <davidel@xmailserver.org> CC: Thomas Gleixner <tglx@linutronix.de> CC: Andrew Morton <akpm@linux-foundation.org> --- fs/eventpoll.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index cc8a9b7..d517aa3 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1114,6 +1114,17 @@ static int ep_send_events(struct eventpoll *ep, return ep_scan_ready_list(ep, ep_send_events_proc, &esed); } +static inline struct timespec epoll_set_mstimeout(long ms) +{ + struct timespec now, ts = { + .tv_sec = ms / MSEC_PER_SEC, + .tv_nsec = NSEC_PER_MSEC * (ms % MSEC_PER_SEC), + }; + + ktime_get_ts(&now); + return timespec_add_safe(now, ts); +} + static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, int maxevents, long timeout) { @@ -1121,12 +1132,11 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, unsigned long flags; long slack; wait_queue_t wait; - struct timespec end_time; ktime_t expires, *to = NULL; if (timeout > 0) { - ktime_get_ts(&end_time); - timespec_add_ns(&end_time, (u64)timeout * NSEC_PER_MSEC); + struct timespec end_time = epoll_set_mstimeout(timeout); + slack = select_estimate_accuracy(&end_time); to = &expires; *to = timespec_to_ktime(end_time); ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37 2011-01-26 18:45 ` Eric Dumazet @ 2011-01-28 0:17 ` Davide Libenzi 0 siblings, 0 replies; 14+ messages in thread From: Davide Libenzi @ 2011-01-28 0:17 UTC (permalink / raw) To: Eric Dumazet Cc: Shawn Bohrer, Simon Kirby, Linux Kernel Mailing List, Andrew Morton, Thomas Gleixner [-- Attachment #1: Type: TEXT/PLAIN, Size: 2658 bytes --] On Wed, 26 Jan 2011, Eric Dumazet wrote: > Le mercredi 26 janvier 2011 à 19:18 +0100, Eric Dumazet a écrit : > > Le mercredi 26 janvier 2011 à 10:06 -0800, Davide Libenzi a écrit : > > > > > Eric, if you look at fs/select.c (~line 925), poll does exactly the same > > > thing as epoll do. > > > It too, ignores the eventual return value of poll_select_set_timeout(), so > > > maybe a little bit more optimized ktime_get_ts+timespec_add_ms could make > > > sense. > > > > > > > OK, I'll post a V3 ;) > > Well in the poll() case we handle a zero timeout, not in epoll(). > > So the helper function cannot be shared and can be static to epoll. > > > [PATCH v3] epoll: epoll_wait() should not use timespec_add_ns() > > commit 95aac7b1cd224f (epoll: make epoll_wait() use the hrtimer range > feature) added a performance regression because it uses > timespec_add_ns() with potential very large 'ns' values. > > Reported-by: Simon Kirby <sim@hostway.ca> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > CC: Shawn Bohrer <shawn.bohrer@gmail.com> > CC: Davide Libenzi <davidel@xmailserver.org> > CC: Thomas Gleixner <tglx@linutronix.de> > CC: Andrew Morton <akpm@linux-foundation.org> > --- > fs/eventpoll.c | 16 +++++++++++++--- > 1 file changed, 13 insertions(+), 3 deletions(-) > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c > index cc8a9b7..d517aa3 100644 > --- a/fs/eventpoll.c > +++ b/fs/eventpoll.c > @@ -1114,6 +1114,17 @@ static int ep_send_events(struct eventpoll *ep, > return ep_scan_ready_list(ep, ep_send_events_proc, &esed); > } > > +static inline struct timespec epoll_set_mstimeout(long ms) > +{ > + struct timespec now, ts = { > + .tv_sec = ms / MSEC_PER_SEC, > + .tv_nsec = NSEC_PER_MSEC * (ms % MSEC_PER_SEC), > + }; > + > + ktime_get_ts(&now); > + return timespec_add_safe(now, ts); > +} > + > static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, > int maxevents, long timeout) > { > @@ -1121,12 +1132,11 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, > unsigned long flags; > long slack; > wait_queue_t wait; > - struct timespec end_time; > ktime_t expires, *to = NULL; > > if (timeout > 0) { > - ktime_get_ts(&end_time); > - timespec_add_ns(&end_time, (u64)timeout * NSEC_PER_MSEC); > + struct timespec end_time = epoll_set_mstimeout(timeout); > + > slack = select_estimate_accuracy(&end_time); > to = &expires; > *to = timespec_to_ktime(end_time); Looks OK for me. The epoll_ prefix fights with the ep_ standard on the original file, but I will post a one-liner for it, given Andrew already got this. - Davide ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2011-01-28 0:17 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-01-26 0:09 sys_epoll_wait high CPU load in 2.6.37 Simon Kirby 2011-01-26 7:18 ` Eric Dumazet 2011-01-26 11:16 ` Eric Dumazet 2011-01-26 15:31 ` Davide Libenzi 2011-01-26 15:43 ` Eric Dumazet 2011-01-26 15:52 ` Davide Libenzi 2011-01-26 15:59 ` Eric Dumazet 2011-01-26 16:13 ` Eric Dumazet 2011-01-26 17:20 ` Shawn Bohrer 2011-01-26 17:51 ` Eric Dumazet 2011-01-26 18:06 ` Davide Libenzi 2011-01-26 18:18 ` Eric Dumazet 2011-01-26 18:45 ` Eric Dumazet 2011-01-28 0:17 ` Davide Libenzi
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.