* sys_epoll_wait high CPU load in 2.6.37
@ 2011-01-26 0:09 Simon Kirby
2011-01-26 7:18 ` Eric Dumazet
0 siblings, 1 reply; 14+ messages in thread
From: Simon Kirby @ 2011-01-26 0:09 UTC (permalink / raw)
To: linux-kernel, Shawn Bohrer, Eric Dumazet
Hello!
Since upgrading 2.6.36 -> 2.6.37, dovecot's "anvil" process seems to end
up taking a lot more time in "top", and "perf top" shows output like this
(system-wide):
samples pcnt function DSO
_______ _____ _____________________________ __________________________
2405.00 68.8% sys_epoll_wait [kernel.kallsyms]
33.00 0.9% mail_cache_lookup_iter_next libdovecot-storage.so.0.0.0
30.00 0.9% _raw_spin_lock [kernel.kallsyms]
...etc...
It only wakes up 5-10 times per second or so (on this box), and does
stuff like this:
epoll_wait(12, {{EPOLLIN, {u32=19417616, u64=19417616}}}, 25, 2147483647) = 1
read(29, "PENALTY-GET\t192.168.31.10\n"..., 738) = 26
write(29, "0 0\n"..., 4) = 4
epoll_wait(12, {{EPOLLIN, {u32=19395632, u64=19395632}}}, 25, 2147483647) = 1
read(18, "LOOKUP\tpop3/192.168.31.10/tshield"..., 668) = 58
write(18, "0\n"..., 2) = 2
epoll_wait(12, {{EPOLLIN, {u32=19373072, u64=19373072}}}, 25, 2147483647) = 1
read(7, "CONNECT\t3490\tpop3/192.168.31.10/t"..., 254) = 64
epoll_wait(12, {{EPOLLIN, {u32=19373072, u64=19373072}}}, 25, 2147483647) = 1
read(7, "DISCONNECT\t3482\tpop3/192.168.31.1"..., 190) = 62
Anything obvious here? anvil talks over UNIX sockets to the rest of
dovecot, and uses epoll_wait. So, suspect commits might be:
95aac7b1cd224f568fb83937044cd303ff11b029
5456f09aaf88731e16dbcea7522cb330b6846415
or other bits from
git log v2.6.36..v2.6.37 net/unix/af_unix.c fs/eventpoll.c
I suspect it has something to do with that "infinite value" check removal
in that first commit. It doesn't show up easily on a test box, but I can
try reverting 95aac7b1cd in production if it's not obvious.
Simon-
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37
2011-01-26 0:09 sys_epoll_wait high CPU load in 2.6.37 Simon Kirby
@ 2011-01-26 7:18 ` Eric Dumazet
2011-01-26 11:16 ` Eric Dumazet
0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2011-01-26 7:18 UTC (permalink / raw)
To: Simon Kirby; +Cc: linux-kernel, Shawn Bohrer, Davide Libenzi
Le mardi 25 janvier 2011 à 16:09 -0800, Simon Kirby a écrit :
> Hello!
>
> Since upgrading 2.6.36 -> 2.6.37, dovecot's "anvil" process seems to end
> up taking a lot more time in "top", and "perf top" shows output like this
> (system-wide):
>
> samples pcnt function DSO
> _______ _____ _____________________________ __________________________
>
> 2405.00 68.8% sys_epoll_wait [kernel.kallsyms]
> 33.00 0.9% mail_cache_lookup_iter_next libdovecot-storage.so.0.0.0
> 30.00 0.9% _raw_spin_lock [kernel.kallsyms]
> ...etc...
>
> It only wakes up 5-10 times per second or so (on this box), and does
> stuff like this:
>
> epoll_wait(12, {{EPOLLIN, {u32=19417616, u64=19417616}}}, 25, 2147483647) = 1
> read(29, "PENALTY-GET\t192.168.31.10\n"..., 738) = 26
> write(29, "0 0\n"..., 4) = 4
> epoll_wait(12, {{EPOLLIN, {u32=19395632, u64=19395632}}}, 25, 2147483647) = 1
> read(18, "LOOKUP\tpop3/192.168.31.10/tshield"..., 668) = 58
> write(18, "0\n"..., 2) = 2
> epoll_wait(12, {{EPOLLIN, {u32=19373072, u64=19373072}}}, 25, 2147483647) = 1
> read(7, "CONNECT\t3490\tpop3/192.168.31.10/t"..., 254) = 64
> epoll_wait(12, {{EPOLLIN, {u32=19373072, u64=19373072}}}, 25, 2147483647) = 1
> read(7, "DISCONNECT\t3482\tpop3/192.168.31.1"..., 190) = 62
>
> Anything obvious here? anvil talks over UNIX sockets to the rest of
> dovecot, and uses epoll_wait. So, suspect commits might be:
>
> 95aac7b1cd224f568fb83937044cd303ff11b029
> 5456f09aaf88731e16dbcea7522cb330b6846415
> or other bits from
> git log v2.6.36..v2.6.37 net/unix/af_unix.c fs/eventpoll.c
>
> I suspect it has something to do with that "infinite value" check removal
> in that first commit. It doesn't show up easily on a test box, but I can
> try reverting 95aac7b1cd in production if it's not obvious.
>
> Simon-
Yes, 95aac7b1cd is the problem, but anvil should use a 0 (no) timeout
instead of 2147483647 ms : epoll_wait() doesnt have to arm a timer in
this case, it is a bit faster.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37
2011-01-26 7:18 ` Eric Dumazet
@ 2011-01-26 11:16 ` Eric Dumazet
2011-01-26 15:31 ` Davide Libenzi
0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2011-01-26 11:16 UTC (permalink / raw)
To: Simon Kirby; +Cc: linux-kernel, Shawn Bohrer, Davide Libenzi, Andrew Morton
Le mercredi 26 janvier 2011 à 08:18 +0100, Eric Dumazet a écrit :
> Le mardi 25 janvier 2011 à 16:09 -0800, Simon Kirby a écrit :
> > Hello!
> >
> > Since upgrading 2.6.36 -> 2.6.37, dovecot's "anvil" process seems to end
> > up taking a lot more time in "top", and "perf top" shows output like this
> > (system-wide):
> >
> > samples pcnt function DSO
> > _______ _____ _____________________________ __________________________
> >
> > 2405.00 68.8% sys_epoll_wait [kernel.kallsyms]
> > 33.00 0.9% mail_cache_lookup_iter_next libdovecot-storage.so.0.0.0
> > 30.00 0.9% _raw_spin_lock [kernel.kallsyms]
> > ...etc...
> >
> > It only wakes up 5-10 times per second or so (on this box), and does
> > stuff like this:
> >
> > epoll_wait(12, {{EPOLLIN, {u32=19417616, u64=19417616}}}, 25, 2147483647) = 1
> > read(29, "PENALTY-GET\t192.168.31.10\n"..., 738) = 26
> > write(29, "0 0\n"..., 4) = 4
> > epoll_wait(12, {{EPOLLIN, {u32=19395632, u64=19395632}}}, 25, 2147483647) = 1
> > read(18, "LOOKUP\tpop3/192.168.31.10/tshield"..., 668) = 58
> > write(18, "0\n"..., 2) = 2
> > epoll_wait(12, {{EPOLLIN, {u32=19373072, u64=19373072}}}, 25, 2147483647) = 1
> > read(7, "CONNECT\t3490\tpop3/192.168.31.10/t"..., 254) = 64
> > epoll_wait(12, {{EPOLLIN, {u32=19373072, u64=19373072}}}, 25, 2147483647) = 1
> > read(7, "DISCONNECT\t3482\tpop3/192.168.31.1"..., 190) = 62
> >
> > Anything obvious here? anvil talks over UNIX sockets to the rest of
> > dovecot, and uses epoll_wait. So, suspect commits might be:
> >
> > 95aac7b1cd224f568fb83937044cd303ff11b029
> > 5456f09aaf88731e16dbcea7522cb330b6846415
> > or other bits from
> > git log v2.6.36..v2.6.37 net/unix/af_unix.c fs/eventpoll.c
> >
> > I suspect it has something to do with that "infinite value" check removal
> > in that first commit. It doesn't show up easily on a test box, but I can
> > try reverting 95aac7b1cd in production if it's not obvious.
> >
> > Simon-
>
> Yes, 95aac7b1cd is the problem, but anvil should use a 0 (no) timeout
> instead of 2147483647 ms : epoll_wait() doesnt have to arm a timer in
> this case, it is a bit faster.
>
>
Slowness comes from timespec_add_ns() : This one assumed small 'ns'
argument, since it wants to avoid a divide instruction.
static __always_inline void timespec_add_ns(struct timespec *a, u64 ns)
{
a->tv_sec += __iter_div_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns);
a->tv_nsec = ns;
}
We should do this differently for epoll usage ;)
Please try following patch :
[PATCH] epoll: epoll_wait() should be careful in timespec_add_ns use
commit 95aac7b1cd224f (epoll: make epoll_wait() use the hrtimer range
feature) added a performance regression because it used
timespec_add_ns() with potential very large 'ns' values.
Reported-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Shawn Bohrer <shawn.bohrer@gmail.com>
CC: Davide Libenzi <davidel@xmailserver.org>
CC: Andrew Morton <akpm@linux-foundation.org>
---
fs/eventpoll.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index cc8a9b7..7ec0890 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1126,7 +1126,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
if (timeout > 0) {
ktime_get_ts(&end_time);
- timespec_add_ns(&end_time, (u64)timeout * NSEC_PER_MSEC);
+ end_time.tv_sec += timeout / MSEC_PER_SEC;
+ timeout %= MSEC_PER_SEC;
+ timespec_add_ns(&end_time, timeout * NSEC_PER_MSEC);
slack = select_estimate_accuracy(&end_time);
to = &expires;
*to = timespec_to_ktime(end_time);
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37
2011-01-26 11:16 ` Eric Dumazet
@ 2011-01-26 15:31 ` Davide Libenzi
2011-01-26 15:43 ` Eric Dumazet
0 siblings, 1 reply; 14+ messages in thread
From: Davide Libenzi @ 2011-01-26 15:31 UTC (permalink / raw)
To: Eric Dumazet
Cc: Simon Kirby, Linux Kernel Mailing List, Shawn Bohrer, Andrew Morton
[-- Attachment #1: Type: TEXT/PLAIN, Size: 4048 bytes --]
On Wed, 26 Jan 2011, Eric Dumazet wrote:
> Le mercredi 26 janvier 2011 à 08:18 +0100, Eric Dumazet a écrit :
> > Le mardi 25 janvier 2011 à 16:09 -0800, Simon Kirby a écrit :
> > > Hello!
> > >
> > > Since upgrading 2.6.36 -> 2.6.37, dovecot's "anvil" process seems to end
> > > up taking a lot more time in "top", and "perf top" shows output like this
> > > (system-wide):
> > >
> > > samples pcnt function DSO
> > > _______ _____ _____________________________ __________________________
> > >
> > > 2405.00 68.8% sys_epoll_wait [kernel.kallsyms]
> > > 33.00 0.9% mail_cache_lookup_iter_next libdovecot-storage.so.0.0.0
> > > 30.00 0.9% _raw_spin_lock [kernel.kallsyms]
> > > ...etc...
> > >
> > > It only wakes up 5-10 times per second or so (on this box), and does
> > > stuff like this:
> > >
> > > epoll_wait(12, {{EPOLLIN, {u32=19417616, u64=19417616}}}, 25, 2147483647) = 1
> > > read(29, "PENALTY-GET\t192.168.31.10\n"..., 738) = 26
> > > write(29, "0 0\n"..., 4) = 4
> > > epoll_wait(12, {{EPOLLIN, {u32=19395632, u64=19395632}}}, 25, 2147483647) = 1
> > > read(18, "LOOKUP\tpop3/192.168.31.10/tshield"..., 668) = 58
> > > write(18, "0\n"..., 2) = 2
> > > epoll_wait(12, {{EPOLLIN, {u32=19373072, u64=19373072}}}, 25, 2147483647) = 1
> > > read(7, "CONNECT\t3490\tpop3/192.168.31.10/t"..., 254) = 64
> > > epoll_wait(12, {{EPOLLIN, {u32=19373072, u64=19373072}}}, 25, 2147483647) = 1
> > > read(7, "DISCONNECT\t3482\tpop3/192.168.31.1"..., 190) = 62
> > >
> > > Anything obvious here? anvil talks over UNIX sockets to the rest of
> > > dovecot, and uses epoll_wait. So, suspect commits might be:
> > >
> > > 95aac7b1cd224f568fb83937044cd303ff11b029
> > > 5456f09aaf88731e16dbcea7522cb330b6846415
> > > or other bits from
> > > git log v2.6.36..v2.6.37 net/unix/af_unix.c fs/eventpoll.c
> > >
> > > I suspect it has something to do with that "infinite value" check removal
> > > in that first commit. It doesn't show up easily on a test box, but I can
> > > try reverting 95aac7b1cd in production if it's not obvious.
> > >
> > > Simon-
> >
> > Yes, 95aac7b1cd is the problem, but anvil should use a 0 (no) timeout
> > instead of 2147483647 ms : epoll_wait() doesnt have to arm a timer in
> > this case, it is a bit faster.
> >
> >
>
> Slowness comes from timespec_add_ns() : This one assumed small 'ns'
> argument, since it wants to avoid a divide instruction.
>
> static __always_inline void timespec_add_ns(struct timespec *a, u64 ns)
> {
> a->tv_sec += __iter_div_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns);
> a->tv_nsec = ns;
> }
>
> We should do this differently for epoll usage ;)
>
> Please try following patch :
>
> [PATCH] epoll: epoll_wait() should be careful in timespec_add_ns use
>
> commit 95aac7b1cd224f (epoll: make epoll_wait() use the hrtimer range
> feature) added a performance regression because it used
> timespec_add_ns() with potential very large 'ns' values.
>
> Reported-by: Simon Kirby <sim@hostway.ca>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Shawn Bohrer <shawn.bohrer@gmail.com>
> CC: Davide Libenzi <davidel@xmailserver.org>
> CC: Andrew Morton <akpm@linux-foundation.org>
> ---
> fs/eventpoll.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index cc8a9b7..7ec0890 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -1126,7 +1126,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
>
> if (timeout > 0) {
> ktime_get_ts(&end_time);
> - timespec_add_ns(&end_time, (u64)timeout * NSEC_PER_MSEC);
> + end_time.tv_sec += timeout / MSEC_PER_SEC;
> + timeout %= MSEC_PER_SEC;
> + timespec_add_ns(&end_time, timeout * NSEC_PER_MSEC);
> slack = select_estimate_accuracy(&end_time);
> to = &expires;
> *to = timespec_to_ktime(end_time);
Yep, we can overflow the timeout, with the calculation above.
A timespec_add_ms()?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37
2011-01-26 15:31 ` Davide Libenzi
@ 2011-01-26 15:43 ` Eric Dumazet
2011-01-26 15:52 ` Davide Libenzi
0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2011-01-26 15:43 UTC (permalink / raw)
To: Davide Libenzi
Cc: Simon Kirby, Linux Kernel Mailing List, Shawn Bohrer, Andrew Morton
Le mercredi 26 janvier 2011 à 07:31 -0800, Davide Libenzi a écrit :
> On Wed, 26 Jan 2011, Eric Dumazet wrote:
>
> > diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> > index cc8a9b7..7ec0890 100644
> > --- a/fs/eventpoll.c
> > +++ b/fs/eventpoll.c
> > @@ -1126,7 +1126,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
> >
> > if (timeout > 0) {
> > ktime_get_ts(&end_time);
> > - timespec_add_ns(&end_time, (u64)timeout * NSEC_PER_MSEC);
> > + end_time.tv_sec += timeout / MSEC_PER_SEC;
> > + timeout %= MSEC_PER_SEC;
> > + timespec_add_ns(&end_time, timeout * NSEC_PER_MSEC);
> > slack = select_estimate_accuracy(&end_time);
> > to = &expires;
> > *to = timespec_to_ktime(end_time);
>
> Yep, we can overflow the timeout, with the calculation above.
> A timespec_add_ms()?
Well, given timeout after modulo contains a number between 0 and 999,
multiply by 1.000.000 (NSEC_PER_MSEC) cant overflow.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37
2011-01-26 15:43 ` Eric Dumazet
@ 2011-01-26 15:52 ` Davide Libenzi
2011-01-26 15:59 ` Eric Dumazet
0 siblings, 1 reply; 14+ messages in thread
From: Davide Libenzi @ 2011-01-26 15:52 UTC (permalink / raw)
To: Eric Dumazet
Cc: Simon Kirby, Linux Kernel Mailing List, Shawn Bohrer, Andrew Morton
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1285 bytes --]
On Wed, 26 Jan 2011, Eric Dumazet wrote:
> Le mercredi 26 janvier 2011 à 07:31 -0800, Davide Libenzi a écrit :
> > On Wed, 26 Jan 2011, Eric Dumazet wrote:
> >
>
> > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> > > index cc8a9b7..7ec0890 100644
> > > --- a/fs/eventpoll.c
> > > +++ b/fs/eventpoll.c
> > > @@ -1126,7 +1126,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
> > >
> > > if (timeout > 0) {
> > > ktime_get_ts(&end_time);
> > > - timespec_add_ns(&end_time, (u64)timeout * NSEC_PER_MSEC);
> > > + end_time.tv_sec += timeout / MSEC_PER_SEC;
> > > + timeout %= MSEC_PER_SEC;
> > > + timespec_add_ns(&end_time, timeout * NSEC_PER_MSEC);
> > > slack = select_estimate_accuracy(&end_time);
> > > to = &expires;
> > > *to = timespec_to_ktime(end_time);
> >
> > Yep, we can overflow the timeout, with the calculation above.
> > A timespec_add_ms()?
>
> Well, given timeout after modulo contains a number between 0 and 999,
> multiply by 1.000.000 (NSEC_PER_MSEC) cant overflow.
For "above", I meant the current epoll expire time calculation, which was
described above in the message ;)
The hint for a timespec_add_ms() was because we must be doing something
similar in poll, don't we (/me got no code in front ATM)?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37
2011-01-26 15:52 ` Davide Libenzi
@ 2011-01-26 15:59 ` Eric Dumazet
2011-01-26 16:13 ` Eric Dumazet
0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2011-01-26 15:59 UTC (permalink / raw)
To: Davide Libenzi
Cc: Simon Kirby, Linux Kernel Mailing List, Shawn Bohrer, Andrew Morton
Le mercredi 26 janvier 2011 à 07:52 -0800, Davide Libenzi a écrit :
> For "above", I meant the current epoll expire time calculation, which was
> described above in the message ;)
Well, problem was not an overflow, but doing a loop 2.000.000 times ;)
> The hint for a timespec_add_ms() was because we must be doing something
> similar in poll, don't we (/me got no code in front ATM)?
Apparently its done differently in poll(), using
poll_select_set_timeout() helper.
Give me some minutes I'll try to cook an alternate patch
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37
2011-01-26 15:59 ` Eric Dumazet
@ 2011-01-26 16:13 ` Eric Dumazet
2011-01-26 17:20 ` Shawn Bohrer
0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2011-01-26 16:13 UTC (permalink / raw)
To: Davide Libenzi
Cc: Simon Kirby, Linux Kernel Mailing List, Shawn Bohrer,
Andrew Morton, Thomas Gleixner
Le mercredi 26 janvier 2011 à 16:59 +0100, Eric Dumazet a écrit :
> Le mercredi 26 janvier 2011 à 07:52 -0800, Davide Libenzi a écrit :
>
> > For "above", I meant the current epoll expire time calculation, which was
> > described above in the message ;)
>
> Well, problem was not an overflow, but doing a loop 2.000.000 times ;)
>
> > The hint for a timespec_add_ms() was because we must be doing something
> > similar in poll, don't we (/me got no code in front ATM)?
>
> Apparently its done differently in poll(), using
> poll_select_set_timeout() helper.
>
>
> Give me some minutes I'll try to cook an alternate patch
>
Here is the alternate patch, using poll_select_set_timeout() helper
Thanks
[PATCH v2] epoll: epoll_wait() should not use timespec_add_ns()
commit 95aac7b1cd224f (epoll: make epoll_wait() use the hrtimer range
feature) added a performance regression because it uses
timespec_add_ns() with potential very large 'ns' values.
Use poll_select_set_timeout() helper like poll()/select()
Reported-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Shawn Bohrer <shawn.bohrer@gmail.com>
CC: Davide Libenzi <davidel@xmailserver.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Andrew Morton <akpm@linux-foundation.org>
---
fs/eventpoll.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index cc8a9b7..94d887b 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1125,8 +1125,8 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
ktime_t expires, *to = NULL;
if (timeout > 0) {
- ktime_get_ts(&end_time);
- timespec_add_ns(&end_time, (u64)timeout * NSEC_PER_MSEC);
+ poll_select_set_timeout(&end_time, timeout / MSEC_PER_SEC,
+ NSEC_PER_MSEC * (timeout % MSEC_PER_SEC));
slack = select_estimate_accuracy(&end_time);
to = &expires;
*to = timespec_to_ktime(end_time);
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37
2011-01-26 16:13 ` Eric Dumazet
@ 2011-01-26 17:20 ` Shawn Bohrer
2011-01-26 17:51 ` Eric Dumazet
0 siblings, 1 reply; 14+ messages in thread
From: Shawn Bohrer @ 2011-01-26 17:20 UTC (permalink / raw)
To: Eric Dumazet
Cc: Davide Libenzi, Simon Kirby, Linux Kernel Mailing List,
Andrew Morton, Thomas Gleixner
On Wed, Jan 26, 2011 at 05:13:29PM +0100, Eric Dumazet wrote:
> Le mercredi 26 janvier 2011 à 16:59 +0100, Eric Dumazet a écrit :
> > Le mercredi 26 janvier 2011 à 07:52 -0800, Davide Libenzi a écrit :
> >
> > > For "above", I meant the current epoll expire time calculation, which was
> > > described above in the message ;)
> >
> > Well, problem was not an overflow, but doing a loop 2.000.000 times ;)
> >
> > > The hint for a timespec_add_ms() was because we must be doing something
> > > similar in poll, don't we (/me got no code in front ATM)?
> >
> > Apparently its done differently in poll(), using
> > poll_select_set_timeout() helper.
> >
> >
> > Give me some minutes I'll try to cook an alternate patch
> >
>
> Here is the alternate patch, using poll_select_set_timeout() helper
>
> Thanks
>
> [PATCH v2] epoll: epoll_wait() should not use timespec_add_ns()
>
> commit 95aac7b1cd224f (epoll: make epoll_wait() use the hrtimer range
> feature) added a performance regression because it uses
> timespec_add_ns() with potential very large 'ns' values.
>
> Use poll_select_set_timeout() helper like poll()/select()
>
> Reported-by: Simon Kirby <sim@hostway.ca>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Shawn Bohrer <shawn.bohrer@gmail.com>
> CC: Davide Libenzi <davidel@xmailserver.org>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Andrew Morton <akpm@linux-foundation.org>
> ---
> fs/eventpoll.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index cc8a9b7..94d887b 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -1125,8 +1125,8 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
> ktime_t expires, *to = NULL;
>
> if (timeout > 0) {
> - ktime_get_ts(&end_time);
> - timespec_add_ns(&end_time, (u64)timeout * NSEC_PER_MSEC);
> + poll_select_set_timeout(&end_time, timeout / MSEC_PER_SEC,
> + NSEC_PER_MSEC * (timeout % MSEC_PER_SEC));
> slack = select_estimate_accuracy(&end_time);
> to = &expires;
> *to = timespec_to_ktime(end_time);
poll_select_set_timeout() jumps through some extra hoops that
aren't necessary in the epoll case so I actually like your previous
patch better.
--
Shawn
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37
2011-01-26 17:20 ` Shawn Bohrer
@ 2011-01-26 17:51 ` Eric Dumazet
2011-01-26 18:06 ` Davide Libenzi
0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2011-01-26 17:51 UTC (permalink / raw)
To: Shawn Bohrer
Cc: Davide Libenzi, Simon Kirby, Linux Kernel Mailing List,
Andrew Morton, Thomas Gleixner
Le mercredi 26 janvier 2011 à 11:20 -0600, Shawn Bohrer a écrit :
> On Wed, Jan 26, 2011 at 05:13:29PM +0100, Eric Dumazet wrote:
> > Le mercredi 26 janvier 2011 à 16:59 +0100, Eric Dumazet a écrit :
> > > Le mercredi 26 janvier 2011 à 07:52 -0800, Davide Libenzi a écrit :
> > >
> > > > For "above", I meant the current epoll expire time calculation, which was
> > > > described above in the message ;)
> > >
> > > Well, problem was not an overflow, but doing a loop 2.000.000 times ;)
> > >
> > > > The hint for a timespec_add_ms() was because we must be doing something
> > > > similar in poll, don't we (/me got no code in front ATM)?
> > >
> > > Apparently its done differently in poll(), using
> > > poll_select_set_timeout() helper.
> > >
> > >
> > > Give me some minutes I'll try to cook an alternate patch
> > >
> >
> > Here is the alternate patch, using poll_select_set_timeout() helper
> >
> > Thanks
> >
> > [PATCH v2] epoll: epoll_wait() should not use timespec_add_ns()
> >
> > commit 95aac7b1cd224f (epoll: make epoll_wait() use the hrtimer range
> > feature) added a performance regression because it uses
> > timespec_add_ns() with potential very large 'ns' values.
> >
> > Use poll_select_set_timeout() helper like poll()/select()
> >
> > Reported-by: Simon Kirby <sim@hostway.ca>
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > CC: Shawn Bohrer <shawn.bohrer@gmail.com>
> > CC: Davide Libenzi <davidel@xmailserver.org>
> > CC: Thomas Gleixner <tglx@linutronix.de>
> > CC: Andrew Morton <akpm@linux-foundation.org>
> > ---
> > fs/eventpoll.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> > index cc8a9b7..94d887b 100644
> > --- a/fs/eventpoll.c
> > +++ b/fs/eventpoll.c
> > @@ -1125,8 +1125,8 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
> > ktime_t expires, *to = NULL;
> >
> > if (timeout > 0) {
> > - ktime_get_ts(&end_time);
> > - timespec_add_ns(&end_time, (u64)timeout * NSEC_PER_MSEC);
> > + poll_select_set_timeout(&end_time, timeout / MSEC_PER_SEC,
> > + NSEC_PER_MSEC * (timeout % MSEC_PER_SEC));
> > slack = select_estimate_accuracy(&end_time);
> > to = &expires;
> > *to = timespec_to_ktime(end_time);
>
> poll_select_set_timeout() jumps through some extra hoops that
> aren't necessary in the epoll case so I actually like your previous
> patch better.
Well, I dont care, I let Davide decide, he is the boss ;)
This is a stable candidate, so adding timespec_add_ms() sounds overkill.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37
2011-01-26 17:51 ` Eric Dumazet
@ 2011-01-26 18:06 ` Davide Libenzi
2011-01-26 18:18 ` Eric Dumazet
0 siblings, 1 reply; 14+ messages in thread
From: Davide Libenzi @ 2011-01-26 18:06 UTC (permalink / raw)
To: Eric Dumazet
Cc: Shawn Bohrer, Simon Kirby, Linux Kernel Mailing List,
Andrew Morton, Thomas Gleixner
On Wed, 26 Jan 2011, Eric Dumazet wrote:
> > > --- a/fs/eventpoll.c
> > > +++ b/fs/eventpoll.c
> > > @@ -1125,8 +1125,8 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
> > > ktime_t expires, *to = NULL;
> > >
> > > if (timeout > 0) {
> > > - ktime_get_ts(&end_time);
> > > - timespec_add_ns(&end_time, (u64)timeout * NSEC_PER_MSEC);
> > > + poll_select_set_timeout(&end_time, timeout / MSEC_PER_SEC,
> > > + NSEC_PER_MSEC * (timeout % MSEC_PER_SEC));
> > > slack = select_estimate_accuracy(&end_time);
> > > to = &expires;
> > > *to = timespec_to_ktime(end_time);
> >
> > poll_select_set_timeout() jumps through some extra hoops that
> > aren't necessary in the epoll case so I actually like your previous
> > patch better.
>
> Well, I dont care, I let Davide decide, he is the boss ;)
>
> This is a stable candidate, so adding timespec_add_ms() sounds overkill.
Eric, if you look at fs/select.c (~line 925), poll does exactly the same
thing as epoll do.
It too, ignores the eventual return value of poll_select_set_timeout(), so
maybe a little bit more optimized ktime_get_ts+timespec_add_ms could make
sense.
- Davide
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37
2011-01-26 18:06 ` Davide Libenzi
@ 2011-01-26 18:18 ` Eric Dumazet
2011-01-26 18:45 ` Eric Dumazet
0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2011-01-26 18:18 UTC (permalink / raw)
To: Davide Libenzi
Cc: Shawn Bohrer, Simon Kirby, Linux Kernel Mailing List,
Andrew Morton, Thomas Gleixner
Le mercredi 26 janvier 2011 à 10:06 -0800, Davide Libenzi a écrit :
> Eric, if you look at fs/select.c (~line 925), poll does exactly the same
> thing as epoll do.
> It too, ignores the eventual return value of poll_select_set_timeout(), so
> maybe a little bit more optimized ktime_get_ts+timespec_add_ms could make
> sense.
>
OK, I'll post a V3 ;)
Thanks
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37
2011-01-26 18:18 ` Eric Dumazet
@ 2011-01-26 18:45 ` Eric Dumazet
2011-01-28 0:17 ` Davide Libenzi
0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2011-01-26 18:45 UTC (permalink / raw)
To: Davide Libenzi
Cc: Shawn Bohrer, Simon Kirby, Linux Kernel Mailing List,
Andrew Morton, Thomas Gleixner
Le mercredi 26 janvier 2011 à 19:18 +0100, Eric Dumazet a écrit :
> Le mercredi 26 janvier 2011 à 10:06 -0800, Davide Libenzi a écrit :
>
> > Eric, if you look at fs/select.c (~line 925), poll does exactly the same
> > thing as epoll do.
> > It too, ignores the eventual return value of poll_select_set_timeout(), so
> > maybe a little bit more optimized ktime_get_ts+timespec_add_ms could make
> > sense.
> >
>
> OK, I'll post a V3 ;)
Well in the poll() case we handle a zero timeout, not in epoll().
So the helper function cannot be shared and can be static to epoll.
[PATCH v3] epoll: epoll_wait() should not use timespec_add_ns()
commit 95aac7b1cd224f (epoll: make epoll_wait() use the hrtimer range
feature) added a performance regression because it uses
timespec_add_ns() with potential very large 'ns' values.
Reported-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Shawn Bohrer <shawn.bohrer@gmail.com>
CC: Davide Libenzi <davidel@xmailserver.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Andrew Morton <akpm@linux-foundation.org>
---
fs/eventpoll.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index cc8a9b7..d517aa3 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1114,6 +1114,17 @@ static int ep_send_events(struct eventpoll *ep,
return ep_scan_ready_list(ep, ep_send_events_proc, &esed);
}
+static inline struct timespec epoll_set_mstimeout(long ms)
+{
+ struct timespec now, ts = {
+ .tv_sec = ms / MSEC_PER_SEC,
+ .tv_nsec = NSEC_PER_MSEC * (ms % MSEC_PER_SEC),
+ };
+
+ ktime_get_ts(&now);
+ return timespec_add_safe(now, ts);
+}
+
static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
int maxevents, long timeout)
{
@@ -1121,12 +1132,11 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
unsigned long flags;
long slack;
wait_queue_t wait;
- struct timespec end_time;
ktime_t expires, *to = NULL;
if (timeout > 0) {
- ktime_get_ts(&end_time);
- timespec_add_ns(&end_time, (u64)timeout * NSEC_PER_MSEC);
+ struct timespec end_time = epoll_set_mstimeout(timeout);
+
slack = select_estimate_accuracy(&end_time);
to = &expires;
*to = timespec_to_ktime(end_time);
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: sys_epoll_wait high CPU load in 2.6.37
2011-01-26 18:45 ` Eric Dumazet
@ 2011-01-28 0:17 ` Davide Libenzi
0 siblings, 0 replies; 14+ messages in thread
From: Davide Libenzi @ 2011-01-28 0:17 UTC (permalink / raw)
To: Eric Dumazet
Cc: Shawn Bohrer, Simon Kirby, Linux Kernel Mailing List,
Andrew Morton, Thomas Gleixner
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2658 bytes --]
On Wed, 26 Jan 2011, Eric Dumazet wrote:
> Le mercredi 26 janvier 2011 à 19:18 +0100, Eric Dumazet a écrit :
> > Le mercredi 26 janvier 2011 à 10:06 -0800, Davide Libenzi a écrit :
> >
> > > Eric, if you look at fs/select.c (~line 925), poll does exactly the same
> > > thing as epoll do.
> > > It too, ignores the eventual return value of poll_select_set_timeout(), so
> > > maybe a little bit more optimized ktime_get_ts+timespec_add_ms could make
> > > sense.
> > >
> >
> > OK, I'll post a V3 ;)
>
> Well in the poll() case we handle a zero timeout, not in epoll().
>
> So the helper function cannot be shared and can be static to epoll.
>
>
> [PATCH v3] epoll: epoll_wait() should not use timespec_add_ns()
>
> commit 95aac7b1cd224f (epoll: make epoll_wait() use the hrtimer range
> feature) added a performance regression because it uses
> timespec_add_ns() with potential very large 'ns' values.
>
> Reported-by: Simon Kirby <sim@hostway.ca>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Shawn Bohrer <shawn.bohrer@gmail.com>
> CC: Davide Libenzi <davidel@xmailserver.org>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Andrew Morton <akpm@linux-foundation.org>
> ---
> fs/eventpoll.c | 16 +++++++++++++---
> 1 file changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index cc8a9b7..d517aa3 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -1114,6 +1114,17 @@ static int ep_send_events(struct eventpoll *ep,
> return ep_scan_ready_list(ep, ep_send_events_proc, &esed);
> }
>
> +static inline struct timespec epoll_set_mstimeout(long ms)
> +{
> + struct timespec now, ts = {
> + .tv_sec = ms / MSEC_PER_SEC,
> + .tv_nsec = NSEC_PER_MSEC * (ms % MSEC_PER_SEC),
> + };
> +
> + ktime_get_ts(&now);
> + return timespec_add_safe(now, ts);
> +}
> +
> static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
> int maxevents, long timeout)
> {
> @@ -1121,12 +1132,11 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
> unsigned long flags;
> long slack;
> wait_queue_t wait;
> - struct timespec end_time;
> ktime_t expires, *to = NULL;
>
> if (timeout > 0) {
> - ktime_get_ts(&end_time);
> - timespec_add_ns(&end_time, (u64)timeout * NSEC_PER_MSEC);
> + struct timespec end_time = epoll_set_mstimeout(timeout);
> +
> slack = select_estimate_accuracy(&end_time);
> to = &expires;
> *to = timespec_to_ktime(end_time);
Looks OK for me. The epoll_ prefix fights with the ep_ standard on the
original file, but I will post a one-liner for it, given Andrew already
got this.
- Davide
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2011-01-28 0:17 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-26 0:09 sys_epoll_wait high CPU load in 2.6.37 Simon Kirby
2011-01-26 7:18 ` Eric Dumazet
2011-01-26 11:16 ` Eric Dumazet
2011-01-26 15:31 ` Davide Libenzi
2011-01-26 15:43 ` Eric Dumazet
2011-01-26 15:52 ` Davide Libenzi
2011-01-26 15:59 ` Eric Dumazet
2011-01-26 16:13 ` Eric Dumazet
2011-01-26 17:20 ` Shawn Bohrer
2011-01-26 17:51 ` Eric Dumazet
2011-01-26 18:06 ` Davide Libenzi
2011-01-26 18:18 ` Eric Dumazet
2011-01-26 18:45 ` Eric Dumazet
2011-01-28 0:17 ` Davide Libenzi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.