linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] fs, epoll: short circuit fetching events if thread has been killed
@ 2017-05-04  0:22 David Rientjes
  2017-05-09 23:05 ` Andrew Morton
  2017-05-10 13:07 ` Michal Hocko
  0 siblings, 2 replies; 4+ messages in thread
From: David Rientjes @ 2017-05-04  0:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Alexander Viro, linux-kernel

We've encountered zombies that are waiting for a thread to exit that are
looping in ep_poll() almost endlessly although there is a pending SIGKILL
as a result of a group exit.

This happens because we always find ep_events_available() and fetch more
events and never are able to check for signal_pending() that would break
from the loop and return -EINTR.

Special case fatal signals and break immediately to guarantee that we
loop to fetch more events and delay making a timely exit.

It would also be possible to simply move the check for signal_pending()
higher than checking for ep_events_available(), but there have been no
reports of delayed signal handling other than SIGKILL preventing zombies
from exiting that would be fixed by this.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 fs/eventpoll.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1748,6 +1748,16 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
 			 * to TASK_INTERRUPTIBLE before doing the checks.
 			 */
 			set_current_state(TASK_INTERRUPTIBLE);
+			/*
+			 * Always short-circuit for fatal signals to allow
+			 * threads to make a timely exit without the chance of
+			 * finding more events available and fetching
+			 * repeatedly.
+			 */
+			if (fatal_signal_pending(current)) {
+				res = -EINTR;
+				break;
+			}
 			if (ep_events_available(ep) || timed_out)
 				break;
 			if (signal_pending(current)) {

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [patch] fs, epoll: short circuit fetching events if thread has been killed
  2017-05-04  0:22 [patch] fs, epoll: short circuit fetching events if thread has been killed David Rientjes
@ 2017-05-09 23:05 ` Andrew Morton
  2017-05-10  0:08   ` David Rientjes
  2017-05-10 13:07 ` Michal Hocko
  1 sibling, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2017-05-09 23:05 UTC (permalink / raw)
  To: David Rientjes; +Cc: Alexander Viro, linux-kernel, Jan Kara, Davide Libenzi

On Wed, 3 May 2017 17:22:53 -0700 (PDT) David Rientjes <rientjes@google.com> wrote:

> We've encountered zombies that are waiting for a thread to exit that are
> looping in ep_poll() almost endlessly although there is a pending SIGKILL
> as a result of a group exit.
> 
> This happens because we always find ep_events_available() and fetch more
> events and never are able to check for signal_pending() that would break
> from the loop and return -EINTR.
> 
> Special case fatal signals and break immediately to guarantee that we
> loop to fetch more events and delay making a timely exit.
> 
> It would also be possible to simply move the check for signal_pending()
> higher than checking for ep_events_available(), but there have been no
> reports of delayed signal handling other than SIGKILL preventing zombies
> from exiting that would be fixed by this.

Any thoughts on the priority of this?  -stable?  If so, why?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [patch] fs, epoll: short circuit fetching events if thread has been killed
  2017-05-09 23:05 ` Andrew Morton
@ 2017-05-10  0:08   ` David Rientjes
  0 siblings, 0 replies; 4+ messages in thread
From: David Rientjes @ 2017-05-10  0:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Alexander Viro, linux-kernel, Jan Kara, Davide Libenzi

On Tue, 9 May 2017, Andrew Morton wrote:

> > We've encountered zombies that are waiting for a thread to exit that are
> > looping in ep_poll() almost endlessly although there is a pending SIGKILL
> > as a result of a group exit.
> > 
> > This happens because we always find ep_events_available() and fetch more
> > events and never are able to check for signal_pending() that would break
> > from the loop and return -EINTR.
> > 
> > Special case fatal signals and break immediately to guarantee that we
> > loop to fetch more events and delay making a timely exit.
> > 
> > It would also be possible to simply move the check for signal_pending()
> > higher than checking for ep_events_available(), but there have been no
> > reports of delayed signal handling other than SIGKILL preventing zombies
> > from exiting that would be fixed by this.
> 
> Any thoughts on the priority of this?  -stable?  If so, why?
> 

It fixes an issue for us where we have witnessed zombies sticking around 
for at least O(minutes), but considering the code has been like this 
forever and nobody else has complained that I have found, I would simply 
queue it up for 4.12.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [patch] fs, epoll: short circuit fetching events if thread has been killed
  2017-05-04  0:22 [patch] fs, epoll: short circuit fetching events if thread has been killed David Rientjes
  2017-05-09 23:05 ` Andrew Morton
@ 2017-05-10 13:07 ` Michal Hocko
  1 sibling, 0 replies; 4+ messages in thread
From: Michal Hocko @ 2017-05-10 13:07 UTC (permalink / raw)
  To: David Rientjes; +Cc: Andrew Morton, Alexander Viro, linux-kernel

On Wed 03-05-17 17:22:53, David Rientjes wrote:
[...]
> @@ -1748,6 +1748,16 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
>  			 * to TASK_INTERRUPTIBLE before doing the checks.
>  			 */
>  			set_current_state(TASK_INTERRUPTIBLE);
> +			/*
> +			 * Always short-circuit for fatal signals to allow
> +			 * threads to make a timely exit without the chance of
> +			 * finding more events available and fetching
> +			 * repeatedly.
> +			 */
> +			if (fatal_signal_pending(current)) {
> +				res = -EINTR;
> +				break;
> +			}
>  			if (ep_events_available(ep) || timed_out)
>  				break;
>  			if (signal_pending(current)) {

I am wondering. Is there any specific reason why we do not break out of
the loop before checking ep_events_available on any pending signal? Is
there any advantage to preempt signal handling by too many events?

I've tried to dig it out from the full history git tree but it goes all
the way down to "[PATCH] epoll update r3".

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-05-10 13:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-04  0:22 [patch] fs, epoll: short circuit fetching events if thread has been killed David Rientjes
2017-05-09 23:05 ` Andrew Morton
2017-05-10  0:08   ` David Rientjes
2017-05-10 13:07 ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).