linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] khungtaskd: Kick stuck processes
@ 2018-02-08 19:07 Chris Wilson
  2018-02-08 23:10 ` Tetsuo Handa
  2018-02-13 11:56 ` Michal Hocko
  0 siblings, 2 replies; 6+ messages in thread
From: Chris Wilson @ 2018-02-08 19:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: Chris Wilson, Ingo Molnar, Tetsuo Handa, Andrew Morton

After spotting a stuck process, and having decided not to panic, give
the task a kick to see if that helps it to recover (e.g. to paper over a
missed wake up).

References: https://bugs.freedesktop.org/show_bug.cgi?id=104009
References: https://bugs.freedesktop.org/show_bug.cgi?id=104682
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 kernel/hung_task.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 751593ed7c0b..b32acb6bcc63 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -132,6 +132,8 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
 		trigger_all_cpu_backtrace();
 		panic("hung_task: blocked tasks");
 	}
+
+	wake_up_process(t);
 }
 
 /*
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] khungtaskd: Kick stuck processes
  2018-02-08 19:07 [PATCH] khungtaskd: Kick stuck processes Chris Wilson
@ 2018-02-08 23:10 ` Tetsuo Handa
  2018-02-08 23:15   ` Chris Wilson
  2018-02-13 11:56 ` Michal Hocko
  1 sibling, 1 reply; 6+ messages in thread
From: Tetsuo Handa @ 2018-02-08 23:10 UTC (permalink / raw)
  To: chris, linux-kernel; +Cc: mingo, akpm, ak, jack, aryabinin, dvyukov

Chris Wilson wrote:
> After spotting a stuck process, and having decided not to panic, give
> the task a kick to see if that helps it to recover (e.g. to paper over a
> missed wake up).

Yes, we are seeing hangs at io_schedule(), but doesn't optionally allowing
io_schedule() be replaced with timeout version (e.g. dump_page() upon timeout
if io_schedule() was called for e.g. wait_on_page_bit()) give us more clue?

> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=104009
> References: https://bugs.freedesktop.org/show_bug.cgi?id=104682
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
>  kernel/hung_task.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 751593ed7c0b..b32acb6bcc63 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -132,6 +132,8 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
>  		trigger_all_cpu_backtrace();
>  		panic("hung_task: blocked tasks");
>  	}
> +
> +	wake_up_process(t);
>  }
>  
>  /*
> -- 
> 2.16.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] khungtaskd: Kick stuck processes
  2018-02-08 23:10 ` Tetsuo Handa
@ 2018-02-08 23:15   ` Chris Wilson
  2018-02-09 11:30     ` Tetsuo Handa
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Wilson @ 2018-02-08 23:15 UTC (permalink / raw)
  To: Tetsuo Handa, linux-kernel; +Cc: mingo, akpm, ak, jack, aryabinin, dvyukov

Quoting Tetsuo Handa (2018-02-08 23:10:43)
> Chris Wilson wrote:
> > After spotting a stuck process, and having decided not to panic, give
> > the task a kick to see if that helps it to recover (e.g. to paper over a
> > missed wake up).
> 
> Yes, we are seeing hangs at io_schedule(), but doesn't optionally allowing
> io_schedule() be replaced with timeout version (e.g. dump_page() upon timeout
> if io_schedule() was called for e.g. wait_on_page_bit()) give us more clue?

Yes, this isn't for debugging who left the page locked (or the exact
root cause), this is just trying to allow the system to limp along
afterwards :) From personal experience, I know how easy it is to lose a
wakeup and the only thing to notice is khungtaskd shouting every 120s.
-Chris

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] khungtaskd: Kick stuck processes
  2018-02-08 23:15   ` Chris Wilson
@ 2018-02-09 11:30     ` Tetsuo Handa
  0 siblings, 0 replies; 6+ messages in thread
From: Tetsuo Handa @ 2018-02-09 11:30 UTC (permalink / raw)
  To: chris, linux-kernel; +Cc: mingo, akpm, ak, jack, aryabinin, dvyukov

Chris Wilson wrote:
> Quoting Tetsuo Handa (2018-02-08 23:10:43)
> > Chris Wilson wrote:
> > > After spotting a stuck process, and having decided not to panic, give
> > > the task a kick to see if that helps it to recover (e.g. to paper over a
> > > missed wake up).
> > 
> > Yes, we are seeing hangs at io_schedule(), but doesn't optionally allowing
> > io_schedule() be replaced with timeout version (e.g. dump_page() upon timeout
> > if io_schedule() was called for e.g. wait_on_page_bit()) give us more clue?
> 
> Yes, this isn't for debugging who left the page locked (or the exact
> root cause), this is just trying to allow the system to limp along
> afterwards :) From personal experience, I know how easy it is to lose a
> wakeup and the only thing to notice is khungtaskd shouting every 120s.

Calling wake_up_process() does not sleep, does it? Then, I think you can
do it using SystemTap, for SystemTap gives you ability to call exported
functions at (almost) arbitrary line of (almost) arbitrary file. You can find
https://events.static.linuxfound.org/sites/events/files/slides/LCJ2014-en_0.pdf
for an example.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] khungtaskd: Kick stuck processes
  2018-02-08 19:07 [PATCH] khungtaskd: Kick stuck processes Chris Wilson
  2018-02-08 23:10 ` Tetsuo Handa
@ 2018-02-13 11:56 ` Michal Hocko
       [not found]   ` <151852369261.8633.4809735220536862770@mail.alporthouse.com>
  1 sibling, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2018-02-13 11:56 UTC (permalink / raw)
  To: Chris Wilson; +Cc: linux-kernel, Ingo Molnar, Tetsuo Handa, Andrew Morton

On Thu 08-02-18 19:07:53, Chris Wilson wrote:
> After spotting a stuck process, and having decided not to panic, give
> the task a kick to see if that helps it to recover (e.g. to paper over a
> missed wake up).

huh, this is just no-no. watchdog is there to report problems not
interfere. You cannot never know whether the sleeper is prepared for
spurious wakeups. Do not paper over bugs...

> References: https://bugs.freedesktop.org/show_bug.cgi?id=104009
> References: https://bugs.freedesktop.org/show_bug.cgi?id=104682
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
>  kernel/hung_task.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 751593ed7c0b..b32acb6bcc63 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -132,6 +132,8 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
>  		trigger_all_cpu_backtrace();
>  		panic("hung_task: blocked tasks");
>  	}
> +
> +	wake_up_process(t);
>  }
>  
>  /*
> -- 
> 2.16.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] khungtaskd: Kick stuck processes
       [not found]   ` <151852369261.8633.4809735220536862770@mail.alporthouse.com>
@ 2018-02-13 12:29     ` Michal Hocko
  0 siblings, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2018-02-13 12:29 UTC (permalink / raw)
  To: Chris Wilson; +Cc: linux-kernel, Ingo Molnar, Tetsuo Handa, Andrew Morton

On Tue 13-02-18 12:08:12, Chris Wilson wrote:
> Quoting Michal Hocko (2018-02-13 11:56:42)
> > On Thu 08-02-18 19:07:53, Chris Wilson wrote:
> > > After spotting a stuck process, and having decided not to panic, give
> > > the task a kick to see if that helps it to recover (e.g. to paper over a
> > > missed wake up).
> > 
> > huh, this is just no-no. watchdog is there to report problems not
> > interfere. You cannot never know whether the sleeper is prepared for
> > spurious wakeups. Do not paper over bugs...
> 
> Aside from khungtaskd being a debug feature, we want to identify the bug
> by kicking the stuck process and seeing what squeals. Being told that
> khugepaged is stuck over and over again doesn't help resolve who is
> holding onto that lock_page, or if it was just a missed wakeup as all
> other processes are asleep.

And how exactly does kicking helps here? If the waiter uses lock_page
then it would go sleep again because of PG_locked. If the page is not
locked and this is a missed wake up then either unlock_page is wrong
(which doesn't seem to be the case AFAICS) or somebody messes up with
the page locking and this patch doesn't achieve anything.

> We are trying to paper over other bugs so that we can fix ours.

But you do not want to break existing code which might be sensible to
spurious wakeups. You could argue that such a code is broken already and
I would tend to agree, but an artificial wake up is just nogo.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-02-13 12:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-08 19:07 [PATCH] khungtaskd: Kick stuck processes Chris Wilson
2018-02-08 23:10 ` Tetsuo Handa
2018-02-08 23:15   ` Chris Wilson
2018-02-09 11:30     ` Tetsuo Handa
2018-02-13 11:56 ` Michal Hocko
     [not found]   ` <151852369261.8633.4809735220536862770@mail.alporthouse.com>
2018-02-13 12:29     ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).