linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] kernel/hung_task.c: allow to set period separately from timeout
@ 2018-06-08 13:30 Dmitry Vyukov
  2018-06-08 21:58 ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Vyukov @ 2018-06-08 13:30 UTC (permalink / raw)
  To: akpm, penguin-kernel, paulmck; +Cc: Dmitry Vyukov, linux-kernel, syzkaller

Currently task hung checking period is equal to timeout,
as the result hung is detected anywhere between timeout and 2*timeout.
This is fine for most interactive environments, but this hurts automated
testing setups (syzbot). In an automated setup we need to strictly order
CPU lockup < RCU stall < workqueue lockup < task hung < silent loss,
so that RCU stall is not detected as task hung and task hung is not
detected as silent machine loss. The large variance in task hung
detection timeout requires setting silent machine loss timeout to
a very large value (e.g. if task hung is 3 mins, then silent loss
need to be set to ~7 mins). The additional 3 minutes significantly
reduce testing efficiency because usually we crash kernel within
a minute, and this can add hours to bug localization process as it
needs to do dozens of tests.

Allow setting checking period separately from timeout.
This allows to set timeout to, say, 3 minutes, but period to 10 secs.

The period is controlled via a new hung_task_period_secs sysctl,
similar to the existing hung_task_timeout_secs sysctl.
The default value of 0 results in the current behavior.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: linux-kernel@vger.kernel.org
Cc: syzkaller@googlegroups.com
---
 include/linux/sched.h        |  1 +
 include/linux/sched/sysctl.h |  1 +
 kernel/fork.c                |  1 +
 kernel/hung_task.c           | 15 ++++++++++++++-
 kernel/sysctl.c              |  8 ++++++++
 5 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 14e4f9c12337..520032e87c9e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -848,6 +848,7 @@ struct task_struct {
 #endif
 #ifdef CONFIG_DETECT_HUNG_TASK
 	unsigned long			last_switch_count;
+	unsigned long			last_switch_time;
 #endif
 	/* Filesystem information: */
 	struct fs_struct		*fs;
diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index 1c1a1512ec55..1db0fc4cb134 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -10,6 +10,7 @@ struct ctl_table;
 extern int	     sysctl_hung_task_check_count;
 extern unsigned int  sysctl_hung_task_panic;
 extern unsigned long sysctl_hung_task_timeout_secs;
+extern unsigned long sysctl_hung_task_period_secs;
 extern int sysctl_hung_task_warnings;
 extern int proc_dohung_task_timeout_secs(struct ctl_table *table, int write,
 					 void __user *buffer,
diff --git a/kernel/fork.c b/kernel/fork.c
index c6d1c1ce9ed7..f393ed8ae15b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1262,6 +1262,7 @@ static int copy_mm(unsigned long clone_flags, struct task_struct *tsk)
 	tsk->nvcsw = tsk->nivcsw = 0;
 #ifdef CONFIG_DETECT_HUNG_TASK
 	tsk->last_switch_count = tsk->nvcsw + tsk->nivcsw;
+	tsk->last_switch_time = 0;
 #endif
 
 	tsk->mm = NULL;
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 32b479468e4d..55a7e84c3315 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -40,6 +40,11 @@ int __read_mostly sysctl_hung_task_check_count = PID_MAX_LIMIT;
  */
 unsigned long __read_mostly sysctl_hung_task_timeout_secs = CONFIG_DEFAULT_HUNG_TASK_TIMEOUT;
 
+/*
+ * Zero (default value) means use sysctl_hung_task_timeout_secs:
+ */
+unsigned long __read_mostly sysctl_hung_task_period_secs;
+
 int __read_mostly sysctl_hung_task_warnings = 10;
 
 static int __read_mostly did_panic;
@@ -98,8 +103,11 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
 
 	if (switch_count != t->last_switch_count) {
 		t->last_switch_count = switch_count;
+		t->last_switch_time = jiffies;
 		return;
 	}
+	if (time_is_after_jiffies(t->last_switch_time + timeout * HZ))
+		return;
 
 	trace_sched_process_hang(t);
 
@@ -245,8 +253,13 @@ static int watchdog(void *dummy)
 
 	for ( ; ; ) {
 		unsigned long timeout = sysctl_hung_task_timeout_secs;
-		long t = hung_timeout_jiffies(hung_last_checked, timeout);
+		unsigned long period = sysctl_hung_task_period_secs;
+		long t;
 
+		if (period == 0)
+			period = timeout;
+		period = min_t(unsigned long, period, timeout);
+		t = hung_timeout_jiffies(hung_last_checked, period);
 		if (t <= 0) {
 			if (!atomic_xchg(&reset_hung_task, 0))
 				check_hung_uninterruptible_tasks(timeout);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 6a78cf70761d..029cfed5189c 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1098,6 +1098,14 @@ static struct ctl_table kern_table[] = {
 		.proc_handler	= proc_dohung_task_timeout_secs,
 		.extra2		= &hung_task_timeout_max,
 	},
+	{
+		.procname	= "hung_task_period_secs",
+		.data		= &sysctl_hung_task_period_secs,
+		.maxlen		= sizeof(unsigned long),
+		.mode		= 0644,
+		.proc_handler	= proc_dohung_task_timeout_secs,
+		.extra2		= &hung_task_timeout_max,
+	},
 	{
 		.procname	= "hung_task_warnings",
 		.data		= &sysctl_hung_task_warnings,
-- 
2.18.0.rc1.242.g61856ae69a-goog

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] kernel/hung_task.c: allow to set period separately from timeout
  2018-06-08 13:30 [PATCH] kernel/hung_task.c: allow to set period separately from timeout Dmitry Vyukov
@ 2018-06-08 21:58 ` Andrew Morton
  2018-06-09  7:00   ` Tetsuo Handa
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2018-06-08 21:58 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: penguin-kernel, paulmck, linux-kernel, syzkaller

On Fri,  8 Jun 2018 15:30:43 +0200 Dmitry Vyukov <dvyukov@google.com> wrote:

> Currently task hung checking period is equal to timeout,
> as the result hung is detected anywhere between timeout and 2*timeout.
> This is fine for most interactive environments, but this hurts automated
> testing setups (syzbot). In an automated setup we need to strictly order
> CPU lockup < RCU stall < workqueue lockup < task hung < silent loss,
> so that RCU stall is not detected as task hung and task hung is not
> detected as silent machine loss. The large variance in task hung
> detection timeout requires setting silent machine loss timeout to
> a very large value (e.g. if task hung is 3 mins, then silent loss
> need to be set to ~7 mins). The additional 3 minutes significantly
> reduce testing efficiency because usually we crash kernel within
> a minute, and this can add hours to bug localization process as it
> needs to do dozens of tests.
> 
> Allow setting checking period separately from timeout.
> This allows to set timeout to, say, 3 minutes, but period to 10 secs.
> 
> The period is controlled via a new hung_task_period_secs sysctl,
> similar to the existing hung_task_timeout_secs sysctl.
> The default value of 0 results in the current behavior.

I'm rather struggling to understand the difference between "period" and
"timeout".  We would benefit from a clear description of what these two
things do.  An appropriate place for this description is
Documentation/sysctl/kernel.txt, which this patch forgot to update.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] kernel/hung_task.c: allow to set period separately from timeout
  2018-06-08 21:58 ` Andrew Morton
@ 2018-06-09  7:00   ` Tetsuo Handa
  2018-06-11 11:16     ` Dmitry Vyukov
  0 siblings, 1 reply; 6+ messages in thread
From: Tetsuo Handa @ 2018-06-09  7:00 UTC (permalink / raw)
  To: Andrew Morton, Dmitry Vyukov; +Cc: paulmck, linux-kernel, syzkaller

On 2018/06/09 6:58, Andrew Morton wrote:
> On Fri,  8 Jun 2018 15:30:43 +0200 Dmitry Vyukov <dvyukov@google.com> wrote:
> 
>> Currently task hung checking period is equal to timeout,
>> as the result hung is detected anywhere between timeout and 2*timeout.
>> This is fine for most interactive environments, but this hurts automated
>> testing setups (syzbot). In an automated setup we need to strictly order
>> CPU lockup < RCU stall < workqueue lockup < task hung < silent loss,
>> so that RCU stall is not detected as task hung and task hung is not
>> detected as silent machine loss. The large variance in task hung
>> detection timeout requires setting silent machine loss timeout to
>> a very large value (e.g. if task hung is 3 mins, then silent loss
>> need to be set to ~7 mins). The additional 3 minutes significantly
>> reduce testing efficiency because usually we crash kernel within
>> a minute, and this can add hours to bug localization process as it
>> needs to do dozens of tests.
>>
>> Allow setting checking period separately from timeout.
>> This allows to set timeout to, say, 3 minutes, but period to 10 secs.
>>
>> The period is controlled via a new hung_task_period_secs sysctl,
>> similar to the existing hung_task_timeout_secs sysctl.
>> The default value of 0 results in the current behavior.
> 
> I'm rather struggling to understand the difference between "period" and
> "timeout".  We would benefit from a clear description of what these two
> things do.  An appropriate place for this description is
> Documentation/sysctl/kernel.txt, which this patch forgot to update.

My understanding is that "period" is "how frequently we should check"
and "timeout" is "how long a thread remained uninterruptible". Maybe
hung_task_check_interval_secs would be better than hung_task_period_secs.

timeout = 60 and period = 1 would allow hung task to be reported as soon
as it remained uninterruptible for 60 seconds. That makes me easier to
narrow down relevant kernel messages and syzbot program.

Well, showing exact slept time, along with all threads which slept more
than some threshold (e.g. timeout / 2), might be helpful.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] kernel/hung_task.c: allow to set period separately from timeout
  2018-06-09  7:00   ` Tetsuo Handa
@ 2018-06-11 11:16     ` Dmitry Vyukov
  2018-06-11 12:48       ` Tetsuo Handa
  0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Vyukov @ 2018-06-11 11:16 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Andrew Morton, Paul McKenney, LKML, syzkaller

On Sat, Jun 9, 2018 at 9:00 AM, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
> On 2018/06/09 6:58, Andrew Morton wrote:
>> On Fri,  8 Jun 2018 15:30:43 +0200 Dmitry Vyukov <dvyukov@google.com> wrote:
>>
>>> Currently task hung checking period is equal to timeout,
>>> as the result hung is detected anywhere between timeout and 2*timeout.
>>> This is fine for most interactive environments, but this hurts automated
>>> testing setups (syzbot). In an automated setup we need to strictly order
>>> CPU lockup < RCU stall < workqueue lockup < task hung < silent loss,
>>> so that RCU stall is not detected as task hung and task hung is not
>>> detected as silent machine loss. The large variance in task hung
>>> detection timeout requires setting silent machine loss timeout to
>>> a very large value (e.g. if task hung is 3 mins, then silent loss
>>> need to be set to ~7 mins). The additional 3 minutes significantly
>>> reduce testing efficiency because usually we crash kernel within
>>> a minute, and this can add hours to bug localization process as it
>>> needs to do dozens of tests.
>>>
>>> Allow setting checking period separately from timeout.
>>> This allows to set timeout to, say, 3 minutes, but period to 10 secs.
>>>
>>> The period is controlled via a new hung_task_period_secs sysctl,
>>> similar to the existing hung_task_timeout_secs sysctl.
>>> The default value of 0 results in the current behavior.
>>
>> I'm rather struggling to understand the difference between "period" and
>> "timeout".  We would benefit from a clear description of what these two
>> things do.  An appropriate place for this description is
>> Documentation/sysctl/kernel.txt, which this patch forgot to update.
>
> My understanding is that "period" is "how frequently we should check"
> and "timeout" is "how long a thread remained uninterruptible". Maybe
> hung_task_check_interval_secs would be better than hung_task_period_secs.

Hi Tetsuo, Andrew,

I've just mailed v2:

    Changes since v1:
     - add entry to Documentation/sysctl/kernel.txt
     - rename hung_task_period_secs sysctl to hung_task_check_interval_sec

Hopefully now it's more clear what's the difference and what it is doing.



> timeout = 60 and period = 1 would allow hung task to be reported as soon
> as it remained uninterruptible for 60 seconds. That makes me easier to
> narrow down relevant kernel messages and syzbot program.
>
> Well, showing exact slept time, along with all threads which slept more
> than some threshold (e.g. timeout / 2), might be helpful.

You mean if we report any task, then scan all tasks second time and
additionally report tasks that are blocked for (timeout/2 : timeout)?

Should we do this when hung_task_show_lock? Or only when
sysctl_hung_task_panic? Or when?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] kernel/hung_task.c: allow to set period separately from timeout
  2018-06-11 11:16     ` Dmitry Vyukov
@ 2018-06-11 12:48       ` Tetsuo Handa
  2018-06-11 13:16         ` Dmitry Vyukov
  0 siblings, 1 reply; 6+ messages in thread
From: Tetsuo Handa @ 2018-06-11 12:48 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: Andrew Morton, Paul McKenney, LKML, syzkaller

On 2018/06/11 20:16, Dmitry Vyukov wrote:
>> timeout = 60 and period = 1 would allow hung task to be reported as soon
>> as it remained uninterruptible for 60 seconds. That makes me easier to
>> narrow down relevant kernel messages and syzbot program.
>>
>> Well, showing exact slept time, along with all threads which slept more
>> than some threshold (e.g. timeout / 2), might be helpful.
> 
> You mean if we report any task, then scan all tasks second time and
> additionally report tasks that are blocked for (timeout/2 : timeout)?

Yes. Something like check_memalloc_stalling_tasks() in
http://lkml.kernel.org/r/1495331504-12480-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp .

> 
> Should we do this when hung_task_show_lock? Or only when
> sysctl_hung_task_panic? Or when?
> 

I think always is more useful. That is, first round only checks whether
there is at least one stalling task, and second round reports stalling tasks
if at least one task is stalling.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] kernel/hung_task.c: allow to set period separately from timeout
  2018-06-11 12:48       ` Tetsuo Handa
@ 2018-06-11 13:16         ` Dmitry Vyukov
  0 siblings, 0 replies; 6+ messages in thread
From: Dmitry Vyukov @ 2018-06-11 13:16 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Andrew Morton, Paul McKenney, LKML, syzkaller

On Mon, Jun 11, 2018 at 2:48 PM, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
> On 2018/06/11 20:16, Dmitry Vyukov wrote:
>>> timeout = 60 and period = 1 would allow hung task to be reported as soon
>>> as it remained uninterruptible for 60 seconds. That makes me easier to
>>> narrow down relevant kernel messages and syzbot program.
>>>
>>> Well, showing exact slept time, along with all threads which slept more
>>> than some threshold (e.g. timeout / 2), might be helpful.
>>
>> You mean if we report any task, then scan all tasks second time and
>> additionally report tasks that are blocked for (timeout/2 : timeout)?
>
> Yes. Something like check_memalloc_stalling_tasks() in
> http://lkml.kernel.org/r/1495331504-12480-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp .
>
>>
>> Should we do this when hung_task_show_lock? Or only when
>> sysctl_hung_task_panic? Or when?
>>
>
> I think always is more useful. That is, first round only checks whether
> there is at least one stalling task, and second round reports stalling tasks
> if at least one task is stalling.


Agree that it would be useful. But can't promise to work on this soon.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-06-11 13:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-08 13:30 [PATCH] kernel/hung_task.c: allow to set period separately from timeout Dmitry Vyukov
2018-06-08 21:58 ` Andrew Morton
2018-06-09  7:00   ` Tetsuo Handa
2018-06-11 11:16     ` Dmitry Vyukov
2018-06-11 12:48       ` Tetsuo Handa
2018-06-11 13:16         ` Dmitry Vyukov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).