* [PATCH 0/1] hung_task debugging: Add tracepoint to report the hang
@ 2013-10-19 16:18 Oleg Nesterov
2013-10-19 16:18 ` [PATCH 1/1] " Oleg Nesterov
0 siblings, 1 reply; 5+ messages in thread
From: Oleg Nesterov @ 2013-10-19 16:18 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Steven Rostedt; +Cc: Dave Sullivan, linux-kernel
Hi,
We have a feature request, the customer needs something more hookable
than just printk's from check_hung_task() to implement the user-space
watchdog which can potentially resolve the problems which caused the
hang.
The patch simply adds a tracepoint into check_hung_task(), do you
think we can (ab)use tp this way?
And I am just curious, perhaps another patch (below) makes sense too?
Oleg.
------------------------------------------------------------------------------
[PATCH] hung_task debugging: Do not report the same task twice
If a task hangs, check_hung_task() will blame it again and again
until it wakes up or sysctl_hung_task_warnings becomes zero. IMO,
this just adds the unnecessary noise and if another task hangs
after sysctl_hung_task_timeout_secs * sysctl_hung_task_warnings
it won't be reported.
With this patch check_hung_task() simply sets the most significant
bit in ->last_switch_count to mark this task as "already reported".
This bit is cleared if we notice the change in nvcsw/nivcsw, so we
do not skip this task if it hangs again later.
Note that we also ignore the MSB in switch_count, we need this to
avoid the false-positive "already reported". This means that
->last_switch_count is not necessarily equal to ->nvcsw + ->nivcsw
but we do not care, we have enough bits to notice the change.
And this allows to remove the special switch_count == 0 case, just
we need to initialize ->last_switch_count = LONG_MIN.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/fork.c | 2 +-
kernel/hung_task.c | 19 +++++++++----------
2 files changed, 10 insertions(+), 11 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index 8531609..aea397b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -866,7 +866,7 @@ static int copy_mm(unsigned long clone_flags, struct task_struct *tsk)
tsk->min_flt = tsk->maj_flt = 0;
tsk->nvcsw = tsk->nivcsw = 0;
#ifdef CONFIG_DETECT_HUNG_TASK
- tsk->last_switch_count = tsk->nvcsw + tsk->nivcsw;
+ tsk->last_switch_count = LONG_MIN; /* see check_hung_task() */
#endif
tsk->mm = NULL;
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 3952ab1..0f6233c 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -72,7 +72,7 @@ static struct notifier_block panic_block = {
static void check_hung_task(struct task_struct *t, unsigned long timeout)
{
- unsigned long switch_count = t->nvcsw + t->nivcsw;
+ unsigned long switch_count;
/*
* Ensure the task is not frozen.
@@ -81,19 +81,18 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
if (unlikely(t->flags & (PF_FROZEN | PF_FREEZER_SKIP)))
return;
- /*
- * When a freshly created task is scheduled once, changes its state to
- * TASK_UNINTERRUPTIBLE without having ever been switched out once, it
- * musn't be checked.
- */
- if (unlikely(!switch_count))
- return;
-
- if (switch_count != t->last_switch_count) {
+ /* Ignore MSB, see below. LONG_MAX = ~LONG_MIN. */
+ switch_count = (t->nvcsw + t->nivcsw) & LONG_MAX;
+ if (switch_count != (LONG_MAX & t->last_switch_count)) {
t->last_switch_count = switch_count;
return;
}
+ /* We use MSB to mark this task as already reported. */
+ if (t->last_switch_count & LONG_MIN)
+ return;
+ t->last_switch_count |= LONG_MIN;
+
trace_sched_process_hang(t);
if (!sysctl_hung_task_warnings)
--
1.5.5.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 1/1] hung_task debugging: Add tracepoint to report the hang
2013-10-19 16:18 [PATCH 0/1] hung_task debugging: Add tracepoint to report the hang Oleg Nesterov
@ 2013-10-19 16:18 ` Oleg Nesterov
2013-10-20 8:48 ` Ingo Molnar
2013-10-31 10:53 ` [tip:core/locking] " tip-bot for Oleg Nesterov
0 siblings, 2 replies; 5+ messages in thread
From: Oleg Nesterov @ 2013-10-19 16:18 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Steven Rostedt; +Cc: Dave Sullivan, linux-kernel
Currently check_hung_task() prints a warning if it detects the
problem, but it is not convenient to watch the system logs if
user-space wants to be notified about the hang.
Add the new trace_sched_process_hang() into check_hung_task(),
this way a user-space monitor can easily wait for the hang and
potentially resolve a problem.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
include/trace/events/sched.h | 19 +++++++++++++++++++
kernel/hung_task.c | 4 ++++
2 files changed, 23 insertions(+), 0 deletions(-)
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 2e7d994..2a652d1 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -424,6 +424,25 @@ TRACE_EVENT(sched_pi_setprio,
__entry->oldprio, __entry->newprio)
);
+#ifdef CONFIG_DETECT_HUNG_TASK
+TRACE_EVENT(sched_process_hang,
+ TP_PROTO(struct task_struct *tsk),
+ TP_ARGS(tsk),
+
+ TP_STRUCT__entry(
+ __array( char, comm, TASK_COMM_LEN )
+ __field( pid_t, pid )
+ ),
+
+ TP_fast_assign(
+ memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
+ __entry->pid = tsk->pid;
+ ),
+
+ TP_printk("comm=%s pid=%d", __entry->comm, __entry->pid)
+);
+#endif /* CONFIG_DETECT_HUNG_TASK */
+
#endif /* _TRACE_SCHED_H */
/* This part must be outside protection */
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 3e97fb1..3952ab1 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -16,6 +16,7 @@
#include <linux/export.h>
#include <linux/sysctl.h>
#include <linux/utsname.h>
+#include <trace/events/sched.h>
/*
* The number of tasks checked:
@@ -92,6 +93,9 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
t->last_switch_count = switch_count;
return;
}
+
+ trace_sched_process_hang(t);
+
if (!sysctl_hung_task_warnings)
return;
sysctl_hung_task_warnings--;
--
1.5.5.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] hung_task debugging: Add tracepoint to report the hang
2013-10-19 16:18 ` [PATCH 1/1] " Oleg Nesterov
@ 2013-10-20 8:48 ` Ingo Molnar
2013-10-29 19:10 ` Oleg Nesterov
2013-10-31 10:53 ` [tip:core/locking] " tip-bot for Oleg Nesterov
1 sibling, 1 reply; 5+ messages in thread
From: Ingo Molnar @ 2013-10-20 8:48 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: Peter Zijlstra, Steven Rostedt, Dave Sullivan, linux-kernel
* Oleg Nesterov <oleg@redhat.com> wrote:
> Currently check_hung_task() prints a warning if it detects the problem,
> but it is not convenient to watch the system logs if user-space wants to
> be notified about the hang.
>
> Add the new trace_sched_process_hang() into check_hung_task(), this way
> a user-space monitor can easily wait for the hang and potentially
> resolve a problem.
I'm wondering, is the data of trace_console() in kernel/printk/printk.c
not sufficient?
If it's not enough then it might be better to add a higher level printk
tracepoint instead - that can catch hung_task messages and (much) more.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] hung_task debugging: Add tracepoint to report the hang
2013-10-20 8:48 ` Ingo Molnar
@ 2013-10-29 19:10 ` Oleg Nesterov
0 siblings, 0 replies; 5+ messages in thread
From: Oleg Nesterov @ 2013-10-29 19:10 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Peter Zijlstra, Steven Rostedt, Dave Sullivan, linux-kernel
On 10/20, Ingo Molnar wrote:
>
> * Oleg Nesterov <oleg@redhat.com> wrote:
>
> > Currently check_hung_task() prints a warning if it detects the problem,
> > but it is not convenient to watch the system logs if user-space wants to
> > be notified about the hang.
> >
> > Add the new trace_sched_process_hang() into check_hung_task(), this way
> > a user-space monitor can easily wait for the hang and potentially
> > resolve a problem.
>
> I'm wondering, is the data of trace_console() in kernel/printk/printk.c
> not sufficient?
Probably yes... I do not think they disable CONFIG_PRINTK.
But this is obviously much less convenient, they will need to parse the
text. And the user-space watchdog will be woken up much more often than
necessary. And they could probably simply read /var/log or interact with
syslogd somehow, but they specially asked for something better and more
robust.
But of course, I understand that every tracepoint should be justified.
So if you do not like this change I try to convince them to use
trace_console().
> If it's not enough then it might be better to add a higher level printk
> tracepoint instead - that can catch hung_task messages and (much) more.
Not sure I understand... I mean I do not understand why this is really
better for them, except this will simplify the parsing a bit. Anyway
I'd prefer to not send another doubtful patch ;)
Thanks.
Oleg.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [tip:core/locking] hung_task debugging: Add tracepoint to report the hang
2013-10-19 16:18 ` [PATCH 1/1] " Oleg Nesterov
2013-10-20 8:48 ` Ingo Molnar
@ 2013-10-31 10:53 ` tip-bot for Oleg Nesterov
1 sibling, 0 replies; 5+ messages in thread
From: tip-bot for Oleg Nesterov @ 2013-10-31 10:53 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, dsulliva, peterz, rostedt, tglx, oleg
Commit-ID: 6a716c90a51338009c3bc1f460829afaed8f922d
Gitweb: http://git.kernel.org/tip/6a716c90a51338009c3bc1f460829afaed8f922d
Author: Oleg Nesterov <oleg@redhat.com>
AuthorDate: Sat, 19 Oct 2013 18:18:28 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 31 Oct 2013 11:16:18 +0100
hung_task debugging: Add tracepoint to report the hang
Currently check_hung_task() prints a warning if it detects the
problem, but it is not convenient to watch the system logs if
user-space wants to be notified about the hang.
Add the new trace_sched_process_hang() into check_hung_task(),
this way a user-space monitor can easily wait for the hang and
potentially resolve a problem.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Dave Sullivan <dsulliva@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20131019161828.GA7439@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
include/trace/events/sched.h | 19 +++++++++++++++++++
kernel/hung_task.c | 4 ++++
2 files changed, 23 insertions(+)
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 2e7d994..2a652d1 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -424,6 +424,25 @@ TRACE_EVENT(sched_pi_setprio,
__entry->oldprio, __entry->newprio)
);
+#ifdef CONFIG_DETECT_HUNG_TASK
+TRACE_EVENT(sched_process_hang,
+ TP_PROTO(struct task_struct *tsk),
+ TP_ARGS(tsk),
+
+ TP_STRUCT__entry(
+ __array( char, comm, TASK_COMM_LEN )
+ __field( pid_t, pid )
+ ),
+
+ TP_fast_assign(
+ memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
+ __entry->pid = tsk->pid;
+ ),
+
+ TP_printk("comm=%s pid=%d", __entry->comm, __entry->pid)
+);
+#endif /* CONFIG_DETECT_HUNG_TASK */
+
#endif /* _TRACE_SCHED_H */
/* This part must be outside protection */
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 0422523..8807061 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -16,6 +16,7 @@
#include <linux/export.h>
#include <linux/sysctl.h>
#include <linux/utsname.h>
+#include <trace/events/sched.h>
/*
* The number of tasks checked:
@@ -92,6 +93,9 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
t->last_switch_count = switch_count;
return;
}
+
+ trace_sched_process_hang(t);
+
if (!sysctl_hung_task_warnings)
return;
sysctl_hung_task_warnings--;
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-10-31 10:54 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-19 16:18 [PATCH 0/1] hung_task debugging: Add tracepoint to report the hang Oleg Nesterov
2013-10-19 16:18 ` [PATCH 1/1] " Oleg Nesterov
2013-10-20 8:48 ` Ingo Molnar
2013-10-29 19:10 ` Oleg Nesterov
2013-10-31 10:53 ` [tip:core/locking] " tip-bot for Oleg Nesterov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).