* [BUG] perf_events: NMI watchdog event cannot be throttled
@ 2010-08-18 20:26 Stephane Eranian
2010-08-19 11:05 ` Peter Zijlstra
0 siblings, 1 reply; 6+ messages in thread
From: Stephane Eranian @ 2010-08-18 20:26 UTC (permalink / raw)
To: LKML
Cc: Peter Zijlstra, mingo, David S. Miller, Paul Mackerras,
Frédéric Weisbecker, eranian, perfmon2-devel
Hi,
I ran into some issue with the NMI watchdog not firing in a deadlock
situation. After some debugging I found the source of the problem.
The NMI watchdog is currently subject, like any other events, to interrupt
throttling. The heart of the problem is that if you are deadlocked on a CPU
with interrupts masked, the timer interrupt won't fire, therefore the
hwc->interrupts
field won't be reset. Then, depending on the max sampling rate, you
could eventually
fail the max interrupt rate test in __pfm_overflow_handler() and
perf_events would
throttle, i.e., stop, the NMI watchdog event before the 5s delay to panic.
Thus, you would never get the panic. I ran into this problem myself.
This is a serious issue because perf_events must ensure the watchdog can
always fire, regardless of the interrupt masking situation.
Look like one way of solving the problem would be to mark the NMI watchdog
event as immune to throttling. The event being internal to the kernel we could
trust the event setup from perf_event_create_kernel_counter().
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] perf_events: NMI watchdog event cannot be throttled
2010-08-18 20:26 [BUG] perf_events: NMI watchdog event cannot be throttled Stephane Eranian
@ 2010-08-19 11:05 ` Peter Zijlstra
2010-08-19 11:24 ` Stephane Eranian
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Peter Zijlstra @ 2010-08-19 11:05 UTC (permalink / raw)
To: Stephane Eranian
Cc: LKML, mingo, David S. Miller, Paul Mackerras,
Frédéric Weisbecker, eranian, perfmon2-devel
On Wed, 2010-08-18 at 22:26 +0200, Stephane Eranian wrote:
> Hi,
>
> I ran into some issue with the NMI watchdog not firing in a deadlock
> situation. After some debugging I found the source of the problem.
>
> The NMI watchdog is currently subject, like any other events, to interrupt
> throttling. The heart of the problem is that if you are deadlocked on a CPU
> with interrupts masked, the timer interrupt won't fire, therefore the
> hwc->interrupts
> field won't be reset. Then, depending on the max sampling rate, you
> could eventually
> fail the max interrupt rate test in __pfm_overflow_handler() and
> perf_events would
> throttle, i.e., stop, the NMI watchdog event before the 5s delay to panic.
> Thus, you would never get the panic. I ran into this problem myself.
>
> This is a serious issue because perf_events must ensure the watchdog can
> always fire, regardless of the interrupt masking situation.
>
> Look like one way of solving the problem would be to mark the NMI watchdog
> event as immune to throttling. The event being internal to the kernel we could
> trust the event setup from perf_event_create_kernel_counter().
Something like so?
---
kernel/watchdog.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 613bc1f..e0fe6e4 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -206,6 +206,9 @@ void watchdog_overflow_callback(struct perf_event *event, int nmi,
struct perf_sample_data *data,
struct pt_regs *regs)
{
+ /* Ensure the watchdog never gets throttled. */
+ event->hw.interrupts = 0;
+
if (__get_cpu_var(watchdog_nmi_touch) == true) {
__get_cpu_var(watchdog_nmi_touch) = false;
return;
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [BUG] perf_events: NMI watchdog event cannot be throttled
2010-08-19 11:05 ` Peter Zijlstra
@ 2010-08-19 11:24 ` Stephane Eranian
2010-08-19 13:01 ` Stephane Eranian
2010-08-20 14:18 ` [tip:perf/urgent] watchdog: Don't throttle the watchdog tip-bot for Peter Zijlstra
2010-08-23 8:51 ` tip-bot for Peter Zijlstra
2 siblings, 1 reply; 6+ messages in thread
From: Stephane Eranian @ 2010-08-19 11:24 UTC (permalink / raw)
To: Peter Zijlstra
Cc: LKML, mingo, David S. Miller, Paul Mackerras,
Frédéric Weisbecker, eranian, perfmon2-devel
Yeah, that should probably fix it. Let me try it out.
On Thu, Aug 19, 2010 at 1:05 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2010-08-18 at 22:26 +0200, Stephane Eranian wrote:
>> Hi,
>>
>> I ran into some issue with the NMI watchdog not firing in a deadlock
>> situation. After some debugging I found the source of the problem.
>>
>> The NMI watchdog is currently subject, like any other events, to interrupt
>> throttling. The heart of the problem is that if you are deadlocked on a CPU
>> with interrupts masked, the timer interrupt won't fire, therefore the
>> hwc->interrupts
>> field won't be reset. Then, depending on the max sampling rate, you
>> could eventually
>> fail the max interrupt rate test in __pfm_overflow_handler() and
>> perf_events would
>> throttle, i.e., stop, the NMI watchdog event before the 5s delay to panic.
>> Thus, you would never get the panic. I ran into this problem myself.
>>
>> This is a serious issue because perf_events must ensure the watchdog can
>> always fire, regardless of the interrupt masking situation.
>>
>> Look like one way of solving the problem would be to mark the NMI watchdog
>> event as immune to throttling. The event being internal to the kernel we could
>> trust the event setup from perf_event_create_kernel_counter().
>
> Something like so?
>
> ---
> kernel/watchdog.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 613bc1f..e0fe6e4 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -206,6 +206,9 @@ void watchdog_overflow_callback(struct perf_event *event, int nmi,
> struct perf_sample_data *data,
> struct pt_regs *regs)
> {
> + /* Ensure the watchdog never gets throttled. */
> + event->hw.interrupts = 0;
> +
> if (__get_cpu_var(watchdog_nmi_touch) == true) {
> __get_cpu_var(watchdog_nmi_touch) = false;
> return;
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] perf_events: NMI watchdog event cannot be throttled
2010-08-19 11:24 ` Stephane Eranian
@ 2010-08-19 13:01 ` Stephane Eranian
0 siblings, 0 replies; 6+ messages in thread
From: Stephane Eranian @ 2010-08-19 13:01 UTC (permalink / raw)
To: Peter Zijlstra
Cc: LKML, mingo, David S. Miller, Paul Mackerras,
Frédéric Weisbecker, eranian, perfmon2-devel
On Thu, Aug 19, 2010 at 1:24 PM, Stephane Eranian <eranian@google.com> wrote:
> Yeah, that should probably fix it. Let me try it out.
>
Works for me.
Thanks.
>
> On Thu, Aug 19, 2010 at 1:05 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> On Wed, 2010-08-18 at 22:26 +0200, Stephane Eranian wrote:
>>> Hi,
>>>
>>> I ran into some issue with the NMI watchdog not firing in a deadlock
>>> situation. After some debugging I found the source of the problem.
>>>
>>> The NMI watchdog is currently subject, like any other events, to interrupt
>>> throttling. The heart of the problem is that if you are deadlocked on a CPU
>>> with interrupts masked, the timer interrupt won't fire, therefore the
>>> hwc->interrupts
>>> field won't be reset. Then, depending on the max sampling rate, you
>>> could eventually
>>> fail the max interrupt rate test in __pfm_overflow_handler() and
>>> perf_events would
>>> throttle, i.e., stop, the NMI watchdog event before the 5s delay to panic.
>>> Thus, you would never get the panic. I ran into this problem myself.
>>>
>>> This is a serious issue because perf_events must ensure the watchdog can
>>> always fire, regardless of the interrupt masking situation.
>>>
>>> Look like one way of solving the problem would be to mark the NMI watchdog
>>> event as immune to throttling. The event being internal to the kernel we could
>>> trust the event setup from perf_event_create_kernel_counter().
>>
>> Something like so?
>>
>> ---
>> kernel/watchdog.c | 3 +++
>> 1 files changed, 3 insertions(+), 0 deletions(-)
>>
>> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>> index 613bc1f..e0fe6e4 100644
>> --- a/kernel/watchdog.c
>> +++ b/kernel/watchdog.c
>> @@ -206,6 +206,9 @@ void watchdog_overflow_callback(struct perf_event *event, int nmi,
>> struct perf_sample_data *data,
>> struct pt_regs *regs)
>> {
>> + /* Ensure the watchdog never gets throttled. */
>> + event->hw.interrupts = 0;
>> +
>> if (__get_cpu_var(watchdog_nmi_touch) == true) {
>> __get_cpu_var(watchdog_nmi_touch) = false;
>> return;
>>
>>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [tip:perf/urgent] watchdog: Don't throttle the watchdog
2010-08-19 11:05 ` Peter Zijlstra
2010-08-19 11:24 ` Stephane Eranian
@ 2010-08-20 14:18 ` tip-bot for Peter Zijlstra
2010-08-23 8:51 ` tip-bot for Peter Zijlstra
2 siblings, 0 replies; 6+ messages in thread
From: tip-bot for Peter Zijlstra @ 2010-08-20 14:18 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, eranian, hpa, mingo, a.p.zijlstra, fweisbec, tglx,
mingo, dzickus
Commit-ID: b847b94fe2caff0f28a99af6353e9a27282e771a
Gitweb: http://git.kernel.org/tip/b847b94fe2caff0f28a99af6353e9a27282e771a
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Fri, 20 Aug 2010 11:49:15 +0200
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 20 Aug 2010 15:00:06 +0200
watchdog: Don't throttle the watchdog
Stephane reported that when the machine locks up, the regular ticks,
which are responsible to resetting the throttle count, stop too.
Hence the NMI watchdog can end up being throttled before it reports on
the locked up state, and we end up being sad..
Cure this by having the watchdog overflow reset its own throttle count.
Reported-by: Stephane Eranian <eranian@google.com>
Tested-by: Stephane Eranian <eranian@google.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1282215916.1926.4696.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/watchdog.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 613bc1f..0d53c8e 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -206,6 +206,9 @@ void watchdog_overflow_callback(struct perf_event *event, int nmi,
struct perf_sample_data *data,
struct pt_regs *regs)
{
+ /* Ensure the watchdog never gets throttled */
+ event->hw.interrupts = 0;
+
if (__get_cpu_var(watchdog_nmi_touch) == true) {
__get_cpu_var(watchdog_nmi_touch) = false;
return;
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [tip:perf/urgent] watchdog: Don't throttle the watchdog
2010-08-19 11:05 ` Peter Zijlstra
2010-08-19 11:24 ` Stephane Eranian
2010-08-20 14:18 ` [tip:perf/urgent] watchdog: Don't throttle the watchdog tip-bot for Peter Zijlstra
@ 2010-08-23 8:51 ` tip-bot for Peter Zijlstra
2 siblings, 0 replies; 6+ messages in thread
From: tip-bot for Peter Zijlstra @ 2010-08-23 8:51 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, eranian, hpa, mingo, a.p.zijlstra, fweisbec, tglx,
mingo, dzickus
Commit-ID: c6db67cda735d8ace5f19c3831240e1408679790
Gitweb: http://git.kernel.org/tip/c6db67cda735d8ace5f19c3831240e1408679790
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Fri, 20 Aug 2010 11:49:15 +0200
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 23 Aug 2010 10:48:05 +0200
watchdog: Don't throttle the watchdog
Stephane reported that when the machine locks up, the regular ticks,
which are responsible to resetting the throttle count, stop too.
Hence the NMI watchdog can end up being throttled before it reports on
the locked up state, and we end up being sad..
Cure this by having the watchdog overflow reset its own throttle count.
Reported-by: Stephane Eranian <eranian@google.com>
Tested-by: Stephane Eranian <eranian@google.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1282215916.1926.4696.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/watchdog.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 613bc1f..0d53c8e 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -206,6 +206,9 @@ void watchdog_overflow_callback(struct perf_event *event, int nmi,
struct perf_sample_data *data,
struct pt_regs *regs)
{
+ /* Ensure the watchdog never gets throttled */
+ event->hw.interrupts = 0;
+
if (__get_cpu_var(watchdog_nmi_touch) == true) {
__get_cpu_var(watchdog_nmi_touch) = false;
return;
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-08-23 8:52 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-18 20:26 [BUG] perf_events: NMI watchdog event cannot be throttled Stephane Eranian
2010-08-19 11:05 ` Peter Zijlstra
2010-08-19 11:24 ` Stephane Eranian
2010-08-19 13:01 ` Stephane Eranian
2010-08-20 14:18 ` [tip:perf/urgent] watchdog: Don't throttle the watchdog tip-bot for Peter Zijlstra
2010-08-23 8:51 ` tip-bot for Peter Zijlstra
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.