All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] perf_events: NMI watchdog event cannot be throttled
@ 2010-08-18 20:26 Stephane Eranian
  2010-08-19 11:05 ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: Stephane Eranian @ 2010-08-18 20:26 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, mingo, David S. Miller, Paul Mackerras,
	Frédéric Weisbecker, eranian, perfmon2-devel

Hi,

I ran into some issue  with the NMI watchdog not firing in a deadlock
situation. After some debugging I found the source of the problem.

The NMI watchdog is currently subject, like any other events, to interrupt
throttling. The heart of the problem is that if you are deadlocked on a CPU
with interrupts masked, the timer interrupt won't fire, therefore the
hwc->interrupts
field won't be reset. Then, depending on the max sampling rate, you
could eventually
fail the max interrupt rate test in __pfm_overflow_handler() and
perf_events would
throttle, i.e., stop, the NMI watchdog event before the 5s delay to panic.
Thus, you would never get the panic. I ran into this problem myself.

This is a serious issue because perf_events must ensure the watchdog can
always fire, regardless of the interrupt masking situation.

Look like one way of solving the problem would be to mark the NMI watchdog
event as immune to throttling. The event being internal to the kernel we could
trust the event setup from perf_event_create_kernel_counter().

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] perf_events: NMI watchdog event cannot be throttled
  2010-08-18 20:26 [BUG] perf_events: NMI watchdog event cannot be throttled Stephane Eranian
@ 2010-08-19 11:05 ` Peter Zijlstra
  2010-08-19 11:24   ` Stephane Eranian
                     ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Peter Zijlstra @ 2010-08-19 11:05 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: LKML, mingo, David S. Miller, Paul Mackerras,
	Frédéric Weisbecker, eranian, perfmon2-devel

On Wed, 2010-08-18 at 22:26 +0200, Stephane Eranian wrote:
> Hi,
> 
> I ran into some issue  with the NMI watchdog not firing in a deadlock
> situation. After some debugging I found the source of the problem.
> 
> The NMI watchdog is currently subject, like any other events, to interrupt
> throttling. The heart of the problem is that if you are deadlocked on a CPU
> with interrupts masked, the timer interrupt won't fire, therefore the
> hwc->interrupts
> field won't be reset. Then, depending on the max sampling rate, you
> could eventually
> fail the max interrupt rate test in __pfm_overflow_handler() and
> perf_events would
> throttle, i.e., stop, the NMI watchdog event before the 5s delay to panic.
> Thus, you would never get the panic. I ran into this problem myself.
> 
> This is a serious issue because perf_events must ensure the watchdog can
> always fire, regardless of the interrupt masking situation.
> 
> Look like one way of solving the problem would be to mark the NMI watchdog
> event as immune to throttling. The event being internal to the kernel we could
> trust the event setup from perf_event_create_kernel_counter().

Something like so?

---
 kernel/watchdog.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 613bc1f..e0fe6e4 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -206,6 +206,9 @@ void watchdog_overflow_callback(struct perf_event *event, int nmi,
 		 struct perf_sample_data *data,
 		 struct pt_regs *regs)
 {
+	/* Ensure the watchdog never gets throttled. */
+	event->hw.interrupts = 0;
+
 	if (__get_cpu_var(watchdog_nmi_touch) == true) {
 		__get_cpu_var(watchdog_nmi_touch) = false;
 		return;


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [BUG] perf_events: NMI watchdog event cannot be throttled
  2010-08-19 11:05 ` Peter Zijlstra
@ 2010-08-19 11:24   ` Stephane Eranian
  2010-08-19 13:01     ` Stephane Eranian
  2010-08-20 14:18   ` [tip:perf/urgent] watchdog: Don't throttle the watchdog tip-bot for Peter Zijlstra
  2010-08-23  8:51   ` tip-bot for Peter Zijlstra
  2 siblings, 1 reply; 6+ messages in thread
From: Stephane Eranian @ 2010-08-19 11:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, mingo, David S. Miller, Paul Mackerras,
	Frédéric Weisbecker, eranian, perfmon2-devel

Yeah, that should probably fix it. Let me try it out.


On Thu, Aug 19, 2010 at 1:05 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2010-08-18 at 22:26 +0200, Stephane Eranian wrote:
>> Hi,
>>
>> I ran into some issue  with the NMI watchdog not firing in a deadlock
>> situation. After some debugging I found the source of the problem.
>>
>> The NMI watchdog is currently subject, like any other events, to interrupt
>> throttling. The heart of the problem is that if you are deadlocked on a CPU
>> with interrupts masked, the timer interrupt won't fire, therefore the
>> hwc->interrupts
>> field won't be reset. Then, depending on the max sampling rate, you
>> could eventually
>> fail the max interrupt rate test in __pfm_overflow_handler() and
>> perf_events would
>> throttle, i.e., stop, the NMI watchdog event before the 5s delay to panic.
>> Thus, you would never get the panic. I ran into this problem myself.
>>
>> This is a serious issue because perf_events must ensure the watchdog can
>> always fire, regardless of the interrupt masking situation.
>>
>> Look like one way of solving the problem would be to mark the NMI watchdog
>> event as immune to throttling. The event being internal to the kernel we could
>> trust the event setup from perf_event_create_kernel_counter().
>
> Something like so?
>
> ---
>  kernel/watchdog.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 613bc1f..e0fe6e4 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -206,6 +206,9 @@ void watchdog_overflow_callback(struct perf_event *event, int nmi,
>                 struct perf_sample_data *data,
>                 struct pt_regs *regs)
>  {
> +       /* Ensure the watchdog never gets throttled. */
> +       event->hw.interrupts = 0;
> +
>        if (__get_cpu_var(watchdog_nmi_touch) == true) {
>                __get_cpu_var(watchdog_nmi_touch) = false;
>                return;
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] perf_events: NMI watchdog event cannot be throttled
  2010-08-19 11:24   ` Stephane Eranian
@ 2010-08-19 13:01     ` Stephane Eranian
  0 siblings, 0 replies; 6+ messages in thread
From: Stephane Eranian @ 2010-08-19 13:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, mingo, David S. Miller, Paul Mackerras,
	Frédéric Weisbecker, eranian, perfmon2-devel

On Thu, Aug 19, 2010 at 1:24 PM, Stephane Eranian <eranian@google.com> wrote:
> Yeah, that should probably fix it. Let me try it out.
>
Works for me.
Thanks.

>
> On Thu, Aug 19, 2010 at 1:05 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> On Wed, 2010-08-18 at 22:26 +0200, Stephane Eranian wrote:
>>> Hi,
>>>
>>> I ran into some issue  with the NMI watchdog not firing in a deadlock
>>> situation. After some debugging I found the source of the problem.
>>>
>>> The NMI watchdog is currently subject, like any other events, to interrupt
>>> throttling. The heart of the problem is that if you are deadlocked on a CPU
>>> with interrupts masked, the timer interrupt won't fire, therefore the
>>> hwc->interrupts
>>> field won't be reset. Then, depending on the max sampling rate, you
>>> could eventually
>>> fail the max interrupt rate test in __pfm_overflow_handler() and
>>> perf_events would
>>> throttle, i.e., stop, the NMI watchdog event before the 5s delay to panic.
>>> Thus, you would never get the panic. I ran into this problem myself.
>>>
>>> This is a serious issue because perf_events must ensure the watchdog can
>>> always fire, regardless of the interrupt masking situation.
>>>
>>> Look like one way of solving the problem would be to mark the NMI watchdog
>>> event as immune to throttling. The event being internal to the kernel we could
>>> trust the event setup from perf_event_create_kernel_counter().
>>
>> Something like so?
>>
>> ---
>>  kernel/watchdog.c |    3 +++
>>  1 files changed, 3 insertions(+), 0 deletions(-)
>>
>> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>> index 613bc1f..e0fe6e4 100644
>> --- a/kernel/watchdog.c
>> +++ b/kernel/watchdog.c
>> @@ -206,6 +206,9 @@ void watchdog_overflow_callback(struct perf_event *event, int nmi,
>>                 struct perf_sample_data *data,
>>                 struct pt_regs *regs)
>>  {
>> +       /* Ensure the watchdog never gets throttled. */
>> +       event->hw.interrupts = 0;
>> +
>>        if (__get_cpu_var(watchdog_nmi_touch) == true) {
>>                __get_cpu_var(watchdog_nmi_touch) = false;
>>                return;
>>
>>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip:perf/urgent] watchdog: Don't throttle the watchdog
  2010-08-19 11:05 ` Peter Zijlstra
  2010-08-19 11:24   ` Stephane Eranian
@ 2010-08-20 14:18   ` tip-bot for Peter Zijlstra
  2010-08-23  8:51   ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 6+ messages in thread
From: tip-bot for Peter Zijlstra @ 2010-08-20 14:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, eranian, hpa, mingo, a.p.zijlstra, fweisbec, tglx,
	mingo, dzickus

Commit-ID:  b847b94fe2caff0f28a99af6353e9a27282e771a
Gitweb:     http://git.kernel.org/tip/b847b94fe2caff0f28a99af6353e9a27282e771a
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Fri, 20 Aug 2010 11:49:15 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 20 Aug 2010 15:00:06 +0200

watchdog: Don't throttle the watchdog

Stephane reported that when the machine locks up, the regular ticks,
which are responsible to resetting the throttle count, stop too.

Hence the NMI watchdog can end up being throttled before it reports on
the locked up state, and we end up being sad..

Cure this by having the watchdog overflow reset its own throttle count.

Reported-by: Stephane Eranian <eranian@google.com>
Tested-by: Stephane Eranian <eranian@google.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1282215916.1926.4696.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/watchdog.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 613bc1f..0d53c8e 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -206,6 +206,9 @@ void watchdog_overflow_callback(struct perf_event *event, int nmi,
 		 struct perf_sample_data *data,
 		 struct pt_regs *regs)
 {
+	/* Ensure the watchdog never gets throttled */
+	event->hw.interrupts = 0;
+
 	if (__get_cpu_var(watchdog_nmi_touch) == true) {
 		__get_cpu_var(watchdog_nmi_touch) = false;
 		return;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [tip:perf/urgent] watchdog: Don't throttle the watchdog
  2010-08-19 11:05 ` Peter Zijlstra
  2010-08-19 11:24   ` Stephane Eranian
  2010-08-20 14:18   ` [tip:perf/urgent] watchdog: Don't throttle the watchdog tip-bot for Peter Zijlstra
@ 2010-08-23  8:51   ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 6+ messages in thread
From: tip-bot for Peter Zijlstra @ 2010-08-23  8:51 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, eranian, hpa, mingo, a.p.zijlstra, fweisbec, tglx,
	mingo, dzickus

Commit-ID:  c6db67cda735d8ace5f19c3831240e1408679790
Gitweb:     http://git.kernel.org/tip/c6db67cda735d8ace5f19c3831240e1408679790
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Fri, 20 Aug 2010 11:49:15 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 23 Aug 2010 10:48:05 +0200

watchdog: Don't throttle the watchdog

Stephane reported that when the machine locks up, the regular ticks,
which are responsible to resetting the throttle count, stop too.

Hence the NMI watchdog can end up being throttled before it reports on
the locked up state, and we end up being sad..

Cure this by having the watchdog overflow reset its own throttle count.

Reported-by: Stephane Eranian <eranian@google.com>
Tested-by: Stephane Eranian <eranian@google.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1282215916.1926.4696.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/watchdog.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 613bc1f..0d53c8e 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -206,6 +206,9 @@ void watchdog_overflow_callback(struct perf_event *event, int nmi,
 		 struct perf_sample_data *data,
 		 struct pt_regs *regs)
 {
+	/* Ensure the watchdog never gets throttled */
+	event->hw.interrupts = 0;
+
 	if (__get_cpu_var(watchdog_nmi_touch) == true) {
 		__get_cpu_var(watchdog_nmi_touch) = false;
 		return;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-08-23  8:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-18 20:26 [BUG] perf_events: NMI watchdog event cannot be throttled Stephane Eranian
2010-08-19 11:05 ` Peter Zijlstra
2010-08-19 11:24   ` Stephane Eranian
2010-08-19 13:01     ` Stephane Eranian
2010-08-20 14:18   ` [tip:perf/urgent] watchdog: Don't throttle the watchdog tip-bot for Peter Zijlstra
2010-08-23  8:51   ` tip-bot for Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.