All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 1/2] watchdog: update watchdog attributes atomically
@ 2013-07-19  9:04 Michal Hocko
  2013-07-19  9:04 ` [RFC 2/2] watchdog: update watchdog_tresh properly Michal Hocko
  2013-07-19 16:10 ` [RFC 1/2] watchdog: update watchdog attributes atomically Don Zickus
  0 siblings, 2 replies; 16+ messages in thread
From: Michal Hocko @ 2013-07-19  9:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Frederic Weisbecker, Don Zickus, Thomas Gleixner,
	Ingo Molnar

proc_dowatchdog doesn't synchronize multiple callers which
might lead to confusion when two parallel callers might confuse
watchdog_enable_all_cpus resp. watchdog_disable_all_cpus (e.g. watchdog
gets enabled even if watchdog_thresh was set to 0 already).

This patch adds a local mutex which synchronizes callers to the sysctl
handler.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 kernel/watchdog.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 1241d8c..2d64c02 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -520,13 +520,15 @@ int proc_dowatchdog(struct ctl_table *table, int write,
 		    void __user *buffer, size_t *lenp, loff_t *ppos)
 {
 	int err, old_thresh, old_enabled;
+	static DEFINE_MUTEX(watchdog_proc_mutex);
 
+	mutex_lock(&watchdog_proc_mutex);
 	old_thresh = ACCESS_ONCE(watchdog_thresh);
 	old_enabled = ACCESS_ONCE(watchdog_user_enabled);
 
 	err = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
 	if (err || !write)
-		return err;
+		goto out;
 
 	set_sample_period();
 	/*
@@ -544,7 +546,8 @@ int proc_dowatchdog(struct ctl_table *table, int write,
 		watchdog_thresh = old_thresh;
 		watchdog_user_enabled = old_enabled;
 	}
-
+out:
+	mutex_unlock(&watchdog_proc_mutex);
 	return err;
 }
 #endif /* CONFIG_SYSCTL */
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC 2/2] watchdog: update watchdog_tresh properly
  2013-07-19  9:04 [RFC 1/2] watchdog: update watchdog attributes atomically Michal Hocko
@ 2013-07-19  9:04 ` Michal Hocko
  2013-07-19 16:08   ` Don Zickus
  2013-07-22 11:45   ` [RFC -v2 " Michal Hocko
  2013-07-19 16:10 ` [RFC 1/2] watchdog: update watchdog attributes atomically Don Zickus
  1 sibling, 2 replies; 16+ messages in thread
From: Michal Hocko @ 2013-07-19  9:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Frederic Weisbecker, Don Zickus, Thomas Gleixner,
	Ingo Molnar

watchdog_tresh controls how often nmi perf event counter checks per-cpu
hrtimer_interrupts counter and blows up if the counter hasn't changed
since the last check. The counter is updated by per-cpu watchdog_hrtimer
hrtimer which is scheduled with 2/5 watchdog_thresh period which
guarantees that hrtimer is scheduled 2 times per the main period. Both
hrtimer and perf event are started together when the watchdog is
enabled.

So far so good. But...

But what happens when watchdog_thresh is updated from sysctl handler?

proc_dowatchdog will set a new sampling period and hrtimer callback
(watchdog_timer_fn) will use the new value in the next round.
The problem, however, is that nobody tells the perf event that the
sampling period has changed so it is ticking with the period configured
when it has been set up.

This might result in an ear riping dissonance between perf and hrtimer
parts if the watchdog_thresh is increased. And even worse it might lead
to KABOOM if the watchdog is configured to panic on such a spurious
lockup.

This patch fixes the issue by disabling nmi perf event counter and
reinitialize it from scratch if the threshold value has changed. This
has an unpleasant side effect that the allocation of the new event might
fail theoretically so the hard lockup detector would be disabled for
such cpus. On the other hand such a memory allocation failure is very
unlikely because the original event is deallocated right before.
It would be much nicer if we just changed perf event period but there
doesn't seem to be any API to do that right now.

It is also unfortunate that perf_event_alloc uses GFP_KERNEL allocation
unconditionally so we cannot use on_each_cpu() and do the same thing
from the per-cpu context. The update from the current CPU should be safe
because perf_event_disable removes the event atomically before it clears
the per-cpu watchdog_ev so it cannot change under running handler feet.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 kernel/watchdog.c | 32 +++++++++++++++++++++++++++++---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 2d64c02..75e0bef 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -486,7 +486,31 @@ static struct smp_hotplug_thread watchdog_threads = {
 	.unpark			= watchdog_enable,
 };
 
-static int watchdog_enable_all_cpus(void)
+static void watchdog_nmi_reenable(int cpu)
+{
+	/*
+	 * Make sure that perf event counter will adopt to a new
+	 * sampling period. Update the sampling period directly would
+	 * be much nicer but we do not have an API for that now so
+	 * let's use a big hammer.
+	 */
+	watchdog_nmi_disable(cpu);
+	watchdog_nmi_enable(cpu);
+}
+
+static void watchdog_nmi_reenable_all_cpus(void)
+{
+	int cpu;
+
+	get_online_cpus();
+	preempt_disable();
+	for_each_online_cpu(cpu)
+		watchdog_nmi_reenable(cpu);
+	preempt_enable();
+	put_online_cpus();
+}
+
+static int watchdog_enable_all_cpus(bool sample_period_changed)
 {
 	int err = 0;
 
@@ -496,6 +520,8 @@ static int watchdog_enable_all_cpus(void)
 			pr_err("Failed to create watchdog threads, disabled\n");
 		else
 			watchdog_running = 1;
+	} else if (sample_period_changed) {
+		watchdog_nmi_reenable_all_cpus();
 	}
 
 	return err;
@@ -537,7 +563,7 @@ int proc_dowatchdog(struct ctl_table *table, int write,
 	 * watchdog_*_all_cpus() function takes care of this.
 	 */
 	if (watchdog_user_enabled && watchdog_thresh)
-		err = watchdog_enable_all_cpus();
+		err = watchdog_enable_all_cpus(old_thresh != watchdog_thresh);
 	else
 		watchdog_disable_all_cpus();
 
@@ -565,5 +591,5 @@ void __init lockup_detector_init(void)
 #endif
 
 	if (watchdog_user_enabled)
-		watchdog_enable_all_cpus();
+		watchdog_enable_all_cpus(false);
 }
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC 2/2] watchdog: update watchdog_tresh properly
  2013-07-19  9:04 ` [RFC 2/2] watchdog: update watchdog_tresh properly Michal Hocko
@ 2013-07-19 16:08   ` Don Zickus
  2013-07-19 16:37     ` Michal Hocko
  2013-07-22 11:45   ` [RFC -v2 " Michal Hocko
  1 sibling, 1 reply; 16+ messages in thread
From: Don Zickus @ 2013-07-19 16:08 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, Andrew Morton, Frederic Weisbecker,
	Thomas Gleixner, Ingo Molnar

On Fri, Jul 19, 2013 at 11:04:59AM +0200, Michal Hocko wrote:
> watchdog_tresh controls how often nmi perf event counter checks per-cpu
> hrtimer_interrupts counter and blows up if the counter hasn't changed
> since the last check. The counter is updated by per-cpu watchdog_hrtimer
> hrtimer which is scheduled with 2/5 watchdog_thresh period which
> guarantees that hrtimer is scheduled 2 times per the main period. Both
> hrtimer and perf event are started together when the watchdog is
> enabled.
> 
> So far so good. But...
> 
> But what happens when watchdog_thresh is updated from sysctl handler?
> 
> proc_dowatchdog will set a new sampling period and hrtimer callback
> (watchdog_timer_fn) will use the new value in the next round.
> The problem, however, is that nobody tells the perf event that the
> sampling period has changed so it is ticking with the period configured
> when it has been set up.
> 
> This might result in an ear riping dissonance between perf and hrtimer
> parts if the watchdog_thresh is increased. And even worse it might lead
> to KABOOM if the watchdog is configured to panic on such a spurious
> lockup.

Heh.  Good point.

What if we keep it simpler.

if (old_thresh != watchdog_thresh)
  watchdog_disable_all_cpus()
  wathcdog_enable_all_cpus()

The idea is that we are not changing thresholds that often and if we do, we
should probably sync up all the timers and threads again.  Safer to start
from scratch.

Thoughts?

Cheers,
Don

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 1/2] watchdog: update watchdog attributes atomically
  2013-07-19  9:04 [RFC 1/2] watchdog: update watchdog attributes atomically Michal Hocko
  2013-07-19  9:04 ` [RFC 2/2] watchdog: update watchdog_tresh properly Michal Hocko
@ 2013-07-19 16:10 ` Don Zickus
  2013-07-19 16:33   ` Michal Hocko
  1 sibling, 1 reply; 16+ messages in thread
From: Don Zickus @ 2013-07-19 16:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, Andrew Morton, Frederic Weisbecker,
	Thomas Gleixner, Ingo Molnar

On Fri, Jul 19, 2013 at 11:04:58AM +0200, Michal Hocko wrote:
> proc_dowatchdog doesn't synchronize multiple callers which
> might lead to confusion when two parallel callers might confuse
> watchdog_enable_all_cpus resp. watchdog_disable_all_cpus (e.g. watchdog
> gets enabled even if watchdog_thresh was set to 0 already).
> 
> This patch adds a local mutex which synchronizes callers to the sysctl
> handler.

Looks fine by me, except one little nitpick..

> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
>  kernel/watchdog.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 1241d8c..2d64c02 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -520,13 +520,15 @@ int proc_dowatchdog(struct ctl_table *table, int write,
>  		    void __user *buffer, size_t *lenp, loff_t *ppos)
>  {
>  	int err, old_thresh, old_enabled;
> +	static DEFINE_MUTEX(watchdog_proc_mutex);

Should we just make this global instead of hiding it as a static inside a
function.  I don't know the kernel rules for deciding which approach makes
sense.  I know it is the same result in either case...

Cheers,
Don

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 1/2] watchdog: update watchdog attributes atomically
  2013-07-19 16:10 ` [RFC 1/2] watchdog: update watchdog attributes atomically Don Zickus
@ 2013-07-19 16:33   ` Michal Hocko
  2013-07-23 13:56     ` Don Zickus
  0 siblings, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2013-07-19 16:33 UTC (permalink / raw)
  To: Don Zickus
  Cc: linux-kernel, Andrew Morton, Frederic Weisbecker,
	Thomas Gleixner, Ingo Molnar

On Fri 19-07-13 12:10:48, Don Zickus wrote:
> On Fri, Jul 19, 2013 at 11:04:58AM +0200, Michal Hocko wrote:
> > proc_dowatchdog doesn't synchronize multiple callers which
> > might lead to confusion when two parallel callers might confuse
> > watchdog_enable_all_cpus resp. watchdog_disable_all_cpus (e.g. watchdog
> > gets enabled even if watchdog_thresh was set to 0 already).
> > 
> > This patch adds a local mutex which synchronizes callers to the sysctl
> > handler.
> 
> Looks fine by me, except one little nitpick..
> 
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > ---
> >  kernel/watchdog.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > index 1241d8c..2d64c02 100644
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -520,13 +520,15 @@ int proc_dowatchdog(struct ctl_table *table, int write,
> >  		    void __user *buffer, size_t *lenp, loff_t *ppos)
> >  {
> >  	int err, old_thresh, old_enabled;
> > +	static DEFINE_MUTEX(watchdog_proc_mutex);
> 
> Should we just make this global instead of hiding it as a static inside a
> function.  I don't know the kernel rules for deciding which approach makes
> sense.  I know it is the same result in either case...

I've hidden it into the function to discourage from abusing it for
something else and because the usage is nicely focused in this
function. But I have no problem to pull it out.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/2] watchdog: update watchdog_tresh properly
  2013-07-19 16:08   ` Don Zickus
@ 2013-07-19 16:37     ` Michal Hocko
  2013-07-19 18:05       ` Don Zickus
  0 siblings, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2013-07-19 16:37 UTC (permalink / raw)
  To: Don Zickus
  Cc: linux-kernel, Andrew Morton, Frederic Weisbecker,
	Thomas Gleixner, Ingo Molnar

On Fri 19-07-13 12:08:52, Don Zickus wrote:
> On Fri, Jul 19, 2013 at 11:04:59AM +0200, Michal Hocko wrote:
> > watchdog_tresh controls how often nmi perf event counter checks per-cpu
> > hrtimer_interrupts counter and blows up if the counter hasn't changed
> > since the last check. The counter is updated by per-cpu watchdog_hrtimer
> > hrtimer which is scheduled with 2/5 watchdog_thresh period which
> > guarantees that hrtimer is scheduled 2 times per the main period. Both
> > hrtimer and perf event are started together when the watchdog is
> > enabled.
> > 
> > So far so good. But...
> > 
> > But what happens when watchdog_thresh is updated from sysctl handler?
> > 
> > proc_dowatchdog will set a new sampling period and hrtimer callback
> > (watchdog_timer_fn) will use the new value in the next round.
> > The problem, however, is that nobody tells the perf event that the
> > sampling period has changed so it is ticking with the period configured
> > when it has been set up.
> > 
> > This might result in an ear riping dissonance between perf and hrtimer
> > parts if the watchdog_thresh is increased. And even worse it might lead
> > to KABOOM if the watchdog is configured to panic on such a spurious
> > lockup.
> 
> Heh.  Good point.
> 
> What if we keep it simpler.
> 
> if (old_thresh != watchdog_thresh)
>   watchdog_disable_all_cpus()
>   wathcdog_enable_all_cpus()

Those would do nothing because of watchdog_disabled checks. It is also
much more heavy than necessary. I hope we can get a perf API which just
update the perf event period directly and we do not have to call even
watchdog_nmi_{en,dis}able

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/2] watchdog: update watchdog_tresh properly
  2013-07-19 16:37     ` Michal Hocko
@ 2013-07-19 18:05       ` Don Zickus
  2013-07-20  8:42         ` Michal Hocko
  0 siblings, 1 reply; 16+ messages in thread
From: Don Zickus @ 2013-07-19 18:05 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, Andrew Morton, Frederic Weisbecker,
	Thomas Gleixner, Ingo Molnar

On Fri, Jul 19, 2013 at 06:37:50PM +0200, Michal Hocko wrote:
> On Fri 19-07-13 12:08:52, Don Zickus wrote:
> > On Fri, Jul 19, 2013 at 11:04:59AM +0200, Michal Hocko wrote:
> > > watchdog_tresh controls how often nmi perf event counter checks per-cpu
> > > hrtimer_interrupts counter and blows up if the counter hasn't changed
> > > since the last check. The counter is updated by per-cpu watchdog_hrtimer
> > > hrtimer which is scheduled with 2/5 watchdog_thresh period which
> > > guarantees that hrtimer is scheduled 2 times per the main period. Both
> > > hrtimer and perf event are started together when the watchdog is
> > > enabled.
> > > 
> > > So far so good. But...
> > > 
> > > But what happens when watchdog_thresh is updated from sysctl handler?
> > > 
> > > proc_dowatchdog will set a new sampling period and hrtimer callback
> > > (watchdog_timer_fn) will use the new value in the next round.
> > > The problem, however, is that nobody tells the perf event that the
> > > sampling period has changed so it is ticking with the period configured
> > > when it has been set up.
> > > 
> > > This might result in an ear riping dissonance between perf and hrtimer
> > > parts if the watchdog_thresh is increased. And even worse it might lead
> > > to KABOOM if the watchdog is configured to panic on such a spurious
> > > lockup.
> > 
> > Heh.  Good point.
> > 
> > What if we keep it simpler.
> > 
> > if (old_thresh != watchdog_thresh)
> >   watchdog_disable_all_cpus()
> >   wathcdog_enable_all_cpus()
> 
> Those would do nothing because of watchdog_disabled checks. It is also

I guess I am missing something because I don't see the watchdog_disabled
checks getting in the way.  If the watchdog is already enabled, then the
disable/enable approach should work correctly, no?  The watchdog_disabled
only blocks from re-disabling or re-enabling the watchdog IIUC.


> much more heavy than necessary. I hope we can get a perf API which just
> update the perf event period directly and we do not have to call even
> watchdog_nmi_{en,dis}able

Well the reason for the heaviness is, even with the proposed API you
suggest, there is a problem with the softlockup's hrtimer.  If you set the
watchdog_thresh high like say 60 seconds, then the hrtimer fires every 12
seconds or so (while the nmi watchdog fires every 60 seconds or so).

Now lets adjust the theshold down to 10 seconds.  The current code doesn't
reset the hrtimer to the new value until it goes off again (which at worse
case is 12 seconds).  The API you propose would allow the nmi_watchdog to
fire in 10 seconds.  As a result you would have a potential to cause a
panic on a false lockup.

So if you are going to adjust the nmi_watchdog, you have to adjust the
softlockup value too.  Hence the heavy-ness.

Unless you have another solution for the above problem?

Cheers,
Don

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/2] watchdog: update watchdog_tresh properly
  2013-07-19 18:05       ` Don Zickus
@ 2013-07-20  8:42         ` Michal Hocko
  0 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2013-07-20  8:42 UTC (permalink / raw)
  To: Don Zickus
  Cc: linux-kernel, Andrew Morton, Frederic Weisbecker,
	Thomas Gleixner, Ingo Molnar

On Fri 19-07-13 14:05:30, Don Zickus wrote:
> On Fri, Jul 19, 2013 at 06:37:50PM +0200, Michal Hocko wrote:
> > On Fri 19-07-13 12:08:52, Don Zickus wrote:
> > > On Fri, Jul 19, 2013 at 11:04:59AM +0200, Michal Hocko wrote:
> > > > watchdog_tresh controls how often nmi perf event counter checks per-cpu
> > > > hrtimer_interrupts counter and blows up if the counter hasn't changed
> > > > since the last check. The counter is updated by per-cpu watchdog_hrtimer
> > > > hrtimer which is scheduled with 2/5 watchdog_thresh period which
> > > > guarantees that hrtimer is scheduled 2 times per the main period. Both
> > > > hrtimer and perf event are started together when the watchdog is
> > > > enabled.
> > > > 
> > > > So far so good. But...
> > > > 
> > > > But what happens when watchdog_thresh is updated from sysctl handler?
> > > > 
> > > > proc_dowatchdog will set a new sampling period and hrtimer callback
> > > > (watchdog_timer_fn) will use the new value in the next round.
> > > > The problem, however, is that nobody tells the perf event that the
> > > > sampling period has changed so it is ticking with the period configured
> > > > when it has been set up.
> > > > 
> > > > This might result in an ear riping dissonance between perf and hrtimer
> > > > parts if the watchdog_thresh is increased. And even worse it might lead
> > > > to KABOOM if the watchdog is configured to panic on such a spurious
> > > > lockup.
> > > 
> > > Heh.  Good point.
> > > 
> > > What if we keep it simpler.
> > > 
> > > if (old_thresh != watchdog_thresh)
> > >   watchdog_disable_all_cpus()
> > >   wathcdog_enable_all_cpus()
> > 
> > Those would do nothing because of watchdog_disabled checks. It is also
> 
> I guess I am missing something because I don't see the watchdog_disabled
> checks getting in the way. 

You are not missing anything. I should read more carefully.

> If the watchdog is already enabled, then the disable/enable approach
> should work correctly, no?  The watchdog_disabled only blocks from
> re-disabling or re-enabling the watchdog IIUC.

Yes, you are right.
 
> > much more heavy than necessary. I hope we can get a perf API which just
> > update the perf event period directly and we do not have to call even
> > watchdog_nmi_{en,dis}able
> 
> Well the reason for the heaviness is, even with the proposed API you
> suggest, there is a problem with the softlockup's hrtimer.  If you set the
> watchdog_thresh high like say 60 seconds, then the hrtimer fires every 12
> seconds or so (while the nmi watchdog fires every 60 seconds or so).

True. I was so focused on the particular bug I was dealing with that I
didn't consider the other direction. My bad and thanks for pointing it
out.
You even do not have to play with setting the value high and then
low. It is sufficient to set the value to lower than 4 (from default 10)
and you have the same problem.

> Now lets adjust the theshold down to 10 seconds.  The current code doesn't
> reset the hrtimer to the new value until it goes off again (which at worse
> case is 12 seconds).  The API you propose would allow the nmi_watchdog to
> fire in 10 seconds.  As a result you would have a potential to cause a
> panic on a false lockup.
> 
> So if you are going to adjust the nmi_watchdog, you have to adjust the
> softlockup value too.  Hence the heavy-ness.
> 
> Unless you have another solution for the above problem?

We can cancel and restart the hrtimer, I guess. I will look into it on
Monday.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC -v2 2/2] watchdog: update watchdog_tresh properly
  2013-07-19  9:04 ` [RFC 2/2] watchdog: update watchdog_tresh properly Michal Hocko
  2013-07-19 16:08   ` Don Zickus
@ 2013-07-22 11:45   ` Michal Hocko
  2013-07-22 12:47     ` Michal Hocko
  2013-07-22 14:32     ` [RFC -v3 " Michal Hocko
  1 sibling, 2 replies; 16+ messages in thread
From: Michal Hocko @ 2013-07-22 11:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Frederic Weisbecker, Don Zickus, Thomas Gleixner,
	Ingo Molnar

watchdog_tresh controls how often nmi perf event counter checks per-cpu
hrtimer_interrupts counter and blows up if the counter hasn't changed
since the last check. The counter is updated by per-cpu watchdog_hrtimer
hrtimer which is scheduled with 2/5 watchdog_thresh period which
guarantees that hrtimer is scheduled 2 times per the main period. Both
hrtimer and perf event are started together when the watchdog is
enabled.

So far so good. But...

But what happens when watchdog_thresh is updated from sysctl handler?

proc_dowatchdog will set a new sampling period and hrtimer callback
(watchdog_timer_fn) will use the new value in the next round.
The problem, however, is that nobody tells the perf event that the
sampling period has changed so it is ticking with the period configured
when it has been set up.

This might result in an ear riping dissonance between perf and hrtimer
parts if the watchdog_thresh is increased. And even worse it might lead
to KABOOM if the watchdog is configured to panic on such a spurious
lockup.

This patch fixes the issue by updating both nmi perf even counter and
hrtimers if the threshold value has changed.

The nmi one is disabled and then reinitialized from scratch. This
has an unpleasant side effect that the allocation of the new event might
fail theoretically so the hard lockup detector would be disabled for
such cpus. On the other hand such a memory allocation failure is very
unlikely because the original event is deallocated right before.
It would be much nicer if we just changed perf event period but there
doesn't seem to be any API to do that right now.
It is also unfortunate that perf_event_alloc uses GFP_KERNEL allocation
unconditionally so we cannot use on_each_cpu() and do the same thing
from the per-cpu context. The update from the current CPU should be
safe because perf_event_disable removes the event atomically before
it clears the per-cpu watchdog_ev so it cannot change anything under
running handler feet.

The hrtimer is simply restarted (thanks to Don Zickus who has pointed
this out) because we cannot rely it will fire&adopt to the new sampling
period before a new nmi event triggers (when the treshold is decreased).

Changes since v1
- restart hrtimer to ensure that hrtimer doesn't mess new nmi as pointed
  out by Don Zickus

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 kernel/watchdog.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 2d64c02..7108d3d 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -486,7 +486,50 @@ static struct smp_hotplug_thread watchdog_threads = {
 	.unpark			= watchdog_enable,
 };
 
-static int watchdog_enable_all_cpus(void)
+static void restart_watchdog_hrtimer(void *info)
+{
+	struct hrtimer *hrtimer = &__raw_get_cpu_var(watchdog_hrtimer);
+
+	hrtimer_restart(hrtimer);
+}
+
+static void update_timers(int cpu)
+{
+	/*
+	 * Make sure that perf event counter will adopt to a new
+	 * sampling period. Updating the sampling period directly would
+	 * be much nicer but we do not have an API for that now so
+	 * let's use a big hammer.
+	 * Hrtimer will adopt the new period on the next tick but this
+	 * might be late already so we have to restart the timer as well.
+	 */
+	watchdog_nmi_disable(cpu);
+	if (cpu == smp_processor_id()) {
+		struct hrtimer *hrtimer = &__raw_get_cpu_var(watchdog_hrtimer);
+
+		hrtimer_restart(hrtimer);
+	} else {
+		struct call_single_data data = {
+			.func = restart_watchdog_hrtimer};
+
+		__smp_call_function_single(cpu, &data, 1);
+	}
+	watchdog_nmi_enable(cpu);
+}
+
+static void update_timers_all_cpus(void)
+{
+	int cpu;
+
+	get_online_cpus();
+	preempt_disable();
+	for_each_online_cpu(cpu)
+		update_timers(cpu);
+	preempt_enable();
+	put_online_cpus();
+}
+
+static int watchdog_enable_all_cpus(bool sample_period_changed)
 {
 	int err = 0;
 
@@ -496,6 +539,8 @@ static int watchdog_enable_all_cpus(void)
 			pr_err("Failed to create watchdog threads, disabled\n");
 		else
 			watchdog_running = 1;
+	} else if (sample_period_changed) {
+		update_timers_all_cpus();
 	}
 
 	return err;
@@ -537,7 +582,7 @@ int proc_dowatchdog(struct ctl_table *table, int write,
 	 * watchdog_*_all_cpus() function takes care of this.
 	 */
 	if (watchdog_user_enabled && watchdog_thresh)
-		err = watchdog_enable_all_cpus();
+		err = watchdog_enable_all_cpus(old_thresh != watchdog_thresh);
 	else
 		watchdog_disable_all_cpus();
 
@@ -565,5 +610,5 @@ void __init lockup_detector_init(void)
 #endif
 
 	if (watchdog_user_enabled)
-		watchdog_enable_all_cpus();
+		watchdog_enable_all_cpus(false);
 }
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC -v2 2/2] watchdog: update watchdog_tresh properly
  2013-07-22 11:45   ` [RFC -v2 " Michal Hocko
@ 2013-07-22 12:47     ` Michal Hocko
  2013-07-22 14:32     ` [RFC -v3 " Michal Hocko
  1 sibling, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2013-07-22 12:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Frederic Weisbecker, Don Zickus, Thomas Gleixner,
	Ingo Molnar

On Mon 22-07-13 13:45:02, Michal Hocko wrote:
[...]
> @@ -486,7 +486,50 @@ static struct smp_hotplug_thread watchdog_threads = {
>  	.unpark			= watchdog_enable,
>  };
>  
> -static int watchdog_enable_all_cpus(void)
> +static void restart_watchdog_hrtimer(void *info)
> +{
> +	struct hrtimer *hrtimer = &__raw_get_cpu_var(watchdog_hrtimer);
> +
> +	hrtimer_restart(hrtimer);

Ohh, scratch that. This is not sufficient because the the new sampling
period is not considered. I will repost updated version soon

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC -v3 2/2] watchdog: update watchdog_tresh properly
  2013-07-22 11:45   ` [RFC -v2 " Michal Hocko
  2013-07-22 12:47     ` Michal Hocko
@ 2013-07-22 14:32     ` Michal Hocko
  2013-07-23 13:53       ` Don Zickus
  1 sibling, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2013-07-22 14:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Frederic Weisbecker, Don Zickus, Thomas Gleixner,
	Ingo Molnar

watchdog_tresh controls how often nmi perf event counter checks per-cpu
hrtimer_interrupts counter and blows up if the counter hasn't changed
since the last check. The counter is updated by per-cpu watchdog_hrtimer
hrtimer which is scheduled with 2/5 watchdog_thresh period which
guarantees that hrtimer is scheduled 2 times per the main period. Both
hrtimer and perf event are started together when the watchdog is
enabled.

So far so good. But...

But what happens when watchdog_thresh is updated from sysctl handler?

proc_dowatchdog will set a new sampling period and hrtimer callback
(watchdog_timer_fn) will use the new value in the next round.
The problem, however, is that nobody tells the perf event that the
sampling period has changed so it is ticking with the period configured
when it has been set up.

This might result in an ear riping dissonance between perf and hrtimer
parts if the watchdog_thresh is increased. And even worse it might lead
to KABOOM if the watchdog is configured to panic on such a spurious
lockup.

This patch fixes the issue by updating both nmi perf even counter and
hrtimers if the threshold value has changed.

The nmi one is disabled and then reinitialized from scratch. This
has an unpleasant side effect that the allocation of the new event might
fail theoretically so the hard lockup detector would be disabled for
such cpus. On the other hand such a memory allocation failure is very
unlikely because the original event is deallocated right before.
It would be much nicer if we just changed perf event period but there
doesn't seem to be any API to do that right now.
It is also unfortunate that perf_event_alloc uses GFP_KERNEL allocation
unconditionally so we cannot use on_each_cpu() and do the same thing
from the per-cpu context. The update from the current CPU should be
safe because perf_event_disable removes the event atomically before
it clears the per-cpu watchdog_ev so it cannot change anything under
running handler feet.

The hrtimer is simply restarted (thanks to Don Zickus who has pointed
this out) if it is queued because we cannot rely it will fire&adopt
to the new sampling period before a new nmi event triggers (when the
treshold is decreased).

Changes since v1
- restart hrtimer to ensure that hrtimer doesn't mess new nmi as pointed
  out by Don Zickus

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 kernel/watchdog.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 50 insertions(+), 3 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 2d64c02..eb4ebb5 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -486,7 +486,52 @@ static struct smp_hotplug_thread watchdog_threads = {
 	.unpark			= watchdog_enable,
 };
 
-static int watchdog_enable_all_cpus(void)
+static void restart_watchdog_hrtimer(void *info)
+{
+	struct hrtimer *hrtimer = &__raw_get_cpu_var(watchdog_hrtimer);
+	int ret;
+
+	/*
+	 * No need to cancel and restart hrtimer if it is currently executing
+	 * because it will reprogram itself with the new period now.
+	 * We should never see it unqueued here because we are running per-cpu
+	 * with interrupts disabled.
+	 */
+	ret = hrtimer_try_to_cancel(hrtimer);
+	if (ret == 1)
+		hrtimer_start(hrtimer, ns_to_ktime(sample_period),
+				HRTIMER_MODE_REL_PINNED);
+}
+
+static void update_timers(int cpu)
+{
+	struct call_single_data data = {.func = restart_watchdog_hrtimer};
+	/*
+	 * Make sure that perf event counter will adopt to a new
+	 * sampling period. Updating the sampling period directly would
+	 * be much nicer but we do not have an API for that now so
+	 * let's use a big hammer.
+	 * Hrtimer will adopt the new period on the next tick but this
+	 * might be late already so we have to restart the timer as well.
+	 */
+	watchdog_nmi_disable(cpu);
+	__smp_call_function_single(cpu, &data, 1);
+	watchdog_nmi_enable(cpu);
+}
+
+static void update_timers_all_cpus(void)
+{
+	int cpu;
+
+	get_online_cpus();
+	preempt_disable();
+	for_each_online_cpu(cpu)
+		update_timers(cpu);
+	preempt_enable();
+	put_online_cpus();
+}
+
+static int watchdog_enable_all_cpus(bool sample_period_changed)
 {
 	int err = 0;
 
@@ -496,6 +541,8 @@ static int watchdog_enable_all_cpus(void)
 			pr_err("Failed to create watchdog threads, disabled\n");
 		else
 			watchdog_running = 1;
+	} else if (sample_period_changed) {
+		update_timers_all_cpus();
 	}
 
 	return err;
@@ -537,7 +584,7 @@ int proc_dowatchdog(struct ctl_table *table, int write,
 	 * watchdog_*_all_cpus() function takes care of this.
 	 */
 	if (watchdog_user_enabled && watchdog_thresh)
-		err = watchdog_enable_all_cpus();
+		err = watchdog_enable_all_cpus(old_thresh != watchdog_thresh);
 	else
 		watchdog_disable_all_cpus();
 
@@ -565,5 +612,5 @@ void __init lockup_detector_init(void)
 #endif
 
 	if (watchdog_user_enabled)
-		watchdog_enable_all_cpus();
+		watchdog_enable_all_cpus(false);
 }
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC -v3 2/2] watchdog: update watchdog_tresh properly
  2013-07-22 14:32     ` [RFC -v3 " Michal Hocko
@ 2013-07-23 13:53       ` Don Zickus
  2013-07-23 14:07         ` Michal Hocko
  0 siblings, 1 reply; 16+ messages in thread
From: Don Zickus @ 2013-07-23 13:53 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, Andrew Morton, Frederic Weisbecker,
	Thomas Gleixner, Ingo Molnar

On Mon, Jul 22, 2013 at 04:32:46PM +0200, Michal Hocko wrote:
> The nmi one is disabled and then reinitialized from scratch. This
> has an unpleasant side effect that the allocation of the new event might
> fail theoretically so the hard lockup detector would be disabled for
> such cpus. On the other hand such a memory allocation failure is very
> unlikely because the original event is deallocated right before.
> It would be much nicer if we just changed perf event period but there
> doesn't seem to be any API to do that right now.
> It is also unfortunate that perf_event_alloc uses GFP_KERNEL allocation
> unconditionally so we cannot use on_each_cpu() and do the same thing
> from the per-cpu context. The update from the current CPU should be
> safe because perf_event_disable removes the event atomically before
> it clears the per-cpu watchdog_ev so it cannot change anything under
> running handler feet.

I guess I don't have a problem with this.  I was hoping to have more
shared code with the regular stop/start routines but with the pmu bit
locking (to share pmus with oprofile), you really need to unregister
everything to stop the lockup detector.  This makes it a little too heavy
for a restart routine like this.

The only odd thing is I can't figure out which version you were using to
apply this patch.  I can't find old_thresh (though I understand the idea
of it).

Cheers,
Don

> 
> The hrtimer is simply restarted (thanks to Don Zickus who has pointed
> this out) if it is queued because we cannot rely it will fire&adopt
> to the new sampling period before a new nmi event triggers (when the
> treshold is decreased).
> 
> Changes since v1
> - restart hrtimer to ensure that hrtimer doesn't mess new nmi as pointed
>   out by Don Zickus
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
>  kernel/watchdog.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 50 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 2d64c02..eb4ebb5 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -486,7 +486,52 @@ static struct smp_hotplug_thread watchdog_threads = {
>  	.unpark			= watchdog_enable,
>  };
>  
> -static int watchdog_enable_all_cpus(void)
> +static void restart_watchdog_hrtimer(void *info)
> +{
> +	struct hrtimer *hrtimer = &__raw_get_cpu_var(watchdog_hrtimer);
> +	int ret;
> +
> +	/*
> +	 * No need to cancel and restart hrtimer if it is currently executing
> +	 * because it will reprogram itself with the new period now.
> +	 * We should never see it unqueued here because we are running per-cpu
> +	 * with interrupts disabled.
> +	 */
> +	ret = hrtimer_try_to_cancel(hrtimer);
> +	if (ret == 1)
> +		hrtimer_start(hrtimer, ns_to_ktime(sample_period),
> +				HRTIMER_MODE_REL_PINNED);
> +}
> +
> +static void update_timers(int cpu)
> +{
> +	struct call_single_data data = {.func = restart_watchdog_hrtimer};
> +	/*
> +	 * Make sure that perf event counter will adopt to a new
> +	 * sampling period. Updating the sampling period directly would
> +	 * be much nicer but we do not have an API for that now so
> +	 * let's use a big hammer.
> +	 * Hrtimer will adopt the new period on the next tick but this
> +	 * might be late already so we have to restart the timer as well.
> +	 */
> +	watchdog_nmi_disable(cpu);
> +	__smp_call_function_single(cpu, &data, 1);
> +	watchdog_nmi_enable(cpu);
> +}
> +
> +static void update_timers_all_cpus(void)
> +{
> +	int cpu;
> +
> +	get_online_cpus();
> +	preempt_disable();
> +	for_each_online_cpu(cpu)
> +		update_timers(cpu);
> +	preempt_enable();
> +	put_online_cpus();
> +}
> +
> +static int watchdog_enable_all_cpus(bool sample_period_changed)
>  {
>  	int err = 0;
>  
> @@ -496,6 +541,8 @@ static int watchdog_enable_all_cpus(void)
>  			pr_err("Failed to create watchdog threads, disabled\n");
>  		else
>  			watchdog_running = 1;
> +	} else if (sample_period_changed) {
> +		update_timers_all_cpus();
>  	}
>  
>  	return err;
> @@ -537,7 +584,7 @@ int proc_dowatchdog(struct ctl_table *table, int write,
>  	 * watchdog_*_all_cpus() function takes care of this.
>  	 */
>  	if (watchdog_user_enabled && watchdog_thresh)
> -		err = watchdog_enable_all_cpus();
> +		err = watchdog_enable_all_cpus(old_thresh != watchdog_thresh);
>  	else
>  		watchdog_disable_all_cpus();
>  
> @@ -565,5 +612,5 @@ void __init lockup_detector_init(void)
>  #endif
>  
>  	if (watchdog_user_enabled)
> -		watchdog_enable_all_cpus();
> +		watchdog_enable_all_cpus(false);
>  }
> -- 
> 1.8.3.2
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 1/2] watchdog: update watchdog attributes atomically
  2013-07-19 16:33   ` Michal Hocko
@ 2013-07-23 13:56     ` Don Zickus
  0 siblings, 0 replies; 16+ messages in thread
From: Don Zickus @ 2013-07-23 13:56 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, Andrew Morton, Frederic Weisbecker,
	Thomas Gleixner, Ingo Molnar

On Fri, Jul 19, 2013 at 06:33:23PM +0200, Michal Hocko wrote:
> On Fri 19-07-13 12:10:48, Don Zickus wrote:
> > On Fri, Jul 19, 2013 at 11:04:58AM +0200, Michal Hocko wrote:
> > > proc_dowatchdog doesn't synchronize multiple callers which
> > > might lead to confusion when two parallel callers might confuse
> > > watchdog_enable_all_cpus resp. watchdog_disable_all_cpus (e.g. watchdog
> > > gets enabled even if watchdog_thresh was set to 0 already).
> > > 
> > > This patch adds a local mutex which synchronizes callers to the sysctl
> > > handler.
> > 
> > Looks fine by me, except one little nitpick..
> > 
> > > 
> > > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > > ---
> > >  kernel/watchdog.c | 7 +++++--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > > index 1241d8c..2d64c02 100644
> > > --- a/kernel/watchdog.c
> > > +++ b/kernel/watchdog.c
> > > @@ -520,13 +520,15 @@ int proc_dowatchdog(struct ctl_table *table, int write,
> > >  		    void __user *buffer, size_t *lenp, loff_t *ppos)
> > >  {
> > >  	int err, old_thresh, old_enabled;
> > > +	static DEFINE_MUTEX(watchdog_proc_mutex);
> > 
> > Should we just make this global instead of hiding it as a static inside a
> > function.  I don't know the kernel rules for deciding which approach makes
> > sense.  I know it is the same result in either case...
> 
> I've hidden it into the function to discourage from abusing it for
> something else and because the usage is nicely focused in this
> function. But I have no problem to pull it out.

I understand what you are saying.  I just remember in the past being asked
not to do that, which is why I had the question.  I guess we can leave it
as is.

Cheers,
Don

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC -v3 2/2] watchdog: update watchdog_tresh properly
  2013-07-23 13:53       ` Don Zickus
@ 2013-07-23 14:07         ` Michal Hocko
  2013-07-23 14:44           ` Don Zickus
  0 siblings, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2013-07-23 14:07 UTC (permalink / raw)
  To: Don Zickus
  Cc: linux-kernel, Andrew Morton, Frederic Weisbecker,
	Thomas Gleixner, Ingo Molnar

On Tue 23-07-13 09:53:34, Don Zickus wrote:
> On Mon, Jul 22, 2013 at 04:32:46PM +0200, Michal Hocko wrote:
> > The nmi one is disabled and then reinitialized from scratch. This
> > has an unpleasant side effect that the allocation of the new event might
> > fail theoretically so the hard lockup detector would be disabled for
> > such cpus. On the other hand such a memory allocation failure is very
> > unlikely because the original event is deallocated right before.
> > It would be much nicer if we just changed perf event period but there
> > doesn't seem to be any API to do that right now.
> > It is also unfortunate that perf_event_alloc uses GFP_KERNEL allocation
> > unconditionally so we cannot use on_each_cpu() and do the same thing
> > from the per-cpu context. The update from the current CPU should be
> > safe because perf_event_disable removes the event atomically before
> > it clears the per-cpu watchdog_ev so it cannot change anything under
> > running handler feet.
> 
> I guess I don't have a problem with this.  I was hoping to have more
> shared code with the regular stop/start routines but with the pmu bit
> locking (to share pmus with oprofile), you really need to unregister
> everything to stop the lockup detector.  This makes it a little too heavy
> for a restart routine like this.

I am not sure I understand the above. Regular stop/start is about all
the machinery, I have tried to reduce the restarting to bare minimum.
Do you find the current version heavier than the full disable_all &&
enable_all?

> The only odd thing is I can't figure out which version you were using to
> apply this patch.  I can't find old_thresh (though I understand the idea
> of it).

current Linus tree (linux-next - 20130723 - has it as well AFAICS)
 
> Cheers,
> Don
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC -v3 2/2] watchdog: update watchdog_tresh properly
  2013-07-23 14:07         ` Michal Hocko
@ 2013-07-23 14:44           ` Don Zickus
  2013-07-23 14:51             ` Michal Hocko
  0 siblings, 1 reply; 16+ messages in thread
From: Don Zickus @ 2013-07-23 14:44 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, Andrew Morton, Frederic Weisbecker,
	Thomas Gleixner, Ingo Molnar

On Tue, Jul 23, 2013 at 04:07:29PM +0200, Michal Hocko wrote:
> On Tue 23-07-13 09:53:34, Don Zickus wrote:
> > On Mon, Jul 22, 2013 at 04:32:46PM +0200, Michal Hocko wrote:
> > > The nmi one is disabled and then reinitialized from scratch. This
> > > has an unpleasant side effect that the allocation of the new event might
> > > fail theoretically so the hard lockup detector would be disabled for
> > > such cpus. On the other hand such a memory allocation failure is very
> > > unlikely because the original event is deallocated right before.
> > > It would be much nicer if we just changed perf event period but there
> > > doesn't seem to be any API to do that right now.
> > > It is also unfortunate that perf_event_alloc uses GFP_KERNEL allocation
> > > unconditionally so we cannot use on_each_cpu() and do the same thing
> > > from the per-cpu context. The update from the current CPU should be
> > > safe because perf_event_disable removes the event atomically before
> > > it clears the per-cpu watchdog_ev so it cannot change anything under
> > > running handler feet.
> > 
> > I guess I don't have a problem with this.  I was hoping to have more
> > shared code with the regular stop/start routines but with the pmu bit
> > locking (to share pmus with oprofile), you really need to unregister
> > everything to stop the lockup detector.  This makes it a little too heavy
> > for a restart routine like this.
> 
> I am not sure I understand the above. Regular stop/start is about all
> the machinery, I have tried to reduce the restarting to bare minimum.
> Do you find the current version heavier than the full disable_all &&
> enable_all?

No, I find your restart mechanism lighter than full disable_all.  I would
love to have the lockup detector just disable itself on stop and re-enable
on start.  But because of oprofile, the lockup has to free up its event
on stop and recreate it on start, which kinda sucks.

Anyway it was just an aside.

> 
> > The only odd thing is I can't figure out which version you were using to
> > apply this patch.  I can't find old_thresh (though I understand the idea
> > of it).
> 
> current Linus tree (linux-next - 20130723 - has it as well AFAICS)

Ok. Thanks.  Ah, I see.  I forgot Frederic modified pieces there.  The
threading keeps changing.  I see why you took your approach.

Should be fine.

Acked-by: Don Zickus <dzickus@redhat.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC -v3 2/2] watchdog: update watchdog_tresh properly
  2013-07-23 14:44           ` Don Zickus
@ 2013-07-23 14:51             ` Michal Hocko
  0 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2013-07-23 14:51 UTC (permalink / raw)
  To: Don Zickus
  Cc: linux-kernel, Andrew Morton, Frederic Weisbecker,
	Thomas Gleixner, Ingo Molnar

On Tue 23-07-13 10:44:08, Don Zickus wrote:
> On Tue, Jul 23, 2013 at 04:07:29PM +0200, Michal Hocko wrote:
> > On Tue 23-07-13 09:53:34, Don Zickus wrote:
> > > On Mon, Jul 22, 2013 at 04:32:46PM +0200, Michal Hocko wrote:
> > > > The nmi one is disabled and then reinitialized from scratch. This
> > > > has an unpleasant side effect that the allocation of the new event might
> > > > fail theoretically so the hard lockup detector would be disabled for
> > > > such cpus. On the other hand such a memory allocation failure is very
> > > > unlikely because the original event is deallocated right before.
> > > > It would be much nicer if we just changed perf event period but there
> > > > doesn't seem to be any API to do that right now.
> > > > It is also unfortunate that perf_event_alloc uses GFP_KERNEL allocation
> > > > unconditionally so we cannot use on_each_cpu() and do the same thing
> > > > from the per-cpu context. The update from the current CPU should be
> > > > safe because perf_event_disable removes the event atomically before
> > > > it clears the per-cpu watchdog_ev so it cannot change anything under
> > > > running handler feet.
> > > 
> > > I guess I don't have a problem with this.  I was hoping to have more
> > > shared code with the regular stop/start routines but with the pmu bit
> > > locking (to share pmus with oprofile), you really need to unregister
> > > everything to stop the lockup detector.  This makes it a little too heavy
> > > for a restart routine like this.
> > 
> > I am not sure I understand the above. Regular stop/start is about all
> > the machinery, I have tried to reduce the restarting to bare minimum.
> > Do you find the current version heavier than the full disable_all &&
> > enable_all?
> 
> No, I find your restart mechanism lighter than full disable_all.  I would
> love to have the lockup detector just disable itself on stop and re-enable
> on start.  But because of oprofile, the lockup has to free up its event
> on stop and recreate it on start, which kinda sucks.
> 
> Anyway it was just an aside.

Ohh, I see.
 
> > > The only odd thing is I can't figure out which version you were using to
> > > apply this patch.  I can't find old_thresh (though I understand the idea
> > > of it).
> > 
> > current Linus tree (linux-next - 20130723 - has it as well AFAICS)
> 
> Ok. Thanks.  Ah, I see.  I forgot Frederic modified pieces there.  The
> threading keeps changing.  I see why you took your approach.
> 
> Should be fine.
> 
> Acked-by: Don Zickus <dzickus@redhat.com>

Thanks! I will repost this without RFC if nobody else objects.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-07-23 14:51 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-19  9:04 [RFC 1/2] watchdog: update watchdog attributes atomically Michal Hocko
2013-07-19  9:04 ` [RFC 2/2] watchdog: update watchdog_tresh properly Michal Hocko
2013-07-19 16:08   ` Don Zickus
2013-07-19 16:37     ` Michal Hocko
2013-07-19 18:05       ` Don Zickus
2013-07-20  8:42         ` Michal Hocko
2013-07-22 11:45   ` [RFC -v2 " Michal Hocko
2013-07-22 12:47     ` Michal Hocko
2013-07-22 14:32     ` [RFC -v3 " Michal Hocko
2013-07-23 13:53       ` Don Zickus
2013-07-23 14:07         ` Michal Hocko
2013-07-23 14:44           ` Don Zickus
2013-07-23 14:51             ` Michal Hocko
2013-07-19 16:10 ` [RFC 1/2] watchdog: update watchdog attributes atomically Don Zickus
2013-07-19 16:33   ` Michal Hocko
2013-07-23 13:56     ` Don Zickus

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.