All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] nohz: Random fixes
@ 2013-05-20 16:01 Frederic Weisbecker
  2013-05-20 16:01 ` [PATCH 1/8] nohz: Warn if the machine can not perform nohz_full Frederic Weisbecker
                   ` (7 more replies)
  0 siblings, 8 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2013-05-20 16:01 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Don Zickus, Ingo Molnar, Peter Zijlstra,
	Srivatsa S. Bhat, Steven Rostedt, Paul E. McKenney,
	Thomas Gleixner, Borislav Petkov, Li Zhong, Kevin Hilman,
	Marcelo Tosatti, Gleb Natapov, Andrew Morton, Mike Galbraith,
	H. Peter Anvin

Hi,

A few fixes for nohz against -rc1. I'll likely send the non-rfc
part to Ingo tomorrow, unless anybody has concerns.

Thanks.

---
Frederic Weisbecker (6):
  vtime: Use consistent clocks among nohz accounting
  watchdog: Boot-disable by default on full dynticks
  kvm: Move guest entry/exit APIs to context_tracking
  kthread: Enable parking requests from setup() and unpark() callbacks
  watchdog: Rename confusing state variable
  watchdog: Fix internal state with boot user disabled watchdog

Li Zhong (1):
  nohz: Fix notifier return val that enforce timekeeping

Steven Rostedt (1):
  nohz: Warn if the machine can not perform nohz_full

 include/linux/context_tracking.h |   35 +++++++++++++++++++++++++++++++++++
 include/linux/kvm_host.h         |   37 +------------------------------------
 include/linux/nmi.h              |    2 +-
 include/linux/vtime.h            |    4 ++--
 kernel/context_tracking.c        |    1 -
 kernel/sched/core.c              |    2 +-
 kernel/sched/cputime.c           |    6 +++---
 kernel/smpboot.c                 |    6 ++++++
 kernel/sysctl.c                  |    4 ++--
 kernel/time/tick-sched.c         |    7 ++++++-
 kernel/watchdog.c                |   34 ++++++++++++++++++++--------------
 11 files changed, 77 insertions(+), 61 deletions(-)

-- 
1.7.5.4


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/8] nohz: Warn if the machine can not perform nohz_full
  2013-05-20 16:01 [PATCH 0/8] nohz: Random fixes Frederic Weisbecker
@ 2013-05-20 16:01 ` Frederic Weisbecker
  2013-05-20 16:01 ` [PATCH 2/8] vtime: Use consistent clocks among nohz accounting Frederic Weisbecker
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2013-05-20 16:01 UTC (permalink / raw)
  To: LKML
  Cc: Steven Rostedt, Paul E. McKenney, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, H. Peter Anvin, Peter Zijlstra,
	Frederic Weisbecker

From: Steven Rostedt <rostedt@goodmis.org>

If the user configures NO_HZ_FULL and defines nohz_full=XXX on the
kernel command line, or enables NO_HZ_FULL_ALL, but nohz fails
due to the machine having a unstable clock, warn about it.

We do not want users thinking that they are getting the benefit
of nohz when their machine can not support it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/time/tick-sched.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index bc67d42..cfc798b 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -178,6 +178,11 @@ static bool can_stop_full_tick(void)
 	 */
 	if (!sched_clock_stable) {
 		trace_tick_stop(0, "unstable sched clock\n");
+		/*
+		 * Don't allow the user to think they can get
+		 * full NO_HZ with this machine.
+		 */
+		WARN_ONCE(1, "NO_HZ FULL will not work with unstable sched clock");
 		return false;
 	}
 #endif
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/8] vtime: Use consistent clocks among nohz accounting
  2013-05-20 16:01 [PATCH 0/8] nohz: Random fixes Frederic Weisbecker
  2013-05-20 16:01 ` [PATCH 1/8] nohz: Warn if the machine can not perform nohz_full Frederic Weisbecker
@ 2013-05-20 16:01 ` Frederic Weisbecker
  2013-06-03  9:47   ` Stefan Seyfried
  2013-05-20 16:01 ` [PATCH 3/8] watchdog: Boot-disable by default on full dynticks Frederic Weisbecker
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2013-05-20 16:01 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Mike Galbraith

While computing the cputime delta of dynticks CPUs,
we are mixing up clocks of differents natures:

* local_clock() which takes care of unstable clock
sources and fix these if needed.

* sched_clock() which is the weaker version of
local_clock(). It doesn't compute any fixup in case
of unstable source.

If the clock source is stable, those two clocks are the
same and we can safely compute the difference against
two random points.

Otherwise it results in random deltas as sched_clock()
can randomly drift away, back or forward, from local_clock().

As a consequence, some strange behaviour with unstable tsc
has been observed such as non progressing constant zero cputime.
(The 'top' command showing no load).

Fix this by only using local_clock(), or its irq safe/remote
equivalent, in vtime code.

Reported-by: Mike Galbraith <efault@gmx.de>
Suggested-by: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/vtime.h  |    4 ++--
 kernel/sched/core.c    |    2 +-
 kernel/sched/cputime.c |    6 +++---
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index 71a5782..b1dd2db 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -34,7 +34,7 @@ static inline void vtime_user_exit(struct task_struct *tsk)
 }
 extern void vtime_guest_enter(struct task_struct *tsk);
 extern void vtime_guest_exit(struct task_struct *tsk);
-extern void vtime_init_idle(struct task_struct *tsk);
+extern void vtime_init_idle(struct task_struct *tsk, int cpu);
 #else
 static inline void vtime_account_irq_exit(struct task_struct *tsk)
 {
@@ -45,7 +45,7 @@ static inline void vtime_user_enter(struct task_struct *tsk) { }
 static inline void vtime_user_exit(struct task_struct *tsk) { }
 static inline void vtime_guest_enter(struct task_struct *tsk) { }
 static inline void vtime_guest_exit(struct task_struct *tsk) { }
-static inline void vtime_init_idle(struct task_struct *tsk) { }
+static inline void vtime_init_idle(struct task_struct *tsk, int cpu) { }
 #endif
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 58453b8..e1a27f9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4745,7 +4745,7 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu)
 	 */
 	idle->sched_class = &idle_sched_class;
 	ftrace_graph_init_idle_task(idle, cpu);
-	vtime_init_idle(idle);
+	vtime_init_idle(idle, cpu);
 #if defined(CONFIG_SMP)
 	sprintf(idle->comm, "%s/%d", INIT_TASK_COMM, cpu);
 #endif
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index cc2dc3ee..b5ccba2 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -747,17 +747,17 @@ void arch_vtime_task_switch(struct task_struct *prev)
 
 	write_seqlock(&current->vtime_seqlock);
 	current->vtime_snap_whence = VTIME_SYS;
-	current->vtime_snap = sched_clock();
+	current->vtime_snap = sched_clock_cpu(smp_processor_id());
 	write_sequnlock(&current->vtime_seqlock);
 }
 
-void vtime_init_idle(struct task_struct *t)
+void vtime_init_idle(struct task_struct *t, int cpu)
 {
 	unsigned long flags;
 
 	write_seqlock_irqsave(&t->vtime_seqlock, flags);
 	t->vtime_snap_whence = VTIME_SYS;
-	t->vtime_snap = sched_clock();
+	t->vtime_snap = sched_clock_cpu(cpu);
 	write_sequnlock_irqrestore(&t->vtime_seqlock, flags);
 }
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/8] watchdog: Boot-disable by default on full dynticks
  2013-05-20 16:01 [PATCH 0/8] nohz: Random fixes Frederic Weisbecker
  2013-05-20 16:01 ` [PATCH 1/8] nohz: Warn if the machine can not perform nohz_full Frederic Weisbecker
  2013-05-20 16:01 ` [PATCH 2/8] vtime: Use consistent clocks among nohz accounting Frederic Weisbecker
@ 2013-05-20 16:01 ` Frederic Weisbecker
  2013-05-20 17:52   ` Don Zickus
  2013-05-20 16:01 ` [PATCH 4/8] kvm: Move guest entry/exit APIs to context_tracking Frederic Weisbecker
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2013-05-20 16:01 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Li Zhong, Don Zickus

When the watchdog runs, it prevents the full dynticks
CPUs from stopping their tick because the hard lockup
detector uses perf events internally, which in turn
rely on the periodic tick.

Since this is a rather confusing behaviour that is not
easy to track down and identify for those who want to
test CONFIG_NO_HZ_FULL, let's default disable the
watchdog on boot time when full dynticks is enabled.

The user can still enable it later on runtime using
proc or sysctl.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Don Zickus <dzickus@redhat.com>
---
 kernel/watchdog.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 05039e3..7e1a021 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -543,6 +543,12 @@ static struct smp_hotplug_thread watchdog_threads = {
 
 void __init lockup_detector_init(void)
 {
+#ifdef CONFIG_NO_HZ_FULL
+	watchdog_enabled = 0;
+	watchdog_disabled = 1;
+	pr_warning("Disabled lockup detectors by default because of full dynticks\n");
+	pr_warning("You can overwrite that with 'sysctl -w kernel.watchdog=1'\n");
+#endif
 	set_sample_period();
 	if (smpboot_register_percpu_thread(&watchdog_threads)) {
 		pr_err("Failed to create watchdog threads, disabled\n");
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/8] kvm: Move guest entry/exit APIs to context_tracking
  2013-05-20 16:01 [PATCH 0/8] nohz: Random fixes Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2013-05-20 16:01 ` [PATCH 3/8] watchdog: Boot-disable by default on full dynticks Frederic Weisbecker
@ 2013-05-20 16:01 ` Frederic Weisbecker
  2013-05-20 16:01 ` [PATCH 5/8] nohz: Fix notifier return val that enforce timekeeping Frederic Weisbecker
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2013-05-20 16:01 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Kevin Hilman,
	Marcelo Tosatti, Gleb Natapov

The kvm_host.h header file doesn't handle well
inclusion when archs don't support KVM.

This results in build crashes for such archs when they
want to implement context tracking because this subsystem
includes kvm_host.h in order to implement the
guest_enter/exit APIs but it doesn't handle KVM off case.

To fix this, move the guest_enter()/guest_exit()
declarations and generic implementation to the context
tracking headers. These generic APIs actually belong to
this subsystem, besides other domains boundary tracking
like user_enter() et al.

KVM now properly becomes a user of this library, not the
other buggy way around.

Reported-by: Kevin Hilman <khilman@linaro.org>
Reviewed-by: Kevin Hilman <khilman@linaro.org>
Tested-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
---
 include/linux/context_tracking.h |   35 +++++++++++++++++++++++++++++++++++
 include/linux/kvm_host.h         |   37 +------------------------------------
 kernel/context_tracking.c        |    1 -
 3 files changed, 36 insertions(+), 37 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 365f4a6..fc09d7b 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -3,6 +3,7 @@
 
 #include <linux/sched.h>
 #include <linux/percpu.h>
+#include <linux/vtime.h>
 #include <asm/ptrace.h>
 
 struct context_tracking {
@@ -19,6 +20,26 @@ struct context_tracking {
 	} state;
 };
 
+static inline void __guest_enter(void)
+{
+	/*
+	 * This is running in ioctl context so we can avoid
+	 * the call to vtime_account() with its unnecessary idle check.
+	 */
+	vtime_account_system(current);
+	current->flags |= PF_VCPU;
+}
+
+static inline void __guest_exit(void)
+{
+	/*
+	 * This is running in ioctl context so we can avoid
+	 * the call to vtime_account() with its unnecessary idle check.
+	 */
+	vtime_account_system(current);
+	current->flags &= ~PF_VCPU;
+}
+
 #ifdef CONFIG_CONTEXT_TRACKING
 DECLARE_PER_CPU(struct context_tracking, context_tracking);
 
@@ -35,6 +56,9 @@ static inline bool context_tracking_active(void)
 extern void user_enter(void);
 extern void user_exit(void);
 
+extern void guest_enter(void);
+extern void guest_exit(void);
+
 static inline enum ctx_state exception_enter(void)
 {
 	enum ctx_state prev_ctx;
@@ -57,6 +81,17 @@ extern void context_tracking_task_switch(struct task_struct *prev,
 static inline bool context_tracking_in_user(void) { return false; }
 static inline void user_enter(void) { }
 static inline void user_exit(void) { }
+
+static inline void guest_enter(void)
+{
+	__guest_enter();
+}
+
+static inline void guest_exit(void)
+{
+	__guest_exit();
+}
+
 static inline enum ctx_state exception_enter(void) { return 0; }
 static inline void exception_exit(enum ctx_state prev_ctx) { }
 static inline void context_tracking_task_switch(struct task_struct *prev,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f0eea07..8db53cf 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -23,6 +23,7 @@
 #include <linux/ratelimit.h>
 #include <linux/err.h>
 #include <linux/irqflags.h>
+#include <linux/context_tracking.h>
 #include <asm/signal.h>
 
 #include <linux/kvm.h>
@@ -760,42 +761,6 @@ static inline int kvm_iommu_unmap_guest(struct kvm *kvm)
 }
 #endif
 
-static inline void __guest_enter(void)
-{
-	/*
-	 * This is running in ioctl context so we can avoid
-	 * the call to vtime_account() with its unnecessary idle check.
-	 */
-	vtime_account_system(current);
-	current->flags |= PF_VCPU;
-}
-
-static inline void __guest_exit(void)
-{
-	/*
-	 * This is running in ioctl context so we can avoid
-	 * the call to vtime_account() with its unnecessary idle check.
-	 */
-	vtime_account_system(current);
-	current->flags &= ~PF_VCPU;
-}
-
-#ifdef CONFIG_CONTEXT_TRACKING
-extern void guest_enter(void);
-extern void guest_exit(void);
-
-#else /* !CONFIG_CONTEXT_TRACKING */
-static inline void guest_enter(void)
-{
-	__guest_enter();
-}
-
-static inline void guest_exit(void)
-{
-	__guest_exit();
-}
-#endif /* !CONFIG_CONTEXT_TRACKING */
-
 static inline void kvm_guest_enter(void)
 {
 	unsigned long flags;
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 65349f0..85bdde1 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -15,7 +15,6 @@
  */
 
 #include <linux/context_tracking.h>
-#include <linux/kvm_host.h>
 #include <linux/rcupdate.h>
 #include <linux/sched.h>
 #include <linux/hardirq.h>
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/8] nohz: Fix notifier return val that enforce timekeeping
  2013-05-20 16:01 [PATCH 0/8] nohz: Random fixes Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2013-05-20 16:01 ` [PATCH 4/8] kvm: Move guest entry/exit APIs to context_tracking Frederic Weisbecker
@ 2013-05-20 16:01 ` Frederic Weisbecker
  2013-05-20 16:01 ` [RFC PATCH 6/8] kthread: Enable parking requests from setup() and unpark() callbacks Frederic Weisbecker
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2013-05-20 16:01 UTC (permalink / raw)
  To: LKML
  Cc: Li Zhong, Steven Rostedt, Paul E. McKenney, Ingo Molnar,
	Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Frederic Weisbecker

From: Li Zhong <zhong@linux.vnet.ibm.com>

In tick_nohz_cpu_down_callback() if the cpu is the one handling
timekeeping , we must return something that stops the CPU_DOWN_PREPARE
notifiers and then start notify CPU_DOWN_FAILED on the already called
notifier call backs.

However traditional errno values are not handled by the notifier unless
these are encapsulated using errno_to_notifier().

Hence the current -EINVAL is misinterpreted and converted to junk after
notifier_to_errno(), leaving the notifier subsystem to random behaviour
such as eventually allowing the cpu to go down.

Fix this by using the standard NOTIFY_BAD instead.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/time/tick-sched.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index cfc798b..58d8d4d 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -311,7 +311,7 @@ static int __cpuinit tick_nohz_cpu_down_callback(struct notifier_block *nfb,
 		 * we can't safely shutdown that CPU.
 		 */
 		if (have_nohz_full_mask && tick_do_timer_cpu == cpu)
-			return -EINVAL;
+			return NOTIFY_BAD;
 		break;
 	}
 	return NOTIFY_OK;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 6/8] kthread: Enable parking requests from setup() and unpark() callbacks
  2013-05-20 16:01 [PATCH 0/8] nohz: Random fixes Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2013-05-20 16:01 ` [PATCH 5/8] nohz: Fix notifier return val that enforce timekeeping Frederic Weisbecker
@ 2013-05-20 16:01 ` Frederic Weisbecker
  2013-05-21  5:34   ` anish singh
  2013-05-21  6:59   ` Srivatsa S. Bhat
  2013-05-20 16:01 ` [RFC PATCH 7/8] watchdog: Rename confusing state variable Frederic Weisbecker
  2013-05-20 16:01 ` [RFC PATCH 8/8] watchdog: Fix internal state with boot user disabled watchdog Frederic Weisbecker
  7 siblings, 2 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2013-05-20 16:01 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Srivatsa S. Bhat, Steven Rostedt,
	Paul E. McKenney, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	Borislav Petkov, Li Zhong, Don Zickus

When the watchdog code is boot-disabled by the user, for example
through the 'nmi_watchdog=0' boot option, the setup() callback of
the watchdog kthread requests to park the task, and that until the
user later re-enables the watchdog through sysctl or procfs.

However the parking request is not well handled when done from
the setup() callback. After ->setup() is called, the generic smpboot
kthread loop directly tries to call the thread function or wait
for some event if ->thread_should_run() is false.

In the case of the watchdog kthread, ->thread_should_run() returns
false and the kthread goes to sleep and wait for the watchdog timer
to wake it up. But the timer is not enabled since the user requested
to disable the watchdog. We want the kthread to park instead of waiting
for events that can't happen.

As a result, later unpark requests after sysctl write through
'sysctl -w kernel.watchdog=1' won't wake up/unpark the task as
expected, since it's not parked anyway, leaving the value modified
without triggering any action.

We could workaround some solution in the watchdog code like forcing
one pass to the thread function and immediately return to park.

But supporting parking requests from ->setup() or ->unpark()
callbacks look like proper way to implement cancellation in
general. So let's fix it that way.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Don Zickus <dzickus@redhat.com>
---
 kernel/smpboot.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/kernel/smpboot.c b/kernel/smpboot.c
index 02fc5c9..3394ed0 100644
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -151,6 +151,12 @@ static int smpboot_thread_fn(void *data)
 			break;
 		}
 
+		/* Check if setup or unpark actually want us to park */
+		if (kthread_should_stop() || kthread_should_park()) {
+			preempt_enable();
+			continue;
+		}
+
 		if (!ht->thread_should_run(td->cpu)) {
 			preempt_enable();
 			schedule();
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 7/8] watchdog: Rename confusing state variable
  2013-05-20 16:01 [PATCH 0/8] nohz: Random fixes Frederic Weisbecker
                   ` (5 preceding siblings ...)
  2013-05-20 16:01 ` [RFC PATCH 6/8] kthread: Enable parking requests from setup() and unpark() callbacks Frederic Weisbecker
@ 2013-05-20 16:01 ` Frederic Weisbecker
  2013-05-20 17:53   ` Don Zickus
  2013-05-20 16:01 ` [RFC PATCH 8/8] watchdog: Fix internal state with boot user disabled watchdog Frederic Weisbecker
  7 siblings, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2013-05-20 16:01 UTC (permalink / raw)
  To: LKML; +Cc: Frederic Weisbecker, Ingo Molnar, Don Zickus, Peter Zijlstra

We have two very conflicting state variable names in the
watchdog:

* watchdog_enabled: This one reflects the user interface. It's
set to 1 by default and can be overriden with boot options
or sysctl/procfs interface.

* watchdog_disabled: This is the internal toggle state that
tells if watchdog threads, timers and NMI events are currently
running or not. This state depends on the user settings and
hardware support. It's a convenient state latch.

Now we really need to find clearer names because those
are just too confusing to encourage deep review.

watchdog_enabled now becomes watchdog_user_enabled to reflect
its purpose as an interface.

watchdog_disabled becomes watchdog_running to suggest its
role as a pure internal state.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 include/linux/nmi.h |    2 +-
 kernel/sysctl.c     |    4 ++--
 kernel/watchdog.c   |   32 ++++++++++++++++----------------
 3 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index db50840..6a45fb5 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -46,7 +46,7 @@ static inline bool trigger_all_cpu_backtrace(void)
 #ifdef CONFIG_LOCKUP_DETECTOR
 int hw_nmi_is_cpu_stuck(struct pt_regs *);
 u64 hw_nmi_get_sample_period(int watchdog_thresh);
-extern int watchdog_enabled;
+extern int watchdog_user_enabled;
 extern int watchdog_thresh;
 struct ctl_table;
 extern int proc_dowatchdog(struct ctl_table *, int ,
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 9edcf45..b080565 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -801,7 +801,7 @@ static struct ctl_table kern_table[] = {
 #if defined(CONFIG_LOCKUP_DETECTOR)
 	{
 		.procname       = "watchdog",
-		.data           = &watchdog_enabled,
+		.data           = &watchdog_user_enabled,
 		.maxlen         = sizeof (int),
 		.mode           = 0644,
 		.proc_handler   = proc_dowatchdog,
@@ -828,7 +828,7 @@ static struct ctl_table kern_table[] = {
 	},
 	{
 		.procname       = "nmi_watchdog",
-		.data           = &watchdog_enabled,
+		.data           = &watchdog_user_enabled,
 		.maxlen         = sizeof (int),
 		.mode           = 0644,
 		.proc_handler   = proc_dowatchdog,
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 7e1a021..10776fe 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -29,9 +29,9 @@
 #include <linux/kvm_para.h>
 #include <linux/perf_event.h>
 
-int watchdog_enabled = 1;
+int watchdog_user_enabled = 1;
 int __read_mostly watchdog_thresh = 10;
-static int __read_mostly watchdog_disabled;
+static int __read_mostly watchdog_running = 1;
 static u64 __read_mostly sample_period;
 
 static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
@@ -63,7 +63,7 @@ static int __init hardlockup_panic_setup(char *str)
 	else if (!strncmp(str, "nopanic", 7))
 		hardlockup_panic = 0;
 	else if (!strncmp(str, "0", 1))
-		watchdog_enabled = 0;
+		watchdog_user_enabled = 0;
 	return 1;
 }
 __setup("nmi_watchdog=", hardlockup_panic_setup);
@@ -82,7 +82,7 @@ __setup("softlockup_panic=", softlockup_panic_setup);
 
 static int __init nowatchdog_setup(char *str)
 {
-	watchdog_enabled = 0;
+	watchdog_user_enabled = 0;
 	return 1;
 }
 __setup("nowatchdog", nowatchdog_setup);
@@ -90,7 +90,7 @@ __setup("nowatchdog", nowatchdog_setup);
 /* deprecated */
 static int __init nosoftlockup_setup(char *str)
 {
-	watchdog_enabled = 0;
+	watchdog_user_enabled = 0;
 	return 1;
 }
 __setup("nosoftlockup", nosoftlockup_setup);
@@ -158,7 +158,7 @@ void touch_all_softlockup_watchdogs(void)
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 void touch_nmi_watchdog(void)
 {
-	if (watchdog_enabled) {
+	if (watchdog_user_enabled) {
 		unsigned cpu;
 
 		for_each_present_cpu(cpu) {
@@ -347,7 +347,7 @@ static void watchdog_enable(unsigned int cpu)
 	hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
 	hrtimer->function = watchdog_timer_fn;
 
-	if (!watchdog_enabled) {
+	if (!watchdog_user_enabled) {
 		kthread_park(current);
 		return;
 	}
@@ -482,8 +482,8 @@ static void watchdog_enable_all_cpus(void)
 {
 	unsigned int cpu;
 
-	if (watchdog_disabled) {
-		watchdog_disabled = 0;
+	if (!watchdog_running) {
+		watchdog_running = 1;
 		for_each_online_cpu(cpu)
 			kthread_unpark(per_cpu(softlockup_watchdog, cpu));
 	}
@@ -493,8 +493,8 @@ static void watchdog_disable_all_cpus(void)
 {
 	unsigned int cpu;
 
-	if (!watchdog_disabled) {
-		watchdog_disabled = 1;
+	if (watchdog_running) {
+		watchdog_running = 0;
 		for_each_online_cpu(cpu)
 			kthread_park(per_cpu(softlockup_watchdog, cpu));
 	}
@@ -509,7 +509,7 @@ int proc_dowatchdog(struct ctl_table *table, int write,
 {
 	int ret;
 
-	if (watchdog_disabled < 0)
+	if (watchdog_running < 0)
 		return -ENODEV;
 
 	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
@@ -522,7 +522,7 @@ int proc_dowatchdog(struct ctl_table *table, int write,
 	 * disabled. The 'watchdog_disabled' variable check in
 	 * watchdog_*_all_cpus() function takes care of this.
 	 */
-	if (watchdog_enabled && watchdog_thresh)
+	if (watchdog_user_enabled && watchdog_thresh)
 		watchdog_enable_all_cpus();
 	else
 		watchdog_disable_all_cpus();
@@ -544,14 +544,14 @@ static struct smp_hotplug_thread watchdog_threads = {
 void __init lockup_detector_init(void)
 {
 #ifdef CONFIG_NO_HZ_FULL
-	watchdog_enabled = 0;
-	watchdog_disabled = 1;
+	watchdog_user_enabled = 0;
+	watchdog_running = 0;
 	pr_warning("Disabled lockup detectors by default because of full dynticks\n");
 	pr_warning("You can overwrite that with 'sysctl -w kernel.watchdog=1'\n");
 #endif
 	set_sample_period();
 	if (smpboot_register_percpu_thread(&watchdog_threads)) {
 		pr_err("Failed to create watchdog threads, disabled\n");
-		watchdog_disabled = -ENODEV;
+		watchdog_running = -ENODEV;
 	}
 }
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 8/8] watchdog: Fix internal state with boot user disabled watchdog
  2013-05-20 16:01 [PATCH 0/8] nohz: Random fixes Frederic Weisbecker
                   ` (6 preceding siblings ...)
  2013-05-20 16:01 ` [RFC PATCH 7/8] watchdog: Rename confusing state variable Frederic Weisbecker
@ 2013-05-20 16:01 ` Frederic Weisbecker
  2013-05-20 17:54   ` Don Zickus
  7 siblings, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2013-05-20 16:01 UTC (permalink / raw)
  To: LKML; +Cc: Frederic Weisbecker, Don Zickus, Ingo Molnar, Peter Zijlstra

When the watchdog is disabled through the 'nmi_watchdog=0'
boot parameter, the watchdog_running internal state isn't
cleared. Hence further enablements through sysctl/procfs
are ignored because the subsystem spuriously thinks it's
already running.

Initialize it properly on boot.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 kernel/watchdog.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 10776fe..3fa5df20 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -31,7 +31,7 @@
 
 int watchdog_user_enabled = 1;
 int __read_mostly watchdog_thresh = 10;
-static int __read_mostly watchdog_running = 1;
+static int __read_mostly watchdog_running;
 static u64 __read_mostly sample_period;
 
 static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
@@ -545,10 +545,10 @@ void __init lockup_detector_init(void)
 {
 #ifdef CONFIG_NO_HZ_FULL
 	watchdog_user_enabled = 0;
-	watchdog_running = 0;
 	pr_warning("Disabled lockup detectors by default because of full dynticks\n");
 	pr_warning("You can overwrite that with 'sysctl -w kernel.watchdog=1'\n");
 #endif
+	watchdog_running = watchdog_user_enabled;
 	set_sample_period();
 	if (smpboot_register_percpu_thread(&watchdog_threads)) {
 		pr_err("Failed to create watchdog threads, disabled\n");
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/8] watchdog: Boot-disable by default on full dynticks
  2013-05-20 16:01 ` [PATCH 3/8] watchdog: Boot-disable by default on full dynticks Frederic Weisbecker
@ 2013-05-20 17:52   ` Don Zickus
  2013-05-20 18:14     ` Frederic Weisbecker
  0 siblings, 1 reply; 25+ messages in thread
From: Don Zickus @ 2013-05-20 17:52 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Steven Rostedt, Paul E. McKenney, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Peter Zijlstra, Li Zhong

On Mon, May 20, 2013 at 06:01:51PM +0200, Frederic Weisbecker wrote:
> When the watchdog runs, it prevents the full dynticks
> CPUs from stopping their tick because the hard lockup
> detector uses perf events internally, which in turn
> rely on the periodic tick.
> 
> Since this is a rather confusing behaviour that is not
> easy to track down and identify for those who want to
> test CONFIG_NO_HZ_FULL, let's default disable the
> watchdog on boot time when full dynticks is enabled.
> 
> The user can still enable it later on runtime using
> proc or sysctl.

I thought Peter committed a patch to perf so that this isn't needed any
more?

Cheers,
Don

> 
> Reported-by: Steven Rostedt <rostedt@goodmis.org>
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Li Zhong <zhong@linux.vnet.ibm.com>
> Cc: Don Zickus <dzickus@redhat.com>
> ---
>  kernel/watchdog.c |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 05039e3..7e1a021 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -543,6 +543,12 @@ static struct smp_hotplug_thread watchdog_threads = {
>  
>  void __init lockup_detector_init(void)
>  {
> +#ifdef CONFIG_NO_HZ_FULL
> +	watchdog_enabled = 0;
> +	watchdog_disabled = 1;
> +	pr_warning("Disabled lockup detectors by default because of full dynticks\n");
> +	pr_warning("You can overwrite that with 'sysctl -w kernel.watchdog=1'\n");
> +#endif
>  	set_sample_period();
>  	if (smpboot_register_percpu_thread(&watchdog_threads)) {
>  		pr_err("Failed to create watchdog threads, disabled\n");
> -- 
> 1.7.5.4
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 7/8] watchdog: Rename confusing state variable
  2013-05-20 16:01 ` [RFC PATCH 7/8] watchdog: Rename confusing state variable Frederic Weisbecker
@ 2013-05-20 17:53   ` Don Zickus
  0 siblings, 0 replies; 25+ messages in thread
From: Don Zickus @ 2013-05-20 17:53 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: LKML, Ingo Molnar, Peter Zijlstra

On Mon, May 20, 2013 at 06:01:55PM +0200, Frederic Weisbecker wrote:
> We have two very conflicting state variable names in the
> watchdog:
> 
> * watchdog_enabled: This one reflects the user interface. It's
> set to 1 by default and can be overriden with boot options
> or sysctl/procfs interface.
> 
> * watchdog_disabled: This is the internal toggle state that
> tells if watchdog threads, timers and NMI events are currently
> running or not. This state depends on the user settings and
> hardware support. It's a convenient state latch.
> 
> Now we really need to find clearer names because those
> are just too confusing to encourage deep review.
> 
> watchdog_enabled now becomes watchdog_user_enabled to reflect
> its purpose as an interface.
> 
> watchdog_disabled becomes watchdog_running to suggest its
> role as a pure internal state.

Heh.  Thanks for the cleanup.

Acked-by: Don Zickus <dzickus@redhat.com>

> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Don Zickus <dzickus@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> ---
>  include/linux/nmi.h |    2 +-
>  kernel/sysctl.c     |    4 ++--
>  kernel/watchdog.c   |   32 ++++++++++++++++----------------
>  3 files changed, 19 insertions(+), 19 deletions(-)
> 
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index db50840..6a45fb5 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -46,7 +46,7 @@ static inline bool trigger_all_cpu_backtrace(void)
>  #ifdef CONFIG_LOCKUP_DETECTOR
>  int hw_nmi_is_cpu_stuck(struct pt_regs *);
>  u64 hw_nmi_get_sample_period(int watchdog_thresh);
> -extern int watchdog_enabled;
> +extern int watchdog_user_enabled;
>  extern int watchdog_thresh;
>  struct ctl_table;
>  extern int proc_dowatchdog(struct ctl_table *, int ,
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 9edcf45..b080565 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -801,7 +801,7 @@ static struct ctl_table kern_table[] = {
>  #if defined(CONFIG_LOCKUP_DETECTOR)
>  	{
>  		.procname       = "watchdog",
> -		.data           = &watchdog_enabled,
> +		.data           = &watchdog_user_enabled,
>  		.maxlen         = sizeof (int),
>  		.mode           = 0644,
>  		.proc_handler   = proc_dowatchdog,
> @@ -828,7 +828,7 @@ static struct ctl_table kern_table[] = {
>  	},
>  	{
>  		.procname       = "nmi_watchdog",
> -		.data           = &watchdog_enabled,
> +		.data           = &watchdog_user_enabled,
>  		.maxlen         = sizeof (int),
>  		.mode           = 0644,
>  		.proc_handler   = proc_dowatchdog,
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 7e1a021..10776fe 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -29,9 +29,9 @@
>  #include <linux/kvm_para.h>
>  #include <linux/perf_event.h>
>  
> -int watchdog_enabled = 1;
> +int watchdog_user_enabled = 1;
>  int __read_mostly watchdog_thresh = 10;
> -static int __read_mostly watchdog_disabled;
> +static int __read_mostly watchdog_running = 1;
>  static u64 __read_mostly sample_period;
>  
>  static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
> @@ -63,7 +63,7 @@ static int __init hardlockup_panic_setup(char *str)
>  	else if (!strncmp(str, "nopanic", 7))
>  		hardlockup_panic = 0;
>  	else if (!strncmp(str, "0", 1))
> -		watchdog_enabled = 0;
> +		watchdog_user_enabled = 0;
>  	return 1;
>  }
>  __setup("nmi_watchdog=", hardlockup_panic_setup);
> @@ -82,7 +82,7 @@ __setup("softlockup_panic=", softlockup_panic_setup);
>  
>  static int __init nowatchdog_setup(char *str)
>  {
> -	watchdog_enabled = 0;
> +	watchdog_user_enabled = 0;
>  	return 1;
>  }
>  __setup("nowatchdog", nowatchdog_setup);
> @@ -90,7 +90,7 @@ __setup("nowatchdog", nowatchdog_setup);
>  /* deprecated */
>  static int __init nosoftlockup_setup(char *str)
>  {
> -	watchdog_enabled = 0;
> +	watchdog_user_enabled = 0;
>  	return 1;
>  }
>  __setup("nosoftlockup", nosoftlockup_setup);
> @@ -158,7 +158,7 @@ void touch_all_softlockup_watchdogs(void)
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
>  void touch_nmi_watchdog(void)
>  {
> -	if (watchdog_enabled) {
> +	if (watchdog_user_enabled) {
>  		unsigned cpu;
>  
>  		for_each_present_cpu(cpu) {
> @@ -347,7 +347,7 @@ static void watchdog_enable(unsigned int cpu)
>  	hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
>  	hrtimer->function = watchdog_timer_fn;
>  
> -	if (!watchdog_enabled) {
> +	if (!watchdog_user_enabled) {
>  		kthread_park(current);
>  		return;
>  	}
> @@ -482,8 +482,8 @@ static void watchdog_enable_all_cpus(void)
>  {
>  	unsigned int cpu;
>  
> -	if (watchdog_disabled) {
> -		watchdog_disabled = 0;
> +	if (!watchdog_running) {
> +		watchdog_running = 1;
>  		for_each_online_cpu(cpu)
>  			kthread_unpark(per_cpu(softlockup_watchdog, cpu));
>  	}
> @@ -493,8 +493,8 @@ static void watchdog_disable_all_cpus(void)
>  {
>  	unsigned int cpu;
>  
> -	if (!watchdog_disabled) {
> -		watchdog_disabled = 1;
> +	if (watchdog_running) {
> +		watchdog_running = 0;
>  		for_each_online_cpu(cpu)
>  			kthread_park(per_cpu(softlockup_watchdog, cpu));
>  	}
> @@ -509,7 +509,7 @@ int proc_dowatchdog(struct ctl_table *table, int write,
>  {
>  	int ret;
>  
> -	if (watchdog_disabled < 0)
> +	if (watchdog_running < 0)
>  		return -ENODEV;
>  
>  	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> @@ -522,7 +522,7 @@ int proc_dowatchdog(struct ctl_table *table, int write,
>  	 * disabled. The 'watchdog_disabled' variable check in
>  	 * watchdog_*_all_cpus() function takes care of this.
>  	 */
> -	if (watchdog_enabled && watchdog_thresh)
> +	if (watchdog_user_enabled && watchdog_thresh)
>  		watchdog_enable_all_cpus();
>  	else
>  		watchdog_disable_all_cpus();
> @@ -544,14 +544,14 @@ static struct smp_hotplug_thread watchdog_threads = {
>  void __init lockup_detector_init(void)
>  {
>  #ifdef CONFIG_NO_HZ_FULL
> -	watchdog_enabled = 0;
> -	watchdog_disabled = 1;
> +	watchdog_user_enabled = 0;
> +	watchdog_running = 0;
>  	pr_warning("Disabled lockup detectors by default because of full dynticks\n");
>  	pr_warning("You can overwrite that with 'sysctl -w kernel.watchdog=1'\n");
>  #endif
>  	set_sample_period();
>  	if (smpboot_register_percpu_thread(&watchdog_threads)) {
>  		pr_err("Failed to create watchdog threads, disabled\n");
> -		watchdog_disabled = -ENODEV;
> +		watchdog_running = -ENODEV;
>  	}
>  }
> -- 
> 1.7.5.4
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 8/8] watchdog: Fix internal state with boot user disabled watchdog
  2013-05-20 16:01 ` [RFC PATCH 8/8] watchdog: Fix internal state with boot user disabled watchdog Frederic Weisbecker
@ 2013-05-20 17:54   ` Don Zickus
  0 siblings, 0 replies; 25+ messages in thread
From: Don Zickus @ 2013-05-20 17:54 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: LKML, Ingo Molnar, Peter Zijlstra

On Mon, May 20, 2013 at 06:01:56PM +0200, Frederic Weisbecker wrote:
> When the watchdog is disabled through the 'nmi_watchdog=0'
> boot parameter, the watchdog_running internal state isn't
> cleared. Hence further enablements through sysctl/procfs
> are ignored because the subsystem spuriously thinks it's
> already running.
> 
> Initialize it properly on boot.

Looks fine to me.  Can't think of a case where this breaks something.

Acked-by: Don Zickus <dzickus@redhat.com>

> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Don Zickus <dzickus@redhat.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> ---
>  kernel/watchdog.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 10776fe..3fa5df20 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -31,7 +31,7 @@
>  
>  int watchdog_user_enabled = 1;
>  int __read_mostly watchdog_thresh = 10;
> -static int __read_mostly watchdog_running = 1;
> +static int __read_mostly watchdog_running;
>  static u64 __read_mostly sample_period;
>  
>  static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
> @@ -545,10 +545,10 @@ void __init lockup_detector_init(void)
>  {
>  #ifdef CONFIG_NO_HZ_FULL
>  	watchdog_user_enabled = 0;
> -	watchdog_running = 0;
>  	pr_warning("Disabled lockup detectors by default because of full dynticks\n");
>  	pr_warning("You can overwrite that with 'sysctl -w kernel.watchdog=1'\n");
>  #endif
> +	watchdog_running = watchdog_user_enabled;
>  	set_sample_period();
>  	if (smpboot_register_percpu_thread(&watchdog_threads)) {
>  		pr_err("Failed to create watchdog threads, disabled\n");
> -- 
> 1.7.5.4
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/8] watchdog: Boot-disable by default on full dynticks
  2013-05-20 17:52   ` Don Zickus
@ 2013-05-20 18:14     ` Frederic Weisbecker
  0 siblings, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2013-05-20 18:14 UTC (permalink / raw)
  To: Don Zickus
  Cc: LKML, Steven Rostedt, Paul E. McKenney, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Peter Zijlstra, Li Zhong

On Mon, May 20, 2013 at 01:52:29PM -0400, Don Zickus wrote:
> On Mon, May 20, 2013 at 06:01:51PM +0200, Frederic Weisbecker wrote:
> > When the watchdog runs, it prevents the full dynticks
> > CPUs from stopping their tick because the hard lockup
> > detector uses perf events internally, which in turn
> > rely on the periodic tick.
> > 
> > Since this is a rather confusing behaviour that is not
> > easy to track down and identify for those who want to
> > test CONFIG_NO_HZ_FULL, let's default disable the
> > watchdog on boot time when full dynticks is enabled.
> > 
> > The user can still enable it later on runtime using
> > proc or sysctl.
> 
> I thought Peter committed a patch to perf so that this isn't needed any
> more?

There is the patchset that converts perf_event_task_tick() to an hrtimer.
But I'm not sure it covers everything. Also this will be for the next merge
window. And in the end this moves the problem elsewhere: we still have timer
interrupts, they just come as specific hrtimers, although may be less often,
I haven't looked yet at the patchset.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 6/8] kthread: Enable parking requests from setup() and unpark() callbacks
  2013-05-20 16:01 ` [RFC PATCH 6/8] kthread: Enable parking requests from setup() and unpark() callbacks Frederic Weisbecker
@ 2013-05-21  5:34   ` anish singh
  2013-05-21  7:49     ` Srivatsa S. Bhat
  2013-05-21  6:59   ` Srivatsa S. Bhat
  1 sibling, 1 reply; 25+ messages in thread
From: anish singh @ 2013-05-21  5:34 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Srivatsa S. Bhat, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Don Zickus

On Mon, May 20, 2013 at 9:31 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> When the watchdog code is boot-disabled by the user, for example
> through the 'nmi_watchdog=0' boot option, the setup() callback of
> the watchdog kthread requests to park the task, and that until the
> user later re-enables the watchdog through sysctl or procfs.
>
> However the parking request is not well handled when done from
> the setup() callback. After ->setup() is called, the generic smpboot
> kthread loop directly tries to call the thread function or wait
> for some event if ->thread_should_run() is false.
>
> In the case of the watchdog kthread, ->thread_should_run() returns
> false and the kthread goes to sleep and wait for the watchdog timer
> to wake it up. But the timer is not enabled since the user requested
> to disable the watchdog. We want the kthread to park instead of waiting
> for events that can't happen.
>
> As a result, later unpark requests after sysctl write through
> 'sysctl -w kernel.watchdog=1' won't wake up/unpark the task as
> expected, since it's not parked anyway, leaving the value modified
> without triggering any action.
Out of curiosity, this can happen only for short period of time right?After
some time this will work right as the thread get back in action
after the schedule call.Then sysctl and procfs will work I think.
>
> We could workaround some solution in the watchdog code like forcing
> one pass to the thread function and immediately return to park.
>
> But supporting parking requests from ->setup() or ->unpark()
> callbacks look like proper way to implement cancellation in
> general. So let's fix it that way.
>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Li Zhong <zhong@linux.vnet.ibm.com>
> Cc: Don Zickus <dzickus@redhat.com>
> ---
>  kernel/smpboot.c |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> index 02fc5c9..3394ed0 100644
> --- a/kernel/smpboot.c
> +++ b/kernel/smpboot.c
> @@ -151,6 +151,12 @@ static int smpboot_thread_fn(void *data)
>                         break;
>                 }
>
> +               /* Check if setup or unpark actually want us to park */
> +               if (kthread_should_stop() || kthread_should_park()) {
> +                       preempt_enable();
> +                       continue;
> +               }
> +
>                 if (!ht->thread_should_run(td->cpu)) {
>                         preempt_enable();
>                         schedule();
> --
> 1.7.5.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 6/8] kthread: Enable parking requests from setup() and unpark() callbacks
  2013-05-20 16:01 ` [RFC PATCH 6/8] kthread: Enable parking requests from setup() and unpark() callbacks Frederic Weisbecker
  2013-05-21  5:34   ` anish singh
@ 2013-05-21  6:59   ` Srivatsa S. Bhat
  2013-06-05 16:33     ` Frederic Weisbecker
  1 sibling, 1 reply; 25+ messages in thread
From: Srivatsa S. Bhat @ 2013-05-21  6:59 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Steven Rostedt, Paul E. McKenney, Ingo Molnar,
	Thomas Gleixner, Peter Zijlstra, Borislav Petkov, Li Zhong,
	Don Zickus

On 05/20/2013 09:31 PM, Frederic Weisbecker wrote:
> When the watchdog code is boot-disabled by the user, for example
> through the 'nmi_watchdog=0' boot option, the setup() callback of
> the watchdog kthread requests to park the task, and that until the
> user later re-enables the watchdog through sysctl or procfs.
> 
> However the parking request is not well handled when done from
> the setup() callback. After ->setup() is called, the generic smpboot
> kthread loop directly tries to call the thread function or wait
> for some event if ->thread_should_run() is false.
> 
> In the case of the watchdog kthread, ->thread_should_run() returns
> false and the kthread goes to sleep and wait for the watchdog timer
> to wake it up. But the timer is not enabled since the user requested
> to disable the watchdog. We want the kthread to park instead of waiting
> for events that can't happen.
> 
> As a result, later unpark requests after sysctl write through
> 'sysctl -w kernel.watchdog=1' won't wake up/unpark the task as
> expected, since it's not parked anyway, leaving the value modified
> without triggering any action.
> 
> We could workaround some solution in the watchdog code like forcing
> one pass to the thread function and immediately return to park.
> 
> But supporting parking requests from ->setup() or ->unpark()

unpark() can already do a proper park, because immediately after
coming out of the parked state, the 'continue' statement helps
re-evaluate the stop/park condition.

So this fix is only for the ->setup() case.

> callbacks look like proper way to implement cancellation in
> general. So let's fix it that way.
>

Sounds good to me.

Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

But I wonder what nmi_watchdog=0 should actually mean, semantically..
The current code works as if the user asked us not to run the watchdog
threads, but it could as well be interpreted as if the user does not
want to run *any* watchdog-related *code*. In that case, ideally we
should *unregister* the watchdog threads, instead of just parking them.
And when the user enables them again via sysctl/procfs, we should
register the watchdog threads with the smpboot infrastructure.

I'm not saying that the current semantics is wrong, but if we really
implement it the other way I proposed above, then we won't have to deal
with weird issues like ->setup() code wanting to park, and watchdog
threads unparking and parking themselves on every CPU hotplug operation,
despite the fact that the user specified nmi_watchdog=0 on the kernel
command line.

Regards,
Srivatsa S. Bhat
 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Li Zhong <zhong@linux.vnet.ibm.com>
> Cc: Don Zickus <dzickus@redhat.com>
> ---
>  kernel/smpboot.c |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> index 02fc5c9..3394ed0 100644
> --- a/kernel/smpboot.c
> +++ b/kernel/smpboot.c
> @@ -151,6 +151,12 @@ static int smpboot_thread_fn(void *data)
>  			break;
>  		}
> 
> +		/* Check if setup or unpark actually want us to park */
> +		if (kthread_should_stop() || kthread_should_park()) {
> +			preempt_enable();
> +			continue;
> +		}
> +
>  		if (!ht->thread_should_run(td->cpu)) {
>  			preempt_enable();
>  			schedule();
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 6/8] kthread: Enable parking requests from setup() and unpark() callbacks
  2013-05-21  5:34   ` anish singh
@ 2013-05-21  7:49     ` Srivatsa S. Bhat
  2013-05-21  8:58       ` anish singh
  0 siblings, 1 reply; 25+ messages in thread
From: Srivatsa S. Bhat @ 2013-05-21  7:49 UTC (permalink / raw)
  To: anish singh
  Cc: Frederic Weisbecker, LKML, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Don Zickus

On 05/21/2013 11:04 AM, anish singh wrote:
> On Mon, May 20, 2013 at 9:31 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
>> When the watchdog code is boot-disabled by the user, for example
>> through the 'nmi_watchdog=0' boot option, the setup() callback of
>> the watchdog kthread requests to park the task, and that until the
>> user later re-enables the watchdog through sysctl or procfs.
>>
>> However the parking request is not well handled when done from
>> the setup() callback. After ->setup() is called, the generic smpboot
>> kthread loop directly tries to call the thread function or wait
>> for some event if ->thread_should_run() is false.
>>
>> In the case of the watchdog kthread, ->thread_should_run() returns
>> false and the kthread goes to sleep and wait for the watchdog timer
>> to wake it up. But the timer is not enabled since the user requested
>> to disable the watchdog. We want the kthread to park instead of waiting
>> for events that can't happen.
>>
>> As a result, later unpark requests after sysctl write through
>> 'sysctl -w kernel.watchdog=1' won't wake up/unpark the task as
>> expected, since it's not parked anyway, leaving the value modified
>> without triggering any action.
> Out of curiosity, this can happen only for short period of time right?After
> some time this will work right as the thread get back in action
> after the schedule call.Then sysctl and procfs will work I think.

kthread_unpark() can wake up a task only if the task is in TASK_PARKED
state. But since the above task would be in TASK_INTERRUPTIBLE state 
(since it is not parked), kthread_unpark() will be powerless to do anything.

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 6/8] kthread: Enable parking requests from setup() and unpark() callbacks
  2013-05-21  7:49     ` Srivatsa S. Bhat
@ 2013-05-21  8:58       ` anish singh
  2013-05-21  9:07         ` Srivatsa S. Bhat
  2013-05-22 15:18         ` Frederic Weisbecker
  0 siblings, 2 replies; 25+ messages in thread
From: anish singh @ 2013-05-21  8:58 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Frederic Weisbecker, LKML, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Don Zickus

On Tue, May 21, 2013 at 1:19 PM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> On 05/21/2013 11:04 AM, anish singh wrote:
>> On Mon, May 20, 2013 at 9:31 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
>>> When the watchdog code is boot-disabled by the user, for example
>>> through the 'nmi_watchdog=0' boot option, the setup() callback of
>>> the watchdog kthread requests to park the task, and that until the
>>> user later re-enables the watchdog through sysctl or procfs.
>>>
>>> However the parking request is not well handled when done from
>>> the setup() callback. After ->setup() is called, the generic smpboot
>>> kthread loop directly tries to call the thread function or wait
>>> for some event if ->thread_should_run() is false.
>>>
>>> In the case of the watchdog kthread, ->thread_should_run() returns
>>> false and the kthread goes to sleep and wait for the watchdog timer
>>> to wake it up. But the timer is not enabled since the user requested
>>> to disable the watchdog. We want the kthread to park instead of waiting
>>> for events that can't happen.
>>>
>>> As a result, later unpark requests after sysctl write through
>>> 'sysctl -w kernel.watchdog=1' won't wake up/unpark the task as
>>> expected, since it's not parked anyway, leaving the value modified
>>> without triggering any action.
>> Out of curiosity, this can happen only for short period of time right?After
>> some time this will work right as the thread get back in action
>> after the schedule call.Then sysctl and procfs will work I think.
>
> kthread_unpark() can wake up a task only if the task is in TASK_PARKED
> state. But since the above task would be in TASK_INTERRUPTIBLE state
> (since it is not parked), kthread_unpark() will be powerless to do anything.
Yes but this will happen only for a short period of time right?
After the schdule() call the below code gets executed in while() loop.

if (kthread_should_park()) {
__set_current_state(TASK_RUNNING);
preempt_enable();
if (ht->park && td->status == HP_THREAD_ACTIVE) {
BUG_ON(td->cpu != smp_processor_id());
ht->park(td->cpu);
td->status = HP_THREAD_PARKED;
}
kthread_parkme();
/* We might have been woken for stop */
continue;
}

As we have already called kthread_park this above if() condition gets true and
it will park the thread wouldn't it?But this will happen after the schedule
call which is not right as mentioned by fredrick.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 6/8] kthread: Enable parking requests from setup() and unpark() callbacks
  2013-05-21  8:58       ` anish singh
@ 2013-05-21  9:07         ` Srivatsa S. Bhat
  2013-05-22 15:18         ` Frederic Weisbecker
  1 sibling, 0 replies; 25+ messages in thread
From: Srivatsa S. Bhat @ 2013-05-21  9:07 UTC (permalink / raw)
  To: anish singh
  Cc: Frederic Weisbecker, LKML, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Don Zickus

On 05/21/2013 02:28 PM, anish singh wrote:
> On Tue, May 21, 2013 at 1:19 PM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 05/21/2013 11:04 AM, anish singh wrote:
>>> On Mon, May 20, 2013 at 9:31 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
>>>> When the watchdog code is boot-disabled by the user, for example
>>>> through the 'nmi_watchdog=0' boot option, the setup() callback of
>>>> the watchdog kthread requests to park the task, and that until the
>>>> user later re-enables the watchdog through sysctl or procfs.
>>>>
>>>> However the parking request is not well handled when done from
>>>> the setup() callback. After ->setup() is called, the generic smpboot
>>>> kthread loop directly tries to call the thread function or wait
>>>> for some event if ->thread_should_run() is false.
>>>>
>>>> In the case of the watchdog kthread, ->thread_should_run() returns
>>>> false and the kthread goes to sleep and wait for the watchdog timer
>>>> to wake it up. But the timer is not enabled since the user requested
>>>> to disable the watchdog. We want the kthread to park instead of waiting
>>>> for events that can't happen.
>>>>
>>>> As a result, later unpark requests after sysctl write through
>>>> 'sysctl -w kernel.watchdog=1' won't wake up/unpark the task as
>>>> expected, since it's not parked anyway, leaving the value modified
>>>> without triggering any action.
>>> Out of curiosity, this can happen only for short period of time right?After
>>> some time this will work right as the thread get back in action
>>> after the schedule call.Then sysctl and procfs will work I think.
>>
>> kthread_unpark() can wake up a task only if the task is in TASK_PARKED
>> state. But since the above task would be in TASK_INTERRUPTIBLE state
>> (since it is not parked), kthread_unpark() will be powerless to do anything.
> Yes but this will happen only for a short period of time right?
> After the schdule() call the below code gets executed in while() loop.
> 

schedule() is not just another function call from which you'll simply
return "after a short period of time". We have set the task's state to
TASK_INTERRUPTIBLE at the beginning of the loop. So the call to schedule()
will kick the task out of the runqueue, since it is not runnable any more.

You need somebody to do a wake_up_process() or something similar to make
the task runnable again, after which it gets back to the runqueue and gets
a chance to run. But the only one who can do wake_up_process() for this
watchdog task in this case, is the kthread_unpark() code, which needs the
task to be in TASK_PARKED state, not TASK_INTERRUPTIBLE. So the call to
wake_up_state() returns without doing anything, leaving the watchdog task
non-runnable. This is what Frederic referred to, when he said that
enabling nmi watchdog via sysctl/procfs won't do anything.

> if (kthread_should_park()) {
> __set_current_state(TASK_RUNNING);
> preempt_enable();
> if (ht->park && td->status == HP_THREAD_ACTIVE) {
> BUG_ON(td->cpu != smp_processor_id());
> ht->park(td->cpu);
> td->status = HP_THREAD_PARKED;
> }
> kthread_parkme();
> /* We might have been woken for stop */
> continue;
> }
> 
> As we have already called kthread_park this above if() condition gets true and
> it will park the thread wouldn't it?But this will happen after the schedule
> call which is not right as mentioned by fredrick.
> 

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 6/8] kthread: Enable parking requests from setup() and unpark() callbacks
  2013-05-21  8:58       ` anish singh
  2013-05-21  9:07         ` Srivatsa S. Bhat
@ 2013-05-22 15:18         ` Frederic Weisbecker
  1 sibling, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2013-05-22 15:18 UTC (permalink / raw)
  To: anish singh
  Cc: Srivatsa S. Bhat, LKML, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Don Zickus

2013/5/21, anish singh <anish198519851985@gmail.com>:
> On Tue, May 21, 2013 at 1:19 PM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 05/21/2013 11:04 AM, anish singh wrote:
>>> On Mon, May 20, 2013 at 9:31 PM, Frederic Weisbecker <fweisbec@gmail.com>
>>> wrote:
>>>> When the watchdog code is boot-disabled by the user, for example
>>>> through the 'nmi_watchdog=0' boot option, the setup() callback of
>>>> the watchdog kthread requests to park the task, and that until the
>>>> user later re-enables the watchdog through sysctl or procfs.
>>>>
>>>> However the parking request is not well handled when done from
>>>> the setup() callback. After ->setup() is called, the generic smpboot
>>>> kthread loop directly tries to call the thread function or wait
>>>> for some event if ->thread_should_run() is false.
>>>>
>>>> In the case of the watchdog kthread, ->thread_should_run() returns
>>>> false and the kthread goes to sleep and wait for the watchdog timer
>>>> to wake it up. But the timer is not enabled since the user requested
>>>> to disable the watchdog. We want the kthread to park instead of waiting
>>>> for events that can't happen.
>>>>
>>>> As a result, later unpark requests after sysctl write through
>>>> 'sysctl -w kernel.watchdog=1' won't wake up/unpark the task as
>>>> expected, since it's not parked anyway, leaving the value modified
>>>> without triggering any action.
>>> Out of curiosity, this can happen only for short period of time
>>> right?After
>>> some time this will work right as the thread get back in action
>>> after the schedule call.Then sysctl and procfs will work I think.
>>
>> kthread_unpark() can wake up a task only if the task is in TASK_PARKED
>> state. But since the above task would be in TASK_INTERRUPTIBLE state
>> (since it is not parked), kthread_unpark() will be powerless to do
>> anything.
> Yes but this will happen only for a short period of time right?
> After the schdule() call the below code gets executed in while() loop.
>
> if (kthread_should_park()) {
> __set_current_state(TASK_RUNNING);
> preempt_enable();
> if (ht->park && td->status == HP_THREAD_ACTIVE) {
> BUG_ON(td->cpu != smp_processor_id());
> ht->park(td->cpu);
> td->status = HP_THREAD_PARKED;
> }
> kthread_parkme();
> /* We might have been woken for stop */
> continue;
> }
>
> As we have already called kthread_park this above if() condition gets true
> and
> it will park the thread wouldn't it?But this will happen after the schedule
> call which is not right as mentioned by fredrick.
>

But we are scheduling in TASK_INTERRUPTIBLE mode so we are going to
sleep until some other thread wake us. But we are
unparked instead of being awoken. This just have no effect because
kthread_unpark() is a no-op on non-parked kthreads.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/8] vtime: Use consistent clocks among nohz accounting
  2013-05-20 16:01 ` [PATCH 2/8] vtime: Use consistent clocks among nohz accounting Frederic Weisbecker
@ 2013-06-03  9:47   ` Stefan Seyfried
  2013-06-03 13:48     ` Steven Rostedt
  2013-06-03 19:48     ` Frederic Weisbecker
  0 siblings, 2 replies; 25+ messages in thread
From: Stefan Seyfried @ 2013-06-03  9:47 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Steven Rostedt, Paul E. McKenney, Ingo Molnar,
	Thomas Gleixner, Peter Zijlstra, Borislav Petkov, Li Zhong,
	Mike Galbraith

Am 20.05.2013 18:01, schrieb Frederic Weisbecker:
> While computing the cputime delta of dynticks CPUs,
> we are mixing up clocks of differents natures:

[...]

> As a consequence, some strange behaviour with unstable tsc
> has been observed such as non progressing constant zero cputime.
> (The 'top' command showing no load).

This happens for example on my trusty ThinkPad X200s (family 6 model 23
stepping 10 Core 2 duo), seriously confusing its user (me :-).

> Fix this by only using local_clock(), or its irq safe/remote
> equivalent, in vtime code.
> 
> Reported-by: Mike Galbraith <efault@gmx.de>
> Suggested-by: Mike Galbraith <efault@gmx.de>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Li Zhong <zhong@linux.vnet.ibm.com>
> Cc: Mike Galbraith <efault@gmx.de>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>

FWIW:
Tested-by: Stefan Seyfried <seife+dev@b1-systems.com>

This patch fixes the 0% CPU issue on openSUSE Factory kernels for me.

Best regards,

	Stefan
-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/8] vtime: Use consistent clocks among nohz accounting
  2013-06-03  9:47   ` Stefan Seyfried
@ 2013-06-03 13:48     ` Steven Rostedt
  2013-06-03 19:48     ` Frederic Weisbecker
  1 sibling, 0 replies; 25+ messages in thread
From: Steven Rostedt @ 2013-06-03 13:48 UTC (permalink / raw)
  To: Stefan Seyfried
  Cc: Frederic Weisbecker, LKML, Paul E. McKenney, Ingo Molnar,
	Thomas Gleixner, Peter Zijlstra, Borislav Petkov, Li Zhong,
	Mike Galbraith

On Mon, 2013-06-03 at 11:47 +0200, Stefan Seyfried wrote:

> FWIW:
> Tested-by: Stefan Seyfried <seife+dev@b1-systems.com>

FWIW, we can always use testers. Thanks for testing!

-- Steve



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/8] vtime: Use consistent clocks among nohz accounting
  2013-06-03  9:47   ` Stefan Seyfried
  2013-06-03 13:48     ` Steven Rostedt
@ 2013-06-03 19:48     ` Frederic Weisbecker
  2013-06-03 19:51       ` Stefan Seyfried
  1 sibling, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2013-06-03 19:48 UTC (permalink / raw)
  To: Stefan Seyfried
  Cc: LKML, Steven Rostedt, Paul E. McKenney, Ingo Molnar,
	Thomas Gleixner, Peter Zijlstra, Borislav Petkov, Li Zhong,
	Mike Galbraith

On Mon, Jun 03, 2013 at 11:47:17AM +0200, Stefan Seyfried wrote:
> Am 20.05.2013 18:01, schrieb Frederic Weisbecker:
> > While computing the cputime delta of dynticks CPUs,
> > we are mixing up clocks of differents natures:
> 
> [...]
> 
> > As a consequence, some strange behaviour with unstable tsc
> > has been observed such as non progressing constant zero cputime.
> > (The 'top' command showing no load).
> 
> This happens for example on my trusty ThinkPad X200s (family 6 model 23
> stepping 10 Core 2 duo), seriously confusing its user (me :-).
> 
> > Fix this by only using local_clock(), or its irq safe/remote
> > equivalent, in vtime code.
> > 
> > Reported-by: Mike Galbraith <efault@gmx.de>
> > Suggested-by: Mike Galbraith <efault@gmx.de>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: Li Zhong <zhong@linux.vnet.ibm.com>
> > Cc: Mike Galbraith <efault@gmx.de>
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> 
> FWIW:
> Tested-by: Stefan Seyfried <seife+dev@b1-systems.com>
> 
> This patch fixes the 0% CPU issue on openSUSE Factory kernels for me.

Thanks! The patch has been committed already so I can't add your Tested-by:
but feedbacks on testing are always appeciated.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/8] vtime: Use consistent clocks among nohz accounting
  2013-06-03 19:48     ` Frederic Weisbecker
@ 2013-06-03 19:51       ` Stefan Seyfried
  2013-06-03 20:12         ` Frederic Weisbecker
  0 siblings, 1 reply; 25+ messages in thread
From: Stefan Seyfried @ 2013-06-03 19:51 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Steven Rostedt, Paul E. McKenney, Ingo Molnar,
	Thomas Gleixner, Peter Zijlstra, Borislav Petkov, Li Zhong,
	Mike Galbraith

Am 03.06.2013 21:48, schrieb Frederic Weisbecker:
> On Mon, Jun 03, 2013 at 11:47:17AM +0200, Stefan Seyfried wrote:
>> FWIW:
>> Tested-by: Stefan Seyfried <seife+dev@b1-systems.com>
>>
>> This patch fixes the 0% CPU issue on openSUSE Factory kernels for me.
> 
> Thanks! The patch has been committed already so I can't add your Tested-by:
> but feedbacks on testing are always appeciated.

But it did not end up in Linus' tree yet. That would be more important
for me than the credits in the commit message :-)

Thanks,

	Stefan
-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/8] vtime: Use consistent clocks among nohz accounting
  2013-06-03 19:51       ` Stefan Seyfried
@ 2013-06-03 20:12         ` Frederic Weisbecker
  0 siblings, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2013-06-03 20:12 UTC (permalink / raw)
  To: Stefan Seyfried
  Cc: LKML, Steven Rostedt, Paul E. McKenney, Ingo Molnar,
	Thomas Gleixner, Peter Zijlstra, Borislav Petkov, Li Zhong,
	Mike Galbraith

On Mon, Jun 03, 2013 at 09:51:37PM +0200, Stefan Seyfried wrote:
> Am 03.06.2013 21:48, schrieb Frederic Weisbecker:
> > On Mon, Jun 03, 2013 at 11:47:17AM +0200, Stefan Seyfried wrote:
> >> FWIW:
> >> Tested-by: Stefan Seyfried <seife+dev@b1-systems.com>
> >>
> >> This patch fixes the 0% CPU issue on openSUSE Factory kernels for me.
> > 
> > Thanks! The patch has been committed already so I can't add your Tested-by:
> > but feedbacks on testing are always appeciated.
> 
> But it did not end up in Linus' tree yet. That would be more important
> for me than the credits in the commit message :-)

It will soon :)

> 
> Thanks,
> 
> 	Stefan
> -- 
> Stefan Seyfried
> Linux Consultant & Developer -- GPG Key: 0x731B665B
> 
> B1 Systems GmbH
> Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
> GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 6/8] kthread: Enable parking requests from setup() and unpark() callbacks
  2013-05-21  6:59   ` Srivatsa S. Bhat
@ 2013-06-05 16:33     ` Frederic Weisbecker
  0 siblings, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2013-06-05 16:33 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: LKML, Steven Rostedt, Paul E. McKenney, Ingo Molnar,
	Thomas Gleixner, Peter Zijlstra, Borislav Petkov, Li Zhong,
	Don Zickus

2013/5/21 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>:
> On 05/20/2013 09:31 PM, Frederic Weisbecker wrote:
>> When the watchdog code is boot-disabled by the user, for example
>> through the 'nmi_watchdog=0' boot option, the setup() callback of
>> the watchdog kthread requests to park the task, and that until the
>> user later re-enables the watchdog through sysctl or procfs.
>>
>> However the parking request is not well handled when done from
>> the setup() callback. After ->setup() is called, the generic smpboot
>> kthread loop directly tries to call the thread function or wait
>> for some event if ->thread_should_run() is false.
>>
>> In the case of the watchdog kthread, ->thread_should_run() returns
>> false and the kthread goes to sleep and wait for the watchdog timer
>> to wake it up. But the timer is not enabled since the user requested
>> to disable the watchdog. We want the kthread to park instead of waiting
>> for events that can't happen.
>>
>> As a result, later unpark requests after sysctl write through
>> 'sysctl -w kernel.watchdog=1' won't wake up/unpark the task as
>> expected, since it's not parked anyway, leaving the value modified
>> without triggering any action.
>>
>> We could workaround some solution in the watchdog code like forcing
>> one pass to the thread function and immediately return to park.
>>
>> But supporting parking requests from ->setup() or ->unpark()
>
> unpark() can already do a proper park, because immediately after
> coming out of the parked state, the 'continue' statement helps
> re-evaluate the stop/park condition.

Right.

>
> So this fix is only for the ->setup() case.
>
>> callbacks look like proper way to implement cancellation in
>> general. So let's fix it that way.
>>
>
> Sounds good to me.
>
> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>
> But I wonder what nmi_watchdog=0 should actually mean, semantically..
> The current code works as if the user asked us not to run the watchdog
> threads, but it could as well be interpreted as if the user does not
> want to run *any* watchdog-related *code*. In that case, ideally we
> should *unregister* the watchdog threads, instead of just parking them.
> And when the user enables them again via sysctl/procfs, we should
> register the watchdog threads with the smpboot infrastructure.
>
> I'm not saying that the current semantics is wrong, but if we really
> implement it the other way I proposed above, then we won't have to deal
> with weird issues like ->setup() code wanting to park, and watchdog
> threads unparking and parking themselves on every CPU hotplug operation,
> despite the fact that the user specified nmi_watchdog=0 on the kernel
> command line.

That's an interesting point. It seems that using smpboot
register/unregister operations for sysctl control would be more
appropriate and less complicated. I'm going to give it a try.

Thanks!

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2013-06-05 16:33 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-20 16:01 [PATCH 0/8] nohz: Random fixes Frederic Weisbecker
2013-05-20 16:01 ` [PATCH 1/8] nohz: Warn if the machine can not perform nohz_full Frederic Weisbecker
2013-05-20 16:01 ` [PATCH 2/8] vtime: Use consistent clocks among nohz accounting Frederic Weisbecker
2013-06-03  9:47   ` Stefan Seyfried
2013-06-03 13:48     ` Steven Rostedt
2013-06-03 19:48     ` Frederic Weisbecker
2013-06-03 19:51       ` Stefan Seyfried
2013-06-03 20:12         ` Frederic Weisbecker
2013-05-20 16:01 ` [PATCH 3/8] watchdog: Boot-disable by default on full dynticks Frederic Weisbecker
2013-05-20 17:52   ` Don Zickus
2013-05-20 18:14     ` Frederic Weisbecker
2013-05-20 16:01 ` [PATCH 4/8] kvm: Move guest entry/exit APIs to context_tracking Frederic Weisbecker
2013-05-20 16:01 ` [PATCH 5/8] nohz: Fix notifier return val that enforce timekeeping Frederic Weisbecker
2013-05-20 16:01 ` [RFC PATCH 6/8] kthread: Enable parking requests from setup() and unpark() callbacks Frederic Weisbecker
2013-05-21  5:34   ` anish singh
2013-05-21  7:49     ` Srivatsa S. Bhat
2013-05-21  8:58       ` anish singh
2013-05-21  9:07         ` Srivatsa S. Bhat
2013-05-22 15:18         ` Frederic Weisbecker
2013-05-21  6:59   ` Srivatsa S. Bhat
2013-06-05 16:33     ` Frederic Weisbecker
2013-05-20 16:01 ` [RFC PATCH 7/8] watchdog: Rename confusing state variable Frederic Weisbecker
2013-05-20 17:53   ` Don Zickus
2013-05-20 16:01 ` [RFC PATCH 8/8] watchdog: Fix internal state with boot user disabled watchdog Frederic Weisbecker
2013-05-20 17:54   ` Don Zickus

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.