All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] timers nohz updates preview for 3.11
@ 2013-06-12 14:02 Frederic Weisbecker
  2013-06-12 14:02 ` [PATCH 1/6] sched: Disable lb_bias feature for full dynticks Frederic Weisbecker
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Frederic Weisbecker @ 2013-06-12 14:02 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Li Zhong, Borislav Petkov, Josh Triplett, Don Zickus,
	Srivatsa S. Bhat, Anish Singh, H. Peter Anvin

Hi,

So here is a first batch of some patches I think could go for 3.11
Except for "rcu: Prevent CPU from stopping tick if awaited for quiescent state report"
which needs more thinking. I believe we can do more from the IPI to force a
quiescent state without the need to restart the tick.

For those who want to check through git:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	timers/core-preview

Thanks,
	Frederic
---

Frederic Weisbecker (5):
      sched: Disable lb_bias feature for full dynticks
      watchdog: Register / unregister watchdog kthreads on sysctl control
      watchdog: Boot-disable by default on full dynticks
      rcu: Prevent CPU from stopping tick if awaited for quiescent state report
      nohz: Remove obsolete check for full dynticks CPUs to be RCU nocbs

Steven Rostedt (1):
      nohz: Warn if the machine can not perform nohz_full


 include/linux/rcupdate.h |    3 +
 kernel/rcutree_plugin.h  |   43 +++++++++++++++++++++
 kernel/sched/fair.c      |   13 +++++-
 kernel/sched/features.h  |    3 +
 kernel/time/tick-sched.c |   20 +++++-----
 kernel/watchdog.c        |   93 ++++++++++++++++++++++++++-------------------
 6 files changed, 124 insertions(+), 51 deletions(-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/6] sched: Disable lb_bias feature for full dynticks
  2013-06-12 14:02 [PATCH 0/6] timers nohz updates preview for 3.11 Frederic Weisbecker
@ 2013-06-12 14:02 ` Frederic Weisbecker
  2013-06-12 14:02 ` [PATCH 2/6] nohz: Warn if the machine can not perform nohz_full Frederic Weisbecker
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Frederic Weisbecker @ 2013-06-12 14:02 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Ingo Molnar, Li Zhong, Paul E. McKenney,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner, Borislav Petkov

If we run in full dynticks mode, we currently have no way to
correctly update the secondary decaying indexes of the CPU
load stats as it is typically maintained by update_cpu_load_active()
at each tick.

We have an available infrastructure that handles tickless loads
(cf: decay_load_missed) but it only works for idle tickless loads,
which only applies if the CPU hasn't run any real task but idle on
the tickless timeslice.

Until we can provide a sane mathematical solution to handle full
dynticks loads, lets simply deactivate the LB_BIAS sched feature
if CONFIG_NO_HZ_FULL as it is currently the only user of the decayed
load records.

The first load index that represents the current runqueue load weight
is still maintained and usable.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
---
 kernel/sched/fair.c     |   13 +++++++++++--
 kernel/sched/features.h |    3 +++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c61a614..81b62d6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2922,6 +2922,15 @@ static unsigned long weighted_cpuload(const int cpu)
 	return cpu_rq(cpu)->load.weight;
 }
 
+static inline int sched_lb_bias(void)
+{
+#ifndef CONFIG_NO_HZ_FULL
+	return sched_feat(LB_BIAS);
+#else
+	return 0;
+#endif
+}
+
 /*
  * Return a low guess at the load of a migration-source cpu weighted
  * according to the scheduling class and "nice" value.
@@ -2934,7 +2943,7 @@ static unsigned long source_load(int cpu, int type)
 	struct rq *rq = cpu_rq(cpu);
 	unsigned long total = weighted_cpuload(cpu);
 
-	if (type == 0 || !sched_feat(LB_BIAS))
+	if (type == 0 || !sched_lb_bias())
 		return total;
 
 	return min(rq->cpu_load[type-1], total);
@@ -2949,7 +2958,7 @@ static unsigned long target_load(int cpu, int type)
 	struct rq *rq = cpu_rq(cpu);
 	unsigned long total = weighted_cpuload(cpu);
 
-	if (type == 0 || !sched_feat(LB_BIAS))
+	if (type == 0 || !sched_lb_bias())
 		return total;
 
 	return max(rq->cpu_load[type-1], total);
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 99399f8..635f902 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -43,7 +43,10 @@ SCHED_FEAT(ARCH_POWER, true)
 
 SCHED_FEAT(HRTICK, false)
 SCHED_FEAT(DOUBLE_TICK, false)
+
+#ifndef CONFIG_NO_HZ_FULL
 SCHED_FEAT(LB_BIAS, true)
+#endif
 
 /*
  * Decrement CPU power based on time not spent running tasks
-- 
1.7.5.4


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 2/6] nohz: Warn if the machine can not perform nohz_full
  2013-06-12 14:02 [PATCH 0/6] timers nohz updates preview for 3.11 Frederic Weisbecker
  2013-06-12 14:02 ` [PATCH 1/6] sched: Disable lb_bias feature for full dynticks Frederic Weisbecker
@ 2013-06-12 14:02 ` Frederic Weisbecker
  2013-06-12 14:02 ` [PATCH 3/6] watchdog: Register / unregister watchdog kthreads on sysctl control Frederic Weisbecker
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Frederic Weisbecker @ 2013-06-12 14:02 UTC (permalink / raw)
  To: LKML
  Cc: Steven Rostedt, Paul E. McKenney, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Frederic Weisbecker

From: Steven Rostedt <rostedt@goodmis.org>

If the user configures NO_HZ_FULL and defines nohz_full=XXX on the
kernel command line, or enables NO_HZ_FULL_ALL, but nohz fails
due to the machine having a unstable clock, warn about it.

We do not want users thinking that they are getting the benefit
of nohz when their machine can not support it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/time/tick-sched.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 0cf1c14..dbb8f76 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -178,6 +178,11 @@ static bool can_stop_full_tick(void)
 	 */
 	if (!sched_clock_stable) {
 		trace_tick_stop(0, "unstable sched clock\n");
+		/*
+		 * Don't allow the user to think they can get
+		 * full NO_HZ with this machine.
+		 */
+		WARN_ONCE(1, "NO_HZ FULL will not work with unstable sched clock");
 		return false;
 	}
 #endif
-- 
1.7.5.4


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 3/6] watchdog: Register / unregister watchdog kthreads on sysctl control
  2013-06-12 14:02 [PATCH 0/6] timers nohz updates preview for 3.11 Frederic Weisbecker
  2013-06-12 14:02 ` [PATCH 1/6] sched: Disable lb_bias feature for full dynticks Frederic Weisbecker
  2013-06-12 14:02 ` [PATCH 2/6] nohz: Warn if the machine can not perform nohz_full Frederic Weisbecker
@ 2013-06-12 14:02 ` Frederic Weisbecker
  2013-06-12 14:02 ` [PATCH 4/6] watchdog: Boot-disable by default on full dynticks Frederic Weisbecker
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Frederic Weisbecker @ 2013-06-12 14:02 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Srivatsa S. Bhat, Anish Singh,
	Steven Rostedt, Paul E. McKenney, Ingo Molnar, Thomas Gleixner,
	Peter Zijlstra, Borislav Petkov, Li Zhong, Don Zickus

The user activation/deactivation of the watchdog through boot parameters
or systcl is currently implemented with a dance involving kthreads parking
and unparking methods: the threads are unconditionally registered on
boot and they park as soon as the user want the watchdog to be disabled.

This method involves a few noisy details to handle though: the watchdog
kthreads may be unparked anytime due to hotplug operations, after which
the watchdog internals have to decide to park again if it is user-disabled.

As a result the setup() and unpark() methods need to be able to request a
reparking. This is not currently supported in the kthread infrastructure
so this piece of the watchdog code only works halfway.

Besides, unparking/reparking the watchdog kthreads consume unnecessary
cputime on hotplug operations when those could be simply ignored in the
first place.

As suggested by Srivatsa, let's instead only register the watchdog
threads when they are needed. This way we don't need to think about
hotplug operations and we don't burden the CPU onlining when the watchdog
is simply disabled.

Suggested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Anish Singh <anish198519851985@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Don Zickus <dzickus@redhat.com>
---
 kernel/watchdog.c |   87 ++++++++++++++++++++++++++++------------------------
 1 files changed, 47 insertions(+), 40 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 05039e3..52c9a9b 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -31,7 +31,7 @@
 
 int watchdog_enabled = 1;
 int __read_mostly watchdog_thresh = 10;
-static int __read_mostly watchdog_disabled;
+static int __read_mostly watchdog_disabled = 1;
 static u64 __read_mostly sample_period;
 
 static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
@@ -347,11 +347,6 @@ static void watchdog_enable(unsigned int cpu)
 	hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
 	hrtimer->function = watchdog_timer_fn;
 
-	if (!watchdog_enabled) {
-		kthread_park(current);
-		return;
-	}
-
 	/* Enable the perf event */
 	watchdog_nmi_enable(cpu);
 
@@ -374,6 +369,11 @@ static void watchdog_disable(unsigned int cpu)
 	watchdog_nmi_disable(cpu);
 }
 
+static void watchdog_cleanup(unsigned int cpu, bool online)
+{
+	watchdog_disable(cpu);
+}
+
 static int watchdog_should_run(unsigned int cpu)
 {
 	return __this_cpu_read(hrtimer_interrupts) !=
@@ -475,28 +475,40 @@ static int watchdog_nmi_enable(unsigned int cpu) { return 0; }
 static void watchdog_nmi_disable(unsigned int cpu) { return; }
 #endif /* CONFIG_HARDLOCKUP_DETECTOR */
 
-/* prepare/enable/disable routines */
-/* sysctl functions */
-#ifdef CONFIG_SYSCTL
-static void watchdog_enable_all_cpus(void)
+static struct smp_hotplug_thread watchdog_threads = {
+	.store			= &softlockup_watchdog,
+	.thread_should_run	= watchdog_should_run,
+	.thread_fn		= watchdog,
+	.thread_comm		= "watchdog/%u",
+	.setup			= watchdog_enable,
+	.cleanup		= watchdog_cleanup,
+	.park			= watchdog_disable,
+	.unpark			= watchdog_enable,
+};
+
+static int watchdog_enable_all_cpus(void)
 {
-	unsigned int cpu;
+	int err = 0;
 
 	if (watchdog_disabled) {
-		watchdog_disabled = 0;
-		for_each_online_cpu(cpu)
-			kthread_unpark(per_cpu(softlockup_watchdog, cpu));
+		err = smpboot_register_percpu_thread(&watchdog_threads);
+		if (err)
+			pr_err("Failed to create watchdog threads, disabled\n");
+		else
+			watchdog_disabled = 0;
 	}
+
+	return err;
 }
 
+/* prepare/enable/disable routines */
+/* sysctl functions */
+#ifdef CONFIG_SYSCTL
 static void watchdog_disable_all_cpus(void)
 {
-	unsigned int cpu;
-
 	if (!watchdog_disabled) {
 		watchdog_disabled = 1;
-		for_each_online_cpu(cpu)
-			kthread_park(per_cpu(softlockup_watchdog, cpu));
+		smpboot_unregister_percpu_thread(&watchdog_threads);
 	}
 }
 
@@ -507,14 +519,14 @@ static void watchdog_disable_all_cpus(void)
 int proc_dowatchdog(struct ctl_table *table, int write,
 		    void __user *buffer, size_t *lenp, loff_t *ppos)
 {
-	int ret;
+	int err, old_thresh, old_enabled;
 
-	if (watchdog_disabled < 0)
-		return -ENODEV;
+	old_thresh = ACCESS_ONCE(watchdog_thresh);
+	old_enabled = ACCESS_ONCE(watchdog_enabled);
 
-	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
-	if (ret || !write)
-		return ret;
+	err = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+	if (err || !write)
+		return err;
 
 	set_sample_period();
 	/*
@@ -523,29 +535,24 @@ int proc_dowatchdog(struct ctl_table *table, int write,
 	 * watchdog_*_all_cpus() function takes care of this.
 	 */
 	if (watchdog_enabled && watchdog_thresh)
-		watchdog_enable_all_cpus();
+		err = watchdog_enable_all_cpus();
 	else
 		watchdog_disable_all_cpus();
 
-	return ret;
+	/* Restore old values on failure */
+	if (err) {
+		watchdog_thresh = old_thresh;
+		watchdog_enabled = old_enabled;
+	}
+
+	return err;
 }
 #endif /* CONFIG_SYSCTL */
 
-static struct smp_hotplug_thread watchdog_threads = {
-	.store			= &softlockup_watchdog,
-	.thread_should_run	= watchdog_should_run,
-	.thread_fn		= watchdog,
-	.thread_comm		= "watchdog/%u",
-	.setup			= watchdog_enable,
-	.park			= watchdog_disable,
-	.unpark			= watchdog_enable,
-};
-
 void __init lockup_detector_init(void)
 {
 	set_sample_period();
-	if (smpboot_register_percpu_thread(&watchdog_threads)) {
-		pr_err("Failed to create watchdog threads, disabled\n");
-		watchdog_disabled = -ENODEV;
-	}
+
+	if (watchdog_enabled)
+		watchdog_enable_all_cpus();
 }
-- 
1.7.5.4


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 4/6] watchdog: Boot-disable by default on full dynticks
  2013-06-12 14:02 [PATCH 0/6] timers nohz updates preview for 3.11 Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2013-06-12 14:02 ` [PATCH 3/6] watchdog: Register / unregister watchdog kthreads on sysctl control Frederic Weisbecker
@ 2013-06-12 14:02 ` Frederic Weisbecker
  2013-06-12 17:03   ` Don Zickus
  2013-06-12 14:02 ` [PATCH 5/6] rcu: Prevent CPU from stopping tick if awaited for quiescent state report Frederic Weisbecker
  2013-06-12 14:02 ` [PATCH 6/6] nohz: Remove obsolete check for full dynticks CPUs to be RCU nocbs Frederic Weisbecker
  5 siblings, 1 reply; 27+ messages in thread
From: Frederic Weisbecker @ 2013-06-12 14:02 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Li Zhong, Don Zickus, Srivatsa S. Bhat, Anish Singh

When the watchdog runs, it prevents the full dynticks
CPUs from stopping their tick because the hard lockup
detector uses perf events internally, which in turn
rely on the periodic tick.

Since this is a rather confusing behaviour that is not
easy to track down and identify for those who want to
test CONFIG_NO_HZ_FULL, let's default disable the
watchdog on boot time when full dynticks is enabled.

The user can still enable it later on runtime using
proc or sysctl.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Anish Singh <anish198519851985@gmail.com>
---
 kernel/watchdog.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 52c9a9b..277d713 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -553,6 +553,14 @@ void __init lockup_detector_init(void)
 {
 	set_sample_period();
 
+#ifdef CONFIG_NO_HZ_FULL
+	if (watchdog_enabled) {
+		watchdog_enabled = 0;
+		pr_warning("Disabled lockup detectors by default for full dynticks\n");
+		pr_warning("You can reactivate it with 'sysctl -w kernel.watchdog=1'\n");
+	}
+#endif
+
 	if (watchdog_enabled)
 		watchdog_enable_all_cpus();
 }
-- 
1.7.5.4


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 5/6] rcu: Prevent CPU from stopping tick if awaited for quiescent state report
  2013-06-12 14:02 [PATCH 0/6] timers nohz updates preview for 3.11 Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2013-06-12 14:02 ` [PATCH 4/6] watchdog: Boot-disable by default on full dynticks Frederic Weisbecker
@ 2013-06-12 14:02 ` Frederic Weisbecker
  2013-06-12 14:02 ` [PATCH 6/6] nohz: Remove obsolete check for full dynticks CPUs to be RCU nocbs Frederic Weisbecker
  5 siblings, 0 replies; 27+ messages in thread
From: Frederic Weisbecker @ 2013-06-12 14:02 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Josh Triplett

When a CPU runs in full dynticks mode in the kernel, it is outside
the RCU user mode. If another CPU has started a grace period and is
waiting for this CPU to report a quiescent state, the lack of a tick
may extend the grace period. This is typically not a problem because
the kernel code is supposed to quickly resume to either:

* userspace, in this case we'll enter into RCU user mode and thus
get rid of our quiescent state report duty.

* schedule to idle, which involve the same as the userspace case

* schedule another task, but then this implies we have more than
one task in the runqueue and we kept the tick for the scheduler
to correctly handle the multitasking.

Now it's always good to consider a worst case which here could be
that the CPU eventually stays in the kernel longer than expected
and can then extend a grace period longer than we can afford.

Restarting the tick can be a good idea in this case:

* this way we can report we are outside a softirq if an RCU_bh qs
is pending.

* we can kick the rcu softirq if we need to report a RCU_sched bh, as
that involve a context switch.

RCU already sends an IPI to kick such annoying full dynticks CPUs.
Now this patch implements the other side: restart the tick from the
IPI if we need to report a quiescent state.

NOTE: we can probably do better and rather act from the IPI without
restarting the tick.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
---
 include/linux/rcupdate.h |    3 +++
 kernel/rcutree_plugin.h  |   43 +++++++++++++++++++++++++++++++++++++++++++
 kernel/time/tick-sched.c |    5 +++++
 3 files changed, 51 insertions(+), 0 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 4ccd68e..6e3c5cf 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -1006,5 +1006,8 @@ extern bool rcu_is_nocb_cpu(int cpu);
 static inline bool rcu_is_nocb_cpu(int cpu) { return false; }
 #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
 
+#ifdef CONFIG_NO_HZ_FULL
+extern bool rcu_can_stop_tick(void);
+#endif
 
 #endif /* __LINUX_RCUPDATE_H */
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 3db5a37..391386f 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -2350,3 +2350,46 @@ static void rcu_kick_nohz_cpu(int cpu)
 		smp_send_reschedule(cpu);
 #endif /* #ifdef CONFIG_NO_HZ_FULL */
 }
+
+#ifdef CONFIG_NO_HZ_FULL
+/*
+ * This pairs with rcu_kick_nohz_cpu. It is called from the
+ * irq exit path to check if the CPU needs to restart its tick
+ * to report a quiescent state after extending the grace period
+ * for too long.
+ */
+bool rcu_can_stop_tick(void)
+{
+	struct rcu_state *rsp;
+	struct rcu_data *rdp;
+
+	WARN_ON_ONCE(!irqs_disabled());
+
+	/* We are already in extended quiescent state */
+	if (rcu_is_cpu_idle())
+		return true;
+
+	/*
+	 * Note there is no guarantee that we'll see the new grace period
+	 * that the IPI sender wants us to see in the RCU global state. Some
+	 * ordering against the IPI send/receive and rsp->gpnum is probably
+	 * required to enforce that.
+	 *
+	 * Besides, note_new_gp_num() might ignore the new grace period if
+	 * the rnp lock is contended.
+	 *
+	 * Either we need to resend the ipi periodically if no progress is made
+	 * or we need to fix these ordering/locking issues for this code to be
+	 * correct.
+	 */
+	for_each_rcu_flavor(rsp) {
+		 rdp = this_cpu_ptr(rsp->rda);
+		 check_for_new_grace_period(rsp, rdp);
+
+		 if (rdp->qs_pending && !rdp->passed_quiesce)
+			 return false;
+	}
+
+	return true;
+}
+#endif
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index dbb8f76..917b871 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -170,6 +170,11 @@ static bool can_stop_full_tick(void)
 		return false;
 	}
 
+	if (!rcu_can_stop_tick()) {
+		trace_tick_stop(0, "RCU needs tick\n");
+		return false;
+	}
+
 	/* sched_clock_tick() needs us? */
 #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
 	/*
-- 
1.7.5.4


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 6/6] nohz: Remove obsolete check for full dynticks CPUs to be RCU nocbs
  2013-06-12 14:02 [PATCH 0/6] timers nohz updates preview for 3.11 Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2013-06-12 14:02 ` [PATCH 5/6] rcu: Prevent CPU from stopping tick if awaited for quiescent state report Frederic Weisbecker
@ 2013-06-12 14:02 ` Frederic Weisbecker
  5 siblings, 0 replies; 27+ messages in thread
From: Frederic Weisbecker @ 2013-06-12 14:02 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Li Zhong, Borislav Petkov

Building full dynticks now implies that all CPUs are forced
into RCU nocb mode through CONFIG_RCU_NOCB_CPU_ALL.

The dynamic check has become useless.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
---
 kernel/time/tick-sched.c |   10 ----------
 1 files changed, 0 insertions(+), 10 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 917b871..91e2ab9 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -356,16 +356,6 @@ void __init tick_nohz_init(void)
 	}
 
 	cpu_notifier(tick_nohz_cpu_down_callback, 0);
-
-	/* Make sure full dynticks CPU are also RCU nocbs */
-	for_each_cpu(cpu, nohz_full_mask) {
-		if (!rcu_is_nocb_cpu(cpu)) {
-			pr_warning("NO_HZ: CPU %d is not RCU nocb: "
-				   "cleared from nohz_full range", cpu);
-			cpumask_clear_cpu(cpu, nohz_full_mask);
-		}
-	}
-
 	cpulist_scnprintf(nohz_full_buf, sizeof(nohz_full_buf), nohz_full_mask);
 	pr_info("NO_HZ: Full dynticks CPUs: %s.\n", nohz_full_buf);
 }
-- 
1.7.5.4


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] watchdog: Boot-disable by default on full dynticks
  2013-06-12 14:02 ` [PATCH 4/6] watchdog: Boot-disable by default on full dynticks Frederic Weisbecker
@ 2013-06-12 17:03   ` Don Zickus
  2013-06-13 13:10     ` Frederic Weisbecker
  0 siblings, 1 reply; 27+ messages in thread
From: Don Zickus @ 2013-06-12 17:03 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Steven Rostedt, Paul E. McKenney, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Peter Zijlstra, Li Zhong,
	Srivatsa S. Bhat, Anish Singh

On Wed, Jun 12, 2013 at 04:02:36PM +0200, Frederic Weisbecker wrote:
> When the watchdog runs, it prevents the full dynticks
> CPUs from stopping their tick because the hard lockup
> detector uses perf events internally, which in turn
> rely on the periodic tick.
> 
> Since this is a rather confusing behaviour that is not
> easy to track down and identify for those who want to
> test CONFIG_NO_HZ_FULL, let's default disable the
> watchdog on boot time when full dynticks is enabled.
> 
> The user can still enable it later on runtime using
> proc or sysctl.

I thought we had a conversation awhile ago, where we agreed this was going
to be fixed for 3.11?  Didn't Peter find the patch and apply it to his
tree?  I am confused why this is still needed?

Cheers,
Don

> 
> Reported-by: Steven Rostedt <rostedt@goodmis.org>
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Li Zhong <zhong@linux.vnet.ibm.com>
> Cc: Don Zickus <dzickus@redhat.com>
> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> Cc: Anish Singh <anish198519851985@gmail.com>
> ---
>  kernel/watchdog.c |    8 ++++++++
>  1 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 52c9a9b..277d713 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -553,6 +553,14 @@ void __init lockup_detector_init(void)
>  {
>  	set_sample_period();
>  
> +#ifdef CONFIG_NO_HZ_FULL
> +	if (watchdog_enabled) {
> +		watchdog_enabled = 0;
> +		pr_warning("Disabled lockup detectors by default for full dynticks\n");
> +		pr_warning("You can reactivate it with 'sysctl -w kernel.watchdog=1'\n");
> +	}
> +#endif
> +
>  	if (watchdog_enabled)
>  		watchdog_enable_all_cpus();
>  }
> -- 
> 1.7.5.4
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] watchdog: Boot-disable by default on full dynticks
  2013-06-12 17:03   ` Don Zickus
@ 2013-06-13 13:10     ` Frederic Weisbecker
  2013-06-13 14:02       ` Don Zickus
  0 siblings, 1 reply; 27+ messages in thread
From: Frederic Weisbecker @ 2013-06-13 13:10 UTC (permalink / raw)
  To: Don Zickus
  Cc: LKML, Steven Rostedt, Paul E. McKenney, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Peter Zijlstra, Li Zhong,
	Srivatsa S. Bhat, Anish Singh

On Wed, Jun 12, 2013 at 01:03:16PM -0400, Don Zickus wrote:
> On Wed, Jun 12, 2013 at 04:02:36PM +0200, Frederic Weisbecker wrote:
> > When the watchdog runs, it prevents the full dynticks
> > CPUs from stopping their tick because the hard lockup
> > detector uses perf events internally, which in turn
> > rely on the periodic tick.
> > 
> > Since this is a rather confusing behaviour that is not
> > easy to track down and identify for those who want to
> > test CONFIG_NO_HZ_FULL, let's default disable the
> > watchdog on boot time when full dynticks is enabled.
> > 
> > The user can still enable it later on runtime using
> > proc or sysctl.
> 
> I thought we had a conversation awhile ago, where we agreed this was going
> to be fixed for 3.11?  Didn't Peter find the patch and apply it to his
> tree?  I am confused why this is still needed?

We agreed on the patch but it hasn't been applied yet. I'm trying to get
a sane series of nohz patches before sending to Ingo.

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] watchdog: Boot-disable by default on full dynticks
  2013-06-13 13:10     ` Frederic Weisbecker
@ 2013-06-13 14:02       ` Don Zickus
  2013-06-13 14:22         ` Frederic Weisbecker
  2013-06-18 10:36         ` Peter Zijlstra
  0 siblings, 2 replies; 27+ messages in thread
From: Don Zickus @ 2013-06-13 14:02 UTC (permalink / raw)
  To: Frederic Weisbecker, Peter Zijlstra
  Cc: LKML, Steven Rostedt, Paul E. McKenney, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Li Zhong, Srivatsa S. Bhat,
	Anish Singh

On Thu, Jun 13, 2013 at 03:10:59PM +0200, Frederic Weisbecker wrote:
> On Wed, Jun 12, 2013 at 01:03:16PM -0400, Don Zickus wrote:
> > On Wed, Jun 12, 2013 at 04:02:36PM +0200, Frederic Weisbecker wrote:
> > > When the watchdog runs, it prevents the full dynticks
> > > CPUs from stopping their tick because the hard lockup
> > > detector uses perf events internally, which in turn
> > > rely on the periodic tick.
> > > 
> > > Since this is a rather confusing behaviour that is not
> > > easy to track down and identify for those who want to
> > > test CONFIG_NO_HZ_FULL, let's default disable the
> > > watchdog on boot time when full dynticks is enabled.
> > > 
> > > The user can still enable it later on runtime using
> > > proc or sysctl.
> > 
> > I thought we had a conversation awhile ago, where we agreed this was going
> > to be fixed for 3.11?  Didn't Peter find the patch and apply it to his
> > tree?  I am confused why this is still needed?
> 
> We agreed on the patch but it hasn't been applied yet. I'm trying to get
> a sane series of nohz patches before sending to Ingo.

Peter,

Where is this patch?

Cheers,
Don

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] watchdog: Boot-disable by default on full dynticks
  2013-06-13 14:02       ` Don Zickus
@ 2013-06-13 14:22         ` Frederic Weisbecker
  2013-06-13 14:45           ` Don Zickus
  2013-06-18 10:36         ` Peter Zijlstra
  1 sibling, 1 reply; 27+ messages in thread
From: Frederic Weisbecker @ 2013-06-13 14:22 UTC (permalink / raw)
  To: Don Zickus
  Cc: Peter Zijlstra, LKML, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Andrew Morton, Thomas Gleixner, Li Zhong,
	Srivatsa S. Bhat, Anish Singh

On Thu, Jun 13, 2013 at 10:02:07AM -0400, Don Zickus wrote:
> On Thu, Jun 13, 2013 at 03:10:59PM +0200, Frederic Weisbecker wrote:
> > On Wed, Jun 12, 2013 at 01:03:16PM -0400, Don Zickus wrote:
> > > On Wed, Jun 12, 2013 at 04:02:36PM +0200, Frederic Weisbecker wrote:
> > > > When the watchdog runs, it prevents the full dynticks
> > > > CPUs from stopping their tick because the hard lockup
> > > > detector uses perf events internally, which in turn
> > > > rely on the periodic tick.
> > > > 
> > > > Since this is a rather confusing behaviour that is not
> > > > easy to track down and identify for those who want to
> > > > test CONFIG_NO_HZ_FULL, let's default disable the
> > > > watchdog on boot time when full dynticks is enabled.
> > > > 
> > > > The user can still enable it later on runtime using
> > > > proc or sysctl.
> > > 
> > > I thought we had a conversation awhile ago, where we agreed this was going
> > > to be fixed for 3.11?  Didn't Peter find the patch and apply it to his
> > > tree?  I am confused why this is still needed?
> > 
> > We agreed on the patch but it hasn't been applied yet. I'm trying to get
> > a sane series of nohz patches before sending to Ingo.
> 
> Peter,
> 
> Where is this patch?

Which patch? The old version of the current one? It was part of a previous series
that needed improvements so it hasn't been applied yet.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] watchdog: Boot-disable by default on full dynticks
  2013-06-13 14:22         ` Frederic Weisbecker
@ 2013-06-13 14:45           ` Don Zickus
  2013-06-13 14:56             ` Frederic Weisbecker
  0 siblings, 1 reply; 27+ messages in thread
From: Don Zickus @ 2013-06-13 14:45 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, LKML, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Andrew Morton, Thomas Gleixner, Li Zhong,
	Srivatsa S. Bhat, Anish Singh

On Thu, Jun 13, 2013 at 04:22:12PM +0200, Frederic Weisbecker wrote:
> On Thu, Jun 13, 2013 at 10:02:07AM -0400, Don Zickus wrote:
> > On Thu, Jun 13, 2013 at 03:10:59PM +0200, Frederic Weisbecker wrote:
> > > On Wed, Jun 12, 2013 at 01:03:16PM -0400, Don Zickus wrote:
> > > > On Wed, Jun 12, 2013 at 04:02:36PM +0200, Frederic Weisbecker wrote:
> > > > > When the watchdog runs, it prevents the full dynticks
> > > > > CPUs from stopping their tick because the hard lockup
> > > > > detector uses perf events internally, which in turn
> > > > > rely on the periodic tick.
> > > > > 
> > > > > Since this is a rather confusing behaviour that is not
> > > > > easy to track down and identify for those who want to
> > > > > test CONFIG_NO_HZ_FULL, let's default disable the
> > > > > watchdog on boot time when full dynticks is enabled.
> > > > > 
> > > > > The user can still enable it later on runtime using
> > > > > proc or sysctl.
> > > > 
> > > > I thought we had a conversation awhile ago, where we agreed this was going
> > > > to be fixed for 3.11?  Didn't Peter find the patch and apply it to his
> > > > tree?  I am confused why this is still needed?
> > > 
> > > We agreed on the patch but it hasn't been applied yet. I'm trying to get
> > > a sane series of nohz patches before sending to Ingo.
> > 
> > Peter,
> > 
> > Where is this patch?
> 
> Which patch? The old version of the current one? It was part of a previous series
> that needed improvements so it hasn't been applied yet.

I guess I am confused.  I thought Peter said in an email awhile ago, that
he found Stephan's patch (that converted perf to use hrtimers) and applied
it to his tree.  Are you saying it was unapplied because it needed
improvements?

Cheers,
Don


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] watchdog: Boot-disable by default on full dynticks
  2013-06-13 14:45           ` Don Zickus
@ 2013-06-13 14:56             ` Frederic Weisbecker
  2013-06-13 15:20               ` Don Zickus
  0 siblings, 1 reply; 27+ messages in thread
From: Frederic Weisbecker @ 2013-06-13 14:56 UTC (permalink / raw)
  To: Don Zickus
  Cc: Peter Zijlstra, LKML, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Andrew Morton, Thomas Gleixner, Li Zhong,
	Srivatsa S. Bhat, Anish Singh

On Thu, Jun 13, 2013 at 10:45:15AM -0400, Don Zickus wrote:
> On Thu, Jun 13, 2013 at 04:22:12PM +0200, Frederic Weisbecker wrote:
> > On Thu, Jun 13, 2013 at 10:02:07AM -0400, Don Zickus wrote:
> > > On Thu, Jun 13, 2013 at 03:10:59PM +0200, Frederic Weisbecker wrote:
> > > > On Wed, Jun 12, 2013 at 01:03:16PM -0400, Don Zickus wrote:
> > > > > On Wed, Jun 12, 2013 at 04:02:36PM +0200, Frederic Weisbecker wrote:
> > > > > > When the watchdog runs, it prevents the full dynticks
> > > > > > CPUs from stopping their tick because the hard lockup
> > > > > > detector uses perf events internally, which in turn
> > > > > > rely on the periodic tick.
> > > > > > 
> > > > > > Since this is a rather confusing behaviour that is not
> > > > > > easy to track down and identify for those who want to
> > > > > > test CONFIG_NO_HZ_FULL, let's default disable the
> > > > > > watchdog on boot time when full dynticks is enabled.
> > > > > > 
> > > > > > The user can still enable it later on runtime using
> > > > > > proc or sysctl.
> > > > > 
> > > > > I thought we had a conversation awhile ago, where we agreed this was going
> > > > > to be fixed for 3.11?  Didn't Peter find the patch and apply it to his
> > > > > tree?  I am confused why this is still needed?
> > > > 
> > > > We agreed on the patch but it hasn't been applied yet. I'm trying to get
> > > > a sane series of nohz patches before sending to Ingo.
> > > 
> > > Peter,
> > > 
> > > Where is this patch?
> > 
> > Which patch? The old version of the current one? It was part of a previous series
> > that needed improvements so it hasn't been applied yet.
> 
> I guess I am confused.  I thought Peter said in an email awhile ago, that
> he found Stephan's patch (that converted perf to use hrtimers) and applied
> it to his tree.  Are you saying it was unapplied because it needed
> improvements?

I think it doesn't completely remove the perf tick. Besides it doesn't really
solve the problem since perf events will require periodic hrtimers to work,
which still defeats the purpose of full dynticks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] watchdog: Boot-disable by default on full dynticks
  2013-06-13 14:56             ` Frederic Weisbecker
@ 2013-06-13 15:20               ` Don Zickus
  2013-06-13 15:48                 ` Steven Rostedt
  0 siblings, 1 reply; 27+ messages in thread
From: Don Zickus @ 2013-06-13 15:20 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, LKML, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Andrew Morton, Thomas Gleixner, Li Zhong,
	Srivatsa S. Bhat, Anish Singh

On Thu, Jun 13, 2013 at 04:56:03PM +0200, Frederic Weisbecker wrote:
> > > > Peter,
> > > > 
> > > > Where is this patch?
> > > 
> > > Which patch? The old version of the current one? It was part of a previous series
> > > that needed improvements so it hasn't been applied yet.
> > 
> > I guess I am confused.  I thought Peter said in an email awhile ago, that
> > he found Stephan's patch (that converted perf to use hrtimers) and applied
> > it to his tree.  Are you saying it was unapplied because it needed
> > improvements?
> 
> I think it doesn't completely remove the perf tick. Besides it doesn't really
> solve the problem since perf events will require periodic hrtimers to work,
> which still defeats the purpose of full dynticks.

I don't know enough about how full dynticks work to even present a
solution.  But currently I was working with the Red Hat performance team
to enhance perf to help our customers diagnose performance problems
easier.

My fear is anyone who uses full dynticks and has issues, can't use perf to
help diagnose their problems because it will change the dynamics of the
problem.  And with the current huge drop in performance in cpu_idle (as
compared to RHEL-6's 2.6.32 kernel) due to what seems to be miscalculated
c-states, one might have a hard time evaluating if full dynticks is doing
the right thing or not.

Then again perhaps full dynticks isn't useful for distros like RHEL.

That's why I was hoping to solve the underlying problem as opposed to
accepting patches like this which work around the symptoms.

Again, my knowledge of full dynticks is poor, so I have almost no idea of
the complexities surrounding the problem and how hard it is to even solve
it.

Cheers,
Don


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] watchdog: Boot-disable by default on full dynticks
  2013-06-13 15:20               ` Don Zickus
@ 2013-06-13 15:48                 ` Steven Rostedt
  2013-06-13 16:21                   ` anish singh
  2013-06-14 13:49                   ` Don Zickus
  0 siblings, 2 replies; 27+ messages in thread
From: Steven Rostedt @ 2013-06-13 15:48 UTC (permalink / raw)
  To: Don Zickus
  Cc: Frederic Weisbecker, Peter Zijlstra, LKML, Paul E. McKenney,
	Ingo Molnar, Andrew Morton, Thomas Gleixner, Li Zhong,
	Srivatsa S. Bhat, Anish Singh

On Thu, 2013-06-13 at 11:20 -0400, Don Zickus wrote:

> I don't know enough about how full dynticks work to even present a
> solution.  But currently I was working with the Red Hat performance team
> to enhance perf to help our customers diagnose performance problems
> easier.
> 
> My fear is anyone who uses full dynticks and has issues, can't use perf to
> help diagnose their problems because it will change the dynamics of the
> problem.  And with the current huge drop in performance in cpu_idle (as
> compared to RHEL-6's 2.6.32 kernel) due to what seems to be miscalculated
> c-states, one might have a hard time evaluating if full dynticks is doing
> the right thing or not.

This needs to be fixed, but not for 3.11. Although, you can still use
ftrace to diagnose it.

> 
> Then again perhaps full dynticks isn't useful for distros like RHEL.

It will be very useful for RHEL. But its still very new, and I wouldn't
recommend using it in a production environment yet. There's still a few
issues that need to be worked out, including this one. When the issues
are fixed, then RHEL and other distributions will definitely want to
enable this.

> 
> That's why I was hoping to solve the underlying problem as opposed to
> accepting patches like this which work around the symptoms.

For now it's just to get things working as people expect it to. First
impressions are very important, and if someone enables it and sees it
makes no difference, they may from then on never trust it. The way to
handle that is to make sure it works when enabled, even if it disables
some other cool features. But as I said, it shouldn't be used in
production quite yet.

> 
> Again, my knowledge of full dynticks is poor, so I have almost no idea of
> the complexities surrounding the problem and how hard it is to even solve
> it.

The concept behind full dynamic ticks is very easy. When you set a given
CPU(s) to dynamic tick, when it only has a single task scheduled on that
CPU, it disables the periodic tick. This removes essentially *all*
latency from the kernel! That is, if the task is doing some complex
calculations, it wont be interrupted for kernel maintenance. A lot of
Red Hat customers would love to have this feature. It allows for
extremely low latency actions even without a real-time kernel. Heck, it
works without even kernel preemption.

Now removing the periodic tick is not a trivial task, and this is where
all our issues come from. In fact, we can not even completely remove the
tick yet, we just move it to 1 HZ instead of whatever the CONFIG_HZ is
set to. We have to handle everything that depends on that tick, which
includes perf, among other things.

-- Steve



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] watchdog: Boot-disable by default on full dynticks
  2013-06-13 15:48                 ` Steven Rostedt
@ 2013-06-13 16:21                   ` anish singh
  2013-06-13 17:16                     ` Steven Rostedt
  2013-06-14 13:49                   ` Don Zickus
  1 sibling, 1 reply; 27+ messages in thread
From: anish singh @ 2013-06-13 16:21 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Don Zickus, Frederic Weisbecker, Peter Zijlstra, LKML,
	Paul E. McKenney, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Li Zhong, Srivatsa S. Bhat

On Thu, Jun 13, 2013 at 9:18 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Thu, 2013-06-13 at 11:20 -0400, Don Zickus wrote:
>
>> I don't know enough about how full dynticks work to even present a
>> solution.  But currently I was working with the Red Hat performance team
>> to enhance perf to help our customers diagnose performance problems
>> easier.
>>
>> My fear is anyone who uses full dynticks and has issues, can't use perf to
>> help diagnose their problems because it will change the dynamics of the
>> problem.  And with the current huge drop in performance in cpu_idle (as
>> compared to RHEL-6's 2.6.32 kernel) due to what seems to be miscalculated
>> c-states, one might have a hard time evaluating if full dynticks is doing
>> the right thing or not.
>
> This needs to be fixed, but not for 3.11. Although, you can still use
> ftrace to diagnose it.
>
>>
>> Then again perhaps full dynticks isn't useful for distros like RHEL.
>
> It will be very useful for RHEL. But its still very new, and I wouldn't
> recommend using it in a production environment yet. There's still a few
> issues that need to be worked out, including this one. When the issues
> are fixed, then RHEL and other distributions will definitely want to
> enable this.
>
>>
>> That's why I was hoping to solve the underlying problem as opposed to
>> accepting patches like this which work around the symptoms.
>
> For now it's just to get things working as people expect it to. First
> impressions are very important, and if someone enables it and sees it
> makes no difference, they may from then on never trust it. The way to
> handle that is to make sure it works when enabled, even if it disables
> some other cool features. But as I said, it shouldn't be used in
> production quite yet.
>
>>
>> Again, my knowledge of full dynticks is poor, so I have almost no idea of
>> the complexities surrounding the problem and how hard it is to even solve
>> it.
>
> The concept behind full dynamic ticks is very easy. When you set a given
> CPU(s) to dynamic tick, when it only has a single task scheduled on that
> CPU, it disables the periodic tick. This removes essentially *all*

Documentation/timers/highres.txt states that
hrtimer_stop_sched_tick() is called when a CPU goes into idle state.Would
you mind elaborating "single task scheduled on that CPU"?
I am bit new to this so please excuse me if the question is too basic.
> latency from the kernel! That is, if the task is doing some complex
> calculations, it wont be interrupted for kernel maintenance. A lot of
> Red Hat customers would love to have this feature. It allows for
> extremely low latency actions even without a real-time kernel. Heck, it
> works without even kernel preemption.
>
> Now removing the periodic tick is not a trivial task, and this is where

You mean getting rid of period ticks in the kernel code is not easy as there
are many features such as perf is dependent on it right and that is why
instead of completely removing periodic ticks we just set the periodic tick
to happen at 1HZ instead of CONFIG_HZ value?
> all our issues come from. In fact, we can not even completely remove the
> tick yet, we just move it to 1 HZ instead of whatever the CONFIG_HZ is
> set to. We have to handle everything that depends on that tick, which
> includes perf, among other things.
>
> -- Steve
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] watchdog: Boot-disable by default on full dynticks
  2013-06-13 16:21                   ` anish singh
@ 2013-06-13 17:16                     ` Steven Rostedt
  2013-06-14  4:17                       ` anish singh
  0 siblings, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2013-06-13 17:16 UTC (permalink / raw)
  To: anish singh
  Cc: Don Zickus, Frederic Weisbecker, Peter Zijlstra, LKML,
	Paul E. McKenney, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Li Zhong, Srivatsa S. Bhat

On Thu, 2013-06-13 at 21:51 +0530, anish singh wrote:

> > The concept behind full dynamic ticks is very easy. When you set a given
> > CPU(s) to dynamic tick, when it only has a single task scheduled on that
> > CPU, it disables the periodic tick. This removes essentially *all*
> 
> Documentation/timers/highres.txt states that
> hrtimer_stop_sched_tick() is called when a CPU goes into idle state.Would
> you mind elaborating "single task scheduled on that CPU"?
> I am bit new to this so please excuse me if the question is too basic.

That's the old CONFIG_NO_HZ, which only disables the tick on idle. What
we are working on is to also disable the tick when there's only one task
running on a given CPU. Why have as schedule tick when there's nothing
to schedule?

3.10 now has new config options:

CONFIG_NO_HZ_PERIODIC - which is NO_HZ disabled
   (the old # CONFIG_NO_HZ not set)

CONFIG_NO_HZ_IDLE - which is what CONFIG_NO_HZ use to be.

Note, CONFIG_NO_HZ still exists and if set, will make CONFIG_NO_HZ_IDLE
the default. This was to keep the same configuration for people who
update their kernel and run make oldconfig.

Then there's the new:

CONFIG_NO_HZ_FULL - this enables CONFIG_NO_HZ_IDLE plus adds the new
feature with disabling the tick when only one task is running on a given
CPU.


> > latency from the kernel! That is, if the task is doing some complex
> > calculations, it wont be interrupted for kernel maintenance. A lot of
> > Red Hat customers would love to have this feature. It allows for
> > extremely low latency actions even without a real-time kernel. Heck, it
> > works without even kernel preemption.
> >
> > Now removing the periodic tick is not a trivial task, and this is where
> 
> You mean getting rid of period ticks in the kernel code is not easy as there
> are many features such as perf is dependent on it right and that is why
> instead of completely removing periodic ticks we just set the periodic tick
> to happen at 1HZ instead of CONFIG_HZ value?

IIRC, the reason for moving to 1 HZ is so that the scheduler doesn't get
confused with overflows. It still needs to handle time keeping for
managing how to schedule tasks according to CFS.

Everything else shouldn't depend on the tick... period.

-- Steve

> > all our issues come from. In fact, we can not even completely remove the
> > tick yet, we just move it to 1 HZ instead of whatever the CONFIG_HZ is
> > set to. We have to handle everything that depends on that tick, which
> > includes perf, among other things.
> >



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] watchdog: Boot-disable by default on full dynticks
  2013-06-13 17:16                     ` Steven Rostedt
@ 2013-06-14  4:17                       ` anish singh
  2013-06-14 12:26                         ` Paul E. McKenney
  0 siblings, 1 reply; 27+ messages in thread
From: anish singh @ 2013-06-14  4:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Don Zickus, Frederic Weisbecker, Peter Zijlstra, LKML,
	Paul E. McKenney, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Li Zhong, Srivatsa S. Bhat

On Thu, Jun 13, 2013 at 10:46 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Thu, 2013-06-13 at 21:51 +0530, anish singh wrote:
>
>> > The concept behind full dynamic ticks is very easy. When you set a given
>> > CPU(s) to dynamic tick, when it only has a single task scheduled on that
>> > CPU, it disables the periodic tick. This removes essentially *all*
>>
>> Documentation/timers/highres.txt states that
>> hrtimer_stop_sched_tick() is called when a CPU goes into idle state.Would
>> you mind elaborating "single task scheduled on that CPU"?
>> I am bit new to this so please excuse me if the question is too basic.
>
> That's the old CONFIG_NO_HZ, which only disables the tick on idle. What
> we are working on is to also disable the tick when there's only one task
> running on a given CPU. Why have as schedule tick when there's nothing
> to schedule?
>
> 3.10 now has new config options:
>
> CONFIG_NO_HZ_PERIODIC - which is NO_HZ disabled
>    (the old # CONFIG_NO_HZ not set)
>
> CONFIG_NO_HZ_IDLE - which is what CONFIG_NO_HZ use to be.
>
> Note, CONFIG_NO_HZ still exists and if set, will make CONFIG_NO_HZ_IDLE
> the default. This was to keep the same configuration for people who
> update their kernel and run make oldconfig.
>
> Then there's the new:
>
> CONFIG_NO_HZ_FULL - this enables CONFIG_NO_HZ_IDLE plus adds the new
> feature with disabling the tick when only one task is running on a given
> CPU.

Thanks and some more explanation in below documents.
Documentation/timers/NO_HZ.txt
Documentation/timers/highres.txt
>
>
>> > latency from the kernel! That is, if the task is doing some complex
>> > calculations, it wont be interrupted for kernel maintenance. A lot of
>> > Red Hat customers would love to have this feature. It allows for
>> > extremely low latency actions even without a real-time kernel. Heck, it
>> > works without even kernel preemption.
>> >
>> > Now removing the periodic tick is not a trivial task, and this is where
>>
>> You mean getting rid of period ticks in the kernel code is not easy as there
>> are many features such as perf is dependent on it right and that is why
>> instead of completely removing periodic ticks we just set the periodic tick
>> to happen at 1HZ instead of CONFIG_HZ value?
>
> IIRC, the reason for moving to 1 HZ is so that the scheduler doesn't get
> confused with overflows. It still needs to handle time keeping for
"overflows" meaning the tick happening at 1HZ?
However as I see here in this patch http://lwn.net/Articles/549592/ -
you have plans to move it to 0Hz then how does scheduler cope
with this?Does it not need this information to schedule a different
task when the current task on "adaptive-ticks CPU" is done?

Anyway doesn't "future works" should be part of No-hz.txt document?
> managing how to schedule tasks according to CFS.
>
> Everything else shouldn't depend on the tick... period.
>
> -- Steve
>
>> > all our issues come from. In fact, we can not even completely remove the
>> > tick yet, we just move it to 1 HZ instead of whatever the CONFIG_HZ is
>> > set to. We have to handle everything that depends on that tick, which
>> > includes perf, among other things.
>> >
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] watchdog: Boot-disable by default on full dynticks
  2013-06-14  4:17                       ` anish singh
@ 2013-06-14 12:26                         ` Paul E. McKenney
  2013-06-14 16:03                           ` anish singh
  0 siblings, 1 reply; 27+ messages in thread
From: Paul E. McKenney @ 2013-06-14 12:26 UTC (permalink / raw)
  To: anish singh
  Cc: Steven Rostedt, Don Zickus, Frederic Weisbecker, Peter Zijlstra,
	LKML, Ingo Molnar, Andrew Morton, Thomas Gleixner, Li Zhong,
	Srivatsa S. Bhat

On Fri, Jun 14, 2013 at 09:47:31AM +0530, anish singh wrote:
> On Thu, Jun 13, 2013 at 10:46 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> > On Thu, 2013-06-13 at 21:51 +0530, anish singh wrote:
> >
> >> > The concept behind full dynamic ticks is very easy. When you set a given
> >> > CPU(s) to dynamic tick, when it only has a single task scheduled on that
> >> > CPU, it disables the periodic tick. This removes essentially *all*
> >>
> >> Documentation/timers/highres.txt states that
> >> hrtimer_stop_sched_tick() is called when a CPU goes into idle state.Would
> >> you mind elaborating "single task scheduled on that CPU"?
> >> I am bit new to this so please excuse me if the question is too basic.
> >
> > That's the old CONFIG_NO_HZ, which only disables the tick on idle. What
> > we are working on is to also disable the tick when there's only one task
> > running on a given CPU. Why have as schedule tick when there's nothing
> > to schedule?
> >
> > 3.10 now has new config options:
> >
> > CONFIG_NO_HZ_PERIODIC - which is NO_HZ disabled
> >    (the old # CONFIG_NO_HZ not set)
> >
> > CONFIG_NO_HZ_IDLE - which is what CONFIG_NO_HZ use to be.
> >
> > Note, CONFIG_NO_HZ still exists and if set, will make CONFIG_NO_HZ_IDLE
> > the default. This was to keep the same configuration for people who
> > update their kernel and run make oldconfig.
> >
> > Then there's the new:
> >
> > CONFIG_NO_HZ_FULL - this enables CONFIG_NO_HZ_IDLE plus adds the new
> > feature with disabling the tick when only one task is running on a given
> > CPU.
> 
> Thanks and some more explanation in below documents.
> Documentation/timers/NO_HZ.txt
> Documentation/timers/highres.txt
> >
> >
> >> > latency from the kernel! That is, if the task is doing some complex
> >> > calculations, it wont be interrupted for kernel maintenance. A lot of
> >> > Red Hat customers would love to have this feature. It allows for
> >> > extremely low latency actions even without a real-time kernel. Heck, it
> >> > works without even kernel preemption.
> >> >
> >> > Now removing the periodic tick is not a trivial task, and this is where
> >>
> >> You mean getting rid of period ticks in the kernel code is not easy as there
> >> are many features such as perf is dependent on it right and that is why
> >> instead of completely removing periodic ticks we just set the periodic tick
> >> to happen at 1HZ instead of CONFIG_HZ value?
> >
> > IIRC, the reason for moving to 1 HZ is so that the scheduler doesn't get
> > confused with overflows. It still needs to handle time keeping for
> "overflows" meaning the tick happening at 1HZ?
> However as I see here in this patch http://lwn.net/Articles/549592/ -
> you have plans to move it to 0Hz then how does scheduler cope
> with this?Does it not need this information to schedule a different
> task when the current task on "adaptive-ticks CPU" is done?

When the current task completed, it will enter the kernel, allowing
the scheduler to take over.

If a second task awakens or is created, there will either be some sort
of interrupt to the CPU itself, or to some other CPU, and this other
CPU will IPI the first CPU.  Either way, the scheduler gains control
when it needs it -- but avoids continually interrupting the task.

> Anyway doesn't "future works" should be part of No-hz.txt document?

It does, please see the very last bullet of the document:

o	Some process-handling operations still require the occasional
	scheduling-clock tick.	These operations include calculating CPU
	load, maintaining sched average, computing CFS entity vruntime,
	computing avenrun, and carrying out load balancing.  They are
	currently accommodated by scheduling-clock tick every second
	or so.	On-going work will eliminate the need even for these
	infrequent scheduling-clock ticks.

Here, "the occasional scheduling-clock tick" is the 1Hz tick that
Steven mentioned.

							Thanx, Paul

> > managing how to schedule tasks according to CFS.
> >
> > Everything else shouldn't depend on the tick... period.
> >
> > -- Steve
> >
> >> > all our issues come from. In fact, we can not even completely remove the
> >> > tick yet, we just move it to 1 HZ instead of whatever the CONFIG_HZ is
> >> > set to. We have to handle everything that depends on that tick, which
> >> > includes perf, among other things.
> >> >
> >
> >
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] watchdog: Boot-disable by default on full dynticks
  2013-06-13 15:48                 ` Steven Rostedt
  2013-06-13 16:21                   ` anish singh
@ 2013-06-14 13:49                   ` Don Zickus
  2013-06-14 15:35                     ` Steven Rostedt
  1 sibling, 1 reply; 27+ messages in thread
From: Don Zickus @ 2013-06-14 13:49 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Frederic Weisbecker, Peter Zijlstra, LKML, Paul E. McKenney,
	Ingo Molnar, Andrew Morton, Thomas Gleixner, Li Zhong,
	Srivatsa S. Bhat, Anish Singh

On Thu, Jun 13, 2013 at 11:48:11AM -0400, Steven Rostedt wrote:
> On Thu, 2013-06-13 at 11:20 -0400, Don Zickus wrote:
> 
> > I don't know enough about how full dynticks work to even present a
> > solution.  But currently I was working with the Red Hat performance team
> > to enhance perf to help our customers diagnose performance problems
> > easier.
> > 
> > My fear is anyone who uses full dynticks and has issues, can't use perf to
> > help diagnose their problems because it will change the dynamics of the
> > problem.  And with the current huge drop in performance in cpu_idle (as
> > compared to RHEL-6's 2.6.32 kernel) due to what seems to be miscalculated
> > c-states, one might have a hard time evaluating if full dynticks is doing
> > the right thing or not.
> 
> This needs to be fixed, but not for 3.11. Although, you can still use
> ftrace to diagnose it.

Ok.  At least we both agree it shouldn't stay like this and needs fixing.

> 
> > 
> > Then again perhaps full dynticks isn't useful for distros like RHEL.
> 
> It will be very useful for RHEL. But its still very new, and I wouldn't
> recommend using it in a production environment yet. There's still a few
> issues that need to be worked out, including this one. When the issues
> are fixed, then RHEL and other distributions will definitely want to
> enable this.
> 
> > 
> > That's why I was hoping to solve the underlying problem as opposed to
> > accepting patches like this which work around the symptoms.
> 
> For now it's just to get things working as people expect it to. First
> impressions are very important, and if someone enables it and sees it
> makes no difference, they may from then on never trust it. The way to
> handle that is to make sure it works when enabled, even if it disables
> some other cool features. But as I said, it shouldn't be used in
> production quite yet.
> 
> > 
> > Again, my knowledge of full dynticks is poor, so I have almost no idea of
> > the complexities surrounding the problem and how hard it is to even solve
> > it.
> 
> The concept behind full dynamic ticks is very easy. When you set a given
> CPU(s) to dynamic tick, when it only has a single task scheduled on that
> CPU, it disables the periodic tick. This removes essentially *all*
> latency from the kernel! That is, if the task is doing some complex

Including SMMi latency? ;-)

> calculations, it wont be interrupted for kernel maintenance. A lot of
> Red Hat customers would love to have this feature. It allows for
> extremely low latency actions even without a real-time kernel. Heck, it
> works without even kernel preemption.

Interesting.

> 
> Now removing the periodic tick is not a trivial task, and this is where
> all our issues come from. In fact, we can not even completely remove the
> tick yet, we just move it to 1 HZ instead of whatever the CONFIG_HZ is
> set to. We have to handle everything that depends on that tick, which
> includes perf, among other things.

Which part of perf is dependent on the tick?  Just curious.

Cheers,
Don

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] watchdog: Boot-disable by default on full dynticks
  2013-06-14 13:49                   ` Don Zickus
@ 2013-06-14 15:35                     ` Steven Rostedt
  0 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2013-06-14 15:35 UTC (permalink / raw)
  To: Don Zickus
  Cc: Frederic Weisbecker, Peter Zijlstra, LKML, Paul E. McKenney,
	Ingo Molnar, Andrew Morton, Thomas Gleixner, Li Zhong,
	Srivatsa S. Bhat, Anish Singh

On Fri, 2013-06-14 at 09:49 -0400, Don Zickus wrote:
>  
> > The concept behind full dynamic ticks is very easy. When you set a given
> > CPU(s) to dynamic tick, when it only has a single task scheduled on that
> > CPU, it disables the periodic tick. This removes essentially *all*
> > latency from the kernel! That is, if the task is doing some complex
> 
> Including SMMi latency? ;-)

When you have SMI latencies, it's time to bitch at your vendor, not
us. ;-)


> > 
> > Now removing the periodic tick is not a trivial task, and this is where
> > all our issues come from. In fact, we can not even completely remove the
> > tick yet, we just move it to 1 HZ instead of whatever the CONFIG_HZ is
> > set to. We have to handle everything that depends on that tick, which
> > includes perf, among other things.
> 
> Which part of perf is dependent on the tick?  Just curious.

I'm not the one to answer this question, but it seems that it uses the
tick whenever perf is active. I had to disable watchdog when dynamic
tick was configured because it would permanently disable dynamic ticks
without letting the user know why. This is because watchdog uses perf
and enables it on boot and keeps it enabled.

-- Steve



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] watchdog: Boot-disable by default on full dynticks
  2013-06-14 12:26                         ` Paul E. McKenney
@ 2013-06-14 16:03                           ` anish singh
  2013-06-14 16:12                             ` Steven Rostedt
  0 siblings, 1 reply; 27+ messages in thread
From: anish singh @ 2013-06-14 16:03 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Steven Rostedt, Don Zickus, Frederic Weisbecker, Peter Zijlstra,
	LKML, Ingo Molnar, Andrew Morton, Thomas Gleixner, Li Zhong,
	Srivatsa S. Bhat

Thanks Paul & Steve for replying.
On Fri, Jun 14, 2013 at 5:56 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Fri, Jun 14, 2013 at 09:47:31AM +0530, anish singh wrote:
>> On Thu, Jun 13, 2013 at 10:46 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
>> > On Thu, 2013-06-13 at 21:51 +0530, anish singh wrote:
>> >
>> >> > The concept behind full dynamic ticks is very easy. When you set a given
>> >> > CPU(s) to dynamic tick, when it only has a single task scheduled on that
>> >> > CPU, it disables the periodic tick. This removes essentially *all*
>> >>
>> >> Documentation/timers/highres.txt states that
>> >> hrtimer_stop_sched_tick() is called when a CPU goes into idle state.Would
>> >> you mind elaborating "single task scheduled on that CPU"?
>> >> I am bit new to this so please excuse me if the question is too basic.
>> >
>> > That's the old CONFIG_NO_HZ, which only disables the tick on idle. What
>> > we are working on is to also disable the tick when there's only one task
>> > running on a given CPU. Why have as schedule tick when there's nothing
>> > to schedule?
>> >
>> > 3.10 now has new config options:
>> >
>> > CONFIG_NO_HZ_PERIODIC - which is NO_HZ disabled
>> >    (the old # CONFIG_NO_HZ not set)
>> >
>> > CONFIG_NO_HZ_IDLE - which is what CONFIG_NO_HZ use to be.
>> >
>> > Note, CONFIG_NO_HZ still exists and if set, will make CONFIG_NO_HZ_IDLE
>> > the default. This was to keep the same configuration for people who
>> > update their kernel and run make oldconfig.
>> >
>> > Then there's the new:
>> >
>> > CONFIG_NO_HZ_FULL - this enables CONFIG_NO_HZ_IDLE plus adds the new
>> > feature with disabling the tick when only one task is running on a given
>> > CPU.
>>
>> Thanks and some more explanation in below documents.
>> Documentation/timers/NO_HZ.txt
>> Documentation/timers/highres.txt
>> >
>> >
>> >> > latency from the kernel! That is, if the task is doing some complex
>> >> > calculations, it wont be interrupted for kernel maintenance. A lot of
>> >> > Red Hat customers would love to have this feature. It allows for
>> >> > extremely low latency actions even without a real-time kernel. Heck, it
>> >> > works without even kernel preemption.
>> >> >
>> >> > Now removing the periodic tick is not a trivial task, and this is where
>> >>
>> >> You mean getting rid of period ticks in the kernel code is not easy as there
>> >> are many features such as perf is dependent on it right and that is why
>> >> instead of completely removing periodic ticks we just set the periodic tick
>> >> to happen at 1HZ instead of CONFIG_HZ value?
>> >
>> > IIRC, the reason for moving to 1 HZ is so that the scheduler doesn't get
>> > confused with overflows. It still needs to handle time keeping for
>> "overflows" meaning the tick happening at 1HZ?
>> However as I see here in this patch http://lwn.net/Articles/549592/ -
>> you have plans to move it to 0Hz then how does scheduler cope
>> with this?Does it not need this information to schedule a different
>> task when the current task on "adaptive-ticks CPU" is done?
>
> When the current task completed, it will enter the kernel, allowing
> the scheduler to take over.
>
> If a second task awakens or is created, there will either be some sort
> of interrupt to the CPU itself, or to some other CPU, and this other
> CPU will IPI the first CPU.  Either way, the scheduler gains control
> when it needs it -- but avoids continually interrupting the task.
>
>> Anyway doesn't "future works" should be part of No-hz.txt document?
>
> It does, please see the very last bullet of the document:
>
> o       Some process-handling operations still require the occasional
>         scheduling-clock tick.  These operations include calculating CPU
>         load, maintaining sched average, computing CFS entity vruntime,
>         computing avenrun, and carrying out load balancing.  They are
>         currently accommodated by scheduling-clock tick every second
>         or so.  On-going work will eliminate the need even for these
>         infrequent scheduling-clock ticks.
>
> Here, "the occasional scheduling-clock tick" is the 1Hz tick that

May I know why we zeroed in on 1Hz? Is there any logical reason
or just because it is above 0Hz?
>
>                                                         Thanx, Paul
>
>> > managing how to schedule tasks according to CFS.
>> >
>> > Everything else shouldn't depend on the tick... period.
>> >
>> > -- Steve
>> >
>> >> > all our issues come from. In fact, we can not even completely remove the
>> >> > tick yet, we just move it to 1 HZ instead of whatever the CONFIG_HZ is
>> >> > set to. We have to handle everything that depends on that tick, which
>> >> > includes perf, among other things.
>> >> >
>> >
>> >
>>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] watchdog: Boot-disable by default on full dynticks
  2013-06-14 16:03                           ` anish singh
@ 2013-06-14 16:12                             ` Steven Rostedt
  2013-06-14 16:22                               ` anish singh
  0 siblings, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2013-06-14 16:12 UTC (permalink / raw)
  To: anish singh
  Cc: Paul McKenney, Don Zickus, Frederic Weisbecker, Peter Zijlstra,
	LKML, Ingo Molnar, Andrew Morton, Thomas Gleixner, Li Zhong,
	Srivatsa S. Bhat

On Fri, 2013-06-14 at 21:33 +0530, anish singh wrote:

> May I know why we zeroed in on 1Hz? Is there any logical reason
> or just because it is above 0Hz?
> >

We had to keep a tick. What number would you have picked?

-- Steve



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] watchdog: Boot-disable by default on full dynticks
  2013-06-14 16:12                             ` Steven Rostedt
@ 2013-06-14 16:22                               ` anish singh
  0 siblings, 0 replies; 27+ messages in thread
From: anish singh @ 2013-06-14 16:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul McKenney, Don Zickus, Frederic Weisbecker, Peter Zijlstra,
	LKML, Ingo Molnar, Andrew Morton, Thomas Gleixner, Li Zhong,
	Srivatsa S. Bhat

On Fri, Jun 14, 2013 at 9:42 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Fri, 2013-06-14 at 21:33 +0530, anish singh wrote:
>
>> May I know why we zeroed in on 1Hz? Is there any logical reason
>> or just because it is above 0Hz?
>> >
>
> We had to keep a tick. What number would you have picked?

Great.I will just confirming my understanding.Thanks.
>
> -- Steve
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] watchdog: Boot-disable by default on full dynticks
  2013-06-13 14:02       ` Don Zickus
  2013-06-13 14:22         ` Frederic Weisbecker
@ 2013-06-18 10:36         ` Peter Zijlstra
  2013-06-18 12:04           ` Frederic Weisbecker
  1 sibling, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2013-06-18 10:36 UTC (permalink / raw)
  To: Don Zickus
  Cc: Frederic Weisbecker, LKML, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Andrew Morton, Thomas Gleixner, Li Zhong,
	Srivatsa S. Bhat, Anish Singh, eranian

On Thu, Jun 13, 2013 at 10:02:07AM -0400, Don Zickus wrote:
> On Thu, Jun 13, 2013 at 03:10:59PM +0200, Frederic Weisbecker wrote:
> > On Wed, Jun 12, 2013 at 01:03:16PM -0400, Don Zickus wrote:
> > > On Wed, Jun 12, 2013 at 04:02:36PM +0200, Frederic Weisbecker wrote:
> > > > When the watchdog runs, it prevents the full dynticks
> > > > CPUs from stopping their tick because the hard lockup
> > > > detector uses perf events internally, which in turn
> > > > rely on the periodic tick.
> > > > 
> > > > Since this is a rather confusing behaviour that is not
> > > > easy to track down and identify for those who want to
> > > > test CONFIG_NO_HZ_FULL, let's default disable the
> > > > watchdog on boot time when full dynticks is enabled.
> > > > 
> > > > The user can still enable it later on runtime using
> > > > proc or sysctl.
> > > 
> > > I thought we had a conversation awhile ago, where we agreed this was going
> > > to be fixed for 3.11?  Didn't Peter find the patch and apply it to his
> > > tree?  I am confused why this is still needed?
> > 
> > We agreed on the patch but it hasn't been applied yet. I'm trying to get
> > a sane series of nohz patches before sending to Ingo.
> 
> Peter,
> 
> Where is this patch?
> 

9e6302056f8029f438e853432a856b9f13de26a6
62b8563979273424d6ebe9201e34d1acc133ad4f

made it in, it does need a little additional work; something like the
below might be a start:

Aside from doing the NOHZ thing there's also a partial fix for event migration
since I now noticed we have per-cpu counters for various event types.

Very much needs more work but shows what would be needed

---
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -139,8 +139,11 @@ enum event_type_t {
  * perf_cgroup_events: >0 per-cpu cgroup events exist on this cpu
  */
 struct static_key_deferred perf_sched_events __read_mostly;
+
+/* do we really need the atomic thing? */
 static DEFINE_PER_CPU(atomic_t, perf_cgroup_events);
 static DEFINE_PER_CPU(atomic_t, perf_branch_stack_events);
+static DEFINE_PER_CPU(atomic_t, perf_freq_events);
 
 static atomic_t nr_mmap_events __read_mostly;
 static atomic_t nr_comm_events __read_mostly;
@@ -2791,6 +2794,17 @@ bool perf_event_can_stop_tick(void)
 }
 #endif
 
+void perf_kick_nohz_cpu(void)
+{
+	/* magic to drop out of NOHZ mode */
+}
+
+bool perf_event_needs_cpu(void)
+{
+	return atomic_read(&__get_cpu_var(perf_freq_events)) || 
+	       __this_cpu_read(perf_throttled_count);
+}
+
 void perf_event_task_tick(void)
 {
 	struct list_head *head = &__get_cpu_var(rotation_list);
@@ -3101,6 +3115,28 @@ static void free_event_rcu(struct rcu_he
 static void ring_buffer_put(struct ring_buffer *rb);
 static void ring_buffer_detach(struct perf_event *event, struct ring_buffer *rb);
 
+static void __account_event(struct perf_event *event)
+{
+	if (is_cgroup_event(event))
+		atomic_inc(&per_cpu(perf_cgroup_events, event->cpu));
+	if (has_branch_stack(event) && !(event->attach_state & PERF_ATTACH_TASK))
+		atomic_inc(&per_cpu(perf_branch_stack_events, event->cpu));
+	if (event->attr.freq) {
+		atomic_inc(&per_cpu(perf_freq_events, event->cpu));
+		perf_kick_nohz_cpu();
+	}
+}
+
+static void __unaccount_event(struct perf_event *event)
+{
+	if (is_cgroup_event(event))
+		atomic_dec(&per_cpu(perf_cgroup_events, event->cpu));
+	if (has_branch_stack(event) && !(event->attach_state & PERF_ATTACH_TASK))
+		atomic_dec(&per_cpu(perf_branch_stack_events, event->cpu));
+	if (event->attr.freq)
+		atomic_dec(&per_cpu(perf_freq_events, event->cpu));
+}
+
 static void free_event(struct perf_event *event)
 {
 	irq_work_sync(&event->pending);
@@ -3116,19 +3152,12 @@ static void free_event(struct perf_event
 			atomic_dec(&nr_task_events);
 		if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
 			put_callchain_buffers();
-		if (is_cgroup_event(event)) {
-			atomic_dec(&per_cpu(perf_cgroup_events, event->cpu));
+		if (is_cgroup_event(event))
 			static_key_slow_dec_deferred(&perf_sched_events);
-		}
-
-		if (has_branch_stack(event)) {
+		if (has_branch_stack(event))
 			static_key_slow_dec_deferred(&perf_sched_events);
-			/* is system-wide event */
-			if (!(event->attach_state & PERF_ATTACH_TASK)) {
-				atomic_dec(&per_cpu(perf_branch_stack_events,
-						    event->cpu));
-			}
-		}
+
+		__unaccount_event(event);
 	}
 
 	if (event->rb) {
@@ -5149,6 +5178,7 @@ static int __perf_event_overflow(struct
 		if (unlikely(throttle
 			     && hwc->interrupts >= max_samples_per_tick)) {
 			__this_cpu_inc(perf_throttled_count);
+			perf_kick_nohz_cpu(); /* XXX we're in NMI context */
 			hwc->interrupts = MAX_INTERRUPTS;
 			perf_log_throttle(event, 0);
 			ret = 1;
@@ -6544,6 +6574,7 @@ perf_event_alloc(struct perf_event_attr
 			atomic_inc(&nr_task_events);
 		if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
 			err = get_callchain_buffers();
+			/* XXX this error patch leaks the above counts */
 			if (err) {
 				free_event(event);
 				return ERR_PTR(err);
@@ -6845,11 +6876,20 @@ SYSCALL_DEFINE5(perf_event_open,
 		 * one more event:
 		 * - that has cgroup constraint on event->cpu
 		 * - that may need work on context switch
+		 *
+		 * assumes inherited counters don't have flags set.
 		 */
-		atomic_inc(&per_cpu(perf_cgroup_events, event->cpu));
 		static_key_slow_inc(&perf_sched_events.key);
 	}
 
+	if (!event->parent) {
+		/* 
+		 * XXX the above err_alloc will dec below zero for
+		 * perf_branch_stack_events.
+		 */ 
+		__account_event(event);
+	}
+
 	/*
 	 * Special case software events and allow them to be part of
 	 * any hardware group.
@@ -7083,6 +7123,7 @@ void perf_pmu_migrate_context(struct pmu
 		perf_remove_from_context(event);
 		put_ctx(src_ctx);
 		list_add(&event->event_entry, &events);
+		__unaccount_event(event);
 	}
 	mutex_unlock(&src_ctx->mutex);
 
@@ -7091,6 +7132,8 @@ void perf_pmu_migrate_context(struct pmu
 	mutex_lock(&dst_ctx->mutex);
 	list_for_each_entry_safe(event, tmp, &events, event_entry) {
 		list_del(&event->event_entry);
+		event->cpu = dst_cpu; /* XXX perf_install_in_context already does this; do we really need __account_event() _before_ that? */
+		__account_event(event);
 		if (event->state >= PERF_EVENT_STATE_OFF)
 			event->state = PERF_EVENT_STATE_INACTIVE;
 		perf_install_in_context(dst_ctx, event, dst_cpu);


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] watchdog: Boot-disable by default on full dynticks
  2013-06-18 10:36         ` Peter Zijlstra
@ 2013-06-18 12:04           ` Frederic Weisbecker
  2013-06-18 12:53             ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Frederic Weisbecker @ 2013-06-18 12:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Don Zickus, LKML, Steven Rostedt, Paul E. McKenney, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Li Zhong, Srivatsa S. Bhat,
	Anish Singh, eranian

On Tue, Jun 18, 2013 at 12:36:32PM +0200, Peter Zijlstra wrote:
> On Thu, Jun 13, 2013 at 10:02:07AM -0400, Don Zickus wrote:
> > On Thu, Jun 13, 2013 at 03:10:59PM +0200, Frederic Weisbecker wrote:
> > > On Wed, Jun 12, 2013 at 01:03:16PM -0400, Don Zickus wrote:
> > > > On Wed, Jun 12, 2013 at 04:02:36PM +0200, Frederic Weisbecker wrote:
> > > > > When the watchdog runs, it prevents the full dynticks
> > > > > CPUs from stopping their tick because the hard lockup
> > > > > detector uses perf events internally, which in turn
> > > > > rely on the periodic tick.
> > > > > 
> > > > > Since this is a rather confusing behaviour that is not
> > > > > easy to track down and identify for those who want to
> > > > > test CONFIG_NO_HZ_FULL, let's default disable the
> > > > > watchdog on boot time when full dynticks is enabled.
> > > > > 
> > > > > The user can still enable it later on runtime using
> > > > > proc or sysctl.
> > > > 
> > > > I thought we had a conversation awhile ago, where we agreed this was going
> > > > to be fixed for 3.11?  Didn't Peter find the patch and apply it to his
> > > > tree?  I am confused why this is still needed?
> > > 
> > > We agreed on the patch but it hasn't been applied yet. I'm trying to get
> > > a sane series of nohz patches before sending to Ingo.
> > 
> > Peter,
> > 
> > Where is this patch?
> > 
> 
> 9e6302056f8029f438e853432a856b9f13de26a6
> 62b8563979273424d6ebe9201e34d1acc133ad4f
> 
> made it in, it does need a little additional work; something like the
> below might be a start:
> 
> Aside from doing the NOHZ thing there's also a partial fix for event migration
> since I now noticed we have per-cpu counters for various event types.
> 
> Very much needs more work but shows what would be needed

I see. So what do you think, should I push my patch that deactivates
the watchdog to Ingo until this below patch gets ready? It would be nice to
have a solution for 3.11.

If you prefer I can work on getting your below proposal into shape. I'm on
vacancies next week but I can try to get it ready until the next merge window
closes.

Thanks.

> 
> ---
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -139,8 +139,11 @@ enum event_type_t {
>   * perf_cgroup_events: >0 per-cpu cgroup events exist on this cpu
>   */
>  struct static_key_deferred perf_sched_events __read_mostly;
> +
> +/* do we really need the atomic thing? */
>  static DEFINE_PER_CPU(atomic_t, perf_cgroup_events);
>  static DEFINE_PER_CPU(atomic_t, perf_branch_stack_events);
> +static DEFINE_PER_CPU(atomic_t, perf_freq_events);
>  
>  static atomic_t nr_mmap_events __read_mostly;
>  static atomic_t nr_comm_events __read_mostly;
> @@ -2791,6 +2794,17 @@ bool perf_event_can_stop_tick(void)
>  }
>  #endif
>  
> +void perf_kick_nohz_cpu(void)
> +{
> +	/* magic to drop out of NOHZ mode */
> +}
> +
> +bool perf_event_needs_cpu(void)
> +{
> +	return atomic_read(&__get_cpu_var(perf_freq_events)) || 
> +	       __this_cpu_read(perf_throttled_count);
> +}
> +
>  void perf_event_task_tick(void)
>  {
>  	struct list_head *head = &__get_cpu_var(rotation_list);
> @@ -3101,6 +3115,28 @@ static void free_event_rcu(struct rcu_he
>  static void ring_buffer_put(struct ring_buffer *rb);
>  static void ring_buffer_detach(struct perf_event *event, struct ring_buffer *rb);
>  
> +static void __account_event(struct perf_event *event)
> +{
> +	if (is_cgroup_event(event))
> +		atomic_inc(&per_cpu(perf_cgroup_events, event->cpu));
> +	if (has_branch_stack(event) && !(event->attach_state & PERF_ATTACH_TASK))
> +		atomic_inc(&per_cpu(perf_branch_stack_events, event->cpu));
> +	if (event->attr.freq) {
> +		atomic_inc(&per_cpu(perf_freq_events, event->cpu));
> +		perf_kick_nohz_cpu();
> +	}
> +}
> +
> +static void __unaccount_event(struct perf_event *event)
> +{
> +	if (is_cgroup_event(event))
> +		atomic_dec(&per_cpu(perf_cgroup_events, event->cpu));
> +	if (has_branch_stack(event) && !(event->attach_state & PERF_ATTACH_TASK))
> +		atomic_dec(&per_cpu(perf_branch_stack_events, event->cpu));
> +	if (event->attr.freq)
> +		atomic_dec(&per_cpu(perf_freq_events, event->cpu));
> +}
> +
>  static void free_event(struct perf_event *event)
>  {
>  	irq_work_sync(&event->pending);
> @@ -3116,19 +3152,12 @@ static void free_event(struct perf_event
>  			atomic_dec(&nr_task_events);
>  		if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
>  			put_callchain_buffers();
> -		if (is_cgroup_event(event)) {
> -			atomic_dec(&per_cpu(perf_cgroup_events, event->cpu));
> +		if (is_cgroup_event(event))
>  			static_key_slow_dec_deferred(&perf_sched_events);
> -		}
> -
> -		if (has_branch_stack(event)) {
> +		if (has_branch_stack(event))
>  			static_key_slow_dec_deferred(&perf_sched_events);
> -			/* is system-wide event */
> -			if (!(event->attach_state & PERF_ATTACH_TASK)) {
> -				atomic_dec(&per_cpu(perf_branch_stack_events,
> -						    event->cpu));
> -			}
> -		}
> +
> +		__unaccount_event(event);
>  	}
>  
>  	if (event->rb) {
> @@ -5149,6 +5178,7 @@ static int __perf_event_overflow(struct
>  		if (unlikely(throttle
>  			     && hwc->interrupts >= max_samples_per_tick)) {
>  			__this_cpu_inc(perf_throttled_count);
> +			perf_kick_nohz_cpu(); /* XXX we're in NMI context */
>  			hwc->interrupts = MAX_INTERRUPTS;
>  			perf_log_throttle(event, 0);
>  			ret = 1;
> @@ -6544,6 +6574,7 @@ perf_event_alloc(struct perf_event_attr
>  			atomic_inc(&nr_task_events);
>  		if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
>  			err = get_callchain_buffers();
> +			/* XXX this error patch leaks the above counts */
>  			if (err) {
>  				free_event(event);
>  				return ERR_PTR(err);
> @@ -6845,11 +6876,20 @@ SYSCALL_DEFINE5(perf_event_open,
>  		 * one more event:
>  		 * - that has cgroup constraint on event->cpu
>  		 * - that may need work on context switch
> +		 *
> +		 * assumes inherited counters don't have flags set.
>  		 */
> -		atomic_inc(&per_cpu(perf_cgroup_events, event->cpu));
>  		static_key_slow_inc(&perf_sched_events.key);
>  	}
>  
> +	if (!event->parent) {
> +		/* 
> +		 * XXX the above err_alloc will dec below zero for
> +		 * perf_branch_stack_events.
> +		 */ 
> +		__account_event(event);
> +	}
> +
>  	/*
>  	 * Special case software events and allow them to be part of
>  	 * any hardware group.
> @@ -7083,6 +7123,7 @@ void perf_pmu_migrate_context(struct pmu
>  		perf_remove_from_context(event);
>  		put_ctx(src_ctx);
>  		list_add(&event->event_entry, &events);
> +		__unaccount_event(event);
>  	}
>  	mutex_unlock(&src_ctx->mutex);
>  
> @@ -7091,6 +7132,8 @@ void perf_pmu_migrate_context(struct pmu
>  	mutex_lock(&dst_ctx->mutex);
>  	list_for_each_entry_safe(event, tmp, &events, event_entry) {
>  		list_del(&event->event_entry);
> +		event->cpu = dst_cpu; /* XXX perf_install_in_context already does this; do we really need __account_event() _before_ that? */
> +		__account_event(event);
>  		if (event->state >= PERF_EVENT_STATE_OFF)
>  			event->state = PERF_EVENT_STATE_INACTIVE;
>  		perf_install_in_context(dst_ctx, event, dst_cpu);
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] watchdog: Boot-disable by default on full dynticks
  2013-06-18 12:04           ` Frederic Weisbecker
@ 2013-06-18 12:53             ` Peter Zijlstra
  0 siblings, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2013-06-18 12:53 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Don Zickus, LKML, Steven Rostedt, Paul E. McKenney, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Li Zhong, Srivatsa S. Bhat,
	Anish Singh, eranian

On Tue, Jun 18, 2013 at 02:04:38PM +0200, Frederic Weisbecker wrote:
> I see. So what do you think, should I push my patch that deactivates
> the watchdog to Ingo until this below patch gets ready? It would be nice to
> have a solution for 3.11.

I don't care too much either way.

> If you prefer I can work on getting your below proposal into shape. I'm on
> vacancies next week but I can try to get it ready until the next merge window
> closes.

Right, except for some 'fun' details the proposed patch isn't actually
that hard. Enjoy your holidays.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2013-06-18 12:53 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-12 14:02 [PATCH 0/6] timers nohz updates preview for 3.11 Frederic Weisbecker
2013-06-12 14:02 ` [PATCH 1/6] sched: Disable lb_bias feature for full dynticks Frederic Weisbecker
2013-06-12 14:02 ` [PATCH 2/6] nohz: Warn if the machine can not perform nohz_full Frederic Weisbecker
2013-06-12 14:02 ` [PATCH 3/6] watchdog: Register / unregister watchdog kthreads on sysctl control Frederic Weisbecker
2013-06-12 14:02 ` [PATCH 4/6] watchdog: Boot-disable by default on full dynticks Frederic Weisbecker
2013-06-12 17:03   ` Don Zickus
2013-06-13 13:10     ` Frederic Weisbecker
2013-06-13 14:02       ` Don Zickus
2013-06-13 14:22         ` Frederic Weisbecker
2013-06-13 14:45           ` Don Zickus
2013-06-13 14:56             ` Frederic Weisbecker
2013-06-13 15:20               ` Don Zickus
2013-06-13 15:48                 ` Steven Rostedt
2013-06-13 16:21                   ` anish singh
2013-06-13 17:16                     ` Steven Rostedt
2013-06-14  4:17                       ` anish singh
2013-06-14 12:26                         ` Paul E. McKenney
2013-06-14 16:03                           ` anish singh
2013-06-14 16:12                             ` Steven Rostedt
2013-06-14 16:22                               ` anish singh
2013-06-14 13:49                   ` Don Zickus
2013-06-14 15:35                     ` Steven Rostedt
2013-06-18 10:36         ` Peter Zijlstra
2013-06-18 12:04           ` Frederic Weisbecker
2013-06-18 12:53             ` Peter Zijlstra
2013-06-12 14:02 ` [PATCH 5/6] rcu: Prevent CPU from stopping tick if awaited for quiescent state report Frederic Weisbecker
2013-06-12 14:02 ` [PATCH 6/6] nohz: Remove obsolete check for full dynticks CPUs to be RCU nocbs Frederic Weisbecker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.