linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs
@ 2012-10-29 20:27 Steven Rostedt
  2012-10-29 20:27 ` [PATCH 01/32] nohz: Move nohz load balancer selection into idle logic Steven Rostedt
                   ` (33 more replies)
  0 siblings, 34 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith

A while ago Frederic posted a series of patches to get an idea on
how to implement nohz cpusets. Where you can add a task to a cpuset
and mark the set to be 'nohz'. When the task runs on a CPU and is
the only task scheduled (nr_running == 1), the tick will stop.
The idea is to give the task the least amount of kernel interference
as possible. If the task doesn't do any system calls (and possibly
even if it does), no timer interrupt will bother it. By using
isocpus and nohz cpuset, a task would be able to achieve true cpu
isolation.

This has been long asked for by those in the RT community. If a task
requires uninterruptible CPU time, this would be able to give a task
that, even without the full PREEMPT-RT patch set.

This patch set is not for inclusion. It is just to get the topic
at the forefront again. The design requires more work and more
discussion.

I ported Frederic's work to v3.7-rc3 and I'm posting it here so that
people can comment on it. I just did the minimal to get it to compile
and boot. I haven't done any real tests with it yet. I may have screwed
some things up during the port, but that's OK, because the patch set
will most likely require a rewrite anyway.

Please have a look, and lets get this out the door.

-- Steve


Frederic Weisbecker (31):
      nohz: Move nohz load balancer selection into idle logic
      cpuset: Set up interface for nohz flag
      nohz: Try not to give the timekeeping duty to an adaptive tickless cpu
      x86: New cpuset nohz irq vector
      nohz: Adaptive tick stop and restart on nohz cpuset
      nohz/cpuset: Don't turn off the tick if rcu needs it
      nohz/cpuset: Wake up adaptive nohz CPU when a timer gets enqueued
      nohz/cpuset: Don't stop the tick if posix cpu timers are running
      nohz/cpuset: Restart tick when nohz flag is cleared on cpuset
      nohz/cpuset: Restart the tick if printk needs it
      rcu: Restart the tick on non-responding adaptive nohz CPUs
      rcu: Restart tick if we enqueue a callback in a nohz/cpuset CPU
      nohz: Generalize tickless cpu time accounting
      nohz/cpuset: Account user and system times in adaptive nohz mode
      nohz/cpuset: New API to flush cputimes on nohz cpusets
      nohz/cpuset: Flush cputime on threads in nohz cpusets when waiting leader
      nohz/cpuset: Flush cputimes on procfs stat file read
      nohz/cpuset: Flush cputimes for getrusage() and times() syscalls
      x86: Syscall hooks for nohz cpusets
      nohz: Don't restart the tick before scheduling to idle
      sched: Comment on rq->clock correctness in ttwu_do_wakeup() in nohz
      sched: Update rq clock on nohz CPU before migrating tasks
      sched: Update rq clock on nohz CPU before setting fair group shares
      sched: Update rq clock on tickless CPUs before calling check_preempt_curr()
      sched: Update rq clock earlier in unthrottle_cfs_rq
      sched: Update clock of nohz busiest rq before balancing
      sched: Update rq clock before idle balancing
      sched: Update nohz rq clock before searching busiest group on load balancing
      rcu: Switch to extended quiescent state in userspace from nohz cpuset
      nohz/cpuset: Disable under some configs
      nohz, not for merge: Add tickless tracing

Hakan Akkan (1):
      nohz/cpuset: enable addition&removal of cpus while in adaptive nohz mode

----
 arch/Kconfig                       |    3 +
 arch/x86/include/asm/entry_arch.h  |    3 +
 arch/x86/include/asm/hw_irq.h      |    7 +
 arch/x86/include/asm/irq_vectors.h |    2 +
 arch/x86/include/asm/smp.h         |   11 +-
 arch/x86/kernel/entry_64.S         |    4 +
 arch/x86/kernel/irqinit.c          |    4 +
 arch/x86/kernel/ptrace.c           |   11 +
 arch/x86/kernel/smp.c              |   28 +++
 fs/proc/array.c                    |    2 +
 include/linux/cpuset.h             |   35 ++++
 include/linux/kernel_stat.h        |    2 +
 include/linux/posix-timers.h       |    1 +
 include/linux/rcupdate.h           |    1 +
 include/linux/sched.h              |   10 +-
 include/linux/tick.h               |   72 +++++--
 init/Kconfig                       |    8 +
 kernel/cpuset.c                    |  144 ++++++++++++-
 kernel/exit.c                      |    8 +
 kernel/posix-cpu-timers.c          |   12 ++
 kernel/printk.c                    |   15 +-
 kernel/rcutree.c                   |   28 ++-
 kernel/sched/core.c                |   82 +++++++-
 kernel/sched/cputime.c             |   22 ++
 kernel/sched/fair.c                |   41 +++-
 kernel/sched/sched.h               |   18 ++
 kernel/softirq.c                   |    6 +-
 kernel/sys.c                       |    6 +
 kernel/time/tick-sched.c           |  398 ++++++++++++++++++++++++++++++++----
 kernel/time/timer_list.c           |    3 +-
 kernel/timer.c                     |    2 +-
 31 files changed, 912 insertions(+), 77 deletions(-)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 01/32] nohz: Move nohz load balancer selection into idle logic
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-30  8:32   ` Charles Wang
  2012-10-29 20:27 ` [PATCH 02/32] cpuset: Set up interface for nohz flag Steven Rostedt
                   ` (32 subsequent siblings)
  33 siblings, 1 reply; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0001-nohz-Move-nohz-load-balancer-selection-into-idle-log.patch --]
[-- Type: text/plain, Size: 2565 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

[ ** BUGGY PATCH: I need to put more thinking into this ** ]

We want the nohz load balancer to be an idle CPU, thus
move that selection to strict dyntick idle logic.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ added movement of calc_load_exit_idle() ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/time/tick-sched.c |   11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index a402608..d6d16fe 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -372,9 +372,6 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 		 * the scheduler tick in nohz_restart_sched_tick.
 		 */
 		if (!ts->tick_stopped) {
-			nohz_balance_enter_idle(cpu);
-			calc_load_enter_idle();
-
 			ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
 			ts->tick_stopped = 1;
 		}
@@ -466,8 +463,11 @@ static void __tick_nohz_idle_enter(struct tick_sched *ts)
 			ts->idle_expires = expires;
 		}
 
-		if (!was_stopped && ts->tick_stopped)
+		if (!was_stopped && ts->tick_stopped) {
 			ts->idle_jiffies = ts->last_jiffies;
+			nohz_balance_enter_idle(cpu);
+			calc_load_enter_idle();
+		}
 	}
 }
 
@@ -573,7 +573,6 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
 	tick_do_update_jiffies64(now);
 	update_cpu_load_nohz();
 
-	calc_load_exit_idle();
 	touch_softlockup_watchdog();
 	/*
 	 * Cancel the scheduled timer and restore the tick
@@ -628,6 +627,8 @@ void tick_nohz_idle_exit(void)
 		tick_nohz_stop_idle(cpu, now);
 
 	if (ts->tick_stopped) {
+		nohz_balance_enter_idle(cpu);
+		calc_load_exit_idle();
 		tick_nohz_restart_sched_tick(ts, now);
 		tick_nohz_account_idle_ticks(ts);
 	}
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 02/32] cpuset: Set up interface for nohz flag
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
  2012-10-29 20:27 ` [PATCH 01/32] nohz: Move nohz load balancer selection into idle logic Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-30 17:16   ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 03/32] nohz: Try not to give the timekeeping duty to an adaptive tickless cpu Steven Rostedt
                   ` (31 subsequent siblings)
  33 siblings, 1 reply; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0002-cpuset-Set-up-interface-for-nohz-flag.patch --]
[-- Type: text/plain, Size: 6621 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

Prepare the interface to implement the nohz cpuset flag.
This flag, once set, will tell the system to try to
shutdown the periodic timer tick when possible.

We use here a per cpu refcounter. As long as a CPU
is contained into at least one cpuset that has the
nohz flag set, it is part of the set of CPUs that
run into adaptive nohz mode.

[ include build fix from Zen Lin ]

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 arch/Kconfig           |    3 +++
 include/linux/cpuset.h |   31 ++++++++++++++++++++++++++++
 init/Kconfig           |    8 ++++++++
 kernel/cpuset.c        |   53 +++++++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 94 insertions(+), 1 deletion(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 366ec06..8e2162f6 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -239,6 +239,9 @@ config HAVE_ARCH_JUMP_LABEL
 	bool
 
 config HAVE_ARCH_MUTEX_CPU_RELAX
+       bool
+
+config HAVE_CPUSETS_NO_HZ
 	bool
 
 config HAVE_RCU_TABLE_FREE
diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index 838320f..7e7eb41 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -13,6 +13,7 @@
 #include <linux/nodemask.h>
 #include <linux/cgroup.h>
 #include <linux/mm.h>
+#include <linux/atomic.h>
 
 #ifdef CONFIG_CPUSETS
 
@@ -235,4 +236,34 @@ static inline bool put_mems_allowed(unsigned int seq)
 
 #endif /* !CONFIG_CPUSETS */
 
+#ifdef CONFIG_CPUSETS_NO_HZ
+
+DECLARE_PER_CPU(atomic_t, cpu_adaptive_nohz_ref);
+
+static inline bool cpuset_cpu_adaptive_nohz(int cpu)
+{
+	atomic_t *ref = &per_cpu(cpu_adaptive_nohz_ref, cpu);
+
+	if (atomic_add_return(0, ref) > 0)
+		return true;
+
+	return false;
+}
+
+static inline bool cpuset_adaptive_nohz(void)
+{
+	/*
+	 * We probably want to do atomic_read() when we read
+	 * locally to avoid the overhead of an ordered add.
+	 * For that we have to do the dec of the ref locally as
+	 * well.
+	 */
+	return cpuset_cpu_adaptive_nohz(smp_processor_id());
+}
+#else
+static inline bool cpuset_cpu_adaptive_nohz(int cpu) { return false; }
+static inline bool cpuset_adaptive_nohz(void) { return false; }
+
+#endif /* CONFIG_CPUSETS_NO_HZ */
+
 #endif /* _LINUX_CPUSET_H */
diff --git a/init/Kconfig b/init/Kconfig
index 6fdd6e3..ffdeeab 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -749,6 +749,14 @@ config PROC_PID_CPUSET
 	depends on CPUSETS
 	default y
 
+config CPUSETS_NO_HZ
+       bool "Tickless cpusets"
+       depends on CPUSETS && HAVE_CPUSETS_NO_HZ
+       help
+         This options let you apply a nohz property to a cpuset such
+	 that the periodic timer tick tries to be avoided when possible on
+	 the concerned CPUs.
+
 config CGROUP_CPUACCT
 	bool "Simple CPU accounting cgroup subsystem"
 	help
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index f33c715..6319d8e 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -145,6 +145,7 @@ typedef enum {
 	CS_SCHED_LOAD_BALANCE,
 	CS_SPREAD_PAGE,
 	CS_SPREAD_SLAB,
+	CS_ADAPTIVE_NOHZ,
 } cpuset_flagbits_t;
 
 /* the type of hotplug event */
@@ -189,6 +190,11 @@ static inline int is_spread_slab(const struct cpuset *cs)
 	return test_bit(CS_SPREAD_SLAB, &cs->flags);
 }
 
+static inline int is_adaptive_nohz(const struct cpuset *cs)
+{
+	return test_bit(CS_ADAPTIVE_NOHZ, &cs->flags);
+}
+
 static struct cpuset top_cpuset = {
 	.flags = ((1 << CS_CPU_EXCLUSIVE) | (1 << CS_MEM_EXCLUSIVE)),
 };
@@ -1190,6 +1196,32 @@ static void cpuset_change_flag(struct task_struct *tsk,
 	cpuset_update_task_spread_flag(cgroup_cs(scan->cg), tsk);
 }
 
+#ifdef CONFIG_CPUSETS_NO_HZ
+
+DEFINE_PER_CPU(atomic_t, cpu_adaptive_nohz_ref);
+
+static void update_nohz_cpus(struct cpuset *old_cs, struct cpuset *cs)
+{
+	int cpu;
+	int val;
+
+	if (is_adaptive_nohz(old_cs) == is_adaptive_nohz(cs))
+		return;
+
+	for_each_cpu(cpu, cs->cpus_allowed) {
+		atomic_t *ref = &per_cpu(cpu_adaptive_nohz_ref, cpu);
+		if (is_adaptive_nohz(cs))
+			atomic_inc(ref);
+		else
+			atomic_dec(ref);
+	}
+}
+#else
+static inline void update_nohz_cpus(struct cpuset *old_cs, struct cpuset *cs)
+{
+}
+#endif
+
 /*
  * update_tasks_flags - update the spread flags of tasks in the cpuset.
  * @cs: the cpuset in which each task's spread flags needs to be changed
@@ -1255,6 +1287,8 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
 	spread_flag_changed = ((is_spread_slab(cs) != is_spread_slab(trialcs))
 			|| (is_spread_page(cs) != is_spread_page(trialcs)));
 
+	update_nohz_cpus(cs, trialcs);
+
 	mutex_lock(&callback_mutex);
 	cs->flags = trialcs->flags;
 	mutex_unlock(&callback_mutex);
@@ -1465,6 +1499,7 @@ typedef enum {
 	FILE_MEMORY_PRESSURE,
 	FILE_SPREAD_PAGE,
 	FILE_SPREAD_SLAB,
+	FILE_ADAPTIVE_NOHZ,
 } cpuset_filetype_t;
 
 static int cpuset_write_u64(struct cgroup *cgrp, struct cftype *cft, u64 val)
@@ -1504,6 +1539,11 @@ static int cpuset_write_u64(struct cgroup *cgrp, struct cftype *cft, u64 val)
 	case FILE_SPREAD_SLAB:
 		retval = update_flag(CS_SPREAD_SLAB, cs, val);
 		break;
+#ifdef CONFIG_CPUSETS_NO_HZ
+	case FILE_ADAPTIVE_NOHZ:
+		retval = update_flag(CS_ADAPTIVE_NOHZ, cs, val);
+		break;
+#endif
 	default:
 		retval = -EINVAL;
 		break;
@@ -1663,6 +1703,10 @@ static u64 cpuset_read_u64(struct cgroup *cont, struct cftype *cft)
 		return is_spread_page(cs);
 	case FILE_SPREAD_SLAB:
 		return is_spread_slab(cs);
+#ifdef CONFIG_CPUSETS_NO_HZ
+	case FILE_ADAPTIVE_NOHZ:
+		return is_adaptive_nohz(cs);
+#endif
 	default:
 		BUG();
 	}
@@ -1771,7 +1815,14 @@ static struct cftype files[] = {
 		.write_u64 = cpuset_write_u64,
 		.private = FILE_SPREAD_SLAB,
 	},
-
+#ifdef CONFIG_CPUSETS_NO_HZ
+	{
+		.name = "adaptive_nohz",
+		.read_u64 = cpuset_read_u64,
+		.write_u64 = cpuset_write_u64,
+		.private = FILE_ADAPTIVE_NOHZ,
+	},
+#endif
 	{
 		.name = "memory_pressure_enabled",
 		.flags = CFTYPE_ONLY_ON_ROOT,
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 03/32] nohz: Try not to give the timekeeping duty to an adaptive tickless cpu
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
  2012-10-29 20:27 ` [PATCH 01/32] nohz: Move nohz load balancer selection into idle logic Steven Rostedt
  2012-10-29 20:27 ` [PATCH 02/32] cpuset: Set up interface for nohz flag Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-30 17:33   ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 04/32] x86: New cpuset nohz irq vector Steven Rostedt
                   ` (30 subsequent siblings)
  33 siblings, 1 reply; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0003-nohz-Try-not-to-give-the-timekeeping-duty-to-an-adap.patch --]
[-- Type: text/plain, Size: 3482 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

Try to give the timekeeing duty to a CPU that doesn't belong
to any nohz cpuset when possible, so that we increase the chance
for these nohz cpusets to run their CPUs out of periodic tick
mode.

[TODO: We need to find a way to ensure there is always one non-nohz
running CPU maintaining the timekeeping duty if every non-idle CPUs are
adaptive tickless]

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/tick-sched.c |   52 ++++++++++++++++++++++++++++++++++++----------
 1 file changed, 41 insertions(+), 11 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index d6d16fe..c7a78c6 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -20,6 +20,7 @@
 #include <linux/profile.h>
 #include <linux/sched.h>
 #include <linux/module.h>
+#include <linux/cpuset.h>
 
 #include <asm/irq_regs.h>
 
@@ -789,6 +790,45 @@ void tick_check_idle(int cpu)
 	tick_check_nohz(cpu);
 }
 
+#ifdef CONFIG_CPUSETS_NO_HZ
+
+/*
+ * Take the timer duty if nobody is taking care of it.
+ * If a CPU already does and and it's in a nohz cpuset,
+ * then take the charge so that it can switch to nohz mode.
+ */
+static void tick_do_timer_check_handler(int cpu)
+{
+	int handler = tick_do_timer_cpu;
+
+	if (unlikely(handler == TICK_DO_TIMER_NONE)) {
+		tick_do_timer_cpu = cpu;
+	} else {
+		if (!cpuset_adaptive_nohz() &&
+		    cpuset_cpu_adaptive_nohz(handler))
+			tick_do_timer_cpu = cpu;
+	}
+}
+
+#else
+
+static void tick_do_timer_check_handler(int cpu)
+{
+#ifdef CONFIG_NO_HZ
+	/*
+	 * Check if the do_timer duty was dropped. We don't care about
+	 * concurrency: This happens only when the cpu in charge went
+	 * into a long sleep. If two cpus happen to assign themself to
+	 * this duty, then the jiffies update is still serialized by
+	 * xtime_lock.
+	 */
+	if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE))
+		tick_do_timer_cpu = cpu;
+#endif
+}
+
+#endif /* CONFIG_CPUSETS_NO_HZ */
+
 /*
  * High resolution timer specific code
  */
@@ -805,17 +845,7 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
 	ktime_t now = ktime_get();
 	int cpu = smp_processor_id();
 
-#ifdef CONFIG_NO_HZ
-	/*
-	 * Check if the do_timer duty was dropped. We don't care about
-	 * concurrency: This happens only when the cpu in charge went
-	 * into a long sleep. If two cpus happen to assign themself to
-	 * this duty, then the jiffies update is still serialized by
-	 * xtime_lock.
-	 */
-	if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE))
-		tick_do_timer_cpu = cpu;
-#endif
+	tick_do_timer_check_handler(cpu);
 
 	/* Check, if the jiffies need an update */
 	if (tick_do_timer_cpu == cpu)
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 04/32] x86: New cpuset nohz irq vector
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (2 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 03/32] nohz: Try not to give the timekeeping duty to an adaptive tickless cpu Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-30 17:39   ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 05/32] nohz: Adaptive tick stop and restart on nohz cpuset Steven Rostedt
                   ` (29 subsequent siblings)
  33 siblings, 1 reply; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0004-x86-New-cpuset-nohz-irq-vector.patch --]
[-- Type: text/plain, Size: 7057 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

We need a way to send an IPI (remote or local) in order to
asynchronously restart the tick for CPUs in nohz adaptive mode.

This must be asynchronous such that we can trigger it with irqs
disabled. This must be usable as a self-IPI as well for example
in cases where we want to avoid random dealock scenario while
restarting the tick inline otherwise.

This only settles the x86 backend. The core tick restart function
will be defined in a later patch.

[CHECKME: Perhaps we instead need to use irq work for self IPIs.
But we also need a way to send async remote IPIs.]

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/entry_arch.h  |    3 +++
 arch/x86/include/asm/hw_irq.h      |    7 +++++++
 arch/x86/include/asm/irq_vectors.h |    2 ++
 arch/x86/include/asm/smp.h         |   11 ++++++++++-
 arch/x86/kernel/entry_64.S         |    4 ++++
 arch/x86/kernel/irqinit.c          |    4 ++++
 arch/x86/kernel/smp.c              |   24 ++++++++++++++++++++++++
 7 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/entry_arch.h b/arch/x86/include/asm/entry_arch.h
index 40afa00..7e8c38c 100644
--- a/arch/x86/include/asm/entry_arch.h
+++ b/arch/x86/include/asm/entry_arch.h
@@ -10,6 +10,9 @@
  * through the ICC by us (IPIs)
  */
 #ifdef CONFIG_SMP
+#ifdef CONFIG_CPUSETS_NO_HZ
+BUILD_INTERRUPT(cpuset_update_nohz_interrupt,CPUSET_UPDATE_NOHZ_VECTOR)
+#endif
 BUILD_INTERRUPT(reschedule_interrupt,RESCHEDULE_VECTOR)
 BUILD_INTERRUPT(call_function_interrupt,CALL_FUNCTION_VECTOR)
 BUILD_INTERRUPT(call_function_single_interrupt,CALL_FUNCTION_SINGLE_VECTOR)
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index eb92a6e..0d26ed7 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -35,6 +35,10 @@ extern void spurious_interrupt(void);
 extern void thermal_interrupt(void);
 extern void reschedule_interrupt(void);
 
+#ifdef CONFIG_CPUSETS_NO_HZ
+extern void cpuset_update_nohz_interrupt(void);
+#endif
+
 extern void invalidate_interrupt(void);
 extern void invalidate_interrupt0(void);
 extern void invalidate_interrupt1(void);
@@ -152,6 +156,9 @@ extern asmlinkage void smp_irq_move_cleanup_interrupt(void);
 #endif
 #ifdef CONFIG_SMP
 extern void smp_reschedule_interrupt(struct pt_regs *);
+#ifdef CONFIG_CPUSETS_NO_HZ
+extern void smp_cpuset_update_nohz_interrupt(struct pt_regs *);
+#endif
 extern void smp_call_function_interrupt(struct pt_regs *);
 extern void smp_call_function_single_interrupt(struct pt_regs *);
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 1508e51..f54dea8 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -112,6 +112,8 @@
 /* Xen vector callback to receive events in a HVM domain */
 #define XEN_HVM_EVTCHN_CALLBACK		0xf3
 
+#define CPUSET_UPDATE_NOHZ_VECTOR	0xf2
+
 /*
  * Local APIC timer IRQ vector is on a different priority level,
  * to work around the 'lost local interrupt if more than 2 IRQ
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 4f19a15..2c30bbd 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -71,7 +71,9 @@ struct smp_ops {
 
 	void (*stop_other_cpus)(int wait);
 	void (*smp_send_reschedule)(int cpu);
-
+#ifdef CONFIG_CPUSETS_NO_HZ
+	void (*smp_cpuset_update_nohz)(int cpu);
+#endif
 	int (*cpu_up)(unsigned cpu, struct task_struct *tidle);
 	int (*cpu_disable)(void);
 	void (*cpu_die)(unsigned int cpu);
@@ -140,6 +142,13 @@ static inline void smp_send_reschedule(int cpu)
 	smp_ops.smp_send_reschedule(cpu);
 }
 
+static inline void smp_cpuset_update_nohz(int cpu)
+{
+#ifdef CONFIG_CPUSETS_NO_HZ
+	smp_ops.smp_cpuset_update_nohz(cpu);
+#endif
+}
+
 static inline void arch_send_call_function_single_ipi(int cpu)
 {
 	smp_ops.send_call_func_single_ipi(cpu);
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index b51b2c7..6d5b77d 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1173,6 +1173,10 @@ apicinterrupt CALL_FUNCTION_VECTOR \
 	call_function_interrupt smp_call_function_interrupt
 apicinterrupt RESCHEDULE_VECTOR \
 	reschedule_interrupt smp_reschedule_interrupt
+#ifdef CONFIG_CPUSETS_NO_HZ
+apicinterrupt CPUSET_UPDATE_NOHZ_VECTOR \
+	cpuset_update_nohz_interrupt smp_cpuset_update_nohz_interrupt
+#endif
 #endif
 
 apicinterrupt ERROR_APIC_VECTOR \
diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
index 6e03b0d..394e9ec 100644
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -171,6 +171,10 @@ static void __init smp_intr_init(void)
 	 */
 	alloc_intr_gate(RESCHEDULE_VECTOR, reschedule_interrupt);
 
+#ifdef CONFIG_CPUSETS_NO_HZ
+	alloc_intr_gate(CPUSET_UPDATE_NOHZ_VECTOR, cpuset_update_nohz_interrupt);
+#endif
+
 	/* IPI for generic function call */
 	alloc_intr_gate(CALL_FUNCTION_VECTOR, call_function_interrupt);
 
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 48d2b7d..4c0b7d2 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -126,6 +126,17 @@ static void native_smp_send_reschedule(int cpu)
 	apic->send_IPI_mask(cpumask_of(cpu), RESCHEDULE_VECTOR);
 }
 
+#ifdef CONFIG_CPUSETS_NO_HZ
+static void native_smp_cpuset_update_nohz(int cpu)
+{
+	if (unlikely(cpu_is_offline(cpu))) {
+		WARN_ON(1);
+		return;
+	}
+	apic->send_IPI_mask(cpumask_of(cpu), CPUSET_UPDATE_NOHZ_VECTOR);
+}
+#endif
+
 void native_send_call_func_single_ipi(int cpu)
 {
 	apic->send_IPI_mask(cpumask_of(cpu), CALL_FUNCTION_SINGLE_VECTOR);
@@ -259,6 +270,16 @@ void smp_reschedule_interrupt(struct pt_regs *regs)
 	 */
 }
 
+#ifdef CONFIG_CPUSETS_NO_HZ
+void smp_cpuset_update_nohz_interrupt(struct pt_regs *regs)
+{
+	ack_APIC_irq();
+	irq_enter();
+	inc_irq_stat(irq_call_count);
+	irq_exit();
+}
+#endif
+
 void smp_call_function_interrupt(struct pt_regs *regs)
 {
 	ack_APIC_irq();
@@ -292,6 +313,9 @@ struct smp_ops smp_ops = {
 
 	.stop_other_cpus	= native_stop_other_cpus,
 	.smp_send_reschedule	= native_smp_send_reschedule,
+#ifdef CONFIG_CPUSETS_NO_HZ
+	.smp_cpuset_update_nohz = native_smp_cpuset_update_nohz,
+#endif
 
 	.cpu_up			= native_cpu_up,
 	.cpu_die		= native_cpu_die,
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 05/32] nohz: Adaptive tick stop and restart on nohz cpuset
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (3 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 04/32] x86: New cpuset nohz irq vector Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-30 18:23   ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 06/32] nohz/cpuset: Dont turn off the tick if rcu needs it Steven Rostedt
                   ` (28 subsequent siblings)
  33 siblings, 1 reply; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0005-nohz-Adaptive-tick-stop-and-restart-on-nohz-cpuset.patch --]
[-- Type: text/plain, Size: 11057 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

When a CPU is included in a nohz cpuset, try to switch
it to nohz mode from the interrupt exit path if it is running
a single non-idle task.

Then restart the tick if necessary if we are enqueuing a
second task while the timer is stopped, so that the scheduler
tick is rearmed.

[TODO: Handle the many things done from scheduler_tick()]

[ Included build fix from Geoff Levand ]

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/smp.c    |    2 ++
 include/linux/sched.h    |    6 ++++
 include/linux/tick.h     |   11 +++++-
 init/Kconfig             |    2 +-
 kernel/sched/core.c      |   24 +++++++++++++
 kernel/sched/sched.h     |   12 +++++++
 kernel/softirq.c         |    6 ++--
 kernel/time/tick-sched.c |   86 +++++++++++++++++++++++++++++++++++++++++-----
 8 files changed, 137 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 4c0b7d2..0bad72d 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -23,6 +23,7 @@
 #include <linux/interrupt.h>
 #include <linux/cpu.h>
 #include <linux/gfp.h>
+#include <linux/tick.h>
 
 #include <asm/mtrr.h>
 #include <asm/tlbflush.h>
@@ -275,6 +276,7 @@ void smp_cpuset_update_nohz_interrupt(struct pt_regs *regs)
 {
 	ack_APIC_irq();
 	irq_enter();
+	tick_nohz_check_adaptive();
 	inc_irq_stat(irq_call_count);
 	irq_exit();
 }
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 0dd42a0..749752e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2753,6 +2753,12 @@ static inline void inc_syscw(struct task_struct *tsk)
 #define TASK_SIZE_OF(tsk)	TASK_SIZE
 #endif
 
+#ifdef CONFIG_CPUSETS_NO_HZ
+extern bool sched_can_stop_tick(void);
+#else
+static inline bool sched_can_stop_tick(void) { return false; }
+#endif
+
 #ifdef CONFIG_MM_OWNER
 extern void mm_update_next_owner(struct mm_struct *mm);
 extern void mm_init_owner(struct mm_struct *mm, struct task_struct *p);
diff --git a/include/linux/tick.h b/include/linux/tick.h
index f37fceb..9b66fd3 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -124,11 +124,12 @@ static inline int tick_oneshot_mode_active(void) { return 0; }
 # ifdef CONFIG_NO_HZ
 extern void tick_nohz_idle_enter(void);
 extern void tick_nohz_idle_exit(void);
+extern void tick_nohz_restart_sched_tick(void);
 extern void tick_nohz_irq_exit(void);
 extern ktime_t tick_nohz_get_sleep_length(void);
 extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
 extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
-# else
+# else /* !NO_HZ */
 static inline void tick_nohz_idle_enter(void) { }
 static inline void tick_nohz_idle_exit(void) { }
 
@@ -142,4 +143,12 @@ static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return -1; }
 static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
 # endif /* !NO_HZ */
 
+#ifdef CONFIG_CPUSETS_NO_HZ
+extern void tick_nohz_check_adaptive(void);
+extern void tick_nohz_post_schedule(void);
+#else /* !CPUSETS_NO_HZ */
+static inline void tick_nohz_check_adaptive(void) { }
+static inline void tick_nohz_post_schedule(void) { }
+#endif /* CPUSETS_NO_HZ */
+
 #endif
diff --git a/init/Kconfig b/init/Kconfig
index ffdeeab..418e078 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -751,7 +751,7 @@ config PROC_PID_CPUSET
 
 config CPUSETS_NO_HZ
        bool "Tickless cpusets"
-       depends on CPUSETS && HAVE_CPUSETS_NO_HZ
+       depends on CPUSETS && HAVE_CPUSETS_NO_HZ && NO_HZ && HIGH_RES_TIMERS
        help
          This options let you apply a nohz property to a cpuset such
 	 that the periodic timer tick tries to be avoided when possible on
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d8927f..2716b79 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1196,6 +1196,29 @@ static void update_avg(u64 *avg, u64 sample)
 }
 #endif
 
+#ifdef CONFIG_CPUSETS_NO_HZ
+bool sched_can_stop_tick(void)
+{
+	struct rq *rq;
+
+	rq = this_rq();
+
+	/*
+	 * This is called right after cpuset_adaptive_nohz() that
+	 * uses atomic_add_return() so that we are ordered against
+	 * cpu_adaptive_nohz_ref. When inc_nr_running() sends an
+	 * IPI to this CPU, we are guaranteed to see the update on
+	 * nr_running.
+	 */
+
+	/* More than one running task need preemption */
+	if (rq->nr_running > 1)
+		return false;
+
+	return true;
+}
+#endif
+
 static void
 ttwu_stat(struct task_struct *p, int cpu, int wake_flags)
 {
@@ -1897,6 +1920,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
 	 * frame will be invalid.
 	 */
 	finish_task_switch(this_rq(), prev);
+	tick_nohz_post_schedule();
 }
 
 /*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 7a7db09..c6cd9ec 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1,6 +1,7 @@
 
 #include <linux/sched.h>
 #include <linux/mutex.h>
+#include <linux/cpuset.h>
 #include <linux/spinlock.h>
 #include <linux/stop_machine.h>
 
@@ -927,6 +928,17 @@ static inline u64 steal_ticks(u64 steal)
 static inline void inc_nr_running(struct rq *rq)
 {
 	rq->nr_running++;
+
+	if (rq->nr_running == 2) {
+		/*
+		 * cpuset_cpu_adaptive_nohz() uses atomic_add_return()
+		 * to order against rq->nr_running updates. This way
+		 * the CPU that receives the IPI is guaranteed to see
+		 * the update on nr_running without the rq->lock.
+		 */
+		if (cpuset_cpu_adaptive_nohz(rq->cpu))
+			smp_cpuset_update_nohz(rq->cpu);
+	}
 }
 
 static inline void dec_nr_running(struct rq *rq)
diff --git a/kernel/softirq.c b/kernel/softirq.c
index cc96bdc..e06b8eb 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -25,6 +25,7 @@
 #include <linux/smp.h>
 #include <linux/smpboot.h>
 #include <linux/tick.h>
+#include <linux/cpuset.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/irq.h>
@@ -307,7 +308,8 @@ void irq_enter(void)
 	int cpu = smp_processor_id();
 
 	rcu_irq_enter();
-	if (is_idle_task(current) && !in_interrupt()) {
+
+	if ((is_idle_task(current) || cpuset_adaptive_nohz()) && !in_interrupt()) {
 		/*
 		 * Prevent raise_softirq from needlessly waking up ksoftirqd
 		 * here, as softirq will be serviced on return from interrupt.
@@ -349,7 +351,7 @@ void irq_exit(void)
 
 #ifdef CONFIG_NO_HZ
 	/* Make sure that timer wheel updates are propagated */
-	if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
+	if (!in_interrupt())
 		tick_nohz_irq_exit();
 #endif
 	rcu_irq_exit();
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index c7a78c6..35047b2 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -512,6 +512,24 @@ void tick_nohz_idle_enter(void)
 	local_irq_enable();
 }
 
+static void tick_nohz_cpuset_stop_tick(struct tick_sched *ts)
+{
+#ifdef CONFIG_CPUSETS_NO_HZ
+	int cpu = smp_processor_id();
+
+	if (!cpuset_adaptive_nohz() || is_idle_task(current))
+		return;
+
+	if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
+		return;
+
+	if (!sched_can_stop_tick())
+		return;
+
+	tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
+#endif
+}
+
 /**
  * tick_nohz_irq_exit - update next tick event from interrupt exit
  *
@@ -524,10 +542,12 @@ void tick_nohz_irq_exit(void)
 {
 	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
 
-	if (!ts->inidle)
-		return;
-
-	__tick_nohz_idle_enter(ts);
+	if (ts->inidle) {
+		if (!need_resched())
+			__tick_nohz_idle_enter(ts);
+	} else {
+		tick_nohz_cpuset_stop_tick(ts);
+	}
 }
 
 /**
@@ -568,7 +588,7 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
 	}
 }
 
-static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
+static void __tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
 {
 	/* Update jiffies first */
 	tick_do_update_jiffies64(now);
@@ -584,6 +604,31 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
 	tick_nohz_restart(ts, now);
 }
 
+/**
+ * tick_nohz_restart_sched_tick - restart the tick for a tickless CPU
+ *
+ * Restart the tick when the CPU is in adaptive tickless mode.
+ */
+void tick_nohz_restart_sched_tick(void)
+{
+	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+	unsigned long flags;
+	ktime_t now;
+
+	local_irq_save(flags);
+
+	if (!ts->tick_stopped) {
+		local_irq_restore(flags);
+		return;
+	}
+
+	now = ktime_get();
+	__tick_nohz_restart_sched_tick(ts, now);
+
+	local_irq_restore(flags);
+}
+
+
 static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
 {
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING
@@ -630,7 +675,7 @@ void tick_nohz_idle_exit(void)
 	if (ts->tick_stopped) {
 		nohz_balance_enter_idle(cpu);
 		calc_load_exit_idle();
-		tick_nohz_restart_sched_tick(ts, now);
+		__tick_nohz_restart_sched_tick(ts, now);
 		tick_nohz_account_idle_ticks(ts);
 	}
 
@@ -791,7 +836,6 @@ void tick_check_idle(int cpu)
 }
 
 #ifdef CONFIG_CPUSETS_NO_HZ
-
 /*
  * Take the timer duty if nobody is taking care of it.
  * If a CPU already does and and it's in a nohz cpuset,
@@ -810,6 +854,31 @@ static void tick_do_timer_check_handler(int cpu)
 	}
 }
 
+void tick_nohz_check_adaptive(void)
+{
+	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+
+	if (cpuset_adaptive_nohz()) {
+		if (ts->tick_stopped && !is_idle_task(current)) {
+			if (!sched_can_stop_tick())
+				tick_nohz_restart_sched_tick();
+		}
+	}
+}
+
+void tick_nohz_post_schedule(void)
+{
+	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+
+	/*
+	 * No need to disable irqs here. The worst that can happen
+	 * is an irq that comes and restart the tick before us.
+	 * tick_nohz_restart_sched_tick() is irq safe.
+	 */
+	if (ts->tick_stopped)
+		tick_nohz_restart_sched_tick();
+}
+
 #else
 
 static void tick_do_timer_check_handler(int cpu)
@@ -856,6 +925,7 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
 	 * no valid regs pointer
 	 */
 	if (regs) {
+		int user = user_mode(regs);
 		/*
 		 * When we are idle and the tick is stopped, we have to touch
 		 * the watchdog as we might not schedule for a really long
@@ -869,7 +939,7 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
 			if (is_idle_task(current))
 				ts->idle_jiffies++;
 		}
-		update_process_times(user_mode(regs));
+		update_process_times(user);
 		profile_tick(CPU_PROFILING);
 	}
 
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 06/32] nohz/cpuset: Dont turn off the tick if rcu needs it
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (4 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 05/32] nohz: Adaptive tick stop and restart on nohz cpuset Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-30 18:30   ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 07/32] nohz/cpuset: Wake up adaptive nohz CPU when a timer gets enqueued Steven Rostedt
                   ` (27 subsequent siblings)
  33 siblings, 1 reply; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0006-nohz-cpuset-Don-t-turn-off-the-tick-if-rcu-needs-it.patch --]
[-- Type: text/plain, Size: 4034 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

If RCU is waiting for the current CPU to complete a grace
period, don't turn off the tick. Unlike dynctik-idle, we
are not necessarily going to enter into rcu extended quiescent
state, so we may need to keep the tick to note current CPU's
quiescent states.

[added build fix from Zen Lin]

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/rcupdate.h |    1 +
 kernel/rcutree.c         |    3 +--
 kernel/time/tick-sched.c |   22 ++++++++++++++++++----
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 7c968e4..9804c3a 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -186,6 +186,7 @@ static inline int rcu_preempt_depth(void)
 extern void rcu_sched_qs(int cpu);
 extern void rcu_bh_qs(int cpu);
 extern void rcu_check_callbacks(int cpu, int user);
+extern int rcu_pending(int cpu);
 struct notifier_block;
 extern void rcu_idle_enter(void);
 extern void rcu_idle_exit(void);
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 74df86b..0dca81f 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -234,7 +234,6 @@ module_param(jiffies_till_next_fqs, ulong, 0644);
 
 static void force_qs_rnp(struct rcu_state *rsp, int (*f)(struct rcu_data *));
 static void force_quiescent_state(struct rcu_state *rsp);
-static int rcu_pending(int cpu);
 
 /*
  * Return the number of RCU-sched batches processed thus far for debug & stats.
@@ -2429,7 +2428,7 @@ static int __rcu_pending(struct rcu_state *rsp, struct rcu_data *rdp)
  * by the current CPU, returning 1 if so.  This function is part of the
  * RCU implementation; it is -not- an exported member of the RCU API.
  */
-static int rcu_pending(int cpu)
+int rcu_pending(int cpu)
 {
 	struct rcu_state *rsp;
 
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 35047b2..de3c8fe 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -512,9 +512,21 @@ void tick_nohz_idle_enter(void)
 	local_irq_enable();
 }
 
+#ifdef CONFIG_CPUSETS_NO_HZ
+static bool can_stop_adaptive_tick(void)
+{
+	if (!sched_can_stop_tick())
+		return false;
+
+	/* Is there a grace period to complete ? */
+	if (rcu_pending(smp_processor_id()))
+		return false;
+
+	return true;
+}
+
 static void tick_nohz_cpuset_stop_tick(struct tick_sched *ts)
 {
-#ifdef CONFIG_CPUSETS_NO_HZ
 	int cpu = smp_processor_id();
 
 	if (!cpuset_adaptive_nohz() || is_idle_task(current))
@@ -523,12 +535,14 @@ static void tick_nohz_cpuset_stop_tick(struct tick_sched *ts)
 	if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
 		return;
 
-	if (!sched_can_stop_tick())
+	if (!can_stop_adaptive_tick())
 		return;
 
 	tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
-#endif
 }
+#else
+static void tick_nohz_cpuset_stop_tick(struct tick_sched *ts) { }
+#endif
 
 /**
  * tick_nohz_irq_exit - update next tick event from interrupt exit
@@ -860,7 +874,7 @@ void tick_nohz_check_adaptive(void)
 
 	if (cpuset_adaptive_nohz()) {
 		if (ts->tick_stopped && !is_idle_task(current)) {
-			if (!sched_can_stop_tick())
+			if (!can_stop_adaptive_tick())
 				tick_nohz_restart_sched_tick();
 		}
 	}
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 07/32] nohz/cpuset: Wake up adaptive nohz CPU when a timer gets enqueued
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (5 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 06/32] nohz/cpuset: Dont turn off the tick if rcu needs it Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 08/32] nohz/cpuset: Dont stop the tick if posix cpu timers are running Steven Rostedt
                   ` (26 subsequent siblings)
  33 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0007-nohz-cpuset-Wake-up-adaptive-nohz-CPU-when-a-timer-g.patch --]
[-- Type: text/plain, Size: 3389 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

Wake up a CPU when a timer list timer is enqueued there and
the CPU is in adaptive nohz mode. Sending an IPI to it makes
it reconsidering the next timer to program on top of recent
updates.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/sched.h |    4 ++--
 kernel/sched/core.c   |   28 +++++++++++++++++++++++++++-
 kernel/timer.c        |    2 +-
 3 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 749752e..a41dd22 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1972,9 +1972,9 @@ static inline void idle_task_exit(void) {}
 #endif
 
 #if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
-extern void wake_up_idle_cpu(int cpu);
+extern void wake_up_nohz_cpu(int cpu);
 #else
-static inline void wake_up_idle_cpu(int cpu) { }
+static inline void wake_up_nohz_cpu(int cpu) { }
 #endif
 
 extern unsigned int sysctl_sched_latency;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2716b79..7b35eda 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -578,7 +578,7 @@ unlock:
  * account when the CPU goes back to idle and evaluates the timer
  * wheel for the next timer event.
  */
-void wake_up_idle_cpu(int cpu)
+static void wake_up_idle_cpu(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
 
@@ -608,6 +608,32 @@ void wake_up_idle_cpu(int cpu)
 		smp_send_reschedule(cpu);
 }
 
+static bool wake_up_cpuset_nohz_cpu(int cpu)
+{
+#ifdef CONFIG_CPUSETS_NO_HZ
+	/*
+	 * If the current CPU doesn't see the target as nohz
+	 * then it means the target hasn't seen itself nohz
+	 * yet either. In this case we don't send an IPI to the
+	 * target because it hasn't yet tried to stop the tick.
+	 * But if the nohz flag is set concurrently, the target
+	 * will find the newly enqueued timer once we release
+	 * the base->lock.
+	 */
+	if (cpuset_cpu_adaptive_nohz(cpu)) {
+		smp_cpuset_update_nohz(cpu);
+		return true;
+	}
+#endif
+	return false;
+}
+
+void wake_up_nohz_cpu(int cpu)
+{
+	if (!wake_up_cpuset_nohz_cpu(cpu))
+		wake_up_idle_cpu(cpu);
+}
+
 static inline bool got_nohz_idle_kick(void)
 {
 	int cpu = smp_processor_id();
diff --git a/kernel/timer.c b/kernel/timer.c
index 367d008..51e20ca 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -936,7 +936,7 @@ void add_timer_on(struct timer_list *timer, int cpu)
 	 * makes sure that a CPU on the way to idle can not evaluate
 	 * the timer wheel.
 	 */
-	wake_up_idle_cpu(cpu);
+	wake_up_nohz_cpu(cpu);
 	spin_unlock_irqrestore(&base->lock, flags);
 }
 EXPORT_SYMBOL_GPL(add_timer_on);
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 08/32] nohz/cpuset: Dont stop the tick if posix cpu timers are running
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (6 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 07/32] nohz/cpuset: Wake up adaptive nohz CPU when a timer gets enqueued Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 09/32] nohz/cpuset: Restart tick when nohz flag is cleared on cpuset Steven Rostedt
                   ` (25 subsequent siblings)
  33 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0008-nohz-cpuset-Don-t-stop-the-tick-if-posix-cpu-timers-.patch --]
[-- Type: text/plain, Size: 3307 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

If either a per thread or a per process posix cpu timer is running,
don't stop the tick.

TODO: restart the tick if it is stopped and a posix cpu timer is
enqueued. Check we probably need a memory barrier for the per
process posix timer that can be enqueued from another task
of the group.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/posix-timers.h |    1 +
 kernel/posix-cpu-timers.c    |   12 ++++++++++++
 kernel/time/tick-sched.c     |    4 ++++
 3 files changed, 17 insertions(+)

diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 042058f..97480c2 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -119,6 +119,7 @@ int posix_timer_event(struct k_itimer *timr, int si_private);
 void posix_cpu_timer_schedule(struct k_itimer *timer);
 
 void run_posix_cpu_timers(struct task_struct *task);
+bool posix_cpu_timers_running(struct task_struct *tsk);
 void posix_cpu_timers_exit(struct task_struct *task);
 void posix_cpu_timers_exit_group(struct task_struct *task);
 
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 125cb67..79d4c24 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -6,6 +6,7 @@
 #include <linux/posix-timers.h>
 #include <linux/errno.h>
 #include <linux/math64.h>
+#include <linux/cpuset.h>
 #include <asm/uaccess.h>
 #include <linux/kernel_stat.h>
 #include <trace/events/timer.h>
@@ -1274,6 +1275,17 @@ static inline int fastpath_timer_check(struct task_struct *tsk)
 	return 0;
 }
 
+bool posix_cpu_timers_running(struct task_struct *tsk)
+{
+	if (!task_cputime_zero(&tsk->cputime_expires))
+		return true;
+
+	if (tsk->signal->cputimer.running)
+		return true;
+
+	return false;
+}
+
 /*
  * This is called from the timer interrupt handler.  The irq handler has
  * already updated our counts.  We need to check if any timers fire now.
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index de3c8fe..0a5e650 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -21,6 +21,7 @@
 #include <linux/sched.h>
 #include <linux/module.h>
 #include <linux/cpuset.h>
+#include <linux/posix-timers.h>
 
 #include <asm/irq_regs.h>
 
@@ -518,6 +519,9 @@ static bool can_stop_adaptive_tick(void)
 	if (!sched_can_stop_tick())
 		return false;
 
+	if (posix_cpu_timers_running(current))
+		return false;
+
 	/* Is there a grace period to complete ? */
 	if (rcu_pending(smp_processor_id()))
 		return false;
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 09/32] nohz/cpuset: Restart tick when nohz flag is cleared on cpuset
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (7 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 08/32] nohz/cpuset: Dont stop the tick if posix cpu timers are running Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-30 18:55   ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 10/32] nohz/cpuset: Restart the tick if printk needs it Steven Rostedt
                   ` (24 subsequent siblings)
  33 siblings, 1 reply; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0009-nohz-cpuset-Restart-tick-when-nohz-flag-is-cleared-o.patch --]
[-- Type: text/plain, Size: 3380 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

Issue an IPI to restart the tick on a CPU that belongs
to a cpuset when its nohz flag gets cleared.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/cpuset.h   |    2 ++
 kernel/cpuset.c          |   25 +++++++++++++++++++++++--
 kernel/time/tick-sched.c |    8 ++++++++
 3 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index 7e7eb41..631968b 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -260,6 +260,8 @@ static inline bool cpuset_adaptive_nohz(void)
 	 */
 	return cpuset_cpu_adaptive_nohz(smp_processor_id());
 }
+
+extern void cpuset_exit_nohz_interrupt(void *unused);
 #else
 static inline bool cpuset_cpu_adaptive_nohz(int cpu) { return false; }
 static inline bool cpuset_adaptive_nohz(void) { return false; }
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 6319d8e..1b67e5b 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1200,6 +1200,14 @@ static void cpuset_change_flag(struct task_struct *tsk,
 
 DEFINE_PER_CPU(atomic_t, cpu_adaptive_nohz_ref);
 
+static void cpu_exit_nohz(int cpu)
+{
+	preempt_disable();
+	smp_call_function_single(cpu, cpuset_exit_nohz_interrupt,
+				 NULL, true);
+	preempt_enable();
+}
+
 static void update_nohz_cpus(struct cpuset *old_cs, struct cpuset *cs)
 {
 	int cpu;
@@ -1211,9 +1219,22 @@ static void update_nohz_cpus(struct cpuset *old_cs, struct cpuset *cs)
 	for_each_cpu(cpu, cs->cpus_allowed) {
 		atomic_t *ref = &per_cpu(cpu_adaptive_nohz_ref, cpu);
 		if (is_adaptive_nohz(cs))
-			atomic_inc(ref);
+			val = atomic_inc_return(ref);
 		else
-			atomic_dec(ref);
+			val = atomic_dec_return(ref);
+
+		if (!val) {
+			/*
+			 * The update to cpu_adaptive_nohz_ref must be
+			 * visible right away. So that once we restart the tick
+			 * from the IPI, it won't be stopped again due to cache
+			 * update lag.
+			 * FIXME: We probably need more to ensure this value is really
+			 * visible right away.
+			 */
+			smp_mb();
+			cpu_exit_nohz(cpu);
+		}
 	}
 }
 #else
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 0a5e650..de7de68 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -884,6 +884,14 @@ void tick_nohz_check_adaptive(void)
 	}
 }
 
+void cpuset_exit_nohz_interrupt(void *unused)
+{
+	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+
+	if (ts->tick_stopped && !is_idle_task(current))
+		tick_nohz_restart_adaptive();
+}
+
 void tick_nohz_post_schedule(void)
 {
 	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 10/32] nohz/cpuset: Restart the tick if printk needs it
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (8 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 09/32] nohz/cpuset: Restart tick when nohz flag is cleared on cpuset Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-30 19:01   ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 11/32] rcu: Restart the tick on non-responding adaptive nohz CPUs Steven Rostedt
                   ` (23 subsequent siblings)
  33 siblings, 1 reply; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0010-nohz-cpuset-Restart-the-tick-if-printk-needs-it.patch --]
[-- Type: text/plain, Size: 2170 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

If we are in nohz adaptive mode and printk is called, the tick is
missing to wake up the logger. We need to restart the tick when that
happens. Do this asynchronously by issuing a tick restart self IPI
to avoid deadlocking with the current random locking chain.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/printk.c |   15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index 2d607f4..bf9048d 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -42,6 +42,7 @@
 #include <linux/notifier.h>
 #include <linux/rculist.h>
 #include <linux/poll.h>
+#include <linux/cpuset.h>
 
 #include <asm/uaccess.h>
 
@@ -1977,8 +1978,20 @@ int printk_needs_cpu(int cpu)
 
 void wake_up_klogd(void)
 {
-	if (waitqueue_active(&log_wait))
+	unsigned long flags;
+
+	if (waitqueue_active(&log_wait)) {
 		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
+		/* Make it visible from any interrupt from now */
+		barrier();
+		/*
+		 * It's safe to check that even if interrupts are not disabled.
+		 * If we enable nohz adaptive mode concurrently, we'll see the
+		 * printk_pending value and thus keep a periodic tick behaviour.
+		 */
+		if (cpuset_adaptive_nohz())
+			smp_cpuset_update_nohz(smp_processor_id());
+	}
 }
 
 static void console_cont_flush(char *text, size_t size)
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 11/32] rcu: Restart the tick on non-responding adaptive nohz CPUs
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (9 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 10/32] nohz/cpuset: Restart the tick if printk needs it Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 12/32] rcu: Restart tick if we enqueue a callback in a nohz/cpuset CPU Steven Rostedt
                   ` (22 subsequent siblings)
  33 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0011-rcu-Restart-the-tick-on-non-responding-adaptive-nohz.patch --]
[-- Type: text/plain, Size: 2166 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

When a CPU in adaptive nohz mode doesn't respond to complete
a grace period, issue it a specific IPI so that it restarts
the tick and chases a quiescent state.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/rcutree.c |   18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 0dca81f..21664a3 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -53,6 +53,7 @@
 #include <linux/delay.h>
 #include <linux/stop_machine.h>
 #include <linux/random.h>
+#include <linux/cpuset.h>
 
 #include "rcutree.h"
 #include <trace/events/rcu.h>
@@ -798,6 +799,20 @@ static int dyntick_save_progress_counter(struct rcu_data *rdp)
 	return (rdp->dynticks_snap & 0x1) == 0;
 }
 
+static void cpuset_update_rcu_cpu(int cpu)
+{
+#ifdef CONFIG_CPUSETS_NO_HZ
+	unsigned long flags;
+
+	local_irq_save(flags);
+
+	if (cpuset_cpu_adaptive_nohz(cpu))
+		smp_cpuset_update_nohz(cpu);
+
+	local_irq_restore(flags);
+#endif
+}
+
 /*
  * Return true if the specified CPU has passed through a quiescent
  * state by virtue of being in or having passed through an dynticks
@@ -845,6 +860,9 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
 		rdp->offline_fqs++;
 		return 1;
 	}
+
+	cpuset_update_rcu_cpu(rdp->cpu);
+
 	return 0;
 }
 
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 12/32] rcu: Restart tick if we enqueue a callback in a nohz/cpuset CPU
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (10 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 11/32] rcu: Restart the tick on non-responding adaptive nohz CPUs Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 13/32] nohz: Generalize tickless cpu time accounting Steven Rostedt
                   ` (21 subsequent siblings)
  33 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0012-rcu-Restart-tick-if-we-enqueue-a-callback-in-a-nohz-.patch --]
[-- Type: text/plain, Size: 1940 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

If we enqueue an rcu callback, we need the CPU tick to stay
alive until we take care of those by completing the appropriate
grace period.

Thus, when we call_rcu(), send a self IPI that checks rcu_needs_cpu()
so that we restore a periodic tick behaviour that can take care of
everything.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/rcutree.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 21664a3..7dce432 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -2081,6 +2081,13 @@ static void invoke_rcu_core(void)
 static void __call_rcu_core(struct rcu_state *rsp, struct rcu_data *rdp,
 			    struct rcu_head *head, unsigned long flags)
 {
+	/* Restart the timer if needed to handle the callbacks */
+	if (cpuset_adaptive_nohz()) {
+		/* Make updates on nxtlist visible to self IPI */
+		barrier();
+		smp_cpuset_update_nohz(smp_processor_id());
+	}
+
 	/*
 	 * If called from an extended quiescent state, invoke the RCU
 	 * core in order to force a re-evaluation of RCU's idleness.
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 13/32] nohz: Generalize tickless cpu time accounting
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (11 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 12/32] rcu: Restart tick if we enqueue a callback in a nohz/cpuset CPU Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 14/32] nohz/cpuset: Account user and system times in adaptive nohz mode Steven Rostedt
                   ` (20 subsequent siblings)
  33 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0013-nohz-Generalize-tickless-cpu-time-accounting.patch --]
[-- Type: text/plain, Size: 10985 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

When the CPU enters idle, it saves the jiffies stamp into
ts->idle_jiffies, increment this value by one every time
there is a timer interrupt and accounts "jiffies - ts->idle_jiffies"
idle ticks when we exit idle. This way we still account the
idle CPU time even if the tick is stopped.

This patch settles the ground to generalize this for user
and system accounting. ts->idle_jiffies becomes ts->saved_jiffies and
a new member ts->saved_jiffies_whence indicates from which domain
we saved the jiffies: user, system or idle.

This is one more step toward making the tickless infrastructure usable
further idle contexts.

For now this is only used by idle but further patches make use of
it for user and system.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/kernel_stat.h |    2 ++
 include/linux/tick.h        |   45 ++++++++++++++++++++--------------
 kernel/sched/cputime.c      |   22 +++++++++++++++++
 kernel/time/tick-sched.c    |   57 ++++++++++++++++++++++++++++---------------
 kernel/time/timer_list.c    |    3 ++-
 5 files changed, 90 insertions(+), 39 deletions(-)

diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 36d12f0..88a44a3 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -122,7 +122,9 @@ static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
 extern unsigned long long task_delta_exec(struct task_struct *);
 
 extern void account_user_time(struct task_struct *, cputime_t, cputime_t);
+extern void account_user_ticks(struct task_struct *, unsigned long);
 extern void account_system_time(struct task_struct *, int, cputime_t, cputime_t);
+extern void account_system_ticks(struct task_struct *, unsigned long);
 extern void account_steal_time(cputime_t);
 extern void account_idle_time(cputime_t);
 
diff --git a/include/linux/tick.h b/include/linux/tick.h
index 9b66fd3..03b6edd 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -27,25 +27,33 @@ enum tick_nohz_mode {
 	NOHZ_MODE_HIGHRES,
 };
 
+enum tick_saved_jiffies {
+	JIFFIES_SAVED_NONE,
+	JIFFIES_SAVED_IDLE,
+	JIFFIES_SAVED_USER,
+	JIFFIES_SAVED_SYS,
+};
+
 /**
  * struct tick_sched - sched tick emulation and no idle tick control/stats
- * @sched_timer:	hrtimer to schedule the periodic tick in high
- *			resolution mode
- * @last_tick:		Store the last tick expiry time when the tick
- *			timer is modified for nohz sleeps. This is necessary
- *			to resume the tick timer operation in the timeline
- *			when the CPU returns from nohz sleep.
- * @tick_stopped:	Indicator that the idle tick has been stopped
- * @idle_jiffies:	jiffies at the entry to idle for idle time accounting
- * @idle_calls:		Total number of idle calls
- * @idle_sleeps:	Number of idle calls, where the sched tick was stopped
- * @idle_entrytime:	Time when the idle call was entered
- * @idle_waketime:	Time when the idle was interrupted
- * @idle_exittime:	Time when the idle state was left
- * @idle_sleeptime:	Sum of the time slept in idle with sched tick stopped
- * @iowait_sleeptime:	Sum of the time slept in idle with sched tick stopped, with IO outstanding
- * @sleep_length:	Duration of the current idle sleep
- * @do_timer_lst:	CPU was the last one doing do_timer before going idle
+ * @sched_timer:		hrtimer to schedule the periodic tick in high
+ *				resolution mode
+ * @last_tick:			Store the last tick expiry time when the tick
+ *				timer is modified for nohz sleeps. This is necessary
+ *				to resume the tick timer operation in the timeline
+ *				when the CPU returns from nohz sleep.
+ * @tick_stopped:		Indicator that the idle tick has been stopped
+ * @idle_calls:			Total number of idle calls
+ * @idle_sleeps:		Number of idle calls, where the sched tick was stopped
+ * @idle_entrytime:		Time when the idle call was entered
+ * @idle_waketime:		Time when the idle was interrupted
+ * @idle_exittime:		Time when the idle state was left
+ * @idle_sleeptime:		Sum of the time slept in idle with sched tick stopped
+ * @saved_jiffies:		Jiffies snapshot on tick stop for cpu time accounting
+ * @saved_jiffies_whence:	Area where we saved @saved_jiffies
+ * @iowait_sleeptime:		Sum of the time slept in idle with sched tick stopped, with IO outstanding
+ * @sleep_length:		Duration of the current idle sleep
+ * @do_timer_lst:		CPU was the last one doing do_timer before going idle
  */
 struct tick_sched {
 	struct hrtimer			sched_timer;
@@ -54,7 +62,6 @@ struct tick_sched {
 	ktime_t				last_tick;
 	int				inidle;
 	int				tick_stopped;
-	unsigned long			idle_jiffies;
 	unsigned long			idle_calls;
 	unsigned long			idle_sleeps;
 	int				idle_active;
@@ -62,6 +69,8 @@ struct tick_sched {
 	ktime_t				idle_waketime;
 	ktime_t				idle_exittime;
 	ktime_t				idle_sleeptime;
+	enum tick_saved_jiffies		saved_jiffies_whence;
+	unsigned long			saved_jiffies;
 	ktime_t				iowait_sleeptime;
 	ktime_t				sleep_length;
 	unsigned long			last_jiffies;
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 81b763b..b7a4d1a 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -166,6 +166,17 @@ void account_user_time(struct task_struct *p, cputime_t cputime,
 	acct_update_integrals(p);
 }
 
+void account_user_ticks(struct task_struct *p, unsigned long ticks)
+{
+	cputime_t delta_cputime, delta_scaled;
+
+	if (ticks) {
+		delta_cputime = jiffies_to_cputime(ticks);
+		delta_scaled = cputime_to_scaled(ticks);
+		account_user_time(p, delta_cputime, delta_scaled);
+	}
+}
+
 /*
  * Account guest cpu time to a process.
  * @p: the process that the cpu time gets accounted to
@@ -243,6 +254,17 @@ void account_system_time(struct task_struct *p, int hardirq_offset,
 	__account_system_time(p, cputime, cputime_scaled, index);
 }
 
+void account_system_ticks(struct task_struct *p, unsigned long ticks)
+{
+	cputime_t delta_cputime, delta_scaled;
+
+	if (ticks) {
+		delta_cputime = jiffies_to_cputime(ticks);
+		delta_scaled = cputime_to_scaled(ticks);
+		account_system_time(p, 0, delta_cputime, delta_scaled);
+	}
+}
+
 /*
  * Account for involuntary wait time.
  * @cputime: the cpu time spent in involuntary wait
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index de7de68..b8f3757 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -466,7 +466,8 @@ static void __tick_nohz_idle_enter(struct tick_sched *ts)
 		}
 
 		if (!was_stopped && ts->tick_stopped) {
-			ts->idle_jiffies = ts->last_jiffies;
+			ts->saved_jiffies = ts->last_jiffies;
+			ts->saved_jiffies_whence = JIFFIES_SAVED_IDLE;
 			nohz_balance_enter_idle(cpu);
 			calc_load_enter_idle();
 		}
@@ -647,22 +648,36 @@ void tick_nohz_restart_sched_tick(void)
 }
 
 
-static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
+static void tick_nohz_account_ticks(struct tick_sched *ts)
 {
-#ifndef CONFIG_VIRT_CPU_ACCOUNTING
 	unsigned long ticks;
 	/*
-	 * We stopped the tick in idle. Update process times would miss the
-	 * time we slept as update_process_times does only a 1 tick
-	 * accounting. Enforce that this is accounted to idle !
+	 * We stopped the tick. Update process times would miss the
+	 * time we ran tickless as update_process_times does only a 1 tick
+	 * accounting. Enforce that this is accounted to nohz timeslices.
 	 */
-	ticks = jiffies - ts->idle_jiffies;
+	ticks = jiffies - ts->saved_jiffies;
 	/*
 	 * We might be one off. Do not randomly account a huge number of ticks!
 	 */
-	if (ticks && ticks < LONG_MAX)
-		account_idle_ticks(ticks);
-#endif
+	if (ticks && ticks < LONG_MAX) {
+		switch (ts->saved_jiffies_whence) {
+		case JIFFIES_SAVED_IDLE:
+			account_idle_ticks(ticks);
+			break;
+		case JIFFIES_SAVED_USER:
+			account_user_ticks(current, ticks);
+			break;
+		case JIFFIES_SAVED_SYS:
+			account_system_ticks(current, ticks);
+			break;
+		case JIFFIES_SAVED_NONE:
+			break;
+		default:
+			WARN_ON_ONCE(1);
+		}
+	}
+	ts->saved_jiffies_whence = JIFFIES_SAVED_NONE;
 }
 
 /**
@@ -694,7 +709,9 @@ void tick_nohz_idle_exit(void)
 		nohz_balance_enter_idle(cpu);
 		calc_load_exit_idle();
 		__tick_nohz_restart_sched_tick(ts, now);
-		tick_nohz_account_idle_ticks(ts);
+#ifndef CONFIG_VIRT_CPU_ACCOUNTING
+		tick_nohz_account_ticks(ts);
+#endif
 	}
 
 	local_irq_enable();
@@ -742,7 +759,7 @@ static void tick_nohz_handler(struct clock_event_device *dev)
 	 */
 	if (ts->tick_stopped) {
 		touch_softlockup_watchdog();
-		ts->idle_jiffies++;
+		ts->saved_jiffies++;
 	}
 
 	update_process_times(user_mode(regs));
@@ -953,17 +970,17 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
 	if (regs) {
 		int user = user_mode(regs);
 		/*
-		 * When we are idle and the tick is stopped, we have to touch
-		 * the watchdog as we might not schedule for a really long
-		 * time. This happens on complete idle SMP systems while
-		 * waiting on the login prompt. We also increment the "start of
-		 * idle" jiffy stamp so the idle accounting adjustment we do
-		 * when we go busy again does not account too much ticks.
+		 * When the tick is stopped, we have to touch the watchdog
+		 * as we might not schedule for a really long time. This
+		 * happens on complete idle SMP systems while waiting on
+		 * the login prompt. We also increment the last jiffy stamp
+		 * recorded when we stopped the tick so the cpu time accounting
+		 * adjustment does not account too much ticks when we flush them.
 		 */
 		if (ts->tick_stopped) {
+			/* CHECKME: may be this is only needed in idle */
 			touch_softlockup_watchdog();
-			if (is_idle_task(current))
-				ts->idle_jiffies++;
+			ts->saved_jiffies++;
 		}
 		update_process_times(user);
 		profile_tick(CPU_PROFILING);
diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c
index af5a7e9..54705e3 100644
--- a/kernel/time/timer_list.c
+++ b/kernel/time/timer_list.c
@@ -169,7 +169,8 @@ static void print_cpu(struct seq_file *m, int cpu, u64 now)
 		P(nohz_mode);
 		P_ns(last_tick);
 		P(tick_stopped);
-		P(idle_jiffies);
+		/* CHECKME: Do we want saved_jiffies_whence as well? */
+		P(saved_jiffies);
 		P(idle_calls);
 		P(idle_sleeps);
 		P_ns(idle_entrytime);
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 14/32] nohz/cpuset: Account user and system times in adaptive nohz mode
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (12 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 13/32] nohz: Generalize tickless cpu time accounting Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 15/32] nohz/cpuset: New API to flush cputimes on nohz cpusets Steven Rostedt
                   ` (19 subsequent siblings)
  33 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0014-nohz-cpuset-Account-user-and-system-times-in-adaptiv.patch --]
[-- Type: text/plain, Size: 8067 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

If we are not running the tick, we are not anymore regularly counting
the user/system cputime at every jiffies.

To solve this, save a snapshot of the jiffies when we stop the tick
and keep track of where we saved it: user or system. On top of this,
we account the cputime elapsed when we cross the kernel entry/exit
boundaries and when we restart the tick.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/tick.h     |   12 +++++
 kernel/sched/core.c      |    1 +
 kernel/time/tick-sched.c |  129 +++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 140 insertions(+), 2 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 03b6edd..598b492 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -153,11 +153,23 @@ static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
 # endif /* !NO_HZ */
 
 #ifdef CONFIG_CPUSETS_NO_HZ
+extern void tick_nohz_enter_kernel(void);
+extern void tick_nohz_exit_kernel(void);
+extern void tick_nohz_enter_exception(struct pt_regs *regs);
+extern void tick_nohz_exit_exception(struct pt_regs *regs);
 extern void tick_nohz_check_adaptive(void);
+extern void tick_nohz_pre_schedule(void);
 extern void tick_nohz_post_schedule(void);
+extern bool tick_nohz_account_tick(void);
 #else /* !CPUSETS_NO_HZ */
+static inline void tick_nohz_enter_kernel(void) { }
+static inline void tick_nohz_exit_kernel(void) { }
+static inline void tick_nohz_enter_exception(struct pt_regs *regs) { }
+static inline void tick_nohz_exit_exception(struct pt_regs *regs) { }
 static inline void tick_nohz_check_adaptive(void) { }
+static inline void tick_nohz_pre_schedule(void) { }
 static inline void tick_nohz_post_schedule(void) { }
+static inline bool tick_nohz_account_tick(void) { return false; }
 #endif /* CPUSETS_NO_HZ */
 
 #endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7b35eda..bebea17 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1771,6 +1771,7 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev,
 		    struct task_struct *next)
 {
 	trace_sched_switch(prev, next);
+	tick_nohz_pre_schedule();
 	sched_info_switch(prev, next);
 	perf_event_task_sched_out(prev, next);
 	fire_sched_out_preempt_notifiers(prev, next);
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index b8f3757..de8ba59 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -532,7 +532,13 @@ static bool can_stop_adaptive_tick(void)
 
 static void tick_nohz_cpuset_stop_tick(struct tick_sched *ts)
 {
+	struct pt_regs *regs = get_irq_regs();
 	int cpu = smp_processor_id();
+	int was_stopped;
+	int user = 0;
+
+	if (regs)
+		user = user_mode(regs);
 
 	if (!cpuset_adaptive_nohz() || is_idle_task(current))
 		return;
@@ -543,7 +549,36 @@ static void tick_nohz_cpuset_stop_tick(struct tick_sched *ts)
 	if (!can_stop_adaptive_tick())
 		return;
 
+	/*
+	 * If we stop the tick between the syscall exit hook and the actual
+	 * return to userspace, we'll think we are in system space (due to
+	 * user_mode() thinking so). And since we passed the syscall exit hook
+	 * already we won't realize we are in userspace. So the time spent
+	 * tickless would be spuriously accounted as belonging to system.
+	 *
+	 * To avoid this kind of problem, we only stop the tick from userspace
+	 * (until we find a better solution).
+	 * We can later enter the kernel and keep the tick stopped. But the place
+	 * where we stop the tick must be userspace.
+	 * We make an exception for kernel threads since they always execute in
+	 * kernel space.
+	 */
+	if (!user && current->mm)
+		return;
+
+	was_stopped = ts->tick_stopped;
 	tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
+
+	if (!was_stopped && ts->tick_stopped) {
+		WARN_ON_ONCE(ts->saved_jiffies_whence != JIFFIES_SAVED_NONE);
+		if (user)
+			ts->saved_jiffies_whence = JIFFIES_SAVED_USER;
+		else if (!current->mm)
+			ts->saved_jiffies_whence = JIFFIES_SAVED_SYS;
+
+		ts->saved_jiffies = jiffies;
+		set_thread_flag(TIF_NOHZ);
+	}
 }
 #else
 static void tick_nohz_cpuset_stop_tick(struct tick_sched *ts) { }
@@ -871,6 +906,68 @@ void tick_check_idle(int cpu)
 }
 
 #ifdef CONFIG_CPUSETS_NO_HZ
+void tick_nohz_exit_kernel(void)
+{
+	unsigned long flags;
+	struct tick_sched *ts;
+	unsigned long delta_jiffies;
+
+	if (!test_thread_flag(TIF_NOHZ))
+		return;
+
+	local_irq_save(flags);
+
+	ts = &__get_cpu_var(tick_cpu_sched);
+
+	WARN_ON_ONCE(!ts->tick_stopped);
+	WARN_ON_ONCE(ts->saved_jiffies_whence != JIFFIES_SAVED_SYS);
+
+	delta_jiffies = jiffies - ts->saved_jiffies;
+	account_system_ticks(current, delta_jiffies);
+
+	ts->saved_jiffies = jiffies;
+	ts->saved_jiffies_whence = JIFFIES_SAVED_USER;
+
+	local_irq_restore(flags);
+}
+
+void tick_nohz_enter_kernel(void)
+{
+	unsigned long flags;
+	struct tick_sched *ts;
+	unsigned long delta_jiffies;
+
+	if (!test_thread_flag(TIF_NOHZ))
+		return;
+
+	local_irq_save(flags);
+
+	ts = &__get_cpu_var(tick_cpu_sched);
+
+	WARN_ON_ONCE(!ts->tick_stopped);
+	WARN_ON_ONCE(ts->saved_jiffies_whence != JIFFIES_SAVED_USER);
+
+	delta_jiffies = jiffies - ts->saved_jiffies;
+	account_user_ticks(current, delta_jiffies);
+
+	ts->saved_jiffies = jiffies;
+	ts->saved_jiffies_whence = JIFFIES_SAVED_SYS;
+
+	local_irq_restore(flags);
+}
+
+void tick_nohz_enter_exception(struct pt_regs *regs)
+{
+	if (user_mode(regs))
+		tick_nohz_enter_kernel();
+}
+
+void tick_nohz_exit_exception(struct pt_regs *regs)
+{
+	if (user_mode(regs))
+		tick_nohz_exit_kernel();
+}
+
 /*
  * Take the timer duty if nobody is taking care of it.
  * If a CPU already does and and it's in a nohz cpuset,
@@ -889,6 +986,15 @@ static void tick_do_timer_check_handler(int cpu)
 	}
 }
 
+static void tick_nohz_restart_adaptive(void)
+{
+	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+
+	tick_nohz_account_ticks(ts);
+	tick_nohz_restart_sched_tick();
+	clear_thread_flag(TIF_NOHZ);
+}
+
 void tick_nohz_check_adaptive(void)
 {
 	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
@@ -896,7 +1002,7 @@ void tick_nohz_check_adaptive(void)
 	if (cpuset_adaptive_nohz()) {
 		if (ts->tick_stopped && !is_idle_task(current)) {
 			if (!can_stop_adaptive_tick())
-				tick_nohz_restart_sched_tick();
+				tick_nohz_restart_adaptive();
 		}
 	}
 }
@@ -909,6 +1015,26 @@ void cpuset_exit_nohz_interrupt(void *unused)
 		tick_nohz_restart_adaptive();
 }
 
+/*
+ * Flush cputime and clear hooks before context switch in case we
+ * haven't yet received the IPI that should take care of that.
+ */
+void tick_nohz_pre_schedule(void)
+{
+	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+
+	/*
+	 * We are holding the rq lock and if we restart the tick now
+	 * we could deadlock by acquiring the lock twice. Instead
+	 * we do that on post schedule time. For now do the cleanups
+	 * on the prev task.
+	 */
+	if (test_thread_flag(TIF_NOHZ)) {
+		tick_nohz_account_ticks(ts);
+		clear_thread_flag(TIF_NOHZ);
+	}
+}
+
 void tick_nohz_post_schedule(void)
 {
 	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
@@ -921,7 +1047,6 @@ void tick_nohz_post_schedule(void)
 	if (ts->tick_stopped)
 		tick_nohz_restart_sched_tick();
 }
-
 #else
 
 static void tick_do_timer_check_handler(int cpu)
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 15/32] nohz/cpuset: New API to flush cputimes on nohz cpusets
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (13 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 14/32] nohz/cpuset: Account user and system times in adaptive nohz mode Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 16/32] nohz/cpuset: Flush cputime on threads in nohz cpusets when waiting leader Steven Rostedt
                   ` (18 subsequent siblings)
  33 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0015-nohz-cpuset-New-API-to-flush-cputimes-on-nohz-cpuset.patch --]
[-- Type: text/plain, Size: 5724 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

Provide a new API that sends an IPI to every CPUs included
in nohz cpusets in order to flush their cputimes. It's going
to be useful for those that want to see accurate cputimes
on a nohz cpuset.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/cpuset.h   |    2 ++
 include/linux/tick.h     |    1 +
 kernel/cpuset.c          |   34 +++++++++++++++++++++++++++++++++-
 kernel/time/tick-sched.c |   21 ++++++++++++++++-----
 4 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index 631968b..b6c2460 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -262,9 +262,11 @@ static inline bool cpuset_adaptive_nohz(void)
 }
 
 extern void cpuset_exit_nohz_interrupt(void *unused);
+extern void cpuset_nohz_flush_cputimes(void);
 #else
 static inline bool cpuset_cpu_adaptive_nohz(int cpu) { return false; }
 static inline bool cpuset_adaptive_nohz(void) { return false; }
+static inline void cpuset_nohz_flush_cputimes(void) { }
 
 #endif /* CONFIG_CPUSETS_NO_HZ */
 
diff --git a/include/linux/tick.h b/include/linux/tick.h
index 598b492..3c31d6e 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -161,6 +161,7 @@ extern void tick_nohz_check_adaptive(void);
 extern void tick_nohz_pre_schedule(void);
 extern void tick_nohz_post_schedule(void);
 extern bool tick_nohz_account_tick(void);
+extern void tick_nohz_flush_current_times(bool restart_tick);
 #else /* !CPUSETS_NO_HZ */
 static inline void tick_nohz_enter_kernel(void) { }
 static inline void tick_nohz_exit_kernel(void) { }
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 1b67e5b..84f9f2b 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -59,6 +59,7 @@
 #include <linux/mutex.h>
 #include <linux/workqueue.h>
 #include <linux/cgroup.h>
+#include <linux/tick.h>
 
 /*
  * Workqueue for cpuset related tasks.
@@ -1200,6 +1201,23 @@ static void cpuset_change_flag(struct task_struct *tsk,
 
 DEFINE_PER_CPU(atomic_t, cpu_adaptive_nohz_ref);
 
+static cpumask_t nohz_cpuset_mask;
+
+static void flush_cputime_interrupt(void *unused)
+{
+	tick_nohz_flush_current_times(false);
+}
+
+void cpuset_nohz_flush_cputimes(void)
+{
+	preempt_disable();
+	smp_call_function_many(&nohz_cpuset_mask, flush_cputime_interrupt,
+			       NULL, true);
+	preempt_enable();
+	/* Make the utime/stime updates visible */
+	smp_mb();
+}
+
 static void cpu_exit_nohz(int cpu)
 {
 	preempt_disable();
@@ -1223,7 +1241,15 @@ static void update_nohz_cpus(struct cpuset *old_cs, struct cpuset *cs)
 		else
 			val = atomic_dec_return(ref);
 
-		if (!val) {
+		if (val == 1) {
+			cpumask_set_cpu(cpu, &nohz_cpuset_mask);
+			/*
+			 * The mask update needs to be visible right away
+			 * so that this CPU is part of the cputime IPI
+			 * update right now.
+			 */
+			 smp_mb();
+		} else if (!val) {
 			/*
 			 * The update to cpu_adaptive_nohz_ref must be
 			 * visible right away. So that once we restart the tick
@@ -1234,6 +1260,12 @@ static void update_nohz_cpus(struct cpuset *old_cs, struct cpuset *cs)
 			 */
 			smp_mb();
 			cpu_exit_nohz(cpu);
+			/*
+			 * Now that the tick has been restarted and cputimes
+			 * flushed, we don't need anymore to be part of the
+			 * cputime flush IPI.
+			 */
+			cpumask_clear_cpu(cpu, &nohz_cpuset_mask);
 		}
 	}
 }
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index de8ba59..2627663 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -712,7 +712,6 @@ static void tick_nohz_account_ticks(struct tick_sched *ts)
 			WARN_ON_ONCE(1);
 		}
 	}
-	ts->saved_jiffies_whence = JIFFIES_SAVED_NONE;
 }
 
 /**
@@ -746,6 +745,7 @@ void tick_nohz_idle_exit(void)
 		__tick_nohz_restart_sched_tick(ts, now);
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING
 		tick_nohz_account_ticks(ts);
+		ts->saved_jiffies_whence = JIFFIES_SAVED_NONE;
 #endif
 	}
 
@@ -988,9 +988,7 @@ static void tick_do_timer_check_handler(int cpu)
 
 static void tick_nohz_restart_adaptive(void)
 {
-	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
-
-	tick_nohz_account_ticks(ts);
+	tick_nohz_flush_current_times(true);
 	tick_nohz_restart_sched_tick();
 	clear_thread_flag(TIF_NOHZ);
 }
@@ -1030,7 +1028,7 @@ void tick_nohz_pre_schedule(void)
 	 * on the prev task.
 	 */
 	if (test_thread_flag(TIF_NOHZ)) {
-		tick_nohz_account_ticks(ts);
+		tick_nohz_flush_current_times(true);
 		clear_thread_flag(TIF_NOHZ);
 	}
 }
@@ -1047,6 +1045,19 @@ void tick_nohz_post_schedule(void)
 	if (ts->tick_stopped)
 		tick_nohz_restart_sched_tick();
 }
+
+void tick_nohz_flush_current_times(bool restart_tick)
+{
+	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+
+	if (ts->tick_stopped) {
+		tick_nohz_account_ticks(ts);
+		if (restart_tick)
+			ts->saved_jiffies_whence = JIFFIES_SAVED_NONE;
+		else
+			ts->saved_jiffies = jiffies;
+	}
+}
 #else
 
 static void tick_do_timer_check_handler(int cpu)
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 16/32] nohz/cpuset: Flush cputime on threads in nohz cpusets when waiting leader
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (14 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 15/32] nohz/cpuset: New API to flush cputimes on nohz cpusets Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 17/32] nohz/cpuset: Flush cputimes on procfs stat file read Steven Rostedt
                   ` (17 subsequent siblings)
  33 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0016-nohz-cpuset-Flush-cputime-on-threads-in-nohz-cpusets.patch --]
[-- Type: text/plain, Size: 2039 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

When we wait for a zombie task, flush the cputimes on nohz cpusets
in case we are waiting for a group leader that has threads running
in nohz CPUs. This way thread_group_times() doesn't report stale
values.

<doubts>
If I understood well the code, by the time we call that thread_group_times(),
we may have childs that are still running, so this is necessary.
But I need to check deeper.
</doubts>

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/exit.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/exit.c b/kernel/exit.c
index 346616c..154c26b 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -53,6 +53,7 @@
 #include <linux/oom.h>
 #include <linux/writeback.h>
 #include <linux/shm.h>
+#include <linux/cpuset.h>
 
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
@@ -1634,6 +1635,13 @@ repeat:
 	   (!wo->wo_pid || hlist_empty(&wo->wo_pid->tasks[wo->wo_type])))
 		goto notask;
 
+	/*
+	 * For cputime in sub-threads before adding them.
+	 * Must be called outside tasklist_lock lock because write lock
+	 * can be acquired under irqs disabled.
+	 */
+	cpuset_nohz_flush_cputimes();
+
 	set_current_state(TASK_INTERRUPTIBLE);
 	read_lock(&tasklist_lock);
 	tsk = current;
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 17/32] nohz/cpuset: Flush cputimes on procfs stat file read
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (15 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 16/32] nohz/cpuset: Flush cputime on threads in nohz cpusets when waiting leader Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 18/32] nohz/cpuset: Flush cputimes for getrusage() and times() syscalls Steven Rostedt
                   ` (16 subsequent siblings)
  33 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0017-nohz-cpuset-Flush-cputimes-on-procfs-stat-file-read.patch --]
[-- Type: text/plain, Size: 1533 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

When we read a process's procfs stat file, we need
to flush the cputimes of the tasks running in nohz
cpusets in case some childs in the thread group are
running there.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 fs/proc/array.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index c1c207c..f7e1fdc 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -406,6 +406,8 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 	cutime = cstime = utime = stime = 0;
 	cgtime = gtime = 0;
 
+	/* For thread group times */
+	cpuset_nohz_flush_cputimes();
 	if (lock_task_sighand(task, &flags)) {
 		struct signal_struct *sig = task->signal;
 
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 18/32] nohz/cpuset: Flush cputimes for getrusage() and times() syscalls
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (16 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 17/32] nohz/cpuset: Flush cputimes on procfs stat file read Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 19/32] x86: Syscall hooks for nohz cpusets Steven Rostedt
                   ` (15 subsequent siblings)
  33 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0018-nohz-cpuset-Flush-cputimes-for-getrusage-and-times-s.patch --]
[-- Type: text/plain, Size: 1944 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

Both syscalls need to iterate through the thread group to get
the cputimes. As some threads of the group may be running on
nohz cpuset, we need to flush the cputimes there.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sys.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/sys.c b/kernel/sys.c
index e6e0ece..b57ea9a 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -47,6 +47,7 @@
 #include <linux/syscalls.h>
 #include <linux/kprobes.h>
 #include <linux/user_namespace.h>
+#include <linux/cpuset.h>
 
 #include <linux/kmsg_dump.h>
 /* Move somewhere else to avoid recompiling? */
@@ -1045,6 +1046,8 @@ void do_sys_times(struct tms *tms)
 {
 	cputime_t tgutime, tgstime, cutime, cstime;
 
+	cpuset_nohz_flush_cputimes();
+
 	spin_lock_irq(&current->sighand->siglock);
 	thread_group_times(current, &tgutime, &tgstime);
 	cutime = current->signal->cutime;
@@ -1710,6 +1713,9 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
 		goto out;
 	}
 
+	/* For thread_group_times */
+	cpuset_nohz_flush_cputimes();
+
 	if (!lock_task_sighand(p, &flags))
 		return;
 
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 19/32] x86: Syscall hooks for nohz cpusets
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (17 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 18/32] nohz/cpuset: Flush cputimes for getrusage() and times() syscalls Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 20/32] nohz/cpuset: enable addition&removal of cpus while in adaptive nohz mode Steven Rostedt
                   ` (14 subsequent siblings)
  33 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0019-x86-Syscall-hooks-for-nohz-cpusets.patch --]
[-- Type: text/plain, Size: 2016 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

Add syscall hooks to notify syscall entry and exit on
CPUs running in adative nohz mode.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 arch/x86/kernel/ptrace.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index b00b33a..9c18e1e 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -22,6 +22,7 @@
 #include <linux/perf_event.h>
 #include <linux/hw_breakpoint.h>
 #include <linux/rcupdate.h>
+#include <linux/tick.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
@@ -1461,6 +1462,10 @@ long syscall_trace_enter(struct pt_regs *regs)
 {
 	long ret = 0;
 
+	/* Notify nohz task syscall early so the rest can use rcu */
+	/* (SDR: Does the rcu_user_exit() make this obsolete?) */
+	tick_nohz_enter_kernel();
+
 	rcu_user_exit();
 
 	/*
@@ -1528,4 +1533,10 @@ void syscall_trace_leave(struct pt_regs *regs)
 		tracehook_report_syscall_exit(regs, step);
 
 	rcu_user_enter();
+	/*
+	 * Notify nohz task exit syscall at last so the rest can
+	 * use rcu.
+	 * (SDR: does the above make this obsolete?)
+	 */
+	tick_nohz_exit_kernel();
 }
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 20/32] nohz/cpuset: enable addition&removal of cpus while in adaptive nohz mode
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (18 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 19/32] x86: Syscall hooks for nohz cpusets Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 21/32] nohz: Dont restart the tick before scheduling to idle Steven Rostedt
                   ` (13 subsequent siblings)
  33 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Hakan Akkan, Alessio Igor Bogani, Avi Kivity,
	Chris Metcalf, Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0020-nohz-cpuset-enable-addition-removal-of-cpus-while-in.patch --]
[-- Type: text/plain, Size: 6065 bytes --]

From: Hakan Akkan <hakanakkan@gmail.com>

Currently modifying cpuset.cpus mask of a cgroup does not
update the reference counters for adaptive nohz mode if the
cpuset already had cpuset.adaptive_nohz == 1. Fix it so that
cpus can be added or removed from a adaptive_nohz cpuset.

Signed-off-by: Hakan Akkan <hakanakkan@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/cpuset.c |  111 ++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 73 insertions(+), 38 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 84f9f2b..218abc8 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -868,6 +868,8 @@ static void update_tasks_cpumask(struct cpuset *cs, struct ptr_heap *heap)
 	cgroup_scan_tasks(&scan);
 }
 
+static void update_nohz_cpus(struct cpuset *old_cs, struct cpuset *cs);
+
 /**
  * update_cpumask - update the cpus_allowed mask of a cpuset and all tasks in it
  * @cs: the cpuset to consider
@@ -908,6 +910,11 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 	if (cpumask_equal(cs->cpus_allowed, trialcs->cpus_allowed))
 		return 0;
 
+	/*
+	 * Update adaptive nohz bits.
+	 */
+	update_nohz_cpus(cs, trialcs);
+
 	retval = heap_init(&heap, PAGE_SIZE, GFP_KERNEL, NULL);
 	if (retval)
 		return retval;
@@ -1226,50 +1233,75 @@ static void cpu_exit_nohz(int cpu)
 	preempt_enable();
 }
 
-static void update_nohz_cpus(struct cpuset *old_cs, struct cpuset *cs)
+static void update_cpu_nohz_flag(int cpu, int adjust)
 {
-	int cpu;
+	atomic_t *ref = &per_cpu(cpu_adaptive_nohz_ref, cpu);
 	int val;
 
+	val = atomic_add_return(adjust, ref);
+
+	if (val == 1 && adjust > 0) {
+		cpumask_set_cpu(cpu, &nohz_cpuset_mask);
+		/*
+		 * The mask update needs to be visible right away
+		 * so that this CPU is part of the cputime IPI
+		 * update right now.
+		 */
+		 smp_mb();
+	} else if (!val) {
+		/*
+		 * The update to cpu_adaptive_nohz_ref must be
+		 * visible right away. So that once we restart the tick
+		 * from the IPI, it won't be stopped again due to cache
+		 * update lag.
+		 * FIXME: We probably need more to ensure this value is really
+		 * visible right away.
+		 */
+		smp_mb();
+		cpu_exit_nohz(cpu);
+		/*
+		 * Now that the tick has been restarted and cputimes
+		 * flushed, we don't need anymore to be part of the
+		 * cputime flush IPI.
+		 */
+		cpumask_clear_cpu(cpu, &nohz_cpuset_mask);
+	}
+}
+
+static void update_nohz_flag(struct cpuset *old_cs, struct cpuset *cs)
+{
+	int cpu;
+	int adjust;
+
 	if (is_adaptive_nohz(old_cs) == is_adaptive_nohz(cs))
 		return;
 
-	for_each_cpu(cpu, cs->cpus_allowed) {
-		atomic_t *ref = &per_cpu(cpu_adaptive_nohz_ref, cpu);
-		if (is_adaptive_nohz(cs))
-			val = atomic_inc_return(ref);
-		else
-			val = atomic_dec_return(ref);
-
-		if (val == 1) {
-			cpumask_set_cpu(cpu, &nohz_cpuset_mask);
-			/*
-			 * The mask update needs to be visible right away
-			 * so that this CPU is part of the cputime IPI
-			 * update right now.
-			 */
-			 smp_mb();
-		} else if (!val) {
-			/*
-			 * The update to cpu_adaptive_nohz_ref must be
-			 * visible right away. So that once we restart the tick
-			 * from the IPI, it won't be stopped again due to cache
-			 * update lag.
-			 * FIXME: We probably need more to ensure this value is really
-			 * visible right away.
-			 */
-			smp_mb();
-			cpu_exit_nohz(cpu);
-			/*
-			 * Now that the tick has been restarted and cputimes
-			 * flushed, we don't need anymore to be part of the
-			 * cputime flush IPI.
-			 */
-			cpumask_clear_cpu(cpu, &nohz_cpuset_mask);
-		}
-	}
+	adjust = is_adaptive_nohz(cs) ? 1 : -1;
+	for_each_cpu(cpu, cs->cpus_allowed)
+		update_cpu_nohz_flag(cpu, adjust);
+}
+
+static void update_nohz_cpus(struct cpuset *old_cs, struct cpuset *cs)
+{
+	int cpu;
+	cpumask_t cpus;
+
+	/*
+	 * Only bother if the cpuset has adaptive nohz
+	 */
+	if (!is_adaptive_nohz(cs))
+		return;
+
+	cpumask_xor(&cpus, old_cs->cpus_allowed, cs->cpus_allowed);
+
+	for_each_cpu(cpu, &cpus)
+		update_cpu_nohz_flag(cpu,
+			cpumask_test_cpu(cpu, cs->cpus_allowed) ? 1 : -1);
 }
 #else
+static inline void update_nohz_flag(struct cpuset *old_cs, struct cpuset *cs)
+{
+}
 static inline void update_nohz_cpus(struct cpuset *old_cs, struct cpuset *cs)
 {
 }
@@ -1340,7 +1372,7 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
 	spread_flag_changed = ((is_spread_slab(cs) != is_spread_slab(trialcs))
 			|| (is_spread_page(cs) != is_spread_page(trialcs)));
 
-	update_nohz_cpus(cs, trialcs);
+	update_nohz_flag(cs, trialcs);
 
 	mutex_lock(&callback_mutex);
 	cs->flags = trialcs->flags;
@@ -1965,7 +1997,8 @@ static struct cgroup_subsys_state *cpuset_create(struct cgroup *cont)
 /*
  * If the cpuset being removed has its flag 'sched_load_balance'
  * enabled, then simulate turning sched_load_balance off, which
- * will call async_rebuild_sched_domains().
+ * will call async_rebuild_sched_domains(). Also update adaptive
+ * nohz flag.
  */
 
 static void cpuset_destroy(struct cgroup *cont)
@@ -1975,6 +2008,8 @@ static void cpuset_destroy(struct cgroup *cont)
 	if (is_sched_load_balance(cs))
 		update_flag(CS_SCHED_LOAD_BALANCE, cs, 0);
 
+	update_flag(CS_ADAPTIVE_NOHZ, cs, 0);
+
 	number_of_cpusets--;
 	free_cpumask_var(cs->cpus_allowed);
 	kfree(cs);
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 21/32] nohz: Dont restart the tick before scheduling to idle
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (19 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 20/32] nohz/cpuset: enable addition&removal of cpus while in adaptive nohz mode Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 22/32] sched: Comment on rq->clock correctness in ttwu_do_wakeup() in nohz Steven Rostedt
                   ` (12 subsequent siblings)
  33 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0021-nohz-Don-t-restart-the-tick-before-scheduling-to-idl.patch --]
[-- Type: text/plain, Size: 2207 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

If we were running adaptive tickless but then we schedule out and
enter the idle task, we don't need to restart the tick because
tick_nohz_idle_enter() is going to be called right away.

The only thing we need to do is to save the jiffies such that
when we later restart the tick we can account the CPU time spent
while idle was tickless.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/tick-sched.c |   18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 2627663..16267ee 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1036,14 +1036,18 @@ void tick_nohz_pre_schedule(void)
 void tick_nohz_post_schedule(void)
 {
 	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+	unsigned long flags;
 
-	/*
-	 * No need to disable irqs here. The worst that can happen
-	 * is an irq that comes and restart the tick before us.
-	 * tick_nohz_restart_sched_tick() is irq safe.
-	 */
-	if (ts->tick_stopped)
-		tick_nohz_restart_sched_tick();
+	local_irq_save(flags);
+	if (ts->tick_stopped) {
+		if (is_idle_task(current)) {
+			ts->saved_jiffies = jiffies;
+			ts->saved_jiffies_whence = JIFFIES_SAVED_IDLE;
+		} else {
+			tick_nohz_restart_sched_tick();
+		}
+	}
+	local_irq_restore(flags);
 }
 
 void tick_nohz_flush_current_times(bool restart_tick)
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 22/32] sched: Comment on rq->clock correctness in ttwu_do_wakeup() in nohz
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (20 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 21/32] nohz: Dont restart the tick before scheduling to idle Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 23/32] sched: Update rq clock on nohz CPU before migrating tasks Steven Rostedt
                   ` (11 subsequent siblings)
  33 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0022-sched-Comment-on-rq-clock-correctness-in-ttwu_do_wak.patch --]
[-- Type: text/plain, Size: 1572 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/core.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bebea17..783d5e4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1309,6 +1309,12 @@ ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
 	if (p->sched_class->task_woken)
 		p->sched_class->task_woken(rq, p);
 
+	/*
+	 * For adaptive nohz case: We called ttwu_activate()
+	 * which just updated the rq clock. There is an
+	 * exception with p->on_rq != 0 but in this case
+	 * we are not idle and rq->idle_stamp == 0
+	 */
 	if (rq->idle_stamp) {
 		u64 delta = rq->clock - rq->idle_stamp;
 		u64 max = 2*sysctl_sched_migration_cost;
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 23/32] sched: Update rq clock on nohz CPU before migrating tasks
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (21 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 22/32] sched: Comment on rq->clock correctness in ttwu_do_wakeup() in nohz Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 24/32] sched: Update rq clock on nohz CPU before setting fair group shares Steven Rostedt
                   ` (10 subsequent siblings)
  33 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0023-sched-Update-rq-clock-on-nohz-CPU-before-migrating-t.patch --]
[-- Type: text/plain, Size: 2200 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

Because the sched_class::put_prev_task() callback of rt and fair
classes are referring to the rq clock to update their runtime
statistics. A CPU running in tickless mode may carry a stale value.
We need to update it there.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/core.c  |    6 ++++++
 kernel/sched/sched.h |    6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 783d5e4..f0fa54d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4851,6 +4851,12 @@ static void migrate_tasks(unsigned int dead_cpu)
 	 */
 	rq->stop = NULL;
 
+	/*
+	 * ->put_prev_task() need to have an up-to-date value
+	 * of rq->clock[_task]
+	 */
+	update_nohz_rq_clock(rq);
+
 	for ( ; ; ) {
 		/*
 		 * There's this thread running, bail when that's the only
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c6cd9ec..1956494 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -948,6 +948,12 @@ static inline void dec_nr_running(struct rq *rq)
 
 extern void update_rq_clock(struct rq *rq);
 
+static inline void update_nohz_rq_clock(struct rq *rq)
+{
+	if (cpuset_cpu_adaptive_nohz(cpu_of(rq)))
+		update_rq_clock(rq);
+}
+
 extern void activate_task(struct rq *rq, struct task_struct *p, int flags);
 extern void deactivate_task(struct rq *rq, struct task_struct *p, int flags);
 
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 24/32] sched: Update rq clock on nohz CPU before setting fair group shares
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (22 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 23/32] sched: Update rq clock on nohz CPU before migrating tasks Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 25/32] sched: Update rq clock on tickless CPUs before calling check_preempt_curr() Steven Rostedt
                   ` (9 subsequent siblings)
  33 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0024-sched-Update-rq-clock-on-nohz-CPU-before-setting-fai.patch --]
[-- Type: text/plain, Size: 1913 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

Because we may update the execution time (sched_group_set_shares()->
	update_cfs_shares()->reweight_entity()->update_curr()) before
reweighting the entity after updating the group shares and this requires
an uptodate version of the runqueue clock. Let's update it on the target
CPU if it runs tickless because scheduler_tick() is not there to maintain
it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/fair.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6b800a1..928c4cb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5264,6 +5264,11 @@ int sched_group_set_shares(struct task_group *tg, unsigned long shares)
 		se = tg->se[i];
 		/* Propagate contribution to hierarchy */
 		raw_spin_lock_irqsave(&rq->lock, flags);
+		/*
+		 * We may call update_curr() which needs an up-to-date
+		 * version of rq clock if the CPU runs tickless.
+		 */
+		update_nohz_rq_clock(rq);
 		for_each_sched_entity(se)
 			update_cfs_shares(group_cfs_rq(se));
 		raw_spin_unlock_irqrestore(&rq->lock, flags);
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 25/32] sched: Update rq clock on tickless CPUs before calling check_preempt_curr()
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (23 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 24/32] sched: Update rq clock on nohz CPU before setting fair group shares Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 26/32] sched: Update rq clock earlier in unthrottle_cfs_rq Steven Rostedt
                   ` (8 subsequent siblings)
  33 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0025-sched-Update-rq-clock-on-tickless-CPUs-before-callin.patch --]
[-- Type: text/plain, Size: 2396 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

check_preempt_wakeup() of fair class needs an uptodate sched clock
value to update runtime stats of the current task.

When a task is woken up, activate_task() is usually called right before
ttwu_do_wakeup() unless the task is already in the runqueue. In this
case we need to update the rq clock manually in case the CPU runs
tickless because ttwu_do_wakeup() calls check_preempt_wakeup().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/core.c |   17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f0fa54d..320abee 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1353,6 +1353,12 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
 
 	rq = __task_rq_lock(p);
 	if (p->on_rq) {
+		/*
+		 * Ensure check_preempt_curr() won't deal with a stale value
+		 * of rq clock if the CPU is tickless. BTW do we actually need
+		 * check_preempt_curr() to be called here?
+		 */
+		update_nohz_rq_clock(rq);
 		ttwu_do_wakeup(rq, p, wake_flags);
 		ret = 1;
 	}
@@ -1530,8 +1536,17 @@ static void try_to_wake_up_local(struct task_struct *p)
 	if (!(p->state & TASK_NORMAL))
 		goto out;
 
-	if (!p->on_rq)
+	if (!p->on_rq) {
 		ttwu_activate(rq, p, ENQUEUE_WAKEUP);
+	} else {
+		/*
+		 * Even if the task is on the runqueue we still
+		 * need to ensure check_preempt_curr() won't
+		 * deal with a stale rq clock value on a tickless
+		 * CPU
+		 */
+		update_nohz_rq_clock(rq);
+	}
 
 	ttwu_do_wakeup(rq, p, 0);
 	ttwu_stat(p, smp_processor_id(), 0);
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 26/32] sched: Update rq clock earlier in unthrottle_cfs_rq
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (24 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 25/32] sched: Update rq clock on tickless CPUs before calling check_preempt_curr() Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 27/32] sched: Update clock of nohz busiest rq before balancing Steven Rostedt
                   ` (7 subsequent siblings)
  33 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0026-sched-Update-rq-clock-earlier-in-unthrottle_cfs_rq.patch --]
[-- Type: text/plain, Size: 1818 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

In this function we are making use of rq->clock right before the
update of the rq clock, let's just call update_rq_clock() just
before that to avoid using a stale rq clock value.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/fair.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 928c4cb..f320922 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1667,15 +1667,16 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
 	long task_delta;
 
 	se = cfs_rq->tg->se[cpu_of(rq_of(cfs_rq))];
-
 	cfs_rq->throttled = 0;
+
+	update_rq_clock(rq);
+
 	raw_spin_lock(&cfs_b->lock);
 	cfs_b->throttled_time += rq->clock - cfs_rq->throttled_timestamp;
 	list_del_rcu(&cfs_rq->throttled_list);
 	raw_spin_unlock(&cfs_b->lock);
 	cfs_rq->throttled_timestamp = 0;
 
-	update_rq_clock(rq);
 	/* update hierarchical throttle state */
 	walk_tg_tree_from(cfs_rq->tg, tg_nop, tg_unthrottle_up, (void *)rq);
 
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 27/32] sched: Update clock of nohz busiest rq before balancing
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (25 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 26/32] sched: Update rq clock earlier in unthrottle_cfs_rq Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 28/32] sched: Update rq clock before idle balancing Steven Rostedt
                   ` (6 subsequent siblings)
  33 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0027-sched-Update-clock-of-nohz-busiest-rq-before-balanci.patch --]
[-- Type: text/plain, Size: 2734 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

move_tasks() and active_load_balance_cpu_stop() both need
to have the busiest rq clock uptodate because they may end
up calling can_migrate_task() that uses rq->clock_task
to determine if the task running in the busiest runqueue
is cache hot.

Hence if the busiest runqueue is tickless, update its clock
before reading it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ Forward port conflicts ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/sched/fair.c |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f320922..a63e641 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4231,6 +4231,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 {
 	int ld_moved, cur_ld_moved, active_balance = 0;
 	int lb_iterations, max_lb_iterations;
+	int clock_updated;
 	struct sched_group *group;
 	struct rq *busiest;
 	unsigned long flags;
@@ -4274,6 +4275,7 @@ redo:
 
 	ld_moved = 0;
 	lb_iterations = 1;
+	clock_updated = 0;
 	if (busiest->nr_running > 1) {
 		/*
 		 * Attempt to move tasks. If find_busiest_group has found
@@ -4297,6 +4299,14 @@ more_balance:
 		 */
 		cur_ld_moved = move_tasks(&env);
 		ld_moved += cur_ld_moved;
+
+		/*
+		 * Move tasks may end up calling can_migrate_task() which
+		 * requires an uptodate value of the rq clock.
+		 */
+		update_nohz_rq_clock(busiest);
+		clock_updated = 1;
+
 		double_rq_unlock(env.dst_rq, busiest);
 		local_irq_restore(flags);
 
@@ -4392,6 +4402,13 @@ more_balance:
 				busiest->active_balance = 1;
 				busiest->push_cpu = this_cpu;
 				active_balance = 1;
+				/*
+				 * active_load_balance_cpu_stop may end up calling
+				 * can_migrate_task() which requires an uptodate
+				 * value of the rq clock.
+				 */
+				if (!clock_updated)
+					update_nohz_rq_clock(busiest);
 			}
 			raw_spin_unlock_irqrestore(&busiest->lock, flags);
 
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 28/32] sched: Update rq clock before idle balancing
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (26 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 27/32] sched: Update clock of nohz busiest rq before balancing Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 29/32] sched: Update nohz rq clock before searching busiest group on load balancing Steven Rostedt
                   ` (5 subsequent siblings)
  33 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0028-sched-Update-rq-clock-before-idle-balancing.patch --]
[-- Type: text/plain, Size: 1600 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

idle_balance() is called from schedule() right before we schedule the
idle task. It needs to record the idle timestamp at that time and for
this the rq clock must be accurate. If the CPU is running tickless
we need to update the rq clock manually.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/fair.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a63e641..89e816e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4470,6 +4470,7 @@ void idle_balance(int this_cpu, struct rq *this_rq)
 	int pulled_task = 0;
 	unsigned long next_balance = jiffies + HZ;
 
+	update_nohz_rq_clock(this_rq);
 	this_rq->idle_stamp = this_rq->clock;
 
 	if (this_rq->avg_idle < sysctl_sched_migration_cost)
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 29/32] sched: Update nohz rq clock before searching busiest group on load balancing
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (27 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 28/32] sched: Update rq clock before idle balancing Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 30/32] rcu: Switch to extended quiescent state in userspace from nohz cpuset Steven Rostedt
                   ` (4 subsequent siblings)
  33 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0029-sched-Update-nohz-rq-clock-before-searching-busiest-.patch --]
[-- Type: text/plain, Size: 1966 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

While load balancing an rq target, we look for the busiest group.
This operation may require an uptodate rq clock if we end up calling
scale_rt_power(). To this end, update it manually if the target is
running tickless.

DOUBT: don't we actually also need this in vanilla kernel, in case
this_cpu is in dyntick-idle mode?

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/fair.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 89e816e..b1b9a20 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4252,6 +4252,19 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 
 	schedstat_inc(sd, lb_count[idle]);
 
+	/*
+	 * find_busiest_group() may need an uptodate cpu clock
+	 * for find_busiest_group() (see scale_rt_power()). If
+	 * the CPU is nohz, it's clock may be stale.
+	 */
+	if (cpuset_cpu_adaptive_nohz(this_cpu)) {
+		local_irq_save(flags);
+		raw_spin_lock(&this_rq->lock);
+		update_rq_clock(this_rq);
+		raw_spin_unlock(&this_rq->lock);
+		local_irq_restore(flags);
+	}
+
 redo:
 	group = find_busiest_group(&env, balance);
 
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 30/32] rcu: Switch to extended quiescent state in userspace from nohz cpuset
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (28 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 29/32] sched: Update nohz rq clock before searching busiest group on load balancing Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 31/32] nohz/cpuset: Disable under some configs Steven Rostedt
                   ` (3 subsequent siblings)
  33 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0030-rcu-Switch-to-extended-quiescent-state-in-userspace-.patch --]
[-- Type: text/plain, Size: 5132 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

When we switch to adaptive nohz mode and we run in userspace,
we can still receive IPIs from the RCU core if a grace period
has been started by another CPU because we need to take part
of its completion.

However running in userspace is similar to that of running in
idle because we don't make use of RCU there, thus we can be
considered as running in RCU extended quiescent state. The
benefit when running into that mode is that we are not
anymore disturbed by needless IPIs coming from the RCU core.

To perform this, we just to use the RCU extended quiescent state
APIs on the following points:

- kernel exit or tick stop in userspace: here we switch to extended
quiescent state because we run in userspace without the tick.

- kernel entry or tick restart: here we exit the extended quiescent
state because either we enter the kernel and we may make use of RCU
read side critical section anytime, or we need the timer tick for some
reason and that takes care of RCU grace period in a traditional way.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/tick.h     |    3 +++
 kernel/time/tick-sched.c |   27 +++++++++++++++++++++++++--
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 3c31d6e..e2a49ad 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -153,6 +153,8 @@ static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
 # endif /* !NO_HZ */
 
 #ifdef CONFIG_CPUSETS_NO_HZ
+DECLARE_PER_CPU(int, nohz_task_ext_qs);
+
 extern void tick_nohz_enter_kernel(void);
 extern void tick_nohz_exit_kernel(void);
 extern void tick_nohz_enter_exception(struct pt_regs *regs);
@@ -160,6 +162,7 @@ extern void tick_nohz_exit_exception(struct pt_regs *regs);
 extern void tick_nohz_check_adaptive(void);
 extern void tick_nohz_pre_schedule(void);
 extern void tick_nohz_post_schedule(void);
+extern void tick_nohz_cpu_exit_qs(void);
 extern bool tick_nohz_account_tick(void);
 extern void tick_nohz_flush_current_times(bool restart_tick);
 #else /* !CPUSETS_NO_HZ */
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 16267ee..bdd40bb 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -571,10 +571,13 @@ static void tick_nohz_cpuset_stop_tick(struct tick_sched *ts)
 
 	if (!was_stopped && ts->tick_stopped) {
 		WARN_ON_ONCE(ts->saved_jiffies_whence != JIFFIES_SAVED_NONE);
-		if (user)
+		if (user) {
 			ts->saved_jiffies_whence = JIFFIES_SAVED_USER;
-		else if (!current->mm)
+			__get_cpu_var(nohz_task_ext_qs) = 1;
+			rcu_user_enter_irq();
+		} else if (!current->mm) {
 			ts->saved_jiffies_whence = JIFFIES_SAVED_SYS;
+		}
 
 		ts->saved_jiffies = jiffies;
 		set_thread_flag(TIF_NOHZ);
@@ -906,6 +909,8 @@ void tick_check_idle(int cpu)
 }
 
 #ifdef CONFIG_CPUSETS_NO_HZ
+DEFINE_PER_CPU(int, nohz_task_ext_qs);
+
 void tick_nohz_exit_kernel(void)
 {
 	unsigned long flags;
@@ -928,6 +933,9 @@ void tick_nohz_exit_kernel(void)
 	ts->saved_jiffies = jiffies;
 	ts->saved_jiffies_whence = JIFFIES_SAVED_USER;
 
+	__get_cpu_var(nohz_task_ext_qs) = 1;
+	rcu_user_enter();
+
 	local_irq_restore(flags);
 }
 
@@ -947,6 +955,11 @@ void tick_nohz_enter_kernel(void)
 	WARN_ON_ONCE(!ts->tick_stopped);
 	WARN_ON_ONCE(ts->saved_jiffies_whence != JIFFIES_SAVED_USER);
 
+	if (__get_cpu_var(nohz_task_ext_qs) == 1) {
+		__get_cpu_var(nohz_task_ext_qs) = 0;
+		rcu_user_exit();
+	}
+
 	delta_jiffies = jiffies - ts->saved_jiffies;
 	account_user_ticks(current, delta_jiffies);
 
@@ -956,6 +969,14 @@ void tick_nohz_enter_kernel(void)
 	local_irq_restore(flags);
 }
 
+void tick_nohz_cpu_exit_qs(void)
+{
+	if (__get_cpu_var(nohz_task_ext_qs)) {
+		rcu_user_exit_irq();
+		__get_cpu_var(nohz_task_ext_qs) = 0;
+	}
+}
+
 void tick_nohz_enter_exception(struct pt_regs *regs)
 {
 	if (user_mode(regs))
@@ -991,6 +1012,7 @@ static void tick_nohz_restart_adaptive(void)
 	tick_nohz_flush_current_times(true);
 	tick_nohz_restart_sched_tick();
 	clear_thread_flag(TIF_NOHZ);
+	tick_nohz_cpu_exit_qs();
 }
 
 void tick_nohz_check_adaptive(void)
@@ -1030,6 +1052,7 @@ void tick_nohz_pre_schedule(void)
 	if (test_thread_flag(TIF_NOHZ)) {
 		tick_nohz_flush_current_times(true);
 		clear_thread_flag(TIF_NOHZ);
+		/* FIXME: warn if we are in RCU idle mode */
 	}
 }
 
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 31/32] nohz/cpuset: Disable under some configs
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (29 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 30/32] rcu: Switch to extended quiescent state in userspace from nohz cpuset Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-29 20:27 ` [PATCH 32/32] nohz, not for merge: Add tickless tracing Steven Rostedt
                   ` (2 subsequent siblings)
  33 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

[-- Attachment #1: 0031-nohz-cpuset-Disable-under-some-configs.patch --]
[-- Type: text/plain, Size: 1782 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

This shows the various things that are not yet handled by
the nohz cpusets: perf events, irq work, irq time accounting.

But there are further things that have yet to be handled:
sched clock tick, runqueue clock, sched_class::task_tick(),
rq clock, cpu load, complete handling of cputimes, ...

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 init/Kconfig |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index 418e078..78e793c 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -751,7 +751,7 @@ config PROC_PID_CPUSET
 
 config CPUSETS_NO_HZ
        bool "Tickless cpusets"
-       depends on CPUSETS && HAVE_CPUSETS_NO_HZ && NO_HZ && HIGH_RES_TIMERS
+       depends on CPUSETS && HAVE_CPUSETS_NO_HZ && NO_HZ && HIGH_RES_TIMERS && !IRQ_TIME_ACCOUNTING
        help
          This options let you apply a nohz property to a cpuset such
 	 that the periodic timer tick tries to be avoided when possible on
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 32/32] nohz, not for merge: Add tickless tracing
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (30 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 31/32] nohz/cpuset: Disable under some configs Steven Rostedt
@ 2012-10-29 20:27 ` Steven Rostedt
  2012-10-30 14:02 ` [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Gilad Ben-Yossef
  2012-11-02 14:23 ` Christoph Lameter
  33 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-29 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith

[-- Attachment #1: 0032-nohz-not-for-merge-Add-tickless-tracing.patch --]
[-- Type: text/plain, Size: 2974 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

add hoc tickless tracing

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/x86/kernel/smp.c    |    2 ++
 kernel/cpuset.c          |    1 +
 kernel/time/tick-sched.c |    7 +++++++
 3 files changed, 10 insertions(+)

diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 0bad72d..45f2176 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -264,6 +264,7 @@ finish:
 void smp_reschedule_interrupt(struct pt_regs *regs)
 {
 	ack_APIC_irq();
+	trace_printk("IPI: Scheduler\n");
 	inc_irq_stat(irq_resched_count);
 	scheduler_ipi();
 	/*
@@ -276,6 +277,7 @@ void smp_cpuset_update_nohz_interrupt(struct pt_regs *regs)
 {
 	ack_APIC_irq();
 	irq_enter();
+	trace_printk("IPI: Nohz update\n");
 	tick_nohz_check_adaptive();
 	inc_irq_stat(irq_call_count);
 	irq_exit();
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 218abc8..84f099c 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1212,6 +1212,7 @@ static cpumask_t nohz_cpuset_mask;
 
 static void flush_cputime_interrupt(void *unused)
 {
+	trace_printk("IPI: flush cputime\n");
 	tick_nohz_flush_current_times(false);
 }
 
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index bdd40bb..db19c2d 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -376,6 +376,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 		if (!ts->tick_stopped) {
 			ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
 			ts->tick_stopped = 1;
+			trace_printk("Stop tick\n");
 		}
 
 		/*
@@ -581,6 +582,7 @@ static void tick_nohz_cpuset_stop_tick(struct tick_sched *ts)
 
 		ts->saved_jiffies = jiffies;
 		set_thread_flag(TIF_NOHZ);
+		trace_printk("set TIF_NOHZ\n");
 	}
 }
 #else
@@ -659,6 +661,7 @@ static void __tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
 	ts->idle_exittime = now;
 
 	tick_nohz_restart(ts, now);
+	trace_printk("Restart sched tick\n");
 }
 
 /**
@@ -1012,6 +1015,7 @@ static void tick_nohz_restart_adaptive(void)
 	tick_nohz_flush_current_times(true);
 	tick_nohz_restart_sched_tick();
 	clear_thread_flag(TIF_NOHZ);
+	trace_printk("clear TIF_NOHZ\n");
 	tick_nohz_cpu_exit_qs();
 }
 
@@ -1031,6 +1035,7 @@ void cpuset_exit_nohz_interrupt(void *unused)
 {
 	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
 
+	trace_printk("IPI: Nohz exit\n");
 	if (ts->tick_stopped && !is_idle_task(current))
 		tick_nohz_restart_adaptive();
 }
@@ -1052,6 +1057,7 @@ void tick_nohz_pre_schedule(void)
 	if (test_thread_flag(TIF_NOHZ)) {
 		tick_nohz_flush_current_times(true);
 		clear_thread_flag(TIF_NOHZ);
+		trace_printk("clear TIF_NOHZ\n");
 		/* FIXME: warn if we are in RCU idle mode */
 	}
 }
@@ -1147,6 +1153,7 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
 		}
 		update_process_times(user);
 		profile_tick(CPU_PROFILING);
+		trace_printk("tick\n");
 	}
 
 	hrtimer_forward(timer, now, tick_period);
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/32] nohz: Move nohz load balancer selection into idle logic
  2012-10-29 20:27 ` [PATCH 01/32] nohz: Move nohz load balancer selection into idle logic Steven Rostedt
@ 2012-10-30  8:32   ` Charles Wang
  2012-10-30 15:39     ` Steven Rostedt
  0 siblings, 1 reply; 60+ messages in thread
From: Charles Wang @ 2012-10-30  8:32 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Clark Williams, Frederic Weisbecker, Li Zefan, Ingo Molnar,
	Paul E. McKenney, Mike Galbraith, Alessio Igor Bogani,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Daniel Lezcano,
	Geoff Levand, Gilad Ben Yossef, Hakan Akkan, Kevin Hilman,
	Max Krasnyansky, Stephen Hemminger, Sven-Thorsten Dietrich

calc_load_exit_idle depends on updated jiffies, so you shouldn't move 
this before tick_do_update_jiffies64.

And why should we do nohz_balance_enter_idle in tick_nohz_idle_exit? 
It's nohz_balance_exit_idle here.

Regards,
Charles

On 10/30/2012 04:27 AM, Steven Rostedt wrote:
[snipped]
 > @@ -573,7 +573,6 @@ static void tick_nohz_restart_sched_tick(struct 
tick_sched *ts, ktime_t now)
 >  	tick_do_update_jiffies64(now);
 >  	update_cpu_load_nohz();
 >
 > -	calc_load_exit_idle();
 >  	touch_softlockup_watchdog();
 >  	/*
 >  	 * Cancel the scheduled timer and restore the tick
 > @@ -628,6 +627,8 @@ void tick_nohz_idle_exit(void)
 >  		tick_nohz_stop_idle(cpu, now);
 >
 >  	if (ts->tick_stopped) {
 > +		nohz_balance_enter_idle(cpu);
 > +		calc_load_exit_idle();
 >  		tick_nohz_restart_sched_tick(ts, now);
 >  		tick_nohz_account_idle_ticks(ts);
 >  	}

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (31 preceding siblings ...)
  2012-10-29 20:27 ` [PATCH 32/32] nohz, not for merge: Add tickless tracing Steven Rostedt
@ 2012-10-30 14:02 ` Gilad Ben-Yossef
  2012-11-02 14:23 ` Christoph Lameter
  33 siblings, 0 replies; 60+ messages in thread
From: Gilad Ben-Yossef @ 2012-10-30 14:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Clark Williams, Frederic Weisbecker, Li Zefan, Ingo Molnar,
	Paul E. McKenney, Mike Galbraith

On Mon, Oct 29, 2012 at 10:27 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> A while ago Frederic posted a series of patches to get an idea on
> how to implement nohz cpusets.
<snip>
>  By using
> isocpus and nohz cpuset, a task would be able to achieve true cpu
> isolation.
>
> This has been long asked for by those in the RT community. If a task
> requires uninterruptible CPU time, this would be able to give a task
> that, even without the full PREEMPT-RT patch set.
>
> This patch set is not for inclusion. It is just to get the topic
> at the forefront again. The design requires more work and more
> discussion.
>

Three additional data points that might be of interest to the discussion:

1. AFAIK both Tilera and Cavium carry patch sets with similar
functionality in their respective kernels, so the idea has some real
world users already.

2. I tested a previous version of the same patch set (based on 3.3)
together with some fixes* and got the same latency, in cycles, from a
simple test program and a version of said program running bare metal
with no OS. The same program running without this patch got 3 orders
of magnitude higher latency. So, this certainly shows some great
potential.

3. Even if you don't care about latency at all, on a massively
multi-core (or hyperscale, as I've read some people call it now)
systems, assigning a task to a single CPU can makes a lot of sense
from a cache utilization perspective etc; if you that, this feature
can give a performance boost to anything that is mostly CPU bound and
perhaps for some workloads that are not so CPU bound as well.
Specifically, many high performance computing type of workloads come
to mind. So, this has the potential to be useful to both RT folks and
HPC folks, I think.

[*] A newer version patch set:
http://www.spinics.net/lists/linux-mm/msg33860.html and disabling the
part that sends IPI to update cputime for nohz/cpuset CPUs.

Thanks,
Gilad


--
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"If you take a class in large-scale robotics, can you end up in a situation
where the homework eats your dog?"
 -- Jean-Baptiste Queru

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/32] nohz: Move nohz load balancer selection into idle logic
  2012-10-30  8:32   ` Charles Wang
@ 2012-10-30 15:39     ` Steven Rostedt
  0 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-30 15:39 UTC (permalink / raw)
  To: muming.wq
  Cc: Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Stephen Hemminger,
	Sven-Thorsten Dietrich, Andrew Morton, linux-kernel, Alex Shi


On Tue, 2012-10-30 at 16:32 +0800, Charles Wang wrote:
> calc_load_exit_idle depends on updated jiffies, so you shouldn't move 
> this before tick_do_update_jiffies64.

OK, so it should be moved to the end of the if block. Note, that was a
change I made, as that function was added since Frederic did his code. I
looked into the code and it seemed that it should be moved for just idle
as well. But I agree with you that it should be after the jiffies
update.


> And why should we do nohz_balance_enter_idle in tick_nohz_idle_exit? 
> It's nohz_balance_exit_idle here.

OK, that's my fault as well. As Frederic's original patch just moved
select_nohz_load_balance(0). But the commit c1cc017c59 "sched/nohz:
Clean up select_nohz_load_balancer()" replaced it with
nohz_balance_enter_idle(cpu), and removed it on exit, there was nothing
to replace for the current code.

Knowing this patch was considered "buggy" instead of just not moving it
(and forgetting about it), I did the change to remind myself to talk
about it :-)

My question is now, is there any reason to keep that call there? Or can
we just remove it as well.

-- Steve


> 
> Regards,
> Charles
> 
> On 10/30/2012 04:27 AM, Steven Rostedt wrote:
> [snipped]
>  > @@ -573,7 +573,6 @@ static void tick_nohz_restart_sched_tick(struct 
> tick_sched *ts, ktime_t now)
>  >  	tick_do_update_jiffies64(now);
>  >  	update_cpu_load_nohz();
>  >
>  > -	calc_load_exit_idle();
>  >  	touch_softlockup_watchdog();
>  >  	/*
>  >  	 * Cancel the scheduled timer and restore the tick
>  > @@ -628,6 +627,8 @@ void tick_nohz_idle_exit(void)
>  >  		tick_nohz_stop_idle(cpu, now);
>  >
>  >  	if (ts->tick_stopped) {
>  > +		nohz_balance_enter_idle(cpu);
>  > +		calc_load_exit_idle();
>  >  		tick_nohz_restart_sched_tick(ts, now);
>  >  		tick_nohz_account_idle_ticks(ts);
>  >  	}



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 02/32] cpuset: Set up interface for nohz flag
  2012-10-29 20:27 ` [PATCH 02/32] cpuset: Set up interface for nohz flag Steven Rostedt
@ 2012-10-30 17:16   ` Steven Rostedt
  0 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-30 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

On Mon, 2012-10-29 at 16:27 -0400, Steven Rostedt wrote:

>  #ifdef CONFIG_CPUSETS
>  
> @@ -235,4 +236,34 @@ static inline bool put_mems_allowed(unsigned int seq)
>  
>  #endif /* !CONFIG_CPUSETS */
>  
> +#ifdef CONFIG_CPUSETS_NO_HZ
> +
> +DECLARE_PER_CPU(atomic_t, cpu_adaptive_nohz_ref);
> +
> +static inline bool cpuset_cpu_adaptive_nohz(int cpu)
> +{
> +	atomic_t *ref = &per_cpu(cpu_adaptive_nohz_ref, cpu);
> +
> +	if (atomic_add_return(0, ref) > 0)

I'm assuming you do the atomic_add_return() for the implicit memory
barrier? Yuck!

Please comment this. I see that rcutree.c does the same thing without a
comment. Bad Paul, bad!


> +		return true;
> +
> +	return false;
> +}
> +
> +static inline bool cpuset_adaptive_nohz(void)
> +{
> +	/*
> +	 * We probably want to do atomic_read() when we read
> +	 * locally to avoid the overhead of an ordered add.
> +	 * For that we have to do the dec of the ref locally as
> +	 * well.

Does it matter if we miss the dec? What other synchronization is used?

	CPU 1					CPU 2
	------					-----
	var = atomic_add_return(0, ref)
						atomic_dec(ref);
	if (var > 0)

returns true.

For local cases, as this seems to be in a fast path, we should use
this_cpu_read() as well.

	
> +	 */
> +	return cpuset_cpu_adaptive_nohz(smp_processor_id());
> +}
> +#else
> +static inline bool cpuset_cpu_adaptive_nohz(int cpu) { return false; }
> +static inline bool cpuset_adaptive_nohz(void) { return false; }
> +
> +#endif /* CONFIG_CPUSETS_NO_HZ */
> +
>  #endif /* _LINUX_CPUSET_H */
> diff --git a/init/Kconfig b/init/Kconfig
> index 6fdd6e3..ffdeeab 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -749,6 +749,14 @@ config PROC_PID_CPUSET
>  	depends on CPUSETS
>  	default y
>  
> +config CPUSETS_NO_HZ
> +       bool "Tickless cpusets"
> +       depends on CPUSETS && HAVE_CPUSETS_NO_HZ
> +       help
> +         This options let you apply a nohz property to a cpuset such
> +	 that the periodic timer tick tries to be avoided when possible on
> +	 the concerned CPUs.
> +
>  config CGROUP_CPUACCT
>  	bool "Simple CPU accounting cgroup subsystem"
>  	help
> diff --git a/kernel/cpuset.c b/kernel/cpuset.c
> index f33c715..6319d8e 100644
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -145,6 +145,7 @@ typedef enum {
>  	CS_SCHED_LOAD_BALANCE,
>  	CS_SPREAD_PAGE,
>  	CS_SPREAD_SLAB,
> +	CS_ADAPTIVE_NOHZ,
>  } cpuset_flagbits_t;
>  
>  /* the type of hotplug event */
> @@ -189,6 +190,11 @@ static inline int is_spread_slab(const struct cpuset *cs)
>  	return test_bit(CS_SPREAD_SLAB, &cs->flags);
>  }
>  
> +static inline int is_adaptive_nohz(const struct cpuset *cs)
> +{
> +	return test_bit(CS_ADAPTIVE_NOHZ, &cs->flags);
> +}

We can move this into the #ifdef CONFIG_CPUSETS_NO_HZ as well, and have
the #else version just return zero. Why use test_bit() when we already
know the answer?

> +
>  static struct cpuset top_cpuset = {
>  	.flags = ((1 << CS_CPU_EXCLUSIVE) | (1 << CS_MEM_EXCLUSIVE)),
>  };
> @@ -1190,6 +1196,32 @@ static void cpuset_change_flag(struct task_struct *tsk,
>  	cpuset_update_task_spread_flag(cgroup_cs(scan->cg), tsk);
>  }
>  
> +#ifdef CONFIG_CPUSETS_NO_HZ
> +
> +DEFINE_PER_CPU(atomic_t, cpu_adaptive_nohz_ref);
> +
> +static void update_nohz_cpus(struct cpuset *old_cs, struct cpuset *cs)
> +{
> +	int cpu;
> +	int val;
> +
> +	if (is_adaptive_nohz(old_cs) == is_adaptive_nohz(cs))
> +		return;
> +
> +	for_each_cpu(cpu, cs->cpus_allowed) {
> +		atomic_t *ref = &per_cpu(cpu_adaptive_nohz_ref, cpu);
> +		if (is_adaptive_nohz(cs))
> +			atomic_inc(ref);
> +		else
> +			atomic_dec(ref);
> +	}
> +}
> +#else
> +static inline void update_nohz_cpus(struct cpuset *old_cs, struct cpuset *cs)
> +{
> +}
> +#endif
> +

-- Steve



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 03/32] nohz: Try not to give the timekeeping duty to an adaptive tickless cpu
  2012-10-29 20:27 ` [PATCH 03/32] nohz: Try not to give the timekeeping duty to an adaptive tickless cpu Steven Rostedt
@ 2012-10-30 17:33   ` Steven Rostedt
  0 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-30 17:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Li Zefan, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Max Krasnyansky,
	Stephen Hemminger, Sven-Thorsten Dietrich

On Mon, 2012-10-29 at 16:27 -0400, Steven Rostedt wrote:

>  kernel/time/tick-sched.c |   52 ++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 41 insertions(+), 11 deletions(-)
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index d6d16fe..c7a78c6 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -20,6 +20,7 @@
>  #include <linux/profile.h>
>  #include <linux/sched.h>
>  #include <linux/module.h>
> +#include <linux/cpuset.h>
>  
>  #include <asm/irq_regs.h>
>  
> @@ -789,6 +790,45 @@ void tick_check_idle(int cpu)
>  	tick_check_nohz(cpu);
>  }
>  
> +#ifdef CONFIG_CPUSETS_NO_HZ
> +
> +/*
> + * Take the timer duty if nobody is taking care of it.
> + * If a CPU already does and and it's in a nohz cpuset,

and and

> + * then take the charge so that it can switch to nohz mode.
> + */
> +static void tick_do_timer_check_handler(int cpu)
> +{
> +	int handler = tick_do_timer_cpu;
> +
> +	if (unlikely(handler == TICK_DO_TIMER_NONE)) {
> +		tick_do_timer_cpu = cpu;

I take it that this is the forced (no one has it, so I'll take it?)

Perhaps we should do a:

	WARN_ON(cpuset_cpu_adaptive_nohz(cpu));


> +	} else {
> +		if (!cpuset_adaptive_nohz() &&
> +		    cpuset_cpu_adaptive_nohz(handler))
> +			tick_do_timer_cpu = cpu;

Shouldn't this be
		if (!cpuset_cpu_adaptive_nohz(cpu) && ...

Otherwise, we should have it go to smp_processor_id()?


OK, looking further down, the only caller of it passes
smp_processor_id() to the function. But we still shouldn't assume this.
Either, have the function use smp_processor_id() explicitly, or let it
use any cpu. Otherwise it just confuses people, and is prone to be buggy
if in the future something calls it with a cpu that's not the local cpu.

-- Steve

> +	}
> +}
> +
> +#else
> +
> +static void tick_do_timer_check_handler(int cpu)
> +{
> +#ifdef CONFIG_NO_HZ
> +	/*
> +	 * Check if the do_timer duty was dropped. We don't care about
> +	 * concurrency: This happens only when the cpu in charge went
> +	 * into a long sleep. If two cpus happen to assign themself to
> +	 * this duty, then the jiffies update is still serialized by
> +	 * xtime_lock.
> +	 */
> +	if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE))
> +		tick_do_timer_cpu = cpu;
> +#endif
> +}
> +
> +#endif /* CONFIG_CPUSETS_NO_HZ */
> +
>  /*
>   * High resolution timer specific code
>   */
> @@ -805,17 +845,7 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
>  	ktime_t now = ktime_get();
>  	int cpu = smp_processor_id();
>  
> -#ifdef CONFIG_NO_HZ
> -	/*
> -	 * Check if the do_timer duty was dropped. We don't care about
> -	 * concurrency: This happens only when the cpu in charge went
> -	 * into a long sleep. If two cpus happen to assign themself to
> -	 * this duty, then the jiffies update is still serialized by
> -	 * xtime_lock.
> -	 */
> -	if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE))
> -		tick_do_timer_cpu = cpu;
> -#endif
> +	tick_do_timer_check_handler(cpu);
>  
>  	/* Check, if the jiffies need an update */
>  	if (tick_do_timer_cpu == cpu)



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 04/32] x86: New cpuset nohz irq vector
  2012-10-29 20:27 ` [PATCH 04/32] x86: New cpuset nohz irq vector Steven Rostedt
@ 2012-10-30 17:39   ` Steven Rostedt
  2012-10-30 23:51     ` Frederic Weisbecker
  0 siblings, 1 reply; 60+ messages in thread
From: Steven Rostedt @ 2012-10-30 17:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Stephen Hemminger,
	Sven-Thorsten Dietrich

On Mon, 2012-10-29 at 16:27 -0400, Steven Rostedt wrote:
> plain text document attachment
> (0004-x86-New-cpuset-nohz-irq-vector.patch)
> From: Frederic Weisbecker <fweisbec@gmail.com>
> 
> We need a way to send an IPI (remote or local) in order to
> asynchronously restart the tick for CPUs in nohz adaptive mode.
> 
> This must be asynchronous such that we can trigger it with irqs
> disabled. This must be usable as a self-IPI as well for example
> in cases where we want to avoid random dealock scenario while
> restarting the tick inline otherwise.
> 
> This only settles the x86 backend. The core tick restart function
> will be defined in a later patch.
> 
> [CHECKME: Perhaps we instead need to use irq work for self IPIs.
> But we also need a way to send async remote IPIs.]

Probably just use irq_work for self ipis, and normal ipis for other
CPUs.

Also, what reason do we have to force a task out of nohz? IOW, do we
really need this?

Also, perhaps we could just tag onto the schedule_ipi() function instead
of having to create a new IPI for all archs?

-- Steve

> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Alessio Igor Bogani <abogani@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Avi Kivity <avi@redhat.com>
> Cc: Chris Metcalf <cmetcalf@tilera.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Cc: Geoff Levand <geoff@infradead.org>
> Cc: Gilad Ben Yossef <gilad@benyossef.com>
> Cc: Hakan Akkan <hakanakkan@gmail.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Kevin Hilman <khilman@ti.com>
> Cc: Max Krasnyansky <maxk@qualcomm.com>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Stephen Hemminger <shemminger@vyatta.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 05/32] nohz: Adaptive tick stop and restart on nohz cpuset
  2012-10-29 20:27 ` [PATCH 05/32] nohz: Adaptive tick stop and restart on nohz cpuset Steven Rostedt
@ 2012-10-30 18:23   ` Steven Rostedt
  0 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-30 18:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Stephen Hemminger,
	Sven-Thorsten Dietrich

On Mon, 2012-10-29 at 16:27 -0400, Steven Rostedt wrote:

> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1196,6 +1196,29 @@ static void update_avg(u64 *avg, u64 sample)
>  }
>  #endif
>  
> +#ifdef CONFIG_CPUSETS_NO_HZ
> +bool sched_can_stop_tick(void)
> +{
> +	struct rq *rq;
> +
> +	rq = this_rq();
> +
> +	/*
> +	 * This is called right after cpuset_adaptive_nohz() that

See below (for this caller).


> +	 * uses atomic_add_return() so that we are ordered against
> +	 * cpu_adaptive_nohz_ref. When inc_nr_running() sends an
> +	 * IPI to this CPU, we are guaranteed to see the update on
> +	 * nr_running.
> +	 */
> +
> +	/* More than one running task need preemption */
> +	if (rq->nr_running > 1)
> +		return false;
> +
> +	return true;
> +}
> +#endif
> +
>  static void
>  ttwu_stat(struct task_struct *p, int cpu, int wake_flags)
>  {
> @@ -1897,6 +1920,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
>  	 * frame will be invalid.
>  	 */
>  	finish_task_switch(this_rq(), prev);
> +	tick_nohz_post_schedule();
>  }
>  
>  /*
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 7a7db09..c6cd9ec 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1,6 +1,7 @@
>  
>  #include <linux/sched.h>
>  #include <linux/mutex.h>
> +#include <linux/cpuset.h>
>  #include <linux/spinlock.h>
>  #include <linux/stop_machine.h>
>  
> @@ -927,6 +928,17 @@ static inline u64 steal_ticks(u64 steal)
>  static inline void inc_nr_running(struct rq *rq)
>  {
>  	rq->nr_running++;
> +
> +	if (rq->nr_running == 2) {
> +		/*
> +		 * cpuset_cpu_adaptive_nohz() uses atomic_add_return()
> +		 * to order against rq->nr_running updates. This way
> +		 * the CPU that receives the IPI is guaranteed to see
> +		 * the update on nr_running without the rq->lock.
> +		 */
> +		if (cpuset_cpu_adaptive_nohz(rq->cpu))
> +			smp_cpuset_update_nohz(rq->cpu);
> +	}
>  }
>  
>  static inline void dec_nr_running(struct rq *rq)

Should we add one for dec_nr_running()? Or is this done elsewhere. I
would think that there's a good chance that we can miss a chance to stop
the tick.


> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index cc96bdc..e06b8eb 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -25,6 +25,7 @@
>  #include <linux/smp.h>
>  #include <linux/smpboot.h>
>  #include <linux/tick.h>
> +#include <linux/cpuset.h>
>  
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/irq.h>
> @@ -307,7 +308,8 @@ void irq_enter(void)
>  	int cpu = smp_processor_id();
>  
>  	rcu_irq_enter();
> -	if (is_idle_task(current) && !in_interrupt()) {
> +
> +	if ((is_idle_task(current) || cpuset_adaptive_nohz()) && !in_interrupt()) {
>  		/*
>  		 * Prevent raise_softirq from needlessly waking up ksoftirqd
>  		 * here, as softirq will be serviced on return from interrupt.
> @@ -349,7 +351,7 @@ void irq_exit(void)
>  
>  #ifdef CONFIG_NO_HZ
>  	/* Make sure that timer wheel updates are propagated */
> -	if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
> +	if (!in_interrupt())
>  		tick_nohz_irq_exit();
>  #endif
>  	rcu_irq_exit();
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index c7a78c6..35047b2 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -512,6 +512,24 @@ void tick_nohz_idle_enter(void)
>  	local_irq_enable();
>  }
>  
> +static void tick_nohz_cpuset_stop_tick(struct tick_sched *ts)
> +{
> +#ifdef CONFIG_CPUSETS_NO_HZ
> +	int cpu = smp_processor_id();
> +
> +	if (!cpuset_adaptive_nohz() || is_idle_task(current))
> +		return;

The above is most likely true. Lets remove the memory barrier in
cpuset_adaptive_nohz(), just add an explicit one here, in the slow path.

	/* Before checking the below conditions, we must first
	 * make sure that the cpuset/nohz is active, so we do
	 * not miss a deactivating IPI. 
	 * ie. when nr_running == 2, an IPI is sent, and this
	 * code must see the nr_running changed after testing
	 * if the current CPU is adaptive nohz.
	 */
	smp_mb();

> +
> +	if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
> +		return;
> +
> +	if (!sched_can_stop_tick())
> +		return;
> +
> +	tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
> +#endif
> +}
> +
>  /**
>   * tick_nohz_irq_exit - update next tick event from interrupt exit
>   *
> @@ -524,10 +542,12 @@ void tick_nohz_irq_exit(void)
>  {
>  	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
>  
> -	if (!ts->inidle)
> -		return;
> -
> -	__tick_nohz_idle_enter(ts);
> +	if (ts->inidle) {
> +		if (!need_resched())
> +			__tick_nohz_idle_enter(ts);
> +	} else {
> +		tick_nohz_cpuset_stop_tick(ts);
> +	}
>  }
>  
>  /**
> @@ -568,7 +588,7 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
>  	}
>  }
>  
> -static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
> +static void __tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
>  {
>  	/* Update jiffies first */
>  	tick_do_update_jiffies64(now);
> @@ -584,6 +604,31 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
>  	tick_nohz_restart(ts, now);
>  }
>  
> +/**
> + * tick_nohz_restart_sched_tick - restart the tick for a tickless CPU
> + *
> + * Restart the tick when the CPU is in adaptive tickless mode.
> + */
> +void tick_nohz_restart_sched_tick(void)
> +{
> +	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
> +	unsigned long flags;
> +	ktime_t now;
> +
> +	local_irq_save(flags);
> +
> +	if (!ts->tick_stopped) {
> +		local_irq_restore(flags);
> +		return;
> +	}
> +
> +	now = ktime_get();
> +	__tick_nohz_restart_sched_tick(ts, now);
> +
> +	local_irq_restore(flags);
> +}
> +
> +
>  static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
>  {
>  #ifndef CONFIG_VIRT_CPU_ACCOUNTING
> @@ -630,7 +675,7 @@ void tick_nohz_idle_exit(void)
>  	if (ts->tick_stopped) {
>  		nohz_balance_enter_idle(cpu);
>  		calc_load_exit_idle();
> -		tick_nohz_restart_sched_tick(ts, now);
> +		__tick_nohz_restart_sched_tick(ts, now);
>  		tick_nohz_account_idle_ticks(ts);
>  	}
>  
> @@ -791,7 +836,6 @@ void tick_check_idle(int cpu)
>  }
>  
>  #ifdef CONFIG_CPUSETS_NO_HZ
> -
>  /*
>   * Take the timer duty if nobody is taking care of it.
>   * If a CPU already does and and it's in a nohz cpuset,
> @@ -810,6 +854,31 @@ static void tick_do_timer_check_handler(int cpu)
>  	}
>  }
>  
> +void tick_nohz_check_adaptive(void)
> +{
> +	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
> +
> +	if (cpuset_adaptive_nohz()) {

Add smp_mb() here too, with the same comment.


Another option may be to only add the memory barrier on the true case:

static inline bool cpuset_adaptive_nohz(void)
{
	if (atomic_read(this_cpu_read(cpu_adaptive_nohz_ref)) > 0) {
		/*
		 * In order not to miss cases where we need to enable
		 * the tick again, we must make sure that all new checks
		 * are visible after we find we are in adaptive nohz
		 * mode.
		 */
		smp_mb();
		return true;
	}
	return false;
}

The above will only force the expensive memory barrier when adaptive
nohz is enabled. All other cases will avoid that overhead. If we miss
disabling the tick, the next tick should disable it. But we must make
sure that we enable it when needed.

But as this is called in several fast paths for everyone, we need to
keep the overhead of the default case as low as possible.

> +		if (ts->tick_stopped && !is_idle_task(current)) {
> +			if (!sched_can_stop_tick())
> +				tick_nohz_restart_sched_tick();
> +		}
> +	}
> +}
> +
> +void tick_nohz_post_schedule(void)
> +{
> +	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
> +
> +	/*
> +	 * No need to disable irqs here. The worst that can happen
> +	 * is an irq that comes and restart the tick before us.
> +	 * tick_nohz_restart_sched_tick() is irq safe.
> +	 */
> +	if (ts->tick_stopped)
> +		tick_nohz_restart_sched_tick();
> +}
> +
>  #else
>  
>  static void tick_do_timer_check_handler(int cpu)
> @@ -856,6 +925,7 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
>  	 * no valid regs pointer
>  	 */
>  	if (regs) {
> +		int user = user_mode(regs);
>  		/*
>  		 * When we are idle and the tick is stopped, we have to touch
>  		 * the watchdog as we might not schedule for a really long
> @@ -869,7 +939,7 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
>  			if (is_idle_task(current))
>  				ts->idle_jiffies++;
>  		}
> -		update_process_times(user_mode(regs));
> +		update_process_times(user);

What's the purpose of this change?

-- Steve

>  		profile_tick(CPU_PROFILING);
>  	}
>  



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 06/32] nohz/cpuset: Dont turn off the tick if rcu needs it
  2012-10-29 20:27 ` [PATCH 06/32] nohz/cpuset: Dont turn off the tick if rcu needs it Steven Rostedt
@ 2012-10-30 18:30   ` Steven Rostedt
  0 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-30 18:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Stephen Hemminger,
	Sven-Thorsten Dietrich

On Mon, 2012-10-29 at 16:27 -0400, Steven Rostedt wrote:
> plain text document attachment
> (0006-nohz-cpuset-Don-t-turn-off-the-tick-if-rcu-needs-it.patch)
> From: Frederic Weisbecker <fweisbec@gmail.com>
> 
> If RCU is waiting for the current CPU to complete a grace
> period, don't turn off the tick. Unlike dynctik-idle, we
> are not necessarily going to enter into rcu extended quiescent
> state, so we may need to keep the tick to note current CPU's
> quiescent states.
> 
> [added build fix from Zen Lin]

I don't see anything obviously troubling in this patch.

Acked-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve

> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Alessio Igor Bogani <abogani@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Avi Kivity <avi@redhat.com>
> Cc: Chris Metcalf <cmetcalf@tilera.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Cc: Geoff Levand <geoff@infradead.org>
> Cc: Gilad Ben Yossef <gilad@benyossef.com>
> Cc: Hakan Akkan <hakanakkan@gmail.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Kevin Hilman <khilman@ti.com>
> Cc: Max Krasnyansky <maxk@qualcomm.com>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Stephen Hemminger <shemminger@vyatta.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 09/32] nohz/cpuset: Restart tick when nohz flag is cleared on cpuset
  2012-10-29 20:27 ` [PATCH 09/32] nohz/cpuset: Restart tick when nohz flag is cleared on cpuset Steven Rostedt
@ 2012-10-30 18:55   ` Steven Rostedt
  0 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-10-30 18:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Stephen Hemminger,
	Sven-Thorsten Dietrich

On Mon, 2012-10-29 at 16:27 -0400, Steven Rostedt wrote:
> plain text document attachment
> (0009-nohz-cpuset-Restart-tick-when-nohz-flag-is-cleared-o.patch)
> From: Frederic Weisbecker <fweisbec@gmail.com>
> 
> Issue an IPI to restart the tick on a CPU that belongs
> to a cpuset when its nohz flag gets cleared.
> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Alessio Igor Bogani <abogani@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Avi Kivity <avi@redhat.com>
> Cc: Chris Metcalf <cmetcalf@tilera.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Cc: Geoff Levand <geoff@infradead.org>
> Cc: Gilad Ben Yossef <gilad@benyossef.com>
> Cc: Hakan Akkan <hakanakkan@gmail.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Kevin Hilman <khilman@ti.com>
> Cc: Max Krasnyansky <maxk@qualcomm.com>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Stephen Hemminger <shemminger@vyatta.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> ---
>  include/linux/cpuset.h   |    2 ++
>  kernel/cpuset.c          |   25 +++++++++++++++++++++++--
>  kernel/time/tick-sched.c |    8 ++++++++
>  3 files changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
> index 7e7eb41..631968b 100644
> --- a/include/linux/cpuset.h
> +++ b/include/linux/cpuset.h
> @@ -260,6 +260,8 @@ static inline bool cpuset_adaptive_nohz(void)
>  	 */
>  	return cpuset_cpu_adaptive_nohz(smp_processor_id());
>  }
> +
> +extern void cpuset_exit_nohz_interrupt(void *unused);
>  #else
>  static inline bool cpuset_cpu_adaptive_nohz(int cpu) { return false; }
>  static inline bool cpuset_adaptive_nohz(void) { return false; }
> diff --git a/kernel/cpuset.c b/kernel/cpuset.c
> index 6319d8e..1b67e5b 100644
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -1200,6 +1200,14 @@ static void cpuset_change_flag(struct task_struct *tsk,
>  
>  DEFINE_PER_CPU(atomic_t, cpu_adaptive_nohz_ref);
>  
> +static void cpu_exit_nohz(int cpu)
> +{
> +	preempt_disable();
> +	smp_call_function_single(cpu, cpuset_exit_nohz_interrupt,
> +				 NULL, true);
> +	preempt_enable();
> +}
> +
>  static void update_nohz_cpus(struct cpuset *old_cs, struct cpuset *cs)
>  {
>  	int cpu;
> @@ -1211,9 +1219,22 @@ static void update_nohz_cpus(struct cpuset *old_cs, struct cpuset *cs)
>  	for_each_cpu(cpu, cs->cpus_allowed) {
>  		atomic_t *ref = &per_cpu(cpu_adaptive_nohz_ref, cpu);
>  		if (is_adaptive_nohz(cs))
> -			atomic_inc(ref);
> +			val = atomic_inc_return(ref);
>  		else
> -			atomic_dec(ref);
> +			val = atomic_dec_return(ref);
> +
> +		if (!val) {
> +			/*
> +			 * The update to cpu_adaptive_nohz_ref must be
> +			 * visible right away. So that once we restart the tick
> +			 * from the IPI, it won't be stopped again due to cache
> +			 * update lag.
> +			 * FIXME: We probably need more to ensure this value is really
> +			 * visible right away.

What more do you want? stomp_machine()??

> +			 */
> +			smp_mb();

The atomic_inc_return() and atomic_dec_return() already imply a
smp_mb().

Later patches change this code, so I wont dwell on this patch too much.


> +			cpu_exit_nohz(cpu);
> +		}
>  	}
>  }
>  #else
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 0a5e650..de7de68 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -884,6 +884,14 @@ void tick_nohz_check_adaptive(void)
>  	}
>  }
>  
> +void cpuset_exit_nohz_interrupt(void *unused)
> +{
> +	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
> +
> +	if (ts->tick_stopped && !is_idle_task(current))
> +		tick_nohz_restart_adaptive();

BTW, what a confusing name. "restart_adaptive()"? It sounds like we are
going to restart the adaptive code, like restarting NOHZ.

-- Steve

> +}
> +
>  void tick_nohz_post_schedule(void)
>  {
>  	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 10/32] nohz/cpuset: Restart the tick if printk needs it
  2012-10-29 20:27 ` [PATCH 10/32] nohz/cpuset: Restart the tick if printk needs it Steven Rostedt
@ 2012-10-30 19:01   ` Steven Rostedt
  2012-10-30 23:54     ` Frederic Weisbecker
  0 siblings, 1 reply; 60+ messages in thread
From: Steven Rostedt @ 2012-10-30 19:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Thomas Gleixner, Peter Zijlstra, Clark Williams,
	Frederic Weisbecker, Ingo Molnar, Paul E. McKenney,
	Mike Galbraith, Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Stephen Hemminger,
	Sven-Thorsten Dietrich

On Mon, 2012-10-29 at 16:27 -0400, Steven Rostedt wrote:
> plain text document attachment
> (0010-nohz-cpuset-Restart-the-tick-if-printk-needs-it.patch)
> From: Frederic Weisbecker <fweisbec@gmail.com>
> 
> If we are in nohz adaptive mode and printk is called, the tick is
> missing to wake up the logger. We need to restart the tick when that
> happens. Do this asynchronously by issuing a tick restart self IPI
> to avoid deadlocking with the current random locking chain.
> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Alessio Igor Bogani <abogani@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Avi Kivity <avi@redhat.com>
> Cc: Chris Metcalf <cmetcalf@tilera.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Cc: Geoff Levand <geoff@infradead.org>
> Cc: Gilad Ben Yossef <gilad@benyossef.com>
> Cc: Hakan Akkan <hakanakkan@gmail.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Kevin Hilman <khilman@ti.com>
> Cc: Max Krasnyansky <maxk@qualcomm.com>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Stephen Hemminger <shemminger@vyatta.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/printk.c |   15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/printk.c b/kernel/printk.c
> index 2d607f4..bf9048d 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -42,6 +42,7 @@
>  #include <linux/notifier.h>
>  #include <linux/rculist.h>
>  #include <linux/poll.h>
> +#include <linux/cpuset.h>
>  
>  #include <asm/uaccess.h>
>  
> @@ -1977,8 +1978,20 @@ int printk_needs_cpu(int cpu)
>  
>  void wake_up_klogd(void)
>  {
> -	if (waitqueue_active(&log_wait))
> +	unsigned long flags;
> +
> +	if (waitqueue_active(&log_wait)) {
>  		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
> +		/* Make it visible from any interrupt from now */
> +		barrier();
> +		/*
> +		 * It's safe to check that even if interrupts are not disabled.

Probably need to at least disable preemption. I don't see any
requirement that wake_up_klogd() needs to be called with preemption
disabled.

The this_cpu_or() doesn't care which CPU it triggers, but the enabling
of nohz does.

-- Steve

> +		 * If we enable nohz adaptive mode concurrently, we'll see the
> +		 * printk_pending value and thus keep a periodic tick behaviour.
> +		 */
> +		if (cpuset_adaptive_nohz())
> +			smp_cpuset_update_nohz(smp_processor_id());
> +	}
>  }
>  
>  static void console_cont_flush(char *text, size_t size)



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 04/32] x86: New cpuset nohz irq vector
  2012-10-30 17:39   ` Steven Rostedt
@ 2012-10-30 23:51     ` Frederic Weisbecker
  2012-10-31  0:07       ` Steven Rostedt
  0 siblings, 1 reply; 60+ messages in thread
From: Frederic Weisbecker @ 2012-10-30 23:51 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Clark Williams, Ingo Molnar, Paul E. McKenney, Mike Galbraith,
	Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Stephen Hemminger,
	Sven-Thorsten Dietrich

2012/10/30 Steven Rostedt <rostedt@goodmis.org>:
> On Mon, 2012-10-29 at 16:27 -0400, Steven Rostedt wrote:
>> plain text document attachment
>> (0004-x86-New-cpuset-nohz-irq-vector.patch)
>> From: Frederic Weisbecker <fweisbec@gmail.com>
>>
>> We need a way to send an IPI (remote or local) in order to
>> asynchronously restart the tick for CPUs in nohz adaptive mode.
>>
>> This must be asynchronous such that we can trigger it with irqs
>> disabled. This must be usable as a self-IPI as well for example
>> in cases where we want to avoid random dealock scenario while
>> restarting the tick inline otherwise.
>>
>> This only settles the x86 backend. The core tick restart function
>> will be defined in a later patch.
>>
>> [CHECKME: Perhaps we instead need to use irq work for self IPIs.
>> But we also need a way to send async remote IPIs.]
>
> Probably just use irq_work for self ipis, and normal ipis for other
> CPUs.

Right. And that's one more reason why we want to know if the arch
implements irq work with self ipis or not. If the arch can't, then we
just don't stop the tick.

> Also, what reason do we have to force a task out of nohz? IOW, do we
> really need this?

When a posix CPU timer is enqueued, when a new task is enqueued, etc...

>
> Also, perhaps we could just tag onto the schedule_ipi() function instead
> of having to create a new IPI for all archs?

irq work should be just fine. No need to add more overhead on the
schedule ipi I think.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 10/32] nohz/cpuset: Restart the tick if printk needs it
  2012-10-30 19:01   ` Steven Rostedt
@ 2012-10-30 23:54     ` Frederic Weisbecker
  0 siblings, 0 replies; 60+ messages in thread
From: Frederic Weisbecker @ 2012-10-30 23:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Clark Williams, Ingo Molnar, Paul E. McKenney, Mike Galbraith,
	Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Stephen Hemminger,
	Sven-Thorsten Dietrich

2012/10/30 Steven Rostedt <rostedt@goodmis.org>:
> Probably need to at least disable preemption. I don't see any
> requirement that wake_up_klogd() needs to be called with preemption
> disabled.
>
> The this_cpu_or() doesn't care which CPU it triggers, but the enabling
> of nohz does.

This patch is deemed to be replaced with the printk in nohz patchset
I'm working on. But it indeed to disable preemption as well and its
irq work should be made per cpu.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 04/32] x86: New cpuset nohz irq vector
  2012-10-30 23:51     ` Frederic Weisbecker
@ 2012-10-31  0:07       ` Steven Rostedt
  2012-10-31  0:45         ` Frederic Weisbecker
  0 siblings, 1 reply; 60+ messages in thread
From: Steven Rostedt @ 2012-10-31  0:07 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Clark Williams, Ingo Molnar, Paul E. McKenney, Mike Galbraith,
	Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Stephen Hemminger,
	Sven-Thorsten Dietrich

On Wed, 2012-10-31 at 00:51 +0100, Frederic Weisbecker wrote:

> > Probably just use irq_work for self ipis, and normal ipis for other
> > CPUs.
> 
> Right. And that's one more reason why we want to know if the arch
> implements irq work with self ipis or not. If the arch can't, then we
> just don't stop the tick.

We can just allow certain archs to have cpuset/nohz. Make it depend on
features that you want (or makes nohz easier to implement).

> 
> > Also, what reason do we have to force a task out of nohz? IOW, do we
> > really need this?
> 
> When a posix CPU timer is enqueued, when a new task is enqueued, etc...

I was thinking about something other than itself. That is, who would
enqueue a posix cpu timer on the cpu other than the task running with
nohz on that cpu?

A new task would send the schedule ipi too. Which would enqueue the task
and take the cpu out of nohz, no?


> 
> >
> > Also, perhaps we could just tag onto the schedule_ipi() function instead
> > of having to create a new IPI for all archs?
> 
> irq work should be just fine. No need to add more overhead on the
> schedule ipi I think.

irq_work can send the work to another CPU right? This part I wasn't sure
about.

-- Steve



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 04/32] x86: New cpuset nohz irq vector
  2012-10-31  0:07       ` Steven Rostedt
@ 2012-10-31  0:45         ` Frederic Weisbecker
  0 siblings, 0 replies; 60+ messages in thread
From: Frederic Weisbecker @ 2012-10-31  0:45 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Clark Williams, Ingo Molnar, Paul E. McKenney, Mike Galbraith,
	Alessio Igor Bogani, Avi Kivity, Chris Metcalf,
	Christoph Lameter, Daniel Lezcano, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Kevin Hilman, Stephen Hemminger,
	Sven-Thorsten Dietrich

2012/10/31 Steven Rostedt <rostedt@goodmis.org>:
> On Wed, 2012-10-31 at 00:51 +0100, Frederic Weisbecker wrote:
>
>> > Probably just use irq_work for self ipis, and normal ipis for other
>> > CPUs.
>>
>> Right. And that's one more reason why we want to know if the arch
>> implements irq work with self ipis or not. If the arch can't, then we
>> just don't stop the tick.
>
> We can just allow certain archs to have cpuset/nohz. Make it depend on
> features that you want (or makes nohz easier to implement).

Right.

>>
>> > Also, what reason do we have to force a task out of nohz? IOW, do we
>> > really need this?
>>
>> When a posix CPU timer is enqueued, when a new task is enqueued, etc...
>
> I was thinking about something other than itself. That is, who would
> enqueue a posix cpu timer on the cpu other than the task running with
> nohz on that cpu?

If the posix cpu timer is process wide (ie: whole threadgroup) this can happen.

> A new task would send the schedule ipi too. Which would enqueue the task
> and take the cpu out of nohz, no?

Not if it's enqueued locally. And in this case we don't want to
restart the tick from the ttwu path in order to avoid funny locking
scenario. So a self IPI would do the trick.

>> irq work should be just fine. No need to add more overhead on the
>> schedule ipi I think.
>
> irq_work can send the work to another CPU right? This part I wasn't sure
> about.

"Claiming" a work itself can be a cross CPU competition: multiple CPUs
may want to queue the work at the same time, only one should succeed.
Once claimed though, the work can only been enqueued locally.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs
  2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
                   ` (32 preceding siblings ...)
  2012-10-30 14:02 ` [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Gilad Ben-Yossef
@ 2012-11-02 14:23 ` Christoph Lameter
  2012-11-02 14:37   ` Steven Rostedt
  2012-11-05 22:32   ` Frederic Weisbecker
  33 siblings, 2 replies; 60+ messages in thread
From: Christoph Lameter @ 2012-11-02 14:23 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Clark Williams, Frederic Weisbecker, Li Zefan, Ingo Molnar,
	Paul E. McKenney, Mike Galbraith

On Mon, 29 Oct 2012, Steven Rostedt wrote:

> A while ago Frederic posted a series of patches to get an idea on
> how to implement nohz cpusets. Where you can add a task to a cpuset
> and mark the set to be 'nohz'. When the task runs on a CPU and is
> the only task scheduled (nr_running == 1), the tick will stop.
> The idea is to give the task the least amount of kernel interference
> as possible. If the task doesn't do any system calls (and possibly
> even if it does), no timer interrupt will bother it. By using
> isocpus and nohz cpuset, a task would be able to achieve true cpu
> isolation.

I thought isolcpus was on the way out? If there is no timer interrupt then
there will also be no scheduler activity. Why do we need both?

Also could we have this support without cpusets? There are multiple means
to do system segmentation (f.e. cgroups) and something like hz control is
pretty basic. Control via some cpumask like irq affinities in f.e.

	/sys/devices/system/cpu/nohz

or a per cpu flag in

/sys/devices/system/cpu/cpu0/hz

would be easier and not be tied to something like cpusets.

also it would be best to sync this conceptually with the processors
enabled for rcu processing.

Maybe have a series of cpumasks in /sys/devices/system/cpu/ ?

> This has been long asked for by those in the RT community. If a task
> requires uninterruptible CPU time, this would be able to give a task
> that, even without the full PREEMPT-RT patch set.

Also those interested in low latency are very very interested in this
feature in particular in support without any preempt support on in the
kernel.



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs
  2012-11-02 14:23 ` Christoph Lameter
@ 2012-11-02 14:37   ` Steven Rostedt
  2012-11-02 14:50     ` David Nyström
  2012-11-02 15:03     ` Christoph Lameter
  2012-11-05 22:32   ` Frederic Weisbecker
  1 sibling, 2 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-11-02 14:37 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Clark Williams, Frederic Weisbecker, Li Zefan, Ingo Molnar,
	Paul E. McKenney, Mike Galbraith

On Fri, 2012-11-02 at 14:23 +0000, Christoph Lameter wrote:
> On Mon, 29 Oct 2012, Steven Rostedt wrote:
> 
> > A while ago Frederic posted a series of patches to get an idea on
> > how to implement nohz cpusets. Where you can add a task to a cpuset
> > and mark the set to be 'nohz'. When the task runs on a CPU and is
> > the only task scheduled (nr_running == 1), the tick will stop.
> > The idea is to give the task the least amount of kernel interference
> > as possible. If the task doesn't do any system calls (and possibly
> > even if it does), no timer interrupt will bother it. By using
> > isocpus and nohz cpuset, a task would be able to achieve true cpu
> > isolation.
> 
> I thought isolcpus was on the way out? If there is no timer interrupt then
> there will also be no scheduler activity. Why do we need both?

I probably shouldn't have mentioned isolcpus. I was using that as
something that is general to get everything off of a cpu (irq affinity
for example).

> 
> Also could we have this support without cpusets? There are multiple means
> to do system segmentation (f.e. cgroups) and something like hz control is
> pretty basic. Control via some cpumask like irq affinities in f.e.
> 
> 	/sys/devices/system/cpu/nohz
> 
> or a per cpu flag in
> 
> /sys/devices/system/cpu/cpu0/hz
> 
> would be easier and not be tied to something like cpusets.

Frederic will have to answer this. I was just starting with his patches.
Note, we are holding off this work for now until Frederic's other work
is done (the irq_work and printk updates).

> 
> also it would be best to sync this conceptually with the processors
> enabled for rcu processing.

Processors can be disabled for rcu processing? Or are you talking about
Paul's new work of offloading rcu callbacks?

> 
> Maybe have a series of cpumasks in /sys/devices/system/cpu/ ?
> 
> > This has been long asked for by those in the RT community. If a task
> > requires uninterruptible CPU time, this would be able to give a task
> > that, even without the full PREEMPT-RT patch set.
> 
> Also those interested in low latency are very very interested in this
> feature in particular in support without any preempt support on in the
> kernel.
> 

Yep understood. We really need to get things rolling.

-- Steve



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs
  2012-11-02 14:37   ` Steven Rostedt
@ 2012-11-02 14:50     ` David Nyström
  2012-11-02 15:03     ` Christoph Lameter
  1 sibling, 0 replies; 60+ messages in thread
From: David Nyström @ 2012-11-02 14:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Christoph Lameter, linux-kernel, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Clark Williams, Frederic Weisbecker, Li Zefan,
	Ingo Molnar, Paul E. McKenney, Mike Galbraith

On 11/02/2012 03:37 PM, Steven Rostedt wrote:
> On Fri, 2012-11-02 at 14:23 +0000, Christoph Lameter wrote:
>> On Mon, 29 Oct 2012, Steven Rostedt wrote:
>>
>>> A while ago Frederic posted a series of patches to get an idea on
>>> how to implement nohz cpusets. Where you can add a task to a cpuset
>>> and mark the set to be 'nohz'. When the task runs on a CPU and is
>>> the only task scheduled (nr_running == 1), the tick will stop.
>>> The idea is to give the task the least amount of kernel interference
>>> as possible. If the task doesn't do any system calls (and possibly
>>> even if it does), no timer interrupt will bother it. By using
>>> isocpus and nohz cpuset, a task would be able to achieve true cpu
>>> isolation.
>>

One other aspect that this patch probably needs to address is the cache 
localization of irq spinlocks.

At least in 3.6, with !CONFIG_SPARSE_IRQ
--
struct  irq_desc irq_desc[NR_IRQS] __cacheline_aligned_in_smp = {
	[0 ... NR_IRQS-1] = {
		.handle_irq	= handle_bad_irq,
		.depth		= 1,
		.lock		= __RAW_SPIN_LOCK_UNLOCKED(irq_desc->lock),
	}
};
--

You are likely to get a cache miss in the top half of your low latency 
CPU anytime some other CPU has taken a spinlock which lies within the 
same cache line.

Or is my understanding of the __cacheline_aligned_in_smp declaration wrong ?

Br,
David


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs
  2012-11-02 14:37   ` Steven Rostedt
  2012-11-02 14:50     ` David Nyström
@ 2012-11-02 15:03     ` Christoph Lameter
  2012-11-02 15:14       ` Steven Rostedt
  2012-11-02 18:35       ` Paul E. McKenney
  1 sibling, 2 replies; 60+ messages in thread
From: Christoph Lameter @ 2012-11-02 15:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Clark Williams, Frederic Weisbecker, Li Zefan, Ingo Molnar,
	Paul E. McKenney, Mike Galbraith

On Fri, 2 Nov 2012, Steven Rostedt wrote:

> > also it would be best to sync this conceptually with the processors
> > enabled for rcu processing.
>
> Processors can be disabled for rcu processing? Or are you talking about
> Paul's new work of offloading rcu callbacks?

Yes. Paul's new work to remove rcu processing from processors. That needs
to be synced configuration wise somehow. It does not make sense to process
rcu callbacks on processors where the timer tick does not work anymore.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs
  2012-11-02 15:03     ` Christoph Lameter
@ 2012-11-02 15:14       ` Steven Rostedt
  2012-11-02 18:35       ` Paul E. McKenney
  1 sibling, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2012-11-02 15:14 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Clark Williams, Frederic Weisbecker, Li Zefan, Ingo Molnar,
	Paul E. McKenney, Mike Galbraith

On Fri, 2012-11-02 at 15:03 +0000, Christoph Lameter wrote:
> On Fri, 2 Nov 2012, Steven Rostedt wrote:
> 
> > > also it would be best to sync this conceptually with the processors
> > > enabled for rcu processing.
> >
> > Processors can be disabled for rcu processing? Or are you talking about
> > Paul's new work of offloading rcu callbacks?
> 
> Yes. Paul's new work to remove rcu processing from processors. That needs
> to be synced configuration wise somehow. It does not make sense to process
> rcu callbacks on processors where the timer tick does not work anymore.

Don't worry, Paul is working with us too ;-)

-- Steve



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs
  2012-11-02 15:03     ` Christoph Lameter
  2012-11-02 15:14       ` Steven Rostedt
@ 2012-11-02 18:35       ` Paul E. McKenney
  2012-11-02 20:16         ` Christoph Lameter
  1 sibling, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2012-11-02 18:35 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Steven Rostedt, linux-kernel, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Clark Williams, Frederic Weisbecker, Li Zefan,
	Ingo Molnar, Mike Galbraith

On Fri, Nov 02, 2012 at 03:03:01PM +0000, Christoph Lameter wrote:
> On Fri, 2 Nov 2012, Steven Rostedt wrote:
> 
> > > also it would be best to sync this conceptually with the processors
> > > enabled for rcu processing.
> >
> > Processors can be disabled for rcu processing? Or are you talking about
> > Paul's new work of offloading rcu callbacks?
> 
> Yes. Paul's new work to remove rcu processing from processors. That needs
> to be synced configuration wise somehow. It does not make sense to process
> rcu callbacks on processors where the timer tick does not work anymore.

In kernels built with CONFIG_FAST_NO_HZ=n, if there are callbacks,
then there will be a tick, with or without Frederic's adaptive ticks.
If CONFIG_FAST_NO_HZ=y, if there are callbacks but no tick, RCU will
arrange for a timer to allow RCU processing to proceed as needed, but
much longer than one tick in duration, and only until such time as the
RCU callbacks drain.

So, yes, people who need absolutely all jitter to be banished at whatever
cost would want both adaptive ticks and no-CBs CPUs, but not everyone
who wants adaptive ticks would necessarily want the burden of choosing
which CPUs get callbacks offloaded from and where they should be executed.

So I believe that these need to be controlled separately for the immediate
future.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs
  2012-11-02 18:35       ` Paul E. McKenney
@ 2012-11-02 20:16         ` Christoph Lameter
  2012-11-02 20:41           ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Christoph Lameter @ 2012-11-02 20:16 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, linux-kernel, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Clark Williams, Frederic Weisbecker, Li Zefan,
	Ingo Molnar, Mike Galbraith

On Fri, 2 Nov 2012, Paul E. McKenney wrote:

> So I believe that these need to be controlled separately for the immediate
> future.

Yes they do but the configurations are similar and it would be best if
these were cpumasks in standard locations instead of being specified at
boot time or in a cpuset.

Put the cpu masks into

/sys/devices/system/cpu/{nohz_cpus,rcu_cpus}

or so?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs
  2012-11-02 20:16         ` Christoph Lameter
@ 2012-11-02 20:41           ` Paul E. McKenney
  2012-11-02 20:51             ` Steven Rostedt
  0 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2012-11-02 20:41 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Steven Rostedt, linux-kernel, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Clark Williams, Frederic Weisbecker, Li Zefan,
	Ingo Molnar, Mike Galbraith

On Fri, Nov 02, 2012 at 08:16:58PM +0000, Christoph Lameter wrote:
> On Fri, 2 Nov 2012, Paul E. McKenney wrote:
> 
> > So I believe that these need to be controlled separately for the immediate
> > future.
> 
> Yes they do but the configurations are similar and it would be best if
> these were cpumasks in standard locations instead of being specified at
> boot time or in a cpuset.
> 
> Put the cpu masks into
> 
> /sys/devices/system/cpu/{nohz_cpus,rcu_cpus}
> 
> or so?

The no-CBs mask would be read-only for some time -- changed only at
boot.  Longer term, I hope to allow run-time modification, but...

							Thanx, Paul


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs
  2012-11-02 20:41           ` Paul E. McKenney
@ 2012-11-02 20:51             ` Steven Rostedt
  2012-11-03  2:08               ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Steven Rostedt @ 2012-11-02 20:51 UTC (permalink / raw)
  To: paulmck
  Cc: Christoph Lameter, linux-kernel, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Clark Williams, Frederic Weisbecker, Li Zefan,
	Ingo Molnar, Mike Galbraith

On Fri, 2012-11-02 at 13:41 -0700, Paul E. McKenney wrote:

> The no-CBs mask would be read-only for some time -- changed only at
> boot.  Longer term, I hope to allow run-time modification, but...
> 

but what? You're not looking to retire already are you? ;-)

-- Steve



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs
  2012-11-02 20:51             ` Steven Rostedt
@ 2012-11-03  2:08               ` Paul E. McKenney
  2012-11-05 15:17                 ` Christoph Lameter
  0 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2012-11-03  2:08 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Christoph Lameter, linux-kernel, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Clark Williams, Frederic Weisbecker, Li Zefan,
	Ingo Molnar, Mike Galbraith

On Fri, Nov 02, 2012 at 04:51:50PM -0400, Steven Rostedt wrote:
> On Fri, 2012-11-02 at 13:41 -0700, Paul E. McKenney wrote:
> 
> > The no-CBs mask would be read-only for some time -- changed only at
> > boot.  Longer term, I hope to allow run-time modification, but...
> 
> but what? You're not looking to retire already are you? ;-)

Not for a few decades.  ;-)

But let's add the no-CBs mask to sysfs when I add the ability to run-time
modify that mast.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs
  2012-11-03  2:08               ` Paul E. McKenney
@ 2012-11-05 15:17                 ` Christoph Lameter
  2012-11-05 22:41                   ` Frederic Weisbecker
  0 siblings, 1 reply; 60+ messages in thread
From: Christoph Lameter @ 2012-11-05 15:17 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, linux-kernel, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Clark Williams, Frederic Weisbecker, Li Zefan,
	Ingo Molnar, Mike Galbraith

On Fri, 2 Nov 2012, Paul E. McKenney wrote:

> On Fri, Nov 02, 2012 at 04:51:50PM -0400, Steven Rostedt wrote:
> > On Fri, 2012-11-02 at 13:41 -0700, Paul E. McKenney wrote:
> >
> > > The no-CBs mask would be read-only for some time -- changed only at
> > > boot.  Longer term, I hope to allow run-time modification, but...
> >
> > but what? You're not looking to retire already are you? ;-)
>
> Not for a few decades.  ;-)
>
> But let's add the no-CBs mask to sysfs when I add the ability to run-time
> modify that mast.

Well we are creating a user ABi with the boot time option. It would be
best to get it right out of the door.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs
  2012-11-02 14:23 ` Christoph Lameter
  2012-11-02 14:37   ` Steven Rostedt
@ 2012-11-05 22:32   ` Frederic Weisbecker
  1 sibling, 0 replies; 60+ messages in thread
From: Frederic Weisbecker @ 2012-11-05 22:32 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Steven Rostedt, linux-kernel, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Clark Williams, Li Zefan, Ingo Molnar,
	Paul E. McKenney, Mike Galbraith

2012/11/2 Christoph Lameter <cl@linux.com>:
> Also could we have this support without cpusets? There are multiple means
> to do system segmentation (f.e. cgroups) and something like hz control is
> pretty basic. Control via some cpumask like irq affinities in f.e.
>
>         /sys/devices/system/cpu/nohz
>
> or a per cpu flag in
>
> /sys/devices/system/cpu/cpu0/hz
>
> would be easier and not be tied to something like cpusets.

You really don't want that cpuset interface, do you? ;-)

Yeah I think I agree with you. This adds a dependency to
cpusets/cgroups, I wish we could avoid that if possible. Also cpuset
may be a bit counter intuitive for this usecase. What if a cpu is
included in both a nohz cpuset and a non-nohz cpuset? What is the
behaviour to adopt? An OR on the nohz flag such that as long as the
CPU is in at least one nohz cpuset, it's considered a nohz CPU? Or
only shutdown the tick for the tasks attached in the nohz cpusets? Do
we really want that per cgroup granularity and the overhead /
complexity that comes along?

No I think we should stay simple and have a simple per CPU property
for that, without involving cgroups aside.

So indeed a cpumask in /sys/devices/system/cpu/nohz looks like a
better interface.

>> This has been long asked for by those in the RT community. If a task
>> requires uninterruptible CPU time, this would be able to give a task
>> that, even without the full PREEMPT-RT patch set.
>
> Also those interested in low latency are very very interested in this
> feature in particular in support without any preempt support on in the
> kernel.

Sure, we are trying to make that full dyncticks approach as much
generic as possible.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs
  2012-11-05 15:17                 ` Christoph Lameter
@ 2012-11-05 22:41                   ` Frederic Weisbecker
  0 siblings, 0 replies; 60+ messages in thread
From: Frederic Weisbecker @ 2012-11-05 22:41 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Paul E. McKenney, Steven Rostedt, linux-kernel, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Clark Williams, Li Zefan,
	Ingo Molnar, Mike Galbraith

2012/11/5 Christoph Lameter <cl@linux.com>:
> On Fri, 2 Nov 2012, Paul E. McKenney wrote:
>
>> On Fri, Nov 02, 2012 at 04:51:50PM -0400, Steven Rostedt wrote:
>> > On Fri, 2012-11-02 at 13:41 -0700, Paul E. McKenney wrote:
>> >
>> > > The no-CBs mask would be read-only for some time -- changed only at
>> > > boot.  Longer term, I hope to allow run-time modification, but...
>> >
>> > but what? You're not looking to retire already are you? ;-)
>>
>> Not for a few decades.  ;-)
>>
>> But let's add the no-CBs mask to sysfs when I add the ability to run-time
>> modify that mast.
>
> Well we are creating a user ABi with the boot time option. It would be
> best to get it right out of the door.

I believe that a static setting through a boot option is a nice first
step already. Runtime tuning may involve dynamic migration and other
headaches. The nocb patch is tricky enough to review ;)

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2012-11-05 22:41 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
2012-10-29 20:27 ` [PATCH 01/32] nohz: Move nohz load balancer selection into idle logic Steven Rostedt
2012-10-30  8:32   ` Charles Wang
2012-10-30 15:39     ` Steven Rostedt
2012-10-29 20:27 ` [PATCH 02/32] cpuset: Set up interface for nohz flag Steven Rostedt
2012-10-30 17:16   ` Steven Rostedt
2012-10-29 20:27 ` [PATCH 03/32] nohz: Try not to give the timekeeping duty to an adaptive tickless cpu Steven Rostedt
2012-10-30 17:33   ` Steven Rostedt
2012-10-29 20:27 ` [PATCH 04/32] x86: New cpuset nohz irq vector Steven Rostedt
2012-10-30 17:39   ` Steven Rostedt
2012-10-30 23:51     ` Frederic Weisbecker
2012-10-31  0:07       ` Steven Rostedt
2012-10-31  0:45         ` Frederic Weisbecker
2012-10-29 20:27 ` [PATCH 05/32] nohz: Adaptive tick stop and restart on nohz cpuset Steven Rostedt
2012-10-30 18:23   ` Steven Rostedt
2012-10-29 20:27 ` [PATCH 06/32] nohz/cpuset: Dont turn off the tick if rcu needs it Steven Rostedt
2012-10-30 18:30   ` Steven Rostedt
2012-10-29 20:27 ` [PATCH 07/32] nohz/cpuset: Wake up adaptive nohz CPU when a timer gets enqueued Steven Rostedt
2012-10-29 20:27 ` [PATCH 08/32] nohz/cpuset: Dont stop the tick if posix cpu timers are running Steven Rostedt
2012-10-29 20:27 ` [PATCH 09/32] nohz/cpuset: Restart tick when nohz flag is cleared on cpuset Steven Rostedt
2012-10-30 18:55   ` Steven Rostedt
2012-10-29 20:27 ` [PATCH 10/32] nohz/cpuset: Restart the tick if printk needs it Steven Rostedt
2012-10-30 19:01   ` Steven Rostedt
2012-10-30 23:54     ` Frederic Weisbecker
2012-10-29 20:27 ` [PATCH 11/32] rcu: Restart the tick on non-responding adaptive nohz CPUs Steven Rostedt
2012-10-29 20:27 ` [PATCH 12/32] rcu: Restart tick if we enqueue a callback in a nohz/cpuset CPU Steven Rostedt
2012-10-29 20:27 ` [PATCH 13/32] nohz: Generalize tickless cpu time accounting Steven Rostedt
2012-10-29 20:27 ` [PATCH 14/32] nohz/cpuset: Account user and system times in adaptive nohz mode Steven Rostedt
2012-10-29 20:27 ` [PATCH 15/32] nohz/cpuset: New API to flush cputimes on nohz cpusets Steven Rostedt
2012-10-29 20:27 ` [PATCH 16/32] nohz/cpuset: Flush cputime on threads in nohz cpusets when waiting leader Steven Rostedt
2012-10-29 20:27 ` [PATCH 17/32] nohz/cpuset: Flush cputimes on procfs stat file read Steven Rostedt
2012-10-29 20:27 ` [PATCH 18/32] nohz/cpuset: Flush cputimes for getrusage() and times() syscalls Steven Rostedt
2012-10-29 20:27 ` [PATCH 19/32] x86: Syscall hooks for nohz cpusets Steven Rostedt
2012-10-29 20:27 ` [PATCH 20/32] nohz/cpuset: enable addition&removal of cpus while in adaptive nohz mode Steven Rostedt
2012-10-29 20:27 ` [PATCH 21/32] nohz: Dont restart the tick before scheduling to idle Steven Rostedt
2012-10-29 20:27 ` [PATCH 22/32] sched: Comment on rq->clock correctness in ttwu_do_wakeup() in nohz Steven Rostedt
2012-10-29 20:27 ` [PATCH 23/32] sched: Update rq clock on nohz CPU before migrating tasks Steven Rostedt
2012-10-29 20:27 ` [PATCH 24/32] sched: Update rq clock on nohz CPU before setting fair group shares Steven Rostedt
2012-10-29 20:27 ` [PATCH 25/32] sched: Update rq clock on tickless CPUs before calling check_preempt_curr() Steven Rostedt
2012-10-29 20:27 ` [PATCH 26/32] sched: Update rq clock earlier in unthrottle_cfs_rq Steven Rostedt
2012-10-29 20:27 ` [PATCH 27/32] sched: Update clock of nohz busiest rq before balancing Steven Rostedt
2012-10-29 20:27 ` [PATCH 28/32] sched: Update rq clock before idle balancing Steven Rostedt
2012-10-29 20:27 ` [PATCH 29/32] sched: Update nohz rq clock before searching busiest group on load balancing Steven Rostedt
2012-10-29 20:27 ` [PATCH 30/32] rcu: Switch to extended quiescent state in userspace from nohz cpuset Steven Rostedt
2012-10-29 20:27 ` [PATCH 31/32] nohz/cpuset: Disable under some configs Steven Rostedt
2012-10-29 20:27 ` [PATCH 32/32] nohz, not for merge: Add tickless tracing Steven Rostedt
2012-10-30 14:02 ` [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Gilad Ben-Yossef
2012-11-02 14:23 ` Christoph Lameter
2012-11-02 14:37   ` Steven Rostedt
2012-11-02 14:50     ` David Nyström
2012-11-02 15:03     ` Christoph Lameter
2012-11-02 15:14       ` Steven Rostedt
2012-11-02 18:35       ` Paul E. McKenney
2012-11-02 20:16         ` Christoph Lameter
2012-11-02 20:41           ` Paul E. McKenney
2012-11-02 20:51             ` Steven Rostedt
2012-11-03  2:08               ` Paul E. McKenney
2012-11-05 15:17                 ` Christoph Lameter
2012-11-05 22:41                   ` Frederic Weisbecker
2012-11-05 22:32   ` Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).