linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [ANNOUNCE] 3.7-nohz1
@ 2012-12-20 18:32 Frederic Weisbecker
  2012-12-20 18:32 ` [PATCH 01/24] context_tracking: Add comments on interface and internals Frederic Weisbecker
                   ` (25 more replies)
  0 siblings, 26 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-20 18:32 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
	Li Zhong

Hi,

So this is a new version of the nohz cpusets based on 3.7, except it's not using
cpusets anymore and I actually based it on the middle of the 3.8 merge window
in order to get latest upstream full dynticks preparatory work: cputime cleanups,
RCU user mode, context tracking subsystem, nohz code consolidation, ...

So the big changes since the last nohz cpuset release are:

* printk now uses irq work so it doesn't rely on the tick anymore (provided
your arch implements irq work with IPIs or alike). This chunk has been proposed
for the 3.8 merge window: https://lkml.org/lkml/2012/12/17/177
May be Linus will pull, may be not. We'll see. In any case I've included it in this tree
but I'm not reposting this part of the patchset to avoid spamming you.

* cputime doesn't rely on IPIs anymore. Now the reader does a special computation to
remotely get the tickless cputime.

* No more cpusets interface. Paul McKenney suggested me to start with a boot time
kernel parameter to define the full dynticks cpumask. And he was totally right, it
makes the code much more simple. That's a good way to start and to make the mainlining
easier. We can still add a runtime configuration later if necessary.

* Now there is always a CPU handling the timekeeping. This can be further optimized
and more power-friendly, I really did something simple-stupid. I guess we'll try to get
that into a better shape with Hakan. But at least the timekeeping now works.

* It uses the new RCU callbacks offlining feature. This way a full dynticks CPU doesn't
need to keep the tick to handle local callbacks. This is still very experimental though.

* No more specific IPI vector for full dynticks. We just use the scheduler ipi.

The branch is:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	3.7-nohz1

There is still quite some work to do.

== How to use? ==

Select:
	CONFIG_NO_HZ
	CONFIG_RCU_USER_QS
	CONFIG_VIRT_CPU_ACCOUNTING_GEN
	CONFIG_RCU_NOCB_CPU
	CONFIG_NO_HZ_FULL

You always need at least one timekeeping CPU.

Let's imagine you have 4 CPUs. We keep the CPU 0 to offline RCU callbacks there and to
handle the timekeeping. We set the rest as full dynticks. So you need the following kernel
parameters:

	rcu_nocbs=1-3 full_nohz=1-3

(Note rcu_nocbs value must always be the same as full_nohz).

Now if you want proper isolation you need to:

* Migrate your processes adequately
* Migrate your irqs to CPU 0
* Migrate the RCU nocb threads to CPU 0. Example with the above configuration:

	for p in $(ps -o pid= -C rcuo1,rcuo2,rcuo3)
	do
		taskset -cp 0 $p
	done

Then run what you want on the full dynticks CPUs. For best results, run 1 task
per CPU, mostly in userspace and mostly CPU bound (otherwise more IO = more kernel
mode execution = more chances to get IPIs, tick restarted, workqueues, kthreads, etc...)

This page contains a good reminder for those interested in CPU isolation: https://github.com/gby/linux/wiki

But keep in mind that my tree is not yet ready for serious production.

Happy Christmas, new year or whatever end of the world.
---

Frederic Weisbecker (32):
      irq_work: Fix racy IRQ_WORK_BUSY flag setting
      irq_work: Fix racy check on work pending flag
      irq_work: Remove CONFIG_HAVE_IRQ_WORK
      nohz: Add API to check tick state
      irq_work: Don't stop the tick with pending works
      irq_work: Make self-IPIs optable
      printk: Wake up klogd using irq_work
      Merge branch 'nohz/printk-v8' into 3.7-nohz1-stage
      context_tracking: Add comments on interface and internals
      cputime: Generic on-demand virtual cputime accounting
      cputime: Allow dynamic switch between tick/virtual based cputime accounting
      cputime: Use accessors to read task cputime stats
      cputime: Safely read cputime of full dynticks CPUs
      nohz: Basic full dynticks interface
      nohz: Assign timekeeping duty to a non-full-nohz CPU
      nohz: Trace timekeeping update
      nohz: Wake up full dynticks CPUs when a timer gets enqueued
      rcu: Restart the tick on non-responding full dynticks CPUs
      sched: Comment on rq->clock correctness in ttwu_do_wakeup() in nohz
      sched: Update rq clock on nohz CPU before migrating tasks
      sched: Update rq clock on nohz CPU before setting fair group shares
      sched: Update rq clock on tickless CPUs before calling check_preempt_curr()
      sched: Update rq clock earlier in unthrottle_cfs_rq
      sched: Update clock of nohz busiest rq before balancing
      sched: Update rq clock before idle balancing
      sched: Update nohz rq clock before searching busiest group on load balancing
      nohz: Move nohz load balancer selection into idle logic
      nohz: Full dynticks mode
      nohz: Only stop the tick on RCU nocb CPUs
      nohz: Don't turn off the tick if rcu needs it
      nohz: Don't stop the tick if posix cpu timers are running
      nohz: Add some tracing

Steven Rostedt (2):
      irq_work: Flush work on CPU_DYING
      irq_work: Warn if there's still work on cpu_down

 arch/alpha/Kconfig                  |    1 -
 arch/alpha/kernel/osf_sys.c         |    6 +-
 arch/arm/Kconfig                    |    1 -
 arch/arm64/Kconfig                  |    1 -
 arch/blackfin/Kconfig               |    1 -
 arch/frv/Kconfig                    |    1 -
 arch/hexagon/Kconfig                |    1 -
 arch/mips/Kconfig                   |    1 -
 arch/parisc/Kconfig                 |    1 -
 arch/powerpc/Kconfig                |    1 -
 arch/s390/Kconfig                   |    1 -
 arch/s390/kernel/vtime.c            |    4 +-
 arch/sh/Kconfig                     |    1 -
 arch/sparc/Kconfig                  |    1 -
 arch/x86/Kconfig                    |    1 -
 arch/x86/kernel/apm_32.c            |   11 +-
 drivers/isdn/mISDN/stack.c          |    7 +-
 drivers/staging/iio/trigger/Kconfig |    1 -
 fs/binfmt_elf.c                     |    8 +-
 fs/binfmt_elf_fdpic.c               |    7 +-
 include/asm-generic/cputime.h       |    1 +
 include/linux/context_tracking.h    |   28 +++++
 include/linux/hardirq.h             |    4 +-
 include/linux/init_task.h           |    9 ++
 include/linux/irq_work.h            |   20 +++
 include/linux/kernel_stat.h         |    2 +-
 include/linux/posix-timers.h        |    1 +
 include/linux/printk.h              |    3 -
 include/linux/rcupdate.h            |    8 ++
 include/linux/sched.h               |   48 +++++++-
 include/linux/tick.h                |   26 ++++-
 include/linux/vtime.h               |   47 +++++---
 init/Kconfig                        |   22 +++-
 kernel/acct.c                       |    6 +-
 kernel/context_tracking.c           |   91 +++++++++++----
 kernel/cpu.c                        |    4 +-
 kernel/delayacct.c                  |    7 +-
 kernel/exit.c                       |    6 +-
 kernel/fork.c                       |    8 +-
 kernel/irq_work.c                   |  131 ++++++++++++++++-----
 kernel/posix-cpu-timers.c           |   39 +++++-
 kernel/printk.c                     |   36 +++---
 kernel/rcutree.c                    |   19 +++-
 kernel/rcutree_plugin.h             |   13 +--
 kernel/sched/core.c                 |   69 +++++++++++-
 kernel/sched/cputime.c              |  222 ++++++++++++++++++++++++++++++-----
 kernel/sched/fair.c                 |   42 +++++++-
 kernel/sched/sched.h                |   15 +++
 kernel/signal.c                     |   12 ++-
 kernel/softirq.c                    |   11 +-
 kernel/time/Kconfig                 |    9 ++
 kernel/time/tick-broadcast.c        |    3 +-
 kernel/time/tick-common.c           |    5 +-
 kernel/time/tick-sched.c            |  142 ++++++++++++++++++++---
 kernel/timer.c                      |    3 +-
 kernel/tsacct.c                     |   19 ++-
 56 files changed, 955 insertions(+), 233 deletions(-)

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 01/24] context_tracking: Add comments on interface and internals
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
@ 2012-12-20 18:32 ` Frederic Weisbecker
  2012-12-20 18:32 ` [PATCH 02/24] cputime: Generic on-demand virtual cputime accounting Frederic Weisbecker
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-20 18:32 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
	Li Zhong

This subsystem lacks many explanations on its purpose and
design. Add these missing comments.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
---
 kernel/context_tracking.c |   73 ++++++++++++++++++++++++++++++++++++++------
 1 files changed, 63 insertions(+), 10 deletions(-)

diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index e0e07fd..9f6c38f 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -1,3 +1,19 @@
+/*
+ * Context tracking: Probe on high level context boundaries such as kernel
+ * and userspace. This includes syscalls and exceptions entry/exit.
+ *
+ * This is used by RCU to remove its dependency on the timer tick while a CPU
+ * runs in userspace.
+ *
+ *  Started by Frederic Weisbecker:
+ *
+ * Copyright (C) 2012 Red Hat, Inc., Frederic Weisbecker <fweisbec@redhat.com>
+ *
+ * Many thanks to Gilad Ben-Yossef, Paul McKenney, Ingo Molnar, Andrew Morton,
+ * Steven Rostedt, Peter Zijlstra for suggestions and improvements.
+ *
+ */
+
 #include <linux/context_tracking.h>
 #include <linux/rcupdate.h>
 #include <linux/sched.h>
@@ -6,8 +22,8 @@
 
 struct context_tracking {
 	/*
-	 * When active is false, hooks are not set to
-	 * minimize overhead: TIF flags are cleared
+	 * When active is false, hooks are unset in order
+	 * to minimize overhead: TIF flags are cleared
 	 * and calls to user_enter/exit are ignored. This
 	 * may be further optimized using static keys.
 	 */
@@ -24,6 +40,15 @@ static DEFINE_PER_CPU(struct context_tracking, context_tracking) = {
 #endif
 };
 
+/**
+ * user_enter - Inform the context tracking that the CPU is going to
+ *              enter userspace mode.
+ *
+ * This function must be called right before we switch from the kernel
+ * to userspace, when it's guaranteed the remaining kernel instructions
+ * to execute won't use any RCU read side critical section because this
+ * function sets RCU in extended quiescent state.
+ */
 void user_enter(void)
 {
 	unsigned long flags;
@@ -39,40 +64,68 @@ void user_enter(void)
 	if (in_interrupt())
 		return;
 
+	/* Kernel threads aren't supposed to go to userspace */
 	WARN_ON_ONCE(!current->mm);
 
 	local_irq_save(flags);
 	if (__this_cpu_read(context_tracking.active) &&
 	    __this_cpu_read(context_tracking.state) != IN_USER) {
 		__this_cpu_write(context_tracking.state, IN_USER);
+		/*
+		 * At this stage, only low level arch entry code remains and
+		 * then we'll run in userspace. We can assume there won't be
+		 * any RCU read-side critical section until the next call to
+		 * user_exit() or rcu_irq_enter(). Let's remove RCU's dependency
+		 * on the tick.
+		 */
 		rcu_user_enter();
 	}
 	local_irq_restore(flags);
 }
 
+
+/**
+ * user_exit - Inform the context tracking that the CPU is
+ *             exiting userspace mode and entering the kernel.
+ *
+ * This function must be called after we entered the kernel from userspace
+ * before any use of RCU read side critical section. This potentially include
+ * any high level kernel code like syscalls, exceptions, signal handling, etc...
+ *
+ * This call supports re-entrancy. This way it can be called from any exception
+ * handler without needing to know if we came from userspace or not.
+ */
 void user_exit(void)
 {
 	unsigned long flags;
 
-	/*
-	 * Some contexts may involve an exception occuring in an irq,
-	 * leading to that nesting:
-	 * rcu_irq_enter() rcu_user_exit() rcu_user_exit() rcu_irq_exit()
-	 * This would mess up the dyntick_nesting count though. And rcu_irq_*()
-	 * helpers are enough to protect RCU uses inside the exception. So
-	 * just return immediately if we detect we are in an IRQ.
-	 */
 	if (in_interrupt())
 		return;
 
 	local_irq_save(flags);
 	if (__this_cpu_read(context_tracking.state) == IN_USER) {
 		__this_cpu_write(context_tracking.state, IN_KERNEL);
+		/*
+		 * We are going to run code that may use RCU. Inform
+		 * RCU core about that (ie: we may need the tick again).
+		 */
 		rcu_user_exit();
 	}
 	local_irq_restore(flags);
 }
 
+
+/**
+ * context_tracking_task_switch - context switch the syscall hooks
+ *
+ * The context tracking uses the syscall slow path to implement its user-kernel
+ * boundaries hooks on syscalls. This way it doesn't impact the syscall fast
+ * path on CPUs that don't do context tracking.
+ *
+ * But we need to clear the flag on the previous task because it may later
+ * migrate to some CPU that doesn't do the context tracking. As such the TIF
+ * flag may not be desired there.
+ */
 void context_tracking_task_switch(struct task_struct *prev,
 			     struct task_struct *next)
 {
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 02/24] cputime: Generic on-demand virtual cputime accounting
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
  2012-12-20 18:32 ` [PATCH 01/24] context_tracking: Add comments on interface and internals Frederic Weisbecker
@ 2012-12-20 18:32 ` Frederic Weisbecker
  2012-12-21  5:11   ` Steven Rostedt
  2012-12-26  8:19   ` Li Zhong
  2012-12-20 18:32 ` [PATCH 03/24] cputime: Allow dynamic switch between tick/virtual based " Frederic Weisbecker
                   ` (23 subsequent siblings)
  25 siblings, 2 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-20 18:32 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

If we want to stop the tick further idle, we need to be
able to account the cputime without using the tick.

Virtual based cputime accounting solves that problem by
hooking into kernel/user boundaries.

However implementing CONFIG_VIRT_CPU_ACCOUNTING require
to set low level hooks and involves more overhead. But
we already have a generic context tracking subsystem
that is required for RCU needs by archs which will want to
shut down the tick outside idle.

This patch implements a generic virtual based cputime
accounting that relies on these generic kernel/user hooks.

There are some upsides of doing this:

- This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING
if context tracking is already built (already necessary for RCU in full
tickless mode).

- We can rely on the generic context tracking subsystem to dynamically
(de)activate the hooks, so that we can switch anytime between virtual
and tick based accounting. This way we don't have the overhead
of the virtual accounting when the tick is running periodically.

And a few downsides:

- It relies on jiffies and the hooks are set in high level code. This
results in less precise cputime accounting than with a true native
virtual based cputime accounting which hooks on low level code and use
a cpu hardware clock. Precision is not the goal of this though.

- There is probably more overhead than a native virtual based cputime
accounting. But this relies on hooks that are already set anyway.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/context_tracking.h |   28 +++++++++++
 include/linux/vtime.h            |    4 ++
 init/Kconfig                     |   11 ++++-
 kernel/context_tracking.c        |   22 ++-------
 kernel/sched/cputime.c           |   93 +++++++++++++++++++++++++++++++++++--
 5 files changed, 135 insertions(+), 23 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index e24339c..9f33fbc 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -3,12 +3,40 @@
 
 #ifdef CONFIG_CONTEXT_TRACKING
 #include <linux/sched.h>
+#include <linux/percpu.h>
+
+struct context_tracking {
+	/*
+	 * When active is false, hooks are unset in order
+	 * to minimize overhead: TIF flags are cleared
+	 * and calls to user_enter/exit are ignored. This
+	 * may be further optimized using static keys.
+	 */
+	bool active;
+	enum {
+		IN_KERNEL = 0,
+		IN_USER,
+	} state;
+};
+
+DECLARE_PER_CPU(struct context_tracking, context_tracking);
+
+static inline bool context_tracking_in_user(void)
+{
+	return __this_cpu_read(context_tracking.state) == IN_USER;
+}
+
+static inline bool context_tracking_active(void)
+{
+	return __this_cpu_read(context_tracking.active);
+}
 
 extern void user_enter(void);
 extern void user_exit(void);
 extern void context_tracking_task_switch(struct task_struct *prev,
 					 struct task_struct *next);
 #else
+static inline bool context_tracking_in_user(void) { return false; }
 static inline void user_enter(void) { }
 static inline void user_exit(void) { }
 static inline void context_tracking_task_switch(struct task_struct *prev,
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index ae30ab5..58392aa 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -17,6 +17,10 @@ static inline void vtime_account_system_irqsafe(struct task_struct *tsk) { }
 static inline void vtime_account(struct task_struct *tsk) { }
 #endif
 
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+static inline void arch_vtime_task_switch(struct task_struct *tsk) { }
+#endif
+
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
 extern void irqtime_account_irq(struct task_struct *tsk);
 #else
diff --git a/init/Kconfig b/init/Kconfig
index 60579d6..a64b3e8 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -340,7 +340,9 @@ config TICK_CPU_ACCOUNTING
 
 config VIRT_CPU_ACCOUNTING
 	bool "Deterministic task and CPU time accounting"
-	depends on HAVE_VIRT_CPU_ACCOUNTING
+	depends on HAVE_VIRT_CPU_ACCOUNTING || HAVE_CONTEXT_TRACKING
+	select VIRT_CPU_ACCOUNTING_GEN if !HAVE_VIRT_CPU_ACCOUNTING
+	default y if PPC64
 	help
 	  Select this option to enable more accurate task and CPU time
 	  accounting.  This is done by reading a CPU counter on each
@@ -363,6 +365,13 @@ config IRQ_TIME_ACCOUNTING
 
 endchoice
 
+config VIRT_CPU_ACCOUNTING_GEN
+	select CONTEXT_TRACKING
+	bool
+	help
+	  Implement a generic virtual based cputime accounting by using
+	  the context tracking subsystem.
+
 config BSD_PROCESS_ACCT
 	bool "BSD Process Accounting"
 	help
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 9f6c38f..ca1e073 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -17,24 +17,10 @@
 #include <linux/context_tracking.h>
 #include <linux/rcupdate.h>
 #include <linux/sched.h>
-#include <linux/percpu.h>
 #include <linux/hardirq.h>
 
-struct context_tracking {
-	/*
-	 * When active is false, hooks are unset in order
-	 * to minimize overhead: TIF flags are cleared
-	 * and calls to user_enter/exit are ignored. This
-	 * may be further optimized using static keys.
-	 */
-	bool active;
-	enum {
-		IN_KERNEL = 0,
-		IN_USER,
-	} state;
-};
 
-static DEFINE_PER_CPU(struct context_tracking, context_tracking) = {
+DEFINE_PER_CPU(struct context_tracking, context_tracking) = {
 #ifdef CONFIG_CONTEXT_TRACKING_FORCE
 	.active = true,
 #endif
@@ -70,7 +56,7 @@ void user_enter(void)
 	local_irq_save(flags);
 	if (__this_cpu_read(context_tracking.active) &&
 	    __this_cpu_read(context_tracking.state) != IN_USER) {
-		__this_cpu_write(context_tracking.state, IN_USER);
+		vtime_account_system(current);
 		/*
 		 * At this stage, only low level arch entry code remains and
 		 * then we'll run in userspace. We can assume there won't be
@@ -79,6 +65,7 @@ void user_enter(void)
 		 * on the tick.
 		 */
 		rcu_user_enter();
+		__this_cpu_write(context_tracking.state, IN_USER);
 	}
 	local_irq_restore(flags);
 }
@@ -104,12 +91,13 @@ void user_exit(void)
 
 	local_irq_save(flags);
 	if (__this_cpu_read(context_tracking.state) == IN_USER) {
-		__this_cpu_write(context_tracking.state, IN_KERNEL);
 		/*
 		 * We are going to run code that may use RCU. Inform
 		 * RCU core about that (ie: we may need the tick again).
 		 */
 		rcu_user_exit();
+		vtime_account_user(current);
+		__this_cpu_write(context_tracking.state, IN_KERNEL);
 	}
 	local_irq_restore(flags);
 }
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 293b202..da0a9e7 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -3,6 +3,7 @@
 #include <linux/tsacct_kern.h>
 #include <linux/kernel_stat.h>
 #include <linux/static_key.h>
+#include <linux/context_tracking.h>
 #include "sched.h"
 
 
@@ -495,10 +496,24 @@ void vtime_task_switch(struct task_struct *prev)
 #ifndef __ARCH_HAS_VTIME_ACCOUNT
 void vtime_account(struct task_struct *tsk)
 {
-	if (in_interrupt() || !is_idle_task(tsk))
-		vtime_account_system(tsk);
-	else
-		vtime_account_idle(tsk);
+	if (!in_interrupt()) {
+		/*
+		 * If we interrupted user, context_tracking_in_user()
+		 * is 1 because the context tracking don't hook
+		 * on irq entry/exit. This way we know if
+		 * we need to flush user time on kernel entry.
+		 */
+		if (context_tracking_in_user()) {
+			vtime_account_user(tsk);
+			return;
+		}
+
+		if (is_idle_task(tsk)) {
+			vtime_account_idle(tsk);
+			return;
+		}
+	}
+	vtime_account_system(tsk);
 }
 EXPORT_SYMBOL_GPL(vtime_account);
 #endif /* __ARCH_HAS_VTIME_ACCOUNT */
@@ -586,4 +601,72 @@ void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime
 	thread_group_cputime(p, &cputime);
 	cputime_adjust(&cputime, &p->signal->prev_cputime, ut, st);
 }
-#endif
+
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+static DEFINE_PER_CPU(long, last_jiffies) = INITIAL_JIFFIES;
+
+static cputime_t get_vtime_delta(void)
+{
+	long delta;
+
+	delta = jiffies - __this_cpu_read(last_jiffies);
+	__this_cpu_add(last_jiffies, delta);
+
+	return jiffies_to_cputime(delta);
+}
+
+void vtime_account_system(struct task_struct *tsk)
+{
+	cputime_t delta_cpu = get_vtime_delta();
+
+	account_system_time(tsk, irq_count(), delta_cpu, cputime_to_scaled(delta_cpu));
+}
+
+void vtime_account_user(struct task_struct *tsk)
+{
+	cputime_t delta_cpu = get_vtime_delta();
+
+	/*
+	 * This is an unfortunate hack: if we flush user time only on
+	 * irq entry, we miss the jiffies update and the time is spuriously
+	 * accounted to system time.
+	 */
+	if (context_tracking_in_user())
+		account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
+}
+
+void vtime_account_idle(struct task_struct *tsk)
+{
+	cputime_t delta_cpu = get_vtime_delta();
+
+	account_idle_time(delta_cpu);
+}
+
+static int __cpuinit vtime_cpu_notify(struct notifier_block *self,
+				      unsigned long action, void *hcpu)
+{
+	long cpu = (long)hcpu;
+	long *last_jiffies_cpu = per_cpu_ptr(&last_jiffies, cpu);
+
+	switch (action) {
+	case CPU_UP_PREPARE:
+	case CPU_UP_PREPARE_FROZEN:
+		/*
+		 * CHECKME: ensure that's visible by the CPU
+		 * once it wakes up
+		 */
+		*last_jiffies_cpu = jiffies;
+	default:
+		break;
+	}
+
+	return NOTIFY_OK;
+}
+
+static int __init init_vtime(void)
+{
+	cpu_notifier(vtime_cpu_notify, 0);
+	return 0;
+}
+early_initcall(init_vtime);
+#endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 03/24] cputime: Allow dynamic switch between tick/virtual based cputime accounting
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
  2012-12-20 18:32 ` [PATCH 01/24] context_tracking: Add comments on interface and internals Frederic Weisbecker
  2012-12-20 18:32 ` [PATCH 02/24] cputime: Generic on-demand virtual cputime accounting Frederic Weisbecker
@ 2012-12-20 18:32 ` Frederic Weisbecker
  2012-12-21 15:05   ` Steven Rostedt
  2012-12-20 18:32 ` [PATCH 04/24] cputime: Use accessors to read task cputime stats Frederic Weisbecker
                   ` (22 subsequent siblings)
  25 siblings, 1 reply; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-20 18:32 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

Allow to dynamically switch between tick and virtual based cputime accounting.
This way we can provide a kind of "on-demand" virtual based cputime
accounting. In this mode, the kernel will rely on the user hooks
subsystem to dynamically hook on kernel boundaries.

This is in preparation for beeing able to stop the timer tick further
idle. Doing so will depend on CONFIG_VIRT_CPU_ACCOUNTING which makes
it possible to account the cputime without the tick by hooking on
kernel/user boundaries.

Depending whether the tick is stopped or not, we can switch between
tick and vtime based accounting anytime in order to minimize the
overhead associated to user hooks.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/kernel_stat.h |    2 +-
 include/linux/sched.h       |    4 +-
 include/linux/vtime.h       |    9 +++++++
 init/Kconfig                |    6 ++++
 kernel/fork.c               |    2 +-
 kernel/sched/cputime.c      |   57 ++++++++++++++++++++++++++++---------------
 kernel/time/tick-sched.c    |    5 +++-
 7 files changed, 60 insertions(+), 25 deletions(-)

diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 66b7078..ed5f6ed 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -127,7 +127,7 @@ extern void account_system_time(struct task_struct *, int, cputime_t, cputime_t)
 extern void account_steal_time(cputime_t);
 extern void account_idle_time(cputime_t);
 
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
 static inline void account_process_tick(struct task_struct *tsk, int user)
 {
 	vtime_account_user(tsk);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 651b51a..547c1f0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -597,7 +597,7 @@ struct signal_struct {
 	cputime_t utime, stime, cutime, cstime;
 	cputime_t gtime;
 	cputime_t cgtime;
-#ifndef CONFIG_VIRT_CPU_ACCOUNTING
+#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
 	struct cputime prev_cputime;
 #endif
 	unsigned long nvcsw, nivcsw, cnvcsw, cnivcsw;
@@ -1357,7 +1357,7 @@ struct task_struct {
 
 	cputime_t utime, stime, utimescaled, stimescaled;
 	cputime_t gtime;
-#ifndef CONFIG_VIRT_CPU_ACCOUNTING
+#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
 	struct cputime prev_cputime;
 #endif
 	unsigned long nvcsw, nivcsw; /* context switch counts */
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index 58392aa..e57020d 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -10,11 +10,20 @@ extern void vtime_account_system_irqsafe(struct task_struct *tsk);
 extern void vtime_account_idle(struct task_struct *tsk);
 extern void vtime_account_user(struct task_struct *tsk);
 extern void vtime_account(struct task_struct *tsk);
+
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+extern bool vtime_accounting(void);
 #else
+static inline bool vtime_accounting(void) { return true; }
+#endif
+
+#else /* !CONFIG_VIRT_CPU_ACCOUNTING */
 static inline void vtime_task_switch(struct task_struct *prev) { }
 static inline void vtime_account_system(struct task_struct *tsk) { }
 static inline void vtime_account_system_irqsafe(struct task_struct *tsk) { }
+static inline void vtime_account_user(struct task_struct *tsk) { }
 static inline void vtime_account(struct task_struct *tsk) { }
+static inline bool vtime_accounting(void) { return false; }
 #endif
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
diff --git a/init/Kconfig b/init/Kconfig
index a64b3e8..9d7000a 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -342,6 +342,7 @@ config VIRT_CPU_ACCOUNTING
 	bool "Deterministic task and CPU time accounting"
 	depends on HAVE_VIRT_CPU_ACCOUNTING || HAVE_CONTEXT_TRACKING
 	select VIRT_CPU_ACCOUNTING_GEN if !HAVE_VIRT_CPU_ACCOUNTING
+	select VIRT_CPU_ACCOUNTING_NATIVE if HAVE_VIRT_CPU_ACCOUNTING
 	default y if PPC64
 	help
 	  Select this option to enable more accurate task and CPU time
@@ -367,11 +368,16 @@ endchoice
 
 config VIRT_CPU_ACCOUNTING_GEN
 	select CONTEXT_TRACKING
+	depends on VIRT_CPU_ACCOUNTING && HAVE_CONTEXT_TRACKING
 	bool
 	help
 	  Implement a generic virtual based cputime accounting by using
 	  the context tracking subsystem.
 
+config VIRT_CPU_ACCOUNTING_NATIVE
+	depends on VIRT_CPU_ACCOUNTING && HAVE_VIRT_CPU_ACCOUNTING
+	bool
+
 config BSD_PROCESS_ACCT
 	bool "BSD Process Accounting"
 	help
diff --git a/kernel/fork.c b/kernel/fork.c
index 3c31e87..a81efb8 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1221,7 +1221,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 
 	p->utime = p->stime = p->gtime = 0;
 	p->utimescaled = p->stimescaled = 0;
-#ifndef CONFIG_VIRT_CPU_ACCOUNTING
+#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
 	p->prev_cputime.utime = p->prev_cputime.stime = 0;
 #endif
 #if defined(SPLIT_RSS_COUNTING)
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index da0a9e7..e1fcab4 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -317,8 +317,6 @@ out:
 	rcu_read_unlock();
 }
 
-#ifndef CONFIG_VIRT_CPU_ACCOUNTING
-
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
 /*
  * Account a tick to a process and cpustat
@@ -388,6 +386,7 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
 						struct rq *rq) {}
 #endif /* CONFIG_IRQ_TIME_ACCOUNTING */
 
+#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
 /*
  * Account a single tick of cpu time.
  * @p: the process that the cpu time gets accounted to
@@ -398,6 +397,11 @@ void account_process_tick(struct task_struct *p, int user_tick)
 	cputime_t one_jiffy_scaled = cputime_to_scaled(cputime_one_jiffy);
 	struct rq *rq = this_rq();
 
+	if (vtime_accounting()) {
+		vtime_account_user(p);
+		return;
+	}
+
 	if (sched_clock_irqtime) {
 		irqtime_account_process_tick(p, user_tick, rq);
 		return;
@@ -439,29 +443,13 @@ void account_idle_ticks(unsigned long ticks)
 
 	account_idle_time(jiffies_to_cputime(ticks));
 }
-
 #endif
 
+
 /*
  * Use precise platform statistics if available:
  */
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING
-void task_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st)
-{
-	*ut = p->utime;
-	*st = p->stime;
-}
-
-void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st)
-{
-	struct task_cputime cputime;
-
-	thread_group_cputime(p, &cputime);
-
-	*ut = cputime.utime;
-	*st = cputime.stime;
-}
-
 void vtime_account_system_irqsafe(struct task_struct *tsk)
 {
 	unsigned long flags;
@@ -517,8 +505,25 @@ void vtime_account(struct task_struct *tsk)
 }
 EXPORT_SYMBOL_GPL(vtime_account);
 #endif /* __ARCH_HAS_VTIME_ACCOUNT */
+#endif /* CONFIG_VIRT_CPU_ACCOUNTING */
 
-#else
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
+void task_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st)
+{
+	*ut = p->utime;
+	*st = p->stime;
+}
+
+void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st)
+{
+	struct task_cputime cputime;
+
+	thread_group_cputime(p, &cputime);
+
+	*ut = cputime.utime;
+	*st = cputime.stime;
+}
+#else /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
 
 #ifndef nsecs_to_cputime
 # define nsecs_to_cputime(__nsecs)	nsecs_to_jiffies(__nsecs)
@@ -548,6 +553,12 @@ static void cputime_adjust(struct task_cputime *curr,
 {
 	cputime_t rtime, utime, total;
 
+	if (vtime_accounting()) {
+		*ut = curr->utime;
+		*st = curr->stime;
+		return;
+	}
+
 	utime = curr->utime;
 	total = utime + curr->stime;
 
@@ -601,6 +612,7 @@ void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime
 	thread_group_cputime(p, &cputime);
 	cputime_adjust(&cputime, &p->signal->prev_cputime, ut, st);
 }
+#endif /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
 static DEFINE_PER_CPU(long, last_jiffies) = INITIAL_JIFFIES;
@@ -642,6 +654,11 @@ void vtime_account_idle(struct task_struct *tsk)
 	account_idle_time(delta_cpu);
 }
 
+bool vtime_accounting(void)
+{
+	return context_tracking_active();
+}
+
 static int __cpuinit vtime_cpu_notify(struct notifier_block *self,
 				      unsigned long action, void *hcpu)
 {
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index fb8e5e4..ad0e6fa 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -632,8 +632,11 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
 
 static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
 {
-#ifndef CONFIG_VIRT_CPU_ACCOUNTING
+#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
 	unsigned long ticks;
+
+	if (vtime_accounting())
+		return;
 	/*
 	 * We stopped the tick in idle. Update process times would miss the
 	 * time we slept as update_process_times does only a 1 tick
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 04/24] cputime: Use accessors to read task cputime stats
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2012-12-20 18:32 ` [PATCH 03/24] cputime: Allow dynamic switch between tick/virtual based " Frederic Weisbecker
@ 2012-12-20 18:32 ` Frederic Weisbecker
  2012-12-20 18:32 ` [PATCH 05/24] cputime: Safely read cputime of full dynticks CPUs Frederic Weisbecker
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-20 18:32 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

This is in preparation for the full dynticks feature. While
remotely reading the cputime of a task running in a full
dynticks CPU, we'll need to do some extra-computation. This
way we can account the time it spent tickless in userspace
since its last cputime snapshot.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 arch/alpha/kernel/osf_sys.c |    6 ++++--
 arch/x86/kernel/apm_32.c    |   11 ++++++-----
 drivers/isdn/mISDN/stack.c  |    7 ++++++-
 fs/binfmt_elf.c             |    8 ++++++--
 fs/binfmt_elf_fdpic.c       |    7 +++++--
 include/linux/sched.h       |   18 ++++++++++++++++++
 kernel/acct.c               |    6 ++++--
 kernel/cpu.c                |    4 +++-
 kernel/delayacct.c          |    7 +++++--
 kernel/exit.c               |    6 ++++--
 kernel/posix-cpu-timers.c   |   28 ++++++++++++++++++++++------
 kernel/sched/cputime.c      |    9 +++++----
 kernel/signal.c             |   12 ++++++++----
 kernel/tsacct.c             |   19 +++++++++++++------
 14 files changed, 109 insertions(+), 39 deletions(-)

diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
index 14db93e..dbc1760 100644
--- a/arch/alpha/kernel/osf_sys.c
+++ b/arch/alpha/kernel/osf_sys.c
@@ -1139,6 +1139,7 @@ struct rusage32 {
 SYSCALL_DEFINE2(osf_getrusage, int, who, struct rusage32 __user *, ru)
 {
 	struct rusage32 r;
+	cputime_t utime, stime;
 
 	if (who != RUSAGE_SELF && who != RUSAGE_CHILDREN)
 		return -EINVAL;
@@ -1146,8 +1147,9 @@ SYSCALL_DEFINE2(osf_getrusage, int, who, struct rusage32 __user *, ru)
 	memset(&r, 0, sizeof(r));
 	switch (who) {
 	case RUSAGE_SELF:
-		jiffies_to_timeval32(current->utime, &r.ru_utime);
-		jiffies_to_timeval32(current->stime, &r.ru_stime);
+		task_cputime(current, &utime, &stime);
+		jiffies_to_timeval32(utime, &r.ru_utime);
+		jiffies_to_timeval32(stime, &r.ru_stime);
 		r.ru_minflt = current->min_flt;
 		r.ru_majflt = current->maj_flt;
 		break;
diff --git a/arch/x86/kernel/apm_32.c b/arch/x86/kernel/apm_32.c
index d65464e..8d7012b 100644
--- a/arch/x86/kernel/apm_32.c
+++ b/arch/x86/kernel/apm_32.c
@@ -899,6 +899,7 @@ static void apm_cpu_idle(void)
 	static int use_apm_idle; /* = 0 */
 	static unsigned int last_jiffies; /* = 0 */
 	static unsigned int last_stime; /* = 0 */
+	cputime_t stime;
 
 	int apm_idle_done = 0;
 	unsigned int jiffies_since_last_check = jiffies - last_jiffies;
@@ -906,23 +907,23 @@ static void apm_cpu_idle(void)
 
 	WARN_ONCE(1, "deprecated apm_cpu_idle will be deleted in 2012");
 recalc:
+	task_cputime(current, NULL, &stime);
 	if (jiffies_since_last_check > IDLE_CALC_LIMIT) {
 		use_apm_idle = 0;
-		last_jiffies = jiffies;
-		last_stime = current->stime;
 	} else if (jiffies_since_last_check > idle_period) {
 		unsigned int idle_percentage;
 
-		idle_percentage = current->stime - last_stime;
+		idle_percentage = stime - last_stime;
 		idle_percentage *= 100;
 		idle_percentage /= jiffies_since_last_check;
 		use_apm_idle = (idle_percentage > idle_threshold);
 		if (apm_info.forbid_idle)
 			use_apm_idle = 0;
-		last_jiffies = jiffies;
-		last_stime = current->stime;
 	}
 
+	last_jiffies = jiffies;
+	last_stime = stime;
+
 	bucket = IDLE_LEAKY_MAX;
 
 	while (!need_resched()) {
diff --git a/drivers/isdn/mISDN/stack.c b/drivers/isdn/mISDN/stack.c
index 5f21f62..deda591 100644
--- a/drivers/isdn/mISDN/stack.c
+++ b/drivers/isdn/mISDN/stack.c
@@ -18,6 +18,7 @@
 #include <linux/slab.h>
 #include <linux/mISDNif.h>
 #include <linux/kthread.h>
+#include <linux/sched.h>
 #include "core.h"
 
 static u_int	*debug;
@@ -202,6 +203,9 @@ static int
 mISDNStackd(void *data)
 {
 	struct mISDNstack *st = data;
+#ifdef MISDN_MSG_STATS
+	cputime_t utime, stime;
+#endif
 	int err = 0;
 
 	sigfillset(&current->blocked);
@@ -303,9 +307,10 @@ mISDNStackd(void *data)
 	       "msg %d sleep %d stopped\n",
 	       dev_name(&st->dev->dev), st->msg_cnt, st->sleep_cnt,
 	       st->stopped_cnt);
+	task_cputime(st->thread, &utime, &stime);
 	printk(KERN_DEBUG
 	       "mISDNStackd daemon for %s utime(%ld) stime(%ld)\n",
-	       dev_name(&st->dev->dev), st->thread->utime, st->thread->stime);
+	       dev_name(&st->dev->dev), utime, stime);
 	printk(KERN_DEBUG
 	       "mISDNStackd daemon for %s nvcsw(%ld) nivcsw(%ld)\n",
 	       dev_name(&st->dev->dev), st->thread->nvcsw, st->thread->nivcsw);
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 6d7d164..0766a2b 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -33,6 +33,7 @@
 #include <linux/elf.h>
 #include <linux/utsname.h>
 #include <linux/coredump.h>
+#include <linux/sched.h>
 #include <asm/uaccess.h>
 #include <asm/param.h>
 #include <asm/page.h>
@@ -1320,8 +1321,11 @@ static void fill_prstatus(struct elf_prstatus *prstatus,
 		cputime_to_timeval(cputime.utime, &prstatus->pr_utime);
 		cputime_to_timeval(cputime.stime, &prstatus->pr_stime);
 	} else {
-		cputime_to_timeval(p->utime, &prstatus->pr_utime);
-		cputime_to_timeval(p->stime, &prstatus->pr_stime);
+		cputime_t utime, stime;
+
+		task_cputime(p, &utime, &stime);
+		cputime_to_timeval(utime, &prstatus->pr_utime);
+		cputime_to_timeval(stime, &prstatus->pr_stime);
 	}
 	cputime_to_timeval(p->signal->cutime, &prstatus->pr_cutime);
 	cputime_to_timeval(p->signal->cstime, &prstatus->pr_cstime);
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index dc84732..cb240dd 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -1375,8 +1375,11 @@ static void fill_prstatus(struct elf_prstatus *prstatus,
 		cputime_to_timeval(cputime.utime, &prstatus->pr_utime);
 		cputime_to_timeval(cputime.stime, &prstatus->pr_stime);
 	} else {
-		cputime_to_timeval(p->utime, &prstatus->pr_utime);
-		cputime_to_timeval(p->stime, &prstatus->pr_stime);
+		cputime_t utime, stime;
+
+		task_cputime(p, &utime, &stime);
+		cputime_to_timeval(utime, &prstatus->pr_utime);
+		cputime_to_timeval(stime, &prstatus->pr_stime);
 	}
 	cputime_to_timeval(p->signal->cutime, &prstatus->pr_cutime);
 	cputime_to_timeval(p->signal->cstime, &prstatus->pr_cstime);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 547c1f0..031afd0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1769,6 +1769,24 @@ static inline void put_task_struct(struct task_struct *t)
 		__put_task_struct(t);
 }
 
+static inline void task_cputime(struct task_struct *t,
+				cputime_t *utime, cputime_t *stime)
+{
+	if (utime)
+		*utime = t->utime;
+	if (stime)
+		*stime = t->stime;
+}
+
+static inline void task_cputime_scaled(struct task_struct *t,
+				       cputime_t *utimescaled,
+				       cputime_t *stimescaled)
+{
+	if (utimescaled)
+		*utimescaled = t->utimescaled;
+	if (stimescaled)
+		*stimescaled = t->stimescaled;
+}
 extern void task_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st);
 extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st);
 
diff --git a/kernel/acct.c b/kernel/acct.c
index 051e071..e8b1627 100644
--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -566,6 +566,7 @@ out:
 void acct_collect(long exitcode, int group_dead)
 {
 	struct pacct_struct *pacct = &current->signal->pacct;
+	cputime_t utime, stime;
 	unsigned long vsize = 0;
 
 	if (group_dead && current->mm) {
@@ -593,8 +594,9 @@ void acct_collect(long exitcode, int group_dead)
 		pacct->ac_flag |= ACORE;
 	if (current->flags & PF_SIGNALED)
 		pacct->ac_flag |= AXSIG;
-	pacct->ac_utime += current->utime;
-	pacct->ac_stime += current->stime;
+	task_cputime(current, &utime, &stime);
+	pacct->ac_utime += utime;
+	pacct->ac_stime += stime;
 	pacct->ac_minflt += current->min_flt;
 	pacct->ac_majflt += current->maj_flt;
 	spin_unlock_irq(&current->sighand->siglock);
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 3046a50..e5d5e8e 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -224,11 +224,13 @@ void clear_tasks_mm_cpumask(int cpu)
 static inline void check_for_tasks(int cpu)
 {
 	struct task_struct *p;
+	cputime_t utime, stime;
 
 	write_lock_irq(&tasklist_lock);
 	for_each_process(p) {
+		task_cputime(p, &utime, &stime);
 		if (task_cpu(p) == cpu && p->state == TASK_RUNNING &&
-		    (p->utime || p->stime))
+		    (utime || stime))
 			printk(KERN_WARNING "Task %s (pid = %d) is on cpu %d "
 				"(state = %ld, flags = %x)\n",
 				p->comm, task_pid_nr(p), cpu,
diff --git a/kernel/delayacct.c b/kernel/delayacct.c
index 418b3f7..d473988 100644
--- a/kernel/delayacct.c
+++ b/kernel/delayacct.c
@@ -106,6 +106,7 @@ int __delayacct_add_tsk(struct taskstats *d, struct task_struct *tsk)
 	unsigned long long t2, t3;
 	unsigned long flags;
 	struct timespec ts;
+	cputime_t utime, stime, stimescaled, utimescaled;
 
 	/* Though tsk->delays accessed later, early exit avoids
 	 * unnecessary returning of other data
@@ -114,12 +115,14 @@ int __delayacct_add_tsk(struct taskstats *d, struct task_struct *tsk)
 		goto done;
 
 	tmp = (s64)d->cpu_run_real_total;
-	cputime_to_timespec(tsk->utime + tsk->stime, &ts);
+	task_cputime(tsk, &utime, &stime);
+	cputime_to_timespec(utime + stime, &ts);
 	tmp += timespec_to_ns(&ts);
 	d->cpu_run_real_total = (tmp < (s64)d->cpu_run_real_total) ? 0 : tmp;
 
 	tmp = (s64)d->cpu_scaled_run_real_total;
-	cputime_to_timespec(tsk->utimescaled + tsk->stimescaled, &ts);
+	task_cputime_scaled(tsk, &utimescaled, &stimescaled);
+	cputime_to_timespec(utimescaled + stimescaled, &ts);
 	tmp += timespec_to_ns(&ts);
 	d->cpu_scaled_run_real_total =
 		(tmp < (s64)d->cpu_scaled_run_real_total) ? 0 : tmp;
diff --git a/kernel/exit.c b/kernel/exit.c
index 50d2e93..46481c0 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -97,6 +97,7 @@ static void __exit_signal(struct task_struct *tsk)
 	bool group_dead = thread_group_leader(tsk);
 	struct sighand_struct *sighand;
 	struct tty_struct *uninitialized_var(tty);
+	cputime_t utime, stime;
 
 	sighand = rcu_dereference_check(tsk->sighand,
 					lockdep_tasklist_lock_is_held());
@@ -135,8 +136,9 @@ static void __exit_signal(struct task_struct *tsk)
 		 * We won't ever get here for the group leader, since it
 		 * will have been the last reference on the signal_struct.
 		 */
-		sig->utime += tsk->utime;
-		sig->stime += tsk->stime;
+		task_cputime(tsk, &utime, &stime);
+		sig->utime += utime;
+		sig->stime += stime;
 		sig->gtime += tsk->gtime;
 		sig->min_flt += tsk->min_flt;
 		sig->maj_flt += tsk->maj_flt;
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index d738402..3d58bd5 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -154,11 +154,19 @@ static void bump_cpu_timer(struct k_itimer *timer,
 
 static inline cputime_t prof_ticks(struct task_struct *p)
 {
-	return p->utime + p->stime;
+	cputime_t utime, stime;
+
+	task_cputime(p, &utime, &stime);
+
+	return utime + stime;
 }
 static inline cputime_t virt_ticks(struct task_struct *p)
 {
-	return p->utime;
+	cputime_t utime;
+
+	task_cputime(p, &utime, NULL);
+
+	return utime;
 }
 
 static int
@@ -470,16 +478,21 @@ static void cleanup_timers(struct list_head *head,
  */
 void posix_cpu_timers_exit(struct task_struct *tsk)
 {
+	cputime_t utime, stime;
+
+	task_cputime(tsk, &utime, &stime);
 	cleanup_timers(tsk->cpu_timers,
-		       tsk->utime, tsk->stime, tsk->se.sum_exec_runtime);
+		       utime, stime, tsk->se.sum_exec_runtime);
 
 }
 void posix_cpu_timers_exit_group(struct task_struct *tsk)
 {
 	struct signal_struct *const sig = tsk->signal;
+	cputime_t utime, stime;
 
+	task_cputime(tsk, &utime, &stime);
 	cleanup_timers(tsk->signal->cpu_timers,
-		       tsk->utime + sig->utime, tsk->stime + sig->stime,
+		       utime + sig->utime, stime + sig->stime,
 		       tsk->se.sum_exec_runtime + sig->sum_sched_runtime);
 }
 
@@ -1223,11 +1236,14 @@ static inline int task_cputime_expired(const struct task_cputime *sample,
 static inline int fastpath_timer_check(struct task_struct *tsk)
 {
 	struct signal_struct *sig;
+	cputime_t utime, stime;
+
+	task_cputime(tsk, &utime, &stime);
 
 	if (!task_cputime_zero(&tsk->cputime_expires)) {
 		struct task_cputime task_sample = {
-			.utime = tsk->utime,
-			.stime = tsk->stime,
+			.utime = utime,
+			.stime = stime,
 			.sum_exec_runtime = tsk->se.sum_exec_runtime
 		};
 
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index e1fcab4..0603671 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -296,6 +296,7 @@ static __always_inline bool steal_account_process_tick(void)
 void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
 {
 	struct signal_struct *sig = tsk->signal;
+	cputime_t utime, stime;
 	struct task_struct *t;
 
 	times->utime = sig->utime;
@@ -309,8 +310,9 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
 
 	t = tsk;
 	do {
-		times->utime += t->utime;
-		times->stime += t->stime;
+		task_cputime(tsk, &utime, &stime);
+		times->utime += utime;
+		times->stime += stime;
 		times->sum_exec_runtime += task_sched_runtime(t);
 	} while_each_thread(tsk, t);
 out:
@@ -594,11 +596,10 @@ static void cputime_adjust(struct task_cputime *curr,
 void task_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st)
 {
 	struct task_cputime cputime = {
-		.utime = p->utime,
-		.stime = p->stime,
 		.sum_exec_runtime = p->se.sum_exec_runtime,
 	};
 
+	task_cputime(p, &cputime.utime, &cputime.stime);
 	cputime_adjust(&cputime, &p->prev_cputime, ut, st);
 }
 
diff --git a/kernel/signal.c b/kernel/signal.c
index a49c7f3..bc9e5cd 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1637,6 +1637,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	unsigned long flags;
 	struct sighand_struct *psig;
 	bool autoreap = false;
+	cputime_t utime, stime;
 
 	BUG_ON(sig == -1);
 
@@ -1674,8 +1675,9 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 				       task_uid(tsk));
 	rcu_read_unlock();
 
-	info.si_utime = cputime_to_clock_t(tsk->utime + tsk->signal->utime);
-	info.si_stime = cputime_to_clock_t(tsk->stime + tsk->signal->stime);
+	task_cputime(tsk, &utime, &stime);
+	info.si_utime = cputime_to_clock_t(utime + tsk->signal->utime);
+	info.si_stime = cputime_to_clock_t(stime + tsk->signal->stime);
 
 	info.si_status = tsk->exit_code & 0x7f;
 	if (tsk->exit_code & 0x80)
@@ -1739,6 +1741,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	unsigned long flags;
 	struct task_struct *parent;
 	struct sighand_struct *sighand;
+	cputime_t utime, stime;
 
 	if (for_ptracer) {
 		parent = tsk->parent;
@@ -1757,8 +1760,9 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	info.si_uid = from_kuid_munged(task_cred_xxx(parent, user_ns), task_uid(tsk));
 	rcu_read_unlock();
 
-	info.si_utime = cputime_to_clock_t(tsk->utime);
-	info.si_stime = cputime_to_clock_t(tsk->stime);
+	task_cputime(tsk, &utime, &stime);
+	info.si_utime = cputime_to_clock_t(utime);
+	info.si_stime = cputime_to_clock_t(stime);
 
  	info.si_code = why;
  	switch (why) {
diff --git a/kernel/tsacct.c b/kernel/tsacct.c
index 625df0b..017181f 100644
--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -32,6 +32,7 @@ void bacct_add_tsk(struct user_namespace *user_ns,
 {
 	const struct cred *tcred;
 	struct timespec uptime, ts;
+	cputime_t utime, stime, utimescaled, stimescaled;
 	u64 ac_etime;
 
 	BUILD_BUG_ON(TS_COMM_LEN < TASK_COMM_LEN);
@@ -65,10 +66,15 @@ void bacct_add_tsk(struct user_namespace *user_ns,
 	stats->ac_ppid	 = pid_alive(tsk) ?
 		task_tgid_nr_ns(rcu_dereference(tsk->real_parent), pid_ns) : 0;
 	rcu_read_unlock();
-	stats->ac_utime = cputime_to_usecs(tsk->utime);
-	stats->ac_stime = cputime_to_usecs(tsk->stime);
-	stats->ac_utimescaled = cputime_to_usecs(tsk->utimescaled);
-	stats->ac_stimescaled = cputime_to_usecs(tsk->stimescaled);
+
+	task_cputime(tsk, &utime, &stime);
+	stats->ac_utime = cputime_to_usecs(utime);
+	stats->ac_stime = cputime_to_usecs(stime);
+
+	task_cputime_scaled(tsk, &utimescaled, &stimescaled);
+	stats->ac_utimescaled = cputime_to_usecs(utimescaled);
+	stats->ac_stimescaled = cputime_to_usecs(stimescaled);
+
 	stats->ac_minflt = tsk->min_flt;
 	stats->ac_majflt = tsk->maj_flt;
 
@@ -122,13 +128,14 @@ void xacct_add_tsk(struct taskstats *stats, struct task_struct *p)
 void acct_update_integrals(struct task_struct *tsk)
 {
 	if (likely(tsk->mm)) {
-		cputime_t time, dtime;
+		cputime_t time, dtime, stime, utime;
 		struct timeval value;
 		unsigned long flags;
 		u64 delta;
 
 		local_irq_save(flags);
-		time = tsk->stime + tsk->utime;
+		task_cputime(tsk, &utime, &stime);
+		time = stime + utime;
 		dtime = time - tsk->acct_timexpd;
 		jiffies_to_timeval(cputime_to_jiffies(dtime), &value);
 		delta = value.tv_sec;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 05/24] cputime: Safely read cputime of full dynticks CPUs
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2012-12-20 18:32 ` [PATCH 04/24] cputime: Use accessors to read task cputime stats Frederic Weisbecker
@ 2012-12-20 18:32 ` Frederic Weisbecker
  2012-12-21 15:09   ` Steven Rostedt
  2012-12-20 18:32 ` [PATCH 06/24] nohz: Basic full dynticks interface Frederic Weisbecker
                   ` (20 subsequent siblings)
  25 siblings, 1 reply; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-20 18:32 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

While remotely reading the cputime of a task running in a
full dynticks CPU, the values stored in utime/stime fields
of struct task_struct may be stale. Its values may be those
of the last kernel <-> user transition time snapshot and
we need to add the tickless time spent since this snapshot.

To fix this, flush the cputime of the dynticks CPUs on
kernel <-> user transition and record the time / context
where we did this. Then on top of this snapshot and the current
time, perform the fixup on the reader side from task_times()
accessors.

FIXME: do the same for idle and guest time.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 arch/s390/kernel/vtime.c      |    4 +-
 include/asm-generic/cputime.h |    1 +
 include/linux/hardirq.h       |    4 +-
 include/linux/init_task.h     |    9 +++
 include/linux/sched.h         |   16 +++++
 include/linux/vtime.h         |   40 +++++++-------
 kernel/context_tracking.c     |    2 +-
 kernel/fork.c                 |    6 ++
 kernel/sched/cputime.c        |  123 ++++++++++++++++++++++++++++++-----------
 kernel/softirq.c              |    6 +-
 10 files changed, 151 insertions(+), 60 deletions(-)

diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c
index e84b8b6..e1718fb 100644
--- a/arch/s390/kernel/vtime.c
+++ b/arch/s390/kernel/vtime.c
@@ -127,7 +127,7 @@ void vtime_account_user(struct task_struct *tsk)
  * Update process times based on virtual cpu times stored by entry.S
  * to the lowcore fields user_timer, system_timer & steal_clock.
  */
-void vtime_account(struct task_struct *tsk)
+void vtime_account_irq_enter(struct task_struct *tsk)
 {
 	struct thread_info *ti = task_thread_info(tsk);
 	u64 timer, system;
@@ -148,7 +148,7 @@ void vtime_account(struct task_struct *tsk)
 EXPORT_SYMBOL_GPL(vtime_account);
 
 void vtime_account_system(struct task_struct *tsk)
-__attribute__((alias("vtime_account")));
+__attribute__((alias("vtime_account_irq_enter")));
 EXPORT_SYMBOL_GPL(vtime_account_system);
 
 void __kprobes vtime_stop_cpu(void)
diff --git a/include/asm-generic/cputime.h b/include/asm-generic/cputime.h
index 9a62937..3e704d5 100644
--- a/include/asm-generic/cputime.h
+++ b/include/asm-generic/cputime.h
@@ -10,6 +10,7 @@ typedef unsigned long __nocast cputime_t;
 #define cputime_to_jiffies(__ct)	(__force unsigned long)(__ct)
 #define cputime_to_scaled(__ct)		(__ct)
 #define jiffies_to_cputime(__hz)	(__force cputime_t)(__hz)
+#define jiffies_to_scaled(__hz)		(__force cputime_t)(__hz)
 
 typedef u64 __nocast cputime64_t;
 
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 624ef3f..7105d5c 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -153,7 +153,7 @@ extern void rcu_nmi_exit(void);
  */
 #define __irq_enter()					\
 	do {						\
-		vtime_account_irq_enter(current);	\
+		account_irq_enter_time(current);	\
 		add_preempt_count(HARDIRQ_OFFSET);	\
 		trace_hardirq_enter();			\
 	} while (0)
@@ -169,7 +169,7 @@ extern void irq_enter(void);
 #define __irq_exit()					\
 	do {						\
 		trace_hardirq_exit();			\
-		vtime_account_irq_exit(current);	\
+		account_irq_exit_time(current);		\
 		sub_preempt_count(HARDIRQ_OFFSET);	\
 	} while (0)
 
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 6d087c5..870f13e 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -10,6 +10,7 @@
 #include <linux/pid_namespace.h>
 #include <linux/user_namespace.h>
 #include <linux/securebits.h>
+#include <linux/seqlock.h>
 #include <net/net_namespace.h>
 
 #ifdef CONFIG_SMP
@@ -141,6 +142,13 @@ extern struct task_group root_task_group;
 # define INIT_PERF_EVENTS(tsk)
 #endif
 
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+#define INIT_VTIME(tsk)						\
+	.vtime_seqlock = __SEQLOCK_UNLOCKED(tsk.vtime_seqlock),	\
+	.prev_jiffies = INITIAL_JIFFIES, /* CHECKME */		\
+	.prev_jiffies_whence = JIFFIES_SYS,
+#endif
+
 #define INIT_TASK_COMM "swapper"
 
 /*
@@ -210,6 +218,7 @@ extern struct task_group root_task_group;
 	INIT_TRACE_RECURSION						\
 	INIT_TASK_RCU_PREEMPT(tsk)					\
 	INIT_CPUSET_SEQ							\
+	INIT_VTIME(tsk)							\
 }
 
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 031afd0..727b988 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1360,6 +1360,15 @@ struct task_struct {
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
 	struct cputime prev_cputime;
 #endif
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+	seqlock_t vtime_seqlock;
+	long prev_jiffies;
+	enum {
+		JIFFIES_SLEEPING = 0,
+		JIFFIES_USER,
+		JIFFIES_SYS,
+	} prev_jiffies_whence;
+#endif
 	unsigned long nvcsw, nivcsw; /* context switch counts */
 	struct timespec start_time; 		/* monotonic time */
 	struct timespec real_start_time;	/* boot based time */
@@ -1769,6 +1778,12 @@ static inline void put_task_struct(struct task_struct *t)
 		__put_task_struct(t);
 }
 
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+extern void task_cputime(struct task_struct *t,
+			 cputime_t *utime, cputime_t *stime);
+extern void task_cputime_scaled(struct task_struct *t,
+				cputime_t *utimescaled, cputime_t *stimescaled);
+#else
 static inline void task_cputime(struct task_struct *t,
 				cputime_t *utime, cputime_t *stime)
 {
@@ -1787,6 +1802,7 @@ static inline void task_cputime_scaled(struct task_struct *t,
 	if (stimescaled)
 		*stimescaled = t->stimescaled;
 }
+#endif
 extern void task_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st);
 extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st);
 
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index e57020d..81c7d84 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -9,52 +9,52 @@ extern void vtime_account_system(struct task_struct *tsk);
 extern void vtime_account_system_irqsafe(struct task_struct *tsk);
 extern void vtime_account_idle(struct task_struct *tsk);
 extern void vtime_account_user(struct task_struct *tsk);
-extern void vtime_account(struct task_struct *tsk);
+extern void vtime_account_irq_enter(struct task_struct *tsk);
 
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
-extern bool vtime_accounting(void);
-#else
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
 static inline bool vtime_accounting(void) { return true; }
 #endif
 
 #else /* !CONFIG_VIRT_CPU_ACCOUNTING */
+
 static inline void vtime_task_switch(struct task_struct *prev) { }
 static inline void vtime_account_system(struct task_struct *tsk) { }
 static inline void vtime_account_system_irqsafe(struct task_struct *tsk) { }
 static inline void vtime_account_user(struct task_struct *tsk) { }
-static inline void vtime_account(struct task_struct *tsk) { }
+static inline void vtime_account_irq_enter(struct task_struct *tsk) { }
 static inline bool vtime_accounting(void) { return false; }
 #endif
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
-static inline void arch_vtime_task_switch(struct task_struct *tsk) { }
+extern void arch_vtime_task_switch(struct task_struct *tsk);
+extern void vtime_account_irq_exit(struct task_struct *tsk);
+extern void vtime_user_enter(struct task_struct *tsk);
+extern bool vtime_accounting(void);
+#else
+static inline void vtime_account_irq_exit(struct task_struct *tsk)
+{
+	/* On hard|softirq exit we always account to hard|softirq cputime */
+	vtime_account_system(tsk);
+}
+static inline void vtime_enter_user(struct task_struct *tsk) { }
 #endif
 
+
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
 extern void irqtime_account_irq(struct task_struct *tsk);
 #else
 static inline void irqtime_account_irq(struct task_struct *tsk) { }
 #endif
 
-static inline void vtime_account_irq_enter(struct task_struct *tsk)
+static inline void account_irq_enter_time(struct task_struct *tsk)
 {
-	/*
-	 * Hardirq can interrupt idle task anytime. So we need vtime_account()
-	 * that performs the idle check in CONFIG_VIRT_CPU_ACCOUNTING.
-	 * Softirq can also interrupt idle task directly if it calls
-	 * local_bh_enable(). Such case probably don't exist but we never know.
-	 * Ksoftirqd is not concerned because idle time is flushed on context
-	 * switch. Softirqs in the end of hardirqs are also not a problem because
-	 * the idle time is flushed on hardirq time already.
-	 */
-	vtime_account(tsk);
+	vtime_account_irq_enter(tsk);
 	irqtime_account_irq(tsk);
 }
 
-static inline void vtime_account_irq_exit(struct task_struct *tsk)
+static inline void account_irq_exit_time(struct task_struct *tsk)
 {
-	/* On hard|softirq exit we always account to hard|softirq cputime */
-	vtime_account_system(tsk);
+	vtime_account_irq_exit(tsk);
 	irqtime_account_irq(tsk);
 }
 
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index ca1e073..bd2f2fc 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -56,7 +56,7 @@ void user_enter(void)
 	local_irq_save(flags);
 	if (__this_cpu_read(context_tracking.active) &&
 	    __this_cpu_read(context_tracking.state) != IN_USER) {
-		vtime_account_system(current);
+		vtime_user_enter(current);
 		/*
 		 * At this stage, only low level arch entry code remains and
 		 * then we'll run in userspace. We can assume there won't be
diff --git a/kernel/fork.c b/kernel/fork.c
index a81efb8..efafcba 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1224,6 +1224,12 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
 	p->prev_cputime.utime = p->prev_cputime.stime = 0;
 #endif
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+	seqlock_init(&p->vtime_seqlock);
+	p->prev_jiffies_whence = JIFFIES_SLEEPING; /*CHECKME: idle tasks? */
+	p->prev_jiffies = jiffies;
+#endif
+
 #if defined(SPLIT_RSS_COUNTING)
 	memset(&p->rss_stat, 0, sizeof(p->rss_stat));
 #endif
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 0603671..3f25e60 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -484,7 +484,7 @@ void vtime_task_switch(struct task_struct *prev)
  * vtime_account().
  */
 #ifndef __ARCH_HAS_VTIME_ACCOUNT
-void vtime_account(struct task_struct *tsk)
+void vtime_account_irq_enter(struct task_struct *tsk)
 {
 	if (!in_interrupt()) {
 		/*
@@ -505,7 +505,7 @@ void vtime_account(struct task_struct *tsk)
 	}
 	vtime_account_system(tsk);
 }
-EXPORT_SYMBOL_GPL(vtime_account);
+EXPORT_SYMBOL_GPL(vtime_account_irq_enter);
 #endif /* __ARCH_HAS_VTIME_ACCOUNT */
 #endif /* CONFIG_VIRT_CPU_ACCOUNTING */
 
@@ -616,41 +616,67 @@ void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime
 #endif /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
-static DEFINE_PER_CPU(long, last_jiffies) = INITIAL_JIFFIES;
-
-static cputime_t get_vtime_delta(void)
+static cputime_t get_vtime_delta(struct task_struct *tsk)
 {
 	long delta;
 
-	delta = jiffies - __this_cpu_read(last_jiffies);
-	__this_cpu_add(last_jiffies, delta);
+	delta = jiffies - tsk->prev_jiffies;
+	tsk->prev_jiffies += delta;
 
 	return jiffies_to_cputime(delta);
 }
 
-void vtime_account_system(struct task_struct *tsk)
+static void __vtime_account_system(struct task_struct *tsk)
 {
-	cputime_t delta_cpu = get_vtime_delta();
+	cputime_t delta_cpu = get_vtime_delta(tsk);
 
 	account_system_time(tsk, irq_count(), delta_cpu, cputime_to_scaled(delta_cpu));
 }
 
+void vtime_account_system(struct task_struct *tsk)
+{
+	write_seqlock(&tsk->vtime_seqlock);
+	__vtime_account_system(tsk);
+	write_sequnlock(&tsk->vtime_seqlock);
+}
+
+void vtime_account_irq_exit(struct task_struct *tsk)
+{
+	write_seqlock(&tsk->vtime_seqlock);
+	if (context_tracking_in_user())
+		tsk->prev_jiffies_whence = JIFFIES_USER;
+	__vtime_account_system(tsk);
+	write_sequnlock(&tsk->vtime_seqlock);
+}
+
 void vtime_account_user(struct task_struct *tsk)
 {
-	cputime_t delta_cpu = get_vtime_delta();
+	cputime_t delta_cpu = get_vtime_delta(tsk);
 
 	/*
 	 * This is an unfortunate hack: if we flush user time only on
 	 * irq entry, we miss the jiffies update and the time is spuriously
 	 * accounted to system time.
 	 */
-	if (context_tracking_in_user())
+	if (context_tracking_in_user()) {
+		write_seqlock(&tsk->vtime_seqlock);
+		tsk->prev_jiffies_whence = JIFFIES_SYS;
 		account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
+		write_sequnlock(&tsk->vtime_seqlock);
+	}
+}
+
+void vtime_user_enter(struct task_struct *tsk)
+{
+	write_seqlock(&tsk->vtime_seqlock);
+	tsk->prev_jiffies_whence = JIFFIES_USER;
+	__vtime_account_system(tsk);
+	write_sequnlock(&tsk->vtime_seqlock);
 }
 
 void vtime_account_idle(struct task_struct *tsk)
 {
-	cputime_t delta_cpu = get_vtime_delta();
+	cputime_t delta_cpu = get_vtime_delta(tsk);
 
 	account_idle_time(delta_cpu);
 }
@@ -660,31 +686,64 @@ bool vtime_accounting(void)
 	return context_tracking_active();
 }
 
-static int __cpuinit vtime_cpu_notify(struct notifier_block *self,
-				      unsigned long action, void *hcpu)
+void arch_vtime_task_switch(struct task_struct *prev)
 {
-	long cpu = (long)hcpu;
-	long *last_jiffies_cpu = per_cpu_ptr(&last_jiffies, cpu);
+	write_seqlock(&prev->vtime_seqlock);
+	prev->prev_jiffies_whence = JIFFIES_SLEEPING;
+	write_sequnlock(&prev->vtime_seqlock);
 
-	switch (action) {
-	case CPU_UP_PREPARE:
-	case CPU_UP_PREPARE_FROZEN:
-		/*
-		 * CHECKME: ensure that's visible by the CPU
-		 * once it wakes up
-		 */
-		*last_jiffies_cpu = jiffies;
-	default:
-		break;
-	}
+	write_seqlock(&current->vtime_seqlock);
+	current->prev_jiffies_whence = JIFFIES_SYS;
+	current->prev_jiffies = jiffies;
+	write_sequnlock(&current->vtime_seqlock);
+}
+
+void task_cputime(struct task_struct *t, cputime_t *utime, cputime_t *stime)
+{
+	unsigned int seq;
+	long delta;
+
+	do {
+		seq = read_seqbegin(&t->vtime_seqlock);
+
+		*utime = t->utime;
+		*stime = t->utime;
+
+		if (t->prev_jiffies_whence == JIFFIES_SLEEPING || 
+		    is_idle_task(t))
+			continue;
 
-	return NOTIFY_OK;
+		delta = jiffies - t->prev_jiffies;
+
+		if (t->prev_jiffies_whence == JIFFIES_USER)
+			*utime += delta;
+		else if (t->prev_jiffies_whence == JIFFIES_SYS)
+			*stime += delta;
+	} while (read_seqretry(&t->vtime_seqlock, seq));
 }
 
-static int __init init_vtime(void)
+void task_cputime_scaled(struct task_struct *t,
+			 cputime_t *utimescaled, cputime_t *stimescaled)
 {
-	cpu_notifier(vtime_cpu_notify, 0);
-	return 0;
+	unsigned int seq;
+	long delta;
+
+	do {
+		seq = read_seqbegin(&t->vtime_seqlock);
+
+		*utimescaled = t->utimescaled;
+		*stimescaled = t->utimescaled;
+
+		if (t->prev_jiffies_whence == JIFFIES_SLEEPING || 
+		    is_idle_task(t))
+			continue;
+
+		delta = jiffies - t->prev_jiffies;
+
+		if (t->prev_jiffies_whence == JIFFIES_USER)
+			*utimescaled += jiffies_to_scaled(delta);
+		else if (t->prev_jiffies_whence == JIFFIES_SYS)
+			*stimescaled += jiffies_to_scaled(delta);
+	} while (read_seqretry(&t->vtime_seqlock, seq));
 }
-early_initcall(init_vtime);
 #endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
diff --git a/kernel/softirq.c b/kernel/softirq.c
index ed567ba..f5cc25f 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -221,7 +221,7 @@ asmlinkage void __do_softirq(void)
 	current->flags &= ~PF_MEMALLOC;
 
 	pending = local_softirq_pending();
-	vtime_account_irq_enter(current);
+	account_irq_enter_time(current);
 
 	__local_bh_disable((unsigned long)__builtin_return_address(0),
 				SOFTIRQ_OFFSET);
@@ -272,7 +272,7 @@ restart:
 
 	lockdep_softirq_exit();
 
-	vtime_account_irq_exit(current);
+	account_irq_exit_time(current);
 	__local_bh_enable(SOFTIRQ_OFFSET);
 	tsk_restore_flags(current, old_flags, PF_MEMALLOC);
 }
@@ -341,7 +341,7 @@ static inline void invoke_softirq(void)
  */
 void irq_exit(void)
 {
-	vtime_account_irq_exit(current);
+	account_irq_exit_time(current);
 	trace_hardirq_exit();
 	sub_preempt_count(IRQ_EXIT_OFFSET);
 	if (!in_interrupt() && local_softirq_pending())
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 06/24] nohz: Basic full dynticks interface
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2012-12-20 18:32 ` [PATCH 05/24] cputime: Safely read cputime of full dynticks CPUs Frederic Weisbecker
@ 2012-12-20 18:32 ` Frederic Weisbecker
  2012-12-20 18:32 ` [PATCH 07/24] nohz: Assign timekeeping duty to a non-full-nohz CPU Frederic Weisbecker
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-20 18:32 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

Start with a very simple interface to define full dynticks CPU:
use a boot time option defined cpumask through the "full_nohz="
kernel parameter.

Make sure you keep at least one CPU outside this range to handle
the timekeeping.

Also full_nohz= must match rcu_nocb= value.

Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/tick.h     |    7 +++++++
 kernel/time/Kconfig      |    9 +++++++++
 kernel/time/tick-sched.c |   23 +++++++++++++++++++++++
 3 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 553272e..2d4f6f0 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -157,6 +157,13 @@ static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return -1; }
 static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
 # endif /* !NO_HZ */
 
+#ifdef CONFIG_NO_HZ_FULL
+int tick_nohz_full_cpu(int cpu);
+#else
+static inline int tick_nohz_full_cpu(int cpu) { return 0; }
+#endif
+
+
 # ifdef CONFIG_CPU_IDLE_GOV_MENU
 extern void menu_hrtimer_cancel(void);
 # else
diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index 8601f0d..0a1bc72 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -70,6 +70,15 @@ config NO_HZ
 	  only trigger on an as-needed basis both when the system is
 	  busy and when the system is idle.
 
+config NO_HZ_FULL
+       bool "Full tickless system"
+       depends on NO_HZ && RCU_USER_QS && VIRT_CPU_ACCOUNTING_GEN && RCU_NOCB_CPU
+       select CONTEXT_TRACKING_FORCE
+       help
+         Try to be tickless everywhere, not just in idle. (You need
+	 to fill up the full_nohz_mask boot parameter).
+
+
 config HIGH_RES_TIMERS
 	bool "High Resolution Timer Support"
 	depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index ad0e6fa..fac9ba4 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -142,6 +142,29 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
 	profile_tick(CPU_PROFILING);
 }
 
+#ifdef CONFIG_NO_HZ_FULL
+static cpumask_var_t full_nohz_mask;
+bool have_full_nohz_mask;
+
+int tick_nohz_full_cpu(int cpu)
+{
+	if (!have_full_nohz_mask)
+		return 0;
+
+	return cpumask_test_cpu(cpu, full_nohz_mask);
+}
+
+/* Parse the boot-time nohz CPU list from the kernel parameters. */
+static int __init tick_nohz_full_setup(char *str)
+{
+	alloc_bootmem_cpumask_var(&full_nohz_mask);
+	have_full_nohz_mask = true;
+	cpulist_parse(str, full_nohz_mask);
+	return 1;
+}
+__setup("full_nohz=", tick_nohz_full_setup);
+#endif
+
 /*
  * NOHZ - aka dynamic tick functionality
  */
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 07/24] nohz: Assign timekeeping duty to a non-full-nohz CPU
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
                   ` (5 preceding siblings ...)
  2012-12-20 18:32 ` [PATCH 06/24] nohz: Basic full dynticks interface Frederic Weisbecker
@ 2012-12-20 18:32 ` Frederic Weisbecker
  2012-12-21 16:13   ` Steven Rostedt
  2012-12-20 18:32 ` [PATCH 08/24] nohz: Trace timekeeping update Frederic Weisbecker
                   ` (18 subsequent siblings)
  25 siblings, 1 reply; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-20 18:32 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

This way the full nohz CPUs can safely run with the tick
stopped with a guarantee that somebody else is taking
care of the jiffies and gtod progression.

NOTE: this doesn't handle CPU hotplug. Also we could use something
more elaborated wrt. powersaving if we have more than one non full-nohz
CPU running. But let's use this KISS solution for now.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/tick-broadcast.c |    3 ++-
 kernel/time/tick-common.c    |    5 ++++-
 kernel/time/tick-sched.c     |    7 ++++++-
 3 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index f113755..596c547 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -537,7 +537,8 @@ void tick_broadcast_setup_oneshot(struct clock_event_device *bc)
 		bc->event_handler = tick_handle_oneshot_broadcast;
 
 		/* Take the do_timer update */
-		tick_do_timer_cpu = cpu;
+		if (!tick_nohz_full_cpu(cpu))
+			tick_do_timer_cpu = cpu;
 
 		/*
 		 * We must be careful here. There might be other CPUs
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index b1600a6..83f2bd9 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -163,7 +163,10 @@ static void tick_setup_device(struct tick_device *td,
 		 * this cpu:
 		 */
 		if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
-			tick_do_timer_cpu = cpu;
+			if (!tick_nohz_full_cpu(cpu))
+				tick_do_timer_cpu = cpu;
+			else
+				tick_do_timer_cpu = TICK_DO_TIMER_NONE;
 			tick_next_period = ktime_get();
 			tick_period = ktime_set(0, NSEC_PER_SEC / HZ);
 		}
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index fac9ba4..4a68b50 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -112,7 +112,8 @@ static void tick_sched_do_timer(ktime_t now)
 	 * this duty, then the jiffies update is still serialized by
 	 * jiffies_lock.
 	 */
-	if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE))
+	if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)
+	    && !tick_nohz_full_cpu(cpu))
 		tick_do_timer_cpu = cpu;
 #endif
 
@@ -512,6 +513,10 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 		return false;
 	}
 
+	/* If there are full nohz CPUs around, we need to keep the timekeeping duty */
+	if (have_full_nohz_mask && tick_do_timer_cpu == cpu)
+		return false;
+
 	return true;
 }
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 08/24] nohz: Trace timekeeping update
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
                   ` (6 preceding siblings ...)
  2012-12-20 18:32 ` [PATCH 07/24] nohz: Assign timekeeping duty to a non-full-nohz CPU Frederic Weisbecker
@ 2012-12-20 18:32 ` Frederic Weisbecker
  2012-12-20 18:32 ` [PATCH 09/24] nohz: Wake up full dynticks CPUs when a timer gets enqueued Frederic Weisbecker
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-20 18:32 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

Not for merge. This may become a real tracepoint.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/tick-sched.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 4a68b50..73f339b 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -118,8 +118,10 @@ static void tick_sched_do_timer(ktime_t now)
 #endif
 
 	/* Check, if the jiffies need an update */
-	if (tick_do_timer_cpu == cpu)
+	if (tick_do_timer_cpu == cpu) {
+		trace_printk("do timekeeping\n");
 		tick_do_update_jiffies64(now);
+	}
 }
 
 static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 09/24] nohz: Wake up full dynticks CPUs when a timer gets enqueued
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
                   ` (7 preceding siblings ...)
  2012-12-20 18:32 ` [PATCH 08/24] nohz: Trace timekeeping update Frederic Weisbecker
@ 2012-12-20 18:32 ` Frederic Weisbecker
  2012-12-20 18:32 ` [PATCH 10/24] rcu: Restart the tick on non-responding full dynticks CPUs Frederic Weisbecker
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-20 18:32 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

Wake up a CPU when a timer list timer is enqueued there and
the CPU is in full dynticks mode. Sending an IPI to it makes
it reconsidering the next timer to program on top of recent
updates.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/sched.h |    4 ++--
 kernel/sched/core.c   |   18 +++++++++++++++++-
 kernel/timer.c        |    2 +-
 3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 727b988..8a89dc6 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2038,9 +2038,9 @@ static inline void idle_task_exit(void) {}
 #endif
 
 #if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
-extern void wake_up_idle_cpu(int cpu);
+extern void wake_up_nohz_cpu(int cpu);
 #else
-static inline void wake_up_idle_cpu(int cpu) { }
+static inline void wake_up_nohz_cpu(int cpu) { }
 #endif
 
 extern unsigned int sysctl_sched_latency;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6271b89..1ca0a66 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -579,7 +579,7 @@ unlock:
  * account when the CPU goes back to idle and evaluates the timer
  * wheel for the next timer event.
  */
-void wake_up_idle_cpu(int cpu)
+static void wake_up_idle_cpu(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
 
@@ -609,6 +609,22 @@ void wake_up_idle_cpu(int cpu)
 		smp_send_reschedule(cpu);
 }
 
+static bool wake_up_full_nohz_cpu(int cpu)
+{
+	if (tick_nohz_full_cpu(cpu)) {
+		smp_send_reschedule(cpu);
+		return true;
+	}
+
+	return false;
+}
+
+void wake_up_nohz_cpu(int cpu)
+{
+	if (!wake_up_full_nohz_cpu(cpu))
+		wake_up_idle_cpu(cpu);
+}
+
 static inline bool got_nohz_idle_kick(void)
 {
 	int cpu = smp_processor_id();
diff --git a/kernel/timer.c b/kernel/timer.c
index ff3b516..970b57d 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -936,7 +936,7 @@ void add_timer_on(struct timer_list *timer, int cpu)
 	 * makes sure that a CPU on the way to idle can not evaluate
 	 * the timer wheel.
 	 */
-	wake_up_idle_cpu(cpu);
+	wake_up_nohz_cpu(cpu);
 	spin_unlock_irqrestore(&base->lock, flags);
 }
 EXPORT_SYMBOL_GPL(add_timer_on);
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 10/24] rcu: Restart the tick on non-responding full dynticks CPUs
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
                   ` (8 preceding siblings ...)
  2012-12-20 18:32 ` [PATCH 09/24] nohz: Wake up full dynticks CPUs when a timer gets enqueued Frederic Weisbecker
@ 2012-12-20 18:32 ` Frederic Weisbecker
  2012-12-20 18:32 ` [PATCH 11/24] sched: Comment on rq->clock correctness in ttwu_do_wakeup() in nohz Frederic Weisbecker
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-20 18:32 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

When a CPU in full dynticks mode doesn't respond to complete
a grace period, issue it a specific IPI so that it restarts
the tick and chases a quiescent state.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/rcutree.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index e441b77..302d360 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -53,6 +53,7 @@
 #include <linux/delay.h>
 #include <linux/stop_machine.h>
 #include <linux/random.h>
+#include <linux/tick.h>
 
 #include "rcutree.h"
 #include <trace/events/rcu.h>
@@ -743,6 +744,12 @@ static int dyntick_save_progress_counter(struct rcu_data *rdp)
 	return (rdp->dynticks_snap & 0x1) == 0;
 }
 
+static void rcu_kick_nohz_cpu(int cpu)
+{
+	if (tick_nohz_full_cpu(cpu))
+		smp_send_reschedule(cpu);
+}
+
 /*
  * Return true if the specified CPU has passed through a quiescent
  * state by virtue of being in or having passed through an dynticks
@@ -790,6 +797,9 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
 		rdp->offline_fqs++;
 		return 1;
 	}
+
+	rcu_kick_nohz_cpu(rdp->cpu);
+
 	return 0;
 }
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 11/24] sched: Comment on rq->clock correctness in ttwu_do_wakeup() in nohz
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
                   ` (9 preceding siblings ...)
  2012-12-20 18:32 ` [PATCH 10/24] rcu: Restart the tick on non-responding full dynticks CPUs Frederic Weisbecker
@ 2012-12-20 18:32 ` Frederic Weisbecker
  2012-12-20 18:32 ` [PATCH 12/24] sched: Update rq clock on nohz CPU before migrating tasks Frederic Weisbecker
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-20 18:32 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

Just to avoid confusion.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/core.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1ca0a66..02bd005 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1279,6 +1279,12 @@ ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
 	if (p->sched_class->task_woken)
 		p->sched_class->task_woken(rq, p);
 
+	/*
+	 * For adaptive nohz case: We called ttwu_activate()
+	 * which just updated the rq clock. There is an
+	 * exception with p->on_rq != 0 but in this case
+	 * we are not idle and rq->idle_stamp == 0
+	 */
 	if (rq->idle_stamp) {
 		u64 delta = rq->clock - rq->idle_stamp;
 		u64 max = 2*sysctl_sched_migration_cost;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 12/24] sched: Update rq clock on nohz CPU before migrating tasks
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
                   ` (10 preceding siblings ...)
  2012-12-20 18:32 ` [PATCH 11/24] sched: Comment on rq->clock correctness in ttwu_do_wakeup() in nohz Frederic Weisbecker
@ 2012-12-20 18:32 ` Frederic Weisbecker
  2012-12-20 18:33 ` [PATCH 13/24] sched: Update rq clock on nohz CPU before setting fair group shares Frederic Weisbecker
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-20 18:32 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

Because the sched_class::put_prev_task() callback of rt and fair
classes are referring to the rq clock to update their runtime
statistics. A CPU running in tickless mode may carry a stale value.
We need to update it there.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/core.c  |    6 ++++++
 kernel/sched/sched.h |    7 +++++++
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 02bd005..5d9060a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4832,6 +4832,12 @@ static void migrate_tasks(unsigned int dead_cpu)
 	 */
 	rq->stop = NULL;
 
+	/*
+	 * ->put_prev_task() need to have an up-to-date value
+	 * of rq->clock[_task]
+	 */
+	update_nohz_rq_clock(rq);
+
 	for ( ; ; ) {
 		/*
 		 * There's this thread running, bail when that's the only
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 5eca173..db3d4df 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3,6 +3,7 @@
 #include <linux/mutex.h>
 #include <linux/spinlock.h>
 #include <linux/stop_machine.h>
+#include <linux/tick.h>
 
 #include "cpupri.h"
 
@@ -951,6 +952,12 @@ static inline void dec_nr_running(struct rq *rq)
 
 extern void update_rq_clock(struct rq *rq);
 
+static inline void update_nohz_rq_clock(struct rq *rq)
+{
+	if (tick_nohz_full_cpu(cpu_of(rq)))
+		update_rq_clock(rq);
+}
+
 extern void activate_task(struct rq *rq, struct task_struct *p, int flags);
 extern void deactivate_task(struct rq *rq, struct task_struct *p, int flags);
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 13/24] sched: Update rq clock on nohz CPU before setting fair group shares
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
                   ` (11 preceding siblings ...)
  2012-12-20 18:32 ` [PATCH 12/24] sched: Update rq clock on nohz CPU before migrating tasks Frederic Weisbecker
@ 2012-12-20 18:33 ` Frederic Weisbecker
  2012-12-20 18:33 ` [PATCH 14/24] sched: Update rq clock on tickless CPUs before calling check_preempt_curr() Frederic Weisbecker
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-20 18:33 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

Because we may update the execution time (sched_group_set_shares()->
	update_cfs_shares()->reweight_entity()->update_curr()) before
reweighting the entity after updating the group shares and this requires
an uptodate version of the runqueue clock. Let's update it on the target
CPU if it runs tickless because scheduler_tick() is not there to maintain
it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/fair.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 59e072b..a6ddc1e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5837,6 +5837,12 @@ int sched_group_set_shares(struct task_group *tg, unsigned long shares)
 		se = tg->se[i];
 		/* Propagate contribution to hierarchy */
 		raw_spin_lock_irqsave(&rq->lock, flags);
+		/*
+		 * We may call update_curr() which needs an up-to-date
+		 * version of rq clock if the CPU runs tickless.
+		 */
+		update_nohz_rq_clock(rq);
+
 		for_each_sched_entity(se) {
 			update_cfs_shares(group_cfs_rq(se));
 			/* update contribution to parent */
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 14/24] sched: Update rq clock on tickless CPUs before calling check_preempt_curr()
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
                   ` (12 preceding siblings ...)
  2012-12-20 18:33 ` [PATCH 13/24] sched: Update rq clock on nohz CPU before setting fair group shares Frederic Weisbecker
@ 2012-12-20 18:33 ` Frederic Weisbecker
  2012-12-20 18:33 ` [PATCH 15/24] sched: Update rq clock earlier in unthrottle_cfs_rq Frederic Weisbecker
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-20 18:33 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

check_preempt_wakeup() of fair class needs an uptodate sched clock
value to update runtime stats of the current task.

When a task is woken up, activate_task() is usually called right before
ttwu_do_wakeup() unless the task is already in the runqueue. In this
case we need to update the rq clock manually in case the CPU runs
tickless because ttwu_do_wakeup() calls check_preempt_wakeup().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/core.c |   17 ++++++++++++++++-
 1 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5d9060a..9cbace7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1323,6 +1323,12 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
 
 	rq = __task_rq_lock(p);
 	if (p->on_rq) {
+		/*
+		 * Ensure check_preempt_curr() won't deal with a stale value
+		 * of rq clock if the CPU is tickless. BTW do we actually need
+		 * check_preempt_curr() to be called here?
+		 */
+		update_nohz_rq_clock(rq);
 		ttwu_do_wakeup(rq, p, wake_flags);
 		ret = 1;
 	}
@@ -1500,8 +1506,17 @@ static void try_to_wake_up_local(struct task_struct *p)
 	if (!(p->state & TASK_NORMAL))
 		goto out;
 
-	if (!p->on_rq)
+	if (!p->on_rq) {
 		ttwu_activate(rq, p, ENQUEUE_WAKEUP);
+	} else {
+		/*
+		 * Even if the task is on the runqueue we still
+		 * need to ensure check_preempt_curr() won't
+		 * deal with a stale rq clock value on a tickless
+		 * CPU
+		 */
+		update_nohz_rq_clock(rq);
+	}
 
 	ttwu_do_wakeup(rq, p, 0);
 	ttwu_stat(p, smp_processor_id(), 0);
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 15/24] sched: Update rq clock earlier in unthrottle_cfs_rq
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
                   ` (13 preceding siblings ...)
  2012-12-20 18:33 ` [PATCH 14/24] sched: Update rq clock on tickless CPUs before calling check_preempt_curr() Frederic Weisbecker
@ 2012-12-20 18:33 ` Frederic Weisbecker
  2012-12-20 18:33 ` [PATCH 16/24] sched: Update clock of nohz busiest rq before balancing Frederic Weisbecker
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-20 18:33 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

In this function we are making use of rq->clock right before the
update of the rq clock, let's just call update_rq_clock() just
before that to avoid using a stale rq clock value.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/fair.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a6ddc1e..b31f7ca 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2051,14 +2051,15 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
 	long task_delta;
 
 	se = cfs_rq->tg->se[cpu_of(rq_of(cfs_rq))];
-
 	cfs_rq->throttled = 0;
+
+	update_rq_clock(rq);
+
 	raw_spin_lock(&cfs_b->lock);
 	cfs_b->throttled_time += rq->clock - cfs_rq->throttled_clock;
 	list_del_rcu(&cfs_rq->throttled_list);
 	raw_spin_unlock(&cfs_b->lock);
 
-	update_rq_clock(rq);
 	/* update hierarchical throttle state */
 	walk_tg_tree_from(cfs_rq->tg, tg_nop, tg_unthrottle_up, (void *)rq);
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 16/24] sched: Update clock of nohz busiest rq before balancing
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
                   ` (14 preceding siblings ...)
  2012-12-20 18:33 ` [PATCH 15/24] sched: Update rq clock earlier in unthrottle_cfs_rq Frederic Weisbecker
@ 2012-12-20 18:33 ` Frederic Weisbecker
  2012-12-20 18:33 ` [PATCH 17/24] sched: Update rq clock before idle balancing Frederic Weisbecker
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-20 18:33 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

move_tasks() and active_load_balance_cpu_stop() both need
to have the busiest rq clock uptodate because they may end
up calling can_migrate_task() that uses rq->clock_task
to determine if the task running in the busiest runqueue
is cache hot.

Hence if the busiest runqueue is tickless, update its clock
before reading it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ Forward port conflicts ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/sched/fair.c |   17 +++++++++++++++++
 1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b31f7ca..347dd3f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4774,6 +4774,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 {
 	int ld_moved, cur_ld_moved, active_balance = 0;
 	int lb_iterations, max_lb_iterations;
+	int clock_updated;
 	struct sched_group *group;
 	struct rq *busiest;
 	unsigned long flags;
@@ -4817,6 +4818,7 @@ redo:
 
 	ld_moved = 0;
 	lb_iterations = 1;
+	clock_updated = 0;
 	if (busiest->nr_running > 1) {
 		/*
 		 * Attempt to move tasks. If find_busiest_group has found
@@ -4840,6 +4842,14 @@ more_balance:
 		 */
 		cur_ld_moved = move_tasks(&env);
 		ld_moved += cur_ld_moved;
+
+		/*
+		 * Move tasks may end up calling can_migrate_task() which
+		 * requires an uptodate value of the rq clock.
+		 */
+		update_nohz_rq_clock(busiest);
+		clock_updated = 1;
+
 		double_rq_unlock(env.dst_rq, busiest);
 		local_irq_restore(flags);
 
@@ -4935,6 +4945,13 @@ more_balance:
 				busiest->active_balance = 1;
 				busiest->push_cpu = this_cpu;
 				active_balance = 1;
+				/*
+				 * active_load_balance_cpu_stop may end up calling
+				 * can_migrate_task() which requires an uptodate
+				 * value of the rq clock.
+				 */
+				if (!clock_updated)
+					update_nohz_rq_clock(busiest);
 			}
 			raw_spin_unlock_irqrestore(&busiest->lock, flags);
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 17/24] sched: Update rq clock before idle balancing
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
                   ` (15 preceding siblings ...)
  2012-12-20 18:33 ` [PATCH 16/24] sched: Update clock of nohz busiest rq before balancing Frederic Weisbecker
@ 2012-12-20 18:33 ` Frederic Weisbecker
  2012-12-20 18:33 ` [PATCH 18/24] sched: Update nohz rq clock before searching busiest group on load balancing Frederic Weisbecker
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-20 18:33 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

idle_balance() is called from schedule() right before we schedule the
idle task. It needs to record the idle timestamp at that time and for
this the rq clock must be accurate. If the CPU is running tickless
we need to update the rq clock manually.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/fair.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 347dd3f..291e225 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5013,6 +5013,7 @@ void idle_balance(int this_cpu, struct rq *this_rq)
 	int pulled_task = 0;
 	unsigned long next_balance = jiffies + HZ;
 
+	update_nohz_rq_clock(this_rq);
 	this_rq->idle_stamp = this_rq->clock;
 
 	if (this_rq->avg_idle < sysctl_sched_migration_cost)
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 18/24] sched: Update nohz rq clock before searching busiest group on load balancing
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
                   ` (16 preceding siblings ...)
  2012-12-20 18:33 ` [PATCH 17/24] sched: Update rq clock before idle balancing Frederic Weisbecker
@ 2012-12-20 18:33 ` Frederic Weisbecker
  2012-12-20 18:33 ` [PATCH 19/24] nohz: Move nohz load balancer selection into idle logic Frederic Weisbecker
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-20 18:33 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

While load balancing an rq target, we look for the busiest group.
This operation may require an uptodate rq clock if we end up calling
scale_rt_power(). To this end, update it manually if the target is
running tickless.

DOUBT: don't we actually also need this in vanilla kernel, in case
this_cpu is in dyntick-idle mode?

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/fair.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 291e225..b1b791d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4795,6 +4795,19 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 
 	schedstat_inc(sd, lb_count[idle]);
 
+	/*
+	 * find_busiest_group() may need an uptodate cpu clock
+	 * for find_busiest_group() (see scale_rt_power()). If
+	 * the CPU is nohz, it's clock may be stale.
+	 */
+	if (tick_nohz_full_cpu(this_cpu)) {
+		local_irq_save(flags);
+		raw_spin_lock(&this_rq->lock);
+		update_rq_clock(this_rq);
+		raw_spin_unlock(&this_rq->lock);
+		local_irq_restore(flags);
+	}
+
 redo:
 	group = find_busiest_group(&env, balance);
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 19/24] nohz: Move nohz load balancer selection into idle logic
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
                   ` (17 preceding siblings ...)
  2012-12-20 18:33 ` [PATCH 18/24] sched: Update nohz rq clock before searching busiest group on load balancing Frederic Weisbecker
@ 2012-12-20 18:33 ` Frederic Weisbecker
  2012-12-20 18:33 ` [PATCH 20/24] nohz: Full dynticks mode Frederic Weisbecker
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-20 18:33 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

[ ** BUGGY PATCH: I need to put more thinking into this ** ]

We want the nohz load balancer to be an idle CPU, thus
move that selection to strict dyntick idle logic.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ added movement of calc_load_exit_idle() ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/time/tick-sched.c |   11 ++++++-----
 1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 73f339b..1b607bce 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -442,9 +442,6 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 		 * the scheduler tick in nohz_restart_sched_tick.
 		 */
 		if (!ts->tick_stopped) {
-			nohz_balance_enter_idle(cpu);
-			calc_load_enter_idle();
-
 			ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
 			ts->tick_stopped = 1;
 		}
@@ -540,8 +537,11 @@ static void __tick_nohz_idle_enter(struct tick_sched *ts)
 			ts->idle_expires = expires;
 		}
 
-		if (!was_stopped && ts->tick_stopped)
+		if (!was_stopped && ts->tick_stopped) {
 			ts->idle_jiffies = ts->last_jiffies;
+			nohz_balance_enter_idle(cpu);
+			calc_load_enter_idle();
+		}
 	}
 }
 
@@ -649,7 +649,6 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
 	tick_do_update_jiffies64(now);
 	update_cpu_load_nohz();
 
-	calc_load_exit_idle();
 	touch_softlockup_watchdog();
 	/*
 	 * Cancel the scheduled timer and restore the tick
@@ -709,6 +708,8 @@ void tick_nohz_idle_exit(void)
 		tick_nohz_stop_idle(cpu, now);
 
 	if (ts->tick_stopped) {
+		nohz_balance_enter_idle(cpu);
+		calc_load_exit_idle();
 		tick_nohz_restart_sched_tick(ts, now);
 		tick_nohz_account_idle_ticks(ts);
 	}
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 20/24] nohz: Full dynticks mode
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
                   ` (18 preceding siblings ...)
  2012-12-20 18:33 ` [PATCH 19/24] nohz: Move nohz load balancer selection into idle logic Frederic Weisbecker
@ 2012-12-20 18:33 ` Frederic Weisbecker
  2012-12-26  6:12   ` Namhyung Kim
  2012-12-20 18:33 ` [PATCH 21/24] nohz: Only stop the tick on RCU nocb CPUs Frederic Weisbecker
                   ` (5 subsequent siblings)
  25 siblings, 1 reply; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-20 18:33 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

When a CPU is in full dynticks mode, try to switch
it to nohz mode from the interrupt exit path if it is
running a single non-idle task.

Then restart the tick if necessary if we are enqueuing a
second task while the timer is stopped, so that the scheduler
tick is rearmed.

[TODO: Check remaining things to be done from scheduler_tick()]

[ Included build fix from Geoff Levand ]

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/sched.h    |    6 +++++
 include/linux/tick.h     |    2 +
 kernel/sched/core.c      |   22 ++++++++++++++++++++-
 kernel/sched/sched.h     |    8 +++++++
 kernel/softirq.c         |    5 ++-
 kernel/time/tick-sched.c |   47 ++++++++++++++++++++++++++++++++++++++++-----
 6 files changed, 81 insertions(+), 9 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8a89dc6..4ffac78 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2818,6 +2818,12 @@ static inline void inc_syscw(struct task_struct *tsk)
 #define TASK_SIZE_OF(tsk)	TASK_SIZE
 #endif
 
+#ifdef CONFIG_NO_HZ_FULL
+extern bool sched_can_stop_tick(void);
+#else
+static inline bool sched_can_stop_tick(void) { return false; }
+#endif
+
 #ifdef CONFIG_MM_OWNER
 extern void mm_update_next_owner(struct mm_struct *mm);
 extern void mm_init_owner(struct mm_struct *mm, struct task_struct *p);
diff --git a/include/linux/tick.h b/include/linux/tick.h
index 2d4f6f0..dfb90ea 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -159,8 +159,10 @@ static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
 
 #ifdef CONFIG_NO_HZ_FULL
 int tick_nohz_full_cpu(int cpu);
+extern void tick_nohz_full_check(void);
 #else
 static inline int tick_nohz_full_cpu(int cpu) { return 0; }
+static inline void tick_nohz_full_check(void) { }
 #endif
 
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9cbace7..9d821a3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1215,6 +1215,24 @@ static void update_avg(u64 *avg, u64 sample)
 }
 #endif
 
+#ifdef CONFIG_NO_HZ_FULL
+bool sched_can_stop_tick(void)
+{
+	struct rq *rq;
+
+	rq = this_rq();
+
+	/* Make sure rq->nr_running update is visible after the IPI */
+	smp_rmb();
+
+	/* More than one running task need preemption */
+	if (rq->nr_running > 1)
+		return false;
+
+	return true;
+}
+#endif
+
 static void
 ttwu_stat(struct task_struct *p, int cpu, int wake_flags)
 {
@@ -1357,7 +1375,8 @@ static void sched_ttwu_pending(void)
 
 void scheduler_ipi(void)
 {
-	if (llist_empty(&this_rq()->wake_list) && !got_nohz_idle_kick())
+	if (llist_empty(&this_rq()->wake_list) && !got_nohz_idle_kick()
+	    && !tick_nohz_full_cpu(smp_processor_id()))
 		return;
 
 	/*
@@ -1374,6 +1393,7 @@ void scheduler_ipi(void)
 	 * somewhat pessimize the simple resched case.
 	 */
 	irq_enter();
+	tick_nohz_full_check();
 	sched_ttwu_pending();
 
 	/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index db3d4df..f3d8f4a 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -943,6 +943,14 @@ static inline u64 steal_ticks(u64 steal)
 static inline void inc_nr_running(struct rq *rq)
 {
 	rq->nr_running++;
+
+	if (rq->nr_running == 2) {
+		if (tick_nohz_full_cpu(rq->cpu)) {
+			/* Order rq->nr_running write against the IPI */
+			smp_wmb();
+			smp_send_reschedule(rq->cpu);
+		}
+	}
 }
 
 static inline void dec_nr_running(struct rq *rq)
diff --git a/kernel/softirq.c b/kernel/softirq.c
index f5cc25f..6342078 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -307,7 +307,8 @@ void irq_enter(void)
 	int cpu = smp_processor_id();
 
 	rcu_irq_enter();
-	if (is_idle_task(current) && !in_interrupt()) {
+
+	if ((is_idle_task(current) || tick_nohz_full_cpu(cpu)) && !in_interrupt()) {
 		/*
 		 * Prevent raise_softirq from needlessly waking up ksoftirqd
 		 * here, as softirq will be serviced on return from interrupt.
@@ -349,7 +350,7 @@ void irq_exit(void)
 
 #ifdef CONFIG_NO_HZ
 	/* Make sure that timer wheel updates are propagated */
-	if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
+	if (!in_interrupt())
 		tick_nohz_irq_exit();
 #endif
 	rcu_irq_exit();
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 1b607bce..c057a7e 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -585,6 +585,24 @@ void tick_nohz_idle_enter(void)
 	local_irq_enable();
 }
 
+static void tick_nohz_full_stop_tick(struct tick_sched *ts)
+{
+#ifdef CONFIG_NO_HZ_FULL
+	int cpu = smp_processor_id();
+
+	if (!tick_nohz_full_cpu(cpu) || is_idle_task(current))
+		return;
+
+	if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
+		return;
+
+	if (!sched_can_stop_tick())
+		return;
+
+	tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
+#endif
+}
+
 /**
  * tick_nohz_irq_exit - update next tick event from interrupt exit
  *
@@ -597,12 +615,15 @@ void tick_nohz_irq_exit(void)
 {
 	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
 
-	if (!ts->inidle)
-		return;
-
-	/* Cancel the timer because CPU already waken up from the C-states*/
-	menu_hrtimer_cancel();
-	__tick_nohz_idle_enter(ts);
+	if (ts->inidle) {
+		if (!need_resched()) {
+			/* Cancel the timer because CPU already waken up from the C-states*/
+			menu_hrtimer_cancel();
+			__tick_nohz_idle_enter(ts);
+		}
+	} else {
+		tick_nohz_full_stop_tick(ts);
+	}
 }
 
 /**
@@ -833,6 +854,20 @@ static inline void tick_check_nohz(int cpu) { }
 
 #endif /* NO_HZ */
 
+#ifdef CONFIG_NO_HZ_FULL
+void tick_nohz_full_check(void)
+{
+	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+
+	if (tick_nohz_full_cpu(smp_processor_id())) {
+		if (ts->tick_stopped && !is_idle_task(current)) {
+			if (!sched_can_stop_tick())
+				tick_nohz_restart_sched_tick(ts, ktime_get());
+		}
+	}
+}
+#endif /* CONFIG_NO_HZ_FULL */
+
 /*
  * Called from irq_enter to notify about the possible interruption of idle()
  */
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 21/24] nohz: Only stop the tick on RCU nocb CPUs
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
                   ` (19 preceding siblings ...)
  2012-12-20 18:33 ` [PATCH 20/24] nohz: Full dynticks mode Frederic Weisbecker
@ 2012-12-20 18:33 ` Frederic Weisbecker
  2012-12-20 18:33 ` [PATCH 22/24] nohz: Don't turn off the tick if rcu needs it Frederic Weisbecker
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-20 18:33 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

On a full dynticks CPU, we want the RCU callbacks to be
offlined to another CPU, otherwise we need to keep
the tick to wait for the grace period completion.

Ensure the full dynticks CPU is also an rcu_nocb one.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/rcupdate.h |    7 +++++++
 kernel/rcutree.c         |    6 +++---
 kernel/rcutree_plugin.h  |   13 ++++---------
 kernel/time/tick-sched.c |   20 +++++++++++++++++---
 4 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 275aa3f..829312e 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -992,4 +992,11 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
 #define kfree_rcu(ptr, rcu_head)					\
 	__kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))
 
+#ifdef CONFIG_RCU_NOCB_CPU
+bool rcu_is_nocb_cpu(int cpu);
+#else
+static inline bool rcu_is_nocb_cpu(int cpu) { return false; };
+#endif
+
+
 #endif /* __LINUX_RCUPDATE_H */
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 302d360..e9e0ffa 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1589,7 +1589,7 @@ rcu_send_cbs_to_orphanage(int cpu, struct rcu_state *rsp,
 			  struct rcu_node *rnp, struct rcu_data *rdp)
 {
 	/* No-CBs CPUs do not have orphanable callbacks. */
-	if (is_nocb_cpu(rdp->cpu))
+	if (rcu_is_nocb_cpu(rdp->cpu))
 		return;
 
 	/*
@@ -2651,10 +2651,10 @@ static void _rcu_barrier(struct rcu_state *rsp)
 	 * corresponding CPU's preceding callbacks have been invoked.
 	 */
 	for_each_possible_cpu(cpu) {
-		if (!cpu_online(cpu) && !is_nocb_cpu(cpu))
+		if (!cpu_online(cpu) && !rcu_is_nocb_cpu(cpu))
 			continue;
 		rdp = per_cpu_ptr(rsp->rda, cpu);
-		if (is_nocb_cpu(cpu)) {
+		if (rcu_is_nocb_cpu(cpu)) {
 			_rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
 					   rsp->n_barrier_done);
 			atomic_inc(&rsp->barrier_cpu_count);
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index f6e5ec2..625b327 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -2160,7 +2160,7 @@ static int __init rcu_nocb_setup(char *str)
 __setup("rcu_nocbs=", rcu_nocb_setup);
 
 /* Is the specified CPU a no-CPUs CPU? */
-static bool is_nocb_cpu(int cpu)
+bool rcu_is_nocb_cpu(int cpu)
 {
 	if (have_rcu_nocb_mask)
 		return cpumask_test_cpu(cpu, rcu_nocb_mask);
@@ -2218,7 +2218,7 @@ static bool __call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *rhp,
 			    bool lazy)
 {
 
-	if (!is_nocb_cpu(rdp->cpu))
+	if (!rcu_is_nocb_cpu(rdp->cpu))
 		return 0;
 	__call_rcu_nocb_enqueue(rdp, rhp, &rhp->next, 1, lazy);
 	return 1;
@@ -2235,7 +2235,7 @@ static bool __maybe_unused rcu_nocb_adopt_orphan_cbs(struct rcu_state *rsp,
 	long qll = rsp->qlen_lazy;
 
 	/* If this is not a no-CBs CPU, tell the caller to do it the old way. */
-	if (!is_nocb_cpu(smp_processor_id()))
+	if (!rcu_is_nocb_cpu(smp_processor_id()))
 		return 0;
 	rsp->qlen = 0;
 	rsp->qlen_lazy = 0;
@@ -2275,7 +2275,7 @@ static bool nocb_cpu_expendable(int cpu)
 	 * If there are no no-CB CPUs or if this CPU is not a no-CB CPU,
 	 * then offlining this CPU is harmless.  Let it happen.
 	 */
-	if (!have_rcu_nocb_mask || is_nocb_cpu(cpu))
+	if (!have_rcu_nocb_mask || rcu_is_nocb_cpu(cpu))
 		return 1;
 
 	/* If no memory, play it safe and keep the CPU around. */
@@ -2456,11 +2456,6 @@ static void __init rcu_init_nocb(void)
 
 #else /* #ifdef CONFIG_RCU_NOCB_CPU */
 
-static bool is_nocb_cpu(int cpu)
-{
-	return false;
-}
-
 static bool __call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *rhp,
 			    bool lazy)
 {
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index c057a7e..2808e02 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -585,6 +585,19 @@ void tick_nohz_idle_enter(void)
 	local_irq_enable();
 }
 
+#ifdef CONFIG_NO_HZ_FULL
+static bool can_stop_full_tick(int cpu)
+{
+	if (!sched_can_stop_tick())
+		return false;
+
+	if (!rcu_is_nocb_cpu(cpu))
+		return false;
+
+	return true;
+}
+#endif
+
 static void tick_nohz_full_stop_tick(struct tick_sched *ts)
 {
 #ifdef CONFIG_NO_HZ_FULL
@@ -596,7 +609,7 @@ static void tick_nohz_full_stop_tick(struct tick_sched *ts)
 	if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
 		return;
 
-	if (!sched_can_stop_tick())
+	if (!can_stop_full_tick(cpu))
 		return;
 
 	tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
@@ -858,10 +871,11 @@ static inline void tick_check_nohz(int cpu) { }
 void tick_nohz_full_check(void)
 {
 	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+	int cpu = smp_processor_id();
 
-	if (tick_nohz_full_cpu(smp_processor_id())) {
+	if (tick_nohz_full_cpu(cpu)) {
 		if (ts->tick_stopped && !is_idle_task(current)) {
-			if (!sched_can_stop_tick())
+			if (!can_stop_full_tick(cpu))
 				tick_nohz_restart_sched_tick(ts, ktime_get());
 		}
 	}
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 22/24] nohz: Don't turn off the tick if rcu needs it
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
                   ` (20 preceding siblings ...)
  2012-12-20 18:33 ` [PATCH 21/24] nohz: Only stop the tick on RCU nocb CPUs Frederic Weisbecker
@ 2012-12-20 18:33 ` Frederic Weisbecker
  2012-12-20 18:33 ` [PATCH 23/24] nohz: Don't stop the tick if posix cpu timers are running Frederic Weisbecker
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-20 18:33 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

If RCU is waiting for the current CPU to complete a grace
period, don't turn off the tick. Unlike dynctik-idle, we
are not necessarily going to enter into rcu extended quiescent
state, so we may need to keep the tick to note current CPU's
quiescent states.

[added build fix from Zen Lin]

CHECKME: OTOH we don't want to handle a locally started
grace period, this should be offloaded for rcu_nocb CPUs.
What we want is to be kicked if we stay dynticks in the kernel
for too long (ie: to report a quiescent state).
rcu_pending() is perhaps an overkill just for that.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/rcupdate.h |    1 +
 kernel/rcutree.c         |    3 +--
 kernel/time/tick-sched.c |    3 +++
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 829312e..2ebadac 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -211,6 +211,7 @@ static inline int rcu_preempt_depth(void)
 extern void rcu_sched_qs(int cpu);
 extern void rcu_bh_qs(int cpu);
 extern void rcu_check_callbacks(int cpu, int user);
+extern int rcu_pending(int cpu);
 struct notifier_block;
 extern void rcu_idle_enter(void);
 extern void rcu_idle_exit(void);
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index e9e0ffa..6ba3e02 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -232,7 +232,6 @@ module_param(jiffies_till_next_fqs, ulong, 0644);
 
 static void force_qs_rnp(struct rcu_state *rsp, int (*f)(struct rcu_data *));
 static void force_quiescent_state(struct rcu_state *rsp);
-static int rcu_pending(int cpu);
 
 /*
  * Return the number of RCU-sched batches processed thus far for debug & stats.
@@ -2521,7 +2520,7 @@ static int __rcu_pending(struct rcu_state *rsp, struct rcu_data *rdp)
  * by the current CPU, returning 1 if so.  This function is part of the
  * RCU implementation; it is -not- an exported member of the RCU API.
  */
-static int rcu_pending(int cpu)
+int rcu_pending(int cpu)
 {
 	struct rcu_state *rsp;
 
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 2808e02..0c9281a 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -594,6 +594,9 @@ static bool can_stop_full_tick(int cpu)
 	if (!rcu_is_nocb_cpu(cpu))
 		return false;
 
+	if (rcu_pending(cpu))
+		return false;
+
 	return true;
 }
 #endif
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 23/24] nohz: Don't stop the tick if posix cpu timers are running
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
                   ` (21 preceding siblings ...)
  2012-12-20 18:33 ` [PATCH 22/24] nohz: Don't turn off the tick if rcu needs it Frederic Weisbecker
@ 2012-12-20 18:33 ` Frederic Weisbecker
  2012-12-20 18:33 ` [PATCH 24/24] nohz: Add some tracing Frederic Weisbecker
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-20 18:33 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

If either a per thread or a per process posix cpu timer is running,
don't stop the tick.

TODO: restart the tick if it is stopped and a posix cpu timer is
enqueued. Check we probably need a memory barrier for the per
process posix timer that can be enqueued from another task
of the group.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/posix-timers.h |    1 +
 kernel/posix-cpu-timers.c    |   11 +++++++++++
 kernel/time/tick-sched.c     |    4 ++++
 3 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 042058f..97480c2 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -119,6 +119,7 @@ int posix_timer_event(struct k_itimer *timr, int si_private);
 void posix_cpu_timer_schedule(struct k_itimer *timer);
 
 void run_posix_cpu_timers(struct task_struct *task);
+bool posix_cpu_timers_running(struct task_struct *tsk);
 void posix_cpu_timers_exit(struct task_struct *task);
 void posix_cpu_timers_exit_group(struct task_struct *task);
 
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 3d58bd5..d3b46b3 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -1266,6 +1266,17 @@ static inline int fastpath_timer_check(struct task_struct *tsk)
 	return 0;
 }
 
+bool posix_cpu_timers_running(struct task_struct *tsk)
+{
+	if (!task_cputime_zero(&tsk->cputime_expires))
+		return true;
+
+	if (tsk->signal->cputimer.running)
+		return true;
+
+	return false;
+}
+
 /*
  * This is called from the timer interrupt handler.  The irq handler has
  * already updated our counts.  We need to check if any timers fire now.
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 0c9281a..4f4fa13 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -21,6 +21,7 @@
 #include <linux/sched.h>
 #include <linux/module.h>
 #include <linux/irq_work.h>
+#include <linux/posix-timers.h>
 
 #include <asm/irq_regs.h>
 
@@ -597,6 +598,9 @@ static bool can_stop_full_tick(int cpu)
 	if (rcu_pending(cpu))
 		return false;
 
+	if (posix_cpu_timers_running(current))
+		return false;
+
 	return true;
 }
 #endif
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 24/24] nohz: Add some tracing
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
                   ` (22 preceding siblings ...)
  2012-12-20 18:33 ` [PATCH 23/24] nohz: Don't stop the tick if posix cpu timers are running Frederic Weisbecker
@ 2012-12-20 18:33 ` Frederic Weisbecker
  2012-12-21  2:35 ` [ANNOUNCE] 3.7-nohz1 Steven Rostedt
  2012-12-21  5:20 ` Hakan Akkan
  25 siblings, 0 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-20 18:33 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

Not for merge, just for debugging.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/tick-sched.c |   27 ++++++++++++++++++++++-----
 1 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 4f4fa13..f75a85f 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -142,6 +142,7 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
 			ts->idle_jiffies++;
 	}
 #endif
+	trace_printk("tick\n");
 	update_process_times(user_mode(regs));
 	profile_tick(CPU_PROFILING);
 }
@@ -589,17 +590,30 @@ void tick_nohz_idle_enter(void)
 #ifdef CONFIG_NO_HZ_FULL
 static bool can_stop_full_tick(int cpu)
 {
-	if (!sched_can_stop_tick())
+	if (!sched_can_stop_tick()) {
+		trace_printk("Can't stop: sched\n");
 		return false;
+	}
 
-	if (!rcu_is_nocb_cpu(cpu))
+	if (!rcu_is_nocb_cpu(cpu)) {
+		trace_printk("Can't stop: not RCU nocb\n");
 		return false;
+	}
 
-	if (rcu_pending(cpu))
+	/*
+	 * Keep the tick if we are asked to report a quiescent state.
+	 * This must be further optimized (avoid checks for local callbacks,
+	 * ignore RCU in userspace, etc...
+	 */
+	if (rcu_pending(cpu)) {
+		trace_printk("Can't stop: RCU pending\n");
 		return false;
+	}
 
-	if (posix_cpu_timers_running(current))
+	if (posix_cpu_timers_running(current)) {
+		trace_printk("Can't stop: posix CPU timers running\n");
 		return false;
+	}
 
 	return true;
 }
@@ -613,12 +627,15 @@ static void tick_nohz_full_stop_tick(struct tick_sched *ts)
 	if (!tick_nohz_full_cpu(cpu) || is_idle_task(current))
 		return;
 
-	if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
+	if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE) {
+		trace_printk("Can't stop: NOHZ_MODE_INACTIVE\n");
 		return;
+	}
 
 	if (!can_stop_full_tick(cpu))
 		return;
 
+	trace_printk("Stop tick\n");
 	tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
 #endif
 }
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [ANNOUNCE] 3.7-nohz1
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
                   ` (23 preceding siblings ...)
  2012-12-20 18:33 ` [PATCH 24/24] nohz: Add some tracing Frederic Weisbecker
@ 2012-12-21  2:35 ` Steven Rostedt
  2012-12-23 23:43   ` Frederic Weisbecker
  2012-12-21  5:20 ` Hakan Akkan
  25 siblings, 1 reply; 44+ messages in thread
From: Steven Rostedt @ 2012-12-21  2:35 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Alessio Igor Bogani, Andrew Morton, Avi Kivity,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Ingo Molnar, Paul E. McKenney, Paul Gortmaker,
	Peter Zijlstra, Thomas Gleixner, Li Zhong

On Thu, 2012-12-20 at 19:32 +0100, Frederic Weisbecker wrote:
> Hi,
> 

Nice work Frederic!

> So this is a new version of the nohz cpusets based on 3.7, except it's not using
> cpusets anymore and I actually based it on the middle of the 3.8 merge window
> in order to get latest upstream full dynticks preparatory work: cputime cleanups,
> RCU user mode, context tracking subsystem, nohz code consolidation, ...
> 
> So the big changes since the last nohz cpuset release are:
> 
> * printk now uses irq work so it doesn't rely on the tick anymore (provided
> your arch implements irq work with IPIs or alike). This chunk has been proposed
> for the 3.8 merge window: https://lkml.org/lkml/2012/12/17/177
> May be Linus will pull, may be not. We'll see. In any case I've included it in this tree
> but I'm not reposting this part of the patchset to avoid spamming you.
> 
> * cputime doesn't rely on IPIs anymore. Now the reader does a special computation to
> remotely get the tickless cputime.
> 
> * No more cpusets interface. Paul McKenney suggested me to start with a boot time
> kernel parameter to define the full dynticks cpumask. And he was totally right, it
> makes the code much more simple. That's a good way to start and to make the mainlining
> easier. We can still add a runtime configuration later if necessary.
> 
> * Now there is always a CPU handling the timekeeping. This can be further optimized
> and more power-friendly, I really did something simple-stupid. I guess we'll try to get
> that into a better shape with Hakan. But at least the timekeeping now works.
> 
> * It uses the new RCU callbacks offlining feature. This way a full dynticks CPU doesn't
> need to keep the tick to handle local callbacks. This is still very experimental though.
> 
> * No more specific IPI vector for full dynticks. We just use the scheduler ipi.
> 
> The branch is:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> 	3.7-nohz1
> 
> There is still quite some work to do.
> 
> == How to use? ==
> 
> Select:
> 	CONFIG_NO_HZ
> 	CONFIG_RCU_USER_QS
> 	CONFIG_VIRT_CPU_ACCOUNTING_GEN
> 	CONFIG_RCU_NOCB_CPU
> 	CONFIG_NO_HZ_FULL
> 
> You always need at least one timekeeping CPU.
> 
> Let's imagine you have 4 CPUs. We keep the CPU 0 to offline RCU callbacks there and to
> handle the timekeeping. We set the rest as full dynticks. So you need the following kernel
> parameters:
> 
> 	rcu_nocbs=1-3 full_nohz=1-3
> 
> (Note rcu_nocbs value must always be the same as full_nohz).

Why? You can't have: rcu_nocbs=1-4 full_nohz=1-3
  or: rcu_nocbs=1-3 full_nohz=1-4 ?

That needs to be fixed. Either with a warning, and/or to force the two
to be the same. That is, if they specify:

  rcu_nocbs=1-3 full_nohz=1-4

Then set rcu_nocbs=1-4 with a warning about it. Or simply set
 full_nohz=1-3.

-- Steve

> 
> Now if you want proper isolation you need to:
> 
> * Migrate your processes adequately
> * Migrate your irqs to CPU 0
> * Migrate the RCU nocb threads to CPU 0. Example with the above configuration:
> 
> 	for p in $(ps -o pid= -C rcuo1,rcuo2,rcuo3)
> 	do
> 		taskset -cp 0 $p
> 	done
> 
> Then run what you want on the full dynticks CPUs. For best results, run 1 task
> per CPU, mostly in userspace and mostly CPU bound (otherwise more IO = more kernel
> mode execution = more chances to get IPIs, tick restarted, workqueues, kthreads, etc...)
> 
> This page contains a good reminder for those interested in CPU isolation: https://github.com/gby/linux/wiki
> 
> But keep in mind that my tree is not yet ready for serious production.
> 



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 02/24] cputime: Generic on-demand virtual cputime accounting
  2012-12-20 18:32 ` [PATCH 02/24] cputime: Generic on-demand virtual cputime accounting Frederic Weisbecker
@ 2012-12-21  5:11   ` Steven Rostedt
  2012-12-26  8:19   ` Li Zhong
  1 sibling, 0 replies; 44+ messages in thread
From: Steven Rostedt @ 2012-12-21  5:11 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Alessio Igor Bogani, Andrew Morton, Avi Kivity,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Ingo Molnar, Paul E. McKenney, Paul Gortmaker,
	Peter Zijlstra, Thomas Gleixner

On Thu, 2012-12-20 at 19:32 +0100, Frederic Weisbecker wrote:

> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index 293b202..da0a9e7 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -3,6 +3,7 @@
>  #include <linux/tsacct_kern.h>
>  #include <linux/kernel_stat.h>
>  #include <linux/static_key.h>
> +#include <linux/context_tracking.h>
>  #include "sched.h"
>  
> 
> @@ -495,10 +496,24 @@ void vtime_task_switch(struct task_struct *prev)
>  #ifndef __ARCH_HAS_VTIME_ACCOUNT
>  void vtime_account(struct task_struct *tsk)
>  {
> -	if (in_interrupt() || !is_idle_task(tsk))
> -		vtime_account_system(tsk);
> -	else
> -		vtime_account_idle(tsk);
> +	if (!in_interrupt()) {
> +		/*
> +		 * If we interrupted user, context_tracking_in_user()
> +		 * is 1 because the context tracking don't hook
> +		 * on irq entry/exit. This way we know if
> +		 * we need to flush user time on kernel entry.
> +		 */
> +		if (context_tracking_in_user()) {
> +			vtime_account_user(tsk);
> +			return;
> +		}
> +
> +		if (is_idle_task(tsk)) {
> +			vtime_account_idle(tsk);
> +			return;
> +		}
> +	}
> +	vtime_account_system(tsk);
>  }
>  EXPORT_SYMBOL_GPL(vtime_account);
>  #endif /* __ARCH_HAS_VTIME_ACCOUNT */
> @@ -586,4 +601,72 @@ void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime
>  	thread_group_cputime(p, &cputime);
>  	cputime_adjust(&cputime, &p->signal->prev_cputime, ut, st);
>  }
> -#endif

Deleted #endif here.

> +
> +#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN

Added #ifdef here.

> +static DEFINE_PER_CPU(long, last_jiffies) = INITIAL_JIFFIES;
> +
> +static cputime_t get_vtime_delta(void)
> +{
> +	long delta;
> +
> +	delta = jiffies - __this_cpu_read(last_jiffies);
> +	__this_cpu_add(last_jiffies, delta);
> +
> +	return jiffies_to_cputime(delta);
> +}
> +
> +void vtime_account_system(struct task_struct *tsk)
> +{
> +	cputime_t delta_cpu = get_vtime_delta();
> +
> +	account_system_time(tsk, irq_count(), delta_cpu, cputime_to_scaled(delta_cpu));
> +}
> +
> +void vtime_account_user(struct task_struct *tsk)
> +{
> +	cputime_t delta_cpu = get_vtime_delta();
> +
> +	/*
> +	 * This is an unfortunate hack: if we flush user time only on
> +	 * irq entry, we miss the jiffies update and the time is spuriously
> +	 * accounted to system time.
> +	 */
> +	if (context_tracking_in_user())
> +		account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
> +}
> +
> +void vtime_account_idle(struct task_struct *tsk)
> +{
> +	cputime_t delta_cpu = get_vtime_delta();
> +
> +	account_idle_time(delta_cpu);
> +}
> +
> +static int __cpuinit vtime_cpu_notify(struct notifier_block *self,
> +				      unsigned long action, void *hcpu)
> +{
> +	long cpu = (long)hcpu;
> +	long *last_jiffies_cpu = per_cpu_ptr(&last_jiffies, cpu);
> +
> +	switch (action) {
> +	case CPU_UP_PREPARE:
> +	case CPU_UP_PREPARE_FROZEN:
> +		/*
> +		 * CHECKME: ensure that's visible by the CPU
> +		 * once it wakes up
> +		 */
> +		*last_jiffies_cpu = jiffies;
> +	default:
> +		break;
> +	}
> +
> +	return NOTIFY_OK;
> +}
> +
> +static int __init init_vtime(void)
> +{
> +	cpu_notifier(vtime_cpu_notify, 0);
> +	return 0;
> +}
> +early_initcall(init_vtime);
> +#endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */

Added #endif here

Hmm, missing #endif somewhere. Must explain my error message:

  kernel/sched/cputime.c:448:0: error: unterminated #else

Looks like a possible mismerge.

-- Steve




^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [ANNOUNCE] 3.7-nohz1
  2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
                   ` (24 preceding siblings ...)
  2012-12-21  2:35 ` [ANNOUNCE] 3.7-nohz1 Steven Rostedt
@ 2012-12-21  5:20 ` Hakan Akkan
  25 siblings, 0 replies; 44+ messages in thread
From: Hakan Akkan @ 2012-12-21  5:20 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Alessio Igor Bogani, Andrew Morton, Avi Kivity,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Ingo Molnar, Paul E. McKenney, Paul Gortmaker, Peter Zijlstra,
	Steven Rostedt, Thomas Gleixner, Li Zhong

Hi,

On Thu, Dec 20, 2012 at 11:32 AM, Frederic Weisbecker
<fweisbec@gmail.com> wrote:
> Hi,
>
> So this is a new version of the nohz cpusets based on 3.7, except it's not using
> cpusets anymore and I actually based it on the middle of the 3.8 merge window
> in order to get latest upstream full dynticks preparatory work: cputime cleanups,
> RCU user mode, context tracking subsystem, nohz code consolidation, ...
>
> So the big changes since the last nohz cpuset release are:
>
> * printk now uses irq work so it doesn't rely on the tick anymore (provided
> your arch implements irq work with IPIs or alike). This chunk has been proposed
> for the 3.8 merge window: https://lkml.org/lkml/2012/12/17/177
> May be Linus will pull, may be not. We'll see. In any case I've included it in this tree
> but I'm not reposting this part of the patchset to avoid spamming you.
>
> * cputime doesn't rely on IPIs anymore. Now the reader does a special computation to
> remotely get the tickless cputime.
>
> * No more cpusets interface. Paul McKenney suggested me to start with a boot time
> kernel parameter to define the full dynticks cpumask. And he was totally right, it
> makes the code much more simple. That's a good way to start and to make the mainlining
> easier. We can still add a runtime configuration later if necessary.

It would be nice to have the runtime configuration ability. A percpu control
file such as /sys/devices/system/cpu/cpuX/isol could configure that cpu with
different levels of isolation. Users could echo bitmasks where each bit is
associated with a level of isolation. echo 0 disables all isolation.
Bit 1 disables
RCU callbacks on that CPU, bit 2 isolates the CPU from the general scheduler
just like isolcpus boot argument does, bit 3 pushes all irqs away, bit 4 turns
off the ticks etc.

I always hoped that someone will make isolcpus a runtime option so I guess
it is time to get my hands dirty. Any pointers for this?

>
> * Now there is always a CPU handling the timekeeping. This can be further optimized
> and more power-friendly, I really did something simple-stupid. I guess we'll try to get
> that into a better shape with Hakan. But at least the timekeeping now works.

Will look into it.

>
> * It uses the new RCU callbacks offlining feature. This way a full dynticks CPU doesn't
> need to keep the tick to handle local callbacks. This is still very experimental though.
>
> * No more specific IPI vector for full dynticks. We just use the scheduler ipi.
>
> The branch is:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
>         3.7-nohz1
>
> There is still quite some work to do.
>
> == How to use? ==
>
> Select:
>         CONFIG_NO_HZ
>         CONFIG_RCU_USER_QS
>         CONFIG_VIRT_CPU_ACCOUNTING_GEN
>         CONFIG_RCU_NOCB_CPU
>         CONFIG_NO_HZ_FULL
>
> You always need at least one timekeeping CPU.
>
> Let's imagine you have 4 CPUs. We keep the CPU 0 to offline RCU callbacks there and to
> handle the timekeeping. We set the rest as full dynticks. So you need the following kernel
> parameters:
>
>         rcu_nocbs=1-3 full_nohz=1-3
>
> (Note rcu_nocbs value must always be the same as full_nohz).
>
> Now if you want proper isolation you need to:
>
> * Migrate your processes adequately
> * Migrate your irqs to CPU 0
> * Migrate the RCU nocb threads to CPU 0. Example with the above configuration:
>
>         for p in $(ps -o pid= -C rcuo1,rcuo2,rcuo3)
>         do
>                 taskset -cp 0 $p
>         done
>
> Then run what you want on the full dynticks CPUs. For best results, run 1 task
> per CPU, mostly in userspace and mostly CPU bound (otherwise more IO = more kernel
> mode execution = more chances to get IPIs, tick restarted, workqueues, kthreads, etc...)
>
> This page contains a good reminder for those interested in CPU isolation: https://github.com/gby/linux/wiki
>
> But keep in mind that my tree is not yet ready for serious production.
>
> Happy Christmas, new year or whatever end of the world.
> ---
>
> Frederic Weisbecker (32):
>       irq_work: Fix racy IRQ_WORK_BUSY flag setting
>       irq_work: Fix racy check on work pending flag
>       irq_work: Remove CONFIG_HAVE_IRQ_WORK
>       nohz: Add API to check tick state
>       irq_work: Don't stop the tick with pending works
>       irq_work: Make self-IPIs optable
>       printk: Wake up klogd using irq_work
>       Merge branch 'nohz/printk-v8' into 3.7-nohz1-stage
>       context_tracking: Add comments on interface and internals
>       cputime: Generic on-demand virtual cputime accounting
>       cputime: Allow dynamic switch between tick/virtual based cputime accounting
>       cputime: Use accessors to read task cputime stats
>       cputime: Safely read cputime of full dynticks CPUs
>       nohz: Basic full dynticks interface
>       nohz: Assign timekeeping duty to a non-full-nohz CPU
>       nohz: Trace timekeeping update
>       nohz: Wake up full dynticks CPUs when a timer gets enqueued
>       rcu: Restart the tick on non-responding full dynticks CPUs
>       sched: Comment on rq->clock correctness in ttwu_do_wakeup() in nohz
>       sched: Update rq clock on nohz CPU before migrating tasks
>       sched: Update rq clock on nohz CPU before setting fair group shares
>       sched: Update rq clock on tickless CPUs before calling check_preempt_curr()
>       sched: Update rq clock earlier in unthrottle_cfs_rq
>       sched: Update clock of nohz busiest rq before balancing
>       sched: Update rq clock before idle balancing
>       sched: Update nohz rq clock before searching busiest group on load balancing
>       nohz: Move nohz load balancer selection into idle logic
>       nohz: Full dynticks mode
>       nohz: Only stop the tick on RCU nocb CPUs
>       nohz: Don't turn off the tick if rcu needs it
>       nohz: Don't stop the tick if posix cpu timers are running
>       nohz: Add some tracing
>
> Steven Rostedt (2):
>       irq_work: Flush work on CPU_DYING
>       irq_work: Warn if there's still work on cpu_down
>
>  arch/alpha/Kconfig                  |    1 -
>  arch/alpha/kernel/osf_sys.c         |    6 +-
>  arch/arm/Kconfig                    |    1 -
>  arch/arm64/Kconfig                  |    1 -
>  arch/blackfin/Kconfig               |    1 -
>  arch/frv/Kconfig                    |    1 -
>  arch/hexagon/Kconfig                |    1 -
>  arch/mips/Kconfig                   |    1 -
>  arch/parisc/Kconfig                 |    1 -
>  arch/powerpc/Kconfig                |    1 -
>  arch/s390/Kconfig                   |    1 -
>  arch/s390/kernel/vtime.c            |    4 +-
>  arch/sh/Kconfig                     |    1 -
>  arch/sparc/Kconfig                  |    1 -
>  arch/x86/Kconfig                    |    1 -
>  arch/x86/kernel/apm_32.c            |   11 +-
>  drivers/isdn/mISDN/stack.c          |    7 +-
>  drivers/staging/iio/trigger/Kconfig |    1 -
>  fs/binfmt_elf.c                     |    8 +-
>  fs/binfmt_elf_fdpic.c               |    7 +-
>  include/asm-generic/cputime.h       |    1 +
>  include/linux/context_tracking.h    |   28 +++++
>  include/linux/hardirq.h             |    4 +-
>  include/linux/init_task.h           |    9 ++
>  include/linux/irq_work.h            |   20 +++
>  include/linux/kernel_stat.h         |    2 +-
>  include/linux/posix-timers.h        |    1 +
>  include/linux/printk.h              |    3 -
>  include/linux/rcupdate.h            |    8 ++
>  include/linux/sched.h               |   48 +++++++-
>  include/linux/tick.h                |   26 ++++-
>  include/linux/vtime.h               |   47 +++++---
>  init/Kconfig                        |   22 +++-
>  kernel/acct.c                       |    6 +-
>  kernel/context_tracking.c           |   91 +++++++++++----
>  kernel/cpu.c                        |    4 +-
>  kernel/delayacct.c                  |    7 +-
>  kernel/exit.c                       |    6 +-
>  kernel/fork.c                       |    8 +-
>  kernel/irq_work.c                   |  131 ++++++++++++++++-----
>  kernel/posix-cpu-timers.c           |   39 +++++-
>  kernel/printk.c                     |   36 +++---
>  kernel/rcutree.c                    |   19 +++-
>  kernel/rcutree_plugin.h             |   13 +--
>  kernel/sched/core.c                 |   69 +++++++++++-
>  kernel/sched/cputime.c              |  222 ++++++++++++++++++++++++++++++-----
>  kernel/sched/fair.c                 |   42 +++++++-
>  kernel/sched/sched.h                |   15 +++
>  kernel/signal.c                     |   12 ++-
>  kernel/softirq.c                    |   11 +-
>  kernel/time/Kconfig                 |    9 ++
>  kernel/time/tick-broadcast.c        |    3 +-
>  kernel/time/tick-common.c           |    5 +-
>  kernel/time/tick-sched.c            |  142 ++++++++++++++++++++---
>  kernel/timer.c                      |    3 +-
>  kernel/tsacct.c                     |   19 ++-
>  56 files changed, 955 insertions(+), 233 deletions(-)

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 03/24] cputime: Allow dynamic switch between tick/virtual based cputime accounting
  2012-12-20 18:32 ` [PATCH 03/24] cputime: Allow dynamic switch between tick/virtual based " Frederic Weisbecker
@ 2012-12-21 15:05   ` Steven Rostedt
  2012-12-22 17:43     ` Frederic Weisbecker
  0 siblings, 1 reply; 44+ messages in thread
From: Steven Rostedt @ 2012-12-21 15:05 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Alessio Igor Bogani, Andrew Morton, Avi Kivity,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Ingo Molnar, Paul E. McKenney, Paul Gortmaker,
	Peter Zijlstra, Thomas Gleixner

On Thu, 2012-12-20 at 19:32 +0100, Frederic Weisbecker wrote:

> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index da0a9e7..e1fcab4 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -317,8 +317,6 @@ out:
>  	rcu_read_unlock();
>  }
>  
> -#ifndef CONFIG_VIRT_CPU_ACCOUNTING
> -
>  #ifdef CONFIG_IRQ_TIME_ACCOUNTING
>  /*
>   * Account a tick to a process and cpustat
> @@ -388,6 +386,7 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
>  						struct rq *rq) {}
>  #endif /* CONFIG_IRQ_TIME_ACCOUNTING */
>  
> +#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
>  /*
>   * Account a single tick of cpu time.
>   * @p: the process that the cpu time gets accounted to
> @@ -398,6 +397,11 @@ void account_process_tick(struct task_struct *p, int user_tick)
>  	cputime_t one_jiffy_scaled = cputime_to_scaled(cputime_one_jiffy);
>  	struct rq *rq = this_rq();
>  
> +	if (vtime_accounting()) {
> +		vtime_account_user(p);
> +		return;
> +	}
> +
>  	if (sched_clock_irqtime) {
>  		irqtime_account_process_tick(p, user_tick, rq);
>  		return;
> @@ -439,29 +443,13 @@ void account_idle_ticks(unsigned long ticks)
>  
>  	account_idle_time(jiffies_to_cputime(ticks));
>  }
> -
>  #endif
>  
> +
>  /*
>   * Use precise platform statistics if available:
>   */
>  #ifdef CONFIG_VIRT_CPU_ACCOUNTING
> -void task_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st)
> -{
> -	*ut = p->utime;
> -	*st = p->stime;
> -}
> -
> -void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st)
> -{
> -	struct task_cputime cputime;
> -
> -	thread_group_cputime(p, &cputime);
> -
> -	*ut = cputime.utime;
> -	*st = cputime.stime;
> -}
> -
>  void vtime_account_system_irqsafe(struct task_struct *tsk)
>  {
>  	unsigned long flags;
> @@ -517,8 +505,25 @@ void vtime_account(struct task_struct *tsk)
>  }
>  EXPORT_SYMBOL_GPL(vtime_account);
>  #endif /* __ARCH_HAS_VTIME_ACCOUNT */
> +#endif /* CONFIG_VIRT_CPU_ACCOUNTING */
>  
> -#else
> +#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
> +void task_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st)
> +{
> +	*ut = p->utime;
> +	*st = p->stime;
> +}
> +
> +void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st)
> +{
> +	struct task_cputime cputime;
> +
> +	thread_group_cputime(p, &cputime);
> +
> +	*ut = cputime.utime;
> +	*st = cputime.stime;
> +}
> +#else /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
>  
>  #ifndef nsecs_to_cputime
>  # define nsecs_to_cputime(__nsecs)	nsecs_to_jiffies(__nsecs)
> @@ -548,6 +553,12 @@ static void cputime_adjust(struct task_cputime *curr,
>  {
>  	cputime_t rtime, utime, total;
>  
> +	if (vtime_accounting()) {
> +		*ut = curr->utime;
> +		*st = curr->stime;
> +		return;
> +	}
> +
>  	utime = curr->utime;
>  	total = utime + curr->stime;
>  
> @@ -601,6 +612,7 @@ void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime
>  	thread_group_cputime(p, &cputime);
>  	cputime_adjust(&cputime, &p->signal->prev_cputime, ut, st);
>  }
> +#endif /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */

Ah, the missing #endif gets added back here.

-- Steve

>  
>  #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
>  static DEFINE_PER_CPU(long, last_jiffies) = INITIAL_JIFFIES;
> @@ -642,6 +654,11 @@ void vtime_account_idle(struct task_struct *tsk)
>  	account_idle_time(delta_cpu);
>  }
>  
> +bool vtime_accounting(void)
> +{
> +	return context_tracking_active();
> +}
> +
>  static int __cpuinit vtime_cpu_notify(struct notifier_block *self,
>  				      unsigned long action, void *hcpu)
>  {
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index fb8e5e4..ad0e6fa 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -632,8 +632,11 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
>  
>  static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
>  {
> -#ifndef CONFIG_VIRT_CPU_ACCOUNTING
> +#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
>  	unsigned long ticks;
> +
> +	if (vtime_accounting())
> +		return;
>  	/*
>  	 * We stopped the tick in idle. Update process times would miss the
>  	 * time we slept as update_process_times does only a 1 tick



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 05/24] cputime: Safely read cputime of full dynticks CPUs
  2012-12-20 18:32 ` [PATCH 05/24] cputime: Safely read cputime of full dynticks CPUs Frederic Weisbecker
@ 2012-12-21 15:09   ` Steven Rostedt
  2012-12-22 17:51     ` Frederic Weisbecker
  0 siblings, 1 reply; 44+ messages in thread
From: Steven Rostedt @ 2012-12-21 15:09 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Alessio Igor Bogani, Andrew Morton, Avi Kivity,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Ingo Molnar, Paul E. McKenney, Paul Gortmaker,
	Peter Zijlstra, Thomas Gleixner

On Thu, 2012-12-20 at 19:32 +0100, Frederic Weisbecker wrote:

> --- a/include/linux/init_task.h
> +++ b/include/linux/init_task.h
> @@ -10,6 +10,7 @@
>  #include <linux/pid_namespace.h>
>  #include <linux/user_namespace.h>
>  #include <linux/securebits.h>
> +#include <linux/seqlock.h>
>  #include <net/net_namespace.h>
>  
>  #ifdef CONFIG_SMP
> @@ -141,6 +142,13 @@ extern struct task_group root_task_group;
>  # define INIT_PERF_EVENTS(tsk)
>  #endif
>  
> +#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
> +#define INIT_VTIME(tsk)						\
> +	.vtime_seqlock = __SEQLOCK_UNLOCKED(tsk.vtime_seqlock),	\
> +	.prev_jiffies = INITIAL_JIFFIES, /* CHECKME */		\
> +	.prev_jiffies_whence = JIFFIES_SYS,

#else
# define INIT_VTIME(tsk)
#endif

Otherwise it fails to compile when CONFIG_VIRT_CPU_ACCOUNTING_GEN is not
set.

-- Steve

> +#endif
> +
>  #define INIT_TASK_COMM "swapper"
>  
>  /*
> @@ -210,6 +218,7 @@ extern struct task_group root_task_group;
>  	INIT_TRACE_RECURSION						\
>  	INIT_TASK_RCU_PREEMPT(tsk)					\
>  	INIT_CPUSET_SEQ							\
> +	INIT_VTIME(tsk)							\
>  }
>  
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 031afd0..727b988 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1360,6 +1360,15 @@ struct task_struct {
>  #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
>  	struct cputime prev_cputime;
>  #endif
> +#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
> +	seqlock_t vtime_seqlock;
> +	long prev_jiffies;
> +	enum {
> +		JIFFIES_SLEEPING = 0,
> +		JIFFIES_USER,
> +		JIFFIES_SYS,
> +	} prev_jiffies_whence;
> +#endif
>  	unsigned long nvcsw, nivcsw; /* context switch counts */
>  	struct timespec start_time; 		/* monotonic time */
>  	struct timespec real_start_time;	/* boot based time */
> @@ -1769,6 +1778,12 @@ static inline void put_task_struct(struct task_struct *t)
>  		__put_task_struct(t);
>  }
>  
> +#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
> +extern void task_cputime(struct task_struct *t,
> +			 cputime_t *utime, cputime_t *stime);
> +extern void task_cputime_scaled(struct task_struct *t,
> +				cputime_t *utimescaled, cputime_t *stimescaled);
> +#else
>  static inline void task_cputime(struct task_struct *t,
>  				cputime_t *utime, cputime_t *stime)
>  {
> @@ -1787,6 +1802,7 @@ static inline void task_cputime_scaled(struct task_struct *t,
>  	if (stimescaled)
>  		*stimescaled = t->stimescaled;
>  }
> +#endif
>  extern void task_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st);
>  extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st);
>  
> diff --git a/include/linux/vtime.h b/include/linux/vtime.h
> index e57020d..81c7d84 100644
> --- a/include/linux/vtime.h
> +++ b/include/linux/vtime.h
> @@ -9,52 +9,52 @@ extern void vtime_account_system(struct task_struct *tsk);
>  extern void vtime_account_system_irqsafe(struct task_struct *tsk);
>  extern void vtime_account_idle(struct task_struct *tsk);
>  extern void vtime_account_user(struct task_struct *tsk);
> -extern void vtime_account(struct task_struct *tsk);
> +extern void vtime_account_irq_enter(struct task_struct *tsk);
>  
> -#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
> -extern bool vtime_accounting(void);
> -#else
> +#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
>  static inline bool vtime_accounting(void) { return true; }
>  #endif
>  
>  #else /* !CONFIG_VIRT_CPU_ACCOUNTING */
> +
>  static inline void vtime_task_switch(struct task_struct *prev) { }
>  static inline void vtime_account_system(struct task_struct *tsk) { }
>  static inline void vtime_account_system_irqsafe(struct task_struct *tsk) { }
>  static inline void vtime_account_user(struct task_struct *tsk) { }
> -static inline void vtime_account(struct task_struct *tsk) { }
> +static inline void vtime_account_irq_enter(struct task_struct *tsk) { }
>  static inline bool vtime_accounting(void) { return false; }
>  #endif
>  
>  #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
> -static inline void arch_vtime_task_switch(struct task_struct *tsk) { }
> +extern void arch_vtime_task_switch(struct task_struct *tsk);
> +extern void vtime_account_irq_exit(struct task_struct *tsk);
> +extern void vtime_user_enter(struct task_struct *tsk);
> +extern bool vtime_accounting(void);
> +#else
> +static inline void vtime_account_irq_exit(struct task_struct *tsk)
> +{
> +	/* On hard|softirq exit we always account to hard|softirq cputime */
> +	vtime_account_system(tsk);
> +}
> +static inline void vtime_enter_user(struct task_struct *tsk) { }
>  #endif
>  
> +
>  #ifdef CONFIG_IRQ_TIME_ACCOUNTING
>  extern void irqtime_account_irq(struct task_struct *tsk);
>  #else
>  static inline void irqtime_account_irq(struct task_struct *tsk) { }
>  #endif
>  
> -static inline void vtime_account_irq_enter(struct task_struct *tsk)
> +static inline void account_irq_enter_time(struct task_struct *tsk)
>  {
> -	/*
> -	 * Hardirq can interrupt idle task anytime. So we need vtime_account()
> -	 * that performs the idle check in CONFIG_VIRT_CPU_ACCOUNTING.
> -	 * Softirq can also interrupt idle task directly if it calls
> -	 * local_bh_enable(). Such case probably don't exist but we never know.
> -	 * Ksoftirqd is not concerned because idle time is flushed on context
> -	 * switch. Softirqs in the end of hardirqs are also not a problem because
> -	 * the idle time is flushed on hardirq time already.
> -	 */
> -	vtime_account(tsk);
> +	vtime_account_irq_enter(tsk);
>  	irqtime_account_irq(tsk);
>  }
>  
> -static inline void vtime_account_irq_exit(struct task_struct *tsk)
> +static inline void account_irq_exit_time(struct task_struct *tsk)
>  {
> -	/* On hard|softirq exit we always account to hard|softirq cputime */
> -	vtime_account_system(tsk);
> +	vtime_account_irq_exit(tsk);
>  	irqtime_account_irq(tsk);
>  }
>  
> diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
> index ca1e073..bd2f2fc 100644
> --- a/kernel/context_tracking.c
> +++ b/kernel/context_tracking.c
> @@ -56,7 +56,7 @@ void user_enter(void)
>  	local_irq_save(flags);
>  	if (__this_cpu_read(context_tracking.active) &&
>  	    __this_cpu_read(context_tracking.state) != IN_USER) {
> -		vtime_account_system(current);
> +		vtime_user_enter(current);
>  		/*
>  		 * At this stage, only low level arch entry code remains and
>  		 * then we'll run in userspace. We can assume there won't be
> diff --git a/kernel/fork.c b/kernel/fork.c
> index a81efb8..efafcba 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1224,6 +1224,12 @@ static struct task_struct *copy_process(unsigned long clone_flags,
>  #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
>  	p->prev_cputime.utime = p->prev_cputime.stime = 0;
>  #endif
> +#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
> +	seqlock_init(&p->vtime_seqlock);
> +	p->prev_jiffies_whence = JIFFIES_SLEEPING; /*CHECKME: idle tasks? */
> +	p->prev_jiffies = jiffies;
> +#endif
> +
>  #if defined(SPLIT_RSS_COUNTING)
>  	memset(&p->rss_stat, 0, sizeof(p->rss_stat));
>  #endif
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index 0603671..3f25e60 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -484,7 +484,7 @@ void vtime_task_switch(struct task_struct *prev)
>   * vtime_account().
>   */
>  #ifndef __ARCH_HAS_VTIME_ACCOUNT
> -void vtime_account(struct task_struct *tsk)
> +void vtime_account_irq_enter(struct task_struct *tsk)
>  {
>  	if (!in_interrupt()) {
>  		/*
> @@ -505,7 +505,7 @@ void vtime_account(struct task_struct *tsk)
>  	}
>  	vtime_account_system(tsk);
>  }
> -EXPORT_SYMBOL_GPL(vtime_account);
> +EXPORT_SYMBOL_GPL(vtime_account_irq_enter);
>  #endif /* __ARCH_HAS_VTIME_ACCOUNT */
>  #endif /* CONFIG_VIRT_CPU_ACCOUNTING */
>  
> @@ -616,41 +616,67 @@ void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime
>  #endif /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
>  
>  #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
> -static DEFINE_PER_CPU(long, last_jiffies) = INITIAL_JIFFIES;
> -
> -static cputime_t get_vtime_delta(void)
> +static cputime_t get_vtime_delta(struct task_struct *tsk)
>  {
>  	long delta;
>  
> -	delta = jiffies - __this_cpu_read(last_jiffies);
> -	__this_cpu_add(last_jiffies, delta);
> +	delta = jiffies - tsk->prev_jiffies;
> +	tsk->prev_jiffies += delta;
>  
>  	return jiffies_to_cputime(delta);
>  }
>  
> -void vtime_account_system(struct task_struct *tsk)
> +static void __vtime_account_system(struct task_struct *tsk)
>  {
> -	cputime_t delta_cpu = get_vtime_delta();
> +	cputime_t delta_cpu = get_vtime_delta(tsk);
>  
>  	account_system_time(tsk, irq_count(), delta_cpu, cputime_to_scaled(delta_cpu));
>  }
>  
> +void vtime_account_system(struct task_struct *tsk)
> +{
> +	write_seqlock(&tsk->vtime_seqlock);
> +	__vtime_account_system(tsk);
> +	write_sequnlock(&tsk->vtime_seqlock);
> +}
> +
> +void vtime_account_irq_exit(struct task_struct *tsk)
> +{
> +	write_seqlock(&tsk->vtime_seqlock);
> +	if (context_tracking_in_user())
> +		tsk->prev_jiffies_whence = JIFFIES_USER;
> +	__vtime_account_system(tsk);
> +	write_sequnlock(&tsk->vtime_seqlock);
> +}
> +
>  void vtime_account_user(struct task_struct *tsk)
>  {
> -	cputime_t delta_cpu = get_vtime_delta();
> +	cputime_t delta_cpu = get_vtime_delta(tsk);
>  
>  	/*
>  	 * This is an unfortunate hack: if we flush user time only on
>  	 * irq entry, we miss the jiffies update and the time is spuriously
>  	 * accounted to system time.
>  	 */
> -	if (context_tracking_in_user())
> +	if (context_tracking_in_user()) {
> +		write_seqlock(&tsk->vtime_seqlock);
> +		tsk->prev_jiffies_whence = JIFFIES_SYS;
>  		account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
> +		write_sequnlock(&tsk->vtime_seqlock);
> +	}
> +}
> +
> +void vtime_user_enter(struct task_struct *tsk)
> +{
> +	write_seqlock(&tsk->vtime_seqlock);
> +	tsk->prev_jiffies_whence = JIFFIES_USER;
> +	__vtime_account_system(tsk);
> +	write_sequnlock(&tsk->vtime_seqlock);
>  }
>  
>  void vtime_account_idle(struct task_struct *tsk)
>  {
> -	cputime_t delta_cpu = get_vtime_delta();
> +	cputime_t delta_cpu = get_vtime_delta(tsk);
>  
>  	account_idle_time(delta_cpu);
>  }
> @@ -660,31 +686,64 @@ bool vtime_accounting(void)
>  	return context_tracking_active();
>  }
>  
> -static int __cpuinit vtime_cpu_notify(struct notifier_block *self,
> -				      unsigned long action, void *hcpu)
> +void arch_vtime_task_switch(struct task_struct *prev)
>  {
> -	long cpu = (long)hcpu;
> -	long *last_jiffies_cpu = per_cpu_ptr(&last_jiffies, cpu);
> +	write_seqlock(&prev->vtime_seqlock);
> +	prev->prev_jiffies_whence = JIFFIES_SLEEPING;
> +	write_sequnlock(&prev->vtime_seqlock);
>  
> -	switch (action) {
> -	case CPU_UP_PREPARE:
> -	case CPU_UP_PREPARE_FROZEN:
> -		/*
> -		 * CHECKME: ensure that's visible by the CPU
> -		 * once it wakes up
> -		 */
> -		*last_jiffies_cpu = jiffies;
> -	default:
> -		break;
> -	}
> +	write_seqlock(&current->vtime_seqlock);
> +	current->prev_jiffies_whence = JIFFIES_SYS;
> +	current->prev_jiffies = jiffies;
> +	write_sequnlock(&current->vtime_seqlock);
> +}
> +
> +void task_cputime(struct task_struct *t, cputime_t *utime, cputime_t *stime)
> +{
> +	unsigned int seq;
> +	long delta;
> +
> +	do {
> +		seq = read_seqbegin(&t->vtime_seqlock);
> +
> +		*utime = t->utime;
> +		*stime = t->utime;
> +
> +		if (t->prev_jiffies_whence == JIFFIES_SLEEPING || 
> +		    is_idle_task(t))
> +			continue;
>  
> -	return NOTIFY_OK;
> +		delta = jiffies - t->prev_jiffies;
> +
> +		if (t->prev_jiffies_whence == JIFFIES_USER)
> +			*utime += delta;
> +		else if (t->prev_jiffies_whence == JIFFIES_SYS)
> +			*stime += delta;
> +	} while (read_seqretry(&t->vtime_seqlock, seq));
>  }
>  
> -static int __init init_vtime(void)
> +void task_cputime_scaled(struct task_struct *t,
> +			 cputime_t *utimescaled, cputime_t *stimescaled)
>  {
> -	cpu_notifier(vtime_cpu_notify, 0);
> -	return 0;
> +	unsigned int seq;
> +	long delta;
> +
> +	do {
> +		seq = read_seqbegin(&t->vtime_seqlock);
> +
> +		*utimescaled = t->utimescaled;
> +		*stimescaled = t->utimescaled;
> +
> +		if (t->prev_jiffies_whence == JIFFIES_SLEEPING || 
> +		    is_idle_task(t))
> +			continue;
> +
> +		delta = jiffies - t->prev_jiffies;
> +
> +		if (t->prev_jiffies_whence == JIFFIES_USER)
> +			*utimescaled += jiffies_to_scaled(delta);
> +		else if (t->prev_jiffies_whence == JIFFIES_SYS)
> +			*stimescaled += jiffies_to_scaled(delta);
> +	} while (read_seqretry(&t->vtime_seqlock, seq));
>  }
> -early_initcall(init_vtime);
>  #endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index ed567ba..f5cc25f 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -221,7 +221,7 @@ asmlinkage void __do_softirq(void)
>  	current->flags &= ~PF_MEMALLOC;
>  
>  	pending = local_softirq_pending();
> -	vtime_account_irq_enter(current);
> +	account_irq_enter_time(current);
>  
>  	__local_bh_disable((unsigned long)__builtin_return_address(0),
>  				SOFTIRQ_OFFSET);
> @@ -272,7 +272,7 @@ restart:
>  
>  	lockdep_softirq_exit();
>  
> -	vtime_account_irq_exit(current);
> +	account_irq_exit_time(current);
>  	__local_bh_enable(SOFTIRQ_OFFSET);
>  	tsk_restore_flags(current, old_flags, PF_MEMALLOC);
>  }
> @@ -341,7 +341,7 @@ static inline void invoke_softirq(void)
>   */
>  void irq_exit(void)
>  {
> -	vtime_account_irq_exit(current);
> +	account_irq_exit_time(current);
>  	trace_hardirq_exit();
>  	sub_preempt_count(IRQ_EXIT_OFFSET);
>  	if (!in_interrupt() && local_softirq_pending())



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 07/24] nohz: Assign timekeeping duty to a non-full-nohz CPU
  2012-12-20 18:32 ` [PATCH 07/24] nohz: Assign timekeeping duty to a non-full-nohz CPU Frederic Weisbecker
@ 2012-12-21 16:13   ` Steven Rostedt
  2012-12-22 16:39     ` Frederic Weisbecker
  0 siblings, 1 reply; 44+ messages in thread
From: Steven Rostedt @ 2012-12-21 16:13 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Alessio Igor Bogani, Andrew Morton, Avi Kivity,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Ingo Molnar, Paul E. McKenney, Paul Gortmaker,
	Peter Zijlstra, Thomas Gleixner

On Thu, 2012-12-20 at 19:32 +0100, Frederic Weisbecker wrote:

kernel/time/tick-sched.c:517:6: error: have_full_nohz_mask undeclared
(first use in this function)


> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -112,7 +112,8 @@ static void tick_sched_do_timer(ktime_t now)
>  	 * this duty, then the jiffies update is still serialized by
>  	 * jiffies_lock.
>  	 */
> -	if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE))
> +	if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)
> +	    && !tick_nohz_full_cpu(cpu))
>  		tick_do_timer_cpu = cpu;
>  #endif
>  
> @@ -512,6 +513,10 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
>  		return false;
>  	}
>  
> +	/* If there are full nohz CPUs around, we need to keep the timekeeping duty */
> +	if (have_full_nohz_mask && tick_do_timer_cpu == cpu)
> +		return false;
> +
>  	return true;
>  }
>  

Fold in:

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 4a68b50..00e5682 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -164,6 +164,8 @@ static int __init tick_nohz_full_setup(char *str)
 	return 1;
 }
 __setup("full_nohz=", tick_nohz_full_setup);
+#else
+#define have_full_nohz_mask 0
 #endif
 
 /*


-- Steve



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 07/24] nohz: Assign timekeeping duty to a non-full-nohz CPU
  2012-12-21 16:13   ` Steven Rostedt
@ 2012-12-22 16:39     ` Frederic Weisbecker
  2012-12-22 17:05       ` Steven Rostedt
  0 siblings, 1 reply; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-22 16:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Alessio Igor Bogani, Andrew Morton, Avi Kivity,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Ingo Molnar, Paul E. McKenney, Paul Gortmaker,
	Peter Zijlstra, Thomas Gleixner

2012/12/21 Steven Rostedt <rostedt@goodmis.org>:
> On Thu, 2012-12-20 at 19:32 +0100, Frederic Weisbecker wrote:
>
> kernel/time/tick-sched.c:517:6: error: have_full_nohz_mask undeclared
> (first use in this function)
>
>
>> --- a/kernel/time/tick-sched.c
>> +++ b/kernel/time/tick-sched.c
>> @@ -112,7 +112,8 @@ static void tick_sched_do_timer(ktime_t now)
>>        * this duty, then the jiffies update is still serialized by
>>        * jiffies_lock.
>>        */
>> -     if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE))
>> +     if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)
>> +         && !tick_nohz_full_cpu(cpu))
>>               tick_do_timer_cpu = cpu;
>>  #endif
>>
>> @@ -512,6 +513,10 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
>>               return false;
>>       }
>>
>> +     /* If there are full nohz CPUs around, we need to keep the timekeeping duty */
>> +     if (have_full_nohz_mask && tick_do_timer_cpu == cpu)
>> +             return false;
>> +
>>       return true;
>>  }
>>
>
> Fold in:
>
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 4a68b50..00e5682 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -164,6 +164,8 @@ static int __init tick_nohz_full_setup(char *str)
>         return 1;
>  }
>  __setup("full_nohz=", tick_nohz_full_setup);
> +#else
> +#define have_full_nohz_mask 0
>  #endif

Ah thanks! I'm folding your patch.
I can add your SOB, right?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 07/24] nohz: Assign timekeeping duty to a non-full-nohz CPU
  2012-12-22 16:39     ` Frederic Weisbecker
@ 2012-12-22 17:05       ` Steven Rostedt
  0 siblings, 0 replies; 44+ messages in thread
From: Steven Rostedt @ 2012-12-22 17:05 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Alessio Igor Bogani, Andrew Morton, Avi Kivity,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Ingo Molnar, Paul E. McKenney, Paul Gortmaker,
	Peter Zijlstra, Thomas Gleixner

On Sat, 2012-12-22 at 17:39 +0100, Frederic Weisbecker wrote:

> Ah thanks! I'm folding your patch.
> I can add your SOB, right?

Sure, feel free to add my SOB on any of the changes I sent to you.

-- Steve



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 03/24] cputime: Allow dynamic switch between tick/virtual based cputime accounting
  2012-12-21 15:05   ` Steven Rostedt
@ 2012-12-22 17:43     ` Frederic Weisbecker
  0 siblings, 0 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-22 17:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Alessio Igor Bogani, Andrew Morton, Avi Kivity,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Ingo Molnar, Paul E. McKenney, Paul Gortmaker,
	Peter Zijlstra, Thomas Gleixner

2012/12/21 Steven Rostedt <rostedt@goodmis.org>:
>>@@ -601,6 +612,7 @@ void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime
>>       thread_group_cputime(p, &cputime);
>>       cputime_adjust(&cputime, &p->signal->prev_cputime, ut, st);
>>  }
>> +#endif /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
>
> Ah, the missing #endif gets added back here.

Right, I fixed the mid-state breakage for the next version.

Thanks for the report!

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 05/24] cputime: Safely read cputime of full dynticks CPUs
  2012-12-21 15:09   ` Steven Rostedt
@ 2012-12-22 17:51     ` Frederic Weisbecker
  0 siblings, 0 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-22 17:51 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Alessio Igor Bogani, Andrew Morton, Avi Kivity,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Ingo Molnar, Paul E. McKenney, Paul Gortmaker,
	Peter Zijlstra, Thomas Gleixner

2012/12/21 Steven Rostedt <rostedt@goodmis.org>:
> On Thu, 2012-12-20 at 19:32 +0100, Frederic Weisbecker wrote:
>
>> --- a/include/linux/init_task.h
>> +++ b/include/linux/init_task.h
>> @@ -10,6 +10,7 @@
>>  #include <linux/pid_namespace.h>
>>  #include <linux/user_namespace.h>
>>  #include <linux/securebits.h>
>> +#include <linux/seqlock.h>
>>  #include <net/net_namespace.h>
>>
>>  #ifdef CONFIG_SMP
>> @@ -141,6 +142,13 @@ extern struct task_group root_task_group;
>>  # define INIT_PERF_EVENTS(tsk)
>>  #endif
>>
>> +#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
>> +#define INIT_VTIME(tsk)                                              \
>> +     .vtime_seqlock = __SEQLOCK_UNLOCKED(tsk.vtime_seqlock), \
>> +     .prev_jiffies = INITIAL_JIFFIES, /* CHECKME */          \
>> +     .prev_jiffies_whence = JIFFIES_SYS,
>
> #else
> # define INIT_VTIME(tsk)
> #endif
>
> Otherwise it fails to compile when CONFIG_VIRT_CPU_ACCOUNTING_GEN is not
> set.

Fixed for the next version, thanks!

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [ANNOUNCE] 3.7-nohz1
  2012-12-21  2:35 ` [ANNOUNCE] 3.7-nohz1 Steven Rostedt
@ 2012-12-23 23:43   ` Frederic Weisbecker
  2012-12-30  3:56     ` Paul E. McKenney
  0 siblings, 1 reply; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-23 23:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Alessio Igor Bogani, Andrew Morton, Avi Kivity,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Ingo Molnar, Paul E. McKenney, Paul Gortmaker,
	Peter Zijlstra, Thomas Gleixner, Li Zhong

2012/12/21 Steven Rostedt <rostedt@goodmis.org>:
> On Thu, 2012-12-20 at 19:32 +0100, Frederic Weisbecker wrote:
>> Let's imagine you have 4 CPUs. We keep the CPU 0 to offline RCU callbacks there and to
>> handle the timekeeping. We set the rest as full dynticks. So you need the following kernel
>> parameters:
>>
>>       rcu_nocbs=1-3 full_nohz=1-3
>>
>> (Note rcu_nocbs value must always be the same as full_nohz).
>
> Why? You can't have: rcu_nocbs=1-4 full_nohz=1-3

That should be allowed.

>   or: rcu_nocbs=1-3 full_nohz=1-4 ?

But that not.

You need to have: rcu_nocbs & full_nohz == full_nohz. This is because
the tick is not there to maintain the local RCU callbacks anymore. So
this must be offloaded to the rcu_nocb threads.

I just have a doubt with rcu_nocb. Do we still need the tick to
complete the grace period for local rcu callbacks? I need to discuss
that with Paul.

>
> That needs to be fixed. Either with a warning, and/or to force the two
> to be the same. That is, if they specify:
>
>   rcu_nocbs=1-3 full_nohz=1-4
>
> Then set rcu_nocbs=1-4 with a warning about it. Or simply set
>  full_nohz=1-3.

Yep, will do.

Thanks!

>
> -- Steve
>
>>
>> Now if you want proper isolation you need to:
>>
>> * Migrate your processes adequately
>> * Migrate your irqs to CPU 0
>> * Migrate the RCU nocb threads to CPU 0. Example with the above configuration:
>>
>>       for p in $(ps -o pid= -C rcuo1,rcuo2,rcuo3)
>>       do
>>               taskset -cp 0 $p
>>       done
>>
>> Then run what you want on the full dynticks CPUs. For best results, run 1 task
>> per CPU, mostly in userspace and mostly CPU bound (otherwise more IO = more kernel
>> mode execution = more chances to get IPIs, tick restarted, workqueues, kthreads, etc...)
>>
>> This page contains a good reminder for those interested in CPU isolation: https://github.com/gby/linux/wiki
>>
>> But keep in mind that my tree is not yet ready for serious production.
>>
>
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 20/24] nohz: Full dynticks mode
  2012-12-20 18:33 ` [PATCH 20/24] nohz: Full dynticks mode Frederic Weisbecker
@ 2012-12-26  6:12   ` Namhyung Kim
  2012-12-26  7:02     ` Namhyung Kim
  2012-12-29 13:21     ` Frederic Weisbecker
  0 siblings, 2 replies; 44+ messages in thread
From: Namhyung Kim @ 2012-12-26  6:12 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Alessio Igor Bogani, Andrew Morton, Avi Kivity,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Ingo Molnar, Paul E. McKenney, Paul Gortmaker,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner

Hi Frederic,

On Thu, 20 Dec 2012 19:33:07 +0100, Frederic Weisbecker wrote:
> When a CPU is in full dynticks mode, try to switch
> it to nohz mode from the interrupt exit path if it is
> running a single non-idle task.
>
> Then restart the tick if necessary if we are enqueuing a
> second task while the timer is stopped, so that the scheduler
> tick is rearmed.
>
> [TODO: Check remaining things to be done from scheduler_tick()]
>
> [ Included build fix from Geoff Levand ]
[snip]
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index db3d4df..f3d8f4a 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -943,6 +943,14 @@ static inline u64 steal_ticks(u64 steal)
>  static inline void inc_nr_running(struct rq *rq)
>  {
>  	rq->nr_running++;
> +
> +	if (rq->nr_running == 2) {
> +		if (tick_nohz_full_cpu(rq->cpu)) {
> +			/* Order rq->nr_running write against the IPI */
> +			smp_wmb();
> +			smp_send_reschedule(rq->cpu);
> +		}
> +	}

This block should be guarded with #ifdef CONFIG_SMP, otherwise:

  CC      kernel/sched/core.o
In file included from /home/namhyung/project/linux/kernel/sched/core.c:85:0:
/home/namhyung/project/linux/kernel/sched/sched.h: In function ‘inc_nr_running’:
/home/namhyung/project/linux/kernel/sched/sched.h:960:28: error: ‘struct rq’ has no member named ‘cpu’
/home/namhyung/project/linux/kernel/sched/sched.h:963:26: error: ‘struct rq’ has no member named ‘cpu’


Thanks,
Namhyung

>  }

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 20/24] nohz: Full dynticks mode
  2012-12-26  6:12   ` Namhyung Kim
@ 2012-12-26  7:02     ` Namhyung Kim
  2012-12-29 13:21     ` Frederic Weisbecker
  1 sibling, 0 replies; 44+ messages in thread
From: Namhyung Kim @ 2012-12-26  7:02 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Alessio Igor Bogani, Andrew Morton, Avi Kivity,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Ingo Molnar, Paul E. McKenney, Paul Gortmaker,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner

On Wed, 26 Dec 2012 15:12:22 +0900, Namhyung Kim wrote:
> Hi Frederic,
>
> On Thu, 20 Dec 2012 19:33:07 +0100, Frederic Weisbecker wrote:
>> When a CPU is in full dynticks mode, try to switch
>> it to nohz mode from the interrupt exit path if it is
>> running a single non-idle task.
>>
>> Then restart the tick if necessary if we are enqueuing a
>> second task while the timer is stopped, so that the scheduler
>> tick is rearmed.
>>
>> [TODO: Check remaining things to be done from scheduler_tick()]
>>
>> [ Included build fix from Geoff Levand ]
> [snip]
>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> index db3d4df..f3d8f4a 100644
>> --- a/kernel/sched/sched.h
>> +++ b/kernel/sched/sched.h
>> @@ -943,6 +943,14 @@ static inline u64 steal_ticks(u64 steal)
>>  static inline void inc_nr_running(struct rq *rq)
>>  {
>>  	rq->nr_running++;
>> +
>> +	if (rq->nr_running == 2) {
>> +		if (tick_nohz_full_cpu(rq->cpu)) {
>> +			/* Order rq->nr_running write against the IPI */
>> +			smp_wmb();
>> +			smp_send_reschedule(rq->cpu);
>> +		}
>> +	}
>
> This block should be guarded with #ifdef CONFIG_SMP, otherwise:

Or apply something like this:


diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 0cd913ed29de..8de13ca88a17 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -957,10 +957,10 @@ static inline void inc_nr_running(struct rq *rq)
 	rq->nr_running++;
 
 	if (rq->nr_running == 2) {
-		if (tick_nohz_full_cpu(rq->cpu)) {
+		if (tick_nohz_full_cpu(cpu_of(rq))) {
 			/* Order rq->nr_running write against the IPI */
 			smp_wmb();
-			smp_send_reschedule(rq->cpu);
+			smp_send_reschedule(cpu_of(rq));
 		}
 	}
 }


Thanks,
Namhyung

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 02/24] cputime: Generic on-demand virtual cputime accounting
  2012-12-20 18:32 ` [PATCH 02/24] cputime: Generic on-demand virtual cputime accounting Frederic Weisbecker
  2012-12-21  5:11   ` Steven Rostedt
@ 2012-12-26  8:19   ` Li Zhong
  2012-12-29 13:15     ` Frederic Weisbecker
  1 sibling, 1 reply; 44+ messages in thread
From: Li Zhong @ 2012-12-26  8:19 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Alessio Igor Bogani, Andrew Morton, Avi Kivity,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Ingo Molnar, Paul E. McKenney, Paul Gortmaker,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner

On Thu, 2012-12-20 at 19:32 +0100, Frederic Weisbecker wrote:
> If we want to stop the tick further idle, we need to be
> able to account the cputime without using the tick.
> 
> Virtual based cputime accounting solves that problem by
> hooking into kernel/user boundaries.
> 
> However implementing CONFIG_VIRT_CPU_ACCOUNTING require
> to set low level hooks and involves more overhead. But
> we already have a generic context tracking subsystem
> that is required for RCU needs by archs which will want to
> shut down the tick outside idle.
> 
> This patch implements a generic virtual based cputime
> accounting that relies on these generic kernel/user hooks.
> 
> There are some upsides of doing this:
> 
> - This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING
> if context tracking is already built (already necessary for RCU in full
> tickless mode).
> 
> - We can rely on the generic context tracking subsystem to dynamically
> (de)activate the hooks, so that we can switch anytime between virtual
> and tick based accounting. This way we don't have the overhead
> of the virtual accounting when the tick is running periodically.
> 
> And a few downsides:
> 
> - It relies on jiffies and the hooks are set in high level code. This
> results in less precise cputime accounting than with a true native
> virtual based cputime accounting which hooks on low level code and use
> a cpu hardware clock. Precision is not the goal of this though.
> 
> - There is probably more overhead than a native virtual based cputime
> accounting. But this relies on hooks that are already set anyway.
> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Alessio Igor Bogani <abogani@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Avi Kivity <avi@redhat.com>
> Cc: Chris Metcalf <cmetcalf@tilera.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Geoff Levand <geoff@infradead.org>
> Cc: Gilad Ben Yossef <gilad@benyossef.com>
> Cc: Hakan Akkan <hakanakkan@gmail.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> ---
>  include/linux/context_tracking.h |   28 +++++++++++
>  include/linux/vtime.h            |    4 ++
>  init/Kconfig                     |   11 ++++-
>  kernel/context_tracking.c        |   22 ++-------
>  kernel/sched/cputime.c           |   93 +++++++++++++++++++++++++++++++++++--
>  5 files changed, 135 insertions(+), 23 deletions(-)
> 
> diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
> index e24339c..9f33fbc 100644
> --- a/include/linux/context_tracking.h
> +++ b/include/linux/context_tracking.h
> @@ -3,12 +3,40 @@
> 
>  #ifdef CONFIG_CONTEXT_TRACKING
>  #include <linux/sched.h>
> +#include <linux/percpu.h>
> +
> +struct context_tracking {
> +	/*
> +	 * When active is false, hooks are unset in order
> +	 * to minimize overhead: TIF flags are cleared
> +	 * and calls to user_enter/exit are ignored. This
> +	 * may be further optimized using static keys.
> +	 */
> +	bool active;
> +	enum {
> +		IN_KERNEL = 0,
> +		IN_USER,
> +	} state;
> +};
> +
> +DECLARE_PER_CPU(struct context_tracking, context_tracking);
> +
> +static inline bool context_tracking_in_user(void)
> +{
> +	return __this_cpu_read(context_tracking.state) == IN_USER;
> +}
> +
> +static inline bool context_tracking_active(void)
> +{
> +	return __this_cpu_read(context_tracking.active);
> +}
> 
>  extern void user_enter(void);
>  extern void user_exit(void);
>  extern void context_tracking_task_switch(struct task_struct *prev,
>  					 struct task_struct *next);
>  #else
> +static inline bool context_tracking_in_user(void) { return false; }
>  static inline void user_enter(void) { }
>  static inline void user_exit(void) { }
>  static inline void context_tracking_task_switch(struct task_struct *prev,
> diff --git a/include/linux/vtime.h b/include/linux/vtime.h
> index ae30ab5..58392aa 100644
> --- a/include/linux/vtime.h
> +++ b/include/linux/vtime.h
> @@ -17,6 +17,10 @@ static inline void vtime_account_system_irqsafe(struct task_struct *tsk) { }
>  static inline void vtime_account(struct task_struct *tsk) { }
>  #endif
> 
> +#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
> +static inline void arch_vtime_task_switch(struct task_struct *tsk) { }
> +#endif
> +
>  #ifdef CONFIG_IRQ_TIME_ACCOUNTING
>  extern void irqtime_account_irq(struct task_struct *tsk);
>  #else
> diff --git a/init/Kconfig b/init/Kconfig
> index 60579d6..a64b3e8 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -340,7 +340,9 @@ config TICK_CPU_ACCOUNTING
> 
>  config VIRT_CPU_ACCOUNTING
>  	bool "Deterministic task and CPU time accounting"
> -	depends on HAVE_VIRT_CPU_ACCOUNTING
> +	depends on HAVE_VIRT_CPU_ACCOUNTING || HAVE_CONTEXT_TRACKING
> +	select VIRT_CPU_ACCOUNTING_GEN if !HAVE_VIRT_CPU_ACCOUNTING
> +	default y if PPC64

I saw 
"init/Kconfig:346:warning: defaults for choice values not supported"
on this line. So maybe we don't need it. And we already have 
        "default VIRT_CPU_ACCOUNTING if PPC64"

Thanks, Zhong

>  	help
>  	  Select this option to enable more accurate task and CPU time
>  	  accounting.  This is done by reading a CPU counter on each
> @@ -363,6 +365,13 @@ config IRQ_TIME_ACCOUNTING
> 
>  endchoice
> 
> +config VIRT_CPU_ACCOUNTING_GEN
> +	select CONTEXT_TRACKING
> +	bool
> +	help
> +	  Implement a generic virtual based cputime accounting by using
> +	  the context tracking subsystem.
> +
>  config BSD_PROCESS_ACCT
>  	bool "BSD Process Accounting"
>  	help
> diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
> index 9f6c38f..ca1e073 100644
> --- a/kernel/context_tracking.c
> +++ b/kernel/context_tracking.c
> @@ -17,24 +17,10 @@
>  #include <linux/context_tracking.h>
>  #include <linux/rcupdate.h>
>  #include <linux/sched.h>
> -#include <linux/percpu.h>
>  #include <linux/hardirq.h>
> 
> -struct context_tracking {
> -	/*
> -	 * When active is false, hooks are unset in order
> -	 * to minimize overhead: TIF flags are cleared
> -	 * and calls to user_enter/exit are ignored. This
> -	 * may be further optimized using static keys.
> -	 */
> -	bool active;
> -	enum {
> -		IN_KERNEL = 0,
> -		IN_USER,
> -	} state;
> -};
> 
> -static DEFINE_PER_CPU(struct context_tracking, context_tracking) = {
> +DEFINE_PER_CPU(struct context_tracking, context_tracking) = {
>  #ifdef CONFIG_CONTEXT_TRACKING_FORCE
>  	.active = true,
>  #endif
> @@ -70,7 +56,7 @@ void user_enter(void)
>  	local_irq_save(flags);
>  	if (__this_cpu_read(context_tracking.active) &&
>  	    __this_cpu_read(context_tracking.state) != IN_USER) {
> -		__this_cpu_write(context_tracking.state, IN_USER);
> +		vtime_account_system(current);
>  		/*
>  		 * At this stage, only low level arch entry code remains and
>  		 * then we'll run in userspace. We can assume there won't be
> @@ -79,6 +65,7 @@ void user_enter(void)
>  		 * on the tick.
>  		 */
>  		rcu_user_enter();
> +		__this_cpu_write(context_tracking.state, IN_USER);
>  	}
>  	local_irq_restore(flags);
>  }
> @@ -104,12 +91,13 @@ void user_exit(void)
> 
>  	local_irq_save(flags);
>  	if (__this_cpu_read(context_tracking.state) == IN_USER) {
> -		__this_cpu_write(context_tracking.state, IN_KERNEL);
>  		/*
>  		 * We are going to run code that may use RCU. Inform
>  		 * RCU core about that (ie: we may need the tick again).
>  		 */
>  		rcu_user_exit();
> +		vtime_account_user(current);
> +		__this_cpu_write(context_tracking.state, IN_KERNEL);
>  	}
>  	local_irq_restore(flags);
>  }
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index 293b202..da0a9e7 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -3,6 +3,7 @@
>  #include <linux/tsacct_kern.h>
>  #include <linux/kernel_stat.h>
>  #include <linux/static_key.h>
> +#include <linux/context_tracking.h>
>  #include "sched.h"
> 
> 
> @@ -495,10 +496,24 @@ void vtime_task_switch(struct task_struct *prev)
>  #ifndef __ARCH_HAS_VTIME_ACCOUNT
>  void vtime_account(struct task_struct *tsk)
>  {
> -	if (in_interrupt() || !is_idle_task(tsk))
> -		vtime_account_system(tsk);
> -	else
> -		vtime_account_idle(tsk);
> +	if (!in_interrupt()) {
> +		/*
> +		 * If we interrupted user, context_tracking_in_user()
> +		 * is 1 because the context tracking don't hook
> +		 * on irq entry/exit. This way we know if
> +		 * we need to flush user time on kernel entry.
> +		 */
> +		if (context_tracking_in_user()) {
> +			vtime_account_user(tsk);
> +			return;
> +		}
> +
> +		if (is_idle_task(tsk)) {
> +			vtime_account_idle(tsk);
> +			return;
> +		}
> +	}
> +	vtime_account_system(tsk);
>  }
>  EXPORT_SYMBOL_GPL(vtime_account);
>  #endif /* __ARCH_HAS_VTIME_ACCOUNT */
> @@ -586,4 +601,72 @@ void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime
>  	thread_group_cputime(p, &cputime);
>  	cputime_adjust(&cputime, &p->signal->prev_cputime, ut, st);
>  }
> -#endif
> +
> +#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
> +static DEFINE_PER_CPU(long, last_jiffies) = INITIAL_JIFFIES;
> +
> +static cputime_t get_vtime_delta(void)
> +{
> +	long delta;
> +
> +	delta = jiffies - __this_cpu_read(last_jiffies);
> +	__this_cpu_add(last_jiffies, delta);
> +
> +	return jiffies_to_cputime(delta);
> +}
> +
> +void vtime_account_system(struct task_struct *tsk)
> +{
> +	cputime_t delta_cpu = get_vtime_delta();
> +
> +	account_system_time(tsk, irq_count(), delta_cpu, cputime_to_scaled(delta_cpu));
> +}
> +
> +void vtime_account_user(struct task_struct *tsk)
> +{
> +	cputime_t delta_cpu = get_vtime_delta();
> +
> +	/*
> +	 * This is an unfortunate hack: if we flush user time only on
> +	 * irq entry, we miss the jiffies update and the time is spuriously
> +	 * accounted to system time.
> +	 */
> +	if (context_tracking_in_user())
> +		account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
> +}
> +
> +void vtime_account_idle(struct task_struct *tsk)
> +{
> +	cputime_t delta_cpu = get_vtime_delta();
> +
> +	account_idle_time(delta_cpu);
> +}
> +
> +static int __cpuinit vtime_cpu_notify(struct notifier_block *self,
> +				      unsigned long action, void *hcpu)
> +{
> +	long cpu = (long)hcpu;
> +	long *last_jiffies_cpu = per_cpu_ptr(&last_jiffies, cpu);
> +
> +	switch (action) {
> +	case CPU_UP_PREPARE:
> +	case CPU_UP_PREPARE_FROZEN:
> +		/*
> +		 * CHECKME: ensure that's visible by the CPU
> +		 * once it wakes up
> +		 */
> +		*last_jiffies_cpu = jiffies;
> +	default:
> +		break;
> +	}
> +
> +	return NOTIFY_OK;
> +}
> +
> +static int __init init_vtime(void)
> +{
> +	cpu_notifier(vtime_cpu_notify, 0);
> +	return 0;
> +}
> +early_initcall(init_vtime);
> +#endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 02/24] cputime: Generic on-demand virtual cputime accounting
  2012-12-26  8:19   ` Li Zhong
@ 2012-12-29 13:15     ` Frederic Weisbecker
  0 siblings, 0 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-29 13:15 UTC (permalink / raw)
  To: Li Zhong
  Cc: LKML, Alessio Igor Bogani, Andrew Morton, Chris Metcalf,
	Christoph Lameter, Geoff Levand, Gilad Ben Yossef, Hakan Akkan,
	Ingo Molnar, Paul E. McKenney, Paul Gortmaker, Peter Zijlstra,
	Steven Rostedt, Thomas Gleixner

2012/12/26 Li Zhong <zhong@linux.vnet.ibm.com>:
> On Thu, 2012-12-20 at 19:32 +0100, Frederic Weisbecker wrote:
>> diff --git a/init/Kconfig b/init/Kconfig
>> index 60579d6..a64b3e8 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -340,7 +340,9 @@ config TICK_CPU_ACCOUNTING
>>
>>  config VIRT_CPU_ACCOUNTING
>>       bool "Deterministic task and CPU time accounting"
>> -     depends on HAVE_VIRT_CPU_ACCOUNTING
>> +     depends on HAVE_VIRT_CPU_ACCOUNTING || HAVE_CONTEXT_TRACKING
>> +     select VIRT_CPU_ACCOUNTING_GEN if !HAVE_VIRT_CPU_ACCOUNTING
>> +     default y if PPC64
>
> I saw
> "init/Kconfig:346:warning: defaults for choice values not supported"
> on this line. So maybe we don't need it. And we already have
>         "default VIRT_CPU_ACCOUNTING if PPC64"
>
> Thanks, Zhong

Fixed for the next version, thanks!

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 20/24] nohz: Full dynticks mode
  2012-12-26  6:12   ` Namhyung Kim
  2012-12-26  7:02     ` Namhyung Kim
@ 2012-12-29 13:21     ` Frederic Weisbecker
  1 sibling, 0 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2012-12-29 13:21 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: LKML, Alessio Igor Bogani, Andrew Morton, Avi Kivity,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Ingo Molnar, Paul E. McKenney, Paul Gortmaker,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner

2012/12/26 Namhyung Kim <namhyung@kernel.org>:
> Hi Frederic,
>
> On Thu, 20 Dec 2012 19:33:07 +0100, Frederic Weisbecker wrote:
>> When a CPU is in full dynticks mode, try to switch
>> it to nohz mode from the interrupt exit path if it is
>> running a single non-idle task.
>>
>> Then restart the tick if necessary if we are enqueuing a
>> second task while the timer is stopped, so that the scheduler
>> tick is rearmed.
>>
>> [TODO: Check remaining things to be done from scheduler_tick()]
>>
>> [ Included build fix from Geoff Levand ]
> [snip]
>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> index db3d4df..f3d8f4a 100644
>> --- a/kernel/sched/sched.h
>> +++ b/kernel/sched/sched.h
>> @@ -943,6 +943,14 @@ static inline u64 steal_ticks(u64 steal)
>>  static inline void inc_nr_running(struct rq *rq)
>>  {
>>       rq->nr_running++;
>> +
>> +     if (rq->nr_running == 2) {
>> +             if (tick_nohz_full_cpu(rq->cpu)) {
>> +                     /* Order rq->nr_running write against the IPI */
>> +                     smp_wmb();
>> +                     smp_send_reschedule(rq->cpu);
>> +             }
>> +     }
>
> This block should be guarded with #ifdef CONFIG_SMP, otherwise:
>
>   CC      kernel/sched/core.o
> In file included from /home/namhyung/project/linux/kernel/sched/core.c:85:0:
> /home/namhyung/project/linux/kernel/sched/sched.h: In function ‘inc_nr_running’:
> /home/namhyung/project/linux/kernel/sched/sched.h:960:28: error: ‘struct rq’ has no member named ‘cpu’
> /home/namhyung/project/linux/kernel/sched/sched.h:963:26: error: ‘struct rq’ has no member named ‘cpu’

Right, to fix this, I'm making it depend on CONFIG_NO_HZ_FULL and make
CONFIG_NO_HZ_FULL depend on SMP for the next version.

Thanks!

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [ANNOUNCE] 3.7-nohz1
  2012-12-23 23:43   ` Frederic Weisbecker
@ 2012-12-30  3:56     ` Paul E. McKenney
  2013-01-04 23:42       ` Frederic Weisbecker
  0 siblings, 1 reply; 44+ messages in thread
From: Paul E. McKenney @ 2012-12-30  3:56 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Steven Rostedt, LKML, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul Gortmaker,
	Peter Zijlstra, Thomas Gleixner, Li Zhong

On Mon, Dec 24, 2012 at 12:43:25AM +0100, Frederic Weisbecker wrote:
> 2012/12/21 Steven Rostedt <rostedt@goodmis.org>:
> > On Thu, 2012-12-20 at 19:32 +0100, Frederic Weisbecker wrote:
> >> Let's imagine you have 4 CPUs. We keep the CPU 0 to offline RCU callbacks there and to
> >> handle the timekeeping. We set the rest as full dynticks. So you need the following kernel
> >> parameters:
> >>
> >>       rcu_nocbs=1-3 full_nohz=1-3
> >>
> >> (Note rcu_nocbs value must always be the same as full_nohz).
> >
> > Why? You can't have: rcu_nocbs=1-4 full_nohz=1-3
> 
> That should be allowed.
> 
> >   or: rcu_nocbs=1-3 full_nohz=1-4 ?
> 
> But that not.
> 
> You need to have: rcu_nocbs & full_nohz == full_nohz. This is because
> the tick is not there to maintain the local RCU callbacks anymore. So
> this must be offloaded to the rcu_nocb threads.
> 
> I just have a doubt with rcu_nocb. Do we still need the tick to
> complete the grace period for local rcu callbacks? I need to discuss
> that with Paul.

The tick is only needed if rcu_needs_cpu() returns false.  Of course,
this means that if you don't invoke rcu_needs_cpu() before returning to
adaptive-idle usermode execution, you are correct that a full_nohz CPU
would also have to be a rcu_nocbs CPU.

That said, I am getting close to having an rcu_needs_cpu() that only
returns false if there are callbacks immediately ready to invoke, at
least if RCU_FAST_NO_HZ=y.

							Thanx, Paul

> > That needs to be fixed. Either with a warning, and/or to force the two
> > to be the same. That is, if they specify:
> >
> >   rcu_nocbs=1-3 full_nohz=1-4
> >
> > Then set rcu_nocbs=1-4 with a warning about it. Or simply set
> >  full_nohz=1-3.
> 
> Yep, will do.
> 
> Thanks!
> 
> >
> > -- Steve
> >
> >>
> >> Now if you want proper isolation you need to:
> >>
> >> * Migrate your processes adequately
> >> * Migrate your irqs to CPU 0
> >> * Migrate the RCU nocb threads to CPU 0. Example with the above configuration:
> >>
> >>       for p in $(ps -o pid= -C rcuo1,rcuo2,rcuo3)
> >>       do
> >>               taskset -cp 0 $p
> >>       done
> >>
> >> Then run what you want on the full dynticks CPUs. For best results, run 1 task
> >> per CPU, mostly in userspace and mostly CPU bound (otherwise more IO = more kernel
> >> mode execution = more chances to get IPIs, tick restarted, workqueues, kthreads, etc...)
> >>
> >> This page contains a good reminder for those interested in CPU isolation: https://github.com/gby/linux/wiki
> >>
> >> But keep in mind that my tree is not yet ready for serious production.
> >>
> >
> >
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [ANNOUNCE] 3.7-nohz1
  2012-12-30  3:56     ` Paul E. McKenney
@ 2013-01-04 23:42       ` Frederic Weisbecker
  2013-01-07 13:06         ` Paul E. McKenney
  0 siblings, 1 reply; 44+ messages in thread
From: Frederic Weisbecker @ 2013-01-04 23:42 UTC (permalink / raw)
  To: paulmck
  Cc: Steven Rostedt, LKML, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul Gortmaker,
	Peter Zijlstra, Thomas Gleixner, Li Zhong

2012/12/30 Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> On Mon, Dec 24, 2012 at 12:43:25AM +0100, Frederic Weisbecker wrote:
>> 2012/12/21 Steven Rostedt <rostedt@goodmis.org>:
>> > On Thu, 2012-12-20 at 19:32 +0100, Frederic Weisbecker wrote:
>> >> Let's imagine you have 4 CPUs. We keep the CPU 0 to offline RCU callbacks there and to
>> >> handle the timekeeping. We set the rest as full dynticks. So you need the following kernel
>> >> parameters:
>> >>
>> >>       rcu_nocbs=1-3 full_nohz=1-3
>> >>
>> >> (Note rcu_nocbs value must always be the same as full_nohz).
>> >
>> > Why? You can't have: rcu_nocbs=1-4 full_nohz=1-3
>>
>> That should be allowed.
>>
>> >   or: rcu_nocbs=1-3 full_nohz=1-4 ?
>>
>> But that not.
>>
>> You need to have: rcu_nocbs & full_nohz == full_nohz. This is because
>> the tick is not there to maintain the local RCU callbacks anymore. So
>> this must be offloaded to the rcu_nocb threads.
>>
>> I just have a doubt with rcu_nocb. Do we still need the tick to
>> complete the grace period for local rcu callbacks? I need to discuss
>> that with Paul.
>
> The tick is only needed if rcu_needs_cpu() returns false.  Of course,
> this means that if you don't invoke rcu_needs_cpu() before returning to
> adaptive-idle usermode execution, you are correct that a full_nohz CPU
> would also have to be a rcu_nocbs CPU.
>
> That said, I am getting close to having an rcu_needs_cpu() that only
> returns false if there are callbacks immediately ready to invoke, at
> least if RCU_FAST_NO_HZ=y.

Ok. Also when a CPU enqueues a callback and starts a grace period, the
tick polls on the grace period completion. How is it handled with
rcu_nocbs CPUs? Does rcu_needs_cpu() return false until the grace
period is completed? If so I still need to restart the local tick
whenever a new callback is enqueued.

Thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [ANNOUNCE] 3.7-nohz1
  2013-01-04 23:42       ` Frederic Weisbecker
@ 2013-01-07 13:06         ` Paul E. McKenney
  0 siblings, 0 replies; 44+ messages in thread
From: Paul E. McKenney @ 2013-01-07 13:06 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Steven Rostedt, LKML, Alessio Igor Bogani, Andrew Morton,
	Avi Kivity, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Ingo Molnar, Paul Gortmaker,
	Peter Zijlstra, Thomas Gleixner, Li Zhong

On Sat, Jan 05, 2013 at 12:42:53AM +0100, Frederic Weisbecker wrote:
> 2012/12/30 Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> > On Mon, Dec 24, 2012 at 12:43:25AM +0100, Frederic Weisbecker wrote:
> >> 2012/12/21 Steven Rostedt <rostedt@goodmis.org>:
> >> > On Thu, 2012-12-20 at 19:32 +0100, Frederic Weisbecker wrote:
> >> >> Let's imagine you have 4 CPUs. We keep the CPU 0 to offline RCU callbacks there and to
> >> >> handle the timekeeping. We set the rest as full dynticks. So you need the following kernel
> >> >> parameters:
> >> >>
> >> >>       rcu_nocbs=1-3 full_nohz=1-3
> >> >>
> >> >> (Note rcu_nocbs value must always be the same as full_nohz).
> >> >
> >> > Why? You can't have: rcu_nocbs=1-4 full_nohz=1-3
> >>
> >> That should be allowed.
> >>
> >> >   or: rcu_nocbs=1-3 full_nohz=1-4 ?
> >>
> >> But that not.
> >>
> >> You need to have: rcu_nocbs & full_nohz == full_nohz. This is because
> >> the tick is not there to maintain the local RCU callbacks anymore. So
> >> this must be offloaded to the rcu_nocb threads.
> >>
> >> I just have a doubt with rcu_nocb. Do we still need the tick to
> >> complete the grace period for local rcu callbacks? I need to discuss
> >> that with Paul.
> >
> > The tick is only needed if rcu_needs_cpu() returns false.  Of course,
> > this means that if you don't invoke rcu_needs_cpu() before returning to
> > adaptive-idle usermode execution, you are correct that a full_nohz CPU
> > would also have to be a rcu_nocbs CPU.
> >
> > That said, I am getting close to having an rcu_needs_cpu() that only
> > returns false if there are callbacks immediately ready to invoke, at
> > least if RCU_FAST_NO_HZ=y.
> 
> Ok. Also when a CPU enqueues a callback and starts a grace period, the
> tick polls on the grace period completion.

If RCU_FAST_NO_HZ=n, then yes, this is the case, but only for !rcu_nocbs
CPUs.

>                                            How is it handled with
> rcu_nocbs CPUs? Does rcu_needs_cpu() return false until the grace
> period is completed? If so I still need to restart the local tick
> whenever a new callback is enqueued.

Each rcu_nocbs CPU has a kthread, and that kthread is responsible for
making sure that any needed grace periods move forward.  In mainline, this
is done via CPU 0, which is required to be a !rcu_nocbs CPU.  In -rcu,
the no-CBs kthreads communicate with the grace-period kthread via the
rcu_node tree, so that if all CPUs are rcu_nocbs CPUs, rcu_needs_cpu()
will always return false, even if RCU_FAST_NO_HZ=n.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2013-01-07 13:07 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
2012-12-20 18:32 ` [PATCH 01/24] context_tracking: Add comments on interface and internals Frederic Weisbecker
2012-12-20 18:32 ` [PATCH 02/24] cputime: Generic on-demand virtual cputime accounting Frederic Weisbecker
2012-12-21  5:11   ` Steven Rostedt
2012-12-26  8:19   ` Li Zhong
2012-12-29 13:15     ` Frederic Weisbecker
2012-12-20 18:32 ` [PATCH 03/24] cputime: Allow dynamic switch between tick/virtual based " Frederic Weisbecker
2012-12-21 15:05   ` Steven Rostedt
2012-12-22 17:43     ` Frederic Weisbecker
2012-12-20 18:32 ` [PATCH 04/24] cputime: Use accessors to read task cputime stats Frederic Weisbecker
2012-12-20 18:32 ` [PATCH 05/24] cputime: Safely read cputime of full dynticks CPUs Frederic Weisbecker
2012-12-21 15:09   ` Steven Rostedt
2012-12-22 17:51     ` Frederic Weisbecker
2012-12-20 18:32 ` [PATCH 06/24] nohz: Basic full dynticks interface Frederic Weisbecker
2012-12-20 18:32 ` [PATCH 07/24] nohz: Assign timekeeping duty to a non-full-nohz CPU Frederic Weisbecker
2012-12-21 16:13   ` Steven Rostedt
2012-12-22 16:39     ` Frederic Weisbecker
2012-12-22 17:05       ` Steven Rostedt
2012-12-20 18:32 ` [PATCH 08/24] nohz: Trace timekeeping update Frederic Weisbecker
2012-12-20 18:32 ` [PATCH 09/24] nohz: Wake up full dynticks CPUs when a timer gets enqueued Frederic Weisbecker
2012-12-20 18:32 ` [PATCH 10/24] rcu: Restart the tick on non-responding full dynticks CPUs Frederic Weisbecker
2012-12-20 18:32 ` [PATCH 11/24] sched: Comment on rq->clock correctness in ttwu_do_wakeup() in nohz Frederic Weisbecker
2012-12-20 18:32 ` [PATCH 12/24] sched: Update rq clock on nohz CPU before migrating tasks Frederic Weisbecker
2012-12-20 18:33 ` [PATCH 13/24] sched: Update rq clock on nohz CPU before setting fair group shares Frederic Weisbecker
2012-12-20 18:33 ` [PATCH 14/24] sched: Update rq clock on tickless CPUs before calling check_preempt_curr() Frederic Weisbecker
2012-12-20 18:33 ` [PATCH 15/24] sched: Update rq clock earlier in unthrottle_cfs_rq Frederic Weisbecker
2012-12-20 18:33 ` [PATCH 16/24] sched: Update clock of nohz busiest rq before balancing Frederic Weisbecker
2012-12-20 18:33 ` [PATCH 17/24] sched: Update rq clock before idle balancing Frederic Weisbecker
2012-12-20 18:33 ` [PATCH 18/24] sched: Update nohz rq clock before searching busiest group on load balancing Frederic Weisbecker
2012-12-20 18:33 ` [PATCH 19/24] nohz: Move nohz load balancer selection into idle logic Frederic Weisbecker
2012-12-20 18:33 ` [PATCH 20/24] nohz: Full dynticks mode Frederic Weisbecker
2012-12-26  6:12   ` Namhyung Kim
2012-12-26  7:02     ` Namhyung Kim
2012-12-29 13:21     ` Frederic Weisbecker
2012-12-20 18:33 ` [PATCH 21/24] nohz: Only stop the tick on RCU nocb CPUs Frederic Weisbecker
2012-12-20 18:33 ` [PATCH 22/24] nohz: Don't turn off the tick if rcu needs it Frederic Weisbecker
2012-12-20 18:33 ` [PATCH 23/24] nohz: Don't stop the tick if posix cpu timers are running Frederic Weisbecker
2012-12-20 18:33 ` [PATCH 24/24] nohz: Add some tracing Frederic Weisbecker
2012-12-21  2:35 ` [ANNOUNCE] 3.7-nohz1 Steven Rostedt
2012-12-23 23:43   ` Frederic Weisbecker
2012-12-30  3:56     ` Paul E. McKenney
2013-01-04 23:42       ` Frederic Weisbecker
2013-01-07 13:06         ` Paul E. McKenney
2012-12-21  5:20 ` Hakan Akkan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).