All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL] nohz patches for 3.12 preview v3
@ 2013-08-01  0:31 Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 01/23] sched: Consolidate open coded preemptible() checks Frederic Weisbecker
                   ` (23 more replies)
  0 siblings, 24 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2013-08-01  0:31 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Mike Galbraith, Kevin Hilman, Martin Schwidefsky,
	Heiko Carstens, Geert Uytterhoeven, Alex Shi, Paul Turner,
	Vincent Guittot

Hi,

So none of the patches from the previous v2 posting have changed.
I've just added two more in order to fix build crashes reported
by Wu Fengguang:

      hardirq: Split preempt count mask definitions
      m68k: hardirq_count() only need preempt_mask.h

If no comment arise, I'll send a pull request to Ingo in a few days.

Thanks.

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	timers/nohz-3.12-preview-v3
---

Frederic Weisbecker (23):
      sched: Consolidate open coded preemptible() checks
      context_tracing: Fix guest accounting with native vtime
      vtime: Update a few comments
      context_tracking: Fix runtime CPU off-case
      nohz: Only enable context tracking on full dynticks CPUs
      context_tracking: Remove full dynticks' hacky dependency on wide context tracking
      context_tracking: Ground setup for static key use
      context_tracking: Optimize main APIs off case with static key
      context_tracking: Optimize guest APIs off case with static key
      context_tracking: Optimize context switch off case with static keys
      context_tracking: User/kernel broundary cross trace events
      vtime: Remove a few unneeded generic vtime state checks
      vtime: Fix racy cputime delta update
      context_tracking: Split low level state headers
      hardirq: Split preempt count mask definitions
      m68k: hardirq_count() only need preempt_mask.h
      vtime: Describe overriden functions in dedicated arch headers
      vtime: Optimize full dynticks accounting off case with static keys
      vtime: Always scale generic vtime accounting results
      vtime: Always debug check snapshot source _before_ updating it
      nohz: Rename a few state variables
      nohz: Optimize full dynticks state checks with static keys
      nohz: Optimize full dynticks's sched hooks with static keys


 arch/ia64/include/asm/Kbuild            |    1 +
 arch/m68k/include/asm/irqflags.h        |    2 +-
 arch/powerpc/include/asm/Kbuild         |    1 +
 arch/s390/include/asm/cputime.h         |    3 -
 arch/s390/include/asm/vtime.h           |    7 ++
 arch/s390/kernel/vtime.c                |    1 +
 include/linux/context_tracking.h        |  120 +++++++++++++++--------------
 include/linux/context_tracking_state.h  |   39 +++++++++
 include/linux/hardirq.h                 |  117 +----------------------------
 include/linux/preempt_mask.h            |  122 +++++++++++++++++++++++++++++
 include/linux/tick.h                    |   45 +++++++++--
 include/linux/vtime.h                   |   74 ++++++++++++++++--
 include/trace/events/context_tracking.h |   58 ++++++++++++++
 init/Kconfig                            |   28 +++++--
 kernel/context_tracking.c               |  128 ++++++++++++++++++-------------
 kernel/sched/core.c                     |    4 +-
 kernel/sched/cputime.c                  |   53 ++++---------
 kernel/time/Kconfig                     |    1 -
 kernel/time/tick-sched.c                |   56 ++++++--------
 19 files changed, 534 insertions(+), 326 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 01/23] sched: Consolidate open coded preemptible() checks
  2013-08-01  0:31 [GIT PULL] nohz patches for 3.12 preview v3 Frederic Weisbecker
@ 2013-08-01  0:31 ` Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 02/23] context_tracing: Fix guest accounting with native vtime Frederic Weisbecker
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2013-08-01  0:31 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Ingo Molnar, Li Zhong, Paul E. McKenney,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner, Borislav Petkov,
	Alex Shi, Paul Turner, Mike Galbraith, Vincent Guittot

preempt_schedule() and preempt_schedule_context() open
code their preemptability checks.

Use the standard API instead for consolidation.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alex Shi <alex.shi@intel.com>
Cc: Paul Turner <pjt@google.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/context_tracking.c |    3 +--
 kernel/sched/core.c       |    4 +---
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 383f823..942835c 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -87,10 +87,9 @@ void user_enter(void)
  */
 void __sched notrace preempt_schedule_context(void)
 {
-	struct thread_info *ti = current_thread_info();
 	enum ctx_state prev_ctx;
 
-	if (likely(ti->preempt_count || irqs_disabled()))
+	if (likely(!preemptible()))
 		return;
 
 	/*
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b7c32cb..3fb7ace 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2510,13 +2510,11 @@ void __sched schedule_preempt_disabled(void)
  */
 asmlinkage void __sched notrace preempt_schedule(void)
 {
-	struct thread_info *ti = current_thread_info();
-
 	/*
 	 * If there is a non-zero preempt_count or interrupts are disabled,
 	 * we do not want to preempt the current task. Just return..
 	 */
-	if (likely(ti->preempt_count || irqs_disabled()))
+	if (likely(!preemptible()))
 		return;
 
 	do {
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 02/23] context_tracing: Fix guest accounting with native vtime
  2013-08-01  0:31 [GIT PULL] nohz patches for 3.12 preview v3 Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 01/23] sched: Consolidate open coded preemptible() checks Frederic Weisbecker
@ 2013-08-01  0:31 ` Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 03/23] vtime: Update a few comments Frederic Weisbecker
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2013-08-01  0:31 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Mike Galbraith, Kevin Hilman

1) If context tracking is enabled with native vtime accounting (which
combo is useless except for dev testing), we call vtime_guest_enter()
and vtime_guest_exit() on host <-> guest switches. But those are stubs
in this configurations. As a result, cputime is not correctly flushed
on kvm context switches.

2) If context tracking runs but is disabled on some CPUs, those
CPUs end up calling __guest_enter/__guest_exit which in turn
call vtime_account_system(). We don't want to call this because we
run in tick based accounting for these CPUs.

Refactor the guest_enter/guest_exit code such that all combinations
finally work.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
---
 include/linux/context_tracking.h |   52 ++++++++++++++++----------------------
 kernel/context_tracking.c        |    6 +++-
 2 files changed, 26 insertions(+), 32 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index fc09d7b..5984f25 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -20,25 +20,6 @@ struct context_tracking {
 	} state;
 };
 
-static inline void __guest_enter(void)
-{
-	/*
-	 * This is running in ioctl context so we can avoid
-	 * the call to vtime_account() with its unnecessary idle check.
-	 */
-	vtime_account_system(current);
-	current->flags |= PF_VCPU;
-}
-
-static inline void __guest_exit(void)
-{
-	/*
-	 * This is running in ioctl context so we can avoid
-	 * the call to vtime_account() with its unnecessary idle check.
-	 */
-	vtime_account_system(current);
-	current->flags &= ~PF_VCPU;
-}
 
 #ifdef CONFIG_CONTEXT_TRACKING
 DECLARE_PER_CPU(struct context_tracking, context_tracking);
@@ -56,9 +37,6 @@ static inline bool context_tracking_active(void)
 extern void user_enter(void);
 extern void user_exit(void);
 
-extern void guest_enter(void);
-extern void guest_exit(void);
-
 static inline enum ctx_state exception_enter(void)
 {
 	enum ctx_state prev_ctx;
@@ -81,21 +59,35 @@ extern void context_tracking_task_switch(struct task_struct *prev,
 static inline bool context_tracking_in_user(void) { return false; }
 static inline void user_enter(void) { }
 static inline void user_exit(void) { }
+static inline enum ctx_state exception_enter(void) { return 0; }
+static inline void exception_exit(enum ctx_state prev_ctx) { }
+static inline void context_tracking_task_switch(struct task_struct *prev,
+						struct task_struct *next) { }
+#endif /* !CONFIG_CONTEXT_TRACKING */
 
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+extern void guest_enter(void);
+extern void guest_exit(void);
+#else
 static inline void guest_enter(void)
 {
-	__guest_enter();
+	/*
+	 * This is running in ioctl context so we can avoid
+	 * the call to vtime_account() with its unnecessary idle check.
+	 */
+	vtime_account_system(current);
+	current->flags |= PF_VCPU;
 }
 
 static inline void guest_exit(void)
 {
-	__guest_exit();
+	/*
+	 * This is running in ioctl context so we can avoid
+	 * the call to vtime_account() with its unnecessary idle check.
+	 */
+	vtime_account_system(current);
+	current->flags &= ~PF_VCPU;
 }
-
-static inline enum ctx_state exception_enter(void) { return 0; }
-static inline void exception_exit(enum ctx_state prev_ctx) { }
-static inline void context_tracking_task_switch(struct task_struct *prev,
-						struct task_struct *next) { }
-#endif /* !CONFIG_CONTEXT_TRACKING */
+#endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
 
 #endif
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 942835c..1f47119 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -141,12 +141,13 @@ void user_exit(void)
 	local_irq_restore(flags);
 }
 
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
 void guest_enter(void)
 {
 	if (vtime_accounting_enabled())
 		vtime_guest_enter(current);
 	else
-		__guest_enter();
+		current->flags |= PF_VCPU;
 }
 EXPORT_SYMBOL_GPL(guest_enter);
 
@@ -155,9 +156,10 @@ void guest_exit(void)
 	if (vtime_accounting_enabled())
 		vtime_guest_exit(current);
 	else
-		__guest_exit();
+		current->flags &= ~PF_VCPU;
 }
 EXPORT_SYMBOL_GPL(guest_exit);
+#endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
 
 
 /**
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 03/23] vtime: Update a few comments
  2013-08-01  0:31 [GIT PULL] nohz patches for 3.12 preview v3 Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 01/23] sched: Consolidate open coded preemptible() checks Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 02/23] context_tracing: Fix guest accounting with native vtime Frederic Weisbecker
@ 2013-08-01  0:31 ` Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 04/23] context_tracking: Fix runtime CPU off-case Frederic Weisbecker
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2013-08-01  0:31 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Mike Galbraith, Kevin Hilman

Update a stale comment from the old vtime era and document some
locking that might be non obvious.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
---
 include/linux/context_tracking.h |   10 ++++------
 kernel/sched/cputime.c           |    7 +++++++
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 5984f25..d883ff0 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -72,8 +72,9 @@ extern void guest_exit(void);
 static inline void guest_enter(void)
 {
 	/*
-	 * This is running in ioctl context so we can avoid
-	 * the call to vtime_account() with its unnecessary idle check.
+	 * This is running in ioctl context so its safe
+	 * to assume that it's the stime pending cputime
+	 * to flush.
 	 */
 	vtime_account_system(current);
 	current->flags |= PF_VCPU;
@@ -81,10 +82,7 @@ static inline void guest_enter(void)
 
 static inline void guest_exit(void)
 {
-	/*
-	 * This is running in ioctl context so we can avoid
-	 * the call to vtime_account() with its unnecessary idle check.
-	 */
+	/* Flush the guest cputime we spent on the guest */
 	vtime_account_system(current);
 	current->flags &= ~PF_VCPU;
 }
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index a7959e0..223a35e 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -712,6 +712,13 @@ void vtime_user_enter(struct task_struct *tsk)
 
 void vtime_guest_enter(struct task_struct *tsk)
 {
+	/*
+	 * The flags must be updated under the lock with
+	 * the vtime_snap flush and update.
+	 * That enforces a right ordering and update sequence
+	 * synchronization against the reader (task_gtime())
+	 * that can thus safely catch up with a tickless delta.
+	 */
 	write_seqlock(&tsk->vtime_seqlock);
 	__vtime_account_system(tsk);
 	current->flags |= PF_VCPU;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 04/23] context_tracking: Fix runtime CPU off-case
  2013-08-01  0:31 [GIT PULL] nohz patches for 3.12 preview v3 Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2013-08-01  0:31 ` [PATCH 03/23] vtime: Update a few comments Frederic Weisbecker
@ 2013-08-01  0:31 ` Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 05/23] nohz: Only enable context tracking on full dynticks CPUs Frederic Weisbecker
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2013-08-01  0:31 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Mike Galbraith, Kevin Hilman

As long as the context tracking is enabled on any CPU, even
a single one, all other CPUs need to keep track of their
user <-> kernel boundaries cross as well.

This is because a task can sleep while servicing an exception
that happened in the kernel or in userspace. Then when the task
eventually wakes up and return from the exception, the CPU needs
to know if we resume in userspace or in the kernel. exception_exit()
get this information from exception_enter() that saved the previous
state.

If the CPU where the exception happened didn't keep track of
these informations, exception_exit() doesn't know which state
tracking to restore on the CPU where the task got migrated
and we may return to userspace with the context tracking
subsystem thinking that we are in kernel mode.

This can be fixed in the long term if we move our context tracking
probes on very low level arch fast path user <-> kernel boundary,
although even that is worrisome as an exception can still happen
in the few instructions between the probe and the actual iret.

Also we are not yet ready to set these probes in the fast path given
the potential overhead problem it induces.

So let's fix this by always enable context tracking even on CPUs
that are not in the full dynticks range. OTOH we can spare the
rcu_user_*() and vtime_user_*() calls there because the tick runs
on these CPUs and we can handle RCU state machine and cputime
accounting through it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
---
 kernel/context_tracking.c |   52 ++++++++++++++++++++++++++++----------------
 1 files changed, 33 insertions(+), 19 deletions(-)

diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 1f47119..7b095de 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -54,17 +54,31 @@ void user_enter(void)
 	WARN_ON_ONCE(!current->mm);
 
 	local_irq_save(flags);
-	if (__this_cpu_read(context_tracking.active) &&
-	    __this_cpu_read(context_tracking.state) != IN_USER) {
+	if ( __this_cpu_read(context_tracking.state) != IN_USER) {
+		if (__this_cpu_read(context_tracking.active)) {
+			/*
+			 * At this stage, only low level arch entry code remains and
+			 * then we'll run in userspace. We can assume there won't be
+			 * any RCU read-side critical section until the next call to
+			 * user_exit() or rcu_irq_enter(). Let's remove RCU's dependency
+			 * on the tick.
+			 */
+			vtime_user_enter(current);
+			rcu_user_enter();
+		}
 		/*
-		 * At this stage, only low level arch entry code remains and
-		 * then we'll run in userspace. We can assume there won't be
-		 * any RCU read-side critical section until the next call to
-		 * user_exit() or rcu_irq_enter(). Let's remove RCU's dependency
-		 * on the tick.
+		 * Even if context tracking is disabled on this CPU, because it's outside
+		 * the full dynticks mask for example, we still have to keep track of the
+		 * context transitions and states to prevent inconsistency on those of
+		 * other CPUs.
+		 * If a task triggers an exception in userspace, sleep on the exception
+		 * handler and then migrate to another CPU, that new CPU must know where
+		 * the exception returns by the time we call exception_exit().
+		 * This information can only be provided by the previous CPU when it called
+		 * exception_enter().
+		 * OTOH we can spare the calls to vtime and RCU when context_tracking.active
+		 * is false because we know that CPU is not tickless.
 		 */
-		vtime_user_enter(current);
-		rcu_user_enter();
 		__this_cpu_write(context_tracking.state, IN_USER);
 	}
 	local_irq_restore(flags);
@@ -130,12 +144,14 @@ void user_exit(void)
 
 	local_irq_save(flags);
 	if (__this_cpu_read(context_tracking.state) == IN_USER) {
-		/*
-		 * We are going to run code that may use RCU. Inform
-		 * RCU core about that (ie: we may need the tick again).
-		 */
-		rcu_user_exit();
-		vtime_user_exit(current);
+		if (__this_cpu_read(context_tracking.active)) {
+			/*
+			 * We are going to run code that may use RCU. Inform
+			 * RCU core about that (ie: we may need the tick again).
+			 */
+			rcu_user_exit();
+			vtime_user_exit(current);
+		}
 		__this_cpu_write(context_tracking.state, IN_KERNEL);
 	}
 	local_irq_restore(flags);
@@ -178,8 +194,6 @@ EXPORT_SYMBOL_GPL(guest_exit);
 void context_tracking_task_switch(struct task_struct *prev,
 			     struct task_struct *next)
 {
-	if (__this_cpu_read(context_tracking.active)) {
-		clear_tsk_thread_flag(prev, TIF_NOHZ);
-		set_tsk_thread_flag(next, TIF_NOHZ);
-	}
+	clear_tsk_thread_flag(prev, TIF_NOHZ);
+	set_tsk_thread_flag(next, TIF_NOHZ);
 }
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 05/23] nohz: Only enable context tracking on full dynticks CPUs
  2013-08-01  0:31 [GIT PULL] nohz patches for 3.12 preview v3 Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2013-08-01  0:31 ` [PATCH 04/23] context_tracking: Fix runtime CPU off-case Frederic Weisbecker
@ 2013-08-01  0:31 ` Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 06/23] context_tracking: Remove full dynticks' hacky dependency on wide context tracking Frederic Weisbecker
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2013-08-01  0:31 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Mike Galbraith, Kevin Hilman

The context tracking subsystem has the ability to selectively
enable the tracking on any defined subset of CPU. This means that
we can define a CPU range that doesn't run the context tracking
and another range that does.

Now what we want in practice is to enable the tracking on full
dynticks CPUs only. In order to perform this, we just need to pass
our full dynticks CPU range selection from the full dynticks
subsystem to the context tracking.

This way we can spare the overhead of RCU user extended quiescent
state and vtime maintainance on the CPUs that are outside the
full dynticks range. Just keep in mind the raw context tracking
itself is still necessary everywhere.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
---
 include/linux/context_tracking.h |    2 ++
 kernel/context_tracking.c        |    5 +++++
 kernel/time/tick-sched.c         |    4 ++++
 3 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index d883ff0..1ae37c7 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -34,6 +34,8 @@ static inline bool context_tracking_active(void)
 	return __this_cpu_read(context_tracking.active);
 }
 
+extern void context_tracking_cpu_set(int cpu);
+
 extern void user_enter(void);
 extern void user_exit(void);
 
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 7b095de..72bcb25 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -26,6 +26,11 @@ DEFINE_PER_CPU(struct context_tracking, context_tracking) = {
 #endif
 };
 
+void context_tracking_cpu_set(int cpu)
+{
+	per_cpu(context_tracking.active, cpu) = true;
+}
+
 /**
  * user_enter - Inform the context tracking that the CPU is going to
  *              enter userspace mode.
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index e80183f..6d604fd 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -23,6 +23,7 @@
 #include <linux/irq_work.h>
 #include <linux/posix-timers.h>
 #include <linux/perf_event.h>
+#include <linux/context_tracking.h>
 
 #include <asm/irq_regs.h>
 
@@ -350,6 +351,9 @@ void __init tick_nohz_init(void)
 			return;
 	}
 
+	for_each_cpu(cpu, nohz_full_mask)
+		context_tracking_cpu_set(cpu);
+
 	cpu_notifier(tick_nohz_cpu_down_callback, 0);
 	cpulist_scnprintf(nohz_full_buf, sizeof(nohz_full_buf), nohz_full_mask);
 	pr_info("NO_HZ: Full dynticks CPUs: %s.\n", nohz_full_buf);
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 06/23] context_tracking: Remove full dynticks' hacky dependency on wide context tracking
  2013-08-01  0:31 [GIT PULL] nohz patches for 3.12 preview v3 Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2013-08-01  0:31 ` [PATCH 05/23] nohz: Only enable context tracking on full dynticks CPUs Frederic Weisbecker
@ 2013-08-01  0:31 ` Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 07/23] context_tracking: Ground setup for static key use Frederic Weisbecker
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2013-08-01  0:31 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Mike Galbraith, Kevin Hilman

Now that the full dynticks subsystem only enables the context tracking
on full dynticks CPUs, lets remove the dependency on CONTEXT_TRACKING_FORCE

This dependency was a hack to enable the context tracking widely for the
full dynticks susbsystem until the latter becomes able to enable it in a
more CPU-finegrained fashion.

Now CONTEXT_TRACKING_FORCE only stands for testing on archs that
work on support for the context tracking while full dynticks can't be
used yet due to unmet dependencies. It simulates a system where all CPUs
are full dynticks so that RCU user extended quiescent states and dynticks
cputime accounting can be tested on the given arch.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
---
 init/Kconfig        |   28 ++++++++++++++++++++++------
 kernel/time/Kconfig |    1 -
 2 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 247084b..ffbf5d7 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -527,13 +527,29 @@ config RCU_USER_QS
 config CONTEXT_TRACKING_FORCE
 	bool "Force context tracking"
 	depends on CONTEXT_TRACKING
-	default CONTEXT_TRACKING
+	default y if !NO_HZ_FULL
 	help
-	  Probe on user/kernel boundaries by default in order to
-	  test the features that rely on it such as userspace RCU extended
-	  quiescent states.
-	  This test is there for debugging until we have a real user like the
-	  full dynticks mode.
+	  The major pre-requirement for full dynticks to work is to
+	  support the context tracking subsystem. But there are also
+	  other dependencies to provide in order to make the full
+	  dynticks working.
+
+	  This option stands for testing when an arch implements the
+	  context tracking backend but doesn't yet fullfill all the
+	  requirements to make the full dynticks feature working.
+	  Without the full dynticks, there is no way to test the support
+	  for context tracking and the subsystems that rely on it: RCU
+	  userspace extended quiescent state and tickless cputime
+	  accounting. This option copes with the absence of the full
+	  dynticks subsystem by forcing the context tracking on all
+	  CPUs in the system.
+
+	  Say Y only if you're working on the developpement of an
+	  architecture backend for the context tracking.
+
+	  Say N otherwise, this option brings an overhead that you
+	  don't want in production.
+
 
 config RCU_FANOUT
 	int "Tree-based hierarchical RCU fanout value"
diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index 70f27e8..747bbc7 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -105,7 +105,6 @@ config NO_HZ_FULL
 	select RCU_USER_QS
 	select RCU_NOCB_CPU
 	select VIRT_CPU_ACCOUNTING_GEN
-	select CONTEXT_TRACKING_FORCE
 	select IRQ_WORK
 	help
 	 Adaptively try to shutdown the tick whenever possible, even when
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 07/23] context_tracking: Ground setup for static key use
  2013-08-01  0:31 [GIT PULL] nohz patches for 3.12 preview v3 Frederic Weisbecker
                   ` (5 preceding siblings ...)
  2013-08-01  0:31 ` [PATCH 06/23] context_tracking: Remove full dynticks' hacky dependency on wide context tracking Frederic Weisbecker
@ 2013-08-01  0:31 ` Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 08/23] context_tracking: Optimize main APIs off case with static key Frederic Weisbecker
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2013-08-01  0:31 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Mike Galbraith, Kevin Hilman

Prepare for using a static key in the context tracking subsystem.
This will help optimizing the off case on its many users:

* user_enter, user_exit, exception_enter, exception_exit, guest_enter,
  guest_exit, vtime_*()

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
---
 include/linux/context_tracking.h |    2 ++
 kernel/context_tracking.c        |   26 ++++++++++++++++++++------
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 1ae37c7..f9356eb 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -4,6 +4,7 @@
 #include <linux/sched.h>
 #include <linux/percpu.h>
 #include <linux/vtime.h>
+#include <linux/static_key.h>
 #include <asm/ptrace.h>
 
 struct context_tracking {
@@ -22,6 +23,7 @@ struct context_tracking {
 
 
 #ifdef CONFIG_CONTEXT_TRACKING
+extern struct static_key context_tracking_enabled;
 DECLARE_PER_CPU(struct context_tracking, context_tracking);
 
 static inline bool context_tracking_in_user(void)
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 72bcb25..f07505c 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -20,15 +20,16 @@
 #include <linux/hardirq.h>
 #include <linux/export.h>
 
-DEFINE_PER_CPU(struct context_tracking, context_tracking) = {
-#ifdef CONFIG_CONTEXT_TRACKING_FORCE
-	.active = true,
-#endif
-};
+struct static_key context_tracking_enabled = STATIC_KEY_INIT_FALSE;
+
+DEFINE_PER_CPU(struct context_tracking, context_tracking);
 
 void context_tracking_cpu_set(int cpu)
 {
-	per_cpu(context_tracking.active, cpu) = true;
+	if (!per_cpu(context_tracking.active, cpu)) {
+		per_cpu(context_tracking.active, cpu) = true;
+		static_key_slow_inc(&context_tracking_enabled);
+	}
 }
 
 /**
@@ -202,3 +203,16 @@ void context_tracking_task_switch(struct task_struct *prev,
 	clear_tsk_thread_flag(prev, TIF_NOHZ);
 	set_tsk_thread_flag(next, TIF_NOHZ);
 }
+
+#ifdef CONFIG_CONTEXT_TRACKING_FORCE
+static int __init context_tracking_init(void)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		context_tracking_cpu_set(cpu);
+
+	return 0;
+}
+early_initcall(context_tracking_init);
+#endif
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 08/23] context_tracking: Optimize main APIs off case with static key
  2013-08-01  0:31 [GIT PULL] nohz patches for 3.12 preview v3 Frederic Weisbecker
                   ` (6 preceding siblings ...)
  2013-08-01  0:31 ` [PATCH 07/23] context_tracking: Ground setup for static key use Frederic Weisbecker
@ 2013-08-01  0:31 ` Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 09/23] context_tracking: Optimize guest " Frederic Weisbecker
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2013-08-01  0:31 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Mike Galbraith, Kevin Hilman

Optimize user and exception entry/exit APIs with static
keys. This minimize the overhead for those who enable
CONFIG_NO_HZ_FULL without always using it. Having no range
passed to nohz_full= should result in the probes to be nopped
(at least we hope so...).

If this proves not be enough in the long term, we'll need
to bring an exception slow path by re-routing the exception
handlers.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
---
 include/linux/context_tracking.h |   27 ++++++++++++++++++++++-----
 kernel/context_tracking.c        |   12 ++++++------
 2 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index f9356eb..e5ec0c9 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -38,23 +38,40 @@ static inline bool context_tracking_active(void)
 
 extern void context_tracking_cpu_set(int cpu);
 
-extern void user_enter(void);
-extern void user_exit(void);
+extern void context_tracking_user_enter(void);
+extern void context_tracking_user_exit(void);
+
+static inline void user_enter(void)
+{
+	if (static_key_false(&context_tracking_enabled))
+		context_tracking_user_enter();
+
+}
+static inline void user_exit(void)
+{
+	if (static_key_false(&context_tracking_enabled))
+		context_tracking_user_exit();
+}
 
 static inline enum ctx_state exception_enter(void)
 {
 	enum ctx_state prev_ctx;
 
+	if (!static_key_false(&context_tracking_enabled))
+		return 0;
+
 	prev_ctx = this_cpu_read(context_tracking.state);
-	user_exit();
+	context_tracking_user_exit();
 
 	return prev_ctx;
 }
 
 static inline void exception_exit(enum ctx_state prev_ctx)
 {
-	if (prev_ctx == IN_USER)
-		user_enter();
+	if (static_key_false(&context_tracking_enabled)) {
+		if (prev_ctx == IN_USER)
+			context_tracking_user_enter();
+	}
 }
 
 extern void context_tracking_task_switch(struct task_struct *prev,
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index f07505c..657f668 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -33,15 +33,15 @@ void context_tracking_cpu_set(int cpu)
 }
 
 /**
- * user_enter - Inform the context tracking that the CPU is going to
- *              enter userspace mode.
+ * context_tracking_user_enter - Inform the context tracking that the CPU is going to
+ *                               enter userspace mode.
  *
  * This function must be called right before we switch from the kernel
  * to userspace, when it's guaranteed the remaining kernel instructions
  * to execute won't use any RCU read side critical section because this
  * function sets RCU in extended quiescent state.
  */
-void user_enter(void)
+void context_tracking_user_enter(void)
 {
 	unsigned long flags;
 
@@ -131,8 +131,8 @@ EXPORT_SYMBOL_GPL(preempt_schedule_context);
 #endif /* CONFIG_PREEMPT */
 
 /**
- * user_exit - Inform the context tracking that the CPU is
- *             exiting userspace mode and entering the kernel.
+ * context_tracking_user_exit - Inform the context tracking that the CPU is
+ *                              exiting userspace mode and entering the kernel.
  *
  * This function must be called after we entered the kernel from userspace
  * before any use of RCU read side critical section. This potentially include
@@ -141,7 +141,7 @@ EXPORT_SYMBOL_GPL(preempt_schedule_context);
  * This call supports re-entrancy. This way it can be called from any exception
  * handler without needing to know if we came from userspace or not.
  */
-void user_exit(void)
+void context_tracking_user_exit(void)
 {
 	unsigned long flags;
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 09/23] context_tracking: Optimize guest APIs off case with static key
  2013-08-01  0:31 [GIT PULL] nohz patches for 3.12 preview v3 Frederic Weisbecker
                   ` (7 preceding siblings ...)
  2013-08-01  0:31 ` [PATCH 08/23] context_tracking: Optimize main APIs off case with static key Frederic Weisbecker
@ 2013-08-01  0:31 ` Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 10/23] context_tracking: Optimize context switch off case with static keys Frederic Weisbecker
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2013-08-01  0:31 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Mike Galbraith, Kevin Hilman

Optimize guest entry/exit APIs with static keys. This minimize
the overhead for those who enable CONFIG_NO_HZ_FULL without
always using it. Having no range passed to nohz_full= should
result in the probes overhead to be minimized.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
---
 include/linux/context_tracking.h |   19 +++++++++++++++++--
 kernel/context_tracking.c        |   23 ++---------------------
 kernel/sched/cputime.c           |    2 ++
 3 files changed, 21 insertions(+), 23 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index e5ec0c9..03a32b0 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -87,8 +87,23 @@ static inline void context_tracking_task_switch(struct task_struct *prev,
 #endif /* !CONFIG_CONTEXT_TRACKING */
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
-extern void guest_enter(void);
-extern void guest_exit(void);
+static inline void guest_enter(void)
+{
+	if (static_key_false(&context_tracking_enabled) &&
+	    vtime_accounting_enabled())
+		vtime_guest_enter(current);
+	else
+		current->flags |= PF_VCPU;
+}
+
+static inline void guest_exit(void)
+{
+	if (static_key_false(&context_tracking_enabled) &&
+	    vtime_accounting_enabled())
+		vtime_guest_exit(current);
+	else
+		current->flags &= ~PF_VCPU;
+}
 #else
 static inline void guest_enter(void)
 {
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 657f668..5afa36b 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -21,8 +21,10 @@
 #include <linux/export.h>
 
 struct static_key context_tracking_enabled = STATIC_KEY_INIT_FALSE;
+EXPORT_SYMBOL_GPL(context_tracking_enabled);
 
 DEFINE_PER_CPU(struct context_tracking, context_tracking);
+EXPORT_SYMBOL_GPL(context_tracking);
 
 void context_tracking_cpu_set(int cpu)
 {
@@ -163,27 +165,6 @@ void context_tracking_user_exit(void)
 	local_irq_restore(flags);
 }
 
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
-void guest_enter(void)
-{
-	if (vtime_accounting_enabled())
-		vtime_guest_enter(current);
-	else
-		current->flags |= PF_VCPU;
-}
-EXPORT_SYMBOL_GPL(guest_enter);
-
-void guest_exit(void)
-{
-	if (vtime_accounting_enabled())
-		vtime_guest_exit(current);
-	else
-		current->flags &= ~PF_VCPU;
-}
-EXPORT_SYMBOL_GPL(guest_exit);
-#endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
-
-
 /**
  * context_tracking_task_switch - context switch the syscall callbacks
  * @prev: the task that is being switched out
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 223a35e..bb6b29a 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -724,6 +724,7 @@ void vtime_guest_enter(struct task_struct *tsk)
 	current->flags |= PF_VCPU;
 	write_sequnlock(&tsk->vtime_seqlock);
 }
+EXPORT_SYMBOL_GPL(vtime_guest_enter);
 
 void vtime_guest_exit(struct task_struct *tsk)
 {
@@ -732,6 +733,7 @@ void vtime_guest_exit(struct task_struct *tsk)
 	current->flags &= ~PF_VCPU;
 	write_sequnlock(&tsk->vtime_seqlock);
 }
+EXPORT_SYMBOL_GPL(vtime_guest_exit);
 
 void vtime_account_idle(struct task_struct *tsk)
 {
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 10/23] context_tracking: Optimize context switch off case with static keys
  2013-08-01  0:31 [GIT PULL] nohz patches for 3.12 preview v3 Frederic Weisbecker
                   ` (8 preceding siblings ...)
  2013-08-01  0:31 ` [PATCH 09/23] context_tracking: Optimize guest " Frederic Weisbecker
@ 2013-08-01  0:31 ` Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 11/23] context_tracking: User/kernel broundary cross trace events Frederic Weisbecker
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2013-08-01  0:31 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Mike Galbraith, Kevin Hilman

No need for syscall slowpath if no CPU is full dynticks,
rather nop this in this case.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
---
 include/linux/context_tracking.h |   11 +++++++++--
 kernel/context_tracking.c        |    6 +++---
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 03a32b0..66a8397 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -40,6 +40,8 @@ extern void context_tracking_cpu_set(int cpu);
 
 extern void context_tracking_user_enter(void);
 extern void context_tracking_user_exit(void);
+extern void __context_tracking_task_switch(struct task_struct *prev,
+					   struct task_struct *next);
 
 static inline void user_enter(void)
 {
@@ -74,8 +76,12 @@ static inline void exception_exit(enum ctx_state prev_ctx)
 	}
 }
 
-extern void context_tracking_task_switch(struct task_struct *prev,
-					 struct task_struct *next);
+static inline void context_tracking_task_switch(struct task_struct *prev,
+						struct task_struct *next)
+{
+	if (static_key_false(&context_tracking_enabled))
+		__context_tracking_task_switch(prev, next);
+}
 #else
 static inline bool context_tracking_in_user(void) { return false; }
 static inline void user_enter(void) { }
@@ -104,6 +110,7 @@ static inline void guest_exit(void)
 	else
 		current->flags &= ~PF_VCPU;
 }
+
 #else
 static inline void guest_enter(void)
 {
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 5afa36b..ef21e4f 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -166,7 +166,7 @@ void context_tracking_user_exit(void)
 }
 
 /**
- * context_tracking_task_switch - context switch the syscall callbacks
+ * __context_tracking_task_switch - context switch the syscall callbacks
  * @prev: the task that is being switched out
  * @next: the task that is being switched in
  *
@@ -178,8 +178,8 @@ void context_tracking_user_exit(void)
  * migrate to some CPU that doesn't do the context tracking. As such the TIF
  * flag may not be desired there.
  */
-void context_tracking_task_switch(struct task_struct *prev,
-			     struct task_struct *next)
+void __context_tracking_task_switch(struct task_struct *prev,
+				    struct task_struct *next)
 {
 	clear_tsk_thread_flag(prev, TIF_NOHZ);
 	set_tsk_thread_flag(next, TIF_NOHZ);
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 11/23] context_tracking: User/kernel broundary cross trace events
  2013-08-01  0:31 [GIT PULL] nohz patches for 3.12 preview v3 Frederic Weisbecker
                   ` (9 preceding siblings ...)
  2013-08-01  0:31 ` [PATCH 10/23] context_tracking: Optimize context switch off case with static keys Frederic Weisbecker
@ 2013-08-01  0:31 ` Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 12/23] vtime: Remove a few unneeded generic vtime state checks Frederic Weisbecker
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2013-08-01  0:31 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Mike Galbraith, Kevin Hilman

This can be useful to track all kernel/user round trips.
And it's also helpful to debug the context tracking subsystem.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
---
 include/trace/events/context_tracking.h |   58 +++++++++++++++++++++++++++++++
 kernel/context_tracking.c               |    5 +++
 2 files changed, 63 insertions(+), 0 deletions(-)
 create mode 100644 include/trace/events/context_tracking.h

diff --git a/include/trace/events/context_tracking.h b/include/trace/events/context_tracking.h
new file mode 100644
index 0000000..ce8007c
--- /dev/null
+++ b/include/trace/events/context_tracking.h
@@ -0,0 +1,58 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM context_tracking
+
+#if !defined(_TRACE_CONTEXT_TRACKING_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_CONTEXT_TRACKING_H
+
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(context_tracking_user,
+
+	TP_PROTO(int dummy),
+
+	TP_ARGS(dummy),
+
+	TP_STRUCT__entry(
+		__field( int,	dummy	)
+	),
+
+	TP_fast_assign(
+		__entry->dummy		= dummy;
+	),
+
+	TP_printk("%s", "")
+);
+
+/**
+ * user_enter - called when the kernel resumes to userspace
+ * @dummy:	dummy arg to make trace event macro happy
+ *
+ * This event occurs when the kernel resumes to userspace  after
+ * an exception or a syscall.
+ */
+DEFINE_EVENT(context_tracking_user, user_enter,
+
+	TP_PROTO(int dummy),
+
+	TP_ARGS(dummy)
+);
+
+/**
+ * user_exit - called when userspace enters the kernel
+ * @dummy:	dummy arg to make trace event macro happy
+ *
+ * This event occurs when userspace enters the kernel through
+ * an exception or a syscall.
+ */
+DEFINE_EVENT(context_tracking_user, user_exit,
+
+	TP_PROTO(int dummy),
+
+	TP_ARGS(dummy)
+);
+
+
+#endif /*  _TRACE_CONTEXT_TRACKING_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index ef21e4f..688efe4 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -20,6 +20,9 @@
 #include <linux/hardirq.h>
 #include <linux/export.h>
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/context_tracking.h>
+
 struct static_key context_tracking_enabled = STATIC_KEY_INIT_FALSE;
 EXPORT_SYMBOL_GPL(context_tracking_enabled);
 
@@ -64,6 +67,7 @@ void context_tracking_user_enter(void)
 	local_irq_save(flags);
 	if ( __this_cpu_read(context_tracking.state) != IN_USER) {
 		if (__this_cpu_read(context_tracking.active)) {
+			trace_user_enter(0);
 			/*
 			 * At this stage, only low level arch entry code remains and
 			 * then we'll run in userspace. We can assume there won't be
@@ -159,6 +163,7 @@ void context_tracking_user_exit(void)
 			 */
 			rcu_user_exit();
 			vtime_user_exit(current);
+			trace_user_exit(0);
 		}
 		__this_cpu_write(context_tracking.state, IN_KERNEL);
 	}
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 12/23] vtime: Remove a few unneeded generic vtime state checks
  2013-08-01  0:31 [GIT PULL] nohz patches for 3.12 preview v3 Frederic Weisbecker
                   ` (10 preceding siblings ...)
  2013-08-01  0:31 ` [PATCH 11/23] context_tracking: User/kernel broundary cross trace events Frederic Weisbecker
@ 2013-08-01  0:31 ` Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 13/23] vtime: Fix racy cputime delta update Frederic Weisbecker
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2013-08-01  0:31 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Mike Galbraith, Kevin Hilman

Some generic vtime APIs check if the vtime accounting
is enabled on the local CPU before doing their work.

Some of these are not needed because all their callers already
take care of that. Let's remove the checks on these.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
---
 kernel/sched/cputime.c |   13 +------------
 1 files changed, 1 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index bb6b29a..5f273b47 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -664,9 +664,6 @@ static void __vtime_account_system(struct task_struct *tsk)
 
 void vtime_account_system(struct task_struct *tsk)
 {
-	if (!vtime_accounting_enabled())
-		return;
-
 	write_seqlock(&tsk->vtime_seqlock);
 	__vtime_account_system(tsk);
 	write_sequnlock(&tsk->vtime_seqlock);
@@ -686,12 +683,7 @@ void vtime_account_irq_exit(struct task_struct *tsk)
 
 void vtime_account_user(struct task_struct *tsk)
 {
-	cputime_t delta_cpu;
-
-	if (!vtime_accounting_enabled())
-		return;
-
-	delta_cpu = get_vtime_delta(tsk);
+	cputime_t delta_cpu = get_vtime_delta(tsk);
 
 	write_seqlock(&tsk->vtime_seqlock);
 	tsk->vtime_snap_whence = VTIME_SYS;
@@ -701,9 +693,6 @@ void vtime_account_user(struct task_struct *tsk)
 
 void vtime_user_enter(struct task_struct *tsk)
 {
-	if (!vtime_accounting_enabled())
-		return;
-
 	write_seqlock(&tsk->vtime_seqlock);
 	tsk->vtime_snap_whence = VTIME_USER;
 	__vtime_account_system(tsk);
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 13/23] vtime: Fix racy cputime delta update
  2013-08-01  0:31 [GIT PULL] nohz patches for 3.12 preview v3 Frederic Weisbecker
                   ` (11 preceding siblings ...)
  2013-08-01  0:31 ` [PATCH 12/23] vtime: Remove a few unneeded generic vtime state checks Frederic Weisbecker
@ 2013-08-01  0:31 ` Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 14/23] context_tracking: Split low level state headers Frederic Weisbecker
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2013-08-01  0:31 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Mike Galbraith, Kevin Hilman

get_vtime_delta() must be called under the task vtime_seqlock
with the code that does the cputime accounting flush.

Otherwise the cputime reader can be fooled and run into
a race where it sees the snapshot update but misses the
cputime flush. As a result it can report a cputime that is
way too short.

Fix vtime_account_user() that wasn't complying to that rule.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
---
 kernel/sched/cputime.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 5f273b47..b62d5c0 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -683,9 +683,10 @@ void vtime_account_irq_exit(struct task_struct *tsk)
 
 void vtime_account_user(struct task_struct *tsk)
 {
-	cputime_t delta_cpu = get_vtime_delta(tsk);
+	cputime_t delta_cpu;
 
 	write_seqlock(&tsk->vtime_seqlock);
+	delta_cpu = get_vtime_delta(tsk);
 	tsk->vtime_snap_whence = VTIME_SYS;
 	account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
 	write_sequnlock(&tsk->vtime_seqlock);
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 14/23] context_tracking: Split low level state headers
  2013-08-01  0:31 [GIT PULL] nohz patches for 3.12 preview v3 Frederic Weisbecker
                   ` (12 preceding siblings ...)
  2013-08-01  0:31 ` [PATCH 13/23] vtime: Fix racy cputime delta update Frederic Weisbecker
@ 2013-08-01  0:31 ` Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 15/23] hardirq: Split preempt count mask definitions Frederic Weisbecker
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2013-08-01  0:31 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Mike Galbraith, Kevin Hilman

We plan to use the context tracking static key on inline
vtime APIs. For this we need to include the context tracking
headers from those of vtime.

However vtime headers need to stay low level because they are
included in hardirq.h that mostly contains standalone
definitions. But context_tracking.h includes sched.h for
a few task_struct references, therefore it wouldn't be sensible
to include it from vtime.h

To solve this, lets split the context tracking headers and move
out the pure state definitions that only require a few low level
headers. We can safely include that small part in vtime.h later.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
---
 include/linux/context_tracking.h       |   31 +------------------------
 include/linux/context_tracking_state.h |   39 ++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+), 30 deletions(-)
 create mode 100644 include/linux/context_tracking_state.h

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 66a8397..8b6eedb 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -2,40 +2,12 @@
 #define _LINUX_CONTEXT_TRACKING_H
 
 #include <linux/sched.h>
-#include <linux/percpu.h>
 #include <linux/vtime.h>
-#include <linux/static_key.h>
+#include <linux/context_tracking_state.h>
 #include <asm/ptrace.h>
 
-struct context_tracking {
-	/*
-	 * When active is false, probes are unset in order
-	 * to minimize overhead: TIF flags are cleared
-	 * and calls to user_enter/exit are ignored. This
-	 * may be further optimized using static keys.
-	 */
-	bool active;
-	enum ctx_state {
-		IN_KERNEL = 0,
-		IN_USER,
-	} state;
-};
-
 
 #ifdef CONFIG_CONTEXT_TRACKING
-extern struct static_key context_tracking_enabled;
-DECLARE_PER_CPU(struct context_tracking, context_tracking);
-
-static inline bool context_tracking_in_user(void)
-{
-	return __this_cpu_read(context_tracking.state) == IN_USER;
-}
-
-static inline bool context_tracking_active(void)
-{
-	return __this_cpu_read(context_tracking.active);
-}
-
 extern void context_tracking_cpu_set(int cpu);
 
 extern void context_tracking_user_enter(void);
@@ -83,7 +55,6 @@ static inline void context_tracking_task_switch(struct task_struct *prev,
 		__context_tracking_task_switch(prev, next);
 }
 #else
-static inline bool context_tracking_in_user(void) { return false; }
 static inline void user_enter(void) { }
 static inline void user_exit(void) { }
 static inline enum ctx_state exception_enter(void) { return 0; }
diff --git a/include/linux/context_tracking_state.h b/include/linux/context_tracking_state.h
new file mode 100644
index 0000000..0f1979d
--- /dev/null
+++ b/include/linux/context_tracking_state.h
@@ -0,0 +1,39 @@
+#ifndef _LINUX_CONTEXT_TRACKING_STATE_H
+#define _LINUX_CONTEXT_TRACKING_STATE_H
+
+#include <linux/percpu.h>
+#include <linux/static_key.h>
+
+struct context_tracking {
+	/*
+	 * When active is false, probes are unset in order
+	 * to minimize overhead: TIF flags are cleared
+	 * and calls to user_enter/exit are ignored. This
+	 * may be further optimized using static keys.
+	 */
+	bool active;
+	enum ctx_state {
+		IN_KERNEL = 0,
+		IN_USER,
+	} state;
+};
+
+#ifdef CONFIG_CONTEXT_TRACKING
+extern struct static_key context_tracking_enabled;
+DECLARE_PER_CPU(struct context_tracking, context_tracking);
+
+static inline bool context_tracking_in_user(void)
+{
+	return __this_cpu_read(context_tracking.state) == IN_USER;
+}
+
+static inline bool context_tracking_active(void)
+{
+	return __this_cpu_read(context_tracking.active);
+}
+#else
+static inline bool context_tracking_in_user(void) { return false; }
+static inline bool context_tracking_active(void) { return false; }
+#endif /* CONFIG_CONTEXT_TRACKING */
+
+#endif
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 15/23] hardirq: Split preempt count mask definitions
  2013-08-01  0:31 [GIT PULL] nohz patches for 3.12 preview v3 Frederic Weisbecker
                   ` (13 preceding siblings ...)
  2013-08-01  0:31 ` [PATCH 14/23] context_tracking: Split low level state headers Frederic Weisbecker
@ 2013-08-01  0:31 ` Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 16/23] m68k: hardirq_count() only need preempt_mask.h Frederic Weisbecker
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2013-08-01  0:31 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Mike Galbraith, Kevin Hilman

In order to use static keys with vtime APIs, we'll need to
add static keys headers to vtime.h

hardirq.h then becomes a problem because it needs vtime.h
for irqtime accounting in irq_enter/irq_exit, but it's
often included just to get the irq mask definitions in the
task preempt_count field and the APIs that come along:
in_interrupt(), in_hardirq(), etc...

Some very low level arch headers sometimes need these masks
and APIs such as arch/m68k/include/asm/irqflags.h for example.
But they don't want to include hardirq.h if vtime.h, jump_label.h
and even workqueue.h come along. Including such bloated high
level header from arch headers can quickly result in circular
headers dependency that crash the build.

So let's split hardirq.h in two parts:

* preempt_mask.h that gathers all the preempt_count definitions
and the APIs associated. This one is considered low level and can
be safely included anywhere.

* hardirq.h that includes the previous one. It defines the irq
entry/exit APIs.

To avoid future circular headers dependencies, the preempt_mask.h
inclusion can replace hardirq.h on files that don't implement irq
low level handlers but just need the atomic/context check APIs.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
---
 include/linux/hardirq.h      |  117 +---------------------------------------
 include/linux/preempt_mask.h |  122 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 123 insertions(+), 116 deletions(-)
 create mode 100644 include/linux/preempt_mask.h

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 05bcc09..ccfe17c 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -1,126 +1,11 @@
 #ifndef LINUX_HARDIRQ_H
 #define LINUX_HARDIRQ_H
 
-#include <linux/preempt.h>
+#include <linux/preempt_mask.h>
 #include <linux/lockdep.h>
 #include <linux/ftrace_irq.h>
 #include <linux/vtime.h>
-#include <asm/hardirq.h>
 
-/*
- * We put the hardirq and softirq counter into the preemption
- * counter. The bitmask has the following meaning:
- *
- * - bits 0-7 are the preemption count (max preemption depth: 256)
- * - bits 8-15 are the softirq count (max # of softirqs: 256)
- *
- * The hardirq count can in theory reach the same as NR_IRQS.
- * In reality, the number of nested IRQS is limited to the stack
- * size as well. For archs with over 1000 IRQS it is not practical
- * to expect that they will all nest. We give a max of 10 bits for
- * hardirq nesting. An arch may choose to give less than 10 bits.
- * m68k expects it to be 8.
- *
- * - bits 16-25 are the hardirq count (max # of nested hardirqs: 1024)
- * - bit 26 is the NMI_MASK
- * - bit 27 is the PREEMPT_ACTIVE flag
- *
- * PREEMPT_MASK: 0x000000ff
- * SOFTIRQ_MASK: 0x0000ff00
- * HARDIRQ_MASK: 0x03ff0000
- *     NMI_MASK: 0x04000000
- */
-#define PREEMPT_BITS	8
-#define SOFTIRQ_BITS	8
-#define NMI_BITS	1
-
-#define MAX_HARDIRQ_BITS 10
-
-#ifndef HARDIRQ_BITS
-# define HARDIRQ_BITS	MAX_HARDIRQ_BITS
-#endif
-
-#if HARDIRQ_BITS > MAX_HARDIRQ_BITS
-#error HARDIRQ_BITS too high!
-#endif
-
-#define PREEMPT_SHIFT	0
-#define SOFTIRQ_SHIFT	(PREEMPT_SHIFT + PREEMPT_BITS)
-#define HARDIRQ_SHIFT	(SOFTIRQ_SHIFT + SOFTIRQ_BITS)
-#define NMI_SHIFT	(HARDIRQ_SHIFT + HARDIRQ_BITS)
-
-#define __IRQ_MASK(x)	((1UL << (x))-1)
-
-#define PREEMPT_MASK	(__IRQ_MASK(PREEMPT_BITS) << PREEMPT_SHIFT)
-#define SOFTIRQ_MASK	(__IRQ_MASK(SOFTIRQ_BITS) << SOFTIRQ_SHIFT)
-#define HARDIRQ_MASK	(__IRQ_MASK(HARDIRQ_BITS) << HARDIRQ_SHIFT)
-#define NMI_MASK	(__IRQ_MASK(NMI_BITS)     << NMI_SHIFT)
-
-#define PREEMPT_OFFSET	(1UL << PREEMPT_SHIFT)
-#define SOFTIRQ_OFFSET	(1UL << SOFTIRQ_SHIFT)
-#define HARDIRQ_OFFSET	(1UL << HARDIRQ_SHIFT)
-#define NMI_OFFSET	(1UL << NMI_SHIFT)
-
-#define SOFTIRQ_DISABLE_OFFSET	(2 * SOFTIRQ_OFFSET)
-
-#ifndef PREEMPT_ACTIVE
-#define PREEMPT_ACTIVE_BITS	1
-#define PREEMPT_ACTIVE_SHIFT	(NMI_SHIFT + NMI_BITS)
-#define PREEMPT_ACTIVE	(__IRQ_MASK(PREEMPT_ACTIVE_BITS) << PREEMPT_ACTIVE_SHIFT)
-#endif
-
-#if PREEMPT_ACTIVE < (1 << (NMI_SHIFT + NMI_BITS))
-#error PREEMPT_ACTIVE is too low!
-#endif
-
-#define hardirq_count()	(preempt_count() & HARDIRQ_MASK)
-#define softirq_count()	(preempt_count() & SOFTIRQ_MASK)
-#define irq_count()	(preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK \
-				 | NMI_MASK))
-
-/*
- * Are we doing bottom half or hardware interrupt processing?
- * Are we in a softirq context? Interrupt context?
- * in_softirq - Are we currently processing softirq or have bh disabled?
- * in_serving_softirq - Are we currently processing softirq?
- */
-#define in_irq()		(hardirq_count())
-#define in_softirq()		(softirq_count())
-#define in_interrupt()		(irq_count())
-#define in_serving_softirq()	(softirq_count() & SOFTIRQ_OFFSET)
-
-/*
- * Are we in NMI context?
- */
-#define in_nmi()	(preempt_count() & NMI_MASK)
-
-#if defined(CONFIG_PREEMPT_COUNT)
-# define PREEMPT_CHECK_OFFSET 1
-#else
-# define PREEMPT_CHECK_OFFSET 0
-#endif
-
-/*
- * Are we running in atomic context?  WARNING: this macro cannot
- * always detect atomic context; in particular, it cannot know about
- * held spinlocks in non-preemptible kernels.  Thus it should not be
- * used in the general case to determine whether sleeping is possible.
- * Do not use in_atomic() in driver code.
- */
-#define in_atomic()	((preempt_count() & ~PREEMPT_ACTIVE) != 0)
-
-/*
- * Check whether we were atomic before we did preempt_disable():
- * (used by the scheduler, *after* releasing the kernel lock)
- */
-#define in_atomic_preempt_off() \
-		((preempt_count() & ~PREEMPT_ACTIVE) != PREEMPT_CHECK_OFFSET)
-
-#ifdef CONFIG_PREEMPT_COUNT
-# define preemptible()	(preempt_count() == 0 && !irqs_disabled())
-#else
-# define preemptible()	0
-#endif
 
 #if defined(CONFIG_SMP) || defined(CONFIG_GENERIC_HARDIRQS)
 extern void synchronize_irq(unsigned int irq);
diff --git a/include/linux/preempt_mask.h b/include/linux/preempt_mask.h
new file mode 100644
index 0000000..931bc61
--- /dev/null
+++ b/include/linux/preempt_mask.h
@@ -0,0 +1,122 @@
+#ifndef LINUX_PREEMPT_MASK_H
+#define LINUX_PREEMPT_MASK_H
+
+#include <linux/preempt.h>
+#include <asm/hardirq.h>
+
+/*
+ * We put the hardirq and softirq counter into the preemption
+ * counter. The bitmask has the following meaning:
+ *
+ * - bits 0-7 are the preemption count (max preemption depth: 256)
+ * - bits 8-15 are the softirq count (max # of softirqs: 256)
+ *
+ * The hardirq count can in theory reach the same as NR_IRQS.
+ * In reality, the number of nested IRQS is limited to the stack
+ * size as well. For archs with over 1000 IRQS it is not practical
+ * to expect that they will all nest. We give a max of 10 bits for
+ * hardirq nesting. An arch may choose to give less than 10 bits.
+ * m68k expects it to be 8.
+ *
+ * - bits 16-25 are the hardirq count (max # of nested hardirqs: 1024)
+ * - bit 26 is the NMI_MASK
+ * - bit 27 is the PREEMPT_ACTIVE flag
+ *
+ * PREEMPT_MASK: 0x000000ff
+ * SOFTIRQ_MASK: 0x0000ff00
+ * HARDIRQ_MASK: 0x03ff0000
+ *     NMI_MASK: 0x04000000
+ */
+#define PREEMPT_BITS	8
+#define SOFTIRQ_BITS	8
+#define NMI_BITS	1
+
+#define MAX_HARDIRQ_BITS 10
+
+#ifndef HARDIRQ_BITS
+# define HARDIRQ_BITS	MAX_HARDIRQ_BITS
+#endif
+
+#if HARDIRQ_BITS > MAX_HARDIRQ_BITS
+#error HARDIRQ_BITS too high!
+#endif
+
+#define PREEMPT_SHIFT	0
+#define SOFTIRQ_SHIFT	(PREEMPT_SHIFT + PREEMPT_BITS)
+#define HARDIRQ_SHIFT	(SOFTIRQ_SHIFT + SOFTIRQ_BITS)
+#define NMI_SHIFT	(HARDIRQ_SHIFT + HARDIRQ_BITS)
+
+#define __IRQ_MASK(x)	((1UL << (x))-1)
+
+#define PREEMPT_MASK	(__IRQ_MASK(PREEMPT_BITS) << PREEMPT_SHIFT)
+#define SOFTIRQ_MASK	(__IRQ_MASK(SOFTIRQ_BITS) << SOFTIRQ_SHIFT)
+#define HARDIRQ_MASK	(__IRQ_MASK(HARDIRQ_BITS) << HARDIRQ_SHIFT)
+#define NMI_MASK	(__IRQ_MASK(NMI_BITS)     << NMI_SHIFT)
+
+#define PREEMPT_OFFSET	(1UL << PREEMPT_SHIFT)
+#define SOFTIRQ_OFFSET	(1UL << SOFTIRQ_SHIFT)
+#define HARDIRQ_OFFSET	(1UL << HARDIRQ_SHIFT)
+#define NMI_OFFSET	(1UL << NMI_SHIFT)
+
+#define SOFTIRQ_DISABLE_OFFSET	(2 * SOFTIRQ_OFFSET)
+
+#ifndef PREEMPT_ACTIVE
+#define PREEMPT_ACTIVE_BITS	1
+#define PREEMPT_ACTIVE_SHIFT	(NMI_SHIFT + NMI_BITS)
+#define PREEMPT_ACTIVE	(__IRQ_MASK(PREEMPT_ACTIVE_BITS) << PREEMPT_ACTIVE_SHIFT)
+#endif
+
+#if PREEMPT_ACTIVE < (1 << (NMI_SHIFT + NMI_BITS))
+#error PREEMPT_ACTIVE is too low!
+#endif
+
+#define hardirq_count()	(preempt_count() & HARDIRQ_MASK)
+#define softirq_count()	(preempt_count() & SOFTIRQ_MASK)
+#define irq_count()	(preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK \
+				 | NMI_MASK))
+
+/*
+ * Are we doing bottom half or hardware interrupt processing?
+ * Are we in a softirq context? Interrupt context?
+ * in_softirq - Are we currently processing softirq or have bh disabled?
+ * in_serving_softirq - Are we currently processing softirq?
+ */
+#define in_irq()		(hardirq_count())
+#define in_softirq()		(softirq_count())
+#define in_interrupt()		(irq_count())
+#define in_serving_softirq()	(softirq_count() & SOFTIRQ_OFFSET)
+
+/*
+ * Are we in NMI context?
+ */
+#define in_nmi()	(preempt_count() & NMI_MASK)
+
+#if defined(CONFIG_PREEMPT_COUNT)
+# define PREEMPT_CHECK_OFFSET 1
+#else
+# define PREEMPT_CHECK_OFFSET 0
+#endif
+
+/*
+ * Are we running in atomic context?  WARNING: this macro cannot
+ * always detect atomic context; in particular, it cannot know about
+ * held spinlocks in non-preemptible kernels.  Thus it should not be
+ * used in the general case to determine whether sleeping is possible.
+ * Do not use in_atomic() in driver code.
+ */
+#define in_atomic()	((preempt_count() & ~PREEMPT_ACTIVE) != 0)
+
+/*
+ * Check whether we were atomic before we did preempt_disable():
+ * (used by the scheduler, *after* releasing the kernel lock)
+ */
+#define in_atomic_preempt_off() \
+		((preempt_count() & ~PREEMPT_ACTIVE) != PREEMPT_CHECK_OFFSET)
+
+#ifdef CONFIG_PREEMPT_COUNT
+# define preemptible()	(preempt_count() == 0 && !irqs_disabled())
+#else
+# define preemptible()	0
+#endif
+
+#endif /* LINUX_PREEMPT_MASK_H */
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 16/23] m68k: hardirq_count() only need preempt_mask.h
  2013-08-01  0:31 [GIT PULL] nohz patches for 3.12 preview v3 Frederic Weisbecker
                   ` (14 preceding siblings ...)
  2013-08-01  0:31 ` [PATCH 15/23] hardirq: Split preempt count mask definitions Frederic Weisbecker
@ 2013-08-01  0:31 ` Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 17/23] vtime: Describe overriden functions in dedicated arch headers Frederic Weisbecker
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2013-08-01  0:31 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Mike Galbraith, Kevin Hilman, Geert Uytterhoeven

The m68k irqflags implementation needs to check hardirq
context in some cases.

As it is a very low level header file, it's better to
include preempt_mask.h rather than hardirq.h when the
only purpose is to use irq context APIs. This way we
can avoid future header circular dependencies when
vtime.h will expand to use static keys.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
---
 arch/m68k/include/asm/irqflags.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/m68k/include/asm/irqflags.h b/arch/m68k/include/asm/irqflags.h
index 7ef4115..4c62755 100644
--- a/arch/m68k/include/asm/irqflags.h
+++ b/arch/m68k/include/asm/irqflags.h
@@ -3,7 +3,7 @@
 
 #include <linux/types.h>
 #ifdef CONFIG_MMU
-#include <linux/hardirq.h>
+#include <linux/preempt_mask.h>
 #endif
 #include <linux/preempt.h>
 #include <asm/thread_info.h>
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 17/23] vtime: Describe overriden functions in dedicated arch headers
  2013-08-01  0:31 [GIT PULL] nohz patches for 3.12 preview v3 Frederic Weisbecker
                   ` (15 preceding siblings ...)
  2013-08-01  0:31 ` [PATCH 16/23] m68k: hardirq_count() only need preempt_mask.h Frederic Weisbecker
@ 2013-08-01  0:31 ` Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 18/23] vtime: Optimize full dynticks accounting off case with static keys Frederic Weisbecker
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2013-08-01  0:31 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Mike Galbraith, Kevin Hilman, Martin Schwidefsky,
	Heiko Carstens

If the arch overrides some generic vtime APIs, let it describe
these on a dedicated and standalone header. This way it becomes
convenient to include it in vtime generic headers without irrelevant
stuff in such a low level header.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 arch/ia64/include/asm/Kbuild    |    1 +
 arch/powerpc/include/asm/Kbuild |    1 +
 arch/s390/include/asm/cputime.h |    3 ---
 arch/s390/include/asm/vtime.h   |    7 +++++++
 arch/s390/kernel/vtime.c        |    1 +
 include/linux/vtime.h           |    4 ++++
 6 files changed, 14 insertions(+), 3 deletions(-)
 create mode 100644 arch/s390/include/asm/vtime.h
 create mode 100644 include/asm-generic/vtime.h

diff --git a/arch/ia64/include/asm/Kbuild b/arch/ia64/include/asm/Kbuild
index 05b03ec..a3456f3 100644
--- a/arch/ia64/include/asm/Kbuild
+++ b/arch/ia64/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += clkdev.h
 generic-y += exec.h
 generic-y += kvm_para.h
 generic-y += trace_clock.h
+generic-y += vtime.h
\ No newline at end of file
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 650757c..704e6f1 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -2,3 +2,4 @@
 generic-y += clkdev.h
 generic-y += rwsem.h
 generic-y += trace_clock.h
+generic-y += vtime.h
\ No newline at end of file
diff --git a/arch/s390/include/asm/cputime.h b/arch/s390/include/asm/cputime.h
index d2ff4137..f65bd36 100644
--- a/arch/s390/include/asm/cputime.h
+++ b/arch/s390/include/asm/cputime.h
@@ -13,9 +13,6 @@
 #include <asm/div64.h>
 
 
-#define __ARCH_HAS_VTIME_ACCOUNT
-#define __ARCH_HAS_VTIME_TASK_SWITCH
-
 /* We want to use full resolution of the CPU timer: 2**-12 micro-seconds. */
 
 typedef unsigned long long __nocast cputime_t;
diff --git a/arch/s390/include/asm/vtime.h b/arch/s390/include/asm/vtime.h
new file mode 100644
index 0000000..af9896c
--- /dev/null
+++ b/arch/s390/include/asm/vtime.h
@@ -0,0 +1,7 @@
+#ifndef _S390_VTIME_H
+#define _S390_VTIME_H
+
+#define __ARCH_HAS_VTIME_ACCOUNT
+#define __ARCH_HAS_VTIME_TASK_SWITCH
+
+#endif /* _S390_VTIME_H */
diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c
index 9b9c1b7..abcfab5 100644
--- a/arch/s390/kernel/vtime.c
+++ b/arch/s390/kernel/vtime.c
@@ -19,6 +19,7 @@
 #include <asm/irq_regs.h>
 #include <asm/cputime.h>
 #include <asm/vtimer.h>
+#include <asm/vtime.h>
 #include <asm/irq.h>
 #include "entry.h"
 
diff --git a/include/asm-generic/vtime.h b/include/asm-generic/vtime.h
new file mode 100644
index 0000000..e69de29
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index b1dd2db..2ad0739 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -1,6 +1,10 @@
 #ifndef _LINUX_KERNEL_VTIME_H
 #define _LINUX_KERNEL_VTIME_H
 
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
+#include <asm/vtime.h>
+#endif
+
 struct task_struct;
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 18/23] vtime: Optimize full dynticks accounting off case with static keys
  2013-08-01  0:31 [GIT PULL] nohz patches for 3.12 preview v3 Frederic Weisbecker
                   ` (16 preceding siblings ...)
  2013-08-01  0:31 ` [PATCH 17/23] vtime: Describe overriden functions in dedicated arch headers Frederic Weisbecker
@ 2013-08-01  0:31 ` Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 19/23] vtime: Always scale generic vtime accounting results Frederic Weisbecker
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2013-08-01  0:31 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Mike Galbraith, Kevin Hilman

If no CPU is in the full dynticks range, we can avoid the full
dynticks cputime accounting through generic vtime along with its
overhead and use the traditional tick based accounting instead.

Let's do this and nope the off case with static keys.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
---
 include/linux/context_tracking.h |    6 +--
 include/linux/vtime.h            |   70 +++++++++++++++++++++++++++++++++-----
 kernel/sched/cputime.c           |   22 ++----------
 3 files changed, 67 insertions(+), 31 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 8b6eedb..655356a 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -66,8 +66,7 @@ static inline void context_tracking_task_switch(struct task_struct *prev,
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
 static inline void guest_enter(void)
 {
-	if (static_key_false(&context_tracking_enabled) &&
-	    vtime_accounting_enabled())
+	if (vtime_accounting_enabled())
 		vtime_guest_enter(current);
 	else
 		current->flags |= PF_VCPU;
@@ -75,8 +74,7 @@ static inline void guest_enter(void)
 
 static inline void guest_exit(void)
 {
-	if (static_key_false(&context_tracking_enabled) &&
-	    vtime_accounting_enabled())
+	if (vtime_accounting_enabled())
 		vtime_guest_exit(current);
 	else
 		current->flags &= ~PF_VCPU;
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index 2ad0739..f5b72b3 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -1,22 +1,68 @@
 #ifndef _LINUX_KERNEL_VTIME_H
 #define _LINUX_KERNEL_VTIME_H
 
+#include <linux/context_tracking_state.h>
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
 #include <asm/vtime.h>
 #endif
 
+
 struct task_struct;
 
+/*
+ * vtime_accounting_enabled() definitions/declarations
+ */
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
+static inline bool vtime_accounting_enabled(void) { return true; }
+#endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
+
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+static inline bool vtime_accounting_enabled(void)
+{
+	if (static_key_false(&context_tracking_enabled)) {
+		if (context_tracking_active())
+			return true;
+	}
+
+	return false;
+}
+#endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
+
+#ifndef CONFIG_VIRT_CPU_ACCOUNTING
+static inline bool vtime_accounting_enabled(void) { return false; }
+#endif /* !CONFIG_VIRT_CPU_ACCOUNTING */
+
+
+/*
+ * Common vtime APIs
+ */
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING
+
+#ifdef __ARCH_HAS_VTIME_TASK_SWITCH
 extern void vtime_task_switch(struct task_struct *prev);
+#else
+extern void vtime_common_task_switch(struct task_struct *prev);
+static inline void vtime_task_switch(struct task_struct *prev)
+{
+	if (vtime_accounting_enabled())
+		vtime_common_task_switch(prev);
+}
+#endif /* __ARCH_HAS_VTIME_TASK_SWITCH */
+
 extern void vtime_account_system(struct task_struct *tsk);
 extern void vtime_account_idle(struct task_struct *tsk);
 extern void vtime_account_user(struct task_struct *tsk);
-extern void vtime_account_irq_enter(struct task_struct *tsk);
 
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
-static inline bool vtime_accounting_enabled(void) { return true; }
-#endif
+#ifdef __ARCH_HAS_VTIME_ACCOUNT
+extern void vtime_account_irq_enter(struct task_struct *tsk);
+#else
+extern void vtime_common_account_irq_enter(struct task_struct *tsk);
+static inline void vtime_account_irq_enter(struct task_struct *tsk)
+{
+	if (vtime_accounting_enabled())
+		vtime_common_account_irq_enter(tsk);
+}
+#endif /* __ARCH_HAS_VTIME_ACCOUNT */
 
 #else /* !CONFIG_VIRT_CPU_ACCOUNTING */
 
@@ -24,14 +70,20 @@ static inline void vtime_task_switch(struct task_struct *prev) { }
 static inline void vtime_account_system(struct task_struct *tsk) { }
 static inline void vtime_account_user(struct task_struct *tsk) { }
 static inline void vtime_account_irq_enter(struct task_struct *tsk) { }
-static inline bool vtime_accounting_enabled(void) { return false; }
-#endif
+#endif /* !CONFIG_VIRT_CPU_ACCOUNTING */
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
 extern void arch_vtime_task_switch(struct task_struct *tsk);
-extern void vtime_account_irq_exit(struct task_struct *tsk);
-extern bool vtime_accounting_enabled(void);
+extern void vtime_gen_account_irq_exit(struct task_struct *tsk);
+
+static inline void vtime_account_irq_exit(struct task_struct *tsk)
+{
+	if (vtime_accounting_enabled())
+		vtime_gen_account_irq_exit(tsk);
+}
+
 extern void vtime_user_enter(struct task_struct *tsk);
+
 static inline void vtime_user_exit(struct task_struct *tsk)
 {
 	vtime_account_user(tsk);
@@ -39,7 +91,7 @@ static inline void vtime_user_exit(struct task_struct *tsk)
 extern void vtime_guest_enter(struct task_struct *tsk);
 extern void vtime_guest_exit(struct task_struct *tsk);
 extern void vtime_init_idle(struct task_struct *tsk, int cpu);
-#else
+#else /* !CONFIG_VIRT_CPU_ACCOUNTING_GEN  */
 static inline void vtime_account_irq_exit(struct task_struct *tsk)
 {
 	/* On hard|softirq exit we always account to hard|softirq cputime */
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index b62d5c0..0831b06 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -378,11 +378,8 @@ static inline void irqtime_account_process_tick(struct task_struct *p, int user_
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING
 
 #ifndef __ARCH_HAS_VTIME_TASK_SWITCH
-void vtime_task_switch(struct task_struct *prev)
+void vtime_common_task_switch(struct task_struct *prev)
 {
-	if (!vtime_accounting_enabled())
-		return;
-
 	if (is_idle_task(prev))
 		vtime_account_idle(prev);
 	else
@@ -404,11 +401,8 @@ void vtime_task_switch(struct task_struct *prev)
  * vtime_account().
  */
 #ifndef __ARCH_HAS_VTIME_ACCOUNT
-void vtime_account_irq_enter(struct task_struct *tsk)
+void vtime_common_account_irq_enter(struct task_struct *tsk)
 {
-	if (!vtime_accounting_enabled())
-		return;
-
 	if (!in_interrupt()) {
 		/*
 		 * If we interrupted user, context_tracking_in_user()
@@ -428,7 +422,7 @@ void vtime_account_irq_enter(struct task_struct *tsk)
 	}
 	vtime_account_system(tsk);
 }
-EXPORT_SYMBOL_GPL(vtime_account_irq_enter);
+EXPORT_SYMBOL_GPL(vtime_common_account_irq_enter);
 #endif /* __ARCH_HAS_VTIME_ACCOUNT */
 #endif /* CONFIG_VIRT_CPU_ACCOUNTING */
 
@@ -669,11 +663,8 @@ void vtime_account_system(struct task_struct *tsk)
 	write_sequnlock(&tsk->vtime_seqlock);
 }
 
-void vtime_account_irq_exit(struct task_struct *tsk)
+void vtime_gen_account_irq_exit(struct task_struct *tsk)
 {
-	if (!vtime_accounting_enabled())
-		return;
-
 	write_seqlock(&tsk->vtime_seqlock);
 	if (context_tracking_in_user())
 		tsk->vtime_snap_whence = VTIME_USER;
@@ -732,11 +723,6 @@ void vtime_account_idle(struct task_struct *tsk)
 	account_idle_time(delta_cpu);
 }
 
-bool vtime_accounting_enabled(void)
-{
-	return context_tracking_active();
-}
-
 void arch_vtime_task_switch(struct task_struct *prev)
 {
 	write_seqlock(&prev->vtime_seqlock);
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 19/23] vtime: Always scale generic vtime accounting results
  2013-08-01  0:31 [GIT PULL] nohz patches for 3.12 preview v3 Frederic Weisbecker
                   ` (17 preceding siblings ...)
  2013-08-01  0:31 ` [PATCH 18/23] vtime: Optimize full dynticks accounting off case with static keys Frederic Weisbecker
@ 2013-08-01  0:31 ` Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 20/23] vtime: Always debug check snapshot source _before_ updating it Frederic Weisbecker
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2013-08-01  0:31 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Mike Galbraith, Kevin Hilman

The cputime accounting in full dynticks can be a subtle
mixup of CPUs using tick based accounting and others using
generic vtime.

As long as the tick can have a share on producing these stats, we
want to scale the result against CFS precise accounting as the tick
can miss some task hiding between the periodic interrupt.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
---
 kernel/sched/cputime.c |    6 ------
 1 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 0831b06..e9e742e 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -553,12 +553,6 @@ static void cputime_adjust(struct task_cputime *curr,
 {
 	cputime_t rtime, stime, utime, total;
 
-	if (vtime_accounting_enabled()) {
-		*ut = curr->utime;
-		*st = curr->stime;
-		return;
-	}
-
 	stime = curr->stime;
 	total = stime + curr->utime;
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 20/23] vtime: Always debug check snapshot source _before_ updating it
  2013-08-01  0:31 [GIT PULL] nohz patches for 3.12 preview v3 Frederic Weisbecker
                   ` (18 preceding siblings ...)
  2013-08-01  0:31 ` [PATCH 19/23] vtime: Always scale generic vtime accounting results Frederic Weisbecker
@ 2013-08-01  0:31 ` Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 21/23] nohz: Rename a few state variables Frederic Weisbecker
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2013-08-01  0:31 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Mike Galbraith, Kevin Hilman

The vtime delta update performed by get_vtime_delta() always check
that the source of the snapshot is valid.

Meanhile the snapshot updaters that rely on get_vtime_delta() also
set the new snapshot origin. But some of them do this right before
the call to get_vtime_delta(), making its debug check useless.

This is easily fixable by moving the snapshot origin update after
the call to get_vtime_delta(). The order doesn't matter there.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
---
 kernel/sched/cputime.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index e9e742e..c1d7493 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -660,9 +660,9 @@ void vtime_account_system(struct task_struct *tsk)
 void vtime_gen_account_irq_exit(struct task_struct *tsk)
 {
 	write_seqlock(&tsk->vtime_seqlock);
+	__vtime_account_system(tsk);
 	if (context_tracking_in_user())
 		tsk->vtime_snap_whence = VTIME_USER;
-	__vtime_account_system(tsk);
 	write_sequnlock(&tsk->vtime_seqlock);
 }
 
@@ -680,8 +680,8 @@ void vtime_account_user(struct task_struct *tsk)
 void vtime_user_enter(struct task_struct *tsk)
 {
 	write_seqlock(&tsk->vtime_seqlock);
-	tsk->vtime_snap_whence = VTIME_USER;
 	__vtime_account_system(tsk);
+	tsk->vtime_snap_whence = VTIME_USER;
 	write_sequnlock(&tsk->vtime_seqlock);
 }
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 21/23] nohz: Rename a few state variables
  2013-08-01  0:31 [GIT PULL] nohz patches for 3.12 preview v3 Frederic Weisbecker
                   ` (19 preceding siblings ...)
  2013-08-01  0:31 ` [PATCH 20/23] vtime: Always debug check snapshot source _before_ updating it Frederic Weisbecker
@ 2013-08-01  0:31 ` Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 22/23] nohz: Optimize full dynticks state checks with static keys Frederic Weisbecker
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2013-08-01  0:31 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Mike Galbraith, Kevin Hilman

Rename the full dynticks's cpumask and cpumask state variables
to some more exportable names.

These will be used later from global headers to optimize
the main full dynticks APIs in conjunction with static keys.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
---
 kernel/time/tick-sched.c |   42 +++++++++++++++++++++---------------------
 1 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 6d604fd..71735ea 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -149,8 +149,8 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
 }
 
 #ifdef CONFIG_NO_HZ_FULL
-static cpumask_var_t nohz_full_mask;
-bool have_nohz_full_mask;
+static cpumask_var_t tick_nohz_full_mask;
+bool tick_nohz_full_running;
 
 static bool can_stop_full_tick(void)
 {
@@ -239,11 +239,11 @@ static void nohz_full_kick_ipi(void *info)
  */
 void tick_nohz_full_kick_all(void)
 {
-	if (!have_nohz_full_mask)
+	if (!tick_nohz_full_running)
 		return;
 
 	preempt_disable();
-	smp_call_function_many(nohz_full_mask,
+	smp_call_function_many(tick_nohz_full_mask,
 			       nohz_full_kick_ipi, NULL, false);
 	preempt_enable();
 }
@@ -271,10 +271,10 @@ out:
 
 int tick_nohz_full_cpu(int cpu)
 {
-	if (!have_nohz_full_mask)
+	if (!tick_nohz_full_running)
 		return 0;
 
-	return cpumask_test_cpu(cpu, nohz_full_mask);
+	return cpumask_test_cpu(cpu, tick_nohz_full_mask);
 }
 
 /* Parse the boot-time nohz CPU list from the kernel parameters. */
@@ -282,18 +282,18 @@ static int __init tick_nohz_full_setup(char *str)
 {
 	int cpu;
 
-	alloc_bootmem_cpumask_var(&nohz_full_mask);
-	if (cpulist_parse(str, nohz_full_mask) < 0) {
+	alloc_bootmem_cpumask_var(&tick_nohz_full_mask);
+	if (cpulist_parse(str, tick_nohz_full_mask) < 0) {
 		pr_warning("NOHZ: Incorrect nohz_full cpumask\n");
 		return 1;
 	}
 
 	cpu = smp_processor_id();
-	if (cpumask_test_cpu(cpu, nohz_full_mask)) {
+	if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) {
 		pr_warning("NO_HZ: Clearing %d from nohz_full range for timekeeping\n", cpu);
-		cpumask_clear_cpu(cpu, nohz_full_mask);
+		cpumask_clear_cpu(cpu, tick_nohz_full_mask);
 	}
-	have_nohz_full_mask = true;
+	tick_nohz_full_running = true;
 
 	return 1;
 }
@@ -311,7 +311,7 @@ static int tick_nohz_cpu_down_callback(struct notifier_block *nfb,
 		 * If we handle the timekeeping duty for full dynticks CPUs,
 		 * we can't safely shutdown that CPU.
 		 */
-		if (have_nohz_full_mask && tick_do_timer_cpu == cpu)
+		if (tick_nohz_full_running && tick_do_timer_cpu == cpu)
 			return NOTIFY_BAD;
 		break;
 	}
@@ -330,14 +330,14 @@ static int tick_nohz_init_all(void)
 	int err = -1;
 
 #ifdef CONFIG_NO_HZ_FULL_ALL
-	if (!alloc_cpumask_var(&nohz_full_mask, GFP_KERNEL)) {
+	if (!alloc_cpumask_var(&tick_nohz_full_mask, GFP_KERNEL)) {
 		pr_err("NO_HZ: Can't allocate full dynticks cpumask\n");
 		return err;
 	}
 	err = 0;
-	cpumask_setall(nohz_full_mask);
-	cpumask_clear_cpu(smp_processor_id(), nohz_full_mask);
-	have_nohz_full_mask = true;
+	cpumask_setall(tick_nohz_full_mask);
+	cpumask_clear_cpu(smp_processor_id(), tick_nohz_full_mask);
+	tick_nohz_full_running = true;
 #endif
 	return err;
 }
@@ -346,20 +346,20 @@ void __init tick_nohz_init(void)
 {
 	int cpu;
 
-	if (!have_nohz_full_mask) {
+	if (!tick_nohz_full_running) {
 		if (tick_nohz_init_all() < 0)
 			return;
 	}
 
-	for_each_cpu(cpu, nohz_full_mask)
+	for_each_cpu(cpu, tick_nohz_full_mask)
 		context_tracking_cpu_set(cpu);
 
 	cpu_notifier(tick_nohz_cpu_down_callback, 0);
-	cpulist_scnprintf(nohz_full_buf, sizeof(nohz_full_buf), nohz_full_mask);
+	cpulist_scnprintf(nohz_full_buf, sizeof(nohz_full_buf), tick_nohz_full_mask);
 	pr_info("NO_HZ: Full dynticks CPUs: %s.\n", nohz_full_buf);
 }
 #else
-#define have_nohz_full_mask (0)
+#define tick_nohz_full_running (0)
 #endif
 
 /*
@@ -737,7 +737,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 		return false;
 	}
 
-	if (have_nohz_full_mask) {
+	if (tick_nohz_full_running) {
 		/*
 		 * Keep the tick alive to guarantee timekeeping progression
 		 * if there are full dynticks CPUs around
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 22/23] nohz: Optimize full dynticks state checks with static keys
  2013-08-01  0:31 [GIT PULL] nohz patches for 3.12 preview v3 Frederic Weisbecker
                   ` (20 preceding siblings ...)
  2013-08-01  0:31 ` [PATCH 21/23] nohz: Rename a few state variables Frederic Weisbecker
@ 2013-08-01  0:31 ` Frederic Weisbecker
  2013-08-01  0:31 ` [PATCH 23/23] nohz: Optimize full dynticks's sched hooks " Frederic Weisbecker
  2013-08-01  8:29 ` [GIT PULL] nohz patches for 3.12 preview v3 Peter Zijlstra
  23 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2013-08-01  0:31 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Mike Galbraith, Kevin Hilman

These APIs are frequenctly accessed and priority is given
to optimize the full dynticks off-case in order to let
distros enable this feature without suffering from
significant performance regressions.

Let's inline these APIs and optimize them with static keys.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
---
 include/linux/tick.h     |   25 +++++++++++++++++++++++--
 kernel/time/tick-sched.c |   14 ++------------
 2 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 9180f4b..c60b079 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -10,6 +10,8 @@
 #include <linux/irqflags.h>
 #include <linux/percpu.h>
 #include <linux/hrtimer.h>
+#include <linux/context_tracking_state.h>
+#include <linux/cpumask.h>
 
 #ifdef CONFIG_GENERIC_CLOCKEVENTS
 
@@ -158,15 +160,34 @@ static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
 # endif /* !CONFIG_NO_HZ_COMMON */
 
 #ifdef CONFIG_NO_HZ_FULL
+extern bool tick_nohz_full_running;
+extern cpumask_var_t tick_nohz_full_mask;
+
+static inline bool tick_nohz_full_enabled(void)
+{
+	if (!static_key_false(&context_tracking_enabled))
+		return false;
+
+	return tick_nohz_full_running;
+}
+
+static inline bool tick_nohz_full_cpu(int cpu)
+{
+	if (!tick_nohz_full_enabled())
+		return false;
+
+	return cpumask_test_cpu(cpu, tick_nohz_full_mask);
+}
+
 extern void tick_nohz_init(void);
-extern int tick_nohz_full_cpu(int cpu);
 extern void tick_nohz_full_check(void);
 extern void tick_nohz_full_kick(void);
 extern void tick_nohz_full_kick_all(void);
 extern void tick_nohz_task_switch(struct task_struct *tsk);
 #else
 static inline void tick_nohz_init(void) { }
-static inline int tick_nohz_full_cpu(int cpu) { return 0; }
+static inline bool tick_nohz_full_enabled(void) { return false; }
+static inline bool tick_nohz_full_cpu(int cpu) { return false; }
 static inline void tick_nohz_full_check(void) { }
 static inline void tick_nohz_full_kick(void) { }
 static inline void tick_nohz_full_kick_all(void) { }
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 71735ea..6d6bd6e 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -149,7 +149,7 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
 }
 
 #ifdef CONFIG_NO_HZ_FULL
-static cpumask_var_t tick_nohz_full_mask;
+cpumask_var_t tick_nohz_full_mask;
 bool tick_nohz_full_running;
 
 static bool can_stop_full_tick(void)
@@ -269,14 +269,6 @@ out:
 	local_irq_restore(flags);
 }
 
-int tick_nohz_full_cpu(int cpu)
-{
-	if (!tick_nohz_full_running)
-		return 0;
-
-	return cpumask_test_cpu(cpu, tick_nohz_full_mask);
-}
-
 /* Parse the boot-time nohz CPU list from the kernel parameters. */
 static int __init tick_nohz_full_setup(char *str)
 {
@@ -358,8 +350,6 @@ void __init tick_nohz_init(void)
 	cpulist_scnprintf(nohz_full_buf, sizeof(nohz_full_buf), tick_nohz_full_mask);
 	pr_info("NO_HZ: Full dynticks CPUs: %s.\n", nohz_full_buf);
 }
-#else
-#define tick_nohz_full_running (0)
 #endif
 
 /*
@@ -737,7 +727,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 		return false;
 	}
 
-	if (tick_nohz_full_running) {
+	if (tick_nohz_full_enabled()) {
 		/*
 		 * Keep the tick alive to guarantee timekeeping progression
 		 * if there are full dynticks CPUs around
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 23/23] nohz: Optimize full dynticks's sched hooks with static keys
  2013-08-01  0:31 [GIT PULL] nohz patches for 3.12 preview v3 Frederic Weisbecker
                   ` (21 preceding siblings ...)
  2013-08-01  0:31 ` [PATCH 22/23] nohz: Optimize full dynticks state checks with static keys Frederic Weisbecker
@ 2013-08-01  0:31 ` Frederic Weisbecker
  2013-08-01  8:29 ` [GIT PULL] nohz patches for 3.12 preview v3 Peter Zijlstra
  23 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2013-08-01  0:31 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Steven Rostedt, Paul E. McKenney,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Li Zhong, Mike Galbraith, Kevin Hilman

Scheduler IPIs and task context switches are serious fast path.
Let's try to hide as much as we can the impact of full
dynticks APIs' off case that are called on these sites
through the use of static keys.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
---
 include/linux/tick.h     |   20 ++++++++++++++++----
 kernel/time/tick-sched.c |    8 ++++----
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index c60b079..a7ef1d6 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -180,20 +180,32 @@ static inline bool tick_nohz_full_cpu(int cpu)
 }
 
 extern void tick_nohz_init(void);
-extern void tick_nohz_full_check(void);
+extern void __tick_nohz_full_check(void);
 extern void tick_nohz_full_kick(void);
 extern void tick_nohz_full_kick_all(void);
-extern void tick_nohz_task_switch(struct task_struct *tsk);
+extern void __tick_nohz_task_switch(struct task_struct *tsk);
 #else
 static inline void tick_nohz_init(void) { }
 static inline bool tick_nohz_full_enabled(void) { return false; }
 static inline bool tick_nohz_full_cpu(int cpu) { return false; }
-static inline void tick_nohz_full_check(void) { }
+static inline void __tick_nohz_full_check(void) { }
 static inline void tick_nohz_full_kick(void) { }
 static inline void tick_nohz_full_kick_all(void) { }
-static inline void tick_nohz_task_switch(struct task_struct *tsk) { }
+static inline void __tick_nohz_task_switch(struct task_struct *tsk) { }
 #endif
 
+static inline void tick_nohz_full_check(void)
+{
+	if (tick_nohz_full_enabled())
+		__tick_nohz_full_check();
+}
+
+static inline void tick_nohz_task_switch(struct task_struct *tsk)
+{
+	if (tick_nohz_full_enabled())
+		__tick_nohz_task_switch(tsk);
+}
+
 
 # ifdef CONFIG_CPU_IDLE_GOV_MENU
 extern void menu_hrtimer_cancel(void);
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 6d6bd6e..73997be 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -197,7 +197,7 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now);
  * Re-evaluate the need for the tick on the current CPU
  * and restart it if necessary.
  */
-void tick_nohz_full_check(void)
+void __tick_nohz_full_check(void)
 {
 	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
 
@@ -211,7 +211,7 @@ void tick_nohz_full_check(void)
 
 static void nohz_full_kick_work_func(struct irq_work *work)
 {
-	tick_nohz_full_check();
+	__tick_nohz_full_check();
 }
 
 static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
@@ -230,7 +230,7 @@ void tick_nohz_full_kick(void)
 
 static void nohz_full_kick_ipi(void *info)
 {
-	tick_nohz_full_check();
+	__tick_nohz_full_check();
 }
 
 /*
@@ -253,7 +253,7 @@ void tick_nohz_full_kick_all(void)
  * It might need the tick due to per task/process properties:
  * perf events, posix cpu timers, ...
  */
-void tick_nohz_task_switch(struct task_struct *tsk)
+void __tick_nohz_task_switch(struct task_struct *tsk)
 {
 	unsigned long flags;
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [GIT PULL] nohz patches for 3.12 preview v3
  2013-08-01  0:31 [GIT PULL] nohz patches for 3.12 preview v3 Frederic Weisbecker
                   ` (22 preceding siblings ...)
  2013-08-01  0:31 ` [PATCH 23/23] nohz: Optimize full dynticks's sched hooks " Frederic Weisbecker
@ 2013-08-01  8:29 ` Peter Zijlstra
  2013-08-03 15:46   ` Frederic Weisbecker
  23 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2013-08-01  8:29 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Steven Rostedt, Paul E. McKenney, Ingo Molnar,
	Thomas Gleixner, Borislav Petkov, Li Zhong, Mike Galbraith,
	Kevin Hilman, Martin Schwidefsky, Heiko Carstens,
	Geert Uytterhoeven, Alex Shi, Paul Turner, Vincent Guittot

On Thu, Aug 01, 2013 at 02:31:17AM +0200, Frederic Weisbecker wrote:
> Hi,
> 
> So none of the patches from the previous v2 posting have changed.
> I've just added two more in order to fix build crashes reported
> by Wu Fengguang:
> 
>       hardirq: Split preempt count mask definitions
>       m68k: hardirq_count() only need preempt_mask.h
> 
> If no comment arise, I'll send a pull request to Ingo in a few days.

So I did a drive-by review and didn't spot anything really weird.

However, it would be very good to include some performance figures that
show what effect all this effort had.

I'm assuming its good (TM) ;-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [GIT PULL] nohz patches for 3.12 preview v3
  2013-08-01  8:29 ` [GIT PULL] nohz patches for 3.12 preview v3 Peter Zijlstra
@ 2013-08-03 15:46   ` Frederic Weisbecker
  0 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2013-08-03 15:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Steven Rostedt, Paul E. McKenney, Ingo Molnar,
	Thomas Gleixner, Borislav Petkov, Li Zhong, Mike Galbraith,
	Kevin Hilman, Martin Schwidefsky, Heiko Carstens,
	Geert Uytterhoeven, Alex Shi, Paul Turner, Vincent Guittot

On Thu, Aug 01, 2013 at 10:29:59AM +0200, Peter Zijlstra wrote:
> On Thu, Aug 01, 2013 at 02:31:17AM +0200, Frederic Weisbecker wrote:
> > Hi,
> > 
> > So none of the patches from the previous v2 posting have changed.
> > I've just added two more in order to fix build crashes reported
> > by Wu Fengguang:
> > 
> >       hardirq: Split preempt count mask definitions
> >       m68k: hardirq_count() only need preempt_mask.h
> > 
> > If no comment arise, I'll send a pull request to Ingo in a few days.
> 
> So I did a drive-by review and didn't spot anything really weird.
> 
> However, it would be very good to include some performance figures that
> show what effect all this effort had.
> 
> I'm assuming its good (TM) ;-)

Thanks for your review :)

So here are a few benchmarks, those have been generated with 50 loops of hackbench
in perf stat:

   perf stat -r 50 hackbench

Be careful with the results because my machine is not the best for such testing. It generates sometimes
profiles that contradict altogether. There is sometimes non-deterministic behaviour out there.
But still the following results reveal some coherent tendencies.

So lets first compare the full dynticks dynamic off-case (full dynticks built but no cpu in full dynticks range)
before and after the patchset.

* Before patchset, NO_HZ_FULL=y but no CPUs defined in the full dynticks range

 Performance counter stats for './hackbench' (50 runs):

       2085,711227 task-clock                #    3,665 CPUs utilized            ( +-  0,20% )
            17 150 context-switches          #    0,008 M/sec                    ( +-  2,63% )
             2 787 cpu-migrations            #    0,001 M/sec                    ( +-  3,73% )
            24 642 page-faults               #    0,012 M/sec                    ( +-  0,13% )
     4 570 011 562 cycles                    #    2,191 GHz                      ( +-  0,27% ) [69,39%]
       615 260 904 stalled-cycles-frontend   #   13,46% frontend cycles idle     ( +-  0,41% ) [69,37%]
     2 858 928 434 stalled-cycles-backend    #   62,56% backend  cycles idle     ( +-  0,29% ) [69,44%]
     1 886 501 443 instructions              #    0,41  insns per cycle        
                                             #    1,52  stalled cycles per insn  ( +-  0,21% ) [69,46%]
       321 845 039 branches                  #  154,309 M/sec                    ( +-  0,27% ) [69,38%]
         8 086 056 branch-misses             #    2,51% of all branches          ( +-  0,40% ) [69,37%]

       0,569151871 seconds time elapsed


* After patchset, NO_HZ_FULL=y but no CPUs defined in the full dynticks range:

 Performance counter stats for './hackbench' (50 runs):

       1832,559228 task-clock                #    3,630 CPUs utilized            ( +-  0,88% )
            18 284 context-switches          #    0,010 M/sec                    ( +-  2,70% )
             2 877 cpu-migrations            #    0,002 M/sec                    ( +-  3,57% )
            24 643 page-faults               #    0,013 M/sec                    ( +-  0,12% )
     3 997 096 098 cycles                    #    2,181 GHz                      ( +-  0,94% ) [69,60%]
       497 409 356 stalled-cycles-frontend   #   12,44% frontend cycles idle     ( +-  0,45% ) [69,59%]
     2 828 069 690 stalled-cycles-backend    #   70,75% backend  cycles idle     ( +-  0,61% ) [69,69%]
     1 195 418 817 instructions              #    0,30  insns per cycle        
                                             #    2,37  stalled cycles per insn  ( +-  2,27% ) [69,75%]
       217 876 716 branches                  #  118,892 M/sec                    ( +-  3,13% ) [69,79%]
         5 895 622 branch-misses             #    2,71% of all branches          ( +-  0,44% ) [69,65%]

       0,504899242 seconds time elapsed


Looking at the time elapsed, it's a rough difference of 11% performance gain after the patchset.


Now lets compare static full dynticks off case (NO_HZ_FULL=n) and dynamic full dynticks off case (NO_HZ_FULL=y
but no CPU in "nohz_full=" boot option (namely the above former profile), all after the patchset:

* After patchset, CONFIG_NO_HZ=n, CONFIG_RCU_NOCBS=n, CONFIG_CPU_ACCOUNTING_TICK=y, CONFIG_RCU_USER_QS=n:
  a classical dynticks idle kernel.

 Performance counter stats for './hackbench' (50 runs):

       1784,878581 task-clock                #    3,595 CPUs utilized            ( +-  0,26% )
            18 130 context-switches          #    0,010 M/sec                    ( +-  2,52% )
             2 850 cpu-migrations            #    0,002 M/sec                    ( +-  3,15% )
            24 683 page-faults               #    0,014 M/sec                    ( +-  0,11% )
     3 899 468 529 cycles                    #    2,185 GHz                      ( +-  0,31% ) [69,63%]
       476 759 654 stalled-cycles-frontend   #   12,23% frontend cycles idle     ( +-  0,59% ) [69,58%]
     2 789 090 317 stalled-cycles-backend    #   71,52% backend  cycles idle     ( +-  0,35% ) [69,51%]
     1 156 184 197 instructions              #    0,30  insns per cycle        
                                             #    2,41  stalled cycles per insn  ( +-  0,19% ) [69,54%]
       207 501 450 branches                  #  116,255 M/sec                    ( +-  0,21% ) [69,66%]
         5 029 776 branch-misses             #    2,42% of all branches          ( +-  0,32% ) [69,69%]

       0,496525053 seconds time elapsed


Compared to the dynamic off case above, it's 0.496525053 VS 0.504899242. Roughly 1.6 % performance loss.

To conclude, the patchset improves the current upstream situation a lot in any case (11% better for the dynamic off case).

But even after this patchset, there is still some work to do if we want to make full dynticks dynamic off case near invisible
compared to a dyntick idle kernel. ie: we need to remove that 1.6% performance loss. Now this seem to be some good progress
toward that direction.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2013-08-03 15:46 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-01  0:31 [GIT PULL] nohz patches for 3.12 preview v3 Frederic Weisbecker
2013-08-01  0:31 ` [PATCH 01/23] sched: Consolidate open coded preemptible() checks Frederic Weisbecker
2013-08-01  0:31 ` [PATCH 02/23] context_tracing: Fix guest accounting with native vtime Frederic Weisbecker
2013-08-01  0:31 ` [PATCH 03/23] vtime: Update a few comments Frederic Weisbecker
2013-08-01  0:31 ` [PATCH 04/23] context_tracking: Fix runtime CPU off-case Frederic Weisbecker
2013-08-01  0:31 ` [PATCH 05/23] nohz: Only enable context tracking on full dynticks CPUs Frederic Weisbecker
2013-08-01  0:31 ` [PATCH 06/23] context_tracking: Remove full dynticks' hacky dependency on wide context tracking Frederic Weisbecker
2013-08-01  0:31 ` [PATCH 07/23] context_tracking: Ground setup for static key use Frederic Weisbecker
2013-08-01  0:31 ` [PATCH 08/23] context_tracking: Optimize main APIs off case with static key Frederic Weisbecker
2013-08-01  0:31 ` [PATCH 09/23] context_tracking: Optimize guest " Frederic Weisbecker
2013-08-01  0:31 ` [PATCH 10/23] context_tracking: Optimize context switch off case with static keys Frederic Weisbecker
2013-08-01  0:31 ` [PATCH 11/23] context_tracking: User/kernel broundary cross trace events Frederic Weisbecker
2013-08-01  0:31 ` [PATCH 12/23] vtime: Remove a few unneeded generic vtime state checks Frederic Weisbecker
2013-08-01  0:31 ` [PATCH 13/23] vtime: Fix racy cputime delta update Frederic Weisbecker
2013-08-01  0:31 ` [PATCH 14/23] context_tracking: Split low level state headers Frederic Weisbecker
2013-08-01  0:31 ` [PATCH 15/23] hardirq: Split preempt count mask definitions Frederic Weisbecker
2013-08-01  0:31 ` [PATCH 16/23] m68k: hardirq_count() only need preempt_mask.h Frederic Weisbecker
2013-08-01  0:31 ` [PATCH 17/23] vtime: Describe overriden functions in dedicated arch headers Frederic Weisbecker
2013-08-01  0:31 ` [PATCH 18/23] vtime: Optimize full dynticks accounting off case with static keys Frederic Weisbecker
2013-08-01  0:31 ` [PATCH 19/23] vtime: Always scale generic vtime accounting results Frederic Weisbecker
2013-08-01  0:31 ` [PATCH 20/23] vtime: Always debug check snapshot source _before_ updating it Frederic Weisbecker
2013-08-01  0:31 ` [PATCH 21/23] nohz: Rename a few state variables Frederic Weisbecker
2013-08-01  0:31 ` [PATCH 22/23] nohz: Optimize full dynticks state checks with static keys Frederic Weisbecker
2013-08-01  0:31 ` [PATCH 23/23] nohz: Optimize full dynticks's sched hooks " Frederic Weisbecker
2013-08-01  8:29 ` [GIT PULL] nohz patches for 3.12 preview v3 Peter Zijlstra
2013-08-03 15:46   ` Frederic Weisbecker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.