linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] cputime: Generic virtual based cputime accounting v4
@ 2012-11-03 16:09 Frederic Weisbecker
  2012-11-03 16:09 ` [PATCH 1/3] context_tracking: New context tracking susbsystem Frederic Weisbecker
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Frederic Weisbecker @ 2012-11-03 16:09 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, H. Peter Anvin, Ingo Molnar,
	Paul E. McKenney, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner

Hi,

I'm back on this patchset now that the necessary cputime cleanups are
merged, although more cputime consolidation as in the ctx switch and tick
path should also be done in the future, when I'll get time to cleanup
the s390 part.

So this version of the generic vtime is essentially a rebase against
latest changes (tip:sched/core). Once we get that thing in, we'll need
to handle the cputime read side when the write side is in nohz mode. Probably
no big deal but let's move step by step, as usual.

Comments?

This can be fetched from:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	vtime/generic-v4


Frederic Weisbecker (3):
  context_tracking: New context tracking susbsystem
  cputime: Allow dynamic switch between tick/virtual based cputime
    accounting
  cputime: Generic on-demand virtual cputime accounting

 arch/Kconfig                                       |   12 +-
 arch/ia64/include/asm/cputime.h                    |    5 +
 arch/ia64/kernel/time.c                            |    2 +-
 arch/powerpc/include/asm/cputime.h                 |    5 +
 arch/powerpc/kernel/time.c                         |    2 +-
 arch/s390/include/asm/cputime.h                    |    5 +
 arch/s390/kernel/vtime.c                           |    2 +-
 arch/x86/Kconfig                                   |    2 +-
 arch/x86/include/asm/{rcu.h => context_tracking.h} |   13 +-
 arch/x86/kernel/entry_64.S                         |    2 +-
 arch/x86/kernel/ptrace.c                           |    8 +-
 arch/x86/kernel/signal.c                           |    5 +-
 arch/x86/kernel/traps.c                            |    2 +-
 arch/x86/mm/fault.c                                |    2 +-
 include/linux/context_tracking.h                   |   46 ++++++
 include/linux/rcupdate.h                           |    2 -
 include/linux/sched.h                              |   13 +--
 include/linux/vtime.h                              |   14 ++
 init/Kconfig                                       |   41 ++++--
 kernel/Makefile                                    |    1 +
 kernel/context_tracking.c                          |   71 +++++++++
 kernel/fork.c                                      |    3 +-
 kernel/rcutree.c                                   |   64 +--------
 kernel/sched/core.c                                |    9 +-
 kernel/sched/cputime.c                             |  152 ++++++++++++++++----
 kernel/time/tick-sched.c                           |    5 +-
 26 files changed, 335 insertions(+), 153 deletions(-)
 rename arch/x86/include/asm/{rcu.h => context_tracking.h} (69%)
 create mode 100644 include/linux/context_tracking.h
 create mode 100644 kernel/context_tracking.c

-- 
1.7.5.4


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] context_tracking: New context tracking susbsystem
  2012-11-03 16:09 [PATCH 0/3] cputime: Generic virtual based cputime accounting v4 Frederic Weisbecker
@ 2012-11-03 16:09 ` Frederic Weisbecker
  2012-11-06  9:53   ` Gilad Ben-Yossef
  2012-11-03 16:09 ` [PATCH 2/3] cputime: Allow dynamic switch between tick/virtual based cputime accounting Frederic Weisbecker
  2012-11-03 16:09 ` [PATCH 3/3] cputime: Generic on-demand virtual " Frederic Weisbecker
  2 siblings, 1 reply; 6+ messages in thread
From: Frederic Weisbecker @ 2012-11-03 16:09 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, H. Peter Anvin, Ingo Molnar,
	Paul E. McKenney, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner

Create a new subsystem that probes on kernel boundaries
to keep track of the transitions between level contexts
with two basic initial contexts: user or kernel.

This is an abstraction of some RCU code that use such tracking
to implement its userspace extended quiescent state.

We need to pull this up from RCU into this new level of indirection
because this tracking is also going to be used to implement an "on
demand" generic virtual cputime accounting. A necessary step to
shutdown the tick while still accounting the cputime.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 arch/Kconfig                                       |   12 ++--
 arch/x86/Kconfig                                   |    2 +-
 arch/x86/include/asm/{rcu.h => context_tracking.h} |   13 ++--
 arch/x86/kernel/entry_64.S                         |    2 +-
 arch/x86/kernel/ptrace.c                           |    8 +-
 arch/x86/kernel/signal.c                           |    5 +-
 arch/x86/kernel/traps.c                            |    2 +-
 arch/x86/mm/fault.c                                |    2 +-
 include/linux/context_tracking.h                   |   18 ++++
 include/linux/rcupdate.h                           |    2 -
 include/linux/sched.h                              |    8 --
 init/Kconfig                                       |   30 ++++----
 kernel/Makefile                                    |    1 +
 kernel/context_tracking.c                          |   83 ++++++++++++++++++++
 kernel/rcutree.c                                   |   64 +---------------
 kernel/sched/core.c                                |    9 +-
 16 files changed, 147 insertions(+), 114 deletions(-)
 rename arch/x86/include/asm/{rcu.h => context_tracking.h} (69%)
 create mode 100644 include/linux/context_tracking.h
 create mode 100644 kernel/context_tracking.c

diff --git a/arch/Kconfig b/arch/Kconfig
index 366ec06..3855e06 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -300,15 +300,15 @@ config SECCOMP_FILTER
 
 	  See Documentation/prctl/seccomp_filter.txt for details.
 
-config HAVE_RCU_USER_QS
+config HAVE_CONTEXT_TRACKING
 	bool
 	help
-	  Provide kernel entry/exit hooks necessary for userspace
+	  Provide kernel/user boundaries probes necessary for userspace
 	  RCU extended quiescent state. Syscalls need to be wrapped inside
-	  rcu_user_exit()-rcu_user_enter() through the slow path using
-	  TIF_NOHZ flag. Exceptions handlers must be wrapped as well. Irqs
-	  are already protected inside rcu_irq_enter/rcu_irq_exit() but
-	  preemption or signal handling on irq exit still need to be protected.
+	  user_exit()-user_enter() through the slow path using TIF_NOHZ flag.
+	  Exceptions handlers must be wrapped as well. Irqs are already
+	  protected inside rcu_irq_enter/rcu_irq_exit() but preemption or
+	  signal handling on irq exit still need to be protected.
 
 config HAVE_VIRT_CPU_ACCOUNTING
 	bool
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 46c3bff..110cfad 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -106,7 +106,7 @@ config X86
 	select KTIME_SCALAR if X86_32
 	select GENERIC_STRNCPY_FROM_USER
 	select GENERIC_STRNLEN_USER
-	select HAVE_RCU_USER_QS if X86_64
+	select HAVE_CONTEXT_TRACKING if X86_64
 	select HAVE_IRQ_TIME_ACCOUNTING
 	select GENERIC_KERNEL_THREAD
 	select GENERIC_KERNEL_EXECVE
diff --git a/arch/x86/include/asm/rcu.h b/arch/x86/include/asm/context_tracking.h
similarity index 69%
rename from arch/x86/include/asm/rcu.h
rename to arch/x86/include/asm/context_tracking.h
index d1ac07a..4d5b661 100644
--- a/arch/x86/include/asm/rcu.h
+++ b/arch/x86/include/asm/context_tracking.h
@@ -1,21 +1,20 @@
-#ifndef _ASM_X86_RCU_H
-#define _ASM_X86_RCU_H
+#ifndef _ASM_X86_CONTEXT_TRACKING_H
+#define _ASM_X86_CONTEXT_TRACKING_H
 
 #ifndef __ASSEMBLY__
-
-#include <linux/rcupdate.h>
+#include <linux/context_tracking.h>
 #include <asm/ptrace.h>
 
 static inline void exception_enter(struct pt_regs *regs)
 {
-	rcu_user_exit();
+	user_exit();
 }
 
 static inline void exception_exit(struct pt_regs *regs)
 {
-#ifdef CONFIG_RCU_USER_QS
+#ifdef CONFIG_CONTEXT_TRACKING
 	if (user_mode(regs))
-		rcu_user_enter();
+		user_enter();
 #endif
 }
 
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index b51b2c7..1a1c2ba 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -56,7 +56,7 @@
 #include <asm/ftrace.h>
 #include <asm/percpu.h>
 #include <asm/asm.h>
-#include <asm/rcu.h>
+#include <asm/context_tracking.h>
 #include <asm/smap.h>
 #include <linux/err.h>
 
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index eff5b8c..65b88a5 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -21,7 +21,7 @@
 #include <linux/signal.h>
 #include <linux/perf_event.h>
 #include <linux/hw_breakpoint.h>
-#include <linux/rcupdate.h>
+#include <linux/context_tracking.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
@@ -1461,7 +1461,7 @@ long syscall_trace_enter(struct pt_regs *regs)
 {
 	long ret = 0;
 
-	rcu_user_exit();
+	user_exit();
 
 	/*
 	 * If we stepped into a sysenter/syscall insn, it trapped in
@@ -1516,7 +1516,7 @@ void syscall_trace_leave(struct pt_regs *regs)
 	 * or do_notify_resume(), in which case we can be in RCU
 	 * user mode.
 	 */
-	rcu_user_exit();
+	user_exit();
 
 	audit_syscall_exit(regs);
 
@@ -1534,5 +1534,5 @@ void syscall_trace_leave(struct pt_regs *regs)
 	if (step || test_thread_flag(TIF_SYSCALL_TRACE))
 		tracehook_report_syscall_exit(regs, step);
 
-	rcu_user_enter();
+	user_enter();
 }
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 70b27ee..fbbb604 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -22,6 +22,7 @@
 #include <linux/uaccess.h>
 #include <linux/user-return-notifier.h>
 #include <linux/uprobes.h>
+#include <linux/context_tracking.h>
 
 #include <asm/processor.h>
 #include <asm/ucontext.h>
@@ -816,7 +817,7 @@ static void do_signal(struct pt_regs *regs)
 void
 do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags)
 {
-	rcu_user_exit();
+	user_exit();
 
 #ifdef CONFIG_X86_MCE
 	/* notify userspace of pending MCEs */
@@ -838,7 +839,7 @@ do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags)
 	if (thread_info_flags & _TIF_USER_RETURN_NOTIFY)
 		fire_user_return_notifiers();
 
-	rcu_user_enter();
+	user_enter();
 }
 
 void signal_fault(struct pt_regs *regs, void __user *frame, char *where)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 8276dc6..eb85866 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -55,7 +55,7 @@
 #include <asm/i387.h>
 #include <asm/fpu-internal.h>
 #include <asm/mce.h>
-#include <asm/rcu.h>
+#include <asm/context_tracking.h>
 
 #include <asm/mach_traps.h>
 
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 8e13ecb..b0b1f1d 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -18,7 +18,7 @@
 #include <asm/pgalloc.h>		/* pgd_*(), ...			*/
 #include <asm/kmemcheck.h>		/* kmemcheck_*(), ...		*/
 #include <asm/fixmap.h>			/* VSYSCALL_START		*/
-#include <asm/rcu.h>			/* exception_enter(), ...	*/
+#include <asm/context_tracking.h>		/* exception_enter(), ...	*/
 
 /*
  * Page fault error code bits:
diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
new file mode 100644
index 0000000..e24339c
--- /dev/null
+++ b/include/linux/context_tracking.h
@@ -0,0 +1,18 @@
+#ifndef _LINUX_CONTEXT_TRACKING_H
+#define _LINUX_CONTEXT_TRACKING_H
+
+#ifdef CONFIG_CONTEXT_TRACKING
+#include <linux/sched.h>
+
+extern void user_enter(void);
+extern void user_exit(void);
+extern void context_tracking_task_switch(struct task_struct *prev,
+					 struct task_struct *next);
+#else
+static inline void user_enter(void) { }
+static inline void user_exit(void) { }
+static inline void context_tracking_task_switch(struct task_struct *prev,
+						struct task_struct *next) { }
+#endif /* !CONFIG_CONTEXT_TRACKING */
+
+#endif
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 7c968e4..f5034f2 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -197,8 +197,6 @@ extern void rcu_user_enter(void);
 extern void rcu_user_exit(void);
 extern void rcu_user_enter_after_irq(void);
 extern void rcu_user_exit_after_irq(void);
-extern void rcu_user_hooks_switch(struct task_struct *prev,
-				  struct task_struct *next);
 #else
 static inline void rcu_user_enter(void) { }
 static inline void rcu_user_exit(void) { }
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e1581a0..6c13fe3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1866,14 +1866,6 @@ static inline void rcu_copy_process(struct task_struct *p)
 
 #endif
 
-static inline void rcu_switch(struct task_struct *prev,
-			      struct task_struct *next)
-{
-#ifdef CONFIG_RCU_USER_QS
-	rcu_user_hooks_switch(prev, next);
-#endif
-}
-
 static inline void tsk_restore_flags(struct task_struct *task,
 				unsigned long orig_flags, unsigned long flags)
 {
diff --git a/init/Kconfig b/init/Kconfig
index 6fdd6e3..15e44e7 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -436,6 +436,19 @@ config TASK_IO_ACCOUNTING
 
 endmenu # "CPU/Task time and stats accounting"
 
+config CONTEXT_TRACKING
+       bool
+
+config CONTEXT_TRACKING_FORCE
+	bool "Force context tracking"
+	depends on CONTEXT_TRACKING
+	help
+	  Probe on user/kernel boundaries by default in order to
+	  test the features that rely on it such as userspace RCU extended
+	  quiescent states.
+	  This test is there for debugging until we have a real user like a
+	  full adaptive nohz option.
+
 menu "RCU Subsystem"
 
 choice
@@ -488,7 +501,8 @@ config PREEMPT_RCU
 
 config RCU_USER_QS
 	bool "Consider userspace as in RCU extended quiescent state"
-	depends on HAVE_RCU_USER_QS && SMP
+	depends on HAVE_CONTEXT_TRACKING && SMP
+	select CONTEXT_TRACKING
 	help
 	  This option sets hooks on kernel / userspace boundaries and
 	  puts RCU in extended quiescent state when the CPU runs in
@@ -502,20 +516,6 @@ config RCU_USER_QS
 
 	  If unsure say N
 
-config RCU_USER_QS_FORCE
-	bool "Force userspace extended QS by default"
-	depends on RCU_USER_QS
-	help
-	  Set the hooks in user/kernel boundaries by default in order to
-	  test this feature that treats userspace as an extended quiescent
-	  state until we have a real user like a full adaptive nohz option.
-
-	  Unless you want to hack and help the development of the full
-	  tickless feature, you shouldn't enable this option. It adds
-	  unnecessary overhead.
-
-	  If unsure say N
-
 config RCU_FANOUT
 	int "Tree-based hierarchical RCU fanout value"
 	range 2 64 if 64BIT
diff --git a/kernel/Makefile b/kernel/Makefile
index 0dfeca4..f90bbfc 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -110,6 +110,7 @@ obj-$(CONFIG_USER_RETURN_NOTIFIER) += user-return-notifier.o
 obj-$(CONFIG_PADATA) += padata.o
 obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
 obj-$(CONFIG_JUMP_LABEL) += jump_label.o
+obj-$(CONFIG_CONTEXT_TRACKING) += context_tracking.o
 
 $(obj)/configs.o: $(obj)/config_data.h
 
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
new file mode 100644
index 0000000..d7983ea
--- /dev/null
+++ b/kernel/context_tracking.c
@@ -0,0 +1,83 @@
+#include <linux/context_tracking.h>
+#include <linux/rcupdate.h>
+#include <linux/sched.h>
+#include <linux/percpu.h>
+#include <linux/hardirq.h>
+
+struct context_tracking {
+	/*
+	 * When active is false, hooks are not set to
+	 * minimize overhead: TIF flags are cleared
+	 * and calls to user_enter/exit are ignored. This
+	 * may be further optimized using static keys.
+	 */
+	bool active;
+	enum {
+		IN_KERNEL = 0,
+		IN_USER,
+	} state;
+};
+
+DEFINE_PER_CPU(struct context_tracking, context_tracking) = {
+#ifdef CONFIG_CONTEXT_TRACKING_FORCE
+	.active = true,
+#endif
+};
+
+void user_enter(void)
+{
+	unsigned long flags;
+
+	/*
+	 * Some contexts may involve an exception occuring in an irq,
+	 * leading to that nesting:
+	 * rcu_irq_enter() rcu_user_exit() rcu_user_exit() rcu_irq_exit()
+	 * This would mess up the dyntick_nesting count though. And rcu_irq_*()
+	 * helpers are enough to protect RCU uses inside the exception. So
+	 * just return immediately if we detect we are in an IRQ.
+	 */
+	if (in_interrupt())
+		return;
+
+	WARN_ON_ONCE(!current->mm);
+
+	local_irq_save(flags);
+	if (__this_cpu_read(context_tracking.active) &&
+	    __this_cpu_read(context_tracking.state) != IN_USER) {
+		__this_cpu_write(context_tracking.state, IN_USER);
+		rcu_user_enter();
+	}
+	local_irq_restore(flags);
+}
+
+void user_exit(void)
+{
+	unsigned long flags;
+
+	/*
+	 * Some contexts may involve an exception occuring in an irq,
+	 * leading to that nesting:
+	 * rcu_irq_enter() rcu_user_exit() rcu_user_exit() rcu_irq_exit()
+	 * This would mess up the dyntick_nesting count though. And rcu_irq_*()
+	 * helpers are enough to protect RCU uses inside the exception. So
+	 * just return immediately if we detect we are in an IRQ.
+	 */
+	if (in_interrupt())
+		return;
+
+	local_irq_save(flags);
+	if (__this_cpu_read(context_tracking.state) == IN_USER) {
+		__this_cpu_write(context_tracking.state, IN_KERNEL);
+		rcu_user_exit();
+	}
+	local_irq_restore(flags);
+}
+
+void context_tracking_task_switch(struct task_struct *prev,
+			     struct task_struct *next)
+{
+	if (__this_cpu_read(context_tracking.active)) {
+		clear_tsk_thread_flag(prev, TIF_NOHZ);
+		set_tsk_thread_flag(next, TIF_NOHZ);
+	}
+}
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 74df86b..d3700a4 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -207,9 +207,6 @@ EXPORT_SYMBOL_GPL(rcu_note_context_switch);
 DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = {
 	.dynticks_nesting = DYNTICK_TASK_EXIT_IDLE,
 	.dynticks = ATOMIC_INIT(1),
-#if defined(CONFIG_RCU_USER_QS) && !defined(CONFIG_RCU_USER_QS_FORCE)
-	.ignore_user_qs = true,
-#endif
 };
 
 static int blimit = 10;		/* Maximum callbacks per rcu_do_batch. */
@@ -416,29 +413,7 @@ EXPORT_SYMBOL_GPL(rcu_idle_enter);
  */
 void rcu_user_enter(void)
 {
-	unsigned long flags;
-	struct rcu_dynticks *rdtp;
-
-	/*
-	 * Some contexts may involve an exception occuring in an irq,
-	 * leading to that nesting:
-	 * rcu_irq_enter() rcu_user_exit() rcu_user_exit() rcu_irq_exit()
-	 * This would mess up the dyntick_nesting count though. And rcu_irq_*()
-	 * helpers are enough to protect RCU uses inside the exception. So
-	 * just return immediately if we detect we are in an IRQ.
-	 */
-	if (in_interrupt())
-		return;
-
-	WARN_ON_ONCE(!current->mm);
-
-	local_irq_save(flags);
-	rdtp = &__get_cpu_var(rcu_dynticks);
-	if (!rdtp->ignore_user_qs && !rdtp->in_user) {
-		rdtp->in_user = true;
-		rcu_eqs_enter(true);
-	}
-	local_irq_restore(flags);
+	rcu_eqs_enter(1);
 }
 
 /**
@@ -575,27 +550,7 @@ EXPORT_SYMBOL_GPL(rcu_idle_exit);
  */
 void rcu_user_exit(void)
 {
-	unsigned long flags;
-	struct rcu_dynticks *rdtp;
-
-	/*
-	 * Some contexts may involve an exception occuring in an irq,
-	 * leading to that nesting:
-	 * rcu_irq_enter() rcu_user_exit() rcu_user_exit() rcu_irq_exit()
-	 * This would mess up the dyntick_nesting count though. And rcu_irq_*()
-	 * helpers are enough to protect RCU uses inside the exception. So
-	 * just return immediately if we detect we are in an IRQ.
-	 */
-	if (in_interrupt())
-		return;
-
-	local_irq_save(flags);
-	rdtp = &__get_cpu_var(rcu_dynticks);
-	if (rdtp->in_user) {
-		rdtp->in_user = false;
-		rcu_eqs_exit(true);
-	}
-	local_irq_restore(flags);
+	rcu_eqs_exit(1);
 }
 
 /**
@@ -718,21 +673,6 @@ int rcu_is_cpu_idle(void)
 }
 EXPORT_SYMBOL(rcu_is_cpu_idle);
 
-#ifdef CONFIG_RCU_USER_QS
-void rcu_user_hooks_switch(struct task_struct *prev,
-			   struct task_struct *next)
-{
-	struct rcu_dynticks *rdtp;
-
-	/* Interrupts are disabled in context switch */
-	rdtp = &__get_cpu_var(rcu_dynticks);
-	if (!rdtp->ignore_user_qs) {
-		clear_tsk_thread_flag(prev, TIF_NOHZ);
-		set_tsk_thread_flag(next, TIF_NOHZ);
-	}
-}
-#endif /* #ifdef CONFIG_RCU_USER_QS */
-
 #if defined(CONFIG_PROVE_RCU) && defined(CONFIG_HOTPLUG_CPU)
 
 /*
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5dae0d2..e0d02ab 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -72,6 +72,7 @@
 #include <linux/slab.h>
 #include <linux/init_task.h>
 #include <linux/binfmts.h>
+#include <linux/context_tracking.h>
 
 #include <asm/switch_to.h>
 #include <asm/tlb.h>
@@ -1897,8 +1898,8 @@ context_switch(struct rq *rq, struct task_struct *prev,
 	spin_release(&rq->lock.dep_map, 1, _THIS_IP_);
 #endif
 
+	context_tracking_task_switch(prev, next);
 	/* Here we just switch the register state and the stack. */
-	rcu_switch(prev, next);
 	switch_to(prev, next, prev);
 
 	barrier();
@@ -2931,9 +2932,9 @@ asmlinkage void __sched schedule_user(void)
 	 * we haven't yet exited the RCU idle mode. Do it here manually until
 	 * we find a better solution.
 	 */
-	rcu_user_exit();
+	user_exit();
 	schedule();
-	rcu_user_enter();
+	user_enter();
 }
 #endif
 
@@ -3038,7 +3039,7 @@ asmlinkage void __sched preempt_schedule_irq(void)
 	/* Catch callers which need to be fixed */
 	BUG_ON(ti->preempt_count || !irqs_disabled());
 
-	rcu_user_exit();
+	user_exit();
 	do {
 		add_preempt_count(PREEMPT_ACTIVE);
 		local_irq_enable();
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] cputime: Allow dynamic switch between tick/virtual based cputime accounting
  2012-11-03 16:09 [PATCH 0/3] cputime: Generic virtual based cputime accounting v4 Frederic Weisbecker
  2012-11-03 16:09 ` [PATCH 1/3] context_tracking: New context tracking susbsystem Frederic Weisbecker
@ 2012-11-03 16:09 ` Frederic Weisbecker
  2012-11-03 16:09 ` [PATCH 3/3] cputime: Generic on-demand virtual " Frederic Weisbecker
  2 siblings, 0 replies; 6+ messages in thread
From: Frederic Weisbecker @ 2012-11-03 16:09 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, H. Peter Anvin, Ingo Molnar,
	Paul E. McKenney, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner

Allow to dynamically switch between tick and virtual based cputime accounting.
This way we can provide a kind of "on-demand" virtual based cputime
accounting. In this mode, the kernel will rely on the user hooks
subsystem to dynamically hook on kernel boundaries.

This is in preparation for beeing able to stop the timer tick further
idle. Doing so will depend on CONFIG_VIRT_CPU_ACCOUNTING which makes
it possible to account the cputime without the tick by hooking on
kernel/user boundaries.

Depending whether the tick is stopped or not, we can switch between
tick and vtime based accounting anytime in order to minimize the
overhead associated to user hooks.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 arch/ia64/include/asm/cputime.h    |    5 ++++
 arch/ia64/kernel/time.c            |    2 +-
 arch/powerpc/include/asm/cputime.h |    5 ++++
 arch/powerpc/kernel/time.c         |    2 +-
 arch/s390/include/asm/cputime.h    |    5 ++++
 arch/s390/kernel/vtime.c           |    2 +-
 include/linux/sched.h              |    5 +---
 include/linux/vtime.h              |    7 ++++++
 kernel/fork.c                      |    3 +-
 kernel/sched/cputime.c             |   40 ++++++++++++++++-------------------
 kernel/time/tick-sched.c           |    5 ++-
 11 files changed, 48 insertions(+), 33 deletions(-)

diff --git a/arch/ia64/include/asm/cputime.h b/arch/ia64/include/asm/cputime.h
index 3deac95..49782fe 100644
--- a/arch/ia64/include/asm/cputime.h
+++ b/arch/ia64/include/asm/cputime.h
@@ -103,5 +103,10 @@ static inline void cputime_to_timeval(const cputime_t ct, struct timeval *val)
 #define cputime64_to_clock_t(__ct)	\
 	cputime_to_clock_t((__force cputime_t)__ct)
 
+static inline bool vtime_accounting(void)
+{
+	return true;
+}
+
 #endif /* CONFIG_VIRT_CPU_ACCOUNTING */
 #endif /* __IA64_CPUTIME_H */
diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index 5e48503..7b1fa3d 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -151,7 +151,7 @@ void __vtime_account_idle(struct task_struct *tsk)
  * Called from the timer interrupt handler to charge accumulated user time
  * to the current process.  Must be called with interrupts disabled.
  */
-void account_process_tick(struct task_struct *p, int user_tick)
+void vtime_account_process_tick(struct task_struct *p, int user_tick)
 {
 	vtime_account_user(p);
 }
diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/asm/cputime.h
index 487d46f..e84c2b3 100644
--- a/arch/powerpc/include/asm/cputime.h
+++ b/arch/powerpc/include/asm/cputime.h
@@ -228,6 +228,11 @@ static inline cputime_t clock_t_to_cputime(const unsigned long clk)
 
 #define cputime64_to_clock_t(ct)	cputime_to_clock_t((cputime_t)(ct))
 
+static inline bool vtime_accounting(void)
+{
+	return true;
+}
+
 #endif /* __KERNEL__ */
 #endif /* CONFIG_VIRT_CPU_ACCOUNTING */
 #endif /* __POWERPC_CPUTIME_H */
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 0db456f..1c9a3b8 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -363,7 +363,7 @@ void __vtime_account_idle(struct task_struct *tsk)
  * (i.e. since the last entry from usermode) so that
  * get_paca()->user_time_scaled is up to date.
  */
-void account_process_tick(struct task_struct *tsk, int user_tick)
+void vtime_account_process_tick(struct task_struct *tsk, int user_tick)
 {
 	cputime_t utime, utimescaled;
 
diff --git a/arch/s390/include/asm/cputime.h b/arch/s390/include/asm/cputime.h
index 023d5ae..a96161b 100644
--- a/arch/s390/include/asm/cputime.h
+++ b/arch/s390/include/asm/cputime.h
@@ -191,4 +191,9 @@ static inline int s390_nohz_delay(int cpu)
 
 #define arch_needs_cpu(cpu) s390_nohz_delay(cpu)
 
+static inline bool vtime_accounting(void)
+{
+	return true;
+}
+
 #endif /* _S390_CPUTIME_H */
diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c
index 783e988..ab180de 100644
--- a/arch/s390/kernel/vtime.c
+++ b/arch/s390/kernel/vtime.c
@@ -112,7 +112,7 @@ void vtime_task_switch(struct task_struct *prev)
 	S390_lowcore.system_timer = ti->system_timer;
 }
 
-void account_process_tick(struct task_struct *tsk, int user_tick)
+void vtime_account_process_tick(struct task_struct *tsk, int user_tick)
 {
 	if (do_account_vtime(tsk, HARDIRQ_OFFSET))
 		virt_timer_expire();
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6c13fe3..2f9bba0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -580,9 +580,7 @@ struct signal_struct {
 	cputime_t utime, stime, cutime, cstime;
 	cputime_t gtime;
 	cputime_t cgtime;
-#ifndef CONFIG_VIRT_CPU_ACCOUNTING
 	cputime_t prev_utime, prev_stime;
-#endif
 	unsigned long nvcsw, nivcsw, cnvcsw, cnivcsw;
 	unsigned long min_flt, maj_flt, cmin_flt, cmaj_flt;
 	unsigned long inblock, oublock, cinblock, coublock;
@@ -1339,9 +1337,8 @@ struct task_struct {
 
 	cputime_t utime, stime, utimescaled, stimescaled;
 	cputime_t gtime;
-#ifndef CONFIG_VIRT_CPU_ACCOUNTING
 	cputime_t prev_utime, prev_stime;
-#endif
+
 	unsigned long nvcsw, nivcsw; /* context switch counts */
 	struct timespec start_time; 		/* monotonic time */
 	struct timespec real_start_time;	/* boot based time */
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index 0c2a2d3..85a1f0f 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -8,12 +8,19 @@ extern void vtime_task_switch(struct task_struct *prev);
 extern void __vtime_account_system(struct task_struct *tsk);
 extern void vtime_account_system(struct task_struct *tsk);
 extern void __vtime_account_idle(struct task_struct *tsk);
+extern void vtime_account_process_tick(struct task_struct *tsk,
+				       int user_tick);
 extern void vtime_account(struct task_struct *tsk);
 #else
 static inline void vtime_task_switch(struct task_struct *prev) { }
 static inline void __vtime_account_system(struct task_struct *tsk) { }
 static inline void vtime_account_system(struct task_struct *tsk) { }
+
+static inline void
+vtime_account_process_tick(struct task_struct *tsk, int user_tick) { }
+
 static inline void vtime_account(struct task_struct *tsk) { }
+static inline bool vtime_accounting(void) { return false; }
 #endif
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
diff --git a/kernel/fork.c b/kernel/fork.c
index 8b20ab7..66bf627 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1221,9 +1221,8 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 
 	p->utime = p->stime = p->gtime = 0;
 	p->utimescaled = p->stimescaled = 0;
-#ifndef CONFIG_VIRT_CPU_ACCOUNTING
 	p->prev_utime = p->prev_stime = 0;
-#endif
+
 #if defined(SPLIT_RSS_COUNTING)
 	memset(&p->rss_stat, 0, sizeof(p->rss_stat));
 #endif
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 8d859da..ff608f6 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -288,8 +288,6 @@ static __always_inline bool steal_account_process_tick(void)
 	return false;
 }
 
-#ifndef CONFIG_VIRT_CPU_ACCOUNTING
-
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
 /*
  * Account a tick to a process and cpustat
@@ -369,6 +367,11 @@ void account_process_tick(struct task_struct *p, int user_tick)
 	cputime_t one_jiffy_scaled = cputime_to_scaled(cputime_one_jiffy);
 	struct rq *rq = this_rq();
 
+	if (vtime_accounting()) {
+		vtime_account_process_tick(p, user_tick);
+		return;
+	}
+
 	if (sched_clock_irqtime) {
 		irqtime_account_process_tick(p, user_tick, rq);
 		return;
@@ -411,28 +414,11 @@ void account_idle_ticks(unsigned long ticks)
 	account_idle_time(jiffies_to_cputime(ticks));
 }
 
-#endif
 
 /*
  * Use precise platform statistics if available:
  */
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING
-void task_times(struct task_struct *p, cputime_t *ut, cputime_t *st)
-{
-	*ut = p->utime;
-	*st = p->stime;
-}
-
-void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *st)
-{
-	struct task_cputime cputime;
-
-	thread_group_cputime(p, &cputime);
-
-	*ut = cputime.utime;
-	*st = cputime.stime;
-}
-
 void vtime_account_system(struct task_struct *tsk)
 {
 	unsigned long flags;
@@ -467,8 +453,7 @@ void vtime_account(struct task_struct *tsk)
 }
 EXPORT_SYMBOL_GPL(vtime_account);
 #endif /* __ARCH_HAS_VTIME_ACCOUNT */
-
-#else
+#endif /* CONFIG_VIRT_CPU_ACCOUNTING */
 
 #ifndef nsecs_to_cputime
 # define nsecs_to_cputime(__nsecs)	nsecs_to_jiffies(__nsecs)
@@ -492,6 +477,12 @@ void task_times(struct task_struct *p, cputime_t *ut, cputime_t *st)
 {
 	cputime_t rtime, utime = p->utime, total = utime + p->stime;
 
+	if (vtime_accounting()) {
+		*ut = p->utime;
+		*st = p->stime;
+		return;
+	}
+
 	/*
 	 * Use CFS's precise accounting:
 	 */
@@ -523,6 +514,12 @@ void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *st)
 
 	thread_group_cputime(p, &cputime);
 
+	if (vtime_accounting()) {
+		*ut = cputime.utime;
+		*st = cputime.stime;
+		return;
+	}
+
 	total = cputime.utime + cputime.stime;
 	rtime = nsecs_to_cputime(cputime.sum_exec_runtime);
 
@@ -537,4 +534,3 @@ void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *st)
 	*ut = sig->prev_utime;
 	*st = sig->prev_stime;
 }
-#endif
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index a402608..2c82751 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -586,8 +586,10 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
 
 static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
 {
-#ifndef CONFIG_VIRT_CPU_ACCOUNTING
 	unsigned long ticks;
+
+	if (vtime_accounting())
+		return;
 	/*
 	 * We stopped the tick in idle. Update process times would miss the
 	 * time we slept as update_process_times does only a 1 tick
@@ -599,7 +601,6 @@ static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
 	 */
 	if (ticks && ticks < LONG_MAX)
 		account_idle_ticks(ticks);
-#endif
 }
 
 /**
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] cputime: Generic on-demand virtual cputime accounting
  2012-11-03 16:09 [PATCH 0/3] cputime: Generic virtual based cputime accounting v4 Frederic Weisbecker
  2012-11-03 16:09 ` [PATCH 1/3] context_tracking: New context tracking susbsystem Frederic Weisbecker
  2012-11-03 16:09 ` [PATCH 2/3] cputime: Allow dynamic switch between tick/virtual based cputime accounting Frederic Weisbecker
@ 2012-11-03 16:09 ` Frederic Weisbecker
  2 siblings, 0 replies; 6+ messages in thread
From: Frederic Weisbecker @ 2012-11-03 16:09 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, H. Peter Anvin, Ingo Molnar,
	Paul E. McKenney, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner

If we want to stop the tick further idle, we need to be
able to account the cputime without using the tick.

Virtual based cputime accounting solves that problem by
hooking into kernel/user boundaries.

However implementing CONFIG_VIRT_CPU_ACCOUNTING require
to set low level hooks and involves more overhead. But
we already have a generic context tracking subsystem
that is required for RCU needs by archs which will want to
shut down the tick outside idle.

This patch implements a generic virtual based cputime
accounting that relies on these generic kernel/user hooks.

There are some upsides of doing this:

- This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING
if context tracking is already built (already necessary for RCU in full
tickless mode).

- We can rely on the generic context tracking subsystem to dynamically
(de)activate the hooks, so that we can switch anytime between virtual
and tick based accounting. This way we don't have the overhead
of the virtual accounting when the tick is running periodically.

And a few downsides:

- It relies on jiffies and the hooks are set in high level code. This
results in less precise cputime accounting than with a true native
virtual based cputime accounting which hooks on low level code and use
a cpu hardware clock. Precision is not the goal of this though.

- There is probably more overhead than a native virtual based cputime
accounting. But this relies on hooks that are already set anyway.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/context_tracking.h |   28 ++++++++++
 include/linux/vtime.h            |    7 +++
 init/Kconfig                     |   11 ++++-
 kernel/context_tracking.c        |   16 +-----
 kernel/sched/cputime.c           |  112 ++++++++++++++++++++++++++++++++++++--
 5 files changed, 154 insertions(+), 20 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index e24339c..3b63210 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -3,12 +3,40 @@
 
 #ifdef CONFIG_CONTEXT_TRACKING
 #include <linux/sched.h>
+#include <linux/percpu.h>
+
+struct context_tracking {
+	/*
+	 * When active is false, hooks are not set to
+	 * minimize overhead: TIF flags are cleared
+	 * and calls to user_enter/exit are ignored. This
+	 * may be further optimized using static keys.
+	 */
+	bool active;
+	enum {
+		IN_KERNEL = 0,
+		IN_USER,
+	} state;
+};
+
+DECLARE_PER_CPU(struct context_tracking, context_tracking);
+
+static inline bool context_tracking_in_user(void)
+{
+	return __this_cpu_read(context_tracking.state) == IN_USER;
+}
+
+static inline bool context_tracking_active(void)
+{
+	return __this_cpu_read(context_tracking.active);
+}
 
 extern void user_enter(void);
 extern void user_exit(void);
 extern void context_tracking_task_switch(struct task_struct *prev,
 					 struct task_struct *next);
 #else
+static inline bool context_tracking_in_user(void) { return false; }
 static inline void user_enter(void) { }
 static inline void user_exit(void) { }
 static inline void context_tracking_task_switch(struct task_struct *prev,
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index 85a1f0f..3ea63a1 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -23,6 +23,13 @@ static inline void vtime_account(struct task_struct *tsk) { }
 static inline bool vtime_accounting(void) { return false; }
 #endif
 
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+extern void __vtime_account_user(struct task_struct *tsk);
+extern bool vtime_accounting(void);
+#else
+static inline void __vtime_account_user(struct task_struct *tsk) { }
+#endif
+
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
 extern void irqtime_account_irq(struct task_struct *tsk);
 #else
diff --git a/init/Kconfig b/init/Kconfig
index 15e44e7..ad96572 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -344,7 +344,9 @@ config TICK_CPU_ACCOUNTING
 
 config VIRT_CPU_ACCOUNTING
 	bool "Deterministic task and CPU time accounting"
-	depends on HAVE_VIRT_CPU_ACCOUNTING
+	depends on HAVE_VIRT_CPU_ACCOUNTING || HAVE_CONTEXT_TRACKING
+	select VIRT_CPU_ACCOUNTING_GEN if !HAVE_VIRT_CPU_ACCOUNTING
+	default y if PPC64
 	help
 	  Select this option to enable more accurate task and CPU time
 	  accounting.  This is done by reading a CPU counter on each
@@ -367,6 +369,13 @@ config IRQ_TIME_ACCOUNTING
 
 endchoice
 
+config VIRT_CPU_ACCOUNTING_GEN
+	select CONTEXT_TRACKING
+	bool
+	help
+	  Implement a generic virtual based cputime accounting by using
+	  the context tracking subsystem.
+
 config BSD_PROCESS_ACCT
 	bool "BSD Process Accounting"
 	help
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index d7983ea..1a1ded6 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -1,22 +1,8 @@
 #include <linux/context_tracking.h>
 #include <linux/rcupdate.h>
 #include <linux/sched.h>
-#include <linux/percpu.h>
 #include <linux/hardirq.h>
 
-struct context_tracking {
-	/*
-	 * When active is false, hooks are not set to
-	 * minimize overhead: TIF flags are cleared
-	 * and calls to user_enter/exit are ignored. This
-	 * may be further optimized using static keys.
-	 */
-	bool active;
-	enum {
-		IN_KERNEL = 0,
-		IN_USER,
-	} state;
-};
 
 DEFINE_PER_CPU(struct context_tracking, context_tracking) = {
 #ifdef CONFIG_CONTEXT_TRACKING_FORCE
@@ -45,6 +31,7 @@ void user_enter(void)
 	if (__this_cpu_read(context_tracking.active) &&
 	    __this_cpu_read(context_tracking.state) != IN_USER) {
 		__this_cpu_write(context_tracking.state, IN_USER);
+		__vtime_account_system(current);
 		rcu_user_enter();
 	}
 	local_irq_restore(flags);
@@ -69,6 +56,7 @@ void user_exit(void)
 	if (__this_cpu_read(context_tracking.state) == IN_USER) {
 		__this_cpu_write(context_tracking.state, IN_KERNEL);
 		rcu_user_exit();
+		__vtime_account_user(current);
 	}
 	local_irq_restore(flags);
 }
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index ff608f6..53990e7 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -3,6 +3,7 @@
 #include <linux/tsacct_kern.h>
 #include <linux/kernel_stat.h>
 #include <linux/static_key.h>
+#include <linux/context_tracking.h>
 #include "sched.h"
 
 
@@ -444,11 +445,25 @@ void vtime_account(struct task_struct *tsk)
 
 	local_irq_save(flags);
 
-	if (in_interrupt() || !is_idle_task(tsk))
-		__vtime_account_system(tsk);
-	else
-		__vtime_account_idle(tsk);
-
+	if (!in_interrupt()) {
+		/*
+		 * If we interrupted user, context_tracking_in_user()
+		 * is 1 because the context tracking don't hook
+		 * on irq entry/exit. This way we know if
+		 * we need to flush user time on kernel entry.
+		 */
+		if (context_tracking_in_user()) {
+			__vtime_account_user(tsk);
+			goto out;
+		}
+
+		if (is_idle_task(tsk)) {
+			__vtime_account_idle(tsk);
+			goto out;
+		}
+	}
+	__vtime_account_system(tsk);
+out:
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(vtime_account);
@@ -534,3 +549,90 @@ void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *st)
 	*ut = sig->prev_utime;
 	*st = sig->prev_stime;
 }
+
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+static DEFINE_PER_CPU(long, last_jiffies) = INITIAL_JIFFIES;
+
+static cputime_t get_vtime_delta(void)
+{
+	long delta;
+
+	delta = jiffies - __this_cpu_read(last_jiffies);
+	__this_cpu_add(last_jiffies, delta);
+
+	return jiffies_to_cputime(delta);
+}
+
+void __vtime_account_system(struct task_struct *tsk)
+{
+	cputime_t delta_cpu = get_vtime_delta();
+
+	account_system_time(tsk, irq_count(), delta_cpu, cputime_to_scaled(delta_cpu));
+}
+
+void __vtime_account_user(struct task_struct *tsk)
+{
+	cputime_t delta_cpu = get_vtime_delta();
+
+	account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
+}
+
+void __vtime_account_idle(struct task_struct *tsk)
+{
+	cputime_t delta_cpu = get_vtime_delta();
+
+	account_idle_time(delta_cpu);
+}
+
+void vtime_task_switch(struct task_struct *prev)
+{
+	if (is_idle_task(prev))
+		__vtime_account_idle(prev);
+	else
+		__vtime_account_system(prev);
+}
+
+/*
+ * This is an unfortunate hack: if we flush user time only on
+ * irq entry, we miss the jiffies update and the time is spuriously
+ * accounted to system time.
+ */
+void vtime_account_process_tick(struct task_struct *p, int user_tick)
+{
+	if (context_tracking_in_user())
+		__vtime_account_user(p);
+}
+
+bool vtime_accounting(void)
+{
+	return context_tracking_active();
+}
+
+static int __cpuinit vtime_cpu_notify(struct notifier_block *self,
+				      unsigned long action, void *hcpu)
+{
+	long cpu = (long)hcpu;
+	long *last_jiffies_cpu = per_cpu_ptr(&last_jiffies, cpu);
+
+	switch (action) {
+	case CPU_UP_PREPARE:
+	case CPU_UP_PREPARE_FROZEN:
+		/*
+		 * CHECKME: ensure that's visible by the CPU
+		 * once it wakes up
+		 */
+		*last_jiffies_cpu = jiffies;
+	default:
+		break;
+	}
+
+	return NOTIFY_OK;
+}
+
+static int __init init_vtime(void)
+{
+	cpu_notifier(vtime_cpu_notify, 0);
+	return 0;
+}
+early_initcall(init_vtime);
+#endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/3] context_tracking: New context tracking susbsystem
  2012-11-03 16:09 ` [PATCH 1/3] context_tracking: New context tracking susbsystem Frederic Weisbecker
@ 2012-11-06  9:53   ` Gilad Ben-Yossef
  2012-11-26 23:17     ` Frederic Weisbecker
  0 siblings, 1 reply; 6+ messages in thread
From: Gilad Ben-Yossef @ 2012-11-06  9:53 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Andrew Morton, H. Peter Anvin, Ingo Molnar,
	Paul E. McKenney, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner

On Sat, Nov 3, 2012 at 6:09 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> Create a new subsystem that probes on kernel boundaries
> to keep track of the transitions between level contexts
> with two basic initial contexts: user or kernel.
>
> This is an abstraction of some RCU code that use such tracking
> to implement its userspace extended quiescent state.
>
> We need to pull this up from RCU into this new level of indirection
> because this tracking is also going to be used to implement an "on
> demand" generic virtual cputime accounting. A necessary step to
> shutdown the tick while still accounting the cputime.
> ...
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 366ec06..3855e06 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -300,15 +300,15 @@ config SECCOMP_FILTER
>
>           See Documentation/prctl/seccomp_filter.txt for details.
>
> -config HAVE_RCU_USER_QS
> +config HAVE_CONTEXT_TRACKING
>         bool
>         help
> -         Provide kernel entry/exit hooks necessary for userspace
> +         Provide kernel/user boundaries probes necessary for userspace
>           RCU extended quiescent state. Syscalls need to be wrapped inside

A minor nit pick: if whole point of the patch is to turn an RCU
specific mechanism to a generic one
that RCU happens to use, then the text needs to reflect that. How about:

"Provide kernel/user boundaries probes necessary for subsystems that
need it, such as userspace
RCU extended quiescent state. "


> -         rcu_user_exit()-rcu_user_enter() through the slow path using
> -         TIF_NOHZ flag. Exceptions handlers must be wrapped as well. Irqs
> -         are already protected inside rcu_irq_enter/rcu_irq_exit() but
> -         preemption or signal handling on irq exit still need to be protected.
> +         user_exit()-user_enter() through the slow path using TIF_NOHZ flag.
> +         Exceptions handlers must be wrapped as well. Irqs are already
> +         protected inside rcu_irq_enter/rcu_irq_exit() but preemption or
> +         signal handling on irq exit still need to be protected.
>

Thanks,
Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
 -- Jean-Baptiste Queru

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/3] context_tracking: New context tracking susbsystem
  2012-11-06  9:53   ` Gilad Ben-Yossef
@ 2012-11-26 23:17     ` Frederic Weisbecker
  0 siblings, 0 replies; 6+ messages in thread
From: Frederic Weisbecker @ 2012-11-26 23:17 UTC (permalink / raw)
  To: Gilad Ben-Yossef
  Cc: LKML, Andrew Morton, H. Peter Anvin, Ingo Molnar,
	Paul E. McKenney, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner

2012/11/6 Gilad Ben-Yossef <gilad@benyossef.com>:
> On Sat, Nov 3, 2012 at 6:09 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
>> diff --git a/arch/Kconfig b/arch/Kconfig
>> index 366ec06..3855e06 100644
>> --- a/arch/Kconfig
>> +++ b/arch/Kconfig
>> @@ -300,15 +300,15 @@ config SECCOMP_FILTER
>>
>>           See Documentation/prctl/seccomp_filter.txt for details.
>>
>> -config HAVE_RCU_USER_QS
>> +config HAVE_CONTEXT_TRACKING
>>         bool
>>         help
>> -         Provide kernel entry/exit hooks necessary for userspace
>> +         Provide kernel/user boundaries probes necessary for userspace
>>           RCU extended quiescent state. Syscalls need to be wrapped inside
>
> A minor nit pick: if whole point of the patch is to turn an RCU
> specific mechanism to a generic one
> that RCU happens to use, then the text needs to reflect that. How about:
>
> "Provide kernel/user boundaries probes necessary for subsystems that
> need it, such as userspace
> RCU extended quiescent state. "

Good point! I'm fixing this.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-11-26 23:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-03 16:09 [PATCH 0/3] cputime: Generic virtual based cputime accounting v4 Frederic Weisbecker
2012-11-03 16:09 ` [PATCH 1/3] context_tracking: New context tracking susbsystem Frederic Weisbecker
2012-11-06  9:53   ` Gilad Ben-Yossef
2012-11-26 23:17     ` Frederic Weisbecker
2012-11-03 16:09 ` [PATCH 2/3] cputime: Allow dynamic switch between tick/virtual based cputime accounting Frederic Weisbecker
2012-11-03 16:09 ` [PATCH 3/3] cputime: Generic on-demand virtual " Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).