All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs
@ 2014-11-28 18:23 Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 01/30] jiffies: Remove HZ > USEC_PER_SEC special case Frederic Weisbecker
                   ` (29 more replies)
  0 siblings, 30 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

Hi,

Thomas suggested to store the cpu and task stats in nanoseconds in order
to avoid back and forth conversion between cputime_t to nsecs.

This patchset does that (plus many fixes and cleanups).

There should be a performance impact for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
configurations which account cputime from arch's kernel entrypoint. As
this config always use cputime_t as a time source, the conversion to nsecs
is required on each accounting update. This concern only powerpc and s390
(ia64 also support this mode but its cputime_t wraps nsecs so the conversion
is a noop there). I'm not sure how much this config is used in powerpc,
it doesn't appear in its defconfigs and that mode is expected to be a
bit slower than tick based accounting anyway. But s390 only supports
this mode (no support for tick based accounting).

But on the other side of the balance, it simplifies the core code a bit.

The patchset isn't complete, I have yet to convert the posix cpu timers
code as well. I just need to post the current state before moving forward
to details.

I need to get your opinion on that patchset before going deeper. Is this
conversion a right direction to take?

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	cpustat/nsecs

HEAD: 3539960cb3be04512d5f601c8e83c8bab658e0c1

Thanks,
	Frederic
---

Frederic Weisbecker (30):
      jiffies: Remove HZ > USEC_PER_SEC special case
      time: Introduce jiffies64_to_nsecs()
      cputime: Introduce nsecs_to_cputime64()
      s390: Convert open coded idle time seqcount
      s390: Translate cputime magic constants to macros
      s390: Introduce cputime64_to_nsecs()
      cputime: Convert kcpustat to nsecs
      apm32: Fix cputime == jiffies assumption
      alpha: Fix jiffies based cputime assumption
      cputime: Convert guest time accounting to nsecs
      cputime: Special API to return old-typed cputime
      cputime: Convert task/group cputime to nsecs
      alpha: Convert obsolete cputime_t to nsecs
      x86: Convert obsolete cputime type to nsecs
      isdn: Convert obsolete cputime type to nsecs
      binfmt: Convert obsolete cputime type to nsecs
      acct: Convert obsolete cputime type to nsecs
      delaycct: Convert obsolete cputime type to nsecs
      tsacct: Convert obsolete cputime type to nsecs
      signal: Convert obsolete cputime type to nsecs
      cputime: Remove task_cputime_t_scaled
      u64_stats_sync: Introduce preempt-unsafe readers
      cputime: Convert irq_time_accounting to use u64_stats_sync
      cputime: Increment kcpustat directly on irqtime account
      cputime: Remove temporary irqtime states
      cputime: Push time to account_user_time() in nanosecs
      cputime: Push time to account_steal_time() in nanosecs
      cputime: Push time to account_idle_time() in nanosecs
      cputime: Push time to account_guest_time() in nanosecs
      cputime: Push time to account_system_time() in nanosecs


 arch/alpha/kernel/osf_sys.c        |  15 ++-
 arch/ia64/kernel/time.c            |   9 +-
 arch/powerpc/kernel/time.c         |  10 +-
 arch/s390/appldata/appldata_os.c   |  16 +--
 arch/s390/include/asm/cputime.h    |  52 +++++----
 arch/s390/include/asm/idle.h       |   3 +-
 arch/s390/kernel/idle.c            |  30 ++---
 arch/s390/kernel/vtime.c           |  15 ++-
 arch/x86/kernel/apm_32.c           |   6 +-
 drivers/cpufreq/cpufreq.c          |   6 +-
 drivers/cpufreq/cpufreq_governor.c |  14 +--
 drivers/isdn/mISDN/stack.c         |   4 +-
 drivers/macintosh/rack-meter.c     |   2 +-
 fs/binfmt_elf.c                    |  15 +--
 fs/binfmt_elf_fdpic.c              |  14 +--
 fs/compat_binfmt_elf.c             |  20 ++--
 fs/proc/array.c                    |  15 ++-
 fs/proc/stat.c                     |  68 +++++------
 fs/proc/uptime.c                   |   6 +-
 include/linux/cputime.h            |  22 ++++
 include/linux/init_task.h          |   2 +-
 include/linux/jiffies.h            |   2 +
 include/linux/kernel_stat.h        |  10 +-
 include/linux/sched.h              |  83 ++++++++++----
 include/linux/u64_stats_sync.h     |  29 +++--
 kernel/acct.c                      |   7 +-
 kernel/delayacct.c                 |   6 +-
 kernel/exit.c                      |   4 +-
 kernel/sched/cputime.c             | 224 ++++++++++++++++++-------------------
 kernel/sched/sched.h               |  44 ++------
 kernel/sched/stats.h               |   8 +-
 kernel/signal.c                    |  12 +-
 kernel/sys.c                       |  16 +--
 kernel/time/itimer.c               |   2 +-
 kernel/time/posix-cpu-timers.c     |  44 ++++----
 kernel/time/time.c                 |  21 +++-
 kernel/time/timeconst.bc           |   6 +
 kernel/tsacct.c                    |  18 +--
 38 files changed, 475 insertions(+), 405 deletions(-)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH 01/30] jiffies: Remove HZ > USEC_PER_SEC special case
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 02/30] time: Introduce jiffies64_to_nsecs() Frederic Weisbecker
                   ` (28 subsequent siblings)
  29 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

HZ never goes much further 1000 and a bit. And if we ever reach one tick
per microsecond, we might be having a problem.

Lets stop maintaining this special case.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/time/time.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/kernel/time/time.c b/kernel/time/time.c
index a9ae20f..e44f6f1 100644
--- a/kernel/time/time.c
+++ b/kernel/time/time.c
@@ -260,10 +260,11 @@ EXPORT_SYMBOL(jiffies_to_msecs);
 
 unsigned int jiffies_to_usecs(const unsigned long j)
 {
-#if HZ <= USEC_PER_SEC && !(USEC_PER_SEC % HZ)
+	/* Hz usually doesn't go much further MSEC_PER_SEC */
+	BUILD_BUG_ON(HZ >= USEC_PER_SEC);
+
+#if !(USEC_PER_SEC % HZ)
 	return (USEC_PER_SEC / HZ) * j;
-#elif HZ > USEC_PER_SEC && !(HZ % USEC_PER_SEC)
-	return (j + (HZ / USEC_PER_SEC) - 1)/(HZ / USEC_PER_SEC);
 #else
 # if BITS_PER_LONG == 32
 	return (HZ_TO_USEC_MUL32 * j) >> HZ_TO_USEC_SHR32;
@@ -543,10 +544,8 @@ unsigned long usecs_to_jiffies(const unsigned int u)
 {
 	if (u > jiffies_to_usecs(MAX_JIFFY_OFFSET))
 		return MAX_JIFFY_OFFSET;
-#if HZ <= USEC_PER_SEC && !(USEC_PER_SEC % HZ)
+#if !(USEC_PER_SEC % HZ)
 	return (u + (USEC_PER_SEC / HZ) - 1) / (USEC_PER_SEC / HZ);
-#elif HZ > USEC_PER_SEC && !(HZ % USEC_PER_SEC)
-	return u * (HZ / USEC_PER_SEC);
 #else
 	return (USEC_TO_HZ_MUL32 * u + USEC_TO_HZ_ADJ32)
 		>> USEC_TO_HZ_SHR32;
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 02/30] time: Introduce jiffies64_to_nsecs()
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 01/30] jiffies: Remove HZ > USEC_PER_SEC special case Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 03/30] cputime: Introduce nsecs_to_cputime64() Frederic Weisbecker
                   ` (27 subsequent siblings)
  29 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

This will be needed for the cputime_t to nsec conversion.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/jiffies.h  |  2 ++
 kernel/time/time.c       | 10 ++++++++++
 kernel/time/timeconst.bc |  6 ++++++
 3 files changed, 18 insertions(+)

diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
index c367cbd..1161e5c 100644
--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -288,6 +288,8 @@ static inline u64 jiffies_to_nsecs(const unsigned long j)
 	return (u64)jiffies_to_usecs(j) * NSEC_PER_USEC;
 }
 
+extern u64 jiffies64_to_nsecs(u64 j);
+
 extern unsigned long msecs_to_jiffies(const unsigned int m);
 extern unsigned long usecs_to_jiffies(const unsigned int u);
 extern unsigned long timespec_to_jiffies(const struct timespec *value);
diff --git a/kernel/time/time.c b/kernel/time/time.c
index e44f6f1..0fe3490 100644
--- a/kernel/time/time.c
+++ b/kernel/time/time.c
@@ -715,6 +715,16 @@ u64 nsec_to_clock_t(u64 x)
 #endif
 }
 
+u64 jiffies64_to_nsecs(u64 j)
+{
+#if !(NSEC_PER_SEC % HZ)
+	return (NSEC_PER_SEC / HZ) * j;
+# else
+	return div_u64(j * HZ_TO_NSEC_NUM, HZ_TO_NSEC_DEN);
+#endif
+}
+EXPORT_SYMBOL(jiffies64_to_nsecs);
+
 /**
  * nsecs_to_jiffies64 - Convert nsecs in u64 to jiffies64
  *
diff --git a/kernel/time/timeconst.bc b/kernel/time/timeconst.bc
index 511bdf2..a0ab148 100644
--- a/kernel/time/timeconst.bc
+++ b/kernel/time/timeconst.bc
@@ -98,6 +98,12 @@ define timeconst(hz) {
 		print "#define HZ_TO_USEC_DEN\t\t", hz/cd, "\n"
 		print "#define USEC_TO_HZ_NUM\t\t", hz/cd, "\n"
 		print "#define USEC_TO_HZ_DEN\t\t", 1000000/cd, "\n"
+
+		cd=gcd(hz,1000000000)
+		print "#define HZ_TO_NSEC_NUM\t\t", 1000000000/cd, "\n"
+		print "#define HZ_TO_NSEC_DEN\t\t", hz/cd, "\n"
+		print "#define NSEC_TO_HZ_NUM\t\t", hz/cd, "\n"
+		print "#define NSEC_TO_HZ_DEN\t\t", 1000000000/cd, "\n"
 		print "\n"
 
 		print "#endif /* KERNEL_TIMECONST_H */\n"
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 03/30] cputime: Introduce nsecs_to_cputime64()
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 01/30] jiffies: Remove HZ > USEC_PER_SEC special case Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 02/30] time: Introduce jiffies64_to_nsecs() Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-12-01 14:05   ` Martin Schwidefsky
  2014-11-28 18:23 ` [RFC PATCH 04/30] s390: Convert open coded idle time seqcount Frederic Weisbecker
                   ` (26 subsequent siblings)
  29 siblings, 1 reply; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

This will be needed for the conversion of kernel stat to nsecs.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/cputime.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/cputime.h b/include/linux/cputime.h
index f2eb2ee..a225ab9 100644
--- a/include/linux/cputime.h
+++ b/include/linux/cputime.h
@@ -13,4 +13,14 @@
 	usecs_to_cputime((__nsecs) / NSEC_PER_USEC)
 #endif
 
+#ifndef nsecs_to_cputime
+# define nsecs_to_cputime(__nsecs)	\
+	usecs_to_cputime((__nsecs) / NSEC_PER_USEC)
+#endif
+
+#ifndef nsecs_to_cputime64
+# define nsecs_to_cputime64(__nsecs)	\
+	((__force cputime64_t) nsecs_to_cputime(__nsecs))
+#endif
+
 #endif /* __LINUX_CPUTIME_H */
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 04/30] s390: Convert open coded idle time seqcount
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 03/30] cputime: Introduce nsecs_to_cputime64() Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-12-01 13:46   ` Heiko Carstens
  2014-11-28 18:23 ` [RFC PATCH 05/30] s390: Translate cputime magic constants to macros Frederic Weisbecker
                   ` (25 subsequent siblings)
  29 siblings, 1 reply; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

s390 uses open coded seqcount to synchronize idle time accounting.
Lets consolidate it with the standard API.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/s390/include/asm/idle.h |  3 ++-
 arch/s390/kernel/idle.c      | 28 +++++++++++++++-------------
 2 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/arch/s390/include/asm/idle.h b/arch/s390/include/asm/idle.h
index 6af037f..113cd96 100644
--- a/arch/s390/include/asm/idle.h
+++ b/arch/s390/include/asm/idle.h
@@ -9,9 +9,10 @@
 
 #include <linux/types.h>
 #include <linux/device.h>
+#include <linux/seqlock.h>
 
 struct s390_idle_data {
-	unsigned int sequence;
+	seqcount_t seqcount;
 	unsigned long long idle_count;
 	unsigned long long idle_time;
 	unsigned long long clock_idle_enter;
diff --git a/arch/s390/kernel/idle.c b/arch/s390/kernel/idle.c
index 7559f1b..9b75577 100644
--- a/arch/s390/kernel/idle.c
+++ b/arch/s390/kernel/idle.c
@@ -36,15 +36,13 @@ void __kprobes enabled_wait(void)
 	psw_idle(idle, psw_mask);
 
 	/* Account time spent with enabled wait psw loaded as idle time. */
-	idle->sequence++;
-	smp_wmb();
+	write_seqcount_begin(&idle->seqcount);
 	idle_time = idle->clock_idle_exit - idle->clock_idle_enter;
 	idle->clock_idle_enter = idle->clock_idle_exit = 0ULL;
 	idle->idle_time += idle_time;
 	idle->idle_count++;
 	account_idle_time(idle_time);
-	smp_wmb();
-	idle->sequence++;
+	write_seqcount_end(&idle->seqcount);
 }
 
 static ssize_t show_idle_count(struct device *dev,
@@ -52,14 +50,15 @@ static ssize_t show_idle_count(struct device *dev,
 {
 	struct s390_idle_data *idle = &per_cpu(s390_idle, dev->id);
 	unsigned long long idle_count;
-	unsigned int sequence;
+	unsigned int seq;
 
 	do {
-		sequence = ACCESS_ONCE(idle->sequence);
+		seq = read_seqcount_begin(&idle->seqcount);
 		idle_count = ACCESS_ONCE(idle->idle_count);
 		if (ACCESS_ONCE(idle->clock_idle_enter))
 			idle_count++;
-	} while ((sequence & 1) || (ACCESS_ONCE(idle->sequence) != sequence));
+	} while (read_seqcount_retry(&idle->seqcount, seq));
+
 	return sprintf(buf, "%llu\n", idle_count);
 }
 DEVICE_ATTR(idle_count, 0444, show_idle_count, NULL);
@@ -69,16 +68,18 @@ static ssize_t show_idle_time(struct device *dev,
 {
 	struct s390_idle_data *idle = &per_cpu(s390_idle, dev->id);
 	unsigned long long now, idle_time, idle_enter, idle_exit;
-	unsigned int sequence;
+	unsigned int seq;
 
 	do {
 		now = get_tod_clock();
-		sequence = ACCESS_ONCE(idle->sequence);
+		seq = read_seqcount_begin(&idle->seqcount);
 		idle_time = ACCESS_ONCE(idle->idle_time);
 		idle_enter = ACCESS_ONCE(idle->clock_idle_enter);
 		idle_exit = ACCESS_ONCE(idle->clock_idle_exit);
-	} while ((sequence & 1) || (ACCESS_ONCE(idle->sequence) != sequence));
+	} while (read_seqcount_retry(&idle->seqcount, seq));
+
 	idle_time += idle_enter ? ((idle_exit ? : now) - idle_enter) : 0;
+
 	return sprintf(buf, "%llu\n", idle_time >> 12);
 }
 DEVICE_ATTR(idle_time_us, 0444, show_idle_time, NULL);
@@ -87,14 +88,15 @@ cputime64_t arch_cpu_idle_time(int cpu)
 {
 	struct s390_idle_data *idle = &per_cpu(s390_idle, cpu);
 	unsigned long long now, idle_enter, idle_exit;
-	unsigned int sequence;
+	unsigned int seq;
 
 	do {
 		now = get_tod_clock();
-		sequence = ACCESS_ONCE(idle->sequence);
+		seq = read_seqcount_begin(&idle->seqcount);
 		idle_enter = ACCESS_ONCE(idle->clock_idle_enter);
 		idle_exit = ACCESS_ONCE(idle->clock_idle_exit);
-	} while ((sequence & 1) || (ACCESS_ONCE(idle->sequence) != sequence));
+	} while (read_seqcount_retry(&idle->seqcount, seq));
+
 	return idle_enter ? ((idle_exit ?: now) - idle_enter) : 0;
 }
 
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 05/30] s390: Translate cputime magic constants to macros
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 04/30] s390: Convert open coded idle time seqcount Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-12-01 13:47   ` Heiko Carstens
  2014-11-28 18:23 ` [RFC PATCH 06/30] s390: Introduce cputime64_to_nsecs() Frederic Weisbecker
                   ` (24 subsequent siblings)
  29 siblings, 1 reply; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

Make the code more self-explanatory by naming magic constants.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/s390/include/asm/cputime.h | 47 +++++++++++++++++++++--------------------
 1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/arch/s390/include/asm/cputime.h b/arch/s390/include/asm/cputime.h
index f8c1969..820b38a 100644
--- a/arch/s390/include/asm/cputime.h
+++ b/arch/s390/include/asm/cputime.h
@@ -10,7 +10,8 @@
 #include <linux/types.h>
 #include <asm/div64.h>
 
-
+#define CPUTIME_PER_USEC 4096ULL
+#define CPUTIME_PER_SEC (CPUTIME_PER_USEC * USEC_PER_SEC)
 /* We want to use full resolution of the CPU timer: 2**-12 micro-seconds. */
 
 typedef unsigned long long __nocast cputime_t;
@@ -38,24 +39,24 @@ static inline unsigned long __div(unsigned long long n, unsigned long base)
  */
 static inline unsigned long cputime_to_jiffies(const cputime_t cputime)
 {
-	return __div((__force unsigned long long) cputime, 4096000000ULL / HZ);
+	return __div((__force unsigned long long) cputime, CPUTIME_PER_SEC / HZ);
 }
 
 static inline cputime_t jiffies_to_cputime(const unsigned int jif)
 {
-	return (__force cputime_t)(jif * (4096000000ULL / HZ));
+	return (__force cputime_t)(jif * (CPUTIME_PER_SEC / HZ));
 }
 
 static inline u64 cputime64_to_jiffies64(cputime64_t cputime)
 {
 	unsigned long long jif = (__force unsigned long long) cputime;
-	do_div(jif, 4096000000ULL / HZ);
+	do_div(jif, CPUTIME_PER_SEC / HZ);
 	return jif;
 }
 
 static inline cputime64_t jiffies64_to_cputime64(const u64 jif)
 {
-	return (__force cputime64_t)(jif * (4096000000ULL / HZ));
+	return (__force cputime64_t)(jif * (CPUTIME_PER_SEC / HZ));
 }
 
 /*
@@ -68,7 +69,7 @@ static inline unsigned int cputime_to_usecs(const cputime_t cputime)
 
 static inline cputime_t usecs_to_cputime(const unsigned int m)
 {
-	return (__force cputime_t)(m * 4096ULL);
+	return (__force cputime_t)(m * CPUTIME_PER_USEC);
 }
 
 #define usecs_to_cputime64(m)		usecs_to_cputime(m)
@@ -78,12 +79,12 @@ static inline cputime_t usecs_to_cputime(const unsigned int m)
  */
 static inline unsigned int cputime_to_secs(const cputime_t cputime)
 {
-	return __div((__force unsigned long long) cputime, 2048000000) >> 1;
+	return __div((__force unsigned long long) cputime, CPUTIME_PER_SEC / 2) >> 1;
 }
 
 static inline cputime_t secs_to_cputime(const unsigned int s)
 {
-	return (__force cputime_t)(s * 4096000000ULL);
+	return (__force cputime_t)(s * CPUTIME_PER_SEC);
 }
 
 /*
@@ -91,8 +92,8 @@ static inline cputime_t secs_to_cputime(const unsigned int s)
  */
 static inline cputime_t timespec_to_cputime(const struct timespec *value)
 {
-	unsigned long long ret = value->tv_sec * 4096000000ULL;
-	return (__force cputime_t)(ret + value->tv_nsec * 4096 / 1000);
+	unsigned long long ret = value->tv_sec * CPUTIME_PER_SEC;
+	return (__force cputime_t)(ret + (value->tv_nsec * CPUTIME_PER_USEC) / NSEC_PER_USEC);
 }
 
 static inline void cputime_to_timespec(const cputime_t cputime,
@@ -103,12 +104,12 @@ static inline void cputime_to_timespec(const cputime_t cputime,
 	register_pair rp;
 
 	rp.pair = __cputime >> 1;
-	asm ("dr %0,%1" : "+d" (rp) : "d" (2048000000UL));
-	value->tv_nsec = rp.subreg.even * 1000 / 4096;
+	asm ("dr %0,%1" : "+d" (rp) : "d" (CPUTIME_PER_SEC / 2));
+	value->tv_nsec = rp.subreg.even * NSEC_PER_USEC / CPUTIME_PER_USEC;
 	value->tv_sec = rp.subreg.odd;
 #else
-	value->tv_nsec = (__cputime % 4096000000ULL) * 1000 / 4096;
-	value->tv_sec = __cputime / 4096000000ULL;
+	value->tv_nsec = (__cputime % CPUTIME_PER_SEC) * NSEC_PER_USEC / CPUTIME_PER_USEC;
+	value->tv_sec = __cputime / CPUTIME_PER_SEC;
 #endif
 }
 
@@ -119,8 +120,8 @@ static inline void cputime_to_timespec(const cputime_t cputime,
  */
 static inline cputime_t timeval_to_cputime(const struct timeval *value)
 {
-	unsigned long long ret = value->tv_sec * 4096000000ULL;
-	return (__force cputime_t)(ret + value->tv_usec * 4096ULL);
+	unsigned long long ret = value->tv_sec * CPUTIME_PER_SEC;
+	return (__force cputime_t)(ret + value->tv_usec * CPUTIME_PER_USEC);
 }
 
 static inline void cputime_to_timeval(const cputime_t cputime,
@@ -131,12 +132,12 @@ static inline void cputime_to_timeval(const cputime_t cputime,
 	register_pair rp;
 
 	rp.pair = __cputime >> 1;
-	asm ("dr %0,%1" : "+d" (rp) : "d" (2048000000UL));
-	value->tv_usec = rp.subreg.even / 4096;
+	asm ("dr %0,%1" : "+d" (rp) : "d" (CPUTIME_PER_USEC / 2));
+	value->tv_usec = rp.subreg.even / CPUTIME_PER_USEC;
 	value->tv_sec = rp.subreg.odd;
 #else
-	value->tv_usec = (__cputime % 4096000000ULL) / 4096;
-	value->tv_sec = __cputime / 4096000000ULL;
+	value->tv_usec = (__cputime % CPUTIME_PER_SEC) / CPUTIME_PER_USEC;
+	value->tv_sec = __cputime / CPUTIME_PER_SEC;
 #endif
 }
 
@@ -146,13 +147,13 @@ static inline void cputime_to_timeval(const cputime_t cputime,
 static inline clock_t cputime_to_clock_t(cputime_t cputime)
 {
 	unsigned long long clock = (__force unsigned long long) cputime;
-	do_div(clock, 4096000000ULL / USER_HZ);
+	do_div(clock, CPUTIME_PER_SEC / USER_HZ);
 	return clock;
 }
 
 static inline cputime_t clock_t_to_cputime(unsigned long x)
 {
-	return (__force cputime_t)(x * (4096000000ULL / USER_HZ));
+	return (__force cputime_t)(x * (CPUTIME_PER_SEC / USER_HZ));
 }
 
 /*
@@ -161,7 +162,7 @@ static inline cputime_t clock_t_to_cputime(unsigned long x)
 static inline clock_t cputime64_to_clock_t(cputime64_t cputime)
 {
 	unsigned long long clock = (__force unsigned long long) cputime;
-	do_div(clock, 4096000000ULL / USER_HZ);
+	do_div(clock, CPUTIME_PER_SEC / USER_HZ);
 	return clock;
 }
 
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 06/30] s390: Introduce cputime64_to_nsecs()
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 05/30] s390: Translate cputime magic constants to macros Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-12-01 12:24   ` Heiko Carstens
  2014-11-28 18:23 ` [RFC PATCH 07/30] cputime: Convert kcpustat to nsecs Frederic Weisbecker
                   ` (23 subsequent siblings)
  29 siblings, 1 reply; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

This will be needed for the conversion of kernel stat to nsecs.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/s390/include/asm/cputime.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/s390/include/asm/cputime.h b/arch/s390/include/asm/cputime.h
index 820b38a..75ba96f 100644
--- a/arch/s390/include/asm/cputime.h
+++ b/arch/s390/include/asm/cputime.h
@@ -59,6 +59,11 @@ static inline cputime64_t jiffies64_to_cputime64(const u64 jif)
 	return (__force cputime64_t)(jif * (CPUTIME_PER_SEC / HZ));
 }
 
+static inline u64 cputime64_to_nsecs(cputime64_t cputime)
+{
+	return (__force u64)cputime * CPUTIME_PER_USEC * NSEC_PER_USEC;
+}
+
 /*
  * Convert cputime to microseconds and back.
  */
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 07/30] cputime: Convert kcpustat to nsecs
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (5 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 06/30] s390: Introduce cputime64_to_nsecs() Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-12-01 14:14   ` Martin Schwidefsky
  2014-11-28 18:23 ` [RFC PATCH 08/30] apm32: Fix cputime == jiffies assumption Frederic Weisbecker
                   ` (22 subsequent siblings)
  29 siblings, 1 reply; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

Kernel cpu stats are stored in cputime_t which is an architecture
defined type, and hence a bit opaque and requiring accessors and mutators
for any operation.

Converting them to nsecs simplifies the code a little bit.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/s390/appldata/appldata_os.c   | 16 ++++-----
 drivers/cpufreq/cpufreq.c          |  6 ++--
 drivers/cpufreq/cpufreq_governor.c | 14 ++------
 drivers/macintosh/rack-meter.c     |  2 +-
 fs/proc/stat.c                     | 68 +++++++++++++++++++-------------------
 fs/proc/uptime.c                   |  6 ++--
 kernel/sched/cputime.c             | 35 ++++++++++----------
 7 files changed, 69 insertions(+), 78 deletions(-)

diff --git a/arch/s390/appldata/appldata_os.c b/arch/s390/appldata/appldata_os.c
index 69b23b2..08b9e94 100644
--- a/arch/s390/appldata/appldata_os.c
+++ b/arch/s390/appldata/appldata_os.c
@@ -113,21 +113,21 @@ static void appldata_get_os_data(void *data)
 	j = 0;
 	for_each_online_cpu(i) {
 		os_data->os_cpu[j].per_cpu_user =
-			cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_USER]);
+			nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_USER]);
 		os_data->os_cpu[j].per_cpu_nice =
-			cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_NICE]);
+			nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_NICE]);
 		os_data->os_cpu[j].per_cpu_system =
-			cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_SYSTEM]);
+			nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_SYSTEM]);
 		os_data->os_cpu[j].per_cpu_idle =
-			cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IDLE]);
+			nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IDLE]);
 		os_data->os_cpu[j].per_cpu_irq =
-			cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IRQ]);
+			nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IRQ]);
 		os_data->os_cpu[j].per_cpu_softirq =
-			cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_SOFTIRQ]);
+			nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_SOFTIRQ]);
 		os_data->os_cpu[j].per_cpu_iowait =
-			cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IOWAIT]);
+			nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IOWAIT]);
 		os_data->os_cpu[j].per_cpu_steal =
-			cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_STEAL]);
+			nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_STEAL]);
 		os_data->os_cpu[j].cpu_id = i;
 		j++;
 	}
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 4473eba..e7df764 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -117,7 +117,7 @@ static inline u64 get_cpu_idle_time_jiffy(unsigned int cpu, u64 *wall)
 	u64 cur_wall_time;
 	u64 busy_time;
 
-	cur_wall_time = jiffies64_to_cputime64(get_jiffies_64());
+	cur_wall_time = jiffies64_to_nsecs(get_jiffies_64());
 
 	busy_time = kcpustat_cpu(cpu).cpustat[CPUTIME_USER];
 	busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_SYSTEM];
@@ -128,9 +128,9 @@ static inline u64 get_cpu_idle_time_jiffy(unsigned int cpu, u64 *wall)
 
 	idle_time = cur_wall_time - busy_time;
 	if (wall)
-		*wall = cputime_to_usecs(cur_wall_time);
+		*wall = div_u64(cur_wall_time, NSEC_PER_USEC);
 
-	return cputime_to_usecs(idle_time);
+	return div_u64(idle_time, NSEC_PER_USEC);
 }
 
 u64 get_cpu_idle_time(unsigned int cpu, u64 *wall, int io_busy)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 1b44496..75f9a389 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -92,20 +92,12 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
 
 		if (ignore_nice) {
 			u64 cur_nice;
-			unsigned long cur_nice_jiffies;
 
 			cur_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE] -
-					 cdbs->prev_cpu_nice;
-			/*
-			 * Assumption: nice time between sampling periods will
-			 * be less than 2^32 jiffies for 32 bit sys
-			 */
-			cur_nice_jiffies = (unsigned long)
-					cputime64_to_jiffies64(cur_nice);
+				  cdbs->prev_cpu_nice;
 
-			cdbs->prev_cpu_nice =
-				kcpustat_cpu(j).cpustat[CPUTIME_NICE];
-			idle_time += jiffies_to_usecs(cur_nice_jiffies);
+			cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
+			idle_time += div_u64(cur_nice, NSEC_PER_USEC);
 		}
 
 		if (unlikely(!wall_time || wall_time < idle_time))
diff --git a/drivers/macintosh/rack-meter.c b/drivers/macintosh/rack-meter.c
index 4192901..6c5fb49 100644
--- a/drivers/macintosh/rack-meter.c
+++ b/drivers/macintosh/rack-meter.c
@@ -91,7 +91,7 @@ static inline cputime64_t get_cpu_idle_time(unsigned int cpu)
 	if (rackmeter_ignore_nice)
 		retval += kcpustat_cpu(cpu).cpustat[CPUTIME_NICE];
 
-	return retval;
+	return nsecs_to_cputime64(retval);
 }
 
 static void rackmeter_setup_i2s(struct rackmeter *rm)
diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index bf2d03f..9726e01 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -21,23 +21,23 @@
 
 #ifdef arch_idle_time
 
-static cputime64_t get_idle_time(int cpu)
+static u64 get_idle_time(int cpu)
 {
-	cputime64_t idle;
+	u64 idle;
 
 	idle = kcpustat_cpu(cpu).cpustat[CPUTIME_IDLE];
 	if (cpu_online(cpu) && !nr_iowait_cpu(cpu))
-		idle += arch_idle_time(cpu);
+		idle += cputime64_to_nsecs(arch_idle_time(cpu));
 	return idle;
 }
 
-static cputime64_t get_iowait_time(int cpu)
+static u64 get_iowait_time(int cpu)
 {
-	cputime64_t iowait;
+	u64 iowait;
 
 	iowait = kcpustat_cpu(cpu).cpustat[CPUTIME_IOWAIT];
 	if (cpu_online(cpu) && nr_iowait_cpu(cpu))
-		iowait += arch_idle_time(cpu);
+		iowait += cputime64_to_nsecs(arch_idle_time(cpu));
 	return iowait;
 }
 
@@ -45,32 +45,32 @@ static cputime64_t get_iowait_time(int cpu)
 
 static u64 get_idle_time(int cpu)
 {
-	u64 idle, idle_time = -1ULL;
+	u64 idle, idle_usecs = -1ULL;
 
 	if (cpu_online(cpu))
-		idle_time = get_cpu_idle_time_us(cpu, NULL);
+		idle_usecs = get_cpu_idle_time_us(cpu, NULL);
 
-	if (idle_time == -1ULL)
+	if (idle_usecs == -1ULL)
 		/* !NO_HZ or cpu offline so we can rely on cpustat.idle */
 		idle = kcpustat_cpu(cpu).cpustat[CPUTIME_IDLE];
 	else
-		idle = usecs_to_cputime64(idle_time);
+		idle = idle_usecs * NSEC_PER_USEC;
 
 	return idle;
 }
 
 static u64 get_iowait_time(int cpu)
 {
-	u64 iowait, iowait_time = -1ULL;
+	u64 iowait, iowait_usecs = -1ULL;
 
 	if (cpu_online(cpu))
-		iowait_time = get_cpu_iowait_time_us(cpu, NULL);
+		iowait_usecs = get_cpu_iowait_time_us(cpu, NULL);
 
-	if (iowait_time == -1ULL)
+	if (iowait_usecs == -1ULL)
 		/* !NO_HZ or cpu offline so we can rely on cpustat.iowait */
 		iowait = kcpustat_cpu(cpu).cpustat[CPUTIME_IOWAIT];
 	else
-		iowait = usecs_to_cputime64(iowait_time);
+		iowait = iowait_usecs * NSEC_PER_USEC;
 
 	return iowait;
 }
@@ -118,16 +118,16 @@ static int show_stat(struct seq_file *p, void *v)
 	sum += arch_irq_stat();
 
 	seq_puts(p, "cpu ");
-	seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(user));
-	seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(nice));
-	seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(system));
-	seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(idle));
-	seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(iowait));
-	seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(irq));
-	seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(softirq));
-	seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(steal));
-	seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest));
-	seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice));
+	seq_put_decimal_ull(p, ' ', nsec_to_clock_t(user));
+	seq_put_decimal_ull(p, ' ', nsec_to_clock_t(nice));
+	seq_put_decimal_ull(p, ' ', nsec_to_clock_t(system));
+	seq_put_decimal_ull(p, ' ', nsec_to_clock_t(idle));
+	seq_put_decimal_ull(p, ' ', nsec_to_clock_t(iowait));
+	seq_put_decimal_ull(p, ' ', nsec_to_clock_t(irq));
+	seq_put_decimal_ull(p, ' ', nsec_to_clock_t(softirq));
+	seq_put_decimal_ull(p, ' ', nsec_to_clock_t(steal));
+	seq_put_decimal_ull(p, ' ', nsec_to_clock_t(guest));
+	seq_put_decimal_ull(p, ' ', nsec_to_clock_t(guest_nice));
 	seq_putc(p, '\n');
 
 	for_each_online_cpu(i) {
@@ -143,16 +143,16 @@ static int show_stat(struct seq_file *p, void *v)
 		guest = kcpustat_cpu(i).cpustat[CPUTIME_GUEST];
 		guest_nice = kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE];
 		seq_printf(p, "cpu%d", i);
-		seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(user));
-		seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(nice));
-		seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(system));
-		seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(idle));
-		seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(iowait));
-		seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(irq));
-		seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(softirq));
-		seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(steal));
-		seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest));
-		seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice));
+		seq_put_decimal_ull(p, ' ', nsec_to_clock_t(user));
+		seq_put_decimal_ull(p, ' ', nsec_to_clock_t(nice));
+		seq_put_decimal_ull(p, ' ', nsec_to_clock_t(system));
+		seq_put_decimal_ull(p, ' ', nsec_to_clock_t(idle));
+		seq_put_decimal_ull(p, ' ', nsec_to_clock_t(iowait));
+		seq_put_decimal_ull(p, ' ', nsec_to_clock_t(irq));
+		seq_put_decimal_ull(p, ' ', nsec_to_clock_t(softirq));
+		seq_put_decimal_ull(p, ' ', nsec_to_clock_t(steal));
+		seq_put_decimal_ull(p, ' ', nsec_to_clock_t(guest));
+		seq_put_decimal_ull(p, ' ', nsec_to_clock_t(guest_nice));
 		seq_putc(p, '\n');
 	}
 	seq_printf(p, "intr %llu", (unsigned long long)sum);
diff --git a/fs/proc/uptime.c b/fs/proc/uptime.c
index 33de567..812a7f9 100644
--- a/fs/proc/uptime.c
+++ b/fs/proc/uptime.c
@@ -11,17 +11,15 @@ static int uptime_proc_show(struct seq_file *m, void *v)
 {
 	struct timespec uptime;
 	struct timespec idle;
-	u64 idletime;
 	u64 nsec;
 	u32 rem;
 	int i;
 
-	idletime = 0;
+	nsec = 0;
 	for_each_possible_cpu(i)
-		idletime += (__force u64) kcpustat_cpu(i).cpustat[CPUTIME_IDLE];
+		nsec += (__force u64) kcpustat_cpu(i).cpustat[CPUTIME_IDLE];
 
 	get_monotonic_boottime(&uptime);
-	nsec = cputime64_to_jiffies64(idletime) * TICK_NSEC;
 	idle.tv_sec = div_u64_rem(nsec, NSEC_PER_SEC, &rem);
 	idle.tv_nsec = rem;
 	seq_printf(m, "%lu.%02lu %lu.%02lu\n",
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 8394b1e..6cfdc2b 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -76,7 +76,7 @@ void irqtime_account_irq(struct task_struct *curr)
 }
 EXPORT_SYMBOL_GPL(irqtime_account_irq);
 
-static int irqtime_account_hi_update(void)
+static int irqtime_account_hi_update(u64 threshold)
 {
 	u64 *cpustat = kcpustat_this_cpu->cpustat;
 	unsigned long flags;
@@ -85,13 +85,13 @@ static int irqtime_account_hi_update(void)
 
 	local_irq_save(flags);
 	latest_ns = this_cpu_read(cpu_hardirq_time);
-	if (nsecs_to_cputime64(latest_ns) > cpustat[CPUTIME_IRQ])
+	if (latest_ns - cpustat[CPUTIME_IRQ] > threshold)
 		ret = 1;
 	local_irq_restore(flags);
 	return ret;
 }
 
-static int irqtime_account_si_update(void)
+static int irqtime_account_si_update(u64 threshold)
 {
 	u64 *cpustat = kcpustat_this_cpu->cpustat;
 	unsigned long flags;
@@ -100,7 +100,7 @@ static int irqtime_account_si_update(void)
 
 	local_irq_save(flags);
 	latest_ns = this_cpu_read(cpu_softirq_time);
-	if (nsecs_to_cputime64(latest_ns) > cpustat[CPUTIME_SOFTIRQ])
+	if (latest_ns - cpustat[CPUTIME_SOFTIRQ] > threshold)
 		ret = 1;
 	local_irq_restore(flags);
 	return ret;
@@ -145,7 +145,7 @@ void account_user_time(struct task_struct *p, cputime_t cputime,
 	index = (task_nice(p) > 0) ? CPUTIME_NICE : CPUTIME_USER;
 
 	/* Add user time to cpustat. */
-	task_group_account_field(p, index, (__force u64) cputime);
+	task_group_account_field(p, index, cputime_to_nsecs(cputime));
 
 	/* Account for user time used */
 	acct_account_cputime(p);
@@ -170,11 +170,11 @@ static void account_guest_time(struct task_struct *p, cputime_t cputime,
 
 	/* Add guest time to cpustat. */
 	if (task_nice(p) > 0) {
-		cpustat[CPUTIME_NICE] += (__force u64) cputime;
-		cpustat[CPUTIME_GUEST_NICE] += (__force u64) cputime;
+		cpustat[CPUTIME_NICE] += cputime_to_nsecs(cputime);
+		cpustat[CPUTIME_GUEST_NICE] += cputime_to_nsecs(cputime);
 	} else {
-		cpustat[CPUTIME_USER] += (__force u64) cputime;
-		cpustat[CPUTIME_GUEST] += (__force u64) cputime;
+		cpustat[CPUTIME_USER] += cputime_to_nsecs(cputime);
+		cpustat[CPUTIME_GUEST] += cputime_to_nsecs(cputime);
 	}
 }
 
@@ -195,7 +195,7 @@ void __account_system_time(struct task_struct *p, cputime_t cputime,
 	account_group_system_time(p, cputime);
 
 	/* Add system time to cpustat. */
-	task_group_account_field(p, index, (__force u64) cputime);
+	task_group_account_field(p, index, cputime_to_nsecs(cputime));
 
 	/* Account for system time used */
 	acct_account_cputime(p);
@@ -236,7 +236,7 @@ void account_steal_time(cputime_t cputime)
 {
 	u64 *cpustat = kcpustat_this_cpu->cpustat;
 
-	cpustat[CPUTIME_STEAL] += (__force u64) cputime;
+	cpustat[CPUTIME_STEAL] += cputime_to_nsecs(cputime);
 }
 
 /*
@@ -249,9 +249,9 @@ void account_idle_time(cputime_t cputime)
 	struct rq *rq = this_rq();
 
 	if (atomic_read(&rq->nr_iowait) > 0)
-		cpustat[CPUTIME_IOWAIT] += (__force u64) cputime;
+		cpustat[CPUTIME_IOWAIT] += cputime_to_nsecs(cputime);
 	else
-		cpustat[CPUTIME_IDLE] += (__force u64) cputime;
+		cpustat[CPUTIME_IDLE] += cputime_to_nsecs(cputime);
 }
 
 static __always_inline bool steal_account_process_tick(void)
@@ -341,6 +341,7 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
 {
 	cputime_t scaled = cputime_to_scaled(cputime_one_jiffy);
 	u64 cputime = (__force u64) cputime_one_jiffy;
+	u64 nsec = cputime_to_nsecs(cputime); //TODO: make that build time
 	u64 *cpustat = kcpustat_this_cpu->cpustat;
 
 	if (steal_account_process_tick())
@@ -349,10 +350,10 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
 	cputime *= ticks;
 	scaled *= ticks;
 
-	if (irqtime_account_hi_update()) {
-		cpustat[CPUTIME_IRQ] += cputime;
-	} else if (irqtime_account_si_update()) {
-		cpustat[CPUTIME_SOFTIRQ] += cputime;
+	if (irqtime_account_hi_update(nsec)) {
+		cpustat[CPUTIME_IRQ] += nsec;
+	} else if (irqtime_account_si_update(nsec)) {
+		cpustat[CPUTIME_SOFTIRQ] += nsec;
 	} else if (this_cpu_ksoftirqd() == p) {
 		/*
 		 * ksoftirqd time do not get accounted in cpu_softirq_time.
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 08/30] apm32: Fix cputime == jiffies assumption
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (6 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 07/30] cputime: Convert kcpustat to nsecs Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 09/30] alpha: Fix jiffies based cputime assumption Frederic Weisbecker
                   ` (21 subsequent siblings)
  29 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

That code wrongly assumes that cputime_t wraps jiffies_t. Lets use
the correct accessors/mutators.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/x86/kernel/apm_32.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apm_32.c b/arch/x86/kernel/apm_32.c
index 5848744..b3949c3 100644
--- a/arch/x86/kernel/apm_32.c
+++ b/arch/x86/kernel/apm_32.c
@@ -920,7 +920,7 @@ recalc:
 	} else if (jiffies_since_last_check > idle_period) {
 		unsigned int idle_percentage;
 
-		idle_percentage = stime - last_stime;
+		idle_percentage = cputime_to_jiffies(stime - last_stime);
 		idle_percentage *= 100;
 		idle_percentage /= jiffies_since_last_check;
 		use_apm_idle = (idle_percentage > idle_threshold);
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 09/30] alpha: Fix jiffies based cputime assumption
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (7 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 08/30] apm32: Fix cputime == jiffies assumption Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 10/30] cputime: Convert guest time accounting to nsecs Frederic Weisbecker
                   ` (20 subsequent siblings)
  29 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

That code wrongly assumes that cputime_t wraps jiffies_t. Lets use
the correct accessors/mutators.

In practice there should be no harm yet because alpha currently
only support tick based cputime accounting which is always jiffies
based.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/alpha/kernel/osf_sys.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
index f9c732e..6358718 100644
--- a/arch/alpha/kernel/osf_sys.c
+++ b/arch/alpha/kernel/osf_sys.c
@@ -1138,6 +1138,7 @@ SYSCALL_DEFINE2(osf_getrusage, int, who, struct rusage32 __user *, ru)
 {
 	struct rusage32 r;
 	cputime_t utime, stime;
+	unsigned int utime_jiffies, stime_jiffies;
 
 	if (who != RUSAGE_SELF && who != RUSAGE_CHILDREN)
 		return -EINVAL;
@@ -1146,14 +1147,18 @@ SYSCALL_DEFINE2(osf_getrusage, int, who, struct rusage32 __user *, ru)
 	switch (who) {
 	case RUSAGE_SELF:
 		task_cputime(current, &utime, &stime);
-		jiffies_to_timeval32(utime, &r.ru_utime);
-		jiffies_to_timeval32(stime, &r.ru_stime);
+		utime_jiffies = cputime_to_jiffies(utime);
+		stime_jiffies = cputime_to_jiffies(stime);
+		jiffies_to_timeval32(utime_jiffies, &r.ru_utime);
+		jiffies_to_timeval32(stime_jiffies, &r.ru_stime);
 		r.ru_minflt = current->min_flt;
 		r.ru_majflt = current->maj_flt;
 		break;
 	case RUSAGE_CHILDREN:
-		jiffies_to_timeval32(current->signal->cutime, &r.ru_utime);
-		jiffies_to_timeval32(current->signal->cstime, &r.ru_stime);
+		utime_jiffies = cputime_to_jiffies(current->signal->cutime);
+		stime_jiffies = cputime_to_jiffies(current->signal->cstime);
+		jiffies_to_timeval32(utime_jiffies, &r.ru_utime);
+		jiffies_to_timeval32(stime_jiffies, &r.ru_stime);
 		r.ru_minflt = current->signal->cmin_flt;
 		r.ru_majflt = current->signal->cmaj_flt;
 		break;
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 10/30] cputime: Convert guest time accounting to nsecs
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (8 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 09/30] alpha: Fix jiffies based cputime assumption Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 11/30] cputime: Special API to return old-typed cputime Frederic Weisbecker
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 fs/proc/array.c        |  6 +++---
 include/linux/sched.h  | 10 +++++-----
 kernel/sched/cputime.c |  6 +++---
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index cd3653e..e4a8ef1 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -384,7 +384,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 	unsigned long cmin_flt = 0, cmaj_flt = 0;
 	unsigned long  min_flt = 0,  maj_flt = 0;
 	cputime_t cutime, cstime, utime, stime;
-	cputime_t cgtime, gtime;
+	u64 cgtime, gtime;
 	unsigned long rsslim = 0;
 	char tcomm[sizeof(task->comm)];
 	unsigned long flags;
@@ -511,8 +511,8 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 	seq_put_decimal_ull(m, ' ', task->rt_priority);
 	seq_put_decimal_ull(m, ' ', task->policy);
 	seq_put_decimal_ull(m, ' ', delayacct_blkio_ticks(task));
-	seq_put_decimal_ull(m, ' ', cputime_to_clock_t(gtime));
-	seq_put_decimal_ll(m, ' ', cputime_to_clock_t(cgtime));
+	seq_put_decimal_ull(m, ' ', nsec_to_clock_t(gtime));
+	seq_put_decimal_ll(m, ' ', nsec_to_clock_t(cgtime));
 
 	if (mm && permitted) {
 		seq_put_decimal_ull(m, ' ', mm->start_data);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5e344bb..9e49bae 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -649,8 +649,8 @@ struct signal_struct {
 	 */
 	seqlock_t stats_lock;
 	cputime_t utime, stime, cutime, cstime;
-	cputime_t gtime;
-	cputime_t cgtime;
+	u64 gtime;
+	u64 cgtime;
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
 	struct cputime prev_cputime;
 #endif
@@ -1366,7 +1366,7 @@ struct task_struct {
 	int __user *clear_child_tid;		/* CLONE_CHILD_CLEARTID */
 
 	cputime_t utime, stime, utimescaled, stimescaled;
-	cputime_t gtime;
+	u64 gtime;
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
 	struct cputime prev_cputime;
 #endif
@@ -1866,7 +1866,7 @@ extern void task_cputime(struct task_struct *t,
 			 cputime_t *utime, cputime_t *stime);
 extern void task_cputime_scaled(struct task_struct *t,
 				cputime_t *utimescaled, cputime_t *stimescaled);
-extern cputime_t task_gtime(struct task_struct *t);
+extern u64 task_gtime(struct task_struct *t);
 #else
 static inline void task_cputime(struct task_struct *t,
 				cputime_t *utime, cputime_t *stime)
@@ -1887,7 +1887,7 @@ static inline void task_cputime_scaled(struct task_struct *t,
 		*stimescaled = t->stimescaled;
 }
 
-static inline cputime_t task_gtime(struct task_struct *t)
+static inline u64 task_gtime(struct task_struct *t)
 {
 	return t->gtime;
 }
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 6cfdc2b..f3701ab 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -166,7 +166,7 @@ static void account_guest_time(struct task_struct *p, cputime_t cputime,
 	p->utime += cputime;
 	p->utimescaled += cputime_scaled;
 	account_group_user_time(p, cputime);
-	p->gtime += cputime;
+	p->gtime += cptime_to_nsecs(cputime);
 
 	/* Add guest time to cpustat. */
 	if (task_nice(p) > 0) {
@@ -763,10 +763,10 @@ void vtime_init_idle(struct task_struct *t, int cpu)
 	write_sequnlock_irqrestore(&t->vtime_seqlock, flags);
 }
 
-cputime_t task_gtime(struct task_struct *t)
+u64 task_gtime(struct task_struct *t)
 {
 	unsigned int seq;
-	cputime_t gtime;
+	u64 gtime;
 
 	do {
 		seq = read_seqbegin(&t->vtime_seqlock);
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 11/30] cputime: Special API to return old-typed cputime
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (9 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 10/30] cputime: Convert guest time accounting to nsecs Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 12/30] cputime: Convert task/group cputime to nsecs Frederic Weisbecker
                   ` (18 subsequent siblings)
  29 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

This API returns the tasks cputime in cputime_t before we convert
cputime internals to use nsecs. Blindly converting all cputime readers
to use this API will later let us convert smoothly and step by step all
these places to use nsec cputime.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/alpha/kernel/osf_sys.c    |  2 +-
 arch/x86/kernel/apm_32.c       |  2 +-
 drivers/isdn/mISDN/stack.c     |  2 +-
 fs/binfmt_elf.c                |  6 +++---
 fs/binfmt_elf_fdpic.c          |  6 +++---
 include/linux/init_task.h      |  2 +-
 include/linux/sched.h          | 43 +++++++++++++++++++++++++++++++++++++----
 kernel/acct.c                  |  2 +-
 kernel/delayacct.c             |  4 ++--
 kernel/signal.c                |  4 ++--
 kernel/time/itimer.c           |  2 +-
 kernel/time/posix-cpu-timers.c | 44 +++++++++++++++++++++---------------------
 kernel/tsacct.c                |  6 +++---
 13 files changed, 80 insertions(+), 45 deletions(-)

diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
index 6358718..5451c10 100644
--- a/arch/alpha/kernel/osf_sys.c
+++ b/arch/alpha/kernel/osf_sys.c
@@ -1146,7 +1146,7 @@ SYSCALL_DEFINE2(osf_getrusage, int, who, struct rusage32 __user *, ru)
 	memset(&r, 0, sizeof(r));
 	switch (who) {
 	case RUSAGE_SELF:
-		task_cputime(current, &utime, &stime);
+		task_cputime_t(current, &utime, &stime);
 		utime_jiffies = cputime_to_jiffies(utime);
 		stime_jiffies = cputime_to_jiffies(stime);
 		jiffies_to_timeval32(utime_jiffies, &r.ru_utime);
diff --git a/arch/x86/kernel/apm_32.c b/arch/x86/kernel/apm_32.c
index b3949c3..bffca50 100644
--- a/arch/x86/kernel/apm_32.c
+++ b/arch/x86/kernel/apm_32.c
@@ -914,7 +914,7 @@ static int apm_cpu_idle(struct cpuidle_device *dev,
 	unsigned int bucket;
 
 recalc:
-	task_cputime(current, NULL, &stime);
+	task_cputime_t(current, NULL, &stime);
 	if (jiffies_since_last_check > IDLE_CALC_LIMIT) {
 		use_apm_idle = 0;
 	} else if (jiffies_since_last_check > idle_period) {
diff --git a/drivers/isdn/mISDN/stack.c b/drivers/isdn/mISDN/stack.c
index 9cb4b62..0a36617 100644
--- a/drivers/isdn/mISDN/stack.c
+++ b/drivers/isdn/mISDN/stack.c
@@ -306,7 +306,7 @@ mISDNStackd(void *data)
 	       "msg %d sleep %d stopped\n",
 	       dev_name(&st->dev->dev), st->msg_cnt, st->sleep_cnt,
 	       st->stopped_cnt);
-	task_cputime(st->thread, &utime, &stime);
+	task_cputime_t(st->thread, &utime, &stime);
 	printk(KERN_DEBUG
 	       "mISDNStackd daemon for %s utime(%ld) stime(%ld)\n",
 	       dev_name(&st->dev->dev), utime, stime);
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index d8fc060..84149e2 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1293,19 +1293,19 @@ static void fill_prstatus(struct elf_prstatus *prstatus,
 	prstatus->pr_pgrp = task_pgrp_vnr(p);
 	prstatus->pr_sid = task_session_vnr(p);
 	if (thread_group_leader(p)) {
-		struct task_cputime cputime;
+		struct task_cputime_t cputime;
 
 		/*
 		 * This is the record for the group leader.  It shows the
 		 * group-wide total, not its individual thread total.
 		 */
-		thread_group_cputime(p, &cputime);
+		thread_group_cputime_t(p, &cputime);
 		cputime_to_timeval(cputime.utime, &prstatus->pr_utime);
 		cputime_to_timeval(cputime.stime, &prstatus->pr_stime);
 	} else {
 		cputime_t utime, stime;
 
-		task_cputime(p, &utime, &stime);
+		task_cputime_t(p, &utime, &stime);
 		cputime_to_timeval(utime, &prstatus->pr_utime);
 		cputime_to_timeval(stime, &prstatus->pr_stime);
 	}
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index d3634bf..3dc8e5d 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -1336,19 +1336,19 @@ static void fill_prstatus(struct elf_prstatus *prstatus,
 	prstatus->pr_pgrp = task_pgrp_vnr(p);
 	prstatus->pr_sid = task_session_vnr(p);
 	if (thread_group_leader(p)) {
-		struct task_cputime cputime;
+		struct task_cputime_t cputime;
 
 		/*
 		 * This is the record for the group leader.  It shows the
 		 * group-wide total, not its individual thread total.
 		 */
-		thread_group_cputime(p, &cputime);
+		thread_group_cputime_t(p, &cputime);
 		cputime_to_timeval(cputime.utime, &prstatus->pr_utime);
 		cputime_to_timeval(cputime.stime, &prstatus->pr_stime);
 	} else {
 		cputime_t utime, stime;
 
-		task_cputime(p, &utime, &stime);
+		task_cputime_t(p, &utime, &stime);
 		cputime_to_timeval(utime, &prstatus->pr_utime);
 		cputime_to_timeval(stime, &prstatus->pr_stime);
 	}
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 77fc43f..9aaf2c9 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -50,7 +50,7 @@ extern struct fs_struct init_fs;
 	.cpu_timers	= INIT_CPU_TIMERS(sig.cpu_timers),		\
 	.rlim		= INIT_RLIMITS,					\
 	.cputimer	= { 						\
-		.cputime = INIT_CPUTIME,				\
+		.cputime = INIT_CPUTIME_T,				\
 		.running = 0,						\
 		.lock = __RAW_SPIN_LOCK_UNLOCKED(sig.cputimer.lock),	\
 	},								\
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9e49bae..83f77bf 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -511,6 +511,14 @@ struct task_cputime {
 	cputime_t stime;
 	unsigned long long sum_exec_runtime;
 };
+
+/* Temporary type to ease cputime_t to nsecs conversion */
+struct task_cputime_t {
+	cputime_t utime;
+	cputime_t stime;
+	unsigned long long sum_exec_runtime;
+};
+
 /* Alternate field names when used to cache expirations. */
 #define prof_exp	stime
 #define virt_exp	utime
@@ -523,6 +531,13 @@ struct task_cputime {
 		.sum_exec_runtime = 0,				\
 	}
 
+#define INIT_CPUTIME_T	\
+	(struct task_cputime_t) {				\
+		.utime = 0,					\
+		.stime = 0,					\
+		.sum_exec_runtime = 0,				\
+	}
+
 #ifdef CONFIG_PREEMPT_COUNT
 #define PREEMPT_DISABLED	(1 + PREEMPT_ENABLED)
 #else
@@ -549,7 +564,7 @@ struct task_cputime {
  * used for thread group CPU timer calculations.
  */
 struct thread_group_cputimer {
-	struct task_cputime cputime;
+	struct task_cputime_t cputime;
 	int running;
 	raw_spinlock_t lock;
 };
@@ -627,7 +642,7 @@ struct signal_struct {
 	struct thread_group_cputimer cputimer;
 
 	/* Earliest-expiration cache. */
-	struct task_cputime cputime_expires;
+	struct task_cputime_t cputime_expires;
 
 	struct list_head cpu_timers[3];
 
@@ -1385,7 +1400,7 @@ struct task_struct {
 /* mm fault and swap info: this can arguably be seen as either mm-specific or thread-specific */
 	unsigned long min_flt, maj_flt;
 
-	struct task_cputime cputime_expires;
+	struct task_cputime_t cputime_expires;
 	struct list_head cpu_timers[3];
 
 /* process credentials */
@@ -1892,6 +1907,20 @@ static inline u64 task_gtime(struct task_struct *t)
 	return t->gtime;
 }
 #endif
+
+static inline void task_cputime_t(struct task_struct *t,
+				  cputime_t *utime, cputime_t *stime)
+{
+	task_cputime(t, utime, stime);
+}
+
+static inline void task_cputime_t_scaled(struct task_struct *t,
+					 cputime_t *utimescaled,
+					 cputime_t *stimescaled)
+{
+	task_cputime_scaled(t, utimescaled, stimescaled);
+}
+
 extern void task_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st);
 extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st);
 
@@ -2892,7 +2921,13 @@ static __always_inline bool need_resched(void)
  * Thread group CPU time accounting.
  */
 void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times);
-void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times);
+void thread_group_cputimer(struct task_struct *tsk, struct task_cputime_t *times);
+
+static inline void thread_group_cputime_t(struct task_struct *tsk,
+					  struct task_cputime_t *times)
+{
+	thread_group_cputime(tsk, (struct task_cputime *)times);
+}
 
 static inline void thread_group_cputime_init(struct signal_struct *sig)
 {
diff --git a/kernel/acct.c b/kernel/acct.c
index 33738ef..acfa901 100644
--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -561,7 +561,7 @@ void acct_collect(long exitcode, int group_dead)
 		pacct->ac_flag |= ACORE;
 	if (current->flags & PF_SIGNALED)
 		pacct->ac_flag |= AXSIG;
-	task_cputime(current, &utime, &stime);
+	task_cputime_t(current, &utime, &stime);
 	pacct->ac_utime += utime;
 	pacct->ac_stime += stime;
 	pacct->ac_minflt += current->min_flt;
diff --git a/kernel/delayacct.c b/kernel/delayacct.c
index ef90b04..8cf2179 100644
--- a/kernel/delayacct.c
+++ b/kernel/delayacct.c
@@ -87,12 +87,12 @@ int __delayacct_add_tsk(struct taskstats *d, struct task_struct *tsk)
 	unsigned long flags, t1;
 	s64 tmp;
 
-	task_cputime(tsk, &utime, &stime);
+	task_cputime_t(tsk, &utime, &stime);
 	tmp = (s64)d->cpu_run_real_total;
 	tmp += cputime_to_nsecs(utime + stime);
 	d->cpu_run_real_total = (tmp < (s64)d->cpu_run_real_total) ? 0 : tmp;
 
-	task_cputime_scaled(tsk, &utimescaled, &stimescaled);
+	task_cputime_t_scaled(tsk, &utimescaled, &stimescaled);
 	tmp = (s64)d->cpu_scaled_run_real_total;
 	tmp += cputime_to_nsecs(utimescaled + stimescaled);
 	d->cpu_scaled_run_real_total =
diff --git a/kernel/signal.c b/kernel/signal.c
index 8f0876f..b0f1f39 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1656,7 +1656,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 				       task_uid(tsk));
 	rcu_read_unlock();
 
-	task_cputime(tsk, &utime, &stime);
+	task_cputime_t(tsk, &utime, &stime);
 	info.si_utime = cputime_to_clock_t(utime + tsk->signal->utime);
 	info.si_stime = cputime_to_clock_t(stime + tsk->signal->stime);
 
@@ -1741,7 +1741,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	info.si_uid = from_kuid_munged(task_cred_xxx(parent, user_ns), task_uid(tsk));
 	rcu_read_unlock();
 
-	task_cputime(tsk, &utime, &stime);
+	task_cputime_t(tsk, &utime, &stime);
 	info.si_utime = cputime_to_clock_t(utime);
 	info.si_stime = cputime_to_clock_t(stime);
 
diff --git a/kernel/time/itimer.c b/kernel/time/itimer.c
index 8d262b4..1b9ddae 100644
--- a/kernel/time/itimer.c
+++ b/kernel/time/itimer.c
@@ -53,7 +53,7 @@ static void get_cpu_itimer(struct task_struct *tsk, unsigned int clock_id,
 	cval = it->expires;
 	cinterval = it->incr;
 	if (cval) {
-		struct task_cputime cputime;
+		struct task_cputime_t cputime;
 		cputime_t t;
 
 		thread_group_cputimer(tsk, &cputime);
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 492b986..429c782 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -116,7 +116,7 @@ static void bump_cpu_timer(struct k_itimer *timer,
  * Checks @cputime to see if all fields are zero.  Returns true if all fields
  * are zero, false if any field is nonzero.
  */
-static inline int task_cputime_zero(const struct task_cputime *cputime)
+static inline int task_cputime_zero(const struct task_cputime_t *cputime)
 {
 	if (!cputime->utime && !cputime->stime && !cputime->sum_exec_runtime)
 		return 1;
@@ -127,7 +127,7 @@ static inline unsigned long long prof_ticks(struct task_struct *p)
 {
 	cputime_t utime, stime;
 
-	task_cputime(p, &utime, &stime);
+	task_cputime_t(p, &utime, &stime);
 
 	return cputime_to_expires(utime + stime);
 }
@@ -135,7 +135,7 @@ static inline unsigned long long virt_ticks(struct task_struct *p)
 {
 	cputime_t utime;
 
-	task_cputime(p, &utime, NULL);
+	task_cputime_t(p, &utime, NULL);
 
 	return cputime_to_expires(utime);
 }
@@ -196,7 +196,7 @@ static int cpu_clock_sample(const clockid_t which_clock, struct task_struct *p,
 	return 0;
 }
 
-static void update_gt_cputime(struct task_cputime *a, struct task_cputime *b)
+static void update_gt_cputime(struct task_cputime_t *a, struct task_cputime_t *b)
 {
 	if (b->utime > a->utime)
 		a->utime = b->utime;
@@ -208,10 +208,10 @@ static void update_gt_cputime(struct task_cputime *a, struct task_cputime *b)
 		a->sum_exec_runtime = b->sum_exec_runtime;
 }
 
-void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
+void thread_group_cputimer(struct task_struct *tsk, struct task_cputime_t *times)
 {
 	struct thread_group_cputimer *cputimer = &tsk->signal->cputimer;
-	struct task_cputime sum;
+	struct task_cputime_t sum;
 	unsigned long flags;
 
 	if (!cputimer->running) {
@@ -221,7 +221,7 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
 		 * to synchronize the timer to the clock every time we start
 		 * it.
 		 */
-		thread_group_cputime(tsk, &sum);
+		thread_group_cputime_t(tsk, &sum);
 		raw_spin_lock_irqsave(&cputimer->lock, flags);
 		cputimer->running = 1;
 		update_gt_cputime(&cputimer->cputime, &sum);
@@ -240,21 +240,21 @@ static int cpu_clock_sample_group(const clockid_t which_clock,
 				  struct task_struct *p,
 				  unsigned long long *sample)
 {
-	struct task_cputime cputime;
+	struct task_cputime_t cputime;
 
 	switch (CPUCLOCK_WHICH(which_clock)) {
 	default:
 		return -EINVAL;
 	case CPUCLOCK_PROF:
-		thread_group_cputime(p, &cputime);
+		thread_group_cputime_t(p, &cputime);
 		*sample = cputime_to_expires(cputime.utime + cputime.stime);
 		break;
 	case CPUCLOCK_VIRT:
-		thread_group_cputime(p, &cputime);
+		thread_group_cputime_t(p, &cputime);
 		*sample = cputime_to_expires(cputime.utime);
 		break;
 	case CPUCLOCK_SCHED:
-		thread_group_cputime(p, &cputime);
+		thread_group_cputime_t(p, &cputime);
 		*sample = cputime.sum_exec_runtime;
 		break;
 	}
@@ -448,7 +448,7 @@ static void arm_timer(struct k_itimer *timer)
 {
 	struct task_struct *p = timer->it.cpu.task;
 	struct list_head *head, *listpos;
-	struct task_cputime *cputime_expires;
+	struct task_cputime_t *cputime_expires;
 	struct cpu_timer_list *const nt = &timer->it.cpu;
 	struct cpu_timer_list *next;
 
@@ -540,7 +540,7 @@ static int cpu_timer_sample_group(const clockid_t which_clock,
 				  struct task_struct *p,
 				  unsigned long long *sample)
 {
-	struct task_cputime cputime;
+	struct task_cputime_t cputime;
 
 	thread_group_cputimer(p, &cputime);
 	switch (CPUCLOCK_WHICH(which_clock)) {
@@ -772,7 +772,7 @@ static void posix_cpu_timer_get(struct k_itimer *timer, struct itimerspec *itp)
 		/*
 		 * Protect against sighand release/switch in exit/exec and
 		 * also make timer sampling safe if it ends up calling
-		 * thread_group_cputime().
+		 * thread_group_cputime_t().
 		 */
 		sighand = lock_task_sighand(p, &flags);
 		if (unlikely(sighand == NULL)) {
@@ -836,7 +836,7 @@ static void check_thread_timers(struct task_struct *tsk,
 {
 	struct list_head *timers = tsk->cpu_timers;
 	struct signal_struct *const sig = tsk->signal;
-	struct task_cputime *tsk_expires = &tsk->cputime_expires;
+	struct task_cputime_t *tsk_expires = &tsk->cputime_expires;
 	unsigned long long expires;
 	unsigned long soft;
 
@@ -936,7 +936,7 @@ static void check_process_timers(struct task_struct *tsk,
 	unsigned long long utime, ptime, virt_expires, prof_expires;
 	unsigned long long sum_sched_runtime, sched_expires;
 	struct list_head *timers = sig->cpu_timers;
-	struct task_cputime cputime;
+	struct task_cputime_t cputime;
 	unsigned long soft;
 
 	/*
@@ -1024,7 +1024,7 @@ void posix_cpu_timer_schedule(struct k_itimer *timer)
 	} else {
 		/*
 		 * Protect arm_timer() and timer sampling in case of call to
-		 * thread_group_cputime().
+		 * thread_group_cputime_t().
 		 */
 		sighand = lock_task_sighand(p, &flags);
 		if (unlikely(sighand == NULL)) {
@@ -1069,8 +1069,8 @@ out:
  * Returns true if any field of the former is greater than the corresponding
  * field of the latter if the latter field is set.  Otherwise returns false.
  */
-static inline int task_cputime_expired(const struct task_cputime *sample,
-					const struct task_cputime *expires)
+static inline int task_cputime_expired(const struct task_cputime_t *sample,
+					const struct task_cputime_t *expires)
 {
 	if (expires->utime && sample->utime >= expires->utime)
 		return 1;
@@ -1097,10 +1097,10 @@ static inline int fastpath_timer_check(struct task_struct *tsk)
 	struct signal_struct *sig;
 	cputime_t utime, stime;
 
-	task_cputime(tsk, &utime, &stime);
+	task_cputime_t(tsk, &utime, &stime);
 
 	if (!task_cputime_zero(&tsk->cputime_expires)) {
-		struct task_cputime task_sample = {
+		struct task_cputime_t task_sample = {
 			.utime = utime,
 			.stime = stime,
 			.sum_exec_runtime = tsk->se.sum_exec_runtime
@@ -1112,7 +1112,7 @@ static inline int fastpath_timer_check(struct task_struct *tsk)
 
 	sig = tsk->signal;
 	if (sig->cputimer.running) {
-		struct task_cputime group_sample;
+		struct task_cputime_t group_sample;
 
 		raw_spin_lock(&sig->cputimer.lock);
 		group_sample = sig->cputimer.cputime;
diff --git a/kernel/tsacct.c b/kernel/tsacct.c
index 975cb49..4133e94 100644
--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -66,11 +66,11 @@ void bacct_add_tsk(struct user_namespace *user_ns,
 		task_tgid_nr_ns(rcu_dereference(tsk->real_parent), pid_ns) : 0;
 	rcu_read_unlock();
 
-	task_cputime(tsk, &utime, &stime);
+	task_cputime_t(tsk, &utime, &stime);
 	stats->ac_utime = cputime_to_usecs(utime);
 	stats->ac_stime = cputime_to_usecs(stime);
 
-	task_cputime_scaled(tsk, &utimescaled, &stimescaled);
+	task_cputime_t_scaled(tsk, &utimescaled, &stimescaled);
 	stats->ac_utimescaled = cputime_to_usecs(utimescaled);
 	stats->ac_stimescaled = cputime_to_usecs(stimescaled);
 
@@ -154,7 +154,7 @@ void acct_update_integrals(struct task_struct *tsk)
 {
 	cputime_t utime, stime;
 
-	task_cputime(tsk, &utime, &stime);
+	task_cputime_t(tsk, &utime, &stime);
 	__acct_update_integrals(tsk, utime, stime);
 }
 
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 12/30] cputime: Convert task/group cputime to nsecs
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (10 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 11/30] cputime: Special API to return old-typed cputime Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 13/30] alpha: Convert obsolete cputime_t " Frederic Weisbecker
                   ` (17 subsequent siblings)
  29 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

Now that most cputime readers use the transition API which return the
task cputime in old style cputime_t, we can safely store the cputime in
nsecs.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/alpha/kernel/osf_sys.c |  4 +--
 fs/binfmt_elf.c             | 11 ++++++--
 fs/binfmt_elf_fdpic.c       |  4 +--
 fs/proc/array.c             |  9 +++++--
 include/linux/cputime.h     | 12 +++++++++
 include/linux/sched.h       | 51 ++++++++++++++++++++++++------------
 kernel/exit.c               |  4 +--
 kernel/sched/cputime.c      | 63 ++++++++++++++++++++++-----------------------
 kernel/sys.c                | 16 ++++++------
 9 files changed, 107 insertions(+), 67 deletions(-)

diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
index 5451c10..a6e4491 100644
--- a/arch/alpha/kernel/osf_sys.c
+++ b/arch/alpha/kernel/osf_sys.c
@@ -1155,8 +1155,8 @@ SYSCALL_DEFINE2(osf_getrusage, int, who, struct rusage32 __user *, ru)
 		r.ru_majflt = current->maj_flt;
 		break;
 	case RUSAGE_CHILDREN:
-		utime_jiffies = cputime_to_jiffies(current->signal->cutime);
-		stime_jiffies = cputime_to_jiffies(current->signal->cstime);
+		utime_jiffies = nsecs_to_jiffies(current->signal->cutime);
+		stime_jiffies = nsecs_to_jiffies(current->signal->cstime);
 		jiffies_to_timeval32(utime_jiffies, &r.ru_utime);
 		jiffies_to_timeval32(stime_jiffies, &r.ru_stime);
 		r.ru_minflt = current->signal->cmin_flt;
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 84149e2..646cfc3 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1283,6 +1283,8 @@ static void fill_note(struct memelfnote *note, const char *name, int type,
 static void fill_prstatus(struct elf_prstatus *prstatus,
 		struct task_struct *p, long signr)
 {
+	struct timeval tv;
+
 	prstatus->pr_info.si_signo = prstatus->pr_cursig = signr;
 	prstatus->pr_sigpend = p->pending.signal.sig[0];
 	prstatus->pr_sighold = p->blocked.sig[0];
@@ -1309,8 +1311,13 @@ static void fill_prstatus(struct elf_prstatus *prstatus,
 		cputime_to_timeval(utime, &prstatus->pr_utime);
 		cputime_to_timeval(stime, &prstatus->pr_stime);
 	}
-	cputime_to_timeval(p->signal->cutime, &prstatus->pr_cutime);
-	cputime_to_timeval(p->signal->cstime, &prstatus->pr_cstime);
+	tv = ns_to_timeval(p->signal->cutime);
+	prstatus->pr_cutime.tv_sec = tv.tv_sec;
+	prstatus->pr_cutime.tv_usec = tv.tv_usec;
+
+	tv = ns_to_timeval(p->signal->cstime);
+	prstatus->pr_cstime.tv_sec = tv.tv_sec;
+	prstatus->pr_cstime.tv_usec = tv.tv_usec;
 }
 
 static int fill_psinfo(struct elf_prpsinfo *psinfo, struct task_struct *p,
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 3dc8e5d..fecdb6d 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -1352,8 +1352,8 @@ static void fill_prstatus(struct elf_prstatus *prstatus,
 		cputime_to_timeval(utime, &prstatus->pr_utime);
 		cputime_to_timeval(stime, &prstatus->pr_stime);
 	}
-	cputime_to_timeval(p->signal->cutime, &prstatus->pr_cutime);
-	cputime_to_timeval(p->signal->cstime, &prstatus->pr_cstime);
+	prstatus->pr_cutime = ns_to_timeval(p->signal->cutime);
+	prstatus->pr_cstime = ns_to_timeval(p->signal->cstime);
 
 	prstatus->pr_exec_fdpic_loadmap = p->mm->context.exec_fdpic_loadmap;
 	prstatus->pr_interp_fdpic_loadmap = p->mm->context.interp_fdpic_loadmap;
diff --git a/fs/proc/array.c b/fs/proc/array.c
index e4a8ef1..de4fe51 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -385,6 +385,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 	unsigned long  min_flt = 0,  maj_flt = 0;
 	cputime_t cutime, cstime, utime, stime;
 	u64 cgtime, gtime;
+	u64 nutime, nstime;
 	unsigned long rsslim = 0;
 	char tcomm[sizeof(task->comm)];
 	unsigned long flags;
@@ -439,7 +440,9 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 
 			min_flt += sig->min_flt;
 			maj_flt += sig->maj_flt;
-			thread_group_cputime_adjusted(task, &utime, &stime);
+			thread_group_cputime_adjusted(task, &nutime, &nstime);
+			utime = nsecs_to_cputime(nutime);
+			stime = nsecs_to_cputime(nstime);
 			gtime += sig->gtime;
 		}
 
@@ -455,7 +458,9 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 	if (!whole) {
 		min_flt = task->min_flt;
 		maj_flt = task->maj_flt;
-		task_cputime_adjusted(task, &utime, &stime);
+		task_cputime_adjusted(task, &nutime, &nstime);
+		utime = nsecs_to_cputime(nutime);
+		stime = nsecs_to_cputime(nstime);
 		gtime = task_gtime(task);
 	}
 
diff --git a/include/linux/cputime.h b/include/linux/cputime.h
index a225ab9..ff843a9 100644
--- a/include/linux/cputime.h
+++ b/include/linux/cputime.h
@@ -23,4 +23,16 @@
 	((__force cputime64_t) nsecs_to_cputime(__nsecs))
 #endif
 
+#ifndef nsecs_to_scaled
+static inline u64 nsecs_to_scaled(u64 nsecs)
+{
+	cputime_t cputime, scaled;
+
+	cputime = nsecs_to_cputime(nsecs);
+	scaled = cputime_to_scaled(cputime);
+
+	return cputime_to_nsecs(scaled);
+}
+#endif
+
 #endif /* __LINUX_CPUTIME_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 83f77bf..3be3b0b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -488,8 +488,8 @@ struct cpu_itimer {
  * Gathers a generic snapshot of user and system time.
  */
 struct cputime {
-	cputime_t utime;
-	cputime_t stime;
+	u64 utime;
+	u64 stime;
 };
 
 /**
@@ -507,8 +507,8 @@ struct cputime {
  * of them in parallel.
  */
 struct task_cputime {
-	cputime_t utime;
-	cputime_t stime;
+	u64 utime;
+	u64 stime;
 	unsigned long long sum_exec_runtime;
 };
 
@@ -663,7 +663,7 @@ struct signal_struct {
 	 * in __exit_signal, except for the group leader.
 	 */
 	seqlock_t stats_lock;
-	cputime_t utime, stime, cutime, cstime;
+	u64 utime, stime, cutime, cstime;
 	u64 gtime;
 	u64 cgtime;
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
@@ -1380,7 +1380,7 @@ struct task_struct {
 	int __user *set_child_tid;		/* CLONE_CHILD_SETTID */
 	int __user *clear_child_tid;		/* CLONE_CHILD_CLEARTID */
 
-	cputime_t utime, stime, utimescaled, stimescaled;
+	u64 utime, stime, utimescaled, stimescaled;
 	u64 gtime;
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
 	struct cputime prev_cputime;
@@ -1878,13 +1878,13 @@ static inline void put_task_struct(struct task_struct *t)
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
 extern void task_cputime(struct task_struct *t,
-			 cputime_t *utime, cputime_t *stime);
+			 u64 *utime, u64 *stime);
 extern void task_cputime_scaled(struct task_struct *t,
-				cputime_t *utimescaled, cputime_t *stimescaled);
+				u64 *utimescaled, u64 *stimescaled);
 extern u64 task_gtime(struct task_struct *t);
 #else
 static inline void task_cputime(struct task_struct *t,
-				cputime_t *utime, cputime_t *stime)
+				u64 *utime, u64 *stime)
 {
 	if (utime)
 		*utime = t->utime;
@@ -1893,8 +1893,8 @@ static inline void task_cputime(struct task_struct *t,
 }
 
 static inline void task_cputime_scaled(struct task_struct *t,
-				       cputime_t *utimescaled,
-				       cputime_t *stimescaled)
+				       u64 *utimescaled,
+				       u64 *stimescaled)
 {
 	if (utimescaled)
 		*utimescaled = t->utimescaled;
@@ -1911,18 +1911,30 @@ static inline u64 task_gtime(struct task_struct *t)
 static inline void task_cputime_t(struct task_struct *t,
 				  cputime_t *utime, cputime_t *stime)
 {
-	task_cputime(t, utime, stime);
+	u64 ut, st;
+
+	task_cputime(t, &ut, &st);
+	if (utime)
+		*utime = nsecs_to_cputime(ut);
+	if (stime)
+		*stime = nsecs_to_cputime(st);
 }
 
 static inline void task_cputime_t_scaled(struct task_struct *t,
 					 cputime_t *utimescaled,
 					 cputime_t *stimescaled)
 {
-	task_cputime_scaled(t, utimescaled, stimescaled);
+	u64 ut, st;
+
+	task_cputime_scaled(t, &ut, &st);
+	if (utimescaled)
+		*utimescaled = nsecs_to_cputime(ut);
+	if (stimescaled)
+		*stimescaled = nsecs_to_cputime(st);
 }
 
-extern void task_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st);
-extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st);
+extern void task_cputime_adjusted(struct task_struct *p, u64 *ut, u64 *st);
+extern void thread_group_cputime_adjusted(struct task_struct *p, u64 *ut, u64 *st);
 
 /*
  * Per process flags
@@ -2924,9 +2936,14 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times);
 void thread_group_cputimer(struct task_struct *tsk, struct task_cputime_t *times);
 
 static inline void thread_group_cputime_t(struct task_struct *tsk,
-					  struct task_cputime_t *times)
+					  struct task_cputime_t *cputime)
 {
-	thread_group_cputime(tsk, (struct task_cputime *)times);
+	struct task_cputime times;
+
+	thread_group_cputime(tsk, &times);
+	cputime->utime = nsecs_to_cputime(times.utime);
+	cputime->stime = nsecs_to_cputime(times.stime);
+	cputime->sum_exec_runtime = times.sum_exec_runtime;
 }
 
 static inline void thread_group_cputime_init(struct signal_struct *sig)
diff --git a/kernel/exit.c b/kernel/exit.c
index 5d30019..9df0729 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -86,7 +86,7 @@ static void __exit_signal(struct task_struct *tsk)
 	bool group_dead = thread_group_leader(tsk);
 	struct sighand_struct *sighand;
 	struct tty_struct *uninitialized_var(tty);
-	cputime_t utime, stime;
+	u64 utime, stime;
 
 	sighand = rcu_dereference_check(tsk->sighand,
 					lockdep_tasklist_lock_is_held());
@@ -1022,7 +1022,7 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
 		struct signal_struct *psig;
 		struct signal_struct *sig;
 		unsigned long maxrss;
-		cputime_t tgutime, tgstime;
+		u64 tgutime, tgstime;
 
 		/*
 		 * The resource counters for the group leader are in its
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index f3701ab..eefe1ec 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -138,8 +138,8 @@ void account_user_time(struct task_struct *p, cputime_t cputime,
 	int index;
 
 	/* Add user time to process. */
-	p->utime += cputime;
-	p->utimescaled += cputime_scaled;
+	p->utime += cputime_to_nsecs(cputime);
+	p->utimescaled += cputime_to_nsecs(cputime_scaled);
 	account_group_user_time(p, cputime);
 
 	index = (task_nice(p) > 0) ? CPUTIME_NICE : CPUTIME_USER;
@@ -163,10 +163,10 @@ static void account_guest_time(struct task_struct *p, cputime_t cputime,
 	u64 *cpustat = kcpustat_this_cpu->cpustat;
 
 	/* Add guest time to process. */
-	p->utime += cputime;
-	p->utimescaled += cputime_scaled;
+	p->utime += cputime_to_nsecs(cputime);
+	p->utimescaled += cputime_to_nsecs(cputime_scaled);
 	account_group_user_time(p, cputime);
-	p->gtime += cptime_to_nsecs(cputime);
+	p->gtime += cputime_to_nsecs(cputime);
 
 	/* Add guest time to cpustat. */
 	if (task_nice(p) > 0) {
@@ -190,8 +190,8 @@ void __account_system_time(struct task_struct *p, cputime_t cputime,
 			cputime_t cputime_scaled, int index)
 {
 	/* Add system time to process. */
-	p->stime += cputime;
-	p->stimescaled += cputime_scaled;
+	p->stime += cputime_to_nsecs(cputime);
+	p->stimescaled += cputime_to_nsecs(cputime_scaled);
 	account_group_system_time(p, cputime);
 
 	/* Add system time to cpustat. */
@@ -286,7 +286,7 @@ static __always_inline bool steal_account_process_tick(void)
 void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
 {
 	struct signal_struct *sig = tsk->signal;
-	cputime_t utime, stime;
+	u64 utime, stime;
 	struct task_struct *t;
 	unsigned int seq, nextseq;
 	unsigned long flags;
@@ -440,13 +440,13 @@ EXPORT_SYMBOL_GPL(vtime_common_account_irq_enter);
 
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
-void task_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st)
+void task_cputime_adjusted(struct task_struct *p, u64 *ut, u64 *st)
 {
 	*ut = p->utime;
 	*st = p->stime;
 }
 
-void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st)
+void thread_group_cputime_adjusted(struct task_struct *p, u64 *ut, u64 *st)
 {
 	struct task_cputime cputime;
 
@@ -515,7 +515,7 @@ void account_idle_ticks(unsigned long ticks)
  * Perform (stime * rtime) / total, but avoid multiplication overflow by
  * loosing precision when the numbers are big.
  */
-static cputime_t scale_stime(u64 stime, u64 rtime, u64 total)
+static u64 scale_stime(u64 stime, u64 rtime, u64 total)
 {
 	u64 scaled;
 
@@ -552,7 +552,7 @@ drop_precision:
 	 * followed by a 64/32->64 divide.
 	 */
 	scaled = div_u64((u64) (u32) stime * (u64) (u32) rtime, (u32)total);
-	return (__force cputime_t) scaled;
+	return scaled;
 }
 
 /*
@@ -564,12 +564,12 @@ drop_precision:
  * Normally a caller will only go through this loop once, or not
  * at all in case a previous caller updated counter the same jiffy.
  */
-static void cputime_advance(cputime_t *counter, cputime_t new)
+static void cputime_advance(u64 *counter, u64 new)
 {
-	cputime_t old;
+	u64 old;
 
 	while (new > (old = ACCESS_ONCE(*counter)))
-		cmpxchg_cputime(counter, old, new);
+		cmpxchg64(counter, old, new);
 }
 
 /*
@@ -578,9 +578,9 @@ static void cputime_advance(cputime_t *counter, cputime_t new)
  */
 static void cputime_adjust(struct task_cputime *curr,
 			   struct cputime *prev,
-			   cputime_t *ut, cputime_t *st)
+			   u64 *ut, u64 *st)
 {
-	cputime_t rtime, stime, utime;
+	u64 rtime, stime, utime;
 
 	/*
 	 * Tick based cputime accounting depend on random scheduling
@@ -592,7 +592,7 @@ static void cputime_adjust(struct task_cputime *curr,
 	 * Fix this by scaling these tick based values against the total
 	 * runtime accounted by the CFS scheduler.
 	 */
-	rtime = nsecs_to_cputime(curr->sum_exec_runtime);
+	rtime = curr->sum_exec_runtime;
 
 	/*
 	 * Update userspace visible utime/stime values only if actual execution
@@ -610,10 +610,9 @@ static void cputime_adjust(struct task_cputime *curr,
 	} else if (stime == 0) {
 		utime = rtime;
 	} else {
-		cputime_t total = stime + utime;
+		u64 total = stime + utime;
 
-		stime = scale_stime((__force u64)stime,
-				    (__force u64)rtime, (__force u64)total);
+		stime = scale_stime(stime, rtime, total);
 		utime = rtime - stime;
 	}
 
@@ -625,7 +624,7 @@ out:
 	*st = prev->stime;
 }
 
-void task_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st)
+void task_cputime_adjusted(struct task_struct *p, u64 *ut, u64 *st)
 {
 	struct task_cputime cputime = {
 		.sum_exec_runtime = p->se.sum_exec_runtime,
@@ -635,7 +634,7 @@ void task_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st)
 	cputime_adjust(&cputime, &p->prev_cputime, ut, st);
 }
 
-void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st)
+void thread_group_cputime_adjusted(struct task_struct *p, u64 *ut, u64 *st)
 {
 	struct task_cputime cputime;
 
@@ -787,9 +786,9 @@ u64 task_gtime(struct task_struct *t)
  */
 static void
 fetch_task_cputime(struct task_struct *t,
-		   cputime_t *u_dst, cputime_t *s_dst,
-		   cputime_t *u_src, cputime_t *s_src,
-		   cputime_t *udelta, cputime_t *sdelta)
+		   u64 *u_dst, u64 *s_dst,
+		   u64 *u_src, u64 *s_src,
+		   u64 *udelta, u64 *sdelta)
 {
 	unsigned int seq;
 	unsigned long long delta;
@@ -826,9 +825,9 @@ fetch_task_cputime(struct task_struct *t,
 }
 
 
-void task_cputime(struct task_struct *t, cputime_t *utime, cputime_t *stime)
+void task_cputime(struct task_struct *t, u64 *utime, u64 *stime)
 {
-	cputime_t udelta, sdelta;
+	u64 udelta, sdelta;
 
 	fetch_task_cputime(t, utime, stime, &t->utime,
 			   &t->stime, &udelta, &sdelta);
@@ -839,15 +838,15 @@ void task_cputime(struct task_struct *t, cputime_t *utime, cputime_t *stime)
 }
 
 void task_cputime_scaled(struct task_struct *t,
-			 cputime_t *utimescaled, cputime_t *stimescaled)
+			 u64 *utimescaled, u64 *stimescaled)
 {
-	cputime_t udelta, sdelta;
+	u64 udelta, sdelta;
 
 	fetch_task_cputime(t, utimescaled, stimescaled,
 			   &t->utimescaled, &t->stimescaled, &udelta, &sdelta);
 	if (utimescaled)
-		*utimescaled += cputime_to_scaled(udelta);
+		*utimescaled += nsecs_to_scaled(udelta);
 	if (stimescaled)
-		*stimescaled += cputime_to_scaled(sdelta);
+		*stimescaled += nsecs_to_scaled(sdelta);
 }
 #endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
diff --git a/kernel/sys.c b/kernel/sys.c
index 1eaa2f0..aa9dab9 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -867,15 +867,15 @@ SYSCALL_DEFINE0(getegid)
 
 void do_sys_times(struct tms *tms)
 {
-	cputime_t tgutime, tgstime, cutime, cstime;
+	u64 tgutime, tgstime, cutime, cstime;
 
 	thread_group_cputime_adjusted(current, &tgutime, &tgstime);
 	cutime = current->signal->cutime;
 	cstime = current->signal->cstime;
-	tms->tms_utime = cputime_to_clock_t(tgutime);
-	tms->tms_stime = cputime_to_clock_t(tgstime);
-	tms->tms_cutime = cputime_to_clock_t(cutime);
-	tms->tms_cstime = cputime_to_clock_t(cstime);
+	tms->tms_utime = nsec_to_clock_t(tgutime);
+	tms->tms_stime = nsec_to_clock_t(tgstime);
+	tms->tms_cutime = nsec_to_clock_t(cutime);
+	tms->tms_cstime = nsec_to_clock_t(cstime);
 }
 
 SYSCALL_DEFINE1(times, struct tms __user *, tbuf)
@@ -1528,7 +1528,7 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
 {
 	struct task_struct *t;
 	unsigned long flags;
-	cputime_t tgutime, tgstime, utime, stime;
+	u64 tgutime, tgstime, utime, stime;
 	unsigned long maxrss = 0;
 
 	memset((char *)r, 0, sizeof (*r));
@@ -1584,8 +1584,8 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
 	unlock_task_sighand(p, &flags);
 
 out:
-	cputime_to_timeval(utime, &r->ru_utime);
-	cputime_to_timeval(stime, &r->ru_stime);
+	r->ru_utime = ns_to_timeval(utime);
+	r->ru_stime = ns_to_timeval(stime);
 
 	if (who != RUSAGE_CHILDREN) {
 		struct mm_struct *mm = get_task_mm(p);
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 13/30] alpha: Convert obsolete cputime_t to nsecs
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (11 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 12/30] cputime: Convert task/group cputime to nsecs Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 14/30] x86: Convert obsolete cputime type " Frederic Weisbecker
                   ` (16 subsequent siblings)
  29 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/alpha/kernel/osf_sys.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
index a6e4491..03ebb5f 100644
--- a/arch/alpha/kernel/osf_sys.c
+++ b/arch/alpha/kernel/osf_sys.c
@@ -1137,7 +1137,7 @@ struct rusage32 {
 SYSCALL_DEFINE2(osf_getrusage, int, who, struct rusage32 __user *, ru)
 {
 	struct rusage32 r;
-	cputime_t utime, stime;
+	u64 utime, stime;
 	unsigned int utime_jiffies, stime_jiffies;
 
 	if (who != RUSAGE_SELF && who != RUSAGE_CHILDREN)
@@ -1146,9 +1146,9 @@ SYSCALL_DEFINE2(osf_getrusage, int, who, struct rusage32 __user *, ru)
 	memset(&r, 0, sizeof(r));
 	switch (who) {
 	case RUSAGE_SELF:
-		task_cputime_t(current, &utime, &stime);
-		utime_jiffies = cputime_to_jiffies(utime);
-		stime_jiffies = cputime_to_jiffies(stime);
+		task_cputime(current, &utime, &stime);
+		utime_jiffies = nsecs_to_jiffies(utime);
+		stime_jiffies = nsecs_to_jiffies(stime);
 		jiffies_to_timeval32(utime_jiffies, &r.ru_utime);
 		jiffies_to_timeval32(stime_jiffies, &r.ru_stime);
 		r.ru_minflt = current->min_flt;
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 14/30] x86: Convert obsolete cputime type to nsecs
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (12 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 13/30] alpha: Convert obsolete cputime_t " Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 15/30] isdn: " Frederic Weisbecker
                   ` (15 subsequent siblings)
  29 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/x86/kernel/apm_32.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/apm_32.c b/arch/x86/kernel/apm_32.c
index bffca50..63504a4 100644
--- a/arch/x86/kernel/apm_32.c
+++ b/arch/x86/kernel/apm_32.c
@@ -906,21 +906,21 @@ static int apm_cpu_idle(struct cpuidle_device *dev,
 {
 	static int use_apm_idle; /* = 0 */
 	static unsigned int last_jiffies; /* = 0 */
-	static unsigned int last_stime; /* = 0 */
-	cputime_t stime;
+	static u64 last_stime; /* = 0 */
+	u64 stime;
 
 	int apm_idle_done = 0;
 	unsigned int jiffies_since_last_check = jiffies - last_jiffies;
 	unsigned int bucket;
 
 recalc:
-	task_cputime_t(current, NULL, &stime);
+	task_cputime(current, NULL, &stime);
 	if (jiffies_since_last_check > IDLE_CALC_LIMIT) {
 		use_apm_idle = 0;
 	} else if (jiffies_since_last_check > idle_period) {
 		unsigned int idle_percentage;
 
-		idle_percentage = cputime_to_jiffies(stime - last_stime);
+		idle_percentage = nsecs_to_jiffies(stime - last_stime);
 		idle_percentage *= 100;
 		idle_percentage /= jiffies_since_last_check;
 		use_apm_idle = (idle_percentage > idle_threshold);
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 15/30] isdn: Convert obsolete cputime type to nsecs
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (13 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 14/30] x86: Convert obsolete cputime type " Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 16/30] binfmt: " Frederic Weisbecker
                   ` (14 subsequent siblings)
  29 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

Not sure if MISDN stats are ABI but it displays task cputime in cputime_t
raw value regardless of what type cputime_t wraps which could be either
jiffies, nsecs, usecs, or whatever random time unit. Plus it wrongly
assumes that cputime_t is long.

Given that this dump is broken anyway, lets just display the nanosec
value and stick with that.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 drivers/isdn/mISDN/stack.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/isdn/mISDN/stack.c b/drivers/isdn/mISDN/stack.c
index 0a36617..b324474 100644
--- a/drivers/isdn/mISDN/stack.c
+++ b/drivers/isdn/mISDN/stack.c
@@ -203,7 +203,7 @@ mISDNStackd(void *data)
 {
 	struct mISDNstack *st = data;
 #ifdef MISDN_MSG_STATS
-	cputime_t utime, stime;
+	u64 utime, stime;
 #endif
 	int err = 0;
 
@@ -306,9 +306,9 @@ mISDNStackd(void *data)
 	       "msg %d sleep %d stopped\n",
 	       dev_name(&st->dev->dev), st->msg_cnt, st->sleep_cnt,
 	       st->stopped_cnt);
-	task_cputime_t(st->thread, &utime, &stime);
+	task_cputime(st->thread, &utime, &stime);
 	printk(KERN_DEBUG
-	       "mISDNStackd daemon for %s utime(%ld) stime(%ld)\n",
+	       "mISDNStackd daemon for %s utime(%llu) stime(%llu)\n",
 	       dev_name(&st->dev->dev), utime, stime);
 	printk(KERN_DEBUG
 	       "mISDNStackd daemon for %s nvcsw(%ld) nivcsw(%ld)\n",
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 16/30] binfmt: Convert obsolete cputime type to nsecs
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (14 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 15/30] isdn: " Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 17/30] acct: " Frederic Weisbecker
                   ` (13 subsequent siblings)
  29 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 fs/binfmt_elf.c        | 26 ++++++++++----------------
 fs/binfmt_elf_fdpic.c  | 16 ++++++++--------
 fs/compat_binfmt_elf.c | 20 +++++++++++---------
 3 files changed, 29 insertions(+), 33 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 646cfc3..4123f23 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1283,8 +1283,6 @@ static void fill_note(struct memelfnote *note, const char *name, int type,
 static void fill_prstatus(struct elf_prstatus *prstatus,
 		struct task_struct *p, long signr)
 {
-	struct timeval tv;
-
 	prstatus->pr_info.si_signo = prstatus->pr_cursig = signr;
 	prstatus->pr_sigpend = p->pending.signal.sig[0];
 	prstatus->pr_sighold = p->blocked.sig[0];
@@ -1295,29 +1293,25 @@ static void fill_prstatus(struct elf_prstatus *prstatus,
 	prstatus->pr_pgrp = task_pgrp_vnr(p);
 	prstatus->pr_sid = task_session_vnr(p);
 	if (thread_group_leader(p)) {
-		struct task_cputime_t cputime;
+		struct task_cputime cputime;
 
 		/*
 		 * This is the record for the group leader.  It shows the
 		 * group-wide total, not its individual thread total.
 		 */
-		thread_group_cputime_t(p, &cputime);
-		cputime_to_timeval(cputime.utime, &prstatus->pr_utime);
-		cputime_to_timeval(cputime.stime, &prstatus->pr_stime);
+		thread_group_cputime(p, &cputime);
+		prstatus->pr_utime = ns_to_timeval(cputime.utime);
+		prstatus->pr_stime = ns_to_timeval(cputime.stime);
 	} else {
-		cputime_t utime, stime;
+		u64 utime, stime;
 
-		task_cputime_t(p, &utime, &stime);
-		cputime_to_timeval(utime, &prstatus->pr_utime);
-		cputime_to_timeval(stime, &prstatus->pr_stime);
+		task_cputime(p, &utime, &stime);
+		prstatus->pr_utime = ns_to_timeval(utime);
+		prstatus->pr_stime = ns_to_timeval(stime);
 	}
-	tv = ns_to_timeval(p->signal->cutime);
-	prstatus->pr_cutime.tv_sec = tv.tv_sec;
-	prstatus->pr_cutime.tv_usec = tv.tv_usec;
 
-	tv = ns_to_timeval(p->signal->cstime);
-	prstatus->pr_cstime.tv_sec = tv.tv_sec;
-	prstatus->pr_cstime.tv_usec = tv.tv_usec;
+	prstatus->pr_cutime = ns_to_timeval(p->signal->cutime);
+	prstatus->pr_cstime = ns_to_timeval(p->signal->cstime);
 }
 
 static int fill_psinfo(struct elf_prpsinfo *psinfo, struct task_struct *p,
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index fecdb6d..de4bb4c 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -1336,21 +1336,21 @@ static void fill_prstatus(struct elf_prstatus *prstatus,
 	prstatus->pr_pgrp = task_pgrp_vnr(p);
 	prstatus->pr_sid = task_session_vnr(p);
 	if (thread_group_leader(p)) {
-		struct task_cputime_t cputime;
+		struct task_cputime cputime;
 
 		/*
 		 * This is the record for the group leader.  It shows the
 		 * group-wide total, not its individual thread total.
 		 */
-		thread_group_cputime_t(p, &cputime);
-		cputime_to_timeval(cputime.utime, &prstatus->pr_utime);
-		cputime_to_timeval(cputime.stime, &prstatus->pr_stime);
+		thread_group_cputime(p, &cputime);
+		prstatus->pr_utime = ns_to_timeval(cputime.utime);
+		prstatus->pr_stime = ns_to_timeval(cputime.stime);
 	} else {
-		cputime_t utime, stime;
+		u64 utime, stime;
 
-		task_cputime_t(p, &utime, &stime);
-		cputime_to_timeval(utime, &prstatus->pr_utime);
-		cputime_to_timeval(stime, &prstatus->pr_stime);
+		task_cputime(p, &utime, &stime);
+		prstatus->pr_utime = ns_to_timeval(utime);
+		prstatus->pr_stime = ns_to_timeval(stime);
 	}
 	prstatus->pr_cutime = ns_to_timeval(p->signal->cutime);
 	prstatus->pr_cstime = ns_to_timeval(p->signal->cstime);
diff --git a/fs/compat_binfmt_elf.c b/fs/compat_binfmt_elf.c
index 4d24d17..73fa05d 100644
--- a/fs/compat_binfmt_elf.c
+++ b/fs/compat_binfmt_elf.c
@@ -52,21 +52,23 @@
 #define elf_prpsinfo	compat_elf_prpsinfo
 
 /*
- * Compat version of cputime_to_compat_timeval, perhaps this
+ * Compat version of ns_to_timeval, perhaps this
  * should be an inline in <linux/compat.h>.
  */
-static void cputime_to_compat_timeval(const cputime_t cputime,
-				      struct compat_timeval *value)
+static struct compat_timeval ns_to_compat_timeval(const s64 nsec)
 {
 	struct timeval tv;
-	cputime_to_timeval(cputime, &tv);
-	value->tv_sec = tv.tv_sec;
-	value->tv_usec = tv.tv_usec;
+	struct compat_timeval ctv;
+
+	tv = ns_to_timeval(nsec);
+	ctv.tv_sec = tv.tv_sec;
+	ctv.tv_usec = tv.tv_usec;
+
+	return ctv;
 }
 
-#undef cputime_to_timeval
-#define cputime_to_timeval cputime_to_compat_timeval
-
+#undef ns_to_timeval
+#define ns_to_timeval ns_to_compat_timeval
 
 /*
  * To use this file, asm/elf.h must define compat_elf_check_arch.
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 17/30] acct: Convert obsolete cputime type to nsecs
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (15 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 16/30] binfmt: " Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 18/30] delaycct: " Frederic Weisbecker
                   ` (12 subsequent siblings)
  29 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/sched.h | 2 +-
 kernel/acct.c         | 9 +++++----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3be3b0b..6c8c077 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -469,7 +469,7 @@ struct pacct_struct {
 	int			ac_flag;
 	long			ac_exitcode;
 	unsigned long		ac_mem;
-	cputime_t		ac_utime, ac_stime;
+	u64			ac_utime, ac_stime;
 	unsigned long		ac_minflt, ac_majflt;
 };
 
diff --git a/kernel/acct.c b/kernel/acct.c
index acfa901..50a0c4a 100644
--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -455,8 +455,8 @@ static void fill_ac(acct_t *ac)
 	spin_lock_irq(&current->sighand->siglock);
 	tty = current->signal->tty;	/* Safe as we hold the siglock */
 	ac->ac_tty = tty ? old_encode_dev(tty_devnum(tty)) : 0;
-	ac->ac_utime = encode_comp_t(jiffies_to_AHZ(cputime_to_jiffies(pacct->ac_utime)));
-	ac->ac_stime = encode_comp_t(jiffies_to_AHZ(cputime_to_jiffies(pacct->ac_stime)));
+	ac->ac_utime = encode_comp_t(nsec_to_AHZ(pacct->ac_utime));
+	ac->ac_stime = encode_comp_t(nsec_to_AHZ(pacct->ac_stime));
 	ac->ac_flag = pacct->ac_flag;
 	ac->ac_mem = encode_comp_t(pacct->ac_mem);
 	ac->ac_minflt = encode_comp_t(pacct->ac_minflt);
@@ -532,7 +532,7 @@ out:
 void acct_collect(long exitcode, int group_dead)
 {
 	struct pacct_struct *pacct = &current->signal->pacct;
-	cputime_t utime, stime;
+	u64 utime, stime;
 	unsigned long vsize = 0;
 
 	if (group_dead && current->mm) {
@@ -561,7 +561,8 @@ void acct_collect(long exitcode, int group_dead)
 		pacct->ac_flag |= ACORE;
 	if (current->flags & PF_SIGNALED)
 		pacct->ac_flag |= AXSIG;
-	task_cputime_t(current, &utime, &stime);
+
+	task_cputime(current, &utime, &stime);
 	pacct->ac_utime += utime;
 	pacct->ac_stime += stime;
 	pacct->ac_minflt += current->min_flt;
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 18/30] delaycct: Convert obsolete cputime type to nsecs
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (16 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 17/30] acct: " Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 19/30] tsacct: " Frederic Weisbecker
                   ` (11 subsequent siblings)
  29 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/delayacct.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/delayacct.c b/kernel/delayacct.c
index 8cf2179..7719063 100644
--- a/kernel/delayacct.c
+++ b/kernel/delayacct.c
@@ -82,19 +82,19 @@ void __delayacct_blkio_end(void)
 
 int __delayacct_add_tsk(struct taskstats *d, struct task_struct *tsk)
 {
-	cputime_t utime, stime, stimescaled, utimescaled;
+	u64 utime, stime, stimescaled, utimescaled;
 	unsigned long long t2, t3;
 	unsigned long flags, t1;
 	s64 tmp;
 
-	task_cputime_t(tsk, &utime, &stime);
+	task_cputime(tsk, &utime, &stime);
 	tmp = (s64)d->cpu_run_real_total;
-	tmp += cputime_to_nsecs(utime + stime);
+	tmp += utime + stime;
 	d->cpu_run_real_total = (tmp < (s64)d->cpu_run_real_total) ? 0 : tmp;
 
-	task_cputime_t_scaled(tsk, &utimescaled, &stimescaled);
+	task_cputime_scaled(tsk, &utimescaled, &stimescaled);
 	tmp = (s64)d->cpu_scaled_run_real_total;
-	tmp += cputime_to_nsecs(utimescaled + stimescaled);
+	tmp += utimescaled + stimescaled;
 	d->cpu_scaled_run_real_total =
 		(tmp < (s64)d->cpu_scaled_run_real_total) ? 0 : tmp;
 
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 19/30] tsacct: Convert obsolete cputime type to nsecs
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (17 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 18/30] delaycct: " Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 20/30] signal: " Frederic Weisbecker
                   ` (10 subsequent siblings)
  29 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/sched.h |  2 +-
 kernel/tsacct.c       | 24 ++++++++++++------------
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6c8c077..9ce575e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1523,7 +1523,7 @@ struct task_struct {
 #if defined(CONFIG_TASK_XACCT)
 	u64 acct_rss_mem1;	/* accumulated rss usage */
 	u64 acct_vm_mem1;	/* accumulated virtual memory usage */
-	cputime_t acct_timexpd;	/* stime + utime since last update */
+	u64 acct_timexpd;	/* stime + utime since last update */
 #endif
 #ifdef CONFIG_CPUSETS
 	nodemask_t mems_allowed;	/* Protected by alloc_lock */
diff --git a/kernel/tsacct.c b/kernel/tsacct.c
index 4133e94..a66881e 100644
--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -31,7 +31,7 @@ void bacct_add_tsk(struct user_namespace *user_ns,
 		   struct taskstats *stats, struct task_struct *tsk)
 {
 	const struct cred *tcred;
-	cputime_t utime, stime, utimescaled, stimescaled;
+	u64 utime, stime, utimescaled, stimescaled;
 	u64 delta;
 
 	BUILD_BUG_ON(TS_COMM_LEN < TASK_COMM_LEN);
@@ -66,13 +66,13 @@ void bacct_add_tsk(struct user_namespace *user_ns,
 		task_tgid_nr_ns(rcu_dereference(tsk->real_parent), pid_ns) : 0;
 	rcu_read_unlock();
 
-	task_cputime_t(tsk, &utime, &stime);
-	stats->ac_utime = cputime_to_usecs(utime);
-	stats->ac_stime = cputime_to_usecs(stime);
+	task_cputime(tsk, &utime, &stime);
+	stats->ac_utime = div_u64(utime, NSEC_PER_USEC);
+	stats->ac_stime = div_u64(stime, NSEC_PER_USEC);
 
-	task_cputime_t_scaled(tsk, &utimescaled, &stimescaled);
-	stats->ac_utimescaled = cputime_to_usecs(utimescaled);
-	stats->ac_stimescaled = cputime_to_usecs(stimescaled);
+	task_cputime_scaled(tsk, &utimescaled, &stimescaled);
+	stats->ac_utimescaled = div_u64(utimescaled, NSEC_PER_USEC);
+	stats->ac_stimescaled = div_u64(stimescaled, NSEC_PER_USEC);
 
 	stats->ac_minflt = tsk->min_flt;
 	stats->ac_majflt = tsk->maj_flt;
@@ -121,10 +121,10 @@ void xacct_add_tsk(struct taskstats *stats, struct task_struct *p)
 #undef MB
 
 static void __acct_update_integrals(struct task_struct *tsk,
-				    cputime_t utime, cputime_t stime)
+				    u64 utime, u64 stime)
 {
 	if (likely(tsk->mm)) {
-		cputime_t time, dtime;
+		u64 time, dtime;
 		struct timeval value;
 		unsigned long flags;
 		u64 delta;
@@ -132,7 +132,7 @@ static void __acct_update_integrals(struct task_struct *tsk,
 		local_irq_save(flags);
 		time = stime + utime;
 		dtime = time - tsk->acct_timexpd;
-		jiffies_to_timeval(cputime_to_jiffies(dtime), &value);
+		value = ns_to_timeval(dtime);
 		delta = value.tv_sec;
 		delta = delta * USEC_PER_SEC + value.tv_usec;
 
@@ -152,9 +152,9 @@ static void __acct_update_integrals(struct task_struct *tsk,
  */
 void acct_update_integrals(struct task_struct *tsk)
 {
-	cputime_t utime, stime;
+	u64 utime, stime;
 
-	task_cputime_t(tsk, &utime, &stime);
+	task_cputime(tsk, &utime, &stime);
 	__acct_update_integrals(tsk, utime, stime);
 }
 
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 20/30] signal: Convert obsolete cputime type to nsecs
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (18 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 19/30] tsacct: " Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 21/30] cputime: Remove task_cputime_t_scaled Frederic Weisbecker
                   ` (9 subsequent siblings)
  29 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/signal.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index b0f1f39..d50006e 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1618,7 +1618,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	unsigned long flags;
 	struct sighand_struct *psig;
 	bool autoreap = false;
-	cputime_t utime, stime;
+	u64 utime, stime;
 
 	BUG_ON(sig == -1);
 
@@ -1656,9 +1656,9 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 				       task_uid(tsk));
 	rcu_read_unlock();
 
-	task_cputime_t(tsk, &utime, &stime);
-	info.si_utime = cputime_to_clock_t(utime + tsk->signal->utime);
-	info.si_stime = cputime_to_clock_t(stime + tsk->signal->stime);
+	task_cputime(tsk, &utime, &stime);
+	info.si_utime = nsec_to_clock_t(utime + tsk->signal->utime);
+	info.si_stime = nsec_to_clock_t(stime + tsk->signal->stime);
 
 	info.si_status = tsk->exit_code & 0x7f;
 	if (tsk->exit_code & 0x80)
@@ -1722,7 +1722,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	unsigned long flags;
 	struct task_struct *parent;
 	struct sighand_struct *sighand;
-	cputime_t utime, stime;
+	u64 utime, stime;
 
 	if (for_ptracer) {
 		parent = tsk->parent;
@@ -1741,9 +1741,9 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	info.si_uid = from_kuid_munged(task_cred_xxx(parent, user_ns), task_uid(tsk));
 	rcu_read_unlock();
 
-	task_cputime_t(tsk, &utime, &stime);
-	info.si_utime = cputime_to_clock_t(utime);
-	info.si_stime = cputime_to_clock_t(stime);
+	task_cputime(tsk, &utime, &stime);
+	info.si_utime = nsec_to_clock_t(utime);
+	info.si_stime = nsec_to_clock_t(stime);
 
  	info.si_code = why;
  	switch (why) {
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 21/30] cputime: Remove task_cputime_t_scaled
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (19 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 20/30] signal: " Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 22/30] u64_stats_sync: Introduce preempt-unsafe readers Frederic Weisbecker
                   ` (8 subsequent siblings)
  29 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

All readers of tsk->[us]timescaled have been converted to the new
nsec based cputime type.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/sched.h | 13 -------------
 1 file changed, 13 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9ce575e..659f068 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1920,19 +1920,6 @@ static inline void task_cputime_t(struct task_struct *t,
 		*stime = nsecs_to_cputime(st);
 }
 
-static inline void task_cputime_t_scaled(struct task_struct *t,
-					 cputime_t *utimescaled,
-					 cputime_t *stimescaled)
-{
-	u64 ut, st;
-
-	task_cputime_scaled(t, &ut, &st);
-	if (utimescaled)
-		*utimescaled = nsecs_to_cputime(ut);
-	if (stimescaled)
-		*stimescaled = nsecs_to_cputime(st);
-}
-
 extern void task_cputime_adjusted(struct task_struct *p, u64 *ut, u64 *st);
 extern void thread_group_cputime_adjusted(struct task_struct *p, u64 *ut, u64 *st);
 
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 22/30] u64_stats_sync: Introduce preempt-unsafe readers
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (20 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 21/30] cputime: Remove task_cputime_t_scaled Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 23/30] cputime: Convert irq_time_accounting to use u64_stats_sync Frederic Weisbecker
                   ` (7 subsequent siblings)
  29 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

This is going to be used by irqtime accounting. The scheduler accesses
irqtime from fast-path where preemption is already disabled.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/u64_stats_sync.h | 35 +++++++++++++++++++++++------------
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/include/linux/u64_stats_sync.h b/include/linux/u64_stats_sync.h
index 4b4439e..20c26dc 100644
--- a/include/linux/u64_stats_sync.h
+++ b/include/linux/u64_stats_sync.h
@@ -88,29 +88,40 @@ static inline void u64_stats_update_end(struct u64_stats_sync *syncp)
 #endif
 }
 
+/* Preempt-unsafe version of u64_stats_fetch_begin */
+static inline unsigned int __u64_stats_fetch_begin(const struct u64_stats_sync *syncp)
+{
+#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
+	return read_seqcount_begin(&syncp->seq);
+#endif
+	return 0;
+}
+
+/* Preempt-unsafe version of u64_stats_fetch_retry */
+static inline bool __u64_stats_fetch_retry(const struct u64_stats_sync *syncp,
+					 unsigned int start)
+{
+#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
+	return read_seqcount_retry(&syncp->seq, start);
+#endif
+	return false;
+}
+
 static inline unsigned int u64_stats_fetch_begin(const struct u64_stats_sync *syncp)
 {
-#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
-	return read_seqcount_begin(&syncp->seq);
-#else
-#if BITS_PER_LONG==32
+#if BITS_PER_LONG==32 && !defined(CONFIG_SMP)
 	preempt_disable();
 #endif
-	return 0;
-#endif
+	return __u64_stats_fetch_begin(syncp);
 }
 
 static inline bool u64_stats_fetch_retry(const struct u64_stats_sync *syncp,
 					 unsigned int start)
 {
-#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
-	return read_seqcount_retry(&syncp->seq, start);
-#else
-#if BITS_PER_LONG==32
+#if BITS_PER_LONG==32 && !defined(CONFIG_SMP)
 	preempt_enable();
 #endif
-	return false;
-#endif
+	return __u64_stats_fetch_retry(syncp, start);
 }
 
 /*
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 23/30] cputime: Convert irq_time_accounting to use u64_stats_sync
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (21 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 22/30] u64_stats_sync: Introduce preempt-unsafe readers Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 24/30] cputime: Increment kcpustat directly on irqtime account Frederic Weisbecker
                   ` (6 subsequent siblings)
  29 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

Irqtime accounting internals uses open-coded u64_stats_sync. Lets
consolidate it with the relevant APIs.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/sched/cputime.c | 24 +++++++++---------------
 kernel/sched/sched.h   | 47 +++++++++++++----------------------------------
 2 files changed, 22 insertions(+), 49 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index eefe1ec..f55633f 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -20,10 +20,8 @@
  * task when irq is in progress while we read rq->clock. That is a worthy
  * compromise in place of having locks on each irq in account_system_time.
  */
-DEFINE_PER_CPU(u64, cpu_hardirq_time);
-DEFINE_PER_CPU(u64, cpu_softirq_time);
+DEFINE_PER_CPU(struct cpu_irqtime, cpu_irqtime);
 
-static DEFINE_PER_CPU(u64, irq_start_time);
 static int sched_clock_irqtime;
 
 void enable_sched_clock_irqtime(void)
@@ -36,10 +34,6 @@ void disable_sched_clock_irqtime(void)
 	sched_clock_irqtime = 0;
 }
 
-#ifndef CONFIG_64BIT
-DEFINE_PER_CPU(seqcount_t, irq_time_seq);
-#endif /* CONFIG_64BIT */
-
 /*
  * Called before incrementing preempt_count on {soft,}irq_enter
  * and before decrementing preempt_count on {soft,}irq_exit.
@@ -56,10 +50,10 @@ void irqtime_account_irq(struct task_struct *curr)
 	local_irq_save(flags);
 
 	cpu = smp_processor_id();
-	delta = sched_clock_cpu(cpu) - __this_cpu_read(irq_start_time);
-	__this_cpu_add(irq_start_time, delta);
+	delta = sched_clock_cpu(cpu) - __this_cpu_read(cpu_irqtime.irq_start_time);
+	__this_cpu_add(cpu_irqtime.irq_start_time, delta);
 
-	irq_time_write_begin();
+	u64_stats_update_begin(this_cpu_ptr(&cpu_irqtime.stats_sync));
 	/*
 	 * We do not account for softirq time from ksoftirqd here.
 	 * We want to continue accounting softirq time to ksoftirqd thread
@@ -67,11 +61,11 @@ void irqtime_account_irq(struct task_struct *curr)
 	 * that do not consume any time, but still wants to run.
 	 */
 	if (hardirq_count())
-		__this_cpu_add(cpu_hardirq_time, delta);
+		__this_cpu_add(cpu_irqtime.hardirq_time, delta);
 	else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
-		__this_cpu_add(cpu_softirq_time, delta);
+		__this_cpu_add(cpu_irqtime.softirq_time, delta);
 
-	irq_time_write_end();
+	u64_stats_update_end(this_cpu_ptr(&cpu_irqtime.stats_sync));
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(irqtime_account_irq);
@@ -84,7 +78,7 @@ static int irqtime_account_hi_update(u64 threshold)
 	int ret = 0;
 
 	local_irq_save(flags);
-	latest_ns = this_cpu_read(cpu_hardirq_time);
+	latest_ns = this_cpu_read(cpu_irqtime.hardirq_time);
 	if (latest_ns - cpustat[CPUTIME_IRQ] > threshold)
 		ret = 1;
 	local_irq_restore(flags);
@@ -99,7 +93,7 @@ static int irqtime_account_si_update(u64 threshold)
 	int ret = 0;
 
 	local_irq_save(flags);
-	latest_ns = this_cpu_read(cpu_softirq_time);
+	latest_ns = this_cpu_read(cpu_irqtime.softirq_time);
 	if (latest_ns - cpustat[CPUTIME_SOFTIRQ] > threshold)
 		ret = 1;
 	local_irq_restore(flags);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 24156c84..bb3e66f 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -8,6 +8,7 @@
 #include <linux/stop_machine.h>
 #include <linux/tick.h>
 #include <linux/slab.h>
+#include <linux/u64_stats_sync.h>
 
 #include "cpupri.h"
 #include "cpudeadline.h"
@@ -1521,49 +1522,27 @@ enum rq_nohz_flag_bits {
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
 
-DECLARE_PER_CPU(u64, cpu_hardirq_time);
-DECLARE_PER_CPU(u64, cpu_softirq_time);
+struct cpu_irqtime {
+	u64			hardirq_time;
+	u64			softirq_time;
+	u64			irq_start_time;
+	struct u64_stats_sync	stats_sync;
+};
 
-#ifndef CONFIG_64BIT
-DECLARE_PER_CPU(seqcount_t, irq_time_seq);
-
-static inline void irq_time_write_begin(void)
-{
-	__this_cpu_inc(irq_time_seq.sequence);
-	smp_wmb();
-}
-
-static inline void irq_time_write_end(void)
-{
-	smp_wmb();
-	__this_cpu_inc(irq_time_seq.sequence);
-}
+DECLARE_PER_CPU(struct cpu_irqtime, cpu_irqtime);
 
+/* Must be called with preemption disabled */
 static inline u64 irq_time_read(int cpu)
 {
 	u64 irq_time;
 	unsigned seq;
 
 	do {
-		seq = read_seqcount_begin(&per_cpu(irq_time_seq, cpu));
-		irq_time = per_cpu(cpu_softirq_time, cpu) +
-			   per_cpu(cpu_hardirq_time, cpu);
-	} while (read_seqcount_retry(&per_cpu(irq_time_seq, cpu), seq));
+		seq = __u64_stats_fetch_begin(&per_cpu(cpu_irqtime, cpu).stats_sync);
+		irq_time = per_cpu(cpu_irqtime.softirq_time, cpu) +
+			   per_cpu(cpu_irqtime.hardirq_time, cpu);
+	} while (__u64_stats_fetch_retry(&per_cpu(cpu_irqtime, cpu).stats_sync, seq));
 
 	return irq_time;
 }
-#else /* CONFIG_64BIT */
-static inline void irq_time_write_begin(void)
-{
-}
-
-static inline void irq_time_write_end(void)
-{
-}
-
-static inline u64 irq_time_read(int cpu)
-{
-	return per_cpu(cpu_softirq_time, cpu) + per_cpu(cpu_hardirq_time, cpu);
-}
-#endif /* CONFIG_64BIT */
 #endif /* CONFIG_IRQ_TIME_ACCOUNTING */
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 24/30] cputime: Increment kcpustat directly on irqtime account
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (22 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 23/30] cputime: Convert irq_time_accounting to use u64_stats_sync Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-12-01 14:41   ` Martin Schwidefsky
  2014-11-28 18:23 ` [RFC PATCH 25/30] cputime: Remove temporary irqtime states Frederic Weisbecker
                   ` (5 subsequent siblings)
  29 siblings, 1 reply; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

The irqtime is accounted is nsecs and stored in
cpu_irq_time.hardirq_time and cpu_irq_time.softirq_time. Once the
accumulated amount reaches a new jiffy, this one gets accounted to the
kcpustat.

This was necessary when kcpustat was stored in cputime_t, which could at
worst have a jiffies granularity. But now kcpustat is stored in nsecs
so this whole discretization game with temporary irqtime storage has
become unnecessary.

We can now directly account the irqtime to the kcpustat.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/sched/cputime.c | 48 ++++++++++++++++++++----------------------------
 kernel/sched/sched.h   |  1 +
 2 files changed, 21 insertions(+), 28 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index f55633f..6e3beba 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -40,6 +40,7 @@ void disable_sched_clock_irqtime(void)
  */
 void irqtime_account_irq(struct task_struct *curr)
 {
+	u64 *cpustat;
 	unsigned long flags;
 	s64 delta;
 	int cpu;
@@ -47,6 +48,8 @@ void irqtime_account_irq(struct task_struct *curr)
 	if (!sched_clock_irqtime)
 		return;
 
+	cpustat = kcpustat_this_cpu->cpustat;
+
 	local_irq_save(flags);
 
 	cpu = smp_processor_id();
@@ -60,42 +63,33 @@ void irqtime_account_irq(struct task_struct *curr)
 	 * in that case, so as not to confuse scheduler with a special task
 	 * that do not consume any time, but still wants to run.
 	 */
-	if (hardirq_count())
+	if (hardirq_count()) {
 		__this_cpu_add(cpu_irqtime.hardirq_time, delta);
-	else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
+		cpustat[CPUTIME_IRQ] += delta;
+		__this_cpu_add(cpu_irqtime.tick_skip, delta);
+	} else if (in_serving_softirq() && curr != this_cpu_ksoftirqd()) {
 		__this_cpu_add(cpu_irqtime.softirq_time, delta);
+		cpustat[CPUTIME_SOFTIRQ] += delta;
+		__this_cpu_add(cpu_irqtime.tick_skip, delta);
+	}
 
 	u64_stats_update_end(this_cpu_ptr(&cpu_irqtime.stats_sync));
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(irqtime_account_irq);
 
-static int irqtime_account_hi_update(u64 threshold)
+static int irqtime_skip_tick(u64 cputime)
 {
-	u64 *cpustat = kcpustat_this_cpu->cpustat;
 	unsigned long flags;
-	u64 latest_ns;
+	u64 skip;
 	int ret = 0;
 
 	local_irq_save(flags);
-	latest_ns = this_cpu_read(cpu_irqtime.hardirq_time);
-	if (latest_ns - cpustat[CPUTIME_IRQ] > threshold)
-		ret = 1;
-	local_irq_restore(flags);
-	return ret;
-}
-
-static int irqtime_account_si_update(u64 threshold)
-{
-	u64 *cpustat = kcpustat_this_cpu->cpustat;
-	unsigned long flags;
-	u64 latest_ns;
-	int ret = 0;
-
-	local_irq_save(flags);
-	latest_ns = this_cpu_read(cpu_irqtime.softirq_time);
-	if (latest_ns - cpustat[CPUTIME_SOFTIRQ] > threshold)
+	skip = this_cpu_read(cpu_irqtime.tick_skip);
+	if (cputime >= skip) {
+		__this_cpu_sub(cpu_irqtime.tick_skip, cputime);
 		ret = 1;
+	}
 	local_irq_restore(flags);
 	return ret;
 }
@@ -336,7 +330,6 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
 	cputime_t scaled = cputime_to_scaled(cputime_one_jiffy);
 	u64 cputime = (__force u64) cputime_one_jiffy;
 	u64 nsec = cputime_to_nsecs(cputime); //TODO: make that build time
-	u64 *cpustat = kcpustat_this_cpu->cpustat;
 
 	if (steal_account_process_tick())
 		return;
@@ -344,11 +337,10 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
 	cputime *= ticks;
 	scaled *= ticks;
 
-	if (irqtime_account_hi_update(nsec)) {
-		cpustat[CPUTIME_IRQ] += nsec;
-	} else if (irqtime_account_si_update(nsec)) {
-		cpustat[CPUTIME_SOFTIRQ] += nsec;
-	} else if (this_cpu_ksoftirqd() == p) {
+	if (irqtime_skip_tick(nsec))
+		return;
+
+	if (this_cpu_ksoftirqd() == p) {
 		/*
 		 * ksoftirqd time do not get accounted in cpu_softirq_time.
 		 * So, we have to handle it separately here.
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index bb3e66f..f613053 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1526,6 +1526,7 @@ struct cpu_irqtime {
 	u64			hardirq_time;
 	u64			softirq_time;
 	u64			irq_start_time;
+	u64			tick_skip;
 	struct u64_stats_sync	stats_sync;
 };
 
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 25/30] cputime: Remove temporary irqtime states
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (23 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 24/30] cputime: Increment kcpustat directly on irqtime account Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 26/30] cputime: Push time to account_user_time() in nanosecs Frederic Weisbecker
                   ` (4 subsequent siblings)
  29 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

Now that the temporary irqtime storage has become unnecessary, lets
remove it.

This involves to move the u64_stat_sync seqlock to the kcpustat directly
in order to keep coherent irqtime reads from the scheduler.

This seqlock can be used as well for other kcpustat. The need hasn't
yet arised as nobody seem to complain about possible erroneous
/proc/cpustat values due to 64 bits values read in two passes in 32 bits
CPUs. But at least we are prepared for that.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/kernel_stat.h |  2 ++
 kernel/sched/cputime.c      | 40 ++++++++++++++++++++++------------------
 kernel/sched/sched.h        | 22 ++++++----------------
 3 files changed, 30 insertions(+), 34 deletions(-)

diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 8422b4e..585ced4 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -10,6 +10,7 @@
 #include <linux/vtime.h>
 #include <asm/irq.h>
 #include <linux/cputime.h>
+#include <linux/u64_stats_sync.h>
 
 /*
  * 'kernel_stat.h' contains the definitions needed for doing
@@ -33,6 +34,7 @@ enum cpu_usage_stat {
 
 struct kernel_cpustat {
 	u64 cpustat[NR_STATS];
+	struct u64_stats_sync stats_sync;
 };
 
 struct kernel_stat {
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 6e3beba..f675008 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -8,6 +8,23 @@
 
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
+struct cpu_irqtime {
+	u64			irq_start_time;
+	u64			tick_skip;
+};
+static DEFINE_PER_CPU(struct cpu_irqtime, cpu_irqtime);
+
+static int sched_clock_irqtime;
+
+void enable_sched_clock_irqtime(void)
+{
+	sched_clock_irqtime = 1;
+}
+
+void disable_sched_clock_irqtime(void)
+{
+	sched_clock_irqtime = 0;
+}
 
 /*
  * There are no locks covering percpu hardirq/softirq time.
@@ -20,19 +37,6 @@
  * task when irq is in progress while we read rq->clock. That is a worthy
  * compromise in place of having locks on each irq in account_system_time.
  */
-DEFINE_PER_CPU(struct cpu_irqtime, cpu_irqtime);
-
-static int sched_clock_irqtime;
-
-void enable_sched_clock_irqtime(void)
-{
-	sched_clock_irqtime = 1;
-}
-
-void disable_sched_clock_irqtime(void)
-{
-	sched_clock_irqtime = 0;
-}
 
 /*
  * Called before incrementing preempt_count on {soft,}irq_enter
@@ -40,6 +44,7 @@ void disable_sched_clock_irqtime(void)
  */
 void irqtime_account_irq(struct task_struct *curr)
 {
+	struct kernel_cpustat *kcpustat;
 	u64 *cpustat;
 	unsigned long flags;
 	s64 delta;
@@ -48,7 +53,8 @@ void irqtime_account_irq(struct task_struct *curr)
 	if (!sched_clock_irqtime)
 		return;
 
-	cpustat = kcpustat_this_cpu->cpustat;
+	kcpustat = kcpustat_this_cpu;
+	cpustat = kcpustat->cpustat;
 
 	local_irq_save(flags);
 
@@ -56,7 +62,7 @@ void irqtime_account_irq(struct task_struct *curr)
 	delta = sched_clock_cpu(cpu) - __this_cpu_read(cpu_irqtime.irq_start_time);
 	__this_cpu_add(cpu_irqtime.irq_start_time, delta);
 
-	u64_stats_update_begin(this_cpu_ptr(&cpu_irqtime.stats_sync));
+	u64_stats_update_begin(&kcpustat->stats_sync);
 	/*
 	 * We do not account for softirq time from ksoftirqd here.
 	 * We want to continue accounting softirq time to ksoftirqd thread
@@ -64,16 +70,14 @@ void irqtime_account_irq(struct task_struct *curr)
 	 * that do not consume any time, but still wants to run.
 	 */
 	if (hardirq_count()) {
-		__this_cpu_add(cpu_irqtime.hardirq_time, delta);
 		cpustat[CPUTIME_IRQ] += delta;
 		__this_cpu_add(cpu_irqtime.tick_skip, delta);
 	} else if (in_serving_softirq() && curr != this_cpu_ksoftirqd()) {
-		__this_cpu_add(cpu_irqtime.softirq_time, delta);
 		cpustat[CPUTIME_SOFTIRQ] += delta;
 		__this_cpu_add(cpu_irqtime.tick_skip, delta);
 	}
 
-	u64_stats_update_end(this_cpu_ptr(&cpu_irqtime.stats_sync));
+	u64_stats_update_end(&kcpustat->stats_sync);
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(irqtime_account_irq);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index f613053..1ca6c82 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -8,7 +8,7 @@
 #include <linux/stop_machine.h>
 #include <linux/tick.h>
 #include <linux/slab.h>
-#include <linux/u64_stats_sync.h>
+#include <linux/kernel_stat.h>
 
 #include "cpupri.h"
 #include "cpudeadline.h"
@@ -1521,28 +1521,18 @@ enum rq_nohz_flag_bits {
 #endif
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
-
-struct cpu_irqtime {
-	u64			hardirq_time;
-	u64			softirq_time;
-	u64			irq_start_time;
-	u64			tick_skip;
-	struct u64_stats_sync	stats_sync;
-};
-
-DECLARE_PER_CPU(struct cpu_irqtime, cpu_irqtime);
-
 /* Must be called with preemption disabled */
 static inline u64 irq_time_read(int cpu)
 {
+	struct kernel_cpustat *kcpustat = &kcpustat_cpu(cpu);
 	u64 irq_time;
 	unsigned seq;
 
 	do {
-		seq = __u64_stats_fetch_begin(&per_cpu(cpu_irqtime, cpu).stats_sync);
-		irq_time = per_cpu(cpu_irqtime.softirq_time, cpu) +
-			   per_cpu(cpu_irqtime.hardirq_time, cpu);
-	} while (__u64_stats_fetch_retry(&per_cpu(cpu_irqtime, cpu).stats_sync, seq));
+		seq = __u64_stats_fetch_begin(&kcpustat->stats_sync);
+		irq_time = kcpustat->cpustat[CPUTIME_SOFTIRQ] +
+			   kcpustat->cpustat[CPUTIME_IRQ];
+	} while (__u64_stats_fetch_retry(&kcpustat->stats_sync, seq));
 
 	return irq_time;
 }
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 26/30] cputime: Push time to account_user_time() in nanosecs
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (24 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 25/30] cputime: Remove temporary irqtime states Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 27/30] cputime: Push time to account_steal_time() " Frederic Weisbecker
                   ` (3 subsequent siblings)
  29 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/ia64/kernel/time.c     |  5 ++++-
 arch/powerpc/kernel/time.c  |  3 ++-
 arch/s390/kernel/vtime.c    |  6 ++++--
 include/linux/kernel_stat.h |  2 +-
 kernel/sched/cputime.c      | 24 +++++++++++++++---------
 kernel/sched/stats.h        |  4 ++--
 6 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index 9a0104a..0518131 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -89,8 +89,11 @@ void vtime_account_user(struct task_struct *tsk)
 	struct thread_info *ti = task_thread_info(tsk);
 
 	if (ti->ac_utime) {
+		u64 utime;
+		//TODO: cycle_to_nsec()
 		delta_utime = cycle_to_cputime(ti->ac_utime);
-		account_user_time(tsk, delta_utime, delta_utime);
+		utime = cputime_to_nsecs(delta_utime);
+		account_user_time(tsk, utime, utime);
 		ti->ac_utime = 0;
 	}
 }
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 7505599..c69c8cc 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -376,7 +376,8 @@ void vtime_account_user(struct task_struct *tsk)
 	get_paca()->user_time = 0;
 	get_paca()->user_time_scaled = 0;
 	get_paca()->utime_sspurr = 0;
-	account_user_time(tsk, utime, utimescaled);
+	account_user_time(tsk, cputime_to_nsecs(utime),
+			  cputime_to_nsecs(utimescaled));
 }
 
 #else /* ! CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c
index 7f0089d..0ec68b8 100644
--- a/arch/s390/kernel/vtime.c
+++ b/arch/s390/kernel/vtime.c
@@ -60,7 +60,7 @@ static inline int virt_timer_forward(u64 elapsed)
 static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
 {
 	struct thread_info *ti = task_thread_info(tsk);
-	u64 timer, clock, user, system, steal;
+	u64 timer, clock, user, system, steal, nsecs;
 
 	timer = S390_lowcore.last_update_timer;
 	clock = S390_lowcore.last_update_clock;
@@ -79,11 +79,13 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
 	user = S390_lowcore.user_timer - ti->user_timer;
 	S390_lowcore.steal_timer -= user;
 	ti->user_timer = S390_lowcore.user_timer;
-	account_user_time(tsk, user, user);
+	nsecs = cputime_to_nsecs(user);
+	account_user_time(tsk, nsecs, nsecs);
 
 	system = S390_lowcore.system_timer - ti->system_timer;
 	S390_lowcore.steal_timer -= system;
 	ti->system_timer = S390_lowcore.system_timer;
+	nsecs = cputime_to_nsecs(system);
 	account_system_time(tsk, hardirq_offset, system, system);
 
 	steal = S390_lowcore.steal_timer;
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 585ced4..9ec9881 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -84,7 +84,7 @@ static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
  */
 extern unsigned long long task_delta_exec(struct task_struct *);
 
-extern void account_user_time(struct task_struct *, cputime_t, cputime_t);
+extern void account_user_time(struct task_struct *, u64, u64);
 extern void account_system_time(struct task_struct *, int, cputime_t, cputime_t);
 extern void account_steal_time(cputime_t);
 extern void account_idle_time(cputime_t);
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index f675008..02fd2e7 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -124,20 +124,19 @@ static inline void task_group_account_field(struct task_struct *p, int index,
  * @cputime: the cpu time spent in user space since the last update
  * @cputime_scaled: cputime scaled by cpu frequency
  */
-void account_user_time(struct task_struct *p, cputime_t cputime,
-		       cputime_t cputime_scaled)
+void account_user_time(struct task_struct *p, u64 cputime, u64 cputime_scaled)
 {
 	int index;
 
 	/* Add user time to process. */
-	p->utime += cputime_to_nsecs(cputime);
-	p->utimescaled += cputime_to_nsecs(cputime_scaled);
+	p->utime += cputime;
+	p->utimescaled += cputime_scaled;
 	account_group_user_time(p, cputime);
 
 	index = (task_nice(p) > 0) ? CPUTIME_NICE : CPUTIME_USER;
 
 	/* Add user time to cpustat. */
-	task_group_account_field(p, index, cputime_to_nsecs(cputime));
+	task_group_account_field(p, index, cputime);
 
 	/* Account for user time used */
 	acct_account_cputime(p);
@@ -157,7 +156,7 @@ static void account_guest_time(struct task_struct *p, cputime_t cputime,
 	/* Add guest time to process. */
 	p->utime += cputime_to_nsecs(cputime);
 	p->utimescaled += cputime_to_nsecs(cputime_scaled);
-	account_group_user_time(p, cputime);
+	account_group_user_time(p, cputime_to_nsecs(cputime));
 	p->gtime += cputime_to_nsecs(cputime);
 
 	/* Add guest time to cpustat. */
@@ -333,7 +332,7 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
 {
 	cputime_t scaled = cputime_to_scaled(cputime_one_jiffy);
 	u64 cputime = (__force u64) cputime_one_jiffy;
-	u64 nsec = cputime_to_nsecs(cputime); //TODO: make that build time
+	u64 nsec, nsec_scaled;
 
 	if (steal_account_process_tick())
 		return;
@@ -341,6 +340,9 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
 	cputime *= ticks;
 	scaled *= ticks;
 
+	nsec = cputime_to_nsecs(cputime);
+	nsec_scaled = cputime_to_nsecs(scaled);
+
 	if (irqtime_skip_tick(nsec))
 		return;
 
@@ -352,7 +354,7 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
 		 */
 		__account_system_time(p, cputime, scaled, CPUTIME_SOFTIRQ);
 	} else if (user_tick) {
-		account_user_time(p, cputime, scaled);
+		account_user_time(p, nsec, nsec_scaled);
 	} else if (p == rq->idle) {
 		account_idle_time(cputime);
 	} else if (p->flags & PF_VCPU) { /* System time or guest time */
@@ -454,11 +456,15 @@ void thread_group_cputime_adjusted(struct task_struct *p, u64 *ut, u64 *st)
 void account_process_tick(struct task_struct *p, int user_tick)
 {
 	cputime_t one_jiffy_scaled = cputime_to_scaled(cputime_one_jiffy);
+	u64 nsec, nsec_scaled;
 	struct rq *rq = this_rq();
 
 	if (vtime_accounting_enabled())
 		return;
 
+	nsec = cputime_to_nsecs(cputime_one_jiffy); //TODO: Make that build time
+	nsec_scaled = cputime_to_nsecs(one_jiffy_scaled); //Ditto
+
 	if (sched_clock_irqtime) {
 		irqtime_account_process_tick(p, user_tick, rq, 1);
 		return;
@@ -468,7 +474,7 @@ void account_process_tick(struct task_struct *p, int user_tick)
 		return;
 
 	if (user_tick)
-		account_user_time(p, cputime_one_jiffy, one_jiffy_scaled);
+		account_user_time(p, nsec, nsec_scaled);
 	else if ((p != rq->idle) || (irq_count() != HARDIRQ_OFFSET))
 		account_system_time(p, HARDIRQ_OFFSET, cputime_one_jiffy,
 				    one_jiffy_scaled);
diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h
index 4ab7043..649b38c 100644
--- a/kernel/sched/stats.h
+++ b/kernel/sched/stats.h
@@ -208,7 +208,7 @@ static inline bool cputimer_running(struct task_struct *tsk)
  * running CPU and update the utime field there.
  */
 static inline void account_group_user_time(struct task_struct *tsk,
-					   cputime_t cputime)
+					   u64 cputime)
 {
 	struct thread_group_cputimer *cputimer = &tsk->signal->cputimer;
 
@@ -216,7 +216,7 @@ static inline void account_group_user_time(struct task_struct *tsk,
 		return;
 
 	raw_spin_lock(&cputimer->lock);
-	cputimer->cputime.utime += cputime;
+	cputimer->cputime.utime += nsecs_to_cputime(cputime);
 	raw_spin_unlock(&cputimer->lock);
 }
 
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 27/30] cputime: Push time to account_steal_time() in nanosecs
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (25 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 26/30] cputime: Push time to account_user_time() in nanosecs Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 28/30] cputime: Push time to account_idle_time() " Frederic Weisbecker
                   ` (2 subsequent siblings)
  29 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/powerpc/kernel/time.c  |  2 +-
 arch/s390/kernel/vtime.c    |  2 +-
 include/linux/kernel_stat.h |  2 +-
 kernel/sched/cputime.c      | 20 ++++++--------------
 4 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index c69c8cc..e44558c 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -346,7 +346,7 @@ void vtime_account_system(struct task_struct *tsk)
 	delta = vtime_delta(tsk, &sys_scaled, &stolen);
 	account_system_time(tsk, 0, delta, sys_scaled);
 	if (stolen)
-		account_steal_time(stolen);
+		account_steal_time(cputime_to_nsecs(stolen));
 }
 EXPORT_SYMBOL_GPL(vtime_account_system);
 
diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c
index 0ec68b8..a2dfba1 100644
--- a/arch/s390/kernel/vtime.c
+++ b/arch/s390/kernel/vtime.c
@@ -91,7 +91,7 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
 	steal = S390_lowcore.steal_timer;
 	if ((s64) steal > 0) {
 		S390_lowcore.steal_timer = 0;
-		account_steal_time(steal);
+		account_steal_time(cputime_to_nsecs(steal));
 	}
 
 	return virt_timer_forward(user + system);
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 9ec9881..b164be9 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -86,7 +86,7 @@ extern unsigned long long task_delta_exec(struct task_struct *);
 
 extern void account_user_time(struct task_struct *, u64, u64);
 extern void account_system_time(struct task_struct *, int, cputime_t, cputime_t);
-extern void account_steal_time(cputime_t);
+extern void account_steal_time(u64);
 extern void account_idle_time(cputime_t);
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 02fd2e7..ee8a9bf 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -223,11 +223,11 @@ void account_system_time(struct task_struct *p, int hardirq_offset,
  * Account for involuntary wait time.
  * @cputime: the cpu time spent in involuntary wait
  */
-void account_steal_time(cputime_t cputime)
+void account_steal_time(u64 cputime)
 {
 	u64 *cpustat = kcpustat_this_cpu->cpustat;
 
-	cpustat[CPUTIME_STEAL] += cputime_to_nsecs(cputime);
+	cpustat[CPUTIME_STEAL] += cputime;
 }
 
 /*
@@ -250,21 +250,13 @@ static __always_inline bool steal_account_process_tick(void)
 #ifdef CONFIG_PARAVIRT
 	if (static_key_false(&paravirt_steal_enabled)) {
 		u64 steal;
-		cputime_t steal_ct;
 
 		steal = paravirt_steal_clock(smp_processor_id());
 		steal -= this_rq()->prev_steal_time;
+		this_rq()->prev_steal_time += steal;
+		account_steal_time(steal);
 
-		/*
-		 * cputime_t may be less precise than nsecs (eg: if it's
-		 * based on jiffies). Lets cast the result to cputime
-		 * granularity and account the rest on the next rounds.
-		 */
-		steal_ct = nsecs_to_cputime(steal);
-		this_rq()->prev_steal_time += cputime_to_nsecs(steal_ct);
-
-		account_steal_time(steal_ct);
-		return steal_ct;
+		return steal;
 	}
 #endif
 	return false;
@@ -489,7 +481,7 @@ void account_process_tick(struct task_struct *p, int user_tick)
  */
 void account_steal_ticks(unsigned long ticks)
 {
-	account_steal_time(jiffies_to_cputime(ticks));
+	account_steal_time(jiffies_to_nsecs(ticks));
 }
 
 /*
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 28/30] cputime: Push time to account_idle_time() in nanosecs
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (26 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 27/30] cputime: Push time to account_steal_time() " Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-11-28 18:23 ` [RFC PATCH 29/30] cputime: Push time to account_guest_time() " Frederic Weisbecker
  2014-11-28 18:24 ` [RFC PATCH 30/30] cputime: Push time to account_system_time() " Frederic Weisbecker
  29 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/ia64/kernel/time.c     |  2 +-
 arch/powerpc/kernel/time.c  |  2 +-
 arch/s390/kernel/idle.c     |  2 +-
 include/linux/kernel_stat.h |  2 +-
 kernel/sched/cputime.c      | 10 +++++-----
 5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index 0518131..2f9e8f0 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -143,7 +143,7 @@ EXPORT_SYMBOL_GPL(vtime_account_system);
 
 void vtime_account_idle(struct task_struct *tsk)
 {
-	account_idle_time(vtime_delta(tsk));
+	account_idle_time(cputime_to_nsecs(vtime_delta(tsk)));
 }
 
 #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index e44558c..cdd78a2 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -355,7 +355,7 @@ void vtime_account_idle(struct task_struct *tsk)
 	u64 delta, sys_scaled, stolen;
 
 	delta = vtime_delta(tsk, &sys_scaled, &stolen);
-	account_idle_time(delta + stolen);
+	account_idle_time(cputime_to_nsecs(delta + stolen));
 }
 
 /*
diff --git a/arch/s390/kernel/idle.c b/arch/s390/kernel/idle.c
index 9b75577..d2130ed 100644
--- a/arch/s390/kernel/idle.c
+++ b/arch/s390/kernel/idle.c
@@ -41,7 +41,7 @@ void __kprobes enabled_wait(void)
 	idle->clock_idle_enter = idle->clock_idle_exit = 0ULL;
 	idle->idle_time += idle_time;
 	idle->idle_count++;
-	account_idle_time(idle_time);
+	account_idle_time(cputime_to_nsecs(idle_time));
 	write_seqcount_end(&idle->seqcount);
 }
 
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index b164be9..2b26786 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -87,7 +87,7 @@ extern unsigned long long task_delta_exec(struct task_struct *);
 extern void account_user_time(struct task_struct *, u64, u64);
 extern void account_system_time(struct task_struct *, int, cputime_t, cputime_t);
 extern void account_steal_time(u64);
-extern void account_idle_time(cputime_t);
+extern void account_idle_time(u64);
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
 static inline void account_process_tick(struct task_struct *tsk, int user)
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index ee8a9bf..7d818bb 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -234,15 +234,15 @@ void account_steal_time(u64 cputime)
  * Account for idle time.
  * @cputime: the cpu time spent in idle wait
  */
-void account_idle_time(cputime_t cputime)
+void account_idle_time(u64 cputime)
 {
 	u64 *cpustat = kcpustat_this_cpu->cpustat;
 	struct rq *rq = this_rq();
 
 	if (atomic_read(&rq->nr_iowait) > 0)
-		cpustat[CPUTIME_IOWAIT] += cputime_to_nsecs(cputime);
+		cpustat[CPUTIME_IOWAIT] += cputime;
 	else
-		cpustat[CPUTIME_IDLE] += cputime_to_nsecs(cputime);
+		cpustat[CPUTIME_IDLE] += cputime;
 }
 
 static __always_inline bool steal_account_process_tick(void)
@@ -348,7 +348,7 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
 	} else if (user_tick) {
 		account_user_time(p, nsec, nsec_scaled);
 	} else if (p == rq->idle) {
-		account_idle_time(cputime);
+		account_idle_time(nsec);
 	} else if (p->flags & PF_VCPU) { /* System time or guest time */
 		account_guest_time(p, cputime, scaled);
 	} else {
@@ -496,7 +496,7 @@ void account_idle_ticks(unsigned long ticks)
 		return;
 	}
 
-	account_idle_time(jiffies_to_cputime(ticks));
+	account_idle_time(jiffies_to_nsecs(ticks));
 }
 
 /*
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 29/30] cputime: Push time to account_guest_time() in nanosecs
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (27 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 28/30] cputime: Push time to account_idle_time() " Frederic Weisbecker
@ 2014-11-28 18:23 ` Frederic Weisbecker
  2014-11-28 18:24 ` [RFC PATCH 30/30] cputime: Push time to account_system_time() " Frederic Weisbecker
  29 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:23 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/sched/cputime.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 7d818bb..9d002f5 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -148,24 +148,24 @@ void account_user_time(struct task_struct *p, u64 cputime, u64 cputime_scaled)
  * @cputime: the cpu time spent in virtual machine since the last update
  * @cputime_scaled: cputime scaled by cpu frequency
  */
-static void account_guest_time(struct task_struct *p, cputime_t cputime,
-			       cputime_t cputime_scaled)
+static void account_guest_time(struct task_struct *p, u64 cputime,
+			       u64 cputime_scaled)
 {
 	u64 *cpustat = kcpustat_this_cpu->cpustat;
 
 	/* Add guest time to process. */
-	p->utime += cputime_to_nsecs(cputime);
-	p->utimescaled += cputime_to_nsecs(cputime_scaled);
-	account_group_user_time(p, cputime_to_nsecs(cputime));
-	p->gtime += cputime_to_nsecs(cputime);
+	p->utime += cputime;
+	p->utimescaled += cputime_scaled;
+	account_group_user_time(p, cputime);
+	p->gtime += cputime;
 
 	/* Add guest time to cpustat. */
 	if (task_nice(p) > 0) {
-		cpustat[CPUTIME_NICE] += cputime_to_nsecs(cputime);
-		cpustat[CPUTIME_GUEST_NICE] += cputime_to_nsecs(cputime);
+		cpustat[CPUTIME_NICE] += cputime;
+		cpustat[CPUTIME_GUEST_NICE] += cputime;
 	} else {
-		cpustat[CPUTIME_USER] += cputime_to_nsecs(cputime);
-		cpustat[CPUTIME_GUEST] += cputime_to_nsecs(cputime);
+		cpustat[CPUTIME_USER] += cputime;
+		cpustat[CPUTIME_GUEST] += cputime;
 	}
 }
 
@@ -205,7 +205,7 @@ void account_system_time(struct task_struct *p, int hardirq_offset,
 	int index;
 
 	if ((p->flags & PF_VCPU) && (irq_count() - hardirq_offset == 0)) {
-		account_guest_time(p, cputime, cputime_scaled);
+		account_guest_time(p, cputime_to_nsecs(cputime), cputime_to_nsecs(cputime_scaled));
 		return;
 	}
 
@@ -350,7 +350,7 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
 	} else if (p == rq->idle) {
 		account_idle_time(nsec);
 	} else if (p->flags & PF_VCPU) { /* System time or guest time */
-		account_guest_time(p, cputime, scaled);
+		account_guest_time(p, nsec, nsec_scaled);
 	} else {
 		__account_system_time(p, cputime, scaled,	CPUTIME_SYSTEM);
 	}
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 30/30] cputime: Push time to account_system_time() in nanosecs
  2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
                   ` (28 preceding siblings ...)
  2014-11-28 18:23 ` [RFC PATCH 29/30] cputime: Push time to account_guest_time() " Frederic Weisbecker
@ 2014-11-28 18:24 ` Frederic Weisbecker
  29 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-11-28 18:24 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel,
	Martin Schwidefsky

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/ia64/kernel/time.c     |  2 +-
 arch/powerpc/kernel/time.c  |  3 ++-
 arch/s390/kernel/vtime.c    |  7 ++++---
 include/linux/kernel_stat.h |  2 +-
 kernel/sched/cputime.c      | 22 +++++++++++-----------
 kernel/sched/stats.h        |  4 ++--
 6 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index 2f9e8f0..f01bb2a 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -135,7 +135,7 @@ static cputime_t vtime_delta(struct task_struct *tsk)
 
 void vtime_account_system(struct task_struct *tsk)
 {
-	cputime_t delta = vtime_delta(tsk);
+	u64 delta = cputime_to_nsecs(vtime_delta(tsk));
 
 	account_system_time(tsk, 0, delta, delta);
 }
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index cdd78a2..bb839fb 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -344,7 +344,8 @@ void vtime_account_system(struct task_struct *tsk)
 	u64 delta, sys_scaled, stolen;
 
 	delta = vtime_delta(tsk, &sys_scaled, &stolen);
-	account_system_time(tsk, 0, delta, sys_scaled);
+	account_system_time(tsk, 0, cputime_to_nsecs(delta),
+			    cputime_to_nsecs(sys_scaled));
 	if (stolen)
 		account_steal_time(cputime_to_nsecs(stolen));
 }
diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c
index a2dfba1..0780700 100644
--- a/arch/s390/kernel/vtime.c
+++ b/arch/s390/kernel/vtime.c
@@ -86,7 +86,7 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
 	S390_lowcore.steal_timer -= system;
 	ti->system_timer = S390_lowcore.system_timer;
 	nsecs = cputime_to_nsecs(system);
-	account_system_time(tsk, hardirq_offset, system, system);
+	account_system_time(tsk, hardirq_offset, nsecs, nsecs);
 
 	steal = S390_lowcore.steal_timer;
 	if ((s64) steal > 0) {
@@ -128,7 +128,7 @@ void vtime_account_user(struct task_struct *tsk)
 void vtime_account_irq_enter(struct task_struct *tsk)
 {
 	struct thread_info *ti = task_thread_info(tsk);
-	u64 timer, system;
+	u64 timer, system, nsecs;
 
 	WARN_ON_ONCE(!irqs_disabled());
 
@@ -139,7 +139,8 @@ void vtime_account_irq_enter(struct task_struct *tsk)
 	system = S390_lowcore.system_timer - ti->system_timer;
 	S390_lowcore.steal_timer -= system;
 	ti->system_timer = S390_lowcore.system_timer;
-	account_system_time(tsk, 0, system, system);
+	nsecs = cputime_to_nsecs(system);
+	account_system_time(tsk, 0, nsecs, nsecs);
 
 	virt_timer_forward(system);
 }
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 2b26786..ec7d6f0 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -85,7 +85,7 @@ static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
 extern unsigned long long task_delta_exec(struct task_struct *);
 
 extern void account_user_time(struct task_struct *, u64, u64);
-extern void account_system_time(struct task_struct *, int, cputime_t, cputime_t);
+extern void account_system_time(struct task_struct *, int, u64, u64);
 extern void account_steal_time(u64);
 extern void account_idle_time(u64);
 
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 9d002f5..5f3ff5c 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -177,16 +177,16 @@ static void account_guest_time(struct task_struct *p, u64 cputime,
  * @target_cputime64: pointer to cpustat field that has to be updated
  */
 static inline
-void __account_system_time(struct task_struct *p, cputime_t cputime,
-			cputime_t cputime_scaled, int index)
+void __account_system_time(struct task_struct *p, u64 cputime,
+			   u64 cputime_scaled, int index)
 {
 	/* Add system time to process. */
-	p->stime += cputime_to_nsecs(cputime);
-	p->stimescaled += cputime_to_nsecs(cputime_scaled);
+	p->stime += cputime;
+	p->stimescaled += cputime_scaled;
 	account_group_system_time(p, cputime);
 
 	/* Add system time to cpustat. */
-	task_group_account_field(p, index, cputime_to_nsecs(cputime));
+	task_group_account_field(p, index, cputime);
 
 	/* Account for system time used */
 	acct_account_cputime(p);
@@ -200,12 +200,12 @@ void __account_system_time(struct task_struct *p, cputime_t cputime,
  * @cputime_scaled: cputime scaled by cpu frequency
  */
 void account_system_time(struct task_struct *p, int hardirq_offset,
-			 cputime_t cputime, cputime_t cputime_scaled)
+			 u64 cputime, u64 cputime_scaled)
 {
 	int index;
 
 	if ((p->flags & PF_VCPU) && (irq_count() - hardirq_offset == 0)) {
-		account_guest_time(p, cputime_to_nsecs(cputime), cputime_to_nsecs(cputime_scaled));
+		account_guest_time(p, cputime, cputime_scaled);
 		return;
 	}
 
@@ -344,7 +344,7 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
 		 * So, we have to handle it separately here.
 		 * Also, p->stime needs to be updated for ksoftirqd.
 		 */
-		__account_system_time(p, cputime, scaled, CPUTIME_SOFTIRQ);
+		__account_system_time(p, nsec, nsec_scaled, CPUTIME_SOFTIRQ);
 	} else if (user_tick) {
 		account_user_time(p, nsec, nsec_scaled);
 	} else if (p == rq->idle) {
@@ -352,7 +352,7 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
 	} else if (p->flags & PF_VCPU) { /* System time or guest time */
 		account_guest_time(p, nsec, nsec_scaled);
 	} else {
-		__account_system_time(p, cputime, scaled,	CPUTIME_SYSTEM);
+		__account_system_time(p, nsec, nsec_scaled, CPUTIME_SYSTEM);
 	}
 }
 
@@ -468,8 +468,8 @@ void account_process_tick(struct task_struct *p, int user_tick)
 	if (user_tick)
 		account_user_time(p, nsec, nsec_scaled);
 	else if ((p != rq->idle) || (irq_count() != HARDIRQ_OFFSET))
-		account_system_time(p, HARDIRQ_OFFSET, cputime_one_jiffy,
-				    one_jiffy_scaled);
+		account_system_time(p, HARDIRQ_OFFSET, nsec,
+				    nsec_scaled);
 	else
 		account_idle_time(cputime_one_jiffy);
 }
diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h
index 649b38c..0c88cbe 100644
--- a/kernel/sched/stats.h
+++ b/kernel/sched/stats.h
@@ -231,7 +231,7 @@ static inline void account_group_user_time(struct task_struct *tsk,
  * running CPU and update the stime field there.
  */
 static inline void account_group_system_time(struct task_struct *tsk,
-					     cputime_t cputime)
+					     u64 cputime)
 {
 	struct thread_group_cputimer *cputimer = &tsk->signal->cputimer;
 
@@ -239,7 +239,7 @@ static inline void account_group_system_time(struct task_struct *tsk,
 		return;
 
 	raw_spin_lock(&cputimer->lock);
-	cputimer->cputime.stime += cputime;
+	cputimer->cputime.stime += nsecs_to_cputime(cputime);
 	raw_spin_unlock(&cputimer->lock);
 }
 
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 06/30] s390: Introduce cputime64_to_nsecs()
  2014-11-28 18:23 ` [RFC PATCH 06/30] s390: Introduce cputime64_to_nsecs() Frederic Weisbecker
@ 2014-12-01 12:24   ` Heiko Carstens
  2014-12-01 13:58     ` Martin Schwidefsky
  2014-12-01 16:23     ` Frederic Weisbecker
  0 siblings, 2 replies; 49+ messages in thread
From: Heiko Carstens @ 2014-12-01 12:24 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Tony Luck, Peter Zijlstra, Benjamin Herrenschmidt,
	Thomas Gleixner, Oleg Nesterov, Paul Mackerras, Wu Fengguang,
	Ingo Molnar, Rik van Riel, Martin Schwidefsky

On Fri, Nov 28, 2014 at 07:23:36PM +0100, Frederic Weisbecker wrote:
> This will be needed for the conversion of kernel stat to nsecs.
> 
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> ---
>  arch/s390/include/asm/cputime.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/s390/include/asm/cputime.h b/arch/s390/include/asm/cputime.h
> index 820b38a..75ba96f 100644
> --- a/arch/s390/include/asm/cputime.h
> +++ b/arch/s390/include/asm/cputime.h
> @@ -59,6 +59,11 @@ static inline cputime64_t jiffies64_to_cputime64(const u64 jif)
>  	return (__force cputime64_t)(jif * (CPUTIME_PER_SEC / HZ));
>  }
> 
> +static inline u64 cputime64_to_nsecs(cputime64_t cputime)
> +{
> +	return (__force u64)cputime * CPUTIME_PER_USEC * NSEC_PER_USEC;
> +}
> +

This is incorrect. You probably wanted to write something like

	return (__force u64)cputime / CPUTIME_PER_USEC * NSEC_PER_USEC; ?

However we would still lose a lot of precision.
The correct algorithm to convert from cputime to nanoseconds can be found in
tod_to_ns() - see arch/s390/include/asm/timex.h

And if you see that rather complex algorithm, I doubt we want to have the
changes you propose. We need to have that calculation three times for each
irq (user, system and steal time) and would still have worse precision than
we have right now. Not talking about the additional wasted cpu cycles...

But I guess Martin wanted to comment on your patches anyway ;)


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 04/30] s390: Convert open coded idle time seqcount
  2014-11-28 18:23 ` [RFC PATCH 04/30] s390: Convert open coded idle time seqcount Frederic Weisbecker
@ 2014-12-01 13:46   ` Heiko Carstens
  0 siblings, 0 replies; 49+ messages in thread
From: Heiko Carstens @ 2014-12-01 13:46 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Tony Luck, Peter Zijlstra, Benjamin Herrenschmidt,
	Thomas Gleixner, Oleg Nesterov, Paul Mackerras, Wu Fengguang,
	Ingo Molnar, Rik van Riel, Martin Schwidefsky

On Fri, Nov 28, 2014 at 07:23:34PM +0100, Frederic Weisbecker wrote:
> s390 uses open coded seqcount to synchronize idle time accounting.
> Lets consolidate it with the standard API.
> 
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> ---
>  arch/s390/include/asm/idle.h |  3 ++-
>  arch/s390/kernel/idle.c      | 28 +++++++++++++++-------------
>  2 files changed, 17 insertions(+), 14 deletions(-)

This is a nice cleanup. I'll apply it to the s390 tree, so it will
go upstream independently of this discussion.

Thanks,
Heiko


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 05/30] s390: Translate cputime magic constants to macros
  2014-11-28 18:23 ` [RFC PATCH 05/30] s390: Translate cputime magic constants to macros Frederic Weisbecker
@ 2014-12-01 13:47   ` Heiko Carstens
  2014-12-01 16:23     ` Frederic Weisbecker
  0 siblings, 1 reply; 49+ messages in thread
From: Heiko Carstens @ 2014-12-01 13:47 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Tony Luck, Peter Zijlstra, Benjamin Herrenschmidt,
	Thomas Gleixner, Oleg Nesterov, Paul Mackerras, Wu Fengguang,
	Ingo Molnar, Rik van Riel, Martin Schwidefsky

On Fri, Nov 28, 2014 at 07:23:35PM +0100, Frederic Weisbecker wrote:
> Make the code more self-explanatory by naming magic constants.
> 
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> ---
>  arch/s390/include/asm/cputime.h | 47 +++++++++++++++++++++--------------------
>  1 file changed, 24 insertions(+), 23 deletions(-)

Same here, nice cleanup and will be applied to the s390 tree.

Thanks,
Heiko


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 06/30] s390: Introduce cputime64_to_nsecs()
  2014-12-01 12:24   ` Heiko Carstens
@ 2014-12-01 13:58     ` Martin Schwidefsky
  2014-12-01 16:23     ` Frederic Weisbecker
  1 sibling, 0 replies; 49+ messages in thread
From: Martin Schwidefsky @ 2014-12-01 13:58 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Frederic Weisbecker, LKML, Tony Luck, Peter Zijlstra,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel

On Mon, 1 Dec 2014 13:24:52 +0100
Heiko Carstens <heiko.carstens@de.ibm.com> wrote:

> On Fri, Nov 28, 2014 at 07:23:36PM +0100, Frederic Weisbecker wrote:
> > This will be needed for the conversion of kernel stat to nsecs.
> > 
> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> > Cc: Oleg Nesterov <oleg@redhat.com>
> > Cc: Paul Mackerras <paulus@samba.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Tony Luck <tony.luck@intel.com>
> > Cc: Wu Fengguang <fengguang.wu@intel.com>
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > ---
> >  arch/s390/include/asm/cputime.h | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/arch/s390/include/asm/cputime.h b/arch/s390/include/asm/cputime.h
> > index 820b38a..75ba96f 100644
> > --- a/arch/s390/include/asm/cputime.h
> > +++ b/arch/s390/include/asm/cputime.h
> > @@ -59,6 +59,11 @@ static inline cputime64_t jiffies64_to_cputime64(const u64 jif)
> >  	return (__force cputime64_t)(jif * (CPUTIME_PER_SEC / HZ));
> >  }
> > 
> > +static inline u64 cputime64_to_nsecs(cputime64_t cputime)
> > +{
> > +	return (__force u64)cputime * CPUTIME_PER_USEC * NSEC_PER_USEC;
> > +}
> > +
> 
> This is incorrect. You probably wanted to write something like
> 
> 	return (__force u64)cputime / CPUTIME_PER_USEC * NSEC_PER_USEC; ?
> 
> However we would still lose a lot of precision.
> The correct algorithm to convert from cputime to nanoseconds can be found in
> tod_to_ns() - see arch/s390/include/asm/timex.h
> 
> And if you see that rather complex algorithm, I doubt we want to have the
> changes you propose. We need to have that calculation three times for each
> irq (user, system and steal time) and would still have worse precision than
> we have right now. Not talking about the additional wasted cpu cycles...
> 
> But I guess Martin wanted to comment on your patches anyway ;)

The function that gets called most often is the accounting code for irq_enter
and irq_exit. Both are mapped to vtime_account_irq_enter, with the correct
implementation for the cputime_to_nsec the function gets 15 instructions
longer. The relevant code sequence

Upstream code:
  10592e:       e3 10 02 e8 00 04       lg      %r1,744
  105934:       b2 09 f0 a0             stpt    160(%r15)
  105938:       e3 30 f0 a0 00 04       lg      %r3,160(%r15)
  10593e:       e3 10 02 d8 00 08       ag      %r1,728
  105944:       e3 30 02 e8 00 24       stg     %r3,744
  10594a:       b9 09 00 13             sgr     %r1,%r3
  10594e:       e3 10 02 d8 00 24       stg     %r1,728
  105954:       b9 04 00 41             lgr     %r4,%r1
  105958:       e3 40 c0 68 00 09       sg      %r4,104(%r12)
  10595e:       e3 30 02 e0 00 04       lg      %r3,736
  105964:       b9 09 00 34             sgr     %r3,%r4
  105968:       b9 04 00 54             lgr     %r5,%r4
  10596c:       e3 30 02 e0 00 24       stg     %r3,736
  105972:       a7 39 00 00             lghi    %r3,0
  105976:       e3 10 c0 68 00 24       stg     %r1,104(%r12)
  10597c:       b9 04 00 b4             lgr     %r11,%r4
  105980:       c0 e5 00 03 78 4c       brasl   %r14,174a18 <account_system_time

Patched code:
  105a3e:       e3 50 02 e8 00 04       lg      %r5,744
  105a44:       b2 09 f0 a0             stpt    160(%r15)
  105a48:       b9 04 00 15             lgr     %r1,%r5
  105a4c:       e3 50 f0 a0 00 04       lg      %r5,160(%r15)
  105a52:       e3 50 02 e8 00 24       stg     %r5,744
  105a58:       e3 10 02 d8 00 08       ag      %r1,728
  105a5e:       b9 e9 50 51             sgrk    %r5,%r1,%r5
  105a62:       e3 00 02 e0 00 04       lg      %r0,736
  105a68:       e3 50 02 d8 00 24       stg     %r5,728
  105a6e:       b9 04 00 15             lgr     %r1,%r5
  105a72:       e3 10 a0 68 00 09       sg      %r1,104(%r10)
  105a78:       b9 04 00 e1             lgr     %r14,%r1
  105a7c:       b9 04 00 81             lgr     %r8,%r1
  105a80:       eb 11 00 20 00 0c       srlg    %r1,%r1,32
  105a86:       ec 3e 20 bf 00 55       risbg   %r3,%r14,32,191,0
  105a8c:       eb 91 00 02 00 0d       sllg    %r9,%r1,2
  105a92:       eb c3 00 07 00 0d       sllg    %r12,%r3,7
  105a98:       eb b1 00 07 00 0d       sllg    %r11,%r1,7
  105a9e:       eb 43 00 02 00 0d       sllg    %r4,%r3,2
  105aa4:       b9 09 00 b9             sgr     %r11,%r9
  105aa8:       b9 e9 40 4c             sgrk    %r4,%r12,%r4
  105aac:       b9 08 00 43             agr     %r4,%r3
  105ab0:       b9 08 00 1b             agr     %r1,%r11
  105ab4:       b9 e9 e0 30             sgrk    %r3,%r0,%r14
  105ab8:       eb 11 00 17 00 0d       sllg    %r1,%r1,23
  105abe:       e3 30 02 e0 00 24       stg     %r3,736
  105ac4:       eb 44 00 09 00 0c       srlg    %r4,%r4,9
  105aca:       e3 50 a0 68 00 24       stg     %r5,104(%r10)
  105ad0:       b9 08 00 41             agr     %r4,%r1
  105ad4:       b9 04 00 54             lgr     %r5,%r4
  105ad8:       a7 39 00 00             lghi    %r3,0
  105adc:       c0 e5 00 03 78 02       brasl   %r14,174ae0 <account_system_time

The function is called two times for each interrupt and accounts
the system time only, that makes 2 * 15 instructions more for each
interrupt, while loosing a small amount of precision. Imho not good.

The idea of cputime_t was to allow an architecture to define its preferred
format, for s390 this is a pure CPU timer delta. We do not loose *any*
precision as long as the CPU timer works correctly. From my point of view
this is a change for the worse.

On the positive side, there are some nice improvements in the patch
series. We will definitely pick up some of the patches.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 03/30] cputime: Introduce nsecs_to_cputime64()
  2014-11-28 18:23 ` [RFC PATCH 03/30] cputime: Introduce nsecs_to_cputime64() Frederic Weisbecker
@ 2014-12-01 14:05   ` Martin Schwidefsky
  0 siblings, 0 replies; 49+ messages in thread
From: Martin Schwidefsky @ 2014-12-01 14:05 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel

On Fri, 28 Nov 2014 19:23:33 +0100
Frederic Weisbecker <fweisbec@gmail.com> wrote:

> This will be needed for the conversion of kernel stat to nsecs.
> 
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> ---
>  include/linux/cputime.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/include/linux/cputime.h b/include/linux/cputime.h
> index f2eb2ee..a225ab9 100644
> --- a/include/linux/cputime.h
> +++ b/include/linux/cputime.h
> @@ -13,4 +13,14 @@
>  	usecs_to_cputime((__nsecs) / NSEC_PER_USEC)
>  #endif
> 
> +#ifndef nsecs_to_cputime
> +# define nsecs_to_cputime(__nsecs)	\
> +	usecs_to_cputime((__nsecs) / NSEC_PER_USEC)
> +#endif
> +
> +#ifndef nsecs_to_cputime64
> +# define nsecs_to_cputime64(__nsecs)	\
> +	((__force cputime64_t) nsecs_to_cputime(__nsecs))
> +#endif
> +
>  #endif /* __LINUX_CPUTIME_H */

For any architecture with a cputime_t better than a micro-second
the conversion to micro seconds degrades the precision a lot.

I would prefer to see the compile fail for e.g. s390 instead
of silently introducing *broken* cputime values.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 07/30] cputime: Convert kcpustat to nsecs
  2014-11-28 18:23 ` [RFC PATCH 07/30] cputime: Convert kcpustat to nsecs Frederic Weisbecker
@ 2014-12-01 14:14   ` Martin Schwidefsky
  2014-12-01 16:10     ` Frederic Weisbecker
  0 siblings, 1 reply; 49+ messages in thread
From: Martin Schwidefsky @ 2014-12-01 14:14 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel

On Fri, 28 Nov 2014 19:23:37 +0100
Frederic Weisbecker <fweisbec@gmail.com> wrote:

> Kernel cpu stats are stored in cputime_t which is an architecture
> defined type, and hence a bit opaque and requiring accessors and mutators
> for any operation.
> 
> Converting them to nsecs simplifies the code a little bit.

Quite honestly I do not see much of an improvement here, on set of
functions (cputime_to_xxx) gets replaced by another (nsecs_to_xxx).

On the contrary for s390 I see a degradation, consider a hunk like
this:

> @@ -128,9 +128,9 @@ static inline u64 get_cpu_idle_time_jiffy(unsigned int cpu, u64 *wall)
> 
>  	idle_time = cur_wall_time - busy_time;
>  	if (wall)
> -		*wall = cputime_to_usecs(cur_wall_time);
> +		*wall = div_u64(cur_wall_time, NSEC_PER_USEC);
> 
> -	return cputime_to_usecs(idle_time);
> +	return div_u64(idle_time, NSEC_PER_USEC);
>  }
> 
>  u64 get_cpu_idle_time(unsigned int cpu, u64 *wall, int io_busy)

For s390 cputime_to_usecs is a shift, with the new code we now have a division.
Fortunately this is in a piece of code that s390 does not use..

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 24/30] cputime: Increment kcpustat directly on irqtime account
  2014-11-28 18:23 ` [RFC PATCH 24/30] cputime: Increment kcpustat directly on irqtime account Frederic Weisbecker
@ 2014-12-01 14:41   ` Martin Schwidefsky
  2014-12-01 16:15     ` Frederic Weisbecker
  0 siblings, 1 reply; 49+ messages in thread
From: Martin Schwidefsky @ 2014-12-01 14:41 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel

On Fri, 28 Nov 2014 19:23:54 +0100
Frederic Weisbecker <fweisbec@gmail.com> wrote:

> The irqtime is accounted is nsecs and stored in
> cpu_irq_time.hardirq_time and cpu_irq_time.softirq_time. Once the
> accumulated amount reaches a new jiffy, this one gets accounted to the
> kcpustat.
> 
> This was necessary when kcpustat was stored in cputime_t, which could at
> worst have a jiffies granularity. But now kcpustat is stored in nsecs
> so this whole discretization game with temporary irqtime storage has
> become unnecessary.
> 
> We can now directly account the irqtime to the kcpustat.

Isn't the issue here that two different approaches to cputime accounting
get mixed here? On the one hand a cputime_t based on jiffies and on the
other CONFIG_IRQ_TIME_ACCOUNTING which uses sched_clock_cpu() to create
the accounting deltas.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 07/30] cputime: Convert kcpustat to nsecs
  2014-12-01 14:14   ` Martin Schwidefsky
@ 2014-12-01 16:10     ` Frederic Weisbecker
  2014-12-01 16:48       ` Martin Schwidefsky
  0 siblings, 1 reply; 49+ messages in thread
From: Frederic Weisbecker @ 2014-12-01 16:10 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: LKML, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel

On Mon, Dec 01, 2014 at 03:14:02PM +0100, Martin Schwidefsky wrote:
> On Fri, 28 Nov 2014 19:23:37 +0100
> Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > Kernel cpu stats are stored in cputime_t which is an architecture
> > defined type, and hence a bit opaque and requiring accessors and mutators
> > for any operation.
> > 
> > Converting them to nsecs simplifies the code a little bit.
> 
> Quite honestly I do not see much of an improvement here, on set of
> functions (cputime_to_xxx) gets replaced by another (nsecs_to_xxx).

Well it's not just that. Irqtime accounting gets simplified (no more
temporary buffer getting flushed on tick), same goes for guest accounting.
cpufreq gets one less level of conversion. Some places also lost their cputime_t
accessors due to internal nsecs use.
Plus a few other simplifications here and there that I haven't yet finished
like VIRT_CPU_ACCOUNTING_GEN that won't need cputime_t accessors anymore.

Also once the patchset is complete, we should be able to remove a significant
part of cputime_t accessors and mutators if only archs use them for one or
two conversions (probably cputime_to_nsecs() alone would be enough). And there are
many implementations of cputime_t: jiffies, nsecs, powerpc and s390. Expect
the removal of jiffies and nsecs based cputime_t plus the largest part of powerpc
and s390 implementations.

> 
> On the contrary for s390 I see a degradation, consider a hunk like
> this:
> 
> > @@ -128,9 +128,9 @@ static inline u64 get_cpu_idle_time_jiffy(unsigned int cpu, u64 *wall)
> > 
> >  	idle_time = cur_wall_time - busy_time;
> >  	if (wall)
> > -		*wall = cputime_to_usecs(cur_wall_time);
> > +		*wall = div_u64(cur_wall_time, NSEC_PER_USEC);
> > 
> > -	return cputime_to_usecs(idle_time);
> > +	return div_u64(idle_time, NSEC_PER_USEC);
> >  }
> > 
> >  u64 get_cpu_idle_time(unsigned int cpu, u64 *wall, int io_busy)
> 
> For s390 cputime_to_usecs is a shift, with the new code we now have a division.
> Fortunately this is in a piece of code that s390 does not use..

Speaking about the degradation in s390:

s390 is really a special case. And it would be a shame if we prevent from a
real core cleanup just for this special case especially as it's fairly possible
to keep a specific treatment for s390 in order not to impact its performances
and time precision. We could simply accumulate the cputime in per-cpu values:

struct s390_cputime {
       cputime_t user, sys, softirq, hardirq, steal;
}

DEFINE_PER_CPU(struct s390_cputime, s390_cputime);

Then on irq entry/exit, just add the accumulated time to the relevant buffer
and account for real (through any account_...time() functions) only on tick
and task switch. There the costly operations (unit conversion and call to
account_...._time() functions) are deferred to a rarer yet periodic enough
event. This is what s390 does already for user/system time and kernel
boundaries.

This way we should even improve the situation compared to what we have
upstream. It's going to be faster because calling the accounting functions
can be costlier than simple per-cpu ops. And also we keep the cputime_t
granularity. For archs like s390 which have a granularity higher than nsecs,
we can have:

   u64 cputime_to_nsecs(cputime_t time, u64 *rem);

And to avoid remainder losses, we can do that from the tick:

    delta_cputime = this_cpu_read(s390_cputime.hardirq);
    delta_nsec = cputime_to_nsecs(delta_cputime, &rem);
    account_system_time(delta_nsec, HARDIRQ_OFFSET);
    this_cpu_write(s390_cputime.hardirq, rem);

Although I doubt that remainders below one nsec lost each tick matter that much.
But if it does, it's fairly possible to handle like above.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 24/30] cputime: Increment kcpustat directly on irqtime account
  2014-12-01 14:41   ` Martin Schwidefsky
@ 2014-12-01 16:15     ` Frederic Weisbecker
  2014-12-01 16:50       ` Martin Schwidefsky
  0 siblings, 1 reply; 49+ messages in thread
From: Frederic Weisbecker @ 2014-12-01 16:15 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: LKML, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel

On Mon, Dec 01, 2014 at 03:41:28PM +0100, Martin Schwidefsky wrote:
> On Fri, 28 Nov 2014 19:23:54 +0100
> Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > The irqtime is accounted is nsecs and stored in
> > cpu_irq_time.hardirq_time and cpu_irq_time.softirq_time. Once the
> > accumulated amount reaches a new jiffy, this one gets accounted to the
> > kcpustat.
> > 
> > This was necessary when kcpustat was stored in cputime_t, which could at
> > worst have a jiffies granularity. But now kcpustat is stored in nsecs
> > so this whole discretization game with temporary irqtime storage has
> > become unnecessary.
> > 
> > We can now directly account the irqtime to the kcpustat.
> 
> Isn't the issue here that two different approaches to cputime accounting
> get mixed here? On the one hand a cputime_t based on jiffies and on the
> other CONFIG_IRQ_TIME_ACCOUNTING which uses sched_clock_cpu() to create
> the accounting deltas.

There is no other way really because cputime_t can wrap very low granularity
time unit such as jiffies. And there is no way to account irqtime with jiffies
since IRQ duration is supposed to be below 1 ms.

So irqtime is accounted with a high precision clock, nsecs based and periodically
accounted as cputime_t once we accumulate enough for a cputime_t unit.

And turning cputime_t to nsecs simplifies that.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 06/30] s390: Introduce cputime64_to_nsecs()
  2014-12-01 12:24   ` Heiko Carstens
  2014-12-01 13:58     ` Martin Schwidefsky
@ 2014-12-01 16:23     ` Frederic Weisbecker
  1 sibling, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-12-01 16:23 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: LKML, Tony Luck, Peter Zijlstra, Benjamin Herrenschmidt,
	Thomas Gleixner, Oleg Nesterov, Paul Mackerras, Wu Fengguang,
	Ingo Molnar, Rik van Riel, Martin Schwidefsky

On Mon, Dec 01, 2014 at 01:24:52PM +0100, Heiko Carstens wrote:
> On Fri, Nov 28, 2014 at 07:23:36PM +0100, Frederic Weisbecker wrote:
> > This will be needed for the conversion of kernel stat to nsecs.
> > 
> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> > Cc: Oleg Nesterov <oleg@redhat.com>
> > Cc: Paul Mackerras <paulus@samba.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Tony Luck <tony.luck@intel.com>
> > Cc: Wu Fengguang <fengguang.wu@intel.com>
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > ---
> >  arch/s390/include/asm/cputime.h | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/arch/s390/include/asm/cputime.h b/arch/s390/include/asm/cputime.h
> > index 820b38a..75ba96f 100644
> > --- a/arch/s390/include/asm/cputime.h
> > +++ b/arch/s390/include/asm/cputime.h
> > @@ -59,6 +59,11 @@ static inline cputime64_t jiffies64_to_cputime64(const u64 jif)
> >  	return (__force cputime64_t)(jif * (CPUTIME_PER_SEC / HZ));
> >  }
> > 
> > +static inline u64 cputime64_to_nsecs(cputime64_t cputime)
> > +{
> > +	return (__force u64)cputime * CPUTIME_PER_USEC * NSEC_PER_USEC;
> > +}
> > +
> 
> This is incorrect. You probably wanted to write something like
> 
> 	return (__force u64)cputime / CPUTIME_PER_USEC * NSEC_PER_USEC; ?

You're right :-)

> 
> However we would still lose a lot of precision.
> The correct algorithm to convert from cputime to nanoseconds can be found in
> tod_to_ns() - see arch/s390/include/asm/timex.h
> 
> And if you see that rather complex algorithm, I doubt we want to have the
> changes you propose. We need to have that calculation three times for each
> irq (user, system and steal time) and would still have worse precision than
> we have right now. Not talking about the additional wasted cpu cycles...

Yeah indeed. So probably it could be better to accumulate the time in cputime_t
and flush it as nsecs on tick.

> 
> But I guess Martin wanted to comment on your patches anyway ;)
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 05/30] s390: Translate cputime magic constants to macros
  2014-12-01 13:47   ` Heiko Carstens
@ 2014-12-01 16:23     ` Frederic Weisbecker
  0 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-12-01 16:23 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: LKML, Tony Luck, Peter Zijlstra, Benjamin Herrenschmidt,
	Thomas Gleixner, Oleg Nesterov, Paul Mackerras, Wu Fengguang,
	Ingo Molnar, Rik van Riel, Martin Schwidefsky

On Mon, Dec 01, 2014 at 02:47:30PM +0100, Heiko Carstens wrote:
> On Fri, Nov 28, 2014 at 07:23:35PM +0100, Frederic Weisbecker wrote:
> > Make the code more self-explanatory by naming magic constants.
> > 
> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> > Cc: Oleg Nesterov <oleg@redhat.com>
> > Cc: Paul Mackerras <paulus@samba.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Tony Luck <tony.luck@intel.com>
> > Cc: Wu Fengguang <fengguang.wu@intel.com>
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > ---
> >  arch/s390/include/asm/cputime.h | 47 +++++++++++++++++++++--------------------
> >  1 file changed, 24 insertions(+), 23 deletions(-)
> 
> Same here, nice cleanup and will be applied to the s390 Thanks.

Oh thanks a lot!

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 07/30] cputime: Convert kcpustat to nsecs
  2014-12-01 16:10     ` Frederic Weisbecker
@ 2014-12-01 16:48       ` Martin Schwidefsky
  2014-12-01 17:15         ` Thomas Gleixner
  0 siblings, 1 reply; 49+ messages in thread
From: Martin Schwidefsky @ 2014-12-01 16:48 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel

On Mon, 1 Dec 2014 17:10:34 +0100
Frederic Weisbecker <fweisbec@gmail.com> wrote:

> Speaking about the degradation in s390:
> 
> s390 is really a special case. And it would be a shame if we prevent from a
> real core cleanup just for this special case especially as it's fairly possible
> to keep a specific treatment for s390 in order not to impact its performances
> and time precision. We could simply accumulate the cputime in per-cpu values:
> 
> struct s390_cputime {
>        cputime_t user, sys, softirq, hardirq, steal;
> }
> 
> DEFINE_PER_CPU(struct s390_cputime, s390_cputime);
> 
> Then on irq entry/exit, just add the accumulated time to the relevant buffer
> and account for real (through any account_...time() functions) only on tick
> and task switch. There the costly operations (unit conversion and call to
> account_...._time() functions) are deferred to a rarer yet periodic enough
> event. This is what s390 does already for user/system time and kernel
> boundaries.
> 
> This way we should even improve the situation compared to what we have
> upstream. It's going to be faster because calling the accounting functions
> can be costlier than simple per-cpu ops. And also we keep the cputime_t
> granularity. For archs like s390 which have a granularity higher than nsecs,
> we can have:
> 
>    u64 cputime_to_nsecs(cputime_t time, u64 *rem);
> 
> And to avoid remainder losses, we can do that from the tick:
> 
>     delta_cputime = this_cpu_read(s390_cputime.hardirq);
>     delta_nsec = cputime_to_nsecs(delta_cputime, &rem);
>     account_system_time(delta_nsec, HARDIRQ_OFFSET);
>     this_cpu_write(s390_cputime.hardirq, rem);
> 
> Although I doubt that remainders below one nsec lost each tick matter that much.
> But if it does, it's fairly possible to handle like above.
 
To make that work we would have to move some of the logic from account_system_time
to the architecture code. The decision if a system time delta is guest time,
irq time, softirq time or simply system time is currently done in 
kernel/sched/cputime.c.

As the conversion + the accounting is delayed to a regular tick we would have
to split the accounting code into decision functions which bucket a system time
delta should go to and introduce new function to account to the different buckets.

Instead of a single account_system_time we would have account_guest_time,
account_system_time, account_system_time_irq and account_system_time_softirq.

In principle not a bad idea, that would make the interrupt path for s390 faster
as we would not have to call account_system_time, only the decision function
which could be an inline function.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 24/30] cputime: Increment kcpustat directly on irqtime account
  2014-12-01 16:15     ` Frederic Weisbecker
@ 2014-12-01 16:50       ` Martin Schwidefsky
  0 siblings, 0 replies; 49+ messages in thread
From: Martin Schwidefsky @ 2014-12-01 16:50 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Thomas Gleixner, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel

On Mon, 1 Dec 2014 17:15:45 +0100
Frederic Weisbecker <fweisbec@gmail.com> wrote:

> On Mon, Dec 01, 2014 at 03:41:28PM +0100, Martin Schwidefsky wrote:
> > On Fri, 28 Nov 2014 19:23:54 +0100
> > Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > 
> > > The irqtime is accounted is nsecs and stored in
> > > cpu_irq_time.hardirq_time and cpu_irq_time.softirq_time. Once the
> > > accumulated amount reaches a new jiffy, this one gets accounted to the
> > > kcpustat.
> > > 
> > > This was necessary when kcpustat was stored in cputime_t, which could at
> > > worst have a jiffies granularity. But now kcpustat is stored in nsecs
> > > so this whole discretization game with temporary irqtime storage has
> > > become unnecessary.
> > > 
> > > We can now directly account the irqtime to the kcpustat.
> > 
> > Isn't the issue here that two different approaches to cputime accounting
> > get mixed here? On the one hand a cputime_t based on jiffies and on the
> > other CONFIG_IRQ_TIME_ACCOUNTING which uses sched_clock_cpu() to create
> > the accounting deltas.
> 
> There is no other way really because cputime_t can wrap very low granularity
> time unit such as jiffies. And there is no way to account irqtime with jiffies
> since IRQ duration is supposed to be below 1 ms.
> 
> So irqtime is accounted with a high precision clock, nsecs based and periodically
> accounted as cputime_t once we accumulate enough for a cputime_t unit.
> 
> And turning cputime_t to nsecs simplifies that.

What would happen if cputime_t gets defined to nsecs for a jiffies based system?
The fact that a regular tick is used to create a cputime delta does not force
us to define cputime_t as a jiffies counter, no?

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 07/30] cputime: Convert kcpustat to nsecs
  2014-12-01 16:48       ` Martin Schwidefsky
@ 2014-12-01 17:15         ` Thomas Gleixner
  2014-12-01 17:27           ` Martin Schwidefsky
  2014-12-01 20:14           ` Christian Borntraeger
  0 siblings, 2 replies; 49+ messages in thread
From: Thomas Gleixner @ 2014-12-01 17:15 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Frederic Weisbecker, LKML, Tony Luck, Peter Zijlstra,
	Heiko Carstens, Benjamin Herrenschmidt, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel

On Mon, 1 Dec 2014, Martin Schwidefsky wrote:
> On Mon, 1 Dec 2014 17:10:34 +0100
> Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > Speaking about the degradation in s390:
> > 
> > s390 is really a special case. And it would be a shame if we prevent from a
> > real core cleanup just for this special case especially as it's fairly possible
> > to keep a specific treatment for s390 in order not to impact its performances
> > and time precision. We could simply accumulate the cputime in per-cpu values:
> > 
> > struct s390_cputime {
> >        cputime_t user, sys, softirq, hardirq, steal;
> > }
> > 
> > DEFINE_PER_CPU(struct s390_cputime, s390_cputime);
> > 
> > Then on irq entry/exit, just add the accumulated time to the relevant buffer
> > and account for real (through any account_...time() functions) only on tick
> > and task switch. There the costly operations (unit conversion and call to
> > account_...._time() functions) are deferred to a rarer yet periodic enough
> > event. This is what s390 does already for user/system time and kernel
> > boundaries.
> > 
> > This way we should even improve the situation compared to what we have
> > upstream. It's going to be faster because calling the accounting functions
> > can be costlier than simple per-cpu ops. And also we keep the cputime_t
> > granularity. For archs like s390 which have a granularity higher than nsecs,
> > we can have:
> > 
> >    u64 cputime_to_nsecs(cputime_t time, u64 *rem);
> > 
> > And to avoid remainder losses, we can do that from the tick:
> > 
> >     delta_cputime = this_cpu_read(s390_cputime.hardirq);
> >     delta_nsec = cputime_to_nsecs(delta_cputime, &rem);
> >     account_system_time(delta_nsec, HARDIRQ_OFFSET);
> >     this_cpu_write(s390_cputime.hardirq, rem);
> > 
> > Although I doubt that remainders below one nsec lost each tick matter that much.
> > But if it does, it's fairly possible to handle like above.
>  
> To make that work we would have to move some of the logic from account_system_time
> to the architecture code. The decision if a system time delta is guest time,
> irq time, softirq time or simply system time is currently done in 
> kernel/sched/cputime.c.
> 
> As the conversion + the accounting is delayed to a regular tick we would have
> to split the accounting code into decision functions which bucket a system time
> delta should go to and introduce new function to account to the different buckets.
> 
> Instead of a single account_system_time we would have account_guest_time,
> account_system_time, account_system_time_irq and account_system_time_softirq.
> 
> In principle not a bad idea, that would make the interrupt path for s390 faster
> as we would not have to call account_system_time, only the decision function
> which could be an inline function.

Why make this s390 specific?

We can decouple the accounting from the time accumulation for all
architectures.

struct cputime_record {
       u64 user, sys, softirq, hardirq, steal;
};

DEFINE_PER_CPU(struct cputime_record, cputime_record);

Now let account_xxx_time() just work on that per cpu data
structures. That would just accumulate the deltas based on whatever
the architecture uses as a cputime source with whatever resolution it
provides.

Then we collect that accumulated results for the various buckets on a
regular base and convert them to nano seconds. This is not even
required to be at the tick, it could be done by some async worker and
on idle enter/exit.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 07/30] cputime: Convert kcpustat to nsecs
  2014-12-01 17:15         ` Thomas Gleixner
@ 2014-12-01 17:27           ` Martin Schwidefsky
  2014-12-01 19:59             ` Frederic Weisbecker
  2014-12-01 20:14           ` Christian Borntraeger
  1 sibling, 1 reply; 49+ messages in thread
From: Martin Schwidefsky @ 2014-12-01 17:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Frederic Weisbecker, LKML, Tony Luck, Peter Zijlstra,
	Heiko Carstens, Benjamin Herrenschmidt, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel

On Mon, 1 Dec 2014 18:15:36 +0100 (CET)
Thomas Gleixner <tglx@linutronix.de> wrote:

> On Mon, 1 Dec 2014, Martin Schwidefsky wrote:
> > On Mon, 1 Dec 2014 17:10:34 +0100
> > Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > 
> > > Speaking about the degradation in s390:
> > > 
> > > s390 is really a special case. And it would be a shame if we prevent from a
> > > real core cleanup just for this special case especially as it's fairly possible
> > > to keep a specific treatment for s390 in order not to impact its performances
> > > and time precision. We could simply accumulate the cputime in per-cpu values:
> > > 
> > > struct s390_cputime {
> > >        cputime_t user, sys, softirq, hardirq, steal;
> > > }
> > > 
> > > DEFINE_PER_CPU(struct s390_cputime, s390_cputime);
> > > 
> > > Then on irq entry/exit, just add the accumulated time to the relevant buffer
> > > and account for real (through any account_...time() functions) only on tick
> > > and task switch. There the costly operations (unit conversion and call to
> > > account_...._time() functions) are deferred to a rarer yet periodic enough
> > > event. This is what s390 does already for user/system time and kernel
> > > boundaries.
> > > 
> > > This way we should even improve the situation compared to what we have
> > > upstream. It's going to be faster because calling the accounting functions
> > > can be costlier than simple per-cpu ops. And also we keep the cputime_t
> > > granularity. For archs like s390 which have a granularity higher than nsecs,
> > > we can have:
> > > 
> > >    u64 cputime_to_nsecs(cputime_t time, u64 *rem);
> > > 
> > > And to avoid remainder losses, we can do that from the tick:
> > > 
> > >     delta_cputime = this_cpu_read(s390_cputime.hardirq);
> > >     delta_nsec = cputime_to_nsecs(delta_cputime, &rem);
> > >     account_system_time(delta_nsec, HARDIRQ_OFFSET);
> > >     this_cpu_write(s390_cputime.hardirq, rem);
> > > 
> > > Although I doubt that remainders below one nsec lost each tick matter that much.
> > > But if it does, it's fairly possible to handle like above.
> >  
> > To make that work we would have to move some of the logic from account_system_time
> > to the architecture code. The decision if a system time delta is guest time,
> > irq time, softirq time or simply system time is currently done in 
> > kernel/sched/cputime.c.
> > 
> > As the conversion + the accounting is delayed to a regular tick we would have
> > to split the accounting code into decision functions which bucket a system time
> > delta should go to and introduce new function to account to the different buckets.
> > 
> > Instead of a single account_system_time we would have account_guest_time,
> > account_system_time, account_system_time_irq and account_system_time_softirq.
> > 
> > In principle not a bad idea, that would make the interrupt path for s390 faster
> > as we would not have to call account_system_time, only the decision function
> > which could be an inline function.
> 
> Why make this s390 specific?
> 
> We can decouple the accounting from the time accumulation for all
> architectures.
> 
> struct cputime_record {
>        u64 user, sys, softirq, hardirq, steal;
> };
> 
> DEFINE_PER_CPU(struct cputime_record, cputime_record);
> 
> Now let account_xxx_time() just work on that per cpu data
> structures. That would just accumulate the deltas based on whatever
> the architecture uses as a cputime source with whatever resolution it
> provides.
> 
> Then we collect that accumulated results for the various buckets on a
> regular base and convert them to nano seconds. This is not even
> required to be at the tick, it could be done by some async worker and
> on idle enter/exit.

And leave the decision making in kernel/sched/cputime.c. Yes, that is good.
This would make the arch and the account_xxx_time() function care about
cputime_t and all other common code would use nano-seconds. With the added
benefit that I do not have to change the low level code too much ;-)

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 07/30] cputime: Convert kcpustat to nsecs
  2014-12-01 17:27           ` Martin Schwidefsky
@ 2014-12-01 19:59             ` Frederic Weisbecker
  0 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2014-12-01 19:59 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Thomas Gleixner, LKML, Tony Luck, Peter Zijlstra, Heiko Carstens,
	Benjamin Herrenschmidt, Oleg Nesterov, Paul Mackerras,
	Wu Fengguang, Ingo Molnar, Rik van Riel

On Mon, Dec 01, 2014 at 06:27:38PM +0100, Martin Schwidefsky wrote:
> On Mon, 1 Dec 2014 18:15:36 +0100 (CET)
> Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> > On Mon, 1 Dec 2014, Martin Schwidefsky wrote:
> > > On Mon, 1 Dec 2014 17:10:34 +0100
> > > Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > > 
> > > > Speaking about the degradation in s390:
> > > > 
> > > > s390 is really a special case. And it would be a shame if we prevent from a
> > > > real core cleanup just for this special case especially as it's fairly possible
> > > > to keep a specific treatment for s390 in order not to impact its performances
> > > > and time precision. We could simply accumulate the cputime in per-cpu values:
> > > > 
> > > > struct s390_cputime {
> > > >        cputime_t user, sys, softirq, hardirq, steal;
> > > > }
> > > > 
> > > > DEFINE_PER_CPU(struct s390_cputime, s390_cputime);
> > > > 
> > > > Then on irq entry/exit, just add the accumulated time to the relevant buffer
> > > > and account for real (through any account_...time() functions) only on tick
> > > > and task switch. There the costly operations (unit conversion and call to
> > > > account_...._time() functions) are deferred to a rarer yet periodic enough
> > > > event. This is what s390 does already for user/system time and kernel
> > > > boundaries.
> > > > 
> > > > This way we should even improve the situation compared to what we have
> > > > upstream. It's going to be faster because calling the accounting functions
> > > > can be costlier than simple per-cpu ops. And also we keep the cputime_t
> > > > granularity. For archs like s390 which have a granularity higher than nsecs,
> > > > we can have:
> > > > 
> > > >    u64 cputime_to_nsecs(cputime_t time, u64 *rem);
> > > > 
> > > > And to avoid remainder losses, we can do that from the tick:
> > > > 
> > > >     delta_cputime = this_cpu_read(s390_cputime.hardirq);
> > > >     delta_nsec = cputime_to_nsecs(delta_cputime, &rem);
> > > >     account_system_time(delta_nsec, HARDIRQ_OFFSET);
> > > >     this_cpu_write(s390_cputime.hardirq, rem);
> > > > 
> > > > Although I doubt that remainders below one nsec lost each tick matter that much.
> > > > But if it does, it's fairly possible to handle like above.
> > >  
> > > To make that work we would have to move some of the logic from account_system_time
> > > to the architecture code. The decision if a system time delta is guest time,
> > > irq time, softirq time or simply system time is currently done in 
> > > kernel/sched/cputime.c.
> > > 
> > > As the conversion + the accounting is delayed to a regular tick we would have
> > > to split the accounting code into decision functions which bucket a system time
> > > delta should go to and introduce new function to account to the different buckets.
> > > 
> > > Instead of a single account_system_time we would have account_guest_time,
> > > account_system_time, account_system_time_irq and account_system_time_softirq.
> > > 
> > > In principle not a bad idea, that would make the interrupt path for s390 faster
> > > as we would not have to call account_system_time, only the decision function
> > > which could be an inline function.
> > 
> > Why make this s390 specific?
> > 
> > We can decouple the accounting from the time accumulation for all
> > architectures.
> > 
> > struct cputime_record {
> >        u64 user, sys, softirq, hardirq, steal;
> > };
> > 
> > DEFINE_PER_CPU(struct cputime_record, cputime_record);
> > 
> > Now let account_xxx_time() just work on that per cpu data
> > structures. That would just accumulate the deltas based on whatever
> > the architecture uses as a cputime source with whatever resolution it
> > provides.
> > 
> > Then we collect that accumulated results for the various buckets on a
> > regular base and convert them to nano seconds. This is not even
> > required to be at the tick, it could be done by some async worker and
> > on idle enter/exit.
> 
> And leave the decision making in kernel/sched/cputime.c. Yes, that is good.
> This would make the arch and the account_xxx_time() function care about
> cputime_t and all other common code would use nano-seconds. With the added
> benefit that I do not have to change the low level code too much ;-)

Yes that sounds really good. Besides, this whole machinery can also benefit for
CONFIG_VIRT_CPU_ACCOUNTING_GEN which is the same context entry/exit based
cputime accounting, just it's based on context tracking.

Note the current differences between those two CONFIG_VIRT_CPU_ACCOUNTING
flavours:

_ CONFIG_VIRT_CPU_ACCOUNTING_NATIVE:
  * user/kernel boundaries: accumulate
  * irq boundaries: account
  * context_switch: account
  * tick: account (pending user time if any)
  * task_cputime(): direct access

_ CONFIG_VIRT_CPU_ACCOUNTING_GEN:
  * user/kernel boundaries: account
  * irq boundaries: account
  * context_switch: account
  * tick: ignore
  * task_cputime(): direct access + accumulate

More details about task_cputime(): tsk->[us]time are fetched through task_cputime(),
in NATIVE tsk->[us]time are periodically accounted (thanks to the tick)
so task_cputime() simply return the fields as is.
In GEN the tick maybe off so task_cputime() returns tsk->[us]time + pending
accumulated time.

I'm recalling that because Thomas suggests that we don't _have_ to account
the accumulated time from the tick and indeed, this can be done from calls
to task_cputime() like we do for GEN. Of course this comes at the cost of
overhead on any access to utime and stime fields of a task_struct. Thus
the pros and cons must be carefully considered between tick overhead 
and task_cputime() overhead, I'd personally be cautious and flush from
the tick at least as a first step.

That is:

_ CONFIG_VIRT_CPU_ACCOUNTING (GEN || NATIVE)
  * user/kernel boundaries: accumulate
  * irq boundaries: accumulate
  * context_switch: account
  * tick: account on NATIVE
  * task_cputime: return accumulated on GEN

I can take care of that as a preclude before the conversion of cputime to nsecs.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 07/30] cputime: Convert kcpustat to nsecs
  2014-12-01 17:15         ` Thomas Gleixner
  2014-12-01 17:27           ` Martin Schwidefsky
@ 2014-12-01 20:14           ` Christian Borntraeger
  2014-12-01 20:21             ` Thomas Gleixner
  1 sibling, 1 reply; 49+ messages in thread
From: Christian Borntraeger @ 2014-12-01 20:14 UTC (permalink / raw)
  To: Thomas Gleixner, Martin Schwidefsky
  Cc: Frederic Weisbecker, LKML, Tony Luck, Peter Zijlstra,
	Heiko Carstens, Benjamin Herrenschmidt, Oleg Nesterov,
	Paul Mackerras, Wu Fengguang, Ingo Molnar, Rik van Riel

Am 01.12.2014 um 18:15 schrieb Thomas Gleixner:
> On Mon, 1 Dec 2014, Martin Schwidefsky wrote:
>> On Mon, 1 Dec 2014 17:10:34 +0100
>> Frederic Weisbecker <fweisbec@gmail.com> wrote:
>>
>>> Speaking about the degradation in s390:
>>>
>>> s390 is really a special case. And it would be a shame if we prevent from a
>>> real core cleanup just for this special case especially as it's fairly possible
>>> to keep a specific treatment for s390 in order not to impact its performances
>>> and time precision. We could simply accumulate the cputime in per-cpu values:
>>>
>>> struct s390_cputime {
>>>        cputime_t user, sys, softirq, hardirq, steal;
>>> }
>>>
>>> DEFINE_PER_CPU(struct s390_cputime, s390_cputime);
>>>
>>> Then on irq entry/exit, just add the accumulated time to the relevant buffer
>>> and account for real (through any account_...time() functions) only on tick
>>> and task switch. There the costly operations (unit conversion and call to
>>> account_...._time() functions) are deferred to a rarer yet periodic enough
>>> event. This is what s390 does already for user/system time and kernel
>>> boundaries.
>>>
>>> This way we should even improve the situation compared to what we have
>>> upstream. It's going to be faster because calling the accounting functions
>>> can be costlier than simple per-cpu ops. And also we keep the cputime_t
>>> granularity. For archs like s390 which have a granularity higher than nsecs,
>>> we can have:
>>>
>>>    u64 cputime_to_nsecs(cputime_t time, u64 *rem);
>>>
>>> And to avoid remainder losses, we can do that from the tick:
>>>
>>>     delta_cputime = this_cpu_read(s390_cputime.hardirq);
>>>     delta_nsec = cputime_to_nsecs(delta_cputime, &rem);
>>>     account_system_time(delta_nsec, HARDIRQ_OFFSET);
>>>     this_cpu_write(s390_cputime.hardirq, rem);
>>>
>>> Although I doubt that remainders below one nsec lost each tick matter that much.
>>> But if it does, it's fairly possible to handle like above.
>>  
>> To make that work we would have to move some of the logic from account_system_time
>> to the architecture code. The decision if a system time delta is guest time,
>> irq time, softirq time or simply system time is currently done in 
>> kernel/sched/cputime.c.
>>
>> As the conversion + the accounting is delayed to a regular tick we would have
>> to split the accounting code into decision functions which bucket a system time
>> delta should go to and introduce new function to account to the different buckets.
>>
>> Instead of a single account_system_time we would have account_guest_time,
>> account_system_time, account_system_time_irq and account_system_time_softirq.
>>
>> In principle not a bad idea, that would make the interrupt path for s390 faster
>> as we would not have to call account_system_time, only the decision function
>> which could be an inline function.
> 
> Why make this s390 specific?
> 
> We can decouple the accounting from the time accumulation for all
> architectures.
> 
> struct cputime_record {
>        u64 user, sys, softirq, hardirq, steal;
> };

 Wont we need guest, nice, guest_nice as well?

> 
> DEFINE_PER_CPU(struct cputime_record, cputime_record);
> 
> Now let account_xxx_time() just work on that per cpu data
> structures. That would just accumulate the deltas based on whatever
> the architecture uses as a cputime source with whatever resolution it
> provides.
> 
> Then we collect that accumulated results for the various buckets on a
> regular base and convert them to nano seconds. This is not even
> required to be at the tick, it could be done by some async worker and
> on idle enter/exit.
> 
> Thanks,
> 
> 	tglx
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 07/30] cputime: Convert kcpustat to nsecs
  2014-12-01 20:14           ` Christian Borntraeger
@ 2014-12-01 20:21             ` Thomas Gleixner
  0 siblings, 0 replies; 49+ messages in thread
From: Thomas Gleixner @ 2014-12-01 20:21 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Martin Schwidefsky, Frederic Weisbecker, LKML, Tony Luck,
	Peter Zijlstra, Heiko Carstens, Benjamin Herrenschmidt,
	Oleg Nesterov, Paul Mackerras, Wu Fengguang, Ingo Molnar,
	Rik van Riel

On Mon, 1 Dec 2014, Christian Borntraeger wrote:
> Am 01.12.2014 um 18:15 schrieb Thomas Gleixner:
> > Why make this s390 specific?
> > 
> > We can decouple the accounting from the time accumulation for all
> > architectures.
> > 
> > struct cputime_record {
> >        u64 user, sys, softirq, hardirq, steal;
> > };
> 
>  Wont we need guest, nice, guest_nice as well?

Yes. I just took the list from Frederics example w/o staring into
the code.
 
Thanks,

	tglx

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2014-12-01 20:21 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 01/30] jiffies: Remove HZ > USEC_PER_SEC special case Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 02/30] time: Introduce jiffies64_to_nsecs() Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 03/30] cputime: Introduce nsecs_to_cputime64() Frederic Weisbecker
2014-12-01 14:05   ` Martin Schwidefsky
2014-11-28 18:23 ` [RFC PATCH 04/30] s390: Convert open coded idle time seqcount Frederic Weisbecker
2014-12-01 13:46   ` Heiko Carstens
2014-11-28 18:23 ` [RFC PATCH 05/30] s390: Translate cputime magic constants to macros Frederic Weisbecker
2014-12-01 13:47   ` Heiko Carstens
2014-12-01 16:23     ` Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 06/30] s390: Introduce cputime64_to_nsecs() Frederic Weisbecker
2014-12-01 12:24   ` Heiko Carstens
2014-12-01 13:58     ` Martin Schwidefsky
2014-12-01 16:23     ` Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 07/30] cputime: Convert kcpustat to nsecs Frederic Weisbecker
2014-12-01 14:14   ` Martin Schwidefsky
2014-12-01 16:10     ` Frederic Weisbecker
2014-12-01 16:48       ` Martin Schwidefsky
2014-12-01 17:15         ` Thomas Gleixner
2014-12-01 17:27           ` Martin Schwidefsky
2014-12-01 19:59             ` Frederic Weisbecker
2014-12-01 20:14           ` Christian Borntraeger
2014-12-01 20:21             ` Thomas Gleixner
2014-11-28 18:23 ` [RFC PATCH 08/30] apm32: Fix cputime == jiffies assumption Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 09/30] alpha: Fix jiffies based cputime assumption Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 10/30] cputime: Convert guest time accounting to nsecs Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 11/30] cputime: Special API to return old-typed cputime Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 12/30] cputime: Convert task/group cputime to nsecs Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 13/30] alpha: Convert obsolete cputime_t " Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 14/30] x86: Convert obsolete cputime type " Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 15/30] isdn: " Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 16/30] binfmt: " Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 17/30] acct: " Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 18/30] delaycct: " Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 19/30] tsacct: " Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 20/30] signal: " Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 21/30] cputime: Remove task_cputime_t_scaled Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 22/30] u64_stats_sync: Introduce preempt-unsafe readers Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 23/30] cputime: Convert irq_time_accounting to use u64_stats_sync Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 24/30] cputime: Increment kcpustat directly on irqtime account Frederic Weisbecker
2014-12-01 14:41   ` Martin Schwidefsky
2014-12-01 16:15     ` Frederic Weisbecker
2014-12-01 16:50       ` Martin Schwidefsky
2014-11-28 18:23 ` [RFC PATCH 25/30] cputime: Remove temporary irqtime states Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 26/30] cputime: Push time to account_user_time() in nanosecs Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 27/30] cputime: Push time to account_steal_time() " Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 28/30] cputime: Push time to account_idle_time() " Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 29/30] cputime: Push time to account_guest_time() " Frederic Weisbecker
2014-11-28 18:24 ` [RFC PATCH 30/30] cputime: Push time to account_system_time() " Frederic Weisbecker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.