linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/10] vtime: Delay cputime accounting to tick / context switch
@ 2017-01-05 17:11 Frederic Weisbecker
  2017-01-05 17:11 ` [PATCH 01/10] powerpc32: Fix stale scaled stime on " Frederic Weisbecker
                   ` (11 more replies)
  0 siblings, 12 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2017-01-05 17:11 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Heiko Carstens, Martin Schwidefsky, Tony Luck,
	Fenghua Yu, Peter Zijlstra, Rik van Riel, Thomas Gleixner,
	Ingo Molnar, Stanislaw Gruszka, Wanpeng Li,
	Christian Borntraeger

This version is a rebase on top of latest Linus tree which includes
the fix 8f2b468aadc ("s390/vtime: correct system time accounting").

Also a small change: I have moved account_system_index_scaled() to s390
in patch "s390/cputime: delayed accounting of system time" because it is
the only user of the function.

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	vtime/acc-v2

Thanks,
	Frederic
---

Frederic Weisbecker (9):
      powerpc32: Fix stale scaled stime on context switch
      ia64: Fix wrong start cputime assignment on task switch
      cputime: Allow accounting system time using cpustat index
      cputime: Export account_guest_time
      powerpc: Prepare accounting structure for cputime flush on tick
      powerpc: Migrate stolen_time field to accounting structure
      powerpc/vtime: Accumulate cputime and account only on tick/task switch
      ia64: Accumulate cputime and account only on tick/task switch
      vtime: Rename vtime_account_user() to vtime_flush()

Martin Schwidefsky (1):
      s390/cputime: delayed accounting of system time


 arch/ia64/include/asm/thread_info.h   |   6 ++
 arch/ia64/kernel/time.c               |  66 +++++++++++-----
 arch/powerpc/include/asm/accounting.h |  14 +++-
 arch/powerpc/include/asm/paca.h       |   1 -
 arch/powerpc/kernel/asm-offsets.c     |   8 +-
 arch/powerpc/kernel/time.c            | 138 +++++++++++++++++++++-------------
 arch/powerpc/xmon/xmon.c              |   8 +-
 arch/s390/include/asm/lowcore.h       |  65 ++++++++--------
 arch/s390/include/asm/processor.h     |   3 +
 arch/s390/kernel/vtime.c              | 138 ++++++++++++++++++++++------------
 include/linux/kernel_stat.h           |   5 +-
 include/linux/vtime.h                 |   7 +-
 kernel/sched/cputime.c                |  16 ++--
 13 files changed, 302 insertions(+), 173 deletions(-)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 01/10] powerpc32: Fix stale scaled stime on context switch
  2017-01-05 17:11 [PATCH 00/10] vtime: Delay cputime accounting to tick / context switch Frederic Weisbecker
@ 2017-01-05 17:11 ` Frederic Weisbecker
  2017-01-14 10:00   ` [tip:sched/core] sched/cputime, " tip-bot for Frederic Weisbecker
  2017-01-05 17:11 ` [PATCH 02/10] ia64: Fix wrong start cputime assignment on task switch Frederic Weisbecker
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Frederic Weisbecker @ 2017-01-05 17:11 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Heiko Carstens, Martin Schwidefsky, Tony Luck,
	Fenghua Yu, Peter Zijlstra, Rik van Riel, Thomas Gleixner,
	Ingo Molnar, Stanislaw Gruszka, Wanpeng Li,
	Christian Borntraeger

On context switch with powerpc32, the cputime is accumulated in the
thread_info struct. So the switching-in task must move forward its
start time snapshot to the current time in order to later compute the
delta spent in system mode.

This is what we do for the normal cputime by initializing the starttime
field to the value of the previous task's starttime which got freshly
updated.

But we are missing the update of the scaled cputime start time. As a
result we may be accounting too much scaled cputime later.

Fix this by initializing the scaled cputime the same way we do for
normal cputime.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/powerpc/kernel/time.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index bc2e08d..ce21650 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -407,6 +407,7 @@ void arch_vtime_task_switch(struct task_struct *prev)
 	struct cpu_accounting_data *acct = get_accounting(current);
 
 	acct->starttime = get_accounting(prev)->starttime;
+	acct->startspurr = get_accounting(prev)->startspurr;
 	acct->system_time = 0;
 	acct->user_time = 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 02/10] ia64: Fix wrong start cputime assignment on task switch
  2017-01-05 17:11 [PATCH 00/10] vtime: Delay cputime accounting to tick / context switch Frederic Weisbecker
  2017-01-05 17:11 ` [PATCH 01/10] powerpc32: Fix stale scaled stime on " Frederic Weisbecker
@ 2017-01-05 17:11 ` Frederic Weisbecker
  2017-01-14 10:00   ` [tip:sched/core] sched/cputime, ia64: Fix incorrect " tip-bot for Frederic Weisbecker
  2017-01-05 17:11 ` [PATCH 03/10] cputime: Allow accounting system time using cpustat index Frederic Weisbecker
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Frederic Weisbecker @ 2017-01-05 17:11 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Heiko Carstens, Martin Schwidefsky, Tony Luck,
	Fenghua Yu, Peter Zijlstra, Rik van Riel, Thomas Gleixner,
	Ingo Molnar, Stanislaw Gruszka, Wanpeng Li,
	Christian Borntraeger

On task switch we must initialize the current cputime of the next task
using the value of the previous task which got freshly updated.

But we are confusing that with doing the opposite, which should result
in wrong cputime accounting.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/ia64/kernel/time.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index 71775b95..637e741 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -83,7 +83,7 @@ void arch_vtime_task_switch(struct task_struct *prev)
 	struct thread_info *pi = task_thread_info(prev);
 	struct thread_info *ni = task_thread_info(current);
 
-	pi->ac_stamp = ni->ac_stamp;
+	ni->ac_stamp = pi->ac_stamp;
 	ni->ac_stime = ni->ac_utime = 0;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 03/10] cputime: Allow accounting system time using cpustat index
  2017-01-05 17:11 [PATCH 00/10] vtime: Delay cputime accounting to tick / context switch Frederic Weisbecker
  2017-01-05 17:11 ` [PATCH 01/10] powerpc32: Fix stale scaled stime on " Frederic Weisbecker
  2017-01-05 17:11 ` [PATCH 02/10] ia64: Fix wrong start cputime assignment on task switch Frederic Weisbecker
@ 2017-01-05 17:11 ` Frederic Weisbecker
  2017-01-14 10:01   ` [tip:sched/core] sched/cputime: " tip-bot for Frederic Weisbecker
  2017-01-05 17:11 ` [PATCH 04/10] cputime: Export account_guest_time Frederic Weisbecker
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Frederic Weisbecker @ 2017-01-05 17:11 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Heiko Carstens, Martin Schwidefsky, Tony Luck,
	Fenghua Yu, Peter Zijlstra, Rik van Riel, Thomas Gleixner,
	Ingo Molnar, Stanislaw Gruszka, Wanpeng Li,
	Christian Borntraeger

In order to prepare for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE to delay
cputime accounting to the tick, let's provide APIs to account system
time to precise contexts: hardirq, softirq, pure system, ...

Inspired-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/kernel_stat.h |  2 ++
 kernel/sched/cputime.c      | 10 +++++-----
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 00f7768..14b63bb 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -80,6 +80,8 @@ static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
 
 extern void account_user_time(struct task_struct *, cputime_t);
 extern void account_system_time(struct task_struct *, int, cputime_t);
+extern void account_system_index_time(struct task_struct *, cputime_t,
+				      enum cpu_usage_stat);
 extern void account_steal_time(cputime_t);
 extern void account_idle_time(cputime_t);
 
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 7700a9c..aad4283 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -176,8 +176,8 @@ static void account_guest_time(struct task_struct *p, cputime_t cputime)
  * @cputime: the cpu time spent in kernel space since the last update
  * @index: pointer to cpustat field that has to be updated
  */
-static inline
-void __account_system_time(struct task_struct *p, cputime_t cputime, int index)
+void account_system_index_time(struct task_struct *p,
+			       cputime_t cputime, enum cpu_usage_stat index)
 {
 	/* Add system time to process. */
 	p->stime += cputime;
@@ -213,7 +213,7 @@ void account_system_time(struct task_struct *p, int hardirq_offset,
 	else
 		index = CPUTIME_SYSTEM;
 
-	__account_system_time(p, cputime, index);
+	account_system_index_time(p, cputime, index);
 }
 
 /*
@@ -400,7 +400,7 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
 		 * So, we have to handle it separately here.
 		 * Also, p->stime needs to be updated for ksoftirqd.
 		 */
-		__account_system_time(p, cputime, CPUTIME_SOFTIRQ);
+		account_system_index_time(p, cputime, CPUTIME_SOFTIRQ);
 	} else if (user_tick) {
 		account_user_time(p, cputime);
 	} else if (p == rq->idle) {
@@ -408,7 +408,7 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
 	} else if (p->flags & PF_VCPU) { /* System time or guest time */
 		account_guest_time(p, cputime);
 	} else {
-		__account_system_time(p, cputime, CPUTIME_SYSTEM);
+		account_system_index_time(p, cputime, CPUTIME_SYSTEM);
 	}
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 04/10] cputime: Export account_guest_time
  2017-01-05 17:11 [PATCH 00/10] vtime: Delay cputime accounting to tick / context switch Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2017-01-05 17:11 ` [PATCH 03/10] cputime: Allow accounting system time using cpustat index Frederic Weisbecker
@ 2017-01-05 17:11 ` Frederic Weisbecker
  2017-01-14 10:01   ` [tip:sched/core] sched/cputime: Export account_guest_time() tip-bot for Frederic Weisbecker
  2017-01-05 17:11 ` [PATCH 05/10] powerpc: Prepare accounting structure for cputime flush on tick Frederic Weisbecker
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Frederic Weisbecker @ 2017-01-05 17:11 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Heiko Carstens, Martin Schwidefsky, Tony Luck,
	Fenghua Yu, Peter Zijlstra, Rik van Riel, Thomas Gleixner,
	Ingo Molnar, Stanislaw Gruszka, Wanpeng Li,
	Christian Borntraeger

In order to prepare for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE to delay
cputime accounting to the tick, let's allow archs to account cputime
directly to gtime.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/kernel_stat.h | 1 +
 kernel/sched/cputime.c      | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 14b63bb..cfd6c0c 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -79,6 +79,7 @@ static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
 }
 
 extern void account_user_time(struct task_struct *, cputime_t);
+extern void account_guest_time(struct task_struct *, cputime_t);
 extern void account_system_time(struct task_struct *, int, cputime_t);
 extern void account_system_index_time(struct task_struct *, cputime_t,
 				      enum cpu_usage_stat);
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index aad4283..5813ee4 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -151,7 +151,7 @@ void account_user_time(struct task_struct *p, cputime_t cputime)
  * @p: the process that the cpu time gets accounted to
  * @cputime: the cpu time spent in virtual machine since the last update
  */
-static void account_guest_time(struct task_struct *p, cputime_t cputime)
+void account_guest_time(struct task_struct *p, cputime_t cputime)
 {
 	u64 *cpustat = kcpustat_this_cpu->cpustat;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 05/10] powerpc: Prepare accounting structure for cputime flush on tick
  2017-01-05 17:11 [PATCH 00/10] vtime: Delay cputime accounting to tick / context switch Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2017-01-05 17:11 ` [PATCH 04/10] cputime: Export account_guest_time Frederic Weisbecker
@ 2017-01-05 17:11 ` Frederic Weisbecker
  2017-01-14 10:02   ` [tip:sched/core] sched/cputime, " tip-bot for Frederic Weisbecker
  2017-01-05 17:11 ` [PATCH 06/10] powerpc: Migrate stolen_time field to accounting structure Frederic Weisbecker
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Frederic Weisbecker @ 2017-01-05 17:11 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Heiko Carstens, Martin Schwidefsky, Tony Luck,
	Fenghua Yu, Peter Zijlstra, Rik van Riel, Thomas Gleixner,
	Ingo Molnar, Stanislaw Gruszka, Wanpeng Li,
	Christian Borntraeger

In order to prepare for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE to delay
cputime accounting to the tick, provide finegrained accumulators to
powerpc in order to store the cputime until flushing.

While at it, normalize the name of several fields according to common
cputime naming.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/powerpc/include/asm/accounting.h | 14 +++++++++++---
 arch/powerpc/kernel/asm-offsets.c     |  8 ++++----
 arch/powerpc/kernel/time.c            | 31 ++++++++++++++++---------------
 arch/powerpc/xmon/xmon.c              |  6 +++---
 4 files changed, 34 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/include/asm/accounting.h b/arch/powerpc/include/asm/accounting.h
index c133246..3abcf98 100644
--- a/arch/powerpc/include/asm/accounting.h
+++ b/arch/powerpc/include/asm/accounting.h
@@ -12,9 +12,17 @@
 
 /* Stuff for accurate time accounting */
 struct cpu_accounting_data {
-	unsigned long user_time;	/* accumulated usermode TB ticks */
-	unsigned long system_time;	/* accumulated system TB ticks */
-	unsigned long user_time_scaled;	/* accumulated usermode SPURR ticks */
+	/* Accumulated cputime values to flush on ticks*/
+	unsigned long utime;
+	unsigned long stime;
+	unsigned long utime_scaled;
+	unsigned long stime_scaled;
+	unsigned long gtime;
+	unsigned long hardirq_time;
+	unsigned long softirq_time;
+	unsigned long steal_time;
+	unsigned long idle_time;
+	/* Internal counters */
 	unsigned long starttime;	/* TB value snapshot */
 	unsigned long starttime_user;	/* TB value on exit to usermode */
 	unsigned long startspurr;	/* SPURR value snapshot */
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 0601e6a..e505319 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -252,9 +252,9 @@ int main(void)
 	DEFINE(ACCOUNT_STARTTIME_USER,
 	       offsetof(struct paca_struct, accounting.starttime_user));
 	DEFINE(ACCOUNT_USER_TIME,
-	       offsetof(struct paca_struct, accounting.user_time));
+	       offsetof(struct paca_struct, accounting.utime));
 	DEFINE(ACCOUNT_SYSTEM_TIME,
-	       offsetof(struct paca_struct, accounting.system_time));
+	       offsetof(struct paca_struct, accounting.stime));
 	DEFINE(PACA_TRAP_SAVE, offsetof(struct paca_struct, trap_save));
 	DEFINE(PACA_NAPSTATELOST, offsetof(struct paca_struct, nap_state_lost));
 	DEFINE(PACA_SPRG_VDSO, offsetof(struct paca_struct, sprg_vdso));
@@ -265,9 +265,9 @@ int main(void)
 	DEFINE(ACCOUNT_STARTTIME_USER,
 	       offsetof(struct thread_info, accounting.starttime_user));
 	DEFINE(ACCOUNT_USER_TIME,
-	       offsetof(struct thread_info, accounting.user_time));
+	       offsetof(struct thread_info, accounting.utime));
 	DEFINE(ACCOUNT_SYSTEM_TIME,
-	       offsetof(struct thread_info, accounting.system_time));
+	       offsetof(struct thread_info, accounting.stime));
 #endif
 #endif /* CONFIG_PPC64 */
 
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index ce21650..17a2cd1 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -271,8 +271,8 @@ void accumulate_stolen_time(void)
 
 	sst = scan_dispatch_log(acct->starttime_user);
 	ust = scan_dispatch_log(acct->starttime);
-	acct->system_time -= sst;
-	acct->user_time -= ust;
+	acct->stime -= sst;
+	acct->utime -= ust;
 	local_paca->stolen_time += ust + sst;
 
 	local_paca->soft_enabled = save_soft_enabled;
@@ -281,10 +281,11 @@ void accumulate_stolen_time(void)
 static inline u64 calculate_stolen_time(u64 stop_tb)
 {
 	u64 stolen = 0;
+	struct cpu_accounting_data *acct = &local_paca->accounting;
 
 	if (get_paca()->dtl_ridx != be64_to_cpu(get_lppaca()->dtl_idx)) {
 		stolen = scan_dispatch_log(stop_tb);
-		get_paca()->accounting.system_time -= stolen;
+		acct->stime -= stolen;
 	}
 
 	stolen += get_paca()->stolen_time;
@@ -316,17 +317,17 @@ static unsigned long vtime_delta(struct task_struct *tsk,
 
 	now = mftb();
 	nowscaled = read_spurr(now);
-	acct->system_time += now - acct->starttime;
+	acct->stime += now - acct->starttime;
 	acct->starttime = now;
 	deltascaled = nowscaled - acct->startspurr;
 	acct->startspurr = nowscaled;
 
 	*stolen = calculate_stolen_time(now);
 
-	delta = acct->system_time;
-	acct->system_time = 0;
-	udelta = acct->user_time - acct->utime_sspurr;
-	acct->utime_sspurr = acct->user_time;
+	delta = acct->stime;
+	acct->stime = 0;
+	udelta = acct->utime - acct->utime_sspurr;
+	acct->utime_sspurr = acct->utime;
 
 	/*
 	 * Because we don't read the SPURR on every kernel entry/exit,
@@ -348,7 +349,7 @@ static unsigned long vtime_delta(struct task_struct *tsk,
 			*sys_scaled = deltascaled;
 		}
 	}
-	acct->user_time_scaled += user_scaled;
+	acct->utime_scaled += user_scaled;
 
 	return delta;
 }
@@ -387,10 +388,10 @@ void vtime_account_user(struct task_struct *tsk)
 	cputime_t utime, utimescaled;
 	struct cpu_accounting_data *acct = get_accounting(tsk);
 
-	utime = acct->user_time;
-	utimescaled = acct->user_time_scaled;
-	acct->user_time = 0;
-	acct->user_time_scaled = 0;
+	utime = acct->utime;
+	utimescaled = acct->utime_scaled;
+	acct->utime = 0;
+	acct->utime_scaled = 0;
 	acct->utime_sspurr = 0;
 	account_user_time(tsk, utime);
 	tsk->utimescaled += utimescaled;
@@ -408,8 +409,8 @@ void arch_vtime_task_switch(struct task_struct *prev)
 
 	acct->starttime = get_accounting(prev)->starttime;
 	acct->startspurr = get_accounting(prev)->startspurr;
-	acct->system_time = 0;
-	acct->user_time = 0;
+	acct->stime = 0;
+	acct->utime = 0;
 }
 #endif /* CONFIG_PPC32 */
 
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 9c0e17c..9f3b170 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2287,9 +2287,9 @@ static void dump_one_paca(int cpu)
 	DUMP(p, subcore_sibling_mask, "x");
 #endif
 
-	DUMP(p, accounting.user_time, "llx");
-	DUMP(p, accounting.system_time, "llx");
-	DUMP(p, accounting.user_time_scaled, "llx");
+	DUMP(p, accounting.utime, "llx");
+	DUMP(p, accounting.stime, "llx");
+	DUMP(p, accounting.utime_scaled, "llx");
 	DUMP(p, accounting.starttime, "llx");
 	DUMP(p, accounting.starttime_user, "llx");
 	DUMP(p, accounting.startspurr, "llx");
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 06/10] powerpc: Migrate stolen_time field to accounting structure
  2017-01-05 17:11 [PATCH 00/10] vtime: Delay cputime accounting to tick / context switch Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2017-01-05 17:11 ` [PATCH 05/10] powerpc: Prepare accounting structure for cputime flush on tick Frederic Weisbecker
@ 2017-01-05 17:11 ` Frederic Weisbecker
  2017-01-14 10:03   ` [tip:sched/core] sched/cputime, powerpc: Migrate stolen_time field to the " tip-bot for Frederic Weisbecker
  2017-01-05 17:11 ` [PATCH 07/10] powerpc/vtime: Accumulate cputime and account only on tick/task switch Frederic Weisbecker
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Frederic Weisbecker @ 2017-01-05 17:11 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Heiko Carstens, Martin Schwidefsky, Tony Luck,
	Fenghua Yu, Peter Zijlstra, Rik van Riel, Thomas Gleixner,
	Ingo Molnar, Stanislaw Gruszka, Wanpeng Li,
	Christian Borntraeger

That in order to gather all cputime accumulation to the same place.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/powerpc/include/asm/paca.h | 1 -
 arch/powerpc/kernel/time.c      | 6 +++---
 arch/powerpc/xmon/xmon.c        | 2 +-
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 6a6792b..708c3e5 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -187,7 +187,6 @@ struct paca_struct {
 
 	/* Stuff for accurate time accounting */
 	struct cpu_accounting_data accounting;
-	u64 stolen_time;		/* TB ticks taken by hypervisor */
 	u64 dtl_ridx;			/* read index in dispatch log */
 	struct dtl_entry *dtl_curr;	/* pointer corresponding to dtl_ridx */
 
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 17a2cd1..714313e 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -273,7 +273,7 @@ void accumulate_stolen_time(void)
 	ust = scan_dispatch_log(acct->starttime);
 	acct->stime -= sst;
 	acct->utime -= ust;
-	local_paca->stolen_time += ust + sst;
+	acct->steal_time += ust + sst;
 
 	local_paca->soft_enabled = save_soft_enabled;
 }
@@ -288,8 +288,8 @@ static inline u64 calculate_stolen_time(u64 stop_tb)
 		acct->stime -= stolen;
 	}
 
-	stolen += get_paca()->stolen_time;
-	get_paca()->stolen_time = 0;
+	stolen += acct->steal_time;
+	acct->steal_time = 0;
 	return stolen;
 }
 
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 9f3b170..3f864c3 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2294,7 +2294,7 @@ static void dump_one_paca(int cpu)
 	DUMP(p, accounting.starttime_user, "llx");
 	DUMP(p, accounting.startspurr, "llx");
 	DUMP(p, accounting.utime_sspurr, "llx");
-	DUMP(p, stolen_time, "llx");
+	DUMP(p, accounting.steal_time, "llx");
 #undef DUMP
 
 	catch_memory_errors = 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 07/10] powerpc/vtime: Accumulate cputime and account only on tick/task switch
  2017-01-05 17:11 [PATCH 00/10] vtime: Delay cputime accounting to tick / context switch Frederic Weisbecker
                   ` (5 preceding siblings ...)
  2017-01-05 17:11 ` [PATCH 06/10] powerpc: Migrate stolen_time field to accounting structure Frederic Weisbecker
@ 2017-01-05 17:11 ` Frederic Weisbecker
  2017-01-14 10:03   ` [tip:sched/core] sched/cputime, " tip-bot for Frederic Weisbecker
  2017-01-05 17:11 ` [PATCH 08/10] ia64: " Frederic Weisbecker
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Frederic Weisbecker @ 2017-01-05 17:11 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Heiko Carstens, Martin Schwidefsky, Tony Luck,
	Fenghua Yu, Peter Zijlstra, Rik van Riel, Thomas Gleixner,
	Ingo Molnar, Stanislaw Gruszka, Wanpeng Li,
	Christian Borntraeger

Currently CONFIG_VIRT_CPU_ACCOUNTING_NATIVE accounts the cputime on
any context boundary: irq entry/exit, guest entry/exit, context switch,
etc...

Calling functions such as account_system_time(), account_user_time()
and such can be costly, especially if they are called on many fastpath
such as twice per IRQ. Those functions do more than just accounting to
kcpustat and task cputime. Depending on the config, some subsystems can
perform unpleasant multiplications and divisions, among other things.

So lets accumulate the cputime instead and delay the accounting on ticks
and context switches only.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/powerpc/kernel/time.c | 120 +++++++++++++++++++++++++++++----------------
 1 file changed, 77 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 714313e..4255e69 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -280,17 +280,10 @@ void accumulate_stolen_time(void)
 
 static inline u64 calculate_stolen_time(u64 stop_tb)
 {
-	u64 stolen = 0;
-	struct cpu_accounting_data *acct = &local_paca->accounting;
+	if (get_paca()->dtl_ridx != be64_to_cpu(get_lppaca()->dtl_idx))
+		return scan_dispatch_log(stop_tb);
 
-	if (get_paca()->dtl_ridx != be64_to_cpu(get_lppaca()->dtl_idx)) {
-		stolen = scan_dispatch_log(stop_tb);
-		acct->stime -= stolen;
-	}
-
-	stolen += acct->steal_time;
-	acct->steal_time = 0;
-	return stolen;
+	return 0;
 }
 
 #else /* CONFIG_PPC_SPLPAR */
@@ -306,27 +299,26 @@ static inline u64 calculate_stolen_time(u64 stop_tb)
  * or soft irq state.
  */
 static unsigned long vtime_delta(struct task_struct *tsk,
-				 unsigned long *sys_scaled,
-				 unsigned long *stolen)
+				 unsigned long *stime_scaled,
+				 unsigned long *steal_time)
 {
 	unsigned long now, nowscaled, deltascaled;
-	unsigned long udelta, delta, user_scaled;
+	unsigned long stime;
+	unsigned long utime, utime_scaled;
 	struct cpu_accounting_data *acct = get_accounting(tsk);
 
 	WARN_ON_ONCE(!irqs_disabled());
 
 	now = mftb();
 	nowscaled = read_spurr(now);
-	acct->stime += now - acct->starttime;
+	stime = now - acct->starttime;
 	acct->starttime = now;
 	deltascaled = nowscaled - acct->startspurr;
 	acct->startspurr = nowscaled;
 
-	*stolen = calculate_stolen_time(now);
+	*steal_time = calculate_stolen_time(now);
 
-	delta = acct->stime;
-	acct->stime = 0;
-	udelta = acct->utime - acct->utime_sspurr;
+	utime = acct->utime - acct->utime_sspurr;
 	acct->utime_sspurr = acct->utime;
 
 	/*
@@ -339,39 +331,54 @@ static unsigned long vtime_delta(struct task_struct *tsk,
 	 * the user ticks get saved up in paca->user_time_scaled to be
 	 * used by account_process_tick.
 	 */
-	*sys_scaled = delta;
-	user_scaled = udelta;
-	if (deltascaled != delta + udelta) {
-		if (udelta) {
-			*sys_scaled = deltascaled * delta / (delta + udelta);
-			user_scaled = deltascaled - *sys_scaled;
+	*stime_scaled = stime;
+	utime_scaled = utime;
+	if (deltascaled != stime + utime) {
+		if (utime) {
+			*stime_scaled = deltascaled * stime / (stime + utime);
+			utime_scaled = deltascaled - *stime_scaled;
 		} else {
-			*sys_scaled = deltascaled;
+			*stime_scaled = deltascaled;
 		}
 	}
-	acct->utime_scaled += user_scaled;
+	acct->utime_scaled += utime_scaled;
 
-	return delta;
+	return stime;
 }
 
 void vtime_account_system(struct task_struct *tsk)
 {
-	unsigned long delta, sys_scaled, stolen;
+	unsigned long stime, stime_scaled, steal_time;
+	struct cpu_accounting_data *acct = get_accounting(tsk);
 
-	delta = vtime_delta(tsk, &sys_scaled, &stolen);
-	account_system_time(tsk, 0, delta);
-	tsk->stimescaled += sys_scaled;
-	if (stolen)
-		account_steal_time(stolen);
+	stime = vtime_delta(tsk, &stime_scaled, &steal_time);
+
+	stime -= min(stime, steal_time);
+	acct->steal_time += steal_time;
+
+	if ((tsk->flags & PF_VCPU) && !irq_count()) {
+		acct->gtime += stime;
+		acct->utime_scaled += stime_scaled;
+	} else {
+		if (hardirq_count())
+			acct->hardirq_time += stime;
+		else if (in_serving_softirq())
+			acct->softirq_time += stime;
+		else
+			acct->stime += stime;
+
+		acct->stime_scaled += stime_scaled;
+	}
 }
 EXPORT_SYMBOL_GPL(vtime_account_system);
 
 void vtime_account_idle(struct task_struct *tsk)
 {
-	unsigned long delta, sys_scaled, stolen;
+	unsigned long stime, stime_scaled, steal_time;
+	struct cpu_accounting_data *acct = get_accounting(tsk);
 
-	delta = vtime_delta(tsk, &sys_scaled, &stolen);
-	account_idle_time(delta + stolen);
+	stime = vtime_delta(tsk, &stime_scaled, &steal_time);
+	acct->idle_time += stime + steal_time;
 }
 
 /*
@@ -385,16 +392,45 @@ void vtime_account_idle(struct task_struct *tsk)
  */
 void vtime_account_user(struct task_struct *tsk)
 {
-	cputime_t utime, utimescaled;
 	struct cpu_accounting_data *acct = get_accounting(tsk);
 
-	utime = acct->utime;
-	utimescaled = acct->utime_scaled;
+	if (acct->utime)
+		account_user_time(tsk, acct->utime);
+
+	if (acct->utime_scaled)
+		tsk->utimescaled += acct->utime_scaled;
+
+	if (acct->gtime)
+		account_guest_time(tsk, acct->gtime);
+
+	if (acct->steal_time)
+		account_steal_time(acct->steal_time);
+
+	if (acct->idle_time)
+		account_idle_time(acct->idle_time);
+
+	if (acct->stime)
+		account_system_index_time(tsk, acct->stime, CPUTIME_SYSTEM);
+
+	if (acct->stime_scaled)
+		tsk->stimescaled += acct->stime_scaled;
+
+	if (acct->hardirq_time)
+		account_system_index_time(tsk, acct->hardirq_time, CPUTIME_IRQ);
+
+	if (acct->softirq_time)
+		account_system_index_time(tsk, acct->softirq_time, CPUTIME_SOFTIRQ);
+
 	acct->utime = 0;
 	acct->utime_scaled = 0;
 	acct->utime_sspurr = 0;
-	account_user_time(tsk, utime);
-	tsk->utimescaled += utimescaled;
+	acct->gtime = 0;
+	acct->steal_time = 0;
+	acct->idle_time = 0;
+	acct->stime = 0;
+	acct->stime_scaled = 0;
+	acct->hardirq_time = 0;
+	acct->softirq_time = 0;
 }
 
 #ifdef CONFIG_PPC32
@@ -409,8 +445,6 @@ void arch_vtime_task_switch(struct task_struct *prev)
 
 	acct->starttime = get_accounting(prev)->starttime;
 	acct->startspurr = get_accounting(prev)->startspurr;
-	acct->stime = 0;
-	acct->utime = 0;
 }
 #endif /* CONFIG_PPC32 */
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 08/10] ia64: Accumulate cputime and account only on tick/task switch
  2017-01-05 17:11 [PATCH 00/10] vtime: Delay cputime accounting to tick / context switch Frederic Weisbecker
                   ` (6 preceding siblings ...)
  2017-01-05 17:11 ` [PATCH 07/10] powerpc/vtime: Accumulate cputime and account only on tick/task switch Frederic Weisbecker
@ 2017-01-05 17:11 ` Frederic Weisbecker
  2017-01-14 10:04   ` [tip:sched/core] sched/cputime, " tip-bot for Frederic Weisbecker
  2017-01-05 17:11 ` [PATCH 09/10] s390/cputime: delayed accounting of system time Frederic Weisbecker
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Frederic Weisbecker @ 2017-01-05 17:11 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Heiko Carstens, Martin Schwidefsky, Tony Luck,
	Fenghua Yu, Peter Zijlstra, Rik van Riel, Thomas Gleixner,
	Ingo Molnar, Stanislaw Gruszka, Wanpeng Li,
	Christian Borntraeger

Currently CONFIG_VIRT_CPU_ACCOUNTING_NATIVE accounts the cputime on
any context boundary: irq entry/exit, guest entry/exit, context switch,
etc...

Calling functions such as account_system_time(), account_user_time()
and such can be costly, especially if they are called on many fastpath
such as twice per IRQ. Those functions do more than just accounting to
kcpustat and task cputime. Depending on the config, some subsystems can
perform unpleasant multiplications and divisions, among other things.

So lets accumulate the cputime instead and delay the accounting on ticks
and context switches only.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/ia64/include/asm/thread_info.h |  6 ++++
 arch/ia64/kernel/time.c             | 60 ++++++++++++++++++++++++++++---------
 2 files changed, 52 insertions(+), 14 deletions(-)

diff --git a/arch/ia64/include/asm/thread_info.h b/arch/ia64/include/asm/thread_info.h
index c702642..8742d74 100644
--- a/arch/ia64/include/asm/thread_info.h
+++ b/arch/ia64/include/asm/thread_info.h
@@ -27,6 +27,12 @@ struct thread_info {
 	mm_segment_t addr_limit;	/* user-level address space limit */
 	int preempt_count;		/* 0=premptable, <0=BUG; will also serve as bh-counter */
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
+	__u64 utime;
+	__u64 stime;
+	__u64 gtime;
+	__u64 hardirq_time;
+	__u64 softirq_time;
+	__u64 idle_time;
 	__u64 ac_stamp;
 	__u64 ac_leave;
 	__u64 ac_stime;
diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index 637e741..37f1b31 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -63,14 +63,39 @@ extern cputime_t cycle_to_cputime(u64 cyc);
 
 void vtime_account_user(struct task_struct *tsk)
 {
-	cputime_t delta_utime;
 	struct thread_info *ti = task_thread_info(tsk);
+	cputime_t delta;
 
-	if (ti->ac_utime) {
-		delta_utime = cycle_to_cputime(ti->ac_utime);
-		account_user_time(tsk, delta_utime);
-		ti->ac_utime = 0;
+	if (ti->utime)
+		account_user_time(tsk, cycle_to_cputime(ti->utime));
+
+	if (ti->gtime)
+		account_guest_time(tsk, cycle_to_cputime(ti->gtime));
+
+	if (ti->idle_time)
+		account_idle_time(cycle_to_cputime(ti->idle_time));
+
+	if (ti->stime) {
+		delta = cycle_to_cputime(ti->stime);
+		account_system_index_time(tsk, delta, CPUTIME_SYSTEM);
+	}
+
+	if (ti->hardirq_time) {
+		delta = cycle_to_cputime(ti->hardirq_time);
+		account_system_index_time(tsk, delta, CPUTIME_IRQ);
+	}
+
+	if (ti->softirq_time) {
+		delta = cycle_to_cputime(ti->softirq_time);
+		account_system_index_time(tsk, delta, CPUTIME_SOFTIRQ);
 	}
+
+	ti->utime = 0;
+	ti->gtime = 0;
+	ti->idle_time = 0;
+	ti->stime = 0;
+	ti->hardirq_time = 0;
+	ti->softirq_time = 0;
 }
 
 /*
@@ -91,18 +116,15 @@ void arch_vtime_task_switch(struct task_struct *prev)
  * Account time for a transition between system, hard irq or soft irq state.
  * Note that this function is called with interrupts enabled.
  */
-static cputime_t vtime_delta(struct task_struct *tsk)
+static __u64 vtime_delta(struct task_struct *tsk)
 {
 	struct thread_info *ti = task_thread_info(tsk);
-	cputime_t delta_stime;
-	__u64 now;
+	__u64 now, delta_stime;
 
 	WARN_ON_ONCE(!irqs_disabled());
 
 	now = ia64_get_itc();
-
-	delta_stime = cycle_to_cputime(ti->ac_stime + (now - ti->ac_stamp));
-	ti->ac_stime = 0;
+	delta_stime = now - ti->ac_stamp;
 	ti->ac_stamp = now;
 
 	return delta_stime;
@@ -110,15 +132,25 @@ static cputime_t vtime_delta(struct task_struct *tsk)
 
 void vtime_account_system(struct task_struct *tsk)
 {
-	cputime_t delta = vtime_delta(tsk);
+	struct thread_info *ti = task_thread_info(tsk);
+	__u64 stime = vtime_delta(tsk);
 
-	account_system_time(tsk, 0, delta);
+	if ((tsk->flags & PF_VCPU) && !irq_count())
+		ti->gtime += stime;
+	else if (hardirq_count())
+		ti->hardirq_time += stime;
+	else if (in_serving_softirq())
+		ti->softirq_time += stime;
+	else
+		ti->stime += stime;
 }
 EXPORT_SYMBOL_GPL(vtime_account_system);
 
 void vtime_account_idle(struct task_struct *tsk)
 {
-	account_idle_time(vtime_delta(tsk));
+	struct thread_info *ti = task_thread_info(tsk);
+
+	ti->idle_time += vtime_delta(tsk);
 }
 
 #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 09/10] s390/cputime: delayed accounting of system time
  2017-01-05 17:11 [PATCH 00/10] vtime: Delay cputime accounting to tick / context switch Frederic Weisbecker
                   ` (7 preceding siblings ...)
  2017-01-05 17:11 ` [PATCH 08/10] ia64: " Frederic Weisbecker
@ 2017-01-05 17:11 ` Frederic Weisbecker
  2017-01-14 10:04   ` [tip:sched/core] sched/cputime, s390: Implement " tip-bot for Martin Schwidefsky
  2017-01-05 17:11 ` [PATCH 10/10] vtime: Rename vtime_account_user() to vtime_flush() Frederic Weisbecker
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Frederic Weisbecker @ 2017-01-05 17:11 UTC (permalink / raw)
  To: LKML
  Cc: Martin Schwidefsky, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Heiko Carstens, Tony Luck, Fenghua Yu,
	Peter Zijlstra, Rik van Riel, Thomas Gleixner, Ingo Molnar,
	Stanislaw Gruszka, Wanpeng Li, Christian Borntraeger,
	Frederic Weisbecker

From: Martin Schwidefsky <schwidefsky@de.ibm.com>

The account_system_time() function is called with a cputime that
occurred while running in the kernel. The function detects which
context the CPU is currently running in and accounts the time to
the correct bucket. This forces the arch code to account the
cputime for hardirq and softirq immediately.

Such accounting function can be costly and perform unwelcome divisions
and multiplications, among others.

The arch code can delay the accounting for system time. For s390
the accounting is done once per timer tick and for each task switch.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
[rebase against latest linus tree and move account_system_index_scaled()]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/s390/include/asm/lowcore.h   |  65 +++++++++---------
 arch/s390/include/asm/processor.h |   3 +
 arch/s390/kernel/vtime.c          | 134 +++++++++++++++++++++++++-------------
 3 files changed, 124 insertions(+), 78 deletions(-)

diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h
index 9bfad2a..61261e0e 100644
--- a/arch/s390/include/asm/lowcore.h
+++ b/arch/s390/include/asm/lowcore.h
@@ -85,53 +85,56 @@ struct lowcore {
 	__u64	mcck_enter_timer;		/* 0x02c0 */
 	__u64	exit_timer;			/* 0x02c8 */
 	__u64	user_timer;			/* 0x02d0 */
-	__u64	system_timer;			/* 0x02d8 */
-	__u64	steal_timer;			/* 0x02e0 */
-	__u64	last_update_timer;		/* 0x02e8 */
-	__u64	last_update_clock;		/* 0x02f0 */
-	__u64	int_clock;			/* 0x02f8 */
-	__u64	mcck_clock;			/* 0x0300 */
-	__u64	clock_comparator;		/* 0x0308 */
+	__u64	guest_timer;			/* 0x02d8 */
+	__u64	system_timer;			/* 0x02e0 */
+	__u64	hardirq_timer;			/* 0x02e8 */
+	__u64	softirq_timer;			/* 0x02f0 */
+	__u64	steal_timer;			/* 0x02f8 */
+	__u64	last_update_timer;		/* 0x0300 */
+	__u64	last_update_clock;		/* 0x0308 */
+	__u64	int_clock;			/* 0x0310 */
+	__u64	mcck_clock;			/* 0x0318 */
+	__u64	clock_comparator;		/* 0x0320 */
 
 	/* Current process. */
-	__u64	current_task;			/* 0x0310 */
-	__u8	pad_0x318[0x320-0x318];		/* 0x0318 */
-	__u64	kernel_stack;			/* 0x0320 */
+	__u64	current_task;			/* 0x0328 */
+	__u8	pad_0x318[0x320-0x318];		/* 0x0330 */
+	__u64	kernel_stack;			/* 0x0338 */
 
 	/* Interrupt, panic and restart stack. */
-	__u64	async_stack;			/* 0x0328 */
-	__u64	panic_stack;			/* 0x0330 */
-	__u64	restart_stack;			/* 0x0338 */
+	__u64	async_stack;			/* 0x0340 */
+	__u64	panic_stack;			/* 0x0348 */
+	__u64	restart_stack;			/* 0x0350 */
 
 	/* Restart function and parameter. */
-	__u64	restart_fn;			/* 0x0340 */
-	__u64	restart_data;			/* 0x0348 */
-	__u64	restart_source;			/* 0x0350 */
+	__u64	restart_fn;			/* 0x0358 */
+	__u64	restart_data;			/* 0x0360 */
+	__u64	restart_source;			/* 0x0368 */
 
 	/* Address space pointer. */
-	__u64	kernel_asce;			/* 0x0358 */
-	__u64	user_asce;			/* 0x0360 */
+	__u64	kernel_asce;			/* 0x0370 */
+	__u64	user_asce;			/* 0x0378 */
 
 	/*
 	 * The lpp and current_pid fields form a
 	 * 64-bit value that is set as program
 	 * parameter with the LPP instruction.
 	 */
-	__u32	lpp;				/* 0x0368 */
-	__u32	current_pid;			/* 0x036c */
+	__u32	lpp;				/* 0x0380 */
+	__u32	current_pid;			/* 0x0384 */
 
 	/* SMP info area */
-	__u32	cpu_nr;				/* 0x0370 */
-	__u32	softirq_pending;		/* 0x0374 */
-	__u64	percpu_offset;			/* 0x0378 */
-	__u64	vdso_per_cpu_data;		/* 0x0380 */
-	__u64	machine_flags;			/* 0x0388 */
-	__u32	preempt_count;			/* 0x0390 */
-	__u8	pad_0x0394[0x0398-0x0394];	/* 0x0394 */
-	__u64	gmap;				/* 0x0398 */
-	__u32	spinlock_lockval;		/* 0x03a0 */
-	__u32	fpu_flags;			/* 0x03a4 */
-	__u8	pad_0x03a8[0x0400-0x03a8];	/* 0x03a8 */
+	__u32	cpu_nr;				/* 0x0388 */
+	__u32	softirq_pending;		/* 0x038c */
+	__u64	percpu_offset;			/* 0x0390 */
+	__u64	vdso_per_cpu_data;		/* 0x0398 */
+	__u64	machine_flags;			/* 0x03a0 */
+	__u32	preempt_count;			/* 0x03a8 */
+	__u8	pad_0x03ac[0x03b0-0x03ac];	/* 0x03ac */
+	__u64	gmap;				/* 0x03b0 */
+	__u32	spinlock_lockval;		/* 0x03b8 */
+	__u32	fpu_flags;			/* 0x03bc */
+	__u8	pad_0x03c0[0x0400-0x03c0];	/* 0x03c0 */
 
 	/* Per cpu primary space access list */
 	__u32	paste[16];			/* 0x0400 */
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 6bca916..977a5b6 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -111,7 +111,10 @@ struct thread_struct {
 	unsigned int  acrs[NUM_ACRS];
         unsigned long ksp;              /* kernel stack pointer             */
 	unsigned long user_timer;	/* task cputime in user space */
+	unsigned long guest_timer;	/* task cputime in kvm guest */
 	unsigned long system_timer;	/* task cputime in kernel space */
+	unsigned long hardirq_timer;	/* task cputime in hardirq context */
+	unsigned long softirq_timer;	/* task cputime in softirq context */
 	unsigned long sys_call_table;	/* system call table address */
 	mm_segment_t mm_segment;
 	unsigned long gmap_addr;	/* address of last gmap fault. */
diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c
index 1b5c5ee..1a53e0b 100644
--- a/arch/s390/kernel/vtime.c
+++ b/arch/s390/kernel/vtime.c
@@ -90,14 +90,41 @@ static void update_mt_scaling(void)
 	__this_cpu_write(mt_scaling_jiffies, jiffies_64);
 }
 
+static inline u64 update_tsk_timer(unsigned long *tsk_vtime, u64 new)
+{
+	u64 delta;
+
+	delta = new - *tsk_vtime;
+	*tsk_vtime = new;
+	return delta;
+}
+
+
+static inline u64 scale_vtime(u64 vtime)
+{
+	u64 mult = __this_cpu_read(mt_scaling_mult);
+	u64 div = __this_cpu_read(mt_scaling_div);
+
+	if (smp_cpu_mtid)
+		return vtime * mult / div;
+	return vtime;
+}
+
+static void account_system_index_scaled(struct task_struct *p,
+					cputime_t cputime, cputime_t scaled,
+					enum cpu_usage_stat index)
+{
+	p->stimescaled += scaled;
+	account_system_index_time(p, cputime, index);
+}
+
 /*
  * Update process times based on virtual cpu times stored by entry.S
  * to the lowcore fields user_timer, system_timer & steal_clock.
  */
 static int do_account_vtime(struct task_struct *tsk)
 {
-	u64 timer, clock, user, system, steal;
-	u64 user_scaled, system_scaled;
+	u64 timer, clock, user, guest, system, hardirq, softirq, steal;
 
 	timer = S390_lowcore.last_update_timer;
 	clock = S390_lowcore.last_update_clock;
@@ -110,36 +137,53 @@ static int do_account_vtime(struct task_struct *tsk)
 #endif
 		: "=m" (S390_lowcore.last_update_timer),
 		  "=m" (S390_lowcore.last_update_clock));
-	S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer;
-	S390_lowcore.steal_timer += S390_lowcore.last_update_clock - clock;
+	clock = S390_lowcore.last_update_clock - clock;
+	timer -= S390_lowcore.last_update_timer;
+
+	if (hardirq_count())
+		S390_lowcore.hardirq_timer += timer;
+	else
+		S390_lowcore.system_timer += timer;
 
 	/* Update MT utilization calculation */
 	if (smp_cpu_mtid &&
 	    time_after64(jiffies_64, this_cpu_read(mt_scaling_jiffies)))
 		update_mt_scaling();
 
-	user = S390_lowcore.user_timer - tsk->thread.user_timer;
-	S390_lowcore.steal_timer -= user;
-	tsk->thread.user_timer = S390_lowcore.user_timer;
+	/* Calculate cputime delta */
+	user = update_tsk_timer(&tsk->thread.user_timer,
+				READ_ONCE(S390_lowcore.user_timer));
+	guest = update_tsk_timer(&tsk->thread.guest_timer,
+				 READ_ONCE(S390_lowcore.guest_timer));
+	system = update_tsk_timer(&tsk->thread.system_timer,
+				  READ_ONCE(S390_lowcore.system_timer));
+	hardirq = update_tsk_timer(&tsk->thread.hardirq_timer,
+				   READ_ONCE(S390_lowcore.hardirq_timer));
+	softirq = update_tsk_timer(&tsk->thread.softirq_timer,
+				   READ_ONCE(S390_lowcore.softirq_timer));
+	S390_lowcore.steal_timer +=
+		clock - user - guest - system - hardirq - softirq;
 
-	system = S390_lowcore.system_timer - tsk->thread.system_timer;
-	S390_lowcore.steal_timer -= system;
-	tsk->thread.system_timer = S390_lowcore.system_timer;
-
-	user_scaled = user;
-	system_scaled = system;
-	/* Do MT utilization scaling */
-	if (smp_cpu_mtid) {
-		u64 mult = __this_cpu_read(mt_scaling_mult);
-		u64 div = __this_cpu_read(mt_scaling_div);
+	/* Push account value */
+	if (user) {
+		account_user_time(tsk, user);
+		tsk->utimescaled += scale_vtime(user);
+	}
 
-		user_scaled = (user_scaled * mult) / div;
-		system_scaled = (system_scaled * mult) / div;
+	if (guest) {
+		account_guest_time(tsk, guest);
+		tsk->utimescaled += scale_vtime(guest);
 	}
-	account_user_time(tsk, user);
-	tsk->utimescaled += user_scaled;
-	account_system_time(tsk, 0, system);
-	tsk->stimescaled += system_scaled;
+
+	if (system)
+		account_system_index_scaled(tsk, system, scale_vtime(system),
+					    CPUTIME_SYSTEM);
+	if (hardirq)
+		account_system_index_scaled(tsk, hardirq, scale_vtime(hardirq),
+					    CPUTIME_IRQ);
+	if (softirq)
+		account_system_index_scaled(tsk, softirq, scale_vtime(softirq),
+					    CPUTIME_SOFTIRQ);
 
 	steal = S390_lowcore.steal_timer;
 	if ((s64) steal > 0) {
@@ -147,16 +191,22 @@ static int do_account_vtime(struct task_struct *tsk)
 		account_steal_time(steal);
 	}
 
-	return virt_timer_forward(user + system);
+	return virt_timer_forward(user + guest + system + hardirq + softirq);
 }
 
 void vtime_task_switch(struct task_struct *prev)
 {
 	do_account_vtime(prev);
 	prev->thread.user_timer = S390_lowcore.user_timer;
+	prev->thread.guest_timer = S390_lowcore.guest_timer;
 	prev->thread.system_timer = S390_lowcore.system_timer;
+	prev->thread.hardirq_timer = S390_lowcore.hardirq_timer;
+	prev->thread.softirq_timer = S390_lowcore.softirq_timer;
 	S390_lowcore.user_timer = current->thread.user_timer;
+	S390_lowcore.guest_timer = current->thread.guest_timer;
 	S390_lowcore.system_timer = current->thread.system_timer;
+	S390_lowcore.hardirq_timer = current->thread.hardirq_timer;
+	S390_lowcore.softirq_timer = current->thread.softirq_timer;
 }
 
 /*
@@ -176,32 +226,22 @@ void vtime_account_user(struct task_struct *tsk)
  */
 void vtime_account_irq_enter(struct task_struct *tsk)
 {
-	u64 timer, system, system_scaled;
+	u64 timer;
 
 	timer = S390_lowcore.last_update_timer;
 	S390_lowcore.last_update_timer = get_vtimer();
-	S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer;
-
-	/* Update MT utilization calculation */
-	if (smp_cpu_mtid &&
-	    time_after64(jiffies_64, this_cpu_read(mt_scaling_jiffies)))
-		update_mt_scaling();
-
-	system = S390_lowcore.system_timer - tsk->thread.system_timer;
-	S390_lowcore.steal_timer -= system;
-	tsk->thread.system_timer = S390_lowcore.system_timer;
-	system_scaled = system;
-	/* Do MT utilization scaling */
-	if (smp_cpu_mtid) {
-		u64 mult = __this_cpu_read(mt_scaling_mult);
-		u64 div = __this_cpu_read(mt_scaling_div);
-
-		system_scaled = (system_scaled * mult) / div;
-	}
-	account_system_time(tsk, 0, system);
-	tsk->stimescaled += system_scaled;
-
-	virt_timer_forward(system);
+	timer -= S390_lowcore.last_update_timer;
+
+	if ((tsk->flags & PF_VCPU) && (irq_count() == 0))
+		S390_lowcore.guest_timer += timer;
+	else if (hardirq_count())
+		S390_lowcore.hardirq_timer += timer;
+	else if (in_serving_softirq())
+		S390_lowcore.softirq_timer += timer;
+	else
+		S390_lowcore.system_timer += timer;
+
+	virt_timer_forward(timer);
 }
 EXPORT_SYMBOL_GPL(vtime_account_irq_enter);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 10/10] vtime: Rename vtime_account_user() to vtime_flush()
  2017-01-05 17:11 [PATCH 00/10] vtime: Delay cputime accounting to tick / context switch Frederic Weisbecker
                   ` (8 preceding siblings ...)
  2017-01-05 17:11 ` [PATCH 09/10] s390/cputime: delayed accounting of system time Frederic Weisbecker
@ 2017-01-05 17:11 ` Frederic Weisbecker
  2017-01-14 10:05   ` [tip:sched/core] sched/cputime: " tip-bot for Frederic Weisbecker
  2017-01-09  8:13 ` [PATCH 00/10] vtime: Delay cputime accounting to tick / context switch Martin Schwidefsky
  2017-01-10 11:45 ` Thomas Gleixner
  11 siblings, 1 reply; 34+ messages in thread
From: Frederic Weisbecker @ 2017-01-05 17:11 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Heiko Carstens, Martin Schwidefsky, Tony Luck,
	Fenghua Yu, Peter Zijlstra, Rik van Riel, Thomas Gleixner,
	Ingo Molnar, Stanislaw Gruszka, Wanpeng Li,
	Christian Borntraeger

CONFIG_VIRT_CPU_ACCOUNTING_NATIVE used to accumulate user time and
account it on ticks and context switches only through the
vtime_account_user() function.

Now this model has been generalized on the 3 archs for all kind of
cputime (system, irq, ...) and all the cputime flushing happens under
vtime_account_user().

So let's rename this function to better reflect its new role.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/ia64/kernel/time.c     | 2 +-
 arch/powerpc/kernel/time.c  | 6 ++----
 arch/s390/kernel/vtime.c    | 2 +-
 include/linux/kernel_stat.h | 2 +-
 include/linux/vtime.h       | 7 +++++--
 kernel/sched/cputime.c      | 4 +---
 6 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index 37f1b31..d040f12 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -61,7 +61,7 @@ static struct clocksource *itc_clocksource;
 
 extern cputime_t cycle_to_cputime(u64 cyc);
 
-void vtime_account_user(struct task_struct *tsk)
+void vtime_flush(struct task_struct *tsk)
 {
 	struct thread_info *ti = task_thread_info(tsk);
 	cputime_t delta;
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 4255e69..02e9730 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -382,15 +382,13 @@ void vtime_account_idle(struct task_struct *tsk)
 }
 
 /*
- * Transfer the user time accumulated in the paca
- * by the exception entry and exit code to the generic
- * process user time records.
+ * Account the whole cputime accumulated in the paca
  * Must be called with interrupts disabled.
  * Assumes that vtime_account_system/idle() has been called
  * recently (i.e. since the last entry from usermode) so that
  * get_paca()->user_time_scaled is up to date.
  */
-void vtime_account_user(struct task_struct *tsk)
+void vtime_flush(struct task_struct *tsk)
 {
 	struct cpu_accounting_data *acct = get_accounting(tsk);
 
diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c
index 1a53e0b..0a9e5d6 100644
--- a/arch/s390/kernel/vtime.c
+++ b/arch/s390/kernel/vtime.c
@@ -214,7 +214,7 @@ void vtime_task_switch(struct task_struct *prev)
  * accounting system time in order to correctly compute
  * the stolen time accounting.
  */
-void vtime_account_user(struct task_struct *tsk)
+void vtime_flush(struct task_struct *tsk)
 {
 	if (do_account_vtime(tsk))
 		virt_timer_expire();
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index cfd6c0c..c3e38de 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -89,7 +89,7 @@ extern void account_idle_time(cputime_t);
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
 static inline void account_process_tick(struct task_struct *tsk, int user)
 {
-	vtime_account_user(tsk);
+	vtime_flush(tsk);
 }
 #else
 extern void account_process_tick(struct task_struct *, int user);
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index aa9bfea..0681fe2 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -58,27 +58,28 @@ static inline void vtime_task_switch(struct task_struct *prev)
 
 extern void vtime_account_system(struct task_struct *tsk);
 extern void vtime_account_idle(struct task_struct *tsk);
-extern void vtime_account_user(struct task_struct *tsk);
 
 #else /* !CONFIG_VIRT_CPU_ACCOUNTING */
 
 static inline void vtime_task_switch(struct task_struct *prev) { }
 static inline void vtime_account_system(struct task_struct *tsk) { }
-static inline void vtime_account_user(struct task_struct *tsk) { }
 #endif /* !CONFIG_VIRT_CPU_ACCOUNTING */
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
 extern void arch_vtime_task_switch(struct task_struct *tsk);
+extern void vtime_account_user(struct task_struct *tsk);
 extern void vtime_user_enter(struct task_struct *tsk);
 
 static inline void vtime_user_exit(struct task_struct *tsk)
 {
 	vtime_account_user(tsk);
 }
+
 extern void vtime_guest_enter(struct task_struct *tsk);
 extern void vtime_guest_exit(struct task_struct *tsk);
 extern void vtime_init_idle(struct task_struct *tsk, int cpu);
 #else /* !CONFIG_VIRT_CPU_ACCOUNTING_GEN  */
+static inline void vtime_account_user(struct task_struct *tsk) { }
 static inline void vtime_user_enter(struct task_struct *tsk) { }
 static inline void vtime_user_exit(struct task_struct *tsk) { }
 static inline void vtime_guest_enter(struct task_struct *tsk) { }
@@ -93,9 +94,11 @@ static inline void vtime_account_irq_exit(struct task_struct *tsk)
 	/* On hard|softirq exit we always account to hard|softirq cputime */
 	vtime_account_system(tsk);
 }
+extern void vtime_flush(struct task_struct *tsk);
 #else /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
 static inline void vtime_account_irq_enter(struct task_struct *tsk) { }
 static inline void vtime_account_irq_exit(struct task_struct *tsk) { }
+static inline void vtime_flush(struct task_struct *tsk) { }
 #endif
 
 
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 5813ee4..f7c14cc 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -437,9 +437,7 @@ void vtime_common_task_switch(struct task_struct *prev)
 	else
 		vtime_account_system(prev);
 
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
-	vtime_account_user(prev);
-#endif
+	vtime_flush(prev);
 	arch_vtime_task_switch(prev);
 }
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/10] vtime: Delay cputime accounting to tick / context switch
  2017-01-05 17:11 [PATCH 00/10] vtime: Delay cputime accounting to tick / context switch Frederic Weisbecker
                   ` (9 preceding siblings ...)
  2017-01-05 17:11 ` [PATCH 10/10] vtime: Rename vtime_account_user() to vtime_flush() Frederic Weisbecker
@ 2017-01-09  8:13 ` Martin Schwidefsky
  2017-01-10 11:45 ` Thomas Gleixner
  11 siblings, 0 replies; 34+ messages in thread
From: Martin Schwidefsky @ 2017-01-09  8:13 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Heiko Carstens, Tony Luck, Fenghua Yu, Peter Zijlstra,
	Rik van Riel, Thomas Gleixner, Ingo Molnar, Stanislaw Gruszka,
	Wanpeng Li, Christian Borntraeger

On Thu,  5 Jan 2017 18:11:40 +0100
Frederic Weisbecker <fweisbec@gmail.com> wrote:

> This version is a rebase on top of latest Linus tree which includes
> the fix 8f2b468aadc ("s390/vtime: correct system time accounting").
> 
> Also a small change: I have moved account_system_index_scaled() to s390
> in patch "s390/cputime: delayed accounting of system time" because it is
> the only user of the function.
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> 	vtime/acc-v2

That looks good, I get sensible numbers for s390. Thanks for doing this!

For the s390 parts:
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/10] vtime: Delay cputime accounting to tick / context switch
  2017-01-05 17:11 [PATCH 00/10] vtime: Delay cputime accounting to tick / context switch Frederic Weisbecker
                   ` (10 preceding siblings ...)
  2017-01-09  8:13 ` [PATCH 00/10] vtime: Delay cputime accounting to tick / context switch Martin Schwidefsky
@ 2017-01-10 11:45 ` Thomas Gleixner
  2017-01-10 15:21   ` Frederic Weisbecker
  11 siblings, 1 reply; 34+ messages in thread
From: Thomas Gleixner @ 2017-01-10 11:45 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Heiko Carstens, Martin Schwidefsky, Tony Luck, Fenghua Yu,
	Peter Zijlstra, Rik van Riel, Ingo Molnar, Stanislaw Gruszka,
	Wanpeng Li, Christian Borntraeger

On Thu, 5 Jan 2017, Frederic Weisbecker wrote:

> This version is a rebase on top of latest Linus tree which includes
> the fix 8f2b468aadc ("s390/vtime: correct system time accounting").
> 
> Also a small change: I have moved account_system_index_scaled() to s390
> in patch "s390/cputime: delayed accounting of system time" because it is
> the only user of the function.
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> 	vtime/acc-v2

Acked-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/10] vtime: Delay cputime accounting to tick / context switch
  2017-01-10 11:45 ` Thomas Gleixner
@ 2017-01-10 15:21   ` Frederic Weisbecker
  0 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2017-01-10 15:21 UTC (permalink / raw)
  To: Thomas Gleixner, Martin Schwidefsky
  Cc: LKML, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Heiko Carstens, Tony Luck, Fenghua Yu, Peter Zijlstra,
	Rik van Riel, Ingo Molnar, Stanislaw Gruszka, Wanpeng Li,
	Christian Borntraeger

On Tue, Jan 10, 2017 at 12:45:43PM +0100, Thomas Gleixner wrote:
> On Thu, 5 Jan 2017, Frederic Weisbecker wrote:
> 
> > This version is a rebase on top of latest Linus tree which includes
> > the fix 8f2b468aadc ("s390/vtime: correct system time accounting").
> > 
> > Also a small change: I have moved account_system_index_scaled() to s390
> > in patch "s390/cputime: delayed accounting of system time" because it is
> > the only user of the function.
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > 	vtime/acc-v2
> 
> Acked-by: Thomas Gleixner <tglx@linutronix.de>

Thanks for your reviews! I added the acks on the series.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [tip:sched/core] sched/cputime, powerpc32: Fix stale scaled stime on context switch
  2017-01-05 17:11 ` [PATCH 01/10] powerpc32: Fix stale scaled stime on " Frederic Weisbecker
@ 2017-01-14 10:00   ` tip-bot for Frederic Weisbecker
  0 siblings, 0 replies; 34+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2017-01-14 10:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tony.luck, mingo, tglx, riel, sgruszka, wanpeng.li, fenghua.yu,
	fweisbec, benh, linux-kernel, schwidefsky, borntraeger,
	heiko.carstens, peterz, mpe, torvalds, hpa, paulus

Commit-ID:  90d08ba2b9b4be4aeca6a5b5a4b09fbcde30194d
Gitweb:     http://git.kernel.org/tip/90d08ba2b9b4be4aeca6a5b5a4b09fbcde30194d
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Thu, 5 Jan 2017 18:11:41 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 14 Jan 2017 09:54:11 +0100

sched/cputime, powerpc32: Fix stale scaled stime on context switch

On context switch with powerpc32, the cputime is accumulated in the
thread_info struct. So the switching-in task must move forward its
start time snapshot to the current time in order to later compute the
delta spent in system mode.

This is what we do for the normal cputime by initializing the starttime
field to the value of the previous task's starttime which got freshly
updated.

But we are missing the update of the scaled cputime start time. As a
result we may be accounting too much scaled cputime later.

Fix this by initializing the scaled cputime the same way we do for
normal cputime.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1483636310-6557-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/powerpc/kernel/time.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index bc2e08d..ce21650 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -407,6 +407,7 @@ void arch_vtime_task_switch(struct task_struct *prev)
 	struct cpu_accounting_data *acct = get_accounting(current);
 
 	acct->starttime = get_accounting(prev)->starttime;
+	acct->startspurr = get_accounting(prev)->startspurr;
 	acct->system_time = 0;
 	acct->user_time = 0;
 }

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [tip:sched/core] sched/cputime, ia64: Fix incorrect start cputime assignment on task switch
  2017-01-05 17:11 ` [PATCH 02/10] ia64: Fix wrong start cputime assignment on task switch Frederic Weisbecker
@ 2017-01-14 10:00   ` tip-bot for Frederic Weisbecker
  0 siblings, 0 replies; 34+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2017-01-14 10:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: wanpeng.li, benh, mpe, mingo, tglx, riel, fweisbec, peterz,
	sgruszka, borntraeger, linux-kernel, paulus, fenghua.yu,
	heiko.carstens, schwidefsky, tony.luck, torvalds, hpa

Commit-ID:  8388d21468e7e7656867b67ab2ec98a78c9ad799
Gitweb:     http://git.kernel.org/tip/8388d21468e7e7656867b67ab2ec98a78c9ad799
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Thu, 5 Jan 2017 18:11:42 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 14 Jan 2017 09:54:11 +0100

sched/cputime, ia64: Fix incorrect start cputime assignment on task switch

On task switch we must initialize the current cputime of the next task
using the value of the previous task which got freshly updated.

But we are confusing that with doing the opposite, which should result
in incorrect cputime accounting.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1483636310-6557-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/ia64/kernel/time.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index 71775b95..637e741 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -83,7 +83,7 @@ void arch_vtime_task_switch(struct task_struct *prev)
 	struct thread_info *pi = task_thread_info(prev);
 	struct thread_info *ni = task_thread_info(current);
 
-	pi->ac_stamp = ni->ac_stamp;
+	ni->ac_stamp = pi->ac_stamp;
 	ni->ac_stime = ni->ac_utime = 0;
 }
 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [tip:sched/core] sched/cputime: Allow accounting system time using cpustat index
  2017-01-05 17:11 ` [PATCH 03/10] cputime: Allow accounting system time using cpustat index Frederic Weisbecker
@ 2017-01-14 10:01   ` tip-bot for Frederic Weisbecker
  0 siblings, 0 replies; 34+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2017-01-14 10:01 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: sgruszka, fenghua.yu, borntraeger, mpe, torvalds, riel,
	tony.luck, mingo, heiko.carstens, linux-kernel, schwidefsky, hpa,
	wanpeng.li, paulus, benh, tglx, peterz, fweisbec

Commit-ID:  c31cc6a5187e8b09ccee34f81728a90f80e872e7
Gitweb:     http://git.kernel.org/tip/c31cc6a5187e8b09ccee34f81728a90f80e872e7
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Thu, 5 Jan 2017 18:11:43 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 14 Jan 2017 09:54:11 +0100

sched/cputime: Allow accounting system time using cpustat index

In order to prepare for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y to delay
cputime accounting to the tick, let's provide APIs to account system
time to precise contexts: hardirq, softirq, pure system, ...

Inspired-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1483636310-6557-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/kernel_stat.h |  2 ++
 kernel/sched/cputime.c      | 10 +++++-----
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 00f7768..14b63bb 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -80,6 +80,8 @@ static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
 
 extern void account_user_time(struct task_struct *, cputime_t);
 extern void account_system_time(struct task_struct *, int, cputime_t);
+extern void account_system_index_time(struct task_struct *, cputime_t,
+				      enum cpu_usage_stat);
 extern void account_steal_time(cputime_t);
 extern void account_idle_time(cputime_t);
 
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 7700a9c..aad4283 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -176,8 +176,8 @@ static void account_guest_time(struct task_struct *p, cputime_t cputime)
  * @cputime: the cpu time spent in kernel space since the last update
  * @index: pointer to cpustat field that has to be updated
  */
-static inline
-void __account_system_time(struct task_struct *p, cputime_t cputime, int index)
+void account_system_index_time(struct task_struct *p,
+			       cputime_t cputime, enum cpu_usage_stat index)
 {
 	/* Add system time to process. */
 	p->stime += cputime;
@@ -213,7 +213,7 @@ void account_system_time(struct task_struct *p, int hardirq_offset,
 	else
 		index = CPUTIME_SYSTEM;
 
-	__account_system_time(p, cputime, index);
+	account_system_index_time(p, cputime, index);
 }
 
 /*
@@ -400,7 +400,7 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
 		 * So, we have to handle it separately here.
 		 * Also, p->stime needs to be updated for ksoftirqd.
 		 */
-		__account_system_time(p, cputime, CPUTIME_SOFTIRQ);
+		account_system_index_time(p, cputime, CPUTIME_SOFTIRQ);
 	} else if (user_tick) {
 		account_user_time(p, cputime);
 	} else if (p == rq->idle) {
@@ -408,7 +408,7 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
 	} else if (p->flags & PF_VCPU) { /* System time or guest time */
 		account_guest_time(p, cputime);
 	} else {
-		__account_system_time(p, cputime, CPUTIME_SYSTEM);
+		account_system_index_time(p, cputime, CPUTIME_SYSTEM);
 	}
 }
 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [tip:sched/core] sched/cputime: Export account_guest_time()
  2017-01-05 17:11 ` [PATCH 04/10] cputime: Export account_guest_time Frederic Weisbecker
@ 2017-01-14 10:01   ` tip-bot for Frederic Weisbecker
  0 siblings, 0 replies; 34+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2017-01-14 10:01 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: heiko.carstens, hpa, torvalds, borntraeger, fenghua.yu, mingo,
	peterz, tony.luck, fweisbec, tglx, mpe, riel, sgruszka,
	schwidefsky, paulus, wanpeng.li, linux-kernel, benh

Commit-ID:  1213699ab426608ff1925ab263dd6925102bb92a
Gitweb:     http://git.kernel.org/tip/1213699ab426608ff1925ab263dd6925102bb92a
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Thu, 5 Jan 2017 18:11:44 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 14 Jan 2017 09:54:11 +0100

sched/cputime: Export account_guest_time()

In order to prepare for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y to delay
cputime accounting to the tick, let's allow archs to account cputime
directly to gtime.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1483636310-6557-5-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/kernel_stat.h | 1 +
 kernel/sched/cputime.c      | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 14b63bb..cfd6c0c 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -79,6 +79,7 @@ static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
 }
 
 extern void account_user_time(struct task_struct *, cputime_t);
+extern void account_guest_time(struct task_struct *, cputime_t);
 extern void account_system_time(struct task_struct *, int, cputime_t);
 extern void account_system_index_time(struct task_struct *, cputime_t,
 				      enum cpu_usage_stat);
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index aad4283..5813ee4 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -151,7 +151,7 @@ void account_user_time(struct task_struct *p, cputime_t cputime)
  * @p: the process that the cpu time gets accounted to
  * @cputime: the cpu time spent in virtual machine since the last update
  */
-static void account_guest_time(struct task_struct *p, cputime_t cputime)
+void account_guest_time(struct task_struct *p, cputime_t cputime)
 {
 	u64 *cpustat = kcpustat_this_cpu->cpustat;
 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [tip:sched/core] sched/cputime, powerpc: Prepare accounting structure for cputime flush on tick
  2017-01-05 17:11 ` [PATCH 05/10] powerpc: Prepare accounting structure for cputime flush on tick Frederic Weisbecker
@ 2017-01-14 10:02   ` tip-bot for Frederic Weisbecker
  0 siblings, 0 replies; 34+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2017-01-14 10:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: schwidefsky, wanpeng.li, benh, torvalds, borntraeger, peterz,
	linux-kernel, hpa, fweisbec, tony.luck, tglx, mpe,
	heiko.carstens, mingo, paulus, fenghua.yu, riel, sgruszka

Commit-ID:  8c8b73c4811f2b5e458a7418dca07d2ef85c7db1
Gitweb:     http://git.kernel.org/tip/8c8b73c4811f2b5e458a7418dca07d2ef85c7db1
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Thu, 5 Jan 2017 18:11:45 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 14 Jan 2017 09:54:12 +0100

sched/cputime, powerpc: Prepare accounting structure for cputime flush on tick

In order to prepare for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y to delay
cputime accounting to the tick, provide finegrained accumulators to
powerpc in order to store the cputime until flushing.

While at it, normalize the name of several fields according to common
cputime naming.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1483636310-6557-6-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/powerpc/include/asm/accounting.h | 14 +++++++++++---
 arch/powerpc/kernel/asm-offsets.c     |  8 ++++----
 arch/powerpc/kernel/time.c            | 31 ++++++++++++++++---------------
 arch/powerpc/xmon/xmon.c              |  6 +++---
 4 files changed, 34 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/include/asm/accounting.h b/arch/powerpc/include/asm/accounting.h
index c133246..3abcf98 100644
--- a/arch/powerpc/include/asm/accounting.h
+++ b/arch/powerpc/include/asm/accounting.h
@@ -12,9 +12,17 @@
 
 /* Stuff for accurate time accounting */
 struct cpu_accounting_data {
-	unsigned long user_time;	/* accumulated usermode TB ticks */
-	unsigned long system_time;	/* accumulated system TB ticks */
-	unsigned long user_time_scaled;	/* accumulated usermode SPURR ticks */
+	/* Accumulated cputime values to flush on ticks*/
+	unsigned long utime;
+	unsigned long stime;
+	unsigned long utime_scaled;
+	unsigned long stime_scaled;
+	unsigned long gtime;
+	unsigned long hardirq_time;
+	unsigned long softirq_time;
+	unsigned long steal_time;
+	unsigned long idle_time;
+	/* Internal counters */
 	unsigned long starttime;	/* TB value snapshot */
 	unsigned long starttime_user;	/* TB value on exit to usermode */
 	unsigned long startspurr;	/* SPURR value snapshot */
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 0601e6a..e505319 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -252,9 +252,9 @@ int main(void)
 	DEFINE(ACCOUNT_STARTTIME_USER,
 	       offsetof(struct paca_struct, accounting.starttime_user));
 	DEFINE(ACCOUNT_USER_TIME,
-	       offsetof(struct paca_struct, accounting.user_time));
+	       offsetof(struct paca_struct, accounting.utime));
 	DEFINE(ACCOUNT_SYSTEM_TIME,
-	       offsetof(struct paca_struct, accounting.system_time));
+	       offsetof(struct paca_struct, accounting.stime));
 	DEFINE(PACA_TRAP_SAVE, offsetof(struct paca_struct, trap_save));
 	DEFINE(PACA_NAPSTATELOST, offsetof(struct paca_struct, nap_state_lost));
 	DEFINE(PACA_SPRG_VDSO, offsetof(struct paca_struct, sprg_vdso));
@@ -265,9 +265,9 @@ int main(void)
 	DEFINE(ACCOUNT_STARTTIME_USER,
 	       offsetof(struct thread_info, accounting.starttime_user));
 	DEFINE(ACCOUNT_USER_TIME,
-	       offsetof(struct thread_info, accounting.user_time));
+	       offsetof(struct thread_info, accounting.utime));
 	DEFINE(ACCOUNT_SYSTEM_TIME,
-	       offsetof(struct thread_info, accounting.system_time));
+	       offsetof(struct thread_info, accounting.stime));
 #endif
 #endif /* CONFIG_PPC64 */
 
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index ce21650..17a2cd1 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -271,8 +271,8 @@ void accumulate_stolen_time(void)
 
 	sst = scan_dispatch_log(acct->starttime_user);
 	ust = scan_dispatch_log(acct->starttime);
-	acct->system_time -= sst;
-	acct->user_time -= ust;
+	acct->stime -= sst;
+	acct->utime -= ust;
 	local_paca->stolen_time += ust + sst;
 
 	local_paca->soft_enabled = save_soft_enabled;
@@ -281,10 +281,11 @@ void accumulate_stolen_time(void)
 static inline u64 calculate_stolen_time(u64 stop_tb)
 {
 	u64 stolen = 0;
+	struct cpu_accounting_data *acct = &local_paca->accounting;
 
 	if (get_paca()->dtl_ridx != be64_to_cpu(get_lppaca()->dtl_idx)) {
 		stolen = scan_dispatch_log(stop_tb);
-		get_paca()->accounting.system_time -= stolen;
+		acct->stime -= stolen;
 	}
 
 	stolen += get_paca()->stolen_time;
@@ -316,17 +317,17 @@ static unsigned long vtime_delta(struct task_struct *tsk,
 
 	now = mftb();
 	nowscaled = read_spurr(now);
-	acct->system_time += now - acct->starttime;
+	acct->stime += now - acct->starttime;
 	acct->starttime = now;
 	deltascaled = nowscaled - acct->startspurr;
 	acct->startspurr = nowscaled;
 
 	*stolen = calculate_stolen_time(now);
 
-	delta = acct->system_time;
-	acct->system_time = 0;
-	udelta = acct->user_time - acct->utime_sspurr;
-	acct->utime_sspurr = acct->user_time;
+	delta = acct->stime;
+	acct->stime = 0;
+	udelta = acct->utime - acct->utime_sspurr;
+	acct->utime_sspurr = acct->utime;
 
 	/*
 	 * Because we don't read the SPURR on every kernel entry/exit,
@@ -348,7 +349,7 @@ static unsigned long vtime_delta(struct task_struct *tsk,
 			*sys_scaled = deltascaled;
 		}
 	}
-	acct->user_time_scaled += user_scaled;
+	acct->utime_scaled += user_scaled;
 
 	return delta;
 }
@@ -387,10 +388,10 @@ void vtime_account_user(struct task_struct *tsk)
 	cputime_t utime, utimescaled;
 	struct cpu_accounting_data *acct = get_accounting(tsk);
 
-	utime = acct->user_time;
-	utimescaled = acct->user_time_scaled;
-	acct->user_time = 0;
-	acct->user_time_scaled = 0;
+	utime = acct->utime;
+	utimescaled = acct->utime_scaled;
+	acct->utime = 0;
+	acct->utime_scaled = 0;
 	acct->utime_sspurr = 0;
 	account_user_time(tsk, utime);
 	tsk->utimescaled += utimescaled;
@@ -408,8 +409,8 @@ void arch_vtime_task_switch(struct task_struct *prev)
 
 	acct->starttime = get_accounting(prev)->starttime;
 	acct->startspurr = get_accounting(prev)->startspurr;
-	acct->system_time = 0;
-	acct->user_time = 0;
+	acct->stime = 0;
+	acct->utime = 0;
 }
 #endif /* CONFIG_PPC32 */
 
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 9c0e17c..9f3b170 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2287,9 +2287,9 @@ static void dump_one_paca(int cpu)
 	DUMP(p, subcore_sibling_mask, "x");
 #endif
 
-	DUMP(p, accounting.user_time, "llx");
-	DUMP(p, accounting.system_time, "llx");
-	DUMP(p, accounting.user_time_scaled, "llx");
+	DUMP(p, accounting.utime, "llx");
+	DUMP(p, accounting.stime, "llx");
+	DUMP(p, accounting.utime_scaled, "llx");
 	DUMP(p, accounting.starttime, "llx");
 	DUMP(p, accounting.starttime_user, "llx");
 	DUMP(p, accounting.startspurr, "llx");

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [tip:sched/core] sched/cputime, powerpc: Migrate stolen_time field to the accounting structure
  2017-01-05 17:11 ` [PATCH 06/10] powerpc: Migrate stolen_time field to accounting structure Frederic Weisbecker
@ 2017-01-14 10:03   ` tip-bot for Frederic Weisbecker
  0 siblings, 0 replies; 34+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2017-01-14 10:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, fenghua.yu, wanpeng.li, riel, torvalds, benh,
	peterz, sgruszka, tglx, mingo, paulus, mpe, borntraeger,
	tony.luck, heiko.carstens, hpa, schwidefsky, fweisbec

Commit-ID:  f828c3d0aebab130a19d36336b50afa3414fa0bc
Gitweb:     http://git.kernel.org/tip/f828c3d0aebab130a19d36336b50afa3414fa0bc
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Thu, 5 Jan 2017 18:11:46 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 14 Jan 2017 09:54:12 +0100

sched/cputime, powerpc: Migrate stolen_time field to the accounting structure

That in order to gather all cputime accumulation to the same place.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1483636310-6557-7-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/powerpc/include/asm/paca.h | 1 -
 arch/powerpc/kernel/time.c      | 6 +++---
 arch/powerpc/xmon/xmon.c        | 2 +-
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 6a6792b..708c3e5 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -187,7 +187,6 @@ struct paca_struct {
 
 	/* Stuff for accurate time accounting */
 	struct cpu_accounting_data accounting;
-	u64 stolen_time;		/* TB ticks taken by hypervisor */
 	u64 dtl_ridx;			/* read index in dispatch log */
 	struct dtl_entry *dtl_curr;	/* pointer corresponding to dtl_ridx */
 
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 17a2cd1..714313e 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -273,7 +273,7 @@ void accumulate_stolen_time(void)
 	ust = scan_dispatch_log(acct->starttime);
 	acct->stime -= sst;
 	acct->utime -= ust;
-	local_paca->stolen_time += ust + sst;
+	acct->steal_time += ust + sst;
 
 	local_paca->soft_enabled = save_soft_enabled;
 }
@@ -288,8 +288,8 @@ static inline u64 calculate_stolen_time(u64 stop_tb)
 		acct->stime -= stolen;
 	}
 
-	stolen += get_paca()->stolen_time;
-	get_paca()->stolen_time = 0;
+	stolen += acct->steal_time;
+	acct->steal_time = 0;
 	return stolen;
 }
 
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 9f3b170..3f864c3 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2294,7 +2294,7 @@ static void dump_one_paca(int cpu)
 	DUMP(p, accounting.starttime_user, "llx");
 	DUMP(p, accounting.startspurr, "llx");
 	DUMP(p, accounting.utime_sspurr, "llx");
-	DUMP(p, stolen_time, "llx");
+	DUMP(p, accounting.steal_time, "llx");
 #undef DUMP
 
 	catch_memory_errors = 0;

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [tip:sched/core] sched/cputime, powerpc/vtime: Accumulate cputime and account only on tick/task switch
  2017-01-05 17:11 ` [PATCH 07/10] powerpc/vtime: Accumulate cputime and account only on tick/task switch Frederic Weisbecker
@ 2017-01-14 10:03   ` tip-bot for Frederic Weisbecker
  0 siblings, 0 replies; 34+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2017-01-14 10:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: paulus, tony.luck, fenghua.yu, tglx, wanpeng.li, mpe, riel,
	mingo, borntraeger, hpa, fweisbec, linux-kernel, peterz,
	torvalds, benh, heiko.carstens, sgruszka, schwidefsky

Commit-ID:  a19ff1a2cc9227f82e97836a8ee3e593f622eaf9
Gitweb:     http://git.kernel.org/tip/a19ff1a2cc9227f82e97836a8ee3e593f622eaf9
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Thu, 5 Jan 2017 18:11:47 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 14 Jan 2017 09:54:12 +0100

sched/cputime, powerpc/vtime: Accumulate cputime and account only on tick/task switch

Currently CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y accounts the cputime on
any context boundary: irq entry/exit, guest entry/exit, context switch,
etc...

Calling functions such as account_system_time(), account_user_time()
and such can be costly, especially if they are called on many fastpath
such as twice per IRQ. Those functions do more than just accounting to
kcpustat and task cputime. Depending on the config, some subsystems can
perform unpleasant multiplications and divisions, among other things.

So lets accumulate the cputime instead and delay the accounting on ticks
and context switches only.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1483636310-6557-8-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/powerpc/kernel/time.c | 120 +++++++++++++++++++++++++++++----------------
 1 file changed, 77 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 714313e..4255e69 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -280,17 +280,10 @@ void accumulate_stolen_time(void)
 
 static inline u64 calculate_stolen_time(u64 stop_tb)
 {
-	u64 stolen = 0;
-	struct cpu_accounting_data *acct = &local_paca->accounting;
-
-	if (get_paca()->dtl_ridx != be64_to_cpu(get_lppaca()->dtl_idx)) {
-		stolen = scan_dispatch_log(stop_tb);
-		acct->stime -= stolen;
-	}
+	if (get_paca()->dtl_ridx != be64_to_cpu(get_lppaca()->dtl_idx))
+		return scan_dispatch_log(stop_tb);
 
-	stolen += acct->steal_time;
-	acct->steal_time = 0;
-	return stolen;
+	return 0;
 }
 
 #else /* CONFIG_PPC_SPLPAR */
@@ -306,27 +299,26 @@ static inline u64 calculate_stolen_time(u64 stop_tb)
  * or soft irq state.
  */
 static unsigned long vtime_delta(struct task_struct *tsk,
-				 unsigned long *sys_scaled,
-				 unsigned long *stolen)
+				 unsigned long *stime_scaled,
+				 unsigned long *steal_time)
 {
 	unsigned long now, nowscaled, deltascaled;
-	unsigned long udelta, delta, user_scaled;
+	unsigned long stime;
+	unsigned long utime, utime_scaled;
 	struct cpu_accounting_data *acct = get_accounting(tsk);
 
 	WARN_ON_ONCE(!irqs_disabled());
 
 	now = mftb();
 	nowscaled = read_spurr(now);
-	acct->stime += now - acct->starttime;
+	stime = now - acct->starttime;
 	acct->starttime = now;
 	deltascaled = nowscaled - acct->startspurr;
 	acct->startspurr = nowscaled;
 
-	*stolen = calculate_stolen_time(now);
+	*steal_time = calculate_stolen_time(now);
 
-	delta = acct->stime;
-	acct->stime = 0;
-	udelta = acct->utime - acct->utime_sspurr;
+	utime = acct->utime - acct->utime_sspurr;
 	acct->utime_sspurr = acct->utime;
 
 	/*
@@ -339,39 +331,54 @@ static unsigned long vtime_delta(struct task_struct *tsk,
 	 * the user ticks get saved up in paca->user_time_scaled to be
 	 * used by account_process_tick.
 	 */
-	*sys_scaled = delta;
-	user_scaled = udelta;
-	if (deltascaled != delta + udelta) {
-		if (udelta) {
-			*sys_scaled = deltascaled * delta / (delta + udelta);
-			user_scaled = deltascaled - *sys_scaled;
+	*stime_scaled = stime;
+	utime_scaled = utime;
+	if (deltascaled != stime + utime) {
+		if (utime) {
+			*stime_scaled = deltascaled * stime / (stime + utime);
+			utime_scaled = deltascaled - *stime_scaled;
 		} else {
-			*sys_scaled = deltascaled;
+			*stime_scaled = deltascaled;
 		}
 	}
-	acct->utime_scaled += user_scaled;
+	acct->utime_scaled += utime_scaled;
 
-	return delta;
+	return stime;
 }
 
 void vtime_account_system(struct task_struct *tsk)
 {
-	unsigned long delta, sys_scaled, stolen;
+	unsigned long stime, stime_scaled, steal_time;
+	struct cpu_accounting_data *acct = get_accounting(tsk);
+
+	stime = vtime_delta(tsk, &stime_scaled, &steal_time);
+
+	stime -= min(stime, steal_time);
+	acct->steal_time += steal_time;
 
-	delta = vtime_delta(tsk, &sys_scaled, &stolen);
-	account_system_time(tsk, 0, delta);
-	tsk->stimescaled += sys_scaled;
-	if (stolen)
-		account_steal_time(stolen);
+	if ((tsk->flags & PF_VCPU) && !irq_count()) {
+		acct->gtime += stime;
+		acct->utime_scaled += stime_scaled;
+	} else {
+		if (hardirq_count())
+			acct->hardirq_time += stime;
+		else if (in_serving_softirq())
+			acct->softirq_time += stime;
+		else
+			acct->stime += stime;
+
+		acct->stime_scaled += stime_scaled;
+	}
 }
 EXPORT_SYMBOL_GPL(vtime_account_system);
 
 void vtime_account_idle(struct task_struct *tsk)
 {
-	unsigned long delta, sys_scaled, stolen;
+	unsigned long stime, stime_scaled, steal_time;
+	struct cpu_accounting_data *acct = get_accounting(tsk);
 
-	delta = vtime_delta(tsk, &sys_scaled, &stolen);
-	account_idle_time(delta + stolen);
+	stime = vtime_delta(tsk, &stime_scaled, &steal_time);
+	acct->idle_time += stime + steal_time;
 }
 
 /*
@@ -385,16 +392,45 @@ void vtime_account_idle(struct task_struct *tsk)
  */
 void vtime_account_user(struct task_struct *tsk)
 {
-	cputime_t utime, utimescaled;
 	struct cpu_accounting_data *acct = get_accounting(tsk);
 
-	utime = acct->utime;
-	utimescaled = acct->utime_scaled;
+	if (acct->utime)
+		account_user_time(tsk, acct->utime);
+
+	if (acct->utime_scaled)
+		tsk->utimescaled += acct->utime_scaled;
+
+	if (acct->gtime)
+		account_guest_time(tsk, acct->gtime);
+
+	if (acct->steal_time)
+		account_steal_time(acct->steal_time);
+
+	if (acct->idle_time)
+		account_idle_time(acct->idle_time);
+
+	if (acct->stime)
+		account_system_index_time(tsk, acct->stime, CPUTIME_SYSTEM);
+
+	if (acct->stime_scaled)
+		tsk->stimescaled += acct->stime_scaled;
+
+	if (acct->hardirq_time)
+		account_system_index_time(tsk, acct->hardirq_time, CPUTIME_IRQ);
+
+	if (acct->softirq_time)
+		account_system_index_time(tsk, acct->softirq_time, CPUTIME_SOFTIRQ);
+
 	acct->utime = 0;
 	acct->utime_scaled = 0;
 	acct->utime_sspurr = 0;
-	account_user_time(tsk, utime);
-	tsk->utimescaled += utimescaled;
+	acct->gtime = 0;
+	acct->steal_time = 0;
+	acct->idle_time = 0;
+	acct->stime = 0;
+	acct->stime_scaled = 0;
+	acct->hardirq_time = 0;
+	acct->softirq_time = 0;
 }
 
 #ifdef CONFIG_PPC32
@@ -409,8 +445,6 @@ void arch_vtime_task_switch(struct task_struct *prev)
 
 	acct->starttime = get_accounting(prev)->starttime;
 	acct->startspurr = get_accounting(prev)->startspurr;
-	acct->stime = 0;
-	acct->utime = 0;
 }
 #endif /* CONFIG_PPC32 */
 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [tip:sched/core] sched/cputime, ia64: Accumulate cputime and account only on tick/task switch
  2017-01-05 17:11 ` [PATCH 08/10] ia64: " Frederic Weisbecker
@ 2017-01-14 10:04   ` tip-bot for Frederic Weisbecker
  0 siblings, 0 replies; 34+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2017-01-14 10:04 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: wanpeng.li, hpa, tony.luck, riel, heiko.carstens, peterz,
	sgruszka, benh, torvalds, tglx, mpe, linux-kernel, schwidefsky,
	mingo, fweisbec, paulus, fenghua.yu, borntraeger

Commit-ID:  7dd582305d19fd178bb42ecd1666285ecfb1657a
Gitweb:     http://git.kernel.org/tip/7dd582305d19fd178bb42ecd1666285ecfb1657a
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Thu, 5 Jan 2017 18:11:48 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 14 Jan 2017 09:54:12 +0100

sched/cputime, ia64: Accumulate cputime and account only on tick/task switch

Currently CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y accounts the cputime on
any context boundary: irq entry/exit, guest entry/exit, context switch,
etc...

Calling functions such as account_system_time(), account_user_time()
and such can be costly, especially if they are called on many fastpath
such as twice per IRQ. Those functions do more than just accounting to
kcpustat and task cputime. Depending on the config, some subsystems can
perform unpleasant multiplications and divisions, among other things.

So lets accumulate the cputime instead and delay the accounting on ticks
and context switches only.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1483636310-6557-9-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/ia64/include/asm/thread_info.h |  6 ++++
 arch/ia64/kernel/time.c             | 62 ++++++++++++++++++++++++++++---------
 2 files changed, 53 insertions(+), 15 deletions(-)

diff --git a/arch/ia64/include/asm/thread_info.h b/arch/ia64/include/asm/thread_info.h
index c702642..8742d74 100644
--- a/arch/ia64/include/asm/thread_info.h
+++ b/arch/ia64/include/asm/thread_info.h
@@ -27,6 +27,12 @@ struct thread_info {
 	mm_segment_t addr_limit;	/* user-level address space limit */
 	int preempt_count;		/* 0=premptable, <0=BUG; will also serve as bh-counter */
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
+	__u64 utime;
+	__u64 stime;
+	__u64 gtime;
+	__u64 hardirq_time;
+	__u64 softirq_time;
+	__u64 idle_time;
 	__u64 ac_stamp;
 	__u64 ac_leave;
 	__u64 ac_stime;
diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index 637e741..37f1b31 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -63,14 +63,39 @@ extern cputime_t cycle_to_cputime(u64 cyc);
 
 void vtime_account_user(struct task_struct *tsk)
 {
-	cputime_t delta_utime;
 	struct thread_info *ti = task_thread_info(tsk);
+	cputime_t delta;
 
-	if (ti->ac_utime) {
-		delta_utime = cycle_to_cputime(ti->ac_utime);
-		account_user_time(tsk, delta_utime);
-		ti->ac_utime = 0;
+	if (ti->utime)
+		account_user_time(tsk, cycle_to_cputime(ti->utime));
+
+	if (ti->gtime)
+		account_guest_time(tsk, cycle_to_cputime(ti->gtime));
+
+	if (ti->idle_time)
+		account_idle_time(cycle_to_cputime(ti->idle_time));
+
+	if (ti->stime) {
+		delta = cycle_to_cputime(ti->stime);
+		account_system_index_time(tsk, delta, CPUTIME_SYSTEM);
+	}
+
+	if (ti->hardirq_time) {
+		delta = cycle_to_cputime(ti->hardirq_time);
+		account_system_index_time(tsk, delta, CPUTIME_IRQ);
+	}
+
+	if (ti->softirq_time) {
+		delta = cycle_to_cputime(ti->softirq_time);
+		account_system_index_time(tsk, delta, CPUTIME_SOFTIRQ);
 	}
+
+	ti->utime = 0;
+	ti->gtime = 0;
+	ti->idle_time = 0;
+	ti->stime = 0;
+	ti->hardirq_time = 0;
+	ti->softirq_time = 0;
 }
 
 /*
@@ -91,18 +116,15 @@ void arch_vtime_task_switch(struct task_struct *prev)
  * Account time for a transition between system, hard irq or soft irq state.
  * Note that this function is called with interrupts enabled.
  */
-static cputime_t vtime_delta(struct task_struct *tsk)
+static __u64 vtime_delta(struct task_struct *tsk)
 {
 	struct thread_info *ti = task_thread_info(tsk);
-	cputime_t delta_stime;
-	__u64 now;
+	__u64 now, delta_stime;
 
 	WARN_ON_ONCE(!irqs_disabled());
 
 	now = ia64_get_itc();
-
-	delta_stime = cycle_to_cputime(ti->ac_stime + (now - ti->ac_stamp));
-	ti->ac_stime = 0;
+	delta_stime = now - ti->ac_stamp;
 	ti->ac_stamp = now;
 
 	return delta_stime;
@@ -110,15 +132,25 @@ static cputime_t vtime_delta(struct task_struct *tsk)
 
 void vtime_account_system(struct task_struct *tsk)
 {
-	cputime_t delta = vtime_delta(tsk);
-
-	account_system_time(tsk, 0, delta);
+	struct thread_info *ti = task_thread_info(tsk);
+	__u64 stime = vtime_delta(tsk);
+
+	if ((tsk->flags & PF_VCPU) && !irq_count())
+		ti->gtime += stime;
+	else if (hardirq_count())
+		ti->hardirq_time += stime;
+	else if (in_serving_softirq())
+		ti->softirq_time += stime;
+	else
+		ti->stime += stime;
 }
 EXPORT_SYMBOL_GPL(vtime_account_system);
 
 void vtime_account_idle(struct task_struct *tsk)
 {
-	account_idle_time(vtime_delta(tsk));
+	struct thread_info *ti = task_thread_info(tsk);
+
+	ti->idle_time += vtime_delta(tsk);
 }
 
 #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [tip:sched/core] sched/cputime, s390: Implement delayed accounting of system time
  2017-01-05 17:11 ` [PATCH 09/10] s390/cputime: delayed accounting of system time Frederic Weisbecker
@ 2017-01-14 10:04   ` tip-bot for Martin Schwidefsky
  0 siblings, 0 replies; 34+ messages in thread
From: tip-bot for Martin Schwidefsky @ 2017-01-14 10:04 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: heiko.carstens, torvalds, wanpeng.li, paulus, borntraeger, tglx,
	hpa, riel, peterz, sgruszka, tony.luck, benh, mingo, fweisbec,
	schwidefsky, linux-kernel, mpe, fenghua.yu

Commit-ID:  b7394a5f4ce9542666cc68422c3594ea854adc2c
Gitweb:     http://git.kernel.org/tip/b7394a5f4ce9542666cc68422c3594ea854adc2c
Author:     Martin Schwidefsky <schwidefsky@de.ibm.com>
AuthorDate: Thu, 5 Jan 2017 18:11:49 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 14 Jan 2017 09:54:12 +0100

sched/cputime, s390: Implement delayed accounting of system time

The account_system_time() function is called with a cputime that
occurred while running in the kernel. The function detects which
context the CPU is currently running in and accounts the time to
the correct bucket. This forces the arch code to account the
cputime for hardirq and softirq immediately.

Such accounting function can be costly and perform unwelcome divisions
and multiplications, among others.

The arch code can delay the accounting for system time. For s390
the accounting is done once per timer tick and for each task switch.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
[ Rebase against latest linus tree and move account_system_index_scaled(). ]
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1483636310-6557-10-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/s390/include/asm/lowcore.h   |  65 +++++++++---------
 arch/s390/include/asm/processor.h |   3 +
 arch/s390/kernel/vtime.c          | 136 ++++++++++++++++++++++++--------------
 3 files changed, 125 insertions(+), 79 deletions(-)

diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h
index 9bfad2a..61261e0 100644
--- a/arch/s390/include/asm/lowcore.h
+++ b/arch/s390/include/asm/lowcore.h
@@ -85,53 +85,56 @@ struct lowcore {
 	__u64	mcck_enter_timer;		/* 0x02c0 */
 	__u64	exit_timer;			/* 0x02c8 */
 	__u64	user_timer;			/* 0x02d0 */
-	__u64	system_timer;			/* 0x02d8 */
-	__u64	steal_timer;			/* 0x02e0 */
-	__u64	last_update_timer;		/* 0x02e8 */
-	__u64	last_update_clock;		/* 0x02f0 */
-	__u64	int_clock;			/* 0x02f8 */
-	__u64	mcck_clock;			/* 0x0300 */
-	__u64	clock_comparator;		/* 0x0308 */
+	__u64	guest_timer;			/* 0x02d8 */
+	__u64	system_timer;			/* 0x02e0 */
+	__u64	hardirq_timer;			/* 0x02e8 */
+	__u64	softirq_timer;			/* 0x02f0 */
+	__u64	steal_timer;			/* 0x02f8 */
+	__u64	last_update_timer;		/* 0x0300 */
+	__u64	last_update_clock;		/* 0x0308 */
+	__u64	int_clock;			/* 0x0310 */
+	__u64	mcck_clock;			/* 0x0318 */
+	__u64	clock_comparator;		/* 0x0320 */
 
 	/* Current process. */
-	__u64	current_task;			/* 0x0310 */
-	__u8	pad_0x318[0x320-0x318];		/* 0x0318 */
-	__u64	kernel_stack;			/* 0x0320 */
+	__u64	current_task;			/* 0x0328 */
+	__u8	pad_0x318[0x320-0x318];		/* 0x0330 */
+	__u64	kernel_stack;			/* 0x0338 */
 
 	/* Interrupt, panic and restart stack. */
-	__u64	async_stack;			/* 0x0328 */
-	__u64	panic_stack;			/* 0x0330 */
-	__u64	restart_stack;			/* 0x0338 */
+	__u64	async_stack;			/* 0x0340 */
+	__u64	panic_stack;			/* 0x0348 */
+	__u64	restart_stack;			/* 0x0350 */
 
 	/* Restart function and parameter. */
-	__u64	restart_fn;			/* 0x0340 */
-	__u64	restart_data;			/* 0x0348 */
-	__u64	restart_source;			/* 0x0350 */
+	__u64	restart_fn;			/* 0x0358 */
+	__u64	restart_data;			/* 0x0360 */
+	__u64	restart_source;			/* 0x0368 */
 
 	/* Address space pointer. */
-	__u64	kernel_asce;			/* 0x0358 */
-	__u64	user_asce;			/* 0x0360 */
+	__u64	kernel_asce;			/* 0x0370 */
+	__u64	user_asce;			/* 0x0378 */
 
 	/*
 	 * The lpp and current_pid fields form a
 	 * 64-bit value that is set as program
 	 * parameter with the LPP instruction.
 	 */
-	__u32	lpp;				/* 0x0368 */
-	__u32	current_pid;			/* 0x036c */
+	__u32	lpp;				/* 0x0380 */
+	__u32	current_pid;			/* 0x0384 */
 
 	/* SMP info area */
-	__u32	cpu_nr;				/* 0x0370 */
-	__u32	softirq_pending;		/* 0x0374 */
-	__u64	percpu_offset;			/* 0x0378 */
-	__u64	vdso_per_cpu_data;		/* 0x0380 */
-	__u64	machine_flags;			/* 0x0388 */
-	__u32	preempt_count;			/* 0x0390 */
-	__u8	pad_0x0394[0x0398-0x0394];	/* 0x0394 */
-	__u64	gmap;				/* 0x0398 */
-	__u32	spinlock_lockval;		/* 0x03a0 */
-	__u32	fpu_flags;			/* 0x03a4 */
-	__u8	pad_0x03a8[0x0400-0x03a8];	/* 0x03a8 */
+	__u32	cpu_nr;				/* 0x0388 */
+	__u32	softirq_pending;		/* 0x038c */
+	__u64	percpu_offset;			/* 0x0390 */
+	__u64	vdso_per_cpu_data;		/* 0x0398 */
+	__u64	machine_flags;			/* 0x03a0 */
+	__u32	preempt_count;			/* 0x03a8 */
+	__u8	pad_0x03ac[0x03b0-0x03ac];	/* 0x03ac */
+	__u64	gmap;				/* 0x03b0 */
+	__u32	spinlock_lockval;		/* 0x03b8 */
+	__u32	fpu_flags;			/* 0x03bc */
+	__u8	pad_0x03c0[0x0400-0x03c0];	/* 0x03c0 */
 
 	/* Per cpu primary space access list */
 	__u32	paste[16];			/* 0x0400 */
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 6bca916..977a5b6 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -111,7 +111,10 @@ struct thread_struct {
 	unsigned int  acrs[NUM_ACRS];
         unsigned long ksp;              /* kernel stack pointer             */
 	unsigned long user_timer;	/* task cputime in user space */
+	unsigned long guest_timer;	/* task cputime in kvm guest */
 	unsigned long system_timer;	/* task cputime in kernel space */
+	unsigned long hardirq_timer;	/* task cputime in hardirq context */
+	unsigned long softirq_timer;	/* task cputime in softirq context */
 	unsigned long sys_call_table;	/* system call table address */
 	mm_segment_t mm_segment;
 	unsigned long gmap_addr;	/* address of last gmap fault. */
diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c
index 1b5c5ee..1a53e0b 100644
--- a/arch/s390/kernel/vtime.c
+++ b/arch/s390/kernel/vtime.c
@@ -90,14 +90,41 @@ static void update_mt_scaling(void)
 	__this_cpu_write(mt_scaling_jiffies, jiffies_64);
 }
 
+static inline u64 update_tsk_timer(unsigned long *tsk_vtime, u64 new)
+{
+	u64 delta;
+
+	delta = new - *tsk_vtime;
+	*tsk_vtime = new;
+	return delta;
+}
+
+
+static inline u64 scale_vtime(u64 vtime)
+{
+	u64 mult = __this_cpu_read(mt_scaling_mult);
+	u64 div = __this_cpu_read(mt_scaling_div);
+
+	if (smp_cpu_mtid)
+		return vtime * mult / div;
+	return vtime;
+}
+
+static void account_system_index_scaled(struct task_struct *p,
+					cputime_t cputime, cputime_t scaled,
+					enum cpu_usage_stat index)
+{
+	p->stimescaled += scaled;
+	account_system_index_time(p, cputime, index);
+}
+
 /*
  * Update process times based on virtual cpu times stored by entry.S
  * to the lowcore fields user_timer, system_timer & steal_clock.
  */
 static int do_account_vtime(struct task_struct *tsk)
 {
-	u64 timer, clock, user, system, steal;
-	u64 user_scaled, system_scaled;
+	u64 timer, clock, user, guest, system, hardirq, softirq, steal;
 
 	timer = S390_lowcore.last_update_timer;
 	clock = S390_lowcore.last_update_clock;
@@ -110,36 +137,53 @@ static int do_account_vtime(struct task_struct *tsk)
 #endif
 		: "=m" (S390_lowcore.last_update_timer),
 		  "=m" (S390_lowcore.last_update_clock));
-	S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer;
-	S390_lowcore.steal_timer += S390_lowcore.last_update_clock - clock;
+	clock = S390_lowcore.last_update_clock - clock;
+	timer -= S390_lowcore.last_update_timer;
+
+	if (hardirq_count())
+		S390_lowcore.hardirq_timer += timer;
+	else
+		S390_lowcore.system_timer += timer;
 
 	/* Update MT utilization calculation */
 	if (smp_cpu_mtid &&
 	    time_after64(jiffies_64, this_cpu_read(mt_scaling_jiffies)))
 		update_mt_scaling();
 
-	user = S390_lowcore.user_timer - tsk->thread.user_timer;
-	S390_lowcore.steal_timer -= user;
-	tsk->thread.user_timer = S390_lowcore.user_timer;
-
-	system = S390_lowcore.system_timer - tsk->thread.system_timer;
-	S390_lowcore.steal_timer -= system;
-	tsk->thread.system_timer = S390_lowcore.system_timer;
-
-	user_scaled = user;
-	system_scaled = system;
-	/* Do MT utilization scaling */
-	if (smp_cpu_mtid) {
-		u64 mult = __this_cpu_read(mt_scaling_mult);
-		u64 div = __this_cpu_read(mt_scaling_div);
+	/* Calculate cputime delta */
+	user = update_tsk_timer(&tsk->thread.user_timer,
+				READ_ONCE(S390_lowcore.user_timer));
+	guest = update_tsk_timer(&tsk->thread.guest_timer,
+				 READ_ONCE(S390_lowcore.guest_timer));
+	system = update_tsk_timer(&tsk->thread.system_timer,
+				  READ_ONCE(S390_lowcore.system_timer));
+	hardirq = update_tsk_timer(&tsk->thread.hardirq_timer,
+				   READ_ONCE(S390_lowcore.hardirq_timer));
+	softirq = update_tsk_timer(&tsk->thread.softirq_timer,
+				   READ_ONCE(S390_lowcore.softirq_timer));
+	S390_lowcore.steal_timer +=
+		clock - user - guest - system - hardirq - softirq;
+
+	/* Push account value */
+	if (user) {
+		account_user_time(tsk, user);
+		tsk->utimescaled += scale_vtime(user);
+	}
 
-		user_scaled = (user_scaled * mult) / div;
-		system_scaled = (system_scaled * mult) / div;
+	if (guest) {
+		account_guest_time(tsk, guest);
+		tsk->utimescaled += scale_vtime(guest);
 	}
-	account_user_time(tsk, user);
-	tsk->utimescaled += user_scaled;
-	account_system_time(tsk, 0, system);
-	tsk->stimescaled += system_scaled;
+
+	if (system)
+		account_system_index_scaled(tsk, system, scale_vtime(system),
+					    CPUTIME_SYSTEM);
+	if (hardirq)
+		account_system_index_scaled(tsk, hardirq, scale_vtime(hardirq),
+					    CPUTIME_IRQ);
+	if (softirq)
+		account_system_index_scaled(tsk, softirq, scale_vtime(softirq),
+					    CPUTIME_SOFTIRQ);
 
 	steal = S390_lowcore.steal_timer;
 	if ((s64) steal > 0) {
@@ -147,16 +191,22 @@ static int do_account_vtime(struct task_struct *tsk)
 		account_steal_time(steal);
 	}
 
-	return virt_timer_forward(user + system);
+	return virt_timer_forward(user + guest + system + hardirq + softirq);
 }
 
 void vtime_task_switch(struct task_struct *prev)
 {
 	do_account_vtime(prev);
 	prev->thread.user_timer = S390_lowcore.user_timer;
+	prev->thread.guest_timer = S390_lowcore.guest_timer;
 	prev->thread.system_timer = S390_lowcore.system_timer;
+	prev->thread.hardirq_timer = S390_lowcore.hardirq_timer;
+	prev->thread.softirq_timer = S390_lowcore.softirq_timer;
 	S390_lowcore.user_timer = current->thread.user_timer;
+	S390_lowcore.guest_timer = current->thread.guest_timer;
 	S390_lowcore.system_timer = current->thread.system_timer;
+	S390_lowcore.hardirq_timer = current->thread.hardirq_timer;
+	S390_lowcore.softirq_timer = current->thread.softirq_timer;
 }
 
 /*
@@ -176,32 +226,22 @@ void vtime_account_user(struct task_struct *tsk)
  */
 void vtime_account_irq_enter(struct task_struct *tsk)
 {
-	u64 timer, system, system_scaled;
+	u64 timer;
 
 	timer = S390_lowcore.last_update_timer;
 	S390_lowcore.last_update_timer = get_vtimer();
-	S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer;
-
-	/* Update MT utilization calculation */
-	if (smp_cpu_mtid &&
-	    time_after64(jiffies_64, this_cpu_read(mt_scaling_jiffies)))
-		update_mt_scaling();
-
-	system = S390_lowcore.system_timer - tsk->thread.system_timer;
-	S390_lowcore.steal_timer -= system;
-	tsk->thread.system_timer = S390_lowcore.system_timer;
-	system_scaled = system;
-	/* Do MT utilization scaling */
-	if (smp_cpu_mtid) {
-		u64 mult = __this_cpu_read(mt_scaling_mult);
-		u64 div = __this_cpu_read(mt_scaling_div);
-
-		system_scaled = (system_scaled * mult) / div;
-	}
-	account_system_time(tsk, 0, system);
-	tsk->stimescaled += system_scaled;
-
-	virt_timer_forward(system);
+	timer -= S390_lowcore.last_update_timer;
+
+	if ((tsk->flags & PF_VCPU) && (irq_count() == 0))
+		S390_lowcore.guest_timer += timer;
+	else if (hardirq_count())
+		S390_lowcore.hardirq_timer += timer;
+	else if (in_serving_softirq())
+		S390_lowcore.softirq_timer += timer;
+	else
+		S390_lowcore.system_timer += timer;
+
+	virt_timer_forward(timer);
 }
 EXPORT_SYMBOL_GPL(vtime_account_irq_enter);
 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [tip:sched/core] sched/cputime: Rename vtime_account_user() to vtime_flush()
  2017-01-05 17:11 ` [PATCH 10/10] vtime: Rename vtime_account_user() to vtime_flush() Frederic Weisbecker
@ 2017-01-14 10:05   ` tip-bot for Frederic Weisbecker
  0 siblings, 0 replies; 34+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2017-01-14 10:05 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: riel, linux-kernel, heiko.carstens, schwidefsky, paulus,
	fweisbec, torvalds, peterz, sgruszka, mpe, benh, wanpeng.li,
	tglx, borntraeger, mingo, fenghua.yu, hpa, tony.luck

Commit-ID:  c8d7dabf8f91fadd265e6eb87afb201d14ea299b
Gitweb:     http://git.kernel.org/tip/c8d7dabf8f91fadd265e6eb87afb201d14ea299b
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Thu, 5 Jan 2017 18:11:50 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 14 Jan 2017 09:54:13 +0100

sched/cputime: Rename vtime_account_user() to vtime_flush()

CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y used to accumulate user time and
account it on ticks and context switches only through the
vtime_account_user() function.

Now this model has been generalized on the 3 archs for all kind of
cputime (system, irq, ...) and all the cputime flushing happens under
vtime_account_user().

So let's rename this function to better reflect its new role.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1483636310-6557-11-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/ia64/kernel/time.c     | 2 +-
 arch/powerpc/kernel/time.c  | 6 ++----
 arch/s390/kernel/vtime.c    | 2 +-
 include/linux/kernel_stat.h | 2 +-
 include/linux/vtime.h       | 7 +++++--
 kernel/sched/cputime.c      | 4 +---
 6 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index 37f1b31..d040f12 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -61,7 +61,7 @@ static struct clocksource *itc_clocksource;
 
 extern cputime_t cycle_to_cputime(u64 cyc);
 
-void vtime_account_user(struct task_struct *tsk)
+void vtime_flush(struct task_struct *tsk)
 {
 	struct thread_info *ti = task_thread_info(tsk);
 	cputime_t delta;
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 4255e69..02e9730 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -382,15 +382,13 @@ void vtime_account_idle(struct task_struct *tsk)
 }
 
 /*
- * Transfer the user time accumulated in the paca
- * by the exception entry and exit code to the generic
- * process user time records.
+ * Account the whole cputime accumulated in the paca
  * Must be called with interrupts disabled.
  * Assumes that vtime_account_system/idle() has been called
  * recently (i.e. since the last entry from usermode) so that
  * get_paca()->user_time_scaled is up to date.
  */
-void vtime_account_user(struct task_struct *tsk)
+void vtime_flush(struct task_struct *tsk)
 {
 	struct cpu_accounting_data *acct = get_accounting(tsk);
 
diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c
index 1a53e0b..0a9e5d6 100644
--- a/arch/s390/kernel/vtime.c
+++ b/arch/s390/kernel/vtime.c
@@ -214,7 +214,7 @@ void vtime_task_switch(struct task_struct *prev)
  * accounting system time in order to correctly compute
  * the stolen time accounting.
  */
-void vtime_account_user(struct task_struct *tsk)
+void vtime_flush(struct task_struct *tsk)
 {
 	if (do_account_vtime(tsk))
 		virt_timer_expire();
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index cfd6c0c..c3e38de 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -89,7 +89,7 @@ extern void account_idle_time(cputime_t);
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
 static inline void account_process_tick(struct task_struct *tsk, int user)
 {
-	vtime_account_user(tsk);
+	vtime_flush(tsk);
 }
 #else
 extern void account_process_tick(struct task_struct *, int user);
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index aa9bfea..0681fe2 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -58,27 +58,28 @@ static inline void vtime_task_switch(struct task_struct *prev)
 
 extern void vtime_account_system(struct task_struct *tsk);
 extern void vtime_account_idle(struct task_struct *tsk);
-extern void vtime_account_user(struct task_struct *tsk);
 
 #else /* !CONFIG_VIRT_CPU_ACCOUNTING */
 
 static inline void vtime_task_switch(struct task_struct *prev) { }
 static inline void vtime_account_system(struct task_struct *tsk) { }
-static inline void vtime_account_user(struct task_struct *tsk) { }
 #endif /* !CONFIG_VIRT_CPU_ACCOUNTING */
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
 extern void arch_vtime_task_switch(struct task_struct *tsk);
+extern void vtime_account_user(struct task_struct *tsk);
 extern void vtime_user_enter(struct task_struct *tsk);
 
 static inline void vtime_user_exit(struct task_struct *tsk)
 {
 	vtime_account_user(tsk);
 }
+
 extern void vtime_guest_enter(struct task_struct *tsk);
 extern void vtime_guest_exit(struct task_struct *tsk);
 extern void vtime_init_idle(struct task_struct *tsk, int cpu);
 #else /* !CONFIG_VIRT_CPU_ACCOUNTING_GEN  */
+static inline void vtime_account_user(struct task_struct *tsk) { }
 static inline void vtime_user_enter(struct task_struct *tsk) { }
 static inline void vtime_user_exit(struct task_struct *tsk) { }
 static inline void vtime_guest_enter(struct task_struct *tsk) { }
@@ -93,9 +94,11 @@ static inline void vtime_account_irq_exit(struct task_struct *tsk)
 	/* On hard|softirq exit we always account to hard|softirq cputime */
 	vtime_account_system(tsk);
 }
+extern void vtime_flush(struct task_struct *tsk);
 #else /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
 static inline void vtime_account_irq_enter(struct task_struct *tsk) { }
 static inline void vtime_account_irq_exit(struct task_struct *tsk) { }
+static inline void vtime_flush(struct task_struct *tsk) { }
 #endif
 
 
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 5813ee4..f7c14cc 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -437,9 +437,7 @@ void vtime_common_task_switch(struct task_struct *prev)
 	else
 		vtime_account_system(prev);
 
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
-	vtime_account_user(prev);
-#endif
+	vtime_flush(prev);
 	arch_vtime_task_switch(prev);
 }
 #endif

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/10] s390/cputime: delayed accounting of system time
  2016-12-20 14:13               ` Martin Schwidefsky
@ 2016-12-20 14:30                 ` Frederic Weisbecker
  0 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2016-12-20 14:30 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: LKML, Tony Luck, Wanpeng Li, Peter Zijlstra, Michael Ellerman,
	Heiko Carstens, Benjamin Herrenschmidt, Thomas Gleixner,
	Paul Mackerras, Ingo Molnar, Fenghua Yu, Rik van Riel,
	Stanislaw Gruszka

On Tue, Dec 20, 2016 at 03:13:53PM +0100, Martin Schwidefsky wrote:
> On Wed, 14 Dec 2016 02:44:46 +0100
> Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > On Tue, Dec 13, 2016 at 02:21:21PM +0100, Martin Schwidefsky wrote:
> > > On Tue, 13 Dec 2016 12:13:22 +0100
> > > Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:
> > >   
> > > > On Mon, 12 Dec 2016 16:02:30 +0100
> > > > Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > > >   
> > > > > On Mon, Dec 12, 2016 at 11:27:54AM +0100, Martin Schwidefsky wrote:    
> > > > > > 3) The call to vtime_flush in account_process_tick is done in irq context from
> > > > > >    update_process_times. hardirq_offset==1 is also correct.      
> > > > > 
> > > > > Let's see this for example:
> > > > > 
> > > > > +       if ((tsk->flags & PF_VCPU) && (irq_count() - hardirq_offset == 0))
> > > > > +               S390_lowcore.guest_timer += timer;
> > > > > 
> > > > > If the tick is interrupting guest, we have accounted the guest time on tick IRQ entry.
> > > > > Now we are in the middle of the tick interrupt and since hardirq_offset is 1, we
> > > > > are taking the above path by accounting half of the tick-IRQ time as guest, which is wrong,
> > > > > it's actually IRQ time.    
> > > > 
> > > > Hmm, you got me there. The system time from irq_enter until account_process_tick
> > > > is reached is indeed IRQ time. It is not much but it is incorrect. The best fix
> > > > would be to rip out the accounting of the system time from account_process_tick
> > > > as irq_enter / irq_exit will do system time accounting anyway. To do that
> > > > do_account_vtime needs to be split, because for the task switch we need to
> > > > account the system time of the previous task.  
> > > 
> > > New patch for the delayed cputime account. I can not just rip out system time
> > > accounting from account_process_tick after all, I need a sync point for the
> > > steal time calculation. It basically is the same patch as before but with a new
> > > helper update_tsk_timer, the removal of hardirq_offset and a simplification
> > > for do_account_vtime: the last accounting delta is either hardirq time for
> > > the tick or system time for the task switch.
> > > 
> > > Keeping my finger crossed..  
> > 
> > The patch looks good. But you might want to remove the hardirq_offset in a
> > separate patch to queue for this merge window (I'm not sure if it needs a
> > stable tag, the argument may be there since the beginning).
> > 
> > Because the rest depends on the series that is unlikely to be queued in this
> > merge window at this stage.
> 
> I just pushed two fixes to the linux-s390:fixes tree which will be merged
> eventually after the first -rc candidate for 4.10 is released.
> 
> This includes "s390/vtime: correct system time accounting" which fixes the
> hardirq_offset bug for s390.
> 
> The link to the fixes-tree in case you want to look at the patch:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git fixes

Thanks a lot!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/10] s390/cputime: delayed accounting of system time
  2016-12-14  1:44             ` Frederic Weisbecker
@ 2016-12-20 14:13               ` Martin Schwidefsky
  2016-12-20 14:30                 ` Frederic Weisbecker
  0 siblings, 1 reply; 34+ messages in thread
From: Martin Schwidefsky @ 2016-12-20 14:13 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Tony Luck, Wanpeng Li, Peter Zijlstra, Michael Ellerman,
	Heiko Carstens, Benjamin Herrenschmidt, Thomas Gleixner,
	Paul Mackerras, Ingo Molnar, Fenghua Yu, Rik van Riel,
	Stanislaw Gruszka

On Wed, 14 Dec 2016 02:44:46 +0100
Frederic Weisbecker <fweisbec@gmail.com> wrote:

> On Tue, Dec 13, 2016 at 02:21:21PM +0100, Martin Schwidefsky wrote:
> > On Tue, 13 Dec 2016 12:13:22 +0100
> > Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:
> >   
> > > On Mon, 12 Dec 2016 16:02:30 +0100
> > > Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > >   
> > > > On Mon, Dec 12, 2016 at 11:27:54AM +0100, Martin Schwidefsky wrote:    
> > > > > 3) The call to vtime_flush in account_process_tick is done in irq context from
> > > > >    update_process_times. hardirq_offset==1 is also correct.      
> > > > 
> > > > Let's see this for example:
> > > > 
> > > > +       if ((tsk->flags & PF_VCPU) && (irq_count() - hardirq_offset == 0))
> > > > +               S390_lowcore.guest_timer += timer;
> > > > 
> > > > If the tick is interrupting guest, we have accounted the guest time on tick IRQ entry.
> > > > Now we are in the middle of the tick interrupt and since hardirq_offset is 1, we
> > > > are taking the above path by accounting half of the tick-IRQ time as guest, which is wrong,
> > > > it's actually IRQ time.    
> > > 
> > > Hmm, you got me there. The system time from irq_enter until account_process_tick
> > > is reached is indeed IRQ time. It is not much but it is incorrect. The best fix
> > > would be to rip out the accounting of the system time from account_process_tick
> > > as irq_enter / irq_exit will do system time accounting anyway. To do that
> > > do_account_vtime needs to be split, because for the task switch we need to
> > > account the system time of the previous task.  
> > 
> > New patch for the delayed cputime account. I can not just rip out system time
> > accounting from account_process_tick after all, I need a sync point for the
> > steal time calculation. It basically is the same patch as before but with a new
> > helper update_tsk_timer, the removal of hardirq_offset and a simplification
> > for do_account_vtime: the last accounting delta is either hardirq time for
> > the tick or system time for the task switch.
> > 
> > Keeping my finger crossed..  
> 
> The patch looks good. But you might want to remove the hardirq_offset in a
> separate patch to queue for this merge window (I'm not sure if it needs a
> stable tag, the argument may be there since the beginning).
> 
> Because the rest depends on the series that is unlikely to be queued in this
> merge window at this stage.

I just pushed two fixes to the linux-s390:fixes tree which will be merged
eventually after the first -rc candidate for 4.10 is released.

This includes "s390/vtime: correct system time accounting" which fixes the
hardirq_offset bug for s390.

The link to the fixes-tree in case you want to look at the patch:

git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git fixes

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/10] s390/cputime: delayed accounting of system time
  2016-12-13 13:21           ` Martin Schwidefsky
@ 2016-12-14  1:44             ` Frederic Weisbecker
  2016-12-20 14:13               ` Martin Schwidefsky
  0 siblings, 1 reply; 34+ messages in thread
From: Frederic Weisbecker @ 2016-12-14  1:44 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: LKML, Tony Luck, Wanpeng Li, Peter Zijlstra, Michael Ellerman,
	Heiko Carstens, Benjamin Herrenschmidt, Thomas Gleixner,
	Paul Mackerras, Ingo Molnar, Fenghua Yu, Rik van Riel,
	Stanislaw Gruszka

On Tue, Dec 13, 2016 at 02:21:21PM +0100, Martin Schwidefsky wrote:
> On Tue, 13 Dec 2016 12:13:22 +0100
> Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:
> 
> > On Mon, 12 Dec 2016 16:02:30 +0100
> > Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > 
> > > On Mon, Dec 12, 2016 at 11:27:54AM +0100, Martin Schwidefsky wrote:  
> > > > 3) The call to vtime_flush in account_process_tick is done in irq context from
> > > >    update_process_times. hardirq_offset==1 is also correct.    
> > > 
> > > Let's see this for example:
> > > 
> > > +       if ((tsk->flags & PF_VCPU) && (irq_count() - hardirq_offset == 0))
> > > +               S390_lowcore.guest_timer += timer;
> > > 
> > > If the tick is interrupting guest, we have accounted the guest time on tick IRQ entry.
> > > Now we are in the middle of the tick interrupt and since hardirq_offset is 1, we
> > > are taking the above path by accounting half of the tick-IRQ time as guest, which is wrong,
> > > it's actually IRQ time.  
> > 
> > Hmm, you got me there. The system time from irq_enter until account_process_tick
> > is reached is indeed IRQ time. It is not much but it is incorrect. The best fix
> > would be to rip out the accounting of the system time from account_process_tick
> > as irq_enter / irq_exit will do system time accounting anyway. To do that
> > do_account_vtime needs to be split, because for the task switch we need to
> > account the system time of the previous task.
> 
> New patch for the delayed cputime account. I can not just rip out system time
> accounting from account_process_tick after all, I need a sync point for the
> steal time calculation. It basically is the same patch as before but with a new
> helper update_tsk_timer, the removal of hardirq_offset and a simplification
> for do_account_vtime: the last accounting delta is either hardirq time for
> the tick or system time for the task switch.
> 
> Keeping my finger crossed..

The patch looks good. But you might want to remove the hardirq_offset in a
separate patch to queue for this merge window (I'm not sure if it needs a
stable tag, the argument may be there since the beginning).

Because the rest depends on the series that is unlikely to be queued in this
merge window at this stage.

Thanks!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/10] s390/cputime: delayed accounting of system time
  2016-12-13 11:13         ` Martin Schwidefsky
  2016-12-13 13:21           ` Martin Schwidefsky
@ 2016-12-13 14:38           ` Frederic Weisbecker
  1 sibling, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2016-12-13 14:38 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: LKML, Tony Luck, Wanpeng Li, Peter Zijlstra, Michael Ellerman,
	Heiko Carstens, Benjamin Herrenschmidt, Thomas Gleixner,
	Paul Mackerras, Ingo Molnar, Fenghua Yu, Rik van Riel,
	Stanislaw Gruszka

On Tue, Dec 13, 2016 at 12:13:22PM +0100, Martin Schwidefsky wrote:
> On Mon, 12 Dec 2016 16:02:30 +0100
> Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > On Mon, Dec 12, 2016 at 11:27:54AM +0100, Martin Schwidefsky wrote:
> > > 3) The call to vtime_flush in account_process_tick is done in irq context from
> > >    update_process_times. hardirq_offset==1 is also correct.  
> > 
> > Let's see this for example:
> > 
> > +       if ((tsk->flags & PF_VCPU) && (irq_count() - hardirq_offset == 0))
> > +               S390_lowcore.guest_timer += timer;
> > 
> > If the tick is interrupting guest, we have accounted the guest time on tick IRQ entry.
> > Now we are in the middle of the tick interrupt and since hardirq_offset is 1, we
> > are taking the above path by accounting half of the tick-IRQ time as guest, which is wrong,
> > it's actually IRQ time.
> 
> Hmm, you got me there. The system time from irq_enter until account_process_tick
> is reached is indeed IRQ time. It is not much but it is incorrect. The best fix
> would be to rip out the accounting of the system time from account_process_tick
> as irq_enter / irq_exit will do system time accounting anyway. To do that
> do_account_vtime needs to be split, because for the task switch we need to
> account the system time of the previous task.

Exactly!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/10] s390/cputime: delayed accounting of system time
  2016-12-13 11:13         ` Martin Schwidefsky
@ 2016-12-13 13:21           ` Martin Schwidefsky
  2016-12-14  1:44             ` Frederic Weisbecker
  2016-12-13 14:38           ` Frederic Weisbecker
  1 sibling, 1 reply; 34+ messages in thread
From: Martin Schwidefsky @ 2016-12-13 13:21 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Tony Luck, Wanpeng Li, Peter Zijlstra, Michael Ellerman,
	Heiko Carstens, Benjamin Herrenschmidt, Thomas Gleixner,
	Paul Mackerras, Ingo Molnar, Fenghua Yu, Rik van Riel,
	Stanislaw Gruszka

On Tue, 13 Dec 2016 12:13:22 +0100
Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:

> On Mon, 12 Dec 2016 16:02:30 +0100
> Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > On Mon, Dec 12, 2016 at 11:27:54AM +0100, Martin Schwidefsky wrote:  
> > > 3) The call to vtime_flush in account_process_tick is done in irq context from
> > >    update_process_times. hardirq_offset==1 is also correct.    
> > 
> > Let's see this for example:
> > 
> > +       if ((tsk->flags & PF_VCPU) && (irq_count() - hardirq_offset == 0))
> > +               S390_lowcore.guest_timer += timer;
> > 
> > If the tick is interrupting guest, we have accounted the guest time on tick IRQ entry.
> > Now we are in the middle of the tick interrupt and since hardirq_offset is 1, we
> > are taking the above path by accounting half of the tick-IRQ time as guest, which is wrong,
> > it's actually IRQ time.  
> 
> Hmm, you got me there. The system time from irq_enter until account_process_tick
> is reached is indeed IRQ time. It is not much but it is incorrect. The best fix
> would be to rip out the accounting of the system time from account_process_tick
> as irq_enter / irq_exit will do system time accounting anyway. To do that
> do_account_vtime needs to be split, because for the task switch we need to
> account the system time of the previous task.

New patch for the delayed cputime account. I can not just rip out system time
accounting from account_process_tick after all, I need a sync point for the
steal time calculation. It basically is the same patch as before but with a new
helper update_tsk_timer, the removal of hardirq_offset and a simplification
for do_account_vtime: the last accounting delta is either hardirq time for
the tick or system time for the task switch.

Keeping my finger crossed..

--
>From 2d7ca61909c438f947d27110b7b2a2b67a44607a Mon Sep 17 00:00:00 2001
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
Date: Sat, 3 Dec 2016 15:56:50 +0100
Subject: [PATCH 1/2] s390/cputime: delayed accounting of system time

The account_system_time() function is called with a cputime that
occurred while running in the kernel. The function detects which
context the CPU is currently running in and accounts the time to
the correct bucket. This forces the arch code to account the
cputime for hardirq and softirq immediately.

Such accounting function can be costly and perform unwelcome divisions
and multiplications, among others.

The arch code can delay the accounting for system time. For s390
the accounting is done once per timer tick and for each task switch.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
[rebase against latest cputime tree, massaged changelog accordingly]
---
 arch/s390/include/asm/lowcore.h   |  65 +++++++++---------
 arch/s390/include/asm/processor.h |   3 +
 arch/s390/kernel/vtime.c          | 134 +++++++++++++++++++++++---------------
 3 files changed, 120 insertions(+), 82 deletions(-)

diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h
index 9bfad2a..61261e0 100644
--- a/arch/s390/include/asm/lowcore.h
+++ b/arch/s390/include/asm/lowcore.h
@@ -85,53 +85,56 @@ struct lowcore {
 	__u64	mcck_enter_timer;		/* 0x02c0 */
 	__u64	exit_timer;			/* 0x02c8 */
 	__u64	user_timer;			/* 0x02d0 */
-	__u64	system_timer;			/* 0x02d8 */
-	__u64	steal_timer;			/* 0x02e0 */
-	__u64	last_update_timer;		/* 0x02e8 */
-	__u64	last_update_clock;		/* 0x02f0 */
-	__u64	int_clock;			/* 0x02f8 */
-	__u64	mcck_clock;			/* 0x0300 */
-	__u64	clock_comparator;		/* 0x0308 */
+	__u64	guest_timer;			/* 0x02d8 */
+	__u64	system_timer;			/* 0x02e0 */
+	__u64	hardirq_timer;			/* 0x02e8 */
+	__u64	softirq_timer;			/* 0x02f0 */
+	__u64	steal_timer;			/* 0x02f8 */
+	__u64	last_update_timer;		/* 0x0300 */
+	__u64	last_update_clock;		/* 0x0308 */
+	__u64	int_clock;			/* 0x0310 */
+	__u64	mcck_clock;			/* 0x0318 */
+	__u64	clock_comparator;		/* 0x0320 */
 
 	/* Current process. */
-	__u64	current_task;			/* 0x0310 */
-	__u8	pad_0x318[0x320-0x318];		/* 0x0318 */
-	__u64	kernel_stack;			/* 0x0320 */
+	__u64	current_task;			/* 0x0328 */
+	__u8	pad_0x318[0x320-0x318];		/* 0x0330 */
+	__u64	kernel_stack;			/* 0x0338 */
 
 	/* Interrupt, panic and restart stack. */
-	__u64	async_stack;			/* 0x0328 */
-	__u64	panic_stack;			/* 0x0330 */
-	__u64	restart_stack;			/* 0x0338 */
+	__u64	async_stack;			/* 0x0340 */
+	__u64	panic_stack;			/* 0x0348 */
+	__u64	restart_stack;			/* 0x0350 */
 
 	/* Restart function and parameter. */
-	__u64	restart_fn;			/* 0x0340 */
-	__u64	restart_data;			/* 0x0348 */
-	__u64	restart_source;			/* 0x0350 */
+	__u64	restart_fn;			/* 0x0358 */
+	__u64	restart_data;			/* 0x0360 */
+	__u64	restart_source;			/* 0x0368 */
 
 	/* Address space pointer. */
-	__u64	kernel_asce;			/* 0x0358 */
-	__u64	user_asce;			/* 0x0360 */
+	__u64	kernel_asce;			/* 0x0370 */
+	__u64	user_asce;			/* 0x0378 */
 
 	/*
 	 * The lpp and current_pid fields form a
 	 * 64-bit value that is set as program
 	 * parameter with the LPP instruction.
 	 */
-	__u32	lpp;				/* 0x0368 */
-	__u32	current_pid;			/* 0x036c */
+	__u32	lpp;				/* 0x0380 */
+	__u32	current_pid;			/* 0x0384 */
 
 	/* SMP info area */
-	__u32	cpu_nr;				/* 0x0370 */
-	__u32	softirq_pending;		/* 0x0374 */
-	__u64	percpu_offset;			/* 0x0378 */
-	__u64	vdso_per_cpu_data;		/* 0x0380 */
-	__u64	machine_flags;			/* 0x0388 */
-	__u32	preempt_count;			/* 0x0390 */
-	__u8	pad_0x0394[0x0398-0x0394];	/* 0x0394 */
-	__u64	gmap;				/* 0x0398 */
-	__u32	spinlock_lockval;		/* 0x03a0 */
-	__u32	fpu_flags;			/* 0x03a4 */
-	__u8	pad_0x03a8[0x0400-0x03a8];	/* 0x03a8 */
+	__u32	cpu_nr;				/* 0x0388 */
+	__u32	softirq_pending;		/* 0x038c */
+	__u64	percpu_offset;			/* 0x0390 */
+	__u64	vdso_per_cpu_data;		/* 0x0398 */
+	__u64	machine_flags;			/* 0x03a0 */
+	__u32	preempt_count;			/* 0x03a8 */
+	__u8	pad_0x03ac[0x03b0-0x03ac];	/* 0x03ac */
+	__u64	gmap;				/* 0x03b0 */
+	__u32	spinlock_lockval;		/* 0x03b8 */
+	__u32	fpu_flags;			/* 0x03bc */
+	__u8	pad_0x03c0[0x0400-0x03c0];	/* 0x03c0 */
 
 	/* Per cpu primary space access list */
 	__u32	paste[16];			/* 0x0400 */
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 9c00351..6f07907 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -111,7 +111,10 @@ struct thread_struct {
 	unsigned int  acrs[NUM_ACRS];
         unsigned long ksp;              /* kernel stack pointer             */
 	unsigned long user_timer;	/* task cputime in user space */
+	unsigned long guest_timer;	/* task cputime in kvm guest */
 	unsigned long system_timer;	/* task cputime in kernel space */
+	unsigned long hardirq_timer;	/* task cputime in hardirq context */
+	unsigned long softirq_timer;	/* task cputime in softirq context */
 	unsigned long sys_call_table;	/* system call table address */
 	mm_segment_t mm_segment;
 	unsigned long gmap_addr;	/* address of last gmap fault. */
diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c
index 6b246aa..65dc7b6 100644
--- a/arch/s390/kernel/vtime.c
+++ b/arch/s390/kernel/vtime.c
@@ -90,14 +90,33 @@ static void update_mt_scaling(void)
 	__this_cpu_write(mt_scaling_jiffies, jiffies_64);
 }
 
+static inline u64 update_tsk_timer(unsigned long *tsk_vtime, u64 new)
+{
+	u64 delta;
+
+	delta = new - *tsk_vtime;
+	*tsk_vtime = new;
+	return delta;
+}
+
+
+static inline u64 scale_vtime(u64 vtime)
+{
+	u64 mult = __this_cpu_read(mt_scaling_mult);
+	u64 div = __this_cpu_read(mt_scaling_div);
+
+	if (smp_cpu_mtid)
+		return vtime * mult / div;
+	return vtime;
+}
+
 /*
  * Update process times based on virtual cpu times stored by entry.S
  * to the lowcore fields user_timer, system_timer & steal_clock.
  */
-static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
+static int do_account_vtime(struct task_struct *tsk)
 {
-	u64 timer, clock, user, system, steal;
-	u64 user_scaled, system_scaled;
+	u64 timer, clock, user, guest, system, hardirq, softirq, steal;
 
 	timer = S390_lowcore.last_update_timer;
 	clock = S390_lowcore.last_update_clock;
@@ -110,36 +129,53 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
 #endif
 		: "=m" (S390_lowcore.last_update_timer),
 		  "=m" (S390_lowcore.last_update_clock));
-	S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer;
-	S390_lowcore.steal_timer += S390_lowcore.last_update_clock - clock;
+	clock = S390_lowcore.last_update_clock - clock;
+	timer -= S390_lowcore.last_update_timer;
+
+	if (hardirq_count())
+		S390_lowcore.hardirq_timer += timer;
+	else
+		S390_lowcore.system_timer += timer;
 
 	/* Update MT utilization calculation */
 	if (smp_cpu_mtid &&
 	    time_after64(jiffies_64, this_cpu_read(mt_scaling_jiffies)))
 		update_mt_scaling();
 
-	user = S390_lowcore.user_timer - tsk->thread.user_timer;
-	S390_lowcore.steal_timer -= user;
-	tsk->thread.user_timer = S390_lowcore.user_timer;
-
-	system = S390_lowcore.system_timer - tsk->thread.system_timer;
-	S390_lowcore.steal_timer -= system;
-	tsk->thread.system_timer = S390_lowcore.system_timer;
-
-	user_scaled = user;
-	system_scaled = system;
-	/* Do MT utilization scaling */
-	if (smp_cpu_mtid) {
-		u64 mult = __this_cpu_read(mt_scaling_mult);
-		u64 div = __this_cpu_read(mt_scaling_div);
+	/* Calculate cputime delta */
+	user = update_tsk_timer(&tsk->thread.user_timer,
+				READ_ONCE(S390_lowcore.user_timer));
+	guest = update_tsk_timer(&tsk->thread.guest_timer,
+				 READ_ONCE(S390_lowcore.guest_timer));
+	system = update_tsk_timer(&tsk->thread.system_timer,
+				  READ_ONCE(S390_lowcore.system_timer));
+	hardirq = update_tsk_timer(&tsk->thread.hardirq_timer,
+				   READ_ONCE(S390_lowcore.hardirq_timer));
+	softirq = update_tsk_timer(&tsk->thread.softirq_timer,
+				   READ_ONCE(S390_lowcore.softirq_timer));
+	S390_lowcore.steal_timer +=
+		clock - user - guest - system - hardirq - softirq;
+
+	/* Push account value */
+	if (user) {
+		account_user_time(tsk, user);
+		tsk->utimescaled += scale_vtime(user);
+	}
 
-		user_scaled = (user_scaled * mult) / div;
-		system_scaled = (system_scaled * mult) / div;
+	if (guest) {
+		account_guest_time(tsk, guest);
+		tsk->utimescaled += scale_vtime(guest);
 	}
-	account_user_time(tsk, user);
-	tsk->utimescaled += user_scaled;
-	account_system_time(tsk, hardirq_offset, system);
-	tsk->stimescaled += system_scaled;
+
+	if (system)
+		account_system_index_scaled(tsk, system, scale_vtime(system),
+					    CPUTIME_SYSTEM);
+	if (hardirq)
+		account_system_index_scaled(tsk, hardirq, scale_vtime(hardirq),
+					    CPUTIME_IRQ);
+	if (softirq)
+		account_system_index_scaled(tsk, softirq, scale_vtime(softirq),
+					    CPUTIME_SOFTIRQ);
 
 	steal = S390_lowcore.steal_timer;
 	if ((s64) steal > 0) {
@@ -147,16 +183,22 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
 		account_steal_time(steal);
 	}
 
-	return virt_timer_forward(user + system);
+	return virt_timer_forward(user + guest + system + hardirq + softirq);
 }
 
 void vtime_task_switch(struct task_struct *prev)
 {
-	do_account_vtime(prev, 0);
+	do_account_vtime(prev);
 	prev->thread.user_timer = S390_lowcore.user_timer;
+	prev->thread.guest_timer = S390_lowcore.guest_timer;
 	prev->thread.system_timer = S390_lowcore.system_timer;
+	prev->thread.hardirq_timer = S390_lowcore.hardirq_timer;
+	prev->thread.softirq_timer = S390_lowcore.softirq_timer;
 	S390_lowcore.user_timer = current->thread.user_timer;
+	S390_lowcore.guest_timer = current->thread.guest_timer;
 	S390_lowcore.system_timer = current->thread.system_timer;
+	S390_lowcore.hardirq_timer = current->thread.hardirq_timer;
+	S390_lowcore.softirq_timer = current->thread.softirq_timer;
 }
 
 /*
@@ -166,7 +208,7 @@ void vtime_task_switch(struct task_struct *prev)
  */
 void vtime_account_user(struct task_struct *tsk)
 {
-	if (do_account_vtime(tsk, HARDIRQ_OFFSET))
+	if (do_account_vtime(tsk))
 		virt_timer_expire();
 }
 
@@ -176,32 +218,22 @@ void vtime_account_user(struct task_struct *tsk)
  */
 void vtime_account_irq_enter(struct task_struct *tsk)
 {
-	u64 timer, system, system_scaled;
+	u64 timer;
 
 	timer = S390_lowcore.last_update_timer;
 	S390_lowcore.last_update_timer = get_vtimer();
-	S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer;
-
-	/* Update MT utilization calculation */
-	if (smp_cpu_mtid &&
-	    time_after64(jiffies_64, this_cpu_read(mt_scaling_jiffies)))
-		update_mt_scaling();
-
-	system = S390_lowcore.system_timer - tsk->thread.system_timer;
-	S390_lowcore.steal_timer -= system;
-	tsk->thread.system_timer = S390_lowcore.system_timer;
-	system_scaled = system;
-	/* Do MT utilization scaling */
-	if (smp_cpu_mtid) {
-		u64 mult = __this_cpu_read(mt_scaling_mult);
-		u64 div = __this_cpu_read(mt_scaling_div);
-
-		system_scaled = (system_scaled * mult) / div;
-	}
-	account_system_time(tsk, 0, system);
-	tsk->stimescaled += system_scaled;
-
-	virt_timer_forward(system);
+	timer -= S390_lowcore.last_update_timer;
+
+	if ((tsk->flags & PF_VCPU) && (irq_count() == 0))
+		S390_lowcore.guest_timer += timer;
+	else if (hardirq_count())
+		S390_lowcore.hardirq_timer += timer;
+	else if (in_serving_softirq())
+		S390_lowcore.softirq_timer += timer;
+	else
+		S390_lowcore.system_timer += timer;
+
+	virt_timer_forward(timer);
 }
 EXPORT_SYMBOL_GPL(vtime_account_irq_enter);
 
-- 
2.8.4


-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/10] s390/cputime: delayed accounting of system time
  2016-12-12 15:02       ` Frederic Weisbecker
@ 2016-12-13 11:13         ` Martin Schwidefsky
  2016-12-13 13:21           ` Martin Schwidefsky
  2016-12-13 14:38           ` Frederic Weisbecker
  0 siblings, 2 replies; 34+ messages in thread
From: Martin Schwidefsky @ 2016-12-13 11:13 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Tony Luck, Wanpeng Li, Peter Zijlstra, Michael Ellerman,
	Heiko Carstens, Benjamin Herrenschmidt, Thomas Gleixner,
	Paul Mackerras, Ingo Molnar, Fenghua Yu, Rik van Riel,
	Stanislaw Gruszka

On Mon, 12 Dec 2016 16:02:30 +0100
Frederic Weisbecker <fweisbec@gmail.com> wrote:

> On Mon, Dec 12, 2016 at 11:27:54AM +0100, Martin Schwidefsky wrote:
> > 3) The call to vtime_flush in account_process_tick is done in irq context from
> >    update_process_times. hardirq_offset==1 is also correct.  
> 
> Let's see this for example:
> 
> +       if ((tsk->flags & PF_VCPU) && (irq_count() - hardirq_offset == 0))
> +               S390_lowcore.guest_timer += timer;
> 
> If the tick is interrupting guest, we have accounted the guest time on tick IRQ entry.
> Now we are in the middle of the tick interrupt and since hardirq_offset is 1, we
> are taking the above path by accounting half of the tick-IRQ time as guest, which is wrong,
> it's actually IRQ time.

Hmm, you got me there. The system time from irq_enter until account_process_tick
is reached is indeed IRQ time. It is not much but it is incorrect. The best fix
would be to rip out the accounting of the system time from account_process_tick
as irq_enter / irq_exit will do system time accounting anyway. To do that
do_account_vtime needs to be split, because for the task switch we need to
account the system time of the previous task.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/10] s390/cputime: delayed accounting of system time
  2016-12-12 10:27     ` Martin Schwidefsky
@ 2016-12-12 15:02       ` Frederic Weisbecker
  2016-12-13 11:13         ` Martin Schwidefsky
  0 siblings, 1 reply; 34+ messages in thread
From: Frederic Weisbecker @ 2016-12-12 15:02 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: LKML, Tony Luck, Wanpeng Li, Peter Zijlstra, Michael Ellerman,
	Heiko Carstens, Benjamin Herrenschmidt, Thomas Gleixner,
	Paul Mackerras, Ingo Molnar, Fenghua Yu, Rik van Riel,
	Stanislaw Gruszka

On Mon, Dec 12, 2016 at 11:27:54AM +0100, Martin Schwidefsky wrote:
> On Sat, 10 Dec 2016 02:48:06 +0100
> Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > We should get rid of the hardirq_offset argument, it doesn't really make sense
> > anymore. Also it makes the accounting buggy now. It's called from the tick
> > through account_user_time() with hardirq_offset=1, so the irq time is incorrectly
> > accumulated as system time. Guest time may be incorrect too.
> > 
> > In fact it may have been buggy even before this patchset because vtime_account_user()
> > isn't only called from the tick but also from task switch, and hardirq_offset remains 1
> > for those two cases. Not good.
> 
> For s390 the do_account_vtime function is called from vtime_task_switch and vtime_flush.
> 1) vtime_task_switch is exclusively called from finish_task_switch outside of irq context.
>    The call to do_account_vtime with hardirq_offset==0 from vtime_task_switch is correct.

Yes that one is fine.

> 2) The call to vtime_flush in vtime_common_task_switch is irrelevant for s390 as we
>    define __ARCH_HAS_VTIME_TASK_SWITCH

That's right, I missed that. And now I remember that special case remains because s390 has its
own way to account idle time.

> 3) The call to vtime_flush in account_process_tick is done in irq context from
>    update_process_times. hardirq_offset==1 is also correct.

Let's see this for example:

+       if ((tsk->flags & PF_VCPU) && (irq_count() - hardirq_offset == 0))
+               S390_lowcore.guest_timer += timer;

If the tick is interrupting guest, we have accounted the guest time on tick IRQ entry.
Now we are in the middle of the tick interrupt and since hardirq_offset is 1, we
are taking the above path by accounting half of the tick-IRQ time as guest, which is wrong,
it's actually IRQ time.

> > >  void vtime_task_switch(struct task_struct *prev)
> > >  {
> > >  	do_account_vtime(prev, 0);  
> > 
> > This call should be removed, the task switch already calls vtime_account_user().
> 
> The vtime_account_user function is empty for s390..

That's right.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/10] s390/cputime: delayed accounting of system time
  2016-12-10  1:48   ` Frederic Weisbecker
@ 2016-12-12 10:27     ` Martin Schwidefsky
  2016-12-12 15:02       ` Frederic Weisbecker
  0 siblings, 1 reply; 34+ messages in thread
From: Martin Schwidefsky @ 2016-12-12 10:27 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Tony Luck, Wanpeng Li, Peter Zijlstra, Michael Ellerman,
	Heiko Carstens, Benjamin Herrenschmidt, Thomas Gleixner,
	Paul Mackerras, Ingo Molnar, Fenghua Yu, Rik van Riel,
	Stanislaw Gruszka

On Sat, 10 Dec 2016 02:48:06 +0100
Frederic Weisbecker <fweisbec@gmail.com> wrote:

> On Tue, Dec 06, 2016 at 03:32:22AM +0100, Frederic Weisbecker wrote:
> > From: Martin Schwidefsky <schwidefsky@de.ibm.com>
> > 
> > The account_system_time() function is called with a cputime that
> > occurred while running in the kernel. The function detects which
> > context the CPU is currently running in and accounts the time to
> > the correct bucket. This forces the arch code to account the
> > cputime for hardirq and softirq immediately.
> > 
> > Such accounting function can be costly and perform unwelcome divisions
> > and multiplications, among others.
> > 
> > The arch code can delay the accounting for system time. For s390
> > the accounting is done once per timer tick and for each task switch.
> > 
> > Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Cc: Paul Mackerras <paulus@samba.org>
> > Cc: Michael Ellerman <mpe@ellerman.id.au>
> > Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> > Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> > Cc: Tony Luck <tony.luck@intel.com>
> > Cc: Fenghua Yu <fenghua.yu@intel.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Stanislaw Gruszka <sgruszka@redhat.com>
> > Cc: Wanpeng Li <wanpeng.li@hotmail.com>
> > [rebase against latest cputime tree, massaged changelog accordingly]
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>  
> 
> Looking at this patch again, I think I need to do another pass on it.
> Comments below:
> 
> >  /*
> >   * Update process times based on virtual cpu times stored by entry.S
> >   * to the lowcore fields user_timer, system_timer & steal_clock.
> >   */
> >  static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
> >  {
> > -	u64 timer, clock, user, system, steal;
> > -	u64 user_scaled, system_scaled;
> > +	u64 timer, clock, user, guest, system, hardirq, softirq, steal;
> >  
> >  	timer = S390_lowcore.last_update_timer;
> >  	clock = S390_lowcore.last_update_clock;
> > @@ -110,36 +119,57 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
> >  #endif
> >  		: "=m" (S390_lowcore.last_update_timer),
> >  		  "=m" (S390_lowcore.last_update_clock));
> > -	S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer;
> > -	S390_lowcore.steal_timer += S390_lowcore.last_update_clock - clock;
> > +	clock = S390_lowcore.last_update_clock - clock;
> > +	timer -= S390_lowcore.last_update_timer;
> > +
> > +	if ((tsk->flags & PF_VCPU) && (irq_count() - hardirq_offset == 0))
> > +		S390_lowcore.guest_timer += timer;
> > +	else if (hardirq_count() - hardirq_offset)
> > +		S390_lowcore.hardirq_timer += timer;  
> 
> We should get rid of the hardirq_offset argument, it doesn't really make sense
> anymore. Also it makes the accounting buggy now. It's called from the tick
> through account_user_time() with hardirq_offset=1, so the irq time is incorrectly
> accumulated as system time. Guest time may be incorrect too.
> 
> In fact it may have been buggy even before this patchset because vtime_account_user()
> isn't only called from the tick but also from task switch, and hardirq_offset remains 1
> for those two cases. Not good.

For s390 the do_account_vtime function is called from vtime_task_switch and vtime_flush.
1) vtime_task_switch is exclusively called from finish_task_switch outside of irq context.
   The call to do_account_vtime with hardirq_offset==0 from vtime_task_switch is correct.
2) The call to vtime_flush in vtime_common_task_switch is irrelevant for s390 as we
   define __ARCH_HAS_VTIME_TASK_SWITCH
3) The call to vtime_flush in account_process_tick is done in irq context from
   update_process_times. hardirq_offset==1 is also correct.

As far as s390 is concerned that looks good.

> > +	else if (in_serving_softirq())
> > +		S390_lowcore.softirq_timer += timer;
> > +	else
> > +		S390_lowcore.system_timer += timer;
> >  
> >  	/* Update MT utilization calculation */
> >  	if (smp_cpu_mtid &&
> >  	    time_after64(jiffies_64, this_cpu_read(mt_scaling_jiffies)))
> >  		update_mt_scaling();
> >  
> > +	/* Calculate cputime delta */
> >  	user = S390_lowcore.user_timer - tsk->thread.user_timer;
> > -	S390_lowcore.steal_timer -= user;
> >  	tsk->thread.user_timer = S390_lowcore.user_timer;
> > -
> > +	guest = S390_lowcore.guest_timer - tsk->thread.guest_timer;
> > +	tsk->thread.guest_timer = S390_lowcore.guest_timer;
> >  	system = S390_lowcore.system_timer - tsk->thread.system_timer;
> > -	S390_lowcore.steal_timer -= system;
> >  	tsk->thread.system_timer = S390_lowcore.system_timer;
> > +	hardirq = S390_lowcore.hardirq_timer - tsk->thread.hardirq_timer;
> > +	tsk->thread.hardirq_timer = S390_lowcore.hardirq_timer;
> > +	softirq = S390_lowcore.softirq_timer - tsk->thread.softirq_timer;
> > +	tsk->thread.softirq_timer = S390_lowcore.softirq_timer;
> > +	S390_lowcore.steal_timer +=
> > +		clock - user - guest - system - hardirq - softirq;
> >  
> > -	user_scaled = user;
> > -	system_scaled = system;
> > -	/* Do MT utilization scaling */
> > -	if (smp_cpu_mtid) {
> > -		u64 mult = __this_cpu_read(mt_scaling_mult);
> > -		u64 div = __this_cpu_read(mt_scaling_div);
> > +	/* Push account value */
> > +	if (user) {
> > +		account_user_time(tsk, user);
> > +		tsk->utimescaled += scale_vtime(user);
> > +	}
> >  
> > -		user_scaled = (user_scaled * mult) / div;
> > -		system_scaled = (system_scaled * mult) / div;
> > +	if (guest) {
> > +		account_guest_time(tsk, guest);
> > +		tsk->utimescaled += scale_vtime(guest);
> >  	}
> > -	account_user_time(tsk, user);
> > -	tsk->utimescaled += user_scaled;
> > -	account_system_time(tsk, hardirq_offset, system);
> > -	tsk->stimescaled += system_scaled;
> > +
> > +	if (system)
> > +		account_system_index_scaled(tsk, system, scale_vtime(system),
> > +					    CPUTIME_SYSTEM);
> > +	if (hardirq)
> > +		account_system_index_scaled(tsk, hardirq, scale_vtime(hardirq),
> > +					    CPUTIME_IRQ);
> > +	if (softirq)
> > +		account_system_index_scaled(tsk, softirq, scale_vtime(softirq),
> > +					    CPUTIME_SOFTIRQ);
> >  
> >  	steal = S390_lowcore.steal_timer;
> >  	if ((s64) steal > 0) {
> > @@ -147,16 +177,22 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
> >  		account_steal_time(steal);
> >  	}
> >  
> > -	return virt_timer_forward(user + system);
> > +	return virt_timer_forward(user + guest + system + hardirq + softirq);
> >  }
> >  
> >  void vtime_task_switch(struct task_struct *prev)
> >  {
> >  	do_account_vtime(prev, 0);  
> 
> This call should be removed, the task switch already calls vtime_account_user().

The vtime_account_user function is empty for s390..
 
> >  	prev->thread.user_timer = S390_lowcore.user_timer;
> > +	prev->thread.guest_timer = S390_lowcore.guest_timer;
> >  	prev->thread.system_timer = S390_lowcore.system_timer;
> > +	prev->thread.hardirq_timer = S390_lowcore.hardirq_timer;
> > +	prev->thread.softirq_timer = S390_lowcore.softirq_timer;
> >  	S390_lowcore.user_timer = current->thread.user_timer;
> > +	S390_lowcore.guest_timer = current->thread.guest_timer;
> >  	S390_lowcore.system_timer = current->thread.system_timer;
> > +	S390_lowcore.hardirq_timer = current->thread.hardirq_timer;
> > +	S390_lowcore.softirq_timer = current->thread.softirq_timer;
> >  }  
> 


-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/10] s390/cputime: delayed accounting of system time
  2016-12-06  2:32 ` [PATCH 09/10] s390/cputime: delayed accounting of system time Frederic Weisbecker
@ 2016-12-10  1:48   ` Frederic Weisbecker
  2016-12-12 10:27     ` Martin Schwidefsky
  0 siblings, 1 reply; 34+ messages in thread
From: Frederic Weisbecker @ 2016-12-10  1:48 UTC (permalink / raw)
  To: LKML
  Cc: Martin Schwidefsky, Tony Luck, Wanpeng Li, Peter Zijlstra,
	Michael Ellerman, Heiko Carstens, Benjamin Herrenschmidt,
	Thomas Gleixner, Paul Mackerras, Ingo Molnar, Fenghua Yu,
	Rik van Riel, Stanislaw Gruszka

On Tue, Dec 06, 2016 at 03:32:22AM +0100, Frederic Weisbecker wrote:
> From: Martin Schwidefsky <schwidefsky@de.ibm.com>
> 
> The account_system_time() function is called with a cputime that
> occurred while running in the kernel. The function detects which
> context the CPU is currently running in and accounts the time to
> the correct bucket. This forces the arch code to account the
> cputime for hardirq and softirq immediately.
> 
> Such accounting function can be costly and perform unwelcome divisions
> and multiplications, among others.
> 
> The arch code can delay the accounting for system time. For s390
> the accounting is done once per timer tick and for each task switch.
> 
> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Stanislaw Gruszka <sgruszka@redhat.com>
> Cc: Wanpeng Li <wanpeng.li@hotmail.com>
> [rebase against latest cputime tree, massaged changelog accordingly]
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>

Looking at this patch again, I think I need to do another pass on it.
Comments below:

>  /*
>   * Update process times based on virtual cpu times stored by entry.S
>   * to the lowcore fields user_timer, system_timer & steal_clock.
>   */
>  static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
>  {
> -	u64 timer, clock, user, system, steal;
> -	u64 user_scaled, system_scaled;
> +	u64 timer, clock, user, guest, system, hardirq, softirq, steal;
>  
>  	timer = S390_lowcore.last_update_timer;
>  	clock = S390_lowcore.last_update_clock;
> @@ -110,36 +119,57 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
>  #endif
>  		: "=m" (S390_lowcore.last_update_timer),
>  		  "=m" (S390_lowcore.last_update_clock));
> -	S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer;
> -	S390_lowcore.steal_timer += S390_lowcore.last_update_clock - clock;
> +	clock = S390_lowcore.last_update_clock - clock;
> +	timer -= S390_lowcore.last_update_timer;
> +
> +	if ((tsk->flags & PF_VCPU) && (irq_count() - hardirq_offset == 0))
> +		S390_lowcore.guest_timer += timer;
> +	else if (hardirq_count() - hardirq_offset)
> +		S390_lowcore.hardirq_timer += timer;

We should get rid of the hardirq_offset argument, it doesn't really make sense
anymore. Also it makes the accounting buggy now. It's called from the tick
through account_user_time() with hardirq_offset=1, so the irq time is incorrectly
accumulated as system time. Guest time may be incorrect too.

In fact it may have been buggy even before this patchset because vtime_account_user()
isn't only called from the tick but also from task switch, and hardirq_offset remains 1
for those two cases. Not good.

> +	else if (in_serving_softirq())
> +		S390_lowcore.softirq_timer += timer;
> +	else
> +		S390_lowcore.system_timer += timer;
>  
>  	/* Update MT utilization calculation */
>  	if (smp_cpu_mtid &&
>  	    time_after64(jiffies_64, this_cpu_read(mt_scaling_jiffies)))
>  		update_mt_scaling();
>  
> +	/* Calculate cputime delta */
>  	user = S390_lowcore.user_timer - tsk->thread.user_timer;
> -	S390_lowcore.steal_timer -= user;
>  	tsk->thread.user_timer = S390_lowcore.user_timer;
> -
> +	guest = S390_lowcore.guest_timer - tsk->thread.guest_timer;
> +	tsk->thread.guest_timer = S390_lowcore.guest_timer;
>  	system = S390_lowcore.system_timer - tsk->thread.system_timer;
> -	S390_lowcore.steal_timer -= system;
>  	tsk->thread.system_timer = S390_lowcore.system_timer;
> +	hardirq = S390_lowcore.hardirq_timer - tsk->thread.hardirq_timer;
> +	tsk->thread.hardirq_timer = S390_lowcore.hardirq_timer;
> +	softirq = S390_lowcore.softirq_timer - tsk->thread.softirq_timer;
> +	tsk->thread.softirq_timer = S390_lowcore.softirq_timer;
> +	S390_lowcore.steal_timer +=
> +		clock - user - guest - system - hardirq - softirq;
>  
> -	user_scaled = user;
> -	system_scaled = system;
> -	/* Do MT utilization scaling */
> -	if (smp_cpu_mtid) {
> -		u64 mult = __this_cpu_read(mt_scaling_mult);
> -		u64 div = __this_cpu_read(mt_scaling_div);
> +	/* Push account value */
> +	if (user) {
> +		account_user_time(tsk, user);
> +		tsk->utimescaled += scale_vtime(user);
> +	}
>  
> -		user_scaled = (user_scaled * mult) / div;
> -		system_scaled = (system_scaled * mult) / div;
> +	if (guest) {
> +		account_guest_time(tsk, guest);
> +		tsk->utimescaled += scale_vtime(guest);
>  	}
> -	account_user_time(tsk, user);
> -	tsk->utimescaled += user_scaled;
> -	account_system_time(tsk, hardirq_offset, system);
> -	tsk->stimescaled += system_scaled;
> +
> +	if (system)
> +		account_system_index_scaled(tsk, system, scale_vtime(system),
> +					    CPUTIME_SYSTEM);
> +	if (hardirq)
> +		account_system_index_scaled(tsk, hardirq, scale_vtime(hardirq),
> +					    CPUTIME_IRQ);
> +	if (softirq)
> +		account_system_index_scaled(tsk, softirq, scale_vtime(softirq),
> +					    CPUTIME_SOFTIRQ);
>  
>  	steal = S390_lowcore.steal_timer;
>  	if ((s64) steal > 0) {
> @@ -147,16 +177,22 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
>  		account_steal_time(steal);
>  	}
>  
> -	return virt_timer_forward(user + system);
> +	return virt_timer_forward(user + guest + system + hardirq + softirq);
>  }
>  
>  void vtime_task_switch(struct task_struct *prev)
>  {
>  	do_account_vtime(prev, 0);

This call should be removed, the task switch already calls vtime_account_user().

>  	prev->thread.user_timer = S390_lowcore.user_timer;
> +	prev->thread.guest_timer = S390_lowcore.guest_timer;
>  	prev->thread.system_timer = S390_lowcore.system_timer;
> +	prev->thread.hardirq_timer = S390_lowcore.hardirq_timer;
> +	prev->thread.softirq_timer = S390_lowcore.softirq_timer;
>  	S390_lowcore.user_timer = current->thread.user_timer;
> +	S390_lowcore.guest_timer = current->thread.guest_timer;
>  	S390_lowcore.system_timer = current->thread.system_timer;
> +	S390_lowcore.hardirq_timer = current->thread.hardirq_timer;
> +	S390_lowcore.softirq_timer = current->thread.softirq_timer;
>  }

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 09/10] s390/cputime: delayed accounting of system time
  2016-12-06  2:32 [PATCH 00/10] vtime: Delay cputime accounting to tick Frederic Weisbecker
@ 2016-12-06  2:32 ` Frederic Weisbecker
  2016-12-10  1:48   ` Frederic Weisbecker
  0 siblings, 1 reply; 34+ messages in thread
From: Frederic Weisbecker @ 2016-12-06  2:32 UTC (permalink / raw)
  To: LKML
  Cc: Martin Schwidefsky, Tony Luck, Wanpeng Li, Peter Zijlstra,
	Michael Ellerman, Heiko Carstens, Benjamin Herrenschmidt,
	Thomas Gleixner, Paul Mackerras, Frederic Weisbecker,
	Ingo Molnar, Fenghua Yu, Rik van Riel, Stanislaw Gruszka

From: Martin Schwidefsky <schwidefsky@de.ibm.com>

The account_system_time() function is called with a cputime that
occurred while running in the kernel. The function detects which
context the CPU is currently running in and accounts the time to
the correct bucket. This forces the arch code to account the
cputime for hardirq and softirq immediately.

Such accounting function can be costly and perform unwelcome divisions
and multiplications, among others.

The arch code can delay the accounting for system time. For s390
the accounting is done once per timer tick and for each task switch.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
[rebase against latest cputime tree, massaged changelog accordingly]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/s390/include/asm/lowcore.h   |  65 +++++++++++-----------
 arch/s390/include/asm/processor.h |   3 +
 arch/s390/kernel/vtime.c          | 112 +++++++++++++++++++++++---------------
 3 files changed, 106 insertions(+), 74 deletions(-)

diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h
index 9bfad2a..61261e0e 100644
--- a/arch/s390/include/asm/lowcore.h
+++ b/arch/s390/include/asm/lowcore.h
@@ -85,53 +85,56 @@ struct lowcore {
 	__u64	mcck_enter_timer;		/* 0x02c0 */
 	__u64	exit_timer;			/* 0x02c8 */
 	__u64	user_timer;			/* 0x02d0 */
-	__u64	system_timer;			/* 0x02d8 */
-	__u64	steal_timer;			/* 0x02e0 */
-	__u64	last_update_timer;		/* 0x02e8 */
-	__u64	last_update_clock;		/* 0x02f0 */
-	__u64	int_clock;			/* 0x02f8 */
-	__u64	mcck_clock;			/* 0x0300 */
-	__u64	clock_comparator;		/* 0x0308 */
+	__u64	guest_timer;			/* 0x02d8 */
+	__u64	system_timer;			/* 0x02e0 */
+	__u64	hardirq_timer;			/* 0x02e8 */
+	__u64	softirq_timer;			/* 0x02f0 */
+	__u64	steal_timer;			/* 0x02f8 */
+	__u64	last_update_timer;		/* 0x0300 */
+	__u64	last_update_clock;		/* 0x0308 */
+	__u64	int_clock;			/* 0x0310 */
+	__u64	mcck_clock;			/* 0x0318 */
+	__u64	clock_comparator;		/* 0x0320 */
 
 	/* Current process. */
-	__u64	current_task;			/* 0x0310 */
-	__u8	pad_0x318[0x320-0x318];		/* 0x0318 */
-	__u64	kernel_stack;			/* 0x0320 */
+	__u64	current_task;			/* 0x0328 */
+	__u8	pad_0x318[0x320-0x318];		/* 0x0330 */
+	__u64	kernel_stack;			/* 0x0338 */
 
 	/* Interrupt, panic and restart stack. */
-	__u64	async_stack;			/* 0x0328 */
-	__u64	panic_stack;			/* 0x0330 */
-	__u64	restart_stack;			/* 0x0338 */
+	__u64	async_stack;			/* 0x0340 */
+	__u64	panic_stack;			/* 0x0348 */
+	__u64	restart_stack;			/* 0x0350 */
 
 	/* Restart function and parameter. */
-	__u64	restart_fn;			/* 0x0340 */
-	__u64	restart_data;			/* 0x0348 */
-	__u64	restart_source;			/* 0x0350 */
+	__u64	restart_fn;			/* 0x0358 */
+	__u64	restart_data;			/* 0x0360 */
+	__u64	restart_source;			/* 0x0368 */
 
 	/* Address space pointer. */
-	__u64	kernel_asce;			/* 0x0358 */
-	__u64	user_asce;			/* 0x0360 */
+	__u64	kernel_asce;			/* 0x0370 */
+	__u64	user_asce;			/* 0x0378 */
 
 	/*
 	 * The lpp and current_pid fields form a
 	 * 64-bit value that is set as program
 	 * parameter with the LPP instruction.
 	 */
-	__u32	lpp;				/* 0x0368 */
-	__u32	current_pid;			/* 0x036c */
+	__u32	lpp;				/* 0x0380 */
+	__u32	current_pid;			/* 0x0384 */
 
 	/* SMP info area */
-	__u32	cpu_nr;				/* 0x0370 */
-	__u32	softirq_pending;		/* 0x0374 */
-	__u64	percpu_offset;			/* 0x0378 */
-	__u64	vdso_per_cpu_data;		/* 0x0380 */
-	__u64	machine_flags;			/* 0x0388 */
-	__u32	preempt_count;			/* 0x0390 */
-	__u8	pad_0x0394[0x0398-0x0394];	/* 0x0394 */
-	__u64	gmap;				/* 0x0398 */
-	__u32	spinlock_lockval;		/* 0x03a0 */
-	__u32	fpu_flags;			/* 0x03a4 */
-	__u8	pad_0x03a8[0x0400-0x03a8];	/* 0x03a8 */
+	__u32	cpu_nr;				/* 0x0388 */
+	__u32	softirq_pending;		/* 0x038c */
+	__u64	percpu_offset;			/* 0x0390 */
+	__u64	vdso_per_cpu_data;		/* 0x0398 */
+	__u64	machine_flags;			/* 0x03a0 */
+	__u32	preempt_count;			/* 0x03a8 */
+	__u8	pad_0x03ac[0x03b0-0x03ac];	/* 0x03ac */
+	__u64	gmap;				/* 0x03b0 */
+	__u32	spinlock_lockval;		/* 0x03b8 */
+	__u32	fpu_flags;			/* 0x03bc */
+	__u8	pad_0x03c0[0x0400-0x03c0];	/* 0x03c0 */
 
 	/* Per cpu primary space access list */
 	__u32	paste[16];			/* 0x0400 */
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 9c00351..6f07907 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -111,7 +111,10 @@ struct thread_struct {
 	unsigned int  acrs[NUM_ACRS];
         unsigned long ksp;              /* kernel stack pointer             */
 	unsigned long user_timer;	/* task cputime in user space */
+	unsigned long guest_timer;	/* task cputime in kvm guest */
 	unsigned long system_timer;	/* task cputime in kernel space */
+	unsigned long hardirq_timer;	/* task cputime in hardirq context */
+	unsigned long softirq_timer;	/* task cputime in softirq context */
 	unsigned long sys_call_table;	/* system call table address */
 	mm_segment_t mm_segment;
 	unsigned long gmap_addr;	/* address of last gmap fault. */
diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c
index 6b246aa..0fdcaca 100644
--- a/arch/s390/kernel/vtime.c
+++ b/arch/s390/kernel/vtime.c
@@ -90,14 +90,23 @@ static void update_mt_scaling(void)
 	__this_cpu_write(mt_scaling_jiffies, jiffies_64);
 }
 
+static inline u64 scale_vtime(u64 vtime)
+{
+	u64 mult = __this_cpu_read(mt_scaling_mult);
+	u64 div = __this_cpu_read(mt_scaling_div);
+
+	if (smp_cpu_mtid)
+		return vtime * mult / div;
+	return vtime;
+}
+
 /*
  * Update process times based on virtual cpu times stored by entry.S
  * to the lowcore fields user_timer, system_timer & steal_clock.
  */
 static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
 {
-	u64 timer, clock, user, system, steal;
-	u64 user_scaled, system_scaled;
+	u64 timer, clock, user, guest, system, hardirq, softirq, steal;
 
 	timer = S390_lowcore.last_update_timer;
 	clock = S390_lowcore.last_update_clock;
@@ -110,36 +119,57 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
 #endif
 		: "=m" (S390_lowcore.last_update_timer),
 		  "=m" (S390_lowcore.last_update_clock));
-	S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer;
-	S390_lowcore.steal_timer += S390_lowcore.last_update_clock - clock;
+	clock = S390_lowcore.last_update_clock - clock;
+	timer -= S390_lowcore.last_update_timer;
+
+	if ((tsk->flags & PF_VCPU) && (irq_count() - hardirq_offset == 0))
+		S390_lowcore.guest_timer += timer;
+	else if (hardirq_count() - hardirq_offset)
+		S390_lowcore.hardirq_timer += timer;
+	else if (in_serving_softirq())
+		S390_lowcore.softirq_timer += timer;
+	else
+		S390_lowcore.system_timer += timer;
 
 	/* Update MT utilization calculation */
 	if (smp_cpu_mtid &&
 	    time_after64(jiffies_64, this_cpu_read(mt_scaling_jiffies)))
 		update_mt_scaling();
 
+	/* Calculate cputime delta */
 	user = S390_lowcore.user_timer - tsk->thread.user_timer;
-	S390_lowcore.steal_timer -= user;
 	tsk->thread.user_timer = S390_lowcore.user_timer;
-
+	guest = S390_lowcore.guest_timer - tsk->thread.guest_timer;
+	tsk->thread.guest_timer = S390_lowcore.guest_timer;
 	system = S390_lowcore.system_timer - tsk->thread.system_timer;
-	S390_lowcore.steal_timer -= system;
 	tsk->thread.system_timer = S390_lowcore.system_timer;
+	hardirq = S390_lowcore.hardirq_timer - tsk->thread.hardirq_timer;
+	tsk->thread.hardirq_timer = S390_lowcore.hardirq_timer;
+	softirq = S390_lowcore.softirq_timer - tsk->thread.softirq_timer;
+	tsk->thread.softirq_timer = S390_lowcore.softirq_timer;
+	S390_lowcore.steal_timer +=
+		clock - user - guest - system - hardirq - softirq;
 
-	user_scaled = user;
-	system_scaled = system;
-	/* Do MT utilization scaling */
-	if (smp_cpu_mtid) {
-		u64 mult = __this_cpu_read(mt_scaling_mult);
-		u64 div = __this_cpu_read(mt_scaling_div);
+	/* Push account value */
+	if (user) {
+		account_user_time(tsk, user);
+		tsk->utimescaled += scale_vtime(user);
+	}
 
-		user_scaled = (user_scaled * mult) / div;
-		system_scaled = (system_scaled * mult) / div;
+	if (guest) {
+		account_guest_time(tsk, guest);
+		tsk->utimescaled += scale_vtime(guest);
 	}
-	account_user_time(tsk, user);
-	tsk->utimescaled += user_scaled;
-	account_system_time(tsk, hardirq_offset, system);
-	tsk->stimescaled += system_scaled;
+
+	if (system)
+		account_system_index_scaled(tsk, system, scale_vtime(system),
+					    CPUTIME_SYSTEM);
+	if (hardirq)
+		account_system_index_scaled(tsk, hardirq, scale_vtime(hardirq),
+					    CPUTIME_IRQ);
+	if (softirq)
+		account_system_index_scaled(tsk, softirq, scale_vtime(softirq),
+					    CPUTIME_SOFTIRQ);
 
 	steal = S390_lowcore.steal_timer;
 	if ((s64) steal > 0) {
@@ -147,16 +177,22 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
 		account_steal_time(steal);
 	}
 
-	return virt_timer_forward(user + system);
+	return virt_timer_forward(user + guest + system + hardirq + softirq);
 }
 
 void vtime_task_switch(struct task_struct *prev)
 {
 	do_account_vtime(prev, 0);
 	prev->thread.user_timer = S390_lowcore.user_timer;
+	prev->thread.guest_timer = S390_lowcore.guest_timer;
 	prev->thread.system_timer = S390_lowcore.system_timer;
+	prev->thread.hardirq_timer = S390_lowcore.hardirq_timer;
+	prev->thread.softirq_timer = S390_lowcore.softirq_timer;
 	S390_lowcore.user_timer = current->thread.user_timer;
+	S390_lowcore.guest_timer = current->thread.guest_timer;
 	S390_lowcore.system_timer = current->thread.system_timer;
+	S390_lowcore.hardirq_timer = current->thread.hardirq_timer;
+	S390_lowcore.softirq_timer = current->thread.softirq_timer;
 }
 
 /*
@@ -176,32 +212,22 @@ void vtime_account_user(struct task_struct *tsk)
  */
 void vtime_account_irq_enter(struct task_struct *tsk)
 {
-	u64 timer, system, system_scaled;
+	u64 timer;
 
 	timer = S390_lowcore.last_update_timer;
 	S390_lowcore.last_update_timer = get_vtimer();
-	S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer;
-
-	/* Update MT utilization calculation */
-	if (smp_cpu_mtid &&
-	    time_after64(jiffies_64, this_cpu_read(mt_scaling_jiffies)))
-		update_mt_scaling();
-
-	system = S390_lowcore.system_timer - tsk->thread.system_timer;
-	S390_lowcore.steal_timer -= system;
-	tsk->thread.system_timer = S390_lowcore.system_timer;
-	system_scaled = system;
-	/* Do MT utilization scaling */
-	if (smp_cpu_mtid) {
-		u64 mult = __this_cpu_read(mt_scaling_mult);
-		u64 div = __this_cpu_read(mt_scaling_div);
-
-		system_scaled = (system_scaled * mult) / div;
-	}
-	account_system_time(tsk, 0, system);
-	tsk->stimescaled += system_scaled;
-
-	virt_timer_forward(system);
+	timer -= S390_lowcore.last_update_timer;
+
+	if ((tsk->flags & PF_VCPU) && (irq_count() == 0))
+		S390_lowcore.guest_timer += timer;
+	else if (hardirq_count())
+		S390_lowcore.hardirq_timer += timer;
+	else if (in_serving_softirq())
+		S390_lowcore.softirq_timer += timer;
+	else
+		S390_lowcore.system_timer += timer;
+
+	virt_timer_forward(timer);
 }
 EXPORT_SYMBOL_GPL(vtime_account_irq_enter);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2017-01-14 11:14 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-05 17:11 [PATCH 00/10] vtime: Delay cputime accounting to tick / context switch Frederic Weisbecker
2017-01-05 17:11 ` [PATCH 01/10] powerpc32: Fix stale scaled stime on " Frederic Weisbecker
2017-01-14 10:00   ` [tip:sched/core] sched/cputime, " tip-bot for Frederic Weisbecker
2017-01-05 17:11 ` [PATCH 02/10] ia64: Fix wrong start cputime assignment on task switch Frederic Weisbecker
2017-01-14 10:00   ` [tip:sched/core] sched/cputime, ia64: Fix incorrect " tip-bot for Frederic Weisbecker
2017-01-05 17:11 ` [PATCH 03/10] cputime: Allow accounting system time using cpustat index Frederic Weisbecker
2017-01-14 10:01   ` [tip:sched/core] sched/cputime: " tip-bot for Frederic Weisbecker
2017-01-05 17:11 ` [PATCH 04/10] cputime: Export account_guest_time Frederic Weisbecker
2017-01-14 10:01   ` [tip:sched/core] sched/cputime: Export account_guest_time() tip-bot for Frederic Weisbecker
2017-01-05 17:11 ` [PATCH 05/10] powerpc: Prepare accounting structure for cputime flush on tick Frederic Weisbecker
2017-01-14 10:02   ` [tip:sched/core] sched/cputime, " tip-bot for Frederic Weisbecker
2017-01-05 17:11 ` [PATCH 06/10] powerpc: Migrate stolen_time field to accounting structure Frederic Weisbecker
2017-01-14 10:03   ` [tip:sched/core] sched/cputime, powerpc: Migrate stolen_time field to the " tip-bot for Frederic Weisbecker
2017-01-05 17:11 ` [PATCH 07/10] powerpc/vtime: Accumulate cputime and account only on tick/task switch Frederic Weisbecker
2017-01-14 10:03   ` [tip:sched/core] sched/cputime, " tip-bot for Frederic Weisbecker
2017-01-05 17:11 ` [PATCH 08/10] ia64: " Frederic Weisbecker
2017-01-14 10:04   ` [tip:sched/core] sched/cputime, " tip-bot for Frederic Weisbecker
2017-01-05 17:11 ` [PATCH 09/10] s390/cputime: delayed accounting of system time Frederic Weisbecker
2017-01-14 10:04   ` [tip:sched/core] sched/cputime, s390: Implement " tip-bot for Martin Schwidefsky
2017-01-05 17:11 ` [PATCH 10/10] vtime: Rename vtime_account_user() to vtime_flush() Frederic Weisbecker
2017-01-14 10:05   ` [tip:sched/core] sched/cputime: " tip-bot for Frederic Weisbecker
2017-01-09  8:13 ` [PATCH 00/10] vtime: Delay cputime accounting to tick / context switch Martin Schwidefsky
2017-01-10 11:45 ` Thomas Gleixner
2017-01-10 15:21   ` Frederic Weisbecker
  -- strict thread matches above, loose matches on Subject: below --
2016-12-06  2:32 [PATCH 00/10] vtime: Delay cputime accounting to tick Frederic Weisbecker
2016-12-06  2:32 ` [PATCH 09/10] s390/cputime: delayed accounting of system time Frederic Weisbecker
2016-12-10  1:48   ` Frederic Weisbecker
2016-12-12 10:27     ` Martin Schwidefsky
2016-12-12 15:02       ` Frederic Weisbecker
2016-12-13 11:13         ` Martin Schwidefsky
2016-12-13 13:21           ` Martin Schwidefsky
2016-12-14  1:44             ` Frederic Weisbecker
2016-12-20 14:13               ` Martin Schwidefsky
2016-12-20 14:30                 ` Frederic Weisbecker
2016-12-13 14:38           ` Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).