linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Tony Luck <tony.luck@intel.com>,
	Wanpeng Li <wanpeng.li@hotmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Paul Mackerras <paulus@samba.org>, Ingo Molnar <mingo@kernel.org>,
	Fenghua Yu <fenghua.yu@intel.com>, Rik van Riel <riel@redhat.com>,
	Stanislaw Gruszka <sgruszka@redhat.com>
Subject: Re: [PATCH 09/10] s390/cputime: delayed accounting of system time
Date: Tue, 13 Dec 2016 14:21:21 +0100	[thread overview]
Message-ID: <20161213142121.382ce2ef@mschwideX1> (raw)
In-Reply-To: <20161213121322.5fdbbb28@mschwideX1>

On Tue, 13 Dec 2016 12:13:22 +0100
Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:

> On Mon, 12 Dec 2016 16:02:30 +0100
> Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > On Mon, Dec 12, 2016 at 11:27:54AM +0100, Martin Schwidefsky wrote:  
> > > 3) The call to vtime_flush in account_process_tick is done in irq context from
> > >    update_process_times. hardirq_offset==1 is also correct.    
> > 
> > Let's see this for example:
> > 
> > +       if ((tsk->flags & PF_VCPU) && (irq_count() - hardirq_offset == 0))
> > +               S390_lowcore.guest_timer += timer;
> > 
> > If the tick is interrupting guest, we have accounted the guest time on tick IRQ entry.
> > Now we are in the middle of the tick interrupt and since hardirq_offset is 1, we
> > are taking the above path by accounting half of the tick-IRQ time as guest, which is wrong,
> > it's actually IRQ time.  
> 
> Hmm, you got me there. The system time from irq_enter until account_process_tick
> is reached is indeed IRQ time. It is not much but it is incorrect. The best fix
> would be to rip out the accounting of the system time from account_process_tick
> as irq_enter / irq_exit will do system time accounting anyway. To do that
> do_account_vtime needs to be split, because for the task switch we need to
> account the system time of the previous task.

New patch for the delayed cputime account. I can not just rip out system time
accounting from account_process_tick after all, I need a sync point for the
steal time calculation. It basically is the same patch as before but with a new
helper update_tsk_timer, the removal of hardirq_offset and a simplification
for do_account_vtime: the last accounting delta is either hardirq time for
the tick or system time for the task switch.

Keeping my finger crossed..

--
>From 2d7ca61909c438f947d27110b7b2a2b67a44607a Mon Sep 17 00:00:00 2001
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
Date: Sat, 3 Dec 2016 15:56:50 +0100
Subject: [PATCH 1/2] s390/cputime: delayed accounting of system time

The account_system_time() function is called with a cputime that
occurred while running in the kernel. The function detects which
context the CPU is currently running in and accounts the time to
the correct bucket. This forces the arch code to account the
cputime for hardirq and softirq immediately.

Such accounting function can be costly and perform unwelcome divisions
and multiplications, among others.

The arch code can delay the accounting for system time. For s390
the accounting is done once per timer tick and for each task switch.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
[rebase against latest cputime tree, massaged changelog accordingly]
---
 arch/s390/include/asm/lowcore.h   |  65 +++++++++---------
 arch/s390/include/asm/processor.h |   3 +
 arch/s390/kernel/vtime.c          | 134 +++++++++++++++++++++++---------------
 3 files changed, 120 insertions(+), 82 deletions(-)

diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h
index 9bfad2a..61261e0 100644
--- a/arch/s390/include/asm/lowcore.h
+++ b/arch/s390/include/asm/lowcore.h
@@ -85,53 +85,56 @@ struct lowcore {
 	__u64	mcck_enter_timer;		/* 0x02c0 */
 	__u64	exit_timer;			/* 0x02c8 */
 	__u64	user_timer;			/* 0x02d0 */
-	__u64	system_timer;			/* 0x02d8 */
-	__u64	steal_timer;			/* 0x02e0 */
-	__u64	last_update_timer;		/* 0x02e8 */
-	__u64	last_update_clock;		/* 0x02f0 */
-	__u64	int_clock;			/* 0x02f8 */
-	__u64	mcck_clock;			/* 0x0300 */
-	__u64	clock_comparator;		/* 0x0308 */
+	__u64	guest_timer;			/* 0x02d8 */
+	__u64	system_timer;			/* 0x02e0 */
+	__u64	hardirq_timer;			/* 0x02e8 */
+	__u64	softirq_timer;			/* 0x02f0 */
+	__u64	steal_timer;			/* 0x02f8 */
+	__u64	last_update_timer;		/* 0x0300 */
+	__u64	last_update_clock;		/* 0x0308 */
+	__u64	int_clock;			/* 0x0310 */
+	__u64	mcck_clock;			/* 0x0318 */
+	__u64	clock_comparator;		/* 0x0320 */
 
 	/* Current process. */
-	__u64	current_task;			/* 0x0310 */
-	__u8	pad_0x318[0x320-0x318];		/* 0x0318 */
-	__u64	kernel_stack;			/* 0x0320 */
+	__u64	current_task;			/* 0x0328 */
+	__u8	pad_0x318[0x320-0x318];		/* 0x0330 */
+	__u64	kernel_stack;			/* 0x0338 */
 
 	/* Interrupt, panic and restart stack. */
-	__u64	async_stack;			/* 0x0328 */
-	__u64	panic_stack;			/* 0x0330 */
-	__u64	restart_stack;			/* 0x0338 */
+	__u64	async_stack;			/* 0x0340 */
+	__u64	panic_stack;			/* 0x0348 */
+	__u64	restart_stack;			/* 0x0350 */
 
 	/* Restart function and parameter. */
-	__u64	restart_fn;			/* 0x0340 */
-	__u64	restart_data;			/* 0x0348 */
-	__u64	restart_source;			/* 0x0350 */
+	__u64	restart_fn;			/* 0x0358 */
+	__u64	restart_data;			/* 0x0360 */
+	__u64	restart_source;			/* 0x0368 */
 
 	/* Address space pointer. */
-	__u64	kernel_asce;			/* 0x0358 */
-	__u64	user_asce;			/* 0x0360 */
+	__u64	kernel_asce;			/* 0x0370 */
+	__u64	user_asce;			/* 0x0378 */
 
 	/*
 	 * The lpp and current_pid fields form a
 	 * 64-bit value that is set as program
 	 * parameter with the LPP instruction.
 	 */
-	__u32	lpp;				/* 0x0368 */
-	__u32	current_pid;			/* 0x036c */
+	__u32	lpp;				/* 0x0380 */
+	__u32	current_pid;			/* 0x0384 */
 
 	/* SMP info area */
-	__u32	cpu_nr;				/* 0x0370 */
-	__u32	softirq_pending;		/* 0x0374 */
-	__u64	percpu_offset;			/* 0x0378 */
-	__u64	vdso_per_cpu_data;		/* 0x0380 */
-	__u64	machine_flags;			/* 0x0388 */
-	__u32	preempt_count;			/* 0x0390 */
-	__u8	pad_0x0394[0x0398-0x0394];	/* 0x0394 */
-	__u64	gmap;				/* 0x0398 */
-	__u32	spinlock_lockval;		/* 0x03a0 */
-	__u32	fpu_flags;			/* 0x03a4 */
-	__u8	pad_0x03a8[0x0400-0x03a8];	/* 0x03a8 */
+	__u32	cpu_nr;				/* 0x0388 */
+	__u32	softirq_pending;		/* 0x038c */
+	__u64	percpu_offset;			/* 0x0390 */
+	__u64	vdso_per_cpu_data;		/* 0x0398 */
+	__u64	machine_flags;			/* 0x03a0 */
+	__u32	preempt_count;			/* 0x03a8 */
+	__u8	pad_0x03ac[0x03b0-0x03ac];	/* 0x03ac */
+	__u64	gmap;				/* 0x03b0 */
+	__u32	spinlock_lockval;		/* 0x03b8 */
+	__u32	fpu_flags;			/* 0x03bc */
+	__u8	pad_0x03c0[0x0400-0x03c0];	/* 0x03c0 */
 
 	/* Per cpu primary space access list */
 	__u32	paste[16];			/* 0x0400 */
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 9c00351..6f07907 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -111,7 +111,10 @@ struct thread_struct {
 	unsigned int  acrs[NUM_ACRS];
         unsigned long ksp;              /* kernel stack pointer             */
 	unsigned long user_timer;	/* task cputime in user space */
+	unsigned long guest_timer;	/* task cputime in kvm guest */
 	unsigned long system_timer;	/* task cputime in kernel space */
+	unsigned long hardirq_timer;	/* task cputime in hardirq context */
+	unsigned long softirq_timer;	/* task cputime in softirq context */
 	unsigned long sys_call_table;	/* system call table address */
 	mm_segment_t mm_segment;
 	unsigned long gmap_addr;	/* address of last gmap fault. */
diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c
index 6b246aa..65dc7b6 100644
--- a/arch/s390/kernel/vtime.c
+++ b/arch/s390/kernel/vtime.c
@@ -90,14 +90,33 @@ static void update_mt_scaling(void)
 	__this_cpu_write(mt_scaling_jiffies, jiffies_64);
 }
 
+static inline u64 update_tsk_timer(unsigned long *tsk_vtime, u64 new)
+{
+	u64 delta;
+
+	delta = new - *tsk_vtime;
+	*tsk_vtime = new;
+	return delta;
+}
+
+
+static inline u64 scale_vtime(u64 vtime)
+{
+	u64 mult = __this_cpu_read(mt_scaling_mult);
+	u64 div = __this_cpu_read(mt_scaling_div);
+
+	if (smp_cpu_mtid)
+		return vtime * mult / div;
+	return vtime;
+}
+
 /*
  * Update process times based on virtual cpu times stored by entry.S
  * to the lowcore fields user_timer, system_timer & steal_clock.
  */
-static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
+static int do_account_vtime(struct task_struct *tsk)
 {
-	u64 timer, clock, user, system, steal;
-	u64 user_scaled, system_scaled;
+	u64 timer, clock, user, guest, system, hardirq, softirq, steal;
 
 	timer = S390_lowcore.last_update_timer;
 	clock = S390_lowcore.last_update_clock;
@@ -110,36 +129,53 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
 #endif
 		: "=m" (S390_lowcore.last_update_timer),
 		  "=m" (S390_lowcore.last_update_clock));
-	S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer;
-	S390_lowcore.steal_timer += S390_lowcore.last_update_clock - clock;
+	clock = S390_lowcore.last_update_clock - clock;
+	timer -= S390_lowcore.last_update_timer;
+
+	if (hardirq_count())
+		S390_lowcore.hardirq_timer += timer;
+	else
+		S390_lowcore.system_timer += timer;
 
 	/* Update MT utilization calculation */
 	if (smp_cpu_mtid &&
 	    time_after64(jiffies_64, this_cpu_read(mt_scaling_jiffies)))
 		update_mt_scaling();
 
-	user = S390_lowcore.user_timer - tsk->thread.user_timer;
-	S390_lowcore.steal_timer -= user;
-	tsk->thread.user_timer = S390_lowcore.user_timer;
-
-	system = S390_lowcore.system_timer - tsk->thread.system_timer;
-	S390_lowcore.steal_timer -= system;
-	tsk->thread.system_timer = S390_lowcore.system_timer;
-
-	user_scaled = user;
-	system_scaled = system;
-	/* Do MT utilization scaling */
-	if (smp_cpu_mtid) {
-		u64 mult = __this_cpu_read(mt_scaling_mult);
-		u64 div = __this_cpu_read(mt_scaling_div);
+	/* Calculate cputime delta */
+	user = update_tsk_timer(&tsk->thread.user_timer,
+				READ_ONCE(S390_lowcore.user_timer));
+	guest = update_tsk_timer(&tsk->thread.guest_timer,
+				 READ_ONCE(S390_lowcore.guest_timer));
+	system = update_tsk_timer(&tsk->thread.system_timer,
+				  READ_ONCE(S390_lowcore.system_timer));
+	hardirq = update_tsk_timer(&tsk->thread.hardirq_timer,
+				   READ_ONCE(S390_lowcore.hardirq_timer));
+	softirq = update_tsk_timer(&tsk->thread.softirq_timer,
+				   READ_ONCE(S390_lowcore.softirq_timer));
+	S390_lowcore.steal_timer +=
+		clock - user - guest - system - hardirq - softirq;
+
+	/* Push account value */
+	if (user) {
+		account_user_time(tsk, user);
+		tsk->utimescaled += scale_vtime(user);
+	}
 
-		user_scaled = (user_scaled * mult) / div;
-		system_scaled = (system_scaled * mult) / div;
+	if (guest) {
+		account_guest_time(tsk, guest);
+		tsk->utimescaled += scale_vtime(guest);
 	}
-	account_user_time(tsk, user);
-	tsk->utimescaled += user_scaled;
-	account_system_time(tsk, hardirq_offset, system);
-	tsk->stimescaled += system_scaled;
+
+	if (system)
+		account_system_index_scaled(tsk, system, scale_vtime(system),
+					    CPUTIME_SYSTEM);
+	if (hardirq)
+		account_system_index_scaled(tsk, hardirq, scale_vtime(hardirq),
+					    CPUTIME_IRQ);
+	if (softirq)
+		account_system_index_scaled(tsk, softirq, scale_vtime(softirq),
+					    CPUTIME_SOFTIRQ);
 
 	steal = S390_lowcore.steal_timer;
 	if ((s64) steal > 0) {
@@ -147,16 +183,22 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
 		account_steal_time(steal);
 	}
 
-	return virt_timer_forward(user + system);
+	return virt_timer_forward(user + guest + system + hardirq + softirq);
 }
 
 void vtime_task_switch(struct task_struct *prev)
 {
-	do_account_vtime(prev, 0);
+	do_account_vtime(prev);
 	prev->thread.user_timer = S390_lowcore.user_timer;
+	prev->thread.guest_timer = S390_lowcore.guest_timer;
 	prev->thread.system_timer = S390_lowcore.system_timer;
+	prev->thread.hardirq_timer = S390_lowcore.hardirq_timer;
+	prev->thread.softirq_timer = S390_lowcore.softirq_timer;
 	S390_lowcore.user_timer = current->thread.user_timer;
+	S390_lowcore.guest_timer = current->thread.guest_timer;
 	S390_lowcore.system_timer = current->thread.system_timer;
+	S390_lowcore.hardirq_timer = current->thread.hardirq_timer;
+	S390_lowcore.softirq_timer = current->thread.softirq_timer;
 }
 
 /*
@@ -166,7 +208,7 @@ void vtime_task_switch(struct task_struct *prev)
  */
 void vtime_account_user(struct task_struct *tsk)
 {
-	if (do_account_vtime(tsk, HARDIRQ_OFFSET))
+	if (do_account_vtime(tsk))
 		virt_timer_expire();
 }
 
@@ -176,32 +218,22 @@ void vtime_account_user(struct task_struct *tsk)
  */
 void vtime_account_irq_enter(struct task_struct *tsk)
 {
-	u64 timer, system, system_scaled;
+	u64 timer;
 
 	timer = S390_lowcore.last_update_timer;
 	S390_lowcore.last_update_timer = get_vtimer();
-	S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer;
-
-	/* Update MT utilization calculation */
-	if (smp_cpu_mtid &&
-	    time_after64(jiffies_64, this_cpu_read(mt_scaling_jiffies)))
-		update_mt_scaling();
-
-	system = S390_lowcore.system_timer - tsk->thread.system_timer;
-	S390_lowcore.steal_timer -= system;
-	tsk->thread.system_timer = S390_lowcore.system_timer;
-	system_scaled = system;
-	/* Do MT utilization scaling */
-	if (smp_cpu_mtid) {
-		u64 mult = __this_cpu_read(mt_scaling_mult);
-		u64 div = __this_cpu_read(mt_scaling_div);
-
-		system_scaled = (system_scaled * mult) / div;
-	}
-	account_system_time(tsk, 0, system);
-	tsk->stimescaled += system_scaled;
-
-	virt_timer_forward(system);
+	timer -= S390_lowcore.last_update_timer;
+
+	if ((tsk->flags & PF_VCPU) && (irq_count() == 0))
+		S390_lowcore.guest_timer += timer;
+	else if (hardirq_count())
+		S390_lowcore.hardirq_timer += timer;
+	else if (in_serving_softirq())
+		S390_lowcore.softirq_timer += timer;
+	else
+		S390_lowcore.system_timer += timer;
+
+	virt_timer_forward(timer);
 }
 EXPORT_SYMBOL_GPL(vtime_account_irq_enter);
 
-- 
2.8.4


-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

  reply	other threads:[~2016-12-13 13:21 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-06  2:32 [PATCH 00/10] vtime: Delay cputime accounting to tick Frederic Weisbecker
2016-12-06  2:32 ` [FIX][PATCH 01/10] powerpc32: Fix stale scaled stime on context switch Frederic Weisbecker
2016-12-06  2:32 ` [FIX][PATCH 02/10] ia64: Fix wrong start cputime assignment on task switch Frederic Weisbecker
2016-12-06  2:32 ` [PATCH 03/10] cputime: Allow accounting system time using cpustat index Frederic Weisbecker
2016-12-06  2:32 ` [PATCH 04/10] cputime: Export account_guest_time Frederic Weisbecker
2016-12-06  2:32 ` [PATCH 05/10] powerpc: Prepare accounting structure for cputime flush on tick Frederic Weisbecker
2016-12-06  2:32 ` [PATCH 06/10] powerpc: Migrate stolen_time field to accounting structure Frederic Weisbecker
2016-12-06  2:32 ` [PATCH 07/10] powerpc/vtime: Accumulate cputime and account only on tick/task switch Frederic Weisbecker
2016-12-06  2:32 ` [PATCH 08/10] ia64: " Frederic Weisbecker
2016-12-06  2:32 ` [PATCH 09/10] s390/cputime: delayed accounting of system time Frederic Weisbecker
2016-12-10  1:48   ` Frederic Weisbecker
2016-12-12 10:27     ` Martin Schwidefsky
2016-12-12 15:02       ` Frederic Weisbecker
2016-12-13 11:13         ` Martin Schwidefsky
2016-12-13 13:21           ` Martin Schwidefsky [this message]
2016-12-14  1:44             ` Frederic Weisbecker
2016-12-20 14:13               ` Martin Schwidefsky
2016-12-20 14:30                 ` Frederic Weisbecker
2016-12-13 14:38           ` Frederic Weisbecker
2016-12-06  2:32 ` [PATCH 10/10] vtime: Rename vtime_account_user() to vtime_flush() Frederic Weisbecker
2016-12-06  4:20 ` [PATCH 00/10] vtime: Delay cputime accounting to tick Paul Mackerras
2016-12-06  7:04   ` Martin Schwidefsky
2016-12-06 14:34   ` Frederic Weisbecker
2016-12-06  8:40 ` Christian Borntraeger
2017-01-05 17:11 [PATCH 00/10] vtime: Delay cputime accounting to tick / context switch Frederic Weisbecker
2017-01-05 17:11 ` [PATCH 09/10] s390/cputime: delayed accounting of system time Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161213142121.382ce2ef@mschwideX1 \
    --to=schwidefsky@de.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=fenghua.yu@intel.com \
    --cc=fweisbec@gmail.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=sgruszka@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=wanpeng.li@hotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).