linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] sched,time: reduce nohz_full syscall overhead 40%
@ 2016-01-30  3:36 riel
  2016-01-30  3:36 ` [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals riel
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: riel @ 2016-01-30  3:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: tglx, mingo, luto, fweisbec, peterz, clark

unning with nohz_full introduces a fair amount of overhead.
Specifically, various things that are usually done from the
timer interrupt are now done at syscall, irq, and guest
entry and exit times.

However, some of the code that is called every single time
has only ever worked at jiffy resolution. The code in
__acct_update_integrals was also doing some unnecessary
calculations.

Getting rid of the unnecessary calculations, without
changing any of the functionality in __acct_update_integrals
gets us about an 11% win.

Not calling the time statistics updating code more than
once per jiffy, like is done on housekeeping CPUs and on
all the CPUs of a non-nohz_full system, shaves off a
further 30%.

I tested this series with a microbenchmark calling
an invalid syscall number ten million times in a row,
on a nohz_full cpu.

    Run times for the microbenchmark:
    
4.4				3.8 seconds
4.5-rc1				3.7 seconds
4.5-rc1 + first patch		3.3 seconds
4.5-rc1 + first 3 patches	3.1 seconds
4.5-rc1 + all patches		2.3 seconds

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals
  2016-01-30  3:36 [PATCH 0/2] sched,time: reduce nohz_full syscall overhead 40% riel
@ 2016-01-30  3:36 ` riel
  2016-01-30  4:56   ` kbuild test robot
  2016-01-30 14:44   ` Frederic Weisbecker
  2016-01-30  3:36 ` [PATCH 2/4] acct,time: change indentation in __acct_update_integrals riel
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 16+ messages in thread
From: riel @ 2016-01-30  3:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: tglx, mingo, luto, fweisbec, peterz, clark

From: Rik van Riel <riel@redhat.com>

When running a microbenchmark calling an invalid syscall number
in a loop, on a nohz_full CPU, we spend a full 9% of our CPU
time in __acct_update_integrals.

This function converts cputime_t to jiffies, to a timeval, only to
convert the timeval back to microseconds before discarding it.

This patch leaves __acct_update_integrals functionally equivalent,
but speeds things up by about 12%, with 10 million calls to an
invalid syscall number dropping from 3.7 to 3.25 seconds.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 kernel/tsacct.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/kernel/tsacct.c b/kernel/tsacct.c
index 975cb49e32bf..41667b23dbd0 100644
--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -93,9 +93,9 @@ void xacct_add_tsk(struct taskstats *stats, struct task_struct *p)
 {
 	struct mm_struct *mm;
 
-	/* convert pages-usec to Mbyte-usec */
-	stats->coremem = p->acct_rss_mem1 * PAGE_SIZE / MB;
-	stats->virtmem = p->acct_vm_mem1 * PAGE_SIZE / MB;
+	/* convert pages-nsec/KB to Mbyte-usec, see __acct_update_integrals */
+	stats->coremem = p->acct_rss_mem1 * PAGE_SIZE / (1000 * KB);
+	stats->virtmem = p->acct_vm_mem1 * PAGE_SIZE / (1000 * KB);
 	mm = get_task_mm(p);
 	if (mm) {
 		/* adjust to KB unit */
@@ -125,22 +125,21 @@ static void __acct_update_integrals(struct task_struct *tsk,
 {
 	if (likely(tsk->mm)) {
 		cputime_t time, dtime;
-		struct timeval value;
 		unsigned long flags;
 		u64 delta;
 
 		local_irq_save(flags);
 		time = stime + utime;
 		dtime = time - tsk->acct_timexpd;
-		jiffies_to_timeval(cputime_to_jiffies(dtime), &value);
-		delta = value.tv_sec;
-		delta = delta * USEC_PER_SEC + value.tv_usec;
+		delta = cputime_to_nsecs(dtime);
 
-		if (delta == 0)
+		if (delta < TICK_NSEC)
 			goto out;
+
 		tsk->acct_timexpd = time;
-		tsk->acct_rss_mem1 += delta * get_mm_rss(tsk->mm);
-		tsk->acct_vm_mem1 += delta * tsk->mm->total_vm;
+		/* The final unit will be Mbyte-usecs, see xacct_add_tsk */
+		tsk->acct_rss_mem1 += delta * get_mm_rss(tsk->mm) / 1024;
+		tsk->acct_vm_mem1 += delta * tsk->mm->total_vm / 1024;
 	out:
 		local_irq_restore(flags);
 	}
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] acct,time: change indentation in __acct_update_integrals
  2016-01-30  3:36 [PATCH 0/2] sched,time: reduce nohz_full syscall overhead 40% riel
  2016-01-30  3:36 ` [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals riel
@ 2016-01-30  3:36 ` riel
  2016-01-30 16:15   ` Frederic Weisbecker
  2016-01-30  3:36 ` [PATCH 3/4] time,acct: drop irq save & restore from __acct_update_integrals riel
  2016-01-30  3:36 ` [PATCH 4/4] sched,time: only call account_{user,sys,guest,idle}_time once a jiffy riel
  3 siblings, 1 reply; 16+ messages in thread
From: riel @ 2016-01-30  3:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: tglx, mingo, luto, fweisbec, peterz, clark

From: Rik van Riel <riel@redhat.com>

Change the indentation in __acct_update_integrals to make the function
a little easier to read.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Rik van Riel <riel@redhat.com>
---
 kernel/tsacct.c | 41 +++++++++++++++++++++--------------------
 1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/kernel/tsacct.c b/kernel/tsacct.c
index 41667b23dbd0..8908f8b1d26e 100644
--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -123,26 +123,27 @@ void xacct_add_tsk(struct taskstats *stats, struct task_struct *p)
 static void __acct_update_integrals(struct task_struct *tsk,
 				    cputime_t utime, cputime_t stime)
 {
-	if (likely(tsk->mm)) {
-		cputime_t time, dtime;
-		unsigned long flags;
-		u64 delta;
-
-		local_irq_save(flags);
-		time = stime + utime;
-		dtime = time - tsk->acct_timexpd;
-		delta = cputime_to_nsecs(dtime);
-
-		if (delta < TICK_NSEC)
-			goto out;
-
-		tsk->acct_timexpd = time;
-		/* The final unit will be Mbyte-usecs, see xacct_add_tsk */
-		tsk->acct_rss_mem1 += delta * get_mm_rss(tsk->mm) / 1024;
-		tsk->acct_vm_mem1 += delta * tsk->mm->total_vm / 1024;
-	out:
-		local_irq_restore(flags);
-	}
+	cputime_t time, dtime;
+	unsigned long flags;
+	u64 delta;
+
+	if (unlikely(!tsk->mm))
+		return;
+
+	local_irq_save(flags);
+	time = stime + utime;
+	dtime = time - tsk->acct_timexpd;
+	delta = cputime_to_nsecs(dtime);
+
+	if (delta < TICK_NSEC)
+		goto out;
+
+	tsk->acct_timexpd = time;
+	/* The final unit will be Mbyte-usecs, see xacct_add_tsk */
+	tsk->acct_rss_mem1 += delta * get_mm_rss(tsk->mm) / 1024;
+	tsk->acct_vm_mem1 += delta * tsk->mm->total_vm / 1024;
+out:
+	local_irq_restore(flags);
 }
 
 /**
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] time,acct: drop irq save & restore from __acct_update_integrals
  2016-01-30  3:36 [PATCH 0/2] sched,time: reduce nohz_full syscall overhead 40% riel
  2016-01-30  3:36 ` [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals riel
  2016-01-30  3:36 ` [PATCH 2/4] acct,time: change indentation in __acct_update_integrals riel
@ 2016-01-30  3:36 ` riel
  2016-01-30 16:24   ` Frederic Weisbecker
  2016-01-30  3:36 ` [PATCH 4/4] sched,time: only call account_{user,sys,guest,idle}_time once a jiffy riel
  3 siblings, 1 reply; 16+ messages in thread
From: riel @ 2016-01-30  3:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: tglx, mingo, luto, fweisbec, peterz, clark

From: Rik van Riel <riel@redhat.com>

It looks like all the call paths that lead to __acct_update_integrals
already have irqs disabled, and __acct_update_integrals does not need
to disable irqs itself.

This is very convenient since about half the CPU time left in this
function was spent in local_irq_save alone.

Performance of a microbenchmark that calls an invalid syscall
ten million times in a row on a nohz_full CPU improves 21% vs.
4.5-rc1 with both the removal of divisions from __acct_update_integrals
and this patch, with runtime dropping from 3.7 to 2.9 seconds.

With these patches applied, the highest remaining cpu user in
the trace is native_sched_clock, which is addressed in the next
patch.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Rik van Riel <riel@redhat.com>
---
 kernel/tsacct.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/kernel/tsacct.c b/kernel/tsacct.c
index 8908f8b1d26e..b2663d699a72 100644
--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -124,26 +124,22 @@ static void __acct_update_integrals(struct task_struct *tsk,
 				    cputime_t utime, cputime_t stime)
 {
 	cputime_t time, dtime;
-	unsigned long flags;
 	u64 delta;
 
 	if (unlikely(!tsk->mm))
 		return;
 
-	local_irq_save(flags);
 	time = stime + utime;
 	dtime = time - tsk->acct_timexpd;
 	delta = cputime_to_nsecs(dtime);
 
 	if (delta < TICK_NSEC)
-		goto out;
+		return;
 
 	tsk->acct_timexpd = time;
 	/* The final unit will be Mbyte-usecs, see xacct_add_tsk */
 	tsk->acct_rss_mem1 += delta * get_mm_rss(tsk->mm) / 1024;
 	tsk->acct_vm_mem1 += delta * tsk->mm->total_vm / 1024;
-out:
-	local_irq_restore(flags);
 }
 
 /**
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/4] sched,time: only call account_{user,sys,guest,idle}_time once a jiffy
  2016-01-30  3:36 [PATCH 0/2] sched,time: reduce nohz_full syscall overhead 40% riel
                   ` (2 preceding siblings ...)
  2016-01-30  3:36 ` [PATCH 3/4] time,acct: drop irq save & restore from __acct_update_integrals riel
@ 2016-01-30  3:36 ` riel
  3 siblings, 0 replies; 16+ messages in thread
From: riel @ 2016-01-30  3:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: tglx, mingo, luto, fweisbec, peterz, clark

From: Rik van Riel <riel@redhat.com>

After removing __acct_update_integrals from the profile,
native_sched_clock remains as the top CPU user. This can be
reduced by only calling account_{user,sys,guest,idle}_time
once per jiffy for long running tasks on nohz_full CPUs.

This will reduce timing accuracy on nohz_full CPUs to jiffy
based sampling, just like on normal CPUs. It results in
totally removing native_sched_clock from the profile, and
significantly speeding up the syscall entry and exit path,
as well as irq entry and exit, and kvm guest entry & exit.

This code relies on another CPU advancing jiffies when the
system is busy. On a nohz_full system, this is done by a
housekeeping CPU.

A microbenchmark calling an invalid syscall number 10 million
times in a row speeds up an additional 30% over the numbers
with just the previous patches, for a total speedup of about
40% over 4.4 and 4.5-rc1.

Run times for the microbenchmark:

4.4				3.8 seconds
4.5-rc1				3.7 seconds
4.5-rc1 + first patch		3.3 seconds
4.5-rc1 + first 3 patches	3.1 seconds
4.5-rc1 + all patches		2.3 seconds

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 include/linux/sched.h  |  1 +
 kernel/sched/cputime.c | 35 +++++++++++++++++++++++++++++------
 2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index a10494a94cc3..019c3af98503 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1532,6 +1532,7 @@ struct task_struct {
 	struct prev_cputime prev_cputime;
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
 	seqcount_t vtime_seqcount;
+	unsigned long vtime_jiffies;
 	unsigned long long vtime_snap;
 	enum {
 		/* Task is sleeping or running in a CPU with VTIME inactive */
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index b2ab2ffb1adc..923c110319b1 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -668,6 +668,15 @@ void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime
 #endif /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+static bool vtime_jiffies_changed(struct task_struct *tsk, unsigned long now)
+{
+	if (tsk->vtime_jiffies == jiffies)
+		return false;
+
+	tsk->vtime_jiffies = jiffies;
+	return true;
+}
+
 static unsigned long long vtime_delta(struct task_struct *tsk)
 {
 	unsigned long long clock;
@@ -699,6 +708,9 @@ static void __vtime_account_system(struct task_struct *tsk)
 
 void vtime_account_system(struct task_struct *tsk)
 {
+	if (!vtime_jiffies_changed(tsk, jiffies))
+		return;
+
 	write_seqcount_begin(&tsk->vtime_seqcount);
 	__vtime_account_system(tsk);
 	write_seqcount_end(&tsk->vtime_seqcount);
@@ -707,7 +719,8 @@ void vtime_account_system(struct task_struct *tsk)
 void vtime_gen_account_irq_exit(struct task_struct *tsk)
 {
 	write_seqcount_begin(&tsk->vtime_seqcount);
-	__vtime_account_system(tsk);
+	if (vtime_jiffies_changed(tsk, jiffies))
+		__vtime_account_system(tsk);
 	if (context_tracking_in_user())
 		tsk->vtime_snap_whence = VTIME_USER;
 	write_seqcount_end(&tsk->vtime_seqcount);
@@ -718,16 +731,19 @@ void vtime_account_user(struct task_struct *tsk)
 	cputime_t delta_cpu;
 
 	write_seqcount_begin(&tsk->vtime_seqcount);
-	delta_cpu = get_vtime_delta(tsk);
 	tsk->vtime_snap_whence = VTIME_SYS;
-	account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
+	if (vtime_jiffies_changed(tsk, jiffies)) {
+		delta_cpu = get_vtime_delta(tsk);
+		account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
+	}
 	write_seqcount_end(&tsk->vtime_seqcount);
 }
 
 void vtime_user_enter(struct task_struct *tsk)
 {
 	write_seqcount_begin(&tsk->vtime_seqcount);
-	__vtime_account_system(tsk);
+	if (vtime_jiffies_changed(tsk, jiffies))
+		__vtime_account_system(tsk);
 	tsk->vtime_snap_whence = VTIME_USER;
 	write_seqcount_end(&tsk->vtime_seqcount);
 }
@@ -742,7 +758,8 @@ void vtime_guest_enter(struct task_struct *tsk)
 	 * that can thus safely catch up with a tickless delta.
 	 */
 	write_seqcount_begin(&tsk->vtime_seqcount);
-	__vtime_account_system(tsk);
+	if (vtime_jiffies_changed(tsk, jiffies))
+		__vtime_account_system(tsk);
 	current->flags |= PF_VCPU;
 	write_seqcount_end(&tsk->vtime_seqcount);
 }
@@ -759,8 +776,12 @@ EXPORT_SYMBOL_GPL(vtime_guest_exit);
 
 void vtime_account_idle(struct task_struct *tsk)
 {
-	cputime_t delta_cpu = get_vtime_delta(tsk);
+	cputime_t delta_cpu;
+
+	if (!vtime_jiffies_changed(tsk, jiffies))
+		return;
 
+	delta_cpu = get_vtime_delta(tsk);
 	account_idle_time(delta_cpu);
 }
 
@@ -773,6 +794,7 @@ void arch_vtime_task_switch(struct task_struct *prev)
 	write_seqcount_begin(&current->vtime_seqcount);
 	current->vtime_snap_whence = VTIME_SYS;
 	current->vtime_snap = sched_clock_cpu(smp_processor_id());
+	current->vtime_jiffies = jiffies;
 	write_seqcount_end(&current->vtime_seqcount);
 }
 
@@ -784,6 +806,7 @@ void vtime_init_idle(struct task_struct *t, int cpu)
 	write_seqcount_begin(&t->vtime_seqcount);
 	t->vtime_snap_whence = VTIME_SYS;
 	t->vtime_snap = sched_clock_cpu(cpu);
+	t->vtime_jiffies = jiffies;
 	write_seqcount_end(&t->vtime_seqcount);
 	local_irq_restore(flags);
 }
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals
  2016-01-30  3:36 ` [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals riel
@ 2016-01-30  4:56   ` kbuild test robot
  2016-01-30 14:44   ` Frederic Weisbecker
  1 sibling, 0 replies; 16+ messages in thread
From: kbuild test robot @ 2016-01-30  4:56 UTC (permalink / raw)
  To: riel; +Cc: kbuild-all, linux-kernel, tglx, mingo, luto, fweisbec, peterz, clark

[-- Attachment #1: Type: text/plain, Size: 905 bytes --]

Hi Rik,

[auto build test ERROR on tip/sched/core]
[also build test ERROR on v4.5-rc1 next-20160129]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url:    https://github.com/0day-ci/linux/commits/riel-redhat-com/sched-time-remove-non-power-of-two-divides-from-__acct_update_integrals/20160130-114019
config: i386-defconfig (attached as .config)
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   kernel/built-in.o: In function `xacct_add_tsk':
>> (.text+0x906b0): undefined reference to `__udivdi3'
   kernel/built-in.o: In function `xacct_add_tsk':
   (.text+0x906e3): undefined reference to `__udivdi3'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 23958 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals
  2016-01-30  3:36 ` [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals riel
  2016-01-30  4:56   ` kbuild test robot
@ 2016-01-30 14:44   ` Frederic Weisbecker
  2016-01-30 17:53     ` Rik van Riel
  1 sibling, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2016-01-30 14:44 UTC (permalink / raw)
  To: riel; +Cc: linux-kernel, tglx, mingo, luto, peterz, clark

On Fri, Jan 29, 2016 at 10:36:02PM -0500, riel@redhat.com wrote:
> From: Rik van Riel <riel@redhat.com>
> 
> When running a microbenchmark calling an invalid syscall number
> in a loop, on a nohz_full CPU, we spend a full 9% of our CPU
> time in __acct_update_integrals.
> 
> This function converts cputime_t to jiffies, to a timeval, only to
> convert the timeval back to microseconds before discarding it.
> 
> This patch leaves __acct_update_integrals functionally equivalent,
> but speeds things up by about 12%, with 10 million calls to an
> invalid syscall number dropping from 3.7 to 3.25 seconds.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
>  kernel/tsacct.c | 19 +++++++++----------
>  1 file changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/tsacct.c b/kernel/tsacct.c
> index 975cb49e32bf..41667b23dbd0 100644
> --- a/kernel/tsacct.c
> +++ b/kernel/tsacct.c
> @@ -93,9 +93,9 @@ void xacct_add_tsk(struct taskstats *stats, struct task_struct *p)
>  {
>  	struct mm_struct *mm;
>  
> -	/* convert pages-usec to Mbyte-usec */
> -	stats->coremem = p->acct_rss_mem1 * PAGE_SIZE / MB;
> -	stats->virtmem = p->acct_vm_mem1 * PAGE_SIZE / MB;
> +	/* convert pages-nsec/KB to Mbyte-usec, see __acct_update_integrals */
> +	stats->coremem = p->acct_rss_mem1 * PAGE_SIZE / (1000 * KB);
> +	stats->virtmem = p->acct_vm_mem1 * PAGE_SIZE / (1000 * KB);
>  	mm = get_task_mm(p);
>  	if (mm) {
>  		/* adjust to KB unit */
> @@ -125,22 +125,21 @@ static void __acct_update_integrals(struct task_struct *tsk,
>  {
>  	if (likely(tsk->mm)) {
>  		cputime_t time, dtime;
> -		struct timeval value;
>  		unsigned long flags;
>  		u64 delta;
>  
>  		local_irq_save(flags);
>  		time = stime + utime;
>  		dtime = time - tsk->acct_timexpd;
> -		jiffies_to_timeval(cputime_to_jiffies(dtime), &value);
> -		delta = value.tv_sec;
> -		delta = delta * USEC_PER_SEC + value.tv_usec;
> +		delta = cputime_to_nsecs(dtime);

You might want to add a comment specifying why we don't call cputime_to_usecs()
directly (because we optimize if delta < TICK_NSEC).

Although this has a good impact on nohz_full, it might have a tiny bad one on !nohz_full
because now we first convert jiffies to nsecs (which implies a multiplication by 1000)
that we later divide again by 1000. Now this is ok because I plan to convert tsk->utime/stime
to nsecs and thus remove most of the cputime_t use and conversions everywhere.

>  
> -		if (delta == 0)
> +		if (delta < TICK_NSEC)
>  			goto out;


> +
>  		tsk->acct_timexpd = time;
> -		tsk->acct_rss_mem1 += delta * get_mm_rss(tsk->mm);
> -		tsk->acct_vm_mem1 += delta * tsk->mm->total_vm;
> +		/* The final unit will be Mbyte-usecs, see xacct_add_tsk */
> +		tsk->acct_rss_mem1 += delta * get_mm_rss(tsk->mm) / 1024;
> +		tsk->acct_vm_mem1 += delta * tsk->mm->total_vm / 1024;

The use of 1024 and the change on MB above are confusing me. Why are we doing that?

Thanks.

>  	out:
>  		local_irq_restore(flags);
>  	}
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] acct,time: change indentation in __acct_update_integrals
  2016-01-30  3:36 ` [PATCH 2/4] acct,time: change indentation in __acct_update_integrals riel
@ 2016-01-30 16:15   ` Frederic Weisbecker
  0 siblings, 0 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2016-01-30 16:15 UTC (permalink / raw)
  To: riel; +Cc: linux-kernel, tglx, mingo, luto, peterz, clark

On Fri, Jan 29, 2016 at 10:36:03PM -0500, riel@redhat.com wrote:
> From: Rik van Riel <riel@redhat.com>
> 
> Change the indentation in __acct_update_integrals to make the function
> a little easier to read.
> 
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Rik van Riel <riel@redhat.com>

ACK.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] time,acct: drop irq save & restore from __acct_update_integrals
  2016-01-30  3:36 ` [PATCH 3/4] time,acct: drop irq save & restore from __acct_update_integrals riel
@ 2016-01-30 16:24   ` Frederic Weisbecker
  0 siblings, 0 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2016-01-30 16:24 UTC (permalink / raw)
  To: riel; +Cc: linux-kernel, tglx, mingo, luto, peterz, clark

On Fri, Jan 29, 2016 at 10:36:04PM -0500, riel@redhat.com wrote:
> From: Rik van Riel <riel@redhat.com>
> 
> It looks like all the call paths that lead to __acct_update_integrals
> already have irqs disabled, and __acct_update_integrals does not need
> to disable irqs itself.
> 
> This is very convenient since about half the CPU time left in this
> function was spent in local_irq_save alone.
> 
> Performance of a microbenchmark that calls an invalid syscall
> ten million times in a row on a nohz_full CPU improves 21% vs.
> 4.5-rc1 with both the removal of divisions from __acct_update_integrals
> and this patch, with runtime dropping from 3.7 to 2.9 seconds.
> 
> With these patches applied, the highest remaining cpu user in
> the trace is native_sched_clock, which is addressed in the next
> patch.
> 
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
>  kernel/tsacct.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/kernel/tsacct.c b/kernel/tsacct.c
> index 8908f8b1d26e..b2663d699a72 100644
> --- a/kernel/tsacct.c
> +++ b/kernel/tsacct.c
> @@ -124,26 +124,22 @@ static void __acct_update_integrals(struct task_struct *tsk,
>  				    cputime_t utime, cputime_t stime)
>  {
>  	cputime_t time, dtime;
> -	unsigned long flags;
>  	u64 delta;
>  
>  	if (unlikely(!tsk->mm))
>  		return;
>  
> -	local_irq_save(flags);
>  	time = stime + utime;
>  	dtime = time - tsk->acct_timexpd;
>  	delta = cputime_to_nsecs(dtime);
>  
>  	if (delta < TICK_NSEC)
> -		goto out;
> +		return;
>  
>  	tsk->acct_timexpd = time;
>  	/* The final unit will be Mbyte-usecs, see xacct_add_tsk */
>  	tsk->acct_rss_mem1 += delta * get_mm_rss(tsk->mm) / 1024;
>  	tsk->acct_vm_mem1 += delta * tsk->mm->total_vm / 1024;
> -out:
> -	local_irq_restore(flags);
>  }

I think you need this as well, because do_exit() probably doesn't have
irqs disabled at that point:

diff --git a/kernel/tsacct.c b/kernel/tsacct.c
index 975cb49..12c6047 100644
--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -153,9 +153,12 @@ static void __acct_update_integrals(struct task_struct *tsk,
 void acct_update_integrals(struct task_struct *tsk)
 {
 	cputime_t utime, stime;
+	unsigned long flags;
 
 	task_cputime(tsk, &utime, &stime);
+	local_irq_save(flags);
 	__acct_update_integrals(tsk, utime, stime);
+	local_irq_restore(flags);
 }
 
 /**

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals
  2016-01-30 14:44   ` Frederic Weisbecker
@ 2016-01-30 17:53     ` Rik van Riel
  2016-02-01 11:30       ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Rik van Riel @ 2016-01-30 17:53 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: linux-kernel, tglx, mingo, luto, peterz, clark

On 01/30/2016 09:44 AM, Frederic Weisbecker wrote:
> On Fri, Jan 29, 2016 at 10:36:02PM -0500, riel@redhat.com wrote:
>> From: Rik van Riel <riel@redhat.com>
>>
>> When running a microbenchmark calling an invalid syscall number
>> in a loop, on a nohz_full CPU, we spend a full 9% of our CPU
>> time in __acct_update_integrals.
>>
>> This function converts cputime_t to jiffies, to a timeval, only to
>> convert the timeval back to microseconds before discarding it.
>>
>> This patch leaves __acct_update_integrals functionally equivalent,
>> but speeds things up by about 12%, with 10 million calls to an
>> invalid syscall number dropping from 3.7 to 3.25 seconds.
>>
>> Signed-off-by: Rik van Riel <riel@redhat.com>
>> ---
>>  kernel/tsacct.c | 19 +++++++++----------
>>  1 file changed, 9 insertions(+), 10 deletions(-)
>>
>> diff --git a/kernel/tsacct.c b/kernel/tsacct.c
>> index 975cb49e32bf..41667b23dbd0 100644
>> --- a/kernel/tsacct.c
>> +++ b/kernel/tsacct.c
>> @@ -93,9 +93,9 @@ void xacct_add_tsk(struct taskstats *stats, struct task_struct *p)
>>  {
>>  	struct mm_struct *mm;
>>  
>> -	/* convert pages-usec to Mbyte-usec */
>> -	stats->coremem = p->acct_rss_mem1 * PAGE_SIZE / MB;
>> -	stats->virtmem = p->acct_vm_mem1 * PAGE_SIZE / MB;
>> +	/* convert pages-nsec/KB to Mbyte-usec, see __acct_update_integrals */
>> +	stats->coremem = p->acct_rss_mem1 * PAGE_SIZE / (1000 * KB);
>> +	stats->virtmem = p->acct_vm_mem1 * PAGE_SIZE / (1000 * KB);
>>  	mm = get_task_mm(p);
>>  	if (mm) {
>>  		/* adjust to KB unit */
>> @@ -125,22 +125,21 @@ static void __acct_update_integrals(struct task_struct *tsk,
>>  {
>>  	if (likely(tsk->mm)) {
>>  		cputime_t time, dtime;
>> -		struct timeval value;
>>  		unsigned long flags;
>>  		u64 delta;
>>  
>>  		local_irq_save(flags);
>>  		time = stime + utime;
>>  		dtime = time - tsk->acct_timexpd;
>> -		jiffies_to_timeval(cputime_to_jiffies(dtime), &value);
>> -		delta = value.tv_sec;
>> -		delta = delta * USEC_PER_SEC + value.tv_usec;
>> +		delta = cputime_to_nsecs(dtime);
> 
> You might want to add a comment specifying why we don't call cputime_to_usecs()
> directly (because we optimize if delta < TICK_NSEC).
> 
> Although this has a good impact on nohz_full, it might have a tiny bad one on !nohz_full
> because now we first convert jiffies to nsecs (which implies a multiplication by 1000)
> that we later divide again by 1000. Now this is ok because I plan to convert tsk->utime/stime
> to nsecs and thus remove most of the cputime_t use and conversions everywhere.

Isn't cputime_t in nanoseconds even on !nohz_full systems nowadays,
due to sched_clock?

Also, a multiplication is essentially instantaneous compared to
a division, which is why Peter suggested going this way around.

>>  
>> -		if (delta == 0)
>> +		if (delta < TICK_NSEC)
>>  			goto out;
> 
> 
>> +
>>  		tsk->acct_timexpd = time;
>> -		tsk->acct_rss_mem1 += delta * get_mm_rss(tsk->mm);
>> -		tsk->acct_vm_mem1 += delta * tsk->mm->total_vm;
>> +		/* The final unit will be Mbyte-usecs, see xacct_add_tsk */
>> +		tsk->acct_rss_mem1 += delta * get_mm_rss(tsk->mm) / 1024;
>> +		tsk->acct_vm_mem1 += delta * tsk->mm->total_vm / 1024;
> 
> The use of 1024 and the change on MB above are confusing me. Why are we doing that?
> 
> Thanks.

So the compiler can just do a right shift in the frequently called
code, and have no divide at all left in __acct_update_integrals.
However, reducing the value here does seem useful for the prevention
of overflows.

The divide is saved for when the statistics are read out to
userspace.

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals
  2016-01-30 17:53     ` Rik van Riel
@ 2016-02-01 11:30       ` Peter Zijlstra
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Zijlstra @ 2016-02-01 11:30 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Frederic Weisbecker, linux-kernel, tglx, mingo, luto, clark

On Sat, Jan 30, 2016 at 12:53:09PM -0500, Rik van Riel wrote:
> >> +		delta = cputime_to_nsecs(dtime);
> > 
> > You might want to add a comment specifying why we don't call cputime_to_usecs()
> > directly (because we optimize if delta < TICK_NSEC).
> > 
> > Although this has a good impact on nohz_full, it might have a tiny bad one on !nohz_full
> > because now we first convert jiffies to nsecs (which implies a multiplication by 1000)
> > that we later divide again by 1000. Now this is ok because I plan to convert tsk->utime/stime
> > to nsecs and thus remove most of the cputime_t use and conversions everywhere.
> 
> Isn't cputime_t in nanoseconds even on !nohz_full systems nowadays,
> due to sched_clock?

Don't think so, we still use jiffy accounting for !NOHZ_FULL.

> Also, a multiplication is essentially instantaneous compared to
> a division, which is why Peter suggested going this way around.

Yep.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] acct,time: change indentation in __acct_update_integrals
  2016-02-11  1:08 ` [PATCH 2/4] acct,time: change indentation in __acct_update_integrals riel
@ 2016-02-11  1:23   ` Joe Perches
  0 siblings, 0 replies; 16+ messages in thread
From: Joe Perches @ 2016-02-11  1:23 UTC (permalink / raw)
  To: riel, linux-kernel
  Cc: fweisbec, tglx, mingo, luto, peterz, clark, eric.dumazet

On Wed, 2016-02-10 at 20:08 -0500, riel@redhat.com wrote:
> Change the indentation in __acct_update_integrals to make the function
> a little easier to read.

trivia:

> diff --git a/kernel/tsacct.c b/kernel/tsacct.c
[]
> @@ -125,31 +125,32 @@ void xacct_add_tsk(struct taskstats *stats, struct task_struct *p)
[]
> +	if (!likely(tsk->mm))
> +		return;

Using

	if (unlikely(!tsk->mm))
		return;

would be a lot more common.

(~150:1 in the kernel sources)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/4] acct,time: change indentation in __acct_update_integrals
  2016-02-11  1:08 [PATCH 0/4 v6] sched,time: reduce nohz_full syscall overhead 40% riel
@ 2016-02-11  1:08 ` riel
  2016-02-11  1:23   ` Joe Perches
  0 siblings, 1 reply; 16+ messages in thread
From: riel @ 2016-02-11  1:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: fweisbec, tglx, mingo, luto, peterz, clark, eric.dumazet

From: Rik van Riel <riel@redhat.com>

Change the indentation in __acct_update_integrals to make the function
a little easier to read.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@redhat.com>
---
 kernel/tsacct.c | 51 ++++++++++++++++++++++++++-------------------------
 1 file changed, 26 insertions(+), 25 deletions(-)

diff --git a/kernel/tsacct.c b/kernel/tsacct.c
index 460ee2bbfef3..d12e815b7bcd 100644
--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -125,31 +125,32 @@ void xacct_add_tsk(struct taskstats *stats, struct task_struct *p)
 static void __acct_update_integrals(struct task_struct *tsk,
 				    cputime_t utime, cputime_t stime)
 {
-	if (likely(tsk->mm)) {
-		cputime_t time, dtime;
-		unsigned long flags;
-		u64 delta;
-
-		local_irq_save(flags);
-		time = stime + utime;
-		dtime = time - tsk->acct_timexpd;
-		/* Avoid division: cputime_t is often in nanoseconds already. */
-		delta = cputime_to_nsecs(dtime);
-
-		if (delta < TICK_NSEC)
-			goto out;
-
-		tsk->acct_timexpd = time;
-		/*
-		 * Divide by 1024 to avoid overflow, and to avoid division.
-		 * The final unit reported to userspace is Mbyte-usecs,
-		 * the rest of the math is done in xacct_add_tsk.
-		 */
-		tsk->acct_rss_mem1 += delta * get_mm_rss(tsk->mm) >> 10;
-		tsk->acct_vm_mem1 += delta * tsk->mm->total_vm >> 10;
-	out:
-		local_irq_restore(flags);
-	}
+	cputime_t time, dtime;
+	unsigned long flags;
+	u64 delta;
+
+	if (!likely(tsk->mm))
+		return;
+
+	local_irq_save(flags);
+	time = stime + utime;
+	dtime = time - tsk->acct_timexpd;
+	/* Avoid division: cputime_t is often in nanoseconds already. */
+	delta = cputime_to_nsecs(dtime);
+
+	if (delta < TICK_NSEC)
+		goto out;
+
+	tsk->acct_timexpd = time;
+	/*
+	 * Divide by 1024 to avoid overflow, and to avoid division.
+	 * The final unit reported to userspace is Mbyte-usecs,
+	 * the rest of the math is done in xacct_add_tsk.
+	 */
+	tsk->acct_rss_mem1 += delta * get_mm_rss(tsk->mm) >> 10;
+	tsk->acct_vm_mem1 += delta * tsk->mm->total_vm >> 10;
+out:
+	local_irq_restore(flags);
 }
 
 /**
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] acct,time: change indentation in __acct_update_integrals
  2016-02-02 17:19 [PATCH 0/4 v5] sched,time: reduce nohz_full syscall overhead 40% riel
@ 2016-02-02 17:19 ` riel
  0 siblings, 0 replies; 16+ messages in thread
From: riel @ 2016-02-02 17:19 UTC (permalink / raw)
  To: linux-kernel; +Cc: fweisbec, tglx, mingo, luto, peterz, clark, eric.dumazet

From: Rik van Riel <riel@redhat.com>

Change the indentation in __acct_update_integrals to make the function
a little easier to read.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@redhat.com>
---
 kernel/tsacct.c | 51 ++++++++++++++++++++++++++-------------------------
 1 file changed, 26 insertions(+), 25 deletions(-)

diff --git a/kernel/tsacct.c b/kernel/tsacct.c
index 460ee2bbfef3..d12e815b7bcd 100644
--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -125,31 +125,32 @@ void xacct_add_tsk(struct taskstats *stats, struct task_struct *p)
 static void __acct_update_integrals(struct task_struct *tsk,
 				    cputime_t utime, cputime_t stime)
 {
-	if (likely(tsk->mm)) {
-		cputime_t time, dtime;
-		unsigned long flags;
-		u64 delta;
-
-		local_irq_save(flags);
-		time = stime + utime;
-		dtime = time - tsk->acct_timexpd;
-		/* Avoid division: cputime_t is often in nanoseconds already. */
-		delta = cputime_to_nsecs(dtime);
-
-		if (delta < TICK_NSEC)
-			goto out;
-
-		tsk->acct_timexpd = time;
-		/*
-		 * Divide by 1024 to avoid overflow, and to avoid division.
-		 * The final unit reported to userspace is Mbyte-usecs,
-		 * the rest of the math is done in xacct_add_tsk.
-		 */
-		tsk->acct_rss_mem1 += delta * get_mm_rss(tsk->mm) >> 10;
-		tsk->acct_vm_mem1 += delta * tsk->mm->total_vm >> 10;
-	out:
-		local_irq_restore(flags);
-	}
+	cputime_t time, dtime;
+	unsigned long flags;
+	u64 delta;
+
+	if (!likely(tsk->mm))
+		return;
+
+	local_irq_save(flags);
+	time = stime + utime;
+	dtime = time - tsk->acct_timexpd;
+	/* Avoid division: cputime_t is often in nanoseconds already. */
+	delta = cputime_to_nsecs(dtime);
+
+	if (delta < TICK_NSEC)
+		goto out;
+
+	tsk->acct_timexpd = time;
+	/*
+	 * Divide by 1024 to avoid overflow, and to avoid division.
+	 * The final unit reported to userspace is Mbyte-usecs,
+	 * the rest of the math is done in xacct_add_tsk.
+	 */
+	tsk->acct_rss_mem1 += delta * get_mm_rss(tsk->mm) >> 10;
+	tsk->acct_vm_mem1 += delta * tsk->mm->total_vm >> 10;
+out:
+	local_irq_restore(flags);
 }
 
 /**
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] acct,time: change indentation in __acct_update_integrals
  2016-02-01 19:21 [PATCH 0/4 v4] sched,time: reduce nohz_full syscall overhead 40% riel
@ 2016-02-01 19:21 ` riel
  0 siblings, 0 replies; 16+ messages in thread
From: riel @ 2016-02-01 19:21 UTC (permalink / raw)
  To: linux-kernel; +Cc: tglx, peterz, fweisbec, clark, luto, mingo

From: Rik van Riel <riel@redhat.com>

Change the indentation in __acct_update_integrals to make the function
a little easier to read.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@redhat.com>
---
 kernel/tsacct.c | 51 ++++++++++++++++++++++++++-------------------------
 1 file changed, 26 insertions(+), 25 deletions(-)

diff --git a/kernel/tsacct.c b/kernel/tsacct.c
index 460ee2bbfef3..d12e815b7bcd 100644
--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -125,31 +125,32 @@ void xacct_add_tsk(struct taskstats *stats, struct task_struct *p)
 static void __acct_update_integrals(struct task_struct *tsk,
 				    cputime_t utime, cputime_t stime)
 {
-	if (likely(tsk->mm)) {
-		cputime_t time, dtime;
-		unsigned long flags;
-		u64 delta;
-
-		local_irq_save(flags);
-		time = stime + utime;
-		dtime = time - tsk->acct_timexpd;
-		/* Avoid division: cputime_t is often in nanoseconds already. */
-		delta = cputime_to_nsecs(dtime);
-
-		if (delta < TICK_NSEC)
-			goto out;
-
-		tsk->acct_timexpd = time;
-		/*
-		 * Divide by 1024 to avoid overflow, and to avoid division.
-		 * The final unit reported to userspace is Mbyte-usecs,
-		 * the rest of the math is done in xacct_add_tsk.
-		 */
-		tsk->acct_rss_mem1 += delta * get_mm_rss(tsk->mm) >> 10;
-		tsk->acct_vm_mem1 += delta * tsk->mm->total_vm >> 10;
-	out:
-		local_irq_restore(flags);
-	}
+	cputime_t time, dtime;
+	unsigned long flags;
+	u64 delta;
+
+	if (!likely(tsk->mm))
+		return;
+
+	local_irq_save(flags);
+	time = stime + utime;
+	dtime = time - tsk->acct_timexpd;
+	/* Avoid division: cputime_t is often in nanoseconds already. */
+	delta = cputime_to_nsecs(dtime);
+
+	if (delta < TICK_NSEC)
+		goto out;
+
+	tsk->acct_timexpd = time;
+	/*
+	 * Divide by 1024 to avoid overflow, and to avoid division.
+	 * The final unit reported to userspace is Mbyte-usecs,
+	 * the rest of the math is done in xacct_add_tsk.
+	 */
+	tsk->acct_rss_mem1 += delta * get_mm_rss(tsk->mm) >> 10;
+	tsk->acct_vm_mem1 += delta * tsk->mm->total_vm >> 10;
+out:
+	local_irq_restore(flags);
 }
 
 /**
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] acct,time: change indentation in __acct_update_integrals
  2016-02-01  2:12 [PATCH 0/4 v3] sched,time: reduce nohz_full syscall overhead 40% riel
@ 2016-02-01  2:12 ` riel
  0 siblings, 0 replies; 16+ messages in thread
From: riel @ 2016-02-01  2:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: fweisbec, tglx, mingo, luto, peterz, clark

From: Rik van Riel <riel@redhat.com>

Change the indentation in __acct_update_integrals to make the function
a little easier to read.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@redhat.com>
---
 kernel/tsacct.c | 51 ++++++++++++++++++++++++++-------------------------
 1 file changed, 26 insertions(+), 25 deletions(-)

diff --git a/kernel/tsacct.c b/kernel/tsacct.c
index 1b121a2f1c55..9c23584c76c4 100644
--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -123,31 +123,32 @@ void xacct_add_tsk(struct taskstats *stats, struct task_struct *p)
 static void __acct_update_integrals(struct task_struct *tsk,
 				    cputime_t utime, cputime_t stime)
 {
-	if (likely(tsk->mm)) {
-		cputime_t time, dtime;
-		unsigned long flags;
-		u64 delta;
-
-		local_irq_save(flags);
-		time = stime + utime;
-		dtime = time - tsk->acct_timexpd;
-		/* Avoid division: cputime_t is often in nanoseconds already. */
-		delta = cputime_to_nsecs(dtime);
-
-		if (delta < TICK_NSEC)
-			goto out;
-
-		tsk->acct_timexpd = time;
-		/*
-		 * Divide by 1024 to avoid overflow, and to avoid division.
-		 * The final unit reported to userspace is Mbyte-usecs,
-		 * the rest of the math is done in xacct_add_tsk.
-		 */
-		tsk->acct_rss_mem1 += delta * get_mm_rss(tsk->mm) >> 10;
-		tsk->acct_vm_mem1 += delta * tsk->mm->total_vm >> 10;
-	out:
-		local_irq_restore(flags);
-	}
+	cputime_t time, dtime;
+	unsigned long flags;
+	u64 delta;
+
+	if (!likely(tsk->mm))
+		return;
+
+	local_irq_save(flags);
+	time = stime + utime;
+	dtime = time - tsk->acct_timexpd;
+	/* Avoid division: cputime_t is often in nanoseconds already. */
+	delta = cputime_to_nsecs(dtime);
+
+	if (delta < TICK_NSEC)
+		goto out;
+
+	tsk->acct_timexpd = time;
+	/*
+	 * Divide by 1024 to avoid overflow, and to avoid division.
+	 * The final unit reported to userspace is Mbyte-usecs,
+	 * the rest of the math is done in xacct_add_tsk.
+	 */
+	tsk->acct_rss_mem1 += delta * get_mm_rss(tsk->mm) >> 10;
+	tsk->acct_vm_mem1 += delta * tsk->mm->total_vm >> 10;
+out:
+	local_irq_restore(flags);
 }
 
 /**
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-02-11  1:23 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-30  3:36 [PATCH 0/2] sched,time: reduce nohz_full syscall overhead 40% riel
2016-01-30  3:36 ` [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals riel
2016-01-30  4:56   ` kbuild test robot
2016-01-30 14:44   ` Frederic Weisbecker
2016-01-30 17:53     ` Rik van Riel
2016-02-01 11:30       ` Peter Zijlstra
2016-01-30  3:36 ` [PATCH 2/4] acct,time: change indentation in __acct_update_integrals riel
2016-01-30 16:15   ` Frederic Weisbecker
2016-01-30  3:36 ` [PATCH 3/4] time,acct: drop irq save & restore from __acct_update_integrals riel
2016-01-30 16:24   ` Frederic Weisbecker
2016-01-30  3:36 ` [PATCH 4/4] sched,time: only call account_{user,sys,guest,idle}_time once a jiffy riel
2016-02-01  2:12 [PATCH 0/4 v3] sched,time: reduce nohz_full syscall overhead 40% riel
2016-02-01  2:12 ` [PATCH 2/4] acct,time: change indentation in __acct_update_integrals riel
2016-02-01 19:21 [PATCH 0/4 v4] sched,time: reduce nohz_full syscall overhead 40% riel
2016-02-01 19:21 ` [PATCH 2/4] acct,time: change indentation in __acct_update_integrals riel
2016-02-02 17:19 [PATCH 0/4 v5] sched,time: reduce nohz_full syscall overhead 40% riel
2016-02-02 17:19 ` [PATCH 2/4] acct,time: change indentation in __acct_update_integrals riel
2016-02-11  1:08 [PATCH 0/4 v6] sched,time: reduce nohz_full syscall overhead 40% riel
2016-02-11  1:08 ` [PATCH 2/4] acct,time: change indentation in __acct_update_integrals riel
2016-02-11  1:23   ` Joe Perches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).