All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rik van Riel <riel@redhat.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de,
	mingo@kernel.org, luto@amacapital.net, peterz@infradead.org,
	clark@redhat.com
Subject: Re: [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals
Date: Sat, 30 Jan 2016 12:53:09 -0500	[thread overview]
Message-ID: <56ACF885.8070203@redhat.com> (raw)
In-Reply-To: <20160130144403.GB32581@lerouge>

On 01/30/2016 09:44 AM, Frederic Weisbecker wrote:
> On Fri, Jan 29, 2016 at 10:36:02PM -0500, riel@redhat.com wrote:
>> From: Rik van Riel <riel@redhat.com>
>>
>> When running a microbenchmark calling an invalid syscall number
>> in a loop, on a nohz_full CPU, we spend a full 9% of our CPU
>> time in __acct_update_integrals.
>>
>> This function converts cputime_t to jiffies, to a timeval, only to
>> convert the timeval back to microseconds before discarding it.
>>
>> This patch leaves __acct_update_integrals functionally equivalent,
>> but speeds things up by about 12%, with 10 million calls to an
>> invalid syscall number dropping from 3.7 to 3.25 seconds.
>>
>> Signed-off-by: Rik van Riel <riel@redhat.com>
>> ---
>>  kernel/tsacct.c | 19 +++++++++----------
>>  1 file changed, 9 insertions(+), 10 deletions(-)
>>
>> diff --git a/kernel/tsacct.c b/kernel/tsacct.c
>> index 975cb49e32bf..41667b23dbd0 100644
>> --- a/kernel/tsacct.c
>> +++ b/kernel/tsacct.c
>> @@ -93,9 +93,9 @@ void xacct_add_tsk(struct taskstats *stats, struct task_struct *p)
>>  {
>>  	struct mm_struct *mm;
>>  
>> -	/* convert pages-usec to Mbyte-usec */
>> -	stats->coremem = p->acct_rss_mem1 * PAGE_SIZE / MB;
>> -	stats->virtmem = p->acct_vm_mem1 * PAGE_SIZE / MB;
>> +	/* convert pages-nsec/KB to Mbyte-usec, see __acct_update_integrals */
>> +	stats->coremem = p->acct_rss_mem1 * PAGE_SIZE / (1000 * KB);
>> +	stats->virtmem = p->acct_vm_mem1 * PAGE_SIZE / (1000 * KB);
>>  	mm = get_task_mm(p);
>>  	if (mm) {
>>  		/* adjust to KB unit */
>> @@ -125,22 +125,21 @@ static void __acct_update_integrals(struct task_struct *tsk,
>>  {
>>  	if (likely(tsk->mm)) {
>>  		cputime_t time, dtime;
>> -		struct timeval value;
>>  		unsigned long flags;
>>  		u64 delta;
>>  
>>  		local_irq_save(flags);
>>  		time = stime + utime;
>>  		dtime = time - tsk->acct_timexpd;
>> -		jiffies_to_timeval(cputime_to_jiffies(dtime), &value);
>> -		delta = value.tv_sec;
>> -		delta = delta * USEC_PER_SEC + value.tv_usec;
>> +		delta = cputime_to_nsecs(dtime);
> 
> You might want to add a comment specifying why we don't call cputime_to_usecs()
> directly (because we optimize if delta < TICK_NSEC).
> 
> Although this has a good impact on nohz_full, it might have a tiny bad one on !nohz_full
> because now we first convert jiffies to nsecs (which implies a multiplication by 1000)
> that we later divide again by 1000. Now this is ok because I plan to convert tsk->utime/stime
> to nsecs and thus remove most of the cputime_t use and conversions everywhere.

Isn't cputime_t in nanoseconds even on !nohz_full systems nowadays,
due to sched_clock?

Also, a multiplication is essentially instantaneous compared to
a division, which is why Peter suggested going this way around.

>>  
>> -		if (delta == 0)
>> +		if (delta < TICK_NSEC)
>>  			goto out;
> 
> 
>> +
>>  		tsk->acct_timexpd = time;
>> -		tsk->acct_rss_mem1 += delta * get_mm_rss(tsk->mm);
>> -		tsk->acct_vm_mem1 += delta * tsk->mm->total_vm;
>> +		/* The final unit will be Mbyte-usecs, see xacct_add_tsk */
>> +		tsk->acct_rss_mem1 += delta * get_mm_rss(tsk->mm) / 1024;
>> +		tsk->acct_vm_mem1 += delta * tsk->mm->total_vm / 1024;
> 
> The use of 1024 and the change on MB above are confusing me. Why are we doing that?
> 
> Thanks.

So the compiler can just do a right shift in the frequently called
code, and have no divide at all left in __acct_update_integrals.
However, reducing the value here does seem useful for the prevention
of overflows.

The divide is saved for when the statistics are read out to
userspace.

-- 
All rights reversed

  reply	other threads:[~2016-01-30 17:53 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-30  3:36 [PATCH 0/2] sched,time: reduce nohz_full syscall overhead 40% riel
2016-01-30  3:36 ` [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals riel
2016-01-30  4:56   ` kbuild test robot
2016-01-30 14:44   ` Frederic Weisbecker
2016-01-30 17:53     ` Rik van Riel [this message]
2016-02-01 11:30       ` Peter Zijlstra
2016-01-30  3:36 ` [PATCH 2/4] acct,time: change indentation in __acct_update_integrals riel
2016-01-30 16:15   ` Frederic Weisbecker
2016-01-30  3:36 ` [PATCH 3/4] time,acct: drop irq save & restore from __acct_update_integrals riel
2016-01-30 16:24   ` Frederic Weisbecker
2016-01-30  3:36 ` [PATCH 4/4] sched,time: only call account_{user,sys,guest,idle}_time once a jiffy riel
2016-02-01  2:12 [PATCH 0/4 v3] sched,time: reduce nohz_full syscall overhead 40% riel
2016-02-01  2:12 ` [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals riel
2016-02-01  4:46   ` kbuild test robot
2016-02-01  8:37   ` Thomas Gleixner
2016-02-01  9:22     ` Peter Zijlstra
2016-02-01  9:31       ` Thomas Gleixner
2016-02-01 13:44       ` Rik van Riel
2016-02-01 13:51         ` Peter Zijlstra
2016-02-01 19:21 [PATCH 0/4 v4] sched,time: reduce nohz_full syscall overhead 40% riel
2016-02-01 19:21 ` [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals riel
2016-02-02 17:19 [PATCH 0/4 v5] sched,time: reduce nohz_full syscall overhead 40% riel
2016-02-02 17:19 ` [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals riel
2016-02-11  1:08 [PATCH 0/4 v6] sched,time: reduce nohz_full syscall overhead 40% riel
2016-02-11  1:08 ` [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56ACF885.8070203@redhat.com \
    --to=riel@redhat.com \
    --cc=clark@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.