linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Xunlei Pang <xlpang@linux.alibaba.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	tglx@linutronix.de, frederic@kernel.org, lcapitulino@redhat.com,
	torvalds@linux-foundation.org, linux-kernel@vger.kernel.org,
	hpa@zytor.com, tj@kernel.org, linux-tip-commits@vger.kernel.org
Subject: Re: [tip:sched/core] sched/cputime: Ensure accurate utime and stime ratio in cputime_adjust()
Date: Tue, 24 Jul 2018 21:28:48 +0800	[thread overview]
Message-ID: <dca42bb1-5480-cec0-0bc8-b5ac3c208177@linux.alibaba.com> (raw)
In-Reply-To: <20180723092118.GZ2494@hirez.programming.kicks-ass.net>

On 7/23/18 5:21 PM, Peter Zijlstra wrote:
> On Tue, Jul 17, 2018 at 12:08:36PM +0800, Xunlei Pang wrote:
>> The trace data corresponds to the last sample period:
>> trace entry 1:
>>              cat-20755 [022] d...  1370.106496: cputime_adjust: task
>> tick-based utime 362560000000 stime 2551000000, scheduler rtime 333060702626
>>              cat-20755 [022] d...  1370.106497: cputime_adjust: result:
>> old utime 330729718142 stime 2306983867, new utime 330733635372 stime
>> 2327067254
>>
>> trace entry 2:
>>              cat-20773 [005] d...  1371.109825: cputime_adjust: task
>> tick-based utime 362567000000 stime 3547000000, scheduler rtime 334063718912
>>              cat-20773 [005] d...  1371.109826: cputime_adjust: result:
>> old utime 330733635372 stime 2327067254, new utime 330827229702 stime
>> 3236489210
>>
>> 1) expected behaviour
>> Let's compare the last two trace entries(all the data below is in ns):
>> task tick-based utime: 362560000000->362567000000 increased 7000000
>> task tick-based stime: 2551000000  ->3547000000   increased 996000000
>> scheduler rtime:       333060702626->334063718912 increased 1003016286
>>
>> The application actually runs almost 100%sys at the moment, we can
>> use the task tick-based utime and stime increased to double check:
>> 996000000/(7000000+996000000) > 99%sys
>>
>> 2) the current cputime_adjust() inaccurate result
>> But for the current cputime_adjust(), we get the following adjusted
>> utime and stime increase in this sample period:
>> adjusted utime: 330733635372->330827229702 increased 93594330
>> adjusted stime: 2327067254  ->3236489210   increased 909421956
>>
>> so 909421956/(93594330+909421956)=91%sys as the shell script shows above.
>>
>> 3) root cause
>> The root cause of the issue is that the current cputime_adjust() always
>> passes the whole times to scale_stime() to split the whole utime and
>> stime. In this patch, we pass all the increased deltas in 1) within
>> user's sample period to scale_stime() instead and accumulate the
>> corresponding results to the previous saved adjusted utime and stime,
>> so guarantee the accurate usr and sys increase within the user sample
>> period.
> 
> But why it this a problem?
> 
> Since its sample based there's really nothing much you can guarantee.
> What if your test program were to run in userspace for 50% of the time
> but is so constructed to always be in kernel space when the tick
> happens?
> 
> Then you would 'expect' it to be 50% user and 50% sys, but you're also
> not getting that.
> 
> This stuff cannot be perfect, and the current code provides 'sensible'
> numbers over the long run for most programs. Why muck with it?
> 

Basically I am ok with the current implementation, except for one
scenario we've met: when kernel went wrong for some reason with 100% sys
suddenly for seconds(even trigger softlockup), the statistics monitor
didn't reflect the fact, which confused people. One example with our
per-cgroup top, we ever noticed "20% usr, 80% sys" displayed while in
fact the kernel was in some busy loop(100% sys) at that moment, and the
tick based time are of course all sys samples in such case.

  reply	other threads:[~2018-07-24 13:30 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-09 14:58 [PATCH v2] sched/cputime: Ensure accurate utime and stime ratio in cputime_adjust() Xunlei Pang
2018-07-15 23:36 ` [tip:sched/core] " tip-bot for Xunlei Pang
2018-07-16 13:37   ` Peter Zijlstra
2018-07-16 16:11     ` Frederic Weisbecker
2018-07-16 17:41     ` Ingo Molnar
2018-07-17  4:08       ` Xunlei Pang
2018-07-23  9:21         ` Peter Zijlstra
2018-07-24 13:28           ` Xunlei Pang [this message]
2018-07-16 13:35 ` [PATCH v2] " Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dca42bb1-5480-cec0-0bc8-b5ac3c208177@linux.alibaba.com \
    --to=xlpang@linux.alibaba.com \
    --cc=frederic@kernel.org \
    --cc=hpa@zytor.com \
    --cc=lcapitulino@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).