From: Stanislaw Gruszka <sgruszka@redhat.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
hpa@zytor.com, rostedt@goodmis.org, akpm@linux-foundation.org,
tglx@linutronix.de,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-kernel@vger.kernel.org
Subject: Re: [RFC 4/4] cputime: remove scaling
Date: Thu, 11 Apr 2013 10:36:35 +0200 [thread overview]
Message-ID: <20130411083634.GB1380@redhat.com> (raw)
In-Reply-To: <20130410120228.GC8083@gmail.com>
On Wed, Apr 10, 2013 at 02:02:28PM +0200, Ingo Molnar wrote:
>
> * Stanislaw Gruszka <sgruszka@redhat.com> wrote:
>
> > Scaling cputime cause problems, bunch of them was fixed, but still is possible
> > to hit multiplication overflow issue, which make {u,s}time values incorrect.
> > This problem has no good solution in kernel.
>
> Wasn't 128-bit math a solution to the overflow problems? 128-bit math isn't nice,
> but at least for multiplication it's defensible.
128 bit division is needed unfortunately. Though on 99.9% of cases, it will go
through 64 bit fast path.
> > This patch remove scaling code and export raw values of {u,t}ime . Procps
> > programs can use newly introduced sum_exec_runtime to find out precisely
> > calculated process cpu time and scale utime, stime values accordingly.
> >
> > Unfortunately times(2) syscall has no such option.
> >
> > This change affect kernels compiled without CONFIG_VIRT_CPU_ACCOUNTING_*.
>
> So, the concern here is that 'top hiding' code can now hide again. It's also that
> we are not really solving the problem, we are pushing it to user-space - which in
> the best case gets updated to solve the problem in some similar fashion - and in
> the worst case does not get updated or does it in a buggy way.
>
> So while user-space has it a bit easier because it can do floating point math, is
> there really no workable solution to the current kernel side integer overflow bug?
I do not see any. Basically all we have make problem less reproducible
or just defer it. The best solution, except full 128 bit math I found
is something like this (dropping precision if values are big and overflow
will happen):
u64 _scale_time(u64 rtime, u64 total, u64 time)
{
const int zero_bits = clzll(time) + clzll(rtime);
u64 scaled;
if (zero_bits < 64) {
/* Drop precision */
const int drop_bits = 64 - zero_bits;
time >>= drop_bits;
rtime >>= drop_bits;
total >>= 2*drop_bits;
if (total == 0)
return time;
}
scaled = (time * rtime) / total;
return scaled;
}
It defer problem to quite long period. My testing script detect failure at:
FAIL!
rtime: 1954463459156 <- 22621 days (one thread , CONFIG_HZ=1000)
total: 1771603722423
stime: 354320744484
kernel: 391351504748 <- kernel value
python: 390892691830 <- correct value
For one thread this is fine, but for 512 threads inaccuracy will happen
after only 40 days (due to dropping too many of "total" variable bits).
> I really prefer robust kernel side accounting/instrumentation.
We have CONFIG_IRQ_TIME_ACCOUNTING and CONFIG_VIRT_CPU_ACCOUNTING_GEN.
Perhaps we can change to use one of those options by default. I wonder
if the additional performance cost related with them is really something
that we should care about. Are there any measurement that show those
will make performance worse ?
Stanislaw
next prev parent reply other threads:[~2013-04-11 8:36 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-28 16:53 [RFC 0/4] do not make cputime scaling in kernel Stanislaw Gruszka
2013-03-28 16:53 ` [RFC 1/4] cputime: change parameter of thread_group_cputime_adjusted Stanislaw Gruszka
2013-03-28 16:53 ` [RFC 2/4] procfs: add sum_exec_runtime to /proc/PID/stat Stanislaw Gruszka
2013-03-28 16:53 ` [RFC 3/4] sched,proc: add csum_sched_runtime Stanislaw Gruszka
2013-03-28 16:53 ` [RFC 4/4] cputime: remove scaling Stanislaw Gruszka
2013-04-10 12:02 ` Ingo Molnar
2013-04-10 14:29 ` H. Peter Anvin
2013-04-11 8:37 ` Stanislaw Gruszka
2013-04-11 15:19 ` H. Peter Anvin
2013-04-11 8:36 ` Stanislaw Gruszka [this message]
2013-04-11 15:06 ` Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130411083634.GB1380@redhat.com \
--to=sgruszka@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=fweisbec@gmail.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).