From: Oleg Nesterov <oleg@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Andrew Fox <afox@redhat.com>,
Stephen Johnston <sjohnsto@redhat.com>,
linux-kernel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Stanislaw Gruszka <sgruszka@redhat.com>
Subject: Re: [PATCH] sched/cputime: make scale_stime() more precise
Date: Fri, 19 Jul 2019 16:03:25 +0200 [thread overview]
Message-ID: <20190719140325.GA31938@redhat.com> (raw)
In-Reply-To: <20190719110349.GG3419@hirez.programming.kicks-ass.net>
On 07/19, Peter Zijlstra wrote:
>
> On Thu, Jul 18, 2019 at 03:18:34PM +0200, Oleg Nesterov wrote:
> >
> > $ ./stime 300000
> > start=300000000000000
> > ut(diff)/st(diff): 299994875 ( 0) 300009124 (2000)
> > ut(diff)/st(diff): 299994875 ( 0) 300011124 (2000)
> > ut(diff)/st(diff): 299994875 ( 0) 300013124 (2000)
> > ut(diff)/st(diff): 299994875 ( 0) 300015124 (2000)
> > ut(diff)/st(diff): 299994875 ( 0) 300017124 (2000)
> > ut(diff)/st(diff): 299994875 ( 0) 300019124 (2000)
> > ut(diff)/st(diff): 299994875 ( 0) 300021124 (2000)
> > ut(diff)/st(diff): 299994875 ( 0) 300023124 (2000)
> > ut(diff)/st(diff): 299994875 ( 0) 300025124 (2000)
> > ut(diff)/st(diff): 299994875 ( 0) 300027124 (2000)
> > ut(diff)/st(diff): 299994875 ( 0) 300029124 (2000)
> > ut(diff)/st(diff): 299996875 (2000) 300029124 ( 0)
> > ut(diff)/st(diff): 299998875 (2000) 300029124 ( 0)
> > ut(diff)/st(diff): 300000875 (2000) 300029124 ( 0)
> > ut(diff)/st(diff): 300002875 (2000) 300029124 ( 0)
> > ut(diff)/st(diff): 300004875 (2000) 300029124 ( 0)
> > ut(diff)/st(diff): 300006875 (2000) 300029124 ( 0)
> > ut(diff)/st(diff): 300008875 (2000) 300029124 ( 0)
> > ut(diff)/st(diff): 300010875 (2000) 300029124 ( 0)
> > ut(diff)/st(diff): 300012055 (1180) 300029944 ( 820)
> > ut(diff)/st(diff): 300012055 ( 0) 300031944 (2000)
> > ut(diff)/st(diff): 300012055 ( 0) 300033944 (2000)
> > ut(diff)/st(diff): 300012055 ( 0) 300035944 (2000)
> > ut(diff)/st(diff): 300012055 ( 0) 300037944 (2000)
> >
> > shows the problem even when sum_exec_runtime is not that big: 300000 secs.
> >
> > The new implementation of scale_stime() does the additional div64_u64_rem()
> > in a loop but see the comment, as long it is used by cputime_adjust() this
> > can happen only once.
>
> That only shows something after long long staring :/ There's no words on
> what the output actually means or what would've been expected.
Sorry, I should have explained it in more details,
> Also, your example is incomplete; the below is a test for scale_stime();
> from this we can see that the division results in too large a number,
> but, important for our use-case in cputime_adjust(), it is a step
> function (due to loss in precision) and for every plateau we shift
> runtime into the wrong bucket.
Yes.
> Your proposed function works; but is atrocious,
Agreed,
> esp. on 32bit.
Yes... but lets compare it with the current implementation. To simplify,
lets look at the "less generic" version I showed in reply to this patch:
static u64 scale_stime(u64 stime, u64 rtime, u64 total)
{
u64 res = 0, div, rem;
if (ilog2(stime) + ilog2(rtime) > 62) {
div = div64_u64_rem(rtime, total, &rem);
res += div * stime;
rtime = rem;
int shift = ilog2(stime) + ilog2(rtime) - 62;
if (shift > 0) {
rtime >>= shift;
total >>= shift;
if (!total)
return res;
}
}
return res + div64_u64(stime * rtime, total);
}
So, if stime * rtime overflows it does div64_u64() twice while the
current version does a single div_u64() == do_div() (on 32bit).
Even a single div64_u64() is more expensive than do_div() but afaics
a) not too much and b) only if divisor == total doesn't fit in 32bit
and I think this is unlikely.
So I'd say it makes scale_stime() approximately twice more expensive
on 32bit. But hopefully fixe the problem.
> Included below is also an x86_64 implementation in 2 instructions.
But we need the arch-neutral implementation anyway, the code above
is the best I could invent.
But see below!
> I'm still trying see if there's anything saner we can do...
Oh, please, it is not that I like my solution very much, I would like
to see something more clever.
> static noinline u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 c)
> {
> u64 q;
> asm ("mulq %2; divq %3" : "=a" (q) : "a" (a), "rm" (b), "rm" (c) : "rdx");
> return q;
> }
Heh. I have to admit that I didn't know that divq divides 128bit by
64bit. gcc calls the __udivti3 intrinsic in this case so I wrongly
came to conclusion this is not simple even on x86_64. Plus the fact
that linux/math64.h only has mul_u64_u64_shr()...
IIUC, the new helper above is not "safe", it generates an exception
if the result doesn't fit in 64bit. But scale_stime() can safely use
it because stime < total.
So may be we can do
static u64 scale_stime(u64 stime, u64 rtime, u64 total)
{
u64 res = 0, div, rem;
#ifdef mul_u64_u64_div_u64
return mul_u64_u64_div_u64(stime, rtime, total);
#endif
if (ilog2(stime) + ilog2(rtime) > 62) {
div = div64_u64_rem(rtime, total, &rem);
res += div * stime;
rtime = rem;
int shift = ilog2(stime) + ilog2(rtime) - 62;
if (shift > 0) {
rtime >>= shift;
total >>= shift;
if (!total)
return res;
}
}
return res + div64_u64(stime * rtime, total);
}
?
Oleg.
next prev parent reply other threads:[~2019-07-19 14:03 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-18 13:18 [PATCH] sched/cputime: make scale_stime() more precise Oleg Nesterov
2019-07-18 13:21 ` Oleg Nesterov
2019-07-18 14:55 ` Oleg Nesterov
2019-07-19 11:03 ` Peter Zijlstra
2019-07-19 13:47 ` Peter Zijlstra
2019-07-19 14:37 ` Oleg Nesterov
2019-07-22 19:56 ` Peter Zijlstra
2019-07-23 14:00 ` Oleg Nesterov
2019-07-23 14:29 ` Oleg Nesterov
2019-07-19 14:03 ` Oleg Nesterov [this message]
2019-07-22 19:45 ` Peter Zijlstra
2019-07-22 10:52 ` Stanislaw Gruszka
2019-07-22 20:00 ` Peter Zijlstra
2019-07-23 9:37 ` Stanislaw Gruszka
2020-01-22 16:46 ` Oleg Nesterov
2020-01-23 13:05 ` Oleg Nesterov
2020-01-24 15:42 ` Oleg Nesterov
2020-01-27 12:28 ` [PATCH v2] " Oleg Nesterov
2020-05-15 17:24 ` Oleg Nesterov
2020-05-19 17:25 ` Peter Zijlstra
2020-05-19 18:33 ` Linus Torvalds
2020-05-19 18:42 ` Peter Zijlstra
2020-05-19 19:11 ` Peter Zijlstra
2020-05-19 19:51 ` Linus Torvalds
2020-05-20 15:24 ` Oleg Nesterov
2020-05-20 15:36 ` Peter Zijlstra
2020-05-20 20:10 ` Peter Zijlstra
2020-05-21 13:26 ` Oleg Nesterov
2020-06-16 12:21 ` [tip: sched/core] sched/cputime: Improve cputime_adjust() tip-bot2 for Oleg Nesterov
-- strict thread matches above, loose matches on Subject: below --
2019-07-18 13:15 [PATCH] sched/cputime: make scale_stime() more precise Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190719140325.GA31938@redhat.com \
--to=oleg@redhat.com \
--cc=afox@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=sgruszka@redhat.com \
--cc=sjohnsto@redhat.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).