linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stanislaw Gruszka <sgruszka@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>, Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andrew Fox <afox@redhat.com>,
	Stephen Johnston <sjohnsto@redhat.com>,
	linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH] sched/cputime: make scale_stime() more precise
Date: Tue, 23 Jul 2019 11:37:42 +0200	[thread overview]
Message-ID: <20190723093741.GB2955@redhat.com> (raw)
In-Reply-To: <20190722200034.GJ6698@worktop.programming.kicks-ass.net>

On Mon, Jul 22, 2019 at 10:00:34PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 22, 2019 at 12:52:41PM +0200, Stanislaw Gruszka wrote:
> > On Fri, Jul 19, 2019 at 01:03:49PM +0200, Peter Zijlstra wrote:
> > > > shows the problem even when sum_exec_runtime is not that big: 300000 secs.
> > > > 
> > > > The new implementation of scale_stime() does the additional div64_u64_rem()
> > > > in a loop but see the comment, as long it is used by cputime_adjust() this
> > > > can happen only once.
> > > 
> > > That only shows something after long long staring :/ There's no words on
> > > what the output actually means or what would've been expected.
> > > 
> > > Also, your example is incomplete; the below is a test for scale_stime();
> > > from this we can see that the division results in too large a number,
> > > but, important for our use-case in cputime_adjust(), it is a step
> > > function (due to loss in precision) and for every plateau we shift
> > > runtime into the wrong bucket.
> > > 
> > > Your proposed function works; but is atrocious, esp. on 32bit. That
> > > said, before we 'fixed' it, it had similar horrible divisions in, see
> > > commit 55eaa7c1f511 ("sched: Avoid cputime scaling overflow").
> > > 
> > > Included below is also an x86_64 implementation in 2 instructions.
> > > 
> > > I'm still trying see if there's anything saner we can do...
> > 
> > I was always proponent of removing scaling and export raw values
> > and sum_exec_runtime. But that has obvious drawback, reintroduce
> > 'top hiding' issue.
> 
> I think (but didn't grep) that we actually export sum_exec_runtime in
> /proc/ *somewhere*.

Yes, it is, in /proc/[PID]/schedstat for CONFIG_SCHEDSTAT=y and in
/proc/[PID]/sched for CONFIG_SCHED_DEBUG=y

> > But maybe we can export raw values in separate file i.e.
> > /proc/[pid]/raw_cpu_times ? So applications that require more precise
> > cputime values for very long-living processes can use this file.
> 
> There are no raw cpu_times, there are system and user samples, and
> samples are, by their very nature, an approximation. We just happen to
> track the samples in TICK_NSEC resolution these days, but they're still
> ticks (except on s390 and maybe other archs, which do time accounting in
> the syscall path).
> 
> But I think you'll find x86 people are quite opposed to doing TSC reads
> in syscall entry and exit :-)

By 'raw' I meant values that are stored in task_struct without
processing by cputime_adjust() -> scale_stime(). Idea is that
user space can make scaling using floating point, so do not have
to drop precision if numbers are big. 

That was discussed on my RFC and PATCH series posts:
https://lore.kernel.org/lkml/1364489605-5443-1-git-send-email-sgruszka@redhat.com/
https://lore.kernel.org/lkml/1365066635-2959-1-git-send-email-sgruszka@redhat.com/

There was objection that even if kernel does not tell what utime/stime
values mean exactly, users already got used to scaled behaviour and
changing this is 'semantic' ABI breakage. And obviously this would make
'top hiding' worm working again for unpatched top.

So perhaps we can add new exports of not-scaled utime/stime and achieve
the same goal without changing the meaning of existing fields. From
other hand, maybe nowadays better tools exist to provide exact cputimes
to user space i.e. 'perf stat' or bpf, so proposed addition is unneeded.

Stanislaw

  reply	other threads:[~2019-07-23  9:38 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-18 13:18 [PATCH] sched/cputime: make scale_stime() more precise Oleg Nesterov
2019-07-18 13:21 ` Oleg Nesterov
2019-07-18 14:55 ` Oleg Nesterov
2019-07-19 11:03 ` Peter Zijlstra
2019-07-19 13:47   ` Peter Zijlstra
2019-07-19 14:37     ` Oleg Nesterov
2019-07-22 19:56       ` Peter Zijlstra
2019-07-23 14:00         ` Oleg Nesterov
2019-07-23 14:29           ` Oleg Nesterov
2019-07-19 14:03   ` Oleg Nesterov
2019-07-22 19:45     ` Peter Zijlstra
2019-07-22 10:52   ` Stanislaw Gruszka
2019-07-22 20:00     ` Peter Zijlstra
2019-07-23  9:37       ` Stanislaw Gruszka [this message]
2020-01-22 16:46 ` Oleg Nesterov
2020-01-23 13:05   ` Oleg Nesterov
2020-01-24 15:42   ` Oleg Nesterov
2020-01-27 12:28 ` [PATCH v2] " Oleg Nesterov
2020-05-15 17:24   ` Oleg Nesterov
2020-05-19 17:25   ` Peter Zijlstra
2020-05-19 18:33     ` Linus Torvalds
2020-05-19 18:42       ` Peter Zijlstra
2020-05-19 19:11       ` Peter Zijlstra
2020-05-19 19:51         ` Linus Torvalds
2020-05-20 15:24     ` Oleg Nesterov
2020-05-20 15:36       ` Peter Zijlstra
2020-05-20 20:10         ` Peter Zijlstra
2020-05-21 13:26           ` Oleg Nesterov
2020-06-16 12:21     ` [tip: sched/core] sched/cputime: Improve cputime_adjust() tip-bot2 for Oleg Nesterov
  -- strict thread matches above, loose matches on Subject: below --
2019-07-18 13:15 [PATCH] sched/cputime: make scale_stime() more precise Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190723093741.GB2955@redhat.com \
    --to=sgruszka@redhat.com \
    --cc=afox@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sjohnsto@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).