linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stanislaw Gruszka <sgruszka@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Peter Zijlstra" <peterz@infradead.org>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"Ingo Molnar" <mingo@kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	linux-tip-commits@vger.kernel.org
Subject: Re: [tip:sched/core] sched: Lower chances of cputime scaling overflow
Date: Tue, 16 Apr 2013 12:40:07 +0200	[thread overview]
Message-ID: <20130416104007.GA620@redhat.com> (raw)
In-Reply-To: <CA+55aFwMeAe6Mb9EPeT4Ke0W36z-XUK3BZvbOt+dMzdbJ3F1Ew@mail.gmail.com>

On Sat, Apr 13, 2013 at 11:44:54AM -0700, Linus Torvalds wrote:
> PS. This is the "Make sure 'total' fits in 32 bits first" version. Not
> really tested, but it's just changing the order of operations a bit.
> 
>     /* We know one of the values has a bit set in the high 32 bits */
>     for (;;) {
>         /* Make sure "rtime" is the bigger of stime/rtime */
>         if (stime > rtime) {
>             u64 tmp = rtime; rtime = stime; stime = tmp;
>         }
> 
>         /* Make sure 'total' fits in 32 bits */
>         if (total >> 32)
>                 goto drop_precision;
> 
>         /* Does rtime (and thus stime) fit in 32 bits? */
>         if (!(rtime >> 32))
>                 break;
> 
>         /* Can we just balance rtime/stime rather than dropping bits? */
>         if (stime >> 31)
>             goto drop_precision;
> 
>         /* We can grow stime and shrink rtime and try to make them both fit */
>         stime <<= 1;
>         rtime >>= 1;
>         continue;
> 
> drop_precision:
>         /* We drop from rtime, it has more bits than stime */
>         rtime >>= 1;
>         total >>= 1;
>     }

It also also pass 0.1% relative error on my tests. Decreasing error
threshold to 0.02% failed. I didn't check other values or measure how
frequent 0.02% fail on each version, I assume this one is better :-)

So regarding relative error this algorithm is fine, there is no
multiplication overflow error which make scaled numbers bogus. Then
I looked on this algorithm regarding context how it is used ...

Raw stime/rtime/total values will increase in jiffies resolution,
so do scaled_stime if we do not drop precision. For bigger numbers,
since we drop precision, scaled_stime will grow in chunks. How big
the chunk depend on how overall big numbers are and stime/total ratio.
For example: stime = 0.5*total, 128 threads, after 1 year of CPU
execution chunk will be 1024 jiffies.

We use scaled stime value this way:

	if (total)
		stime = scale_stime(stime, rtime, total);
	else
		stime = rtime;

	/*
	 * If the tick based count grows faster than the scheduler one,
	 * the result of the scaling may go backward.
	 * Let's enforce monotonicity.
	 */
	prev->stime = max(prev->stime, stime);
	prev->utime = max(prev->utime, rtime - prev->stime);

	*ut = prev->utime;
	*st = prev->stime;

Since rtime increase, but scaled stime not, stime will be accounted
as prev->utime. Then after chunk jiffies, stime will grow and we will
get it accounted in prev->stime. As result we will account more cpu
time than actually process do. This error will accumulate depending
how frequently cputime_adjust(process) will be called.
 
As solution for this we could just stop accounting if prev->stime +
prev->utime are already bigger than rtime. For example:

 	rtime = nsecs_to_cputime(curr->sum_exec_runtime);
 
+	/*
+	 * Update userspace visible utime/stime values only if actual execution
+	 * time is bigger than already exported. Note that can happen, that we 
+	 * provided bigger values due to scaling inaccuracy on big numbers.
+	 */
+	if (prev->stime + prev->utime >= rtime)
+		goto out;
+
 	if (total)
 		stime = scale_stime(stime, rtime, total);
 	else
@@ -573,6 +581,7 @@ static void cputime_adjust(struct task_cputime *curr,
 	prev->stime = max(prev->stime, stime);
 	prev->utime = max(prev->utime, rtime - prev->stime);
 
+out:
 	*ut = prev->utime;
 	*st = prev->stime;
 }
 
This should help with erroneously accounting more CPU time than process
actually use. As disadvantage userspace will see CPU time increase in
chunks, but I think this is better than see values much bigger than
correct ones (and for 99.9% user cases it does not matter).

Stanislaw
 

  reply	other threads:[~2013-04-16 10:39 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <tip-d9a3c9823a2e6a543eb7807fb3d15d8233817ec5@git.kernel.org>
2013-03-26 14:01 ` [tip:sched/core] sched: Lower chances of cputime scaling overflow Stanislaw Gruszka
2013-03-26 14:19   ` Frederic Weisbecker
2013-03-26 16:54     ` Stanislaw Gruszka
2013-04-10 12:51     ` Ingo Molnar
2013-04-10 15:28       ` Frederic Weisbecker
2013-04-10 17:32         ` Ingo Molnar
2013-04-11  8:04           ` Stanislaw Gruszka
2013-04-11 13:45   ` Peter Zijlstra
2013-04-11 14:50     ` Stanislaw Gruszka
2013-04-11 17:31       ` Peter Zijlstra
2013-04-11 15:38     ` Linus Torvalds
2013-04-11 18:07       ` Peter Zijlstra
2013-04-11 18:22         ` Frederic Weisbecker
2013-04-11 18:26           ` Frederic Weisbecker
2013-04-11 18:22         ` Linus Torvalds
2013-04-12  7:55       ` Peter Zijlstra
2013-04-13 14:49         ` Stanislaw Gruszka
2013-04-13 18:44           ` Linus Torvalds
2013-04-16 10:40             ` Stanislaw Gruszka [this message]
2013-04-30 14:03             ` Stanislaw Gruszka
2013-04-13 14:55       ` Stanislaw Gruszka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130416104007.GA620@redhat.com \
    --to=sgruszka@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=fweisbec@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).