From: Paul Turner <pjt@google.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Segall <bsegall@google.com>,
linux-kernel@vger.kernel.org, Venki Pallipadi <venki@google.com>,
Srivatsa Vaddagiri <vatsa@in.ibm.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>,
Mike Galbraith <efault@gmx.de>,
Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>,
Ingo Molnar <mingo@elte.hu>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Morten Rasmussen <Morten.Rasmussen@arm.com>,
Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Subject: Re: [PATCH 14/16] sched: make __update_entity_runnable_avg() fast
Date: Wed, 11 Jul 2012 17:15:04 -0700 [thread overview]
Message-ID: <CAPM31R+RYvLLN0mK8VBOe7QTxCxEEuLEZ1OokeLAB=D0cxqJxQ@mail.gmail.com> (raw)
In-Reply-To: <1341917490.3462.119.camel@twins>
So I've been trying to dig up the little proglets that originally
computed this stuff, since some specific adjustments were made but for
the life of me[*] I cannot find it, so I am stuck trying to reverse
engineer it like you :-).
[*] Including some over-night greps on all my source trees.
The short answer is I can explain some of the differences, but not
all; I suspect that perhaps I generated things using a wonky table.
Will update the tables with the tweaked numbers below for next
posting.
Updated values:
inverses for fixed point multiplication by y^k
0: ffffffff
1: fa83b2da
2: f5257d14
3: efe4b99a
4: eac0c6e6
5: e5b906e6
6: e0ccdeeb
7: dbfbb796
8: d744fcc9
9: d2a81d91
10: ce248c14
11: c9b9bd85
12: c5672a10
13: c12c4cc9
14: bd08a39e
15: b8fbaf46
16: b504f333
17: b123f581
18: ad583ee9
19: a9a15ab4
20: a5fed6a9
21: a2704302
22: 9ef5325f
23: 9b8d39b9
24: 9837f050
25: 94f4efa8
26: 91c3d373
27: 8ea4398a
28: 8b95c1e3
29: 88980e80
30: 85aac367
31: 82cd8698
[ Delta versus previous is purely double vs float, no surprises on this one ]
convergence
345> 47765
[ This is accounting for the fact that we're not getting to use a
perfect value of y, but it is the value we will converge to with max
individual updates and our fastmult y^1 approximation ]
sum y^n
[ Where:
exact_n = exact_{n-1} * exact_y + 1024.0
floor_n = FLOOR( floor_{n-1} * exact_y * 1024.0 )
shift_n = approximating exact_n using shift/div for mult
fastmul1 = approximating exact_n using inverse mult of y^1 recursively
fastmul2 = \sum 1024*y^n using inverse mult of y^k
Error terms for the approximations:
sum y^n
exact floor shift fastmul1 fastmul2
1: 1002 -0 0 0 0
2: 1983 -1 0 1 1
3: 2942 -1 -1 1 1
4: 3881 -1 -2 1 1
5: 4800 -2 -3 1 1
6: 5699 -2 -4 1 1
7: 6579 -3 -5 1 1
8: 7440 -3 -6 1 1
9: 8283 -4 -6 2 2
10: 9108 -5 -7 1 2
11: 9914 -5 -8 1 2
12: 10704 -6 -9 1 2
13: 11477 -7 -9 2 3
14: 12233 -7 -10 2 3
15: 12973 -7 -11 2 3
16: 13697 -7 -12 2 3
17: 14405 -7 -13 1 3
18: 15099 -8 -14 1 3
19: 15777 -8 -16 0 3
20: 16441 -8 -17 0 3
21: 17091 -9 -18 0 3
22: 17727 -9 -18 1 4
23: 18349 -9 -20 0 3
24: 18958 -9 -20 1 4
25: 19554 -9 -21 1 4
26: 20137 -9 -22 1 4
27: 20707 -9 -24 1 4
28: 21266 -10 -25 1 4
29: 21812 -10 -27 0 3
30: 22347 -11 -28 1 4
31: 22870 -11 -30 0 3
The concern here is that we don't want approximations that
over-estimate to make possible exceeding our converged max load sum
above, which was accumulated using only single y^n steps.
For this reason I prefer the most conservative floor approximation
which never over-estimates, with errors <0.1%. I think this is what I
chose previously (the first terms all align), but I can't explain the
divergence for higher n.
(Exact values)
exact floor shift fastmul1 fastmul2
1: 1002 1002 1002 1002 1002
2: 1983 1982 1982 1983 1983
3: 2942 2941 2941 2943 2943
4: 3881 3880 3879 3882 3882
5: 4800 4798 4797 4801 4801
6: 5699 5697 5695 5700 5700
7: 6579 6576 6574 6580 6580
8: 7440 7437 7434 7441 7441
9: 8283 8279 8276 8284 8284
10: 9108 9103 9100 9108 9109
11: 9914 9909 9906 9915 9916
12: 10704 10698 10695 10705 10706
13: 11477 11470 11467 11478 11479
14: 12233 12226 12222 12234 12235
15: 12973 12966 12961 12974 12975
16: 13697 13690 13684 13698 13699
17: 14405 14398 14392 14406 14408
18: 15099 15091 15084 15099 15101
19: 15777 15769 15761 15777 15780
20: 16441 16433 16424 16441 16444
21: 17091 17082 17073 17091 17094
22: 17727 17718 17708 17727 17730
23: 18349 18340 18329 18349 18352
24: 18958 18949 18937 18958 18961
25: 19554 19545 19532 19554 19557
26: 20137 20128 20114 20137 20140
27: 20707 20698 20683 20708 20711
28: 21266 21256 21240 21266 21269
29: 21812 21802 21785 21812 21815
30: 22347 22336 22318 22347 22350
31: 22870 22859 22840 22870 22873
And for posterity, a simple generator so that I don't lose it again:
#include <math.h>
#include <stdio.h>
#define SRR(x, y) (((x) + (1UL << ((y) - 1))) >> (y))
#define N 32
#define WMULT_SHIFT 32
const long WMULT_CONST = ((1UL << N) - 1);
const double y = .97857206208770013451;
long approx_decay(int c) {
return (c * 4008) >> 12;
}
long mult_inv_array[N];
void calc_mult_inv() {
int i;
double yn = 0;
printf("inverses\n");
for (i = 0; i < N; i++) {
yn = (double)WMULT_CONST * pow(y, i);
mult_inv_array[i] = yn;
printf("%d: %8lx\n", i, mult_inv_array[i]);
}
printf("\n");
}
long mult_inv(long c, int n) {
return SRR(c * runnable_avg_yN_inv[n], WMULT_SHIFT);
}
void calc_yn_sum(int n)
{
int i;
double sum = 0, sum_fl = 0, diff = 0;
long approx = 0, approx_fm = 0, approx_fm2 = 0;
printf("sum y^n\n");
printf(" %8s %8s %8s %8s %8s\n", "exact", "floor", "shift",
"fastmul1", "fastmul2");
for (i = 1; i < n; i++) {
sum = (y * sum + y * 1024);
sum_fl = floor(y * sum_fl+ y * 1024);
approx = approx_decay(approx) + approx_decay(1024);
approx_fm = mult_inv(approx_fm, 1) + mult_inv(1024, 1);
approx_fm2 += mult_inv(1024, i);
/*diff = sum;*/
printf("%2d: %8.0f %8.0f %8ld %8ld %8ld\n", i, sum,
sum_fl - diff,
approx - (long)diff,
approx_fm - (long)diff,
approx_fm2 - (long)diff);
}
printf("\n");
}
void calc_conv(long n) {
long old_n;
int i = -1;
printf("convergence\n");
do {
old_n = n;
n = mult_inv(n, 1) + 1024;
i++;
} while (n != old_n);
printf("%d> %ld\n", i - 1, n);
printf("\n");
}
void main() {
calc_mult_inv();
calc_conv(1024);
calc_yn_sum(N);
}
next prev parent reply other threads:[~2012-07-12 0:15 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-28 2:24 [PATCH 00/16] Series short description Paul Turner
2012-06-28 2:24 ` [PATCH 09/16] sched: normalize tg load contributions against runnable time Paul Turner
2012-06-29 7:26 ` Namhyung Kim
2012-07-04 19:48 ` Peter Zijlstra
2012-07-06 11:52 ` Peter Zijlstra
2012-07-12 1:08 ` Andre Noll
2012-07-12 0:02 ` Paul Turner
2012-07-06 12:23 ` Peter Zijlstra
2012-06-28 2:24 ` [PATCH 06/16] sched: account for blocked load waking back up Paul Turner
2012-06-28 2:24 ` [PATCH 02/16] sched: maintain per-rq runnable averages Paul Turner
2012-06-28 2:24 ` [PATCH 01/16] sched: track the runnable average on a per-task entitiy basis Paul Turner
2012-06-28 6:06 ` Namhyung Kim
2012-07-12 0:14 ` Paul Turner
2012-07-04 15:32 ` Peter Zijlstra
2012-07-12 0:12 ` Paul Turner
2012-06-28 2:24 ` [PATCH 04/16] sched: maintain the load contribution of blocked entities Paul Turner
2012-06-29 1:27 ` Namhyung Kim
2012-06-28 2:24 ` [PATCH 07/16] sched: aggregate total task_group load Paul Turner
2012-06-28 2:24 ` [PATCH 05/16] sched: add an rq migration call-back to sched_class Paul Turner
2012-06-29 1:32 ` Namhyung Kim
2012-06-28 2:24 ` [PATCH 08/16] sched: compute load contribution by a group entity Paul Turner
2012-06-28 2:24 ` [PATCH 03/16] sched: aggregate load contributed by task entities on parenting cfs_rq Paul Turner
2012-06-28 6:33 ` Namhyung Kim
2012-07-04 15:28 ` Peter Zijlstra
2012-07-06 14:53 ` Peter Zijlstra
2012-07-09 9:15 ` Ingo Molnar
2012-06-28 2:24 ` [PATCH 11/16] sched: replace update_shares weight distribution with per-entity computation Paul Turner
2012-06-28 2:24 ` [PATCH 16/16] sched: introduce temporary FAIR_GROUP_SCHED dependency for load-tracking Paul Turner
2012-06-28 2:24 ` [PATCH 12/16] sched: refactor update_shares_cpu() -> update_blocked_avgs() Paul Turner
2012-06-29 7:28 ` Namhyung Kim
2012-07-12 0:03 ` Paul Turner
2012-07-05 11:58 ` Peter Zijlstra
2012-07-12 0:11 ` Paul Turner
2012-07-12 14:40 ` Peter Zijlstra
2012-06-28 2:24 ` [PATCH 15/16] sched: implement usage tracking Paul Turner
2012-06-28 2:24 ` [PATCH 13/16] sched: update_cfs_shares at period edge Paul Turner
2012-06-28 2:24 ` [PATCH 10/16] sched: maintain runnable averages across throttled periods Paul Turner
2012-06-28 2:24 ` [PATCH 14/16] sched: make __update_entity_runnable_avg() fast Paul Turner
2012-07-04 15:41 ` Peter Zijlstra
2012-07-04 17:20 ` Peter Zijlstra
2012-07-09 20:18 ` Benjamin Segall
2012-07-10 10:51 ` Peter Zijlstra
2012-07-12 0:15 ` Paul Turner [this message]
2012-07-12 14:30 ` Peter Zijlstra
2012-07-04 16:51 ` Peter Zijlstra
2012-08-23 14:14 [patch 00/16] sched: per-entity load-tracking pjt
2012-08-23 14:14 ` [patch 14/16] sched: make __update_entity_runnable_avg() fast pjt
2012-08-24 8:28 ` Namhyung Kim
2012-08-28 22:18 ` Paul Turner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAPM31R+RYvLLN0mK8VBOe7QTxCxEEuLEZ1OokeLAB=D0cxqJxQ@mail.gmail.com' \
--to=pjt@google.com \
--cc=Morten.Rasmussen@arm.com \
--cc=bsegall@google.com \
--cc=efault@gmx.de \
--cc=kamalesh@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=nikunj@linux.vnet.ibm.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=svaidy@linux.vnet.ibm.com \
--cc=vatsa@in.ibm.com \
--cc=venki@google.com \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.