* [PATCH] sched/fair: Optimize sum computation with a lookup table
@ 2016-04-08 2:07 Yuyang Du
2016-04-08 10:31 ` Joe Perches
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Yuyang Du @ 2016-04-08 2:07 UTC (permalink / raw)
To: peterz, mingo, linux-kernel
Cc: bsegall, pjt, morten.rasmussen, vincent.guittot,
dietmar.eggemann, Yuyang Du
__compute_runnable_contrib() uses a loop to compute sum, whereas a
table loopup can do it faster in a constant time.
Signed-off-by: Yuyang Du <yuyang.du@intel.com>
---
kernel/sched/fair.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b8cc1c3..32bc4ef 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2603,6 +2603,15 @@ static const u32 runnable_avg_yN_sum[] = {
};
/*
+ * Precomputed \Sum y^k { 1<=k<=n, where n%32=0). Values are rolled down to
+ * lower integers.
+ */
+static const u32 __accumulated_sum_N32[] = {
+ 0, 23371, 35056, 40899, 43820, 45281,
+ 46011, 46376, 46559, 46650, 46696, 46719,
+};
+
+/*
* Approximate:
* val * y^n, where y^32 ~= 0.5 (~1 scheduling period)
*/
@@ -2650,14 +2659,9 @@ static u32 __compute_runnable_contrib(u64 n)
else if (unlikely(n >= LOAD_AVG_MAX_N))
return LOAD_AVG_MAX;
- /* Compute \Sum k^n combining precomputed values for k^i, \Sum k^j */
- do {
- contrib /= 2; /* y^LOAD_AVG_PERIOD = 1/2 */
- contrib += runnable_avg_yN_sum[LOAD_AVG_PERIOD];
-
- n -= LOAD_AVG_PERIOD;
- } while (n > LOAD_AVG_PERIOD);
-
+ /* Since n < LOAD_AVG_MAX_N, n/LOAD_AVG_PERIOD < 11 */
+ contrib = __accumulated_sum_N32[n/LOAD_AVG_PERIOD];
+ n %= LOAD_AVG_PERIOD;
contrib = decay_load(contrib, n);
return contrib + runnable_avg_yN_sum[n];
}
--
1.7.9.5
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] sched/fair: Optimize sum computation with a lookup table
2016-04-08 2:07 [PATCH] sched/fair: Optimize sum computation with a lookup table Yuyang Du
@ 2016-04-08 10:31 ` Joe Perches
2016-04-08 10:54 ` Peter Zijlstra
2016-04-08 10:44 ` Peter Zijlstra
2016-04-08 11:30 ` Morten Rasmussen
2 siblings, 1 reply; 7+ messages in thread
From: Joe Perches @ 2016-04-08 10:31 UTC (permalink / raw)
To: Yuyang Du, peterz, mingo, linux-kernel
Cc: bsegall, pjt, morten.rasmussen, vincent.guittot, dietmar.eggemann
On Fri, 2016-04-08 at 10:07 +0800, Yuyang Du wrote:
> __compute_runnable_contrib() uses a loop to compute sum, whereas a
> table lookup can do it faster in a constant time.
Perhaps this becomes rather fragile code overly dependent on the
current #define values of LOAD_AVG_MAX_N and LOAD_AVG_PERIOD.
Perhaps this comment just above the definitions of LOAD_AVG_MAX_N
and LOAD_AVG_PERIOD should be updated to include this new table:
* Note: The tables runnable_avg_yN_inv and runnable_avg_yN_sum are
* dependent on this value.
Perhaps the __ prefix for __accumulated_sum_N32 is odd as both of
the runnable_avg_yN_ tables are not __ prefixed.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] sched/fair: Optimize sum computation with a lookup table
2016-04-08 2:07 [PATCH] sched/fair: Optimize sum computation with a lookup table Yuyang Du
2016-04-08 10:31 ` Joe Perches
@ 2016-04-08 10:44 ` Peter Zijlstra
2016-04-08 10:45 ` Peter Zijlstra
2016-04-08 11:30 ` Morten Rasmussen
2 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2016-04-08 10:44 UTC (permalink / raw)
To: Yuyang Du
Cc: mingo, linux-kernel, bsegall, pjt, morten.rasmussen,
vincent.guittot, dietmar.eggemann
On Fri, Apr 08, 2016 at 10:07:20AM +0800, Yuyang Du wrote:
> __compute_runnable_contrib() uses a loop to compute sum, whereas a
> table loopup can do it faster in a constant time.
> - /* Compute \Sum k^n combining precomputed values for k^i, \Sum k^j */
> - do {
> - contrib /= 2; /* y^LOAD_AVG_PERIOD = 1/2 */
> - contrib += runnable_avg_yN_sum[LOAD_AVG_PERIOD];
> -
> - n -= LOAD_AVG_PERIOD;
> - } while (n > LOAD_AVG_PERIOD);
> -
> + /* Since n < LOAD_AVG_MAX_N, n/LOAD_AVG_PERIOD < 11 */
> + contrib = __accumulated_sum_N32[n/LOAD_AVG_PERIOD];
> + n %= LOAD_AVG_PERIOD;
> contrib = decay_load(contrib, n);
> return contrib + runnable_avg_yN_sum[n];
You replace a simple loop with a DIV instruction and a potential extra
cachemiss.
Is that really faster? What is the median 'n' for which we run that
loop? IOW how many loops do we normally do?
And remember that while recent Intel chips are really good at divisions,
not everybody is (and even then they're still slow).
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] sched/fair: Optimize sum computation with a lookup table
2016-04-08 10:44 ` Peter Zijlstra
@ 2016-04-08 10:45 ` Peter Zijlstra
0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2016-04-08 10:45 UTC (permalink / raw)
To: Yuyang Du
Cc: mingo, linux-kernel, bsegall, pjt, morten.rasmussen,
vincent.guittot, dietmar.eggemann
On Fri, Apr 08, 2016 at 12:44:32PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 08, 2016 at 10:07:20AM +0800, Yuyang Du wrote:
> > __compute_runnable_contrib() uses a loop to compute sum, whereas a
> > table loopup can do it faster in a constant time.
>
> > - /* Compute \Sum k^n combining precomputed values for k^i, \Sum k^j */
> > - do {
> > - contrib /= 2; /* y^LOAD_AVG_PERIOD = 1/2 */
> > - contrib += runnable_avg_yN_sum[LOAD_AVG_PERIOD];
> > -
> > - n -= LOAD_AVG_PERIOD;
> > - } while (n > LOAD_AVG_PERIOD);
> > -
> > + /* Since n < LOAD_AVG_MAX_N, n/LOAD_AVG_PERIOD < 11 */
> > + contrib = __accumulated_sum_N32[n/LOAD_AVG_PERIOD];
> > + n %= LOAD_AVG_PERIOD;
> > contrib = decay_load(contrib, n);
> > return contrib + runnable_avg_yN_sum[n];
>
> You replace a simple loop with a DIV instruction and a potential extra
> cachemiss.
Duh, LOAD_AVG_PERIOD is 32, that's a simple shift.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] sched/fair: Optimize sum computation with a lookup table
2016-04-08 10:31 ` Joe Perches
@ 2016-04-08 10:54 ` Peter Zijlstra
2016-04-08 16:22 ` Juri Lelli
0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2016-04-08 10:54 UTC (permalink / raw)
To: Joe Perches
Cc: Yuyang Du, mingo, linux-kernel, bsegall, pjt, morten.rasmussen,
vincent.guittot, dietmar.eggemann
On Fri, Apr 08, 2016 at 03:31:41AM -0700, Joe Perches wrote:
> On Fri, 2016-04-08 at 10:07 +0800, Yuyang Du wrote:
> > __compute_runnable_contrib() uses a loop to compute sum, whereas a
> > table lookup can do it faster in a constant time.
>
> Perhaps this becomes rather fragile code overly dependent on the
> current #define values of LOAD_AVG_MAX_N and LOAD_AVG_PERIOD.
Too late for that, these tables already heavily depend on these values.
Commit 5b51f2f80b3b ("sched: Make __update_entity_runnable_avg() fast"),
which introduced these tables, includes a program to re-compute the
values if we ever need to change them.
It might maybe make sense for Yuyang to update that program to also
compute this new table and include the whole new program in the
Changelog of this patch.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] sched/fair: Optimize sum computation with a lookup table
2016-04-08 2:07 [PATCH] sched/fair: Optimize sum computation with a lookup table Yuyang Du
2016-04-08 10:31 ` Joe Perches
2016-04-08 10:44 ` Peter Zijlstra
@ 2016-04-08 11:30 ` Morten Rasmussen
2 siblings, 0 replies; 7+ messages in thread
From: Morten Rasmussen @ 2016-04-08 11:30 UTC (permalink / raw)
To: Yuyang Du
Cc: peterz, mingo, linux-kernel, bsegall, pjt, vincent.guittot,
dietmar.eggemann
On Fri, Apr 08, 2016 at 10:07:20AM +0800, Yuyang Du wrote:
> __compute_runnable_contrib() uses a loop to compute sum, whereas a
> table loopup can do it faster in a constant time.
>
> Signed-off-by: Yuyang Du <yuyang.du@intel.com>
> ---
> kernel/sched/fair.c | 20 ++++++++++++--------
> 1 file changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index b8cc1c3..32bc4ef 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2603,6 +2603,15 @@ static const u32 runnable_avg_yN_sum[] = {
> };
>
> /*
> + * Precomputed \Sum y^k { 1<=k<=n, where n%32=0). Values are rolled down to
> + * lower integers.
> + */
> +static const u32 __accumulated_sum_N32[] = {
> + 0, 23371, 35056, 40899, 43820, 45281,
> + 46011, 46376, 46559, 46650, 46696, 46719,
> +};
Could we come up with a name that better indicates that there is
relation between this new table and runnable_avg_yN_sum[]?
runnable_avg_N32_sum[]?
> +
> +/*
> * Approximate:
> * val * y^n, where y^32 ~= 0.5 (~1 scheduling period)
> */
> @@ -2650,14 +2659,9 @@ static u32 __compute_runnable_contrib(u64 n)
> else if (unlikely(n >= LOAD_AVG_MAX_N))
> return LOAD_AVG_MAX;
>
> - /* Compute \Sum k^n combining precomputed values for k^i, \Sum k^j */
> - do {
> - contrib /= 2; /* y^LOAD_AVG_PERIOD = 1/2 */
> - contrib += runnable_avg_yN_sum[LOAD_AVG_PERIOD];
> -
> - n -= LOAD_AVG_PERIOD;
> - } while (n > LOAD_AVG_PERIOD);
> -
> + /* Since n < LOAD_AVG_MAX_N, n/LOAD_AVG_PERIOD < 11 */
> + contrib = __accumulated_sum_N32[n/LOAD_AVG_PERIOD];
> + n %= LOAD_AVG_PERIOD;
> contrib = decay_load(contrib, n);
> return contrib + runnable_avg_yN_sum[n];
> }
Looks okay to me, but as PeterZ already said, it would be good to have
the code to compute that values.
With that, feel free to put my acked/reviewed-by on it if you like.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] sched/fair: Optimize sum computation with a lookup table
2016-04-08 10:54 ` Peter Zijlstra
@ 2016-04-08 16:22 ` Juri Lelli
0 siblings, 0 replies; 7+ messages in thread
From: Juri Lelli @ 2016-04-08 16:22 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Joe Perches, Yuyang Du, mingo, linux-kernel, bsegall, pjt,
morten.rasmussen, vincent.guittot, dietmar.eggemann
Hi,
On 08/04/16 12:54, Peter Zijlstra wrote:
> On Fri, Apr 08, 2016 at 03:31:41AM -0700, Joe Perches wrote:
> > On Fri, 2016-04-08 at 10:07 +0800, Yuyang Du wrote:
> > > __compute_runnable_contrib() uses a loop to compute sum, whereas a
> > > table lookup can do it faster in a constant time.
> >
> > Perhaps this becomes rather fragile code overly dependent on the
> > current #define values of LOAD_AVG_MAX_N and LOAD_AVG_PERIOD.
>
> Too late for that, these tables already heavily depend on these values.
>
> Commit 5b51f2f80b3b ("sched: Make __update_entity_runnable_avg() fast"),
> which introduced these tables, includes a program to re-compute the
> values if we ever need to change them.
>
> It might maybe make sense for Yuyang to update that program to also
> compute this new table and include the whole new program in the
> Changelog of this patch.
>
+1 on this. That program turns out to be really useful if one needs to
recompute tables.
Best,
- Juri
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-04-08 16:22 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-08 2:07 [PATCH] sched/fair: Optimize sum computation with a lookup table Yuyang Du
2016-04-08 10:31 ` Joe Perches
2016-04-08 10:54 ` Peter Zijlstra
2016-04-08 16:22 ` Juri Lelli
2016-04-08 10:44 ` Peter Zijlstra
2016-04-08 10:45 ` Peter Zijlstra
2016-04-08 11:30 ` Morten Rasmussen
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.