[PATCH] sched/fair: Optimize sum computation with a lookup table

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] sched/fair: Optimize sum computation with a lookup table
@ 2016-04-08  2:07 Yuyang Du
  2016-04-08 10:31 ` Joe Perches
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Yuyang Du @ 2016-04-08  2:07 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: bsegall, pjt, morten.rasmussen, vincent.guittot,
	dietmar.eggemann, Yuyang Du

__compute_runnable_contrib() uses a loop to compute sum, whereas a
table loopup can do it faster in a constant time.

Signed-off-by: Yuyang Du <yuyang.du@intel.com>
---
 kernel/sched/fair.c |   20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b8cc1c3..32bc4ef 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2603,6 +2603,15 @@ static const u32 runnable_avg_yN_sum[] = {
 };
 
 /*
+ * Precomputed \Sum y^k { 1<=k<=n, where n%32=0). Values are rolled down to
+ * lower integers.
+ */
+static const u32 __accumulated_sum_N32[] = {
+	    0, 23371, 35056, 40899, 43820, 45281,
+	46011, 46376, 46559, 46650, 46696, 46719,
+};
+
+/*
  * Approximate:
  *   val * y^n,    where y^32 ~= 0.5 (~1 scheduling period)
  */
@@ -2650,14 +2659,9 @@ static u32 __compute_runnable_contrib(u64 n)
 	else if (unlikely(n >= LOAD_AVG_MAX_N))
 		return LOAD_AVG_MAX;
 
-	/* Compute \Sum k^n combining precomputed values for k^i, \Sum k^j */
-	do {
-		contrib /= 2; /* y^LOAD_AVG_PERIOD = 1/2 */
-		contrib += runnable_avg_yN_sum[LOAD_AVG_PERIOD];
-
-		n -= LOAD_AVG_PERIOD;
-	} while (n > LOAD_AVG_PERIOD);
-
+	/* Since n < LOAD_AVG_MAX_N, n/LOAD_AVG_PERIOD < 11 */
+	contrib = __accumulated_sum_N32[n/LOAD_AVG_PERIOD];
+	n %= LOAD_AVG_PERIOD;
 	contrib = decay_load(contrib, n);
 	return contrib + runnable_avg_yN_sum[n];
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched/fair: Optimize sum computation with a lookup table
  2016-04-08  2:07 [PATCH] sched/fair: Optimize sum computation with a lookup table Yuyang Du
@ 2016-04-08 10:31 ` Joe Perches
  2016-04-08 10:54   ` Peter Zijlstra
  2016-04-08 10:44 ` Peter Zijlstra
  2016-04-08 11:30 ` Morten Rasmussen
  2 siblings, 1 reply; 7+ messages in thread
From: Joe Perches @ 2016-04-08 10:31 UTC (permalink / raw)
  To: Yuyang Du, peterz, mingo, linux-kernel
  Cc: bsegall, pjt, morten.rasmussen, vincent.guittot, dietmar.eggemann

On Fri, 2016-04-08 at 10:07 +0800, Yuyang Du wrote:
> __compute_runnable_contrib() uses a loop to compute sum, whereas a
> table lookup can do it faster in a constant time.

Perhaps this becomes rather fragile code overly dependent on the
current #define values of LOAD_AVG_MAX_N and LOAD_AVG_PERIOD.

Perhaps this comment just above the definitions of LOAD_AVG_MAX_N
and LOAD_AVG_PERIOD should be updated to include this new table:

 * Note: The tables runnable_avg_yN_inv and runnable_avg_yN_sum are
 * dependent on this value.

Perhaps the __ prefix for __accumulated_sum_N32 is odd as both of
the runnable_avg_yN_ tables are not __ prefixed.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched/fair: Optimize sum computation with a lookup table
  2016-04-08  2:07 [PATCH] sched/fair: Optimize sum computation with a lookup table Yuyang Du
  2016-04-08 10:31 ` Joe Perches
@ 2016-04-08 10:44 ` Peter Zijlstra
  2016-04-08 10:45   ` Peter Zijlstra
  2016-04-08 11:30 ` Morten Rasmussen
  2 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2016-04-08 10:44 UTC (permalink / raw)
  To: Yuyang Du
  Cc: mingo, linux-kernel, bsegall, pjt, morten.rasmussen,
	vincent.guittot, dietmar.eggemann

On Fri, Apr 08, 2016 at 10:07:20AM +0800, Yuyang Du wrote:
> __compute_runnable_contrib() uses a loop to compute sum, whereas a
> table loopup can do it faster in a constant time.

> -	/* Compute \Sum k^n combining precomputed values for k^i, \Sum k^j */
> -	do {
> -		contrib /= 2; /* y^LOAD_AVG_PERIOD = 1/2 */
> -		contrib += runnable_avg_yN_sum[LOAD_AVG_PERIOD];
> -
> -		n -= LOAD_AVG_PERIOD;
> -	} while (n > LOAD_AVG_PERIOD);
> -
> +	/* Since n < LOAD_AVG_MAX_N, n/LOAD_AVG_PERIOD < 11 */
> +	contrib = __accumulated_sum_N32[n/LOAD_AVG_PERIOD];
> +	n %= LOAD_AVG_PERIOD;
>  	contrib = decay_load(contrib, n);
>  	return contrib + runnable_avg_yN_sum[n];

You replace a simple loop with a DIV instruction and a potential extra
cachemiss.

Is that really faster? What is the median 'n' for which we run that
loop? IOW how many loops do we normally do?

And remember that while recent Intel chips are really good at divisions,
not everybody is (and even then they're still slow).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched/fair: Optimize sum computation with a lookup table
  2016-04-08 10:44 ` Peter Zijlstra
@ 2016-04-08 10:45   ` Peter Zijlstra
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2016-04-08 10:45 UTC (permalink / raw)
  To: Yuyang Du
  Cc: mingo, linux-kernel, bsegall, pjt, morten.rasmussen,
	vincent.guittot, dietmar.eggemann

On Fri, Apr 08, 2016 at 12:44:32PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 08, 2016 at 10:07:20AM +0800, Yuyang Du wrote:
> > __compute_runnable_contrib() uses a loop to compute sum, whereas a
> > table loopup can do it faster in a constant time.
> 
> > -	/* Compute \Sum k^n combining precomputed values for k^i, \Sum k^j */
> > -	do {
> > -		contrib /= 2; /* y^LOAD_AVG_PERIOD = 1/2 */
> > -		contrib += runnable_avg_yN_sum[LOAD_AVG_PERIOD];
> > -
> > -		n -= LOAD_AVG_PERIOD;
> > -	} while (n > LOAD_AVG_PERIOD);
> > -
> > +	/* Since n < LOAD_AVG_MAX_N, n/LOAD_AVG_PERIOD < 11 */
> > +	contrib = __accumulated_sum_N32[n/LOAD_AVG_PERIOD];
> > +	n %= LOAD_AVG_PERIOD;
> >  	contrib = decay_load(contrib, n);
> >  	return contrib + runnable_avg_yN_sum[n];
> 
> You replace a simple loop with a DIV instruction and a potential extra
> cachemiss.

Duh, LOAD_AVG_PERIOD is 32, that's a simple shift.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched/fair: Optimize sum computation with a lookup table
  2016-04-08 10:31 ` Joe Perches
@ 2016-04-08 10:54   ` Peter Zijlstra
  2016-04-08 16:22     ` Juri Lelli
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2016-04-08 10:54 UTC (permalink / raw)
  To: Joe Perches
  Cc: Yuyang Du, mingo, linux-kernel, bsegall, pjt, morten.rasmussen,
	vincent.guittot, dietmar.eggemann

On Fri, Apr 08, 2016 at 03:31:41AM -0700, Joe Perches wrote:
> On Fri, 2016-04-08 at 10:07 +0800, Yuyang Du wrote:
> > __compute_runnable_contrib() uses a loop to compute sum, whereas a
> > table lookup can do it faster in a constant time.
> 
> Perhaps this becomes rather fragile code overly dependent on the
> current #define values of LOAD_AVG_MAX_N and LOAD_AVG_PERIOD.

Too late for that, these tables already heavily depend on these values.

Commit 5b51f2f80b3b ("sched: Make __update_entity_runnable_avg() fast"),
which introduced these tables, includes a program to re-compute the
values if we ever need to change them.

It might maybe make sense for Yuyang to update that program to also
compute this new table and include the whole new program in the
Changelog of this patch.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched/fair: Optimize sum computation with a lookup table
  2016-04-08  2:07 [PATCH] sched/fair: Optimize sum computation with a lookup table Yuyang Du
  2016-04-08 10:31 ` Joe Perches
  2016-04-08 10:44 ` Peter Zijlstra
@ 2016-04-08 11:30 ` Morten Rasmussen
  2 siblings, 0 replies; 7+ messages in thread
From: Morten Rasmussen @ 2016-04-08 11:30 UTC (permalink / raw)
  To: Yuyang Du
  Cc: peterz, mingo, linux-kernel, bsegall, pjt, vincent.guittot,
	dietmar.eggemann

On Fri, Apr 08, 2016 at 10:07:20AM +0800, Yuyang Du wrote:
> __compute_runnable_contrib() uses a loop to compute sum, whereas a
> table loopup can do it faster in a constant time.
> 
> Signed-off-by: Yuyang Du <yuyang.du@intel.com>
> ---
>  kernel/sched/fair.c |   20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index b8cc1c3..32bc4ef 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2603,6 +2603,15 @@ static const u32 runnable_avg_yN_sum[] = {
>  };
>  
>  /*
> + * Precomputed \Sum y^k { 1<=k<=n, where n%32=0). Values are rolled down to
> + * lower integers.
> + */
> +static const u32 __accumulated_sum_N32[] = {
> +	    0, 23371, 35056, 40899, 43820, 45281,
> +	46011, 46376, 46559, 46650, 46696, 46719,
> +};

Could we come up with a name that better indicates that there is
relation between this new table and runnable_avg_yN_sum[]?
runnable_avg_N32_sum[]?

> +
> +/*
>   * Approximate:
>   *   val * y^n,    where y^32 ~= 0.5 (~1 scheduling period)
>   */
> @@ -2650,14 +2659,9 @@ static u32 __compute_runnable_contrib(u64 n)
>  	else if (unlikely(n >= LOAD_AVG_MAX_N))
>  		return LOAD_AVG_MAX;
>  
> -	/* Compute \Sum k^n combining precomputed values for k^i, \Sum k^j */
> -	do {
> -		contrib /= 2; /* y^LOAD_AVG_PERIOD = 1/2 */
> -		contrib += runnable_avg_yN_sum[LOAD_AVG_PERIOD];
> -
> -		n -= LOAD_AVG_PERIOD;
> -	} while (n > LOAD_AVG_PERIOD);
> -
> +	/* Since n < LOAD_AVG_MAX_N, n/LOAD_AVG_PERIOD < 11 */
> +	contrib = __accumulated_sum_N32[n/LOAD_AVG_PERIOD];
> +	n %= LOAD_AVG_PERIOD;
>  	contrib = decay_load(contrib, n);
>  	return contrib + runnable_avg_yN_sum[n];
>  }

Looks okay to me, but as PeterZ already said, it would be good to have
the code to compute that values.

With that, feel free to put my acked/reviewed-by on it if you like.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched/fair: Optimize sum computation with a lookup table
  2016-04-08 10:54   ` Peter Zijlstra
@ 2016-04-08 16:22     ` Juri Lelli
  0 siblings, 0 replies; 7+ messages in thread
From: Juri Lelli @ 2016-04-08 16:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Joe Perches, Yuyang Du, mingo, linux-kernel, bsegall, pjt,
	morten.rasmussen, vincent.guittot, dietmar.eggemann

Hi,

On 08/04/16 12:54, Peter Zijlstra wrote:
> On Fri, Apr 08, 2016 at 03:31:41AM -0700, Joe Perches wrote:
> > On Fri, 2016-04-08 at 10:07 +0800, Yuyang Du wrote:
> > > __compute_runnable_contrib() uses a loop to compute sum, whereas a
> > > table lookup can do it faster in a constant time.
> > 
> > Perhaps this becomes rather fragile code overly dependent on the
> > current #define values of LOAD_AVG_MAX_N and LOAD_AVG_PERIOD.
> 
> Too late for that, these tables already heavily depend on these values.
> 
> Commit 5b51f2f80b3b ("sched: Make __update_entity_runnable_avg() fast"),
> which introduced these tables, includes a program to re-compute the
> values if we ever need to change them.
> 
> It might maybe make sense for Yuyang to update that program to also
> compute this new table and include the whole new program in the
> Changelog of this patch.
> 

+1 on this. That program turns out to be really useful if one needs to
recompute tables.

Best,

- Juri

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-04-08 16:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-08  2:07 [PATCH] sched/fair: Optimize sum computation with a lookup table Yuyang Du
2016-04-08 10:31 ` Joe Perches
2016-04-08 10:54   ` Peter Zijlstra
2016-04-08 16:22     ` Juri Lelli
2016-04-08 10:44 ` Peter Zijlstra
2016-04-08 10:45   ` Peter Zijlstra
2016-04-08 11:30 ` Morten Rasmussen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.