All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched/fair: Consider RT/IRQ pressure in capacity_spare_wake
@ 2017-11-09 18:52 Joel Fernandes
  2017-11-10  8:29 ` Vincent Guittot
  2017-11-20 11:43 ` Dietmar Eggemann
  0 siblings, 2 replies; 10+ messages in thread
From: Joel Fernandes @ 2017-11-09 18:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes, Dietmar Eggemann, Vincent Guittot,
	Morten Ramussen, Brendan Jackman, Srinivas Pandruvada, Len Brown,
	Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Patrick Bellasi, Chris Redpath, Steve Muckle,
	Steven Rostedt, Saravana Kannan, Vikram Mulukutla, Rohit Jain,
	Atish Patra, EAS Dev, Android Kernel

capacity_spare_wake in the slow path influences choice of idlest groups,
as we search for groups with maximum spare capacity. In scenarios where
RT pressure is high, a sub optimal group can be chosen and hurt
performance of the task being woken up.

Several tests with results are included below to show improvements with
this change.

1) Hackbench on Pixel 2 Android device (4x4 ARM64 Octa core)
------------------------------------------------------------
Here we have RT activity running on big CPU cluster induced with rt-app,
and running hackbench in parallel. The RT tasks are bound to 4 CPUs on
the big cluster (cpu 4,5,6,7) and have 100ms periodicity with
runtime=20ms sleep=80ms.

Hackbench shows big benefit (30%) improvement when number of tasks is 8
and 32: Note: data is completion time in seconds (lower is better).
Number of loops for 8 and 16 tasks is 50000, and for 32 tasks its 20000.
+--------+-----+-------+-------------------+---------------------------+
| groups | fds | tasks | Without Patch     | With Patch                |
+--------+-----+-------+---------+---------+-----------------+---------+
|        |     |       | Mean    | Stdev   | Mean            | Stdev   |
|        |     |       +-------------------+-----------------+---------+
|      1 |   8 |     8 | 1.0534  | 0.13722 | 0.7293 (+30.7%) | 0.02653 |
|      2 |   8 |    16 | 1.6219  | 0.16631 | 1.6391 (-1%)    | 0.24001 |
|      4 |   8 |    32 | 1.2538  | 0.13086 | 1.1080 (+11.6%) | 0.16201 |
+--------+-----+-------+---------+---------+-----------------+---------+

2) Rohit ran barrier.c test (details below) with following improvements:
------------------------------------------------------------------------
This was Rohit's original use case for a patch he posted at [1] however
from his recent tests he showed my patch can replace his slow path
changes [1] and there's no need to selectively scan/skip CPUs in
find_idlest_group_cpu in the slow path to get the improvement he sees.

barrier.c (open_mp code) as a micro-benchmark. It does a number of
iterations and barrier sync at the end of each for loop.

Here barrier,c is running in along with ping on CPU 0 and 1 as:
'ping -l 10000 -q -s 10 -f hostX'

barrier.c can be found at:
http://www.spinics.net/lists/kernel/msg2506955.html

Following are the results for the iterations per second with this
micro-benchmark (higher is better), on a 44 core, 2 socket 88 Threads
Intel x86 machine:
+--------+------------------+---------------------------+
|Threads | Without patch    | With patch                |
|        |                  |                           |
+--------+--------+---------+-----------------+---------+
|        | Mean   | Std Dev | Mean            | Std Dev |
+--------+--------+---------+-----------------+---------+
|1       | 539.36 | 60.16   | 572.54 (+6.15%) | 40.95   |
|2       | 481.01 | 19.32   | 530.64 (+10.32%)| 56.16   |
|4       | 474.78 | 22.28   | 479.46 (+0.99%) | 18.89   |
|8       | 450.06 | 24.91   | 447.82 (-0.50%) | 12.36   |
|16      | 436.99 | 22.57   | 441.88 (+1.12%) | 7.39    |
|32      | 388.28 | 55.59   | 429.4  (+10.59%)| 31.14   |
|64      | 314.62 | 6.33    | 311.81 (-0.89%) | 11.99   |
+--------+--------+---------+-----------------+---------+

3) ping+hackbench test on bare-metal sever (Rohit ran this test)
----------------------------------------------------------------
Here hackbench is running in threaded mode along
with, running ping on CPU 0 and 1 as:
'ping -l 10000 -q -s 10 -f hostX'

This test is running on 2 socket, 20 core and 40 threads Intel x86
machine:
Number of loops is 10000 and runtime is in seconds (Lower is better).

+--------------+-----------------+--------------------------+
|Task Groups   | Without patch   |  With patch              |
|              +-------+---------+----------------+---------+
|(Groups of 40)| Mean  | Std Dev |  Mean          | Std Dev |
+--------------+-------+---------+----------------+---------+
|1             | 0.851 | 0.007   |  0.828 (+2.77%)| 0.032   |
|2             | 1.083 | 0.203   |  1.087 (-0.37%)| 0.246   |
|4             | 1.601 | 0.051   |  1.611 (-0.62%)| 0.055   |
|8             | 2.837 | 0.060   |  2.827 (+0.35%)| 0.031   |
|16            | 5.139 | 0.133   |  5.107 (+0.63%)| 0.085   |
|25            | 7.569 | 0.142   |  7.503 (+0.88%)| 0.143   |
+--------------+-------+---------+----------------+---------+

[1] https://patchwork.kernel.org/patch/9991635/

Matt Fleming also ran cyclictest and several different hackbench tests
on his test machines to santiy-check that the patch doesn't harm any
of his usecases.

Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Morten Ramussen <morten.rasmussen@arm.com>
Cc: Brendan Jackman <brendan.jackman@arm.com>
Tested-by: Rohit Jain <rohit.k.jain@oracle.com>
Tested-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Joel Fernandes <joelaf@google.com>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 56f343b8e749..ba9609407cb9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5724,7 +5724,7 @@ static int cpu_util_wake(int cpu, struct task_struct *p);
 
 static unsigned long capacity_spare_wake(int cpu, struct task_struct *p)
 {
-	return capacity_orig_of(cpu) - cpu_util_wake(cpu, p);
+	return max_t(long, capacity_of(cpu) - cpu_util_wake(cpu, p), 0);
 }
 
 /*
-- 
2.15.0.448.gf294e3d99a-goog

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/fair: Consider RT/IRQ pressure in capacity_spare_wake
  2017-11-09 18:52 [PATCH] sched/fair: Consider RT/IRQ pressure in capacity_spare_wake Joel Fernandes
@ 2017-11-10  8:29 ` Vincent Guittot
  2017-11-16 21:53   ` Joel Fernandes
  2017-11-20 11:43 ` Dietmar Eggemann
  1 sibling, 1 reply; 10+ messages in thread
From: Vincent Guittot @ 2017-11-10  8:29 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Dietmar Eggemann, Morten Ramussen, Brendan Jackman,
	Srinivas Pandruvada, Len Brown, Rafael J. Wysocki, Viresh Kumar,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Patrick Bellasi,
	Chris Redpath, Steve Muckle, Steven Rostedt, Saravana Kannan,
	Vikram Mulukutla, Rohit Jain, Atish Patra, EAS Dev,
	Android Kernel

On 9 November 2017 at 19:52, Joel Fernandes <joelaf@google.com> wrote:
> capacity_spare_wake in the slow path influences choice of idlest groups,
> as we search for groups with maximum spare capacity. In scenarios where
> RT pressure is high, a sub optimal group can be chosen and hurt
> performance of the task being woken up.
>
> Several tests with results are included below to show improvements with
> this change.
>
> 1) Hackbench on Pixel 2 Android device (4x4 ARM64 Octa core)

"4x4 ARM64 Octa core" is confusing . At least for me, 4x4 means 16 cores :-)

> ------------------------------------------------------------
> Here we have RT activity running on big CPU cluster induced with rt-app,
> and running hackbench in parallel. The RT tasks are bound to 4 CPUs on
> the big cluster (cpu 4,5,6,7) and have 100ms periodicity with
> runtime=20ms sleep=80ms.
>
> Hackbench shows big benefit (30%) improvement when number of tasks is 8
> and 32: Note: data is completion time in seconds (lower is better).
> Number of loops for 8 and 16 tasks is 50000, and for 32 tasks its 20000.
> +--------+-----+-------+-------------------+---------------------------+
> | groups | fds | tasks | Without Patch     | With Patch                |
> +--------+-----+-------+---------+---------+-----------------+---------+
> |        |     |       | Mean    | Stdev   | Mean            | Stdev   |
> |        |     |       +-------------------+-----------------+---------+
> |      1 |   8 |     8 | 1.0534  | 0.13722 | 0.7293 (+30.7%) | 0.02653 |
> |      2 |   8 |    16 | 1.6219  | 0.16631 | 1.6391 (-1%)    | 0.24001 |
> |      4 |   8 |    32 | 1.2538  | 0.13086 | 1.1080 (+11.6%) | 0.16201 |
> +--------+-----+-------+---------+---------+-----------------+---------+

 Out of curiosity, do you know why you don't see any improvement for
16 tasks but only for 8 and 32 tasks ?

>
> 2) Rohit ran barrier.c test (details below) with following improvements:
> ------------------------------------------------------------------------
> This was Rohit's original use case for a patch he posted at [1] however
> from his recent tests he showed my patch can replace his slow path
> changes [1] and there's no need to selectively scan/skip CPUs in
> find_idlest_group_cpu in the slow path to get the improvement he sees.
>
> barrier.c (open_mp code) as a micro-benchmark. It does a number of
> iterations and barrier sync at the end of each for loop.
>
> Here barrier,c is running in along with ping on CPU 0 and 1 as:
> 'ping -l 10000 -q -s 10 -f hostX'
>
> barrier.c can be found at:
> http://www.spinics.net/lists/kernel/msg2506955.html
>
> Following are the results for the iterations per second with this
> micro-benchmark (higher is better), on a 44 core, 2 socket 88 Threads
> Intel x86 machine:
> +--------+------------------+---------------------------+
> |Threads | Without patch    | With patch                |
> |        |                  |                           |
> +--------+--------+---------+-----------------+---------+
> |        | Mean   | Std Dev | Mean            | Std Dev |
> +--------+--------+---------+-----------------+---------+
> |1       | 539.36 | 60.16   | 572.54 (+6.15%) | 40.95   |
> |2       | 481.01 | 19.32   | 530.64 (+10.32%)| 56.16   |
> |4       | 474.78 | 22.28   | 479.46 (+0.99%) | 18.89   |
> |8       | 450.06 | 24.91   | 447.82 (-0.50%) | 12.36   |
> |16      | 436.99 | 22.57   | 441.88 (+1.12%) | 7.39    |
> |32      | 388.28 | 55.59   | 429.4  (+10.59%)| 31.14   |
> |64      | 314.62 | 6.33    | 311.81 (-0.89%) | 11.99   |
> +--------+--------+---------+-----------------+---------+
>
> 3) ping+hackbench test on bare-metal sever (Rohit ran this test)
> ----------------------------------------------------------------
> Here hackbench is running in threaded mode along
> with, running ping on CPU 0 and 1 as:
> 'ping -l 10000 -q -s 10 -f hostX'
>
> This test is running on 2 socket, 20 core and 40 threads Intel x86
> machine:
> Number of loops is 10000 and runtime is in seconds (Lower is better).
>
> +--------------+-----------------+--------------------------+
> |Task Groups   | Without patch   |  With patch              |
> |              +-------+---------+----------------+---------+
> |(Groups of 40)| Mean  | Std Dev |  Mean          | Std Dev |
> +--------------+-------+---------+----------------+---------+
> |1             | 0.851 | 0.007   |  0.828 (+2.77%)| 0.032   |
> |2             | 1.083 | 0.203   |  1.087 (-0.37%)| 0.246   |
> |4             | 1.601 | 0.051   |  1.611 (-0.62%)| 0.055   |
> |8             | 2.837 | 0.060   |  2.827 (+0.35%)| 0.031   |
> |16            | 5.139 | 0.133   |  5.107 (+0.63%)| 0.085   |
> |25            | 7.569 | 0.142   |  7.503 (+0.88%)| 0.143   |
> +--------------+-------+---------+----------------+---------+
>
> [1] https://patchwork.kernel.org/patch/9991635/
>
> Matt Fleming also ran cyclictest and several different hackbench tests
> on his test machines to santiy-check that the patch doesn't harm any
> of his usecases.
>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Morten Ramussen <morten.rasmussen@arm.com>
> Cc: Brendan Jackman <brendan.jackman@arm.com>
> Tested-by: Rohit Jain <rohit.k.jain@oracle.com>
> Tested-by: Matt Fleming <matt@codeblueprint.co.uk>
> Signed-off-by: Joel Fernandes <joelaf@google.com>
> ---
>  kernel/sched/fair.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 56f343b8e749..ba9609407cb9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5724,7 +5724,7 @@ static int cpu_util_wake(int cpu, struct task_struct *p);
>
>  static unsigned long capacity_spare_wake(int cpu, struct task_struct *p)
>  {
> -       return capacity_orig_of(cpu) - cpu_util_wake(cpu, p);
> +       return max_t(long, capacity_of(cpu) - cpu_util_wake(cpu, p), 0);

Make sense

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>


>  }
>
>  /*
> --
> 2.15.0.448.gf294e3d99a-goog
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/fair: Consider RT/IRQ pressure in capacity_spare_wake
  2017-11-10  8:29 ` Vincent Guittot
@ 2017-11-16 21:53   ` Joel Fernandes
  2017-11-17  8:49     ` Vincent Guittot
  0 siblings, 1 reply; 10+ messages in thread
From: Joel Fernandes @ 2017-11-16 21:53 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Dietmar Eggemann, Morten Ramussen, Brendan Jackman,
	Srinivas Pandruvada, Len Brown, Rafael J. Wysocki, Viresh Kumar,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Patrick Bellasi,
	Chris Redpath, Steve Muckle, Steven Rostedt, Saravana Kannan,
	Vikram Mulukutla, Rohit Jain, Atish Patra, EAS Dev,
	Android Kernel

Hi Vincent,

Thanks a lot for your reply, and sorry for the late reply. Actually I
just started paternity leave so that's why the delay. My working hours
and completely random at the moment :-)

On Fri, Nov 10, 2017 at 12:29 AM, Vincent Guittot
<vincent.guittot@linaro.org> wrote:
> On 9 November 2017 at 19:52, Joel Fernandes <joelaf@google.com> wrote:
>> capacity_spare_wake in the slow path influences choice of idlest groups,
>> as we search for groups with maximum spare capacity. In scenarios where
>> RT pressure is high, a sub optimal group can be chosen and hurt
>> performance of the task being woken up.
>>
>> Several tests with results are included below to show improvements with
>> this change.
>>
>> 1) Hackbench on Pixel 2 Android device (4x4 ARM64 Octa core)
>
> "4x4 ARM64 Octa core" is confusing . At least for me, 4x4 means 16 cores :-)

Sure I'll fix it, I meant 4 big and 4 LITTLE CPUs :)

>
>> ------------------------------------------------------------
>> Here we have RT activity running on big CPU cluster induced with rt-app,
>> and running hackbench in parallel. The RT tasks are bound to 4 CPUs on
>> the big cluster (cpu 4,5,6,7) and have 100ms periodicity with
>> runtime=20ms sleep=80ms.
>>
>> Hackbench shows big benefit (30%) improvement when number of tasks is 8
>> and 32: Note: data is completion time in seconds (lower is better).
>> Number of loops for 8 and 16 tasks is 50000, and for 32 tasks its 20000.
>> +--------+-----+-------+-------------------+---------------------------+
>> | groups | fds | tasks | Without Patch     | With Patch                |
>> +--------+-----+-------+---------+---------+-----------------+---------+
>> |        |     |       | Mean    | Stdev   | Mean            | Stdev   |
>> |        |     |       +-------------------+-----------------+---------+
>> |      1 |   8 |     8 | 1.0534  | 0.13722 | 0.7293 (+30.7%) | 0.02653 |
>> |      2 |   8 |    16 | 1.6219  | 0.16631 | 1.6391 (-1%)    | 0.24001 |
>> |      4 |   8 |    32 | 1.2538  | 0.13086 | 1.1080 (+11.6%) | 0.16201 |
>> +--------+-----+-------+---------+---------+-----------------+---------+
>
>  Out of curiosity, do you know why you don't see any improvement for
> 16 tasks but only for 8 and 32 tasks ?

Yes I'm not fully sure why 16 tasks didn't show that much improvement.
I can try to trace it when I can get a chance. Generally for this
test, with more number of tasks, the improvement is lesser. However
you're right to point out that the improvement with 32 is > with 16
for this test.

[..]
>>  kernel/sched/fair.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 56f343b8e749..ba9609407cb9 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5724,7 +5724,7 @@ static int cpu_util_wake(int cpu, struct task_struct *p);
>>
>>  static unsigned long capacity_spare_wake(int cpu, struct task_struct *p)
>>  {
>> -       return capacity_orig_of(cpu) - cpu_util_wake(cpu, p);
>> +       return max_t(long, capacity_of(cpu) - cpu_util_wake(cpu, p), 0);
>
> Make sense
>
> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

Thanks!

- Joel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/fair: Consider RT/IRQ pressure in capacity_spare_wake
  2017-11-16 21:53   ` Joel Fernandes
@ 2017-11-17  8:49     ` Vincent Guittot
  2017-12-12  0:43       ` Joel Fernandes
  0 siblings, 1 reply; 10+ messages in thread
From: Vincent Guittot @ 2017-11-17  8:49 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Dietmar Eggemann, Morten Ramussen, Brendan Jackman,
	Srinivas Pandruvada, Len Brown, Rafael J. Wysocki, Viresh Kumar,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Patrick Bellasi,
	Chris Redpath, Steve Muckle, Steven Rostedt, Saravana Kannan,
	Vikram Mulukutla, Rohit Jain, Atish Patra, EAS Dev,
	Android Kernel

On 16 November 2017 at 22:53, Joel Fernandes <joelaf@google.com> wrote:
> Hi Vincent,
>
> Thanks a lot for your reply, and sorry for the late reply. Actually I
> just started paternity leave so that's why the delay. My working hours

Congratulations !

> and completely random at the moment :-)
>
> On Fri, Nov 10, 2017 at 12:29 AM, Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
>> On 9 November 2017 at 19:52, Joel Fernandes <joelaf@google.com> wrote:
>>> capacity_spare_wake in the slow path influences choice of idlest groups,
>>> as we search for groups with maximum spare capacity. In scenarios where
>>> RT pressure is high, a sub optimal group can be chosen and hurt
>>> performance of the task being woken up.
>>>
>>> Several tests with results are included below to show improvements with
>>> this change.
>>>
>>> 1) Hackbench on Pixel 2 Android device (4x4 ARM64 Octa core)
>>
>> "4x4 ARM64 Octa core" is confusing . At least for me, 4x4 means 16 cores :-)
>
> Sure I'll fix it, I meant 4 big and 4 LITTLE CPUs :)
>
>>
>>> ------------------------------------------------------------
>>> Here we have RT activity running on big CPU cluster induced with rt-app,
>>> and running hackbench in parallel. The RT tasks are bound to 4 CPUs on
>>> the big cluster (cpu 4,5,6,7) and have 100ms periodicity with
>>> runtime=20ms sleep=80ms.
>>>
>>> Hackbench shows big benefit (30%) improvement when number of tasks is 8
>>> and 32: Note: data is completion time in seconds (lower is better).
>>> Number of loops for 8 and 16 tasks is 50000, and for 32 tasks its 20000.
>>> +--------+-----+-------+-------------------+---------------------------+
>>> | groups | fds | tasks | Without Patch     | With Patch                |
>>> +--------+-----+-------+---------+---------+-----------------+---------+
>>> |        |     |       | Mean    | Stdev   | Mean            | Stdev   |
>>> |        |     |       +-------------------+-----------------+---------+
>>> |      1 |   8 |     8 | 1.0534  | 0.13722 | 0.7293 (+30.7%) | 0.02653 |
>>> |      2 |   8 |    16 | 1.6219  | 0.16631 | 1.6391 (-1%)    | 0.24001 |
>>> |      4 |   8 |    32 | 1.2538  | 0.13086 | 1.1080 (+11.6%) | 0.16201 |
>>> +--------+-----+-------+---------+---------+-----------------+---------+
>>
>>  Out of curiosity, do you know why you don't see any improvement for
>> 16 tasks but only for 8 and 32 tasks ?
>
> Yes I'm not fully sure why 16 tasks didn't show that much improvement.

Yes. This is just to make sure that there no unexpected side effect

> I can try to trace it when I can get a chance. Generally for this
> test, with more number of tasks, the improvement is lesser. However
> you're right to point out that the improvement with 32 is > with 16
> for this test.
>
> [..]
>>>  kernel/sched/fair.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 56f343b8e749..ba9609407cb9 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -5724,7 +5724,7 @@ static int cpu_util_wake(int cpu, struct task_struct *p);
>>>
>>>  static unsigned long capacity_spare_wake(int cpu, struct task_struct *p)
>>>  {
>>> -       return capacity_orig_of(cpu) - cpu_util_wake(cpu, p);
>>> +       return max_t(long, capacity_of(cpu) - cpu_util_wake(cpu, p), 0);
>>
>> Make sense
>>
>> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
>
> Thanks!
>
> - Joel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/fair: Consider RT/IRQ pressure in capacity_spare_wake
  2017-11-09 18:52 [PATCH] sched/fair: Consider RT/IRQ pressure in capacity_spare_wake Joel Fernandes
  2017-11-10  8:29 ` Vincent Guittot
@ 2017-11-20 11:43 ` Dietmar Eggemann
  1 sibling, 0 replies; 10+ messages in thread
From: Dietmar Eggemann @ 2017-11-20 11:43 UTC (permalink / raw)
  To: Joel Fernandes, linux-kernel
  Cc: Vincent Guittot, Morten Ramussen, Brendan Jackman,
	Srinivas Pandruvada, Len Brown, Rafael J. Wysocki, Viresh Kumar,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Patrick Bellasi,
	Chris Redpath, Steve Muckle, Steven Rostedt, Saravana Kannan,
	Vikram Mulukutla, Rohit Jain, Atish Patra, EAS Dev,
	Android Kernel

On 11/09/2017 06:52 PM, Joel Fernandes wrote:

[...]

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 56f343b8e749..ba9609407cb9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5724,7 +5724,7 @@ static int cpu_util_wake(int cpu, struct task_struct *p);
>   
>   static unsigned long capacity_spare_wake(int cpu, struct task_struct *p)
>   {
> -	return capacity_orig_of(cpu) - cpu_util_wake(cpu, p);
> +	return max_t(long, capacity_of(cpu) - cpu_util_wake(cpu, p), 0);
>   }
>   
>   /*
> 

Looks good to me. Maybe you could mention in the patch header that you 
switch capacity_orig_of() for capacity_of() since its only a tiny diff 
in the hunk.

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/fair: Consider RT/IRQ pressure in capacity_spare_wake
  2017-11-17  8:49     ` Vincent Guittot
@ 2017-12-12  0:43       ` Joel Fernandes
  2017-12-13 20:00         ` Joel Fernandes
  0 siblings, 1 reply; 10+ messages in thread
From: Joel Fernandes @ 2017-12-12  0:43 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Dietmar Eggemann, Morten Ramussen, Brendan Jackman,
	Srinivas Pandruvada, Len Brown, Rafael J. Wysocki, Viresh Kumar,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Patrick Bellasi,
	Chris Redpath, Steve Muckle, Steven Rostedt, Saravana Kannan,
	Vikram Mulukutla, Rohit Jain, Atish Patra, EAS Dev,
	Android Kernel

Hi Vincent,

>>
>>>
>>>> ------------------------------------------------------------
>>>> Here we have RT activity running on big CPU cluster induced with rt-app,
>>>> and running hackbench in parallel. The RT tasks are bound to 4 CPUs on
>>>> the big cluster (cpu 4,5,6,7) and have 100ms periodicity with
>>>> runtime=20ms sleep=80ms.
>>>>
>>>> Hackbench shows big benefit (30%) improvement when number of tasks is 8
>>>> and 32: Note: data is completion time in seconds (lower is better).
>>>> Number of loops for 8 and 16 tasks is 50000, and for 32 tasks its 20000.
>>>> +--------+-----+-------+-------------------+---------------------------+
>>>> | groups | fds | tasks | Without Patch     | With Patch                |
>>>> +--------+-----+-------+---------+---------+-----------------+---------+
>>>> |        |     |       | Mean    | Stdev   | Mean            | Stdev   |
>>>> |        |     |       +-------------------+-----------------+---------+
>>>> |      1 |   8 |     8 | 1.0534  | 0.13722 | 0.7293 (+30.7%) | 0.02653 |
>>>> |      2 |   8 |    16 | 1.6219  | 0.16631 | 1.6391 (-1%)    | 0.24001 |
>>>> |      4 |   8 |    32 | 1.2538  | 0.13086 | 1.1080 (+11.6%) | 0.16201 |
>>>> +--------+-----+-------+---------+---------+-----------------+---------+
>>>
>>>  Out of curiosity, do you know why you don't see any improvement for
>>> 16 tasks but only for 8 and 32 tasks ?
>>
>> Yes I'm not fully sure why 16 tasks didn't show that much improvement.
>
> Yes. This is just to make sure that there no unexpected side effect

Just got back from vacation. Tried to reproduce these results, looks
like our product kernel changed enough that I am not able to exactly
replicate these results and I don't recall the tree I ran these on. I
will redo these tests and share my data in the next rev. Worst case I
can probably drop this test, since there are other hackbench tests in
this patch as well that show improvements. But I'll give it a shot to
make sure no side effects from this. thanks.

- Joel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/fair: Consider RT/IRQ pressure in capacity_spare_wake
  2017-12-12  0:43       ` Joel Fernandes
@ 2017-12-13 20:00         ` Joel Fernandes
  2017-12-14 15:46           ` Vincent Guittot
  0 siblings, 1 reply; 10+ messages in thread
From: Joel Fernandes @ 2017-12-13 20:00 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Dietmar Eggemann, Morten Ramussen, Brendan Jackman,
	Srinivas Pandruvada, Len Brown, Rafael J. Wysocki, Viresh Kumar,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Patrick Bellasi,
	Chris Redpath, Steve Muckle, Steven Rostedt, Saravana Kannan,
	Vikram Mulukutla, Rohit Jain, Atish Patra, EAS Dev,
	Android Kernel

On Mon, Dec 11, 2017 at 4:43 PM, Joel Fernandes <joelaf@google.com> wrote:
> Hi Vincent,
>
>>>
>>>>
>>>>> ------------------------------------------------------------
>>>>> Here we have RT activity running on big CPU cluster induced with rt-app,
>>>>> and running hackbench in parallel. The RT tasks are bound to 4 CPUs on
>>>>> the big cluster (cpu 4,5,6,7) and have 100ms periodicity with
>>>>> runtime=20ms sleep=80ms.
>>>>>
>>>>> Hackbench shows big benefit (30%) improvement when number of tasks is 8
>>>>> and 32: Note: data is completion time in seconds (lower is better).
>>>>> Number of loops for 8 and 16 tasks is 50000, and for 32 tasks its 20000.
>>>>> +--------+-----+-------+-------------------+---------------------------+
>>>>> | groups | fds | tasks | Without Patch     | With Patch                |
>>>>> +--------+-----+-------+---------+---------+-----------------+---------+
>>>>> |        |     |       | Mean    | Stdev   | Mean            | Stdev   |
>>>>> |        |     |       +-------------------+-----------------+---------+
>>>>> |      1 |   8 |     8 | 1.0534  | 0.13722 | 0.7293 (+30.7%) | 0.02653 |
>>>>> |      2 |   8 |    16 | 1.6219  | 0.16631 | 1.6391 (-1%)    | 0.24001 |
>>>>> |      4 |   8 |    32 | 1.2538  | 0.13086 | 1.1080 (+11.6%) | 0.16201 |
>>>>> +--------+-----+-------+---------+---------+-----------------+---------+
>>>>
>>>>  Out of curiosity, do you know why you don't see any improvement for
>>>> 16 tasks but only for 8 and 32 tasks ?
>>>
>>> Yes I'm not fully sure why 16 tasks didn't show that much improvement.
>>
>> Yes. This is just to make sure that there no unexpected side effect
>

It could have been sloppy testing - I could have hit thermal
throttling or forgotten to stop Android runtime before running the
test. Looking at my old data, the case for 16 tasks has higher
completion times than 32 tasks which doesn't make sense. Sorry about
that. I was careful this time, I recreated the product tree and
applied patch - ran the same test as in this patch, the data prefixed
with "with" is with patch and "without" is without patch.

The naming of the Test column is "<test>-<numFDs>-<numGroups>". Data
is completion time of hackbench in seconds.

RUN 1:

Test         Mean             Median            Stddev
with-f4-1g  0.67645 (+3.7%)  0.68000 (+3.8%)  0.025755
with-f4-2g  1.0685  (-0.3%)  1.0570 (+1%)       0.044122
with-f4-4g  1.7558  (+0.7%)  1.7685 (+0.08%)    0.096015

without-f4-1g  0.70255  0.70750  0.025330
without-f4-2g  1.0653  1.0680  0.040300
without-f4-4g  1.7688  1.7670  0.046341

RUN 2:

Test         Mean          Median          Stddev
with-f4-1g  0.68100 (+1%)  0.67800 (+2%)   0.025543
with-f4-2g  1.0242 (+1.5%) 1.0260 (+1.5%)  0.042886
with-f4-4g  1.6100 (+3%)   1.6075 (+3.7%)  0.052677

without-f4-1g  0.68840  0.69150  0.030988
without-f4-2g  1.0400  1.0420  0.034288
without-f4-4g  1.6636  1.6670  0.056963


Let me know what you think, thanks.

- Joel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/fair: Consider RT/IRQ pressure in capacity_spare_wake
  2017-12-13 20:00         ` Joel Fernandes
@ 2017-12-14 15:46           ` Vincent Guittot
  2017-12-14 17:08             ` Joel Fernandes
  0 siblings, 1 reply; 10+ messages in thread
From: Vincent Guittot @ 2017-12-14 15:46 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Dietmar Eggemann, Morten Ramussen, Brendan Jackman,
	Srinivas Pandruvada, Len Brown, Rafael J. Wysocki, Viresh Kumar,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Patrick Bellasi,
	Chris Redpath, Steve Muckle, Steven Rostedt, Saravana Kannan,
	Vikram Mulukutla, Rohit Jain, Atish Patra, EAS Dev,
	Android Kernel

Hi Joel,

On 13 December 2017 at 21:00, Joel Fernandes <joelaf@google.com> wrote:
> On Mon, Dec 11, 2017 at 4:43 PM, Joel Fernandes <joelaf@google.com> wrote:
>> Hi Vincent,
>>
>>>>
>>>>>
>>>>>> ------------------------------------------------------------
>>>>>> Here we have RT activity running on big CPU cluster induced with rt-app,
>>>>>> and running hackbench in parallel. The RT tasks are bound to 4 CPUs on
>>>>>> the big cluster (cpu 4,5,6,7) and have 100ms periodicity with
>>>>>> runtime=20ms sleep=80ms.
>>>>>>
>>>>>> Hackbench shows big benefit (30%) improvement when number of tasks is 8
>>>>>> and 32: Note: data is completion time in seconds (lower is better).
>>>>>> Number of loops for 8 and 16 tasks is 50000, and for 32 tasks its 20000.
>>>>>> +--------+-----+-------+-------------------+---------------------------+
>>>>>> | groups | fds | tasks | Without Patch     | With Patch                |
>>>>>> +--------+-----+-------+---------+---------+-----------------+---------+
>>>>>> |        |     |       | Mean    | Stdev   | Mean            | Stdev   |
>>>>>> |        |     |       +-------------------+-----------------+---------+
>>>>>> |      1 |   8 |     8 | 1.0534  | 0.13722 | 0.7293 (+30.7%) | 0.02653 |
>>>>>> |      2 |   8 |    16 | 1.6219  | 0.16631 | 1.6391 (-1%)    | 0.24001 |
>>>>>> |      4 |   8 |    32 | 1.2538  | 0.13086 | 1.1080 (+11.6%) | 0.16201 |
>>>>>> +--------+-----+-------+---------+---------+-----------------+---------+
>>>>>
>>>>>  Out of curiosity, do you know why you don't see any improvement for
>>>>> 16 tasks but only for 8 and 32 tasks ?
>>>>
>>>> Yes I'm not fully sure why 16 tasks didn't show that much improvement.
>>>
>>> Yes. This is just to make sure that there no unexpected side effect
>>
>
> It could have been sloppy testing - I could have hit thermal
> throttling or forgotten to stop Android runtime before running the
> test. Looking at my old data, the case for 16 tasks has higher
> completion times than 32 tasks which doesn't make sense. Sorry about
> that. I was careful this time, I recreated the product tree and
> applied patch - ran the same test as in this patch, the data prefixed
> with "with" is with patch and "without" is without patch.
>
> The naming of the Test column is "<test>-<numFDs>-<numGroups>". Data
> is completion time of hackbench in seconds.
>
> RUN 1:
>
> Test         Mean             Median            Stddev
> with-f4-1g  0.67645 (+3.7%)  0.68000 (+3.8%)  0.025755
> with-f4-2g  1.0685  (-0.3%)  1.0570 (+1%)       0.044122
> with-f4-4g  1.7558  (+0.7%)  1.7685 (+0.08%)    0.096015
>
> without-f4-1g  0.70255  0.70750  0.025330
> without-f4-2g  1.0653  1.0680  0.040300
> without-f4-4g  1.7688  1.7670  0.046341
>
> RUN 2:
>
> Test         Mean          Median          Stddev
> with-f4-1g  0.68100 (+1%)  0.67800 (+2%)   0.025543
> with-f4-2g  1.0242 (+1.5%) 1.0260 (+1.5%)  0.042886
> with-f4-4g  1.6100 (+3%)   1.6075 (+3.7%)  0.052677
>
> without-f4-1g  0.68840  0.69150  0.030988
> without-f4-2g  1.0400  1.0420  0.034288
> without-f4-4g  1.6636  1.6670  0.056963
>
>
> Let me know what you think, thanks.

The improvement has decreased compared to previous results and there
is instability between your runs; As an example, run2 without patch
does better than run1 with patchs for 2g and 4g.
Could you run tests on a SMP linux kernel instead of  big/LITTLE
android in order to have a saner test environnement and remove some
possible disturbances

Vincent
>
> - Joel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/fair: Consider RT/IRQ pressure in capacity_spare_wake
  2017-12-14 15:46           ` Vincent Guittot
@ 2017-12-14 17:08             ` Joel Fernandes
  2017-12-14 17:16               ` Vincent Guittot
  0 siblings, 1 reply; 10+ messages in thread
From: Joel Fernandes @ 2017-12-14 17:08 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Dietmar Eggemann, Morten Ramussen, Brendan Jackman,
	Srinivas Pandruvada, Len Brown, Rafael J. Wysocki, Viresh Kumar,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Patrick Bellasi,
	Chris Redpath, Steve Muckle, Steven Rostedt, Saravana Kannan,
	Vikram Mulukutla, Rohit Jain, Atish Patra, EAS Dev,
	Android Kernel

Hi Vincent,
Thanks for your reply.

On Thu, Dec 14, 2017 at 7:46 AM, Vincent Guittot
<vincent.guittot@linaro.org> wrote:
> Hi Joel,
>
> On 13 December 2017 at 21:00, Joel Fernandes <joelaf@google.com> wrote:
>> On Mon, Dec 11, 2017 at 4:43 PM, Joel Fernandes <joelaf@google.com> wrote:
>>> Hi Vincent,
>>>
>>>>>
>>>>>>
>>>>>>> ------------------------------------------------------------
>>>>>>> Here we have RT activity running on big CPU cluster induced with rt-app,
>>>>>>> and running hackbench in parallel. The RT tasks are bound to 4 CPUs on
>>>>>>> the big cluster (cpu 4,5,6,7) and have 100ms periodicity with
>>>>>>> runtime=20ms sleep=80ms.
>>>>>>>
>>>>>>> Hackbench shows big benefit (30%) improvement when number of tasks is 8
>>>>>>> and 32: Note: data is completion time in seconds (lower is better).
>>>>>>> Number of loops for 8 and 16 tasks is 50000, and for 32 tasks its 20000.
>>>>>>> +--------+-----+-------+-------------------+---------------------------+
>>>>>>> | groups | fds | tasks | Without Patch     | With Patch                |
>>>>>>> +--------+-----+-------+---------+---------+-----------------+---------+
>>>>>>> |        |     |       | Mean    | Stdev   | Mean            | Stdev   |
>>>>>>> |        |     |       +-------------------+-----------------+---------+
>>>>>>> |      1 |   8 |     8 | 1.0534  | 0.13722 | 0.7293 (+30.7%) | 0.02653 |
>>>>>>> |      2 |   8 |    16 | 1.6219  | 0.16631 | 1.6391 (-1%)    | 0.24001 |
>>>>>>> |      4 |   8 |    32 | 1.2538  | 0.13086 | 1.1080 (+11.6%) | 0.16201 |
>>>>>>> +--------+-----+-------+---------+---------+-----------------+---------+
>>>>>>
>>>>>>  Out of curiosity, do you know why you don't see any improvement for
>>>>>> 16 tasks but only for 8 and 32 tasks ?
>>>>>
>>>>> Yes I'm not fully sure why 16 tasks didn't show that much improvement.
>>>>
>>>> Yes. This is just to make sure that there no unexpected side effect
>>>
>>
>> It could have been sloppy testing - I could have hit thermal
>> throttling or forgotten to stop Android runtime before running the
>> test. Looking at my old data, the case for 16 tasks has higher
>> completion times than 32 tasks which doesn't make sense. Sorry about
>> that. I was careful this time, I recreated the product tree and
>> applied patch - ran the same test as in this patch, the data prefixed
>> with "with" is with patch and "without" is without patch.
>>
>> The naming of the Test column is "<test>-<numFDs>-<numGroups>". Data
>> is completion time of hackbench in seconds.
>>
>> RUN 1:
>>
>> Test         Mean             Median            Stddev
>> with-f4-1g  0.67645 (+3.7%)  0.68000 (+3.8%)  0.025755
>> with-f4-2g  1.0685  (-0.3%)  1.0570 (+1%)       0.044122
>> with-f4-4g  1.7558  (+0.7%)  1.7685 (+0.08%)    0.096015
>>
>> without-f4-1g  0.70255  0.70750  0.025330
>> without-f4-2g  1.0653  1.0680  0.040300
>> without-f4-4g  1.7688  1.7670  0.046341
>>
>> RUN 2:
>>
>> Test         Mean          Median          Stddev
>> with-f4-1g  0.68100 (+1%)  0.67800 (+2%)   0.025543
>> with-f4-2g  1.0242 (+1.5%) 1.0260 (+1.5%)  0.042886
>> with-f4-4g  1.6100 (+3%)   1.6075 (+3.7%)  0.052677
>>
>> without-f4-1g  0.68840  0.69150  0.030988
>> without-f4-2g  1.0400  1.0420  0.034288
>> without-f4-4g  1.6636  1.6670  0.056963
>>
>>
>> Let me know what you think, thanks.
>
> The improvement has decreased compared to previous results and there

Yes but the previous result was invalid as I mentioned, I controlled
the environment better this time. Previous result showed 4g completed
quicker than 2g which wasn't very meaningful.

> is instability between your runs; As an example, run2 without patch
> does better than run1 with patchs for 2g and 4g.

That's true. The improvement percent isn't stable.

> Could you run tests on a SMP linux kernel instead of  big/LITTLE
> android in order to have a saner test environnement and remove some
> possible disturbances

Would it be Ok with you if I just dropped this synthetic test from the
patch since there are other hackbench results (case 3) from Rohit
which are on SMP?

Thanks,

- Joel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/fair: Consider RT/IRQ pressure in capacity_spare_wake
  2017-12-14 17:08             ` Joel Fernandes
@ 2017-12-14 17:16               ` Vincent Guittot
  0 siblings, 0 replies; 10+ messages in thread
From: Vincent Guittot @ 2017-12-14 17:16 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Dietmar Eggemann, Morten Ramussen, Brendan Jackman,
	Srinivas Pandruvada, Len Brown, Rafael J. Wysocki, Viresh Kumar,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Patrick Bellasi,
	Chris Redpath, Steve Muckle, Steven Rostedt, Saravana Kannan,
	Vikram Mulukutla, Rohit Jain, Atish Patra, EAS Dev,
	Android Kernel

On 14 December 2017 at 18:08, Joel Fernandes <joelaf@google.com> wrote:
> Hi Vincent,
> Thanks for your reply.
>
> On Thu, Dec 14, 2017 at 7:46 AM, Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
>> Hi Joel,
>>
>> On 13 December 2017 at 21:00, Joel Fernandes <joelaf@google.com> wrote:
>>> On Mon, Dec 11, 2017 at 4:43 PM, Joel Fernandes <joelaf@google.com> wrote:
>>>> Hi Vincent,
>>>>
>>>>>>
>>>>>>>
>>>>>>>> ------------------------------------------------------------
>>>>>>>> Here we have RT activity running on big CPU cluster induced with rt-app,
>>>>>>>> and running hackbench in parallel. The RT tasks are bound to 4 CPUs on
>>>>>>>> the big cluster (cpu 4,5,6,7) and have 100ms periodicity with
>>>>>>>> runtime=20ms sleep=80ms.
>>>>>>>>
>>>>>>>> Hackbench shows big benefit (30%) improvement when number of tasks is 8
>>>>>>>> and 32: Note: data is completion time in seconds (lower is better).
>>>>>>>> Number of loops for 8 and 16 tasks is 50000, and for 32 tasks its 20000.
>>>>>>>> +--------+-----+-------+-------------------+---------------------------+
>>>>>>>> | groups | fds | tasks | Without Patch     | With Patch                |
>>>>>>>> +--------+-----+-------+---------+---------+-----------------+---------+
>>>>>>>> |        |     |       | Mean    | Stdev   | Mean            | Stdev   |
>>>>>>>> |        |     |       +-------------------+-----------------+---------+
>>>>>>>> |      1 |   8 |     8 | 1.0534  | 0.13722 | 0.7293 (+30.7%) | 0.02653 |
>>>>>>>> |      2 |   8 |    16 | 1.6219  | 0.16631 | 1.6391 (-1%)    | 0.24001 |
>>>>>>>> |      4 |   8 |    32 | 1.2538  | 0.13086 | 1.1080 (+11.6%) | 0.16201 |
>>>>>>>> +--------+-----+-------+---------+---------+-----------------+---------+
>>>>>>>
>>>>>>>  Out of curiosity, do you know why you don't see any improvement for
>>>>>>> 16 tasks but only for 8 and 32 tasks ?
>>>>>>
>>>>>> Yes I'm not fully sure why 16 tasks didn't show that much improvement.
>>>>>
>>>>> Yes. This is just to make sure that there no unexpected side effect
>>>>
>>>
>>> It could have been sloppy testing - I could have hit thermal
>>> throttling or forgotten to stop Android runtime before running the
>>> test. Looking at my old data, the case for 16 tasks has higher
>>> completion times than 32 tasks which doesn't make sense. Sorry about
>>> that. I was careful this time, I recreated the product tree and
>>> applied patch - ran the same test as in this patch, the data prefixed
>>> with "with" is with patch and "without" is without patch.
>>>
>>> The naming of the Test column is "<test>-<numFDs>-<numGroups>". Data
>>> is completion time of hackbench in seconds.
>>>
>>> RUN 1:
>>>
>>> Test         Mean             Median            Stddev
>>> with-f4-1g  0.67645 (+3.7%)  0.68000 (+3.8%)  0.025755
>>> with-f4-2g  1.0685  (-0.3%)  1.0570 (+1%)       0.044122
>>> with-f4-4g  1.7558  (+0.7%)  1.7685 (+0.08%)    0.096015
>>>
>>> without-f4-1g  0.70255  0.70750  0.025330
>>> without-f4-2g  1.0653  1.0680  0.040300
>>> without-f4-4g  1.7688  1.7670  0.046341
>>>
>>> RUN 2:
>>>
>>> Test         Mean          Median          Stddev
>>> with-f4-1g  0.68100 (+1%)  0.67800 (+2%)   0.025543
>>> with-f4-2g  1.0242 (+1.5%) 1.0260 (+1.5%)  0.042886
>>> with-f4-4g  1.6100 (+3%)   1.6075 (+3.7%)  0.052677
>>>
>>> without-f4-1g  0.68840  0.69150  0.030988
>>> without-f4-2g  1.0400  1.0420  0.034288
>>> without-f4-4g  1.6636  1.6670  0.056963
>>>
>>>
>>> Let me know what you think, thanks.
>>
>> The improvement has decreased compared to previous results and there
>
> Yes but the previous result was invalid as I mentioned, I controlled
> the environment better this time. Previous result showed 4g completed
> quicker than 2g which wasn't very meaningful.

Yes. It was just to highlight that we don't see improvements for this
test anymore with new results

>
>> is instability between your runs; As an example, run2 without patch
>> does better than run1 with patchs for 2g and 4g.
>
> That's true. The improvement percent isn't stable.
>
>> Could you run tests on a SMP linux kernel instead of  big/LITTLE
>> android in order to have a saner test environnement and remove some
>> possible disturbances
>
> Would it be Ok with you if I just dropped this synthetic test from the
> patch since there are other hackbench results (case 3) from Rohit
> which are on SMP?

Yes you can probably remove it as there is no improvement and others
tests  show improvement


>
> Thanks,
>
> - Joel

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-12-14 17:17 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-09 18:52 [PATCH] sched/fair: Consider RT/IRQ pressure in capacity_spare_wake Joel Fernandes
2017-11-10  8:29 ` Vincent Guittot
2017-11-16 21:53   ` Joel Fernandes
2017-11-17  8:49     ` Vincent Guittot
2017-12-12  0:43       ` Joel Fernandes
2017-12-13 20:00         ` Joel Fernandes
2017-12-14 15:46           ` Vincent Guittot
2017-12-14 17:08             ` Joel Fernandes
2017-12-14 17:16               ` Vincent Guittot
2017-11-20 11:43 ` Dietmar Eggemann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.