linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched/fair: Fix the nohz.next_balance update mess
@ 2017-02-05  9:57 Wanpeng Li
  2017-02-06  8:07 ` Vincent Guittot
  0 siblings, 1 reply; 5+ messages in thread
From: Wanpeng Li @ 2017-02-05  9:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: Wanpeng Li, Vincent Guittot, Peter Zijlstra, Ingo Molnar

From: Wanpeng Li <wanpeng.li@hotmail.com>

The commit: 
  c5afb6a87f2 ("sched/fair: Fix nohz.next_balance update")

intends to update nohz.next_balance in two steps.

1) The ILB CPU utilizes next_balance variable in nohz_idle_balance() 
   to gather the shortest next balance of other idle CPUs before 
   updating nohz.next_balance. 
2) The ILB CPU updates the nohz.next_balance according to its own 
   next_balance after load balance on behalf of other idle CPUs.

However, there is a mess which breaks the original intention of the 
first step, every idle CPUs update nohz.next_balance during ILB CPU 
on behalf of them to do load balance, and then the ILB CPU utilizes 
next_balance variable in nohz_idle_balance() to gather the shortest 
next balance of other idle CPUs before updating nohz.next_balance.

This patch fixes it by don't update nohz.next_balance for other idle 
CPUs when ILB CPU on behalf of them to do load balance. 

Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
CC: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 kernel/sched/fair.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 274c747..83948a4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8750,7 +8750,8 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
 		 * balance for itself and we need to update the
 		 * nohz.next_balance accordingly.
 		 */
-		if ((idle == CPU_IDLE) && time_after(nohz.next_balance, rq->next_balance))
+		if ((idle == CPU_IDLE) && time_after(nohz.next_balance, rq->next_balance) &&
+			!test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_rq()->cpu)))
 			nohz.next_balance = rq->next_balance;
 #endif
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched/fair: Fix the nohz.next_balance update mess
  2017-02-05  9:57 [PATCH] sched/fair: Fix the nohz.next_balance update mess Wanpeng Li
@ 2017-02-06  8:07 ` Vincent Guittot
  2017-02-06  8:33   ` Wanpeng Li
  0 siblings, 1 reply; 5+ messages in thread
From: Vincent Guittot @ 2017-02-06  8:07 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: linux-kernel, Wanpeng Li, Peter Zijlstra, Ingo Molnar

Hi Wanpeng

On 5 February 2017 at 10:57, Wanpeng Li <kernellwp@gmail.com> wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
>
> The commit:
>   c5afb6a87f2 ("sched/fair: Fix nohz.next_balance update")
>
> intends to update nohz.next_balance in two steps.
>
> 1) The ILB CPU utilizes next_balance variable in nohz_idle_balance()
>    to gather the shortest next balance of other idle CPUs before
>    updating nohz.next_balance.
> 2) The ILB CPU updates the nohz.next_balance according to its own
>    next_balance after load balance on behalf of other idle CPUs.
>
> However, there is a mess which breaks the original intention of the

Have you got details of the mess that this generates ?

> first step, every idle CPUs update nohz.next_balance during ILB CPU
> on behalf of them to do load balance, and then the ILB CPU utilizes
> next_balance variable in nohz_idle_balance() to gather the shortest
> next balance of other idle CPUs before updating nohz.next_balance.
>
> This patch fixes it by don't update nohz.next_balance for other idle
> CPUs when ILB CPU on behalf of them to do load balance.

But how do you take into account the next balance of other idle CPUs ?

>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> CC: Ingo Molnar <mingo@kernel.org>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
>  kernel/sched/fair.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 274c747..83948a4 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8750,7 +8750,8 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
>                  * balance for itself and we need to update the
>                  * nohz.next_balance accordingly.
>                  */
> -               if ((idle == CPU_IDLE) && time_after(nohz.next_balance, rq->next_balance))
> +               if ((idle == CPU_IDLE) && time_after(nohz.next_balance, rq->next_balance) &&
> +                       !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_rq()->cpu)))
>                         nohz.next_balance = rq->next_balance;

With this change only the ILB CPU will update the nohz.next_balance
but what about the next_balance of other idle CPUs ?
The nohz.next_balance must be the next_balance of all idle CPU not only the ILB.
So an idle CPU (other than the ILB) will have to wait for the ILB
CPU's period evcen if it has shorter load balance period

Vincent

>  #endif
>         }
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched/fair: Fix the nohz.next_balance update mess
  2017-02-06  8:07 ` Vincent Guittot
@ 2017-02-06  8:33   ` Wanpeng Li
  2017-02-06 13:23     ` Vincent Guittot
  0 siblings, 1 reply; 5+ messages in thread
From: Wanpeng Li @ 2017-02-06  8:33 UTC (permalink / raw)
  To: Vincent Guittot; +Cc: linux-kernel, Wanpeng Li, Peter Zijlstra, Ingo Molnar

Hi Vincent,
2017-02-06 16:07 GMT+08:00 Vincent Guittot <vincent.guittot@linaro.org>:
> Hi Wanpeng
>
> On 5 February 2017 at 10:57, Wanpeng Li <kernellwp@gmail.com> wrote:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> The commit:
>>   c5afb6a87f2 ("sched/fair: Fix nohz.next_balance update")
>>
>> intends to update nohz.next_balance in two steps.
>>
>> 1) The ILB CPU utilizes next_balance variable in nohz_idle_balance()
>>    to gather the shortest next balance of other idle CPUs before
>>    updating nohz.next_balance.
>> 2) The ILB CPU updates the nohz.next_balance according to its own
>>    next_balance after load balance on behalf of other idle CPUs.
>>
>> However, there is a mess which breaks the original intention of the
>
> Have you got details of the mess that this generates ?
>
>> first step, every idle CPUs update nohz.next_balance during ILB CPU
>> on behalf of them to do load balance, and then the ILB CPU utilizes
>> next_balance variable in nohz_idle_balance() to gather the shortest
>> next balance of other idle CPUs before updating nohz.next_balance.
>>
>> This patch fixes it by don't update nohz.next_balance for other idle
>> CPUs when ILB CPU on behalf of them to do load balance.
>
> But how do you take into account the next balance of other idle CPUs ?

The step 1) which I describe above for your original commit takes it
into account. In addition, please refers to the comments which you
added(rebalance_domains()) in the original commit:

/*
 * If this CPU has been elected to perform the nohz idle
 * balance. Other idle CPUs have already rebalanced with
 * nohz_idle_balance() and nohz.next_balance has been
 * updated accordingly. This CPU is now running the idle load
 * balance for itself and we need to update the
 * nohz.next_balance accordingly.
 */

>
>>
>> Cc: Vincent Guittot <vincent.guittot@linaro.org>
>> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
>> CC: Ingo Molnar <mingo@kernel.org>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>>  kernel/sched/fair.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 274c747..83948a4 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -8750,7 +8750,8 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
>>                  * balance for itself and we need to update the
>>                  * nohz.next_balance accordingly.
>>                  */
>> -               if ((idle == CPU_IDLE) && time_after(nohz.next_balance, rq->next_balance))
>> +               if ((idle == CPU_IDLE) && time_after(nohz.next_balance, rq->next_balance) &&
>> +                       !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_rq()->cpu)))
>>                         nohz.next_balance = rq->next_balance;
>
> With this change only the ILB CPU will update the nohz.next_balance
> but what about the next_balance of other idle CPUs ?
> The nohz.next_balance must be the next_balance of all idle CPU not only the ILB.
> So an idle CPU (other than the ILB) will have to wait for the ILB
> CPU's period evcen if it has shorter load balance period

Ditto.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched/fair: Fix the nohz.next_balance update mess
  2017-02-06  8:33   ` Wanpeng Li
@ 2017-02-06 13:23     ` Vincent Guittot
  2017-02-06 22:06       ` Wanpeng Li
  0 siblings, 1 reply; 5+ messages in thread
From: Vincent Guittot @ 2017-02-06 13:23 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: linux-kernel, Wanpeng Li, Peter Zijlstra, Ingo Molnar

On 6 February 2017 at 09:33, Wanpeng Li <kernellwp@gmail.com> wrote:
> Hi Vincent,
> 2017-02-06 16:07 GMT+08:00 Vincent Guittot <vincent.guittot@linaro.org>:
>> Hi Wanpeng
>>
>> On 5 February 2017 at 10:57, Wanpeng Li <kernellwp@gmail.com> wrote:
>>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>>
>>> The commit:
>>>   c5afb6a87f2 ("sched/fair: Fix nohz.next_balance update")
>>>
>>> intends to update nohz.next_balance in two steps.
>>>
>>> 1) The ILB CPU utilizes next_balance variable in nohz_idle_balance()
>>>    to gather the shortest next balance of other idle CPUs before
>>>    updating nohz.next_balance.
>>> 2) The ILB CPU updates the nohz.next_balance according to its own
>>>    next_balance after load balance on behalf of other idle CPUs.
>>>
>>> However, there is a mess which breaks the original intention of the
>>
>> Have you got details of the mess that this generates ?
>>
>>> first step, every idle CPUs update nohz.next_balance during ILB CPU
>>> on behalf of them to do load balance, and then the ILB CPU utilizes
>>> next_balance variable in nohz_idle_balance() to gather the shortest
>>> next balance of other idle CPUs before updating nohz.next_balance.
>>>
>>> This patch fixes it by don't update nohz.next_balance for other idle
>>> CPUs when ILB CPU on behalf of them to do load balance.
>>
>> But how do you take into account the next balance of other idle CPUs ?
>
> The step 1) which I describe above for your original commit takes it
> into account. In addition, please refers to the comments which you
> added(rebalance_domains()) in the original commit:

yes sorry I mixed rebalance_domains and  nohz_idle_balance code

But are you sure that this additional condition will change anything ?

When an ILB is triggered, it means that nohz.next_balance is before jiffies.
Then, for all Idle CPUs (except the ILB CPU), the rq->next_balance
will be for sure after nohz.next_balance once we have finished the
for_each_domain loop of rebalance_domain() so it can't trig
nohz.next_balance = rq->next_balance  and the current condition if
fine.

Then, we set nohz.next_balance to the next balance for the idle CPUs
(except ILB CPU) at the end of nohz_idle_balance,

Then, we run rebalance_domains() for ILB CPU and only now,
rq->next_balance can be before nohz.next_balance

When you said " there is a mess which breaks the original intention of
the first step", have you seen such wrong behavior or have you got a
use case in mind ?

Regards,
Vincent

>
> /*
>  * If this CPU has been elected to perform the nohz idle
>  * balance. Other idle CPUs have already rebalanced with
>  * nohz_idle_balance() and nohz.next_balance has been
>  * updated accordingly. This CPU is now running the idle load
>  * balance for itself and we need to update the
>  * nohz.next_balance accordingly.
>  */
>
>>
>>>
>>> Cc: Vincent Guittot <vincent.guittot@linaro.org>
>>> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
>>> CC: Ingo Molnar <mingo@kernel.org>
>>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>>> ---
>>>  kernel/sched/fair.c | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 274c747..83948a4 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -8750,7 +8750,8 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
>>>                  * balance for itself and we need to update the
>>>                  * nohz.next_balance accordingly.
>>>                  */
>>> -               if ((idle == CPU_IDLE) && time_after(nohz.next_balance, rq->next_balance))
>>> +               if ((idle == CPU_IDLE) && time_after(nohz.next_balance, rq->next_balance) &&
>>> +                       !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_rq()->cpu)))
>>>                         nohz.next_balance = rq->next_balance;
>>
>> With this change only the ILB CPU will update the nohz.next_balance
>> but what about the next_balance of other idle CPUs ?
>> The nohz.next_balance must be the next_balance of all idle CPU not only the ILB.
>> So an idle CPU (other than the ILB) will have to wait for the ILB
>> CPU's period evcen if it has shorter load balance period
>
> Ditto.
>
> Regards,
> Wanpeng Li

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched/fair: Fix the nohz.next_balance update mess
  2017-02-06 13:23     ` Vincent Guittot
@ 2017-02-06 22:06       ` Wanpeng Li
  0 siblings, 0 replies; 5+ messages in thread
From: Wanpeng Li @ 2017-02-06 22:06 UTC (permalink / raw)
  To: Vincent Guittot; +Cc: linux-kernel, Wanpeng Li, Peter Zijlstra, Ingo Molnar

2017-02-06 21:23 GMT+08:00 Vincent Guittot <vincent.guittot@linaro.org>:
> On 6 February 2017 at 09:33, Wanpeng Li <kernellwp@gmail.com> wrote:
>> Hi Vincent,
>> 2017-02-06 16:07 GMT+08:00 Vincent Guittot <vincent.guittot@linaro.org>:
>>> Hi Wanpeng
>>>
>>> On 5 February 2017 at 10:57, Wanpeng Li <kernellwp@gmail.com> wrote:
>>>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>>>
>>>> The commit:
>>>>   c5afb6a87f2 ("sched/fair: Fix nohz.next_balance update")
>>>>
>>>> intends to update nohz.next_balance in two steps.
>>>>
>>>> 1) The ILB CPU utilizes next_balance variable in nohz_idle_balance()
>>>>    to gather the shortest next balance of other idle CPUs before
>>>>    updating nohz.next_balance.
>>>> 2) The ILB CPU updates the nohz.next_balance according to its own
>>>>    next_balance after load balance on behalf of other idle CPUs.
>>>>
>>>> However, there is a mess which breaks the original intention of the
>>>
>>> Have you got details of the mess that this generates ?
>>>
>>>> first step, every idle CPUs update nohz.next_balance during ILB CPU
>>>> on behalf of them to do load balance, and then the ILB CPU utilizes
>>>> next_balance variable in nohz_idle_balance() to gather the shortest
>>>> next balance of other idle CPUs before updating nohz.next_balance.
>>>>
>>>> This patch fixes it by don't update nohz.next_balance for other idle
>>>> CPUs when ILB CPU on behalf of them to do load balance.
>>>
>>> But how do you take into account the next balance of other idle CPUs ?
>>
>> The step 1) which I describe above for your original commit takes it
>> into account. In addition, please refers to the comments which you
>> added(rebalance_domains()) in the original commit:
>
> yes sorry I mixed rebalance_domains and  nohz_idle_balance code
>
> But are you sure that this additional condition will change anything ?
>
> When an ILB is triggered, it means that nohz.next_balance is before jiffies.
> Then, for all Idle CPUs (except the ILB CPU), the rq->next_balance
> will be for sure after nohz.next_balance once we have finished the
> for_each_domain loop of rebalance_domain() so it can't trig
> nohz.next_balance = rq->next_balance  and the current condition if
> fine.

Oh, I miss it. Thanks for your pointing out.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-02-06 22:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-05  9:57 [PATCH] sched/fair: Fix the nohz.next_balance update mess Wanpeng Li
2017-02-06  8:07 ` Vincent Guittot
2017-02-06  8:33   ` Wanpeng Li
2017-02-06 13:23     ` Vincent Guittot
2017-02-06 22:06       ` Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).