* [PATCH] sched/fair: Fix the nohz.next_balance update mess @ 2017-02-05 9:57 Wanpeng Li 2017-02-06 8:07 ` Vincent Guittot 0 siblings, 1 reply; 5+ messages in thread From: Wanpeng Li @ 2017-02-05 9:57 UTC (permalink / raw) To: linux-kernel; +Cc: Wanpeng Li, Vincent Guittot, Peter Zijlstra, Ingo Molnar From: Wanpeng Li <wanpeng.li@hotmail.com> The commit: c5afb6a87f2 ("sched/fair: Fix nohz.next_balance update") intends to update nohz.next_balance in two steps. 1) The ILB CPU utilizes next_balance variable in nohz_idle_balance() to gather the shortest next balance of other idle CPUs before updating nohz.next_balance. 2) The ILB CPU updates the nohz.next_balance according to its own next_balance after load balance on behalf of other idle CPUs. However, there is a mess which breaks the original intention of the first step, every idle CPUs update nohz.next_balance during ILB CPU on behalf of them to do load balance, and then the ILB CPU utilizes next_balance variable in nohz_idle_balance() to gather the shortest next balance of other idle CPUs before updating nohz.next_balance. This patch fixes it by don't update nohz.next_balance for other idle CPUs when ILB CPU on behalf of them to do load balance. Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> CC: Ingo Molnar <mingo@kernel.org> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> --- kernel/sched/fair.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 274c747..83948a4 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8750,7 +8750,8 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle) * balance for itself and we need to update the * nohz.next_balance accordingly. */ - if ((idle == CPU_IDLE) && time_after(nohz.next_balance, rq->next_balance)) + if ((idle == CPU_IDLE) && time_after(nohz.next_balance, rq->next_balance) && + !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_rq()->cpu))) nohz.next_balance = rq->next_balance; #endif } -- 2.7.4 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] sched/fair: Fix the nohz.next_balance update mess 2017-02-05 9:57 [PATCH] sched/fair: Fix the nohz.next_balance update mess Wanpeng Li @ 2017-02-06 8:07 ` Vincent Guittot 2017-02-06 8:33 ` Wanpeng Li 0 siblings, 1 reply; 5+ messages in thread From: Vincent Guittot @ 2017-02-06 8:07 UTC (permalink / raw) To: Wanpeng Li; +Cc: linux-kernel, Wanpeng Li, Peter Zijlstra, Ingo Molnar Hi Wanpeng On 5 February 2017 at 10:57, Wanpeng Li <kernellwp@gmail.com> wrote: > From: Wanpeng Li <wanpeng.li@hotmail.com> > > The commit: > c5afb6a87f2 ("sched/fair: Fix nohz.next_balance update") > > intends to update nohz.next_balance in two steps. > > 1) The ILB CPU utilizes next_balance variable in nohz_idle_balance() > to gather the shortest next balance of other idle CPUs before > updating nohz.next_balance. > 2) The ILB CPU updates the nohz.next_balance according to its own > next_balance after load balance on behalf of other idle CPUs. > > However, there is a mess which breaks the original intention of the Have you got details of the mess that this generates ? > first step, every idle CPUs update nohz.next_balance during ILB CPU > on behalf of them to do load balance, and then the ILB CPU utilizes > next_balance variable in nohz_idle_balance() to gather the shortest > next balance of other idle CPUs before updating nohz.next_balance. > > This patch fixes it by don't update nohz.next_balance for other idle > CPUs when ILB CPU on behalf of them to do load balance. But how do you take into account the next balance of other idle CPUs ? > > Cc: Vincent Guittot <vincent.guittot@linaro.org> > Cc: Peter Zijlstra (Intel) <peterz@infradead.org> > CC: Ingo Molnar <mingo@kernel.org> > Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> > --- > kernel/sched/fair.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 274c747..83948a4 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -8750,7 +8750,8 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle) > * balance for itself and we need to update the > * nohz.next_balance accordingly. > */ > - if ((idle == CPU_IDLE) && time_after(nohz.next_balance, rq->next_balance)) > + if ((idle == CPU_IDLE) && time_after(nohz.next_balance, rq->next_balance) && > + !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_rq()->cpu))) > nohz.next_balance = rq->next_balance; With this change only the ILB CPU will update the nohz.next_balance but what about the next_balance of other idle CPUs ? The nohz.next_balance must be the next_balance of all idle CPU not only the ILB. So an idle CPU (other than the ILB) will have to wait for the ILB CPU's period evcen if it has shorter load balance period Vincent > #endif > } > -- > 2.7.4 > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] sched/fair: Fix the nohz.next_balance update mess 2017-02-06 8:07 ` Vincent Guittot @ 2017-02-06 8:33 ` Wanpeng Li 2017-02-06 13:23 ` Vincent Guittot 0 siblings, 1 reply; 5+ messages in thread From: Wanpeng Li @ 2017-02-06 8:33 UTC (permalink / raw) To: Vincent Guittot; +Cc: linux-kernel, Wanpeng Li, Peter Zijlstra, Ingo Molnar Hi Vincent, 2017-02-06 16:07 GMT+08:00 Vincent Guittot <vincent.guittot@linaro.org>: > Hi Wanpeng > > On 5 February 2017 at 10:57, Wanpeng Li <kernellwp@gmail.com> wrote: >> From: Wanpeng Li <wanpeng.li@hotmail.com> >> >> The commit: >> c5afb6a87f2 ("sched/fair: Fix nohz.next_balance update") >> >> intends to update nohz.next_balance in two steps. >> >> 1) The ILB CPU utilizes next_balance variable in nohz_idle_balance() >> to gather the shortest next balance of other idle CPUs before >> updating nohz.next_balance. >> 2) The ILB CPU updates the nohz.next_balance according to its own >> next_balance after load balance on behalf of other idle CPUs. >> >> However, there is a mess which breaks the original intention of the > > Have you got details of the mess that this generates ? > >> first step, every idle CPUs update nohz.next_balance during ILB CPU >> on behalf of them to do load balance, and then the ILB CPU utilizes >> next_balance variable in nohz_idle_balance() to gather the shortest >> next balance of other idle CPUs before updating nohz.next_balance. >> >> This patch fixes it by don't update nohz.next_balance for other idle >> CPUs when ILB CPU on behalf of them to do load balance. > > But how do you take into account the next balance of other idle CPUs ? The step 1) which I describe above for your original commit takes it into account. In addition, please refers to the comments which you added(rebalance_domains()) in the original commit: /* * If this CPU has been elected to perform the nohz idle * balance. Other idle CPUs have already rebalanced with * nohz_idle_balance() and nohz.next_balance has been * updated accordingly. This CPU is now running the idle load * balance for itself and we need to update the * nohz.next_balance accordingly. */ > >> >> Cc: Vincent Guittot <vincent.guittot@linaro.org> >> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> >> CC: Ingo Molnar <mingo@kernel.org> >> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> >> --- >> kernel/sched/fair.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 274c747..83948a4 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -8750,7 +8750,8 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle) >> * balance for itself and we need to update the >> * nohz.next_balance accordingly. >> */ >> - if ((idle == CPU_IDLE) && time_after(nohz.next_balance, rq->next_balance)) >> + if ((idle == CPU_IDLE) && time_after(nohz.next_balance, rq->next_balance) && >> + !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_rq()->cpu))) >> nohz.next_balance = rq->next_balance; > > With this change only the ILB CPU will update the nohz.next_balance > but what about the next_balance of other idle CPUs ? > The nohz.next_balance must be the next_balance of all idle CPU not only the ILB. > So an idle CPU (other than the ILB) will have to wait for the ILB > CPU's period evcen if it has shorter load balance period Ditto. Regards, Wanpeng Li ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] sched/fair: Fix the nohz.next_balance update mess 2017-02-06 8:33 ` Wanpeng Li @ 2017-02-06 13:23 ` Vincent Guittot 2017-02-06 22:06 ` Wanpeng Li 0 siblings, 1 reply; 5+ messages in thread From: Vincent Guittot @ 2017-02-06 13:23 UTC (permalink / raw) To: Wanpeng Li; +Cc: linux-kernel, Wanpeng Li, Peter Zijlstra, Ingo Molnar On 6 February 2017 at 09:33, Wanpeng Li <kernellwp@gmail.com> wrote: > Hi Vincent, > 2017-02-06 16:07 GMT+08:00 Vincent Guittot <vincent.guittot@linaro.org>: >> Hi Wanpeng >> >> On 5 February 2017 at 10:57, Wanpeng Li <kernellwp@gmail.com> wrote: >>> From: Wanpeng Li <wanpeng.li@hotmail.com> >>> >>> The commit: >>> c5afb6a87f2 ("sched/fair: Fix nohz.next_balance update") >>> >>> intends to update nohz.next_balance in two steps. >>> >>> 1) The ILB CPU utilizes next_balance variable in nohz_idle_balance() >>> to gather the shortest next balance of other idle CPUs before >>> updating nohz.next_balance. >>> 2) The ILB CPU updates the nohz.next_balance according to its own >>> next_balance after load balance on behalf of other idle CPUs. >>> >>> However, there is a mess which breaks the original intention of the >> >> Have you got details of the mess that this generates ? >> >>> first step, every idle CPUs update nohz.next_balance during ILB CPU >>> on behalf of them to do load balance, and then the ILB CPU utilizes >>> next_balance variable in nohz_idle_balance() to gather the shortest >>> next balance of other idle CPUs before updating nohz.next_balance. >>> >>> This patch fixes it by don't update nohz.next_balance for other idle >>> CPUs when ILB CPU on behalf of them to do load balance. >> >> But how do you take into account the next balance of other idle CPUs ? > > The step 1) which I describe above for your original commit takes it > into account. In addition, please refers to the comments which you > added(rebalance_domains()) in the original commit: yes sorry I mixed rebalance_domains and nohz_idle_balance code But are you sure that this additional condition will change anything ? When an ILB is triggered, it means that nohz.next_balance is before jiffies. Then, for all Idle CPUs (except the ILB CPU), the rq->next_balance will be for sure after nohz.next_balance once we have finished the for_each_domain loop of rebalance_domain() so it can't trig nohz.next_balance = rq->next_balance and the current condition if fine. Then, we set nohz.next_balance to the next balance for the idle CPUs (except ILB CPU) at the end of nohz_idle_balance, Then, we run rebalance_domains() for ILB CPU and only now, rq->next_balance can be before nohz.next_balance When you said " there is a mess which breaks the original intention of the first step", have you seen such wrong behavior or have you got a use case in mind ? Regards, Vincent > > /* > * If this CPU has been elected to perform the nohz idle > * balance. Other idle CPUs have already rebalanced with > * nohz_idle_balance() and nohz.next_balance has been > * updated accordingly. This CPU is now running the idle load > * balance for itself and we need to update the > * nohz.next_balance accordingly. > */ > >> >>> >>> Cc: Vincent Guittot <vincent.guittot@linaro.org> >>> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> >>> CC: Ingo Molnar <mingo@kernel.org> >>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> >>> --- >>> kernel/sched/fair.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>> index 274c747..83948a4 100644 >>> --- a/kernel/sched/fair.c >>> +++ b/kernel/sched/fair.c >>> @@ -8750,7 +8750,8 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle) >>> * balance for itself and we need to update the >>> * nohz.next_balance accordingly. >>> */ >>> - if ((idle == CPU_IDLE) && time_after(nohz.next_balance, rq->next_balance)) >>> + if ((idle == CPU_IDLE) && time_after(nohz.next_balance, rq->next_balance) && >>> + !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_rq()->cpu))) >>> nohz.next_balance = rq->next_balance; >> >> With this change only the ILB CPU will update the nohz.next_balance >> but what about the next_balance of other idle CPUs ? >> The nohz.next_balance must be the next_balance of all idle CPU not only the ILB. >> So an idle CPU (other than the ILB) will have to wait for the ILB >> CPU's period evcen if it has shorter load balance period > > Ditto. > > Regards, > Wanpeng Li ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] sched/fair: Fix the nohz.next_balance update mess 2017-02-06 13:23 ` Vincent Guittot @ 2017-02-06 22:06 ` Wanpeng Li 0 siblings, 0 replies; 5+ messages in thread From: Wanpeng Li @ 2017-02-06 22:06 UTC (permalink / raw) To: Vincent Guittot; +Cc: linux-kernel, Wanpeng Li, Peter Zijlstra, Ingo Molnar 2017-02-06 21:23 GMT+08:00 Vincent Guittot <vincent.guittot@linaro.org>: > On 6 February 2017 at 09:33, Wanpeng Li <kernellwp@gmail.com> wrote: >> Hi Vincent, >> 2017-02-06 16:07 GMT+08:00 Vincent Guittot <vincent.guittot@linaro.org>: >>> Hi Wanpeng >>> >>> On 5 February 2017 at 10:57, Wanpeng Li <kernellwp@gmail.com> wrote: >>>> From: Wanpeng Li <wanpeng.li@hotmail.com> >>>> >>>> The commit: >>>> c5afb6a87f2 ("sched/fair: Fix nohz.next_balance update") >>>> >>>> intends to update nohz.next_balance in two steps. >>>> >>>> 1) The ILB CPU utilizes next_balance variable in nohz_idle_balance() >>>> to gather the shortest next balance of other idle CPUs before >>>> updating nohz.next_balance. >>>> 2) The ILB CPU updates the nohz.next_balance according to its own >>>> next_balance after load balance on behalf of other idle CPUs. >>>> >>>> However, there is a mess which breaks the original intention of the >>> >>> Have you got details of the mess that this generates ? >>> >>>> first step, every idle CPUs update nohz.next_balance during ILB CPU >>>> on behalf of them to do load balance, and then the ILB CPU utilizes >>>> next_balance variable in nohz_idle_balance() to gather the shortest >>>> next balance of other idle CPUs before updating nohz.next_balance. >>>> >>>> This patch fixes it by don't update nohz.next_balance for other idle >>>> CPUs when ILB CPU on behalf of them to do load balance. >>> >>> But how do you take into account the next balance of other idle CPUs ? >> >> The step 1) which I describe above for your original commit takes it >> into account. In addition, please refers to the comments which you >> added(rebalance_domains()) in the original commit: > > yes sorry I mixed rebalance_domains and nohz_idle_balance code > > But are you sure that this additional condition will change anything ? > > When an ILB is triggered, it means that nohz.next_balance is before jiffies. > Then, for all Idle CPUs (except the ILB CPU), the rq->next_balance > will be for sure after nohz.next_balance once we have finished the > for_each_domain loop of rebalance_domain() so it can't trig > nohz.next_balance = rq->next_balance and the current condition if > fine. Oh, I miss it. Thanks for your pointing out. Regards, Wanpeng Li ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-02-06 22:06 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-02-05 9:57 [PATCH] sched/fair: Fix the nohz.next_balance update mess Wanpeng Li 2017-02-06 8:07 ` Vincent Guittot 2017-02-06 8:33 ` Wanpeng Li 2017-02-06 13:23 ` Vincent Guittot 2017-02-06 22:06 ` Wanpeng Li
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).