* [PATCH 0/2] avoid spurious blocked load update @ 2021-11-12 9:58 Vincent Guittot 2021-11-12 9:58 ` [PATCH 1/2] sched/fair: skip newidle update stats Vincent Guittot 2021-11-16 23:48 ` [PATCH 0/2] avoid spurious blocked load update Tim Chen 0 siblings, 2 replies; 11+ messages in thread From: Vincent Guittot @ 2021-11-12 9:58 UTC (permalink / raw) To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, linux-kernel, tim.c.chen Cc: joel, Vincent Guittot This patchset is a follow up of : https://lore.kernel.org/lkml/20211019123537.17146-1-vincent.guittot@linaro.org/ It ensures that newly idle load balance will not kick the update of blocked load if it skips the load balance because avg_idle is too short. It also makes sure that rq->next_balance doesn't go in the past when updated. Tim Chen (1): sched: sched: Fix rq->next_balance time updated to earlier than current time Vincent Guittot (1): sched/fair: skip newidle update stats kernel/sched/fair.c | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) -- 2.17.1 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] sched/fair: skip newidle update stats 2021-11-12 9:58 [PATCH 0/2] avoid spurious blocked load update Vincent Guittot @ 2021-11-12 9:58 ` Vincent Guittot 2021-11-12 14:29 ` Peter Zijlstra 2021-11-12 14:55 ` Peter Zijlstra 2021-11-16 23:48 ` [PATCH 0/2] avoid spurious blocked load update Tim Chen 1 sibling, 2 replies; 11+ messages in thread From: Vincent Guittot @ 2021-11-12 9:58 UTC (permalink / raw) To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, linux-kernel, tim.c.chen Cc: joel, Vincent Guittot In case we skip the newly idle LB entirely or we abort it because we are going to exceed the avg_idle, we have to make sure to not start an update of the blocked load when entering idle Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> --- kernel/sched/fair.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 13950beb01a2..a162b0ec8963 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -10861,7 +10861,7 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf) int this_cpu = this_rq->cpu; u64 t0, t1, curr_cost = 0; struct sched_domain *sd; - int pulled_task = 0; + int pulled_task = 0, early_stop = 0; update_misfit_status(NULL, this_rq); @@ -10898,8 +10898,16 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf) if (!READ_ONCE(this_rq->rd->overload) || (sd && this_rq->avg_idle < sd->max_newidle_lb_cost)) { - if (sd) + if (sd) { update_next_balance(sd, &next_balance); + + /* + * We skip new idle LB because there is not enough + * time before next wake up. Make sure that we will + * not kick NOHZ_NEWILB_KICK + */ + early_stop = 1; + } rcu_read_unlock(); goto out; @@ -10918,8 +10926,10 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf) update_next_balance(sd, &next_balance); - if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) + if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) { + early_stop = 1; break; + } if (sd->flags & SD_BALANCE_NEWIDLE) { @@ -10969,7 +10979,7 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf) if (pulled_task) this_rq->idle_stamp = 0; - else + else if (!early_stop) nohz_newidle_balance(this_rq); rq_repin_lock(this_rq, rf); -- 2.17.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] sched/fair: skip newidle update stats 2021-11-12 9:58 ` [PATCH 1/2] sched/fair: skip newidle update stats Vincent Guittot @ 2021-11-12 14:29 ` Peter Zijlstra 2021-11-12 14:47 ` Vincent Guittot 2021-11-12 14:55 ` Peter Zijlstra 1 sibling, 1 reply; 11+ messages in thread From: Peter Zijlstra @ 2021-11-12 14:29 UTC (permalink / raw) To: Vincent Guittot Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, linux-kernel, tim.c.chen, joel On Fri, Nov 12, 2021 at 10:58:56AM +0100, Vincent Guittot wrote: > In case we skip the newly idle LB entirely or we abort it because we are > going to exceed the avg_idle, we have to make sure to not start an update > of the blocked load when entering idle > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > --- > kernel/sched/fair.c | 18 ++++++++++++++---- > 1 file changed, 14 insertions(+), 4 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 13950beb01a2..a162b0ec8963 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -10861,7 +10861,7 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf) > int this_cpu = this_rq->cpu; > u64 t0, t1, curr_cost = 0; > struct sched_domain *sd; > - int pulled_task = 0; > + int pulled_task = 0, early_stop = 0; > > update_misfit_status(NULL, this_rq); > > @@ -10898,8 +10898,16 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf) > if (!READ_ONCE(this_rq->rd->overload) || > (sd && this_rq->avg_idle < sd->max_newidle_lb_cost)) { > > - if (sd) > + if (sd) { > update_next_balance(sd, &next_balance); > + > + /* > + * We skip new idle LB because there is not enough > + * time before next wake up. Make sure that we will > + * not kick NOHZ_NEWILB_KICK > + */ > + early_stop = 1; > + } > rcu_read_unlock(); > > goto out; > @@ -10918,8 +10926,10 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf) > > update_next_balance(sd, &next_balance); > > - if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) > + if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) { > + early_stop = 1; > break; > + } > > if (sd->flags & SD_BALANCE_NEWIDLE) { > > @@ -10969,7 +10979,7 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf) > > if (pulled_task) > this_rq->idle_stamp = 0; > - else > + else if (!early_stop) > nohz_newidle_balance(this_rq); > > rq_repin_lock(this_rq, rf); Urgh code flow is a mess... Let me see if I can fix some of that. Anyway, does nohz_newidle_balance() want to loose it's ->avg_idle test with this on? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] sched/fair: skip newidle update stats 2021-11-12 14:29 ` Peter Zijlstra @ 2021-11-12 14:47 ` Vincent Guittot 2021-11-12 15:29 ` Peter Zijlstra 0 siblings, 1 reply; 11+ messages in thread From: Vincent Guittot @ 2021-11-12 14:47 UTC (permalink / raw) To: Peter Zijlstra Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, linux-kernel, tim.c.chen, joel On Fri, 12 Nov 2021 at 15:29, Peter Zijlstra <peterz@infradead.org> wrote: > > On Fri, Nov 12, 2021 at 10:58:56AM +0100, Vincent Guittot wrote: > > In case we skip the newly idle LB entirely or we abort it because we are > > going to exceed the avg_idle, we have to make sure to not start an update > > of the blocked load when entering idle > > > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > > --- > > kernel/sched/fair.c | 18 ++++++++++++++---- > > 1 file changed, 14 insertions(+), 4 deletions(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 13950beb01a2..a162b0ec8963 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -10861,7 +10861,7 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf) > > int this_cpu = this_rq->cpu; > > u64 t0, t1, curr_cost = 0; > > struct sched_domain *sd; > > - int pulled_task = 0; > > + int pulled_task = 0, early_stop = 0; > > > > update_misfit_status(NULL, this_rq); > > > > @@ -10898,8 +10898,16 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf) > > if (!READ_ONCE(this_rq->rd->overload) || > > (sd && this_rq->avg_idle < sd->max_newidle_lb_cost)) { > > > > - if (sd) > > + if (sd) { > > update_next_balance(sd, &next_balance); > > + > > + /* > > + * We skip new idle LB because there is not enough > > + * time before next wake up. Make sure that we will > > + * not kick NOHZ_NEWILB_KICK > > + */ > > + early_stop = 1; > > + } > > rcu_read_unlock(); > > > > goto out; > > @@ -10918,8 +10926,10 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf) > > > > update_next_balance(sd, &next_balance); > > > > - if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) > > + if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) { > > + early_stop = 1; > > break; > > + } > > > > if (sd->flags & SD_BALANCE_NEWIDLE) { > > > > @@ -10969,7 +10979,7 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf) > > > > if (pulled_task) > > this_rq->idle_stamp = 0; > > - else > > + else if (!early_stop) > > nohz_newidle_balance(this_rq); > > > > rq_repin_lock(this_rq, rf); > > Urgh code flow is a mess... Let me see if I can fix some of that. yeah, I haven't find a better way > > Anyway, does nohz_newidle_balance() want to loose it's ->avg_idle test > with this on? This test still covers cases with short newly idle balance. Being conservative, people never complained that the update of blocked load average of idle CPUs doesn't happen often enough. It's most often the opposite ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] sched/fair: skip newidle update stats 2021-11-12 14:47 ` Vincent Guittot @ 2021-11-12 15:29 ` Peter Zijlstra 2021-11-12 16:00 ` Vincent Guittot 0 siblings, 1 reply; 11+ messages in thread From: Peter Zijlstra @ 2021-11-12 15:29 UTC (permalink / raw) To: Vincent Guittot Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, linux-kernel, tim.c.chen, joel On Fri, Nov 12, 2021 at 03:47:21PM +0100, Vincent Guittot wrote: > On Fri, 12 Nov 2021 at 15:29, Peter Zijlstra <peterz@infradead.org> wrote: > > > > On Fri, Nov 12, 2021 at 10:58:56AM +0100, Vincent Guittot wrote: > > > In case we skip the newly idle LB entirely or we abort it because we are > > > going to exceed the avg_idle, we have to make sure to not start an update > > > of the blocked load when entering idle > > > > > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > > > --- > > > kernel/sched/fair.c | 18 ++++++++++++++---- > > > 1 file changed, 14 insertions(+), 4 deletions(-) > > > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > index 13950beb01a2..a162b0ec8963 100644 > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c > > > @@ -10861,7 +10861,7 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf) > > > int this_cpu = this_rq->cpu; > > > u64 t0, t1, curr_cost = 0; > > > struct sched_domain *sd; > > > - int pulled_task = 0; > > > + int pulled_task = 0, early_stop = 0; > > > > > > update_misfit_status(NULL, this_rq); > > > > > > @@ -10898,8 +10898,16 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf) > > > if (!READ_ONCE(this_rq->rd->overload) || > > > (sd && this_rq->avg_idle < sd->max_newidle_lb_cost)) { > > > > > > - if (sd) > > > + if (sd) { > > > update_next_balance(sd, &next_balance); > > > + > > > + /* > > > + * We skip new idle LB because there is not enough > > > + * time before next wake up. Make sure that we will > > > + * not kick NOHZ_NEWILB_KICK > > > + */ > > > + early_stop = 1; > > > + } > > > rcu_read_unlock(); > > > > > > goto out; > > Anyway, does nohz_newidle_balance() want to loose it's ->avg_idle test > > with this on? > > This test still covers cases with short newly idle balance. Being > conservative, people never complained that the update of blocked load > average of idle CPUs doesn't happen often enough. It's most often the > opposite Well, per commit c5b0a7eefc70 ("sched/fair: Remove sysctl_sched_migration_cost condition") combined with the above change, we no longer call nohz_newidle_balance() in exactly that condition, right? Or are we worried about that !overload case? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] sched/fair: skip newidle update stats 2021-11-12 15:29 ` Peter Zijlstra @ 2021-11-12 16:00 ` Vincent Guittot 0 siblings, 0 replies; 11+ messages in thread From: Vincent Guittot @ 2021-11-12 16:00 UTC (permalink / raw) To: Peter Zijlstra Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, linux-kernel, tim.c.chen, joel On Fri, 12 Nov 2021 at 16:29, Peter Zijlstra <peterz@infradead.org> wrote: > > On Fri, Nov 12, 2021 at 03:47:21PM +0100, Vincent Guittot wrote: > > On Fri, 12 Nov 2021 at 15:29, Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > On Fri, Nov 12, 2021 at 10:58:56AM +0100, Vincent Guittot wrote: > > > > In case we skip the newly idle LB entirely or we abort it because we are > > > > going to exceed the avg_idle, we have to make sure to not start an update > > > > of the blocked load when entering idle > > > > > > > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > > > > --- > > > > kernel/sched/fair.c | 18 ++++++++++++++---- > > > > 1 file changed, 14 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > > index 13950beb01a2..a162b0ec8963 100644 > > > > --- a/kernel/sched/fair.c > > > > +++ b/kernel/sched/fair.c > > > > @@ -10861,7 +10861,7 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf) > > > > int this_cpu = this_rq->cpu; > > > > u64 t0, t1, curr_cost = 0; > > > > struct sched_domain *sd; > > > > - int pulled_task = 0; > > > > + int pulled_task = 0, early_stop = 0; > > > > > > > > update_misfit_status(NULL, this_rq); > > > > > > > > @@ -10898,8 +10898,16 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf) > > > > if (!READ_ONCE(this_rq->rd->overload) || > > > > (sd && this_rq->avg_idle < sd->max_newidle_lb_cost)) { > > > > > > > > - if (sd) > > > > + if (sd) { > > > > update_next_balance(sd, &next_balance); > > > > + > > > > + /* > > > > + * We skip new idle LB because there is not enough > > > > + * time before next wake up. Make sure that we will > > > > + * not kick NOHZ_NEWILB_KICK > > > > + */ > > > > + early_stop = 1; > > > > + } > > > > rcu_read_unlock(); > > > > > > > > goto out; > > > > Anyway, does nohz_newidle_balance() want to loose it's ->avg_idle test > > > with this on? > > > > This test still covers cases with short newly idle balance. Being > > conservative, people never complained that the update of blocked load > > average of idle CPUs doesn't happen often enough. It's most often the > > opposite > > Well, per commit c5b0a7eefc70 ("sched/fair: Remove > sysctl_sched_migration_cost condition") combined with the above change, > we no longer call nohz_newidle_balance() in exactly that condition, > right? > > Or are we worried about that !overload case? we can do a complete newly idle LB but have this_rq->avg_idle < sysctl_sched_migration_cost. In this case, the condition will continue to skip update of other idle CPUs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] sched/fair: skip newidle update stats 2021-11-12 9:58 ` [PATCH 1/2] sched/fair: skip newidle update stats Vincent Guittot 2021-11-12 14:29 ` Peter Zijlstra @ 2021-11-12 14:55 ` Peter Zijlstra 2021-11-12 14:55 ` Peter Zijlstra 1 sibling, 1 reply; 11+ messages in thread From: Peter Zijlstra @ 2021-11-12 14:55 UTC (permalink / raw) To: Vincent Guittot Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, linux-kernel, tim.c.chen, joel Subject: sched/fair: Simplify newidle_balance() From: Peter Zijlstra <peterz@infradead.org> Date: Fri Nov 12 15:46:26 CET 2021 Move rq_{un,re}pin_lock() next to raw_spin_rq_{un,}lock(). Remove all rcu_read_{,un}lock(), since we have preempt/irqs disabled over the whole function and those hold off RCU (again). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- kernel/sched/fair.c | 23 ++++++++--------------- 1 file changed, 8 insertions(+), 15 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -10884,15 +10884,6 @@ static int newidle_balance(struct rq *th if (!cpu_active(this_cpu)) return 0; - /* - * This is OK, because current is on_cpu, which avoids it being picked - * for load-balance and preemption/IRQs are still disabled avoiding - * further scheduler activity on it and we're being very careful to - * re-start the picking loop. - */ - rq_unpin_lock(this_rq, rf); - - rcu_read_lock(); sd = rcu_dereference_check_sched_domain(this_rq->sd); if (!READ_ONCE(this_rq->rd->overload) || @@ -10908,18 +10899,22 @@ static int newidle_balance(struct rq *th */ early_stop = 1; } - rcu_read_unlock(); goto out; } - rcu_read_unlock(); + /* + * This is OK, because current is on_cpu, which avoids it being picked + * for load-balance and preemption/IRQs are still disabled avoiding + * further scheduler activity on it and we're being very careful to + * re-start the picking loop. + */ + rq_unpin_lock(this_rq, rf); raw_spin_rq_unlock(this_rq); t0 = sched_clock_cpu(this_cpu); update_blocked_averages(this_cpu); - rcu_read_lock(); for_each_domain(this_cpu, sd) { int continue_balancing = 1; u64 domain_cost; @@ -10953,9 +10948,9 @@ static int newidle_balance(struct rq *th this_rq->ttwu_pending) break; } - rcu_read_unlock(); raw_spin_rq_lock(this_rq); + rq_repin_lock(this_rq, rf); if (curr_cost > this_rq->max_idle_balance_cost) this_rq->max_idle_balance_cost = curr_cost; @@ -10982,8 +10977,6 @@ static int newidle_balance(struct rq *th else if (!early_stop) nohz_newidle_balance(this_rq); - rq_repin_lock(this_rq, rf); - return pulled_task; } ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] sched/fair: skip newidle update stats 2021-11-12 14:55 ` Peter Zijlstra @ 2021-11-12 14:55 ` Peter Zijlstra 2021-11-12 16:05 ` Vincent Guittot 0 siblings, 1 reply; 11+ messages in thread From: Peter Zijlstra @ 2021-11-12 14:55 UTC (permalink / raw) To: Vincent Guittot Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, linux-kernel, tim.c.chen, joel Subject: sched/fair: Reflow newidle_balance() From: Peter Zijlstra <peterz@infradead.org> Date: Fri Nov 12 15:46:08 CET 2021 The control flow in newidle_balance() is a little convoluted, attempt simplification. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- kernel/sched/fair.c | 21 ++++++++------------- 1 file changed, 8 insertions(+), 13 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -10858,10 +10858,10 @@ static inline void nohz_newidle_balance( static int newidle_balance(struct rq *this_rq, struct rq_flags *rf) { unsigned long next_balance = jiffies + HZ; + int pulled_task = 0, timeout = 0; int this_cpu = this_rq->cpu; u64 t0, t1, curr_cost = 0; struct sched_domain *sd; - int pulled_task = 0, early_stop = 0; update_misfit_status(NULL, this_rq); @@ -10889,17 +10889,9 @@ static int newidle_balance(struct rq *th if (!READ_ONCE(this_rq->rd->overload) || (sd && this_rq->avg_idle < sd->max_newidle_lb_cost)) { - if (sd) { + if (sd) update_next_balance(sd, &next_balance); - /* - * We skip new idle LB because there is not enough - * time before next wake up. Make sure that we will - * not kick NOHZ_NEWILB_KICK - */ - early_stop = 1; - } - goto out; } @@ -10922,7 +10914,7 @@ static int newidle_balance(struct rq *th update_next_balance(sd, &next_balance); if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) { - early_stop = 1; + timeout = 1; break; } @@ -10967,6 +10959,11 @@ static int newidle_balance(struct rq *th if (this_rq->nr_running != this_rq->cfs.h_nr_running) pulled_task = -1; + if (pulled_task || timeout) + goto out; + + nohz_newidle_balance(this_rq); + out: /* Move the next balance forward */ if (time_after(this_rq->next_balance, next_balance)) @@ -10974,8 +10971,6 @@ static int newidle_balance(struct rq *th if (pulled_task) this_rq->idle_stamp = 0; - else if (!early_stop) - nohz_newidle_balance(this_rq); return pulled_task; } ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] sched/fair: skip newidle update stats 2021-11-12 14:55 ` Peter Zijlstra @ 2021-11-12 16:05 ` Vincent Guittot 0 siblings, 0 replies; 11+ messages in thread From: Vincent Guittot @ 2021-11-12 16:05 UTC (permalink / raw) To: Peter Zijlstra Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, linux-kernel, tim.c.chen, joel On Fri, 12 Nov 2021 at 15:55, Peter Zijlstra <peterz@infradead.org> wrote: > > > Subject: sched/fair: Reflow newidle_balance() > From: Peter Zijlstra <peterz@infradead.org> > Date: Fri Nov 12 15:46:08 CET 2021 > > The control flow in newidle_balance() is a little convoluted, attempt > simplification. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > kernel/sched/fair.c | 21 ++++++++------------- > 1 file changed, 8 insertions(+), 13 deletions(-) > > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -10858,10 +10858,10 @@ static inline void nohz_newidle_balance( > static int newidle_balance(struct rq *this_rq, struct rq_flags *rf) > { > unsigned long next_balance = jiffies + HZ; > + int pulled_task = 0, timeout = 0; > int this_cpu = this_rq->cpu; > u64 t0, t1, curr_cost = 0; > struct sched_domain *sd; > - int pulled_task = 0, early_stop = 0; > > update_misfit_status(NULL, this_rq); > > @@ -10889,17 +10889,9 @@ static int newidle_balance(struct rq *th > if (!READ_ONCE(this_rq->rd->overload) || > (sd && this_rq->avg_idle < sd->max_newidle_lb_cost)) { > > - if (sd) { > + if (sd) > update_next_balance(sd, &next_balance); > > - /* > - * We skip new idle LB because there is not enough > - * time before next wake up. Make sure that we will > - * not kick NOHZ_NEWILB_KICK > - */ > - early_stop = 1; > - } > - > goto out; > } > > @@ -10922,7 +10914,7 @@ static int newidle_balance(struct rq *th > update_next_balance(sd, &next_balance); > > if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) { > - early_stop = 1; > + timeout = 1; > break; > } > > @@ -10967,6 +10959,11 @@ static int newidle_balance(struct rq *th > if (this_rq->nr_running != this_rq->cfs.h_nr_running) > pulled_task = -1; > > + if (pulled_task || timeout) > + goto out; > + > + nohz_newidle_balance(this_rq); maybe if (!pulled_task && !timeout) nohz_newidle_balance(this_rq); > + > out: > /* Move the next balance forward */ > if (time_after(this_rq->next_balance, next_balance)) > @@ -10974,8 +10971,6 @@ static int newidle_balance(struct rq *th > > if (pulled_task) > this_rq->idle_stamp = 0; > - else if (!early_stop) > - nohz_newidle_balance(this_rq); > > return pulled_task; > } ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/2] avoid spurious blocked load update 2021-11-12 9:58 [PATCH 0/2] avoid spurious blocked load update Vincent Guittot 2021-11-12 9:58 ` [PATCH 1/2] sched/fair: skip newidle update stats Vincent Guittot @ 2021-11-16 23:48 ` Tim Chen 2021-11-18 14:40 ` Vincent Guittot 1 sibling, 1 reply; 11+ messages in thread From: Tim Chen @ 2021-11-16 23:48 UTC (permalink / raw) To: Vincent Guittot, mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, linux-kernel Cc: joel On Fri, 2021-11-12 at 10:58 +0100, Vincent Guittot wrote: > This patchset is a follow up of : > https://lore.kernel.org/lkml/20211019123537.17146-1-vincent.guittot@linaro.org/ > > It ensures that newly idle load balance will not kick the update of > blocked load if it skips the load balance because avg_idle is too > short. > It also makes sure that rq->next_balance doesn't go in the past when > updated. > > Tim Chen (1): > sched: sched: Fix rq->next_balance time updated to earlier than > current time > > Vincent Guittot (1): > sched/fair: skip newidle update stats > > kernel/sched/fair.c | 25 ++++++++++++++++++++----- > 1 file changed, 20 insertions(+), 5 deletions(-) > Vincent, Got some data back from the benchmark team. To my surprise, the skip_new_idle_update_stats patch actually makes things a little worse. Relative Performance (higher better) 5.15 rc4 vanilla (cgroup disabled) 100% 5.15 rc4 vanilla (cgroup enabled) 96% patch v2 96% patch v3 96% patch v3 +skip_new_idle_update_stats 93.7% patch v3 +skip_new_idle_update_stats +Fix rq->next_balance_time 93.7% The cpu utilization actually is the similar compared with having just the v3 patch. In both cases they are 81% user 12% kernel 2% idle 5% waiting for IO Profile on key functions in load balancing shows a little more cpu utilization, which is unexpected as we are cutting short the newidle_balance. patch v3 0.56% [k] __update_load_avg_cfs_rq 0.51% [k] update_load_avg 0.39% [k] update_blocked_averages 0.36% [k] __update_load_avg_se 0.05% [k] newidle_balance patch v3 + skip_new_idle_update_stats 0.58% [k] __update_load_avg_cfs_rq 0.53% [k] update_load_avg 0.40% [k] update_blocked_averages 0.37% [k] __update_load_avg_se 0.06% [k] newidle_balance Context switch frequency is lower by 4% with the skip_new_idle_update_stats patch. Thanks. Tim ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/2] avoid spurious blocked load update 2021-11-16 23:48 ` [PATCH 0/2] avoid spurious blocked load update Tim Chen @ 2021-11-18 14:40 ` Vincent Guittot 0 siblings, 0 replies; 11+ messages in thread From: Vincent Guittot @ 2021-11-18 14:40 UTC (permalink / raw) To: Tim Chen Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, linux-kernel, joel On Wed, 17 Nov 2021 at 00:48, Tim Chen <tim.c.chen@linux.intel.com> wrote: > > On Fri, 2021-11-12 at 10:58 +0100, Vincent Guittot wrote: > > This patchset is a follow up of : > > https://lore.kernel.org/lkml/20211019123537.17146-1-vincent.guittot@linaro.org/ > > > > It ensures that newly idle load balance will not kick the update of > > blocked load if it skips the load balance because avg_idle is too > > short. > > It also makes sure that rq->next_balance doesn't go in the past when > > updated. > > > > Tim Chen (1): > > sched: sched: Fix rq->next_balance time updated to earlier than > > current time > > > > Vincent Guittot (1): > > sched/fair: skip newidle update stats > > > > kernel/sched/fair.c | 25 ++++++++++++++++++++----- > > 1 file changed, 20 insertions(+), 5 deletions(-) > > > > Vincent, > > Got some data back from the benchmark team. > To my surprise, the skip_new_idle_update_stats patch > actually makes things a little worse. > > Relative Performance > (higher better) > 5.15 rc4 vanilla (cgroup disabled) 100% > 5.15 rc4 vanilla (cgroup enabled) 96% > patch v2 96% > patch v3 96% > patch v3 > +skip_new_idle_update_stats 93.7% > patch v3 > +skip_new_idle_update_stats > +Fix rq->next_balance_time 93.7% > Yeah, that looks surprising. patch skip_new_idle_update_stats only ensures that the cpu will not run an update of the blocked average of idle cpus before entering idle but outside newidle_balance if it thinks that a task is about to wake up soon. The end result is that we run less often _nohz_idle_balance(cpu_rq(cpu), NOHZ_STATS_KICK, CPU_IDLE); before entering idle. > The cpu utilization actually is the similar compared with > having just the v3 patch. In both cases they are > 81% user > 12% kernel > 2% idle > 5% waiting for IO > > > Profile on key functions > in load balancing shows a little more cpu utilization, > which is unexpected as we are cutting short > the newidle_balance. > > patch v3 > 0.56% [k] __update_load_avg_cfs_rq > 0.51% [k] update_load_avg > 0.39% [k] update_blocked_averages > 0.36% [k] __update_load_avg_se > 0.05% [k] newidle_balance > > patch v3 + skip_new_idle_update_stats > 0.58% [k] __update_load_avg_cfs_rq > 0.53% [k] update_load_avg > 0.40% [k] update_blocked_averages > 0.37% [k] __update_load_avg_se > 0.06% [k] newidle_balance > > Context switch frequency is lower by 4% with the skip_new_idle_update_stats > patch. > > Thanks. > > Tim > ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-11-18 14:40 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-11-12 9:58 [PATCH 0/2] avoid spurious blocked load update Vincent Guittot 2021-11-12 9:58 ` [PATCH 1/2] sched/fair: skip newidle update stats Vincent Guittot 2021-11-12 14:29 ` Peter Zijlstra 2021-11-12 14:47 ` Vincent Guittot 2021-11-12 15:29 ` Peter Zijlstra 2021-11-12 16:00 ` Vincent Guittot 2021-11-12 14:55 ` Peter Zijlstra 2021-11-12 14:55 ` Peter Zijlstra 2021-11-12 16:05 ` Vincent Guittot 2021-11-16 23:48 ` [PATCH 0/2] avoid spurious blocked load update Tim Chen 2021-11-18 14:40 ` Vincent Guittot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).